- Large RISC-V IPI rework to make way for a new interrupt
architecture
- More Loongarch fixes from Lianmin Lv, fixing issues in the so
called "dual-bridge" systems.
- Workaround for the nvidia T241 chip that gets confused in
3 and 4 socket configurations, leading to the GIC
malfunctionning in some contexts
- Drop support for non-firmware driven GIC configurarations
now that the old ARM11MP Cavium board is gone
- Workaround for the Rockchip 3588 chip that doesn't
correctly deal with the shareability attributes.
- Replace uses of of_find_property() with the more appropriate
of_property_read_bool()
- Make bcm-6345-l1 request its MMIO region
- Add suspend support to the SiFive PLIC
- Drop support for stih415, stih416 and stid127 platforms
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmRCjF0PHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpDfAcQAMCqxwZcMPtMDcj1pfuiPM4mcPnb81nB80ms
0Clc+O/fLvW4bduVQN9S+P4aJ3WNYUw0nTrF13CzUZpPOLGt8NlylSZ0+EQV6+th
60+YoJ0X5ovlQ6tMUzhJjuU2HFQxXyvNmkXsrr4xaPlZaOt01OCNGhKVwuq+f7nJ
U87id3M3Nt6dSj//NonBdy+61y/XZiPQXZZP6act71f5Rz+REnBdEESzWLbzV9jo
UiVZD24OWLE+A0N4JqTXI4ruXIaOdwfOi31hN/OF3JrWWrM99RYK/pLJM/mEvCdw
HA0acS/uj9wQLUD9OFPCKE3aGOA4rdcWUrwzQvINZMIezHgaWfuOT/J5BbF9iH+Y
nrySLQB9xl9TVtMsfCOmjc+yPDDQ0AycrV++pn6CH59dx4WR3e2wTe68arHMoO9h
ljvEHVb2inDYJqHhyYKhNehlD1YGNFcOT4QRrduWxQezl3RKdYEjJngmm7QBfRT1
ee62Jm2RodPAknuDnnJNsFnHAE8emu9RLdKviEmPT66oclhw2sgn/V2JC+O+Oe0e
TkXrn4AzrIpEjbEUMJAow7RTSpzIGx921fcqbSpZrf6k75rgxaKBPPxtV4uANfII
gbOKklj15pljuPuxsputWzkdU9xVvgdg/5CHw6+AA68rNeB16NbTMe9RsiYNzlF6
7XBGnW3Q
=TsID
-----END PGP SIGNATURE-----
Merge tag 'irqchip-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip changes from Marc Zyngier:
- Large RISC-V IPI rework to make way for a new interrupt
architecture
- More Loongarch fixes from Lianmin Lv, fixing issues in the so
called "dual-bridge" systems.
- Workaround for the nvidia T241 chip that gets confused in
3 and 4 socket configurations, leading to the GIC
malfunctionning in some contexts
- Drop support for non-firmware driven GIC configurarations
now that the old ARM11MP Cavium board is gone
- Workaround for the Rockchip 3588 chip that doesn't
correctly deal with the shareability attributes.
- Replace uses of of_find_property() with the more appropriate
of_property_read_bool()
- Make bcm-6345-l1 request its MMIO region
- Add suspend support to the SiFive PLIC
- Drop support for stih415, stih416 and stid127 platforms
Link: https://lore.kernel.org/lkml/20230421132104.3021536-1-maz@kernel.org
The AIA specification introduce per-HART AIA CSRs which primarily
support:
* 64 local interrupts on both RV64 and RV32
* priority for each of the 64 local interrupts
* interrupt filtering for local interrupts
This patch virtualize above mentioned AIA CSRs and also extend
ONE_REG interface to allow user-space save/restore Guest/VM
view of these CSRs.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
To support 64 VCPU local interrupts on RV32 host, we should use
bitmap for irqs_pending and irqs_pending_mask in struct kvm_vcpu_arch.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
We implement ONE_REG interface for AIA CSRs as a separate subtype
under the CSR ONE_REG interface.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
To make the CSR ONE_REG interface extensible, we implement subtype
for the CSR ONE_REG IDs. The existing CSR ONE_REG IDs are treated
as subtype = 0 (aka General CSRs).
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
To incrementally implement AIA support, we first add minimal skeletal
support which only compiles and detects AIA hardware support at the
boot-time but does not provide any functionality.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
The hgatp.VMID mask defines are used before shifting when extracting
VMID value from hgatp CSR value so based on the convention followed
in the other parts of asm/csr.h, the hgatp.VMID mask defines should
not have a _MASK suffix.
While we are here, let's use GENMASK() for hgatp.VMID and hgatp.PPN.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
We have two extension names for AIA ISA support: Smaia (M-mode AIA CSRs)
and Ssaia (S-mode AIA CSRs).
We extend the ISA string parsing to detect Smaia and Ssaia extensions.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
We extend the KVM ISA extension ONE_REG interface to allow KVM
user space to detect and enable Zbb extension for Guest/VM.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
We add ONE_REG interface to enable/disable SBI extensions (just
like the ONE_REG interface for ISA extensions). This allows KVM
user-space to decide the set of SBI extension enabled for a Guest
and by default all SBI extensions are enabled.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
While alphabetized lists tend to become unalphabetized almost
as quickly as they get fixed up, it is preferred to keep select
lists in Kconfigs in order. Let's fix KVM's up.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Anup Patel <anup@brainfault.org>
Read mmu_invalidate_seq before dropping the mmap_lock so that KVM can
detect if the results of vma_lookup() (e.g. vma_shift) become stale
before it acquires kvm->mmu_lock. This fixes a theoretical bug where a
VMA could be changed by userspace after vma_lookup() and before KVM
reads the mmu_invalidate_seq, causing KVM to install page table entries
based on a (possibly) no-longer-valid vma_shift.
Re-order the MMU cache top-up to earlier in user_mem_abort() so that it
is not done after KVM has read mmu_invalidate_seq (i.e. so as to avoid
inducing spurious fault retries).
It's unlikely that any sane userspace currently modifies VMAs in such a
way as to trigger this race. And even with directed testing I was unable
to reproduce it. But a sufficiently motivated host userspace might be
able to exploit this race.
Note KVM/ARM had the same bug and was fixed in a separate, near
identical patch (see Link).
Link: https://lore.kernel.org/kvm/20230313235454.2964067-1-dmatlack@google.com/
Fixes: 9955371cc0 ("RISC-V: KVM: Implement MMU notifiers")
Cc: stable@vger.kernel.org
Signed-off-by: David Matlack <dmatlack@google.com>
Tested-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Anup Patel <anup@brainfault.org>
Alexandre Ghiti <alexghiti@rivosinc.com> says:
After multiple attempts, this patchset is now based on the fact that the
64b kernel mapping was moved outside the linear mapping.
The first patch allows to build relocatable kernels but is not selected
by default. That patch is a requirement for KASLR.
The second and third patches take advantage of an already existing powerpc
script that checks relocations at compile-time, and uses it for riscv.
* b4-shazam-merge:
riscv: Use --emit-relocs in order to move .rela.dyn in init
riscv: Check relocations at compile time
powerpc: Move script to check relocations at compile time in scripts/
riscv: Introduce CONFIG_RELOCATABLE
riscv: Move .rela.dyn outside of init to avoid empty relocations
riscv: Prepare EFI header for relocatable kernels
Link: https://lore.kernel.org/r/20230329045329.64565-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
To circumvent an issue where placing the relocations inside the init
sections produces empty relocations, use --emit-relocs. But to avoid
carrying those relocations in vmlinux, use an intermediate
vmlinux.relocs file which is a copy of vmlinux *before* stripping its
relocations.
Suggested-by: Björn Töpel <bjorn@kernel.org>
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230329045329.64565-7-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Relocating kernel at runtime is done very early in the boot process, so
it is not convenient to check for relocations there and react in case a
relocation was not expected.
There exists a script in scripts/ that extracts the relocations from
vmlinux that is then used at postlink to check the relocations.
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20230329045329.64565-6-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This config allows to compile 64b kernel as PIE and to relocate it at
any virtual address at runtime: this paves the way to KASLR.
Runtime relocation is possible since relocation metadata are embedded into
the kernel.
Note that relocating at runtime introduces an overhead even if the
kernel is loaded at the same address it was linked at and that the compiler
options are those used in arm64 which uses the same RELA relocation
format.
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230329045329.64565-4-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Alexandre Ghiti <alexghiti@rivosinc.com> says:
As described in patch 2, our current kasan implementation is intricate,
so I tried to simplify the implementation and mimic what arm64/x86 are
doing.
In addition it fixes UEFI bootflow with a kasan kernel and kasan inline
instrumentation: all kasan configurations were tested on a large ubuntu
kernel with success with KASAN_KUNIT_TEST and KASAN_MODULE_TEST.
inline ubuntu config + uefi:
sv39: OK
sv48: OK
sv57: OK
outline ubuntu config + uefi:
sv39: OK
sv48: OK
sv57: OK
Actually 1 test always fails with KASAN_KUNIT_TEST that I have to check:
KASAN failure expected in "set_bit(nr, addr)", but none occurrred
Note that Palmer recently proposed to remove COMMAND_LINE_SIZE from the
userspace abi
https://lore.kernel.org/lkml/20221211061358.28035-1-palmer@rivosinc.com/T/
so that we can finally increase the command line to fit all kasan kernel
parameters.
All of this should hopefully fix the syzkaller riscv build that has been
failing for a few months now, any test is appreciated and if I can help
in any way, please ask.
* b4-shazam-merge:
riscv: Unconditionnally select KASAN_VMALLOC if KASAN
riscv: Fix ptdump when KASAN is enabled
riscv: Fix EFI stub usage of KASAN instrumented strcmp function
riscv: Move DTB_EARLY_BASE_VA to the kernel address space
riscv: Rework kasan population functions
riscv: Split early and final KASAN population functions
Link: https://lore.kernel.org/r/20230203075232.274282-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
If KASAN is enabled, VMAP_STACK depends on KASAN_VMALLOC so enable
KASAN_VMALLOC with KASAN so that we can enable VMAP_STACK by default.
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230203075232.274282-7-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The KASAN shadow region was moved next to the kernel mapping but the
ptdump code was not updated and it appears to break the dump of the kernel
page table, so fix this by moving the KASAN shadow region in ptdump.
Fixes: f7ae02333d ("riscv: Move KASAN mapping next to the kernel mapping")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230203075232.274282-6-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The EFI stub must not use any KASAN instrumented code as the kernel
proper did not initialize the thread pointer and the mapping for the
KASAN shadow region.
Avoid using the generic strcmp function, instead use the one in
drivers/firmware/efi/libstub/string.c.
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20230203075232.274282-5-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The early virtual address should lie in the kernel address space for
inline kasan instrumentation to succeed, otherwise kasan tries to
dereference an address that does not exist in the address space (since
kasan only maps *kernel* address space, not the userspace).
Simply use the very first address of the kernel address space for the
early fdt mapping.
It allowed an Ubuntu kernel to boot successfully with inline
instrumentation.
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230203075232.274282-4-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Our previous kasan population implementation used to have the final kasan
shadow region mapped with kasan_early_shadow_page, because we did not clean
the early mapping and then we had to populate the kasan region "in-place"
which made the code cumbersome.
So now we clear the early mapping, establish a temporary mapping while we
populate the kasan shadow region with just the kernel regions that will
be used.
This new version uses the "generic" way of going through a page table
that may be folded at runtime (avoid the XXX_next macros).
It was tested with outline instrumentation on an Ubuntu kernel
configuration successfully.
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230203075232.274282-3-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This is a preliminary work that allows to make the code more
understandable.
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230203075232.274282-2-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Alexandre Ghiti <alexghiti@rivosinc.com> says:
This patchset intends to improve tlb utilization by using hugepages for
the linear mapping.
As reported by Anup in v6, when STRICT_KERNEL_RWX is enabled, we must
take care of isolating the kernel text and rodata so that they are not
mapped with a PUD mapping which would then assign wrong permissions to
the whole region: it is achieved the same way as arm64 by using the
memblock nomap API which isolates those regions and re-merge them afterwards
thus avoiding any issue with the system resources tree creation.
arch/riscv/include/asm/page.h | 19 ++++++-
arch/riscv/mm/init.c | 102 ++++++++++++++++++++++++++--------
arch/riscv/mm/physaddr.c | 16 ++++++
drivers/of/fdt.c | 11 ++--
4 files changed, 118 insertions(+), 30 deletions(-)
* b4-shazam-merge:
riscv: Use PUD/P4D/PGD pages for the linear mapping
riscv: Move the linear mapping creation in its own function
riscv: Get rid of riscv_pfn_base variable
Link: https://lore.kernel.org/r/20230324155421.271544-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
During the early page table creation, we used to set the mapping for
PAGE_OFFSET to the kernel load address: but the kernel load address is
always offseted by PMD_SIZE which makes it impossible to use PUD/P4D/PGD
pages as this physical address is not aligned on PUD/P4D/PGD size (whereas
PAGE_OFFSET is).
But actually we don't have to establish this mapping (ie set va_pa_offset)
that early in the boot process because:
- first, setup_vm installs a temporary kernel mapping and among other
things, discovers the system memory,
- then, setup_vm_final creates the final kernel mapping and takes
advantage of the discovered system memory to create the linear
mapping.
During the first phase, we don't know the start of the system memory and
then until the second phase is finished, we can't use the linear mapping at
all and phys_to_virt/virt_to_phys translations must not be used because it
would result in a different translation from the 'real' one once the final
mapping is installed.
So here we simply delay the initialization of va_pa_offset to after the
system memory discovery. But to make sure noone uses the linear mapping
before, we add some guard in the DEBUG_VIRTUAL config.
Finally we can use PUD/P4D/PGD hugepages when possible, which will result
in a better TLB utilization.
Note that:
- this does not apply to rv32 as the kernel mapping lies in the linear
mapping.
- we rely on the firmware to protect itself using PMP.
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Acked-by: Rob Herring <robh@kernel.org> # DT bits
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Tested-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20230324155421.271544-4-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
No change intended, it just splits the linear mapping creation from
setup_vm_final: this prepares for upcoming additions to the linear
mapping creation.
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Tested-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20230324155421.271544-3-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Use directly phys_ram_base instead, riscv_pfn_base is just the pfn of
the address contained in phys_ram_base.
Even if there is no functional change intended in this patch, actually
setting phys_ram_base that early changes the behaviour of
kernel_mapping_pa_to_va during the early boot: phys_ram_base used to be
zero before this patch and now it is set to the physical start address of
the kernel. But it does not break the conversion of a kernel physical
address into a virtual address since kernel_mapping_pa_to_va should only
be used on kernel physical addresses, i.e. addresses greater than the
physical start address of the kernel.
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Tested-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20230324155421.271544-2-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Other extensions only capitalise the first letter in the text visible
in Kconfig menus, and provide a short comment about the extension's
meaning. Do the same for Svnapot & Svpbmt.
The precedent for capitalisation in the Kconfig text was set by Zicbom
& sorta followed for Zicboz. The RVI styling used for multi-letter
extensions only capitalises the first letter, so do the same here.
If nothing else, my OCD likes it when the extensions follow a consistent
pattern.
While editing one of the lines, reformat the "spelling" of 64-bit.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230405-pucker-cogwheel-3a999a94a2f2@wendy
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
RISC-V now builds the sched domain based on the simple possible map.
Enable SCHED_MC to make the building based on cpu_coregroup_mask()
which also takes care of the NUMA and cores with LLC.
Signed-off-by: Song Shuai <suagrfillet@gmail.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230310110336.970985-1-suagrfillet@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
RISC-V now manages CPU topology using arch_topology which provides
CPU capacity and frequency related interfaces to access the cpu/freq
invariant in possible heterogeneous or DVFS-enabled platforms.
Here adds topology.h file to export the arch_topology interfaces for
replacing the scheduler's constant-based cpu/freq invariant accounting.
Signed-off-by: Song Shuai <suagrfillet@gmail.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Ley Foon Tan <lftan@kernel.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230323123924.3032174-1-suagrfillet@gmail.com
[Palmer: Fix the whitespace issues.]
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Evan Green <evan@rivosinc.com> says:
There's been a bunch of off-list discussions about this, including at
Plumbers. The original plan was to do something involving providing an
ISA string to userspace, but ISA strings just aren't sufficient for a
stable ABI any more: in order to parse an ISA string users need the
version of the specifications that the string is written to, the version
of each extension (sometimes at a finer granularity than the RISC-V
releases/versions encode), and the expected use case for the ISA string
(ie, is it a U-mode or M-mode string). That's a lot of complexity to
try and keep ABI compatible and it's probably going to continue to grow,
as even if there's no more complexity in the specifications we'll have
to deal with the various ISA string parsing oddities that end up all
over userspace.
Instead this patch set takes a very different approach and provides a set
of key/value pairs that encode various bits about the system. The big
advantage here is that we can clearly define what these mean so we can
ensure ABI stability, but it also allows us to encode information that's
unlikely to ever appear in an ISA string (see the misaligned access
performance, for example). The resulting interface looks a lot like
what arm64 and x86 do, and will hopefully fit well into something like
ACPI in the future.
The actual user interface is a syscall, with a vDSO function in front of
it. The vDSO function can answer some queries without a syscall at all,
and falls back to the syscall for cases it doesn't have answers to.
Currently we prepopulate it with an array of answers for all keys and
a CPU set of "all CPUs". This can be adjusted as necessary to provide
fast answers to the most common queries.
An example series in glibc exposing this syscall and using it in an
ifunc selector for memcpy can be found at [1].
I was asked about the performance delta between this and something like
sysfs. I created a small test program and ran it on a Nezha D1
Allwinner board. Doing each operation 100000 times and dividing, these
operations take the following amount of time:
- open()+read()+close() of /sys/kernel/cpu_byteorder: 3.8us
- access("/sys/kernel/cpu_byteorder", R_OK): 1.3us
- riscv_hwprobe() vDSO and syscall: .0094us
- riscv_hwprobe() vDSO with no syscall: 0.0091us
These numbers get farther apart if we query multiple keys, as sysfs will
scale linearly with the number of keys, where the dedicated syscall
stays the same. To frame these numbers, I also did a tight
fork/exec/wait loop, which I measured as 4.8ms. So doing 4
open/read/close operations is a delta of about 0.3%, versus a single vDSO
call is a delta of essentially zero.
[1] https://patchwork.ozlabs.org/project/glibc/list/?series=343050
* b4-shazam-merge:
RISC-V: Add hwprobe vDSO function and data
selftests: Test the new RISC-V hwprobe interface
RISC-V: hwprobe: Support probing of misaligned access performance
RISC-V: hwprobe: Add support for RISCV_HWPROBE_BASE_BEHAVIOR_IMA
RISC-V: Add a syscall for HW probing
RISC-V: Move struct riscv_cpuinfo to new header
Link: https://lore.kernel.org/r/20230407231103.2622178-1-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Add a vDSO function __vdso_riscv_hwprobe, which can sit in front of the
riscv_hwprobe syscall and answer common queries. We stash a copy of
static answers for the "all CPUs" case in the vDSO data page. This data
is private to the vDSO, so we can decide later to change what's stored
there or under what conditions we defer to the syscall. Currently all
data can be discovered at boot, so the vDSO function answers all queries
when the cpumask is set to the "all CPUs" hint.
There's also a boolean in the data that lets the vDSO function know that
all CPUs are the same. In that case, the vDSO will also answer queries
for arbitrary CPU masks in addition to the "all CPUs" hint.
Signed-off-by: Evan Green <evan@rivosinc.com>
Link: https://lore.kernel.org/r/20230407231103.2622178-7-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This allows userspace to select various routines to use based on the
performance of misaligned access on the target hardware.
Rather than adding DT bindings, this change taps into the alternatives
mechanism used to probe CPU errata. Add a new function pointer alongside
the vendor-specific errata_patch_func() that probes for desirable errata
(otherwise known as "features"). Unlike the errata_patch_func(), this
function is called on each CPU as it comes up, so it can save
feature information per-CPU.
The T-head C906 has fast unaligned access, both as defined by GCC [1],
and in performing a basic benchmark, which determined that byte copies
are >50% slower than a misaligned word copy of the same data size (source
for this test at [2]):
bytecopy size f000 count 50000 offset 0 took 31664899 us
wordcopy size f000 count 50000 offset 0 took 5180919 us
wordcopy size f000 count 50000 offset 1 took 13416949 us
[1] https://github.com/gcc-mirror/gcc/blob/master/gcc/config/riscv/riscv.cc#L353
[2] https://pastebin.com/EPXvDHSW
Co-developed-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Paul Walmsley <paul.walmsley@sifive.com>
Link: https://lore.kernel.org/r/20230407231103.2622178-5-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
We have an implicit set of base behaviors that userspace depends on,
which are mostly defined in various ISA specifications.
Co-developed-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Paul Walmsley <paul.walmsley@sifive.com>
Link: https://lore.kernel.org/r/20230407231103.2622178-4-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
We don't have enough space for these all in ELF_HWCAP{,2} and there's no
system call that quite does this, so let's just provide an arch-specific
one to probe for hardware capabilities. This currently just provides
m{arch,imp,vendor}id, but with the key-value pairs we can pass more in
the future.
Co-developed-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Paul Walmsley <paul.walmsley@sifive.com>
Link: https://lore.kernel.org/r/20230407231103.2622178-3-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
In preparation for tracking and exposing microarchitectural details to
userspace (like whether or not unaligned accesses are fast), move the
riscv_cpuinfo struct out to its own new cpufeatures.h header. It will
need to be used by more than just cpu.c.
Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Paul Walmsley <paul.walmsley@sifive.com>
Link: https://lore.kernel.org/r/20230407231103.2622178-2-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
There are a number of updates for devicetree files for Qualcomm,
Rockchips, and NXP i.MX platforms, addressing mistakes in the DT
contents:
- Wrong GPIO polarity on some boards
- Lower SD card interface speed for better stability
- Incorrect power supply, clock, pmic, cache properties
- Disable broken hbr3 on sc7280-herobrine
- Devicetree warning fixes
The only other changes are:
- A regression fix for the Amlogic performance monitoring unit driver,
along with two related DT changes.
- imx_v6_v7_defconfig enables PCI support again.
- Trivial fixes for tee, optee and psci firmware drivers, addressing
compiler warning and error output
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmQ+sQgACgkQYKtH/8kJ
UiftwQ/9G4zUbL4KZMZXmIrLTE/RYcJVLwnIk1Qh36Kj04tYE+uADRyxBdDcTCE6
hSwgQ98hvM8hDAgGUIEscO7FsoTWFs/ADXVQmJ/kgMzLdfldLkjJYsXiatmZFIlQ
7DW6ZtXV8ePUi/2Dk3al0bRuR+4xBAQf+0u/p9W7dFwttxeTyL87ApLgfQ9eoq9u
I60vwr9QNeUY19QpzHXX+AZRWS2R+uLqfBNEVV2NHl4Fy6iGqaBRc2q1Fby9Tt1h
793vDwNZz3+65xXL/XGDlKxh5OQtRK3FiWHXD9qTHzohUrYu3zMG2/ls8GczF3Vk
HYOtQp5xYNWI37JU2XlLIjWA4tuc0LUInVB2yK5uniIGKXaygnwnRI0IaEQkvWtW
tI89MOsPne7BgQv2boJh0FBA1yXYgL5WYFBx1x11kP71IRFf2LHrgutvuiIKzPF8
UiuxdrakT5FMZVr7pJTDr5Gk52qgR7PXXAGKc/oDj37JhXE1XuqSyNkbGRtujkkc
3x6etgAkuYcUQtuka0VVCLmG6Y/2Otn3dj+y+RYHFH7ljUDN6PfwveUqXNX9o0nz
AdI9gULbYWOf8iUc3hvT1tyJNudSIBeBXiqq/ovIklAme1scpKfjT54vQLyS4HYB
2repRE5v1LtXSyF2A3877EY4m6Tbcm4t+cUK2vl7NKVtYE4P8n8=
=5Uki
-----END PGP SIGNATURE-----
Merge tag 'arm-fixes-6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"There are a number of updates for devicetree files for Qualcomm,
Rockchips, and NXP i.MX platforms, addressing mistakes in the DT
contents:
- Wrong GPIO polarity on some boards
- Lower SD card interface speed for better stability
- Incorrect power supply, clock, pmic, cache properties
- Disable broken hbr3 on sc7280-herobrine
- Devicetree warning fixes
The only other changes are:
- A regression fix for the Amlogic performance monitoring unit
driver, along with two related DT changes.
- imx_v6_v7_defconfig enables PCI support again.
- Trivial fixes for tee, optee and psci firmware drivers, addressing
compiler warning and error output"
* tag 'arm-fixes-6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (32 commits)
firmware/psci: demote suspend-mode warning to info level
arm64: dts: qcom: sc7280: remove hbr3 support on herobrine boards
ARM: imx_v6_v7_defconfig: Fix unintentional disablement of PCI
arm64: dts: rockchip: correct panel supplies on some rk3326 boards
arm64: dts: rockchip: use just "port" in panel on RockPro64
arm64: dts: rockchip: use just "port" in panel on Pinebook Pro
ARM: dts: imx6ull-colibri: Remove unnecessary #address-cells/#size-cells
ARM: dts: imx7d-remarkable2: Remove unnecessary #address-cells/#size-cells
arm64: dts: imx8mp-verdin: correct off-on-delay
arm64: dts: imx8mm-verdin: correct off-on-delay
arm64: dts: imx8mm-evk: correct pmic clock source
arm64: dts: qcom: sc8280xp-pmics: fix pon compatible and registers
arm64: dts: rockchip: Remove non-existing pwm-delay-us property
arm64: dts: rockchip: Add clk_rtc_32k to Anbernic xx3 Devices
tee: Pass a pointer to virt_to_page()
perf/amlogic: adjust register offsets
arm64: dts: meson-g12-common: resolve conflict between canvas & pmu
arm64: dts: meson-g12-common: specify full DMC range
arm64: dts: imx8mp: fix address length for LCDIF2
riscv: dts: canaan: drop invalid spi-max-frequency
...
- Drop debug info from purgatory objects again
- Document that kernel.org provides prebuilt LLVM toolchains
- Give up handling untracked files for source package builds
- Avoid creating corrupted cpio when KBUILD_BUILD_TIMESTAMP is given
with a pre-epoch data.
- Change panic_show_mem() to a macro to handle variable-length argument
- Compress tarballs on-the-fly again
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmQ7zuUVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsG3/AP/12Y/zmiCY/R2NQmhYAGt05K9I5R
M0mQbdNLlzmXfrXPEAUVWbUcEOZ1jFxqkXJwhf1kYHmJ+2MeS/n1d4A+4x4eGXDE
a2WfB/G8FP5+T6+u/U/LDfMqVpbuAfBkZXu5QeceUlHyTY1Q1BAO9PUvNu+Df2P7
i1SBEDVzaN3WE1ZOc3pjQf0tIQybyM/x9EjGrh4DpBVTQYpmmgxpCdyzVr7yJvQ3
LFHy+OtX2WTqtgULUunH776pp59EGgoKLDSLck+wG7bdmYm5y/13uVGc1iBfGBAX
RUV8qaP1ijB0BHZZnzsppUjZshSeS97sOUVv6hiwRdWgr0ISfZuVYN0E2YlhBcFV
UUidlk1l1VOytM8/EDrYyHnTvmHm+glMp1FxRR48ymZr1PqVUxQcad0lPClylp1b
Xc50C2wkFwa5a8RkY0aIihrVpnbHBSiPVHvaF01kFwNUor+VASpanR/xtTr4b88x
OK9aImRII15CxoOZdWtvut4c0OHw4sbyzmCuXM/nyS6c5+yroM/QZZs+c2ja/QEv
QNlDW54JzU6u+JE4O7W/gH3mqKH8ytL7Z5hmVECiiCYWp77IBCE8B+3dXVGfdGUg
Wy8MgtvZMW0ZAseqBD6VmVXkQIizUAgpJvkJy7R6YTw52P6sT2P2WAcWUTi2FQTD
FpUEf8eMi1UTDQgG
=bFZ1
-----END PGP SIGNATURE-----
Merge tag 'kbuild-fixes-v6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Drop debug info from purgatory objects again
- Document that kernel.org provides prebuilt LLVM toolchains
- Give up handling untracked files for source package builds
- Avoid creating corrupted cpio when KBUILD_BUILD_TIMESTAMP is given
with a pre-epoch data.
- Change panic_show_mem() to a macro to handle variable-length argument
- Compress tarballs on-the-fly again
* tag 'kbuild-fixes-v6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: do not create intermediate *.tar for tar packages
kbuild: do not create intermediate *.tar for source tarballs
kbuild: merge cmd_archive_linux and cmd_archive_perf
init/initramfs: Fix argument forwarding to panic() in panic_show_mem()
initramfs: Check negative timestamp to prevent broken cpio archive
kbuild: give up untracked files for source package builds
Documentation/llvm: Add a note about prebuilt kernel.org toolchains
purgatory: fix disabling debug info
* A fix for a missing fence when generating the NOMMU sigreturn
trampoline.
* A set of fixes for early DTB handling of reserved memory nodes.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmQ5ZsoTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYiUhED/9IsUdJfMdQHzrL7XZL7U9FgHhSCVl4
HgpcpngG6Op+F8lGnVhQV4ivoVYrk85lR2JqJeES9FpChSmZXCptB6K9NFCDLdW+
9+Qnqa6NnmcJ9Aba7Ckr7zwApohtDGucegIKx3bqafsnBqh2SM/l7ljCWEuaxIbZ
5qFEbWGcILVGQrXJheLgkx7uGbBmEvrmLi9T5jZ1Vwltg1QS2STDjOCSR5cfBpx7
kqoWfSa1+fb5RF428KRvDTuIjUzkkKvUEo/yEzJMMp1s3vI2mKT8ytqQHw4GrQ9w
6X/wvabjDtz8iGRWsh4VVg7iJ+xImzpqRO4UXkd4wArCDdCvjSu4BpAs3cOqKVTl
4w+dTbQPjL22PYCdhBVmJH8K4TmJGzSkOPK8wcIlK1mQaRCAjAalSvCLQIN/+ttq
4AwkXLpfwLvJLhn78Y/7WS6dbedYjMbNxdz2Sy6yFm3qZj1dqfgS5yIuHpLVo7UW
AK7nS3FGJRyii8w/lywxDgLTdMeJiZ8/Xy0gJkWps5sGolBIPqSwUcLDhFS68HoM
6x+IzOXDlHnw7i/sj6Q10IRvjFVLieNDYniD7CnOYDKFSgUOvehCMlky7l4naRdO
vfVHsHPX38z5ZTeM/vZ1HKobKeTmi3Hy/C1MWUwMgj2nPkui777c9mAQLHY7ppeZ
dr4Cn/4GNfZz3w==
=1Tw7
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A fix for a missing fence when generating the NOMMU sigreturn
trampoline
- A set of fixes for early DTB handling of reserved memory nodes
* tag 'riscv-for-linus-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: No need to relocate the dtb as it lies in the fixmap region
riscv: Do not set initial_boot_params to the linear address of the dtb
riscv: Move early dtb mapping into the fixmap region
riscv: add icache flush for nommu sigreturn trampoline
We used to access the dtb via its linear mapping address but now that the
dtb early mapping was moved in the fixmap region, we can keep using this
address since it is present in swapper_pg_dir, and remove the dtb
relocation.
Note that the relocation was wrong anyway since early_memremap() is
restricted to 256K whereas the maximum fdt size is 2MB.
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230329081932.79831-4-alexghiti@rivosinc.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
early_init_dt_verify() is already called in parse_dtb() and since the dtb
address does not change anymore (it is now in the fixmap region), no need
to reset initial_boot_params by calling early_init_dt_verify() again.
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230329081932.79831-3-alexghiti@rivosinc.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
riscv establishes 2 virtual mappings:
- early_pg_dir maps the kernel which allows to discover the system
memory
- swapper_pg_dir installs the final mapping (linear mapping included)
We used to map the dtb in early_pg_dir using DTB_EARLY_BASE_VA, and this
mapping was not carried over in swapper_pg_dir. It happens that
early_init_fdt_scan_reserved_mem() must be called before swapper_pg_dir is
setup otherwise we could allocate reserved memory defined in the dtb.
And this function initializes reserved_mem variable with addresses that
lie in the early_pg_dir dtb mapping: when those addresses are reused
with swapper_pg_dir, this mapping does not exist and then we trap.
The previous "fix" was incorrect as early_init_fdt_scan_reserved_mem()
must be called before swapper_pg_dir is set up otherwise we could
allocate in reserved memory defined in the dtb.
So move the dtb mapping in the fixmap region which is established in
early_pg_dir and handed over to swapper_pg_dir.
Fixes: 922b0375fc ("riscv: Fix memblock reservation for device tree blob")
Fixes: 8f3a2b4a96 ("RISC-V: Move DT mapping outof fixmap")
Fixes: 50e63dd8ed ("riscv: fix reserved memory setup")
Reported-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/all/f8e67f82-103d-156c-deb0-d6d6e2756f5e@microchip.com/
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230329081932.79831-2-alexghiti@rivosinc.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Now that of_cpu_device_node_get() is defined in of.h, of_device.h is just
implicitly including other includes, and is no longer needed. Adjust the
include files with what was implicitly included by of_device.h (cpu.h and
of.h) and drop including of_device.h.
Link: https://lore.kernel.org/r/20230329-dt-cpu-header-cleanups-v1-9-581e2605fe47@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Removing the include of cpu.h from of_device.h (included by
of_platform.h) causes an error in setup.c:
arch/riscv/kernel/setup.c:313:22: error: arithmetic on a pointer to an incomplete type 'typeof(struct cpu)' (aka 'struct cpu')
The of_platform.h header is not necessary either, so it can be dropped.
Link: https://lore.kernel.org/r/20230329-dt-cpu-header-cleanups-v1-8-581e2605fe47@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
This reverts commit baf7cbd94b.
There are some duplicate cache attributes populations executed
in both ci_leaf_init() and later cache_setup_properties().
Revert the commit baf7cbd94b ("riscv: Set more data to cacheinfo")
to setup only the level and type attributes at this early place.
Signed-off-by: Song Shuai <suagrfillet@gmail.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230308064734.512457-1-suagrfillet@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
In a NOMMU kernel, sigreturn trampolines are generated on the user
stack by setup_rt_frame. Currently, these trampolines are not instruction
fenced, thus their visibility to ifetch is not guaranteed.
This patch adds a flush_icache_range in setup_rt_frame to fix this
problem.
Signed-off-by: Mathis Salmen <mathis.salmen@matsal.de>
Fixes: 6bd33e1ece ("riscv: add nommu support")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230406101130.82304-1-mathis.salmen@matsal.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The RISC-V calling convention passes the first argument, and the
return value in the a0 register. For this reason, the a0 register
needs some extra care; When handling syscalls, the a0 register is
saved into regs->orig_a0, so a0 can be properly restored for,
e.g. interrupted syscalls.
This functionality was broken with the introduction of the generic
entry patches. Here, a0 was saved into orig_a0 after calling
syscall_enter_from_user_mode(), which can change regs->a0 for some
paths, incorrectly restoring a0.
This is resolved, by saving a0 prior doing the
syscall_enter_from_user_mode() call.
Fixes: f0bddf5058 ("riscv: entry: Convert to generic entry")
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Reported-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Andy Chiu <andy.chiu@sifive.com>
Link: https://lore.kernel.org/r/20230403065207.1070974-1-bjorn@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Commit 370f696e44 ("dt-bindings: serial: snps-dw-apb-uart: add dma &
dma-names properties") documented dma-names property to handle Allwinner
D1 dtbs_check warnings, but relies on the rx->tx ordering, which is the
reverse of what a bunch of different boards expect.
The initial proposed solution was to allow a flexible dma-names order in
the binding, due to potential ABI breakage concerns after fixing the DTS
files. But luckily the Allwinner boards are not affected, since they are
using a shared DMA channel for rx and tx.
Hence, the first step in fixing the inconsistency was to change
dma-names order in the binding to tx->rx.
Do the same for the snps,dw-apb-uart nodes in the DTS file.
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230321215624.78383-7-cristian.ciocaltea@collabora.com
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Since 32ef9e5054, -Wa,-gdwarf-2 is no longer used in KBUILD_AFLAGS.
Instead, it includes -g, the appropriate -gdwarf-* flag, and also the
-Wa versions of both of those if building with Clang and GNU as. As a
result, debug info was being generated for the purgatory objects, even
though the intention was that it not be.
Fixes: 32ef9e5054 ("Makefile.debug: re-enable debug info for .S files")
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Cc: stable@vger.kernel.org
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
If we have specialized interrupt controller (such as AIA IMSIC) which
allows supervisor mode to directly inject IPIs without any assistance
from M-mode or HS-mode then using such specialized interrupt controller,
we can do remote icache flushe directly from supervisor mode instead of
using the SBI RFENCE calls.
This patch extends remote icache flush functions to use supervisor mode
IPIs whenever direct supervisor mode IPIs.are supported by interrupt
controller.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230328035223.1480939-7-apatel@ventanamicro.com
If we have specialized interrupt controller (such as AIA IMSIC) which
allows supervisor mode to directly inject IPIs without any assistance
from M-mode or HS-mode then using such specialized interrupt controller,
we can do remote TLB flushes directly from supervisor mode instead of
using the SBI RFENCE calls.
This patch extends remote TLB flush functions to use supervisor mode
IPIs whenever direct supervisor mode IPIs.are supported by interrupt
controller.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230328035223.1480939-6-apatel@ventanamicro.com
To do remote FENCEs (i.e. remote TLB flushes) using IPI calls on the
RISC-V kernel, we need hardware mechanism to directly inject IPI from
the supervisor mode (i.e. RISC-V kernel) instead of using SBI calls.
The upcoming AIA IMSIC devices allow direct IPI injection from the
supervisor mode (i.e. RISC-V kernel). To support this, we extend the
riscv_ipi_set_virq_range() function so that IPI provider (i.e. irqchip
drivers can mark IPIs as suitable for remote FENCEs.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230328035223.1480939-5-apatel@ventanamicro.com
Currently, the RISC-V kernel provides arch specific hooks (i.e.
struct riscv_ipi_ops) to register IPI handling methods. The stats
gathering of IPIs is also arch specific in the RISC-V kernel.
Other architectures (such as ARM, ARM64, and MIPS) have moved away
from custom arch specific IPI handling methods. Currently, these
architectures have Linux irqchip drivers providing a range of Linux
IRQ numbers to be used as IPIs and IPI triggering is done using
generic IPI APIs. This approach allows architectures to treat IPIs
as normal Linux IRQs and IPI stats gathering is done by the generic
Linux IRQ subsystem.
We extend the RISC-V IPI handling as-per above approach so that arch
specific IPI handling methods (struct riscv_ipi_ops) can be removed
and the IPI handling is done through the Linux IRQ subsystem.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230328035223.1480939-4-apatel@ventanamicro.com
Various RISC-V drivers (such as SBI IPI, SBI Timer, SBI PMU, and
KVM RISC-V) don't have associated DT node but these drivers need
standard per-CPU (local) interrupts defined by the RISC-V privileged
specification.
We add riscv_get_intc_hwnode() in arch/riscv which allows RISC-V
drivers not having DT node to discover INTC hwnode which in-turn
helps these drivers to map per-CPU (local) interrupts provided
by the INTC driver.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230328035223.1480939-3-apatel@ventanamicro.com
The software interrupt pending (i.e. [M|S]SIP) bit is writeable for
S-mode but read-only for M-mode so we clear this bit only when using
SBI IPI operations.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230328035223.1480939-2-apatel@ventanamicro.com
As for now all arches have dma_default_coherent reflecting default
DMA coherency for of devices, so there is no need to have a standalone
config option.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Christoph Hellwig <hch@lst.de>
A solitary fix here from Krzysztof for an invalid property that
should've probably been removed months ago, but was missed due to it
being in a dtb that doesn't build w/ defconfig.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRh246EGq/8RLhDjO14tDGHoIJi0gUCZC8BcgAKCRB4tDGHoIJi
0lDNAQCrDplQivDH8k8/z5mQFnB+1792ZQwv2cRooCOFkilt8QEA3zV7f4+zxPnW
4yCGEJweZ0qO042Kjt4l1tqBdoeaRgc=
=anjw
-----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmQvN0QACgkQmmx57+YA
GNmbjw/+P6+fQ5Idr0gmTKS90oZIbZv+pQnJZ7kqoCFCIYKVpFtV3kpAz3WsMqtW
v6V1K8Xr6HGbLxF2MGOuzeBfQGnGIW6uGE/Cq6sd2uZ2FQdi8Ty4DlfODzEeaE8D
9tZpHmILjnDRHjVS9vhQmeU9/Zl76kgr4Zva6ICdurn58uBmed0RUR69TasbepJs
qh/ahTqUqul84Wm0OL79UMHzwH6rG2y+GDGAPrVN15M0vtZImI4VnivTA8A9sr6J
Dlyez29doQSG6JAr5eEsLC68IQok4ekF+YTczLepwONjYut1rAyWTAaTvI7Nb+tZ
LIblhBOAm4BJAxWI+f59M0vt/PffghG/gGPZQYHTztqIps+LbIjM87DEBS5F3w5a
Xc0h0n4CIqOpOruz8YQ835PGgQjZ1EL4Wv0cgWPCbfKia9onedXb5olw2U4VtEQJ
3dH8AinUeH2pzGHBYHLa9VoisRD2w9YvhuRZbdEbP7imzfU4bz1qMakvArNJVmvo
i3V++kQAIuvVUVVzlc+LT5VojEEizO1Gnioz+qZ8syn1b2PmXQVVGFOZ5czU8Eju
InNovGG2SMYIkzT7Ha/9kfYMBL1OLLQLtoXNIZghn7VtzgeFlCuIMsuJBbTZXzPy
e2SRsJgBb4gQP/UOZZ9sTijV4JJikEk6vCvatG+Ik8362sG7VRM=
=5QeF
-----END PGP SIGNATURE-----
Merge tag 'riscv-dt-fixes-for-v6.3-final' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into arm/fixes
RISC-V Devicetree fixes for v6.3-final
A solitary fix here from Krzysztof for an invalid property that
should've probably been removed months ago, but was missed due to it
being in a dtb that doesn't build w/ defconfig.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
* tag 'riscv-dt-fixes-for-v6.3-final' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux:
riscv: dts: canaan: drop invalid spi-max-frequency
Link: https://lore.kernel.org/r/20230406-negate-octagon-0fc2e47dbde5@spud
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Merge Hal's series adding support for the new StarFive JH7110 SoC.
There's a few bindings here for core components that were not picked up
by the various maintainers for the subsystems (previously Palmer would
pick these up via the RISC-V tree) & the first two commits in the branch
are shared with the clk tree, since the dts depends on defines in the
dt-binding headers.
This is based on -rc2, as the board does not actually boot on -rc1
due to the bug Linus introduced.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
We introduce a new HAS_IOPORT Kconfig option to indicate support for I/O
Port access. In a future patch HAS_IOPORT=n will disable compilation of
the I/O accessor functions inb()/outb() and friends on architectures
which can not meaningfully support legacy I/O spaces such as s390.
The following architectures do not select HAS_IOPORT:
* ARC
* C-SKY
* Hexagon
* Nios II
* OpenRISC
* s390
* User-Mode Linux
* Xtensa
All other architectures select HAS_IOPORT at least conditionally.
The "depends on" relations on HAS_IOPORT in drivers as well as ifdefs
for HAS_IOPORT specific sections will be added in subsequent patches on
a per subsystem basis.
Co-developed-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@kernel.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net> # for ARCH=um
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Add a minimal device tree for StarFive JH7110 VisionFive 2 board
which has version A and version B. Support booting and basic
clock/reset/pinctrl/uart drivers.
Tested-by: Tommaso Merciai <tomm.merciai@gmail.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Emil Renner Berthing <kernel@esmil.dk>
Co-developed-by: Jianlong Huang <jianlong.huang@starfivetech.com>
Signed-off-by: Jianlong Huang <jianlong.huang@starfivetech.com>
Co-developed-by: Hal Feng <hal.feng@starfivetech.com>
Signed-off-by: Hal Feng <hal.feng@starfivetech.com>
Reviewed-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Now that the SRCU Kconfig option is unconditionally selected, there is
no longer any point in selecting it. Therefore, remove the "select SRCU"
Kconfig statements from the various KVM Kconfig files.
Acked-by: Sean Christopherson <seanjc@google.com> (x86)
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <kvm@vger.kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org> (arm64)
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Anup Patel <anup@brainfault.org> (riscv)
Acked-by: Heiko Carstens <hca@linux.ibm.com> (s390)
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
* A fix for FPU probing in XIP kernels.
* Always enable the alternative framework for non-XIP kernels.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmQm9ToTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYib30D/0ccbYz0wflpuRaH1Rx011HaI+PeqA0
b/yG9Wq0xCS/iIAFgH97sMj8DQMsUb6Gut64kiG0rn7S29jg/t7UMTPHsXWLdhzk
+deyPYQdmOLl30OIVxCLFRCQGUxv8fS0UAAbS0AbrXqiqym/7NEigPXS1ny46hxJ
vB1xJxDc3Z78dlANdyuD9jnTrdagtPlQgsdeKqtGkx/EehrgQiR3ap9emjjCaVQp
jFcRlcDP0G0LSeJusDA58ufZggpjrB77yVeREBD0kZ8ukeGcMtCXsRje1qTH1vkc
ngYzgr0IVasugZK48ErnPirgjy2dOycEprWH7qixR2Kly0E5xxY3gCO1xEj2skbL
kA36S6zMvPUv8h9f2313zDv7oZKr+iSnhKB2OWSzctIYQeoHBKNyzeqi0Hb44Sa9
BtwO6yxQ3KOlX+LUjT8Us2ZoRF+lPIADHvORHJPr5LsJeLR83PoGJc5PZmi++6qA
CHn0ap1Ob6N9jwH5quBr6O3TQjM78i1r/U4Vh7FR7xXAN/peNJ6MFsXnmJo91adn
Q+lzR4gcsxjrWHAcS0d7cAoNdwE6oPhEdgcMrADy8d3g2MJw+oX2e2333fw7CLeh
ZXM/Tq+0CW0QPY4XI/LWvb2aeuUUdGJFiLrASAHCxO+gK0sIhLqjqLk7DJ3+Tz9g
Z79aVHpliwj11A==
=rwtl
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A fix for FPU probing in XIP kernels
- Always enable the alternative framework for non-XIP kernels
* tag 'riscv-for-linus-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: always select RISCV_ALTERNATIVE for non-xip kernels
RISC-V: add non-alternative fallback for riscv_has_extension_[un]likely()
Conflicts:
drivers/net/ethernet/mediatek/mtk_ppe.c
3fbe4d8c0e ("net: ethernet: mtk_eth_soc: ppe: add support for flow accounting")
924531326e ("net: ethernet: mtk_eth_soc: add missing ppe cache flush when deleting a flow")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
for-next contains two additional extensions that select
RISCV_ALTERNATIVE. RISCV_ALTERNATIVE no longer needs to be selected by
individual config options as it is now selected for !XIP_KERNEL builds
by the top level RISCV option.
These extensions rely on the alternative framework, so convert the
"select"s to "depends on"s instead.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230324121240.3594777-1-conor.dooley@microchip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Conor Dooley <conor.dooley@microchip.com> says:
Here's my attempt at fixing both the use of an FPU on XIP kernels and
the issue that Jason ran into where CONFIG_FPU, which needs the
alternatives frame work for has_fpu() checks, could be enabled without
the alternatives actually being present.
For the former, a "slow" fallback that does not use alternatives is
added to riscv_has_extension_[un]likely() that can be used with XIP.
Obviously, we want to make use of Jisheng's alternatives based approach
where possible, so any users of riscv_has_extension_[un]likely() will
want to make sure that they select RISCV_ALTERNATIVE.
If they don't however, they'll hit the fallback path which (should,
sparing a silly mistake from me!) behave in the same way, thus
succeeding silently. Sounds like a
To prevent "depends on !XIP_KERNEL; select RISCV_ALTERNATIVE" spreading
like the plague through the various places that want to check for the
presence of extensions, and sidestep the potential silent "success"
mentioned above, all users RISCV_ALTERNATIVE are converted from selects
to dependencies, with the option being selected for all !XIP_KERNEL
builds.
I know that the VDSO was a key place that Jisheng wanted to use the new
helper rather than static branches, and I think the fallback path
should not cause issues there.
See the thread at [1] for the prior discussion.
1 - https://lore.kernel.org/linux-riscv/20230128172856.3814-1-jszhang@kernel.org/T/#m21390d570997145d31dd8bb95002fd61f99c6573
[Palmer: these were also merged into fixes, but there's a cleanup that
depends on the merge so I'm taking it into for-next as well.]
* b4-shazam-merge:
RISC-V: always select RISCV_ALTERNATIVE for non-xip kernels
RISC-V: add non-alternative fallback for riscv_has_extension_[un]likely()
Link: https://lore.kernel.org/r/20230324100538.3514663-1-conor.dooley@microchip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
* commit '1ee7fc3f4d0a93831a20d5566f203d5ad6d44de8':
RISC-V: always select RISCV_ALTERNATIVE for non-xip kernels
RISC-V: add non-alternative fallback for riscv_has_extension_[un]likely()
Conor Dooley <conor.dooley@microchip.com> says:
Here's my attempt at fixing both the use of an FPU on XIP kernels and
the issue that Jason ran into where CONFIG_FPU, which needs the
alternatives frame work for has_fpu() checks, could be enabled without
the alternatives actually being present.
For the former, a "slow" fallback that does not use alternatives is
added to riscv_has_extension_[un]likely() that can be used with XIP.
Obviously, we want to make use of Jisheng's alternatives based approach
where possible, so any users of riscv_has_extension_[un]likely() will
want to make sure that they select RISCV_ALTERNATIVE.
If they don't however, they'll hit the fallback path which (should,
sparing a silly mistake from me!) behave in the same way, thus
succeeding silently. Sounds like a
To prevent "depends on !XIP_KERNEL; select RISCV_ALTERNATIVE" spreading
like the plague through the various places that want to check for the
presence of extensions, and sidestep the potential silent "success"
mentioned above, all users RISCV_ALTERNATIVE are converted from selects
to dependencies, with the option being selected for all !XIP_KERNEL
builds.
I know that the VDSO was a key place that Jisheng wanted to use the new
helper rather than static branches, and I think the fallback path
should not cause issues there.
See the thread at [1] for the prior discussion.
1 - https://lore.kernel.org/linux-riscv/20230128172856.3814-1-jszhang@kernel.org/T/#m21390d570997145d31dd8bb95002fd61f99c6573
[Palmer: merging in the fixes as a branch as there's some features that
depend on it.]
* b4-shazam-merge:
RISC-V: always select RISCV_ALTERNATIVE for non-xip kernels
RISC-V: add non-alternative fallback for riscv_has_extension_[un]likely()
Link: https://lore.kernel.org/r/20230324100538.3514663-1-conor.dooley@microchip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
When moving switch_to's has_fpu() over to using
riscv_has_extension_likely() rather than static branches, the FPU code
gained a dependency on the alternatives framework.
That dependency has now been removed, as riscv_has_extension_ikely() now
contains a fallback path, using __riscv_isa_extension_available(), but
if CONFIG_RISCV_ALTERNATIVE isn't selected when CONFIG_FPU is, has_fpu()
checks will not benefit from the "fast path" that the alternatives
framework provides.
We want to ensure that alternatives are available whenever
riscv_has_extension_[un]likely() is used, rather than silently falling
back to the slow path, but rather than rely on selecting
RISCV_ALTERNATIVE in the myriad of locations that may use
riscv_has_extension_[un]likely(), select it (almost) always instead by
adding it to the main RISCV config entry.
xip kernels cannot make use of the alternatives framework, so it is not
enabled for those configurations, although this is the status quo.
All current sites that select RISCV_ALTERNATIVE are converted to
dependencies on the option instead. The explicit dependencies on
!XIP_KERNEL can be dropped, as RISCV_ALTERNATIVE is not user selectable.
Fixes: 702e64550b ("riscv: fpu: switch has_fpu() to riscv_has_extension_likely()")
Link: https://lore.kernel.org/all/ZBruFRwt3rUVngPu@zx2c4.com/
Reported-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20230324100538.3514663-3-conor.dooley@microchip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The has_fpu() check, which in turn calls riscv_has_extension_likely(),
relies on alternatives to figure out whether the system has an FPU.
As a result, it will malfunction on XIP kernels, as they do not support
the alternatives mechanism.
When alternatives support is not present, fall back to using
__riscv_isa_extension_available() in riscv_has_extension_[un]likely()
instead stead, which handily takes the same argument, so that kernels
that do not support alternatives can accurately report the presence of
FPU support.
Fixes: 702e64550b ("riscv: fpu: switch has_fpu() to riscv_has_extension_likely()")
Link: https://lore.kernel.org/all/ad445951-3d13-4644-94d9-e0989cda39c3@spud/
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20230324100538.3514663-2-conor.dooley@microchip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Jesse Taube <mr.bossman075@gmail.com> says:
This patch-set aims to add NOMMU support to RV32.
Many people want to build simple emulators or HDL
models of RISC-V this patch makes it possible to
run linux on them.
Yimin Gu is the original author of this set.
Submitted here:
https://lists.buildroot.org/pipermail/buildroot/2022-November/656134.html
Though Jesse T rewrote the Dconf.
* b4-shazam-merge:
riscv: configs: Add nommu PHONY defconfig for RV32
riscv: Kconfig: Allow RV32 to build with no MMU
Link: https://lore.kernel.org/r/20230301002657.352637-1-Mr.Bossman075@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
32bit risc-v can be configured to run without MMU. Introduce
rv32_nommu_virt_defconfig .PHONY target, that is based on
nommu_virt_defconfig. This is similar to how rv32_defconfig
is based on "defconfig".
Suggested-by: Conor Dooley <conor@kernel.org>
Signed-off-by: Jesse Taube <Mr.Bossman075@gmail.com>
Cc: Yimin Gu <ustcymgu@gmail.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230301002657.352637-4-Mr.Bossman075@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Some RISC-V 32bit cores do not have an MMU, and the kernel should be
able to build for them. This patch enables the RV32 to be built with
no MMU support.
Signed-off-by: Yimin Gu <ustcymgu@gmail.com>
CC: Jesse Taube <Mr.Bossman075@gmail.com>
Tested-by: Waldemar Brodkorb <wbx@openadk.org>
Signed-off-by: Jesse Taube <Mr.Bossman075@gmail.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230301002657.352637-3-Mr.Bossman075@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
* Fix VM hang in case of timer delta being zero
ARM:
* Fixes for the MMU:
* Read the MMU notifier seq before dropping the mmap lock to guard
against reading a potentially stale VMA
* Disable interrupts when walking user page tables to protect against
the page table being freed
* Read the MTE permissions for the VMA within the mmap lock critical
section, avoiding the use of a potentally stale VMA pointer
* Fixes for the vPMU:
* Return the sum of the current perf event value and PMC snapshot for
reads from userspace
* Don't save the value of guest writes to PMCR_EL0.{C,P}, which could
otherwise lead to userspace erroneously resetting the vPMU during VM
save/restore
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmQh12wUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMPyAgAhe9P5efFlkg/N/BUoxCdiQjhTEGO
BbNtZFbSgdqsTA0No5sjSsq8+dYuj4IpYHxvTRNXffxpn/6x736x9ff6fH3QwY0P
n65q37m3XrBzbaUv2XlX6l4GJC/KHWpXsawVszmnQpw8wYUYglo9JWnWVvpo/vLV
byWXcG9Rt0MQVnV13bYtjnpdNYCdVuhZuvjDBvbBa2I9BRKShvWyzcRcc48K1RNq
UI5PCEb8c2NUOB/6b9lRkEaNYLssVLOxpI3abGJP+IVSl/WE+hgCbFpz7ZkYwt79
AYm4wnZTSKBMQb5P2IOlgkfasvEyFiZ4Jj7x3RLXcXSWIB6g0J20iNasYg==
=qX9k
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"RISC-V:
- Fix VM hang in case of timer delta being zero
ARM:
- MMU fixes:
- Read the MMU notifier seq before dropping the mmap lock to guard
against reading a potentially stale VMA
- Disable interrupts when walking user page tables to protect
against the page table being freed
- Read the MTE permissions for the VMA within the mmap lock
critical section, avoiding the use of a potentally stale VMA
pointer
- vPMU fixes:
- Return the sum of the current perf event value and PMC snapshot
for reads from userspace
- Don't save the value of guest writes to PMCR_EL0.{C,P}, which
could otherwise lead to userspace erroneously resetting the vPMU
during VM save/restore"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
riscv/kvm: Fix VM hang in case of timer delta being zero.
KVM: arm64: Check for kvm_vma_mte_allowed in the critical section
KVM: arm64: Disable interrupts while walking userspace PTs
KVM: arm64: Retry fault if vma_lookup() results become invalid
KVM: arm64: PMU: Don't save PMCR_EL0.{C,P} for the vCPU
KVM: arm64: PMU: Fix GET_ONE_REG for vPMC regs to return the current value
After the commit 93d102f094 ("printk: remove safe buffers"),
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT is no longer useful. Remove it.
Signed-off-by: Marc Aurèle La France <tsi@tuyoix.net>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
[pmladek@suse.cz: Cleaned up the commit message.]
Signed-off-by: Petr Mladek <pmladek@suse.com>
Fixes: 93d102f094 ("printk: remove safe buffers")
Link: https://lore.kernel.org/r/5c19e248-1b6b-330c-7c4c-a824688daefe@tuyoix.net
The spi-max-frequency is a property of SPI children, not the
controller:
k210_generic.dtb: spi@50240000: Unevaluated properties are not allowed ('spi-max-frequency' was unexpected)
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
guoren@kernel.org <guoren@kernel.org> says:
From: Guo Ren <guoren@linux.alibaba.com>
The patches convert riscv to use the generic entry infrastructure from
kernel/entry/*. Some optimization for entry.S with new .macro and merge
ret_from_kernel_thread into ret_from_fork.
* b4-shazam-merge:
riscv: entry: Consolidate general regs saving/restoring
riscv: entry: Consolidate ret_from_kernel_thread into ret_from_fork
riscv: entry: Remove extra level wrappers of trace_hardirqs_{on,off}
riscv: entry: Convert to generic entry
riscv: entry: Add noinstr to prevent instrumentation inserted
riscv: ptrace: Remove duplicate operation
Link: https://lore.kernel.org/r/20230222033021.983168-1-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
* A fix to match the CSR ASID masking rules when passing ASIDs to
firmware.
* Force GCC to use ISA 2.2, to avoid a host of compatibily issues
between toolchains.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmQdtzITHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYiV/cD/oDpfixwripxGxnVXaRcQiNbQLrrY3U
Ft2paTcBBlD0pOIRi9NlMWS0GTxYUqOvTrxZmgMUu6yq7lZ/CegOKjhwqeCnzAV2
c0O1ZNQ5ljSJHflDQEeu9cM6VGpb9UWozWOKwS23MLnObGN1DvzhQ9zLR9vKd5bh
ErqSAuFpeWS53UvXtrbAR72U5Hdy+hCz9syYQkubpjCxvKl9yA9n1c0eevcfjO/6
epzABtW+UVRfm7zHRFEarqy9rkLn9e35Oo1nn5+owwglCtCVAikIXQm/eGZcvV2e
al2DoMXazSK92hbP/+R3MNdUgJ0PPv9CDpL++5jYtolmYszj1qPff8O2lhaPzTH/
947aVNdlmfeBpbCwNACTQHAQGw6KEta6a9bnr/FXF4ruSJTy/YTq0aa611MsJ532
jCcGB76HHefk0WHsOFaZCG75TWsacM1uohwZ6wZ73x2xl+2BunWKGKnac7wb0hnQ
JDikhiEOA6J2hAltRutQTHcKoKCdFOzGWpYmh+86AefMOzUbW5UKVEsT/cTi0nBb
03m9M5FiHgDCFD9HJPpJwDYIuntbbmzXVHGZx5afgxCsfnE33Pi3vPI4GOZw9eY1
HBKapZM2jfeC9ptNkePlA8Il51DtAwBi+McL9DpfbcSNLJRRaBJMedx8uZU92CWY
fkbSwdhhEob1rQ==
=13qD
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A fix to match the CSR ASID masking rules when passing ASIDs to
firmware
- Force GCC to use ISA 2.2, to avoid a host of compatibily issues
between toolchains
* tag 'riscv-for-linus-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Handle zicsr/zifencei issues between clang and binutils
riscv: mm: Fix incorrect ASID argument when flushing TLB
To be able to trace invocations of smp_send_reschedule(), rename the
arch-specific definitions of it to arch_smp_send_reschedule() and wrap it
into an smp_send_reschedule() that contains a tracepoint.
Changes to include the declaration of the tracepoint were driven by the
following coccinelle script:
@func_use@
@@
smp_send_reschedule(...);
@include@
@@
#include <trace/events/ipi.h>
@no_include depends on func_use && !include@
@@
#include <...>
+
+ #include <trace/events/ipi.h>
[csky bits]
[riscv bits]
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Guo Ren <guoren@kernel.org>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230307143558.294354-6-vschneid@redhat.com
There are two related issues that appear in certain combinations with
clang and GNU binutils.
The first occurs when a version of clang that supports zicsr or zifencei
via '-march=' [1] (i.e, >= 17.x) is used in combination with a version
of GNU binutils that do not recognize zicsr and zifencei in the
'-march=' value (i.e., < 2.36):
riscv64-linux-gnu-ld: -march=rv64i2p0_m2p0_a2p0_c2p0_zicsr2p0_zifencei2p0: Invalid or unknown z ISA extension: 'zifencei'
riscv64-linux-gnu-ld: failed to merge target specific data of file fs/efivarfs/file.o
riscv64-linux-gnu-ld: -march=rv64i2p0_m2p0_a2p0_c2p0_zicsr2p0_zifencei2p0: Invalid or unknown z ISA extension: 'zifencei'
riscv64-linux-gnu-ld: failed to merge target specific data of file fs/efivarfs/super.o
The second occurs when a version of clang that does not support zicsr or
zifencei via '-march=' (i.e., <= 16.x) is used in combination with a
version of GNU as that defaults to a newer ISA base spec, which requires
specifying zicsr and zifencei in the '-march=' value explicitly (i.e, >=
2.38):
../arch/riscv/kernel/kexec_relocate.S: Assembler messages:
../arch/riscv/kernel/kexec_relocate.S:147: Error: unrecognized opcode `fence.i', extension `zifencei' required
clang-12: error: assembler command failed with exit code 1 (use -v to see invocation)
This is the same issue addressed by commit 6df2a016c0 ("riscv: fix
build with binutils 2.38") (see [2] for additional information) but
older versions of clang miss out on it because the cc-option check
fails:
clang-12: error: invalid arch name 'rv64imac_zicsr_zifencei', unsupported standard user-level extension 'zicsr'
clang-12: error: invalid arch name 'rv64imac_zicsr_zifencei', unsupported standard user-level extension 'zicsr'
To resolve the first issue, only attempt to add zicsr and zifencei to
the march string when using the GNU assembler 2.38 or newer, which is
when the default ISA spec was updated, requiring these extensions to be
specified explicitly. LLVM implements an older version of the base
specification for all currently released versions, so these instructions
are available as part of the 'i' extension. If LLVM's implementation is
updated in the future, a CONFIG_AS_IS_LLVM condition can be added to
CONFIG_TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI.
To resolve the second issue, use version 2.2 of the base ISA spec when
using an older version of clang that does not support zicsr or zifencei
via '-march=', as that is the spec version most compatible with the one
clang/LLVM implements and avoids the need to specify zicsr and zifencei
explicitly due to still being a part of 'i'.
[1]: 22e199e6af
[2]: https://lore.kernel.org/ZAxT7T9Xy1Fo3d5W@aurel32.net/
Cc: stable@vger.kernel.org
Link: https://github.com/ClangBuiltLinux/linux/issues/1808
Co-developed-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230313-riscv-zicsr-zifencei-fiasco-v1-1-dd1b7840a551@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The ret_from_kernel_thread() behaves similarly with ret_from_fork(),
the only difference is whether call the fn(arg) or not, this can be
achieved by testing fn is NULL or not, I.E s0 is 0 or not. Many
architectures have done the same thing, it makes entry.S more clean.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Tested-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230222033021.983168-7-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Since riscv is converted to generic entry, there's no need for the
extra wrappers of trace_hardirqs_{on,off}.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230222033021.983168-6-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This patch converts riscv to use the generic entry infrastructure from
kernel/entry/*. The generic entry makes maintainers' work easier and
codes more elegant. Here are the changes:
- More clear entry.S with handle_exception and ret_from_exception
- Get rid of complex custom signal implementation
- Move syscall procedure from assembly to C, which is much more
readable.
- Connect ret_from_fork & ret_from_kernel_thread to generic entry.
- Wrap with irqentry_enter/exit and syscall_enter/exit_from_user_mode
- Use the standard preemption code instead of custom
Suggested-by: Huacai Chen <chenhuacai@kernel.org>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Yipeng Zou <zouyipeng@huawei.com>
Tested-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Link: https://lore.kernel.org/r/20230222033021.983168-5-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Without noinstr the compiler is free to insert instrumentation (think
all the k*SAN, KCov, GCov, ftrace etc..) which can call code we're not
yet ready to run this early in the entry path, for instance it could
rely on RCU which isn't on yet, or expect lockdep state. (by peterz)
Link: https://lore.kernel.org/linux-riscv/YxcQ6NoPf3AH0EXe@hirez.programming.kicks-ass.net/
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230222033021.983168-4-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The TIF_SYSCALL_TRACE is controlled by a common code, see
kernel/ptrace.c and include/linux/thread_info.h.
clear_task_syscall_work(child, SYSCALL_TRACE);
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230222033021.983168-3-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Currently, we pass the CONTEXTID instead of the ASID to the TLB flush
function. We should only take the ASID field to prevent from touching
the reserved bit field.
Fixes: 3f1e782998 ("riscv: add ASID-based tlbflushing methods")
Signed-off-by: Dylan Jhong <dylan@andestech.com>
Reviewed-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
Link: https://lore.kernel.org/r/20230313034906.2401730-1-dylan@andestech.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The actual intention is that no dynamic relocation exists in the VDSO. For
this the VDSO build validates that the resulting .so file does not have any
relocations which are specified via $(ARCH_REL_TYPE_ABS) per architecture,
which is fragile as e.g. ARM64 lacks an entry for R_AARCH64_RELATIVE. Aside
of that ARCH_REL_TYPE_ABS is a misnomer as it checks for relative
relocations too.
However, some GNU ld ports produce unneeded R_*_NONE relocation entries. If
a port fails to determine the exact .rel[a].dyn size, the trailing zeros
become R_*_NONE relocations. E.g. ld's powerpc port recently fixed
https://sourceware.org/bugzilla/show_bug.cgi?id=29540). R_*_NONE are
generally a no-op in the dynamic loaders. So just ignore them.
Remove the ARCH_REL_TYPE_ABS defines and just validate that the resulting
.so file does not contain any R_* relocation entries except R_*_NONE.
Signed-off-by: Fangrui Song <maskray@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> # for aarch64
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> # for vDSO, aarch64
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Link: https://lore.kernel.org/r/20230310190750.3323802-1-maskray@google.com
* A pair of fixes to the ASID allocator to avoid leaking stale mappings
between tasks.
* A fix to the vmalloc fault handler to tolerate huge pages.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmQUf1cTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYiafvD/4ixaHUMYFBBsw0Vo2kXaILmBYNZOmz
KoAHykqlg4TRZ0xtOK/iLcSsiDXVVbI91iBeKjrwOiJ2+Sk4gDm01JMhOK6eJh4I
boQAoRNgUBJLiKp7ZlybJ3R8yXw4VkKK0lJKNd9zOko+76Z8cQitsiwliWQwnpJw
jtKpzYZ8Plxki+0jUt7/21FUF0sy1UspgFTQdV6XfBGtIqVuVNgRLK4emjrKxl7s
fpkvQfD9ZPCuCNqg42o9VULK8fQfQSi5jt9POrGVKg7EaEHb7NfxttWxu/VkMBoI
cTa9zNSM4DYfmubOTqPoE4MxxmY294vii2JnoimQPDWlT9gGRD5Puf/rmm420cUE
yhsl4HdurDBRw3608pIfXWl9pTBo/doFImrQfY/IuGlR6Jy632NFFdPXa0vA/RoM
JBpAVJrUGRRo6w5B+GM5XVpxQNiBtMtGSVYNG2185Gtszlw6CebG31Da39kBPr2O
G/QFTVaZJnlHVqEJwOm/7TuYM/8u+uT6eiuYiRBcHImOIleUJPGYnDfG+dav3nln
E4DXBref4ikAZX794rEQnB6Ayt3Hl1E5lZ9HA+sezMNwv2zhT9rYAgF+oM8/A6FV
3JxcBmkNj3lqKzwNK85YOHE7us/5u+PY7HPrUngC7iORvh2wSh+AVfiu7mXdhrWD
e6NwgE4EoZOgqw==
=A1sl
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- fixes to the ASID allocator to avoid leaking stale mappings between
tasks
- fix the vmalloc fault handler to tolerate huge pages
* tag 'riscv-for-linus-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: mm: Support huge page in vmalloc_fault()
riscv: asid: Fixup stale TLB entry cause application crash
Revert "riscv: mm: notify remote harts about mmu cache updates"
In case when VCPU is blocked due to WFI, we schedule the timer
from `kvm_riscv_vcpu_timer_blocking()` to keep timer interrupt
ticking.
But in case when delta_ns comes to be zero, we never schedule
the timer and VCPU keeps sleeping indefinitely until any activity
is done with VM console.
This is easily reproduce-able using kvmtool.
./lkvm-static run -c1 --console virtio -p "earlycon root=/dev/vda" \
-k ./Image -d rootfs.ext4
Also, just add a print in kvm_riscv_vcpu_vstimer_expired() to
check the interrupt delivery and run `top` or similar auto-upating
cmd from guest. Within sometime one can notice that print from
timer expiry routine stops and the `top` cmd output will stop
updating.
This change fixes this by making sure we schedule the timer even
with delta_ns being zero to bring the VCPU out of sleep immediately.
Fixes: 8f5cb44b1b ("RISC-V: KVM: Support sstc extension")
Signed-off-by: Rajnesh Kanwal <rkanwal@rivosinc.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
All kvm_arch_vm_ioctl() implementations now only deal with "int"
types as return values, so we can change the return type of these
functions to use "int" instead of "long".
Signed-off-by: Thomas Huth <thuth@redhat.com>
Acked-by: Anup Patel <anup@brainfault.org>
Message-Id: <20230208140105.655814-7-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The mailbox on PolarFire SoC should really have three reg properties,
not two. Without splitting into three sections, the system controller's
QSPI cannot be accessed as it sits inside the current first range. The
driver & binding have been adapted to account for both two & three
ranges, so fix the dts too.
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Andrew Jones <ajones@ventanamicro.com> says:
When the Zicboz extension is available we can more rapidly zero naturally
aligned Zicboz block sized chunks of memory. As pages are always page
aligned and are larger than any Zicboz block size will be, then
clear_page() appears to be a good candidate for the extension. While cycle
count and energy consumption should also be considered, we can be pretty
certain that implementing clear_page() with the Zicboz extension is a win
by comparing the new dynamic instruction count with its current count[1].
Doing so we see that the new count is just over a quarter of the old count
(see patch6's commit message for more details).
For those of you who reviewed v1[2], you may be looking for the memset()
patches. As pointed out in v1, and a couple follow-up emails, it's not
clear that patching memset() is a win yet. When I get a chance to test
on real hardware with a comprehensive benchmark collection then I can
post the memset() patches separately (assuming the benchmarks show it's
worthwhile).
* b4-shazam-merge:
RISC-V: KVM: Expose Zicboz to the guest
RISC-V: KVM: Provide UAPI for Zicboz block size
RISC-V: Use Zicboz in clear_page when available
RISC-V: cpufeatures: Put the upper 16 bits of patch ID to work
RISC-V: Add Zicboz detection and block size parsing
dt-bindings: riscv: Document cboz-block-size
RISC-V: Factor out body of riscv_init_cbom_blocksize loop
RISC-V: alternatives: Support patching multiple insns in assembly
Link: https://lore.kernel.org/r/20230224162631.405473-1-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Guests may use the cbo.zero instruction when the CPU has the Zicboz
extension and the hypervisor sets henvcfg.CBZE.
Add Zicboz support for KVM guests which may be enabled and
disabled from KVM userspace using the ISA extension ONE_REG API.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20230224162631.405473-9-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
We're about to allow guests to use the Zicboz extension. KVM
userspace needs to know the cache block size in order to
properly advertise it to the guest. Provide a virtual config
register for userspace to get it with the GET_ONE_REG API, but
setting it cannot be supported, so disallow SET_ONE_REG.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20230224162631.405473-8-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Using memset() to zero a 4K page takes 563 total instructions, where
20 are branches. clear_page(), with Zicboz and a 64 byte block size,
takes 169 total instructions, where 4 are branches and 33 are nops.
Even though the block size is a variable, thanks to alternatives, we
can still implement a Duff device without having to do any preliminary
calculations. This is achieved by using the alternatives' cpufeature
value (the upper 16 bits of patch_id). The value used is the maximum
zicboz block size order accepted at the patch site. This enables us
to stop patching / unrolling when 4K bytes have been zeroed (we would
loop and continue after 4K if the page size would be larger)
For 4K pages, unrolling 16 times allows block sizes of 64 and 128 to
only loop a few times and larger block sizes to not loop at all. Since
cbo.zero doesn't take an offset, we also need an 'add' after each
instruction, making the loop body 112 to 160 bytes. Hopefully this
is small enough to not cause icache misses.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230224162631.405473-7-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
cpufeature IDs are consecutive integers starting at 26, so a 32-bit
patch ID allows an aircraft carrier load of feature IDs. Repurposing
the upper 16 bits still leaves a boat load of feature IDs and gains
16 bits which may be used to control patching on a per patch-site
basis.
This will be initially used in Zicboz's application to clear_page(),
as Zicboz's block size must also be considered. In that case, the
upper 16-bit value's role will be to convey the maximum block size
which the Zicboz clear_page() implementation supports.
cpufeature patch sites which need to check for the existence or
absence of other cpufeatures may also be able to make use of this.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230224162631.405473-6-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Parse "riscv,cboz-block-size" from the DT by piggybacking on Zicbom's
riscv_init_cbom_blocksize(). Additionally check the DT for the presence
of the "zicboz" extension and, when it's present, validate the parsed
cboz block size as we do Zicbom's cbom block size with
riscv_isa_extension_check().
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230224162631.405473-5-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Refactor riscv_init_cbom_blocksize() to prepare for it to be used
for both cbom block size and cboz block size.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230224162631.405473-3-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
As pointed out in commit d374a16539 ("RISC-V: fix compile error
from deduplicated __ALTERNATIVE_CFG_2"), we need quotes around
parameters passed to macros within macros to avoid spaces being
interpreted as separators. ALT_NEW_CONTENT was trying to handle
this by defining new_c has a vararg, but this isn't sufficient
for calling ALTERNATIVE() from assembly with multiple instructions
in the new/old sequences. Remove the vararg "hack" and use quotes.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230224162631.405473-2-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Andrew Jones <ajones@ventanamicro.com> says:
This series has no intended functional change. These cleanups were
found while renaming errata_id to patch_id in order to better
convey that its purpose is larger than errata (it's also for
cpufeatures).
* b4-shazam-merge:
riscv: cpufeature: Drop errata_list.h and other unused includes
riscv: lib: Include hwcap.h directly
riscv: alternatives: Rename errata_id to patch_id
riscv: alternatives: Remove unnecessary define and unused struct
riscv: Rename Kconfig.erratas to Kconfig.errata
riscv: Clarify RISCV_ALTERNATIVE help text
Link: https://lore.kernel.org/r/20230224154601.88163-1-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Drop errata_list.h, since cpufeature.c includes hwcap.h directly to
get cpufeature IDs. And, while there, prune the rest of the unused
includes too.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230224154601.88163-7-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
When using alternatives for cpufeatures we should include hwcap.h
directly, rather than through errata_list.h. Opportunistically drop
an unused include too.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230224154601.88163-6-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Alternatives are used for both errata and cpufeatures. Use a more
generic name, 'patch_id', as in "ID of code patching site", to
avoid confusion when alternatives are used for cpufeatures.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230224154601.88163-5-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
A define and a struct were introduced with commit 6f4eea9046
("riscv: Introduce alternative mechanism to apply errata solution"),
which introduced alternatives to RISC-V. The define is used for
an arbitrary string length, specific to sifive errata, so just use
the number directly there instead. The struct has never been used,
so remove it.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230224154601.88163-4-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Errata is already plural for erratum. Rename it to make the
grammar gooder.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230224154601.88163-3-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Clarify RISCV_ALTERNATIVE's help text by pointing out that code
patching is not only done at boot time, but also module load time.
Also point out that this is the minimal possible overhead.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230224154601.88163-2-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Since RISC-V supports ioremap() with huge page (pud/pmd) mapping,
However, vmalloc_fault() assumes that the vmalloc range is limited
to pte mappings. To complete the vmalloc_fault() function by adding
huge page support.
Fixes: 310f541a02 ("riscv: Enable HAVE_ARCH_HUGE_VMAP for 64BIT")
Cc: stable@vger.kernel.org
Signed-off-by: Dylan Jhong <dylan@andestech.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230310075021.3919290-1-dylan@andestech.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
D1 contains a crypto engine which is supported by the sun8i-ce driver.
Signed-off-by: Samuel Holland <samuel@sholland.org>
Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://lore.kernel.org/r/20221231220146.646-4-samuel@sholland.org
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
* RISC-V architecture-specific ELF attributes have been disabled in the
kernel builds.
* A fix for a locking failure while during errata patching that
manifests on SiFive-based systems.
* A fix for a KASAN failure during stack unwinding.
* A fix for some lockdep failures during text patching.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmQLXKsTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYiTd6D/9QxHDNOEiyrY7bSLkbeLiRvASXyMyx
/oKT8fmfsaikJngM/ZttQOiCEqPUDdeFfw0Mg3V9WxvrUsRtKDtfRN+pTUirFKpB
yWMAHuQ69QaWUJqaxfRa6JhyjgbnOCp3t/8RM/TywU3O4QG2MWbPg8QENnxtDAUN
X4zNis6xx19PCT2508irZNGysgIqlIRcDRJ/x85LQXIQBRREAnNTojgepwUxfDeZ
E4iPjlPU/i0eFeqpZkd3vZjjpkAZ91VfqCqmJTKv7JGKg0xeKM4z+EGlhNiS8odF
W9YS6Sf1wCa5smhq4vTF4PGEpineK+KrdJ5lPZXLKbbNtv/c1YNVSXBdi88DsWEn
jYnptL5mKWXAAIdOKdP50LjmiwDQP3BCqR+Ck9HYvPEST65QRwYC2pvReszsX5Bu
guiNSFuh0eEszu6VqJCFjrPKxLGODi0Ug2XeJd5NQ2sd7OkkrUP0wuJaYbn+xnUb
RWJNZY5jpKf+euPuJd5VgPiiOiOE+7gfG0X3bOB27f2OJPS4BFT8Z5NTzIx3qUet
I7hwuXHhNCCl+loczljNnwDZO1g4ktAvOjrRm42MZ2ERyAAvso9UiaI+zdciATP7
UgWvxybQc8oe+XAPpsnr5UwBl3Hy9o1ELXBY1avV8b5XQy7tAIP5eQW4R7nNYxM7
DvNBcO8rOEFx0w==
=goS0
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- RISC-V architecture-specific ELF attributes have been disabled in the
kernel builds
- A fix for a locking failure while during errata patching that
manifests on SiFive-based systems
- A fix for a KASAN failure during stack unwinding
- A fix for some lockdep failures during text patching
* tag 'riscv-for-linus-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: Don't check text_mutex during stop_machine
riscv: Use READ_ONCE_NOCHECK in imprecise unwinding stack mode
RISC-V: fix taking the text_mutex twice during sifive errata patching
RISC-V: Stop emitting attributes
Qinglin Pan <panqinglin00@gmail.com> says:
Svnapot is a RISC-V extension for marking contiguous 4K pages as a non-4K
page. This patch set is for using Svnapot in hugetlb fs and huge vmap.
This patchset adds a Kconfig item for using Svnapot in
"Platform type"->"SVNAPOT extension support". Its default value is on,
and people can set it off if they don't allow kernel to detect Svnapot
hardware support and leverage it.
Tested on:
- qemu rv64 with "Svnapot support" off and svnapot=true.
- qemu rv64 with "Svnapot support" on and svnapot=true.
- qemu rv64 with "Svnapot support" off and svnapot=false.
- qemu rv64 with "Svnapot support" on and svnapot=false.
* b4-shazam-merge:
riscv: mm: support Svnapot in huge vmap
riscv: mm: support Svnapot in hugetlb page
riscv: mm: modify pte format for Svnapot
Link: https://lore.kernel.org/r/20230209131647.17245-1-panqinglin00@gmail.com
[Palmer: fix up the feature ordering in the merge]
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Sergey Matyukevich <geomatsi@gmail.com> says:
Some time ago two different patches have been posted to fix stale TLB
entries that caused applications crashes.
The patch [0] suggested 'aggregating' mm_cpumask, i.e. current cpu is not
cleared for the switched-out task in switch_mm function. For additional
explanations see the commit message by Guo Ren. The same approach is
used by arc architecture, so another good comment is for switch_mm
in arch/arc/include/asm/mmu_context.h.
The patch [1] attempted to reduce the number of TLB flushes by deferring
(and possibly avoiding) them for CPUs not running the task.
Patch [1] has been merged. However we already have two bug reports from
different vendors. So apparently something is missing in the approach
suggested in [1]. In both cases the patch [0] fixed the issue.
This patch series reverts [1] and replaces it by [0].
[0] https://lore.kernel.org/linux-riscv/20221111075902.798571-1-guoren@kernel.org/
[1] https://lore.kernel.org/linux-riscv/20220829205219.283543-1-geomatsi@gmail.com/
* b4-shazam-merge:
riscv: asid: Fixup stale TLB entry cause application crash
Revert "riscv: mm: notify remote harts about mmu cache updates"
Link: https://lore.kernel.org/r/20230226150137.1919750-1-geomatsi@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
After use_asid_allocator is enabled, the userspace application will
crash by stale TLB entries. Because only using cpumask_clear_cpu without
local_flush_tlb_all couldn't guarantee CPU's TLB entries were fresh.
Then set_mm_asid would cause the user space application to get a stale
value by stale TLB entry, but set_mm_noasid is okay.
Here is the symptom of the bug:
unhandled signal 11 code 0x1 (coredump)
0x0000003fd6d22524 <+4>: auipc s0,0x70
0x0000003fd6d22528 <+8>: ld s0,-148(s0) # 0x3fd6d92490
=> 0x0000003fd6d2252c <+12>: ld a5,0(s0)
(gdb) i r s0
s0 0x8082ed1cc3198b21 0x8082ed1cc3198b21
(gdb) x /2x 0x3fd6d92490
0x3fd6d92490: 0xd80ac8a8 0x0000003f
The core dump file shows that register s0 is wrong, but the value in
memory is correct. Because 'ld s0, -148(s0)' used a stale mapping entry
in TLB and got a wrong result from an incorrect physical address.
When the task ran on CPU0, which loaded/speculative-loaded the value of
address(0x3fd6d92490), then the first version of the mapping entry was
PTWed into CPU0's TLB.
When the task switched from CPU0 to CPU1 (No local_tlb_flush_all here by
asid), it happened to write a value on the address (0x3fd6d92490). It
caused do_page_fault -> wp_page_copy -> ptep_clear_flush ->
ptep_get_and_clear & flush_tlb_page.
The flush_tlb_page used mm_cpumask(mm) to determine which CPUs need TLB
flush, but CPU0 had cleared the CPU0's mm_cpumask in the previous
switch_mm. So we only flushed the CPU1 TLB and set the second version
mapping of the PTE. When the task switched from CPU1 to CPU0 again, CPU0
still used a stale TLB mapping entry which contained a wrong target
physical address. It raised a bug when the task happened to read that
value.
CPU0 CPU1
- switch 'task' in
- read addr (Fill stale mapping
entry into TLB)
- switch 'task' out (no tlb_flush)
- switch 'task' in (no tlb_flush)
- write addr cause pagefault
do_page_fault() (change to
new addr mapping)
wp_page_copy()
ptep_clear_flush()
ptep_get_and_clear()
& flush_tlb_page()
write new value into addr
- switch 'task' out (no tlb_flush)
- switch 'task' in (no tlb_flush)
- read addr again (Use stale
mapping entry in TLB)
get wrong value from old phyical
addr, BUG!
The solution is to keep all CPUs' footmarks of cpumask(mm) in switch_mm,
which could guarantee to invalidate all stale TLB entries during TLB
flush.
Fixes: 65d4b9c530 ("RISC-V: Implement ASID allocator")
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Tested-by: Zong Li <zong.li@sifive.com>
Tested-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
Cc: Anup Patel <apatel@ventanamicro.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: stable@vger.kernel.org
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230226150137.1919750-3-geomatsi@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This reverts the remaining bits of commit 4bd1d80efb ("riscv: mm:
notify remote harts harts about mmu cache updates").
According to bug reports, suggested approach to fix stale TLB entries
is not sufficient. It needs to be replaced by a more robust solution.
Fixes: 4bd1d80efb ("riscv: mm: notify remote harts about mmu cache updates")
Reported-by: Zong Li <zong.li@sifive.com>
Reported-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
Cc: stable@vger.kernel.org
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230226150137.1919750-2-geomatsi@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
We're currently using stop_machine() to update ftrace & kprobes, which
means that the thread that takes text_mutex during may not be the same
as the thread that eventually patches the code. This isn't actually a
race because the lock is still held (preventing any other concurrent
accesses) and there is only one thread running during stop_machine(),
but it does trigger a lockdep failure.
This patch just elides the lockdep check during stop_machine.
Fixes: c15ac4fd60 ("riscv/ftrace: Add dynamic function tracer support")
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230303143754.4005217-1-conor.dooley@microchip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Before commit 076cbf5d2163 ("x86/xen: don't let xen_pv_play_dead()
return"), in Xen, when a previously offlined CPU was brought back
online, it unexpectedly resumed execution where it left off in the
middle of the idle loop.
There were some hacks to make that work, but the behavior was surprising
as do_idle() doesn't expect an offlined CPU to return from the dead (in
arch_cpu_idle_dead()).
Now that Xen has been fixed, and the arch-specific implementations of
arch_cpu_idle_dead() also don't return, give it a __noreturn attribute.
This will cause the compiler to complain if an arch-specific
implementation might return. It also improves code generation for both
caller and callee.
Also fixes the following warning:
vmlinux.o: warning: objtool: do_idle+0x25f: unreachable instruction
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/60d527353da8c99d4cf13b6473131d46719ed16d.1676358308.git.jpoimboe@kernel.org
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
As HAVE_ARCH_HUGE_VMAP and HAVE_ARCH_HUGE_VMALLOC is supported, we can
implement arch_vmap_pte_range_map_size and arch_vmap_pte_supported_shift
for Svnapot to support huge vmap about napot size.
It can be tested by huge vmap used in pci driver. Huge vmalloc with svnapot
can be tested by test_vmalloc with [1] applied, and probe this
module to run fix_size_alloc_test with use_huge true.
[1]https://lore.kernel.org/all/20221212055657.698420-1-panqinglin2020@iscas.ac.cn/
Signed-off-by: Qinglin Pan <panqinglin00@gmail.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230209131647.17245-4-panqinglin00@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Svnapot can be used to support 64KB hugetlb page, so it can become a new
option when using hugetlbfs. Add a basic implementation of hugetlb page,
and support 64KB as a size in it by using Svnapot.
For test, boot kernel with command line contains "default_hugepagesz=64K
hugepagesz=64K hugepages=20" and run a simple test like this:
tools/testing/selftests/vm/map_hugetlb 1 16
And it should be passed.
Signed-off-by: Qinglin Pan <panqinglin00@gmail.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230209131647.17245-3-panqinglin00@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Add one alternative to enable/disable svnapot support, enable this static
key when "svnapot" is in the "riscv,isa" field of fdt and SVNAPOT compile
option is set. It will influence the behavior of has_svnapot. All code
dependent on svnapot should make sure that has_svnapot return true firstly.
Modify PTE definition for Svnapot, and creates some functions in pgtable.h
to mark a PTE as napot and check if it is a Svnapot PTE. Until now, only
64KB napot size is supported in spec, so some macros has only 64KB version.
Signed-off-by: Qinglin Pan <panqinglin00@gmail.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230209131647.17245-2-panqinglin00@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The macb on PolarFire SoC has reset support which the generic compatible
does not use. Add the newly introduced MPFS specific compatible as the
primary compatible to avail of this support & wire up the reset to the
clock controllers devicetree entry.
Reviewed-by: Daire McNamara <daire.mcnamara@microchip.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZAZsBwAKCRDbK58LschI
g3W1AQCQnO6pqqX5Q2aYDAZPlZRtV2TRLjuqrQE0dHW/XLAbBgD/bgsAmiKhPSCG
2mTt6izpTQVlZB0e8KcDIvbYd9CE3Qc=
=EjJQ
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2023-03-06
We've added 85 non-merge commits during the last 13 day(s) which contain
a total of 131 files changed, 7102 insertions(+), 1792 deletions(-).
The main changes are:
1) Add skb and XDP typed dynptrs which allow BPF programs for more
ergonomic and less brittle iteration through data and variable-sized
accesses, from Joanne Koong.
2) Bigger batch of BPF verifier improvements to prepare for upcoming BPF
open-coded iterators allowing for less restrictive looping capabilities,
from Andrii Nakryiko.
3) Rework RCU enforcement in the verifier, add kptr_rcu and enforce BPF
programs to NULL-check before passing such pointers into kfunc,
from Alexei Starovoitov.
4) Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and in
local storage maps, from Kumar Kartikeya Dwivedi.
5) Add BPF verifier support for ST instructions in convert_ctx_access()
which will help new -mcpu=v4 clang flag to start emitting them,
from Eduard Zingerman.
6) Make uprobe attachment Android APK aware by supporting attachment
to functions inside ELF objects contained in APKs via function names,
from Daniel Müller.
7) Add a new flag BPF_F_TIMER_ABS flag for bpf_timer_start() helper
to start the timer with absolute expiration value instead of relative
one, from Tero Kristo.
8) Add a new kfunc bpf_cgroup_from_id() to look up cgroups via id,
from Tejun Heo.
9) Extend libbpf to support users manually attaching kprobes/uprobes
in the legacy/perf/link mode, from Menglong Dong.
10) Implement workarounds in the mips BPF JIT for DADDI/R4000,
from Jiaxun Yang.
11) Enable mixing bpf2bpf and tailcalls for the loongarch BPF JIT,
from Hengqi Chen.
12) Extend BPF instruction set doc with describing the encoding of BPF
instructions in terms of how bytes are stored under big/little endian,
from Jose E. Marchesi.
13) Follow-up to enable kfunc support for riscv BPF JIT, from Pu Lehui.
14) Fix bpf_xdp_query() backwards compatibility on old kernels,
from Yonghong Song.
15) Fix BPF selftest cross compilation with CLANG_CROSS_FLAGS,
from Florent Revest.
16) Improve bpf_cpumask_ma to only allocate one bpf_mem_cache,
from Hou Tao.
17) Fix BPF verifier's check_subprogs to not unnecessarily mark
a subprogram with has_tail_call, from Ilya Leoshkevich.
18) Fix arm syscall regs spec in libbpf's bpf_tracing.h, from Puranjay Mohan.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (85 commits)
selftests/bpf: Add test for legacy/perf kprobe/uprobe attach mode
selftests/bpf: Split test_attach_probe into multi subtests
libbpf: Add support to set kprobe/uprobe attach mode
tools/resolve_btfids: Add /libsubcmd to .gitignore
bpf: add support for fixed-size memory pointer returns for kfuncs
bpf: generalize dynptr_get_spi to be usable for iters
bpf: mark PTR_TO_MEM as non-null register type
bpf: move kfunc_call_arg_meta higher in the file
bpf: ensure that r0 is marked scratched after any function call
bpf: fix visit_insn()'s detection of BPF_FUNC_timer_set_callback helper
bpf: clean up visit_insn()'s instruction processing
selftests/bpf: adjust log_fixup's buffer size for proper truncation
bpf: honor env->test_state_freq flag in is_state_visited()
selftests/bpf: enhance align selftest's expected log matching
bpf: improve regsafe() checks for PTR_TO_{MEM,BUF,TP_BUFFER}
bpf: improve stack slot state printing
selftests/bpf: Disassembler tests for verifier.c:convert_ctx_access()
selftests/bpf: test if pointer type is tracked for BPF_ST_MEM
bpf: allow ctx writes using BPF_ST_MEM instruction
bpf: Use separate RCU callbacks for freeing selem
...
====================
Link: https://lore.kernel.org/r/20230307004346.27578-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZAZZ1wAKCRDbK58LschI
g4fcAQDYVsICeBDmhdBdZs7Kb91/s6SrU6B0jy4zs0gOIBBOhgD7B3jt3dMTD2tp
rPLHlv6uUoYS7mbZsrZi/XjVw8UmewM=
=VUnr
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2023-03-06
We've added 8 non-merge commits during the last 7 day(s) which contain
a total of 9 files changed, 64 insertions(+), 18 deletions(-).
The main changes are:
1) Fix BTF resolver for DATASEC sections when a VAR points at a modifier,
that is, keep resolving such instances instead of bailing out,
from Lorenz Bauer.
2) Fix BPF test framework with regards to xdp_frame info misplacement
in the "live packet" code, from Alexander Lobakin.
3) Fix an infinite loop in BPF sockmap code for TCP/UDP/AF_UNIX,
from Liu Jian.
4) Fix a build error for riscv BPF JIT under PERF_EVENTS=n,
from Randy Dunlap.
5) Several BPF doc fixes with either broken links or external instead
of internal doc links, from Bagas Sanjaya.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: check that modifier resolves after pointer
btf: fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR
bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES
bpf, doc: Link to submitting-patches.rst for general patch submission info
bpf, doc: Do not link to docs.kernel.org for kselftest link
bpf, sockmap: Fix an infinite loop error when len is 0 in tcp_bpf_recvmsg_parser()
riscv, bpf: Fix patch_text implicit declaration
bpf, docs: Fix link to BTF doc
====================
Link: https://lore.kernel.org/r/20230306215944.11981-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The RISC-V ELF attributes don't contain any useful information. New
toolchains ignore them, but they frequently trip up various older/mixed
toolchains. So just turn them off.
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230223224605.6995-1-palmer@rivosinc.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Some of the page fault handlers do not deal with the following case
correctly:
* handle_mm_fault() has returned VM_FAULT_RETRY
* there is a pending fatal signal
* fault had happened in kernel mode
Correct action in such case is not "return unconditionally" - fatal
signals are handled only upon return to userland and something like
copy_to_user() would end up retrying the faulting instruction and
triggering the same fault again and again.
What we need to do in such case is to make the caller to treat that
as failed uaccess attempt - handle exception if there is an exception
handler for faulting instruction or oops if there isn't one.
Over the years some architectures had been fixed and now are handling
that case properly; some still do not. This series should fix the
remaining ones.
Status:
m68k, riscv, hexagon, parisc: tested/acked by maintainers.
alpha, sparc32, sparc64: tested locally - bug has been
reproduced on the unpatched kernel and verified to be fixed by
this series.
ia64, microblaze, nios2, openrisc: build, but otherwise
completely untested.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZAP54AAKCRBZ7Krx/gZQ
6/IjAP92oS8kDuodAtT7WorV+GETWr2HFKfvDiPSmEGiSnXIigD9Hj9svXyeeAgl
/TqGR50Yrvn3IUQ0A0bUDpBAG1qyVwY=
=9+Bd
-----END PGP SIGNATURE-----
Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VM_FAULT_RETRY fixes from Al Viro:
"Some of the page fault handlers do not deal with the following case
correctly:
- handle_mm_fault() has returned VM_FAULT_RETRY
- there is a pending fatal signal
- fault had happened in kernel mode
Correct action in such case is not "return unconditionally" - fatal
signals are handled only upon return to userland and something like
copy_to_user() would end up retrying the faulting instruction and
triggering the same fault again and again.
What we need to do in such case is to make the caller to treat that as
failed uaccess attempt - handle exception if there is an exception
handler for faulting instruction or oops if there isn't one.
Over the years some architectures had been fixed and now are handling
that case properly; some still do not. This series should fix the
remaining ones.
Status:
- m68k, riscv, hexagon, parisc: tested/acked by maintainers.
- alpha, sparc32, sparc64: tested locally - bug has been reproduced
on the unpatched kernel and verified to be fixed by this series.
- ia64, microblaze, nios2, openrisc: build, but otherwise completely
untested"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
openrisc: fix livelock in uaccess
nios2: fix livelock in uaccess
microblaze: fix livelock in uaccess
ia64: fix livelock in uaccess
sparc: fix livelock in uaccess
alpha: fix livelock in uaccess
parisc: fix livelock in uaccess
hexagon: fix livelock in uaccess
riscv: fix livelock in uaccess
m68k: fix livelock in uaccess
* Some cleanups and fixes for the Zbb-optimized string routines.
* Support for custom (vendor or implementation defined) perf events.
* COMMAND_LINE_SIZE has been increased to 1024.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmQBuDMTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYie2nEACDnj2w35BYDZDrbOiodXw1bA60IpJG
5Q3bZEIWimaoDluD52UxrLhAZgJ3imGHF2nLLevqqsqAWKT2aRxsiEtgqARN0GzW
Am2+ZBdMuo1oCOHhIzCI9nk1xMG1fjva1ZAhGcVK876W3NuLUDvb1+LeDcwZB5bX
8KUu1haILIlngd8xOZE88xJOw83uQaF1Oc1NLqpQabbCiWxFBAIzI3O/7ZyuVk4M
LrdD5zH7YBUpksol0CxWaJ/jI59t23na2421NKNm34zALsvVFSKc2CZ0+r5DFy3D
4bzMdzXmtDjWfSCTGicGk03acrxUwgpAkFH/y2eJzhQl1WWrO/FiUzlI2uKoHou9
MO+fxPHcOamnWH5OF8jzP9+v5fkKIX2eYWJl5E3Jge6KU8D1Y2cPbE/k75cOq70Y
BVF+BUnQ26UdqYH0POFqikizF7dvt6G+rNpUnFrufc6xs+BALxfyIHRQnTw8dQu4
FbZRLX9YNn8/TpwcfnzIzFk1CH35n8QPBvcumIiV/344Gs7nt5vyDO7b//ExS4IL
mrENDNdaQ/u8T6xkhGSWPPMwNAj3OFpLRi00BInYvRXx9/0WArD1Z5N9uMl42dqd
kB060nUH1Ao7aBQJztqZVDOYwyFhREGM+xLRBLE+CiD8EnVvm30Fhovq2FrFfbif
GPHt+ouKJtlkTw==
=68ZK
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.3-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull more RISC-V updates from Palmer Dabbelt:
- Some cleanups and fixes for the Zbb-optimized string routines
- Support for custom (vendor or implementation defined) perf events
- COMMAND_LINE_SIZE has been increased to 1024
* tag 'riscv-for-linus-6.3-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Bump COMMAND_LINE_SIZE value to 1024
drivers/perf: RISC-V: Allow programming custom firmware events
riscv, lib: Fix Zbb strncmp
RISC-V: improve string-function assembly
riscv equivalent of 26178ec11e "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables. In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.
Tested-by: Björn Töpel <bjorn@kernel.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Increase COMMAND_LINE_SIZE as the current default value is too low
for syzbot kernel command line.
There has been considerable discussion on this patch that has led to a
larger patch set removing COMMAND_LINE_SIZE from the uapi headers on all
ports. That's not quite done yet, but it's gotten far enough we're
confident this is not a uABI change so this is safe.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Link: https://lore.kernel.org/r/20210316193420.904-1-alex@ghiti.fr
[Palmer: it's not uabi]
Link: https://lore.kernel.org/linux-riscv/874b8076-b0d1-4aaa-bcd8-05d523060152@app.fastmail.com/#t
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The Zbb optimized strncmp has two parts; a fast path that does XLEN/8B
per iteration, and a slow that does one byte per iteration.
The idea is to compare aligned XLEN chunks for most of strings, and do
the remainder tail in the slow path.
The Zbb strncmp has two issues in the fast path:
Incorrect remainder handling (wrong compare): Assume that the string
length is 9. On 64b systems, the fast path should do one iteration,
and one iteration in the slow path. Instead, both were done in the
fast path, which lead to incorrect results. An example:
strncmp("/dev/vda", "/dev/", 5);
Correct by changing "bgt" to "bge".
Missing NULL checks in the second string: This could lead to incorrect
results for:
strncmp("/dev/vda", "/dev/vda\0", 8);
Correct by adding an additional check.
Fixes: b6fcdb191e ("RISC-V: add zbb support to string functions")
Suggested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20230228184211.1585641-1-bjorn@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Adapt the suggestions for the assembly string functions that Andrew
suggested but that I didn't manage to include into the series that
got applied.
This includes improvements to two comments, removal of unneeded labels
and moving one instruction slightly higher to contradict an
explanatory comment.
Suggested-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230208225328.1636017-3-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
bpf_jit_comp64.c uses patch_text(), so add <asm/patch.h> to it
to prevent the build error:
../arch/riscv/net/bpf_jit_comp64.c: In function 'bpf_arch_text_poke':
../arch/riscv/net/bpf_jit_comp64.c:691:23: error: implicit declaration of function 'patch_text'; did you mean 'path_get'? [-Werror=implicit-function-declaration]
691 | ret = patch_text(ip, new_insns, ninsns);
| ^~~~~~~~~~
Fixes: 596f2e6f9c ("riscv, bpf: Add bpf_arch_text_poke support for RV64")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Pu Lehui <pulehui@huawei.com>
Acked-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/202302271000.Aj4nMXbZ-lkp@intel.com
Link: https://lore.kernel.org/bpf/20230227072016.14618-1-rdunlap@infradead.org
- Provide a virtual cache topology to the guest to avoid
inconsistencies with migration on heterogenous systems. Non secure
software has no practical need to traverse the caches by set/way in
the first place.
- Add support for taking stage-2 access faults in parallel. This was an
accidental omission in the original parallel faults implementation,
but should provide a marginal improvement to machines w/o FEAT_HAFDBS
(such as hardware from the fruit company).
- A preamble to adding support for nested virtualization to KVM,
including vEL2 register state, rudimentary nested exception handling
and masking unsupported features for nested guests.
- Fixes to the PSCI relay that avoid an unexpected host SVE trap when
resuming a CPU when running pKVM.
- VGIC maintenance interrupt support for the AIC
- Improvements to the arch timer emulation, primarily aimed at reducing
the trap overhead of running nested.
- Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the
interest of CI systems.
- Avoid VM-wide stop-the-world operations when a vCPU accesses its own
redistributor.
- Serialize when toggling CPACR_EL1.SMEN to avoid unexpected exceptions
in the host.
- Aesthetic and comment/kerneldoc fixes
- Drop the vestiges of the old Columbia mailing list and add [Oliver]
as co-maintainer
This also drags in arm64's 'for-next/sme2' branch, because both it and
the PSCI relay changes touch the EL2 initialization code.
RISC-V:
- Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE
- Correctly place the guest in S-mode after redirecting a trap to the guest
- Redirect illegal instruction traps to guest
- SBI PMU support for guest
s390:
- Two patches sorting out confusion between virtual and physical
addresses, which currently are the same on s390.
- A new ioctl that performs cmpxchg on guest memory
- A few fixes
x86:
- Change tdp_mmu to a read-only parameter
- Separate TDP and shadow MMU page fault paths
- Enable Hyper-V invariant TSC control
- Fix a variety of APICv and AVIC bugs, some of them real-world,
some of them affecting architecurally legal but unlikely to
happen in practice
- Mark APIC timer as expired if its in one-shot mode and the count
underflows while the vCPU task was being migrated
- Advertise support for Intel's new fast REP string features
- Fix a double-shootdown issue in the emergency reboot code
- Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give SVM
similar treatment to VMX
- Update Xen's TSC info CPUID sub-leaves as appropriate
- Add support for Hyper-V's extended hypercalls, where "support" at this
point is just forwarding the hypercalls to userspace
- Clean up the kvm->lock vs. kvm->srcu sequences when updating the PMU and
MSR filters
- One-off fixes and cleanups
- Fix and cleanup the range-based TLB flushing code, used when KVM is
running on Hyper-V
- Add support for filtering PMU events using a mask. If userspace
wants to restrict heavily what events the guest can use, it can now
do so without needing an absurd number of filter entries
- Clean up KVM's handling of "PMU MSRs to save", especially when vPMU
support is disabled
- Add PEBS support for Intel Sapphire Rapids
- Fix a mostly benign overflow bug in SEV's send|receive_update_data()
- Move several SVM-specific flags into vcpu_svm
x86 Intel:
- Handle NMI VM-Exits before leaving the noinstr region
- A few trivial cleanups in the VM-Enter flows
- Stop enabling VMFUNC for L1 purely to document that KVM doesn't support
EPTP switching (or any other VM function) for L1
- Fix a crash when using eVMCS's enlighted MSR bitmaps
Generic:
- Clean up the hardware enable and initialization flow, which was
scattered around multiple arch-specific hooks. Instead, just
let the arch code call into generic code. Both x86 and ARM should
benefit from not having to fight common KVM code's notion of how
to do initialization.
- Account allocations in generic kvm_arch_alloc_vm()
- Fix a memory leak if coalesced MMIO unregistration fails
selftests:
- On x86, cache the CPU vendor (AMD vs. Intel) and use the info to emit
the correct hypercall instruction instead of relying on KVM to patch
in VMMCALL
- Use TAP interface for kvm_binary_stats_test and tsc_msrs_test
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmP2YA0UHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPg/Qf+J6nT+TkIa+8Ei+fN1oMTDp4YuIOx
mXvJ9mRK9sQ+tAUVwvDz3qN/fK5mjsYbRHIDlVc5p2Q3bCrVGDDqXPFfCcLx1u+O
9U9xjkO4JxD2LS9pc70FYOyzVNeJ8VMGOBbC2b0lkdYZ4KnUc6e/WWFKJs96bK+H
duo+RIVyaMthnvbTwSv1K3qQb61n6lSJXplywS8KWFK6NZAmBiEFDAWGRYQE9lLs
VcVcG0iDJNL/BQJ5InKCcvXVGskcCm9erDszPo7w4Bypa4S9AMS42DHUaRZrBJwV
/WqdH7ckIz7+OSV0W1j+bKTHAFVTCjXYOM7wQykgjawjICzMSnnG9Gpskw==
=goe1
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- Provide a virtual cache topology to the guest to avoid
inconsistencies with migration on heterogenous systems. Non secure
software has no practical need to traverse the caches by set/way in
the first place
- Add support for taking stage-2 access faults in parallel. This was
an accidental omission in the original parallel faults
implementation, but should provide a marginal improvement to
machines w/o FEAT_HAFDBS (such as hardware from the fruit company)
- A preamble to adding support for nested virtualization to KVM,
including vEL2 register state, rudimentary nested exception
handling and masking unsupported features for nested guests
- Fixes to the PSCI relay that avoid an unexpected host SVE trap when
resuming a CPU when running pKVM
- VGIC maintenance interrupt support for the AIC
- Improvements to the arch timer emulation, primarily aimed at
reducing the trap overhead of running nested
- Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the
interest of CI systems
- Avoid VM-wide stop-the-world operations when a vCPU accesses its
own redistributor
- Serialize when toggling CPACR_EL1.SMEN to avoid unexpected
exceptions in the host
- Aesthetic and comment/kerneldoc fixes
- Drop the vestiges of the old Columbia mailing list and add [Oliver]
as co-maintainer
RISC-V:
- Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE
- Correctly place the guest in S-mode after redirecting a trap to the
guest
- Redirect illegal instruction traps to guest
- SBI PMU support for guest
s390:
- Sort out confusion between virtual and physical addresses, which
currently are the same on s390
- A new ioctl that performs cmpxchg on guest memory
- A few fixes
x86:
- Change tdp_mmu to a read-only parameter
- Separate TDP and shadow MMU page fault paths
- Enable Hyper-V invariant TSC control
- Fix a variety of APICv and AVIC bugs, some of them real-world, some
of them affecting architecurally legal but unlikely to happen in
practice
- Mark APIC timer as expired if its in one-shot mode and the count
underflows while the vCPU task was being migrated
- Advertise support for Intel's new fast REP string features
- Fix a double-shootdown issue in the emergency reboot code
- Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give
SVM similar treatment to VMX
- Update Xen's TSC info CPUID sub-leaves as appropriate
- Add support for Hyper-V's extended hypercalls, where "support" at
this point is just forwarding the hypercalls to userspace
- Clean up the kvm->lock vs. kvm->srcu sequences when updating the
PMU and MSR filters
- One-off fixes and cleanups
- Fix and cleanup the range-based TLB flushing code, used when KVM is
running on Hyper-V
- Add support for filtering PMU events using a mask. If userspace
wants to restrict heavily what events the guest can use, it can now
do so without needing an absurd number of filter entries
- Clean up KVM's handling of "PMU MSRs to save", especially when vPMU
support is disabled
- Add PEBS support for Intel Sapphire Rapids
- Fix a mostly benign overflow bug in SEV's
send|receive_update_data()
- Move several SVM-specific flags into vcpu_svm
x86 Intel:
- Handle NMI VM-Exits before leaving the noinstr region
- A few trivial cleanups in the VM-Enter flows
- Stop enabling VMFUNC for L1 purely to document that KVM doesn't
support EPTP switching (or any other VM function) for L1
- Fix a crash when using eVMCS's enlighted MSR bitmaps
Generic:
- Clean up the hardware enable and initialization flow, which was
scattered around multiple arch-specific hooks. Instead, just let
the arch code call into generic code. Both x86 and ARM should
benefit from not having to fight common KVM code's notion of how to
do initialization
- Account allocations in generic kvm_arch_alloc_vm()
- Fix a memory leak if coalesced MMIO unregistration fails
selftests:
- On x86, cache the CPU vendor (AMD vs. Intel) and use the info to
emit the correct hypercall instruction instead of relying on KVM to
patch in VMMCALL
- Use TAP interface for kvm_binary_stats_test and tsc_msrs_test"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (325 commits)
KVM: SVM: hyper-v: placate modpost section mismatch error
KVM: x86/mmu: Make tdp_mmu_allowed static
KVM: arm64: nv: Use reg_to_encoding() to get sysreg ID
KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes
KVM: arm64: nv: Filter out unsupported features from ID regs
KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2
KVM: arm64: nv: Allow a sysreg to be hidden from userspace only
KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
KVM: arm64: nv: Add accessors for SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
KVM: arm64: nv: Handle SMCs taken from virtual EL2
KVM: arm64: nv: Handle trapped ERET from virtual EL2
KVM: arm64: nv: Inject HVC exceptions to the virtual EL2
KVM: arm64: nv: Support virtual EL2 exceptions
KVM: arm64: nv: Handle HCR_EL2.NV system register traps
KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
KVM: arm64: nv: Add EL2 system registers to vcpu context
KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
KVM: arm64: nv: Introduce nested virtualization VCPU feature
KVM: arm64: Use the S2 MMU context to iterate over S2 table
...
There's a bunch of fixes/cleanups throughout the tree as usual, but we
also have a handful of new features.
* Various improvements to the extension detection and alternative
patching infrastructure.
* Zbb-optimized string routines.
* Support for cpu-capacity in the RISC-V DT bindings.
* Zicbom no longer depends on toolchain support.
* Some performance and code size improvements to ftrace.
* Support for ARCH_WANT_LD_ORPHAN_WARN.
* Oops now contain the faulting instruction.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmP49coTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYiTFnEACuQtGJWSwzH+ORswVyItKzqHcBMU3t
rWHTFxQ7MdWgQO8nrWUtSypGY4n0DFTCe9w4H3tQFRDaTXbI+ycFjidEDt3eJCMb
n6WiGuZpdKVS81CQ0Es4dTWQ1i/28fe1851CGK/PkybXdrPPofdCJ9k3Wepxflb/
2UYxRDyjKt3KbJ2OmN2oF8Ek1rrsGhIC/Dhbdb2JsGZhYF10ZYjquaOLs31WbHMG
O+n/N/JfZRAif1MDQ71ygAm9KV0kGqe/wcRtsJGETwJ8U3I/cjn2mAGd8BRdy4iL
9GFmTmi8q27ntUbakikNz3b4aE9xVnLDvRIyOciI3l8rQjrFAsfnQbuRwlaq6BVJ
BF3e6nAjkcLj23FhbROTlfncEOzrklbNZ+uQIuvyffAUjDoePw9x7o0r+qj7FnOY
WMfNecJMeE5OGVBqHSVFEcAMlN6uYu6wqbEipEpc+8sTg+w1LM0bUVNhV86/BrnL
bh+4+7MPYtg45vy2Y8AuPUBFqR2uCekDpbxciCEGsaIzUYRas2zrt9UkWGjKs1VV
q0qeLSNNA1wBq+q6FprTceipFQIqD5KnmI2GMucF6v4YFg5AzeSOpRc6aeqcs7Z2
+ApShSOFPjjntZbcpTgkvhrPExr0Jel0xY7YSazUUqY0xOHUwGNBEh/E4rzsRLxr
qvUpFAIZT60dfQ==
=XgYl
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.3-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
"There's a bunch of fixes/cleanups throughout the tree as usual, but we
also have a handful of new features:
- Various improvements to the extension detection and alternative
patching infrastructure
- Zbb-optimized string routines
- Support for cpu-capacity in the RISC-V DT bindings
- Zicbom no longer depends on toolchain support
- Some performance and code size improvements to ftrace
- Support for ARCH_WANT_LD_ORPHAN_WARN
- Oops now contain the faulting instruction"
* tag 'riscv-for-linus-6.3-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (67 commits)
RISC-V: add a spin_shadow_stack declaration
riscv: mm: hugetlb: Enable ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
riscv: Add header include guards to insn.h
riscv: alternative: proceed one more instruction for auipc/jalr pair
riscv: Avoid enabling interrupts in die()
riscv, mm: Perform BPF exhandler fixup on page fault
RISC-V: take text_mutex during alternative patching
riscv: hwcap: Don't alphabetize ISA extension IDs
RISC-V: fix ordering of Zbb extension
riscv: jump_label: Fixup unaligned arch_static_branch function
RISC-V: Only provide the single-letter extensions in HWCAP
riscv: mm: fix regression due to update_mmu_cache change
scripts/decodecode: Add support for RISC-V
riscv: Add instruction dump to RISC-V splats
riscv: select ARCH_WANT_LD_ORPHAN_WARN for !XIP_KERNEL
riscv: vmlinux.lds.S: explicitly catch .init.bss sections from EFI stub
riscv: vmlinux.lds.S: explicitly catch .riscv.attributes sections
riscv: vmlinux.lds.S: explicitly catch .rela.dyn symbols
riscv: lds: define RUNTIME_DISCARD_EXIT
RISC-V: move some stray __RISCV_INSN_FUNCS definitions from kprobes
...
Here is the large set of driver core changes for 6.3-rc1.
There's a lot of changes this development cycle, most of the work falls
into two different categories:
- fw_devlink fixes and updates. This has gone through numerous review
cycles and lots of review and testing by lots of different devices.
Hopefully all should be good now, and Saravana will be keeping a
watch for any potential regression on odd embedded systems.
- driver core changes to work to make struct bus_type able to be moved
into read-only memory (i.e. const) The recent work with Rust has
pointed out a number of areas in the driver core where we are
passing around and working with structures that really do not have
to be dynamic at all, and they should be able to be read-only making
things safer overall. This is the contuation of that work (started
last release with kobject changes) in moving struct bus_type to be
constant. We didn't quite make it for this release, but the
remaining patches will be finished up for the release after this
one, but the groundwork has been laid for this effort.
Other than that we have in here:
- debugfs memory leak fixes in some subsystems
- error path cleanups and fixes for some never-able-to-be-hit
codepaths.
- cacheinfo rework and fixes
- Other tiny fixes, full details are in the shortlog
All of these have been in linux-next for a while with no reported
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCY/ipdg8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynL3gCgwzbcWu0So3piZyLiJKxsVo9C2EsAn3sZ9gN6
6oeFOjD3JDju3cQsfGgd
=Su6W
-----END PGP SIGNATURE-----
Merge tag 'driver-core-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the large set of driver core changes for 6.3-rc1.
There's a lot of changes this development cycle, most of the work
falls into two different categories:
- fw_devlink fixes and updates. This has gone through numerous review
cycles and lots of review and testing by lots of different devices.
Hopefully all should be good now, and Saravana will be keeping a
watch for any potential regression on odd embedded systems.
- driver core changes to work to make struct bus_type able to be
moved into read-only memory (i.e. const) The recent work with Rust
has pointed out a number of areas in the driver core where we are
passing around and working with structures that really do not have
to be dynamic at all, and they should be able to be read-only
making things safer overall. This is the contuation of that work
(started last release with kobject changes) in moving struct
bus_type to be constant. We didn't quite make it for this release,
but the remaining patches will be finished up for the release after
this one, but the groundwork has been laid for this effort.
Other than that we have in here:
- debugfs memory leak fixes in some subsystems
- error path cleanups and fixes for some never-able-to-be-hit
codepaths.
- cacheinfo rework and fixes
- Other tiny fixes, full details are in the shortlog
All of these have been in linux-next for a while with no reported
problems"
[ Geert Uytterhoeven points out that that last sentence isn't true, and
that there's a pending report that has a fix that is queued up - Linus ]
* tag 'driver-core-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (124 commits)
debugfs: drop inline constant formatting for ERR_PTR(-ERROR)
OPP: fix error checking in opp_migrate_dentry()
debugfs: update comment of debugfs_rename()
i3c: fix device.h kernel-doc warnings
dma-mapping: no need to pass a bus_type into get_arch_dma_ops()
driver core: class: move EXPORT_SYMBOL_GPL() lines to the correct place
Revert "driver core: add error handling for devtmpfs_create_node()"
Revert "devtmpfs: add debug info to handle()"
Revert "devtmpfs: remove return value of devtmpfs_delete_node()"
driver core: cpu: don't hand-override the uevent bus_type callback.
devtmpfs: remove return value of devtmpfs_delete_node()
devtmpfs: add debug info to handle()
driver core: add error handling for devtmpfs_create_node()
driver core: bus: update my copyright notice
driver core: bus: add bus_get_dev_root() function
driver core: bus: constify bus_unregister()
driver core: bus: constify some internal functions
driver core: bus: constify bus_get_kset()
driver core: bus: constify bus_register/unregister_notifier()
driver core: remove private pointer from struct bus_type
...
Here is the big set of serial and tty driver updates for 6.3-rc1.
Once again, Jiri and Ilpo have done a number of core vt and tty/serial
layer cleanups that were much needed and appreciated. Other than that,
it's just a bunch of little tty/serial driver updates:
- qcom-geni-serial driver updates
- liteuart driver updates
- hvcs driver cleanups
- n_gsm updates and additions for new features
- more 8250 device support added
- fpga/dfl update and additions
- imx serial driver updates
- fsl_lpuart updates
- other tiny fixes and updates for serial drivers
All of these have been in linux-next for a while with no reported
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCY/itAw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykJbQCfWv/J4ZElO108iHBU5mJCDagUDBgAnAtLLN6A
SEAnnokGCDtA/BNIXeES
=luRi
-----END PGP SIGNATURE-----
Merge tag 'tty-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial driver updates from Greg KH:
"Here is the big set of serial and tty driver updates for 6.3-rc1.
Once again, Jiri and Ilpo have done a number of core vt and tty/serial
layer cleanups that were much needed and appreciated. Other than that,
it's just a bunch of little tty/serial driver updates:
- qcom-geni-serial driver updates
- liteuart driver updates
- hvcs driver cleanups
- n_gsm updates and additions for new features
- more 8250 device support added
- fpga/dfl update and additions
- imx serial driver updates
- fsl_lpuart updates
- other tiny fixes and updates for serial drivers
All of these have been in linux-next for a while with no reported
problems"
* tag 'tty-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (143 commits)
tty: n_gsm: add keep alive support
serial: imx: remove a redundant check
dt-bindings: serial: snps-dw-apb-uart: add dma & dma-names properties
soc: qcom: geni-se: Move qcom-geni-se.h to linux/soc/qcom/geni-se.h
tty: n_gsm: add TIOCMIWAIT support
tty: n_gsm: add RING/CD control support
tty: n_gsm: mark unusable ioctl structure fields accordingly
serial: imx: get rid of registers shadowing
serial: imx: refine local variables in rxint()
serial: imx: stop using USR2 in FIFO reading loop
serial: imx: remove redundant USR2 read from FIFO reading loop
serial: imx: do not break from FIFO reading loop prematurely
serial: imx: do not sysrq broken chars
serial: imx: work-around for hardware RX flood
serial: imx: factor-out common code to imx_uart_soft_reset()
serial: 8250_pci1xxxx: Add power management functions to quad-uart driver
serial: 8250_pci1xxxx: Add RS485 support to quad-uart driver
serial: 8250_pci1xxxx: Add driver for quad-uart support
serial: 8250_pci: Add serial8250_pci_setup_port definition in 8250_pcilib.c
tty: pcn_uart: fix memory leak with using debugfs_lookup()
...
F_SEAL_EXEC") which permits the setting of the memfd execute bit at
memfd creation time, with the option of sealing the state of the X bit.
- Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
thread-safe for pmd unshare") which addresses a rare race condition
related to PMD unsharing.
- Several folioification patch serieses from Matthew Wilcox, Vishal
Moola, Sidhartha Kumar and Lorenzo Stoakes
- Johannes Weiner has a series ("mm: push down lock_page_memcg()") which
does perform some memcg maintenance and cleanup work.
- SeongJae Park has added DAMOS filtering to DAMON, with the series
"mm/damon/core: implement damos filter". These filters provide users
with finer-grained control over DAMOS's actions. SeongJae has also done
some DAMON cleanup work.
- Kairui Song adds a series ("Clean up and fixes for swap").
- Vernon Yang contributed the series "Clean up and refinement for maple
tree".
- Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
adds to MGLRU an LRU of memcgs, to improve the scalability of global
reclaim.
- David Hildenbrand has added some userfaultfd cleanup work in the
series "mm: uffd-wp + change_protection() cleanups".
- Christoph Hellwig has removed the generic_writepages() library
function in the series "remove generic_writepages".
- Baolin Wang has performed some maintenance on the compaction code in
his series "Some small improvements for compaction".
- Sidhartha Kumar is doing some maintenance work on struct page in his
series "Get rid of tail page fields".
- David Hildenbrand contributed some cleanup, bugfixing and
generalization of pte management and of pte debugging in his series "mm:
support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap
PTEs".
- Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
flag in the series "Discard __GFP_ATOMIC".
- Sergey Senozhatsky has improved zsmalloc's memory utilization with his
series "zsmalloc: make zspage chain size configurable".
- Joey Gouly has added prctl() support for prohibiting the creation of
writeable+executable mappings. The previous BPF-based approach had
shortcomings. See "mm: In-kernel support for memory-deny-write-execute
(MDWE)".
- Waiman Long did some kmemleak cleanup and bugfixing in the series
"mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".
- T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
"mm: multi-gen LRU: improve".
- Jiaqi Yan has provided some enhancements to our memory error
statistics reporting, mainly by presenting the statistics on a per-node
basis. See the series "Introduce per NUMA node memory error
statistics".
- Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
regression in compaction via his series "Fix excessive CPU usage during
compaction".
- Christoph Hellwig does some vmalloc maintenance work in the series
"cleanup vfree and vunmap".
- Christoph Hellwig has removed block_device_operations.rw_page() in ths
series "remove ->rw_page".
- We get some maple_tree improvements and cleanups in Liam Howlett's
series "VMA tree type safety and remove __vma_adjust()".
- Suren Baghdasaryan has done some work on the maintainability of our
vm_flags handling in the series "introduce vm_flags modifier functions".
- Some pagemap cleanup and generalization work in Mike Rapoport's series
"mm, arch: add generic implementation of pfn_valid() for FLATMEM" and
"fixups for generic implementation of pfn_valid()"
- Baoquan He has done some work to make /proc/vmallocinfo and
/proc/kcore better represent the real state of things in his series
"mm/vmalloc.c: allow vread() to read out vm_map_ram areas".
- Jason Gunthorpe rationalized the GUP system's interface to the rest of
the kernel in the series "Simplify the external interface for GUP".
- SeongJae Park wishes to migrate people from DAMON's debugfs interface
over to its sysfs interface. To support this, we'll temporarily be
printing warnings when people use the debugfs interface. See the series
"mm/damon: deprecate DAMON debugfs interface".
- Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
and clean-ups" series.
- Huang Ying has provided a dramatic reduction in migration's TLB flush
IPI rates with the series "migrate_pages(): batch TLB flushing".
- Arnd Bergmann has some objtool fixups in "objtool warning fixes".
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY/PoPQAKCRDdBJ7gKXxA
jlvpAPsFECUBBl20qSue2zCYWnHC7Yk4q9ytTkPB/MMDrFEN9wD/SNKEm2UoK6/K
DmxHkn0LAitGgJRS/W9w81yrgig9tAQ=
=MlGs
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Daniel Verkamp has contributed a memfd series ("mm/memfd: add
F_SEAL_EXEC") which permits the setting of the memfd execute bit at
memfd creation time, with the option of sealing the state of the X
bit.
- Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
thread-safe for pmd unshare") which addresses a rare race condition
related to PMD unsharing.
- Several folioification patch serieses from Matthew Wilcox, Vishal
Moola, Sidhartha Kumar and Lorenzo Stoakes
- Johannes Weiner has a series ("mm: push down lock_page_memcg()")
which does perform some memcg maintenance and cleanup work.
- SeongJae Park has added DAMOS filtering to DAMON, with the series
"mm/damon/core: implement damos filter".
These filters provide users with finer-grained control over DAMOS's
actions. SeongJae has also done some DAMON cleanup work.
- Kairui Song adds a series ("Clean up and fixes for swap").
- Vernon Yang contributed the series "Clean up and refinement for maple
tree".
- Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
adds to MGLRU an LRU of memcgs, to improve the scalability of global
reclaim.
- David Hildenbrand has added some userfaultfd cleanup work in the
series "mm: uffd-wp + change_protection() cleanups".
- Christoph Hellwig has removed the generic_writepages() library
function in the series "remove generic_writepages".
- Baolin Wang has performed some maintenance on the compaction code in
his series "Some small improvements for compaction".
- Sidhartha Kumar is doing some maintenance work on struct page in his
series "Get rid of tail page fields".
- David Hildenbrand contributed some cleanup, bugfixing and
generalization of pte management and of pte debugging in his series
"mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with
swap PTEs".
- Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
flag in the series "Discard __GFP_ATOMIC".
- Sergey Senozhatsky has improved zsmalloc's memory utilization with
his series "zsmalloc: make zspage chain size configurable".
- Joey Gouly has added prctl() support for prohibiting the creation of
writeable+executable mappings.
The previous BPF-based approach had shortcomings. See "mm: In-kernel
support for memory-deny-write-execute (MDWE)".
- Waiman Long did some kmemleak cleanup and bugfixing in the series
"mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".
- T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
"mm: multi-gen LRU: improve".
- Jiaqi Yan has provided some enhancements to our memory error
statistics reporting, mainly by presenting the statistics on a
per-node basis. See the series "Introduce per NUMA node memory error
statistics".
- Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
regression in compaction via his series "Fix excessive CPU usage
during compaction".
- Christoph Hellwig does some vmalloc maintenance work in the series
"cleanup vfree and vunmap".
- Christoph Hellwig has removed block_device_operations.rw_page() in
ths series "remove ->rw_page".
- We get some maple_tree improvements and cleanups in Liam Howlett's
series "VMA tree type safety and remove __vma_adjust()".
- Suren Baghdasaryan has done some work on the maintainability of our
vm_flags handling in the series "introduce vm_flags modifier
functions".
- Some pagemap cleanup and generalization work in Mike Rapoport's
series "mm, arch: add generic implementation of pfn_valid() for
FLATMEM" and "fixups for generic implementation of pfn_valid()"
- Baoquan He has done some work to make /proc/vmallocinfo and
/proc/kcore better represent the real state of things in his series
"mm/vmalloc.c: allow vread() to read out vm_map_ram areas".
- Jason Gunthorpe rationalized the GUP system's interface to the rest
of the kernel in the series "Simplify the external interface for
GUP".
- SeongJae Park wishes to migrate people from DAMON's debugfs interface
over to its sysfs interface. To support this, we'll temporarily be
printing warnings when people use the debugfs interface. See the
series "mm/damon: deprecate DAMON debugfs interface".
- Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
and clean-ups" series.
- Huang Ying has provided a dramatic reduction in migration's TLB flush
IPI rates with the series "migrate_pages(): batch TLB flushing".
- Arnd Bergmann has some objtool fixups in "objtool warning fixes".
* tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits)
include/linux/migrate.h: remove unneeded externs
mm/memory_hotplug: cleanup return value handing in do_migrate_range()
mm/uffd: fix comment in handling pte markers
mm: change to return bool for isolate_movable_page()
mm: hugetlb: change to return bool for isolate_hugetlb()
mm: change to return bool for isolate_lru_page()
mm: change to return bool for folio_isolate_lru()
objtool: add UACCESS exceptions for __tsan_volatile_read/write
kmsan: disable ftrace in kmsan core code
kasan: mark addr_has_metadata __always_inline
mm: memcontrol: rename memcg_kmem_enabled()
sh: initialize max_mapnr
m68k/nommu: add missing definition of ARCH_PFN_OFFSET
mm: percpu: fix incorrect size in pcpu_obj_full_size()
maple_tree: reduce stack usage with gcc-9 and earlier
mm: page_alloc: call panic() when memoryless node allocation fails
mm: multi-gen LRU: avoid futile retries
migrate_pages: move THP/hugetlb migration support check to simplify code
migrate_pages: batch flushing TLB
migrate_pages: share more code between _unmap and _move
...
- Performance tweaks for efifb earlycon by Andy
- Preparatory refactoring and cleanup work in the efivar layer by Johan,
which is needed to accommodate the Snapdragon arm64 laptops that
expose their EFI variable store via a TEE secure world API.
- Enhancements to the EFI memory map handling so that Xen dom0 can
safely access EFI configuration tables (Demi Marie)
- Wire up the newly introduced IBT/BTI flag in the EFI memory attributes
table, so that firmware that is generated with ENDBR/BTI landing pads
will be mapped with enforcement enabled.
- Clean up how we check and print the EFI revision exposed by the
firmware.
- Incorporate EFI memory attributes protocol definition contributed by
Evgeniy and wire it up in the EFI zboot code. This ensures that these
images can execute under new and stricter rules regarding the default
memory permissions for EFI page allocations. (More work is in progress
here)
- CPER header cleanup by Dan Williams
- Use a raw spinlock to protect the EFI runtime services stack on arm64
to ensure the correct semantics under -rt. (Pierre)
- EFI framebuffer quirk for Lenovo Ideapad by Darrell.
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmPzuwsACgkQw08iOZLZ
jyS7dwwAm95DlDxFIQi4FmTm2mqJws9PyDrkfaAK1CoyqCgeOLQT2FkVolgr8jne
pwpwCTXtYP8y0BZvdQEIjpAq/BHKaD3GJSPfl7lo+pnUu68PpsFWaV6EdT33KKfj
QeF0MnUvrqUeTFI77D+S0ZW2zxdo9eCcahF3TPA52/bEiiDHWBF8Qm9VHeQGklik
zoXA15ft3mgITybgjEA0ncGrVZiBMZrYoMvbdkeoedfw02GN/eaQn8d2iHBtTDEh
3XNlo7ONX0v50cjt0yvwFEA0AKo0o7R1cj+ziKH/bc4KjzIiCbINhy7blroSq+5K
YMlnPHuj8Nhv3I+MBdmn/nxRCQeQsE4RfRru04hfNfdcqjAuqwcBvRXvVnjWKZHl
CmUYs+p/oqxrQ4BjiHfw0JKbXRsgbFI6o3FeeLH9kzI9IDUPpqu3Ma814FVok9Ai
zbOCrJf5tEtg5tIavcUESEMBuHjEafqzh8c7j7AAqbaNjlihsqosDy9aYoarEi5M
f/tLec86
=+pOz
-----END PGP SIGNATURE-----
Merge tag 'efi-next-for-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI updates from Ard Biesheuvel:
"A healthy mix of EFI contributions this time:
- Performance tweaks for efifb earlycon (Andy)
- Preparatory refactoring and cleanup work in the efivar layer, which
is needed to accommodate the Snapdragon arm64 laptops that expose
their EFI variable store via a TEE secure world API (Johan)
- Enhancements to the EFI memory map handling so that Xen dom0 can
safely access EFI configuration tables (Demi Marie)
- Wire up the newly introduced IBT/BTI flag in the EFI memory
attributes table, so that firmware that is generated with ENDBR/BTI
landing pads will be mapped with enforcement enabled
- Clean up how we check and print the EFI revision exposed by the
firmware
- Incorporate EFI memory attributes protocol definition and wire it
up in the EFI zboot code (Evgeniy)
This ensures that these images can execute under new and stricter
rules regarding the default memory permissions for EFI page
allocations (More work is in progress here)
- CPER header cleanup (Dan Williams)
- Use a raw spinlock to protect the EFI runtime services stack on
arm64 to ensure the correct semantics under -rt (Pierre)
- EFI framebuffer quirk for Lenovo Ideapad (Darrell)"
* tag 'efi-next-for-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (24 commits)
firmware/efi sysfb_efi: Add quirk for Lenovo IdeaPad Duet 3
arm64: efi: Make efi_rt_lock a raw_spinlock
efi: Add mixed-mode thunk recipe for GetMemoryAttributes
efi: x86: Wire up IBT annotation in memory attributes table
efi: arm64: Wire up BTI annotation in memory attributes table
efi: Discover BTI support in runtime services regions
efi/cper, cxl: Remove cxl_err.h
efi: Use standard format for printing the EFI revision
efi: Drop minimum EFI version check at boot
efi: zboot: Use EFI protocol to remap code/data with the right attributes
efi/libstub: Add memory attribute protocol definitions
efi: efivars: prevent double registration
efi: verify that variable services are supported
efivarfs: always register filesystem
efi: efivars: add efivars printk prefix
efi: Warn if trying to reserve memory under Xen
efi: Actually enable the ESRT under Xen
efi: Apply allowlist to EFI configuration tables when running under Xen
efi: xen: Implement memory descriptor lookup based on hypercall
efi: memmap: Disregard bogus entries instead of returning them
...
This patch adds kernel function call support for RV64. Since the offset
from RV64 kernel and module functions to bpf programs is almost within
the range of s32, the current infrastructure of RV64 is already
sufficient for kfunc, so let's turn it on.
Suggested-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Acked-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230221140656.3480496-1-pulehui@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The patchwork automation reported a sparse complaint that
spin_shadow_stack was not declared and should be static:
../arch/riscv/kernel/traps.c:335:15: warning: symbol 'spin_shadow_stack' was not declared. Should it be static?
However, this is used in entry.S and therefore shouldn't be static.
The same applies to the shadow_stack that this pseudo spinlock is
trying to protect, so do like its charge and add a declaration to
thread_info.h
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Fixes: 7e1864332f ("riscv: fix race when vmap stack overflow")
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230210185945.915806-1-conor@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Core
----
- Add dedicated kmem_cache for typical/small skb->head, avoid having
to access struct page at kfree time, and improve memory use.
- Introduce sysctl to set default RPS configuration for new netdevs.
- Define Netlink protocol specification format which can be used
to describe messages used by each family and auto-generate parsers.
Add tools for generating kernel data structures and uAPI headers.
- Expose all net/core sysctls inside netns.
- Remove 4s sleep in netpoll if carrier is instantly detected on boot.
- Add configurable limit of MDB entries per port, and port-vlan.
- Continue populating drop reasons throughout the stack.
- Retire a handful of legacy Qdiscs and classifiers.
Protocols
---------
- Support IPv4 big TCP (TSO frames larger than 64kB).
- Add IP_LOCAL_PORT_RANGE socket option, to control local port range
on socket by socket basis.
- Track and report in procfs number of MPTCP sockets used.
- Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP
path manager.
- IPv6: don't check net.ipv6.route.max_size and rely on garbage
collection to free memory (similarly to IPv4).
- Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).
- ICMP: add per-rate limit counters.
- Add support for user scanning requests in ieee802154.
- Remove static WEP support.
- Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
reporting.
- WiFi 7 EHT channel puncturing support (client & AP).
BPF
---
- Add a rbtree data structure following the "next-gen data structure"
precedent set by recently added linked list, that is, by using
kfunc + kptr instead of adding a new BPF map type.
- Expose XDP hints via kfuncs with initial support for RX hash and
timestamp metadata.
- Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key
to better support decap on GRE tunnel devices not operating
in collect metadata.
- Improve x86 JIT's codegen for PROBE_MEM runtime error checks.
- Remove the need for trace_printk_lock for bpf_trace_printk
and bpf_trace_vprintk helpers.
- Extend libbpf's bpf_tracing.h support for tracing arguments of
kprobes/uprobes and syscall as a special case.
- Significantly reduce the search time for module symbols
by livepatch and BPF.
- Enable cpumasks to be used as kptrs, which is useful for tracing
programs tracking which tasks end up running on which CPUs in
different time intervals.
- Add support for BPF trampoline on s390x and riscv64.
- Add capability to export the XDP features supported by the NIC.
- Add __bpf_kfunc tag for marking kernel functions as kfuncs.
- Add cgroup.memory=nobpf kernel parameter option to disable BPF
memory accounting for container environments.
Netfilter
---------
- Remove the CLUSTERIP target. It has been marked as obsolete
for years, and we still have WARN splats wrt. races of
the out-of-band /proc interface installed by this target.
- Add 'destroy' commands to nf_tables. They are identical to
the existing 'delete' commands, but do not return an error if
the referenced object (set, chain, rule...) did not exist.
Driver API
----------
- Improve cpumask_local_spread() locality to help NICs set the right
IRQ affinity on AMD platforms.
- Separate C22 and C45 MDIO bus transactions more clearly.
- Introduce new DCB table to control DSCP rewrite on egress.
- Support configuration of Physical Layer Collision Avoidance (PLCA)
Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
shared medium Ethernet.
- Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
preemption of low priority frames by high priority frames.
- Add support for controlling MACSec offload using netlink SET.
- Rework devlink instance refcounts to allow registration and
de-registration under the instance lock. Split the code into multiple
files, drop some of the unnecessarily granular locks and factor out
common parts of netlink operation handling.
- Add TX frame aggregation parameters (for USB drivers).
- Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
messages with notifications for debug.
- Allow offloading of UDP NEW connections via act_ct.
- Add support for per action HW stats in TC.
- Support hardware miss to TC action (continue processing in SW from
a specific point in the action chain).
- Warn if old Wireless Extension user space interface is used with
modern cfg80211/mac80211 drivers. Do not support Wireless Extensions
for Wi-Fi 7 devices at all. Everyone should switch to using nl80211
interface instead.
- Improve the CAN bit timing configuration. Use extack to return error
messages directly to user space, update the SJW handling, including
the definition of a new default value that will benefit CAN-FD
controllers, by increasing their oscillator tolerance.
New hardware / drivers
----------------------
- Ethernet:
- nVidia BlueField-3 support (control traffic driver)
- Ethernet support for imx93 SoCs
- Motorcomm yt8531 gigabit Ethernet PHY
- onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
- Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
- Amlogic gxl MDIO mux
- WiFi:
- RealTek RTL8188EU (rtl8xxxu)
- Qualcomm Wi-Fi 7 devices (ath12k)
- CAN:
- Renesas R-Car V4H
Drivers
-------
- Bluetooth:
- Set Per Platform Antenna Gain (PPAG) for Intel controllers.
- Ethernet NICs:
- Intel (1G, igc):
- support TSN / Qbv / packet scheduling features of i226 model
- Intel (100G, ice):
- use GNSS subsystem instead of TTY
- multi-buffer XDP support
- extend support for GPIO pins to E823 devices
- nVidia/Mellanox:
- update the shared buffer configuration on PFC commands
- implement PTP adjphase function for HW offset control
- TC support for Geneve and GRE with VF tunnel offload
- more efficient crypto key management method
- multi-port eswitch support
- Netronome/Corigine:
- add DCB IEEE support
- support IPsec offloading for NFP3800
- Freescale/NXP (enetc):
- enetc: support XDP_REDIRECT for XDP non-linear buffers
- enetc: improve reconfig, avoid link flap and waiting for idle
- enetc: support MAC Merge layer
- Other NICs:
- sfc/ef100: add basic devlink support for ef100
- ionic: rx_push mode operation (writing descriptors via MMIO)
- bnxt: use the auxiliary bus abstraction for RDMA
- r8169: disable ASPM and reset bus in case of tx timeout
- cpsw: support QSGMII mode for J721e CPSW9G
- cpts: support pulse-per-second output
- ngbe: add an mdio bus driver
- usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
- r8152: handle devices with FW with NCM support
- amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
- virtio-net: support multi buffer XDP
- virtio/vsock: replace virtio_vsock_pkt with sk_buff
- tsnep: XDP support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add support for latency TLV (in FW control messages)
- Microchip (sparx5):
- separate explicit and implicit traffic forwarding rules, make
the implicit rules always active
- add support for egress DSCP rewrite
- IS0 VCAP support (Ingress Classification)
- IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS etc.)
- ES2 VCAP support (Egress Access Control)
- support for Per-Stream Filtering and Policing (802.1Q, 8.6.5.1)
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- add MAB (port auth) offload support
- enable PTP receive for mv88e6390
- NXP (ocelot):
- support MAC Merge layer
- support for the the vsc7512 internal copper phys
- Microchip:
- lan9303: convert to PHYLINK
- lan966x: support TC flower filter statistics
- lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
- lan937x: support Credit Based Shaper configuration
- ksz9477: support Energy Efficient Ethernet
- other:
- qca8k: convert to regmap read/write API, use bulk operations
- rswitch: Improve TX timestamp accuracy
- Intel WiFi (iwlwifi):
- EHT (Wi-Fi 7) rate reporting
- STEP equalizer support: transfer some STEP (connection to radio
on platforms with integrated wifi) related parameters from the
BIOS to the firmware.
- Qualcomm 802.11ax WiFi (ath11k):
- IPQ5018 support
- Fine Timing Measurement (FTM) responder role support
- channel 177 support
- MediaTek WiFi (mt76):
- per-PHY LED support
- mt7996: EHT (Wi-Fi 7) support
- Wireless Ethernet Dispatch (WED) reset support
- switch to using page pool allocator
- RealTek WiFi (rtw89):
- support new version of Bluetooth co-existance
- Mobile:
- rmnet: support TX aggregation.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmP1VIYACgkQMUZtbf5S
IrvsChAApz0rNL/sPKxXTEfxZ1tN7D3sYxYKQPomxvl5BV+MvicrLddJy3KmzEFK
nnJNO3nuRNuH422JQ/ylZ4mGX1opa6+5QJb0UINImXUI7Fm8HHBIuPGkv7d5CheZ
7JexFqjPJXUy9nPyh1Rra+IA9AcRd2U7jeGEZR38wb99bHJQj5Bzdk20WArEB0el
n44aqg49LXH71bSeXRz77x5SjkwVtYiccQxLcnmTbjLU2xVraLvI2J+wAhHnVXWW
9lrU1+V4Ex2Xcd1xR0L0cHeK+meP1TrPRAeF+JDpVI3a/zJiE7cZjfHdG/jH5xWl
leZJqghVozrZQNtewWWO7XhUFhMDgFu3W/1vNLjSHPZEqaz1JpM67J1+ql6s63l4
LMWoXbcYZz+SL9ZRCoPkbGue/5fKSHv8/Jl9Sh58+eTS+c/zgN8uFGRNFXLX1+EP
n8uvt985PxMd6x1+dHumhOUzxnY4Sfi1vjitSunTsNFQ3Cmp4SO0IfBVJWfLUCuC
xz5hbJGJJbSpvUsO+HWyCg83E5OWghRE/Onpt2jsQSZCrO9HDg4FRTEf3WAMgaqc
edb5KfbRZPTJQM08gWdluXzSk1nw3FNP2tXW4XlgUrEbjb+fOk0V9dQg2gyYTxQ1
Nhvn8ZQPi6/GMMELHAIPGmmW1allyOGiAzGlQsv8EmL+OFM6WDI=
=xXhC
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core:
- Add dedicated kmem_cache for typical/small skb->head, avoid having
to access struct page at kfree time, and improve memory use.
- Introduce sysctl to set default RPS configuration for new netdevs.
- Define Netlink protocol specification format which can be used to
describe messages used by each family and auto-generate parsers.
Add tools for generating kernel data structures and uAPI headers.
- Expose all net/core sysctls inside netns.
- Remove 4s sleep in netpoll if carrier is instantly detected on
boot.
- Add configurable limit of MDB entries per port, and port-vlan.
- Continue populating drop reasons throughout the stack.
- Retire a handful of legacy Qdiscs and classifiers.
Protocols:
- Support IPv4 big TCP (TSO frames larger than 64kB).
- Add IP_LOCAL_PORT_RANGE socket option, to control local port range
on socket by socket basis.
- Track and report in procfs number of MPTCP sockets used.
- Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path
manager.
- IPv6: don't check net.ipv6.route.max_size and rely on garbage
collection to free memory (similarly to IPv4).
- Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).
- ICMP: add per-rate limit counters.
- Add support for user scanning requests in ieee802154.
- Remove static WEP support.
- Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
reporting.
- WiFi 7 EHT channel puncturing support (client & AP).
BPF:
- Add a rbtree data structure following the "next-gen data structure"
precedent set by recently added linked list, that is, by using
kfunc + kptr instead of adding a new BPF map type.
- Expose XDP hints via kfuncs with initial support for RX hash and
timestamp metadata.
- Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to
better support decap on GRE tunnel devices not operating in collect
metadata.
- Improve x86 JIT's codegen for PROBE_MEM runtime error checks.
- Remove the need for trace_printk_lock for bpf_trace_printk and
bpf_trace_vprintk helpers.
- Extend libbpf's bpf_tracing.h support for tracing arguments of
kprobes/uprobes and syscall as a special case.
- Significantly reduce the search time for module symbols by
livepatch and BPF.
- Enable cpumasks to be used as kptrs, which is useful for tracing
programs tracking which tasks end up running on which CPUs in
different time intervals.
- Add support for BPF trampoline on s390x and riscv64.
- Add capability to export the XDP features supported by the NIC.
- Add __bpf_kfunc tag for marking kernel functions as kfuncs.
- Add cgroup.memory=nobpf kernel parameter option to disable BPF
memory accounting for container environments.
Netfilter:
- Remove the CLUSTERIP target. It has been marked as obsolete for
years, and we still have WARN splats wrt races of the out-of-band
/proc interface installed by this target.
- Add 'destroy' commands to nf_tables. They are identical to the
existing 'delete' commands, but do not return an error if the
referenced object (set, chain, rule...) did not exist.
Driver API:
- Improve cpumask_local_spread() locality to help NICs set the right
IRQ affinity on AMD platforms.
- Separate C22 and C45 MDIO bus transactions more clearly.
- Introduce new DCB table to control DSCP rewrite on egress.
- Support configuration of Physical Layer Collision Avoidance (PLCA)
Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
shared medium Ethernet.
- Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
preemption of low priority frames by high priority frames.
- Add support for controlling MACSec offload using netlink SET.
- Rework devlink instance refcounts to allow registration and
de-registration under the instance lock. Split the code into
multiple files, drop some of the unnecessarily granular locks and
factor out common parts of netlink operation handling.
- Add TX frame aggregation parameters (for USB drivers).
- Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
messages with notifications for debug.
- Allow offloading of UDP NEW connections via act_ct.
- Add support for per action HW stats in TC.
- Support hardware miss to TC action (continue processing in SW from
a specific point in the action chain).
- Warn if old Wireless Extension user space interface is used with
modern cfg80211/mac80211 drivers. Do not support Wireless
Extensions for Wi-Fi 7 devices at all. Everyone should switch to
using nl80211 interface instead.
- Improve the CAN bit timing configuration. Use extack to return
error messages directly to user space, update the SJW handling,
including the definition of a new default value that will benefit
CAN-FD controllers, by increasing their oscillator tolerance.
New hardware / drivers:
- Ethernet:
- nVidia BlueField-3 support (control traffic driver)
- Ethernet support for imx93 SoCs
- Motorcomm yt8531 gigabit Ethernet PHY
- onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
- Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
- Amlogic gxl MDIO mux
- WiFi:
- RealTek RTL8188EU (rtl8xxxu)
- Qualcomm Wi-Fi 7 devices (ath12k)
- CAN:
- Renesas R-Car V4H
Drivers:
- Bluetooth:
- Set Per Platform Antenna Gain (PPAG) for Intel controllers.
- Ethernet NICs:
- Intel (1G, igc):
- support TSN / Qbv / packet scheduling features of i226 model
- Intel (100G, ice):
- use GNSS subsystem instead of TTY
- multi-buffer XDP support
- extend support for GPIO pins to E823 devices
- nVidia/Mellanox:
- update the shared buffer configuration on PFC commands
- implement PTP adjphase function for HW offset control
- TC support for Geneve and GRE with VF tunnel offload
- more efficient crypto key management method
- multi-port eswitch support
- Netronome/Corigine:
- add DCB IEEE support
- support IPsec offloading for NFP3800
- Freescale/NXP (enetc):
- support XDP_REDIRECT for XDP non-linear buffers
- improve reconfig, avoid link flap and waiting for idle
- support MAC Merge layer
- Other NICs:
- sfc/ef100: add basic devlink support for ef100
- ionic: rx_push mode operation (writing descriptors via MMIO)
- bnxt: use the auxiliary bus abstraction for RDMA
- r8169: disable ASPM and reset bus in case of tx timeout
- cpsw: support QSGMII mode for J721e CPSW9G
- cpts: support pulse-per-second output
- ngbe: add an mdio bus driver
- usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
- r8152: handle devices with FW with NCM support
- amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
- virtio-net: support multi buffer XDP
- virtio/vsock: replace virtio_vsock_pkt with sk_buff
- tsnep: XDP support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add support for latency TLV (in FW control messages)
- Microchip (sparx5):
- separate explicit and implicit traffic forwarding rules, make
the implicit rules always active
- add support for egress DSCP rewrite
- IS0 VCAP support (Ingress Classification)
- IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS
etc.)
- ES2 VCAP support (Egress Access Control)
- support for Per-Stream Filtering and Policing (802.1Q,
8.6.5.1)
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- add MAB (port auth) offload support
- enable PTP receive for mv88e6390
- NXP (ocelot):
- support MAC Merge layer
- support for the the vsc7512 internal copper phys
- Microchip:
- lan9303: convert to PHYLINK
- lan966x: support TC flower filter statistics
- lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
- lan937x: support Credit Based Shaper configuration
- ksz9477: support Energy Efficient Ethernet
- other:
- qca8k: convert to regmap read/write API, use bulk operations
- rswitch: Improve TX timestamp accuracy
- Intel WiFi (iwlwifi):
- EHT (Wi-Fi 7) rate reporting
- STEP equalizer support: transfer some STEP (connection to radio
on platforms with integrated wifi) related parameters from the
BIOS to the firmware.
- Qualcomm 802.11ax WiFi (ath11k):
- IPQ5018 support
- Fine Timing Measurement (FTM) responder role support
- channel 177 support
- MediaTek WiFi (mt76):
- per-PHY LED support
- mt7996: EHT (Wi-Fi 7) support
- Wireless Ethernet Dispatch (WED) reset support
- switch to using page pool allocator
- RealTek WiFi (rtw89):
- support new version of Bluetooth co-existance
- Mobile:
- rmnet: support TX aggregation"
* tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits)
page_pool: add a comment explaining the fragment counter usage
net: ethtool: fix __ethtool_dev_mm_supported() implementation
ethtool: pse-pd: Fix double word in comments
xsk: add linux/vmalloc.h to xsk.c
sefltests: netdevsim: wait for devlink instance after netns removal
selftest: fib_tests: Always cleanup before exit
net/mlx5e: Align IPsec ASO result memory to be as required by hardware
net/mlx5e: TC, Set CT miss to the specific ct action instance
net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
net/mlx5: Refactor tc miss handling to a single function
net/mlx5: Kconfig: Make tc offload depend on tc skb extension
net/sched: flower: Support hardware miss to tc action
net/sched: flower: Move filter handle initialization earlier
net/sched: cls_api: Support hardware miss to tc action
net/sched: Rename user cookie and act cookie
sfc: fix builds without CONFIG_RTC_LIB
sfc: clean up some inconsistent indentings
net/mlx4_en: Introduce flexible array to silence overflow warning
net: lan966x: Fix possible deadlock inside PTP
net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
...
Add HVO support for RISC-V; see commit 6be24bed9d ("mm: hugetlb:
introduce a new config HUGETLB_PAGE_FREE_VMEMMAP"). This patch is
similar to commit 1e63ac088f ("arm64: mm: hugetlb: enable
HUGETLB_PAGE_FREE_VMEMMAP for arm64"), and riscv's motivation is the
same as arm64. The current riscv was ready to enable HVO after fixup,
ref commit d33deda095 ("riscv/mm: hugepage's PG_dcache_clean flag
is only set in head page").
See Documentation/mm/vmemmap_dedup.rst for more details.
The HugeTLB VmemmapvOptimization (HVO) defaults to off in Kconfig.
Here is the riscv test log:
cat /proc/sys/vm/hugetlb_optimize_vmemmap
echo 8 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
mount -t hugetlbfs none test/ -o pagesize=2048k
<Try some simple hugetlb test in test dir, no problem found.>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/linux-riscv/1F5AF29D-708A-483B-A29F-CAEE6F554866@linux.dev/
Acked-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230201015259.3222524-1-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Add header include guards to insn.h to prevent repeating declaration of
any identifiers in insn.h.
Fixes: edde5584c7 ("riscv: Add SW single-step support for KDB")
Signed-off-by: Liao Chang <liaochang1@huawei.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Fixes: c9c1af3f18 ("RISC-V: rename parse_asm.h to insn.h")
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230129094242.282620-1-liaochang1@huawei.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Björn Töpel <bjorn@kernel.org> says:
From: Björn Töpel <bjorn@rivosinc.com>
RISC-V does not dump faulting instructions in the oops handler. This
series adds "Code:" dumps to the oops output together with
scripts/decodecode support.
* b4-shazam-merge:
scripts/decodecode: Add support for RISC-V
riscv: Add instruction dump to RISC-V splats
Link: https://lore.kernel.org/r/20230119074738.708301-1-bjorn@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
If we patched auipc + jalr pair, we'd better proceed one more
instruction. Andrew pointed out "There's not a problem now, since
we're only adding a fixup for jal, not jalr, but we should
future-proof this and there's no reason to revisit an already fixed-up
instruction anyway."
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Suggested-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20230115162811.3146-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Jisheng Zhang <jszhang@kernel.org> says:
This series tries to improve link time handling of riscv:
patch1 adds the missing RUNTIME_DISCARD_EXIT as suggested by Masahiro.
Similar as other architectures such as x86, arm64 and so on, enable
ARCH_WANT_LD_ORPHAN_WARN to enable linker orphan warnings to prevent
from missing any new sections in future. So the following two patches
are preparation ones, and the last patch finally selects
ARCH_WANT_LD_ORPHAN_WARN
* b4-shazam-merge:
riscv: select ARCH_WANT_LD_ORPHAN_WARN for !XIP_KERNEL
riscv: vmlinux.lds.S: explicitly catch .init.bss sections from EFI stub
riscv: vmlinux.lds.S: explicitly catch .riscv.attributes sections
riscv: vmlinux.lds.S: explicitly catch .rela.dyn symbols
riscv: lds: define RUNTIME_DISCARD_EXIT
Link: https://lore.kernel.org/r/20230119155417.2600-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
While working on something else, I noticed that the kernel would start
accepting interrupts again after crashing in an interrupt handler. Since
the kernel is already in inconsistent state, enabling interrupts is
dangerous and opens up risk of kernel state deteriorating further.
Interrupts do get enabled via what looks like an unintended side effect of
spin_unlock_irq, so switch to the more cautious
spin_lock_irqsave/spin_unlock_irqrestore instead.
Fixes: 76d2a0493a ("RISC-V: Init and Halt Code")
Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20230215144828.3370316-1-mnissler@rivosinc.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Commit 21855cac82 ("riscv/mm: Prevent kernel module to access user
memory without uaccess routines") added early exits/deaths for page
faults stemming from accesses to user-space without using proper
uaccess routines (where sstatus.SUM is set).
Unfortunatly, this is too strict for some BPF programs, which relies
on BPF exhandler fixups. These BPF programs loads "BTF pointers". A
BTF pointers could either be a valid kernel pointer or NULL, but not a
userspace address.
Resolve the problem by calling the fixup handler in the early exit
path.
Fixes: 21855cac82 ("riscv/mm: Prevent kernel module to access user memory without uaccess routines")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230214162515.184827-1-bjorn@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
While the comment above the ISA extension ID definitions says
"Entries are sorted alphabetically.", this stopped being good
advice with commit d8a3d8a752 ("riscv: hwcap: make ISA extension
ids can be used in asm"), as we now use macros instead of enums.
Reshuffling defines is error-prone, so, since they don't need to be
in any particular order, change the advice to just adding new
extensions at the bottom. Also, take the opportunity to change
spaces to tabs, merge three comments into one, and move the base
and max defines into more logical locations wrt the ID definitions.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230209123636.123537-1-ajones@ventanamicro.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
As Andrew reported,
Zb* comes after Zi* according 27.11 "Subset Naming Convention"
so fix the ordering accordingly.
Reported-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230208225328.1636017-2-heiko@sntech.de
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Runtime code patching must be done at a naturally aligned address, or we
may execute on a partial instruction.
We have encountered problems traced back to static jump functions during
the test. We switched the tracer randomly for every 1~5 seconds on a
dual-core QEMU setup and found the kernel sucking at a static branch
where it jumps to itself.
The reason is that the static branch was 2-byte but not 4-byte aligned.
Then, the kernel would patch the instruction, either J or NOP, with two
half-word stores if the machine does not have efficient unaligned
accesses. Thus, moments exist where half of the NOP mixes with the other
half of the J when transitioning the branch. In our particular case, on
a little-endian machine, the upper half of the NOP was mixed with the
lower part of the J when enabling the branch, resulting in a jump that
jumped to itself. Conversely, it would result in a HINT instruction when
disabling the branch, but it might not be observable.
ARM64 does not have this problem since all instructions must be 4-byte
aligned.
Fixes: ebc00dde8a ("riscv: Add jump-label implementation")
Link: https://lore.kernel.org/linux-riscv/20220913094252.3555240-6-andy.chiu@sifive.com/
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230206090440.1255001-1-guoren@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The recent refactoring led to us leaking some HWCAP bits to userspace
that didn't make much sense. With any luck we'll have a better scheme
soon, but for now just mask off those bits to avoid polluting userspace.
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230202233832.11036-1-palmer@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This is a partial revert of the commit 4bd1d80efb ("riscv: mm: notify
remote harts about mmu cache updates"). Original commit included two
loosely related changes serving the same purpose of fixing stale TLB
entries causing user-space application crash:
- introduce deferred per-ASID TLB flush for CPUs not running the task
- switch to per-ASID TLB flush on all CPUs running the task in update_mmu_cache
According to report and discussion in [1], the second part caused a
regression on Renesas RZ/Five SoC. For now restore the old behavior
of the update_mmu_cache.
[1] https://lore.kernel.org/linux-riscv/20220829205219.283543-1-geomatsi@gmail.com/
Fixes: 4bd1d80efb ("riscv: mm: notify remote harts about mmu cache updates")
Reported-by: "Lad, Prabhakar" <prabhakar.csengg@gmail.com>
Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
Link: trailer, so that it can be parsed with git's trailer functionality?
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230129211818.686557-1-geomatsi@gmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Now, after that all the sections are explicitly described and
declared in vmlinux.lds.S, we can enable ld orphan warnings for
!XIP_KERNEL to prevent from missing any new sections in future.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230119155417.2600-6-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
When enabling linker orphan section warning, I got warnings similar as
below:
ld.lld: warning:
./drivers/firmware/efi/libstub/lib.a(efi-stub-helper.stub.o):(.init.bss)
is being placed in '.init.bss'
Catch the sections so that we can enable linker orphan section warning.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20230119155417.2600-5-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
When enabling linker orphan section warning, I got warnings similar as
below:
riscv64-linux-gnu-ld: warning: orphan section `.riscv.attributes' from
`init/main.o' being placed in section `.riscv.attributes'
While I don't see any usage of .riscv.attributes sections' in kernel
now, just catch the sections so that we can enable linker orphan
section warning.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20230119155417.2600-4-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
When enabling linker orphan section warning, I got warnings similar as
below:
riscv64-linux-gnu-ld: warning: orphan section `.rela.text' from
`init/main.o' being placed in section `.rela.dyn'
Use the approach similar as ARM64 does and declare it in vmlinux.lds.S
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20230119155417.2600-3-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
riscv discards .exit.* sections at run-time but doesn't define
RUNTIME_DISCARD_EXIT. However, the .exit.* sections are still allocated
and kept even if the generic DISCARDS would discard the sections due
to missing RUNTIME_DISCARD_EXIT, because the DISCARD sits at the end of
the linker script. Add the missing RUNTIME_DISCARD_EXIT define so that
it still works if we move DISCARD up or even at the beginning of the
linker script.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Suggested-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230119155417.2600-2-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Core:
- Yet another round of improvements to make the clocksource watchdog
more robust:
- Relax the clocksource-watchdog skew criteria to match the NTP
criteria.
- Temporarily skip the watchdog when high memory latencies are
detected which can lead to false-positives.
- Provide an option to enable TSC skew detection even on systems
where TSC is marked as reliable.
Sigh!
- Initialize the restart block in the nanosleep syscalls to be directed
to the no restart function instead of doing a partial setup on entry.
This prevents an erroneous restart_syscall() invocation from
corrupting user space data. While such a situation is clearly a user
space bug, preventing this is a correctness issue and caters to the
least suprise principle.
- Ignore the hrtimer slack for realtime tasks in schedule_hrtimeout()
to align it with the nanosleep semantics.
Drivers:
- The obligatory new driver bindings for Mediatek, Rockchip and RISC-V
variants.
- Add support for the C3STOP misfeature to the RISC-V timer to handle
the case where the timer stops in deeper idle state.
- Set up a static key in the RISC-V timer correctly before first use.
- The usual small improvements and fixes all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmPzV+cTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoYlDEACMrjN2F6qeiOW94t4nQ3qP1M9AMSgO
OihC04XuM14/3tEviu/cUOd60wYcUQ/kfI5C+IL35ezeP2w9lnuKqeFpG7aDOa33
5F3isDPamJdXZEZs44CW15brR6dqDlEi5acKee/TtFV9mN6xNhzxM64IaFqecPmW
P+BTwunB8xwquY8RzsHXor/GOGb6mqWQIPoHEPnywTDe/xQYWt0Exzi7ch6HQr5Z
ZzHG6X4h6UTNimjay6L4qsRQWILmPIg4Z5IlycWMQ8qDFM0lbnIJqkG4JwceolI6
aRQyLe3NQFcPYgq3ue+SNm4RckYn4NbAa1zFm0d5VDgKp4xW1sxvtkxOJuxjaOw2
/rLkHkmyuVvCeTMAySfxrwnszAoM505CHC6CEYc1xELbeCkROFUaymtVyNFnnTru
V/Jt/T2Gyx6tOrafX7u+djUjv9figddRpNbskVZvEi3Ztq4MQ069nK3oSUqtP5vO
INApNg4lq6s8aGqVE+Kp9+CKwGqZqI4MdxQMNMAmCRLPon6apActVawbj18qO/wS
qblQ0cbF8a16itlQ3V68qmhcPh6EZOuq8II4etNq6U0ulV9712WfMbat3z53LG94
QNkAmZ3/wui93I+Q2NPxhf5ybJFQZhR0SOtVO6xIdTgOntkODwzzGu9UapfD8mLb
k5BpWnH8CoUgiw==
=I67j
-----END PGP SIGNATURE-----
Merge tag 'timers-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"Updates for timekeeping, timers and clockevent/source drivers:
Core:
- Yet another round of improvements to make the clocksource watchdog
more robust:
- Relax the clocksource-watchdog skew criteria to match the NTP
criteria.
- Temporarily skip the watchdog when high memory latencies are
detected which can lead to false-positives.
- Provide an option to enable TSC skew detection even on systems
where TSC is marked as reliable.
Sigh!
- Initialize the restart block in the nanosleep syscalls to be
directed to the no restart function instead of doing a partial
setup on entry.
This prevents an erroneous restart_syscall() invocation from
corrupting user space data. While such a situation is clearly a
user space bug, preventing this is a correctness issue and caters
to the least suprise principle.
- Ignore the hrtimer slack for realtime tasks in schedule_hrtimeout()
to align it with the nanosleep semantics.
Drivers:
- The obligatory new driver bindings for Mediatek, Rockchip and
RISC-V variants.
- Add support for the C3STOP misfeature to the RISC-V timer to handle
the case where the timer stops in deeper idle state.
- Set up a static key in the RISC-V timer correctly before first use.
- The usual small improvements and fixes all over the place"
* tag 'timers-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
clocksource/drivers/timer-sun4i: Add CLOCK_EVT_FEAT_DYNIRQ
clocksource/drivers/em_sti: Mark driver as non-removable
clocksource/drivers/sh_tmu: Mark driver as non-removable
clocksource/drivers/riscv: Patch riscv_clock_next_event() jump before first use
clocksource/drivers/timer-microchip-pit64b: Add delay timer
clocksource/drivers/timer-microchip-pit64b: Select driver only on ARM
dt-bindings: timer: sifive,clint: add comaptibles for T-Head's C9xx
dt-bindings: timer: mediatek,mtk-timer: add MT8365
clocksource/drivers/riscv: Get rid of clocksource_arch_init() callback
clocksource/drivers/sh_cmt: Mark driver as non-removable
clocksource/drivers/timer-microchip-pit64b: Drop obsolete dependency on COMPILE_TEST
clocksource/drivers/riscv: Increase the clock source rating
clocksource/drivers/timer-riscv: Set CLOCK_EVT_FEAT_C3STOP based on DT
dt-bindings: timer: Add bindings for the RISC-V timer device
RISC-V: time: initialize hrtimer based broadcast clock event device
dt-bindings: timer: rk-timer: Add rktimer for rv1126
time/debug: Fix memory leak with using debugfs_lookup()
clocksource: Enable TSC watchdog checking of HPET and PMTMR only when requested
posix-timers: Use atomic64_try_cmpxchg() in __update_gt_cputime()
clocksource: Verify HPET and PMTMR when TSC unverified
...
- Improve the scalability of the CFS bandwidth unthrottling logic
with large number of CPUs.
- Fix & rework various cpuidle routines, simplify interaction with
the generic scheduler code. Add __cpuidle methods as noinstr to
objtool's noinstr detection and fix boatloads of cpuidle bugs & quirks.
- Add new ABI: introduce MEMBARRIER_CMD_GET_REGISTRATIONS,
to query previously issued registrations.
- Limit scheduler slice duration to the sysctl_sched_latency period,
to improve scheduling granularity with a large number of SCHED_IDLE
tasks.
- Debuggability enhancement on sys_exit(): warn about disabled IRQs,
but also enable them to prevent a cascade of followup problems and
repeat warnings.
- Fix the rescheduling logic in prio_changed_dl().
- Micro-optimize cpufreq and sched-util methods.
- Micro-optimize ttwu_runnable()
- Micro-optimize the idle-scanning in update_numa_stats(),
select_idle_capacity() and steal_cookie_task().
- Update the RSEQ code & self-tests
- Constify various scheduler methods
- Remove unused methods
- Refine __init tags
- Documentation updates
- ... Misc other cleanups, fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmPzbJwRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1iIvA//ZcEaB8Z6ChLRQjM+bsaudKJu3pdLQbPK
iYbP8Da+LsAfxbEfYuGV3m+jIp0LlBOtsI/EezxQrXV+V7FvNyAX9Y00eEu/zlj8
7Jn3LMy/DBYTwH7LwVdcU0MyIVI8ZPc6WNnkx0LOtGZn8n+qfHPSDzcP3CW+a5AV
UvllPYpYyEmsX0Eby7CF4Ue8mSmbViw/xR3rNr8ZSve0c25XzKabw8O9kE3jiHxP
d/zERJoAYeDyYUEuZqhfn5dTlB4an4IjNEkAfRE5SQ09RA8Gkxsa5Ar8gob9e9M1
eQsdd4/bdhnrkM8L5qDZczqmgCTZ2bukQrxkBXhRDhLgoFxwAn77b+2ZjmIW3Lae
AyGqRcDSg1q2oxaYm5ZiuO/t26aDOZu9vPHyHRDGt95EGbZlrp+GgeePyfCigJYz
UmPdZAAcHdSymnnnlcvdG37WVvaVkpgWZzd8LbtBi23QR+Zc4WQ2IlgnUS5WKNNf
VOBcAcP6E1IslDotZDQCc2dPFFQoQQEssVooyUc5oMytm7BsvxXLOeHG+Ncu/8uc
H+U8Qn8jnqTxJbC5hkWQIJlhVKCq2FJrHxxySYTKROfUNcDgCmxboFeAcXTCIU1K
T0S+sdoTS/CvtLklRkG0j6B8N4N98mOd9cFwUV3tX+/gMLMep3hCQs5L76JagvC5
skkQXoONNaM=
=l1nN
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
- Improve the scalability of the CFS bandwidth unthrottling logic with
large number of CPUs.
- Fix & rework various cpuidle routines, simplify interaction with the
generic scheduler code. Add __cpuidle methods as noinstr to objtool's
noinstr detection and fix boatloads of cpuidle bugs & quirks.
- Add new ABI: introduce MEMBARRIER_CMD_GET_REGISTRATIONS, to query
previously issued registrations.
- Limit scheduler slice duration to the sysctl_sched_latency period, to
improve scheduling granularity with a large number of SCHED_IDLE
tasks.
- Debuggability enhancement on sys_exit(): warn about disabled IRQs,
but also enable them to prevent a cascade of followup problems and
repeat warnings.
- Fix the rescheduling logic in prio_changed_dl().
- Micro-optimize cpufreq and sched-util methods.
- Micro-optimize ttwu_runnable()
- Micro-optimize the idle-scanning in update_numa_stats(),
select_idle_capacity() and steal_cookie_task().
- Update the RSEQ code & self-tests
- Constify various scheduler methods
- Remove unused methods
- Refine __init tags
- Documentation updates
- Misc other cleanups, fixes
* tag 'sched-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (110 commits)
sched/rt: pick_next_rt_entity(): check list_entry
sched/deadline: Add more reschedule cases to prio_changed_dl()
sched/fair: sanitize vruntime of entity being placed
sched/fair: Remove capacity inversion detection
sched/fair: unlink misfit task from cpu overutilized
objtool: mem*() are not uaccess safe
cpuidle: Fix poll_idle() noinstr annotation
sched/clock: Make local_clock() noinstr
sched/clock/x86: Mark sched_clock() noinstr
x86/pvclock: Improve atomic update of last_value in pvclock_clocksource_read()
x86/atomics: Always inline arch_atomic64*()
cpuidle: tracing, preempt: Squash _rcuidle tracing
cpuidle: tracing: Warn about !rcu_is_watching()
cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG
cpuidle: drivers: firmware: psci: Dont instrument suspend code
KVM: selftests: Fix build of rseq test
exit: Detect and fix irq disabled state in oops
cpuidle, arm64: Fix the ARM64 cpuidle logic
cpuidle: mvebu: Fix duplicate flags assignment
sched/fair: Limit sched slice duration
...
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY+/uBgAKCRDbK58LschI
g0ngAPwJHd1RicBuy2C4fLv0nGKZtmYZBAnTGlI2RisPxU6BRwEAwUDLHuc5K6nR
j261okOxOy/MRxdN1NhmR6Qe7nMyQAk=
=tYU+
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2023-02-17
We've added 64 non-merge commits during the last 7 day(s) which contain
a total of 158 files changed, 4190 insertions(+), 988 deletions(-).
The main changes are:
1) Add a rbtree data structure following the "next-gen data structure"
precedent set by recently-added linked-list, that is, by using
kfunc + kptr instead of adding a new BPF map type, from Dave Marchevsky.
2) Add a new benchmark for hashmap lookups to BPF selftests,
from Anton Protopopov.
3) Fix bpf_fib_lookup to only return valid neighbors and add an option
to skip the neigh table lookup, from Martin KaFai Lau.
4) Add cgroup.memory=nobpf kernel parameter option to disable BPF memory
accouting for container environments, from Yafang Shao.
5) Batch of ice multi-buffer and driver performance fixes,
from Alexander Lobakin.
6) Fix a bug in determining whether global subprog's argument is
PTR_TO_CTX, which is based on type names which breaks kprobe progs,
from Andrii Nakryiko.
7) Prep work for future -mcpu=v4 LLVM option which includes usage of
BPF_ST insn. Thus improve BPF_ST-related value tracking in verifier,
from Eduard Zingerman.
8) More prep work for later building selftests with Memory Sanitizer
in order to detect usages of undefined memory, from Ilya Leoshkevich.
9) Fix xsk sockets to check IFF_UP earlier to avoid a NULL pointer
dereference via sendmsg(), from Maciej Fijalkowski.
10) Implement BPF trampoline for RV64 JIT compiler, from Pu Lehui.
11) Fix BPF memory allocator in combination with BPF hashtab where it could
corrupt special fields e.g. used in bpf_spin_lock, from Hou Tao.
12) Fix LoongArch BPF JIT to always use 4 instructions for function
address so that instruction sequences don't change between passes,
from Hengqi Chen.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (64 commits)
selftests/bpf: Add bpf_fib_lookup test
bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup
riscv, bpf: Add bpf trampoline support for RV64
riscv, bpf: Add bpf_arch_text_poke support for RV64
riscv, bpf: Factor out emit_call for kernel and bpf context
riscv: Extend patch_text for multiple instructions
Revert "bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES"
selftests/bpf: Add global subprog context passing tests
selftests/bpf: Convert test_global_funcs test to test_loader framework
bpf: Fix global subprog context argument resolution logic
LoongArch, bpf: Use 4 instructions for function address in JIT
bpf: bpf_fib_lookup should not return neigh in NUD_FAILED state
bpf: Disable bh in bpf_test_run for xdp and tc prog
xsk: check IFF_UP earlier in Tx path
Fix typos in selftest/bpf files
selftests/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
samples/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
bpftool: Use bpf_{btf,link,map,prog}_get_info_by_fd()
libbpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
libbpf: Introduce bpf_{btf,link,map,prog}_get_info_by_fd()
...
====================
Link: https://lore.kernel.org/r/20230217221737.31122-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
About a quarter of the changes are for 32-bit arm, mostly filling in
device support for existing machines and adding minor cleanups, mostly
for Qualcomm and Samsung based machines.
Two new 32-bit SoCs are added, both are quad-core Cortex-A7 chips from
Rockchips that have been around for a while but were lacking kernel
support so far: RV1126 is a Vision SoC with an NPU and is used in the
Edgeble Neural Compute Module 2(Neu2) board, while RK3128 is design for
TV boxes and so far only comes with a dts for its refernece design.
The other 32-bit boards that were added are two ASpeed AST2600 based BMC
boards, the Microchip sam9x60_curiosity development board (Armv5 based!),
the Enclustra PE1 FPGA-SoM baseboard, and a few more boards for i.MX53
and i.MX6ULL.
On the RISC-V side, there are fewer patches, but a total of ten new
single-board computers based on variations of the Allwinner D1/T113 chip,
plus one more board based on Microchip Polarfire.
As usual, arm64 has by far the most changes here, with over 700 non-merge
changesets, among them over 400 alone for Qualcomm. The newly added SoCs
this time are all recent high-end embedded SoCs for various markets,
each on comes with support for its reference board:
- Qualcomm SM8550 (Snapdragon 8 Gen 2) for mobile phones
- Qualcomm QDU1000/QRU1000 5G RAN platform
- Rockchips RK3588/RK3588s for tablets, chromebooks and SBCs
- TI J784S4 for industrial and automotive applications
In total, there are 46 new arm64 machines:
- Reference platforms for each of the five new SoCs
- Three Amlogic based development boards
- Six embedded machines based on NXP i.MX8MM and i.MX8MP
- The Mediatek mt7986a based Banana Pi R3 router
- Six tablets based on Qualcomm MSM8916 (Snapdragon 410),
SM6115 (Snapdragon 662) and SM8250 (Snapdragon 865)
- Two LTE dongles, also based on MSM8916
- Seven mobile phones, based on Qualcomm MSM8953 (Snapdragon 610),
SDM450 and SDM632
- Three chromebooks based on Qualcomm SC7280 (Snapdragon 7c)
- Nine development boards based on Rockchips RK3588, RK3568,
RK3566 and RK3328.
- Five development machines based on TI K3 (AM642/AM654/AM68/AM69)
The cleanup of dtc warnings continues across all platforms, adding
to the total number of changes.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmPvpVYACgkQmmx57+YA
GNm3iA/+NgaiEgwxaot1eoBqKImyP6NtC9VHFYRbscVkaBEkdNpm2zeVX92E2/8d
dZuGiOqY5VC+e53Rbig6m0GLrctYJfZTdJ0tYih8cwkB0jVL6bHzFQE1ugZkXkQC
/dXx2ozNQD1XqfgXqi7OC2PeaBqBxOK4UFrhUvjfzR68GuZmWpdC4+1mdIs106D1
252vV3y3biMDKXg1SgTXc4t8nb/ZT69gJpgJdbNuypDcAVrqlLaQZQ1sdEUu2wsh
6XnBZKe8srkFFwN+eR0Tdf9MhneUFJxLQsAajAm4WN1QiGrqtU42mrpJE80b6Uic
wnkvgwfyGVeGivM4/bAkeug5dCiElzCiwQBCKzL95ucf75Z8SfmhFAVAqji/MFBF
yzfetUld975qI0Bw6zh9dJALz6hElZAbbvcGI1imlXjVIsOwINvCoB5r3YPJw7FR
2nhJrsXs8h37VZgkeTlMp5BMu9j0AQKoBL4zbOSdrDr+XuOvuzIez+8ashFLijvu
FO+qlXfHUC7WsR6wktVumCsADnVRPJZN0UeMhSFixceD/njVaRZBk3BOY5Ea9wjm
G0s3KpqnLgEMrjDW3FLBf8xb9qEQPBAyeYUL9d0MHByz8W7iI/dQjEie0UEzmCqI
J+cDdhMCKDNYOF0Xk8d9k2g5/p62/0akmncOBCZRJf9bMHklBWY=
=I8Ga
-----END PGP SIGNATURE-----
Merge tag 'soc-dt-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC DT updates from Arnd Bergmann:
"About a quarter of the changes are for 32-bit arm, mostly filling in
device support for existing machines and adding minor cleanups, mostly
for Qualcomm and Samsung based machines.
Two new 32-bit SoCs are added, both are quad-core Cortex-A7 chips from
Rockchips that have been around for a while but were lacking kernel
support so far: RV1126 is a Vision SoC with an NPU and is used in the
Edgeble Neural Compute Module 2(Neu2) board, while RK3128 is design
for TV boxes and so far only comes with a dts for its refernece
design.
The other 32-bit boards that were added are two ASpeed AST2600 based
BMC boards, the Microchip sam9x60_curiosity development board (Armv5
based!), the Enclustra PE1 FPGA-SoM baseboard, and a few more boards
for i.MX53 and i.MX6ULL.
On the RISC-V side, there are fewer patches, but a total of ten new
single-board computers based on variations of the Allwinner D1/T113
chip, plus one more board based on Microchip Polarfire.
As usual, arm64 has by far the most changes here, with over 700
non-merge changesets, among them over 400 alone for Qualcomm. The
newly added SoCs this time are all recent high-end embedded SoCs for
various markets, each on comes with support for its reference board:
- Qualcomm SM8550 (Snapdragon 8 Gen 2) for mobile phones
- Qualcomm QDU1000/QRU1000 5G RAN platform
- Rockchips RK3588/RK3588s for tablets, chromebooks and SBCs
- TI J784S4 for industrial and automotive applications
In total, there are 46 new arm64 machines:
- Reference platforms for each of the five new SoCs
- Three Amlogic based development boards
- Six embedded machines based on NXP i.MX8MM and i.MX8MP
- The Mediatek mt7986a based Banana Pi R3 router
- Six tablets based on Qualcomm MSM8916 (Snapdragon 410), SM6115
(Snapdragon 662) and SM8250 (Snapdragon 865)
- Two LTE dongles, also based on MSM8916
- Seven mobile phones, based on Qualcomm MSM8953 (Snapdragon 610),
SDM450 and SDM632
- Three chromebooks based on Qualcomm SC7280 (Snapdragon 7c)
- Nine development boards based on Rockchips RK3588, RK3568, RK3566
and RK3328.
- Five development machines based on TI K3 (AM642/AM654/AM68/AM69)
The cleanup of dtc warnings continues across all platforms, adding to
the total number of changes"
* tag 'soc-dt-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (1035 commits)
dt-bindings: riscv: correct starfive visionfive 2 compatibles
ARM: dts: socfpga: Add enclustra PE1 devicetree
dt-bindings: altera: Add enclustra mercury PE1
arm64: dts: qcom: msm8996: align RPM G-Link clock-controller node with bindings
arm64: dts: qcom: qcs404: align RPM G-Link node with bindings
arm64: dts: qcom: ipq6018: align RPM G-Link node with bindings
arm64: dts: qcom: sm8550: remove invalid interconnect property from cryptobam
arm64: dts: qcom: sc7280: Adjust zombie PWM frequency
arm64: dts: qcom: sc8280xp-pmics: Specify interrupt parent explicitly
arm64: dts: qcom: sm7225-fairphone-fp4: enable remaining i2c busses
arm64: dts: qcom: sm7225-fairphone-fp4: move status property down
arm64: dts: qcom: pmk8350: Use the correct PON compatible
arm64: dts: qcom: sc8280xp-x13s: Enable external display
arm64: dts: qcom: sc8280xp-crd: Introduce pmic_glink
arm64: dts: qcom: sc8280xp: Add USB-C-related DP blocks
arm64: dts: qcom: sm8350-hdk: enable GPU
arm64: dts: qcom: sm8350: add GPU, GMU, GPU CC and SMMU nodes
arm64: dts: qcom: sm8350: finish reordering nodes
arm64: dts: qcom: sm8350: move more nodes to correct place
arm64: dts: qcom: sm8350: reorder device nodes
...
As usual, this branch contains all the patches to enable options
for newly added device drivers in the 32-bit and 64-bit defconfig
files.
I have sorted the files according to the changes to Kconfig files,
to make it easier to check what has changed compared to the 'make
savedefconfig' output.
The most notable change this time is a series from Mark Brown
to add a 'virtconfig' target for arm64, which is for the moment
the same as the 'defconfig' target but disables all the top-level
SoC specific options in order to have a smaller and faster
kernel build.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmPtOE0ACgkQmmx57+YA
GNk0DQ/+N1/Ga/kGtD8UOHrAOO3IPyGJJjQduYBp8e2mNbBy7uq4kOzXdir6va13
4i0N1+5gGt+OC4hbDry4405k8X064nnz5dpgKPWlfIZpMlM/r93xPVKTHRh2rBJI
r18PH0I6QRvM4tGDBhbOXxs/T3jzYXTL0Vk4Y7RYO4Gqx0CL5QgQGIXyPkHTCk/y
WCl9Ycbb4KAjTsA3lcmsZ+horkKK1uiJuI1KeIiWwKMeHc8rMTJRdSedprURCPaP
SyQ4IHMMf3aST4PE8FLLnjD63F0suwUl/K4JRNktOcHcP+29T8cIqOgo7Tq8WLRk
WHemO2dQl7stA6K03RPEabXFR7QN8VNVobLiWAfAAY0jf73pXC/OGxHilzWKJwPS
Dd8SH2T2BW6p0Iuv95cYarfBXm2yp5Cp7WVmZhwX2/vPGjB9qJhvORiHoObYPIdo
JS3FxPvlV6xKOkZwcTTrwJlooO735xNNFl9AyzUXOvmraVFTA+njZ9S7fGq0h/30
Z4UONXkaOSxAe4AfcD7vMDk9ezKFM7rDsPeT27tU3Ti1pLU+AAAkUlyEeWqwerxz
miThF1LI5p5SWhSL32LjjBTfBPZ5DXZPni77Mbigq27OK/osuW3CJMenU5qD33+8
tmyzbX5CrkrwL0kfXpB9fCLiQKNmuO5VokbaapewwZykrdvX4H4=
=48oI
-----END PGP SIGNATURE-----
Merge tag 'soc-defconfig-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM defconfigs updates from Arnd Bergmann:
"As usual, this contains all the patches to enable options for newly
added device drivers in the 32-bit and 64-bit defconfig files.
I have sorted the files according to the changes to Kconfig files,
to make it easier to check what has changed compared to the 'make
savedefconfig' output.
The most notable change this time is a series from Mark Brown to add
a 'virtconfig' target for arm64, which is for the moment the same as
the 'defconfig' target but disables all the top-level SoC specific
options in order to have a smaller and faster kernel build"
* tag 'soc-defconfig-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (39 commits)
arm64: defconfig: enable drivers required by the Qualcomm SA8775P platform
arm64: defconfig: Enable DisplayPort on SC8280XP laptops
arm64: configs: Add virtconfig
kbuild: Provide a version of merge_into_defconfig without override warnings
scripts: merge_config: Add option to suppress warning on overrides
ARM: reorder defconfig files
arm64: reorder defconfig
arm64: defconfig: enable Qualcomm SDAM nvmem driver
arm64: defconfig: enable SM8450 DISPCC clock driver
ARM: defconfig: Add IOSCHED_BFQ to the default configs
ARM: configs: multi_v7: enable NVMEM driver for STM32
ARM: Add wpcm450_defconfig for Nuvoton WPCM450
arm64: defconfig: Enable DMA_RESTRICTED_POOL
arm64: defconfig: Enable missing configs for mt8192-asurada
riscv: defconfig: Enable the Allwinner D1 platform and drivers
ARM: imx_v6_v7_defconfig: Don't enable PROVE_LOCKING
ARM: multi_v7_defconfig: Add GXP Fan and SPI support
ARM: add multi_v7_lpae_defconfig
kbuild: Add config fragment merge functionality
ARM: multi_v7_defconfig: Add options to support TQMLS102xA series
...
BPF trampoline is the critical infrastructure of the BPF subsystem, acting
as a mediator between kernel functions and BPF programs. Numerous important
features, such as using BPF program for zero overhead kernel introspection,
rely on this key component. We can't wait to support bpf trampoline on RV64.
The related tests have passed, as well as the test_verifier with no new
failure ceses.
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/bpf/20230215135205.1411105-5-pulehui@huaweicloud.com
Implement bpf_arch_text_poke for RV64. For call scenario, to make BPF
trampoline compatible with the kernel and BPF context, we follow the
framework of RV64 ftrace to reserve 4 nops for BPF programs as function
entry, and use auipc+jalr instructions for function call. However, since
auipc+jalr call instruction is non-atomic operation, we need to use
stop-machine to make sure instructions patching in atomic context. Also,
we use auipc+jalr pair and need to patch in stop-machine context for
jump scenario.
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/bpf/20230215135205.1411105-4-pulehui@huaweicloud.com
The current emit_call function is not suitable for kernel function call as
it store return value to bpf R0 register. We can separate it out for common
use. Meanwhile, simplify judgment logic, that is, fixed function address
can use jal or auipc+jalr, while the unfixed can use only auipc+jalr.
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/bpf/20230215135205.1411105-3-pulehui@huaweicloud.com
Extend patch_text for multiple instructions. This is the preparaiton for
multiple instructions text patching in riscv BPF trampoline, and may be
useful for other scenario.
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/bpf/20230215135205.1411105-2-pulehui@huaweicloud.com
The __RISCV_INSN_FUNCS originally declared riscv_insn_is_* functions inside
the kprobes implementation. This got moved into a central header in
commit ec5f908775 ("RISC-V: Move riscv_insn_is_* macros into a common header").
Though it looks like I overlooked two of them, so fix that. FENCE itself is
an instruction defined directly by its own opcode, while the created
riscv_isn_is_system function covers all instructions defined under the SYSTEM
opcode.
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230113211955.3534431-1-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
guoren@kernel.org <guoren@kernel.org> says:
From: Guo Ren <guoren@linux.alibaba.com>
The previous ftrace detour implementation fc76b8b8011 ("riscv: Using
PATCHABLE_FUNCTION_ENTRY instead of MCOUNT") contain three problems.
- The most horrible bug is preemption panic which found by Andy [1].
Let's disable preemption for ftrace first, and Andy could continue
the ftrace preemption work.
- The "-fpatchable-function-entry= CFLAG" wasted code size
!RISCV_ISA_C.
- The ftrace detour implementation wasted code size.
- When livepatching, the trampoline (ftrace_regs_caller) would not
return to <func_prolog+12> but would rather jump to the new function.
So, "REG_L ra, -SZREG(sp)" would not run and the original return
address would not be restored. The kernel is likely to hang or crash
as a result. (Found by Evgenii Shatokhin [4])
[Palmer: The first three patches in this series are pretty concrete
fixes, so I'm pulling them ahead of the rest of the series.]
* b4-shazam-merge:
riscv: ftrace: Reduce the detour code size to half
riscv: ftrace: Remove wasted nops for !RISCV_ISA_C
riscv: ftrace: Fixup panic by disabling preemption
Link: https://lore.kernel.org/r/20230112090603.1295340-1-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Use a temporary register to reduce the size of detour code from 16 bytes to
8 bytes. The previous implementation is from 'commit afc76b8b80 ("riscv:
Using PATCHABLE_FUNCTION_ENTRY instead of MCOUNT")'.
Before the patch:
<func_prolog>:
0: REG_S ra, -SZREG(sp)
4: auipc ra, ?
8: jalr ?(ra)
12: REG_L ra, -SZREG(sp)
(func_boddy)
After the patch:
<func_prolog>:
0: auipc t0, ?
4: jalr t0, ?(t0)
(func_boddy)
This patch not just reduces the size of detour code, but also fixes an
important issue:
An Ftrace callback registered with FTRACE_OPS_FL_IPMODIFY flag can
actually change the instruction pointer, e.g. to "replace" the given
kernel function with a new one, which is needed for livepatching, etc.
In this case, the trampoline (ftrace_regs_caller) would not return to
<func_prolog+12> but would rather jump to the new function. So, "REG_L
ra, -SZREG(sp)" would not run and the original return address would not
be restored. The kernel is likely to hang or crash as a result.
This can be easily demonstrated if one tries to "replace", say,
cmdline_proc_show() with a new function with the same signature using
instruction_pointer_set(&fregs->regs, new_func_addr) in the Ftrace
callback.
Link: https://lore.kernel.org/linux-riscv/20221122075440.1165172-1-suagrfillet@gmail.com/
Link: https://lore.kernel.org/linux-riscv/d7d5730b-ebef-68e5-5046-e763e1ee6164@yadro.com/
Co-developed-by: Song Shuai <suagrfillet@gmail.com>
Signed-off-by: Song Shuai <suagrfillet@gmail.com>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Cc: Evgenii Shatokhin <e.shatokhin@yadro.com>
Reviewed-by: Evgenii Shatokhin <e.shatokhin@yadro.com>
Link: https://lore.kernel.org/r/20230112090603.1295340-4-guoren@kernel.org
Cc: stable@vger.kernel.org
Fixes: 10626c32e3 ("riscv/ftrace: Add basic support")
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
In RISCV, we must use an AUIPC + JALR pair to encode an immediate,
forming a jump that jumps to an address over 4K. This may cause errors
if we want to enable kernel preemption and remove dependency from
patching code with stop_machine(). For example, if a task was switched
out on auipc. And, if we changed the ftrace function before it was
switched back, then it would jump to an address that has updated 11:0
bits mixing with previous XLEN:12 part.
p: patched area performed by dynamic ftrace
ftrace_prologue:
p| REG_S ra, -SZREG(sp)
p| auipc ra, 0x? ------------> preempted
...
change ftrace function
...
p| jalr -?(ra) <------------- switched back
p| REG_L ra, -SZREG(sp)
func:
xxx
ret
Fixes: afc76b8b80 ("riscv: Using PATCHABLE_FUNCTION_ENTRY instead of MCOUNT")
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230112090603.1295340-2-guoren@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Conor Dooley <conor@kernel.org> says:
From: Conor Dooley <conor.dooley@microchip.com>
I've yoinked patch 1 from Drew's series adding support for Zicboz &
attached two more patches here that remove the need for, and then drop
the toolchain support checks for Zicbom. The goal is to remove the need
for checking the presence of toolchain Zicbom support in the work being
done to support non instruction based CMOs [1].
I've tested compliation on a number of different configurations with
the Zicbom config option enabled. The important ones to call out I
guess are:
- clang/llvm 14 w/ LLVM=1 which doesn't support Zicbom atm.
- gcc 11 w/ binutils 2.37 which doesn't support Zicbom atm either.
- clang/llvm 15 w/ LLVM=1 BUT with binutils 2.37's ld. This is the
configuration that prompted adding the LD checks as cc/as supports
Zicbom, but ld doesn't [2].
- gcc 12 w/ binutils 2.39 & clang 15 w/ LLVM=1, both of these supported
Zicbom before and still do.
I also checked building the THEAD errata etc with
CONFIG_RISCV_ISA_ZICBOM disabled, and there were no build issues there
either.
* b4-shazam-merge:
RISC-V: remove toolchain version checks for Zicbom
RISC-V: replace cbom instructions with an insn-def
RISC-V: insn-def: Add I-type insn-def
Link: https://lore.kernel.org/r/20230108163356.3063839-1-conor@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Commit b8c86872d1 ("riscv: fix detection of toolchain Zicbom
support") fixed building on systems where Zicbom was supported by the
compiler/assembler but not by the linker in an easily backportable
manner.
Now that the we have insn-defs for the 3 instructions, toolchain support
is no longer required for Zicbom.
Stop emitting "_zicbom" in -march when Zicbom is enabled & drop the
version checks entirely.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230108163356.3063839-4-conor@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
CBO instructions use the I-type of instruction format where
the immediate is used to identify the CBO instruction type.
Add I-type instruction encoding support to insn-def.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230108163356.3063839-2-conor@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Conor Dooley <conor@kernel.org> says:
From: Conor Dooley <conor.dooley@microchip.com>
Ever since RISC-V starting using generic arch topology code, the code
paths for cpu-capacity have been there but there's no binding defined to
actually convey the information. Defining the same property as used on
arm seems to be the only logical thing to do, so do it.
[Palmer: This is on top of the fix required to make it work, which
itself wasn't merged until late in the 6.2 cycle and thus pulls in
various other fixes.]
* b4-shazam-merge:
dt-bindings: riscv: add a capacity-dmips-mhz cpu property
dt-bindings: arm: move cpu-capacity to a shared loation
riscv: Move call to init_cpu_topology() to later initialization stage
riscv/kprobe: Fix instruction simulation of JALR
riscv: fix -Wundef warning for CONFIG_RISCV_BOOT_SPINWAIT
MAINTAINERS: add an IRC entry for RISC-V
RISC-V: fix compile error from deduplicated __ALTERNATIVE_CFG_2
dt-bindings: riscv: fix single letter canonical order
dt-bindings: riscv: fix underscore requirement for multi-letter extensions
riscv: uaccess: fix type of 0 variable on error in get_user()
riscv, kprobes: Stricter c.jr/c.jalr decoding
Link: https://lore.kernel.org/r/20230104180513.1379453-1-conor@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Commit 4bf8860760 ("riscv: cpufeature: extend
riscv_cpufeature_patch_func to all ISA extensions") switched ISA
extension alternatives to use the RISCV_ISA_EXT_* macros instead of
CPUFEATURE_*. This was mismerged when applied on top of the Zbb series,
so the Zbb alternatives referenced the wrong errata ID values.
Fixes: 9daca9a5b9 ("Merge patch series "riscv: improve boot time isa extensions handling"")
Signed-off-by: Samuel Holland <samuel@sholland.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230212021534.59121-3-samuel@sholland.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Now that the text to patch is located using a relative offset from the
alternative entry, the text address should be computed without applying
the kernel mapping offset, both before and after VM setup.
Fixes: 8d23e94a44 ("riscv: switch to relative alternative entries")
Signed-off-by: Samuel Holland <samuel@sholland.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Jisheng Zhang <jszhang@kernel.org>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230212021534.59121-2-samuel@sholland.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Selects should be sorted alphanumerically, and were tidied up originally
by Palmer in commit e8c7ef7d58 ("RISC-V: Sort select statements
alphanumerically") since then, things have gotten out of order again.
Fish RMK's original script out of commit b1b3f49ce4 ("ARM: config:
sort select statements alphanumerically") and do some spring cleaning.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20221219172836.134709-1-conor@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Having a clocksource_arch_init() callback always sets vdso_clock_mode to
VDSO_CLOCKMODE_ARCHTIMER if GENERIC_GETTIMEOFDAY is enabled, this is
required for the riscv-timer.
This works for platforms where just riscv-timer clocksource is present.
On platforms where other clock sources are available we want them to
register with vdso_clock_mode set to VDSO_CLOCKMODE_NONE.
On the Renesas RZ/Five SoC OSTM block can be used as clocksource [0], to
avoid multiple clock sources being registered as VDSO_CLOCKMODE_ARCHTIMER
move setting of vdso_clock_mode in the riscv-timer driver instead of doing
this in clocksource_arch_init() callback as done similarly for ARM/64
architecture.
[0] drivers/clocksource/renesas-ostm.c
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Tested-by: Samuel Holland <samuel@sholland.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Samuel Holland <samuel@sholland.org>
Link: https://lore.kernel.org/r/20221229224601.103851-1-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Similarly to commit 022eb8ae8b ("ARM: 8938/1: kernel: initialize
broadcast hrtimer based clock event device"), RISC-V needs to initiate
hrtimer based broadcast clock event device before C3STOP can be used.
Otherwise, the introduction of C3STOP for the RISC-V arch timer in
commit 232ccac1bd ("clocksource/drivers/riscv: Events are stopped
during CPU suspend") leaves us without any broadcast timer registered.
This prevents the kernel from entering oneshot mode, which breaks timer
behaviour, for example clock_nanosleep().
A test app that sleeps each cpu for 6, 5, 4, 3 ms respectively, HZ=250
& C3STOP enabled, the sleep times are rounded up to the next jiffy:
== CPU: 1 == == CPU: 2 == == CPU: 3 == == CPU: 4 ==
Mean: 7.974992 Mean: 7.976534 Mean: 7.962591 Mean: 3.952179
Std Dev: 0.154374 Std Dev: 0.156082 Std Dev: 0.171018 Std Dev: 0.076193
Hi: 9.472000 Hi: 10.495000 Hi: 8.864000 Hi: 4.736000
Lo: 6.087000 Lo: 6.380000 Lo: 4.872000 Lo: 3.403000
Samples: 521 Samples: 521 Samples: 521 Samples: 521
Link: https://lore.kernel.org/linux-riscv/YzYTNQRxLr7Q9JR0@spud/
Fixes: 232ccac1bd ("clocksource/drivers/riscv: Events are stopped during CPU suspend")
Suggested-by: Samuel Holland <samuel@sholland.org>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Samuel Holland <samuel@sholland.org>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230103141102.772228-2-apatel@ventanamicro.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
* A fix to avoid partial TLB fences for huge pages, which are disallowed
by the ISA.
* A fix to to avoid missing a frame when dumping stacks.
* A fix to avoid misaligned accesses (and possibly overflows) in
kprobes.
* A fix for a race condition in tracking page dirtiness.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmPmZ50THHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYiXg6D/0S9sZJWHNHRUIPzTSkmZX7T5awnT+J
G6sQ1c48ukNz6ujHnlW9x05xHi41taRco+OIpZ7rLmEN/sdjTDEk7fhSaYqqpzZ+
KsVjhT3rBfoAf6B/zwHOkLFnvCLHagv4NzPQrT+6Kwc7TAm2XFm0UihBDFaTUXJt
6SXCFhy3YPft4WqFxQX5umo8W+TrIJd39c3/suwUABPFGyHjmRt5RHEaeOsil7SG
rdxq2KlYVeZ7485bYIKshmIRXQNmxwVI75+CGdzQqL0g8z8dT6rnV4FOW2PBP074
xr+nn0jV9DBmDNbZ7fcsjAJeAzNZZVyYzE9PGESF5euu09iO+VEdnRD3Bqx/rVcB
PQjB78PsaL0OV8fdJLssa0W9D2kvV7FF3aZLNgEJ3+lhmOw2kkuCyp7ud0QXMU0t
rYb6TQQ3kr5lKu3pJPbt8oir9/KDXaFCEN2e5Lmr7T58KIvoYns+jZnY/7FctDTq
S3Ev9sxDzfJWttVYNmciNTmUV8NMlTOTllBFcp68IWmTrOlKETURH7XBOLDPl8Pd
xPmu226whK/5s1/vjwMH/uUnSCxQXWlxiljMg8tkYt44+RU68AWMcnhylWy2eu1c
njdQIcpWKtZVy/NTH4gSvaGIuY3bi4h0zsY23x713eDp5ZbEeEWEcEDCrnjjuU1c
EcOgqzpIzc/c8w==
=hyiU
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
"This is a little bigger that I'd hope for this late in the cycle, but
they're all pretty concrete fixes and the only one that's bigger than
a few lines is pmdp_collapse_flush() (which is almost all
boilerplate/comment). It's also all bug fixes for issues that have
been around for a while.
So I think it's not all that scary, just bad timing.
- avoid partial TLB fences for huge pages, which are disallowed by
the ISA
- avoid missing a frame when dumping stacks
- avoid misaligned accesses (and possibly overflows) in kprobes
- fix a race condition in tracking page dirtiness"
* tag 'riscv-for-linus-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fixup race condition on PG_dcache_clean in flush_icache_pte
riscv: kprobe: Fixup misaligned load text
riscv: stacktrace: Fix missing the first frame
riscv: mm: Implement pmdp_collapse_flush for THP
Every architecture that supports FLATMEM memory model defines its own
version of pfn_valid() that essentially compares a pfn to max_mapnr.
Use mips/powerpc version implemented as static inline as a generic
implementation of pfn_valid() and drop its per-architecture definitions.
[rppt@kernel.org: fix the generic pfn_valid()]
Link: https://lkml.kernel.org/r/Y9lg7R1Yd931C+y5@kernel.org
Link: https://lkml.kernel.org/r/20230129124235.209895-5-rppt@kernel.org
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Guo Ren <guoren@kernel.org> [csky]
Acked-by: Huacai Chen <chenhuacai@loongson.cn> [LoongArch]
Acked-by: Stafford Horne <shorne@gmail.com> [OpenRISC]
Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Reviewed-by: David Hildenbrand <david@redhat.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Cc: Brian Cain <bcain@quicinc.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In commit 588a513d34 ("arm64: Fix race condition on PG_dcache_clean
in __sync_icache_dcache()"), we found RISC-V has the same issue as the
previous arm64. The previous implementation didn't guarantee the correct
sequence of operations, which means flush_icache_all() hasn't been
called when the PG_dcache_clean was set. That would cause a risk of page
synchronization.
Fixes: 08f051eda3 ("RISC-V: Flush I$ when making a dirty page executable")
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230127035306.1819561-1-guoren@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
SBI PMU extension defines a set of firmware events which can provide
useful information to guests about the number of SBI calls. As
hypervisor implements the SBI PMU extension, these firmware events
correspond to ecall invocations between VS->HS mode. All other firmware
events will always report zero if monitored as KVM doesn't implement them.
This patch adds all the infrastructure required to support firmware
events.
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
RISC-V SBI PMU & Sscofpmf ISA extension allows supporting perf in
the virtualization enviornment as well. KVM implementation
relies on SBI PMU extension for the most part while trapping
& emulating the CSRs read for counter access.
This patch doesn't have the event sampling support yet.
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
As the KVM guests only see the virtual PMU counters, all hpmcounter
access should trap and KVM emulates the read access on behalf of guests.
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Any guest must not get access to any hpmcounter including cycle/instret
without any checks. We achieve that by disabling all the bits except TM
bit in hcounteren.
However, instret and cycle access for guest user space can be enabled
upon explicit request (via ONE REG) or on first trap from VU mode
to maintain ABI requirement in the future. This patch doesn't support
that as ONE REG interface is not settled yet.
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
The privilege mode filtering feature must be available in the host so
that the host can inhibit the counters while the execution is in HS mode.
Otherwise, the guests may have access to critical guest information.
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
SBI PMU extension allows KVM guests to configure/start/stop/query
about the PMU counters in virtualized enviornment as well.
In order to allow that, KVM implements the entire SBI PMU extension.
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
This patch only adds barebone structure of perf implementation. Most
of the function returns zero at this point and will be implemented
fully in the future.
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Currently, the SBI extension handle is expected to return Linux error code.
The top SBI layer converts the Linux error code to SBI specific error code
that can be returned to guest invoking the SBI calls. This model works
as long as SBI error codes have 1-to-1 mappings between them.
However, that may not be true always. This patch attempts to disassociate
both these error codes by allowing the SBI extension implementation to
return SBI specific error codes as well.
The extension will continue to return the Linux error specific code which
will indicate any problem *with* the extension emulation while the
SBI specific error will indicate the problem *of* the emulation.
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Suggested-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
According to the SBI specification, the stop function can only
return error code SBI_ERR_FAILED. However, currently it returns
-EINVAL which will be mapped SBI_ERR_INVALID_PARAM.
Return an linux error code that maps to SBI_ERR_FAILED i.e doesn't map
to any other SBI error code. While EACCES is not the best error code
to describe the situation, it is close enough and will be replaced
with SBI error codes directly anyways.
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Currently the probe function just checks if an SBI extension is
registered or not. However, the extension may not want to advertise
itself depending on some other condition.
An additional extension specific probe function will allow
extensions to decide if they want to be advertised to the caller or
not. Any extension that does not require additional dependency checks
can avoid implementing this function.
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
This patch fixes/improve few minor things in SBI PMU extension
definition.
1. Align all the firmware event names.
2. Add macros for bit positions in cache event ID & ops.
The changes were small enough to combine them together instead
of creating 1 liner patches.
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
The M-mode redirects an unhandled illegal instruction trap back
to S-mode. However, KVM running in HS-mode terminates the VS-mode
software when it receives illegal instruction trap. Instead, KVM
should redirect the illegal instruction trap back to VS-mode, and
let VS-mode trap handler decide the next step. This futher allows
guest kernel to implement on-demand enabling of vector extension
for a guest user space process upon first-use.
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
The kvm_riscv_vcpu_trap_redirect() should set guest privilege mode
to supervisor mode because guest traps/interrupts are always handled
in virtual supervisor mode.
Fixes: 9f70132651 ("RISC-V: KVM: Handle MMIO exits for VCPU")
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
At the moment, riscv only supports PMD and PUD hugepages. For sv39,
PGDIR_SIZE == PUD_SIZE but not for sv48 and sv57. So fix this by changing
PGDIR_SIZE into PUD_SIZE.
Fixes: 9d05c1fee8 ("RISC-V: KVM: Implement stage2 page table programming")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Add the generic plumbing to detect whether or not the runtime code
regions were constructed with BTI/IBT landing pads by the firmware,
permitting the OS to enable enforcement when mapping these regions into
the OS's address space.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
* A build fix to avoid static branches in cpu_relax(), which greatly
inflates the jump tables and breaks at least
CONFIG_CC_OPTIMIZE_FOR_SIZE=y.
* A fix for a kernel panic when probing impossible instruction
positions.
* A fix to disable unwind tables, which are enabled by default for
GCC-13 and result in unhandled relocations in modules.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmPdJvwTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYiY3tD/4jeqeeiOnO54xgsN8WVLQpqyAJ7UHY
p3Jv993jTs9Wmo4eUaL/b0PoAvu6V8omGsorPatJUShZnAxQ51Cre6DMJ2NZ5uQG
SJVt8t2/gYDZ8WPqKbN92ZqxIgnGc78lNX1YcrdZRL5oO7MKX1qcnqcSj/1NMAz1
XZGgrXC9eQHNQ72FYevxNoKwSpfMp3vBSHq6oxGKKVRcTYPMiEQp8GVIqYOZSDOG
MLYl+szh9tz3vvMLZ4+R7/Zu5vQUQ7mvq2Sy+MjrpFpZJWOCqUfxPGZ1BaVbRSc3
K98Z/OEeSIxoxPcv0NdWJ/wbu1WKm+tdk0leCmpzj+DdAtf/ysKBdipei6gNZhWY
Q43vwlNwviDNioVW0pIEGpT8nn5k+XNExhwUJtcYgmWleOevzSlKfpPRenLY9rHF
IjNdN8ZcoR5Kh/ydGew+yYTmxjykFoCvAENTdCEj3gehWEF56Emts+uVJpPhxZS7
fBBVnYN64zq3gfbStVB0ci4wMH126vWbP6iEQMfc9fKCoAKymEyjspehQ2NHK/V4
s1Zvb4J0Hay1KQT+vhOJoMOf2vm6zXNRzoXQAtKz8HSn9L6zLq6+oYzHH/55CZ3K
7CN7djPGacIPacqLYE2QodL+xiXNEs7k2xbvU0OsS+uQcRKkYqSPI9hx5jgEG+PS
OXh8tvtxdpVOqQ==
=tZzK
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A build fix to avoid static branches in cpu_relax(), which greatly
inflates the jump tables and breaks at least
CONFIG_CC_OPTIMIZE_FOR_SIZE=y.
- A fix for a kernel panic when probing impossible instruction
positions.
- A fix to disable unwind tables, which are enabled by default for
GCC-13 and result in unhandled relocations in modules.
* tag 'riscv-for-linus-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: disable generation of unwind tables
riscv: kprobe: Fixup kernel panic when probing an illegal position
riscv: Fix build with CONFIG_CC_OPTIMIZE_FOR_SIZE=y
__HAVE_ARCH_PTE_SWP_EXCLUSIVE is now supported by all architectures that
support swp PTEs, so let's drop it.
Link: https://lkml.kernel.org/r/20230113171026.582290-27-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the
offset. This reduces the maximum swap space per file: on 32bit to 16 GiB
(was 32 GiB).
Note that this bit does not conflict with swap PMDs and could also be used
in swap PMD context later.
While at it, mask the type in __swp_entry().
Link: https://lkml.kernel.org/r/20230113171026.582290-20-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When running kfence_test, I found some testcases failed like this:
# test_out_of_bounds_read: EXPECTATION FAILED at mm/kfence/kfence_test.c:346
Expected report_matches(&expect) to be true, but is false
not ok 1 - test_out_of_bounds_read
The corresponding call-trace is:
BUG: KFENCE: out-of-bounds read in kunit_try_run_case+0x38/0x84
Out-of-bounds read at 0x(____ptrval____) (32B right of kfence-#10):
kunit_try_run_case+0x38/0x84
kunit_generic_run_threadfn_adapter+0x12/0x1e
kthread+0xc8/0xde
ret_from_exception+0x0/0xc
The kfence_test using the first frame of call trace to check whether the
testcase is succeed or not. Commit 6a00ef4493 ("riscv: eliminate
unreliable __builtin_frame_address(1)") skip first frame for all
case, which results the kfence_test failed. Indeed, we only need to skip
the first frame for case (task==NULL || task==current).
With this patch, the call-trace will be:
BUG: KFENCE: out-of-bounds read in test_out_of_bounds_read+0x88/0x19e
Out-of-bounds read at 0x(____ptrval____) (1B left of kfence-#7):
test_out_of_bounds_read+0x88/0x19e
kunit_try_run_case+0x38/0x84
kunit_generic_run_threadfn_adapter+0x12/0x1e
kthread+0xc8/0xde
ret_from_exception+0x0/0xc
Fixes: 6a00ef4493 ("riscv: eliminate unreliable __builtin_frame_address(1)")
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Tested-by: Samuel Holland <samuel@sholland.org>
Link: https://lore.kernel.org/r/20221207025038.1022045-1-liushixin2@huawei.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
When THP is enabled, 4K pages are collapsed into a single huge
page using the generic pmdp_collapse_flush() which will further
use flush_tlb_range() to shoot-down stale TLB entries. Unfortunately,
the generic pmdp_collapse_flush() only invalidates cached leaf PTEs
using address specific SFENCEs which results in repetitive (or
unpredictable) page faults on RISC-V implementations which cache
non-leaf PTEs.
Provide a RISC-V specific pmdp_collapse_flush() which ensures both
cached leaf and non-leaf PTEs are invalidated by using non-address
specific SFENCEs as recommended by the RISC-V privileged specification.
Fixes: e88b333142 ("riscv: mm: add THP support on 64-bit")
Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
Link: https://lore.kernel.org/r/20230130074815.1694055-1-mchitale@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
GCC 13 will enable -fasynchronous-unwind-tables by default on riscv. In
the kernel, we don't have any use for unwind tables yet, so disable them.
More importantly, the .eh_frame section brings relocations
(R_RISC_32_PCREL, R_RISCV_SET{6,8,16}, R_RISCV_SUB{6,8,16}) into modules
that we are not prepared to handle.
Signed-off-by: Andreas Schwab <schwab@suse.de>
Link: https://lore.kernel.org/r/mvmzg9xybqu.fsf@suse.de
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The kernel would panic when probed for an illegal position. eg:
(CONFIG_RISCV_ISA_C=n)
echo 'p:hello kernel_clone+0x16 a0=%a0' >> kprobe_events
echo 1 > events/kprobes/hello/enable
cat trace
Kernel panic - not syncing: stack-protector: Kernel stack
is corrupted in: __do_sys_newfstatat+0xb8/0xb8
CPU: 0 PID: 111 Comm: sh Not tainted
6.2.0-rc1-00027-g2d398fe49a4d #490
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff80007268>] dump_backtrace+0x38/0x48
[<ffffffff80c5e83c>] show_stack+0x50/0x68
[<ffffffff80c6da28>] dump_stack_lvl+0x60/0x84
[<ffffffff80c6da6c>] dump_stack+0x20/0x30
[<ffffffff80c5ecf4>] panic+0x160/0x374
[<ffffffff80c6db94>] generic_handle_arch_irq+0x0/0xa8
[<ffffffff802deeb0>] sys_newstat+0x0/0x30
[<ffffffff800158c0>] sys_clone+0x20/0x30
[<ffffffff800039e8>] ret_from_syscall+0x0/0x4
---[ end Kernel panic - not syncing: stack-protector:
Kernel stack is corrupted in: __do_sys_newfstatat+0xb8/0xb8 ]---
That is because the kprobe's ebreak instruction broke the kernel's
original code. The user should guarantee the correction of the probe
position, but it couldn't make the kernel panic.
This patch adds arch_check_kprobe in arch_prepare_kprobe to prevent an
illegal position (Such as the middle of an instruction).
Fixes: c22b0bcb1d ("riscv: Add kprobes supported")
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20230201040604.3390509-1-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Jisheng Zhang <jszhang@kernel.org> says:
Generally, riscv ISA extensions are fixed for any specific hardware
platform, so a hart's features won't change after booting, this
chacteristic makes it straightforward to use a static branch to check
a specific ISA extension is supported or not to optimize performance.
However, some ISA extensions such as SVPBMT and ZICBOM are handled
via. the alternative sequences.
Basically, for ease of maintenance, we prefer to use static branches
in C code, but recently, Samuel found that the static branch usage in
cpu_relax() breaks building with CONFIG_CC_OPTIMIZE_FOR_SIZE[1]. As
Samuel pointed out, "Having a static branch in cpu_relax() is
problematic because that function is widely inlined, including in some
quite complex functions like in the VDSO. A quick measurement shows
this static branch is responsible by itself for around 40% of the jump
table."
Samuel's findings pointed out one of a few downsides of static branches
usage in C code to handle ISA extensions detected at boot time:
static branch's metadata in the __jump_table section, which is not
discarded after ISA extensions are finalized, wastes some space.
I want to try to solve the issue for all possible dynamic handling of
ISA extensions at boot time. Inspired by Mark[2], this patch introduces
riscv_has_extension_*() helpers, which work like static branches but
are patched using alternatives, thus the metadata can be freed after
patching.
[1]https://lore.kernel.org/linux-riscv/20220922060958.44203-1-samuel@sholland.org/
[2]https://lore.kernel.org/linux-arm-kernel/20220912162210.3626215-8-mark.rutland@arm.com/
[3]https://lore.kernel.org/linux-riscv/20221130225614.1594256-1-heiko@sntech.de/
* b4-shazam-merge:
riscv: remove riscv_isa_ext_keys[] array and related usage
riscv: KVM: Switch has_svinval() to riscv_has_extension_unlikely()
riscv: cpu_relax: switch to riscv_has_extension_likely()
riscv: alternative: patch alternatives in the vDSO
riscv: switch to relative alternative entries
riscv: module: Add ADD16 and SUB16 rela types
riscv: module: move find_section to module.h
riscv: fpu: switch has_fpu() to riscv_has_extension_likely()
riscv: introduce riscv_has_extension_[un]likely()
riscv: cpufeature: extend riscv_cpufeature_patch_func to all ISA extensions
riscv: hwcap: make ISA extension ids can be used in asm
riscv: cpufeature: detect RISCV_ALTERNATIVES_EARLY_BOOT earlier
riscv: move riscv_noncoherent_supported() out of ZICBOM probe
Link: https://lore.kernel.org/r/20230128172856.3814-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
All users have switched to riscv_has_extension_*, remove unused
definitions, vars and related setting code.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230128172856.3814-14-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Switch has_svinval() from static branch to the new helper
riscv_has_extension_unlikely().
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Acked-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20230128172856.3814-13-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Make it possible to use alternatives in the vDSO, so that better
implementations can be used if possible.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230128172856.3814-11-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Instead of using absolute addresses for both the old instrucions and
the alternative instructions, use offsets relative to the alt_entry
values. So this not only cuts the size of the alternative entry, but
also meets the prerequisite for patching alternatives in the vDSO,
since absolute alternative entries are subject to dynamic relocation,
which is incompatible with the vDSO building.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230128172856.3814-10-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
To prepare for 16-bit relocation types to be emitted in alternatives
add support for ADD16 and SUB16.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230128172856.3814-9-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Move find_section() to module.h so that the implementation can be shared
by the alternatives code. This will allow us to use alternatives in
the vdso.
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20230128172856.3814-8-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Generally, riscv ISA extensions are fixed for any specific hardware
platform, so a hart's features won't change after booting. This
chacteristic makes it straightforward to use a static branch to check
if a specific ISA extension is supported or not to optimize
performance.
However, some ISA extensions such as SVPBMT and ZICBOM are handled
via. the alternative sequences.
Basically, for ease of maintenance, we prefer to use static branches
in C code, but recently, Samuel found that the static branch usage in
cpu_relax() breaks building with CONFIG_CC_OPTIMIZE_FOR_SIZE[1]. As
Samuel pointed out, "Having a static branch in cpu_relax() is
problematic because that function is widely inlined, including in some
quite complex functions like in the VDSO. A quick measurement shows
this static branch is responsible by itself for around 40% of the jump
table."
Samuel's findings pointed out one of a few downsides of static branches
usage in C code to handle ISA extensions detected at boot time:
static branch's metadata in the __jump_table section, which is not
discarded after ISA extensions are finalized, wastes some space.
I want to try to solve the issue for all possible dynamic handling of
ISA extensions at boot time. Inspired by Mark[2], this patch introduces
riscv_has_extension_*() helpers, which work like static branches but
are patched using alternatives, thus the metadata can be freed after
patching.
Link: https://lore.kernel.org/linux-riscv/20220922060958.44203-1-samuel@sholland.org/ [1]
Link: https://lore.kernel.org/linux-arm-kernel/20220912162210.3626215-8-mark.rutland@arm.com/ [2]
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230128172856.3814-6-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
riscv_cpufeature_patch_func() currently only scans a limited set of
cpufeatures, explicitly defined with macros. Extend it to probe for all
ISA extensions.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230128172856.3814-5-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
So that ISA extensions can be used in assembly files, convert the
multi-letter RISC-V ISA extension IDs enums to macros.
In order to make them visible, move the #ifndef __ASSEMBLY__ guard
to a later point in the header
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230128172856.3814-4-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Currently riscv_cpufeature_patch_func() does nothing at the
RISCV_ALTERNATIVES_EARLY_BOOT stage. Add a check to detect whether we
are in this stage and exit early. This will allow us to use
riscv_cpufeature_patch_func() for scanning of all ISA extensions.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230128172856.3814-3-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
It's a bit weird to call riscv_noncoherent_supported() each time when
insmoding a module. Move the calling out of feature patch func.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230128172856.3814-2-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This is a single fix, but it conflicts with some recent features. I'm
merging it on top of the commit it fixes to ease backporting.
* b4-shazam-merge:
riscv: Fix build with CONFIG_CC_OPTIMIZE_FOR_SIZE=y
Link: https://lore.kernel.org/r/20220922060958.44203-1-samuel@sholland.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
commit 8eb060e101 ("arch/riscv: add Zihintpause support") broke
building with CONFIG_CC_OPTIMIZE_FOR_SIZE enabled (gcc 11.1.0):
CC arch/riscv/kernel/vdso/vgettimeofday.o
In file included from <command-line>:
./arch/riscv/include/asm/jump_label.h: In function 'cpu_relax':
././include/linux/compiler_types.h:285:33: warning: 'asm' operand 0 probably does not match constraints
285 | #define asm_volatile_goto(x...) asm goto(x)
| ^~~
./arch/riscv/include/asm/jump_label.h:41:9: note: in expansion of macro 'asm_volatile_goto'
41 | asm_volatile_goto(
| ^~~~~~~~~~~~~~~~~
././include/linux/compiler_types.h:285:33: error: impossible constraint in 'asm'
285 | #define asm_volatile_goto(x...) asm goto(x)
| ^~~
./arch/riscv/include/asm/jump_label.h:41:9: note: in expansion of macro 'asm_volatile_goto'
41 | asm_volatile_goto(
| ^~~~~~~~~~~~~~~~~
make[1]: *** [scripts/Makefile.build:249: arch/riscv/kernel/vdso/vgettimeofday.o] Error 1
make: *** [arch/riscv/Makefile:128: vdso_prepare] Error 2
Having a static branch in cpu_relax() is problematic because that
function is widely inlined, including in some quite complex functions
like in the VDSO. A quick measurement shows this static branch is
responsible by itself for around 40% of the jump table.
Drop the static branch, which ends up being the same number of
instructions anyway. If Zihintpause is supported, we trade the nop from
the static branch for a div. If Zihintpause is unsupported, we trade the
jump from the static branch for (what gets interpreted as) a nop.
Fixes: 8eb060e101 ("arch/riscv: add Zihintpause support")
Signed-off-by: Samuel Holland <samuel@sholland.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Heiko Stuebner <heiko@sntech.de> says:
From: Heiko Stuebner <heiko.stuebner@vrull.eu>
This series still tries to allow optimized string functions for specific
extensions. The last approach of using an inline base function to hold
the alternative calls did cause some issues in a number of places
So instead of that we're now just using an alternative j at the beginning
of the generic function to jump to a separate place inside the function
itself.
* b4-shazam-merge:
RISC-V: add zbb support to string functions
RISC-V: add infrastructure to allow different str* implementations
Link: https://lore.kernel.org/r/20230113212301.3534711-1-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Add handling for ZBB extension and add support for using it as a
variant for optimized string functions.
Support for the Zbb-str-variants is limited to the GNU-assembler
for now, as LLVM has not yet acquired the functionality to
selectively change the arch option in assembler code.
This is still under review at
https://reviews.llvm.org/D123515
Co-developed-by: Christoph Muellner <christoph.muellner@vrull.eu>
Signed-off-by: Christoph Muellner <christoph.muellner@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230113212301.3534711-3-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Depending on supported extensions on specific RISC-V cores,
optimized str* functions might make sense.
This adds basic infrastructure to allow patching the function calls
via alternatives later on.
The Linux kernel provides standard implementations for string functions
but when architectures want to extend them, they need to provide their
own.
The added generic string functions are done in assembler (taken from
disassembling the main-kernel functions for now) to allow us to control
the used registers and extend them with optimized variants.
This doesn't override the compiler's use of builtin replacements. So still
first of all the compiler will select if a builtin will be better suitable
i.e. for known strings. For all regular cases we will want to later
select possible optimized variants and in the worst case fall back to the
generic implemention added with this change.
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230113212301.3534711-2-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmPW7E8eHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGf7MIAI0JnHN9WvtEukSZ
E6j6+cEGWxsvD6q0g3GPolaKOCw7hlv0pWcFJFcUAt0jebspMdxV2oUGJ8RYW7Lg
nCcHvEVswGKLAQtQSWw52qotW6fUfMPsNYYB5l31sm1sKH4Cgss0W7l2HxO/1LvG
TSeNHX53vNAZ8pVnFYEWCSXC9bzrmU/VALF2EV00cdICmfvjlgkELGXoLKJJWzUp
s63fBHYGGURSgwIWOKStoO6HNo0j/F/wcSMx8leY8qDUtVKHj4v24EvSgxUSDBER
ch3LiSQ6qf4sw/z7pqruKFthKOrlNmcc0phjiES0xwwGiNhLv0z3rAhc4OM2cgYh
SDc/Y/c=
=zpaD
-----END PGP SIGNATURE-----
Merge tag 'v6.2-rc6' into sched/core, to pick up fixes
Pick up fixes before merging another batch of cpuidle updates.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
- High Performance mode (1.8 GHz) support for the Cortex-A76 CPU cores
on R-Car V4H,
- GPIO interrupt support for the RZ/G2UL SoC and the RZ/G2UL SMARC EVK
development board,
- USB Function support for the RZ/N1D SoC,
- Generic Sound Card driver examples for the Renesas R-Car Starter Kit
Premier/Pro and Shimafugi Kingfisher development board stack,
- Universal Flash Storage support for the Renesas Spider development
board,
- External Power Sequence Controller (PWC) support for the RZ/V2M SoC
and the RZ/V2M Evaluation Kit 2.0,
- IOMMU support for MMC on the R-Car S4-8 SoC,
- Miscellaneous fixes and improvements.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQ9qaHoIs/1I4cXmEiKwlD9ZEnxcAUCY9Oc1QAKCRCKwlD9ZEnx
cMRKAP0S0VgkOJU9n+qrDdFNYeCAwLkJbcpWMsc4xy/dxt1gIQEAxbnJpGcmdAis
qiY67N7RU8mST1R7QACUEcvygxvqLwI=
=dAM5
-----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmPX+vEACgkQmmx57+YA
GNk6hA//Xbr/6Ey1bGx6jweljSEvLpOZkMCKiprOlJ7aaKM3SXfO2BK447zADtar
D9ooowc/IHBt4DNltVsAol5dm6QgQjcrs4+S04qFsdPtEtEmy59JS7e4m3BWKlTe
AJucF5A8g9HACAYs3zq6iEK8gGnJU2qW1ysiU03ypf9ctfDsir8eWgzCV0P3lgmr
1Rb29ZG0m661ucDNFM5tGOMpJ3yFGgAx1YnNtdn2XMOs3txFZ9hS0lPAOaBd7Wmj
hBDxJPWye7BG3zAo8vCYPDZcvDgl4iGKpmqHJ+p5fA5tD/od/tpyM/ih94pzv9iW
8yqpaIUNpwZ7VD1wITVxWXUOqubNyXo5f5dqWwjcOkp2toc+xn2gJFPD3ffsjghT
YmvRlglaOUHt09Koinb0SYrNBhGnN2qtYYfZzvRWJZmAhT/gzH16/ezuoI9DMy26
cwO8DsXMLA9QKxKVyzq9Y+9gKRXj6dtJhEjUheax/AfeW2q4Bs9fVG4bSw9LLYb4
vSVRIgBhojWvCA/t0vt1tmK+Nw5abm+m3Mfiq8E6rmR5yWqRmoN3mHaUNs4XOllI
sGd4NtO1uqM/SbovLMkccw4NaU0wUdcnYtKsem7dl1Jo5yWUbg8EGh6jRXdtILEt
SLRrc5NkG4kEbGRqSQ9ieINdw3bWUn2kOLbKcWFDwtqzkq6Ucqk=
=YAr2
-----END PGP SIGNATURE-----
Merge tag 'renesas-dts-for-v6.3-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel into arm/dt
Renesas DT updates for v6.3 (take two)
- High Performance mode (1.8 GHz) support for the Cortex-A76 CPU cores
on R-Car V4H,
- GPIO interrupt support for the RZ/G2UL SoC and the RZ/G2UL SMARC EVK
development board,
- USB Function support for the RZ/N1D SoC,
- Generic Sound Card driver examples for the Renesas R-Car Starter Kit
Premier/Pro and Shimafugi Kingfisher development board stack,
- Universal Flash Storage support for the Renesas Spider development
board,
- External Power Sequence Controller (PWC) support for the RZ/V2M SoC
and the RZ/V2M Evaluation Kit 2.0,
- IOMMU support for MMC on the R-Car S4-8 SoC,
- Miscellaneous fixes and improvements.
* tag 'renesas-dts-for-v6.3-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel: (25 commits)
arm64: dts: renesas: r8a779f0: Add iommus to MMC node
arm64: dts: renesas: v2mevk2: Add PWC support
arm64: dts: renesas: r9a09g011: Add PWC support
arm64: dts: renesas: r9a09g011: Reword ethernet status
arm64: dts: renesas: r8a774[be]1-beacon: Sync aliases with RZ/G2M
arm64: dts: renesas: beacon-renesom: Fix audio clock rate
arm64: dts: renesas: beacon-renesom: Update Ethernet PHY ID
arm64: dts: renesas: beacon-renesom: Fix gpio expander reference
arm64: dts: renesas: spider-cpu: Enable UFS device
arm64: dts: renesas: Add ulcb{-kf} Simple Audio Card MIX + TDM Split dtsi
arm64: dts: renesas: Add ulcb{-kf} Audio Graph Card MIX + TDM Split dtsi
arm64: dts: renesas: Add ulcb{-kf} Audio Graph Card2 MIX + TDM Split dtsi
arm64: dts: renesas: Add ulcb{-kf} Simple Audio Card dtsi
arm64: dts: renesas: Add ulcb{-kf} Audio Graph Card2 dtsi
arm64: dts: renesas: Add ulcb{-kf} Audio Graph Card dtsi
arm64: dts: renesas: #sound-dai-cells is used when simple-card
ARM: dts: renesas: #sound-dai-cells is used when simple-card
arm64: dts: renesas: eagle: Add SCIF_CLK support
ARM: dts: r9a06g032: Add the USBF controller node
arm64: dts: renesas: rzg2ul-smarc-som: Add PHY interrupt support for ETH{0/1}
...
Link: https://lore.kernel.org/r/cover.1674815099.git.geert+renesas@glider.be
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Microchip:
A vendor prefix for Aldec and both a binding and Devicetree for the
Aldec TySoM devkit for PolarFire SoC. This Devicetree corresponds to
what they are shipping in the SDK for rev2 boards.
StarFive:
Just the binding for the new StarFive JH7110 SoC and its first-party
SDC the VisionFive 2.
Other:
I was expecting the Devicetree for the aforementioned board to be ready
for this window, as the pinctrl driver had seem some review prior to
v6.2 and both it & the base clock drivers are heavily based on the
existing drivers for the JH7110.
That didn't come to be.. Christmas, the RISC-V Summit in December and
the Lunar New Year all playing a part perhaps.
Because of that, both Palmer and I have the Kconfig.socs work in our
branches, although in hindsight it probably wasn't needed here as I
only added the TySoM Devicetree & the conflict would've been trivial.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRh246EGq/8RLhDjO14tDGHoIJi0gUCY9EPgwAKCRB4tDGHoIJi
0l9kAQDJHyrjfXMooRHSFXRUsJFYeN8MpYvD1CLavdrD+Pu+KQD/c1sApjfZrKjo
ItyEL37F2QJOAFY1rAxdB7d6ppsUEAQ=
=sN4y
-----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmPX9PcACgkQmmx57+YA
GNn4Xg/+MykVFbZuQZbFuzD9DNA3BnHJuJL+5Uz1Ywy9H+Katl4hQGRLlt4WtRzC
fWuteoPmNwvsbxuDRQ3sDC2oqRFVFD45u/q7qQ4ce+dKDbL2qpnbgxdmXVXdJrgL
e9Wb9V01j8vLimNvFspicnaajFwDEC1uGLbVaZ+RKEk6GxxlJEJqydUsa2FVbQIf
QDMEtR967fI3CUZ2wvzplJDDGBBKZbXxuZ/tOaTwEGPKFxbzQ5CBP7CFkbfEEjTZ
kbcHsfLloKLL1/11oBeR6aPhY1WQvecBHO9WZ3ANR+/+2GT9SLdPbCWQN+uWM9ZI
B5/SvtvJhzyP6HG+ErX/Pc8NPjm6AKMoo6j8fLuy8pASJF+H3tV9bn9mj4ld6B1S
Vz8D1/hoh7OIuEaRDdfzW8xHQS6qQF58MRS3J8+I3xTUs4ZP6JifnOdIcrbXYvup
+SFRgtoewV4wMxPshM+5FsVqsE4xSnvdwZeuxIjqpFY/CvlubFJdT9FBx8tx+v+G
rC7MtoggznsQi1BDShGbu0xNAmE6iJcv9mranFm6NQUXC5/1SObIA9RFb0aBBJoQ
DsU1rDqWndh9bILZkoikpPQ0uZWwAOFRpp44CSZ6/YtYPDku99YWozxqCeGGw4gN
V18SjBXgL9YZfQKH3Z0biOdv6m2vLJKoj5hxgzu0KFxbCFlrl+A=
=SCZP
-----END PGP SIGNATURE-----
Merge tag 'riscv-dt-for-v6.3-mw0' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into arm/dt
RISC-V Devicetrees for v6.3-mw0
Microchip:
A vendor prefix for Aldec and both a binding and Devicetree for the
Aldec TySoM devkit for PolarFire SoC. This Devicetree corresponds to
what they are shipping in the SDK for rev2 boards.
StarFive:
Just the binding for the new StarFive JH7110 SoC and its first-party
SDC the VisionFive 2.
Other:
I was expecting the Devicetree for the aforementioned board to be ready
for this window, as the pinctrl driver had seem some review prior to
v6.2 and both it & the base clock drivers are heavily based on the
existing drivers for the JH7110.
That didn't come to be.. Christmas, the RISC-V Summit in December and
the Lunar New Year all playing a part perhaps.
Because of that, both Palmer and I have the Kconfig.socs work in our
branches, although in hindsight it probably wasn't needed here as I
only added the TySoM Devicetree & the conflict would've been trivial.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
* tag 'riscv-dt-for-v6.3-mw0' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux:
riscv: dts: microchip: add the Aldec TySoM's devicetree
dt-bindings: riscv: microchip: document the Aldec TySoM
dt-bindings: vendor-prefixes: Add entry for Aldec
RISC-V: stop directly selecting drivers for SOC_CANAAN
RISC-V: stop selecting SiFive clock and serial drivers directly
RISC-V: stop selecting the PolarFire SoC clock driver
RISC-V: kbuild: convert all use of SOC_FOO to ARCH_FOO
RISC-V: kconfig.socs: convert usage of SOC_CANAAN to ARCH_CANAAN
RISC-V: introduce ARCH_FOO kconfig aliases for SOC_FOO symbols
dt-bindings: riscv: Add StarFive JH7110 SoC and VisionFive 2 board
Link: https://lore.kernel.org/r/Y9LP+Za1h0fkBa58@spud
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The Allwinner D1 family of SoCs contain a PPU power domain controller
separate from the PRCM. It can power down the video engine and DSP, and
it contains special logic for hardware-assisted CPU idle.
Signed-off-by: Samuel Holland <samuel@sholland.org>
Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://lore.kernel.org/r/20230126063419.15971-4-samuel@sholland.org
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Now that several D1-based boards are supported, enable the platform in
our defconfig. Build in the drivers which are necessary to boot, such as
the pinctrl, MMC, RTC (which provides critical clocks), SPI (for flash),
and watchdog (which may be left enabled by the bootloader). Other common
onboard peripherals are enabled as modules.
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Samuel Holland <samuel@sholland.org>
Link: https://lore.kernel.org/r/20230126045738.47903-12-samuel@sholland.org
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Allwinner manufactures the sunxi family of application processors. This
includes the "sun8i" series of ARMv7 SoCs, the "sun50i" series of ARMv8
SoCs, and now the "sun20i" series of 64-bit RISC-V SoCs.
The first SoC in the sun20i series is D1, containing a single T-HEAD
C906 core. D1s is a low-pin-count variant of D1 with co-packaged DRAM.
Most peripherals are shared across the entire chip family. In fact, the
ARMv7 T113 SoC is pin-compatible and almost entirely register-compatible
with the D1s.
This means many existing device drivers can be reused. To facilitate
this reuse, name the symbol ARCH_SUNXI, since that is what the existing
drivers have as their dependency.
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Samuel Holland <samuel@sholland.org>
Link: https://lore.kernel.org/r/20230126045738.47903-11-samuel@sholland.org
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>