During an LPM, while the memory transfer is in progress on the arrival
side, some latencies are generated when accessing not yet transferred
pages on the arrival side. Thus, the NMI watchdog may be triggered too
frequently, which increases the risk to hit an NMI interrupt in a bad
place in the kernel, leading to a kernel panic.
Disabling the Hard Lockup Watchdog until the memory transfer could be a
too strong work around, some users would want this timeout to be
eventually triggered if the system is hanging even during an LPM.
Introduce a new sysctl variable nmi_watchdog_factor. It allows to apply
a factor to the NMI watchdog timeout during an LPM. Just before the CPUs
are stopped for the switchover sequence, the NMI watchdog timer is set
to watchdog_thresh + factor%
A value of 0 has no effect. The default value is 200, meaning that the
NMI watchdog is set to 30s during LPM (based on a 10s watchdog_thresh
value). Once the memory transfer is achieved, the factor is reset to 0.
Setting this value to a high number is like disabling the NMI watchdog
during an LPM.
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220713154729.80789-5-ldufour@linux.ibm.com
Introduce a factor which would apply to the NMI watchdog timeout.
This factor is a percentage added to the watchdog_tresh value. The value is
set under the watchdog_mutex protection and lockup_detector_reconfigure()
is called to recompute wd_panic_timeout_tb.
Once the factor is set, it remains until it is set back to 0, which means
no impact.
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220713154729.80789-4-ldufour@linux.ibm.com
In some circumstances it may be interesting to reconfigure the watchdog
from inside the kernel.
On PowerPC, this may helpful before and after a LPAR migration (LPM) is
initiated, because it implies some latencies, watchdog, and especially NMI
watchdog is expected to be triggered during this operation. Reconfiguring
the watchdog with a factor, would prevent it to happen too frequently
during LPM.
Rename lockup_detector_reconfigure() as __lockup_detector_reconfigure() and
create a new function lockup_detector_reconfigure() calling
__lockup_detector_reconfigure() under the protection of watchdog_mutex.
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
[mpe: Squash in build fix from Laurent, reported by Sachin]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220713154729.80789-3-ldufour@linux.ibm.com
In pseries_migration_partition(), loop until the memory transfer is
complete. This way the calling drmgr process will not exit earlier,
allowing callbacks to be run only once the migration is fully completed.
If reading the VASI state is done after the hypervisor has completed the
migration, the HCALL is returning H_PARAMETER. We can safely assume that
the memory transfer is achieved if this happens.
This will also allow to manage the NMI watchdog state in the next commits.
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220713154729.80789-2-ldufour@linux.ibm.com
Currently the ptrace-gpr test only tests the GET/SET(FP)REGS ptrace
APIs. But there's an alternate (older) API, called PEEK/POKEUSR.
Add some minimal testing of PEEK/POKEUSR of the FPRs. This is sufficient
to detect the bug that was fixed recently in the 32-bit ptrace FPR
handling.
Depends-on: 8e12784444 ("powerpc/32: Fix overread/overwrite of thread_struct via ptrace")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220627140239.2464900-13-mpe@ellerman.id.au
The ptrace-gpr test uses fixed values to test that registers can be
read/written via ptrace. In particular it sets all GPRs to 1, which
means the test could miss some types of bugs - eg. if the kernel was
only returning the low word.
So generate some random values at startup and use those instead.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220627140239.2464900-12-mpe@ellerman.id.au
The ptrace-gpr test includes some inline asm to load GPR and FPR
registers. It then goes back to C to wait for the parent to trace it and
then checks register contents.
The split between inline asm and C is fragile, it relies on the compiler
not using any non-volatile GPRs after the inline asm block. It also
requires a very large and unwieldy inline asm block.
So convert the logic to set registers, wait, and store registers to a
single asm function, meaning there's no window for the compiler to
intervene.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220627140239.2464900-10-mpe@ellerman.id.au
Some of the ptrace tests check the contents of floating pointer
registers. Currently these use float, which is always 4 bytes, but the
ptrace API supports saving/restoring 8 bytes per register, so switch to
using doubles to exercise the code more fully.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220627140239.2464900-8-mpe@ellerman.id.au
Thare are some asm helpers for creating/popping stack frames in
basic_asm.h. They always save/restore r2 (TOC pointer), but none of the
selftests change r2, so it's unnecessary to save it by default.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220627140239.2464900-5-mpe@ellerman.id.au
Thare are some asm helpers for creating/popping stack frames in
basic_asm.h. They always save/restore CR, but none of the selftests
tests touch non-volatile CR fields, so it's unnecessary to save them by
default.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220627140239.2464900-4-mpe@ellerman.id.au
Currently all ptrace tests are built 64-bit and with TM enabled.
Only the TM tests need TM enabled, so split those out into a separate
variable so that can be specified precisely.
Split the rest of the tests into a variable, and add -m64 to CFLAGS for
those tests, so that in a subsequent patch some tests can be made to
build 32-bit.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220627140239.2464900-3-mpe@ellerman.id.au
Set LOCAL_HDRS so header changes cause rebuilds. The lib.mk logic adds
all the headers in LOCAL_HDRS as dependencies, so there's no need to
also list them explicitly.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220627140239.2464900-2-mpe@ellerman.id.au
The PUSH/POP_BASIC_STACK helpers in basic_asm.h do not ensure that the
stack pointer is always 16-byte aligned, which is required per the ABI.
Fix the macros to do the alignment if the caller fails to.
Currently only one caller passes a non-aligned size, tm_signal_self(),
which hasn't been caught in testing, presumably because it's a leaf
function.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220627140239.2464900-1-mpe@ellerman.id.au
Since commit 87c78b612f ("powerpc: Fix all occurences of "the the"")
fixed "the the", there's now a steady stream of patches fixing other
duplicate words.
Just fix them all at once, to save the overhead of dealing with
individual patches for each case.
This leaves a few cases of "that that", which in some contexts is
correct.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220718095158.326606-1-mpe@ellerman.id.au
In do_adb_query() function of drivers/macintosh/adb.c, req->data is copied
form userland. The parameter "req->data[2]" is missing check, the array
size of adb_handler[] is 16, so adb_handler[req->data[2]].original_address and
adb_handler[req->data[2]].handler_id will lead to oob read.
Cc: stable <stable@kernel.org>
Signed-off-by: Ning Qiang <sohu0106@126.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220713153734.2248-1-sohu0106@126.com
PAPR v2.12 defines a new hypercall, H_WATCHDOG. The hypercall permits
guest control of one or more virtual watchdog timers. The timers have
millisecond granularity. The guest is terminated when a timer
expires.
This patch adds a watchdog driver for these timers, "pseries-wdt".
pseries_wdt_probe() currently assumes the existence of only one
platform device and always assigns it watchdogNumber 1. If we ever
expose more than one timer to userspace we will need to devise a way
to assign a distinct watchdogNumber to each platform device at device
registration time.
Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220713202335.1217647-5-cheloha@linux.ibm.com
PAPR v2.12 defines a new hypercall, H_WATCHDOG. The hypercall permits
guest control of one or more virtual watchdog timers.
These timers do not conform to PowerPC device conventions. They are
not affixed to any extant bus, nor do they have full representation in
the device tree.
As a workaround we represent them as platform devices.
This patch registers a single platform device, "pseries-wdt", with the
platform bus if the FW_FEATURE_WATCHDOG flag is set.
A driver for this device, "pseries-wdt", will be introduced in a
subsequent patch.
Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220713202335.1217647-4-cheloha@linux.ibm.com
PAPR v2.12 specifies a new optional function set, "hcall-watchdog",
for the /rtas/ibm,hypertas-functions property. The presence of this
function set indicates support for the H_WATCHDOG hypercall.
Check for this function set and, if present, set the new
FW_FEATURE_WATCHDOG flag.
Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220713202335.1217647-3-cheloha@linux.ibm.com
PAPR v2.12 defines a new hypercall, H_WATCHDOG. The hypercall permits
guest control of one or more virtual watchdog timers.
Add the opcode for the H_WATCHDOG hypercall to hvcall.h. While here,
add a definition for H_NOOP, a possible return code for H_WATCHDOG.
Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220713202335.1217647-2-cheloha@linux.ibm.com
With GCC 12 allmodconfig prom_init fails to build:
Error: External symbol 'memset' referenced from prom_init.c
make[2]: *** [arch/powerpc/kernel/Makefile:204: arch/powerpc/kernel/prom_init_check] Error 1
The allmodconfig build enables KASAN, so all calls to memset in
prom_init should be converted to __memset by the #ifdefs in
asm/string.h, because prom_init must use the non-KASAN instrumented
versions.
The build failure happens because there's a call to memset that hasn't
been caught by the pre-processor and converted to __memset. Typically
that's because it's a memset generated by the compiler itself, and that
is the case here.
With GCC 12, allmodconfig enables CONFIG_INIT_STACK_ALL_PATTERN, which
causes the compiler to emit memset calls to initialise on-stack
variables with a pattern.
Because prom_init is non-user-facing boot-time only code, as a
workaround just disable stack variable initialisation to unbreak the
build.
Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220718134418.354114-1-mpe@ellerman.id.au
Returning an error code (here -EBUSY) from a remove callback doesn't
prevent the driver from being unloaded. The only effect is that an error
message is emitted and the driver is removed anyhow.
So instead drop the remove function (which is equivalent to returning zero)
and set the suppress_bind_attrs property to make it impossible to unload
the driver via sysfs.
This is a preparation for making platform remove callbacks return void.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220612213400.159257-1-u.kleine-koenig@pengutronix.de
Details is added about "caps" attribute group in the ABI documentation.
This is used to expose some of the PMU attributes in "caps"
directory under : /sys/bus/event_source/devices/<dev>/. The dev/caps
will contain information about features that platform specific PMU
supports.
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220520084630.15181-2-atrajeev@linux.vnet.ibm.com
Add caps support under "/sys/bus/event_source/devices/<pmu>/"
for powerpc. This directory can be used to expose some of the
specific features that powerpc PMU supports to the user.
Example: pmu_name. The name of PMU registered will depend on
platform, say power9 or power10 or it could be Generic Compat
PMU.
Currently the only way to know which is the registered
PMU is from the dmesg logs. But clearing the dmesg will make it
difficult to know exact PMU backend used. And even extracting
from dmesg will be complicated, as we need to parse the dmesg
logs and add filters for pmu name. Whereas by exposing it via
caps will make it easy as we just need to directly read it from
the sysfs.
Add a caps directory to /sys/bus/event_source/devices/cpu/
for power8, power9, power10 and generic compat PMU in respective
PMU driver code. Update the pmu_name file under caps folder
in core-book3s using "attr_update".
The information exposed currently:
- pmu_name : Underlying PMU name from the driver
Example result with power9 pmu:
# ls /sys/bus/event_source/devices/cpu/caps
pmu_name
# cat /sys/bus/event_source/devices/cpu/caps/pmu_name
POWER9
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220520084630.15181-1-atrajeev@linux.vnet.ibm.com
When booting on a machine that uses the compat pmu driver we see this:
[ 0.071192] GENERIC_COMPAT performance monitor hardware support registered
Which is a bit shouty. Give it a nicer name.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220610044006.2095806-1-joel@jms.id.au
Merge our fixes branch. In particular this brings in commit
9864816180 ("powerpc/book3e: Fix PUD allocation size in
map_kernel_page()") which fixes a build failure in next, because commit
2db2008e63 ("powerpc/64e: Rewrite p4d_populate() as a static inline
function") depends on it.
The platform device for the rng must be created much later in boot.
Otherwise it tries to connect to a parent that doesn't yet exist,
resulting in this splat:
[ 0.000478] kobject: '(null)' ((____ptrval____)): is not initialized, yet kobject_get() is being called.
[ 0.002925] [c000000002a0fb30] [c00000000073b0bc] kobject_get+0x8c/0x100 (unreliable)
[ 0.003071] [c000000002a0fba0] [c00000000087e464] device_add+0xf4/0xb00
[ 0.003194] [c000000002a0fc80] [c000000000a7f6e4] of_device_add+0x64/0x80
[ 0.003321] [c000000002a0fcb0] [c000000000a800d0] of_platform_device_create_pdata+0xd0/0x1b0
[ 0.003476] [c000000002a0fd00] [c00000000201fa44] pnv_get_random_long_early+0x240/0x2e4
[ 0.003623] [c000000002a0fe20] [c000000002060c38] random_init+0xc0/0x214
This patch fixes the issue by doing the platform device creation inside
of machine_subsys_initcall.
Fixes: f3eac42665 ("powerpc/powernv: wire up rng during setup_arch")
Cc: stable@vger.kernel.org
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
[mpe: Change "of node" to "platform device" in change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220630121654.1939181-1-Jason@zx2c4.com
With commit ffa0b64e3b ("powerpc: Fix virt_addr_valid() for 64-bit Book3E & 32-bit")
the kernel now validate the addr against high_memory value. This results
in the below BUG_ON with dax pfns.
[ 635.798741][T26531] kernel BUG at mm/page_alloc.c:5521!
1:mon> e
cpu 0x1: Vector: 700 (Program Check) at [c000000007287630]
pc: c00000000055ed48: free_pages.part.0+0x48/0x110
lr: c00000000053ca70: tlb_finish_mmu+0x80/0xd0
sp: c0000000072878d0
msr: 800000000282b033
current = 0xc00000000afabe00
paca = 0xc00000037ffff300 irqmask: 0x03 irq_happened: 0x05
pid = 26531, comm = 50-landscape-sy
kernel BUG at :5521!
Linux version 5.19.0-rc3-14659-g4ec05be7c2e1 (kvaneesh@ltc-boston8) (gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #625 SMP Thu Jun 23 00:35:43 CDT 2022
1:mon> t
[link register ] c00000000053ca70 tlb_finish_mmu+0x80/0xd0
[c0000000072878d0] c00000000053ca54 tlb_finish_mmu+0x64/0xd0 (unreliable)
[c000000007287900] c000000000539424 exit_mmap+0xe4/0x2a0
[c0000000072879e0] c00000000019fc1c mmput+0xcc/0x210
[c000000007287a20] c000000000629230 begin_new_exec+0x5e0/0xf40
[c000000007287ae0] c00000000070b3cc load_elf_binary+0x3ac/0x1e00
[c000000007287c10] c000000000627af0 bprm_execve+0x3b0/0xaf0
[c000000007287cd0] c000000000628414 do_execveat_common.isra.0+0x1e4/0x310
[c000000007287d80] c00000000062858c sys_execve+0x4c/0x60
[c000000007287db0] c00000000002c1b0 system_call_exception+0x160/0x2c0
[c000000007287e10] c00000000000c53c system_call_common+0xec/0x250
The fix is to make sure we update high_memory on memory hotplug.
This is similar to what x86 does in commit 3072e413e3 ("mm/memory_hotplug: introduce add_pages")
Fixes: ffa0b64e3b ("powerpc: Fix virt_addr_valid() for 64-bit Book3E & 32-bit")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220629050925.31447-1-aneesh.kumar@linux.ibm.com
Trying to build a .c file that includes <linux/bpf_perf_event.h>:
$ cat test_bpf_headers.c
#include <linux/bpf_perf_event.h>
throws the below error:
/usr/include/linux/bpf_perf_event.h:14:28: error: field ‘regs’ has incomplete type
14 | bpf_user_pt_regs_t regs;
| ^~~~
This is because we typedef bpf_user_pt_regs_t to 'struct user_pt_regs'
in arch/powerpc/include/uaps/asm/bpf_perf_event.h, but 'struct
user_pt_regs' is not exposed to userspace.
Powerpc has both pt_regs and user_pt_regs structures. However, unlike
arm64 and s390, we expose user_pt_regs to userspace as just 'pt_regs'.
As such, we should typedef bpf_user_pt_regs_t to 'struct pt_regs' for
userspace.
Within the kernel though, we want to typedef bpf_user_pt_regs_t to
'struct user_pt_regs'.
Remove arch/powerpc/include/uapi/asm/bpf_perf_event.h so that the
uapi/asm-generic version of the header is exposed to userspace.
Introduce arch/powerpc/include/asm/bpf_perf_event.h so that we can
typedef bpf_user_pt_regs_t to 'struct user_pt_regs' for use within the
kernel.
Note that this was not showing up with the bpf selftest build since
tools/include/uapi/asm/bpf_perf_event.h didn't include the powerpc
variant.
Fixes: a6460b03f9 ("powerpc/bpf: Fix broken uapi for BPF_PROG_TYPE_PERF_EVENT")
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
[mpe: Use typical naming for header include guard]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220627191119.142867-1-naveen.n.rao@linux.vnet.ibm.com
The convention for indentation seems to be a single tab. Help text is
further indented by an additional two whitespaces. Fix the lines that
violate these rules.
Signed-off-by: Juerg Haefliger <juergh@canonical.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220520115431.147593-1-juergh@canonical.com
struct perf_event_attr supports exclude counting of idle task.
This is sent to kernel via perf_event_attr.exclude_idle and
in perf tool, user can use ":I" event modifier to enable this
for specific event.
Monitor Mode Control Register 2 (MMCR2) SPR has control bits
for each PMCs to freeze counting based on the Control Register
CTRL[RUN] state. CTRL[RUN] is not set when idle task is
running. Patch adds a check for event attr.exclude_idle to
set MMCR2[FCnWAIT] bit.
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210429050208.266619-1-maddy@linux.ibm.com
PowerVM has a stricter policy about allocating TCEs for LPARs and
often there is not enough TCEs for 1:1 mapping, this adds the supported
numbers into dev_info() to help analyzing bugreports.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220601040117.1467710-1-aik@ozlabs.ru
KVM manages emulated TCE tables for guest LIOBNs by a two level table
which maps up to 128TiB with 16MB IOMMU pages (enabled in QEMU by default)
and MAX_ORDER=11 (the kernel's default). Note that the last level of
the table is allocated when actual TCE is updated.
However these tables are created via ioctl() on kvmfd and the userspace
can trigger WARN_ON_ONCE_GFP(order >= MAX_ORDER, gfp) in mm/page_alloc.c
and flood dmesg.
This adds __GFP_NOWARN.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220628080228.1508847-1-aik@ozlabs.ru
This adds two atomic opcodes BPF_XCHG and BPF_CMPXCHG on ppc32, both
of which include the BPF_FETCH flag. The kernel's atomic_cmpxchg
operation fundamentally has 3 operands, but we only have two register
fields. Therefore the operand we compare against (the kernel's API
calls it 'old') is hard-coded to be BPF_REG_R0. Also, kernel's
atomic_cmpxchg returns the previous value at dst_reg + off. JIT the
same for BPF too with return value put in BPF_REG_0.
BPF_REG_R0 = atomic_cmpxchg(dst_reg + off, BPF_REG_R0, src_reg);
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> (ppc64le)
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220610155552.25892-6-hbathini@linux.ibm.com
This adds two atomic opcodes BPF_XCHG and BPF_CMPXCHG on ppc64, both
of which include the BPF_FETCH flag. The kernel's atomic_cmpxchg
operation fundamentally has 3 operands, but we only have two register
fields. Therefore the operand we compare against (the kernel's API
calls it 'old') is hard-coded to be BPF_REG_R0. Also, kernel's
atomic_cmpxchg returns the previous value at dst_reg + off. JIT the
same for BPF too with return value put in BPF_REG_0.
BPF_REG_R0 = atomic_cmpxchg(dst_reg + off, BPF_REG_R0, src_reg);
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> (ppc64le)
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220610155552.25892-4-hbathini@linux.ibm.com
Adding instructions for ppc64 for
atomic[64]_fetch_add
atomic[64]_fetch_and
atomic[64]_fetch_or
atomic[64]_fetch_xor
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> (ppc64le)
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220610155552.25892-3-hbathini@linux.ibm.com
Adding instructions for ppc64 for
atomic[64]_and
atomic[64]_or
atomic[64]_xor
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> (ppc64le)
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220610155552.25892-2-hbathini@linux.ibm.com
There is no need to read the H_BLOCK_REMOVE characteristics when running in
Radix mode because this hcall is never called.
Furthermore since the commit 387e220a2e ("powerpc/64s: Move hash MMU
support code under CONFIG_PPC_64S_HASH_MMU") define
pseries_lpar_read_hblkrm_characteristics as un empty function if
CONFIG_PPC_64S_HASH_MMU is not set, the #ifdef block can be removed.
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220523164353.26441-1-ldufour@linux.ibm.com
The ppc_inst_as_str() macro tries to make printing variable length,
aka "prefixed", instructions convenient. It mostly succeeds, but it does
hide an on-stack buffer, which triggers stack protector.
More problematically it doesn't compile at all with GCC 12,
with -Wdangling-pointer, due to the fact that it returns the char buffer
declared inside the macro:
arch/powerpc/kernel/trace/ftrace.c: In function '__ftrace_modify_call':
./include/linux/printk.h:475:44: error: using a dangling pointer to '__str' [-Werror=dangling-pointer=]
475 | #define printk(fmt, ...) printk_index_wrap(_printk, fmt, ##__VA_ARGS__)
...
arch/powerpc/kernel/trace/ftrace.c:567:17: note: in expansion of macro 'pr_err'
567 | pr_err("Not expected bl: opcode is %s\n", ppc_inst_as_str(op));
| ^~~~~~
./arch/powerpc/include/asm/inst.h:156:14: note: '__str' declared here
156 | char __str[PPC_INST_STR_LEN]; \
| ^~~~~
This could be fixed by having the caller declare the buffer, but in some
places there'd need to be two buffers. In all cases where
ppc_inst_as_str() is used the output is not really meant for user
consumption, it's almost always indicative of a kernel bug.
A simpler solution is to just print the value as an unsigned long. For
normal instructions the output is identical. For prefixed instructions
the value is printed as a single 64-bit quantity, whereas previously the
low half was printed first. But that is good enough for debug output,
especially as prefixed instructions will be rare in kernel code in
practice.
Old:
c000000000111170 60420000 ori r2,r2,0
c000000000111174 04100001 e580fb00 .long 0xe580fb0004100001
New:
c00000000010f90c 60420000 ori r2,r2,0
c00000000010f910 e580fb0004100001 .long 0xe580fb0004100001
Reported-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reported-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20220531065936.3674348-1-mpe@ellerman.id.au