Commit Graph

9185 Commits

Author SHA1 Message Date
Alexander Gordeev 2d1b21ecea s390/kdump: remove nodat stack restriction for calling nodat functions
To allow calling of DAT-off code from kernel the stack needs
to be switched to nodat_stack (or other stack mapped as 1:1).

Before call_nodat() macro was introduced that was necessary
to provide the very same memory address for STNSM and STOSM
instructions. If the kernel would stay on a random stack
(e.g. a virtually mapped one) then a virtual address provided
for STNSM instruction could differ from the physical address
needed for the corresponding STOSM instruction.

After call_nodat() macro is introduced the kernel stack does
not need to be mapped 1:1 anymore, since the macro stores the
physical memory address of return PSW in a register before
entering DAT-off mode. This way the return LPSWE instruction
is able to pick the correct memory location and restore the
DAT-on mode. That however might fail in case the 16-byte return
PSW happened to cross page boundary: PSW mask and PSW address
could end up in two separate non-contiguous physical pages.

Align the return PSW on 16-byte boundary so it always fits
into a single physical page. As result any stack (including
the virtually mapped one) could be used for calling DAT-off
code and prior switching to nodat_stack becomes unnecessary.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-19 17:24:16 +02:00
Alexander Gordeev 82caf7aba1 s390/kdump: rework invocation of DAT-off code
Calling kdump kernel is a two-step process that involves
invocation of the purgatory code: first time - to verify
the new kernel checksum and second time - to call the new
kernel itself.

The purgatory code operates on real addresses and does not
expect any memory protection. Therefore, before the purgatory
code is entered the DAT mode is always turned off. However,
it is only restored upon return from the new kernel checksum
verification. In case the purgatory was called to start the
new kernel and failed the control is returned to the old
kernel, but the DAT mode continues staying off.

The new kernel start failure is unlikely and leads to the
disabled wait state anyway. Still that poses a risk, since
the kernel code in general is not DAT-off safe and even
calling the disabled_wait() function might crash.

Introduce call_nodat() macro that allows entering DAT-off
mode, calling an arbitrary function and restoring DAT mode
back on. Switch all invocations of DAT-off code to that
macro and avoid the above described scenario altogether.

Name the call_nodat() macro in small letters after the
already existing call_on_stack() and put it to the same
header file.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
[hca@linux.ibm.com: some small modifications to call_nodat() macro]
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-19 17:24:16 +02:00
Alexander Gordeev 39218bcf94 s390/kdump: fix virtual vs physical address confusion
Fix virtual vs physical address confusion (which currently are the same).

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-19 17:24:16 +02:00
Alexander Gordeev 86295cb453 s390/kdump: cleanup do_start_kdump() prototype and usage
Avoid unnecessary run-time and compile-time type
conversions of do_start_kdump() function return
value and parameter.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-19 17:24:15 +02:00
Alexander Gordeev 7a04d491ed s390/kexec: turn DAT mode off immediately before purgatory
The kernel code is not guaranteed DAT-off mode safe.
Turn the DAT mode off immediately before entering the
purgatory.

Further, to avoid subtle side effects reset the system
immediately before turning DAT mode off while making
all necessary preparations in advance.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-19 17:24:15 +02:00
Thomas Richter 1a33aee1dc s390/cpum_cf: remove function validate_ctr_auth() by inline code
Remove function validate_ctr_auth() and replace this very small
function by its body.

No functional change.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-19 16:48:14 +02:00
Thomas Richter 9ae9b868ae s390/cpum_cf: provide counter number to validate_ctr_version()
Function validate_ctr_version() first parameter is a pointer to
a large structure, but only member hw_perf_event::config is used.
Supply this structure member value in the function invocation.

No functional change.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-19 16:48:14 +02:00
Thomas Richter 46c4d945ea s390/cpum_cf: introduce static CPU counter facility information
The CPU measurement facility counter information instruction qctri()
retrieves information about the available counter sets.
The information varies between machine generations, but is constant
when running on a particular machine.
For example the CPU measurement facility counter first and second
version numbers determine the amount of counters in a counter
set. This information never changes.

The counter sets are identical for all CPUs in the system. It does
not matter which CPU performs the instruction.

Authorization control of the CPU Measurement facility can only
be changed in the activation profile while the LPAR is not running.

Retrieve the CPU measurement counter information at device driver
initialization time and use its constant values.

Function validate_ctr_version() verifies if a user provided
CPU Measurement counter facility counter is valid and defined.
It now uses the newly introduced static CPU counter facility
information.

To avoid repeated recalculation of the counter set sizes (numbers of
counters per set), which never changes on a running machine,
calculate the counter set size once at device driver initialization
and store the result in an array. Functions cpum_cf_make_setsize()
and cpum_cf_read_setsize() are introduced.

Finally remove cpu_cf_events::info member and use the static CPU
counter facility information instead.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-19 16:48:14 +02:00
Aneesh Kumar K.V 0b376f1e0f mm/hugetlb_vmemmap: rename ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
Now we use ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP config option to
indicate devdax and hugetlb vmemmap optimization support.  Hence rename
that to a generic ARCH_WANT_OPTIMIZE_VMEMMAP

Link: https://lkml.kernel.org/r/20230412050025.84346-2-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Tarun Sahu <tsahu@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18 16:30:09 -07:00
Ilya Leoshkevich c730fce7c7 s390/bpf: Fix bpf_arch_text_poke() with new_addr == NULL
Thomas Richter reported a crash in linux-next with a backtrace similar
to the following one:

	 [<0000000000000000>] 0x0
	([<000000000031a182>] bpf_trace_run4+0xc2/0x218)
	 [<00000000001d59f4>] __bpf_trace_sched_switch+0x1c/0x28
	 [<0000000000c44a3a>] __schedule+0x43a/0x890
	 [<0000000000c44ef8>] schedule+0x68/0x110
	 [<0000000000c4e5ca>] do_nanosleep+0xa2/0x168
	 [<000000000026e7fe>] hrtimer_nanosleep+0xf6/0x1c0
	 [<000000000026eb6e>] __s390x_sys_nanosleep+0xb6/0xf0
	 [<0000000000c3b81c>] __do_syscall+0x1e4/0x208
	 [<0000000000c50510>] system_call+0x70/0x98
	Last Breaking-Event-Address:
	 [<000003ff7fda1814>] bpf_prog_65e887c70a835bbf_on_switch+0x1a4/0x1f0

The problem is that bpf_arch_text_poke() with new_addr == NULL is
susceptible to the following race condition:

	T1                 T2
        -----------------  -------------------
	plt.target = NULL
	                   entry: brcl 0xf,plt
	entry.mask = 0
	                   lgrl %r1,plt.target
	                   br %r1

Fix by setting PLT target to the instruction following `brcl 0xf,plt`
instead of 0. This way T2 will simply resume the execution of the eBPF
program, which is the desired effect of passing new_addr == NULL.

Fixes: f1d5df84cd ("s390/bpf: Implement bpf_arch_text_poke()")
Reported-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/bpf/20230414154755.184502-1-iii@linux.ibm.com
2023-04-14 18:28:52 +02:00
Josh Poimboeuf 9ea7e6b62c init: Mark [arch_call_]rest_init() __noreturn
In preparation for improving objtool's handling of weak noreturn
functions, mark start_kernel(), arch_call_rest_init(), and rest_init()
__noreturn.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/7194ed8a989a85b98d92e62df660f4a90435a723.1681342859.git.jpoimboe@kernel.org
2023-04-14 17:31:23 +02:00
Ilya Leoshkevich 1cf3bfc60f bpf: Support 64-bit pointers to kfuncs
test_ksyms_module fails to emit a kfunc call targeting a module on
s390x, because the verifier stores the difference between kfunc
address and __bpf_call_base in bpf_insn.imm, which is s32, and modules
are roughly (1 << 42) bytes away from the kernel on s390x.

Fix by keeping BTF id in bpf_insn.imm for BPF_PSEUDO_KFUNC_CALLs,
and storing the absolute address in bpf_kfunc_desc.

Introduce bpf_jit_supports_far_kfunc_call() in order to limit this new
behavior to the s390x JIT. Otherwise other JITs need to be modified,
which is not desired.

Introduce bpf_get_kfunc_addr() instead of exposing both
find_kfunc_desc() and struct bpf_kfunc_desc.

In addition to sorting kfuncs by imm, also sort them by offset, in
order to handle conflicting imms from different modules. Do this on
all architectures in order to simplify code.

Factor out resolving specialized kfuncs (XPD and dynptr) from
fixup_kfunc_call(). This was required in the first place, because
fixup_kfunc_call() uses find_kfunc_desc(), which returns a const
pointer, so it's not possible to modify kfunc addr without stripping
const, which is not nice. It also removes repetition of code like:

	if (bpf_jit_supports_far_kfunc_call())
		desc->addr = func;
	else
		insn->imm = BPF_CALL_IMM(func);

and separates kfunc_desc_tab fixups from kfunc_call fixups.

Suggested-by: Jiri Olsa <olsajiri@gmail.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230412230632.885985-1-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-13 21:36:41 -07:00
Heiko Carstens ca1382eafa s390/debug: replace zero-length array with flexible-array member
There are numerous patches which convert zero-length arrays with a
flexible-array member. Convert the remaining s390 occurrences.

Suggested-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://github.com/KSPP/linux/issues/78
Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:28 +02:00
Gustavo A. R. Silva 6ca87bc4c8 s390/fcx: replace zero-length array with flexible-array member
Zero-length arrays are deprecated [1] and have to be replaced by C99
flexible-array members.

This helps with the ongoing efforts to tighten the FORTIFY_SOURCE routines
on memcpy() and help to make progress towards globally enabling
-fstrict-flex-arrays=3 [2]

Link: https://github.com/KSPP/linux/issues/78 [1]
Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html [2]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/ZC7XT5prvoE4Yunm@work
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:28 +02:00
Gustavo A. R. Silva 3071e9b391 s390/diag: replace zero-length array with flexible-array member
Zero-length arrays are deprecated [1] and have to be replaced by C99
flexible-array members.

This helps with the ongoing efforts to tighten the FORTIFY_SOURCE routines
on memcpy() and help to make progress towards globally enabling
-fstrict-flex-arrays=3 [2]

Link: https://github.com/KSPP/linux/issues/78 [1]
Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html [2]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/ZC7XGpUtVhqlRLhH@work
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:28 +02:00
Heiko Carstens 81e8479649 s390/mm: fix direct map accounting
Commit bb1520d581 ("s390/mm: start kernel with DAT enabled") did not
implement direct map accounting in the early page table setup code. In
result the reported values are bogus now:

$cat /proc/meminfo
...
DirectMap4k:        5120 kB
DirectMap1M:    18446744073709546496 kB
DirectMap2G:           0 kB

Fix this by adding the missing accounting. The result looks sane again:

$cat /proc/meminfo
...
DirectMap4k:        6156 kB
DirectMap1M:     2091008 kB
DirectMap2G:     6291456 kB

Fixes: bb1520d581 ("s390/mm: start kernel with DAT enabled")
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:28 +02:00
Heiko Carstens 07fdd6627f s390/mm: rename POPULATE_ONE2ONE to POPULATE_DIRECT
Architectures generally use the "direct map" wording for mapping the whole
physical memory. Use that wording as well in arch/s390/boot/vmem.c, instead
of "one to one" in order to avoid confusion.

This also matches what is already done in arch/s390/mm/vmem.c.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:28 +02:00
Marc Hartmayer d24e18ef7e s390/boot: improve install.sh script
Use proper quoting for the variables and explicitly distinguish between
command options and positional arguments.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:28 +02:00
Thomas Richter c01f2a5fe4 s390/cpum_cf: simplify pr_err() statement in cpumf_pmu_enable/disable
Simplify pr_err() statement into one line and omit return statement.

No functional change.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:28 +02:00
Vasily Gorbik b3e0423c4e s390/kaslr: randomize amode31 base address
When the KASLR is enabled, randomize the base address of the amode31 image
within the first 2 GB, similar to the approach taken for the vmlinux
image. This makes it harder to predict the location of amode31 data
and code.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:27 +02:00
Vasily Gorbik 6e259bc5a1 s390/kaslr: generalize and improve random base distribution
Improve the distribution algorithm of random base address to ensure
a uniformity among all suitable addresses. To generate a random value
once, and to build a continuous range in which every value is suitable,
count all the suitable addresses (referred to as positions) that can be
used as a base address. The positions are counted by iterating over the
usable memory ranges. For each range that is big enough to accommodate
the image, count all the suitable addresses where the image can be placed,
while taking reserved memory ranges into consideration.

A new function "iterate_valid_positions()" has dual purpose. Firstly, it
is called to count the positions in a given memory range, and secondly,
to convert a random position back to an address.

"get_random_base()" has been replaced with more generic
"randomize_within_range()" which now could be called for randomizing
base addresses not just for the kernel image.

Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:27 +02:00
Vasily Gorbik 898435203c s390/boot: pin amode31 default lma
The special amode31 part of the kernel must always remain below 2Gb. Place
it just under vmlinux.default_lma by default, which makes it easier to
debug amode31 as its default lma is known 0x10000 - 0x3000 (currently,
amode31's size is 3 pages). This location is always available as it is
originally occupied by the vmlinux archive.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:27 +02:00
Vasily Gorbik a1d2d9cbaf s390/boot: do not change default_lma
The current modification of the default_lma is illogical and should be
avoided. It would be more appropriate to introduce and utilize a new
variable vmlinux_lma instead, so that default_lma remains unchanged and
at its original "default" value of 0x100000.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:27 +02:00
Thomas Richter 26300860c2 s390/cpum_cf: remove unnecessary copy_from_user call
Struct s390_ctrset_read userdata is filled by ioctl_read operation
using put_user/copy_to_user. However, the ctrset->data value access
is not performed anywhere during the ioctl_read operation.
Remove unnecessary copy_from_user() call.
No functional change.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Suggested-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:27 +02:00
Thomas Richter 3cdf0269cd s390/cpum_cf: log bad return code of function cfset_all_copy
When function cfset_all_copy() fails, also log the bad return code
in the debug statement (when turned on).
No functional change

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:27 +02:00
Heiko Carstens 1707c11652 s390/module: create module allocations without exec permissions
This is the s390 variant of commit 7dfac3c5f4 ("arm64: module: create
module allocations without exec permissions"):

"The core code manages the executable permissions of code regions of
modules explicitly. It is no longer necessary to create the module vmalloc
regions with RWX permissions. So create them with RW- permissions instead,
which is preferred from a security perspective."

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:26 +02:00
Heiko Carstens 7c7ab788c0 s390/ftrace: do not assume module_alloc() returns executable memory
The ftrace code assumes at two places that module_alloc() returns
executable memory. While this is currently true, this will be changed
with a subsequent patch to follow other architectures which implement
ARCH_HAS_STRICT_MODULE_RWX.

Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:26 +02:00
Heiko Carstens f9b2d96c4f s390/mm: use set_memory_*() helpers instead of open coding
Given that set_memory_rox() and set_memory_rwnx() exist, it is possible
to get rid of all open coded __set_memory() usages and replace them with
proper helper calls everywhere.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:26 +02:00
Heiko Carstens f0a2a7c527 s390/mm: implement set_memory_rwnx()
Given that set_memory_rox() is implemented, provide also set_memory_rwnx().
This allows to get rid of all open coded __set_memory() usages in s390
architecture code.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:26 +02:00
Heiko Carstens 22e99fa564 s390/mm: implement set_memory_rox()
Provide the s390 specific native set_memory_rox() implementation to avoid
frequent set_memory_ro(); set_memory_x() call pairs.

This is the s390 variant of commit 60463628c9 ("x86/mm: Implement native
set_memory_rox()").

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:26 +02:00
Nico Boehr bac30ea9ef s390/ipl: fix physical-virtual confusion for diag308
Diag 308 subcodes expect a physical address as their parameter.

This currently is not a bug, but in the future physical and virtual
addresses might differ.

Fix the confusion by doing a virtual-to-physical conversion in the
exported diag308() and leave the assembly wrapper __diag308() alone.

Note that several callers pass NULL as addr, so check for the case when
NULL is passed and pass 0 to hardware since virt_to_phys(0) might be
nonzero.

Suggested-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Signed-off-by: Nico Boehr <nrb@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:26 +02:00
Heiko Carstens 34644cc2e1 s390/kaslr: randomize module base load address
Randomize the load address of modules in the kernel to make KASLR effective
for modules.
This is the s390 variant of commit e2b32e6785 ("x86, kaslr: randomize
module base load address").

Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:26 +02:00
Heiko Carstens bb87190c9d s390/kaslr: provide kaslr_enabled() function
Just like other architectures provide a kaslr_enabled() function, instead
of directly accessing a global variable.

Also pass the renamed __kaslr_enabled variable from the decompressor to the
kernel, so that kalsr_enabled() is available there too. This will be used
by a subsequent patch which randomizes the module base load address.

Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:25 +02:00
Heiko Carstens 11018ef90c s390/checksum: remove not needed uaccess.h include
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:25 +02:00
Heiko Carstens e42ac7789d s390/checksum: always use cksm instruction
Commit dfe843dce7 ("s390/checksum: support GENERIC_CSUM, enable it for
KASAN") switched s390 to use the generic checksum functions, so that KASAN
instrumentation also works checksum functions by avoiding architecture
specific inline assemblies.

There is however the problem that the generic csum_partial() function
returns a 32 bit value with a 16 bit folded checksum, while the original
s390 variant does not fold to 16 bit. This in turn causes that the
ipib_checksum in lowcore contains different values depending on kernel
config options.

The ipib_checksum is used by system dumpers to verify if pointers in
lowcore point to valid data. Verification is done by comparing checksum
values. The system dumpers still use 32 bit checksum values which are not
folded, and therefore the checksum verification fails (incorrectly).

Symptom is that reboot after dump does not work anymore when a KASAN
instrumented kernel is dumped.

Fix this by not using the generic checksum implementation. Instead add an
explicit kasan_check_read() so that KASAN knows about the read access from
within the inline assembly.

Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Fixes: dfe843dce7 ("s390/checksum: support GENERIC_CSUM, enable it for KASAN")
Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13 17:36:25 +02:00
Stefan Haberland 1cee2975bb s390/dasd: add autoquiesce feature
Add the internal logic to check for autoquiesce triggers and handle
them.

Quiesce and resume are functions that tell Linux to stop/resume
issuing I/Os to a specific DASD.
The DASD driver allows a manual quiesce/resume via ioctl.

Autoquiesce will define an amount of triggers that will lead to
an automatic quiesce if a certain event occurs.
There is no automatic resume.

All events will be reported via DASD Extended Error Reporting (EER)
if configured.

Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Link: https://lore.kernel.org/r/20230405142017.2446986-3-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-11 19:53:08 -06:00
Heiko Carstens e06f47a165 s390/mm: try VMA lock-based page fault handling first
Attempt VMA lock-based page fault handling first, and fall back to the
existing mmap_lock-based handling if that fails.

This is the s390 variant of "x86/mm: try VMA lock-based page fault handling
first".

Link: https://lkml.kernel.org/r/20230314132808.1266335-1-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 20:03:02 -07:00
Paul E. McKenney 79cf833be6 kvm: Remove "select SRCU"
Now that the SRCU Kconfig option is unconditionally selected, there is
no longer any point in selecting it.  Therefore, remove the "select SRCU"
Kconfig statements from the various KVM Kconfig files.

Acked-by: Sean Christopherson <seanjc@google.com> (x86)
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <kvm@vger.kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org> (arm64)
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Anup Patel <anup@brainfault.org> (riscv)
Acked-by: Heiko Carstens <hca@linux.ibm.com> (s390)
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2023-04-05 13:47:42 +00:00
Linus Torvalds 76f598ba7d PPC:
* Hide KVM_CAP_IRQFD_RESAMPLE if XIVE is enabled
 
 s390:
 
 * Fix handling of external interrupts in protected guests
 
 x86:
 
 * Resample the pending state of IOAPIC interrupts when unmasking them
 
 * Fix usage of Hyper-V "enlightened TLB" on AMD
 
 * Small fixes to real mode exceptions
 
 * Suppress pending MMIO write exits if emulator detects exception
 
 Documentation:
 
 * Fix rST syntax
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmQsXMQUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOc6Qf/YMDcYxpPuYFZ69t9k4s0ubeGjGpr
 Qdzo6ZgKbfs4+nu89u2F5zvAftRSrEVQRpfLwmTXauM8el3s+EAEPlK/MLH6sSSo
 LG3eWoNxs7MUQ6gBWeQypY6opvz3W9hf8KYd+GY6TV7jwV5/YgH6674Yyd7T/bCm
 0/tUiil5Q7Da0w4GUY5m9gEqlbA9yjDBFySqqTZemo18NzLaPNpesKnm1hvv0QJG
 JRvOMmHwqAZgS0G5cyGEPZKY6Ga0v9QsAXc8mFC/Jtn9GMD2f47PYDRfd/7NSiVz
 ulgsk8zZkZIZ6nMbQ1t10txmGIbFmJ4iPPaATrMII/xHq+idgAlHU7zY1A==
 =QvFj
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "PPC:
   - Hide KVM_CAP_IRQFD_RESAMPLE if XIVE is enabled

  s390:
   - Fix handling of external interrupts in protected guests

  x86:
   - Resample the pending state of IOAPIC interrupts when unmasking them

   - Fix usage of Hyper-V "enlightened TLB" on AMD

   - Small fixes to real mode exceptions

   - Suppress pending MMIO write exits if emulator detects exception

  Documentation:
   - Fix rST syntax"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  docs: kvm: x86: Fix broken field list
  KVM: PPC: Make KVM_CAP_IRQFD_RESAMPLE platform dependent
  KVM: s390: pv: fix external interruption loop not always detected
  KVM: nVMX: Do not report error code when synthesizing VM-Exit from Real Mode
  KVM: x86: Clear "has_error_code", not "error_code", for RM exception injection
  KVM: x86: Suppress pending MMIO write exits if emulator detects exception
  KVM: x86/ioapic: Resample the pending state of an IRQ when unmasking
  KVM: irqfd: Make resampler_list an RCU list
  KVM: SVM: Flush Hyper-V TLB when required
2023-04-04 11:29:37 -07:00
Heiko Carstens b94c0ebb1e s390: enable HAVE_ARCH_STACKLEAK
Add support for the stackleak feature. Whenever the kernel returns to user
space the kernel stack is filled with a poison value.

Enabling this feature is quite expensive: e.g. after instrumenting the
getpid() system call function to have a 4kb stack the result is an
increased runtime of the system call by a factor of 3.

Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-04 18:34:56 +02:00
Heiko Carstens 22ca1e7738 s390: move on_thread_stack() to processor.h
As preparation for the stackleak feature move on_thread_stack() to
processor.h like x86.

Also make it __always_inline, and slightly optimize it by reading
current task's kernel stack pointer from lowcore.

Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-04 18:34:56 +02:00
Heiko Carstens 60afa6d166 s390: remove arch_early_irq_init()
Allocate early async stack like other early stacks and get rid of
arch_early_irq_init(). This way the async stack is allocated earlier,
and handled like all other stacks.

Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-04 18:34:56 +02:00
Heiko Carstens 23be82f0de s390/stacktrace: remove call_on_stack_noreturn()
There is no user left of call_on_stack_noreturn() - remove it.

Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-04 18:34:56 +02:00
Heiko Carstens 944c78376a s390: use init_thread_union aka initial stack for the first process
s390 is the only architecture which switches from the initial stack to a
later on allocated different stack for the first process.
This is (at least) problematic for the stackleak feature, which instruments
functions to save the current stackpointer within the task structure of the
running process.

The stackleak code compares stack pointers of the current process - and
doesn't expect that the kernel stack of a task can change. Even though the
stackleak feature itself will not cause any harm, the assumption about
kernel stacks being consistent is there, and only s390 doesn't follow that.

Therefore switch back to use init_thread_union, just like all other
architectures.

Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-04 18:34:56 +02:00
Heiko Carstens cfea9bc78b s390/stack: set lowcore kernel stack pointer early
Make sure the lowcore kernel stack pointer reflects the kernel stack of the
current task as early as possible, instead of having a NULL pointer there.

Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-04 18:34:55 +02:00
Heiko Carstens c2c3258fb5 s390/stack: use STACK_INIT_OFFSET where possible
Make STACK_INIT_OFFSET also available for assembler code, and
use it everywhere instead of open-coding it at several places.

Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-04 18:34:55 +02:00
Heiko Carstens e6badee940 s390/dumpstack: simplify in stack logic code
The pattern for all in_<type>_stack() functions is the same; especially
also the size of all stacks is the same. Simplify the code by passing only
the stack address to the generic in_stack() helper, which then can assume a
THREAD_SIZE sized stack.

Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-04 18:34:55 +02:00
Vasily Gorbik b46650d56b s390: make extables read-only
Currently, exception tables are marked as ro_after_init. However,
since they are sorted during compile time using scripts/sorttable,
they can be moved to RO_DATA using the RO_EXCEPTION_TABLE_ALIGN macro,
which is specifically designed for this purpose.

Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-04 18:34:55 +02:00
Vasily Gorbik 385bf43c48 s390/entry: rely on long-displacement facility
Since commit 4efd417f29 ("s390: raise minimum supported machine
generation to z10"), the long-displacement facility is assumed and
required for the kernel. Clean up a couple of places in the entry code,
where long-displacement could be used directly instead of using a base
register.

However, there are still a few other places where a base register has
to be used to extend short-displacement for the second lowcore page
access. Notably, boot/head.S still has to be built for z900, and in
mcck_int_handler, spt and lbear, which don't have long-displacements,
but need to access save areas at the second lowcore page.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-04 18:34:54 +02:00
Vasily Gorbik f62f8b716d Merge branch 'uaccess-inline-asm-cleanup' into features
Heiko Carstens says:

===================
There are a couple of oddities within the s390 uaccess library
functions. Therefore cleanup the whole uaccess.c file.

There is no functional change, only improved readability. The output
of "objdump -Dr" was always compared before/after each patch to make
sure that the generated object file is identical, if that could be
expected. Therefore the series also includes more patches than really
required to cleanup the code.

Furthermore the kunit usercopy tests also still pass.
===================

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-04 18:29:28 +02:00
Heiko Carstens 49d6e68f66 s390/uaccess: remove extra blank line
In order to get uaccess.c (nearly) checkpatch warning free remove an
extra blank line:

CHECK: Blank lines aren't necessary before a close brace '}'
+
+}

Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-04 18:27:24 +02:00
Heiko Carstens c3bd834328 s390/uaccess: get rid of not needed local variable
Get rid of the not needed val local variable and pass the constant
value directly as operand value. In addition this turns the val
operand into an input operand, since it is not changed within the
inline assemblies.

This in turn requires also to add the earlyclobber contraint modifier
to all output operands, since the (former) val operand is used after
all output variants have been modified.

The usercopy kunit tests still pass after this change.

Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-04 18:27:24 +02:00
Heiko Carstens 7f65d18329 s390/uaccess: rename tmp1 and tmp2 variables
Rename tmp1 and tmp2 variables to more meaningful val (for value) and rem
(for remainder).

Except for debug sections the output of "objdump -Dr" of the uaccess object
file is identical before/after this change.

Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-04 18:27:24 +02:00
Heiko Carstens afdcc2ce39 s390/uaccess: sort EX_TABLE list for inline assemblies
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-04 18:27:24 +02:00
Heiko Carstens 4e0b0ad45c s390/uaccess: rename/sort labels in inline assemblies
Rename and sort labels in uaccess inline assemblies to increase
readability. In addition have only one EX_TABLE entry per line - also to
increase readability.

Except for debug sections the output of "objdump -Dr" of the uaccess object
file is identical before/after this change.

Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-04 18:27:24 +02:00
Heiko Carstens b96adf0d03 s390/uaccess: remove unused label in inline assemblies
Remove an unused label in all three uaccess inline assemblies.

Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-04 18:27:24 +02:00
Heiko Carstens 10679e4d98 s390/uaccess: use symbolic names for inline assembly operands
Improve readability of the uaccess inline assemblies by using symbolic
names for all input and output operands.

Except for debug sections the output of "objdump -Dr" of the uaccess object
file is identical before/after this change.

Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-04 18:27:24 +02:00
Greg Kroah-Hartman cd8fe5b6db Merge 6.3-rc5 into driver-core-next
We need the fixes in here for testing, as well as the driver core
changes for documentation updates to build on.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-03 09:33:30 +02:00
Alexey Kardashevskiy 52882b9c7a KVM: PPC: Make KVM_CAP_IRQFD_RESAMPLE platform dependent
When introduced, IRQFD resampling worked on POWER8 with XICS. However
KVM on POWER9 has never implemented it - the compatibility mode code
("XICS-on-XIVE") misses the kvm_notify_acked_irq() call and the native
XIVE mode does not handle INTx in KVM at all.

This moved the capability support advertising to platforms and stops
advertising it on XIVE, i.e. POWER9 and later.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Anup Patel <anup@brainfault.org>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20220504074807.3616813-1-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-31 11:19:05 -04:00
Gerald Schaefer 99c2913363 mm: add PTE pointer parameter to flush_tlb_fix_spurious_fault()
s390 can do more fine-grained handling of spurious TLB protection faults,
when there also is the PTE pointer available.

Therefore, pass on the PTE pointer to flush_tlb_fix_spurious_fault() as an
additional parameter.

This will add no functional change to other architectures, but those with
private flush_tlb_fix_spurious_fault() implementations need to be made
aware of the new parameter.

Link: https://lkml.kernel.org/r/20230306161548.661740-1-gerald.schaefer@linux.ibm.com
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
Acked-by: Michael Ellerman <mpe@ellerman.id.au>		[powerpc]
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-28 16:20:12 -07:00
Nico Boehr 21f27df854 KVM: s390: pv: fix external interruption loop not always detected
To determine whether the guest has caused an external interruption loop
upon code 20 (external interrupt) intercepts, the ext_new_psw needs to
be inspected to see whether external interrupts are enabled.

Under non-PV, ext_new_psw can simply be taken from guest lowcore. Under
PV, KVM can only access the encrypted guest lowcore and hence the
ext_new_psw must not be taken from guest lowcore.

handle_external_interrupt() incorrectly did that and hence was not able
to reliably tell whether an external interruption loop is happening or
not. False negatives cause spurious failures of my kvm-unit-test
for extint loops[1] under PV.

Since code 20 is only caused under PV if and only if the guest's
ext_new_psw is enabled for external interrupts, false positive detection
of a external interruption loop can not happen.

Fix this issue by instead looking at the guest PSW in the state
description. Since the PSW swap for external interrupt is done by the
ultravisor before the intercept is caused, this reliably tells whether
the guest is enabled for external interrupts in the ext_new_psw.

Also update the comments to explain better what is happening.

[1] https://lore.kernel.org/kvm/20220812062151.1980937-4-nrb@linux.ibm.com/

Signed-off-by: Nico Boehr <nrb@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Fixes: 201ae986ea ("KVM: s390: protvirt: Implement interrupt injection")
Link: https://lore.kernel.org/r/20230213085520.100756-2-nrb@linux.ibm.com
Message-Id: <20230213085520.100756-2-nrb@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2023-03-28 07:16:37 +00:00
Heiko Carstens 89aba4c26f s390/uaccess: add missing earlyclobber annotations to __clear_user()
Add missing earlyclobber annotation to size, to, and tmp2 operands of the
__clear_user() inline assembly since they are modified or written to before
the last usage of all input operands. This can lead to incorrect register
allocation for the inline assembly.

Fixes: 6c2a9e6df6 ("[S390] Use alternative user-copy operations for new hardware.")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/all/20230321122514.1743889-3-mark.rutland@arm.com/
Cc: stable@vger.kernel.org
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-03-27 17:23:08 +02:00
Heiko Carstens f9bbf25e7b s390/ptrace: fix PTRACE_GET_LAST_BREAK error handling
Return -EFAULT if put_user() for the PTRACE_GET_LAST_BREAK
request fails, instead of silently ignoring it.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-03-27 17:23:08 +02:00
Jiri Slaby (SUSE) 7bb2107e63 s390: reintroduce expoline dependence to scripts
Expolines depend on scripts/basic/fixdep. And build of expolines can now
race with the fixdep build:

 make[1]: *** Deleting file 'arch/s390/lib/expoline/expoline.o'
 /bin/sh: line 1: scripts/basic/fixdep: Permission denied
 make[1]: *** [../scripts/Makefile.build:385: arch/s390/lib/expoline/expoline.o] Error 126
 make: *** [../arch/s390/Makefile:166: expoline_prepare] Error 2

The dependence was removed in the below Fixes: commit. So reintroduce
the dependence on scripts.

Fixes: a0b0987a78 ("s390/nospec: remove unneeded header includes")
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: stable@vger.kernel.org
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Link: https://lore.kernel.org/r/20230316112809.7903-1-jirislaby@kernel.org
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-03-27 17:23:08 +02:00
Heiko Carstens e7615c9225 s390: enable ARCH_HAS_MEMBARRIER_SYNC_CORE
s390 trivially supports the ARCH_HAS_MEMBARRIER_SYNC_CORE requirements
since the used lpswe(y) instruction to return from any kernel context to
user space performs CPU serialization. This is very similar to arm, arm64
and powerpc.

See commit 70216e18e5 ("membarrier: Provide core serializing command,
*_SYNC_CORE") for further details.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-03-27 17:19:52 +02:00
Thomas Richter af90d7b69c s390/cpum_sf: remove flag PERF_CPUM_SF_FULL_BLOCKS
This flag is used to process only fully populated sampling buffers
when an sampling event is stopped on a CPU. By default the last sampling
buffer is also scanned for samples even if the sampling block full
indicator is not set in the trailer entry of a sampling buffer page.

This flag can be set via perf_event_attr::config1 field. It was never
used and never documented. It is useless now.

With PERF_CPUM_SF_FULL_BLOCKS:
When a process is scheduled off the CPU, the sampling is stopped and
the samples are copied to the perf ring buffer and marked invalid.
When stopped at the last full sample buffer page (which is
achieved with the PERF_CPUM_SF_FULL_BLOCKS options), the hardware
sampling will resume at the first free sample entry in the current,
partially filled sample buffer.

Without PERF_CPUM_SF_FULL_BLOCKS (default behavior):
The partially filled last sample buffer is scanned and valid samples
are saved to the perf ring buffer. The valid samples are marked invalid.
The sampling is resumed when the process is scheduled on this CPU.
Again the hardware sampling will resume at the first free sample entry in
the current, partially filled sample buffer.

Now the next interrupt handler invocation scans the
full sample block and saves the valid samples to the ring buffer.
It omits the invalid samples at the top of the buffer.
The default behavior is fully sufficient, therefore remove this feature.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-03-27 17:19:52 +02:00
Valentin Schneider 4c8c3c7f70 treewide: Trace IPIs sent via smp_send_reschedule()
To be able to trace invocations of smp_send_reschedule(), rename the
arch-specific definitions of it to arch_smp_send_reschedule() and wrap it
into an smp_send_reschedule() that contains a tracepoint.

Changes to include the declaration of the tracepoint were driven by the
following coccinelle script:

  @func_use@
  @@
  smp_send_reschedule(...);

  @include@
  @@
  #include <trace/events/ipi.h>

  @no_include depends on func_use && !include@
  @@
    #include <...>
  +
  + #include <trace/events/ipi.h>

[csky bits]
[riscv bits]
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Guo Ren <guoren@kernel.org>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230307143558.294354-6-vschneid@redhat.com
2023-03-24 11:01:28 +01:00
Fangrui Song aff69273af vdso: Improve cmd_vdso_check to check all dynamic relocations
The actual intention is that no dynamic relocation exists in the VDSO. For
this the VDSO build validates that the resulting .so file does not have any
relocations which are specified via $(ARCH_REL_TYPE_ABS) per architecture,
which is fragile as e.g. ARM64 lacks an entry for R_AARCH64_RELATIVE. Aside
of that ARCH_REL_TYPE_ABS is a misnomer as it checks for relative
relocations too.

However, some GNU ld ports produce unneeded R_*_NONE relocation entries. If
a port fails to determine the exact .rel[a].dyn size, the trailing zeros
become R_*_NONE relocations. E.g. ld's powerpc port recently fixed
https://sourceware.org/bugzilla/show_bug.cgi?id=29540). R_*_NONE are
generally a no-op in the dynamic loaders. So just ignore them.

Remove the ARCH_REL_TYPE_ABS defines and just validate that the resulting
.so file does not contain any R_* relocation entries except R_*_NONE.

Signed-off-by: Fangrui Song <maskray@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> # for aarch64
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> # for vDSO, aarch64
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Link: https://lore.kernel.org/r/20230310190750.3323802-1-maskray@google.com
2023-03-21 21:15:34 +01:00
Mark Rutland fee86a4ed5 ftrace: selftest: remove broken trace_direct_tramp
The ftrace selftest code has a trace_direct_tramp() function which it
uses as a direct call trampoline. This happens to work on x86, since the
direct call's return address is in the usual place, and can be returned
to via a RET, but in general the calling convention for direct calls is
different from regular function calls, and requires a trampoline written
in assembly.

On s390, regular function calls place the return address in %r14, and an
ftrace patch-site in an instrumented function places the trampoline's
return address (which is within the instrumented function) in %r0,
preserving the original %r14 value in-place. As a regular C function
will return to the address in %r14, using a C function as the trampoline
results in the trampoline returning to the caller of the instrumented
function, skipping the body of the instrumented function.

Note that the s390 issue is not detcted by the ftrace selftest code, as
the instrumented function is trivial, and returning back into the caller
happens to be equivalent.

On arm64, regular function calls place the return address in x30, and
an ftrace patch-site in an instrumented function saves this into r9
and places the trampoline's return address (within the instrumented
function) in x30. A regular C function will return to the address in
x30, but will not restore x9 into x30. Consequently, using a C function
as the trampoline results in returning to the trampoline's return
address having corrupted x30, such that when the instrumented function
returns, it will return back into itself.

To avoid future issues in this area, remove the trace_direct_tramp()
function, and require that each architecture with direct calls provides
a stub trampoline, named ftrace_stub_direct_tramp. This can be written
to handle the architecture's trampoline calling convention, and in
future could be used elsewhere (e.g. in the ftrace ops sample, to
measure the overhead of direct calls), so we may as well always build it
in.

Link: https://lkml.kernel.org/r/20230321140424.345218-8-revest@chromium.org

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Florent Revest <revest@chromium.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-21 13:59:29 -04:00
Heiko Carstens d28d86a07d s390/mm: make use of atomic_fetch_xor()
Make use of atomic_fetch_xor() instead of an atomic_cmpxchg() loop to
implement atomic_xor_bits() (aka atomic_xor_return()). This makes the C
code more readable and in addition generates better code, since for z196
and newer a single lax instruction is generated instead of a cmpxchg()
loop.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:12:49 +01:00
Harald Freudenberger 2d72eaf036 s390/ap: implement SE AP bind, unbind and associate
Implementation of the new functions for SE AP support:
bind, unbind and associate. There are two new sysfs
attributes for this:

/sys/devices/ap/cardxx/xx.yyyy/se_bind
/sys/devices/ap/cardxx/xx.yyyy/se_associate

Writing a 1 into the se_bind attribute triggers the
SE AP bind for this AP queue, writing a 0 into does
an unbind - that's a reset (RAPQ) with the F bit enabled.

The se_associate attribute needs an integer value in
range 0...2^16-1 written in. This is the index into a
secrets table feed into the ultravisor. For more details
please see the Architecture documents.

These both new ap queue attributes are only visible
inside a SE guest with SB (Secure Binding) available.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:12:49 +01:00
Harald Freudenberger c81cf436e4 s390/ap: new low level inline functions ap_bapq() and ap_aapq()
Introduce two new low level functions ap_bapq() (calls
PQAP(BAPQ)) and ap_aapq (calls PQAP(AAPQ)). Both functions
are only meant to be used in SE environment with the SE
AP binding facility available.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:12:49 +01:00
Harald Freudenberger 4bdf3c3956 s390/ap: provide F bit parameter for ap_rapq() and ap_zapq()
Extent the ap inline functions ap_rapq() (calls PQAP(RAPQ))
and ap_zapq() (calls PQAP(ZAPQ)) with a new parameter to
enable the new architectured F bit which forces an
unassociate and/or unbind on a secure execution associated
and/or bound queue.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:12:49 +01:00
Harald Freudenberger 211c06d845 s390/ap: make tapq gr2 response a struct
This patch introduces a new struct ap_tapq_gr2 which covers
the response in GR2 on TAPQ invocation. This makes it much
easier and less error-prone for the calling functions to
access the right field without shifting and masking.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:12:48 +01:00
Harald Freudenberger f604704021 s390/ap: exploit new B bit from QCI config info
This patch introduces an update to the ap_config_info
struct which is filled with the QCI subfunction. There
is a new bit apsb (short 'B') showing if the AP secure
bind facility is available. The patch also includes a
simple function ap_sb_available() wrapping this bit test.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:12:48 +01:00
Harald Freudenberger 8794c59613 s390/zcrypt: rework length information for dqap
The inline ap_dqap function does not return the number of
bytes actually written into the message buffer. The calling
code inspects the AP message header to figure out what kind
of AP message has been received and pulls the length
information from this header. This processing may not work
correctly in cases where only a fragment of the reply is
received.

With this patch the ap_dqap inline function now returns
the number of actually written bytes in the *length parameter.
So the calling function has a chance to compare the number of
received bytes against what the AP message header length
field states. This is especially useful in cases where a
message could only get partially received.

The low level reply processing functions needed some rework
to be able to catch this new length information and compare
it the right way. The rework also deals with some situations
where until now the reply length was not correctly calculated
and/or set.

All this has been heavily tested as the modifications on
the reply length information may affect crypto load.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:12:47 +01:00
Harald Freudenberger 003d248fee s390/zcrypt: make psmid unsigned long instead of long long
Since s390 kernel build does not support 32 bit build any
more there is no difference between long and long long.
So this patch reworks all occurrences of psmid (a 64 bit
value) to use unsigned long now.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:12:47 +01:00
Heiko Carstens c2272b2d3b s390/vdso: use __ALIGN instead of open coded .align
Use __ALIGN instead of open coded .align statement to make sure that
vdso code follows global kernel function alignment rules.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:12:47 +01:00
Heiko Carstens 91a0117dce s390/expoline: use __ALIGN instead of open coded .align
Use __ALIGN instead of open coded .align statement to make sure that
external expoline thunks follow global function alignment rules.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:12:47 +01:00
Heiko Carstens dfa2a72cdb s390/ftrace: move hotpatch trampolines to mcount.S
Move the ftrace hotpatch trampolines to mcount.S. This allows to make
use of the standard SYM_CODE macros which again makes sure that the
hotpatch trampolines follow the function alignment rules of the rest
of the kernel.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:12:46 +01:00
Heiko Carstens 6ef55060a1 s390: make use of CONFIG_FUNCTION_ALIGNMENT
Make use of CONFIG_FUNCTION_ALIGNMENT which was introduced with commit
d49a062621 ("arch: Introduce CONFIG_FUNCTION_ALIGNMENT").

Select FUNCTION_ALIGNMENT_8B for gcc in order to reflect gcc's default
function alignment. For all other compilers, which is only clang, select
a function alignment of 16 bytes which reflects the default function
alignment for clang.

Also change the __ALIGN define to follow whatever the value of
CONFIG_FUNCTION_ALIGNMENT is. This makes sure that the alignment of C and
assembler functions is the same.

In result everything still uses the default function alignment for both
compilers. However in addition this is now also true for all assembly
functions, so that all functions have a consistent alignment.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:12:46 +01:00
Heiko Carstens e5323477e6 Merge branch 'decompressor-memory-tracking' into features
Vasily Gorbik says:

===================
Combine and generalize all methods for finding unused memory in
decompressor, while decreasing complexity, add memory holes support,
while improving error handling (especially in low-memory conditions)
and debug-ability.
===================

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:04:10 +01:00
Vasily Gorbik 557b19709d s390/kasan: move shadow mapping to decompressor
Since regular paging structs are initialized in decompressor already
move KASAN shadow mapping to decompressor as well. This helps to avoid
allocating KASAN required memory in 1 large chunk, de-duplicate paging
structs creation code and start the uncompressed kernel with KASAN
instrumentation right away. This also allows to avoid all pitfalls
accidentally calling KASAN instrumented code during KASAN initialization.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:02:51 +01:00
Vasily Gorbik e4c31004d3 s390/mm,pageattr: allow KASAN shadow memory
Allow changing page table attributes for KASAN shadow memory ranges.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:02:50 +01:00
Vasily Gorbik f913a66004 s390/boot: rework decompressor reserved tracking
Currently several approaches for finding unused memory in decompressor
are utilized. While "safe_addr" grows towards higher addresses, vmem
code allocates paging structures top down. The former requires careful
ordering. In addition to that ipl report handling code verifies potential
intersections with secure boot certificates on its own. Neither of two
approaches are memory holes aware and consistent with each other in low
memory conditions.

To solve that, existing approaches are generalized and combined
together, as well as online memory ranges are now taken into
consideration.

physmem_info has been extended to contain reserved memory ranges. New
set of functions allow to handle reserves and find unused memory.
All reserves and memory allocations are "typed". In case of out of
memory condition decompressor fails with detailed info on current
reserved ranges and usable online memory.

Linux version 6.2.0 ...
Kernel command line: ... mem=100M
Our of memory allocating 100000 bytes 100000 aligned in range 0:5800000
Reserved memory ranges:
0000000000000000 0000000003e33000 DECOMPRESSOR
0000000003f00000 00000000057648a3 INITRD
00000000063e0000 00000000063e8000 VMEM
00000000063eb000 00000000063f4000 VMEM
00000000063f7800 0000000006400000 VMEM
0000000005800000 0000000006300000 KASAN
Usable online memory ranges (info source: sclp read info [3]):
0000000000000000 0000000006400000
Usable online memory total: 6400000 Reserved: 61b10a3 Free: 24ef5d
Call Trace:
(sp:000000000002bd58 [<0000000000012a70>] physmem_alloc_top_down+0x60/0x14c)
 sp:000000000002bdc8 [<0000000000013756>] _pa+0x56/0x6a
 sp:000000000002bdf0 [<0000000000013bcc>] pgtable_populate+0x45c/0x65e
 sp:000000000002be90 [<00000000000140aa>] setup_vmem+0x2da/0x424
 sp:000000000002bec8 [<0000000000011c20>] startup_kernel+0x428/0x8b4
 sp:000000000002bf60 [<00000000000100f4>] startup_normal+0xd4/0xd4

physmem_alloc_range allows to find free memory in specified range. It
should be used for one time allocations only like finding position for
amode31 and vmlinux.
physmem_alloc_top_down can be used just like physmem_alloc_range, but
it also allows multiple allocations per type and tries to merge sequential
allocations together. Which is useful for paging structures allocations.
If sequential allocations cannot be merged together they are "chained",
allowing easy per type reserved ranges enumeration and migration to
memblock later. Extra "struct reserved_range" allocated for chaining are
not tracked or reserved but rely on the fact that both
physmem_alloc_range and physmem_alloc_top_down search for free memory
only below current top down allocator position. All reserved ranges
should be transferred to memblock before memblock allocations are
enabled.

The startup code has been reordered to delay any memory allocations until
online memory ranges are detected and occupied memory ranges are marked as
reserved to be excluded from follow-up allocations.
Ipl report certificates are a special case, ipl report certificates list
is checked together with other memory reserves until certificates are
saved elsewhere.
KASAN required memory for shadow memory allocation and mapping is reserved
as 1 large chunk which is later passed to KASAN early initialization code.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:02:50 +01:00
Vasily Gorbik 8c37cb7d4f s390/boot: rename mem_detect to physmem_info
In preparation to extending mem_detect with additional information like
reserved ranges rename it to more generic physmem_info. This new naming
also help to avoid confusion by using more exact terms like "physmem
online ranges", etc.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:02:50 +01:00
Vasily Gorbik 53fcc7dbf1 s390/boot: remove non-functioning image bootable check
check_image_bootable() has been introduced with commit 627c9b6205
("s390/boot: block uncompressed vmlinux booting attempts") to make sure
that users don't try to boot uncompressed vmlinux ELF image in qemu. It
used to be possible quite some time ago. That commit prevented confusion
with uncompressed vmlinux image starting to boot and even printing
kernel messages until it crashed. Users might have tried to report the
problem without realizing they are doing something which was not intended.
Since commit f1d3c53237 ("s390/boot: move sclp early buffer from fixed
address in asm to C") check_image_bootable() doesn't function properly
anymore, as well as booting uncompressed vmlinux image in qemu doesn't
really produce any output and crashes. Moving forward it doesn't make
sense to fix check_image_bootable() anymore, so simply remove it.

Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 11:02:50 +01:00
Ilya Leoshkevich 7229ea86e0 s390/dumpstack: resolve userspace last_break
report_user_fault() currently does not show which library last_break
points to. Call print_vma_addr() to find out; the output now looks
like this:

    Last Breaking-Event-Address:
     [<000003ffaa2a56e4>] libc.so.6[3ffaa180000+251000]

For kernel it's unchanged:

    Last Breaking-Event-Address:
     [<000000000030fd06>] trace_hardirqs_on+0x56/0xc8

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 10:56:49 +01:00
Luis Chamberlain 7db1224630 s390: simplify dynamic sysctl registration for appldata_register_ops
The routine appldata_register_ops() allocates a sysctl table
with 4 entries. The firsts one,   ops->ctl_table[0] is the parent directory
with an empty entry following it, ops->ctl_table[1]. The next entry is
for the ops->name and that is     ops->ctl_table[2]. It needs an empty
entry following that, and that is ops->ctl_table[3]. And so hence the
kcalloc(4, sizeof(struct ctl_table), GFP_KERNEL).

We can simplify this considerably since sysctl_register("foo", table)
can create the parent directory for us if it does not exist. So we
can just remove the first two entries and move back the ops->name to
the first entry, and just use kcalloc(2, ...).

[gor@linux.ibm.com: appldata_generic_handler fixup ctl_table index 2->0]
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20230310234525.3986352-7-mcgrof@kernel.org
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 10:56:49 +01:00
Luis Chamberlain 7ddc873dcb s390: simplify one-level sysctl registration for page_table_sysctl
There is no need to declare an extra tables to just create directory,
this can be easily be done with a prefix path with register_sysctl().

Simplify this registration.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20230310234525.3986352-6-mcgrof@kernel.org
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 10:56:49 +01:00
Luis Chamberlain 414b2a960e s390: simplify one level sysctl registration for cmm_table
There is no need to declare an extra tables to just create directory,
this can be easily be done with a prefix path with register_sysctl().

Simplify this registration.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20230310234525.3986352-5-mcgrof@kernel.org
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 10:56:49 +01:00
Luis Chamberlain 71cb8c00a2 s390: simplify one-level sysctl registration for appldata_table
There is no need to declare an extra tables to just create directory,
this can be easily be done with a prefix path with register_sysctl().

Simplify this registration.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20230310234525.3986352-4-mcgrof@kernel.org
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 10:56:48 +01:00
Luis Chamberlain 751e24071c s390: simplify one-level syctl registration for s390dbf_table
There is no need to declare an extra tables to just create directory,
this can be easily be done with a prefix path with register_sysctl().

Simplify this registration.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20230310234525.3986352-3-mcgrof@kernel.org
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 10:56:48 +01:00
Luis Chamberlain 0599331c3d s390: simplify one-level sysctl registration for topology_ctl_table
There is no need to declare an extra tables to just create directory,
this can be easily be done with a prefix path with register_sysctl().

Simplify this registration.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20230310234525.3986352-2-mcgrof@kernel.org
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20 10:56:48 +01:00
Greg Kroah-Hartman 31e7c4cc7d s390/smp: move to use bus_get_dev_root()
Direct access to the struct bus_type dev_root pointer is going away soon
so replace that with a call to bus_get_dev_root() instead, which is what
it is there for.

Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20230313182918.1312597-19-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-17 15:30:03 +01:00
Greg Kroah-Hartman 9493ed19fb s390/topology: move to use bus_get_dev_root()
Direct access to the struct bus_type dev_root pointer is going away soon
so replace that with a call to bus_get_dev_root() instead, which is what
it is there for.

Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20230313182918.1312597-18-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-17 15:29:59 +01:00
Thomas Huth d8708b80fa KVM: Change return type of kvm_arch_vm_ioctl() to "int"
All kvm_arch_vm_ioctl() implementations now only deal with "int"
types as return values, so we can change the return type of these
functions to use "int" instead of "long".

Signed-off-by: Thomas Huth <thuth@redhat.com>
Acked-by: Anup Patel <anup@brainfault.org>
Message-Id: <20230208140105.655814-7-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-16 10:18:07 -04:00
Thomas Huth 71fb165e23 KVM: s390: Use "int" as return type for kvm_s390_get/set_skeys()
These two functions only return normal integers, so it does not
make sense to declare the return type as "long" here.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230208140105.655814-3-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-16 10:18:06 -04:00
Niklas Schnelle 45e5f0c017 s390/pci: clean up left over special treatment for function zero
Prior to commit 960ac36264 ("s390/pci: allow zPCI zbus without
a function zero") enabling and scanning a PCI function had to
potentially be postponed until the function with devfn zero on that bus
was plugged. While the commit removed the waiting itself extra code to
scan all functions on the PCI bus once function zero appeared was
missed. Remove that code and the outdated comments about waiting for
function zero.

Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/20230306151014.60913-5-schnelle@linux.ibm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-03-13 09:16:43 +01:00
Niklas Schnelle b881208dcd s390/pci: remove redundant pci_bus_add_devices() on new bus
The pci_bus_add_devices() call in zpci_bus_create_pci_bus() is without
function since at this point no device could have been added to the
freshly created PCI bus.

Suggested-by: Bjorn Helgaas <helgaas@kernel.org>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/20230306151014.60913-4-schnelle@linux.ibm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-03-13 09:16:43 +01:00