- Improve ftrace code patching so that stop_machine is not required anymore.
This requires a small common code patch acked by Steven Rostedt:
https://lore.kernel.org/linux-s390/20210730220741.4da6fdf6@oasis.local.home/
- Enable KCSAN for s390. This comes with a small common code change to fix a
compile warning. Acked by Marco Elver:
https://lore.kernel.org/r/20210729142811.1309391-1-hca@linux.ibm.com
- Add KFENCE support for s390. This also comes with a minimal x86 patch from
Marco Elver who said also this can be carried via the s390 tree:
https://lore.kernel.org/linux-s390/YQJdarx6XSUQ1tFZ@elver.google.com/
- More changes to prepare the decompressor for relocation.
- Enable DAT also for CPU restart path.
- Final set of register asm removal patches; leaving only three locations where
needed and sane.
- Add NNPA, Vector-Packed-Decimal-Enhancement Facility 2, PCI MIO support to
hwcaps flags.
- Cleanup hwcaps implementation.
- Add new instructions to in-kernel disassembler.
- Various QDIO cleanups.
- Add SCLP debug feature.
- Various other cleanups and improvements all over the place.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmEs1GsACgkQIg7DeRsp
bsJ9+A/9FApCECNPgu6jOX4Ee+no+LxpCPUF8rvt56TFTLv7+Dhm7fJl0xQ9utsZ
FyLMDAr1/FKdm2wBW23QZH4vEIt1bd6e/03DwwK+6IjHKZHRIfB8eGJMsLj/TDzm
K6/+FI7qXjvpNXxgkCqXf5yESi/y5Dgr+16kTBhPZj5awRiwe5puPamji3uiQ45V
r4MdGCCC9BnTZvtPpUrr8ImnUqHJ4/TMo1YYdykLbZFuAvvYUyZ5YG5kh0pMa8JZ
DGJpfLQfy7ZNscIzdVhZtfzzESVtS6/AOeBzDMO1pbM1CGXtvpJJP0Wjlr/PGwoW
fvuMHpqTlDi+TfNZiPP5lwsFC89xSd6gtZH7vAuI8kFCXgW3RMjABF6h/mzpH1WO
jXyaSEZROc/83gxPMYyOYiqrKyAFPbpZ/Rnav2bvGQGneqx7RvmpF3GgA9WEo1PW
rMDoEbLstJuHk0E2uEV+OnQd5F7MHNonzpYfp/7pyQ+PL8w2GExV9yngVc/f3TqG
HYLC9rc3K6DkxZappcJm0qTb7lDTMFI7xK3g9RiqPQBJE1v1MYE/rai48nW69ypE
bRNL76AjyXKo+zKR2wlhJVMY1I1+DarMopHhZj6fzQT5te1LLsv8OuTU2gkt6dIq
YoSYOYvModf3HbKnJul2tszQG9yl+vpE9MiCyBQSsxIYXCriq/c=
=WDRh
-----END PGP SIGNATURE-----
Merge tag 's390-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Heiko Carstens:
- Improve ftrace code patching so that stop_machine is not required
anymore. This requires a small common code patch acked by Steven
Rostedt:
https://lore.kernel.org/linux-s390/20210730220741.4da6fdf6@oasis.local.home/
- Enable KCSAN for s390. This comes with a small common code change to
fix a compile warning. Acked by Marco Elver:
https://lore.kernel.org/r/20210729142811.1309391-1-hca@linux.ibm.com
- Add KFENCE support for s390. This also comes with a minimal x86 patch
from Marco Elver who said also this can be carried via the s390 tree:
https://lore.kernel.org/linux-s390/YQJdarx6XSUQ1tFZ@elver.google.com/
- More changes to prepare the decompressor for relocation.
- Enable DAT also for CPU restart path.
- Final set of register asm removal patches; leaving only three
locations where needed and sane.
- Add NNPA, Vector-Packed-Decimal-Enhancement Facility 2, PCI MIO
support to hwcaps flags.
- Cleanup hwcaps implementation.
- Add new instructions to in-kernel disassembler.
- Various QDIO cleanups.
- Add SCLP debug feature.
- Various other cleanups and improvements all over the place.
* tag 's390-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (105 commits)
s390: remove SCHED_CORE from defconfigs
s390/smp: do not use nodat_stack for secondary CPU start
s390/smp: enable DAT before CPU restart callback is called
s390: update defconfigs
s390/ap: fix state machine hang after failure to enable irq
KVM: s390: generate kvm hypercall functions
s390/sclp: add tracing of SCLP interactions
s390/debug: add early tracing support
s390/debug: fix debug area life cycle
s390/debug: keep debug data on resize
s390/diag: make restart_part2 a local label
s390/mm,pageattr: fix walk_pte_level() early exit
s390: fix typo in linker script
s390: remove do_signal() prototype and do_notify_resume() function
s390/crypto: fix all kernel-doc warnings in vfio_ap_ops.c
s390/pci: improve DMA translation init and exit
s390/pci: simplify CLP List PCI handling
s390/pci: handle FH state mismatch only on disable
s390/pci: fix misleading rc in clp_set_pci_fn()
s390/boot: factor out offset_vmlinux_info() function
...
While in practice vcpu->vcpu_idx == vcpu->vcp_id is often true, it may
not always be, and we must not rely on this. Reason is that KVM decides
the vcpu_idx, userspace decides the vcpu_id, thus the two might not
match.
Currently kvm->arch.idle_mask is indexed by vcpu_id, which implies
that code like
for_each_set_bit(vcpu_id, kvm->arch.idle_mask, online_vcpus) {
vcpu = kvm_get_vcpu(kvm, vcpu_id);
do_stuff(vcpu);
}
is not legit. Reason is that kvm_get_vcpu expects an vcpu_idx, not an
vcpu_id. The trouble is, we do actually use kvm->arch.idle_mask like
this. To fix this problem we have two options. Either use
kvm_get_vcpu_by_id(vcpu_id), which would loop to find the right vcpu_id,
or switch to indexing via vcpu_idx. The latter is preferable for obvious
reasons.
Let us make switch from indexing kvm->arch.idle_mask by vcpu_id to
indexing it by vcpu_idx. To keep gisa_int.kicked_mask indexed by the
same index as idle_mask lets make the same change for it as well.
Fixes: 1ee0bc559d ("KVM: s390: get rid of local_int array")
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Christian Bornträger <borntraeger@de.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: <stable@vger.kernel.org> # 3.15+
Link: https://lore.kernel.org/r/20210827125429.1912577-1-pasic@linux.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
When this feature is enabled the hardware is free to interpret
specification exceptions generated by the guest, instead of causing
program interruption interceptions.
This benefits (test) programs that generate a lot of specification
exceptions (roughly 4x increase in exceptions/sec).
Interceptions will occur as before if ICTL_PINT is set,
i.e. if guest debug is enabled.
There is no indication if this feature is available or not and the
hardware is free to interpret or not. So we can simply set this bit and
if the hardware ignores it we fall back to intercept 8 handling.
Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Link: https://lore.kernel.org/linux-s390/20210706114714.3936825-1-scgl@linux.ibm.com/
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The secondary CPU start C routine uses nodat_stack as a
interim stack before finally switching to kernel_stack.
Such scheme is superfluous, since the assembler restart
interrupt handler (that secondary CPU starter is called
from) does not need to use any stack for switching into
DAT mode. Once DAT is on, any stack including virtually-
mapped one could be used.
Avoid the use of nodat_stack and smp_start_secondary()
helper. Instead, initiate kernel_stack directly from
the restart interrupt handler.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The restart interrupt is triggered whenever a secondary CPU is
brought online, a remote function call dispatched from another
CPU or a manual PSW restart is initiated and causes the system
to kdump. The handling routine is always called with DAT turned
off. It then initializes the stack frame and invokes a callback.
The existing callbacks handle DAT as follows:
* __do_restart() and __machine_kexec() turn in on upon entry;
* __ipl_run(), __reipl_run() and __dump_run() do not turn it
right away, but all of them call diag308() - which turns DAT
on, but only if kasan is enabled;
In addition to the described complexity all callbacks (and the
functions they call) should avoid kasan instrumentation while
DAT is off.
This update enables DAT in the assembler restart handler and
relieves any callbacks (which are mostly C functions) from
dealing with DAT altogether.
There are four types of CPU restart that initialize control
registers in different ways:
1. Start of secondary CPU on boot - control registers are
inherited from the IPL CPU;
2. Restart of online CPU - control registers of the CPU being
restarted are kept;
3. Hotplug of offline CPU - control registers are inherited
from the starting CPU;
4. Start of offline CPU triggered by manual PSW restart -
the control registers are read from the absolute lowcore
and contain the boot time IPL CPU values updated with all
follow-up calls of smp_ctl_set_bit() and smp_ctl_clear_bit()
routines;
In first three cases contents of the control registers is the
most recent. In the latter case control registers are good
enough to facilitate successful completion of kdump operation.
Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Ingo Franzki reported that our defconfig and debug_config went out of
sync with respect to DM_INTEGRITY. Fix it.
Reported-by: Ingo Franzki <ifranzki@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Generate kvm hypercall functions with a macro instead of duplicating
the more or less identical code seven times. This also reduces number
of lines of code.
However the main purpose is to get rid of as many as possible open
coded error prone register asm constructs in s390 architecture code.
For the only user of kvm_hypercall identical code is created
before/after this patch (drivers/s390/virtio/virtio_ccw.c).
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Link: https://lore.kernel.org/r/20210713145713.2815167-1-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Add tracing of interactions between the SCLP base driver, firmware and
other drivers to support problem determination in case of SCLP-related
issues.
For that purpose this patch introduces two new s390dbf debug areas:
- sclp: An abbreviated log of all common interactions
- sclp_err: A full log of failed or abnormal interactions
Tracing of full SCCB contents can be enabled for the sclp area by
setting its debug level to maximum (6).
Overview of added trace events:
* Firmware interaction:
- SRV1: Service call about to be issued
- SRV2: Service call was issued
- INT: Interrupt received
* Driver interaction:
- RQAD: Request was added
- RQOK: Request success
- RQAB: Request aborted
- RQTM: Request timed out
- REG: Event listener registered
- UREG: Event listener unregistered
- EVNT: Event callback
- STCG: State-change callback
* Abnormal events:
- TMO: A timeout occurred
- UNEX: Unexpected SCCB completion
* Other (not traced at default level):
- SYN1: Synchronous wait start
- SYN2: Synchronous wait end
Since the SCLP interface is used by console drivers this patch also
moves s390dbf printks outside the critical section protected by debug
area locks to prevent a potential deadlock that would otherwise be
introduced between console_owner --> sclp_lock --> sclp_debug.lock.
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Debug areas can currently only be used after s390dbf initialization
which occurs as a postcore_initcall. This is too late for tracing
earlier code such as that related to console_init().
This patch introduces a macro for defining a statically initialized
debug area that can be used to trace very early code. The macro is made
available for built-in code only because modules are never running
during early boot.
Example usage:
1. Define static debug area:
DEFINE_STATIC_DEBUG_INFO(my_debug, "my_debug", 4, 1, 16,
&debug_hex_ascii_view);
2. Add trace entry:
debug_event(&my_debug, 0, "DATA", 4);
Note: The debug area is automatically registered in debugfs during boot.
A driver must not call any of the debug_register()/_unregister()
functions on a static debug_info_t!
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Currently allocation and registration of s390dbf debug areas are tied
together. As a result, a debug area cannot be unregistered and
re-registered while any process has an associated debugfs file open.
Fix this by splitting alloc/release from register/unregister.
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Any previously recorded s390dbf debug data is reset when a debug area
is resized using the 'pages' sysfs attribute. This can make
live-debugging unnecessarily complex.
Fix this by copying existing debug data to the newly allocated debug
area when resizing.
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Avoid that the "restart_part2" label, which is in the middle of a
function, appears in /proc/kallsyms.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
In case of splitting to 4k mapping the early exit in walk_pte_level()
must only be taken iff flags is equal to SET_MEMORY_4K.
Currently the early exit is taken if the flag is set, and also others
might be set. This may lead to the situation that a mapping is split
but other changes are not done, like e.g. setting pages to R/W.
There is currently no such caller, but there might be in the future.
Fixes: b3e1a00c8f ("s390/mm: implement set_memory_4k()")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Rename amod31 to amode31 like it was supposed to be.
Fixes: c78d0c7484 ("s390: rename dma section to amode31")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Both are no longer used since the conversion to generic
entry, therefore remove them.
Fixes: 56e62a7370 ("s390: convert to generic entry")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Currently zpci_dma_init_device()/zpci_dma_exit_device() is called as
part of zpci_enable_device()/zpci_disable_device() and errors for
zpci_dma_exit_device() are always ignored even if we could abort.
Improve upon this by moving zpci_dma_exit_device() out of
zpci_disable_device() and check for errors whenever we have a way to
abort the current operation. Note that for example in
zpci_event_hard_deconfigured() the device is expected to be gone so we
really can't abort and proceed even in case of error.
Similarly move the cc == 3 special case out of zpci_unregister_ioat()
and into the callers allowing to abort when finding an already disabled
devices precludes proceeding with the operation.
While we are at it log IOAT register/unregister errors in the s390
debugfs log,
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Currently clp_get_state() and clp_refresh_fh() awkwardly use the
clp_list_pci() callback mechanism to find the entry for a specific FID
and update its zdev, respectively return its state.
This is both needlessly complex and means we are always going through
the entire PCI function list even if the FID has already been found.
Instead lets introduce a clp_find_pci() function to find a specific
entry and share the CLP List PCI request handling code with
clp_list_pci().
With that in place we can also easily make the function handle a simple
out parameter instead of directly altering the zdev allowing easier
access to the updated function handle by the caller.
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Instead of always treating CLP_RC_SETPCIFN_ALRDY as success and blindly
updating the function handle restrict this special handling to the
disable case by moving it into zpci_disable_device() and still treating
it as an error while also updating the function handle such that
a subsequent zpci_disable_device() succeeds or the caller can ignore the
error when aborting is not an option such as for zPCI event 0x304.
Also print this occurrence to the log such that an admin can tell why
a disable operation returned an error.
A mismatch between the state of the underlying device and our view of it
can naturally happen when the device suddenly enters the error state but
we haven't gotten the error notification yet, it must not happen on
enable though.
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Currently clp_set_pci_fn() always returns 0 as long as the CLP request
itself succeeds even if the operation itself returns a response code
other than CLP_RC_OK or CLP_RC_SETPCIFN_ALRDY. This is highly misleading
because calling code assumes that a zero rc means that the operation was
successful.
Fix this by returning the response code or cc on failure with the
exception of the special handling for CLP_RC_SETPCIFN_ALRDY. Also let's
not assume that the returned function handle for CLP_RC_SETPCIFN_ALRDY
is 0, we don't need it anyway.
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Move offsetting all of vmlinux_info fields to a separate
function for better readability.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
It is currently possible to initialize a large PMD page when
the address is not aligned on page boundary.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
On failure to register a struct zpci_dev with a struct zpci_bus we left
a dangling pointer in zdev->zbus. As zpci_create_device() bails if
zpci_bus_device_register() fails this is of no consequence but still bad
practice.
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
It's currently safe to call zpci_cleanup_bus_resources() even if the
resources were never created but it makes no sense so check
zdev->has_resources before we call zpci_cleanup_bus_resources() in
zpci_release_device().
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Acked-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
It was pointed out during an unrelated patch review that locks should not
be open coded - i.e., writing the algorithm of a standard lock in a
function instead of using a lock from the standard library. The setting and
testing of a busy flag and sleeping on a wait_event is the same thing
a lock does. The open coded locks are invisible to lockdep, so potential
locking problems are not detected.
This patch removes the open coded locks used during
VFIO_GROUP_NOTIFY_SET_KVM notification. The busy flag
and wait queue were introduced to resolve a possible circular locking
dependency reported by lockdep when starting a secure execution guest
configured with AP adapters and domains. Reversing the order in which
the kvm->lock mutex and matrix_dev->lock mutex are locked resolves the
issue reported by lockdep, thus enabling the removal of the open coded
locks.
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Link: https://lore.kernel.org/r/20210823212047.1476436-3-akrowiak@linux.ibm.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The function pointer to the interception handler for the PQAP instruction
can get changed during the interception process. Let's add a
semaphore to struct kvm_s390_crypto to control read/write access to the
function pointer contained therein.
The semaphore must be locked for write access by the vfio_ap device driver
when notified that the KVM pointer has been set or cleared. It must be
locked for read access by the interception framework when the PQAP
instruction is intercepted.
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Link: https://lore.kernel.org/r/20210823212047.1476436-2-akrowiak@linux.ibm.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Add new types of KVM stats, linear and logarithmic histogram.
Histogram are very useful for observing the value distribution
of time or size related stats.
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Message-Id: <20210802165633.1866976-2-jingzhangos@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Ship minimal stdarg.h (1 type, 4 macros) as <linux/stdarg.h>.
stdarg.h is the only userspace header commonly used in the kernel.
GPL 2 version of <stdarg.h> can be extracted from
http://archive.debian.org/debian/pool/main/g/gcc-4.2/gcc-4.2_4.2.4.orig.tar.gz
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
The struct pci_dev uses reference counting but zPCI assumed erroneously
that the last reference would always be the local reference after
calling pci_stop_and_remove_bus_device(). This is usually the case but
not how reference counting works and thus inherently fragile.
In fact one case where this causes a NULL pointer dereference when on an
SRIOV device the function 0 was hot unplugged before another function of
the same multi-function device. In this case the second function's
pdev->sriov->dev reference keeps the struct pci_dev of function 0 alive
even after the unplug. This bug was previously hidden by the fact that
we were leaking the struct pci_dev which in turn means that it always
outlived the struct zpci_dev. This was fixed in commit 0b13525c20
("s390/pci: fix leak of PCI device structure") exposing the broken
behavior.
Fix this by accounting for the long living reference a struct pci_dev
has to its underlying struct zpci_dev via the zbus->function[] array and
only release that in pcibios_release_device() ensuring that the struct
pci_dev is not left with a dangling reference. This is a minimal fix in
the future it would probably better to use fine grained reference
counting for struct zpci_dev.
Fixes: 05bc1be6db ("s390/pci: create zPCI bus")
Cc: stable@vger.kernel.org
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The memory block occupied by the SCLP early buffer that is allocated
by the decompressor and then handed over to the decompressed kernel,
must be reserved to prevent it from being reused for other purposes.
This is necessary because the SCLP early buffer is still in use
during kernel initialization.
Fixes: f1d3c53237 ("s390/boot: move sclp early buffer from fixed address in asm to C")
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reported-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The last user of arch_set_page_states(), arch_set_page_nodat() and
arch_test_page_nodat() was removed in commit 394216275c
("s390: remove broken hibernate / power management support"),
let's remove these functions.
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20210806075430.6103-1-david@redhat.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Neither of the two drivers provides any SLIB parameter data, so get rid
of the dead code.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The PCI core already ensures that the MSI[-X] state is correct when MSI[-X]
is disabled. For MSI the reset state is all entries unmasked and for MSI-X
all vectors are masked.
S390 masks all MSI entries and masks the already masked MSI-X entries
again. Remove it and let the device in the correct state.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Acked-by: Niklas Schnelle <schnelle@linux.ibm.com>
Link: https://lore.kernel.org/r/20210729222542.939798136@linutronix.de
Setting the ->dma_address to DMA_MAPPING_ERROR is not part of
the ->map_sg calling convention, so remove it.
Link: https://lore.kernel.org/linux-mips/20210716063241.GC13345@lst.de/
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Niklas Schnelle <schnelle@linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The .map_sg() op now expects an error code instead of zero on failure.
So propagate the error from __s390_dma_map_sg() up. __s390_dma_map_sg()
returns either -ENOMEM on allocation failure or -EINVAL which is
the same as what's expected by dma_map_sgtable().
Signed-off-by: Martin Oliveira <martin.oliveira@eideticom.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Niklas Schnelle <schnelle@linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
lru_slot is used to keep track of the index of the most-recently used
memslot. The correct acronym would be "mru" but that is not a common
acronym. So call it last_used_slot which is a bit more obvious.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20210804222844.1419481-2-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Build failure in drivers/net/wwan/mhi_wwan_mbim.c:
add missing parameter (0, assuming we don't want buffer pre-alloc).
Conflict in drivers/net/dsa/sja1105/sja1105_main.c between:
589918df93 ("net: dsa: sja1105: be stateless with FDB entries on SJA1105P/Q/R/S/SJA1110 too")
0fac6aa098 ("net: dsa: sja1105: delete the best_effort_vlan_filtering mode")
Follow the instructions from the commit message of the former commit
- removed the if conditions. When looking at commit 589918df93 ("net:
dsa: sja1105: be stateless with FDB entries on SJA1105P/Q/R/S/SJA1110 too")
note that the mask_iotag fields get removed by the following patch.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The dma section name is confusing, since the code which resides within
that section has nothing to do with direct memory access. Instead the
limitation is that the code has to run in 31 bit addressing mode, and
therefore has to reside below 2GB. So the name was chosen since
ZONE_DMA is the same region.
To reduce confusion rename the section to amode31, which hopefully
describes better what this is about.
Note: this will also change vmcoreinfo strings
- SDMA=... gets renamed to SAMODE31=...
- EDMA=... gets renamed to EAMODE31=...
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Fix virtual vs physical address confusion (which currently are the same).
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The functions get_online_cpus() and put_online_cpus() have been
deprecated during the CPU hotplug rework. They map directly to
cpus_read_lock() and cpus_read_unlock().
Replace deprecated CPU-hotplug functions with the official version.
The behavior remains unchanged.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20210803141621.780504-5-bigeasy@linutronix.de
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
s390 allows hotpatching the mask of a conditional jump instruction.
Make use of this feature in order to avoid the expensive stop_machine()
call.
The new trampolines are split in 3 stages:
- A first stage is a 6-byte relative conditional long branch located at
each function's entry point. Its offset always points to the second
stage for the corresponding function, and its mask is either all 0s
(ftrace off) or all 1s (ftrace on). The code for flipping the mask is
borrowed from ftrace_{enable,disable}_ftrace_graph_caller. After
flipping, ftrace_arch_code_modify_post_process() syncs with all the
other CPUs by sending SIGPs.
- Second stages for vmlinux are stored in a separate part of the .text
section reserved by the linker script, and in dynamically allocated
memory for modules. This prevents the icache pollution. The total
size of second stages is about 1.5% of that of the kernel image.
Putting second stages in the .bss section is possible and decreases
the size of the non-compressed vmlinux, but splits the kernel 1:1
mapping, which is a bad tradeoff.
Each second stage contains a call to the third stage, a pointer to
the part of the intercepted function right after the first stage, and
a pointer to an interceptor function (e.g. ftrace_caller).
Second stages are 8-byte aligned for the future direct calls
implementation.
- There are only two copies of the third stage: in the .text section
for vmlinux and in dynamically allocated memory for modules. It can be
an expoline, which is relatively large, so inlining it into each
second stage is prohibitively expensive.
As a result of this organization, phoronix-test-suite with ftrace off
does not show any performance degradation.
Suggested-by: Sven Schnelle <svens@linux.ibm.com>
Suggested-by: Vasily Gorbik <gor@linux.ibm.com>
Co-developed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20210728212546.128248-3-iii@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
and netfilter trees.
Current release - regressions:
- mac80211: fix starting aggregation sessions on mesh interfaces
Current release - new code bugs:
- sctp: send pmtu probe only if packet loss in Search Complete state
- bnxt_en: add missing periodic PHC overflow check
- devlink: fix phys_port_name of virtual port and merge error
- hns3: change the method of obtaining default ptp cycle
- can: mcba_usb_start(): add missing urb->transfer_dma initialization
Previous releases - regressions:
- set true network header for ECN decapsulation
- mlx5e: RX, avoid possible data corruption w/ relaxed ordering and LRO
- phy: re-add check for PHY_BRCM_DIS_TXCRXC_NOENRGY on the BCM54811 PHY
- sctp: fix return value check in __sctp_rcv_asconf_lookup
Previous releases - always broken:
- bpf:
- more spectre corner case fixes, introduce a BPF nospec
instruction for mitigating Spectre v4
- fix OOB read when printing XDP link fdinfo
- sockmap: fix cleanup related races
- mac80211: fix enabling 4-address mode on a sta vif after assoc
- can:
- raw: raw_setsockopt(): fix raw_rcv panic for sock UAF
- j1939: j1939_session_deactivate(): clarify lifetime of
session object, avoid UAF
- fix number of identical memory leaks in USB drivers
- tipc:
- do not blindly write skb_shinfo frags when doing decryption
- fix sleeping in tipc accept routine
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmEEWm8ACgkQMUZtbf5S
Irv84A//V/nn9VRdpDpmodwBWVEc9SA00M/nmziRBLwRyG+fRMtnePY4Ha40TPbh
LL6orth08hZKOjVmMc6Ea4EjZbV5E3iAKtAnaX6wi1HpEXVxKtFYnWxu9ydwTEd9
An1fltDtWYkNi3kiq7il+Tp1/yZAQ+NYv5zQZCWJ47kkN3jkjULdAEBqODA2A6Ul
0PQgS1rKzXukE19PlXDuaNuEekhTiEfaTwzHjdBJZkj1toGJGfHsvdQ/YJjixzB9
44SjE4PfxIaMWP0BVaD6hwzaVQhaZETXhZZufdIDdQd7sDbmd6CPODX6mXfLEq4u
JaWylgobsK+5ScHE6siVI+ZlW7stq9l1Ynm10ADiwsZVzKEoP745484aEFOLO6Z+
Ln/IqDQCP/yJQmnl2i0+TfqVDh6BKYoIfUUK/+nzHw4Otycy0m3kj4P+74aYfjOv
Q+cUgbXUemcrpq6wGUK+zK0NyNHVILvdPDnHPMMypwqPk18y5ZmFvaJAVUPSavD9
N7t9LoLyGwK3i/Ir4l+JJZ1KgAv1+TbmyNBWvY1Yk/r/vHU3nBPIv26s7YarNAwD
094vJEJ0+mqO4h+Xj1Nc7HEBFi46JfpN2L8uYoM7gpwziIRMdmpXVLmpEk43WmFi
UMwWJWqabPEXaozC2UFcFLSk+jS7DiD+G5eG+Fd5HecmKzd7RI0=
=sKPI
-----END PGP SIGNATURE-----
Merge tag 'net-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.14-rc4, including fixes from bpf, can, WiFi
(mac80211) and netfilter trees.
Current release - regressions:
- mac80211: fix starting aggregation sessions on mesh interfaces
Current release - new code bugs:
- sctp: send pmtu probe only if packet loss in Search Complete state
- bnxt_en: add missing periodic PHC overflow check
- devlink: fix phys_port_name of virtual port and merge error
- hns3: change the method of obtaining default ptp cycle
- can: mcba_usb_start(): add missing urb->transfer_dma initialization
Previous releases - regressions:
- set true network header for ECN decapsulation
- mlx5e: RX, avoid possible data corruption w/ relaxed ordering and
LRO
- phy: re-add check for PHY_BRCM_DIS_TXCRXC_NOENRGY on the BCM54811
PHY
- sctp: fix return value check in __sctp_rcv_asconf_lookup
Previous releases - always broken:
- bpf:
- more spectre corner case fixes, introduce a BPF nospec
instruction for mitigating Spectre v4
- fix OOB read when printing XDP link fdinfo
- sockmap: fix cleanup related races
- mac80211: fix enabling 4-address mode on a sta vif after assoc
- can:
- raw: raw_setsockopt(): fix raw_rcv panic for sock UAF
- j1939: j1939_session_deactivate(): clarify lifetime of session
object, avoid UAF
- fix number of identical memory leaks in USB drivers
- tipc:
- do not blindly write skb_shinfo frags when doing decryption
- fix sleeping in tipc accept routine"
* tag 'net-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits)
gve: Update MAINTAINERS list
can: esd_usb2: fix memory leak
can: ems_usb: fix memory leak
can: usb_8dev: fix memory leak
can: mcba_usb_start(): add missing urb->transfer_dma initialization
can: hi311x: fix a signedness bug in hi3110_cmd()
MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver
bpf: Fix leakage due to insufficient speculative store bypass mitigation
bpf: Introduce BPF nospec instruction for mitigating Spectre v4
sis900: Fix missing pci_disable_device() in probe and remove
net: let flow have same hash in two directions
nfc: nfcsim: fix use after free during module unload
tulip: windbond-840: Fix missing pci_disable_device() in probe and remove
sctp: fix return value check in __sctp_rcv_asconf_lookup
nfc: s3fwrn5: fix undefined parameter values in dev_err()
net/mlx5: Fix mlx5_vport_tbl_attr chain from u16 to u32
net/mlx5e: Fix nullptr in mlx5e_hairpin_get_mdev()
net/mlx5: Unload device upon firmware fatal error
net/mlx5e: Fix page allocation failure for ptp-RQ over SF
net/mlx5e: Fix page allocation failure for trap-RQ over SF
...
Currently, the install target in arch/s390/Makefile descends into
arch/s390/boot/Makefile to invoke the shell script, but there is no
good reason to do so.
arch/s390/Makefile can run the shell script directly.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20210729142338.446002-1-masahiroy@kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
zstd decompression uses __builtin_clz() which fails back to __clzdi2()
when the kernel is built for older hardware like z900. This leads to
build failures like the following:
s390x-11.1.0-ld: /devel/src/kernel/arch/s390/boot/compressed/../../../../lib/zstd/bitstream.h:148: undefined reference to `__clzdi2'
Fix that by optionally including lib/clz_ctz.c into the decompressor.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 7b034d9c1b ("s390/boot: add zstd support")
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/patch-1.thread-f0f589.git-f0f58936888f.your-ad-here.call-01627564869-ext-2765@work.hours
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Most architectures do not need a custom implementation, and in most
cases the generic implementation is preferred, so change the polariy
on these Kconfig symbols to require architectures to select them when
they provide their own version.
The new name is CONFIG_ARCH_HAS_{STRNCPY_FROM,STRNLEN}_USER.
The remaining architectures at the moment are: ia64, mips, parisc,
um and xtensa. We should probably convert these as well, but
I was not sure how far to take this series. Thomas Bogendoerfer
had some concerns about converting mips but may still do some
more detailed measurements to see which version is better.
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: linux-ia64@vger.kernel.org
Cc: linux-mips@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-um@lists.infradead.org
Cc: linux-xtensa@linux-xtensa.org
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Helge Deller <deller@gmx.de> # parisc
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
- Fix MTE shared page detection
- Enable selftest's use of PMU registers when asked to
s390:
- restore 5.13 debugfs names
x86:
- fix sizes for vcpu-id indexed arrays
- fixes for AMD virtualized LAPIC (AVIC)
- other small bugfixes
Generic:
- access tracking performance test
- dirty_log_perf_test command line parsing fix
- Fix selftest use of obsolete pthread_yield() in favour of sched_yield()
- use cpu_relax when halt polling
- fixed missing KVM_CLEAR_DIRTY_LOG compat ioctl
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmECvOwUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMjuAf/ZdJx7RKRQxMHG4jHGDtOIQq3qxds
2uJsFZS3MWkphSOJ+mbomdXTOCHvhPbJlr5TXaSxGnasmAAl+mDk2qVT0tH6638m
r6M+fu4X0RYvFz54Qnf96V0/elE6ee8rtteXD8WVKQ/XzE3odk1EOqbe7CBDx7yo
A3SzO8eSBzxamKo22fmE3MR5LVVAcN9wNsCb88XGDTUkTbYl+w597r6zg83rMMlL
gwD4f9+NYX6h88BVVwLUkWotUrD/5rRGpRVVEZk5eZKvFGzpukk15dfv0PA9347O
AOM0i/PgnA+Qw6ZsTetWPjD8eFcXDBurGF1tIkyo4X8VogQG0wFIHxbezQ==
=ZgK/
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Fix MTE shared page detection
- Enable selftest's use of PMU registers when asked to
s390:
- restore 5.13 debugfs names
x86:
- fix sizes for vcpu-id indexed arrays
- fixes for AMD virtualized LAPIC (AVIC)
- other small bugfixes
Generic:
- access tracking performance test
- dirty_log_perf_test command line parsing fix
- Fix selftest use of obsolete pthread_yield() in favour of
sched_yield()
- use cpu_relax when halt polling
- fixed missing KVM_CLEAR_DIRTY_LOG compat ioctl"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: add missing compat KVM_CLEAR_DIRTY_LOG
KVM: use cpu_relax when halt polling
KVM: SVM: use vmcb01 in svm_refresh_apicv_exec_ctrl
KVM: SVM: tweak warning about enabled AVIC on nested entry
KVM: SVM: svm_set_vintr don't warn if AVIC is active but is about to be deactivated
KVM: s390: restore old debugfs names
KVM: SVM: delay svm_vcpu_init_msrpm after svm->vmcb is initialized
KVM: selftests: Introduce access_tracking_perf_test
KVM: selftests: Fix missing break in dirty_log_perf_test arg parsing
x86/kvm: fix vcpu-id indexed array sizes
KVM: x86: Check the right feature bit for MSR_KVM_ASYNC_PF_ACK access
docs: virt: kvm: api.rst: replace some characters
KVM: Documentation: Fix KVM_CAP_ENFORCE_PV_FEATURE_CPUID name
KVM: nSVM: Swap the parameter order for svm_copy_vmrun_state()/svm_copy_vmloadsave_state()
KVM: nSVM: Rename nested_svm_vmloadsave() to svm_copy_vmloadsave_state()
KVM: arm64: selftests: get-reg-list: actually enable pmu regs in pmu sublist
KVM: selftests: change pthread_yield to sched_yield
KVM: arm64: Fix detection of shared VMAs on guest fault
Daniel Borkmann says:
====================
pull-request: bpf 2021-07-29
The following pull-request contains BPF updates for your *net* tree.
We've added 9 non-merge commits during the last 14 day(s) which contain
a total of 20 files changed, 446 insertions(+), 138 deletions(-).
The main changes are:
1) Fix UBSAN out-of-bounds splat for showing XDP link fdinfo, from Lorenz Bauer.
2) Fix insufficient Spectre v4 mitigation in BPF runtime, from Daniel Borkmann,
Piotr Krysiuk and Benedict Schlueter.
3) Batch of fixes for BPF sockmap found under stress testing, from John Fastabend.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of JITs, each of the JIT backends compiles the BPF nospec instruction
/either/ to a machine instruction which emits a speculation barrier /or/ to
/no/ machine instruction in case the underlying architecture is not affected
by Speculative Store Bypass or has different mitigations in place already.
This covers both x86 and (implicitly) arm64: In case of x86, we use 'lfence'
instruction for mitigation. In case of arm64, we rely on the firmware mitigation
as controlled via the ssbd kernel parameter. Whenever the mitigation is enabled,
it works for all of the kernel code with no need to provide any additional
instructions here (hence only comment in arm64 JIT). Other archs can follow
as needed. The BPF nospec instruction is specifically targeting Spectre v4
since i) we don't use a serialization barrier for the Spectre v1 case, and
ii) mitigation instructions for v1 and v4 might be different on some archs.
The BPF nospec is required for a future commit, where the BPF verifier does
annotate intermediate BPF programs with speculation barriers.
Co-developed-by: Piotr Krysiuk <piotras@gmail.com>
Co-developed-by: Benedict Schlueter <benedict.schlueter@rub.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Piotr Krysiuk <piotras@gmail.com>
Signed-off-by: Benedict Schlueter <benedict.schlueter@rub.de>
Acked-by: Alexei Starovoitov <ast@kernel.org>
The s390 variant of strncpy_from_user() is slightly faster than the
generic variant, however convert to the generic variant now to follow
most if not all other architectures.
Converting to the generic variant was already considered a couple of
years ago. See commit f5c8b96010 ("s390/uaccess: use sane length for
__strncpy_from_user()").
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
commit bc9e9e672d ("KVM: debugfs: Reuse binary stats descriptors")
did replace the old definitions with the binary ones. While doing that
it missed that some files are names different than the counters. This
is especially important for kvm_stat which does have special handling
for counters named instruction_*.
Fixes: commit bc9e9e672d ("KVM: debugfs: Reuse binary stats descriptors")
CC: Jing Zhang <jingzhangos@google.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-Id: <20210726150108.5603-1-borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
After all the changes to delay.c there are many includes which are not
needed anymore. Get rid of them.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Commit 7f16d7e787 ("s390: show virtualization support in /proc/cpuinfo")
introduced special handling for sie capability, saying this should not be
exposed via hwcaps, without giving a reason.
However this leads to an inconsistent /proc/cpuinfo features line
where all features except the sie capability are also present in
hwcaps. I really don't see a reason to not add that to hwcaps - it
might be quite pointless, but at least this way it is possible to get
rid of some special handling.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Remove the not so obvious "(elf_hwcap & (1UL << 2)" which only checks
if stfle is available. This used to be required for old code before
test_facility() was introduced. test_facility() will do the right
thing, regardless if stfle is available or not.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Remove a leftover from the common 31/64 bit code. z/Architecture mode
is now always active, there is no need to check.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The first six hwcap bits are initialized in a rather odd way: an array
contains the stfl(e) bits which need to be set, so that the
corresponding bit position (= array index) within hwcaps are set.
Better open code it like it is done for all other bits, making it
obvious which bit is set when.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
setup_hwcaps() is a quite large function. Make it smaller by moving
the elf platform setup code into an independent setup function.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Move setup_hwcaps() to processor.c for two reasons:
- make setup.c a bit smaller
- have allmost all of the hwcap code in one file
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Add BUILD_BUG_ON() sanity checks to make sure the hwcap string array
contains a string for each hwcap.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Introduce HWCAP bit numbers, making it easier to tell at which bit
number we currently are. Also use these bits with the BIT macro to
define the real HWCAP masks.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Remove s390 part of all HWCAP defines, just to make them shorter and
easier to handle. The namespace is anyway per architecture.
This is similar to what arm64 has.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
In order to support the use of enhanced PCI instructions in both kernel-
and userspace we need both hardware support and proper setup in the
kernel. The latter can be toggled off with the pci=nomio command line
option.
Thus availability of this feature in userspace depends on all of kernel
configuration (CONFIG_PCI), hardware support and the current kernel
command line and can thus not rely solely on a facility bit. Instead
let's introduce a new ELF hardware capability bit HWCAP_S390_PCI_MIO to
tell userspace whether these PCI instructions can be used.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Kernel support for the newer PCI mio instructions can be toggled off
with the pci=nomio command line option which needs to integrate with
common code PCI option parsing. However this option then toggles static
branches which can't be toggled yet in an early_param() call.
Thus commit 9964f396f1 ("s390: fix setting of mio addressing control")
moved toggling the static branches to the PCI init routine.
With this setup however we can't check for mio support outside the PCI
code during early boot, i.e. before switching the static branches, which
we need to be able to export this as an ELF HWCAP.
Improve on this by turning mio availability into a machine flag that
gets initially set based on CONFIG_PCI and the facility bit and gets
toggled off if pci=nomio is found during PCI option parsing allowing
simple access to this machine flag after early init.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Add hardware capability bits and feature tags to /proc/cpuinfo
for NNPA and Vector-Packed-Decimal-Enhancement Facility 2.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
These macros haven't seen any use in a long time. Also note that the
queue_irqs_*() ones wouldn't even compile anymore.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Now that all drivers use qdio_inspect_queue() and qdio's internal
queue tasklets are gone, the driver-specified queue handlers are
only called for async error reporting (eg. for an error condition in
the QEBSM code).
So take a moment to clean up the Output Queue handlers (they are
_always_ called with qdio_error != 0), and clarify which error types
can be reported through what interface. As Benjamin already suggested
a while ago, we should turn these into distinct enums at some point.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Both qdio drivers have moved away from using qdio's internal tasklet
and timer mechanisms for Output Queues. Rip out all the leftovers.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
There is no useful information within [STARTUP_NORMAL_OFFSET, HEAD_END] now.
But the memory region [0, STARTUP_NORMAL_OFFSET] is used by:
* lowcore
* kdump for swapping memory
* stand-alone zipl dumpers for code, data, stack and heap
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
This change simplifies the task of making the decompressor relocatable.
The decompressor's image contains special DMA sections between _sdma and
_edma. This DMA segment is loaded at boot as part of the decompressor and
then simply handed over to the decompressed kernel. The decompressor itself
never uses it in any way. The primary reason for this is the need to keep
the aforementioned DMA segment below 2GB which is required by architecture,
and because the decompressor is always loaded at a fixed low physical
address, it is guaranteed that the DMA region will not cross the 2GB
memory limit. If the DMA region had been placed in the decompressed kernel,
then KASLR would make this guarantee impossible to fulfill or it would
be restricted to the first 2GB of memory address space.
This commit moves all DMA sections between _sdma and _edma from
the decompressor's image to the decompressed kernel's image. The complete
DMA region is placed in the init section of the decompressed kernel and
immediately relocated below 2GB at start-up before it is needed by other
parts of the decompressed kernel. The relocation of the DMA region happens
even if the decompressed kernel is already located below 2GB in order
to keep the first implementation simple. The relocation should not have
any noticeable impact on boot time because the DMA segment is only a couple
of pages.
After relocating the DMA sections, the kernel has to fix all references
which point into it. In order to automate this, place all variables
pointing into the DMA sections in a special .dma.refs section. All such
variables must be defined using the new __dma_ref macro. Only variables
containing addresses within the DMA sections must be placed in the new
.dma.refs section.
Furthermore, move the initialization of control registers from
the decompressor to the decompressed kernel because some control registers
reference tables that must be placed in the DMA data section to
guarantee that their addresses are below 2G. Because the decompressed
kernel relocates the DMA sections at startup, the content of control
registers CR2, CR5 and CR15 must be updated with new addresses after
the relocation. The decompressed kernel initializes all control registers
early at boot and then updates the content of CR2, CR5 and CR15
as soon as the DMA relocation has occurred. This practically reverts
the commit a80313ff91 ("s390/kernel: introduce .dma sections").
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
As a preparation for moving the .dma.data section from the decompressor to
the decompressed kernel, the .dma.data section must be made relocatable
by replacing absolute memory addressing with relative one. This is required
in order to be able to relocate the DMA section to a memory address <= 2G
as required by the hardware architecture. The DMA section must be
relocated in case the decompressed kernel was loaded to an address >= 2G
which can occur if KASAN is enabled. By making the whole DMA section
position-independent we avoid applying relocations to it whenever it is
moved to a different address, which becomes possible as soon as it becomes
a part of the decompressed kernel.
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Both macros are used only in decompressor's head.S, unnecessary to put
them in a global header used in many places like setup.h is.
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
To reduce duplication, replace error-prone and hard-coded parameter area
offsets with auto-generated ones.
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The macros
* IPL_DEVICE_OFFSET
* INITRD_START_OFFSET
* INITRD_SIZE_OFFSET
* OLDMEM_BASE_OFFSET
* OLDMEM_SIZE_OFFSET
* KERNEL_VERSION_OFFSET
* COMMAND_LINE_OFFSET
are no longer necessary and used only to define another set of macros
with the same names but w/o the suffix _OFFSET. Therefore, drop this
unnecessary indirection.
Drop the macro KERNEL_VERSION_OFFSET w/o renaming it to KERNEL_VERSION
because it is used nowhere.
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
These symbolic constants are used only by assembler code now:
* COMMAND_LINE
* IPL_DEVICE
C code of the decompressed kernel should use boot data passed
by the decompressor instead.
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The new boot data struct shall replace global variables OLDMEM_BASE and
OLDMEM_SIZE. It is initialized in the decompressor and passed
to the decompressed kernel. In comparison to the old solution, this one
doesn't access data at fixed physical addresses which will become important
when the decompressor becomes relocatable.
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The new boot data struct shall replace global variables INITRD_START and
INITRD_SIZE. It is initialized in the decompressor and passed
to the decompressed kernel. In comparison to the old solution, this one
doesn't access data at fixed physical addresses which will become important
when the decompressor becomes relocatable.
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
To make the decompressor relocatable, the early SCLP buffer with a fixed
address must be replaced with a relocatable C buffer of the according size
and alignment as required by SCLP.
Introduce a new function sclp_early_set_buffer() into the SCLP driver
which enables the decompressor to change the SCLP early buffer at any time.
This will be useful when the decompressor becomes fully relocatable and
might need to change the SCLP early buffer to one with an address < 2G
as required by SCLP because it was loaded at an address >= 2G.
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Use STARTUP_NORMAL_OFFSET and STARTUP_KDUMP_OFFSET instead of magic numbers.
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Use system call functions instead of open-coding svc inline
assemblies. This is mostly to get rid of even more register asm
constructs.
Besides that, it makes the code also a bit easier to understand.
The generated code is identical to what is was before.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Provide generic system call functions which should be used whenever a
system call needs to be done from user space. The only in-kernel code
is vdso, which will be converted with a follow on patch.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Using register asm statements has been proven to be very error prone,
especially when using code instrumentation where gcc may add function
calls, which clobbers register contents in an unexpected way.
Therefore get rid of register asm statements in cpacf code, and make
sure this bug class cannot happen.
Reviewed-by: Patrick Steuer <patrick.steuer@de.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
De-duplicate checks for Protected Host Virtualization in decompressor and
kernel.
Set prot_virt_host=0 in the decompressor in *any* of the following cases
and hand it over to the decompressed kernel:
* No explicit prot_virt=1 is given on the kernel command-line
* Protected Guest Virtualization is enabled
* Hardware support not present
* kdump or stand-alone dump
The decompressed kernel needs to use only is_prot_virt_host() instead of
performing again all checks done by the decompressor.
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
A dump kernel is neither required nor able to support Secure Execution.
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>