The nmi.h header has some constant defines for control register bits.
These definitions should really be located in ctl_reg.h. Move and
rename the defines.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add a decoding union for the bits in control registers 2 and use
'union ctlreg0' and 'union ctlreg2' in update_cr_regs to improve
readability.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The smp_send_stop() function can be called from s390_handle_damage
while DAT is off. This happens if a machine check indicates that
kernel gprs or control registers can not be restored. The function
smp_send_stop reenables DAT via __load_psw_mask. That should work
for the case of lost kernel gprs and the system will do the expected
stop of all CPUs. But if control registers are lost, in particular
CR13 with the home space ASCE, interesting secondary crashes may
occur.
Make smp_emergency_stop callable from nmi.c and remove the cpumask
argument. Replace the smp_send_stop call with smp_emergency_stop in
the s390_handle_damage function.
In addition add notrace and NOKPROBE_SYMBOL annotations for all
functions required for the emergency shutdown.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The boot_vdso_data variable is related to the vdso code, the magic of the
initial vdso area for the early boot and the replacement of it in vdso_init
should all be put into vdso.c.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Enable niai instruction in the spinlock code at run-time for machines
on which facility 49 is available (zEC12 and newer).
Signed-off-by: Vasily Gorbik <gor@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Implement CPU alternatives, which allows to optionally patch newer
instructions at runtime, based on CPU facilities availability.
A new kernel boot parameter "noaltinstr" disables patching.
Current implementation is derived from x86 alternatives. Although
ideal instructions padding (when altinstr is longer then oldinstr)
is added at compile time, and no oldinstr nops optimization has to be
done at runtime. Also couple of compile time sanity checks are done:
1. oldinstr and altinstr must be <= 254 bytes long,
2. oldinstr and altinstr must not have an odd length.
alternative(oldinstr, altinstr, facility);
alternative_2(oldinstr, altinstr1, facility1, altinstr2, facility2);
Both compile time and runtime padding consists of either 6/4/2 bytes nop
or a jump (brcl) + 2 bytes nop filler if padding is longer then 6 bytes.
.altinstructions and .altinstr_replacement sections are part of
__init_begin : __init_end region and are freed after initialization.
Signed-off-by: Vasily Gorbik <gor@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Get rid of unused wrapper macros around debug_sprintf_exception.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
debug_event_common memsets the active debug entry with zeros to
prevent stale data leakage. This is overwritten with the actual
debug data in the next step. Only write zeros to that part of the
debug entry that's not used by new debug data.
Micro benchmarks show a 2-10% reduction of cpu cycles with this
approach.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
debug_event currently truncates the data if used with a size larger than
the buf_size of the debug feature. For lots of callers of this function,
wrappers have been implemented that loop until all data is handled.
Move that functionality into debug_event_common and get rid of the wrappers.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Before kexec boots to a crash kernel it checks whether the image in memory
changed after load. This is done by the function kdump_csum_valid, which
returns true, i.e. an int != 0, on success and 0 otherwise. In other words
when kdump_csum_valid returns an error code it means that the validation
succeeded. This is not only counterintuitive but also produces the wrong
result if the kernel was build without CONFIG_CRASH_DUMP. Fix this by
making kdump_csum_valid return a bool.
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZ5H6IAAoJEN7Pa5PG8C+vkIMP/jOw1YDx0307QoL4m5fLfu2P
sxHbNGiOmoIWdCl1muHUoYzyORAUVOS9q1UPlPXQ2M9EBba12ZXGabzQU3bhWhwO
CFRIlnNK7+D7JswImjwm24AgU/QiSE/XJDtHC8iuzTbDvihFdJG131bVg0bSb6Or
BrpySjNOHDIWe7ySnJlHdoC4noPtst4l33xkzNlY6plhVpT0OQvm2A/UFYV1CRaX
PrTGIw20Uf9Z1IlFwf0yuP/u0r/iRJCi06ym89XqUAUdtzcWpAiwMpydop1uIsmR
A15SYC6dvBYN0n3snqjI6EK+/+XHLlUbmaX/ughUnG3rXx5J4KGJRAeP6qPO7GMV
mYvavLe6FrlRvL5uk9zqAaPRYo3iWKhm43Q7923RFNzr5PLUSKDnN9EFsP22Ppi6
i1lv8khBHh4jx5ST/WjhGb77k6OZ6WbCdB6txOmV7I/VQIoF6XuVmlgpJAR0ZivA
1FtS7H62o+jTUyPV3a4sCjqfMg2PPJ2CKCVtYN1R33IWDjiovqNZsHRaIX7T7AIl
8AJLJOpUoWACplwilFpsIkJ0sjjAvGoTpAAfxiJEtLSYjqPMYyQMqpLPIMMQhFCi
KSNbuHpIpLujrSTCoSpgF5VWy5QgBQy9J3QN8nF/i2PE5izlPUoql3K1sb9dOzZQ
It83jpoK4vFl6a1Ra4SJ
=7y/+
-----END PGP SIGNATURE-----
Merge tag 'vfio-ccw-20171016' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/vfio-ccw into features
Pull vfio-ccw update from Cornelia Huck:
"Some improvements in data handling for vfio-ccw."
If the count field of a ccw is zero, there is no need to
try to pin page(s) for it. Let's check the count value
before starting pinning operations.
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Message-Id: <20171011023822.42948-3-bjsdjshi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
We currently return the same error code (-EFAULT) to indicate two
different error cases:
1. a bug in vfio-ccw implementation has been found.
2. a buggy channel program has been detected.
This brings difficulty for userland program (specifically Qemu) to
handle.
Let's use -EFAULT to only indicate the first case. For the second
case, we simply hand over the ccws to lower level for further
handling.
Notice:
Once a bad idaw address is detected, the current behavior is to
suppress the ssch. With this fix, the channel program will be
accepted, and part of the channel program (the part ahead of
the bad idaw) could possibly be executed by the device before
I/O conclusion.
Suggested-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Message-Id: <20171011023822.42948-2-bjsdjshi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
The debug feature code hasn't been touched in ages and the code also
looks like this. Therefore clean up the code so it looks a bit more
like current coding style.
There is no functional change - actually I made also sure that the
generated code with performance_defconfig is identical.
A diff of old vs new with "objdump -d" is empty.
The code is still not checkpatch clean, but that was not the goal.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
drivers/s390/crypto/pkey_api.c:128:11-18: WARNING:
kzalloc should be used for cprbmem, instead of kmalloc/memset
Use kzalloc rather than kmalloc followed by memset with 0
Signed-off-by: Vasyl Gomonovych <gomonovych@gmail.com>
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
For an unknown reason the s390 kprobes instruction replacement
function modifies the kprobe_status of the current CPU to
KPROBE_SWAP_INST. This was supposed to catch traps that happened
during instruction patching. Such a fault is not supposed to happen,
and silently discarding such a fault is certainly also not what we
want. In fact s390 is the only architecture which has this odd piece
of code.
Just remove this and behave like all other architectures. This was
pointed out by Jens Remus.
Reported-by: Jens Remus <jremus@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Just some trivial changes like removing the extern keyword from the
header file, renaming arguments to match the man pages, and whitespace
removal.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Like for the memset16/32/64 variants avoid that subsequent mvc
instructions depend on each other since that might have negative
performance impacts.
This patch is currently hardly relevant since at least gcc 7.1
generates only inline memset code and not a single memset call.
However there is no reason to not provide an optimized version
just in case gcc generates memset calls again, like it did in
the past.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Use memset64 instead of the (now) open-coded variant clear_table.
Performance wise there is no difference.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Provide fast versions of the new memset variants. E.g. the generic
memset64 is ten times slower than the optimized version if used on a
whole page.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add a syscall of s390_sthyi to implement STHYI instruction in LPAR
which reuses the implementation for KVM by Janosch Frank -
commit 95ca2cb579 ("KVM: s390: Add sthyi emulation").
STHYI(Store Hypervisor Information) is an emulated z/VM instruction that
provides a guest with basic information about the layers it is running
on. This includes information about the cpu configuration of both the
machine and the lpar, as well as their names, machine model and
machine type. This information enables an application to determine the
maximum capacity of CPs and IFLs available to software.
For the arguments of s390_sthyi, code shall be 0 and flags is reserved for
future use, info is the output argument to store the required hypervisor
info.
Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
STHYI requires extensive locking in the higher hypervisors and is
very computational/memory expensive. Therefore we cache the retrieved
hypervisor info whose valid period is 1s with mutex to allow concurrent
access. rw semaphore can't benefit here due to cache line bounce.
Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
As we need to support sthyi instruction on LPAR too, move the common code
to kernel part and kvm related code to intercept.c for better reuse.
Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
We never optimized our rwsem inline assemblies to make use of the new
atomic instructions. The generic rwsem implementation implicitly makes
use of the new instructions, since it implements the required rwsem
primitives with atomic operations, which we did optimize.
However even when compiling for old architectures the generic variant
still generates better code. So it's time to simply remove our old
code and switch to the generic implementation.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This instruction came with a z/VM extension and not with a specific
machine generation.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove a couple of instructions that are listed twice.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The e7 opcode table does not have an end marker. Hence when trying to
find an unknown e7 instruction the code will access memory behind the
table until it finds something that matches the opcode, or the kernel
crashes, whatever comes first.
This affects not only the in-kernel disassembler but also uprobes and
kprobes which refuse to set a probe on unknown instructions, and
therefore search the opcode tables to figure out if instructions are
known or not.
Cc: <stable@vger.kernel.org> # v3.18+
Fixes: 3585cb0280 ("s390/disassembler: add vector instructions")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
There is no recent user space application available anymore which still
supports this old virtio transport. Additionally, commit 3b2fbb3f06
("virtio/s390: deprecate old transport") introduced a deprecation message
in the driver, and apparently nobody complained so far that it is still
required. So let's simply remove it.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When grouping devices, the ccwgroup core only checks whether all of the
devices are bound to the same ccw_driver. It has no means of checking
if the requesting ccwgroup driver actually supports this device type.
qeth implements its own device matching in qeth_core_probe_device(),
while ctcm and lcs currently have no sanity-checking at all.
Enable ccwgroup drivers to optionally defer the device type checking to
the ccwgroup core, by specifying their supported ccw_driver.
This allows us drop the device type matching from qeth, and improves
the robustness of ctcm and lcs.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Acked-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch introduces gcm(aes) support into the aes_s390 kernel module.
Signed-off-by: Patrick Steuer <patrick.steuer@de.ibm.com>
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Like the common queued rwlock code the s390 implementation uses the
queued spinlock code on a spinlock_t embedded in the rwlock_t to achieve
the queueing. The encoding of the rwlock_t differs though, the counter
field in the rwlock_t is split into two parts. The upper two bytes hold
the write bit and the write wait counter, the lower two bytes hold the
read counter.
The arch_read_lock operation works exactly like the common qrwlock but
the enqueue operation for a writer follows a diffent logic. After the
failed inline try to get the rwlock in write, the writer first increases
the write wait counter, acquires the wait spin_lock for the queueing,
and then loops until there are no readers and the write bit is zero.
Without the write wait counter a CPU that just released the rwlock
could immediately reacquire the lock in the inline code, bypassing all
outstanding read and write waiters. For s390 this would cause massive
imbalances in favour of writers in case of a contended rwlock.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The queued spinlock code for s390 follows the principles of the common
code qspinlock implementation but with a few notable differences.
The format of the spinlock_t locking word differs, s390 needs to store
the logical CPU number of the lock holder in the spinlock_t to be able
to use the diagnose 9c directed yield hypervisor call.
The inline code sequences for spin_lock and spin_unlock are nice and
short. The inline portion of a spin_lock now typically looks like this:
lhi %r0,0 # 0 indicates an empty lock
l %r1,0x3a0 # CPU number + 1 from lowcore
cs %r0,%r1,<some_lock> # lock operation
jnz call_wait # on failure call wait function
locked:
...
call_wait:
la %r2,<some_lock>
brasl %r14,arch_spin_lock_wait
j locked
A spin_unlock is as simple as before:
lhi %r0,0
sth %r0,2(%r2) # unlock operation
After a CPU has queued itself it may not enable interrupts again for the
arch_spin_lock_flags() variant. The arch_spin_lock_wait_flags wait function
is removed.
To improve performance the code implements opportunistic lock stealing.
If the wait function finds a spinlock_t that indicates that the lock is
free but there are queued waiters, the CPU may steal the lock up to three
times without queueing itself. The lock stealing update the steal counter
in the lock word to prevent more than 3 steals. The counter is reset at
the time the CPU next in the queue successfully takes the lock.
While the queued spinlocks improve performance in a system with dedicated
CPUs, in a virtualized environment with continuously overcommitted CPUs
the queued spinlocks can have a negative effect on performance. This
is due to the fact that a queued CPU that is preempted by the hypervisor
will block the queue at some point even without holding the lock. With
the classic spinlock it does not matter if a CPU is preempted that waits
for the lock. Therefore use the queued spinlock code only if the system
runs with dedicated CPUs and fall back to classic spinlocks when running
with shared CPUs.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The queued spinlock code will come out simpler if the encoding of
the CPU that holds the spinlock is (cpu+1) instead of (~cpu).
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The topology information returned by STSI 15.x.x contains a flag
if the CPUs of a topology-list are dedicated or shared. Make this
information available if the machine provides topology information.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Use setup_timer and mod_timer API instead of structure assignments.
This is done using Coccinelle and semantic patch used
for this as follows:
@@
expression x,y,z,a,b;
@@
-init_timer (&x);
+setup_timer (&x, y, z);
+mod_timer (&a, b);
-x.function = y;
-x.data = z;
-x.expires = b;
-add_timer(&a);
Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Paul Burton reported that the nr_cpumask_bits check
within cpumsf_pmu_event_init() is not necessary.
Actually there is already a prior check within
perf_event_alloc(). Therefore remove the check.
Reported-by: Paul Burton <paul.burton@imgtec.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The function to prepare MEX type 50 ap messages did
not explicitly check for the data length in case of
data > 512 bytes. Instead the function assumes the
boundary check done in the ioctl function will always
reject requests with invalid data length values.
However, screening just the function code may give the
illusion, that there may be a gap which could be
exploited by userspace for buffer overwrite attacks.
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Instead of open coding tod clock to ns conversions use the timex helper.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
All (but one) cmf related sysfs attributes have been converted to read
directly from the measurement block using cmf_read. This is not
possible for the avg_utilization attribute since this is an aggregation
of several values for which cmf_read only returns average values.
Move the computation of the utilization value to the cmf_read interface
such that it can use the raw data.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
To ensure data consistency when reading from the channel measurement
block we wait for the subchannel to become idle and copy the whole
block. This is unnecessary when all we do is export the individual
values via sysfs. Read the values for sysfs export directly from
the measurement block.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
cmf_copy_block tries to ensure data consistency by copying the
channel measurement block twice and comparing the data. This was
needed on very old machines only. Nowadays we guarantee
consistency by copying the data when the channel is idle.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
No need for refcounting - the data can be on stack. Also change
the locking in this function to only use spin_lock_irq (the
function waits, thus it's never called from IRQ context).
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
No need for refcounting - the data can be on stack.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>