This fixes a regression that came with 934b2857cc
("[S390] nohz/sclp: disable timer on synchronous waits.").
If udelay() gets called from a disabled context it sets the clock comparator
to a value where it expects the next interrupt. When the interrupt happens
the clock comparator gets not reset and therefore the interrupt condition
doesn't get cleared. The result is an endless timer interrupt loop.
In addition this patch fixes also the following:
rcutorture reveals that our __udelay implementation is still buggy,
since it might schedule tasklets, but prevents their execution:
NOHZ: local_softirq_pending 42
NOHZ: local_softirq_pending 02
NOHZ: local_softirq_pending 142
NOHZ: local_softirq_pending 02
To fix this we make sure that only the clock comparator interrupt
is enabled when the enabled wait psw is loaded.
Also no code gets called anymore which might schedule tasklets.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When running a 31-bit ptrace, on either an s390 or s390x kernel,
reads and writes into a padding area in struct user_regs_struct32
will result in a kernel panic.
This is also known as CVE-2008-1514.
Test case available here:
http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/user-area-padding.c?cvsroot=systemtap
Steps to reproduce:
1) wget the above
2) gcc -o user-area-padding-31bit user-area-padding.c -Wall -ggdb2 -D_GNU_SOURCE -m31
3) ./user-area-padding-31bit
<panic>
Test status
-----------
Without patch, both s390 and s390x kernels panic. With patch, the test case,
as well as the gdb testsuite, pass without incident, padding area reads
returning zero, writes ignored.
Nb: original version returned -EINVAL on write attempts, which broke the
gdb test and made the test case slightly unhappy, Jan Kratochvil suggested
the change to return 0 on write attempts.
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Tested-by: Jan Kratochvil <jan.kratochvil@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove the now unneeded s390_idle.lock spinlock initialization after
Josef Sipek did it the right way in arch/s390/kernel/process.c.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fix these two (false positive) warnings by adding an __init annoation:
WARNING: vmlinux.o(.text+0x7e6a): Section mismatch in reference from the function stp_reset() to the function .init.text:__alloc_bootmem()
The function stp_reset() references
the function __init __alloc_bootmem().
This is often because stp_reset lacks a __init
annotation or the annotation of __alloc_bootmem is wrong.
WARNING: vmlinux.o(.text+0x7ece): Section mismatch in reference from the function stp_reset() to the function .init.text:free_bootmem()
The function stp_reset() references
the function __init free_bootmem().
This is often because stp_reset lacks a __init
annotation or the annotation of free_bootmem is wrong.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The result of the diag 0x260 call is not always what one would expect.
So just remove it.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
During startup we check if diag308 works using diag 308 subcode 6,
which stores the actual ipl information. This fails with rc = 0x102, if
the system has been ipled from the HMC using load from CD or load from file.
In the case of rc = 0x102 we have to assume that diag 308 is working,
since it still can be used to ipl from an alternative device.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Now that it is safe to use get_online_cpus() we can revert
[S390] cpu topology: Fix possible deadlock.
commit: fd781fa25c
and call arch_reinit_sched_domains() directly from topology_work_fn().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Paul Menage <menage@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently list of kretprobe instances are stored in kretprobe object (as
used_instances,free_instances) and in kretprobe hash table. We have one
global kretprobe lock to serialise the access to these lists. This causes
only one kretprobe handler to execute at a time. Hence affects system
performance, particularly on SMP systems and when return probe is set on
lot of functions (like on all systemcalls).
Solution proposed here gives fine-grain locks that performs better on SMP
system compared to present kretprobe implementation.
Solution:
1) Instead of having one global lock to protect kretprobe instances
present in kretprobe object and kretprobe hash table. We will have
two locks, one lock for protecting kretprobe hash table and another
lock for kretporbe object.
2) We hold lock present in kretprobe object while we modify kretprobe
instance in kretprobe object and we hold per-hash-list lock while
modifying kretprobe instances present in that hash list. To prevent
deadlock, we never grab a per-hash-list lock while holding a kretprobe
lock.
3) We can remove used_instances from struct kretprobe, as we can
track used instances of kretprobe instances using kretprobe hash
table.
Time duration for kernel compilation ("make -j 8") on a 8-way ppc64 system
with return probes set on all systemcalls looks like this.
cacheline non-cacheline Un-patched kernel
aligned patch aligned patch
===============================================================================
real 9m46.784s 9m54.412s 10m2.450s
user 40m5.715s 40m7.142s 40m4.273s
sys 2m57.754s 2m58.583s 3m17.430s
===========================================================
Time duration for kernel compilation ("make -j 8) on the same system, when
kernel is not probed.
=========================
real 9m26.389s
user 40m8.775s
sys 2m7.283s
=========================
Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com>
Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch enables virtio_console as the default console on kvm for
s390. We currently use the same notify hack as lguest for early
console output. I will try to address this for lguest and s390 later.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
nohz: adjust tick_nohz_stop_sched_tick() call of s390 as well
nohz: prevent tick stop outside of the idle loop
This allow to dynamically generate attributes and share show/store
functions between attributes. Right now most attributes are generated
by special macros and lots of duplicated code. With the attribute
passed it's instead possible to attach some data to the attribute
and then use that in shared low level functions to do different things.
I need this for the dynamically generated bank attributes in the x86
machine check code, but it'll allow some further cleanups.
I converted all users in tree to the new show/store prototype. It's a single
huge patch to avoid unbisectable sections.
Runtime tested: x86-32, x86-64
Compiled only: ia64, powerpc
Not compile tested/only grep converted: sh, arm, avr32
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add missing module.h include to fix this:
CC arch/s390/kernel/stacktrace.o
arch/s390/kernel/stacktrace.c:84: warning: data definition has no type or storage class
arch/s390/kernel/stacktrace.c:84: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL'
arch/s390/kernel/stacktrace.c:84: warning: parameter names (without types) in function declaration
arch/s390/kernel/stacktrace.c:97: warning: data definition has no type or storage class
arch/s390/kernel/stacktrace.c:97: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL'
arch/s390/kernel/stacktrace.c:97: warning: parameter names (without types) in function declaration
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Most likely it is broken anyway because of the changes in memory
detection. Since we can't test it and there are probably better ways
that using a P390 card, remove support for it.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Move memory detection code to own file and also simplify it.
Also add an interface which can be called at any time to get the
current memory layout. This interface is needed by our kernel
internal system dumper.
Cc: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Cc: Michael Holzheu <holzheu@de.ibm.com>
Cc: Frank Munzert <munzert@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
As noted by Akinobu Mita in patch b1fceac2b9,
alloc_bootmem and related functions never return NULL and always return a
zeroed region of memory. Thus a NULL test or memset after calls to these
functions is unnecessary.
arch/s390/kernel/topology.c | 2 --
1 file changed, 2 deletions(-)
This was fixed using the following semantic patch.
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
expression E;
statement S;
@@
E = \(alloc_bootmem\|alloc_bootmem_low\|alloc_bootmem_pages\|alloc_bootmem_low_pages\)(...)
... when != E
(
- BUG_ON (E == NULL);
|
- if (E == NULL) S
)
@@
expression E,E1;
@@
E = \(alloc_bootmem\|alloc_bootmem_low\|alloc_bootmem_pages\|alloc_bootmem_low_pages\)(...)
... when != E
- memset(E,0,E1);
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Now it is possible to specify additional kernel parameters on the IPL
command line using the IPL PARM option.
If the Linux system is already running, the new reipl sysfs attribute
'parm' can be used to change kernel parameters for the next reboot.
Examples:
IPL C PARM dasd=1234 root=/dev/dasda1
IPL 1234 PARM savesys=mylnxnss
echo "init=/bin/bash" > /sys/firmware/reipl/ccw/parm
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
The idle notifier chain consists of at most one element. So there's
no point in having a notifier chain. Remove it and directly call the
function.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add support for clock synchronization with the server time protocol.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
In case the initrd is located within the bss section it will be
overwritten when the section is cleared. To prevent this just move
the initrd right behind the bss section if it starts within the
section.
The current code already moves the initrd if the bootmem allocator
bitmap would overwrite it. With this patch we should be safe against
initrd corruptions.
Cc: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add the user_regset definitions for normal and compat processes, replace
the dump_regs core dump cruft with the generic CORE_DUMP_USER_REGSET and
replace binfmt_elf32.c with the generic compat_binfmt_elf.c implementation.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Using the ipldelay kernel parameter leads to a crash at IPL time.
Since this is broken since a long time it looks like nobody is using
it anymore. So remove it instead of fixing it.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Andrew Morton reported this against linux-next:
ERROR: ".save_stack_trace" [tests/backtracetest.ko] undefined!
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It's not even passed on to smp_call_function() anymore, since that
was removed. So kill it.
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
It's never used and the comments refer to nonatomic and retry
interchangably. So get rid of it.
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The first argument to __ctl_store() should be the array to store
stuff in, not just the first element of that array. With the
current code in __cpu_up(), mainline GCC dies with an internal
compiler error. I didn't diagnose that further, but just fixed
the kernel bug.
Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
The correct instruction format of idte is "idte r1,r3,r2" with
r1 at bit 24, r3 at bit 16 and r2 at bit 28.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This fixes the last remaining section mismatch warnings in s390
architecture code. It reveals also a real bug introduced by... me
with git commit 2069e978d5
("[S390] sparsemem vmemmap: initialize memmap.")
Calling the generic vmemmap_alloc_block() function to get initialized
memory is a nice idea, however that function is __meminit annotated
and therefore the function might be gone if we try to call it later.
This can happen if a DCSS segment gets added.
So basically revert the patch and clear the memmap explicitly to fix
the original bug.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Surround all the code withing show_interrupts() with
get/put_online_cpus() to prevent strange results wrt cpu hotplug.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Both smp_call_function() and __smp_call_function_map() access
cpu_online_map. Both functions run with preemption disabled which
protects for cpus going offline. However new cpus can be added and
therefore the cpu_online_map can change unexpectedly.
So use the call_lock to protect against changes to the cpu_online_map
in start_secondary() and all smp_call_* functions.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
We should use const char * for passing the name of the debug feature
around since it will not be changed.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This removes redundant arch code for generic ptrace requests
already handled by ptrace_request and compat_ptrace_request.
It simplifies things to just have the standard entry points,
and use the generic compat_sys_ptrace.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch fixes a bug with cpu bound guest on kvm-s390. Sometimes it
was impossible to deliver a signal to a spinning guest. We used
preemption as a circumvention. The preemption notifiers called
vcpu_load, which checked for pending signals and triggered a host
intercept. But even with preemption, a sigkill was not delivered
immediately.
This patch changes the low level host interrupt handler to check for the
SIE instruction, if TIF_WORK is set. In that case we change the
instruction pointer of the return PSW to rerun the vcpu_run loop. The kvm
code sees an intercept reason 0 if that happens. This patch adds accounting
for these types of intercept as well.
The advantages:
- works with and without preemption
- signals are delivered immediately
- much better host latencies without preemption
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
On return from syscall or interrupt, we have to check if we return to
userspace (likely) and if there is work todo (less likely) to decide
if we handle the work. We can optimize this check: we first check for
the less likely work case and then check for userspace.
This patch is also a preparation for an additional patch, that fixes a bug
in KVM dealing with cpu bound guests.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This replaces the duplicated arch-specific versions of "sys_pipe()" with
one unified implementation. This removes almost 250 lines of duplicated
code.
It's marked __weak, so that *if* an architecture wants to override the
default implementation it can do so by simply having its own replacement
version, since many architectures use alternate calling conventions for
the 'pipe()' system call for legacy reasons (ie traditional UNIX
implementations often return the two file descriptors in registers)
I still haven't changed the cris version even though Linus says the BKL
isn't needed. The arch maintainer can easily do it if there are really
no obstacles.
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>