Move memory detection code to own file and also simplify it.
Also add an interface which can be called at any time to get the
current memory layout. This interface is needed by our kernel
internal system dumper.
Cc: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Cc: Michael Holzheu <holzheu@de.ibm.com>
Cc: Frank Munzert <munzert@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
As noted by Akinobu Mita in patch b1fceac2b9,
alloc_bootmem and related functions never return NULL and always return a
zeroed region of memory. Thus a NULL test or memset after calls to these
functions is unnecessary.
arch/s390/kernel/topology.c | 2 --
1 file changed, 2 deletions(-)
This was fixed using the following semantic patch.
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
expression E;
statement S;
@@
E = \(alloc_bootmem\|alloc_bootmem_low\|alloc_bootmem_pages\|alloc_bootmem_low_pages\)(...)
... when != E
(
- BUG_ON (E == NULL);
|
- if (E == NULL) S
)
@@
expression E,E1;
@@
E = \(alloc_bootmem\|alloc_bootmem_low\|alloc_bootmem_pages\|alloc_bootmem_low_pages\)(...)
... when != E
- memset(E,0,E1);
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Now it is possible to specify additional kernel parameters on the IPL
command line using the IPL PARM option.
If the Linux system is already running, the new reipl sysfs attribute
'parm' can be used to change kernel parameters for the next reboot.
Examples:
IPL C PARM dasd=1234 root=/dev/dasda1
IPL 1234 PARM savesys=mylnxnss
echo "init=/bin/bash" > /sys/firmware/reipl/ccw/parm
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
The idle notifier chain consists of at most one element. So there's
no point in having a notifier chain. Remove it and directly call the
function.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch adds a driver for subchannels of type chsc.
A device /dev/chsc is created which may be used to issue ioctls to:
- obtain information about the machine's I/O configuration
- dynamically change the machine's I/O configuration via
asynchronous chsc commands
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Add support for clock synchronization with the server time protocol.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
In case the initrd is located within the bss section it will be
overwritten when the section is cleared. To prevent this just move
the initrd right behind the bss section if it starts within the
section.
The current code already moves the initrd if the bootmem allocator
bitmap would overwrite it. With this patch we should be safe against
initrd corruptions.
Cc: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add the user_regset definitions for normal and compat processes, replace
the dump_regs core dump cruft with the generic CORE_DUMP_USER_REGSET and
replace binfmt_elf32.c with the generic compat_binfmt_elf.c implementation.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Using the ipldelay kernel parameter leads to a crash at IPL time.
Since this is broken since a long time it looks like nobody is using
it anymore. So remove it instead of fixing it.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Avoid compile error by using EXPORT_SYMBOL_GPL(si_swapinfo) only if
CONFIG_SWAP is set.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Bevore issuing any s390 crypto operation check whether the
CPACF facility is enabled in the facility list. That way a
virtualization layer can prevent usage of the CPACF facility
regardless of the availability of the crypto instructions.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Andrew Morton reported this against linux-next:
ERROR: ".save_stack_trace" [tests/backtracetest.ko] undefined!
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It's not even passed on to smp_call_function() anymore, since that
was removed. So kill it.
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
It's never used and the comments refer to nonatomic and retry
interchangably. So get rid of it.
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
KVM: MMU: Fix is_empty_shadow_page() check
KVM: MMU: Fix printk() format string
KVM: IOAPIC: only set remote_irr if interrupt was injected
KVM: MMU: reschedule during shadow teardown
KVM: VMX: Clear CR4.VMXE in hardware_disable
KVM: migrate PIT timer
KVM: ppc: Report bad GFNs
KVM: ppc: Use a read lock around MMU operations, and release it on error
KVM: ppc: Remove unmatched kunmap() call
KVM: ppc: add lwzx/stwz emulation
KVM: ppc: Remove duplicate function
KVM: s390: Fix race condition in kvm_s390_handle_wait
KVM: s390: Send program check on access error
KVM: s390: fix interrupt delivery
KVM: s390: handle machine checks when guest is running
KVM: s390: fix locking order problem in enable_sie
KVM: s390: use yield instead of schedule to implement diag 0x44
KVM: x86 emulator: fix hypercall return value on AMD
KVM: ia64: fix zero extending for mmio ld1/2/4 emulation in KVM
The first argument to __ctl_store() should be the array to store
stuff in, not just the first element of that array. With the
current code in __cpu_up(), mainline GCC dies with an internal
compiler error. I didn't diagnose that further, but just fixed
the kernel bug.
Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
If a memory range is supposed to be added to the 1:1 mapping and it
ends just below the maximum supported physical address it won't
succeed. This is because a test doesn't consider that the end address
is 1 smaller than start + size.
Fix the comparison.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In case of !64BIT kernel we end up with a zero sized mem_section array.
This happens because NR_MEM_SECTIONS is smaller than SECTIONS_PER_ROOT
but we have:
#define NR_SECTION_ROOTS (NR_MEM_SECTIONS / SECTIONS_PER_ROOT)
and
struct mem_section *mem_section[NR_SECTION_ROOTS];
So fix this by selecting SPARSEMEM_STATIC which makes sure
that SECTIONS_PER_ROOT is 1.
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The call to add_timer was issued before local_int.lock was taken and before
timer_due was set to 0. If the timer expires before the lock is being taken,
the timer function will set timer_due to 1 and exit before the vcpu falls
asleep. Depending on other external events, the vcpu might sleep forever.
This fix pulls setting timer_due to the beginning of the function before
add_timer, which ensures correct behavior.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
If the guest accesses non-existing memory, the sie64a function returns
-EFAULT. We must check the return value and send a program check to the
guest if the sie instruction faulted, otherwise the guest will loop at
the faulting code.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The current code delivers pending interrupts before it checks for
need_resched. On a busy host, this can lead to a longer interrupt
latency if the interrupt is injected while the process is scheduled
away. This patch moves delivering the interrupt _after_ schedule(),
which makes more sense.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The low-level interrupt handler on s390 checks for _TIF_WORK_INT and
exits the guest context, if work is pending.
TIF_WORK_INT is defined as_TIF_SIGPENDING | _TIF_NEED_RESCHED |
_TIF_MCCK_PENDING. Currently the sie loop checks for signals and
reschedule, but it does not check for machine checks. That means that
we exit the guest context if a machine check is pending, but we do not
handle the machine check.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
CC: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
There are potential locking problem in enable_sie. We take the task_lock
and the mmap_sem. As exit_mm uses the same locks vice versa, this triggers
a lockdep warning.
The second problem is that dup_mm and mmput might sleep, so we must not
hold the task_lock at that moment.
The solution is to dup the mm unconditional and use the task_lock before and
afterwards to check if we can use the new mm. dup_mm and mmput are called
outside the task_lock, but we run update_mm while holding the task_lock,
protection us against ptrace.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
diag 0x44 is the common way on s390 to yield the cpu to the hypervisor.
It is called by the guest in cpu_relax and in the spinlock code to
yield to other guest cpus.
This semantic is similar to yield. Lets replace the call to schedule with
yield to make sure that current is really yielding.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The correct instruction format of idte is "idte r1,r3,r2" with
r1 at bit 24, r3 at bit 16 and r2 at bit 28.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Convert show_mem() so its nearly the same as on x86/powerpc.
Gives us proper locking and we get also rid of the only use of max_mapnr.
Also the number of pages was contained in an int which might not be
sufficient not too far in the future.
Cc: Johannes Weiner <hannes@saeurebad.de>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Use get_online_cpus() to prevent cpu hotplug in situations where
for_each_online_cpu() is called.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This fixes the last remaining section mismatch warnings in s390
architecture code. It reveals also a real bug introduced by... me
with git commit 2069e978d5
("[S390] sparsemem vmemmap: initialize memmap.")
Calling the generic vmemmap_alloc_block() function to get initialized
memory is a nice idea, however that function is __meminit annotated
and therefore the function might be gone if we try to call it later.
This can happen if a DCSS segment gets added.
So basically revert the patch and clear the memmap explicitly to fix
the original bug.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
On s390 make allnoconfig fails with the following build error:
arch/s390/mm/init.c: In function 'show_mem':
arch/s390/mm/init.c:55: error: implicit declaration of function 'pfn_valid'
make[1]: *** [arch/s390/mm/init.o] Error 1
make: *** [arch/s390/mm] Error 2
This problem can by fixed ensuring that ARCH_SELECT_MEMORY_MODEL
is always turned on.
Signed-off-by: Hans-Joachim Picht <hans@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Surround all the code withing show_interrupts() with
get/put_online_cpus() to prevent strange results wrt cpu hotplug.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Both smp_call_function() and __smp_call_function_map() access
cpu_online_map. Both functions run with preemption disabled which
protects for cpus going offline. However new cpus can be added and
therefore the cpu_online_map can change unexpectedly.
So use the call_lock to protect against changes to the cpu_online_map
in start_secondary() and all smp_call_* functions.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
We should use const char * for passing the name of the debug feature
around since it will not be changed.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Let's just use the generic vmmemmap_alloc_block() function which
always returns initialized memory.
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Use the existing arch_alloc_page/arch_free_page callbacks to do
the guest page state transitions between stable and unused.
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This removes redundant arch code for generic ptrace requests
already handled by ptrace_request and compat_ptrace_request.
It simplifies things to just have the standard entry points,
and use the generic compat_sys_ptrace.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch fixes a bug with cpu bound guest on kvm-s390. Sometimes it
was impossible to deliver a signal to a spinning guest. We used
preemption as a circumvention. The preemption notifiers called
vcpu_load, which checked for pending signals and triggered a host
intercept. But even with preemption, a sigkill was not delivered
immediately.
This patch changes the low level host interrupt handler to check for the
SIE instruction, if TIF_WORK is set. In that case we change the
instruction pointer of the return PSW to rerun the vcpu_run loop. The kvm
code sees an intercept reason 0 if that happens. This patch adds accounting
for these types of intercept as well.
The advantages:
- works with and without preemption
- signals are delivered immediately
- much better host latencies without preemption
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
On return from syscall or interrupt, we have to check if we return to
userspace (likely) and if there is work todo (less likely) to decide
if we handle the work. We can optimize this check: we first check for
the less likely work case and then check for userspace.
This patch is also a preparation for an additional patch, that fixes a bug
in KVM dealing with cpu bound guests.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This replaces the duplicated arch-specific versions of "sys_pipe()" with
one unified implementation. This removes almost 250 lines of duplicated
code.
It's marked __weak, so that *if* an architecture wants to override the
default implementation it can do so by simply having its own replacement
version, since many architectures use alternate calling conventions for
the 'pipe()' system call for legacy reasons (ie traditional UNIX
implementations often return the two file descriptors in registers)
I still haven't changed the cris version even though Linus says the BKL
isn't needed. The arch maintainer can easily do it if there are really
no obstacles.
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
TIF_RESTORE_SIGMASK no longer needs to be in the _TIF_WORK_* masks. Those low
bits are scarce, and are all used up now. Renumber TIF_RESTORE_SIGMASK to
free one up.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After the PT_IEEE_IP hack has been removed s390 can now use
the common code sys_ptrace function.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The self referential PT_IEEE_IP ptrace peek & poke calls have been
broken for that last 6 years. For peek the code always returns 0
instead of the last ieee fault and for poke the code does nothing.
Since nobody noticed the code seems to be superfluous. So lets
remove it.
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Convert s390 to SPARSEMEM and SPARSEMEM_VMEMMAP. We do a select
of SPARSEMEM_VMEMMAP since it is configurable. This is because
SPARSEMEM without SPARSEMEM_VMEMMAP gives us a hell of broken
include dependencies that I don't want to fix.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This adds hugetlbfs support on System z, using both hardware large page
support if available and software large page emulation on older hardware.
Shared (large) page tables are implemented in software emulation mode,
by using page->index of the first tail page from a compound large page
to store page table information.
Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
From: Heiko Carstens <heiko.carstens@de.ibm.com>
From: Carsten Otte <cotte@de.ibm.com>
This lets us use defines for the magic bits in machine flags instead
of using plain numbers all over the place.
In addition on newer machines features/facilities are indicated by the
result of the stfl instruction. So we use these bits instead of trying
to execute new instructions and check wether we get an exception or
not.
Also the mvpg instruction is always available when in zArch mode,
whereas the idte instruction is only available in zArch mode. This
results in some minor optimizations.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Always use clear_table to initialise page tables. The overlapping
memcpy is just a leftover of a previous version that wasn't fully
converted to clear_table.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
arch/s390/lib/uaccess_mvcos.c:166:
warning: 'strnlen_user_mvcos' defined but not used
arch/s390/lib/uaccess_mvcos.c:186:
warning: 'strncpy_from_user_mvcos' defined but not used
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When we get a notification that cpu topology changed, we schedule a
work struct which just calls arch_reinit_sched_domains. This function
in turn calls get_online_cpus() which results int the lockdep warning
below.
After all it turnded out that it's not legal to call get_online_cpus()
from the context of a multi-threaded work queue.
It could deadlock this way:
process 0 (events/cpu-x):
-> run_workqueue
-> removes my work_struct from the work queue
-> calls work_struct->fn
-> get_online_cpus()
-> locks on cpu_hotplug.lock since process 1 below is doing cpu hotplug
process 1:
-> cpu_down (for cpu-x)
-> cpu_hotplug_begin (holds cpu_hotplug.lock now)
-> cpu-x dead
-> notifier_call_chain with CPU_DEAD
-> cleanup_workqueue_thread
-> flush_cpu_workqueue (succeeds)
-> kthread_stop for events/cpu-x
-> now kthread_stop waits for my work_struct to complete from within
process 0. -> dead.
A single threaded workqueue wouldn't have such problems, however there is
no such common queue available and it's not worth to create one for the
very rare calls to arch_reinit_sched_domains.
So we just create a kernel thread from our work struct which calls
arch_reinit_sched_domains and are done with it.
Thanks to Oleg Nesterov and Peter Zijlstra for helping me figuring out
that this isn't a false positive lockdep warning:
=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.25-03562-g3dc5063-dirty #12
-------------------------------------------------------
events/3/14 is trying to acquire lock:
(&cpu_hotplug.lock){--..}, at: [<0000000000076094>] get_online_cpus+0x50/0x78
but task is already holding lock:
(topology_work){--..}, at: [<0000000000059cde>] run_workqueue+0x106/0x278
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (topology_work){--..}:
[<000000000006fc74>] __lock_acquire+0x1010/0x111c
[<000000000006fe40>] lock_acquire+0xc0/0xf8
[<0000000000059d48>] run_workqueue+0x170/0x278
[<0000000000059edc>] worker_thread+0x8c/0xf0
[<000000000005f5bc>] kthread+0x68/0xa0
[<000000000001a33e>] kernel_thread_starter+0x6/0xc
[<000000000001a338>] kernel_thread_starter+0x0/0xc
-> #1 (events){--..}:
[<000000000006fc74>] __lock_acquire+0x1010/0x111c
[<000000000006fe40>] lock_acquire+0xc0/0xf8
[<000000000005a23c>] cleanup_workqueue_thread+0x60/0xa8
[<00000000003b2ab8>] workqueue_cpu_callback+0xbc/0x170
[<00000000003bba80>] notifier_call_chain+0x5c/0xa4
[<00000000000655a2>] __raw_notifier_call_chain+0x26/0x38
[<00000000000655e2>] raw_notifier_call_chain+0x2e/0x40
[<0000000000075e00>] cpu_down+0x228/0x31c
[<00000000003b1dd8>] store_online+0x64/0xb8
[<00000000001e7128>] sysdev_store+0x48/0x58
[<0000000000121cd2>] sysfs_write_file+0x126/0x1c0
[<00000000000c1944>] vfs_write+0xb0/0x15c
[<00000000000c20e6>] sys_write+0x56/0x88
[<0000000000027a68>] sys32_write+0x34/0x4c
[<0000000000023f70>] sysc_noemu+0x10/0x16
[<0000000077f3f186>] 0x77f3f186
-> #0 (&cpu_hotplug.lock){--..}:
[<000000000006fa84>] __lock_acquire+0xe20/0x111c
[<000000000006fe40>] lock_acquire+0xc0/0xf8
[<00000000003b701c>] mutex_lock_nested+0xd0/0x364
[<0000000000076094>] get_online_cpus+0x50/0x78
[<000000000003a03e>] arch_reinit_sched_domains+0x26/0x58
[<000000000002700e>] topology_work_fn+0x26/0x34
[<0000000000059d4e>] run_workqueue+0x176/0x278
[<0000000000059edc>] worker_thread+0x8c/0xf0
[<000000000005f5bc>] kthread+0x68/0xa0
[<000000000001a33e>] kernel_thread_starter+0x6/0xc
[<000000000001a338>] kernel_thread_starter+0x0/0xc
other info that might help us debug this:
2 locks held by events/3/14:
#0: (events){--..}, at: [<0000000000059cde>] run_workqueue+0x106/0x278
#1: (topology_work){--..}, at: [<0000000000059cde>] run_workqueue+0x106/0x278
stack backtrace:
CPU: 3 Not tainted 2.6.25-03562-g3dc5063-dirty #12
Process events/3 (pid: 14, task: 000000002fb04038, ksp: 000000002fb0bd70)
0400000000000000 000000002fb0ba40 0000000000000002 0000000000000000
000000002fb0bae0 000000002fb0ba58 000000002fb0ba58 0000000000016488
0000000000000000 000000002fb0bd70 0000000000000000 0000000000000000
000000002fb0ba40 000000000000000c 000000002fb0ba40 000000002fb0bab0
00000000003c99e0 0000000000016488 000000002fb0ba40 000000002fb0ba90
Call Trace:
([<00000000000163fc>] show_trace+0x138/0x158)
[<00000000000164e2>] show_stack+0xc6/0xf8
[<0000000000016624>] dump_stack+0xb0/0xc0
[<000000000006cd36>] print_circular_bug_tail+0xa2/0xb4
[<000000000006fa84>] __lock_acquire+0xe20/0x111c
[<000000000006fe40>] lock_acquire+0xc0/0xf8
[<00000000003b701c>] mutex_lock_nested+0xd0/0x364
[<0000000000076094>] get_online_cpus+0x50/0x78
[<000000000003a03e>] arch_reinit_sched_domains+0x26/0x58
[<000000000002700e>] topology_work_fn+0x26/0x34
[<0000000000059d4e>] run_workqueue+0x176/0x278
[<0000000000059edc>] worker_thread+0x8c/0xf0
[<000000000005f5bc>] kthread+0x68/0xa0
[<000000000001a33e>] kernel_thread_starter+0x6/0xc
[<000000000001a338>] kernel_thread_starter+0x0/0xc
INFO: lockdep is turned off.
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
On some smp sysfs store attributes get_online_cpus() may block on
cpu_hotplug.lock, but we hold already smp_cpu_state_mutex. Since the
locking order on cpu hotplug via arch_update_cpu_topology is inverse
this might lead to deadlocks.
So make sure locking order is always the same.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This is where it should be and we can get rid of some externs
and a static inline function.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
New version that does not preserve the marker. Arch maintainers indicate
that the marker functionality is is not needed anymore.
Note you may simplify the s390 asm-offsets.c code further if you use the
OFFSET() macro instead of the DEFINE. See kbuild.h
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
s390 has a strange marker in DEFINE. Undefine the DEFINE from kbuild.h and
define it the way s390 wants it to preserve things as they were.
May be good if the arch maintainer could go over this and check if this
workaround is really necessary.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a proper prototype for __do_softirq() in include/linux/interrupt.h
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
So userspace can save/restore the mpstate during migration.
[avi: export the #define constants describing the value]
[christian: add s390 stubs]
[avi: ditto for ia64]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Timers that fire between guest hlt and vcpu_block's add_wait_queue() are
ignored, possibly resulting in hangs.
Also make sure that atomic_inc and waitqueue_active tests happen in the
specified order, otherwise the following race is open:
CPU0 CPU1
if (waitqueue_active(wq))
add_wait_queue()
if (!atomic_read(pit_timer->pending))
schedule()
atomic_inc(pit_timer->pending)
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Temporarily rename this function to avoid merge conflicts and/or
dependencies. This function will be removed as soon as git-s390
and kvm.git are finally upstream.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch adds functionality to detect if the kernel runs under the KVM
hypervisor. A macro MACHINE_IS_KVM is exported for device drivers. This
allows drivers to skip device detection if the systems runs non-virtualized.
We also define a preferred console to avoid having the ttyS0, which is a line
mode only console.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch adds the virtualization submenu and the kvm option to the kernel
config. It also defines HAVE_KVM for 64bit kernels.
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch introduces interpretation of some diagnose instruction intercepts.
Diagnose is our classic architected way of doing a hypercall. This patch
features the following diagnose codes:
- vm storage size, that tells the guest about its memory layout
- time slice end, which is used by the guest to indicate that it waits
for a lock and thus cannot use up its time slice in a useful way
- ipl functions, which a guest can use to reset and reboot itself
In order to implement ipl functions, we also introduce an exit reason that
causes userspace to perform various resets on the virtual machine. All resets
are described in the principles of operation book, except KVM_S390_RESET_IPL
which causes a reboot of the machine.
Acked-by: Martin Schwidefsky <martin.schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch introduces in-kernel handling of _some_ sigp interprocessor
signals (similar to ipi).
kvm_s390_handle_sigp() decodes the sigp instruction and calls individual
handlers depending on the operation requested:
- sigp sense tries to retrieve information such as existence or running state
of the remote cpu
- sigp emergency sends an external interrupt to the remove cpu
- sigp stop stops a remove cpu
- sigp stop store status stops a remote cpu, and stores its entire internal
state to the cpus lowcore
- sigp set arch sets the architecture mode of the remote cpu. setting to
ESAME (s390x 64bit) is accepted, setting to ESA/S390 (s390, 31 or 24 bit) is
denied, all others are passed to userland
- sigp set prefix sets the prefix register of a remote cpu
For implementation of this, the stop intercept indication starts to get reused
on purpose: a set of action bits defines what to do once a cpu gets stopped:
ACTION_STOP_ON_STOP really stops the cpu when a stop intercept is recognized
ACTION_STORE_ON_STOP stores the cpu status to lowcore when a stop intercept is
recognized
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch introduces in-kernel handling of some intercepts for privileged
instructions:
handle_set_prefix() sets the prefix register of the local cpu
handle_store_prefix() stores the content of the prefix register to memory
handle_store_cpu_address() stores the cpu number of the current cpu to memory
handle_skey() just decrements the instruction address and retries
handle_stsch() delivers condition code 3 "operation not supported"
handle_chsc() same here
handle_stfl() stores the facility list which contains the
capabilities of the cpu
handle_stidp() stores cpu type/model/revision and such
handle_stsi() stores information about the system topology
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch contains the s390 interrupt subsystem (similar to in kernel apic)
including timer interrupts (similar to in-kernel-pit) and enabled wait
(similar to in kernel hlt).
In order to achieve that, this patch also introduces intercept handling
for instruction intercepts, and it implements load control instructions.
This patch introduces an ioctl KVM_S390_INTERRUPT which is valid for both
the vm file descriptors and the vcpu file descriptors. In case this ioctl is
issued against a vm file descriptor, the interrupt is considered floating.
Floating interrupts may be delivered to any virtual cpu in the configuration.
The following interrupts are supported:
SIGP STOP - interprocessor signal that stops a remote cpu
SIGP SET PREFIX - interprocessor signal that sets the prefix register of a
(stopped) remote cpu
INT EMERGENCY - interprocessor interrupt, usually used to signal need_reshed
and for smp_call_function() in the guest.
PROGRAM INT - exception during program execution such as page fault, illegal
instruction and friends
RESTART - interprocessor signal that starts a stopped cpu
INT VIRTIO - floating interrupt for virtio signalisation
INT SERVICE - floating interrupt for signalisations from the system
service processor
struct kvm_s390_interrupt, which is submitted as ioctl parameter when injecting
an interrupt, also carrys parameter data for interrupts along with the interrupt
type. Interrupts on s390 usually have a state that represents the current
operation, or identifies which device has caused the interruption on s390.
kvm_s390_handle_wait() does handle waitpsw in two flavors: in case of a
disabled wait (that is, disabled for interrupts), we exit to userspace. In case
of an enabled wait we set up a timer that equals the cpu clock comparator value
and sleep on a wait queue.
[christian: change virtio interrupt to 0x2603]
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This path introduces handling of sie intercepts in three flavors: Intercepts
are either handled completely in-kernel by kvm_handle_sie_intercept(),
or passed to userspace with corresponding data in struct kvm_run in case
kvm_handle_sie_intercept() returns -ENOTSUPP.
In case of partial execution in kernel with the need of userspace support,
kvm_handle_sie_intercept() may choose to set up struct kvm_run and return
-EREMOTE.
The trivial intercept reasons are handled in this patch:
handle_noop() just does nothing for intercepts that don't require our support
at all
handle_stop() is called when a cpu enters stopped state, and it drops out to
userland after updating our vcpu state
handle_validity() faults in the cpu lowcore if needed, or passes the request
to userland
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch contains the port of Qumranet's kvm kernel module to IBM zSeries
(aka s390x, mainframe) architecture. It uses the mainframe's virtualization
instruction SIE to run virtual machines with up to 64 virtual CPUs each.
This port is only usable on 64bit host kernels, and can only run 64bit guest
kernels. However, running 31bit applications in guest userspace is possible.
The following source files are introduced by this patch
arch/s390/kvm/kvm-s390.c similar to arch/x86/kvm/x86.c, this implements all
arch callbacks for kvm. __vcpu_run calls back into
sie64a to enter the guest machine context
arch/s390/kvm/sie64a.S assembler function sie64a, which enters guest
context via SIE, and switches world before and after that
include/asm-s390/kvm_host.h contains all vital data structures needed to run
virtual machines on the mainframe
include/asm-s390/kvm.h defines kvm_regs and friends for user access to
guest register content
arch/s390/kvm/gaccess.h functions similar to uaccess to access guest memory
arch/s390/kvm/kvm-s390.h header file for kvm-s390 internals, extended by
later patches
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The SIE instruction on s390 uses the 2nd half of the page table page to
virtualize the storage keys of a guest. This patch offers the s390_enable_sie
function, which reorganizes the page tables of a single-threaded process to
reserve space in the page table:
s390_enable_sie makes sure that the process is single threaded and then uses
dup_mm to create a new mm with reorganized page tables. The old mm is freed
and the process has now a page status extended field after every page table.
Code that wants to exploit pgstes should SELECT CONFIG_PGSTE.
This patch has a small common code hit, namely making dup_mm non-static.
Edit (Carsten): I've modified Martin's patch, following Jeremy Fitzhardinge's
review feedback. Now we do have the prototype for dup_mm in
include/linux/sched.h. Following Martin's suggestion, s390_enable_sie() does now
call task_lock() to prevent race against ptrace modification of mm_users.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
[HWRNG] omap: Minor updates
[CRYPTO] kconfig: Ordering cleanup
[CRYPTO] all: Clean up init()/fini()
[CRYPTO] padlock-aes: Use generic setkey function
[CRYPTO] aes: Export generic setkey
[CRYPTO] api: Make the crypto subsystem fully modular
[CRYPTO] cts: Add CTS mode required for Kerberos AES support
[CRYPTO] lrw: Replace all adds to big endians variables with be*_add_cpu
[CRYPTO] tcrypt: Change the XTEA test vectors
[CRYPTO] tcrypt: Shrink the tcrypt module
[CRYPTO] tcrypt: Change the usage of the test vectors
[CRYPTO] api: Constify function pointer tables
[CRYPTO] aes-x86-32: Remove unused return code
[CRYPTO] tcrypt: Shrink speed templates
[CRYPTO] tcrypt: Group common speed templates
[CRYPTO] sha512: Rename sha512 to sha512_generic
[CRYPTO] sha384: Hardware acceleration for s390
[CRYPTO] sha512: Hardware acceleration for s390
[CRYPTO] s390: Generic sha_update and sha_final
[CRYPTO] api: Switch to proc_create()
Exploit the System z10 hardware acceleration for SHA384.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Exploit the System z10 hardware acceleration for SHA512.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The sha_{update|final} functions are similar for every sha variant.
Since that is error-prone and redundant replace these functions by
a shared generic implementation for s390.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
None of these files use any of the functionality promised by
asm/semaphore.h. It's possible that they rely on it dragging in some
unrelated header file, but I can't build all these files, so we'll have
fix any build failures as they come up.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.26: (1090 commits)
[NET]: Fix and allocate less memory for ->priv'less netdevices
[IPV6]: Fix dangling references on error in fib6_add().
[NETLABEL]: Fix NULL deref in netlbl_unlabel_staticlist_gen() if ifindex not found
[PKT_SCHED]: Fix datalen check in tcf_simp_init().
[INET]: Uninline the __inet_inherit_port call.
[INET]: Drop the inet_inherit_port() call.
SCTP: Initialize partial_bytes_acked to 0, when all of the data is acked.
[netdrvr] forcedeth: internal simplifications; changelog removal
phylib: factor out get_phy_id from within get_phy_device
PHY: add BCM5464 support to broadcom PHY driver
cxgb3: Fix __must_check warning with dev_dbg.
tc35815: Statistics cleanup
natsemi: fix MMIO for PPC 44x platforms
[TIPC]: Cleanup of TIPC reference table code
[TIPC]: Optimized initialization of TIPC reference table
[TIPC]: Remove inlining of reference table locking routines
e1000: convert uint16_t style integers to u16
ixgb: convert uint16_t style integers to u16
sb1000.c: make const arrays static
sb1000.c: stop inlining largish static functions
...
* 'semaphore' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc:
Remove DEBUG_SEMAPHORE from Kconfig
Improve semaphore documentation
Simplify semaphore implementation
Add down_timeout and change ACPI to use it
Introduce down_killable()
Generic semaphore implementation
Add semaphore.h to kernel_lock.c
Fix quota.h includes
Semaphores are no longer performance-critical, so a generic C
implementation is better for maintainability, debuggability and
extensibility. Thanks to Peter Zijlstra for fixing the lockdep
warning. Thanks to Harvey Harrison for pointing out that the
unlikely() was unnecessary.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Move the function that prints the segment warning messages found in the
monreader driver and the dcssblk driver to the extmem base code.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Newer s390 models have a breaking-event-address-recording register.
Each time an instruction causes a break in the sequential instruction
execution, the address is saved in that hardware register. On a program
interrupt the address is copied to the lowcore address 272-279, which
makes it software accessible.
This patch changes the program check handler and the stack overflow
checker to copy the value into the pt_regs argument.
The oops output is enhanced to show the last known breaking address.
It might give additional information if the stack trace is corrupted.
The feature is only available on 64 bit.
The new oops output looks like:
[---------snip----------]
Modules linked in: vmcp sunrpc qeth_l2 dm_mod qeth ccwgroup
CPU: 2 Not tainted 2.6.24zlive-host #8
Process modprobe (pid: 4788, task: 00000000bf3d8718, ksp: 00000000b2b0b8e0)
Krnl PSW : 0704200180000000 000003e000020028 (vmcp_init+0x28/0xe4 [vmcp])
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 EA:3
Krnl GPRS: 0000000004000002 000003e000020000 0000000000000000 0000000000000001
000000000015734c ffffffffffffffff 000003e0000b3b00 0000000000000000
000003e00007ca30 00000000b5bb5d40 00000000b5bb5800 000003e0000b3b00
000003e0000a2000 00000000003ecf50 00000000b2b0bd50 00000000b2b0bcb0
Krnl Code: 000003e000020018: c0c000040ff4 larl %r12,3e0000a2000
000003e00002001e: e3e0f0000024 stg %r14,0(%r15)
000003e000020024: a7f40001 brc 15,3e000020026
>000003e000020028: e310c0100004 lg %r1,16(%r12)
000003e00002002e: c020000413dc larl %r2,3e0000a27e6
000003e000020034: c0a00004aee6 larl %r10,3e0000b5e00
000003e00002003a: a7490001 lghi %r4,1
000003e00002003e: a75900f0 lghi %r5,240
Call Trace:
([<000000000014b300>] blocking_notifier_call_chain+0x2c/0x40)
[<000000000015735c>] sys_init_module+0x19d8/0x1b08
[<0000000000110afc>] sysc_noemu+0x10/0x16
[<000002000011cda2>] 0x2000011cda2
Last Breaking-Event-Address:
[<000003e000020024>] vmcp_init+0x24/0xe4 [vmcp]
[---------snip----------]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
The current uaccess page table walk code assumes at a few places that
any access is a user space access. This is not correct if somebody
has issued a set_fs(KERNEL_DS) in advance.
Add code which checks which address space we are in and with this make
sure we access the correct address space. This way we get also rid of
the dirty
if (!currrent-mm)
return -EFAULT;
hack in futex_atomic_cmpxchg_pt.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Most noteable part of this commit is the new local header file entry.h
which contains all the function declarations of functions that get only
called from asm code or are arch internal. That way we can avoid extern
declarations in C files.
This is more or less the same that was done for sparc64.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
This way we get rid of s390's NO_IDLE_HZ and use the generic dynticks
variant instead. In addition we get high resolution timers for free.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Remove the program check generating monitor calls and use function
calls instead. Theres is no real advantage in using monitor calls,
but they do make debugging harder, because of all the program checks
it generates.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
The new function supports setting of permissions for the debugfs files
created by the debug feature. In addition to that, the function provides
uid and gid as parameters for future use. Currently only root is allowed
for uid and gid.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Not very helpful when code dies in "init".
See also http://lkml.org/lkml/2008/3/26/557 .
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Add get_clock_xt to read an 8 byte clock value using store clock
extended (STCKE) and use get_clock_xt for sched_clock. STCKE should
be faster than STCK on newer machines.
Signed-off-by: Jan Glauber <jan.glauber@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
If vertical cpu polarization is active then the hypervisor will
dispatch certain cpus for a longer time than other cpus for maximum
performance. For example if a guest would have three virtual cpus,
each of them with a share of 33 percent, then in case of vertical
cpu polarization all of the processing time would be combined to a
single cpu which would run all the time, while the other two cpus
would get nearly no cpu time.
There are three different types of vertical cpus: high, medium and
low. Low cpus hardly get any real cpu time, while high cpus get a
full real cpu. Medium cpus get something in between.
In order to switch between the two possible modes (default is
horizontal) a 0 for horizontal polarization or a 1 for vertical
polarization must be written to the dispatching sysfs attribute:
/sys/devices/system/cpu/dispatching
The polarization of each single cpu can be figured out by the
polarization sysfs attribute of each cpu:
/sys/devices/system/cpu/cpuX/polarization
horizontal, vertical:high, vertical:medium, vertical:low or unknown.
When switching polarization the polarization attribute may contain
the value unknown until the configuration change is done and the
kernel has figured out the new polarization of each cpu.
Note that running a system with different types of vertical cpus may
result in significant performance regressions. If possible only one
type of vertical cpus should be used. All other cpus should be
offlined.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Add s390 backend so we can give the scheduler some hints about the
cpu topology.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Make stfle visible so other code can call this.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
sys_sigreturn and sys_rt_sigreturn don't take any arguments. So luckily
this resulted only in unneeded instead of incorrect code.
But still this clearly shows why one should not put extern declarations
in C files (will be fixed with a larger sparse patch).
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
This is just a port of 83bd01024b
"x86: protect against sigaltstack wraparound".
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
a0c1e9073e "futex: runtime enable pi and
robust functionality" introduces a test wether futex in atomic stuff
works or not.
It does that by writing to address 0 of the kernel address space. This
will crash on older machines where addressing mode switching is enabled
but where the mvcos instruction is not available. Page table walking is
done by hand and therefore the code tries to access current->mm which
is NULL.
Therefore add an extra check, so we survive the early test.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
List of major changes and improvements:
no manipulation of the global ARP constructor
clean code split into core, layer 2 and layer 3 functionality
better exploitation of the ethtool interface
better representation of the various hardware capabilities
fix packet socket support (tcpdump), no fake_ll required
osasnmpd notification via udev events
coding style and beautification
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
/sys/firmware/reipl/nss/name contains the nss name when defsys or
savesys command has been executed. If the defsys or savesys command
fails the kernel_nss_name has to be cleared since a reipl on that
nss name won't be possible.
Signed-off-by: Hongjie Yang <hongjie@us.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Normally this should not happen, but it's cleaner to do it that way.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
IPL from NSS didn't work because the memory detection routine omits any
memory sections with a size lower than what MAX_ORDER defines.
This causes the detection routine to skip the first memory segment which
has a size of 1MB. Which later on will let the kernel think that there
is no memory available at all.
Since in addition the z/VM memory increment size is 1MB force MAX_ORDER
to be 9, so we can support 1MB segments.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Compile smp.o with -Wno-nonnull so gcc stops warning about memcpy
being used with a null parameter. Also remove the workaround code
and use a char * cast instead of a void * cast to do computations.
Cc: Bastian Blank <bastian@waldi.eu.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If a machine check handling is pending when the idle loop is entered
default_idle will be left with timer ticks and virtual timer disabled.
Fix this by "calling" the idle_chain. Also a BUG_ON(!in_interrupt) in
start_hz_timer must be removed since the function now gets called from
non interrupt context as well.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add CONFIG_HAVE_KRETPROBES to the arch/<arch>/Kconfig file for relevant
architectures with kprobes support. This facilitates easy handling of
in-kernel modules (like samples/kprobes/kretprobe_example.c) that depend on
kretprobes being present in the kernel.
Thanks to Sam Ravnborg for helping make the patch more lean.
Per Mathieu's suggestion, added CONFIG_KRETPROBES and fixed up dependencies.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add missing exception table entry so that the kernel can handle
proctection exceptions as well on the cs instruction. Currently only
specification exceptions are handled correctly.
The missing entry allows user space to crash the kernel.
Cc: stable <stable@kernel.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Since a5fbb6d106
"KVM: fix !SMP build error" smp_call_function isn't a define anymore
that folds into nothing but a define that calls up_smp_call_function
with all parameters. Hence we cannot #ifdef out the unused code
anymore...
This seems to be the preferred method, so do this for s390 as well.
arch/s390/kernel/time.c: In function 'etr_sync_clock':
arch/s390/kernel/time.c:825: error: 'clock_sync_cpu_start' undeclared
arch/s390/kernel/time.c:862: error: 'clock_sync_cpu_end' undeclared
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Just copy the first 512 read-only bytes of the current cpu lowcore if
a new cpu gets onlined. The rest is zeroed out and must be explicitly
initialized. Current code just copies the entire lowcore and
initializes the needed fields.
This should reveal bugs in future enhancements quite early.
Also when the lowcore of the first cpu is replaced this is now done
atomically (no interrupts, no machine checks).
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If both NO_IDLE_HZ and VIRT_TIMER are disabled default_idle won't load
an enabled wait psw and busy loop instead. This is because the
idle_chain is empty and the return value of atomic_notifier_call_chain
will be NOTIFY_DONE, which causes default_idle to return instead of
loading an enabled wait psw.
Fix this by calling __atomic_notifier_call_chain instead and add proper
return value handling.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add support for different number of page table levels dependent
on the highest address used for a process. This will cause a 31 bit
process to use a two level page table instead of the four level page
table that is the default after the pud has been introduced. Likewise
a normal 64 bit process will use three levels instead of four. Only
if a process runs out of the 4 tera bytes which can be addressed with
a three level page table the fourth level is dynamically added. Then
the process can use up to 8 peta byte.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Background: I've implemented 1K/2K page tables for s390. These sub-page
page tables are required to properly support the s390 virtualization
instruction with KVM. The SIE instruction requires that the page tables
have 256 page table entries (pte) followed by 256 page status table entries
(pgste). The pgstes are only required if the process is using the SIE
instruction. The pgstes are updated by the hardware and by the hypervisor
for a number of reasons, one of them is dirty and reference bit tracking.
To avoid wasting memory the standard pte table allocation should return
1K/2K (31/64 bit) and 2K/4K if the process is using SIE.
Problem: Page size on s390 is 4K, page table size is 1K or 2K. That means
the s390 version for pte_alloc_one cannot return a pointer to a struct
page. Trouble is that with the CONFIG_HIGHPTE feature on x86 pte_alloc_one
cannot return a pointer to a pte either, since that would require more than
32 bit for the return value of pte_alloc_one (and the pte * would not be
accessible since its not kmapped).
Solution: The only solution I found to this dilemma is a new typedef: a
pgtable_t. For s390 pgtable_t will be a (pte *) - to be introduced with a
later patch. For everybody else it will be a (struct page *). The
additional problem with the initialization of the ptl lock and the
NR_PAGETABLE accounting is solved with a constructor pgtable_page_ctor and
a destructor pgtable_page_dtor. The page table allocation and free
functions need to call these two whenever a page table page is allocated or
freed. pmd_populate will get a pgtable_t instead of a struct page pointer.
To get the pgtable_t back from a pmd entry that has been installed with
pmd_populate a new function pmd_pgtable is added. It replaces the pmd_page
call in free_pte_range and apply_to_pte_range.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently we possibly lookup the pid in the wrong pid namespace. So
seq_file convert proc_pid_status which ensures the proper pid namespaces is
passed in.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: another build fix]
[akpm@linux-foundation.org: s390 build fix]
[akpm@linux-foundation.org: fix task_name() output]
[akpm@linux-foundation.org: fix nommu build]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andrew Morgan <morgan@kernel.org>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patchset adds a flags variable to reserve_bootmem() and uses the
BOOTMEM_EXCLUSIVE flag in crashkernel reservation code to detect collisions
between crashkernel area and already used memory.
This patch:
Change the reserve_bootmem() function to accept a new flag BOOTMEM_EXCLUSIVE.
If that flag is set, the function returns with -EBUSY if the memory already
has been reserved in the past. This is to avoid conflicts.
Because that code runs before SMP initialisation, there's no race condition
inside reserve_bootmem_core().
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix powerpc build]
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: <linux-arch@vger.kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
[S390] dcss: Initialize workqueue before using it.
[S390] Remove BUILD_BUG_ON() in vmem code.
[S390] sclp_tty/sclp_vt220: Fix scheduling while atomic
[S390] dasd: fix panic caused by alias device offline
[S390] dasd: add ifcc handling
[S390] latencytop s390 support.
[S390] Implement ext2_find_next_bit.
[S390] Cleanup & optimize bitops.
[S390] Define GENERIC_LOCKBREAK.
[S390] console: allow vt220 console to be the only console
[S390] Fix couple of section mismatches.
[S390] Fix smp_call_function_mask semantics.
[S390] Fix linker script.
[S390] DEBUG_PAGEALLOC support for s390.
[S390] cio: Add shutdown callback for ccwgroup.
[S390] cio: Update documentation.
[S390] cio: Clean up chsc response code handling.
[S390] cio: make sense id procedure work with partial hardware response
This is the new timerfd API as it is implemented by the following patch:
int timerfd_create(int clockid, int flags);
int timerfd_settime(int ufd, int flags,
const struct itimerspec *utmr,
struct itimerspec *otmr);
int timerfd_gettime(int ufd, struct itimerspec *otmr);
The timerfd_create() API creates an un-programmed timerfd fd. The "clockid"
parameter can be either CLOCK_MONOTONIC or CLOCK_REALTIME.
The timerfd_settime() API give new settings by the timerfd fd, by optionally
retrieving the previous expiration time (in case the "otmr" parameter is not
NULL).
The time value specified in "utmr" is absolute, if the TFD_TIMER_ABSTIME bit
is set in the "flags" parameter. Otherwise it's a relative time.
The timerfd_gettime() API returns the next expiration time of the timer, or
{0, 0} if the timerfd has not been set yet.
Like the previous timerfd API implementation, read(2) and poll(2) are
supported (with the same interface). Here's a simple test program I used to
exercise the new timerfd APIs:
http://www.xmailserver.org/timerfd-test2.c
[akpm@linux-foundation.org: coding-style cleanups]
[akpm@linux-foundation.org: fix ia64 build]
[akpm@linux-foundation.org: fix m68k build]
[akpm@linux-foundation.org: fix mips build]
[akpm@linux-foundation.org: fix alpha, arm, blackfin, cris, m68k, s390, sparc and sparc64 builds]
[heiko.carstens@de.ibm.com: fix s390]
[akpm@linux-foundation.org: fix powerpc build]
[akpm@linux-foundation.org: fix sparc64 more]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove BUILD_BUG_ON() in vmem code since it causes build failures if
the size of struct page increases. Instead calculate at compile time
the address of the highest physical address that can be added to the
1:1 mapping.
This supposed to fix a build failure with the page owner tracking leak
detector patches as reported by akpm.
page-owner-tracking-leak-detector-broken-on-s390.patch can be removed
from -mm again when this is merged.
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fix compile error:
CC arch/s390/kernel/asm-offsets.s
In file included from
arch/s390/kernel/asm-offsets.c:7:
include/linux/sched.h: In function 'spin_needbreak':
include/linux/sched.h:1931: error: implicit declaration of function '__raw_spin_is_contended'
make[2]: *** [arch/s390/kernel/asm-offsets.s] Error 1
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fix console detection logic to support configurations in which the
vt220 console is the only available Linux console.
Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fix couple of section mismatches. And since we touch the code
anyway change the IPL code to use C99 initializers.
Cc: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Make sure func isn't called on the local cpu just like on all other
architectures that implement this function.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fixes this warning:
vmlinux: warning: allocated section `.text' not in segment
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
After seeing the filename I'd have expected something about the
implementation of SMP in the Linux kernel - not some notes on kernel
configuration and building trivialities noone would search at this
place.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Alan Cox <alan@redhat.com>
Linus:
On the per-architecture side, I do think it would be better to *not* have
internal architecture knowledge in a generic file, and as such a line like
depends on X86_32 || IA64 || PPC || S390 || SPARC64 || X86_64 || AVR32
really shouldn't exist in a file like kernel/Kconfig.instrumentation.
It would be much better to do
depends on ARCH_SUPPORTS_KPROBES
in that generic file, and then architectures that do support it would just
have a
bool ARCH_SUPPORTS_KPROBES
default y
in *their* architecture files. That would seem to be much more logical,
and is readable both for arch maintainers *and* for people who have no
clue - and don't care - about which architecture is supposed to support
which interface...
Changelog:
Actually, I know I gave this as the magic incantation, but now that I see
it, I realize that I should have told you to just use
config KPROBES_SUPPORT
def_bool y
instead, which is a bit denser.
We seem to use both kinds of syntax for these things, but this is really
what "def_bool" is there for...
- Use HAVE_KPROBES
- Use a select
- Yet another update :
Moving to HAVE_* now.
- Update ARM for kprobes support.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Linus:
On the per-architecture side, I do think it would be better to *not* have
internal architecture knowledge in a generic file, and as such a line like
depends on X86_32 || IA64 || PPC || S390 || SPARC64 || X86_64 || AVR32
really shouldn't exist in a file like kernel/Kconfig.instrumentation.
It would be much better to do
depends on ARCH_SUPPORTS_KPROBES
in that generic file, and then architectures that do support it would just
have a
bool ARCH_SUPPORTS_KPROBES
default y
in *their* architecture files. That would seem to be much more logical,
and is readable both for arch maintainers *and* for people who have no
clue - and don't care - about which architecture is supposed to support
which interface...
Changelog:
Actually, I know I gave this as the magic incantation, but now that I see
it, I realize that I should have told you to just use
config ARCH_SUPPORTS_KPROBES
def_bool y
instead, which is a bit denser.
We seem to use both kinds of syntax for these things, but this is really
what "def_bool" is there for...
Changelog :
- Moving to HAVE_*.
- Add AVR32 oprofile.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
This patch consolidate all definitions of .init.text, .init.data
and .exit.text, .exit.data section definitions in
the generic vmlinux.lds.h.
This is a preparational patch - alone it does not buy
us much good.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Git commit 86ef5c9a8e forgot a few
lock_cpu_hotplug/unlock_cpu_hotplug pairs in arch/s390/kernel/smp.c
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In s390's spin_lock_irqsave, interrupts remain disabled while
spinning. In other architectures like x86 and powerpc, interrupts are
re-enabled while spinning if IRQ is not masked before spin_lock_irqsave
is called.
The following patch re-enables interrupts through local_irq_restore
while spinning for a lock acquisition.
This can improve system response.
[heiko.carstens@de.ibm.com: removed saving of pc]
Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch adds the s390 variant for smp_call_function_mask(). The
implementation is pretty straight forward using the wrapper
__smp_call_function_map() which already takes a cpumask_t argument.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch removes TOPDIR from arch/s390/kernel/Makefile.
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Move the NOTES and BUG_TABLE section in the linker script to the
read-only sections right after the text section.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch fixes a problem with the following scenario:
1. Linux booted from DASD "A"
2. Reboot from DASD "B" using "/sys/firmware/reipl/ccw/device"
3. Reboot DASD "B"
Without this patch in step 3 on newer s390 systems under LPAR instead of
DASD "B", DASD "A" will be booted. The reason is that in step 2 we use CCW
reipl and in step 3 we use DIAG308 (subcode 3) reipl. DIAG308 does not
notice the CCW reipl and still thinks that it has to reboot DASD "A".
Before applying this fix, ensure to have MCF RJ9967101E or z9 GA3 base driver
installed.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
We have seen an oops in an OOM situation, where show_mem tried to
access the struct page of a dcss segment. The vmemmap code has
already created the 1:1 mapping but failed allocating the struct
pages. In the OOM case, show_mem now walks the memory. It uses
pfn_valid to detect if it may access the struct page. In the case
described above, the mapping was established and pfn_valid returned
true. As the struct pages were not allocated, the kernel oopsed.
We have to ensure that we have created the struct pages, before we
add a mapping pointing to the pages.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The sclp ipl information has not been initialized. Therefore the ipl loadparm
and the "has_dump" flag have not been set correctly.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
No need to preallocate the per cpu lowcores and stacks.
Savings are 28-32k per offline cpu.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
single list_head variable initialized with LIST_HEAD_INIT could almost
always can be replaced with LIST_HEAD declaration, this shrinks the code
and looks better.
Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In case of a kernel panic it is currently possible to specify that a dump
should be created, the system should be rebooted or stopped. Virtual sysfs
files under the directory /sys/firmware/ are used for that configuration.
In addition to that, there are kernel parameters 'vmhalt', 'vmpoff'
and 'vmpanic', which can be used to specify z/VM commands, which are
automatically executed in case of halt, power off or a kernel panic.
This patch combines both functionalities and allows to specify the z/VM CP
commands also via sysfs attributes. In addition to that, it enhances the
existing handling of shutdown triggers (e.g. halt or panic) and associated
shutdown actions (e.g. dump or reipl) and makes it more flexible.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Move s390 crypto Kconfig options to drivers/crypto/Kconfig to have all
hardware crypto devices in one place.
This also makes messing up the kernel source tree easier for some people.
Signed-off-by: Jan Glauber <jan.glauber@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
It caused only a lot of confusion. From now on cpu hotplug of up to
NR_CPUS will work by default. If somebody wants to limit that then
the possible_cpus parameter can be used.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Used to contain the address of the holder of the lock. But since the
spinlock code is not inlined anymore all locks contain the same address
anyway. And since in addtition nobody complained about that for ages
its obviously unused. So remove it.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Align everything to MAX_ORDER so we can get rid of the extra checks.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Also print PREEMPT and/or SMP if the kernel was configured that way.
Makes s390 look a bit more like other architectures.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Currently the vmalloc area starts at a dynamic address depending on
the memory size. There was also an 8MB security hole after the
physical memory to catch out-of-bounds accesses.
We can simplify the code by putting the vmalloc area explicitely at
the top of the kernel mapping and setting the vmalloc size to a fixed
value of 128MB/128GB for 31bit/64bit systems. Part of the vmalloc
area will be used for the vmem_map. This leaves an area of 96MB/1GB
for normal vmalloc allocations.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The clear-by-asce operation of the idte instruction gets an asce
(address-space-control-element) as argument to specify which TLBs
need to get flushed. The current code passes a plain pointer to
the start of the pgd without the additional bits which would make
the pointer an asce. The current machines don't mind the difference
but a future model might want to use the designation type control
bits in the asce as a filter for the TLBs to flush.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add a new interface so that cpus can be put into standby state and
configured state.
Only offline cpus can be put into standby state or configured state.
For that the new percpu sysfs attribute "configure" must be used.
To put a cpu in standby state a "0" must be written to the attribute.
In order to switch it into configured state a "1" must be written to
the attribute.
Only cpus in configured state can be brought online.
In addition this patch introduces a static mapping of physical to
logical cpus. As a result only the sysfs directories of present cpus
will be created. To scan for new cpus the new sysfs attribute "rescan"
must be used.
Writing to /sys/devices/system/cpu/rescan will trigger a rescan of
cpus and will create directories for new cpus.
On IPL only configured cpus will be used. And on reboot/shutdown all
cpus will remain in their current state (configured/standby).
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (125 commits)
[CRYPTO] twofish: Merge common glue code
[CRYPTO] hifn_795x: Fixup container_of() usage
[CRYPTO] cast6: inline bloat--
[CRYPTO] api: Set default CRYPTO_MINALIGN to unsigned long long
[CRYPTO] tcrypt: Make xcbc available as a standalone test
[CRYPTO] xcbc: Remove bogus hash/cipher test
[CRYPTO] xcbc: Fix algorithm leak when block size check fails
[CRYPTO] tcrypt: Zero axbuf in the right function
[CRYPTO] padlock: Only reset the key once for each CBC and ECB operation
[CRYPTO] api: Include sched.h for cond_resched in scatterwalk.h
[CRYPTO] salsa20-asm: Remove unnecessary dependency on CRYPTO_SALSA20
[CRYPTO] tcrypt: Add select of AEAD
[CRYPTO] salsa20: Add x86-64 assembly version
[CRYPTO] salsa20_i586: Salsa20 stream cipher algorithm (i586 version)
[CRYPTO] gcm: Introduce rfc4106
[CRYPTO] api: Show async type
[CRYPTO] chainiv: Avoid lock spinning where possible
[CRYPTO] seqiv: Add select AEAD in Kconfig
[CRYPTO] scatterwalk: Handle zero nbytes in scatterwalk_map_and_copy
[CRYPTO] null: Allow setkey on digest_null
...
All kobjects require a dynamically allocated name now. We no longer
need to keep track if the name is statically assigned, we can just
unconditionally free() all kobject names on cleanup.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There is no need for kobject_unregister() anymore, thanks to Kay's
kobject cleanup changes, so replace all instances of it with
kobject_put().
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There is no firmware "subsystem" it's just a directory in /sys that
other portions of the kernel want to hook into. So make it a kobject
not a kset to help alivate anyone who tries to do some odd kset-like
things with this.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dynamically create the kset instead of declaring it statically.
This makes the kobject attributes now work properly that I broke in the
previous patch.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Michael Holzheu <holzheu@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Volker Sameske <sameske@de.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This makes the code a bit simpler and and gets us one step closer to
deleting the deprecated subsys_attr code.
NOTE, this needs the next patch in the series in order to work properly.
This will build, but the sysfs files will not properly operate.
Thanks to Cornelia for the build fix on this patch.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Michael Holzheu <holzheu@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Volker Sameske <sameske@de.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We don't need a kset here, a simple kobject will do just fine, so
dynamically create the kobject and use it.
Thanks to Cornelia for the build fix.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Michael Holzheu <holzheu@de.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We don't need a kset here, a simple kobject will do just fine, so
dynamically create the kobject and use it.
We also rename hypervisor_subsys to hypervisor_kset to catch all users
of the variable.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We don't need a "default" ktype for a kset. We should set this
explicitly every time for each kset. This change is needed so that we
can make ksets dynamic, and cleans up one of the odd, undocumented
assumption that the kset/kobject/ktype model has.
This patch is based on a lot of help from Kay Sievers.
Nasty bug in the block code was found by Dave Young
<hidave.darkstar@gmail.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
crypto_blkcipher_decrypt is wrong because it does not care about
the IV.
Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Some CPUs support only 128 bit keys in HW. This patch adds SW fallback
support for the other keys which may be required. The generic algorithm
(and the block mode) must be availble in case of a fallback.
Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This three defines are used in all AES related hardware.
Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In case of TRACE_IRQFLAGS the restore psw masks will not be
initialized if noexec is turned on. This will lead to an
immediate system crash.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Commit b8e7a54cd0 introduced a compile
error if CONFIG_PREEMPT is not set:
arch/s390/kernel/built-in.o: In function `cleanup_io_leave_insn':
/space/kvm/arch/s390/kernel/entry.S:(.text+0xbfce): undefined reference to `preempt_schedule_irq'
This patch hides preempt_schedule_irq if CONFIG_PREEMPT is not set.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Before we're getting short on memory detection fixes here is the next
one: if neither sclp nor diag260 report the storage size the detection
loop will return immediately without detecting anything. Fix this by
breaking the detection loop only if the memory end is known.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Don't perform a sigp store-status-at-address on smp_send_stop().
It will overwrite the lowcores of other cpus and destroys valueable
debug informations.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When returning from IRQ handling and TIF_NEED_RESCHED is set we must
call preempt_schedule_irq() instead of schedule().
Otherwise the BKL might be unlocked in schedule() and therfore
everything that relies on the BKL is broken.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove binary sysctls that never worked due to missing strategy functions.
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove binary sysctls that never worked due to missing strategy functions.
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Current support for TRACE_IRQFLAGS and lockdep_sys_exit is broken.
IRQ flag tracing is broken for program checks. Even worse is that
the newly introduced calls to lockdep_sys_exit are in the critical
section code which is not supposed to call any C functions. In
addition the checks if locks are still held are also done when
returning to kernel code which is broken as well.
Fix all this by disabling interrupts and machine checks at the
exit paths and then do the appropriate checks and calls.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When doing an magic sysrq reboot on s390 the following bug message
appears:
SysRq : Resetting
BUG: sleeping function called from invalid context at include/asm/semaphore.h:61
in_atomic():1, irqs_disabled():0
07000000004002a8 000000000fe6bc48 0000000000000002 0000000000000000
000000000fe6bce8 000000000fe6bc60 000000000fe6bc60 000000000012a79a
0000000000000000 07000000004002a8 0000000000000006 0000000000000000
0000000000000000 000000000fe6bc48 000000000000000d 000000000fe6bcb8
00000000004000c8 0000000000103234 000000000fe6bc48 000000000fe6bc90
Call Trace:
(¬<00000000001031b2>| show_trace+0x12e/0x148)
¬<000000000011ffca>| __might_sleep+0x10a/0x118
¬<0000000000129fba>| acquire_console_sem+0x92/0xf4
¬<000000000012a2ca>| console_unblank+0xc2/0xc8
¬<0000000000107bb4>| machine_restart+0x54/0x6c
¬<000000000028e806>| sysrq_handle_reboot+0x26/0x30
¬<000000000028e52a>| __handle_sysrq+0xa6/0x180
¬<0000000000140134>| run_workqueue+0xcc/0x18c
¬<000000000014029a>| worker_thread+0xa6/0x108
¬<00000000001458e4>| kthread+0x64/0x9c
¬<0000000000106f0e>| kernel_thread_starter+0x6/0xc
¬<0000000000106f08>| kernel_thread_starter+0x0/0xc
The only reason for doing a console_unblank on s390 is to flush the
log buffer. We have to check for in_atomic before doing a
console_unblank as the console is otherwise filled with an unrelated
bug message.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Since powerpc started using CONFIG_GENERIC_CLOCKEVENTS, the
deterministic CPU accounting (CONFIG_VIRT_CPU_ACCOUNTING) has been
broken on powerpc, because we end up counting user time twice: once in
timer_interrupt() and once in update_process_times().
This fixes the problem by pulling the code in update_process_times
that updates utime and stime into a separate function called
account_process_tick. If CONFIG_VIRT_CPU_ACCOUNTING is not defined,
there is a version of account_process_tick in kernel/timer.c that
simply accounts a whole tick to either utime or stime as before. If
CONFIG_VIRT_CPU_ACCOUNTING is defined, then arch code gets to
implement account_process_tick.
This also lets us simplify the s390 code a bit; it means that the s390
timer interrupt can now call update_process_times even when
CONFIG_VIRT_CPU_ACCOUNTING is turned on, and can just implement a
suitable account_process_tick().
account_process_tick() now takes the task_struct * as an argument.
Tested both with and without CONFIG_VIRT_CPU_ACCOUNTING.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The Time of Day clock is the standard time source for s390. It is
- monotonic
- allows very fast reading
- architecture guarantees at least microsecond stepping
- available as part of the architecture
We should announce the rate of tod as 400 to be in sync with the
description found in clocksource.h:
"400-499:Perfect The ideal clocksource. A must-use where available."
This change will prefer tod over less reliable clock sources.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Seems that people prefer to have the unit encoded in the attribute
name. Also makes parsing easier.
Now we have:
# cat /sys/devices/system/cpu/cpu0/idle_time_us
131473592
instead of
# cat /sys/devices/system/cpu/cpu0/idle_time
131473592 us
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Yet another patch in the countless series of memory detection fixes:
if the last area of the reported storage size is a hole the detection
loop will loop forever.
Just break chunk detection loop if its end is going to be larger than
reported storage size.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Commit fae8b22d3e
"[S390] Add per-cpu idle time / idle count sysfs attributes" causes
a link error on !CONFIG_SMP.
Fix this by adding some #ifdef's. Real fix would be to cleanup the
code since we don't register a cpu on !CONFIG_SMP. But that would
be quite a big patch. For the time being this is good enough.
arch/s390/kernel/built-in.o: In function `do_monitor_call':
(.text+0x50d4): undefined reference to `per_cpu__s390_idle'
arch/s390/kernel/built-in.o: In function `cpu_idle':
(.text+0x518c): undefined reference to `per_cpu__s390_idle'
make: *** [.tmp_vmlinux1] Error 1
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fix links to files in Documentation/* in various Kconfig files
Signed-off-by: Dirk Hohndel <hohndel@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- De-confuse the defines for the address-space-control-elements
and the segment/region table entries.
- Create out of line functions for page table allocation / freeing.
- Simplify get_shadow_xxx functions.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The current tlb flushing code for page table entries violates the
s390 architecture in a small detail. The relevant section from the
principles of operation (SA22-7832-02 page 3-47):
"A valid table entry must not be changed while it is attached
to any CPU and may be used for translation by that CPU except to
(1) invalidate the entry by using INVALIDATE PAGE TABLE ENTRY or
INVALIDATE DAT TABLE ENTRY, (2) alter bits 56-63 of a page-table
entry, or (3) make a change by means of a COMPARE AND SWAP AND
PURGE instruction that purges the TLB."
That means if one thread of a multithreaded applciation uses a vma
while another thread does an unmap on it, the page table entries of
that vma needs to get removed with IPTE, IDTE or CSP. In some strange
and rare situations a cpu could check-stop (die) because a entry has
been pushed out of the TLB that is still needed to complete a
(milli-coded) instruction. I've never seen it happen with the current
code on any of the supported machines, so right now this is a
theoretical problem. But I want to fix it nevertheless, to avoid
headaches in the futures.
To get this implemented correctly without changing common code the
primitives ptep_get_and_clear, ptep_get_and_clear_full and
ptep_set_wrprotect need to use the IPTE instruction to invalidate the
pte before the new pte value gets stored. If IPTE is always used for
the three primitives three important operations will have a performace
hit: fork, mprotect and exit_mmap. Time for some workarounds:
* 1: ptep_get_and_clear_full is used in unmap_vmas to remove page
tables entries in a batched tlb gather operation. If the mmu_gather
context passed to unmap_vmas has been started with full_mm_flush==1
or if only one cpu is online or if the only user of a mm_struct is the
current process then the fullmm indication in the mmu_gather context is
set to one. All TLBs for mm_struct are flushed by the tlb_gather_mmu
call. No new TLBs can be created while the unmap is in progress. In
this case ptep_get_and_clear_full clears the ptes with a simple store.
* 2: ptep_get_and_clear is used in change_protection to clear the
ptes from the page tables before they are reentered with the new
access flags. At the end of the update flush_tlb_range clears the
remaining TLBs. In general the ptep_get_and_clear has to issue IPTE
for each pte and flush_tlb_range is a nop. But if there is only one
user of the mm_struct then ptep_get_and_clear uses simple stores
to do the update and flush_tlb_range will flush the TLBs.
* 3: Similar to 2, ptep_set_wrprotect is used in copy_page_range
for a fork to make all ptes of a cow mapping read-only. At the end of
of copy_page_range dup_mmap will flush the TLBs with a call to
flush_tlb_mm. Check for mm->mm_users and if there is only one user
avoid using IPTE in ptep_set_wrprotect and let flush_tlb_mm clear the
TLBs.
Overall for single threaded programs the tlb flush code now performs
better, for multi threaded programs it is slightly worse. In particular
exit_mmap() now does a single IDTE for the mm and then just frees every
page cache reference and every page table page directly without a delay
over the mmu_gather structure.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Currently the ccw method is used to ipl the DASD dump record under LPAR.
This mechanism is not reliable, which can cause dump failures. This fix
now uses the diag 308 ipl method for all machines, which have diag308
subcode 5 and 4 support.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add two new sysfs entries per cpu: idle_count and idle_time.
idle_count contains the number of times a cpu went into idle state.
idle_time contains the time a cpu spent in idle state in microseconds.
This can be used e.g. by powertop to tell how often idle state is
entered and left.
# cat /sys/devices/system/cpu/cpu0/idle_count
504
# cat /sys/devices/system/cpu/cpu0/idle_time
469734037 us
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Quoting Randy:
"It seems sad that this patch sources Kconfig.marker, a 7-line file,
20-something times. Yes, you (we) don't want to put those 7 lines into
20-something different files, so sourcing is the right thing.
However, what you did for avr32 seems more on the right track to me: make
_one_ Instrumentation support menu that includes PROFILING, OPROFILE, KPROBES,
and MARKERS and then use (source) that in all of the arches."
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
One of the easiest things to isolate is the pid printed in kernel log.
There was a patch, that made this for arch-independent code, this one makes
so for arch/xxx files.
It took some time to cross-compile it, but hopefully these are all the
printks in arch code.
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
is_init() is an ambiguous name for the pid==1 check. Split it into
is_global_init() and is_container_init().
A cgroup init has it's tsk->pid == 1.
A global init also has it's tsk->pid == 1 and it's active pid namespace
is the init_pid_ns. But rather than check the active pid namespace,
compare the task structure with 'init_pid_ns.child_reaper', which is
initialized during boot to the /sbin/init process and never changes.
Changelog:
2.6.22-rc4-mm2-pidns1:
- Use 'init_pid_ns.child_reaper' to determine if a given task is the
global init (/sbin/init) process. This would improve performance
and remove dependence on the task_pid().
2.6.21-mm2-pidns2:
- [Sukadev Bhattiprolu] Changed is_container_init() calls in {powerpc,
ppc,avr32}/traps.c for the _exception() call to is_global_init().
This way, we kill only the cgroup if the cgroup's init has a
bug rather than force a kernel panic.
[akpm@linux-foundation.org: fix comment]
[sukadev@us.ibm.com: Use is_global_init() in arch/m32r/mm/fault.c]
[bunk@stusta.de: kernel/pid.c: remove unused exports]
[sukadev@us.ibm.com: Fix capability.c to work with threaded init]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Acked-by: Pavel Emelianov <xemul@openvz.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Herbert Poetzel <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All asm/ipc.h files do only #include <asm-generic/ipc.h>.
This patch therefore removes all include/asm-*/ipc.h files and moves the
contents of include/asm-generic/ipc.h to include/linux/ipc.h.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move the OOM killer's extern function prototypes to include/linux/oom.h and
include it where necessary.
[clg@fr.ibm.com: build fix]
Cc: Andrea Arcangeli <andrea@suse.de>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild: (40 commits)
kbuild: introduce ccflags-y, asflags-y and ldflags-y
kbuild: enable 'make CPPFLAGS=...' to add additional options to CPP
kbuild: enable use of AFLAGS and CFLAGS on commandline
kbuild: enable 'make AFLAGS=...' to add additional options to AS
kbuild: fix AFLAGS use in h8300 and m68knommu
kbuild: check for wrong use of CFLAGS
kbuild: enable 'make CFLAGS=...' to add additional options to CC
kbuild: fix up CFLAGS usage
kbuild: make modpost detect unterminated device id lists
kbuild: call export_report from the Makefile
kbuild: move Kai Germaschewski to CREDITS
kconfig/menuconfig: distinguish between selected-by-another options and comments
kconfig: tristate choices with mixed tristate and boolean values
include/linux/Kbuild: remove duplicate entries
kbuild: kill backward compatibility checks
kbuild: kill EXTRA_ARFLAGS
kbuild: fix documentation in makefiles.txt
kbuild: call make once for all targets when O=.. is used
kbuild: pass -g to assembler under CONFIG_DEBUG_INFO
kbuild: update _shipped files for kconfig syntax cleanup
...
Fix up conflicts in arch/um/sys-{x86_64,i386}/Makefile manually.
Introduce architecture dependent kretprobe blacklists to prohibit users
from inserting return probes on the function in which kprobes can be
inserted but kretprobes can not.
This patch also removes "__kprobes" mark from "__switch_to" on x86_64 and
registers "__switch_to" to the blacklist on x86-64, because that mark is to
prohibit user from inserting only kretprobe.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have had complaints where a threaded application is left in a bad state
after one of it's threads is killed when we hit a VM: out_of_memory
condition.
Killing just one of the process threads can leave the application in a bad
state, whereas killing the entire process group would allow for the
application to restart, or be otherwise handled, and makes it very obvious
that something has gone wrong.
This change allows the entire process group to be taken down, rather
than just the one thread.
Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Identical handlers of PTRACE_DETACH go into ptrace_request().
Not touching compat code.
Not touching archs that don't call ptrace_request.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The variable AFLAGS is a wellknown variable and the usage by
kbuild may result in unexpected behaviour.
On top of that several people over time has asked for a way to
pass in additional flags to gcc.
This patch replace use of AFLAGS with KBUILD_AFLAGS all over
the tree.
Patch was tested on following architectures:
alpha, arm, i386, x86_64, mips, sparc, sparc64, ia64, m68k, s390
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
The variable CFLAGS is a wellknown variable and the usage by
kbuild may result in unexpected behaviour.
On top of that several people over time has asked for a way to
pass in additional flags to gcc.
This patch replace use of CFLAGS with KBUILD_CFLAGS all over the
tree and enabling one to use:
make CFLAGS=...
to specify additional gcc commandline options.
One usecase is when trying to find gcc bugs but other
use cases has been requested too.
Patch was tested on following architectures:
alpha, arm, i386, x86_64, mips, sparc, sparc64, ia64, m68k
Test was simple to do a defconfig build, apply the patch and check
that nothing got rebuild.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Run the lockdep_sys_exit hook before returning to user space.
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Make sure parameter list of the pfault token function is eight byte
aligned. Otherwise we can get specification exceptions.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Replace the hardcoded 4096 value with the PAGE_SIZE macro.
Converted a few decimal numbers to readable hex numbers.
Use of PAGE_SIZE required a small change to page.h
to allow PAGE_SIZE to be used from assembler/linker scripts.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Introduce a consistent style in vmlinux.lds.
This style is gradually being introduced for all archs.
A few lables were moved inside the section definition so
they are assigned the correct value of gcc decide to align
the content to another address than the one . has.
In the past this has fixed several bugs but for s390 it
will not impact due to all the alignmnet already introduced.
Stabs definitions are consolidated in asm-generic/vmlinux.lds.h
This patch also introduce support for DWARF - without knowing
if this makes sense for s390.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
After assigning values to specific registers memset was called. This
may clobber the contents of the used registers.
To solve this extract the two used inline assemblies into small
functions that don't call any functions.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If we use the CLEAR ipl option, reipl is faster, since then VM can release
the memory, which has been paged out.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Also removes a bunch of ^L in drivers/s390/cio/cmf.c
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
There is no need to assign "0" to "hops" twice. Remove one assigment.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The termination condition of the loop that prints the operands of
an instruction doesn't stop after the maximum of 6 operands.
It continues with the operands of the next instruction format
instead which create really long lines.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
arch/s390/Kconfig tells us that CONFIG_APPLDATA_BASE is bool and hence can
never be built modular. Given this, defining appldata_exit() function is
pointless (and wasteful, actually). Remove all that.
Previous patch annotated appldata_offline_cpu() as __cpuexit, but now with the
__exit function appldata_exit() gone, the only callsite that references it is
__cpuinit, so this function can also be __cpuinit, thereby saving space when
HOTPLUG_CPU=n.
Signed-off-by: Satyam Sharma <satyam@infradead.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
appldata_offline_cpu() is only called from __cpuinit-marked hotplug
notifier callback and from the __exit-marked module_exit function,
therefore candidate for __cpuexit.
BTW the __exit module_exit function appldata_exit() of this driver fails to
unregister_hotcpu_notifier() the notifier_block that was registered by
appldata_init() during module startup. This will lead to oops if hotplug
notification comes after module has been unloaded. Let's fix this by
unregistering the notifier appropriately (before appldata_offline_cpu()'ing
the CPUs).
Signed-off-by: Satyam Sharma <satyam@infradead.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
There are currently several SHA implementations that all define their own
initialization vectors and size values. Since this values are idential
move them to a header file under include/crypto.
Signed-off-by: Jan Glauber <jang@de.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Loading the crypto algorithm by the alias instead of by module directly
has the advantage that all possible implementations of this algorithm
are loaded automatically and the crypto API can choose the best one
depending on its priority.
Additionally it ensures that the generic implementation as well as the
HW driver (if available) is loaded in case the HW driver needs the
generic version as fallback in corner cases.
Also remove the probe for sha1 in padlock's init code.
Quote from Herbert:
The probe is actually pointless since we can always probe when
the algorithm is actually used which does not lead to dead-locks
like this.
Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Loading the crypto algorithm by the alias instead of by module directly
has the advantage that all possible implementations of this algorithm
are loaded automatically and the crypto API can choose the best one
depending on its priority.
Additionally it ensures that the generic implementation as well as the
HW driver (if available) is loaded in case the HW driver needs the
generic version as fallback in corner cases.
Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch makes most of the generic device layer network
namespace safe. This patch makes dev_base_head a
network namespace variable, and then it picks up
a few associated variables. The functions:
dev_getbyhwaddr
dev_getfirsthwbytype
dev_get_by_flags
dev_get_by_name
__dev_get_by_name
dev_get_by_index
__dev_get_by_index
dev_ioctl
dev_ethtool
dev_load
wireless_process_ioctl
were modified to take a network namespace argument, and
deal with it.
vlan_ioctl_set and brioctl_set were modified so their
hooks will receive a network namespace argument.
So basically anthing in the core of the network stack that was
affected to by the change of dev_base was modified to handle
multiple network namespaces. The rest of the network stack was
simply modified to explicitly use &init_net the initial network
namespace. This can be fixed when those components of the network
stack are modified to handle multiple network namespaces.
For now the ifindex generator is left global.
Fundametally ifindex numbers are per namespace, or else
we will have corner case problems with migration when
we get that far.
At the same time there are assumptions in the network stack
that the ifindex of a network device won't change. Making
the ifindex number global seems a good compromise until
the network stack can cope with ifindex changes when
you change namespaces, and the like.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As noted by Christoph Hellwig, pktgen was the only user so
it can now be removed.
[ Add missing cases caught by Adrian Bunk. -DaveM ]
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
Placing a kprobe on "bc" instruction (s390/s390x) can cause an oops.
The instruction length is encoded into the first two bits of the s390
instruction. Kprobe is incorrectly computing the instruction length.
The instruction length is used for determining what type of "fix-up" is
needed for conditional branch instruction. The problem can bee seen by
placing a kprobe on a "bc" instruction that will not branch. The
results is that Kprobe incorrectly computes the new instruction
pointer (psw.addr) after single stepping the instruction. The problem
is corrected with this patch.
Signed-off-by: David Wilder <dwilder@us.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
hypfs removes the whole hypfs directory tree and creates a new one, when a
process triggers an update by writing to the "update" attribute. When removing
and creating files, it is necessary to lock the inode of the parent directory
where the files live. Currently hypfs does not lock the parent inode, which
can lead to inode corruption. This patch:
* Introduces correct locking
* Fixes i_nlink reference counting for inodes, when creating directories
* Adds info printk, when hypfs filesystem has been mounted
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
The instruction table for b2 opcodes was missing an opfrag value
for the cpya instruction. All instructions specified after cpya
were not considered by the disassembler. The fix is simple and
obvious - add the opfrag field to the cpya instruction.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
There are several s390 diagnose calls, which must be executed below the
2GB memory boundary. In order to enforce this, those diagnoses must be
compiled into the kernel. Currently diag 14 can be called within the
vmur kernel module from addresses above 2GB. This leads to specification
exceptions. This patch moves diag10, diag14 and diag210 into the new
diag.c file.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
If a machine check is pending and the external or I/O interrupt handler
returns to userspace io_mcck_pending is going to call s390_handle_mcck.
Before this happens a call to TRACE_IRQS_ON was already made since we
know that we are going back to userspace and hence interrupts will be
enabled. So there was an indication that interrupts are enabled while
in reality they are still disabled.
s390_handle_mcck will do a local_irq_save/restore pair and confuse
lockdep which later complains about inconsistent irq tracing.
To solve this just call trace_hardirqs_off before calling
s390_handle_mcck and trace_hardirqs_on afterwards.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch implements support of fallocate system call on s390(x)
platform. A wrapper is added to address the issue which s390 ABI has with
the arguments of this system call.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
There is no need to disable bottom halves when holding call_lock. Also
this could imply that it is legal to call smp_call_function* from
bh context, which it is not.
Also test if func will be executed locally before disabling
and aterwards enabling interrupts again. It's not necessary to disable
and enable interrupts each time __smp_call_function_map gets called.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
smp_call_function_single now has the same semantics as s390's
smp_call_function_on. Therefore convert to the *single variant
and get rid of some architecture specific code.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This changes the s390 linker script to use the asm-generic NOTES macro so that
ELF note sections with SHF_ALLOC set are linked into the kernel image along
with other read-only data. The PT_NOTE also points to their location.
This paves the way for putting useful build-time information into ELF notes
that can be found easily later in a kernel memory dump.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
per cpu data section contains two types of data. One set which is
exclusively accessed by the local cpu and the other set which is per cpu,
but also shared by remote cpus. In the current kernel, these two sets are
not clearely separated out. This can potentially cause the same data
cacheline shared between the two sets of data, which will result in
unnecessary bouncing of the cacheline between cpus.
One way to fix the problem is to cacheline align the remotely accessed per
cpu data, both at the beginning and at the end. Because of the padding at
both ends, this will likely cause some memory wastage and also the
interface to achieve this is not clean.
This patch:
Moves the remotely accessed per cpu data (which is currently marked
as ____cacheline_aligned_in_smp) into a different section, where all the data
elements are cacheline aligned. And as such, this differentiates the local
only data and remotely accessed data cleanly.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: <linux-arch@vger.kernel.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch completes Linus's wish that the fault return codes be made into
bit flags, which I agree makes everything nicer. This requires requires
all handle_mm_fault callers to be modified (possibly the modifications
should go further and do things like fault accounting in handle_mm_fault --
however that would be for another patch).
[akpm@linux-foundation.org: fix alpha build]
[akpm@linux-foundation.org: fix s390 build]
[akpm@linux-foundation.org: fix sparc build]
[akpm@linux-foundation.org: fix sparc64 build]
[akpm@linux-foundation.org: fix ia64 build]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Bryan Wu <bryan.wu@analog.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Still apparently needs some ARM and PPC loving - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Identical implementations of PTRACE_POKEDATA go into generic_ptrace_pokedata()
function.
AFAICS, fix bug on xtensa where successful PTRACE_POKEDATA will nevertheless
return EPERM.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the kernel OOPSed or BUGed then it probably should be considered as
tainted. Thus, all subsequent OOPSes and SysRq dumps will report the
tainted kernel. This saves a lot of time explaining oddities in the
calltraces.
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Added parisc patch from Matthew Wilson -Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sparse gives us a few of these:
stacktrace.c:69:38: warning: incorrect type in argument 2
(different signedness)
stacktrace.c:69:38: expected unsigned int *skip
Just get rid of the 'skip' argument since it is contained in the
struct stack_trace that gets passed anyway.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
z/VM Unit record character device driver to access VM reader, punch,
and printer.
Signed-off-by: Frank Munzert <munzert@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The instructions with format RX_URRD and SI_URD and instructions
with a PC relative operand are not disassembled correctly.
For RX_URRD and SI_URD instructions find_insn sets opfrag to code[0].
The mask byte of these two formats is 0x00. table->opfrag will never
be identical to (opfrag & opmask) and no matching instruction will
be found. Set the mask byte to 0xff to actually check byte 0 against
the table.
For PC relative instructions the (unsigned) offset value needs to be
casted to an signed integer so that negative branch offsets are
handled correctly.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The current generic bug implementation has a call to dump_stack() in case a
WARN_ON(whatever) gets hit. Since report_bug(), which calls dump_stack(),
gets called from an exception handler we can do better: just pass the
pt_regs structure to report_bug() and pass it to show_regs() in case of a
warning. This will give more debug informations like register contents,
etc... In addition this avoids some pointless lines that dump_stack()
emits, since it includes a stack backtrace of the exception handler which
is of no interest in case of a warning. E.g. on s390 the following lines
are currently always present in a stack backtrace if dump_stack() gets
called from report_bug():
[<000000000001517a>] show_trace+0x92/0xe8)
[<0000000000015270>] show_stack+0xa0/0xd0
[<00000000000152ce>] dump_stack+0x2e/0x3c
[<0000000000195450>] report_bug+0x98/0xf8
[<0000000000016cc8>] illegal_op+0x1fc/0x21c
[<00000000000227d6>] sysc_return+0x0/0x10
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
s390 is the only 32bit with unsigned long for size_t (usual for those
is unsigned int). Tell sparse...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sysfs is now completely out of driver/module lifetime game. After
deletion, a sysfs node doesn't access anything outside sysfs proper,
so there's no reason to hold onto the attribute owners. Note that
often the wrong modules were accounted for as owners leading to
accessing removed modules.
This patch kills now unnecessary attribute->owner. Note that with
this change, userland holding a sysfs node does not prevent the
backing module from being unloaded.
For more info regarding lifetime rule cleanup, please read the
following message.
http://article.gmane.org/gmane.linux.kernel/510293
(tweaked by Greg to not delete the field just yet, to make it easier to
merge things properly.)
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
sched-cfs-v2.6.22-git-v18.patch introduces CPU_IDLE in sched.h.
This conflict with the already existing define in
include/asm-s390/processor.h
Just rename the s390 defines, since they will go away as soon as
we support CONFIG_NO_HZ instead of our own CONFIG_NO_IDLE_HZ.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
After the in-kernel system call has been remove the system call path
can be optimized. The problem state bit of the old psw is always set
between system_call and sysc_do_svc. SAVE_ALL_SVC uses this information
to avoid two instructions.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The bogomips calculation triggered via reading from /proc/cpuinfo
can return incorrect values if the qrnnd assembly is called with a
pointer in %r2 with any of the upper 32 bits set.
Fix this by using 64 bit division / remainder operation provided by
gcc instead of calling the assembly.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Merge smp_count_cpus() and smp_get_save_areas() so we save a loop over
all potentially present cpus.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Check if a command is available before executing. Saves some
superfluous service calls that won't succeed anyway.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Introduce some new interfaces so that random subsystems don't have to
mess around with sclp internal structures.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This is mainly to switch off all potentially debugging stuff that
won't report anything useful after an oops happened.
Besided that setting pause_on_oops will work too, but doesn't make
too much sense on s390.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Print list of modules on die() like a lot of other architectures do.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When appending the 'cio_ignore' kernel parameter to the command line, a blank
has to be inserted in order to separate 'cio_ignore' from the preceding kernel
parameters.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Use the __cpuinit instead of __devinit section annotations for code
that deals with cpu hotplug. In addition add some more annotations on
functions that have been left out so far.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
To be able to run with the diagnose 224 switched off, a potential
specification exception has to be handled.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fix:
mm/slab: fix section mismatch warning
mm: fix section mismatch warnings
init/main: use __init_refok to fix section mismatch
kbuild: introduce __init_refok/__initdata_refok to supress section mismatch warnings
all-archs: consolidate .data section definition in asm-generic
all-archs: consolidate .text section definition in asm-generic
kbuild: add "Section mismatch" warning whitelist for powerpc
kbuild: make better section mismatch reports on i386, arm and mips
kbuild: make modpost section warnings clearer
kconfig: search harder for curses library in check-lxdialog.sh
kbuild: include limits.h in sumversion.c for PATH_MAX
powerpc: Fix the MODALIAS generation in modpost for of devices
s390 change for git commit 0f95b7fc83.
That is print kprobes debug data before BUG().
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Wire up sys_utimensat, reserve syscall number for sys_fallocate and
add a couple of syscalls to the ignore list to get rid of warings.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When auditing syscalls that send signals, log the pid and security
context for each target process. Optimize the data collection by
adding a counter for signal-related rules, and avoiding allocating an
aux struct unless we have more than one target process. For process
groups, collect pid/context data in blocks of 16. Move the
audit_signal_info() hook up in check_kill_permission() so we audit
attempts where permission is denied.
Signed-off-by: Amy Griffis <amy.griffis@hp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Disband drivers/s390/Kconfig, use the common Kconfig files. The s390
specific config options from drivers/s390/Kconfig are moved to the
respective common Kconfig files.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch fixes compilation on s390 after the removal of
struct subsystem.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits)
sound: convert "sound" subdirectory to UTF-8
MAINTAINERS: Add cxacru website/mailing list
include files: convert "include" subdirectory to UTF-8
general: convert "kernel" subdirectory to UTF-8
documentation: convert the Documentation directory to UTF-8
Convert the toplevel files CREDITS and MAINTAINERS to UTF-8.
remove broken URLs from net drivers' output
Magic number prefix consistency change to Documentation/magic-number.txt
trivial: s/i_sem /i_mutex/
fix file specification in comments
drivers/base/platform.c: fix small typo in doc
misc doc and kconfig typos
Remove obsolete fat_cvf help text
Fix occurrences of "the the "
Fix minor typoes in kernel/module.c
Kconfig: Remove reference to external mqueue library
Kconfig: A couple of grammatical fixes in arch/i386/Kconfig
Correct comments in genrtc.c to refer to correct /proc file.
Fix more "deprecated" spellos.
Fix "deprecated" typoes.
...
Fix trivial comment conflict in kernel/relay.c.
This finally renames the thread_info field in task structure to stack, so that
the assumptions about this field are gone and archs have more freedom about
placing the thread_info structure.
Nonbroken archs which have a proper thread pointer can do the access to both
current thread and task structure via a single pointer.
It'll allow for a few more cleanups of the fork code, from which e.g. ia64
could benefit.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
[akpm@linux-foundation.org: build fix]
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Andi Kleen <ak@muc.de>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since nonboot CPUs are now disabled after tasks and devices have been
frozen and the CPU hotplug infrastructure is used for this purpose, we need
special CPU hotplug notifications that will help the CPU-hotplug-aware
subsystems distinguish normal CPU hotplug events from CPU hotplug events
related to a system-wide suspend or resume operation in progress. This
patch introduces such notifications and causes them to be used during
suspend and resume transitions. It also changes all of the
CPU-hotplug-aware subsystems to take these notifications into consideration
(for now they are handled in the same way as the corresponding "normal"
ones).
[oleg@tv-sign.ru: cleanups]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Include the new linux/kdebug.h instead of asm/kdebug.h.
Simply remove the asm/kdebug.h include if both had been included.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch provides a debugfs knob to turn kprobes on/off
o A new file /debug/kprobes/enabled indicates if kprobes is enabled or
not (default enabled)
o Echoing 0 to this file will disarm all installed probes
o Any new probe registration when disabled will register the probe but
not arm it. A message will be printed out in such a case.
o When a value 1 is echoed to the file, all probes (including ones
registered in the intervening period) will be enabled
o Unregistration will happen irrespective of whether probes are globally
enabled or not.
o Update Documentation/kprobes.txt to reflect these changes. While there
also update the doc to make it current.
We are also looking at providing sysrq key support to tie to the disabling
feature provided by this patch.
[akpm@linux-foundation.org: Use bool like a bool!]
[akpm@linux-foundation.org: add printk facility levels]
[cornelia.huck@de.ibm.com: Add the missing arch_trampoline_kprobe() for s390]
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- consolidate duplicate code in all arch_prepare_kretprobe instances
into common code
- replace various odd helpers that use hlist_for_each_entry to get
the first elemenet of a list with either a hlist_for_each_entry_save
or an opencoded access to the first element in the caller
- inline add_rp_inst into it's only remaining caller
- use kretprobe_inst_table_head instead of opencoding it
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.
Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch moves the die notifier handling to common code. Previous
various architectures had exactly the same code for it. Note that the new
code is compiled unconditionally, this should be understood as an appel to
the other architecture maintainer to implement support for it aswell (aka
sprinkling a notify_die or two in the proper place)
arm had a notifiy_die that did something totally different, I renamed it to
arm_notify_die as part of the patch and made it static to the file it's
declared and used at. avr32 used to pass slightly less information through
this interface and I brought it into line with the other architectures.
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix vmalloc_sync_all bustage]
[bryan.wu@analog.com: fix vmalloc_sync_all in nommu]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: <linux-arch@vger.kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Architectures that don't support DMA can say so by adding a config NO_DMA
to their Kconfig file. This will prevent compilation of some dma specific
driver code. Also dma-mapping-broken.h isn't needed anymore on at least
s390. This avoids compilation and linking of otherwise dead/broken code.
Other architectures that include dma-mapping-broken.h are arm26, h8300,
m68k, m68knommu and v850. If these could be converted as well we could get
rid of the header file.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
"John W. Linville" <linville@tuxdriver.com>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Cc: <James.Bottomley@SteelEye.com>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: <geert@linux-m68k.org>
Cc: <zippel@linux-m68k.org>
Cc: <spyro@f2s.com>
Cc: <uclinux-v850@lsi.nec.co.jp>
Cc: <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (231 commits)
[PATCH] i386: Don't delete cpu_devs data to identify different x86 types in late_initcall
[PATCH] i386: type may be unused
[PATCH] i386: Some additional chipset register values validation.
[PATCH] i386: Add missing !X86_PAE dependincy to the 2G/2G split.
[PATCH] x86-64: Don't exclude asm-offsets.c in Documentation/dontdiff
[PATCH] i386: avoid redundant preempt_disable in __unlazy_fpu
[PATCH] i386: white space fixes in i387.h
[PATCH] i386: Drop noisy e820 debugging printks
[PATCH] x86-64: Fix allnoconfig error in genapic_flat.c
[PATCH] x86-64: Shut up warnings for vfat compat ioctls on other file systems
[PATCH] x86-64: Share identical video.S between i386 and x86-64
[PATCH] x86-64: Remove CONFIG_REORDER
[PATCH] x86-64: Print type and size correctly for unknown compat ioctls
[PATCH] i386: Remove copy_*_user BUG_ONs for (size < 0)
[PATCH] i386: Little cleanups in smpboot.c
[PATCH] x86-64: Don't enable NUMA for a single node in K8 NUMA scanning
[PATCH] x86: Use RDTSCP for synchronous get_cycles if possible
[PATCH] i386: Add X86_FEATURE_RDTSCP
[PATCH] i386: Implement X86_FEATURE_SYNC_RDTSC on i386
[PATCH] i386: Implement alternative_io for i386
...
Fix up trivial conflict in include/linux/highmem.h manually.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (49 commits)
[SCTP]: Set assoc_id correctly during INIT collision.
[SCTP]: Re-order SCTP initializations to avoid race with sctp_rcv()
[SCTP]: Fix the SO_REUSEADDR handling to be similar to TCP.
[SCTP]: Verify all destination ports in sctp_connectx.
[XFRM] SPD info TLV aggregation
[XFRM] SAD info TLV aggregationx
[AF_RXRPC]: Sort out MTU handling.
[AF_IUCV/IUCV] : Add missing section annotations
[AF_IUCV]: Implementation of a skb backlog queue
[NETLINK]: Remove bogus BUG_ON
[IPV6]: Some cleanups in include/net/ipv6.h
[TCP]: zero out rx_opt in tcp_disconnect()
[BNX2]: Fix TSO problem with small MSS.
[NET]: Rework dev_base via list_head (v3)
[TCP] Highspeed: Limited slow-start is nowadays in tcp_slow_start
[BNX2]: Update version and reldate.
[BNX2]: Print bus information for PCIE devices.
[BNX2]: Add 1-shot MSI handler for 5709.
[BNX2]: Restructure PHY event handling.
[BNX2]: Add indirect spinlock.
...
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6:
remove "struct subsystem" as it is no longer needed
sysfs: printk format warning
DOC: Fix wrong identifier name in Documentation/driver-model/devres.txt
platform: reorder platform_device_del
Driver core: fix show_uevent from taking up way too much stack
Commit c1821c2e97 introduced the
uaccess structure that is used to select the correct set of user
copy functions for the different execution modes (standard vs.
noexec vs. z9 optimized user copy). The uaccess symbol is exported
with EXPORT_SYMBOL_GPL. This breaks all non-gpl modules that use
user copy. To make them work again change the export to
EXPORT_SYMBOL.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Register aes-s390 algorithms with the actual supported max keylen size
Signed-off-by: Jan Glauber <jan.glauber@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
And here's a port of the powerpc patch to get rid of the notifier
chain completely to s390. It's ontop of Martins patch as that one
is in mainline already.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cleanup of dev_base list use, with the aim to simplify making device
list per-namespace. In almost every occasion, use of dev_base variable
and dev->next pointer could be easily replaced by for_each_netdev
loop. A few most complicated places were converted to using
first_netdev()/next_netdev().
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to work on cleaning up the relationship between kobjects, ksets and
ktypes. The removal of 'struct subsystem' is the first step of this,
especially as it is not really needed at all.
Thanks to Kay for fixing the bugs in this patch.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Let's allow page-alignment in general for per-cpu data (wanted by Xen, and
Ingo suggested KVM as well).
Because larger alignments can use more room, we increase the max per-cpu
memory to 64k rather than 32k: it's getting a little tight.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Herbert Xu conviced me that a new flag was overkill; every driver
currently overrides get_stats, so we might as well make the internal
one the default. If someone did fail to set get_stats, they would now
get all 0 stats instead of "No statistics available".
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
arch/s390/appldata/appldata_base.c has some confusing debugging code left
over to allow compiling it as a module. In practice, it cannot be configured
as module and there is no need to keep that code.
Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The clock synchronization of the ETR code requires an smp_call_function
to synchronize all cpus. Calling smp_call_function from a tasklet is
illegal. Replace the tasklet with a job on the global workqueue.
ETR work is rare and can be postponed to a be done by a kernel thread.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Generate uevents for all cpus if cpu capability changes. This can
happen e.g. because the cpus are overheating. The cpu capability can
be read via /sys/devices/system/cpu/cpuN/capability.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
s390 machines provide hardware support for creating Linux dumps on SCSI
disks. For creating a dump a special purpose dump Linux is used. The first
32 MB of memory are saved by the hardware before the dump Linux is
booted. Via an SCLP interface, the saved memory can be accessed from
Linux. This patch exports memory and registers of the crashed Linux to
userspace via a debugfs file. For more information refer to
Documentation/s390/zfcpdump.txt, which is included in this patch.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Recent cvs versions of gcc have support for an improved stack overflow
checking that calculates the size of the guard size for each function.
If the compiler accepts -mstack-size without -mstack-guard then the
new stack check is available. We always want to use the new stack
checker.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Simplify the signal_return function that checks for the two special
system calls sigreturn and rt_sigreturn. No need to do a page table
walk, a call to copy_from_user while disabled page faults will work
as well.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
The minor fault path has grown a lot in terms of cycles. In particular
the kprobes hook is very costly. Optimize the path to save a couple of
cycles. If kprobes is enabled more than 300 cycles can be avoided if
kprobes_running() is false.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Generic bug implementation for s390. Will increase the value of the
console output on BUG() statements since registers r0-r5,r14 will
not be clobbered by a printk() call that was previously done before
the illegal instruction of BUG() was hit.
Also implements an architecture specific WARN_ON(). Output of that
could be increased but requires common code change.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
This patch adds two improvements to the oops output. First it adds an
additional line after the PSW which decodes the different fields of it.
Second a disassembler is added that decodes the instructions surrounding
the faulting PSW. The output of a test oops now looks like this:
kernel BUG at init/main.c:419
illegal operation: 0001 [#1]
CPU: 0 Not tainted
Process swapper (pid: 0, task: 0000000000464968, ksp: 00000000004be000)
Krnl PSW : 0700000180000000 00000000000120b6 (rest_init+0x36/0x38)
R:0 T:1 IO:1 EX:1 Key:0 M:0 W:0 P:0 AS:0 CC:0 PM:0 EA:3
Krnl GPRS: 0000000000000003 00000000004ba017 0000000000000022 0000000000000001
000000000003a5f6 0000000000000000 00000000004be6a8 0000000000000000
0000000000000000 00000000004b8200 0000000000003a50 0000000000008000
0000000000516368 000000000033d008 00000000000120b2 00000000004bdee0
Krnl Code: 00000000000120a6: e3e0f0980024 stg %r14,152(%r15)
00000000000120ac: c0e500014296 brasl %r14,3a5d8
00000000000120b2: a7f40001 brc 15,120b4
>00000000000120b6: 0707 bcr 0,%r7
00000000000120b8: eb7ff0500024 stmg %r7,%r15,80(%r15)
00000000000120be: c0d000195825 larl %r13,33d108
00000000000120c4: a7f13f00 tmll %r15,16128
00000000000120c8: a7840001 brc 8,120ca
Call Trace:
([<00000000000120b2>] rest_init+0x32/0x38)
[<00000000004be614>] start_kernel+0x37c/0x410
[<0000000000012020>] _ehead+0x20/0x80
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Remove system call glue for sys_clone, sys_fork, sys_vfork, sys_execve,
sys_sigreturn, sys_rt_sigreturn and sys_sigaltstack. Call do_execve from
kernel_execve directly, move pt_regs to the right place and branch to
sysc_return to start the user space program. This removes the last
in-kernel system call.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Clean interface between cio and ipl code, so Peter stops complaining.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
If both sclp and diag memory detection don't work stop at the first
memory hole. Otherwise the code might loop forever...
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Allow s390 to properly override the generic
__div64_32() implementation by:
1) Using obj-y for div64.o in s390's makefile instead
of lib-y
2) Adding the weak attribute to the generic implementation.
Signed-off-by: David S. Miller <davem@davemloft.net>
Network drivers which keep stats allocate their own stats structure
then write a get_stats() function to return them. It would be nice if
this were done by default.
1) Add a new "stats" field to "struct net_device".
2) Add a new feature field to say "this driver uses the internal one"
3) Have a default "get_stats" which returns NULL if that feature not set.
4) Change callers to check result of get_stats call for NULL, not if
->get_stats is set.
This should not break backwards compatibility with older drivers, yet
allow modern drivers to shed some boilerplate code.
Lightly tested: works for a modified lguest network driver.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Running a probe on s390 with a probe address that is not 4 byte aligned
results in a Kernel BUG. The problem is that the stura instruction used
by swap_instruction requires the destination address to be 4 byte aligned.
As stura only writes 4 bytes, aligning to the next 4 byte aligned address
results in the breakpoint instruction being stored past the probe address.
The fix is to align the address backward (to the previous 4 byte aligned
address) and writing the two byte breakpoint instruction in the appropriate
bytes.
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Wilder <dwilder@us.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
We used wrong length values for ipl and dump hardware structures.
Since z/VM checks the ipl parameters more accurately than LPAR,
the operations fail there.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
strlcpy already accounts for the trailing zero in its length
computation, so there is no need to substract one to the buffer size.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
diag 260 returns the address of the last addressable byte and not the
size of memory. Since we want the size we have to add 1 to the return
value.
Disable diag 260 for non z/Arch mode since it doesn't work there
anyway.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
PGALLOC_DMA is defined only if we have CONFIG_ZONE_DMA
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replaced check_user_space() + __check_access_register with the new
check_space(). The old functions made wrong assumptions about kernel
and user space when the kernel and user address spaces are switched
(kernel in home space, user in primary/secondary space).
Secondly the user process can switch to the accress register mode if
it is running in primary or secondary mode. In addition it can load
an arbitrary value to the access registers. If any other value than
0 for primary space or 1 for secondary space is loaded and memory
is accessed using the base register related to the access register,
the program should be terminated with a SIGSEGV. To achieve that the
DUALD pointer in the DUCT and the PSALD pointer in the PASTE need
to point to an array of 8 invalid access-list entries to get a
ALEN-translation exception if an invalid alet is used.
Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
nss and kexec don't work together since kexec wants to write to the
read-only text section of the shared kernel image.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reipl doesn't work on older machines were s390_reset_machine() gets
called. The reason is that the text section is read-only but the
variable dump_prefix_page is there. Since s390_reset_machine() writes
to it we get a protection exception.
Therefore move dump_prefix_page to the bss section.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Avoid sprinkling a _lot_ of preempt_disable/preempt_enable pairs.
This would be necessary for e.g. the iucv driver. Also this way we
are more consistent with other architectures which disable
preemption at least for smp_call_function.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The illegal operation handler calls the die notifier with DIE_BPT to
let kprobes pick up its breakpoint. If kprobes does not find its
breakpoint it returns NOTIFY_STOP instead of NOTIFY_DONE.
Since we use stop_machine_run on s390 to arm/disarm the kprobes
breakpoints the race that kprobe_handler tries to solve by checking
for the kprobes breakpoints does not exist. Removing the check makes
BUG_ON working again.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Since $(ARCH) is always "s390" we can replace it with "s390".
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
With CONFIG_SHARED_KERNEL the kernel text segment that might be in a
read only memory sections starts at 1MB. Memory between 0x12000 and
0x100000 is unused then. Free this, so we have appr. an extra MB
of memory available.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Clear only memory from __bss_start to __bss_stop when clearing the bss
section. Not until _end, which currently happens to be the same.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
To avoid ugly warings for older gccs, we replace
BUG() with "return NULL", which is just as well.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Setup.h has been misused for ipl related stuff in the past. We now move
everything, which has to do with ipl and reipl to a new header file named
"ipl.h".
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Replace two stidp inline assemblies with one global implementation.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Force reading of *in_sync in while loop. Loops where the content that
is checked for is changed by a different cpu always should have some
sort of barrier() semantics.
Otherwise this might lead to very subtle bugs.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Disable ZONE_DMA on 31-bit. All memory is addressable by all
devices and we do not need any special memory pool.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Introduce __smp_call_function_map which calls a function on all cpus
given with a cpumask_t. Use it to implement smp_call_function and
smp_call_function_on. Replace smp_ext_bitcall_others with smp_ext_bitcall
and a for_each_cpu_mask loop. Use a cpumask_t instead of an atomic_t for
cpu counting and print a warning if preempt is on in
__smp_call_function_map().
Signed-off-by: Jan Glauber <jan.glauber@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The new delay implementation uses the clock comparator and an external
interrupt even if it is called disabled for interrupts. To do this
all external interrupt source except clock comparator are switched of
before enabling external interrupts. The external interrupt at the
end of the delay period may not execute softirqs or we can end up in a
dead-lock.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fixup the is_contionous replacement by a flag field.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The semantic effect of insert_at_head is that it would allow new registered
sysctl entries to override existing sysctl entries of the same name. Which is
pain for caching and the proc interface never implemented.
I have done an audit and discovered that none of the current users of
register_sysctl care as (excpet for directories) they do not register
duplicate sysctl entries.
So this patch simply removes the support for overriding existing entries in
the sys_sysctl interface since no one uses it or cares and it makes future
enhancments harder.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Corey Minyard <minyard@acm.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: David Chinner <dgc@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We need to have the the definition of all top level sysctl directories
registers in sysctl.h so we don't conflict by accident and cause abi problems.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
[S390] remove __io_virt and mmiowb.
[S390] cio: use ARRAY_SIZE in device_id.c
[S390] cio: Fixup interface for setting options on ccw devices.
[S390] smp_call_function/smp_call_function_on locking.
Many struct file_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.
[akpm@osdl.org: sparc64 fix]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
smp_call_function and smp_call_function_on share the same lock and
smp_call_function_on disables softirq's so it can be called from
softirq context as well. Hence smp_call_function muss disable
softirqs as well to avoid deadlocks.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Martin Schwidefsky wrote:
"s390 does not even need (in|out)b(_p|). I wondered what else from
io.h do we not need. The answer is: almost nothing. With the devres
patch from Al and the dma-mapping patch from Heiko we can get rid of
iomem and all associated definitions."
So we'll just need to replace NO_IOPORT with NO_IOMEM in Kconfig and
kill arch/s390/mm/ioremap.c.
BTW, there's an annoying bit of junk in there - IO_SPACE_LIMIT. We
only need it for /proc/ioports, which AFAICS shouldn't even be there
on s390 (or uml). OTOH, removing that thing would mean a user-visible
change - we go from "empty file in /proc" to "no such file in /proc"...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Split the implementation-agnostic stuff in separate files.
* Make sure that targets using non-default request_irq() pull
kernel/irq/devres.o
* Introduce new symbols (HAS_IOPORT and HAS_IOMEM) defaulting to positive;
allow architectures to turn them off (we needed these symbols anyway for
dependencies of quite a few drivers).
* protect the ioport-related parts of lib/devres.o with CONFIG_HAS_IOPORT.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Part of long forgotten patch
http://groups.google.com/group/fa.linux.kernel/msg/e98e941ce1cf29f6?dmode=source
Since then, m32r grabbed two copies.
Leave s390 copy because of important absence of CONFIG_VT, but remove
references to non-existent timerlist_lock. ia64 also loses timerlist_lock.
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I noticed that almost all architectures implemented exactly the same
sys32_sysinfo... except parisc, where a bug was to be found in handling of
the uptime. So let's remove a whole whack of code for fun and profit.
Cribbed compat_sys_sysinfo from x86_64's implementation, since I figured it
would be the best tested.
This patch incorporates Arnd's suggestion of not using set_fs/get_fs, but
instead extracting out the common code from sys_sysinfo.
Cc: Christoph Hellwig <hch@infradead.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Update all arch/*/kernel/vmlinux.lds.S to not include space for initramfs
when CONFIG_BLK_DEV_INITRAMFS is not selected. This saves another 4 kbytes
on most platfoms (some reserve PAGE_SIZE for initramfs).
Signed-off-by: Jean-Paul Saman <jean-paul.saman@nxp.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch simply defines CONFIG_ZONE_DMA for all arches. We later do special
things with CONFIG_ZONE_DMA after the VM and an arch are prepared to work
without ZONE_DMA.
CONFIG_ZONE_DMA can be defined in two ways depending on how an architecture
handles ISA DMA.
First if CONFIG_GENERIC_ISA_DMA is set by the arch then we know that the arch
needs ZONE_DMA because ISA DMA devices are supported. We can catch this in
mm/Kconfig and do not need to modify arch code.
Second, arches may use ZONE_DMA in an unknown way. We set CONFIG_ZONE_DMA for
all arches that do not set CONFIG_GENERIC_ISA_DMA in order to insure backwards
compatibility. The arches may later undefine ZONE_DMA if their arch code has
been verified to not depend on ZONE_DMA.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Matthew Wilcox <willy@debian.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
From: Jennifer Hunt <jenhunt@us.ibm.com>
This patch adds AF_IUCV socket support.
Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add rewritten IUCV base code to net/iucv.
Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the old IUCV code from drivers/s390/net
Remove approprirate IUCV entries from drivers/s390/net/Makefile,
drivers/s390/net/Kconfig and arch/s390/defconfig
Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set read-only flag in the page table entries for the kernel image text
section. This will catch all instruction caused corruptions withing the
text section.
Instruction replacement via kprobes still works, since it bypasses now
dynamic address translation.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Hopefully this will make it more maintainable and less error prone.
Code makes use of search_exception_tables(). Since it calls this
function before the kernel exeception table is sorted, there is an
early call to sort_main_extable().
This way it's easy to use the already present infrastructure of fixup
sections. Also this would allows to easily convert the rest of
head[31|64].S into C code.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Preset the bogomips number to the cpu capacity value reported by
store system information in SYSIB 1.2.2. This value is constant
for a particular machine model and can be used to determine
relative performance differences between machines.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This is an extension of the already existing hypfs for LPAR (DIAG 204).
Data returned by DIAG 2fc is exported using the s390_hypfs when Linux
is running under z/VM. Information about cpus and memory is provided.
Data is put into different virtual files which can be accessed from user
space. All values are represented as ASCII strings
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add support to boot from a named saved segment (NSS).
Signed-off-by: Hongjie Yang <hongjie@us.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Starting with the z9 the CPU Cryptographic Assist Facility comes with
an integrated Pseudo Random Number Generator. The generator creates
random numbers by an algorithm similar to the ANSI X9.17 standard.
The pseudo-random numbers can be accessed via a character device driver
node called /dev/prandom. Similar to /dev/urandom any amount of bytes
can be read from the device without blocking.
Signed-off-by: Jan Glauber <jan.glauber@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch adds support for clock synchronization to an external time
reference (ETR). The external time reference sends an oscillator
signal and a synchronization signal every 2^20 microseconds to keep
the TOD clocks of all connected servers in sync. For availability
two ETR units can be connected to a machine. If the clock deviates
for more than the sync-check tolerance all cpus get a machine check
that indicates that the clock is out of sync. For the lovely details
how to get the clock back in sync see the code below.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This provides a noexec protection on s390 hardware. Our hardware does
not have any bits left in the pte for a hw noexec bit, so this is a
different approach using shadow page tables and a special addressing
mode that allows separate address spaces for code and data.
As a special feature of our "secondary-space" addressing mode, separate
page tables can be specified for the translation of data addresses
(storage operands) and instruction addresses. The shadow page table is
used for the instruction addresses and the standard page table for the
data addresses.
The shadow page table is linked to the standard page table by a pointer
in page->lru.next of the struct page corresponding to the page that
contains the standard page table (since page->private is not really
private with the pte_lock and the page table pages are not in the LRU
list).
Depending on the software bits of a pte, it is either inserted into
both page tables or just into the standard (data) page table. Pages of
a vma that does not have the VM_EXEC bit set get mapped only in the
data address space. Any try to execute code on such a page will cause a
page translation exception. The standard reaction to this is a SIGSEGV
with two exceptions: the two system call opcodes 0x0a77 (sys_sigreturn)
and 0x0aad (sys_rt_sigreturn) are allowed. They are stored by the
kernel to the signal stack frame. Unfortunately, the signal return
mechanism cannot be modified to use an SA_RESTORER because the
exception unwinding code depends on the system call opcode stored
behind the signal stack frame.
This feature requires that user space is executed in secondary-space
mode and the kernel in home-space mode, which means that the addressing
modes need to be switched and that the noexec protection only works
for user space.
After switching the addressing modes, we cannot use the mvcp/mvcs
instructions anymore to copy between kernel and user space. A new
mvcos instruction has been added to the z9 EC/BC hardware which allows
to copy between arbitrary address spaces, but on older hardware the
page tables need to be walked manually.
Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch moves the config options for the s390 crypto instructions
to the standard "Hardware crypto devices" menu. In addition some
cleanup has been done: use a flag for supported keylengths, add a
warning about machien limitation, return ENOTSUPP in case the
hardware has no support, remove superfluous printks and update
email addresses.
Signed-off-by: Jan Glauber <jan.glauber@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
kretprobe_trampoline_holder() is in kprobes section but used to
register a kprobe in arch_init_kprobes(). Hence register_kprobe()
and therefore arch_init_kprobes() will fail.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In case of an illegal op the die notifier gets called with DIE_TRAP
instead of DIE_BPT first.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Currently loaded DCSS segments are now listed in /proc/iomem with
their name followed by a trailing "(DCSS)".
Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
FCP dump feature detection works only if the sclp command in head.S
was succesful. Since the sclp command is skipped if diag260 works,
we don't have any dump feature detection anymore.
Bug was introduced with d57de5a367.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Change the bounce buffer logic of cpcmd. diag8 needs _real_ memory below
2GB. Therefore vmalloced data does not work. As the data might cross a
page boundary, we cannot use virt_to_page either. The solution is to use
virt_to_page only in the check for a bounce buffer.
There was a redundant check for response==NULL. response < 2GB contains
this check as well.
I also removed the rlen==0 check, since rlen=0 and response!=NULL would
be a caller bug and response==NULL is already checked.
Signed-off-by: Christian Borntraeger <cborntra@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fix an oops experienced on the Cell architecture when init-time functions,
early_*(), are called at runtime. It alters the call paths to make sure
that the callers explicitly say whether the call is being made on behalf of
a hotplug even, or happening at boot-time.
It has been compile tested on ppc64, ia64, s390, i386 and x86_64.
Acked-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There are several places in the futex code where a spin_lock is held
and still uaccesses happen. Deadlocks are avoided by increasing the
preempt count. The pagefault handler will then not take any locks
but will immediately search the fixup tables.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
setup_memory_end() uses VMALLOC_END instead of VMALLOC_END_INIT to
calculate the maximum supported size of physical memory. Since
VMALLOC_END is zero, this will cause a crash on 31 bit systems.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
72486f1f8f inverts the logic if an
'online' attribute in /sys/devices/system/cpu/cpuX should appear.
So we end up with no hotpluggable cpus at all...
Set the hotpluggable value to one to make sure the online
attribute appears again.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fix a memory leak problem in the memory detection routines. A memory leak
of 128k occurs when we have a contiguous memory with mixed access-mode
(read or write) ranges.
Signed-off-by: Hongjie Yang <hongjie@us.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The dump tools expect that the saved prefix register points to the
lowcore of the dump cpu. Since we set the prefix register to 0 during
reipl/dump, we have to save the original prefix register. Before we
start the dump program, we copy the original prefix register to the
designated location in the lowcore.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
We use printks after shutting down all other cpus. This is not allowed
and can lead to deadlocks. Therefore the printks have to be removed.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reboot hangs on LPARs without diag308 support. The reason for this is,
that before the reboot is done, the channel subsystem is shut down.
During the reset on each possible subchannel a "store subchannel" is
done. This operation can end in a program check interruption, if the
specified subchannel set is not implemented by the hardware. During
the reset, currently we do not have a program check handler, which
leads to the described kernel bug. We install now a new program check
handler for the reboot code to fix this problem.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Correct typo to make hypfs work on systems that support only diag204
subcode 4 and fix error handling in hypfs_diag_init.
Signed-off-by: Christian Borntraeger <cborntra@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Run this:
#!/bin/sh
for f in $(grep -Erl "\([^\)]*\) *k[cmz]alloc" *) ; do
echo "De-casting $f..."
perl -pi -e "s/ ?= ?\([^\)]*\) *(k[cmz]alloc) *\(/ = \1\(/" $f
done
And then go through and reinstate those cases where code is casting pointers
to non-pointers.
And then drop a few hunks which conflicted with outstanding work.
Cc: Russell King <rmk@arm.linux.org.uk>, Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Greg KH <greg@kroah.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Paul Fulghum <paulkf@microgate.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Karsten Keil <kkeil@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Ian Kent <raven@themaw.net>
Cc: Steven French <sfrench@us.ibm.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Cc: Jaroslav Kysela <perex@suse.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
[S390] Poison init section before freeing it.
[S390] Use add_active_range() and free_area_init_nodes().
[S390] Virtual memmap for s390.
[S390] Update documentation for dynamic subchannel mapping.
[S390] Use dev->groups for adding/removing the subchannel attribute group.
[S390] Support for disconnected devices reappearing on another subchannel.
[S390] subchannel lock conversion.
[S390] Some preparations for the dynamic subchannel mapping patch.
[S390] runtime switch for qdio performance statistics
[S390] New DASD feature for ERP related logging
[S390] add reset call handler to the ap bus.
[S390] more workqueue fixes.
[S390] workqueue fixes.
[S390] uaccess_pt: add missing down_read() and convert to is_init().
This facility provides three entry points:
ilog2() Log base 2 of unsigned long
ilog2_u32() Log base 2 of u32
ilog2_u64() Log base 2 of u64
These facilities can either be used inside functions on dynamic data:
int do_something(long q)
{
...;
y = ilog2(x)
...;
}
Or can be used to statically initialise global variables with constant values:
unsigned n = ilog2(27);
When performing static initialisation, the compiler will report "error:
initializer element is not constant" if asked to take a log of zero or of
something not reducible to a constant. They treat negative numbers as
unsigned.
When not dealing with a constant, they fall back to using fls() which permits
them to use arch-specific log calculation instructions - such as BSR on
x86/x86_64 or SCAN on FRV - if available.
[akpm@osdl.org: MMC fix]
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David Howells <dhowells@redhat.com>
Cc: Wojtek Kaniewski <wojtekka@toxygen.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The data patterns should allow us to easily tell if somebody accesses
initdata/code after it was freed. Same code as on various other
architectures.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Size zones and holes in an architecture independent manner for s390.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Virtual memmap support for s390. Inspired by the ia64 implementation.
Unlike ia64 we need a mechanism which allows us to dynamically attach
shared memory regions.
These memory regions are accessed via the dcss device driver. dcss
implements the 'direct_access' operation, which requires struct pages
for every single shared page.
Therefore this implementation provides an interface to attach/detach
shared memory:
int add_shared_memory(unsigned long start, unsigned long size);
int remove_shared_memory(unsigned long start, unsigned long size);
The purpose of the add_shared_memory function is to add the given
memory range to the 1:1 mapping and to make sure that the
corresponding range in the vmemmap is backed with physical pages.
It also initialises the new struct pages.
remove_shared_memory in turn only invalidates the page table
entries in the 1:1 mapping. The page tables and the memory used for
struct pages in the vmemmap are currently not freed. They will be
reused when the next segment will be attached.
Given that the maximum size of a shared memory region is 2GB and
in addition all regions must reside below 2GB this is not too much of
a restriction, but there is room for improvement.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove CONFIG_QETH_PERF_STATS and use a sysfs attribute instead.
We want to have the ability to turn the statistics on/off at runtime.
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Doesn't seem to be a good idea to duplicate code :)
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
There was lots of #ifdef noise in the kernel due to hotcpu_notifier(fn,
prio) not correctly marking 'fn' as used in the !HOTPLUG_CPU case, and thus
generating compiler warnings of unused symbols, hence forcing people to add
#ifdefs.
the compiler can skip truly unused functions just fine:
text data bss dec hex filename
1624412 728710 3674856 6027978 5bfaca vmlinux.before
1624412 728710 3674856 6027978 5bfaca vmlinux.after
[akpm@osdl.org: topology.c fix]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When we are unregistering a kprobe-booster, we can't release its
instruction buffer immediately on the preemptive kernel, because some
processes might be preempted on the buffer. The freeze_processes() and
thaw_processes() functions can clean most of processes up from the buffer.
There are still some non-frozen threads who have the PF_NOFREEZE flag. If
those threads are sleeping (not preempted) at the known place outside the
buffer, we can ensure safety of freeing.
However, the processing of this check routine takes a long time. So, this
patch introduces the garbage collection mechanism of insn_slot. It also
introduces the "dirty" flag to free_insn_slot because of efficiency.
The "clean" instruction slots (dirty flag is cleared) are released
immediately. But the "dirty" slots which are used by boosted kprobes, are
marked as garbages. collect_garbage_slots() will be invoked to release
"dirty" slots if there are more than INSNS_PER_PAGE garbage slots or if
there are no unused slots.
Cc: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "bibo,mao" <bibo.mao@intel.com>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Yumiko Sugita <yumiko.sugita.yf@hitachi.com>
Cc: Satoshi Oshima <soshima@redhat.com>
Cc: Hideo Aoki <haoki@redhat.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Define elf_addr_t in linux/elf.h. The size of the type is determined using
ELF_CLASS. This allows us to remove the defines that today are spread all
over .c and .h files.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Cc: Daniel Jacobowitz <drow@false.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce pagefault_{disable,enable}() and use these where previously we did
manual preempt increments/decrements to make the pagefault handler do the
atomic thing.
Currently they still rely on the increased preempt count, but do not rely on
the disabled preemption, this might go away in the future.
(NOTE: the extra barrier() in pagefault_disable might fix some holes on
machines which have too many registers for their own good)
[heiko.carstens@de.ibm.com: s390 fix]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Nick Piggin <npiggin@suse.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix up arch-specific work items where possible to use the new work_struct and
delayed_work structs.
Three places that enqueue bits of their stack and then return have been marked
with #error as this is not permitted.
Signed-Off-By: David Howells <dhowells@redhat.com>
The lock dependency validator adds a bunch of extra stack frames to
the stack, which can cause stack overflows. Especially seen on 31 bit
where the small stack is only 4k.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
VMALLOC_END on 31bit should be 0x8000000UL instead of 0x7fffffffL.
The page mask which is used to make sure memory_end is on 4MB/2MB
boundary is wrong and not needed. Therefore remove it.
Make sure a vmalloc area does also exist and work on (future)
machines with 4TB and more memory.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
There's no need to have a spin_lock here, but need sleepable context
for vmem_map. Therefore convert the spin_lock into a mutex.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Set KBUILD_IMAGE to a sane value. This enables "make rpm"
Signed-off-by: Christian Borntraeger <cborntra@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Follow i386/x86_64:
lockdep can be used to print held locks when printing a
backtrace. This can be useful when debugging things like
'scheduling while atomic' asserts.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Use a wrapper for copy_to/from_user to chose the best usercopy method.
The mvcos instruction is better for sizes greater than 256 bytes, if
mvcos is not available a page table walk is better for sizes greater
than 1024 bytes. Also removed the redundant copy_to/from_user_std_small
functions.
Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Avoid the tprot loop if diag260 works and reports that there are no
holes in memory. The tprot instruction can lead to a significant delay
in the ipl process if the virtual guest has a lot of memory and the
host is under memory pressure.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Need this at yet another file and don't want to add yet another
extern...
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If the memory detection code would ever reach the point where it would
load the wait psw, it would generate a specification exception and the
system would crash at ipl time. This is because of a misaligned wait
psw. It needs to be on a double word boundary instead of a word
boundary.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Let one master cpu kill all other cpus instead of sending an external
interrupt to all other cpus so they can kill themselves.
Simplifies reipl/shutdown functions a lot.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In case of reipl cpcmd gets called when all other cpus are not running
anymore. To prevent deadlocks change __cpcmd so that it doesn't take
any locks and call cpcmd or __cpcmd, whatever is correct in the current
context.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In case of re-IPL and diag308 doesn't work we have to reset all devices
manually and wait synchronously that each reset finished.
This patch adds the necessary infrastucture and the first exploiter of it.
Subsystems that need to add a function that needs to be called at re-IPL
may register/unregister this function via
struct reset_call {
struct reset_call *next;
void (*fn)(void);
};
void register_reset_call(struct reset_call *reset);
void unregister_reset_call(struct reset_call *reset);
When the registered function get called the context is:
- all cpus beside the current one are stopped
- all machine checks and interrupts are disabled
- prefixing is disabled
- a default machine check handler is available for use
The registered functions may not take any locks are sleep.
For the common I/O layer part of this patch:
Introduce a reset_call css_reset that does the following:
- clear all subchannels
- perform a rchp on all channel paths and wait for the resulting
machine checks
This replaces the calls to clear_all_subchannels() and
cio_reset_channel_paths() for kexec and ccw reipl. reipl_ccw_dev() now
uses reipl_find_schid() to determine the subchannel id for a given
device id.
Also remove cio_reset_channel_paths() and friends since they are not
needed anymore.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
segment save will exit with a lock held if the passed segment doesn't
exist. Any subsequent call to segment_save will lead to a deadlock.
Fix this and give up the lock before returning.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Since the diag 308 reipl method is superior to the ccw method, we should
use it whenever it is possible. We can do that, if the user has not
specified a new reipl ccw device and the system has been ipled from
a ccw device.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If reboot fails (e.g. because wrong devno has been specified by the user),
we should just stop all cpus, but should not trigger a kernel panic.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If multiple kernel images are installed on one DASD, the loadparm can be used
to select the boot configuration. This patch introduces the following two new
sysfs attributes:
/sys/firmware/ipl/loadparm: shows loadparm of current system (ro)
/sys/firmware/reipl/ccw/loadparm: loadparm used for next reboot (rw)
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The SALIPL entry point has an needless memory detection routine as we
later check the memory size again. The SALIPL code also uses diagnose
0x060 if we are running under VM, but this diagnose is not compatible
with the 64 bit addressing mode. The solution is to get rid of this
code and rely on the memory detection in the startup code.
Signed-off-by: Christian Borntraeger <cborntra@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
setup_lowcore() calls ctl_set_bit() which returns withs interrupts
enabled. The setup arch code is not supposed to enable interrupts that
early. Therefore use the __ctl_set_bit() variant.
This fixes the not working lock dependency validator on non 64 bit
systems.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Commit 7676bef9c1 breaks DCSS support on
s390. DCSS needs initialized struct pages to work. With the usage of
add_active_range() only the struct pages for physically present pages
are initialized.
This could be fixed if the DCSS driver would initiliaze the struct pages
itself, but this doesn't work too. This is because the mem_map array
does not include holes after the last present memory area and therefore
there is nothing that could be initialized.
To fix this and to avoid some dirty hacks revert this patch for now.
Will be added later when we move to a virtual mem_map.
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
[S390] cio: Make ccw_device_register() static.
[S390] Improve AP bus device removal.
[S390] uaccess error handling.
[S390] cio: css_probe_device() must be called enabled.
[S390] Initialize interval value to 0.
[S390] sys_getcpu compat wrapper.
Add a vmlinux.lds.h helper macro for defining the eight-level initcall table,
teach all the architectures to use it.
This is a prerequisite for a patch which performs initcall synchronisation for
multithreaded-probing.
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
[ Added AVR32 as well ]
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Consider return values for all user space access function and
return -EFAULT on error.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
sscanf() could leave the interval value unchanged in which case it
would be used uninitialized.
Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Looking at the new syscall additions, I noticed that
sys_getcpu_wrapper wraps in to sys_tee, in what appears to be
a copy and paste error. Switch it to point to sys_getcpu..
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fix the following compile error:
CC init/version.o
LD init/built-in.o
LD .tmp_vmlinux1
arch/s390/kernel/built-in.o(.text+0xdba4): In function `sys32_ipc':
: undefined reference to `compat_sys_semtimedop'
arch/s390/kernel/built-in.o(.text+0xdbee): In function `sys32_ipc':
: undefined reference to `compat_sys_semctl'
arch/s390/kernel/built-in.o(.text+0xdc08): In function `sys32_ipc':
: undefined reference to `compat_sys_msgsnd'
arch/s390/kernel/built-in.o(.text+0xdc30): In function `sys32_ipc':
: undefined reference to `compat_sys_msgrcv'
arch/s390/kernel/built-in.o(.text+0xdc58): In function `sys32_ipc':
: undefined reference to `compat_sys_msgctl'
arch/s390/kernel/built-in.o(.text+0xdc76): In function `sys32_ipc':
: undefined reference to `compat_sys_shmat'
arch/s390/kernel/built-in.o(.text+0xdcb0): In function `sys32_ipc':
: undefined reference to `compat_sys_shmctl'
make: *** [.tmp_vmlinux1] Error 1
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The latest kernel 2.6.19-rc1 triggers a bug in the s390 specific stack
trace code when compiled with gcc 3.4.
This patch fixes the latest lock dependency validator code (2.6.19-rc1)
on s390 gcc 3.4. The variable sp was fixed to r15 (which is the stack
pointer in the s390 abi) and assigned new values to r15. Therefore,
gcc 3.4 assigns a new value to r15 and does not restore it on exit (r15
is supposed to be call save) - the kernel stack is broken. Avoid trouble
by not assigning any new value to sp (r15).
Signed-off-by: Christian Borntraeger <cborntra@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove the last few places where a pointer to pt_regs gets passed.
Also make sure we call set_irq_regs() before irq_enter() and after
irq_exit(). This doesn't fix anything but makes sure s390 looks the
same like all other architectures.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fix too slow clock by using CONFIG_GENERIC_TIME and adding a
clock source for the s390 time-of-day clock. As added benefit
we get rid of the s390 specific definition of do_gettimeofday
and do_settimeofday.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fix new restore_sigregs function. It copies the user space copy of the
old psw without correcting the psw.mask and the psw.addr high order bit.
While we are at it, simplify save_sigregs a bit.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
These patches make the kernel pass 64-bit inode numbers internally when
communicating to userspace, even on a 32-bit system. They are required
because some filesystems have intrinsic 64-bit inode numbers: NFS3+ and XFS
for example. The 64-bit inode numbers are then propagated to userspace
automatically where the arch supports it.
Problems have been seen with userspace (eg: ld.so) using the 64-bit inode
number returned by stat64() or getdents64() to differentiate files, and
failing because the 64-bit inode number space was compressed to 32-bits, and
so overlaps occur.
This patch:
Make filldir_t take a 64-bit inode number and struct kstat carry a 64-bit
inode number so that 64-bit inode numbers can be passed back to userspace.
The stat functions then returns the full 64-bit inode number where
available and where possible. If it is not possible to represent the inode
number supplied by the filesystem in the field provided by userspace, then
error EOVERFLOW will be issued.
Similarly, the getdents/readdir functions now pass the full 64-bit inode
number to userspace where possible, returning EOVERFLOW instead when a
directory entry is encountered that can't be properly represented.
Note that this means that some inodes will not be stat'able on a 32-bit
system with old libraries where they were before - but it does mean that
there will be no ambiguity over what a 32-bit inode number refers to.
Note similarly that directory scans may be cut short with an error on a
32-bit system with old libraries where the scan would work before for the
same reasons.
It is judged unlikely that this situation will occur because modern glibc
uses 64-bit capable versions of stat and getdents class functions
exclusively, and that older systems are unlikely to encounter
unrepresentable inode numbers anyway.
[akpm: alpha build fix]
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds the new kernel_execve function on all architectures that were using
_syscall3() to implement execve.
The implementation uses code from the _syscall3 macros provided in the
unistd.h header file. I don't have cross-compilers for any of these
architectures, so the patch is untested with the exception of i386.
Most architectures can probably implement this in a nicer way in assembly or
by combining it with the sys_execve implementation itself, but this should do
it for now.
[bunk@stusta.de: m68knommu build fix]
[markh@osdl.org: build fix]
[bero@arklinux.org: build fix]
[ralf@linux-mips.org: mips fix]
[schwidefsky@de.ibm.com: s390 fix]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Mark Haverkamp <markh@osdl.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move the init_nsproxy definition out of arch/ into kernel/nsproxy.c. This
avoids all arches having to be updated. Compiles and boots on s390.
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds a nsproxy structure to the task struct. Later patches will
move the fs namespace pointer into this structure, and introduce a new utsname
namespace into the nsproxy.
The vserver and openvz functionality, then, would be implemented in large part
by virtualizing/isolating more and more resources into namespaces, each
contained in the nsproxy.
[akpm@osdl.org: build fix]
Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
kprobe_flush_task() possibly calls kfree function during holding
kretprobe_lock spinlock, if kfree function is probed by kretprobe that will
incur spinlock deadlock. This patch moves kfree function out scope of
kretprobe_lock.
Signed-off-by: bibo, mao <bibo.mao@intel.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch vectorizes aio_read() and aio_write() methods to prepare for
collapsing all aio & vectored operations into one interface - which is
aio_read()/aio_write().
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Michael Holzheu <HOLZHEU@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With 2.6.18-rc4-mm2, now wall_jiffies will always be the same as jiffies.
So we can kill wall_jiffies completely.
This is just a cleanup and logically should not change any real behavior
except for one thing: RTC updating code in (old) ppc and xtensa use a
condition "jiffies - wall_jiffies == 1". This condition is never met so I
suppose it is just a bug. I just remove that condition only instead of
kill the whole "if" block.
[heiko.carstens@de.ibm.com: s390 build fix and cleanup]
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use the new diagnose 0x9c in the spinlock implementation for s390. It
yields the remaining timeslice of the virtual cpu that tries to acquire a
lock to the virtual cpu that is the current holder of the lock.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Pass ticks to do_timer() and update_times(), and adjust x86_64 and s390
timer interrupt handler with this change.
Currently update_times() calculates ticks by "jiffies - wall_jiffies", but
callers of do_timer() should know how many ticks to update. Passing ticks
get rid of this redundant calculation. Also there are another redundancy
pointed out by Martin Schwidefsky.
This cleanup make a barrier added by
5aee405c66 needless. So this patch removes
it.
As a bonus, this cleanup make wall_jiffies can be removed easily, since now
wall_jiffies is always synced with jiffies. (This patch does not really
remove wall_jiffies. It would be another cleanup patch)
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is an updated version of Eric Biederman's is_init() patch.
(http://lkml.org/lkml/2006/2/6/280). It applies cleanly to 2.6.18-rc3 and
replaces a few more instances of ->pid == 1 with is_init().
Further, is_init() checks pid and thus removes dependency on Eric's other
patches for now.
Eric's original description:
There are a lot of places in the kernel where we test for init
because we give it special properties. Most significantly init
must not die. This results in code all over the kernel test
->pid == 1.
Introduce is_init to capture this case.
With multiple pid spaces for all of the cases affected we are
looking for only the first process on the system, not some other
process that has pid == 1.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: <lxc-devel@lists.sourceforge.net>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert s390 page handling macros to functions. In particular this fixes a
problem with s390's SetPageUptodate macro which uses its input parameter
twice which again can cause subtle bugs.
[akpm@osdl.org: build fix]
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Major cleanup of all s390 inline assemblies. They now have a common
coding style. Quite a few have been shortened, mainly by using register
asm variables. Use of the EX_TABLE macro helps as well. The atomic ops,
bit ops and locking inlines new use the Q-constraint if a newer gcc
is used. That results in slightly better code.
Thanks to Christian Borntraeger for proof reading the changes.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
A user space program can read uninitialised kernel memory
by appending to a file from a bad address and then reading
the result back. The cause is the copy_from_user function
that does not clear the remaining bytes of the kernel
buffer after it got a fault on the user space address.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add a kernel config option for the IBM System z9. This will produce
faster code on newer compilers using the -march=z9-109 option.
Signed-off-by: Christian Borntraeger <cborntra@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The clocksource infrastructure introduced with commit
ad596171ed broke 31 bit s390.
The reason is that the do_div() primitive for 31 bit always
had a restriction: it could only divide an unsigned 64 bit
integer by an unsigned 31 bit integer. The clocksource code
now uses do_div() with a base value that has the most
significant bit set. The result is that clock->cycle_interval
has a funny value which causes the linux time to jump around
like mad.
The solution is "obvious": implement a proper __div64_32
function for 31 bit s390.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
sparse complains, if we use bitwise operations on enums. Cast enum to
long in order to fix that problem!
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Don't use static initialization for struct members containing
variables because gcc would generate more code and use double space
on stack.
Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Lock for mmap_sem is missing on page fault retry for init task
when it fails due to out of memory.
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Since sys_sysctl is deprecated start allow it to be compiled out. This
should catch any remaining user space code that cares, and paves the way
for further sysctl cleanups.
[akpm@osdl.org: If sys_sysctl() is not compiled-in, emit a warning]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This eliminates the i_blksize field from struct inode. Filesystems that want
to provide a per-inode st_blksize can do so by providing their own getattr
routine instead of using the generic_fillattr() function.
Note that some filesystems were providing pretty much random (and incorrect)
values for i_blksize.
[bunk@stusta.de: cleanup]
[akpm@osdl.org: generic_fillattr() fix]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following patches reduce the size of the VFS inode structure by 28 bytes
on a UP x86. (It would be more on an x86_64 system). This is a 10% reduction
in the inode size on a UP kernel that is configured in a production mode
(i.e., with no spinlock or other debugging functions enabled; if you want to
save memory taken up by in-core inodes, the first thing you should do is
disable the debugging options; they are responsible for a huge amount of bloat
in the VFS inode structure).
This patch:
The filesystem or device-specific pointer in the inode is inside a union,
which is pretty pointless given that all 30+ users of this field have been
using the void pointer. Get rid of the union and rename it to i_private, with
a comment to explain who is allowed to use the void pointer. This is just a
cleanup, but it allows us to reuse the union 'u' for something something where
the union will actually be used.
[judith@osdl.org: powerpc build fix]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Judith Lebzelter <judith@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (225 commits)
[PATCH] Don't set calgary iommu as default y
[PATCH] i386/x86-64: New Intel feature flags
[PATCH] x86: Add a cumulative thermal throttle event counter.
[PATCH] i386: Make the jiffies compares use the 64bit safe macros.
[PATCH] x86: Refactor thermal throttle processing
[PATCH] Add 64bit jiffies compares (for use with get_jiffies_64)
[PATCH] Fix unwinder warning in traps.c
[PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1
[PATCH] x86: Move direct PCI scanning functions out of line
[PATCH] i386/x86-64: Make all early PCI scans dependent on CONFIG_PCI
[PATCH] Don't leak NT bit into next task
[PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder
[PATCH] Fix some broken white space in ia32_signal.c
[PATCH] Initialize argument registers for 32bit signal handlers.
[PATCH] Remove all traces of signal number conversion
[PATCH] Don't synchronize time reading on single core AMD systems
[PATCH] Remove outdated comment in x86-64 mmconfig code
[PATCH] Use string instructions for Core2 copy/clear
[PATCH] x86: - restore i8259A eoi status on resume
[PATCH] i386: Split multi-line printk in oops output.
...
Convert cmm's usage of kernel_thread to kthread_run. Also create the
cmmthread at module load time, so it is possible to check if creation of
the thread fails.
In addition the cmmthread now gets terminated when the module gets unloaded
instead of leaving a stale kernel thread. Also check the return values of
other registration functions at module load and handle their return values
appropriately.
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a notifer chain to the out of memory killer. If one of the registered
callbacks could release some memory, do not kill the process but return and
retry the allocation that forced the oom killer to run.
The purpose of the notifier is to add a safety net in the presence of
memory ballooners. If the resource manager inflated the balloon to a size
where memory allocations can not be satisfied anymore, it is better to
deflate the balloon a bit instead of killing processes.
The implementation for the s390 ballooner is included.
[akpm@osdl.org: cleanups]
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
eventcounters: Do not display counters for zones that are not available on an
arch
Do not define or display counters for the DMA32 and the HIGHMEM zone if such
zones were not configured.
[akpm@osdl.org: s390 fix]
[heiko.carstens@de.ibm.com: s390 fix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Remove unused all_contexts parameter
No caller used it
- Move skip argument into the structure (needed for
followon patches)
Cc: mingo@elte.hu
Signed-off-by: Andi Kleen <ak@suse.de>
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (64 commits)
[BLOCK] dm-crypt: trivial comment improvements
[CRYPTO] api: Deprecate crypto_digest_* and crypto_alg_available
[CRYPTO] padlock: Convert padlock-sha to use crypto_hash
[CRYPTO] users: Use crypto_comp and crypto_has_*
[CRYPTO] api: Add crypto_comp and crypto_has_*
[CRYPTO] users: Use crypto_hash interface instead of crypto_digest
[SCSI] iscsi: Use crypto_hash interface instead of crypto_digest
[CRYPTO] digest: Remove old HMAC implementation
[CRYPTO] doc: Update documentation for hash and me
[SCTP]: Use HMAC template and hash interface
[IPSEC]: Use HMAC template and hash interface
[CRYPTO] tcrypt: Use HMAC template and hash interface
[CRYPTO] hmac: Add crypto template implementation
[CRYPTO] digest: Added user API for new hash type
[CRYPTO] api: Mark parts of cipher interface as deprecated
[PATCH] scatterlist: Add const to sg_set_buf/sg_init_one pointer argument
[CRYPTO] drivers: Remove obsolete block cipher operations
[CRYPTO] users: Use block ciphers where applicable
[SUNRPC] GSS: Use block ciphers where applicable
[IPSEC] ESP: Use block ciphers where applicable
...
This patch removes obsolete block operations of the simple cipher type
from drivers. These were preserved so that existing users can make a
smooth transition. Now that the transition is complete, they are no
longer needed.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds block cipher algorithms for S390. Once all users of the
old cipher type have been converted the existing CBC/ECB non-block cipher
operations will be removed.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Accelerated versions of crypto algorithms must carry a distinct driver name
and priority in order to distinguish themselves from their generic counter-
part.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Now that the tfm is passed directly to setkey instead of the ctx, we no
longer need to pass the &tfm->crt_flags pointer.
This patch also gets rid of a few unnecessary checks on the key length
for ciphers as the cipher layer guarantees that the key length is within
the bounds specified by the algorithm.
Rather than testing dia_setkey every time, this patch does it only once
during crypto_alloc_tfm. The redundant check from crypto_digest_setkey
is also removed.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When an invalid mount option is specified, no root inode is created
for hypfs, hypfs_fill_super() returns with -EINVAL and then
hypfs_kill_super() is called. hypfs_kill_super() does not check if
the root inode has been initialized. This patch adds this check.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This introduces new user-copy operations which are optimized for
copying more than 256 Bytes on new hardware.
Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Introduces a struct uaccess_ops which allows setting user-copy
operations at run-time.
Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Changed and simplified some page table related #defines and code.
Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch delivers a new Linux API in the form of a misc char
device that is useable from user space and allows write access
to the z/VM APPLDATA Monitor Records collected by the *MONITOR
System Service of z/VM.
Signed-off-by: Melissa Howland <melissah@us.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Introduce asm header that contains the appldata data structures and
the diag inline assembly.
Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Substract the size of the initial stack frame from the correct
register. Otherwise we will end up in a program check loop.
Fix the offset into the save area as well.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Convert GET_IPL_DEVICE assembler macro to C function.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fix compile warning with some configurations:
arch/s390/mm/cmm.c:58: warning: 'cmm_strtoul' defined but not used
Originally cmm_strtoul was introduced because simple_strtoul couldn't
handle strings with hexadecimal numbers that contained a capital 'X'.
Since this is no longer true cmm_strtoul can be removed.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If do_signal() gets called several times before returning to user space
and no signal is pending (e.g. cancelled by a debugger) syscall restart
handling could be done several times. This would change the user space
PSW to an address prior to the syscall instruction.
Fix this by making sure that syscall restart handling is only done once.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
It is now possible to specify a ccw/fcp dump device which is used to
automatically create a system dump in case of a kernel panic. The dump
device can be configured under /sys/firmware/dump.
In addition it is now possible to specify a ccw/fcp device which is used
for the next reboot of Linux. The reipl device can be configured under
/sys/firmware/reipl.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Correct some comments in the hypervisor filesystem.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Move initrd if the bitmap of the bootmem allocator would overwrite it.
In addition this patch sets the default size and address of the initrd to 0.
Therefore all boot loaders must set the initrd size and address correctly.
This is especially relevant for ftp boot via HMC/SE, where this change
requires a special patch file entry in the .ins file which sets these two
values contained at address 0x10408 and 0x10410.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Michael Grundy <grundym@us.ibm.com>
Signed-off-by: David Wilder <dwilder@us.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Take default arch/*/kernel/audit.c to lib/, have those with special
needs (== biarch) define AUDIT_ARCH in their Kconfig.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The copy_in_user primitive does not work as advertised. If the source
and target area are available copy_in_user copies one byte too much.
If one of the memory areas is not available it does not copy as much
data as it can, but up to 257 bytes less.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Use __cpuinit for CPU hotplug notifier function.
Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Use hotplug version of register_cpu_notifier in late init functions.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Take return values of sysfs_create_group & friends into account.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
SLES9 binutils don't like .align 4096 statements in head.S. Work around this
by using .org statements.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
1. Multipath devices for which SetPGID is not supported are not handled well.
Use NOP ccws for path verification (sans path grouping) when SetPGID is not
supported.
2. Check for PGIDs already set with SensePGID on _all_ paths (not just the
first one) and try to find a common one. Moan if no common PGID can be
found (and use NOP verification). If no PGIDs have been set, use the css
global PGID (as before). (Rationale: SetPGID will get a command reject if
the PGID it tries to set does not match the already set PGID.)
3. Immediately before reboot, issue RESET CHANNEL PATH (rcp) on all chpids. This
will remove the old PGIDs. rcp will generate solicited CRWs which can be
savely ignored by the machine check handler (all other actions create
unsolicited CRWs).
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove BINFMT_ELF32 config option. Support should be always compiled in if
CONFIG_COMPAT is set.
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently most architectures either always build binfmt_elf32 in the kernel
image or make it a boolean option. Only sparc64 and s390 allow to build it
modularly. This patch turns the option into a boolean aswell because elf
requires various symbols that shouldn't be available to modules. The most
urgent one is tasklist_lock whos export this patch series kills, but there
are others like force_sgi aswell.
Note that sparc doesn't allow a modular 32bit a.out handler either, and
that would be the more useful case as only few people want 32bit sunos
compatibility and 99.9% of all sparc64 users need 32bit linux native elf
support.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: "David S. Miller" <davem@davemloft.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
At the moment, powerpc and s390 have their own versions of do_softirq which
include local_bh_disable() and __local_bh_enable() calls. They end up
calling __do_softirq (in kernel/softirq.c) which also does
local_bh_disable/enable.
Apparently the two levels of disable/enable trigger a warning from some
validation code that Ingo is working on, and he would like to see the outer
level removed. But to do that, we have to move the account_system_vtime
calls that are currently in the arch do_softirq() implementations for
powerpc and s390 into the generic __do_softirq() (this is a no-op for other
archs because account_system_vtime is defined to be an empty inline
function on all other archs). This patch does that.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
irqtrace support for s390.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
stacktrace interface for s390 as needed by lock validator.
[clg@fr.ibm.com: build fix]
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
CONFIG_FRAME_POINTER support for s390.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Put s390's syscall tables into .rodata section and write protect this
section to prevent misuse of it. Suggested by Arjan van de Ven
<arjan@infradead.org>.
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The remaining counters in page_state after the zoned VM counter patches
have been applied are all just for show in /proc/vmstat. They have no
essential function for the VM.
We use a simple increment of per cpu variables. In order to avoid the most
severe races we disable preempt. Preempt does not prevent the race between
an increment and an interrupt handler incrementing the same statistics
counter. However, that race is exceedingly rare, we may only loose one
increment or so and there is no requirement (at least not in kernel) that
the vm event counters have to be accurate.
In the non preempt case this results in a simple increment for each
counter. For many architectures this will be reduced by the compiler to a
single instruction. This single instruction is atomic for i386 and x86_64.
And therefore even the rare race condition in an interrupt is avoided for
both architectures in most cases.
The patchset also adds an off switch for embedded systems that allows a
building of linux kernels without these counters.
The implementation of these counters is through inline code that hopefully
results in only a single instruction increment instruction being emitted
(i386, x86_64) or in the increment being hidden though instruction
concurrency (EPIC architectures such as ia64 can get that done).
Benefits:
- VM event counter operations usually reduce to a single inline instruction
on i386 and x86_64.
- No interrupt disable, only preempt disable for the preempt case.
Preempt disable can also be avoided by moving the counter into a spinlock.
- Handling is similar to zoned VM counters.
- Simple and easily extendable.
- Can be omitted to reduce memory use for embedded use.
References:
RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=113512330605497&w=2
RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=114988082814934&w=2
local_t http://marc.theaimsgroup.com/?l=linux-kernel&m=114991748606690&w=2
V2 http://marc.theaimsgroup.com/?t=115014808400007&r=1&w=2
V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767022346&w=2
V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115047968808926&w=2
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently a single atomic variable is used to establish the size of the page
cache in the whole machine. The zoned VM counters have the same method of
implementation as the nr_pagecache code but also allow the determination of
the pagecache size per zone.
Remove the special implementation for nr_pagecache and make it a zoned counter
named NR_FILE_PAGES.
Updates of the page cache counters are always performed with interrupts off.
We can therefore use the __ variant here.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add CPU ID and steal time, and make OS record size variable.
Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Implementation of new kernel parameter vmpanic that provides a means to
perform a z/VM CP command after a kernel panic occurred.
Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The two macros NEW_TO_OLD_UID and NEW_TO_OLD_GID in binfmt_elf32.c
are not used by any code. Remove them.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
overflowuid and overflowgid were exported twice. Remove the export
from s390_ksyms.c
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove EXPORT_SYMBOL_GPL(avenrun) from appdata_base.c, since it is
already exported in kernel/timer.c
Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
There is almost no room left for any new code between 0x10000
and 0x10480. Move the code from 0x10000 to 0x11000.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If a machine checks interrupts the external or the i/o interrupt
handler before they have completed the cpu time calculations, the
accounting goes wrong. After the cpu returned from the machine check
handler to the interrupted interrupt handler, a negative cpu time delta
can occur. If the accumulated cpu time in lowcore is small enough
this value can get negative as well. The next jiffy interrupt will pick
up that negative value, shift it by 12 and add the now huge positive
value to the cpu time of the process.
To solve this the machine check handler is modified not to change any
of the timestamps in the lowcore if the machine check interrupted kernel
context.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The software watchdog calls machine_restart from a timer function.
The s390 machine_restart calls console_unblank to flush the console
output. This is needed for panic to get the panic message printed.
If console_unblank is called in interrupt a BUG is triggered in
acquire_console_sem. That makes the software watchdog panic instead
of restarting the machine. To get around this problem the call to
console_unblank is made conditionally on !in_interrupt() ||
oops_in_progress.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The wrong base register is used to read a value from the sclp data
structure. The value is used to calculate the memory size.
Use correct register %r4.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
show_stack() passes a pointer to the current stack frame to show_trace().
Because of tail call optimization the pointer doesn't point to the original
stack frame anymory and therefore traces are wrong. Don't pass the pointer
of the current stack frame to show_trace(). Instead let show_trace()
calculate the pointer on its own.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
With Goto-san's patch, we can add new pgdat/node at runtime. I'm now
considering node-hot-add with cpu + memory on ACPI.
I found acpi container, which describes node, could evaluate cpu before
memory. This means cpu-hot-add occurs before memory hot add.
In most part, cpu-hot-add doesn't depend on node hot add. But register_cpu(),
which creates symbolic link from node to cpu, requires that node should be
onlined before register_cpu(). When a node is onlined, its pgdat should be
there.
This patch-set holds off creating symbolic link from node to cpu
until node is onlined.
This removes node arguments from register_cpu().
Now, register_cpu() requires 'struct node' as its argument. But the array of
struct node is now unified in driver/base/node.c now (By Goto's node hotplug
patch). We can get struct node in generic way. So, this argument is not
necessary now.
This patch also guarantees add cpu under node only when node is onlined. It
is necessary for node-hot-add vs. cpu-hot-add patch following this.
Moreover, register_cpu calculates cpu->node_id by cpu_to_node() without regard
to its 'struct node *root' argument. This patch removes it.
Also modify callers of register_cpu()/unregister_cpu, whose args are changed
by register-cpu-remove-node-struct patch.
[Brice.Goglin@ens-lyon.org: fix it]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Brice Goglin <Brice.Goglin@ens-lyon.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
typo fixes
Clean up 'inline is not at beginning' warnings for usb storage
Storage class should be first
i386: Trivial typo fixes
ixj: make ixj_set_tone_off() static
spelling fixes
fix paniced->panicked typos
Spelling fixes for Documentation/atomic_ops.txt
move acknowledgment for Mark Adler to CREDITS
remove the bouncing email address of David Campbell
Up until now algorithms have been happy to get a context pointer since
they know everything that's in the tfm already (e.g., alignment, block
size).
However, once we have parameterised algorithms, such information will
be specific to each tfm. So the algorithm API needs to be changed to
pass the tfm structure instead of the context pointer.
This patch is basically a text substitution. The only tricky bit is
the assembly routines that need to get the context pointer offset
through asm-offsets.h.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Various digest algorithms operate one block at a time and therefore
keep a temporary buffer of partial blocks. This buffer does not need
to be initialised since there is a counter which indicates what is and
isn't valid in it.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cleanup & fix 31 bit compilation:
CC arch/s390/kernel/setup.o
arch/s390/kernel/setup.c:83: error: initializer element is not computable at
load time
arch/s390/kernel/setup.c:83: error: (near initialization for
'code_resource.start')
Not sure which patch in the -mm tree breaks this, but since this can be
considered a cleanup it can be merged anyway.
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Update hypfs for dhowells API changes.
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Ingo Oeser <ioe-lkml@rameria.de>
Cc: Joern Engel <joern@wohnheim.fh-wedel.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On zSeries machines there exists an interface which allows the operating
system to retrieve LPAR hypervisor accounting data. For example, it is
possible to get usage data for physical and virtual cpus. In order to
provide this information to user space programs, I implemented a new
virtual Linux file system named 's390_hypfs' using the Linux 2.6 libfs
framework. The name 's390_hypfs' stands for 'S390 Hypervisor Filesystem'.
All the accounting information is put into different virtual files which
can be accessed from user space. All data is represented as ASCII strings.
When the file system is mounted the accounting information is retrieved and
a file system tree is created with the attribute files containing the cpu
information. The content of the files remains unchanged until a new update
is made. An update can be triggered from user space through writing
'something' into a special purpose update file.
We create the following directory structure:
<mount-point>/
update
cpus/
<cpu-id>
type
mgmtime
<cpu-id>
...
hyp/
type
systems/
<lpar-name>
cpus/
<cpu-id>
type
mgmtime
cputime
onlinetime
<cpu-id>
...
<lpar-name>
cpus/
...
- update: File to trigger update
- cpus/: Directory for all physical cpus
- cpus/<cpu-id>/: Directory for one physical cpu.
- cpus/<cpu-id>/type: Type name of physical zSeries cpu.
- cpus/<cpu-id>/mgmtime: Physical-LPAR-management time in microseconds.
- hyp/: Directory for hypervisor information
- hyp/type: Typ of hypervisor (currently only 'LPAR Hypervisor')
- systems/: Directory for all LPARs
- systems/<lpar-name>/: Directory for one LPAR.
- systems/<lpar-name>/cpus/<cpu-id>/: Directory for the virtual cpus
- systems/<lpar-name>/cpus/<cpu-id>/type: Typ of cpu.
- systems/<lpar-name>/cpus/<cpu-id>/mgmtime:
Accumulated number of microseconds during which a physical
CPU was assigned to the logical cpu and the cpu time was
consumed by the hypervisor and was not provided to
the LPAR (LPAR overhead).
- systems/<lpar-name>/cpus/<cpu-id>/cputime:
Accumulated number of microseconds during which a physical CPU
was assigned to the logical cpu and the cpu time was consumed
by the LPAR.
- systems/<lpar-name>/cpus/<cpu-id>/onlinetime:
Accumulated number of microseconds during which the logical CPU
has been online.
As mount point for the filesystem /sys/hypervisor/s390 is created.
The update process is triggered when writing 'something' into the
'update' file at the top level hypfs directory. You can do this e.g.
with 'echo 1 > update'. During the update the whole directory structure
is deleted and built up again.
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Ingo Oeser <ioe-lkml@rameria.de>
Cc: Joern Engel <joern@wohnheim.fh-wedel.de>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add missing parentheses for type cast to u64.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The 32 bit unsigned substraction (next - jiffies) in stop_hz_timer can
overflow if jiffies gets advanced between next_timer_interrupt and the read
under the xtime lock. The cast to a u64 then results in a large value
which causes the cpu to wait too long. Fix this by casting next and
jiffies independently to u64 before subtracting them.
(Spotted by Zachary Amsden <zach@vmware.com>)
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add new vmsplice system call and add missing __NR_xxx defines for
sys_set_robust_list, sys_get_robust_list, sys_splice, sys_sync_file_range
and sys_tee.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Exploit rcu_needs_cpu() interface to keep the cpu 'ticking' if necessary.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'audit.b10' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
[PATCH] Audit Filter Performance
[PATCH] Rework of IPC auditing
[PATCH] More user space subject labels
[PATCH] Reworked patch for labels on user space messages
[PATCH] change lspp ipc auditing
[PATCH] audit inode patch
[PATCH] support for context based audit filtering, part 2
[PATCH] support for context based audit filtering
[PATCH] no need to wank with task_lock() and pinning task down in audit_syscall_exit()
[PATCH] drop task argument of audit_syscall_{entry,exit}
[PATCH] drop gfp_mask in audit_log_exit()
[PATCH] move call of audit_free() into do_exit()
[PATCH] sockaddr patch
[PATCH] deal with deadlocks in audit_free()
Consider return value of __put_user() when setting up a signal frame
instead of ignoring it.
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a read_mostly section and define __read_mostly to prevent cache line
pollution due to writes for mostly read variables. In addition fix the
incorrect alignment of the cache_line_aligned data section. s390 has a
cacheline size of 256 bytes.
Signed-off-by: Christian Borntraeger <cborntra@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Print a warning with the z/VM error code if segment_load, segment_type or
segment_save fail to ease the problem determination.
Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If a signal handler has been established with the SA_ONSTACK option but no
alternate stack is provided with sigaltstack(), the kernel still tries to
install the alternate stack. Also when setting an alternate stack with
sigalstack() and the SS_DISABLE flag, the kernel tries to install the
alternate stack on signal delivery. Use the correct conditions sas_ss_flags()
to check if the alternate stack has to be used.
Signed-off-by: Laurent Meyer <meyerlau@fr.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Few of the notifier_chain_register() callers use __devinitdata in the
definition of notifier_block data structure. It is incorrect as the
data structure should be available after the initializations (they do
not unregister them during initializations).
This was leading to an oops when notifier_chain_register() call is
invoked for those callback chains after initialization.
This patch fixes all such usages to _not_ have the notifier_block data
structure in the init data section.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-finline-limit might have been required for older compilers, but nowadays
it does no longer make sense.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
for_each_cpu() actually iterates across all possible CPUs. We've had mistakes
in the past where people were using for_each_cpu() where they should have been
iterating across only online or present CPUs. This is inefficient and
possibly buggy.
We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the
future.
This patch replaces for_each_cpu with for_each_possible_cpu.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The kernel's implementation of notifier chains is unsafe. There is no
protection against entries being added to or removed from a chain while the
chain is in use. The issues were discussed in this thread:
http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2
We noticed that notifier chains in the kernel fall into two basic usage
classes:
"Blocking" chains are always called from a process context
and the callout routines are allowed to sleep;
"Atomic" chains can be called from an atomic context and
the callout routines are not allowed to sleep.
We decided to codify this distinction and make it part of the API. Therefore
this set of patches introduces three new, parallel APIs: one for blocking
notifiers, one for atomic notifiers, and one for "raw" notifiers (which is
really just the old API under a new name). New kinds of data structures are
used for the heads of the chains, and new routines are defined for
registration, unregistration, and calling a chain. The three APIs are
explained in include/linux/notifier.h and their implementation is in
kernel/sys.c.
With atomic and blocking chains, the implementation guarantees that the chain
links will not be corrupted and that chain callers will not get messed up by
entries being added or removed. For raw chains the implementation provides no
guarantees at all; users of this API must provide their own protections. (The
idea was that situations may come up where the assumptions of the atomic and
blocking APIs are not appropriate, so it should be possible for users to
handle these things in their own way.)
There are some limitations, which should not be too hard to live with. For
atomic/blocking chains, registration and unregistration must always be done in
a process context since the chain is protected by a mutex/rwsem. Also, a
callout routine for a non-raw chain must not try to register or unregister
entries on its own chain. (This did happen in a couple of places and the code
had to be changed to avoid it.)
Since atomic chains may be called from within an NMI handler, they cannot use
spinlocks for synchronization. Instead we use RCU. The overhead falls almost
entirely in the unregister routine, which is okay since unregistration is much
less frequent that calling a chain.
Here is the list of chains that we adjusted and their classifications. None
of them use the raw API, so for the moment it is only a placeholder.
ATOMIC CHAINS
-------------
arch/i386/kernel/traps.c: i386die_chain
arch/ia64/kernel/traps.c: ia64die_chain
arch/powerpc/kernel/traps.c: powerpc_die_chain
arch/sparc64/kernel/traps.c: sparc64die_chain
arch/x86_64/kernel/traps.c: die_chain
drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list
kernel/panic.c: panic_notifier_list
kernel/profile.c: task_free_notifier
net/bluetooth/hci_core.c: hci_notifier
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain
net/ipv6/addrconf.c: inet6addr_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain
net/netlink/af_netlink.c: netlink_chain
BLOCKING CHAINS
---------------
arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain
arch/s390/kernel/process.c: idle_chain
arch/x86_64/kernel/process.c idle_notifier
drivers/base/memory.c: memory_chain
drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list
drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list
drivers/macintosh/adb.c: adb_client_list
drivers/macintosh/via-pmu.c sleep_notifier_list
drivers/macintosh/via-pmu68k.c sleep_notifier_list
drivers/macintosh/windfarm_core.c wf_client_list
drivers/usb/core/notify.c usb_notifier_list
drivers/video/fbmem.c fb_notifier_list
kernel/cpu.c cpu_chain
kernel/module.c module_notify_list
kernel/profile.c munmap_notifier
kernel/profile.c task_exit_notifier
kernel/sys.c reboot_notifier_list
net/core/dev.c netdev_chain
net/decnet/dn_dev.c: dnaddr_chain
net/ipv4/devinet.c: inetaddr_chain
It's possible that some of these classifications are wrong. If they are,
please let us know or submit a patch to fix them. Note that any chain that
gets called very frequently should be atomic, because the rwsem read-locking
used for blocking chains is very likely to incur cache misses on SMP systems.
(However, if the chain's callout routines may sleep then the chain cannot be
atomic.)
The patch set was written by Alan Stern and Chandra Seetharaman, incorporating
material written by Keith Owens and suggestions from Paul McKenney and Andrew
Morton.
[jes@sgi.com: restructure the notifier chain initialization macros]
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Create compat_sys_adjtimex and use it an all appropriate places.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We had a copy of the compatibility version of struct timex in each 64 bit
architecture. This patch just creates a global one and replaces all the
usages of the old ones.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kyle McMartin <kyle@parisc-linux.org>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
include/linux/platform.h contained nothing that was actually used except
the default_idle() prototype, and is therefore removed by this patch.
This patch does the following with the platform specific default_idle()
functions on different architectures:
- remove the unused function:
- parisc
- sparc64
- make the needlessly global function static:
- arm
- h8300
- m68k
- m68knommu
- s390
- v850
- x86_64
- add a prototype in asm/system.h:
- cris
- i386
- ia64
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Patrick Mochel <mochel@digitalimplant.org>
Acked-by: Kyle McMartin <kyle@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert all kmalloc + memset sequences in arch/s390 to kzalloc usage.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Include connector config in the s390 arch Kconfig to get support for
connectors.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Retry starting of new cpu if sigp restart returns condition code 2 (busy).
Signed-off-by: Michael Ryan <ryan@funsoft.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Set permissoin of /proc/sys/vm/cmm_* files to 0644.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use common code parser for early parameters instead of our own.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When we stop allocating percpu memory for not-possible CPUs we must not touch
the percpu data for not-possible CPUs at all. The correct way of doing this
is to test cpu_possible() or to use for_each_cpu().
This patch is a kernel-wide sweep of all instances of NR_CPUS. I found very
few instances of this bug, if any. But the patch converts lots of open-coded
test to use the preferred helper macros.
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Kyle McMartin <kyle@parisc-linux.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Christian Zankel <chris@zankel.net>
Cc: Philippe Elie <phil.el@wanadoo.fr>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Jens Axboe <axboe@suse.de>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
set_page_count usage outside mm/ is limited to setting the refcount to 1.
Remove set_page_count from outside mm/, and replace those users with
init_page_count() and set_page_refcounted().
This allows more debug checking, and tighter control on how code is allowed
to play around with page->_count.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently the code tries up to spin_retry times to grab a lock using the cs
instruction. The cs instruction has exclusive access to a memory region
and therefore invalidates the appropiate cache line of all other cpus. If
there is contention on a lock this leads to cache line trashing. This can
be avoided if we first check wether a cs instruction is likely to succeed
before the instruction gets actually executed.
Signed-off-by: Christian Ehrhardt <ehrhardt@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
strnlen_user is supposed to return then length count + 1 if no terminating \0
is found, and it should return 0 on exception. Found by David Howells
<dhowells@redhat.com>.
Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-By: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I'm currently at the POSIX meeting and one thing covered was the
incompatibility of Linux's link() with the POSIX definition. The name.
Linux does not follow symlinks, POSIX requires it does.
Even if somebody thinks this is a good default behavior we cannot change this
because it would break the ABI. But the fact remains that some application
might want this behavior.
We have one chance to help implementing this without breaking the behavior.
For this we could use the new linkat interface which would need a new
flags parameter. If the new parameter is AT_SYMLINK_FOLLOW the new
behavior could be invoked.
I do not want to introduce such a patch now. But we could add the
parameter now, just don't use it. The patch below would do this. Can we
get this late patch applied before the release more or less fixes the
syscall API?
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Just rename the compat system call to keep the name consistent with all the
other *64 compat system calls.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The last changes that introduced the additional_cpus command line parameter
also introduced a regression regarding smp initialization speed. In
smp_setup_cpu_possible_map() cpu_present_map is set to the same value as
cpu_possible_map. Especially that means that bits in the present map will be
set for cpus that are not present. This will cause a slow down in the initial
cpu_up() loop in smp_init() since trying to take cpus online that aren't
present takes a while.
Fix this by setting only bits for present cpus in cpu_present_map and set
cpu_present_map to cpu_possible_map in smp_cpus_done().
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce possible_cpus command line option. Hard sets the number of bits set
in cpu_possible_map. Unlike the additional_cpus parameter this one guarantees
that num_possible_cpus() will stay constant even if the system gets rebooted
and a different number of cpus are present at startup.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce additional_cpus command line option. By default no additional cpu
can be attached to the system anymore. Only the cpus present at IPL time can
be switched on/off. If it is desired that additional cpus can be attached to
the system the maximum number of additional cpus needs to be specified with
this option.
This change is necessary in order to limit the waste of per_cpu data
structures.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Set preempt_count of idle_thread to zero before switching off cpu. Otherwise
the preempt_count will be wrong if the cpu is switched on again since the
thread will be reused.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
The boot sequence on s390 sometimes takes ages and we spend a very long
time (up to one or two minutes) in calibrate_migration_costs. The time
spent there differs from boot to boot. Also the calculated costs differ
a lot. I've seen differences by up to a factor of 15 (yes, factor not
percent). Also I doubt that making these measurements make much sense on
a completely virtualized architecture where you cannot tell how much cpu
time you will get anyway.
So introduce the CONFIG_DEFAULT_MIGRATION_COST method for an architecture
to set the scheduler migration costs. This turns off automatic detection
of migration costs. Makes sense on virtual platforms, where migration
costs are hard to measure accurately.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix __delay implementation. Called with an argument "1" or "0" it would
loop nearly forever (since (1/2)-1 = 0xffffffff).
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add fstatat64 support to s390 in order to follow changes with
commit cff2b76009 .
Also fixes compilation for 31 bit.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add support for unshare system call.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add missing smp_cpu_not_running define to avoid build warnings in the non smp
case.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Initiliazing of cpu_possible_map was done in smp_prepare_cpus which is way too
late. Therefore assign a static value to cpu_possible_map, since we don't
have access to max_cpus in setup_arch.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
arch/s390/kernel/compat_signal.c:199: error: conflicting types for 'do_sigaction'
include/linux/sched.h:1115: error: previous declaration of 'do_sigaction' was here
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch 9ad11ab48b changes the type of the first
argument of some compat syscalls from int to unsigned int. Add these changes
to the s390 compat wrapper as well.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add support for the new *at, pselect6 and ppoll system calls. This includes
adding required support for TIF_RESTORE_SIGMASK.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add monotonic_clock interface, used by the hangcheck-timer. On s390 this is
the same as sched_clock().
Signed-off-by: Jan Glauber <jan.glauber@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The least significant bit of the TOD clock value returned by get_clock
is the 4096th part of a microsecond. To get to nanoseconds the value
needs to be divided by 4096 and multiplied with 1000.
The current method multiplies first and then shifts the value to make the
result as precise as possible. The disadvantage is that the multiplication
with 1000 will overflow shortly after 52 days. sched_clock is used by the
scheduler for time stamp deltas, if an overflow occurs between two time stamps
the scheduler will get confused.
With the patch the problem occurs only after approx. one year, so the chance
to run into this overflow is extremly low.
Signed-off-by: Jan Glauber <jan.glauber@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Remove all CVS generated information like e.g. revision IDs from
drivers/s390 and include/asm-s390 (none present in arch/s390).
- Add newline at end of arch/s390/lib/Makefile to avoid diff message.
Acked-by: Andreas Herrmann <aherrman@de.ibm.com>
Acked-by: Frank Pavlic <pavlic@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
finish_arch_switch needs to update the user cpu time as well, not just the
system cpu time. Otherwise the partial user cpu time of a process that is
stored in the lowcore will be (mis-)accounted to the next process.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Define a dummy pm_power_off pointer to make sys_reboot happy.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove useless spin_retry_counter and fix compilation for UP kernels.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The show_task function walks the kernel stack backchain of processes assuming
that the processes are not running. Since this assumption is not correct
walking the backchain can lead to an addressing exception and therefore to a
kernel hang. So prevent the kernel hang (you still get incorrect results)
verity that all read accesses are within the bounds of the kernel stack before
performing them.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix processing of messages larger than 2 * SHA256_BLOCK_SIZE.
Signed-off-by: Jan Glauber <jan.glauber@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Call KM[C] only with a multiple of block size. Check return value of KM[C]
instructions and complain about erros
Signed-off-by: Jan Glauber <jan.glauber@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Provide ECB and CBC encrypt / decrypt functions to crypto API to speed up our
hardware accelerated DES implementation. This new functions allow the crypto
API to call ECB / CBC directly with large blocks in difference to the old
functions that were calles with algorithm block size (8 bytes for DES).
This is up to factor 10 faster than our old hardware implementation :)
Signed-off-by: Jan Glauber <jan.glauber@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Beautify the s390 in-kernel-crypto des code.
Signed-off-by: Jan Glauber <jan.glauber@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
arch: Use <linux/capability.h> where capable() is used.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These days ioctl32.h is only used for communication of fs/compat.c and
fs/compat_ioctl.c and doesn't contain anything of interest to drivers.
Remove inclusion in various drivers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now that all these entries in the arch ioctl32.c files are gone [1], we can
build fs/compat_ioctl.c as a normal object and kill tons of cruft. We need a
special do_ioctl32_pointer handler for s390 so the compat_ptr call is done.
This is not needed but harmless on all other architectures. Also remove some
superflous includes in fs/compat_ioctl.c
Tested on ppc64.
[1] parisc still had it's PPP handler left, which is not fully correct
for ppp and besides that ppp uses the generic SIOCPRIV ioctl so it'd
kick in for all netdevice users. We can introduce a proper handler
in one of the next patch series by adding a compat_ioctl method to
struct net_device but for now let's just kill it - parisc doesn't
compile in mainline anyway and I don't want this to block this
patchset.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <willy@debian.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a compat_ioctl method to the dasd driver so the last entries in
arch/s390/kernel/compat_ioctl.c can go away. Unlike the previous attempt this
one does not replace the ioctl method with an unlocked_ioctl method so that
the ioctl_by_bdev calls in s390 partition code continue to work.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The comment in compat.c is wrong, every architecture provides a
get_compat_sigevent() for the IPC compat code already.
This basically moves the x86_64 version to common code and removes all the
others.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Paul Mackerras <paulus@samba.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The only own ioctl, TAPE390_DISPLAY, is compat_clean, everything else is
routed through common translation code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These ioctls are definitely not compat clean, but we already have a proper
handler in common code, over-riding it in architecture code is
counter-productive.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Again easy because all ioctls are compat clean.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- In case of system crash, current state of cpu registers is saved in memory
in elf note format. So far memory for storing elf notes was being allocated
statically for NR_CPUS.
- This patch introduces dynamic allocation of memory for storing elf notes.
It uses alloc_percpu() interface. This should lead to better memory usage.
- Introduced based on Andi Kleen's and Eric W. Biederman's suggestions.
- This patch also moves memory allocation for elf notes from architecture
dependent portion to architecture independent portion. Now crash_notes is
architecture independent. The whole idea is that size of memory to be
allocated per cpu (MAX_NOTE_BYTES) can be architecture dependent and
allocation of this memory can be architecture independent.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
)
From: Adrian Bunk <bunk@stusta.de>
- create one common dump_thread() prototype in kernel.h
- dump_thread() is only used in fs/binfmt_aout.c and can therefore be
removed on all architectures where CONFIG_BINFMT_AOUT is not
available
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It seems the "make UID16 support optional" patch was checked when it
edited the -tiny tree some time ago, but it wasn't checked whether it
still matches the current situation when it was submitted for inclusion
in -mm. This patch fixes the following bugs:
- ARCH_S390X does no longer exist, nowadays this has to be expressed
through (S390 && 64BIT)
- in five architecture specific Kconfig files the UID16 options
weren't removed
Additionally, it changes the fragile negative dependencies of UID16 to
positive dependencies (new architectures are more likely to not require
UID16 support).
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The ptrace_get_task_struct() helper that I added as part of the ptrace
consolidation is useful in variety of places that currently opencode it.
Switch them to the common helpers.
Add a ptrace_traceme() helper that needs to be explicitly called, and simplify
the ptrace_get_task_struct() interface. We don't need the request argument
now, and we return the task_struct directly, using ERR_PTR() for error
returns. It's a bit more code in the callers, but we have two sane routines
that do one thing well now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sanitize some s390 Kconfig options. We have ARCH_S390, ARCH_S390X,
ARCH_S390_31, 64BIT, S390_SUPPORT and COMPAT. Replace these 6 options by
S390, 64BIT and COMPAT.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
New feature V=V qdio pass-through.
QDIO and HiperSockets processing in z/VM V=V guest environments (as well as
V=R with z/VM running in LPAR mode) requires shadowing of all QDIO
architecture queue elements. Especially the shadowing of SBALs and SLSBs
structures in the hypervisor, and the need to issue SIGA SYNC operations to
observe state changes, eventually causes significant CPU processing overhead
in the hypervisor.
The QDIO pass-through support for V=V guests avoids the shadowing of SBALs and
SLSBs. This significantly reduces the hypervisor overhead for QDIO based I/O.
Signed-off-by: Frank Pavlic <pavlic@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add support for the hardware accelerated AES crypto algorithm.
Signed-off-by: Jan Glauber <jan.glauber@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add support for the hardware accelerated sha256 crypto algorithm.
Signed-off-by: Jan Glauber <jan.glauber@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Replace all references to z990 by s390 in the in-kernel crypto files in
arch/s390/crypto. The code is not specific to a particular machine (z990) but
to the s390 platform. Big diff, does nothing..
Signed-off-by: Jan Glauber <jan.glauber@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There are some more places where the use of cputime_t instead of an integer
type and the associated macros is necessary for the virtual cputime accounting
on s390. Affected are the s390 specific appldata code and BSD process
accounting.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Check return code of do_sigaltstack and force a SIGSEGV if it is -EFAULT.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert __access_ok to an inline C function and change __get_user primitive to
avoid uaccess compiler warnings.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hugh Dickins <hugh@veritas.com>
Fix the broken atomic_cmpxchg primitive. Add atomic_sub_and_test,
atomic64_sub_return, atomic64_sub_and_test, atomic64_cmpxchg,
atomic64_add_unless and atomic64_inc_not_zero. Replace old style
atomic_compare_and_swap by atomic_cmpxchg. Shorten the whole header by
defining most primitives with the two inline functions atomic_add_return and
atomic_sub_return.
In addition this patch contains the s390 related fixes of Hugh's "mm: fill
arch atomic64 gaps" patch.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make some changes to the NEED_RESCHED and POLLING_NRFLAG to reduce
confusion, and make their semantics rigid. Improves efficiency of
resched_task and some cpu_idle routines.
* In resched_task:
- TIF_NEED_RESCHED is only cleared with the task's runqueue lock held,
and as we hold it during resched_task, then there is no need for an
atomic test and set there. The only other time this should be set is
when the task's quantum expires, in the timer interrupt - this is
protected against because the rq lock is irq-safe.
- If TIF_NEED_RESCHED is set, then we don't need to do anything. It
won't get unset until the task get's schedule()d off.
- If we are running on the same CPU as the task we resched, then set
TIF_NEED_RESCHED and no further action is required.
- If we are running on another CPU, and TIF_POLLING_NRFLAG is *not* set
after TIF_NEED_RESCHED has been set, then we need to send an IPI.
Using these rules, we are able to remove the test and set operation in
resched_task, and make clear the previously vague semantics of
POLLING_NRFLAG.
* In idle routines:
- Enter cpu_idle with preempt disabled. When the need_resched() condition
becomes true, explicitly call schedule(). This makes things a bit clearer
(IMO), but haven't updated all architectures yet.
- Many do a test and clear of TIF_NEED_RESCHED for some reason. According
to the resched_task rules, this isn't needed (and actually breaks the
assumption that TIF_NEED_RESCHED is only cleared with the runqueue lock
held). So remove that. Generally one less locked memory op when switching
to the idle thread.
- Many idle routines clear TIF_POLLING_NRFLAG, and only set it in the inner
most polling idle loops. The above resched_task semantics allow it to be
set until before the last time need_resched() is checked before going into
a halt requiring interrupt wakeup.
Many idle routines simply never enter such a halt, and so POLLING_NRFLAG
can be always left set, completely eliminating resched IPIs when rescheduling
the idle task.
POLLING_NRFLAG width can be increased, to reduce the chance of resched IPIs.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Con Kolivas <kernel@kolivas.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Run idle threads with preempt disabled.
Also corrected a bugs in arm26's cpu_idle (make it actually call schedule()).
How did it ever work before?
Might fix the CPU hotplugging hang which Nigel Cunningham noted.
We think the bug hits if the idle thread is preempted after checking
need_resched() and before going to sleep, then the CPU offlined.
After calling stop_machine_run, the CPU eventually returns from preemption and
into the idle thread and goes to sleep. The CPU will continue executing
previous idle and have no chance to call play_dead.
By disabling preemption until we are ready to explicitly schedule, this bug is
fixed and the idle threads generally become more robust.
From: alexs <ashepard@u.washington.edu>
PPC build fix
From: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>
MIPS build fix
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
You could open the /proc/sys/net/ipv4/conf/<if>/<whatever> file, then
wait for interface to go away, try to grab as much memory as possible in
hope to hit the (kfreed) ctl_table. Then fill it with pointers to your
function. Then do read from file you've opened and if you are lucky,
you'll get it called as ->proc_handler() in kernel mode.
So this is at least an Oops and possibly more. It does depend on an
interface going away though, so less of a security risk than it would
otherwise be.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is the arch/ part of the big kfree cleanup patch.
Remove pointless checks for NULL prior to calling kfree() in arch/.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Acked-by: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Merge common parts of head.S and head64.S into head.S and move architecture
specific parts to head31.S and head64.S respectively. Saves us ~500 lines
of duplicated assembly code.
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove pagex pseudo page fault code. It does not work together with the
system call speedup that makes the complete system call path enabled for
interrupts. To make pagex and the syscall speedup code work together we would
have to add code to the program check handler to do a critical section cleanup
like the asynchronous interrupt code. This would make program checks slower.
Not what we want.
Newer versions of z/VM have the improved pfault pseudo page fault interface.
This replaces the old pagex interface and does not have the problem. So its
better to just rip out the pagex code.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Don't switch back to 24 bit addressing mode when waiting for an external
interrupt and set the correct bit in wait PSW (external mask instead of I/O
mask).
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The calculation of the value return by next_timer_interrupt from jiffies to
jiffies_64 is racy against xtime updates. We need to protect the calculation
with read_seqbegin/read_seqretry.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Always create all signal frames for pending signals before returning to
userspace, not just a single one.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>