The TOD epoch extension adds 8 epoch bits to the TOD clock to provide
a continuous clock after 2042/09/17. The store-clock-extended (STCKE)
instruction will store the epoch index in the first byte of the
16 bytes stored by the instruction. The read_boot_clock64 and the
read_presistent_clock64 functions need to take the additional bits
into account to give the correct result after 2042/09/17.
The clock-comparator register will stay 64 bit wide. The comparison
of the clock-comparator with the TOD clock is limited to bytes
1 to 8 of the extended TOD format. To deal with the overflow problem
due to an epoch change the clock-comparator sign control in CR0 can
be used to switch the comparison of the 64-bit TOD clock with the
clock-comparator to a signed comparison.
The decision between the signed vs. unsigned clock-comparator
comparisons is done at boot time. Only if the TOD clock is in the
second half of a 142 year epoch the signed comparison is used.
This solves the epoch overflow issue as long as the machine is
booted at least once in an epoch.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
There are three different code levels in regard to the identification
of guest samples. They differ in the way the LPP instruction is used.
1) Old kernels without the LPP instruction. The guest program parameter
is always zero.
2) Newer kernels load the process pid into the program parameter with LPP.
The guest program parameter is non-zero if the guest executes in a
process != idle.
3) The latest kernels load ((1UL << 31) | pid) with LPP to make the value
non-zero even for the idle task. The guest program parameter is non-zero
if the guest is running.
All kernels load the process pid to CR4 on context switch. The CPU sampling
code uses the value in CR4 to decide between guest and host samples in case
the guest program parameter is zero. The three cases:
1) CR4==pid, gpp==0
2) CR4==pid, gpp==pid
3) CR4==pid, gpp==((1UL << 31) | pid)
The load-control instruction to load the pid into CR4 is expensive and the
goal is to remove it. To distinguish the host CR4 from the guest pid for
the idle process the maximum value 0xffff for the PASN is used.
This adds a fourth case for a guest OS with an updated kernel:
4) CR4==0xffff, gpp=((1UL << 31) | pid)
The host kernel will have CR4==0xffff and will use (gpp!=0 || CR4!==0xffff)
to identify guest samples. This works nicely with all 4 cases, the only
possible issue would be a guest with an old kernel (gpp==0) and a process
pid of 0xffff. Well, don't do that..
Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
We have the s390 specific THREAD_ORDER define and the THREAD_SIZE_ORDER
define which is also used in common code. Both have exactly the same
semantics. Therefore get rid of THREAD_ORDER and always use
THREAD_SIZE_ORDER instead.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This is the s390 variant of commit 15f4eae70d ("x86: Move
thread_info into task_struct").
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
we have to check bit 40 of the facility list before issuing LPP
and not bit 48. Otherwise a guest running on a system with
"The decimal-floating-point zoned-conversion facility" and without
the "The set-program-parameters facility" might crash on an lpp
instruction.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: stable@vger.kernel.org # v4.4+
Fixes: e22cf8ca6f ("s390/cpumf: rework program parameter setting to detect guest samples")
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
head.s contains an stfle instruction which stores it result at the
storage location that is assigned to the stfl instruction.
This is currently no problem, since we only care about one double
word. However if the number of double words in the ALS bitfield grows
the current code is not very stable.
E.g. before issuing the stfle command the memory to which it stores
must be cleared, since the instruction may or may not clear memory
contents where no bits are set.
In order to simplify the code a bit always use the storage location
that we reserved for the stfle result.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The program parameter can be used to mark hardware samples with
some token. Previously, it was used to mark guest samples only.
Improve the program parameter doubleword by combining two parts,
the leftmost LPP part and the rightmost PID part. Set the PID
part for processes by using the task PID.
To distinguish host and guest samples for the kernel (PID part
is zero), the guest must always set the program paramater to a
non-zero value. Use the leftmost bit in the LPP part of the
program parameter to be able to detect guest kernel samples.
[brueckner@linux.vnet.ibm.com]: Split __LC_CURRENT and introduced
__LC_LPP. Corrected __LC_CURRENT users and adjusted assembler parts.
And updated the commit message accordingly.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The kernel currently crashes with a low-address-protection exception
if a user space process executes an instruction that tries to use the
linkage stack. Set the base-ASTE origin and the subspace-ASTE origin
of the dispatchable-unit-control-table to point to a dummy ASTE.
Set up control register 15 to point to an empty linkage stack with no
room left.
A user space process with a linkage stack instruction will still crash
but with a different exception which is correctly translated to a
segmentation fault instead of a kernel oops.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Let the kernel text section always begin at 1MB. This allows to always have
a large frame in the identity mapping of the kernel image for beginning of
the text section, if the machine has EDAT1 support.
Moving the beginning from 64K to 1MB doesn't cost any memory, since we make
the memory between 64K and 1MB available for the page allocator.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove the file name from the comment at top of many files. In most
cases the file name was wrong anyway, so it's rather pointless.
Also unify the IBM copyright statement. We did have a lot of sightly
different statements and wanted to change them one after another
whenever a file gets touched. However that never happened. Instead
people start to take the old/"wrong" statements to use as a template
for new files.
So unify all of them in one go.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Use a sigp sense running to decide which signal processor order to use
for an ipi. If the target cpu is running use external call, if the target
cpu is not running use emergency signal.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Do not set the cr0 enablement bit for iucv by default in head[31|64].S,
move the enablement to iucv_init in the iucv base layer.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The alignment is missing for various global symbols in s390 assembly code.
With a recent gcc and an instruction like stgrl this can lead to a
specification exception if the instruction uses such a mis-aligned address.
Specify the alignment explicitely and while add it define __ALIGN for s390
and use the ENTRY define to save some lines of code.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
As of git commit 1844c9bc0b head64.S/head31.S
are not included in head.S anymore but build as an extra object. This breaks
shared kernel support because the .org statement in head64.S/head31.S for
CONFIG_SHARED_KERNEL=y will have a different effect. The end address of the
head.text section in head.o will be added to the .org value, to compensate
for this subtract 0x11000 to get the required value of 0x100000 again.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fix two bugs with the kernel image compression:
1) reset the bss section of the compressed vmlinux
2) clear the high half of the registers for 64 bit early enough
for the decompression step
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add the "bzImage" compile target and the necessary code to generate
compressed kernel images. The old style uncompressed "image" target
is preserved, a simple make will build them both.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove support to be able to dump 31 bit systems with a 64 bit dumper.
This is mostly useless since no distro ships 31 bit kernels together
with a 64 bit dumper.
We also get rid of a bit of hacky code.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When the kernel is IPLed without the CLEAR option and switches
to 64-bit, the high-order half of the registers might contain
random values. This can cause addressing exceptions and the
kernel enters an interrupt loop.
Initialize the high-order half of the general purpose registers
with zeros after switching to 64-bit mode.
Cc: <stable@kernel.org>
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Saves us more than 65k pointless IPIs.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
"lockdep: Fix backtraces" reveales a bug in early setup code: when
lockdep tries to save a stack backtrace before setup_arch has been
called the lowcore pointer for the current thread info pointer isn't
initialized yet.
However our save stack backtrace code relies on it. If the pointer
isn't initialized the saved backtrace will have zero entries.
lockdep however relies (correctly) on the fact that that cannot
happen.
A write access to some random memory region is the result.
Fix this by initializing the thread info pointer early.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The extract cpu time instruction (ectg) instruction allows the user
process to get the current thread cputime without calling into the
kernel. The code that uses the instruction needs to switch to the
access registers mode to get access to the per-cpu info page that
contains the two base values that are needed to calculate the current
cputime from the CPU timer with the ectg instruction.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Early init code clears the backchain of the initial kernel stack frame.
This is not necessary since it is pre initialized with zeros. Plus it
was broken on 64 bit since it cleared only four of eight bytes.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch adds the code generation option for IBM System z10 and
adds a check in head[31,64].S to prevents the execution of a kernel
compiled for a new processor type on an old machine.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This adds hugetlbfs support on System z, using both hardware large page
support if available and software large page emulation on older hardware.
Shared (large) page tables are implemented in software emulation mode,
by using page->index of the first tail page from a compound large page
to store page table information.
Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
From: Heiko Carstens <heiko.carstens@de.ibm.com>
From: Carsten Otte <cotte@de.ibm.com>
This lets us use defines for the magic bits in machine flags instead
of using plain numbers all over the place.
In addition on newer machines features/facilities are indicated by the
result of the stfl instruction. So we use these bits instead of trying
to execute new instructions and check wether we get an exception or
not.
Also the mvpg instruction is always available when in zArch mode,
whereas the idte instruction is only available in zArch mode. This
results in some minor optimizations.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The clear-by-asce operation of the idte instruction gets an asce
(address-space-control-element) as argument to specify which TLBs
need to get flushed. The current code passes a plain pointer to
the start of the pgd without the additional bits which would make
the pointer an asce. The current machines don't mind the difference
but a future model might want to use the designation type control
bits in the asce as a filter for the TLBs to flush.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
s390 machines provide hardware support for creating Linux dumps on SCSI
disks. For creating a dump a special purpose dump Linux is used. The first
32 MB of memory are saved by the hardware before the dump Linux is
booted. Via an SCLP interface, the saved memory can be accessed from
Linux. This patch exports memory and registers of the crashed Linux to
userspace via a debugfs file. For more information refer to
Documentation/s390/zfcpdump.txt, which is included in this patch.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Replaced check_user_space() + __check_access_register with the new
check_space(). The old functions made wrong assumptions about kernel
and user space when the kernel and user address spaces are switched
(kernel in home space, user in primary/secondary space).
Secondly the user process can switch to the accress register mode if
it is running in primary or secondary mode. In addition it can load
an arbitrary value to the access registers. If any other value than
0 for primary space or 1 for secondary space is loaded and memory
is accessed using the base register related to the access register,
the program should be terminated with a SIGSEGV. To achieve that the
DUALD pointer in the DUCT and the PSALD pointer in the PASTE need
to point to an array of 8 invalid access-list entries to get a
ALEN-translation exception if an invalid alet is used.
Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
With CONFIG_SHARED_KERNEL the kernel text segment that might be in a
read only memory sections starts at 1MB. Memory between 0x12000 and
0x100000 is unused then. Free this, so we have appr. an extra MB
of memory available.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Hopefully this will make it more maintainable and less error prone.
Code makes use of search_exception_tables(). Since it calls this
function before the kernel exeception table is sorted, there is an
early call to sort_main_extable().
This way it's easy to use the already present infrastructure of fixup
sections. Also this would allows to easily convert the rest of
head[31|64].S into C code.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add support to boot from a named saved segment (NSS).
Signed-off-by: Hongjie Yang <hongjie@us.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
FCP dump feature detection works only if the sclp command in head.S
was succesful. Since the sclp command is skipped if diag260 works,
we don't have any dump feature detection anymore.
Bug was introduced with d57de5a367.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fix a memory leak problem in the memory detection routines. A memory leak
of 128k occurs when we have a contiguous memory with mixed access-mode
(read or write) ranges.
Signed-off-by: Hongjie Yang <hongjie@us.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
VMALLOC_END on 31bit should be 0x8000000UL instead of 0x7fffffffL.
The page mask which is used to make sure memory_end is on 4MB/2MB
boundary is wrong and not needed. Therefore remove it.
Make sure a vmalloc area does also exist and work on (future)
machines with 4TB and more memory.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Avoid the tprot loop if diag260 works and reports that there are no
holes in memory. The tprot instruction can lead to a significant delay
in the ipl process if the virtual guest has a lot of memory and the
host is under memory pressure.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Use the new diagnose 0x9c in the spinlock implementation for s390. It
yields the remaining timeslice of the virtual cpu that tries to acquire a
lock to the virtual cpu that is the current holder of the lock.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This introduces new user-copy operations which are optimized for
copying more than 256 Bytes on new hardware.
Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Convert GET_IPL_DEVICE assembler macro to C function.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
It is now possible to specify a ccw/fcp dump device which is used to
automatically create a system dump in case of a kernel panic. The dump
device can be configured under /sys/firmware/dump.
In addition it is now possible to specify a ccw/fcp device which is used
for the next reboot of Linux. The reipl device can be configured under
/sys/firmware/reipl.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Move initrd if the bitmap of the bootmem allocator would overwrite it.
In addition this patch sets the default size and address of the initrd to 0.
Therefore all boot loaders must set the initrd size and address correctly.
This is especially relevant for ftp boot via HMC/SE, where this change
requires a special patch file entry in the .ins file which sets these two
values contained at address 0x10408 and 0x10410.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
SLES9 binutils don't like .align 4096 statements in head.S. Work around this
by using .org statements.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
There is almost no room left for any new code between 0x10000
and 0x10480. Move the code from 0x10000 to 0x11000.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The wrong base register is used to read a value from the sclp data
structure. The value is used to calculate the memory size.
Use correct register %r4.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Merge common parts of head.S and head64.S into head.S and move architecture
specific parts to head31.S and head64.S respectively. Saves us ~500 lines
of duplicated assembly code.
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Don't switch back to 24 bit addressing mode when waiting for an external
interrupt and set the correct bit in wait PSW (external mask instead of I/O
mask).
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>