Commit Graph

231 Commits

Author SHA1 Message Date
Heiko Carstens 0cd9b7230c s390: add separate program check exit path
System call and program check handler both use the system call exit
path when returning to previous context. However the program check
handler jumps right to the end of the system call exit path if the
previous context is kernel context.

This lead to the quite odd double disabling of interrupts in the
system call exit path introduced with commit ce9dfafe29 ("s390:
fix system call exit path").

To avoid that have a separate program check handler exit path if the
previous context is kernel context.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-20 19:17:24 +01:00
Jens Axboe 75309018a2 s390: add support for TIF_NOTIFY_SIGNAL
Wire up TIF_NOTIFY_SIGNAL handling for s390.

Cc: linux-s390@vger.kernel.org
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-11-09 08:16:55 -07:00
Heiko Carstens ce9dfafe29 s390: fix system call exit path
The system call exit path is running with interrupts enabled while
checking for TIF/PIF/CIF bits which require special handling. If all
bits have been checked interrupts are disabled and the kernel exits to
user space.
The problem is that after checking all bits and before interrupts are
disabled bits can be set already again, due to interrupt handling.

This means that the kernel can exit to user space with some
TIF/PIF/CIF bits set, which should never happen. E.g. TIF_NEED_RESCHED
might be set, which might lead to additional latencies, since that bit
will only be recognized with next exit to user space.

Fix this by checking the corresponding bits only when interrupts are
disabled.

Fixes: 0b0ed657fe ("s390: remove critical section cleanup from entry.S")
Cc: <stable@vger.kernel.org> # 5.8
Acked-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-09 11:16:11 +01:00
Sven Schnelle 4bff8cb545 s390: convert to GENERIC_VDSO
Convert s390 to generic vDSO. There are a few special things on s390:

- vDSO can be called without a stack frame - glibc did this in the past.
  So we need to allocate a stackframe on our own.

- The former assembly code used stcke to get the TOD clock and applied
  time steering to it. We need to do the same in the new code. This is done
  in the architecture specific __arch_get_hw_counter function. The steering
  information is stored in an architecure specific area in the vDSO data.

- CPUCLOCK_VIRT is now handled with a syscall fallback, which might
  be slower/less accurate than the old implementation.

The getcpu() function stays as an assembly function because there is no
generic implementation and the code is just a few lines.

Performance number from my system do 100 mio gettimeofday() calls:

Plain syscall: 8.6s
Generic VDSO:  1.3s
old ASM VDSO:  1s

So it's a bit slower but still much faster than syscalls.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-08-26 18:47:21 +02:00
Christian Borntraeger 7b7735c5be s390: fix comment regarding interrupts in svc
With the removal of the critical section cleanup, we now enter the svc
interrupt handler with interrupts disabled.

Fixes: 0b0ed657fe ("s390: remove critical section cleanup from entry.S")
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-07-10 15:08:23 +02:00
Sven Schnelle e64a1618af s390: fix system call single stepping
When single stepping an svc instruction on s390, the kernel is entered
with a PER program check interruption. The program check handler than
jumps to the system call handler by reloading the PSW. The code didn't
set GPR13 to the thread pointer in struct task_struct. This made the
kernel access invalid memory while trying to fetch the syscall function
address. Fix this by always assigned GPR13 after .Lsysc_per.

Fixes: 0b0ed657fe ("s390: remove critical section cleanup from entry.S")
Reported-and-tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2020-06-23 14:05:45 +02:00
Sven Schnelle 00332c16b1 s390/ptrace: pass invalid syscall numbers to tracing
tracing expects to see invalid syscalls, so pass it through.
The syscall path in entry.S checks the syscall number before
looking up the handler, so it is still safe.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-06-16 13:44:04 +02:00
Sven Schnelle 0b0ed657fe s390: remove critical section cleanup from entry.S
The current code is rather complex and caused a lot of subtle
and hard to debug bugs in the past. Simplify the code by calling
the system_call handler with interrupts disabled, save
machine state, and re-enable them later.

This requires significant changes to the machine check handling code
as well. When the machine check interrupt arrived while being in kernel
mode the new code will signal pending machine checks with a SIGP external
call. When userspace was interrupted, the handler will switch to the
kernel stack and directly execute s390_handle_mcck().

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-05-28 12:21:54 +02:00
Sven Schnelle 0b38b5e1d0 s390: prevent leaking kernel address in BEAR
When userspace executes a syscall or gets interrupted,
BEAR contains a kernel address when returning to userspace.
This make it pretty easy to figure out where the kernel is
mapped even with KASLR enabled. To fix this, add lpswe to
lowcore and always execute it there, so userspace sees only
the lowcore address of lpswe. For this we have to extend
both critical_cleanup and the SWITCH_ASYNC macro to also check
for lpswe addresses in lowcore.

Fixes: b2d24b97b2 ("s390/kernel: add support for kernel address space layout randomization (KASLR)")
Cc: <stable@vger.kernel.org> # v5.2+
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-03-10 15:16:25 +01:00
Thomas Gleixner fa68645305 sched/rt, s390: Use CONFIG_PREEMPTION
CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT.
Both PREEMPT and PREEMPT_RT require the same functionality which today
depends on CONFIG_PREEMPT.

Switch the preemption and entry code over to use CONFIG_PREEMPTION. Add
PREEMPT_RT output to die().

[bigeasy: +Kconfig, dumpstack.c]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Link: https://lore.kernel.org/r/20191015191821.11479-18-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-08 14:37:35 +01:00
Heiko Carstens 67626fadd2 s390: enforce CONFIG_SMP
There never have been distributions that shiped with CONFIG_SMP=n for
s390. In addition the kernel currently doesn't even compile with
CONFIG_SMP=n for s390. Most likely it wouldn't even work, even if we
fix the compile error, since nobody tests it, since there is no use
case that I can think of.
Therefore simply enforce CONFIG_SMP and get rid of some more or
less unused code.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2019-06-07 10:09:37 +02:00
Martin Schwidefsky 26a374ae7a s390: add missing ENDPROC statements to assembler functions
The assembler code in arch/s390 misses proper ENDPROC statements
to properly end functions in a few places. Add them.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-05-02 13:54:11 +02:00
Gerald Schaefer ff4a742dde s390/kernel: convert SYSCALL and PGM_CHECK handlers to .quad
With a relocatable kernel that could reside at any place in memory, the
storage size for the SYSCALL and PGM_CHECK handlers needs to be
increased from .long to .quad.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Reviewed-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-04-29 10:47:10 +02:00
Arnd Bergmann aa0d6e70d3 s390: autogenerate compat syscall wrappers
Any system call that takes a pointer argument on s390 requires
a wrapper function to do a 31-to-64 zero-extension, these are
currently generated in arch/s390/kernel/compat_wrapper.c.

On arm64 and x86, we already generate similar wrappers for all
system calls in the place of their definition, just for a different
purpose (they load the arguments from pt_regs).

We can do the same thing here, by adding an asm/syscall_wrapper.h
file with a copy of all the relevant macros to override the generic
version. Besides the addition of the compat entry point, these also
rename the entry points with a __s390_ or __s390x_ prefix, similar
to what we do on arm64 and x86. This in turn requires renaming
a few things, and adding a proper ni_syscall() entry point.

In order to still compile system call definitions that pass an
loff_t argument, the __SC_COMPAT_CAST() macro checks for that
and forces an -ENOSYS error, which was the best I could come up
with. Those functions must obviously not get called from user
space, but instead require hand-written compat_sys_*() handlers,
which fortunately already exist.

Link: https://lore.kernel.org/lkml/20190116131527.2071570-5-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
[heiko.carstens@de.ibm.com: compile fix for !CONFIG_COMPAT]
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-01-18 09:33:19 +01:00
Vasily Gorbik 9fed920e68 s390/kasan: increase instrumented stack size to 64k
Increase kasan instrumented kernel stack size from 32k to 64k. Other
architectures seems to get away with just doubling kernel stack size under
kasan, but on s390 this appears to be not enough due to bigger frame size.
The particular pain point is kasan inlined checks (CONFIG_KASAN_INLINE
vs CONFIG_KASAN_OUTLINE). With inlined checks one particular case hitting
stack overflow is fs sync on xfs filesystem:

 #0 [9a0681e8]  704 bytes  check_usage at 34b1fc
 #1 [9a0684a8]  432 bytes  check_usage at 34c710
 #2 [9a068658]  1048 bytes  validate_chain at 35044a
 #3 [9a068a70]  312 bytes  __lock_acquire at 3559fe
 #4 [9a068ba8]  440 bytes  lock_acquire at 3576ee
 #5 [9a068d60]  104 bytes  _raw_spin_lock at 21b44e0
 #6 [9a068dc8]  1992 bytes  enqueue_entity at 2dbf72
 #7 [9a069590]  1496 bytes  enqueue_task_fair at 2df5f0
 #8 [9a069b68]  64 bytes  ttwu_do_activate at 28f438
 #9 [9a069ba8]  552 bytes  try_to_wake_up at 298c4c
 #10 [9a069dd0]  168 bytes  wake_up_worker at 23f97c
 #11 [9a069e78]  200 bytes  insert_work at 23fc2e
 #12 [9a069f40]  648 bytes  __queue_work at 2487c0
 #13 [9a06a1c8]  200 bytes  __queue_delayed_work at 24db28
 #14 [9a06a290]  248 bytes  mod_delayed_work_on at 24de84
 #15 [9a06a388]  24 bytes  kblockd_mod_delayed_work_on at 153e2a0
 #16 [9a06a3a0]  288 bytes  __blk_mq_delay_run_hw_queue at 158168c
 #17 [9a06a4c0]  192 bytes  blk_mq_run_hw_queue at 1581a3c
 #18 [9a06a580]  184 bytes  blk_mq_sched_insert_requests at 15a2192
 #19 [9a06a638]  1024 bytes  blk_mq_flush_plug_list at 1590f3a
 #20 [9a06aa38]  704 bytes  blk_flush_plug_list at 1555028
 #21 [9a06acf8]  320 bytes  schedule at 219e476
 #22 [9a06ae38]  760 bytes  schedule_timeout at 21b0aac
 #23 [9a06b130]  408 bytes  wait_for_common at 21a1706
 #24 [9a06b2c8]  360 bytes  xfs_buf_iowait at fa1540
 #25 [9a06b430]  256 bytes  __xfs_buf_submit at fadae6
 #26 [9a06b530]  264 bytes  xfs_buf_read_map at fae3f6
 #27 [9a06b638]  656 bytes  xfs_trans_read_buf_map at 10ac9a8
 #28 [9a06b8c8]  304 bytes  xfs_btree_kill_root at e72426
 #29 [9a06b9f8]  288 bytes  xfs_btree_lookup_get_block at e7bc5e
 #30 [9a06bb18]  624 bytes  xfs_btree_lookup at e7e1a6
 #31 [9a06bd88]  2664 bytes  xfs_alloc_ag_vextent_near at dfa070
 #32 [9a06c7f0]  144 bytes  xfs_alloc_ag_vextent at dff3ca
 #33 [9a06c880]  1128 bytes  xfs_alloc_vextent at e05fce
 #34 [9a06cce8]  584 bytes  xfs_bmap_btalloc at e58342
 #35 [9a06cf30]  1336 bytes  xfs_bmapi_write at e618de
 #36 [9a06d468]  776 bytes  xfs_iomap_write_allocate at ff678e
 #37 [9a06d770]  720 bytes  xfs_map_blocks at f82af8
 #38 [9a06da40]  928 bytes  xfs_writepage_map at f83cd6
 #39 [9a06dde0]  320 bytes  xfs_do_writepage at f85872
 #40 [9a06df20]  1320 bytes  write_cache_pages at 73dfe8
 #41 [9a06e448]  208 bytes  xfs_vm_writepages at f7f892
 #42 [9a06e518]  88 bytes  do_writepages at 73fe6a
 #43 [9a06e570]  872 bytes  __writeback_single_inode at a20cb6
 #44 [9a06e8d8]  664 bytes  writeback_sb_inodes at a23be2
 #45 [9a06eb70]  296 bytes  __writeback_inodes_wb at a242e0
 #46 [9a06ec98]  928 bytes  wb_writeback at a2500e
 #47 [9a06f038]  848 bytes  wb_do_writeback at a260ae
 #48 [9a06f388]  536 bytes  wb_workfn at a28228
 #49 [9a06f5a0]  1088 bytes  process_one_work at 24a234
 #50 [9a06f9e0]  1120 bytes  worker_thread at 24ba26
 #51 [9a06fe40]  104 bytes  kthread at 26545a
 #52 [9a06fea8]             kernel_thread_starter at 21b6b62

To be able to increase the stack size to 64k reuse LLILL instruction
in __switch_to function to load 64k - STACK_FRAME_OVERHEAD - __PT_SIZE
(65192) value as unsigned.

Reported-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-11-02 08:31:57 +01:00
Martin Schwidefsky ce3dc44749 s390: add support for virtually mapped kernel stacks
With virtually mapped kernel stacks the kernel stack overflow detection
is now fault based, every stack has a guard page in the vmalloc space.
The panic_stack is renamed to nodat_stack and is used for all function
that need to run without DAT, e.g. memcpy_real or do_start_kdump.

The main effect is a reduction in the kernel image size as with vmap
stacks the old style overflow checking that adds two instructions per
function is not needed anymore. Result from bloat-o-meter:

add/remove: 20/1 grow/shrink: 13/26854 up/down: 2198/-216240 (-214042)

In regard to performance the micro-benchmark for fork has a hit of a
few microseconds, allocating 4 pages in vmalloc space is more expensive
compare to an order-2 page allocation. But with real workload I could
not find a noticeable difference.

Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:20:57 +02:00
Heiko Carstens 9d6d99e3ac s390: wire up rseq system call
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-07-04 08:35:18 +02:00
Christian Borntraeger 891f6a726c s390: Correct register corruption in critical section cleanup
In the critical section cleanup we must not mess with r1.  For march=z9
or older, larl + ex (instead of exrl) are used with r1 as a temporary
register. This can clobber r1 in several interrupt handlers. Fix this by
using r11 as a temp register.  r11 is being saved by all callers of
cleanup_critical.

Fixes: 6dd85fbb87 ("s390: move expoline assembler macros to a header")
Cc: stable@vger.kernel.org #v4.16
Reported-by: Oliver Kurz <okurz@suse.com>
Reported-by: Petr Tesařík <ptesarik@suse.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-06-25 10:07:12 +02:00
Martin Schwidefsky 6dd85fbb87 s390: move expoline assembler macros to a header
To be able to use the expoline branches in different assembler
files move the associated macros from entry.S to a new header
nospec-insn.h.

While we are at it make the macros a bit nicer to use.

Cc: stable@vger.kernel.org # 4.16
Fixes: f19fbd5ed6 ("s390: introduce execute-trampolines for branches")
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-05-07 09:07:32 +02:00
Linus Torvalds becdce1c66 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:

 - Improvements for the spectre defense:
    * The spectre related code is consolidated to a single file
      nospec-branch.c
    * Automatic enable/disable for the spectre v2 defenses (expoline vs.
      nobp)
    * Syslog messages for specve v2 are added
    * Enable CONFIG_GENERIC_CPU_VULNERABILITIES and define the attribute
      functions for spectre v1 and v2

 - Add helper macros for assembler alternatives and use them to shorten
   the code in entry.S.

 - Add support for persistent configuration data via the SCLP Store Data
   interface. The H/W interface requires a page table that uses 4K pages
   only, the code to setup such an address space is added as well.

 - Enable virtio GPU emulation in QEMU. To do this the depends
   statements for a few common Kconfig options are modified.

 - Add support for format-3 channel path descriptors and add a binary
   sysfs interface to export the associated utility strings.

 - Add a sysfs attribute to control the IFCC handling in case of
   constant channel errors.

 - The vfio-ccw changes from Cornelia.

 - Bug fixes and cleanups.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (40 commits)
  s390/kvm: improve stack frame constants in entry.S
  s390/lpp: use assembler alternatives for the LPP instruction
  s390/entry.S: use assembler alternatives
  s390: add assembler macros for CPU alternatives
  s390: add sysfs attributes for spectre
  s390: report spectre mitigation via syslog
  s390: add automatic detection of the spectre defense
  s390: move nobp parameter functions to nospec-branch.c
  s390/cio: add util_string sysfs attribute
  s390/chsc: query utility strings via fmt3 channel path descriptor
  s390/cio: rename struct channel_path_desc
  s390/cio: fix unbind of io_subchannel_driver
  s390/qdio: split up CCQ handling for EQBS / SQBS
  s390/qdio: don't retry EQBS after CCQ 96
  s390/qdio: restrict buffer merging to eligible devices
  s390/qdio: don't merge ERROR output buffers
  s390/qdio: simplify math in get_*_buffer_frontier()
  s390/decompressor: trim uncompressed image head during the build
  s390/crypto: Fix kernel crash on aes_s390 module remove.
  s390/defkeymap: fix global init to zero
  ...
2018-04-09 09:04:10 -07:00
Martin Schwidefsky 92fa7a13c8 s390/kvm: improve stack frame constants in entry.S
The code in sie64a uses the stack frame passed to the function to store
some temporary data in the empty1 array (see struct stack_frame in
asm/processor.h.

Replace the __SF_EMPTY+x constants with a properly defined offset:
s/__SF_EMPTY/__SF_SIE_CONTROL/, s/__SF_EMPTY+8/__SF_SIE_SAVEAREA/,
s/__SF_EMPTY+16/__SF_SIE_REASON/, s/__SF_EMPTY+24/__SF_SIE_FLAGS/.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-03-28 08:38:29 +02:00
Martin Schwidefsky e5b98199de s390/lpp: use assembler alternatives for the LPP instruction
With the new macros for CPU alternatives the MACHINE_FLAG_LPP check
around the LPP instruction can be optimized. After this is done the
flag can be removed.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-03-28 08:38:28 +02:00
Martin Schwidefsky b058661a99 s390/entry.S: use assembler alternatives
Replace the open coded alternatives for the BPOFF, BPON, BPENTER,
and BPEXIT macros with the new magic from asm/alternatives-asm.h
to make the code easier to read.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-03-28 08:38:28 +02:00
Christian Borntraeger d3f468963c s390/entry.S: fix spurious zeroing of r0
when a system call is interrupted we might call the critical section
cleanup handler that re-does some of the operations. When we are between
.Lsysc_vtime and .Lsysc_do_svc we might also redo the saving of the
problem state registers r0-r7:

.Lcleanup_system_call:
[...]
0:      # update accounting time stamp
        mvc     __LC_LAST_UPDATE_TIMER(8),__LC_SYNC_ENTER_TIMER
        # set up saved register r11
        lg      %r15,__LC_KERNEL_STACK
        la      %r9,STACK_FRAME_OVERHEAD(%r15)
        stg     %r9,24(%r11)            # r11 pt_regs pointer
        # fill pt_regs
        mvc     __PT_R8(64,%r9),__LC_SAVE_AREA_SYNC
--->    stmg    %r0,%r7,__PT_R0(%r9)

The problem is now, that we might have already zeroed out r0.
The fix is to move the zeroing of r0 after sysc_do_svc.

Reported-by: Farhan Ali <alifm@linux.vnet.ibm.com>
Fixes: 7041d28115 ("s390: scrub registers on kernel entry and KVM exit")
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-03-06 09:19:35 +01:00
Martin Schwidefsky d5feec04fe s390: do not bypass BPENTER for interrupt system calls
The system call path can be interrupted before the switch back to the
standard branch prediction with BPENTER has been done. The critical
section cleanup code skips forward to .Lsysc_do_svc and bypasses the
BPENTER. In this case the kernel and all subsequent code will run with
the limited branch prediction.

Fixes: eacf67eb9b32 ("s390: run user space and KVM guests with modified branch prediction")
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-02-22 14:08:51 +01:00
Hendrik Brueckner dc24b7b49a s390/clean-up: use CFI_* macros in entry.S
Commit f19fbd5ed6 ("s390: introduce execute-trampolines for
branches") introduces .cfi_* assembler directives.  Instead of
using the directives directly, use the macros from asm/dwarf.h.
This also ensures that the dwarf debug information are created
in the .debug_frame section.

Fixes: f19fbd5ed6 ("s390: introduce execute-trampolines for branches")
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-02-22 10:09:20 +01:00
Martin Schwidefsky f19fbd5ed6 s390: introduce execute-trampolines for branches
Add CONFIG_EXPOLINE to enable the use of the new -mindirect-branch= and
-mfunction_return= compiler options to create a kernel fortified against
the specte v2 attack.

With CONFIG_EXPOLINE=y all indirect branches will be issued with an
execute type instruction. For z10 or newer the EXRL instruction will
be used, for older machines the EX instruction. The typical indirect
call

	basr	%r14,%r1

is replaced with a PC relative call to a new thunk

	brasl	%r14,__s390x_indirect_jump_r1

The thunk contains the EXRL/EX instruction to the indirect branch

__s390x_indirect_jump_r1:
	exrl	0,0f
	j	.
0:	br	%r1

The detour via the execute type instruction has a performance impact.
To get rid of the detour the new kernel parameter "nospectre_v2" and
"spectre_v2=[on,off,auto]" can be used. If the parameter is specified
the kernel and module code will be patched at runtime.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-02-07 15:57:02 +01:00
Martin Schwidefsky 6b73044b2b s390: run user space and KVM guests with modified branch prediction
Define TIF_ISOLATE_BP and TIF_ISOLATE_BP_GUEST and add the necessary
plumbing in entry.S to be able to run user space and KVM guests with
limited branch prediction.

To switch a user space process to limited branch prediction the
s390_isolate_bp() function has to be call, and to run a vCPU of a KVM
guest associated with the current task with limited branch prediction
call s390_isolate_bp_guest().

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-02-05 14:48:50 +01:00
Martin Schwidefsky d768bd892f s390: add options to change branch prediction behaviour for the kernel
Add the PPA instruction to the system entry and exit path to switch
the kernel to a different branch prediction behaviour. The instructions
are added via CPU alternatives and can be disabled with the "nospec"
or the "nobp=0" kernel parameter. If the default behaviour selected
with CONFIG_KERNEL_NOBP is set to "n" then the "nobp=1" parameter can be
used to enable the changed kernel branch prediction.

Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-02-05 13:49:17 +01:00
Martin Schwidefsky 7041d28115 s390: scrub registers on kernel entry and KVM exit
Clear all user space registers on entry to the kernel and all KVM guest
registers on KVM guest exit if the register does not contain either a
parameter or a result value.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-02-05 07:34:54 +01:00
Hendrik Brueckner 4381f9f12e s390/syscalls: use generated syscall_table.h and unistd.h header files
Update the uapi/asm/unistd.h to include the generated compat and
64-bit version of the unistd.h and, as well as, the unistd_nr.h
header file.  Also remove the arch/s390/kernel/syscalls.S file
and use the generated system call table, syscall_table.h, instead.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-01-23 07:36:52 +01:00
Heiko Carstens 3241d3eb7d s390: rework __switch_to() to allow larger task_struct offsets
If GCC_PLUGIN_RANDSTRUCT is enabled the members of task_struct will be
shuffled around. The offsets of the "pid" and "stack" members within
task_struct may not necessarily fit into 12 bits anymore, which causes
compile errors within __switch_to, since instructions are used, which
only have a 12 bit displacement field.

Therefore rework __switch_to, to allow for larger offsets.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-20 08:51:01 +01:00
Martin Schwidefsky 0aaba41b58 s390: remove all code using the access register mode
The vdso code for the getcpu() and the clock_gettime() call use the access
register mode to access the per-CPU vdso data page with the current code.

An alternative to the complicated AR mode is to use the secondary space
mode. This makes the vdso faster and quite a bit simpler. The downside is
that the uaccess code has to be changed quite a bit.

Which instructions are used depends on the machine and what kind of uaccess
operation is requested. The instruction dictates which ASCE value needs
to be loaded into %cr1 and %cr7.

The different cases:

* User copy with MVCOS for z10 and newer machines
  The MVCOS instruction can copy between the primary space (aka user) and
  the home space (aka kernel) directly. For set_fs(KERNEL_DS) the kernel
  ASCE is loaded into %cr1. For set_fs(USER_DS) the user space is already
  loaded in %cr1.

* User copy with MVCP/MVCS for older machines
  To be able to execute the MVCP/MVCS instructions the kernel needs to
  switch to primary mode. The control register %cr1 has to be set to the
  kernel ASCE and %cr7 to either the kernel ASCE or the user ASCE dependent
  on set_fs(KERNEL_DS) vs set_fs(USER_DS).

* Data access in the user address space for strnlen / futex
  To use "normal" instruction with data from the user address space the
  secondary space mode is used. The kernel needs to switch to primary mode,
  %cr1 has to contain the kernel ASCE and %cr7 either the user ASCE or the
  kernel ASCE, dependent on set_fs.

To load a new value into %cr1 or %cr7 is an expensive operation, the kernel
tries to be lazy about it. E.g. for multiple user copies in a row with
MVCP/MVCS the replacement of the vdso ASCE in %cr7 with the user ASCE is
done only once. On return to user space a CPU bit is checked that loads the
vdso ASCE again.

To enable and disable the data access via the secondary space two new
functions are added, enable_sacf_uaccess and disable_sacf_uaccess. The fact
that a context is in secondary space uaccess mode is stored in the
mm_segment_t value for the task. The code of an interrupt may use set_fs
as long as it returns to the previous state it got with get_fs with another
call to set_fs. The code in finish_arch_post_lock_switch simply has to do a
set_fs with the current mm_segment_t value for the task.

For CPUs with MVCOS:

CPU running in                        | %cr1 ASCE | %cr7 ASCE |
--------------------------------------|-----------|-----------|
user space                            |  user     |  vdso     |
kernel, USER_DS, normal-mode          |  user     |  vdso     |
kernel, USER_DS, normal-mode, lazy    |  user     |  user     |
kernel, USER_DS, sacf-mode            |  kernel   |  user     |
kernel, KERNEL_DS, normal-mode        |  kernel   |  vdso     |
kernel, KERNEL_DS, normal-mode, lazy  |  kernel   |  kernel   |
kernel, KERNEL_DS, sacf-mode          |  kernel   |  kernel   |

For CPUs without MVCOS:

CPU running in                        | %cr1 ASCE | %cr7 ASCE |
--------------------------------------|-----------|-----------|
user space                            |  user     |  vdso     |
kernel, USER_DS, normal-mode          |  user     |  vdso     |
kernel, USER_DS, normal-mode lazy     |  kernel   |  user     |
kernel, USER_DS, sacf-mode            |  kernel   |  user     |
kernel, KERNEL_DS, normal-mode        |  kernel   |  vdso     |
kernel, KERNEL_DS, normal-mode, lazy  |  kernel   |  kernel   |
kernel, KERNEL_DS, sacf-mode          |  kernel   |  kernel   |

The lines with "lazy" refer to the state after a copy via the secondary
space with a delayed reload of %cr1 and %cr7.

There are three hardware address spaces that can cause a DAT exception,
primary, secondary and home space. The exception can be related to
four different fault types: user space fault, vdso fault, kernel fault,
and the gmap faults.

Dependent on the set_fs state and normal vs. sacf mode there are a number
of fault combinations:

1) user address space fault via the primary ASCE
2) gmap address space fault via the primary ASCE
3) kernel address space fault via the primary ASCE for machines with
   MVCOS and set_fs(KERNEL_DS)
4) vdso address space faults via the secondary ASCE with an invalid
   address while running in secondary space in problem state
5) user address space fault via the secondary ASCE for user-copy
   based on the secondary space mode, e.g. futex_ops or strnlen_user
6) kernel address space fault via the secondary ASCE for user-copy
   with secondary space mode with set_fs(KERNEL_DS)
7) kernel address space fault via the primary ASCE for user-copy
   with secondary space mode with set_fs(USER_DS) on machines without
   MVCOS.
8) kernel address space fault via the home space ASCE

Replace user_space_fault() with a new function get_fault_type() that
can distinguish all four different fault types.

With these changes the futex atomic ops from the kernel and the
strnlen_user will get a little bit slower, as well as the old style
uaccess with MVCP/MVCS. All user accesses based on MVCOS will be as
fast as before. On the positive side, the user space vdso code is a
lot faster and Linux ceases to use the complicated AR mode.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2017-11-14 11:01:47 +01:00
Martin Schwidefsky c771320e93 s390/mm,kvm: improve detection of KVM guest faults
The identification of guest fault currently relies on the PF_VCPU flag.
This is set in guest_entry_irqoff and cleared in guest_exit_irqoff.
Both functions are called by __vcpu_run, the PF_VCPU flag is set for
quite a lot of kernel code outside of the guest execution.

Replace the PF_VCPU scheme with the PIF_GUEST_FAULT in the pt_regs and
make the program check handler code in entry.S set the bit only for
exception that occurred between the .Lsie_gmap and .Lsie_done labels.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2017-11-14 11:01:43 +01:00
Linus Torvalds d60a540ac5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Heiko Carstens:
 "Since Martin is on vacation you get the s390 pull request for the
  v4.15 merge window this time from me.

  Besides a lot of cleanups and bug fixes these are the most important
  changes:

   - a new regset for runtime instrumentation registers

   - hardware accelerated AES-GCM support for the aes_s390 module

   - support for the new CEX6S crypto cards

   - support for FORTIFY_SOURCE

   - addition of missing z13 and new z14 instructions to the in-kernel
     disassembler

   - generate opcode tables for the in-kernel disassembler out of a
     simple text file instead of having to manually maintain those
     tables

   - fast memset16, memset32 and memset64 implementations

   - removal of named saved segment support

   - hardware counter support for z14

   - queued spinlocks and queued rwlocks implementations for s390

   - use the stack_depth tracking feature for s390 BPF JIT

   - a new s390_sthyi system call which emulates the sthyi (store
     hypervisor information) instruction

   - removal of the old KVM virtio transport

   - an s390 specific CPU alternatives implementation which is used in
     the new spinlock code"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (88 commits)
  MAINTAINERS: add virtio-ccw.h to virtio/s390 section
  s390/noexec: execute kexec datamover without DAT
  s390: fix transactional execution control register handling
  s390/bpf: take advantage of stack_depth tracking
  s390: simplify transactional execution elf hwcap handling
  s390/zcrypt: Rework struct ap_qact_ap_info.
  s390/virtio: remove unused header file kvm_virtio.h
  s390: avoid undefined behaviour
  s390/disassembler: generate opcode tables from text file
  s390/disassembler: remove insn_to_mnemonic()
  s390/dasd: avoid calling do_gettimeofday()
  s390: vfio-ccw: Do not attempt to free no-op, test and tic cda.
  s390: remove named saved segment support
  s390/archrandom: Reconsider s390 arch random implementation
  s390/pci: do not require AIS facility
  s390/qdio: sanitize put_indicator
  s390/qdio: use atomic_cmpxchg
  s390/nmi: avoid using long-displacement facility
  s390: pass endianness info to sparse
  s390/decompressor: remove informational messages
  ...
2017-11-13 11:47:01 -08:00
Linus Torvalds ead751507d License cleanup: add SPDX license identifiers to some files
Many source files in the tree are missing licensing information, which
 makes it harder for compliance tools to determine the correct license.
 
 By default all files without license information are under the default
 license of the kernel, which is GPL version 2.
 
 Update the files which contain no license information with the 'GPL-2.0'
 SPDX license identifier.  The SPDX identifier is a legally binding
 shorthand, which can be used instead of the full boiler plate text.
 
 This patch is based on work done by Thomas Gleixner and Kate Stewart and
 Philippe Ombredanne.
 
 How this work was done:
 
 Patches were generated and checked against linux-4.14-rc6 for a subset of
 the use cases:
  - file had no licensing information it it.
  - file was a */uapi/* one with no licensing information in it,
  - file was a */uapi/* one with existing licensing information,
 
 Further patches will be generated in subsequent months to fix up cases
 where non-standard license headers were used, and references to license
 had to be inferred by heuristics based on keywords.
 
 The analysis to determine which SPDX License Identifier to be applied to
 a file was done in a spreadsheet of side by side results from of the
 output of two independent scanners (ScanCode & Windriver) producing SPDX
 tag:value files created by Philippe Ombredanne.  Philippe prepared the
 base worksheet, and did an initial spot review of a few 1000 files.
 
 The 4.13 kernel was the starting point of the analysis with 60,537 files
 assessed.  Kate Stewart did a file by file comparison of the scanner
 results in the spreadsheet to determine which SPDX license identifier(s)
 to be applied to the file. She confirmed any determination that was not
 immediately clear with lawyers working with the Linux Foundation.
 
 Criteria used to select files for SPDX license identifier tagging was:
  - Files considered eligible had to be source code files.
  - Make and config files were included as candidates if they contained >5
    lines of source
  - File already had some variant of a license header in it (even if <5
    lines).
 
 All documentation files were explicitly excluded.
 
 The following heuristics were used to determine which SPDX license
 identifiers to apply.
 
  - when both scanners couldn't find any license traces, file was
    considered to have no license information in it, and the top level
    COPYING file license applied.
 
    For non */uapi/* files that summary was:
 
    SPDX license identifier                            # files
    ---------------------------------------------------|-------
    GPL-2.0                                              11139
 
    and resulted in the first patch in this series.
 
    If that file was a */uapi/* path one, it was "GPL-2.0 WITH
    Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
 
    SPDX license identifier                            # files
    ---------------------------------------------------|-------
    GPL-2.0 WITH Linux-syscall-note                        930
 
    and resulted in the second patch in this series.
 
  - if a file had some form of licensing information in it, and was one
    of the */uapi/* ones, it was denoted with the Linux-syscall-note if
    any GPL family license was found in the file or had no licensing in
    it (per prior point).  Results summary:
 
    SPDX license identifier                            # files
    ---------------------------------------------------|------
    GPL-2.0 WITH Linux-syscall-note                       270
    GPL-2.0+ WITH Linux-syscall-note                      169
    ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
    ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
    LGPL-2.1+ WITH Linux-syscall-note                      15
    GPL-1.0+ WITH Linux-syscall-note                       14
    ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
    LGPL-2.0+ WITH Linux-syscall-note                       4
    LGPL-2.1 WITH Linux-syscall-note                        3
    ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
    ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
 
    and that resulted in the third patch in this series.
 
  - when the two scanners agreed on the detected license(s), that became
    the concluded license(s).
 
  - when there was disagreement between the two scanners (one detected a
    license but the other didn't, or they both detected different
    licenses) a manual inspection of the file occurred.
 
  - In most cases a manual inspection of the information in the file
    resulted in a clear resolution of the license that should apply (and
    which scanner probably needed to revisit its heuristics).
 
  - When it was not immediately clear, the license identifier was
    confirmed with lawyers working with the Linux Foundation.
 
  - If there was any question as to the appropriate license identifier,
    the file was flagged for further research and to be revisited later
    in time.
 
 In total, over 70 hours of logged manual review was done on the
 spreadsheet to determine the SPDX license identifiers to apply to the
 source files by Kate, Philippe, Thomas and, in some cases, confirmation
 by lawyers working with the Linux Foundation.
 
 Kate also obtained a third independent scan of the 4.13 code base from
 FOSSology, and compared selected files where the other two scanners
 disagreed against that SPDX file, to see if there was new insights.  The
 Windriver scanner is based on an older version of FOSSology in part, so
 they are related.
 
 Thomas did random spot checks in about 500 files from the spreadsheets
 for the uapi headers and agreed with SPDX license identifier in the
 files he inspected. For the non-uapi files Thomas did random spot checks
 in about 15000 files.
 
 In initial set of patches against 4.14-rc6, 3 files were found to have
 copy/paste license identifier errors, and have been fixed to reflect the
 correct identifier.
 
 Additionally Philippe spent 10 hours this week doing a detailed manual
 inspection and review of the 12,461 patched files from the initial patch
 version early this week with:
  - a full scancode scan run, collecting the matched texts, detected
    license ids and scores
  - reviewing anything where there was a license detected (about 500+
    files) to ensure that the applied SPDX license was correct
  - reviewing anything where there was no detection but the patch license
    was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
    SPDX license was correct
 
 This produced a worksheet with 20 files needing minor correction.  This
 worksheet was then exported into 3 different .csv files for the
 different types of files to be modified.
 
 These .csv files were then reviewed by Greg.  Thomas wrote a script to
 parse the csv files and add the proper SPDX tag to the file, in the
 format that the file expected.  This script was further refined by Greg
 based on the output to detect more types of files automatically and to
 distinguish between header and source .c files (which need different
 comment types.)  Finally Greg ran the script using the .csv files to
 generate the patches.
 
 Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
 Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
 Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWfswbQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykvEwCfXU1MuYFQGgMdDmAZXEc+xFXZvqgAoKEcHDNA
 6dVh26uchcEQLN/XqUDt
 =x306
 -----END PGP SIGNATURE-----

Merge tag 'spdx_identifiers-4.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull initial SPDX identifiers from Greg KH:
 "License cleanup: add SPDX license identifiers to some files

  Many source files in the tree are missing licensing information, which
  makes it harder for compliance tools to determine the correct license.

  By default all files without license information are under the default
  license of the kernel, which is GPL version 2.

  Update the files which contain no license information with the
  'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally
  binding shorthand, which can be used instead of the full boiler plate
  text.

  This patch is based on work done by Thomas Gleixner and Kate Stewart
  and Philippe Ombredanne.

  How this work was done:

  Patches were generated and checked against linux-4.14-rc6 for a subset
  of the use cases:

   - file had no licensing information it it.

   - file was a */uapi/* one with no licensing information in it,

   - file was a */uapi/* one with existing licensing information,

  Further patches will be generated in subsequent months to fix up cases
  where non-standard license headers were used, and references to
  license had to be inferred by heuristics based on keywords.

  The analysis to determine which SPDX License Identifier to be applied
  to a file was done in a spreadsheet of side by side results from of
  the output of two independent scanners (ScanCode & Windriver)
  producing SPDX tag:value files created by Philippe Ombredanne.
  Philippe prepared the base worksheet, and did an initial spot review
  of a few 1000 files.

  The 4.13 kernel was the starting point of the analysis with 60,537
  files assessed. Kate Stewart did a file by file comparison of the
  scanner results in the spreadsheet to determine which SPDX license
  identifier(s) to be applied to the file. She confirmed any
  determination that was not immediately clear with lawyers working with
  the Linux Foundation.

  Criteria used to select files for SPDX license identifier tagging was:

   - Files considered eligible had to be source code files.

   - Make and config files were included as candidates if they contained
     >5 lines of source

   - File already had some variant of a license header in it (even if <5
     lines).

  All documentation files were explicitly excluded.

  The following heuristics were used to determine which SPDX license
  identifiers to apply.

   - when both scanners couldn't find any license traces, file was
     considered to have no license information in it, and the top level
     COPYING file license applied.

     For non */uapi/* files that summary was:

       SPDX license identifier                            # files
       ---------------------------------------------------|-------
       GPL-2.0                                              11139

     and resulted in the first patch in this series.

     If that file was a */uapi/* path one, it was "GPL-2.0 WITH
     Linux-syscall-note" otherwise it was "GPL-2.0". Results of that
     was:

       SPDX license identifier                            # files
       ---------------------------------------------------|-------
       GPL-2.0 WITH Linux-syscall-note                        930

     and resulted in the second patch in this series.

   - if a file had some form of licensing information in it, and was one
     of the */uapi/* ones, it was denoted with the Linux-syscall-note if
     any GPL family license was found in the file or had no licensing in
     it (per prior point). Results summary:

       SPDX license identifier                            # files
       ---------------------------------------------------|------
       GPL-2.0 WITH Linux-syscall-note                       270
       GPL-2.0+ WITH Linux-syscall-note                      169
       ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
       ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
       LGPL-2.1+ WITH Linux-syscall-note                      15
       GPL-1.0+ WITH Linux-syscall-note                       14
       ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
       LGPL-2.0+ WITH Linux-syscall-note                       4
       LGPL-2.1 WITH Linux-syscall-note                        3
       ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
       ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

     and that resulted in the third patch in this series.

   - when the two scanners agreed on the detected license(s), that
     became the concluded license(s).

   - when there was disagreement between the two scanners (one detected
     a license but the other didn't, or they both detected different
     licenses) a manual inspection of the file occurred.

   - In most cases a manual inspection of the information in the file
     resulted in a clear resolution of the license that should apply
     (and which scanner probably needed to revisit its heuristics).

   - When it was not immediately clear, the license identifier was
     confirmed with lawyers working with the Linux Foundation.

   - If there was any question as to the appropriate license identifier,
     the file was flagged for further research and to be revisited later
     in time.

  In total, over 70 hours of logged manual review was done on the
  spreadsheet to determine the SPDX license identifiers to apply to the
  source files by Kate, Philippe, Thomas and, in some cases,
  confirmation by lawyers working with the Linux Foundation.

  Kate also obtained a third independent scan of the 4.13 code base from
  FOSSology, and compared selected files where the other two scanners
  disagreed against that SPDX file, to see if there was new insights.
  The Windriver scanner is based on an older version of FOSSology in
  part, so they are related.

  Thomas did random spot checks in about 500 files from the spreadsheets
  for the uapi headers and agreed with SPDX license identifier in the
  files he inspected. For the non-uapi files Thomas did random spot
  checks in about 15000 files.

  In initial set of patches against 4.14-rc6, 3 files were found to have
  copy/paste license identifier errors, and have been fixed to reflect
  the correct identifier.

  Additionally Philippe spent 10 hours this week doing a detailed manual
  inspection and review of the 12,461 patched files from the initial
  patch version early this week with:

   - a full scancode scan run, collecting the matched texts, detected
     license ids and scores

   - reviewing anything where there was a license detected (about 500+
     files) to ensure that the applied SPDX license was correct

   - reviewing anything where there was no detection but the patch
     license was not GPL-2.0 WITH Linux-syscall-note to ensure that the
     applied SPDX license was correct

  This produced a worksheet with 20 files needing minor correction. This
  worksheet was then exported into 3 different .csv files for the
  different types of files to be modified.

  These .csv files were then reviewed by Greg. Thomas wrote a script to
  parse the csv files and add the proper SPDX tag to the file, in the
  format that the file expected. This script was further refined by Greg
  based on the output to detect more types of files automatically and to
  distinguish between header and source .c files (which need different
  comment types.) Finally Greg ran the script using the .csv files to
  generate the patches.

  Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
  Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
  Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
  Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

* tag 'spdx_identifiers-4.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  License cleanup: add SPDX license identifier to uapi header files with a license
  License cleanup: add SPDX license identifier to uapi header files with no license
  License cleanup: add SPDX GPL-2.0 license identifier to files with no license
2017-11-02 10:04:46 -07:00
Vasily Gorbik 2a2d7befd4 s390/nmi: avoid using long-displacement facility
__LC_MCESAD is currently 4528 /* offsetof(struct lowcore, mcesad) */
that would require long-displacement facility for lg, which we don't
have on z900.

Fixes: 3037a52f98 ("s390/nmi: do register validation as early as possible")
Signed-off-by: Vasily Gorbik <gor@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-02 12:32:46 +01:00
Greg Kroah-Hartman b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Martin Schwidefsky 0a5e2ec264 s390/kvm: fix detection of guest machine checks
The new detection code for guest machine checks added a check based
on %r11 to .Lcleanup_sie to distinguish between normal asynchronous
interrupts and machine checks. But the funtion is called from the
program check handler as well with an undefined value in %r11.

The effect is that all program exceptions pointing to the SIE instruction
will set the CIF_MCCK_GUEST bit. The bit stays set for the CPU until the
 next machine check comes in which will incorrectly be interpreted as a
guest machine check.

The simplest fix is to stop using .Lcleanup_sie in the program check
handler and duplicate a few instructions.

Fixes: c929500d7a ("s390/nmi: s390: New low level handling for machine check happening in guest")
Cc: <stable@vger.kernel.org> # v4.13+
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-10-25 07:59:30 +02:00
Martin Schwidefsky 3037a52f98 s390/nmi: do register validation as early as possible
The validation of the CPU registers in the machine check handler is
currently split into two parts. The first part is done at the start
of the low level mcck_int_handler function, this includes the CPU
timer register and the general purpose registers.
The second part is done a bit later in s390_do_machine_check for all
the other registers, including the control registers, floating pointer
control, vector or floating pointer registers, the access registers,
the guarded storage registers, the TOD programmable registers and the
clock comparator.

This is working fine to far but in theory a future extensions could
cause the C code to use registers that are not validated yet. A better
approach is to validate all CPU registers in "safe" assembler code
before any C function is called.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-10-19 17:07:40 +02:00
Martin Schwidefsky 9e293b5a70 s390,kvm: provide plumbing for machines checks when running guests
This provides the basic plumbing for handling machine checks when
 running guests
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJZU4QPAAoJEBF7vIC1phx8GZsP/2P4nxWXBj0NS/dNq54/u7HU
 Va/zHIG7nUX81WZi8OCkPRlvb1RlcgNpIdw3Ar+BueFE6/qwVWBSdstVJCg6JSn4
 L8T1srSeV6yQEPq1/I9S8ERYtbC8bOC3dDF6g+KyaKYnICjq5yC01+86MKSVfLTI
 vFMPWY/PPCgECtXHjGpWBW6HjofRH3/H+XQbxaoTUyHKwWKdtvWer9K2V7Mc/Cf8
 XsyLY2Xq0Y5MBsJs+71Qw8+0R041Et5I3H7Od9lIc3SFYNoenQpk5oTtsujMtDG1
 ccMPZKErYI4wHE3Hy1ozK+MdFNbepUk3RBI3oXU25tpFPG3OPuksnOqCVN/iZmm+
 le9RuUi9WOOsuygPj2dsnx5v+aheedEcYWqvQ/qrNlP3pXNcpl+8waM6eke8HyCK
 1JKcqqGKBNX5wKNE9b5sRTHINWK12EVCQyVrgLlZaXoXLa40NpJPjtV27vr3ttVl
 WmGYgwMUTo15Rdr0NSJlXl8iCgIFtWMHvuRhIgp8pBuWWb28zr6aX4w++JPwOOMZ
 e4rzn55giCBDnjjDFQK2Knv5XxwnMKafYMxZXfC8gLr5ELjnI6vZDN+1zhT5L2S9
 uXd8l6rLN2qik57RzPV6YEDS0iybZnx5HF/ZPrNoFigJpdD7/0jFS5K5N0i+AhV5
 UQmGhSGnI7Teguc45mHT
 =CTzL
 -----END PGP SIGNATURE-----

Merge tag 'nmiforkvm' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into features

Pull kvm patches from Christian Borntraeger:
"s390,kvm: provide plumbing for machines checks when running guests"

This provides the basic plumbing for handling machine checks when
running guests
2017-06-28 12:57:47 +02:00
QingFeng Hao c929500d7a s390/nmi: s390: New low level handling for machine check happening in guest
Add the logic to check if the machine check happens when the guest is
running. If yes, set the exit reason -EINTR in the machine check's
interrupt handler. Refactor s390_do_machine_check to avoid panicing
the host for some kinds of machine checks which happen
when guest is running.
Reinject the instruction processing damage's machine checks including
Delayed Access Exception instead of damaging the host if it happens
in the guest because it could be caused by improper update on TLB entry
or other software case and impacts the guest only.

Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2017-06-27 16:05:27 +02:00
Martin Schwidefsky f044f4c588 s390/fpu: export save_fpu_regs for all configs
The save_fpu_regs function is a general API that is supposed to be
usable for modules as well. Remove the #ifdef that hides the symbol
for CONFIG_KVM=n.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-13 13:03:43 +02:00
Martin Schwidefsky 23fefe119c s390/kvm: avoid global config of vm.alloc_pgste=1
The system control vm.alloc_pgste is used to control the size of the
page tables, either 2K or 4K. The idea is that a KVM host sets the
vm.alloc_pgste control to 1 which causes *all* new processes to run
with 4K page tables. For a non-kvm system the control should stay off
to save on memory used for page tables.

Trouble is that distributions choose to set the control globally to
be able to run KVM guests. This wastes memory on non-KVM systems.

Introduce the PT_S390_PGSTE ELF segment type to "mark" the qemu
executable with it. All executables with this (empty) segment in
its ELF phdr array will be started with 4K page tables. Any executable
without PT_S390_PGSTE will run with the default 2K page tables.

This removes the need to set vm.alloc_pgste=1 for a KVM host and
minimizes the waste of memory for page tables.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-13 13:03:41 +02:00
Christian Borntraeger c0e7bb38c0 s390/kvm: do not rely on the ILC on kvm host protection fauls
For most cases a protection exception in the host (e.g. copy
on write or dirty tracking) on the sie instruction will indicate
an instruction length of 4. Turns out that there are some corner
cases (e.g. runtime instrumentation) where this is not necessarily
true and the ILC is unpredictable.

Let's replace our 4 byte rewind_pad with 3 byte nops to prepare for
all possible ILCs.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-05-17 12:34:03 +02:00
Martin Schwidefsky 07a63cbe8b s390/cputime: fix incorrect system time
git commit c5328901aa "[S390] entry[64].S improvements" removed
the update of the exit_timer lowcore field from the critical section
cleanup of the .Lsysc_restore/.Lsysc_done and .Lio_restore/.Lio_done
blocks. If the PSW is updated by the critical section cleanup to point to
user space again, the interrupt entry code will do a vtime calculation
after the cleanup completed with an exit_timer value which has *not* been
updated. Due to this incorrect system time deltas are calculated.

If an interrupt occured with an old PSW between .Lsysc_restore/.Lsysc_done
or .Lio_restore/.Lio_done update __LC_EXIT_TIMER with the system entry
time of the interrupt.

Cc: stable@vger.kernel.org # 3.3+
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-05-03 09:08:57 +02:00
Linus Torvalds 76f1948a79 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatch updates from Jiri Kosina:

 - a per-task consistency model is being added for architectures that
   support reliable stack dumping (extending this, currently rather
   trivial set, is currently in the works).

   This extends the nature of the types of patches that can be applied
   by live patching infrastructure. The code stems from the design
   proposal made [1] back in November 2014. It's a hybrid of SUSE's
   kGraft and RH's kpatch, combining advantages of both: it uses
   kGraft's per-task consistency and syscall barrier switching combined
   with kpatch's stack trace switching. There are also a number of
   fallback options which make it quite flexible.

   Most of the heavy lifting done by Josh Poimboeuf with help from
   Miroslav Benes and Petr Mladek

   [1] https://lkml.kernel.org/r/20141107140458.GA21774@suse.cz

 - module load time patch optimization from Zhou Chengming

 - a few assorted small fixes

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
  livepatch: add missing printk newlines
  livepatch: Cancel transition a safe way for immediate patches
  livepatch: Reduce the time of finding module symbols
  livepatch: make klp_mutex proper part of API
  livepatch: allow removal of a disabled patch
  livepatch: add /proc/<pid>/patch_state
  livepatch: change to a per-task consistency model
  livepatch: store function sizes
  livepatch: use kstrtobool() in enabled_store()
  livepatch: move patching functions into patch.c
  livepatch: remove unnecessary object loaded check
  livepatch: separate enabled and patched states
  livepatch/s390: add TIF_PATCH_PENDING thread flag
  livepatch/s390: reorganize TIF thread flag bits
  livepatch/powerpc: add TIF_PATCH_PENDING thread flag
  livepatch/x86: add TIF_PATCH_PENDING thread flag
  livepatch: create temporary klp_update_patch_state() stub
  x86/entry: define _TIF_ALLWORK_MASK flags explicitly
  stacktrace/x86: add function for detecting reliable stack traces
2017-05-02 18:24:16 -07:00
Martin Schwidefsky df26c2e87e s390/cpumf: simplify detection of guest samples
There are three different code levels in regard to the identification
of guest samples. They differ in the way the LPP instruction is used.

1) Old kernels without the LPP instruction. The guest program parameter
   is always zero.
2) Newer kernels load the process pid into the program parameter with LPP.
   The guest program parameter is non-zero if the guest executes in a
   process != idle.
3) The latest kernels load ((1UL << 31) | pid) with LPP to make the value
   non-zero even for the idle task. The guest program parameter is non-zero
   if the guest is running.

All kernels load the process pid to CR4 on context switch. The CPU sampling
code uses the value in CR4 to decide between guest and host samples in case
the guest program parameter is zero. The three cases:

1) CR4==pid, gpp==0
2) CR4==pid, gpp==pid
3) CR4==pid, gpp==((1UL << 31) | pid)

The load-control instruction to load the pid into CR4 is expensive and the
goal is to remove it. To distinguish the host CR4 from the guest pid for
the idle process the maximum value 0xffff for the PASN is used.
This adds a fourth case for a guest OS with an updated kernel:

4) CR4==0xffff, gpp=((1UL << 31) | pid)

The host kernel will have CR4==0xffff and will use (gpp!=0 || CR4!==0xffff)
to identify guest samples. This works nicely with all 4 cases, the only
possible issue would be a guest with an old kernel (gpp==0) and a process
pid of 0xffff. Well, don't do that..

Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-04-05 10:11:38 +02:00
Martin Schwidefsky cab36c262e s390: use 64-bit lctlg to load task pid to cr4 on context switch
The 32-bit lctl instruction is quite a bit slower than the 64-bit
counter part lctlg. Use the faster instruction.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-04-05 07:35:14 +02:00
Martin Schwidefsky 916cda1aa1 s390: add a system call for guarded storage
This adds a new system call to enable the use of guarded storage for
user space processes. The system call takes two arguments, a command
and pointer to a guarded storage control block:

    s390_guarded_storage(int command, struct gs_cb *gs_cb);

The second argument is relevant only for the GS_SET_BC_CB command.

The commands in detail:

0 - GS_ENABLE
    Enable the guarded storage facility for the current task. The
    initial content of the guarded storage control block will be
    all zeros. After the enablement the user space code can use
    load-guarded-storage-controls instruction (LGSC) to load an
    arbitrary control block. While a task is enabled the kernel
    will save and restore the current content of the guarded
    storage registers on context switch.
1 - GS_DISABLE
    Disables the use of the guarded storage facility for the current
    task. The kernel will cease to save and restore the content of
    the guarded storage registers, the task specific content of
    these registers is lost.
2 - GS_SET_BC_CB
    Set a broadcast guarded storage control block. This is called
    per thread and stores a specific guarded storage control block
    in the task struct of the current task. This control block will
    be used for the broadcast event GS_BROADCAST.
3 - GS_CLEAR_BC_CB
    Clears the broadcast guarded storage control block. The guarded-
    storage control block is removed from the task struct that was
    established by GS_SET_BC_CB.
4 - GS_BROADCAST
    Sends a broadcast to all thread siblings of the current task.
    Every sibling that has established a broadcast guarded storage
    control block will load this control block and will be enabled
    for guarded storage. The broadcast guarded storage control block
    is used up, a second broadcast without a refresh of the stored
    control block with GS_SET_BC_CB will not have any effect.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-03-22 08:14:25 +01:00
Miroslav Benes 2f09ca60a5 livepatch/s390: add TIF_PATCH_PENDING thread flag
Update a task's patch state when returning from a system call or user
space interrupt, or after handling a signal.

This greatly increases the chances of a patch operation succeeding.  If
a task is I/O bound, it can be patched when returning from a system
call.  If a task is CPU bound, it can be patched when returning from an
interrupt.  If a task is sleeping on a to-be-patched function, the user
can send SIGSTOP and SIGCONT to force it to switch.

Since there are two ways the syscall can be restarted on return from a
signal handling process, it is important to clear the flag before
do_signal() is called. Otherwise we could miss the migration if we used
SIGSTOP/SIGCONT procedure or fake signal to migrate patching blocking
tasks. If we place our hook to sysc_work label in entry before
TIF_SIGPENDING is evaluated we kill two birds with one stone. The task
is correctly migrated in all return paths from a syscall.

Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-03-08 09:22:40 +01:00
Martin Schwidefsky d9fcf2a1cb s390: fix in-kernel program checks
A program check inside the kernel takes a slightly different path in
entry.S compare to a normal user fault. A recent change moved the store
of the breaking event address into the path taken for in-kernel program
checks as well, but %r14 has not been setup to point to the correct
location. A wild store is the consequence.

Move the store of the breaking event address to the code path for
user space faults.

Fixes: 34525e1f7e ("s390: store breaking event address only for program checks")
Reported-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-03-01 09:59:27 +01:00
Heiko Carstens b5a882fcf1 s390: restore address space when returning to user space
Unbalanced set_fs usages (e.g. early exit from a function and a
forgotten set_fs(USER_DS) call) may lead to a situation where the
secondary asce is the kernel space asce when returning to user
space. This would allow user space to modify kernel space at will.

This would only be possible with the above mentioned kernel bug,
however we can detect this and fix the secondary asce before returning
to user space.

Therefore a new TIF_ASCE_SECONDARY which is used within set_fs. When
returning to user space check if TIF_ASCE_SECONDARY is set, which
would indicate a bug. If it is set print a message to the console,
fixup the secondary asce, and then return to user space.

This is similar to what is being discussed for x86 and arm:
"[RFC] syscalls: Restore address limit after a syscall".

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-02-23 10:06:38 +01:00
Heiko Carstens 606aa4aa0b s390: rename CIF_ASCE to CIF_ASCE_PRIMARY
This is just a preparation patch in order to keep the "restore address
space after syscall" patch small.
Rename CIF_ASCE to CIF_ASCE_PRIMARY to be unique and specific when
introducing a second CIF_ASCE_SECONDARY CIF flag.

Suggested-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-02-23 10:06:38 +01:00
Martin Schwidefsky d24b98e3a9 s390/syscall: fix single stepped system calls
Fix PER tracing of system calls after git commit 34525e1f7e
"s390: store breaking event address only for program checks"
broke it.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-02-20 12:38:01 +01:00
Martin Schwidefsky 57d7f939e7 s390: add no-execute support
Bit 0x100 of a page table, segment table of region table entry
can be used to disallow code execution for the virtual addresses
associated with the entry.

There is one tricky bit, the system call to return from a signal
is part of the signal frame written to the user stack. With a
non-executable stack this would stop working. To avoid breaking
things the protection fault handler checks the opcode that caused
the fault for 0x0a77 (sys_sigreturn) and 0x0aad (sys_rt_sigreturn)
and injects a system call. This is preferable to the alternative
solution with a stub function in the vdso because it works for
vdso=off and statically linked binaries as well.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-02-08 14:13:25 +01:00
Martin Schwidefsky 34525e1f7e s390: store breaking event address only for program checks
The principles of operations specifies that the breaking event address
is stored to the address 0x110 in the prefix page only for program checks.
The last branch in user space is lost as soon as a branch in kernel space
is executed after e.g. an svc. This makes it impossible to accurately
maintain the breaking event address for a user space process.

Simplify the code, just copy the current breaking event address from
0x110 to the task structure for program checks from user space.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-01-31 10:46:53 +01:00
Heiko Carstens 7df1160459 s390: remove unused labels from entry.S
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-12-12 12:04:26 +01:00
Martin Schwidefsky ce4dda3f02 s390: fix machine check panic stack switch
For system damage machine checks or machine checks due to invalid PSW
fields the system will be stopped. In order to get an oops message out
before killing the system the machine check handler branches to
.Lmcck_panic, switches to the panic stack and then does the usual
machine check handling.

The switch to the panic stack is incomplete, the stack pointer in %r15
is replaced, but the pt_regs pointer in %r11 is not. The result is
a program check which will kill the system in a slightly different way.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-12-07 07:22:13 +01:00
Martin Schwidefsky 61aaef51cc s390: fix kernel oops for CONFIG_MARCH_Z900=y builds
The LAST_BREAK macro in entry.S uses a different instruction sequence
for CONFIG_MARCH_Z900 builds. The branch target offset to skip the
store of the last breaking event address needs to take the different
length of the code block into account.

Fixes: f8fc82b471 ("s390: move sys_call_table and last_break from thread_info to thread_struct")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-11-25 10:07:55 +01:00
Heiko Carstens 3a890380e4 s390/thread_info: get rid of THREAD_ORDER define
We have the s390 specific THREAD_ORDER define and the THREAD_SIZE_ORDER
define which is also used in common code. Both have exactly the same
semantics. Therefore get rid of THREAD_ORDER and always use
THREAD_SIZE_ORDER instead.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-11-23 16:02:21 +01:00
Martin Schwidefsky ef280c859f s390: move sys_call_table and last_break from thread_info to thread_struct
Move the last two architecture specific fields from the thread_info
structure to the thread_struct. All that is left in thread_info is
the flags field.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-11-15 16:48:20 +01:00
Heiko Carstens d5c352cdd0 s390: move thread_info into task_struct
This is the s390 variant of commit 15f4eae70d ("x86: Move
thread_info into task_struct").

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-11-11 16:37:41 +01:00
Martin Schwidefsky c360192bf4 s390/preempt: move preempt_count to the lowcore
Convert s390 to use a field in the struct lowcore for the CPU
preemption count. It is a bit cheaper to access a lowcore field
compared to a thread_info variable and it removes the depencency
on a task related structure.

bloat-o-meter on the vmlinux image for the default configuration
(CONFIG_PREEMPT_NONE=y) reports a small reduction in text size:

add/remove: 0/0 grow/shrink: 18/578 up/down: 228/-5448 (-5220)

A larger improvement is achieved with the default configuration
but with CONFIG_PREEMPT=y and CONFIG_DEBUG_PREEMPT=n:

add/remove: 2/6 grow/shrink: 59/4477 up/down: 1618/-228762 (-227144)

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-11-11 16:37:40 +01:00
Al Viro 711f5df7bf s390: move exports to definitions
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-08-07 23:47:20 -04:00
Heiko Carstens 46210c440c s390: have unique symbol for __switch_to address
After linking there are several symbols for the same address that the
__switch_to symbol points to. E.g.:

000000000089b9c0 T __kprobes_text_start
000000000089b9c0 T __lock_text_end
000000000089b9c0 T __lock_text_start
000000000089b9c0 T __sched_text_end
000000000089b9c0 T __switch_to

When disassembling with "objdump -d" this results in a missing
__switch_to function. It would be named __kprobes_text_start
instead. To unconfuse objdump add a nop in front of the kprobes text
section. That way __switch_to appears again.

Obviously this solution is sort of a hack, since it also depends on
link order if this works or not. However it is the best I can come up
with for now.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-07-04 09:25:22 +02:00
Heiko Carstens 43799597dc s390: remove pointless load within __switch_to
Remove a leftover from the code that transferred a couple of TIF bits
from the previous task to the next task.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-28 09:32:40 +02:00
Martin Schwidefsky e370e47694 s390: fix floating pointer register corruption (again)
There is a tricky interaction between the machine check handler
and the critical sections of load_fpu_regs and save_fpu_regs
functions. If the machine check interrupts one of the two
functions the critical section cleanup will complete the function
before the machine check handler s390_do_machine_check is called.
Trouble is that the machine check handler needs to validate the
floating point registers *before* and not *after* the completion
of load_fpu_regs/save_fpu_regs.

The simplest solution is to rewind the PSW to the start of the
load_fpu_regs/save_fpu_regs and retry the function after the
return from the machine check handler.

Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: <stable@vger.kernel.org> # 4.3+
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-10 14:35:42 +01:00
Christian Borntraeger b1685ab9bd s390/cpumf: Improve guest detection heuristics
commit e22cf8ca6f ("s390/cpumf: rework program parameter setting
to detect guest samples") requires guest changes to get proper
guest/host. We can do better: We can use the primary asn value,
which is set on all Linux variants to compare this with the host
pp value.
We now have the following cases:
1. Guest using PP
host sample:  gpp == 0, asn == hpp --> host
guest sample: gpp != 0 --> guest
2. Guest not using PP
host sample:  gpp == 0, asn == hpp --> host
guest sample: gpp == 0, asn != hpp --> guest

As soon as the host no longer sets CR4, we must back out
this heuristics - let's add a comment in switch_to.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-02 06:44:28 -06:00
Martin Schwidefsky 419123f900 s390/spinlock: do not yield to a CPU in udelay/mdelay
It does not make sense to try to relinquish the time slice with diag 0x9c
to a CPU in a state that does not allow to schedule the CPU. The scenario
where this can happen is a CPU waiting in udelay/mdelay while holding a
spin-lock.

Add a CIF bit to tag a CPU in enabled wait and use it to detect that the
yield of a CPU will not be successful and skip the diagnose call.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27 09:24:18 +01:00
Heiko Carstens db7e007fd6 s390/udelay: make udelay have busy loop semantics
When using systemtap it was observed that our udelay implementation is
rather suboptimal if being called from a kprobe handler installed by
systemtap.

The problem observed when a kprobe was installed on lock_acquired().
When the probe was hit the kprobe handler did call udelay, which set
up an (internal) timer and reenabled interrupts (only the clock comparator
interrupt) and waited for the interrupt.
This is an optimization to avoid that the cpu is busy looping while waiting
that enough time passes. The problem is that the interrupt handler still
does call irq_enter()/irq_exit() which then again can lead to a deadlock,
since some accounting functions may take locks as well.

If one of these locks is the same, which caused lock_acquired() to be
called, we have a nice deadlock.

This patch reworks the udelay code for the interrupts disabled case to
immediately leave the low level interrupt handler when the clock
comparator interrupt happens. That way no C code is being called and the
deadlock cannot happen anymore.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:13 +02:00
Christian Borntraeger e22cf8ca6f s390/cpumf: rework program parameter setting to detect guest samples
The program parameter can be used to mark hardware samples with
some token.  Previously, it was used to mark guest samples only.

Improve the program parameter doubleword by combining two parts,
the leftmost LPP part and the rightmost PID part.  Set the PID
part for processes by using the task PID.
To distinguish host and guest samples for the kernel (PID part
is zero), the guest must always set the program paramater to a
non-zero value.  Use the leftmost bit in the LPP part of the
program parameter to be able to detect guest kernel samples.

[brueckner@linux.vnet.ibm.com]: Split __LC_CURRENT and introduced
__LC_LPP. Corrected __LC_CURRENT users and adjusted assembler parts.
And updated the commit message accordingly.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:12 +02:00
Hendrik Brueckner 83abeffbd5 s390/entry: add assembler macro to conveniently tests under mask
Various functions in entry.S perform test-under-mask instructions
to test for particular bits in memory.  Because test-under-mask uses
a mask value of one byte, the mask value and the offset into the
memory must be calculated manually.  This easily introduces errors
and is hard to review and read.

Introduce the TSTMSK assembler macro to specify a mask constant and
let the macro calculate the offset and the byte mask to generate a
test-under-mask instruction.  The benefit is that existing symbolic
constants can now be used for tests.  Also the macro checks for
zero mask values and mask values that consist of multiple bytes.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:09 +02:00
Hendrik Brueckner 0ac277790e s390/fpu: add static FPU save area for init_task
Previously, the init task did not have an allocated FPU save area and
saving an FPU state was not possible.  Now if the vector extension is
always enabled, provide a static FPU save area to save FPU states of
vector instructions that can be executed quite early.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:08 +02:00
Hendrik Brueckner b5510d9b68 s390/fpu: always enable the vector facility if it is available
If the kernel detects that the s390 hardware supports the vector
facility, it is enabled by default at an early stage.  To force
it off, use the novx kernel parameter.  Note that there is a small
time window, where the vector facility is enabled before it is
forced to be off.

With enabling the vector facility by default, the FPU save and
restore functions can be improved.  They do not longer require
to manage expensive control register updates to enable or disable
the vector enablement control for particular processes.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:08 +02:00
Martin Schwidefsky 72d38b1978 s390/vtime: correct scaled cputime of partially idle CPUs
The calculation for the SMT scaling factor for a hardware thread
which has been partially idle needs to disregard the cycles spent
by the other threads of the core while the thread is idle.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-09-30 16:22:38 +02:00
Heiko Carstens 9380cf5a88 s390: fix floating point register corruption
The critical section cleanup code misses to add the offset of the
thread_struct to the task address.
Therefore, if the critical section code gets executed, it may corrupt
the task struct or restore the contents of the floating point registers
from the wrong memory location.
Fixes d0164ee20d "s390/kernel: remove save_fpu_regs() parameter and use
__LC_CURRENT instead".

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-09-17 13:43:41 +02:00
Christian Borntraeger 888d5e9804 KVM: s390: use pid of cpu thread for sampling tagging
Right now we use the address of the sie control block as tag for
the sampling data. This is hard to get for users. Let's just use
the PID of the cpu thread to mark the hardware samples.

Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-08-03 10:04:59 +02:00
Hendrik Brueckner d0164ee20d s390/kernel: remove save_fpu_regs() parameter and use __LC_CURRENT instead
All calls to save_fpu_regs() specify the fpu structure of the current task
pointer as parameter.  The task pointer of the current task can also be
retrieved from the CPU lowcore directly.  Remove the parameter definition,
load the __LC_CURRENT task pointer from the CPU lowcore, and rebase the FPU
structure onto the task structure.  Apply the same approach for the
load_fpu_regs() function.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-08-03 10:04:37 +02:00
Martin Schwidefsky 2acb94f431 s390/nmi: use the normal asynchronous stack for machine checks
If a machine checks is received while the CPU is in the kernel, only
the s390_do_machine_check function will be called. The call to
s390_handle_mcck is postponed until the CPU returns to user space.
Because of this it is safe to use the asynchronous stack for machine
checks even if the CPU is already handling an interrupt.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-07-22 09:58:04 +02:00
Martin Schwidefsky a359bb1190 s390/kernel: squeeze a few more cycles out of the system call handler
Reorder the instructions of UPDATE_VTIME to improve superscalar execution,
remove duplicate checks for problem-state from the asynchronous interrupt
handlers, and move the check for problem-state from the synchronous
exit path to the program check path as it is only needed for program
checks inside the kernel.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-07-22 09:58:04 +02:00
Martin Schwidefsky d0fc41071a s390/kvm: integrate HANDLE_SIE_INTERCEPT into cleanup_critical
Currently there are two mechanisms to deal with cleanup work due to
interrupts. The HANDLE_SIE_INTERCEPT macro is used to undo the changes
required to enter SIE in sie64a. If the SIE instruction causes a program
check, or an asynchronous interrupt is received the HANDLE_SIE_INTERCEPT
code forwards the program execution to sie_exit.

All the other critical sections in entry.S are handled by the code in
cleanup_critical that is called by the SWITCH_ASYNC macro.

Move the sie64a function to the beginning of the critical section and
add the code from HANDLE_SIE_INTERCEPT to cleanup_critical. Add a special
case for the sie64a cleanup to the program check handler.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-07-22 09:58:03 +02:00
Martin Schwidefsky dcd2a9aaa0 s390/kvm: fix interrupt race with HANDLE_SIE_INTERCEPT
The HANDLE_SIE_INTERCEPT macro is used in the interrupt handlers
and the program check handler to undo a few changes done by sie64a.
Among them are guest vs host LPP, the gmap ASCE vs kernel ASCE and
the bit that indicates that SIE is currently running on the CPU.

There is a race of a voluntary SIE exit vs asynchronous interrupts.
If the CPU completed the SIE instruction and the TM instruction of
the LPP macro at the time it receives an interrupt, the interrupt
handler will run while the LPP, the ASCE and the SIE bit are still
set up for guest execution. This might result in wrong sampling data,
but it will not cause data corruption or lockups.

The critical section in sie64a needs to be enlarged to include all
instructions that undo the changes required for guest execution.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-07-22 09:58:03 +02:00
Hendrik Brueckner 9977e886cb s390/kernel: lazy restore fpu registers
Improve the save and restore behavior of FPU register contents to use the
vector extension within the kernel.

The kernel does not use floating-point or vector registers and, therefore,
saving and restoring the FPU register contents are performed for handling
signals or switching processes only.  To prepare for using vector
instructions and vector registers within the kernel, enhance the save
behavior and implement a lazy restore at return to user space from a
system call or interrupt.

To implement the lazy restore, the save_fpu_regs() sets a CPU information
flag, CIF_FPU, to indicate that the FPU registers must be restored.
Saving and setting CIF_FPU is performed in an atomic fashion to be
interrupt-safe.  When the kernel wants to use the vector extension or
wants to change the FPU register state for a task during signal handling,
the save_fpu_regs() must be called first.  The CIF_FPU flag is also set at
process switch.  At return to user space, the FPU state is restored.  In
particular, the FPU state includes the floating-point or vector register
contents, as well as, vector-enablement and floating-point control.  The
FPU state restore and clearing CIF_FPU is also performed in an atomic
fashion.

For KVM, the restore of the FPU register state is performed when restoring
the general-purpose guest registers before the SIE instructions is started.
Because the path towards the SIE instruction is interruptible, the CIF_FPU
flag must be checked again right before going into SIE.  If set, the guest
registers must be reloaded again by re-entering the outer SIE loop.  This
is the same behavior as if the SIE critical section is interrupted.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-07-22 09:58:01 +02:00
Martin Schwidefsky 3827ec3d8f s390: adapt entry.S to the move of thread_struct
git commit 0c8c0f03e3
"x86/fpu, sched: Dynamically allocate 'struct fpu'"
moved the thread_struct to the end of the task_struct.

This causes some of the offsets used in entry.S to overflow their
instruction operand field. To fix this  use aghi to create a
dedicated pointer for the thread_struct.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-07-20 13:22:18 +02:00
Christian Borntraeger 8e23654687 KVM: s390: make exit_sie_sync more robust
exit_sie_sync is used to kick CPUs out of SIE and prevent reentering at
any point in time. This is used to reload the prefix pages and to
set the IBS stuff in a way that guarantees that after this function
returns we are no longer in SIE. All current users trigger KVM requests.

The request must be set before we block the CPUs to avoid races. Let's
make this implicit by adding the request into a new function
kvm_s390_sync_requests that replaces exit_sie_sync and split out
s390_vcpu_block and s390_vcpu_unblock, that can be used to keep
CPUs out of SIE independent of requests.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-05-08 15:51:14 +02:00
Heiko Carstens a876cb3f6b s390: remove 31 bit syscalls
Remove the 31 bit syscalls from the syscall table. This is a separate patch
just in case I screwed something up so it can be easily reverted.
However the conversion was done with a script, so everything should be ok.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-03-25 11:49:35 +01:00
Heiko Carstens 4bfc86ce94 s390: remove "64" suffix from a couple of files
Rename a couple of files to get rid of the "64" suffix.
"git blame" will still work.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-03-25 11:49:34 +01:00
Heiko Carstens 5a79859ae0 s390: remove 31 bit support
Remove the 31 bit support in order to reduce maintenance cost and
effectively remove dead code. Since a couple of years there is no
distribution left that comes with a 31 bit kernel.

The 31 bit kernel also has been broken since more than a year before
anybody noticed. In addition I added a removal warning to the kernel
shown at ipl for 5 minutes: a960062e58 ("s390: add 31 bit warning
message") which let everybody know about the plan to remove 31 bit
code. We didn't get any response.

Given that the last 31 bit only machine was introduced in 1999 let's
remove the code.
Anybody with 31 bit user space code can still use the compat mode.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-03-25 11:49:33 +01:00
Martin Schwidefsky 86ed42f401 s390: use local symbol names in entry[64].S
To improve the output of the perf tool hide most of the symbols
from entry[64].S by using the '.L' prefix.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-12-08 09:42:38 +01:00
Linus Torvalds b05d59dfce At over 200 commits, covering almost all supported architectures, this
was a pretty active cycle for KVM.  Changes include:
 
 - a lot of s390 changes: optimizations, support for migration,
   GDB support and more
 
 - ARM changes are pretty small: support for the PSCI 0.2 hypercall
   interface on both the guest and the host (the latter acked by Catalin)
 
 - initial POWER8 and little-endian host support
 
 - support for running u-boot on embedded POWER targets
 
 - pretty large changes to MIPS too, completing the userspace interface
   and improving the handling of virtualized timer hardware
 
 - for x86, a larger set of changes is scheduled for 3.17.  Still,
   we have a few emulator bugfixes and support for running nested
   fully-virtualized Xen guests (para-virtualized Xen guests have
   always worked).  And some optimizations too.
 
 The only missing architecture here is ia64.  It's not a coincidence
 that support for KVM on ia64 is scheduled for removal in 3.17.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJTjtlBAAoJEBvWZb6bTYbyMOUP/2NAePghE3IjG99ikHFdn+BX
 BfrURsuR6GD0AhYQnBidBmpFbAmN/LwSJxv/M7sV7OBRWLu3qbt69DrPTU2e/FK1
 j9q25peu8jRyHzJ1q9rBroo74nD9lQYuVr3uXNxxcg0DRnw14JHGlM3y8LDEknO8
 W+gpWTeAQ+2AuOX98MpRbCRMuzziCSv5bP5FhBVnsWHiZfvMbcUrbeJt+zYSiDAZ
 0tHm/5dFKzfj/vVrrnjD4EZcRr688Bs5rztG96hY6aoVJryjZGLtLp92wCWkRRmH
 CCvZwd245NmNthuKHzcs27/duSWfU0uOlu7AMrD44QYhzeDGyB/2nbCxbGqLLoBA
 nnOviXH4cC65/CnisZ79zfo979HbZcX+Lzg747EjBgCSxJmLlwgiG8yXtDvk5otB
 TH6GUeGDiEEPj//JD3XtgSz0sF2NvjREWRyemjDMvhz6JC/bLytXKb3sn+NXSj8m
 ujzF9eQoa4qKDcBL4IQYGTJ4z5nY3Pd68dHFIPHB7n82OxFLSQUBKxXw8/1fb5og
 VVb8PL4GOcmakQlAKtTMlFPmuy4bbL2r/2iV5xJiOZKmXIu8Hs1JezBE3SFAltbl
 3cAGwSM9/dDkKxUbTFblyOE9bkKbg4WYmq0LkdzsPEomb3IZWntOT25rYnX+LrBz
 bAknaZpPiOrW11Et1htY
 =j5Od
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm into next

Pull KVM updates from Paolo Bonzini:
 "At over 200 commits, covering almost all supported architectures, this
  was a pretty active cycle for KVM.  Changes include:

   - a lot of s390 changes: optimizations, support for migration, GDB
     support and more

   - ARM changes are pretty small: support for the PSCI 0.2 hypercall
     interface on both the guest and the host (the latter acked by
     Catalin)

   - initial POWER8 and little-endian host support

   - support for running u-boot on embedded POWER targets

   - pretty large changes to MIPS too, completing the userspace
     interface and improving the handling of virtualized timer hardware

   - for x86, a larger set of changes is scheduled for 3.17.  Still, we
     have a few emulator bugfixes and support for running nested
     fully-virtualized Xen guests (para-virtualized Xen guests have
     always worked).  And some optimizations too.

  The only missing architecture here is ia64.  It's not a coincidence
  that support for KVM on ia64 is scheduled for removal in 3.17"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (203 commits)
  KVM: add missing cleanup_srcu_struct
  KVM: PPC: Book3S PR: Rework SLB switching code
  KVM: PPC: Book3S PR: Use SLB entry 0
  KVM: PPC: Book3S HV: Fix machine check delivery to guest
  KVM: PPC: Book3S HV: Work around POWER8 performance monitor bugs
  KVM: PPC: Book3S HV: Make sure we don't miss dirty pages
  KVM: PPC: Book3S HV: Fix dirty map for hugepages
  KVM: PPC: Book3S HV: Put huge-page HPTEs in rmap chain for base address
  KVM: PPC: Book3S HV: Fix check for running inside guest in global_invalidates()
  KVM: PPC: Book3S: Move KVM_REG_PPC_WORT to an unused register number
  KVM: PPC: Book3S: Add ONE_REG register names that were missed
  KVM: PPC: Add CAP to indicate hcall fixes
  KVM: PPC: MPIC: Reset IRQ source private members
  KVM: PPC: Graciously fail broken LE hypercalls
  PPC: ePAPR: Fix hypercall on LE guest
  KVM: PPC: BOOK3S: Remove open coded make_dsisr in alignment handler
  KVM: PPC: BOOK3S: Always use the saved DAR value
  PPC: KVM: Make NX bit available with magic page
  KVM: PPC: Disable NX for old magic page using guests
  KVM: PPC: BOOK3S: HV: Add mixed page-size support for guest
  ...
2014-06-04 08:47:12 -07:00
Martin Schwidefsky d3a73acbc2 s390: split TIF bits into CIF, PIF and TIF bits
The oi and ni instructions used in entry[64].S to set and clear bits
in the thread-flags are not guaranteed to be atomic in regard to other
CPUs. Split the TIF bits into CPU, pt_regs and thread-info specific
bits. Updates on the TIF bits are done with atomic instructions,
updates on CPU and pt_regs bits are done with non-atomic instructions.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-05-20 08:58:47 +02:00
Martin Schwidefsky beef560b4c s390/uaccess: simplify control register updates
Always switch to the kernel ASCE in switch_mm. Load the secondary
space ASCE in finish_arch_post_lock_switch after checking that
any pending page table operations have completed. The primary
ASCE is loaded in entry[64].S. With this the update_primary_asce
call can be removed from the switch_to macro and from the start
of switch_mm function. Remove the load_primary argument from
update_user_asce/clear_user_asce, rename update_user_asce to
set_user_asce and rename update_primary_asce to load_kernel_asce.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-05-20 08:58:46 +02:00
Jens Freimann 21ee7ffd17 s390: rename and split lowcore field per_perc_atmid
per_perc_atmid is currently a two-byte field that combines two
fields, the PER code and the PER Addressing-and-Translation-Mode
Identification (ATMID)

Let's make them accessible indepently and also rename per_cause to
per_code.

Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:48 +02:00
Heiko Carstens 457f218095 s390/uaccess: rework uaccess code - fix locking issues
The current uaccess code uses a page table walk in some circumstances,
e.g. in case of the in atomic futex operations or if running on old
hardware which doesn't support the mvcos instruction.

However it turned out that the page table walk code does not correctly
lock page tables when accessing page table entries.
In other words: a different cpu may invalidate a page table entry while
the current cpu inspects the pte. This may lead to random data corruption.

Adding correct locking however isn't trivial for all uaccess operations.
Especially copy_in_user() is problematic since that requires to hold at
least two locks, but must be protected against ABBA deadlock when a
different cpu also performs a copy_in_user() operation.

So the solution is a different approach where we change address spaces:

User space runs in primary address mode, or access register mode within
vdso code, like it currently already does.

The kernel usually also runs in home space mode, however when accessing
user space the kernel switches to primary or secondary address mode if
the mvcos instruction is not available or if a compare-and-swap (futex)
instruction on a user space address is performed.
KVM however is special, since that requires the kernel to run in home
address space while implicitly accessing user space with the sie
instruction.

So we end up with:

User space:
- runs in primary or access register mode
- cr1 contains the user asce
- cr7 contains the user asce
- cr13 contains the kernel asce

Kernel space:
- runs in home space mode
- cr1 contains the user or kernel asce
  -> the kernel asce is loaded when a uaccess requires primary or
     secondary address mode
- cr7 contains the user or kernel asce, (changed with set_fs())
- cr13 contains the kernel asce

In case of uaccess the kernel changes to:
- primary space mode in case of a uaccess (copy_to_user) and uses
  e.g. the mvcp instruction to access user space. However the kernel
  will stay in home space mode if the mvcos instruction is available
- secondary space mode in case of futex atomic operations, so that the
  instructions come from primary address space and data from secondary
  space

In case of kvm the kernel runs in home space mode, but cr1 gets switched
to contain the gmap asce before the sie instruction gets executed. When
the sie instruction is finished cr1 will be switched back to contain the
user asce.

A context switch between two processes will always load the kernel asce
for the next process in cr1. So the first exit to user space is a bit
more expensive (one extra load control register instruction) than before,
however keeps the code rather simple.

In sum this means there is no need to perform any error prone page table
walks anymore when accessing user space.

The patch seems to be rather large, however it mainly removes the
the page table walk code and restores the previously deleted "standard"
uaccess code, with a couple of changes.

The uaccess without mvcos mode can be enforced with the "uaccess_primary"
kernel parameter.

Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-04-03 14:31:04 +02:00
Martin Schwidefsky 53e857f308 s390/mm,tlb: race of lazy TLB flush vs. recreation of TLB entries
Git commit 050eef364a "[S390] fix tlb flushing vs. concurrent
/proc accesses" introduced the attach counter to avoid using the
mm_users value to decide between IPTE for every PTE and lazy TLB
flushing with IDTE. That fixed the problem with mm_users but it
introduced another subtle race, fortunately one that is very hard
to hit.
The background is the requirement of the architecture that a valid
PTE may not be changed while it can be used concurrently by another
cpu. The decision between IPTE and lazy TLB flushing needs to be
done while the PTE is still valid. Now if the virtual cpu is
temporarily stopped after the decision to use lazy TLB flushing but
before the invalid bit of the PTE has been set, another cpu can attach
the mm, find that flush_mm is set, do the IDTE, return to userspace,
and recreate a TLB that uses the PTE in question. When the first,
stopped cpu continues it will change the PTE while it is attached on
another cpu. The first cpu will do another IDTE shortly after the
modification of the PTE which makes the race window quite short.

To fix this race the CPU that wants to attach the address space of a
user space thread needs to wait for the end of the PTE modification.
The number of concurrent TLB flushers for an mm is tracked in the
upper 16 bits of the attach_count and finish_arch_post_lock_switch
is used to wait for the end of the flush operation if required.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-02-21 08:50:18 +01:00
Martin Schwidefsky dbbfe487e5 s390: fix system call restart after inferior call
Git commit 616498813b "s390: system call path micro optimization"
introduced a regression in regard to system call restarting and inferior
function calls via the ptrace interface. The pointer to the system call
table needs to be loaded in sysc_sigpending if do_signal returns with
TIF_SYSCALl set after it restored a system call context.

Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-09-30 13:04:40 +02:00
Martin Schwidefsky 0587d409ec s390/time: return with irqs disabled from psw_idle
Modify the psw_idle waiting logic in entry[64].S to return with
interrupts disabled. This avoids potential issues with udelay
and interrupt loops as interrupts are not reenabled after
clock comparator interrupts.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-08-28 09:19:23 +02:00
Martin Schwidefsky 1f44a22577 s390: convert interrupt handling to use generic hardirq
With the introduction of PCI it became apparent that s390 should
convert to generic hardirqs as too many drivers do not have the
correct dependency for GENERIC_HARDIRQS. On the architecture
level s390 does not have irq lines. It has external interrupts,
I/O interrupts and adapter interrupts. This patch hard-codes all
external interrupts as irq #1, all I/O interrupts as irq #2 and
all adapter interrupts as irq #3. The additional information from
the lowcore associated with the interrupt is stored in the
pt_regs of the interrupt frame, where the interrupt handler can
pick it up. For PCI/MSI interrupts the adapter interrupt handler
scans the relevant bit fields and calls generic_handle_irq with
the virtual irq number for the MSI interrupt.

Reviewed-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-08-22 12:20:04 +02:00
Martin Schwidefsky 48f6b00c6e s390/irq: store interrupt information in pt_regs
Copy the interrupt parameters from the lowcore to the pt_regs structure
in entry[64].S and reduce the arguments of the low level interrupt handler
to the pt_regs pointer only. In addition move the test-pending-interrupt
loop from do_IRQ to entry[64].S to make sure that interrupt information
is always delivered via pt_regs.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-06-26 21:10:23 +02:00