Commit Graph

148 Commits

Author SHA1 Message Date
Sudeep Holla db8f7066dc powerpc/64: drop redundant defination of spin_until_cond
linux/processor.h has exactly same defination for spin_until_cond.
Drop the redundant defination in asm/processor.h

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1fff2054e5dfc00329804dbd3f2a91667c9a8aff.1623438544.git.christophe.leroy@csgroup.eu
2021-06-17 00:09:11 +10:00
Christophe Leroy 16132529ce powerpc/32s: Rework Kernel Userspace Access Protection
On book3s/32, KUAP is provided by toggling Ks bit in segment registers.
One segment register addresses 256M of virtual memory.

At the time being, KUAP implements a complex logic to apply the
unlock/lock on the exact number of segments covering the user range
to access, with saving the boundaries of the range of segments in
a member of thread struct.

But most if not all user accesses are within a single segment.

Rework KUAP with a different approach:
- Open only one segment, the one corresponding to the starting
address of the range to be accessed.
- If a second segment is involved, it will generate a page fault. The
segment will then be open by the page fault handler.

The kuap member of thread struct will now contain:
- The start address of the current on going user access, that will be
used to know which segment to lock at the end of the user access.
- ~0 when no user access is open
- ~1 when additionnal segments are opened by a page fault.

Then, at lock time
- When only one segment is open, close it.
- When several segments are open, close all user segments.

Almost 100% of the time, only one segment will be involved.

In interrupts, inline the function that unlock/lock all segments,
because not inlining them implies a lot of register save/restore.

With the patch, writing value 128 in userspace in perf_copy_attr() is
done with 16 instructions:

    3890:	93 82 04 dc 	stw     r28,1244(r2)
    3894:	7d 20 e5 26 	mfsrin  r9,r28
    3898:	55 29 00 80 	rlwinm  r9,r9,0,2,0
    389c:	7d 20 e1 e4 	mtsrin  r9,r28
    38a0:	4c 00 01 2c 	isync

    38a4:	39 20 00 80 	li      r9,128
    38a8:	91 3c 00 00 	stw     r9,0(r28)

    38ac:	81 42 04 dc 	lwz     r10,1244(r2)
    38b0:	39 00 ff ff 	li      r8,-1
    38b4:	91 02 04 dc 	stw     r8,1244(r2)
    38b8:	2c 0a ff fe 	cmpwi   r10,-2
    38bc:	41 82 00 88 	beq     3944 <perf_copy_attr+0x36c>
    38c0:	7d 20 55 26 	mfsrin  r9,r10
    38c4:	65 29 40 00 	oris    r9,r9,16384
    38c8:	7d 20 51 e4 	mtsrin  r9,r10
    38cc:	4c 00 01 2c 	isync
...
    3944:	48 00 00 01 	bl      3944 <perf_copy_attr+0x36c>
			3944: R_PPC_REL24	kuap_lock_all_ool

Before the patch it was 118 instructions. In reality only 42 are
executed in most cases, but GCC is not able to see that a properly
aligned user access cannot involve more than one segment.

    5060:	39 1d 00 04 	addi    r8,r29,4
    5064:	3d 20 b0 00 	lis     r9,-20480
    5068:	7c 08 48 40 	cmplw   r8,r9
    506c:	40 81 00 08 	ble     5074 <perf_copy_attr+0x2cc>
    5070:	3d 00 b0 00 	lis     r8,-20480
    5074:	39 28 ff ff 	addi    r9,r8,-1
    5078:	57 aa 00 06 	rlwinm  r10,r29,0,0,3
    507c:	55 29 27 3e 	rlwinm  r9,r9,4,28,31
    5080:	39 29 00 01 	addi    r9,r9,1
    5084:	7d 29 53 78 	or      r9,r9,r10
    5088:	91 22 04 dc 	stw     r9,1244(r2)
    508c:	7d 20 ed 26 	mfsrin  r9,r29
    5090:	55 29 00 80 	rlwinm  r9,r9,0,2,0
    5094:	7c 08 50 40 	cmplw   r8,r10
    5098:	40 81 00 c0 	ble     5158 <perf_copy_attr+0x3b0>
    509c:	7d 46 50 f8 	not     r6,r10
    50a0:	7c c6 42 14 	add     r6,r6,r8
    50a4:	54 c6 27 be 	rlwinm  r6,r6,4,30,31
    50a8:	7d 20 51 e4 	mtsrin  r9,r10
    50ac:	3c ea 10 00 	addis   r7,r10,4096
    50b0:	39 29 01 11 	addi    r9,r9,273
    50b4:	7f 88 38 40 	cmplw   cr7,r8,r7
    50b8:	55 29 02 06 	rlwinm  r9,r9,0,8,3
    50bc:	40 9d 00 9c 	ble     cr7,5158 <perf_copy_attr+0x3b0>

    50c0:	2f 86 00 00 	cmpwi   cr7,r6,0
    50c4:	41 9e 00 4c 	beq     cr7,5110 <perf_copy_attr+0x368>
    50c8:	2f 86 00 01 	cmpwi   cr7,r6,1
    50cc:	41 9e 00 2c 	beq     cr7,50f8 <perf_copy_attr+0x350>
    50d0:	2f 86 00 02 	cmpwi   cr7,r6,2
    50d4:	41 9e 00 14 	beq     cr7,50e8 <perf_copy_attr+0x340>
    50d8:	7d 20 39 e4 	mtsrin  r9,r7
    50dc:	39 29 01 11 	addi    r9,r9,273
    50e0:	3c e7 10 00 	addis   r7,r7,4096
    50e4:	55 29 02 06 	rlwinm  r9,r9,0,8,3
    50e8:	7d 20 39 e4 	mtsrin  r9,r7
    50ec:	39 29 01 11 	addi    r9,r9,273
    50f0:	3c e7 10 00 	addis   r7,r7,4096
    50f4:	55 29 02 06 	rlwinm  r9,r9,0,8,3
    50f8:	7d 20 39 e4 	mtsrin  r9,r7
    50fc:	3c e7 10 00 	addis   r7,r7,4096
    5100:	39 29 01 11 	addi    r9,r9,273
    5104:	7f 88 38 40 	cmplw   cr7,r8,r7
    5108:	55 29 02 06 	rlwinm  r9,r9,0,8,3
    510c:	40 9d 00 4c 	ble     cr7,5158 <perf_copy_attr+0x3b0>
    5110:	7d 20 39 e4 	mtsrin  r9,r7
    5114:	39 29 01 11 	addi    r9,r9,273
    5118:	3c c7 10 00 	addis   r6,r7,4096
    511c:	55 29 02 06 	rlwinm  r9,r9,0,8,3
    5120:	7d 20 31 e4 	mtsrin  r9,r6
    5124:	39 29 01 11 	addi    r9,r9,273
    5128:	3c c6 10 00 	addis   r6,r6,4096
    512c:	55 29 02 06 	rlwinm  r9,r9,0,8,3
    5130:	7d 20 31 e4 	mtsrin  r9,r6
    5134:	39 29 01 11 	addi    r9,r9,273
    5138:	3c c7 30 00 	addis   r6,r7,12288
    513c:	55 29 02 06 	rlwinm  r9,r9,0,8,3
    5140:	7d 20 31 e4 	mtsrin  r9,r6
    5144:	3c e7 40 00 	addis   r7,r7,16384
    5148:	39 29 01 11 	addi    r9,r9,273
    514c:	7f 88 38 40 	cmplw   cr7,r8,r7
    5150:	55 29 02 06 	rlwinm  r9,r9,0,8,3
    5154:	41 9d ff bc 	bgt     cr7,5110 <perf_copy_attr+0x368>

    5158:	4c 00 01 2c 	isync
    515c:	39 20 00 80 	li      r9,128
    5160:	91 3d 00 00 	stw     r9,0(r29)

    5164:	38 e0 00 00 	li      r7,0
    5168:	90 e2 04 dc 	stw     r7,1244(r2)
    516c:	7d 20 ed 26 	mfsrin  r9,r29
    5170:	65 29 40 00 	oris    r9,r9,16384
    5174:	40 81 00 c0 	ble     5234 <perf_copy_attr+0x48c>
    5178:	7d 47 50 f8 	not     r7,r10
    517c:	7c e7 42 14 	add     r7,r7,r8
    5180:	54 e7 27 be 	rlwinm  r7,r7,4,30,31
    5184:	7d 20 51 e4 	mtsrin  r9,r10
    5188:	3d 4a 10 00 	addis   r10,r10,4096
    518c:	39 29 01 11 	addi    r9,r9,273
    5190:	7c 08 50 40 	cmplw   r8,r10
    5194:	55 29 02 06 	rlwinm  r9,r9,0,8,3
    5198:	40 81 00 9c 	ble     5234 <perf_copy_attr+0x48c>

    519c:	2c 07 00 00 	cmpwi   r7,0
    51a0:	41 82 00 4c 	beq     51ec <perf_copy_attr+0x444>
    51a4:	2c 07 00 01 	cmpwi   r7,1
    51a8:	41 82 00 2c 	beq     51d4 <perf_copy_attr+0x42c>
    51ac:	2c 07 00 02 	cmpwi   r7,2
    51b0:	41 82 00 14 	beq     51c4 <perf_copy_attr+0x41c>
    51b4:	7d 20 51 e4 	mtsrin  r9,r10
    51b8:	39 29 01 11 	addi    r9,r9,273
    51bc:	3d 4a 10 00 	addis   r10,r10,4096
    51c0:	55 29 02 06 	rlwinm  r9,r9,0,8,3
    51c4:	7d 20 51 e4 	mtsrin  r9,r10
    51c8:	39 29 01 11 	addi    r9,r9,273
    51cc:	3d 4a 10 00 	addis   r10,r10,4096
    51d0:	55 29 02 06 	rlwinm  r9,r9,0,8,3
    51d4:	7d 20 51 e4 	mtsrin  r9,r10
    51d8:	3d 4a 10 00 	addis   r10,r10,4096
    51dc:	39 29 01 11 	addi    r9,r9,273
    51e0:	7c 08 50 40 	cmplw   r8,r10
    51e4:	55 29 02 06 	rlwinm  r9,r9,0,8,3
    51e8:	40 81 00 4c 	ble     5234 <perf_copy_attr+0x48c>
    51ec:	7d 20 51 e4 	mtsrin  r9,r10
    51f0:	39 29 01 11 	addi    r9,r9,273
    51f4:	3c ea 10 00 	addis   r7,r10,4096
    51f8:	55 29 02 06 	rlwinm  r9,r9,0,8,3
    51fc:	7d 20 39 e4 	mtsrin  r9,r7
    5200:	39 29 01 11 	addi    r9,r9,273
    5204:	3c e7 10 00 	addis   r7,r7,4096
    5208:	55 29 02 06 	rlwinm  r9,r9,0,8,3
    520c:	7d 20 39 e4 	mtsrin  r9,r7
    5210:	39 29 01 11 	addi    r9,r9,273
    5214:	3c ea 30 00 	addis   r7,r10,12288
    5218:	55 29 02 06 	rlwinm  r9,r9,0,8,3
    521c:	7d 20 39 e4 	mtsrin  r9,r7
    5220:	3d 4a 40 00 	addis   r10,r10,16384
    5224:	39 29 01 11 	addi    r9,r9,273
    5228:	7c 08 50 40 	cmplw   r8,r10
    522c:	55 29 02 06 	rlwinm  r9,r9,0,8,3
    5230:	41 81 ff bc 	bgt     51ec <perf_copy_attr+0x444>

    5234:	4c 00 01 2c 	isync

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Export the ool handlers to fix build errors]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d9121f96a7c4302946839a0771f5d1daeeb6968c.1622708530.git.christophe.leroy@csgroup.eu
2021-06-17 00:09:08 +10:00
Nicholas Piggin 98db179a78 powerpc/64s: power4 nap fixup in C
There is no need for this to be in asm, use the new intrrupt entry wrapper.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210406025508.821718-1-npiggin@gmail.com
2021-04-08 21:17:45 +10:00
Christophe Leroy e448e1e774 powerpc/math: Fix missing __user qualifier for get_user() and other sparse warnings
Sparse reports the following problems:

arch/powerpc/math-emu/math.c:228:21: warning: Using plain integer as NULL pointer
arch/powerpc/math-emu/math.c:228:31: warning: Using plain integer as NULL pointer
arch/powerpc/math-emu/math.c:228:41: warning: Using plain integer as NULL pointer
arch/powerpc/math-emu/math.c:228:51: warning: Using plain integer as NULL pointer
arch/powerpc/math-emu/math.c:237:13: warning: incorrect type in initializer (different address spaces)
arch/powerpc/math-emu/math.c:237:13:    expected unsigned int [noderef] __user *_gu_addr
arch/powerpc/math-emu/math.c:237:13:    got unsigned int [usertype] *
arch/powerpc/math-emu/math.c:226:1: warning: symbol 'do_mathemu' was not declared. Should it be static?

Add missing __user qualifier when casting pointer used in get_user()

Use NULL instead of 0 to initialise opX local variables.

Add a prototype for do_mathemu() (Added in processor.h like sparc)

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e4d1aae7604d89c98a52dfd8ce8443462e595670.1615809591.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:12 +11:00
Christophe Leroy 7aa8dd67f1 powerpc/32: Always enable data translation in exception prolog
If the code can use a stack in vm area, it can also use a
stack in linear space.

Simplify code by removing old non VMAP stack code on PPC32.

That means the data translation is now re-enabled early in
exception prolog in all cases, not only when using VMAP stacks.

While we are touching EXCEPTION_PROLOG macros, remove the
unused for_rtas parameter in EXCEPTION_PROLOG_1.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7cd6440c60a7e8f4f035b245c57720f51e225aae.1615552866.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:05 +11:00
Christophe Leroy 5747230645 powerpc/32: Remove ksp_limit
ksp_limit is there to help detect stack overflows.
That is specific to ppc32 as it was removed from ppc64 in
commit cbc9565ee8 ("powerpc: Remove ksp_limit on ppc64").

There are other means for detecting stack overflows.

As ppc64 has proven to not need it, ppc32 should be able to do
without it too.

Lets remove it and simplify exception handling.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d789c3385b22e07bedc997613c0d26074cb513e7.1615552866.git.christophe.leroy@csgroup.eu
2021-03-29 13:22:05 +11:00
Aneesh Kumar K.V 48a8ab4eeb powerpc/book3s64/pkeys: Don't update SPRN_AMR when in kernel mode.
Now that kernel correctly store/restore userspace AMR/IAMR values, avoid
manipulating AMR and IAMR from the kernel on behalf of userspace.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201127044424.40686-15-aneesh.kumar@linux.ibm.com
2020-12-04 01:01:26 +11:00
Christophe Leroy 0ecbc6ad18 powerpc/signal: Remove get_clean_sp()
get_clean_sp() is only used once in kernel/signal.c .

GCC is smart enough to see that x & 0xffffffff is a nop
calculation on PPC32, no need of a special PPC32 trivial version.

Include the logic from the PPC64 version of get_clean_sp() directly
in get_sigframe().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/13ef6510ce30a4867e043157b93af5bb8c67fb3b.1597770847.git.christophe.leroy@csgroup.eu
2020-12-04 01:01:12 +11:00
Christophe Leroy b6254ced4d powerpc/signal: Don't manage floating point regs when no FPU
There is no point in copying floating point regs when there
is no FPU and MATH_EMULATION is not selected.

Create a new CONFIG_PPC_FPU_REGS bool that is selected by
CONFIG_MATH_EMULATION and CONFIG_PPC_FPU, and use it to
opt out everything related to fp_state in thread_struct.

The asm const used only by fpu.S are opted out with CONFIG_PPC_FPU
as fpu.S build is conditionnal to CONFIG_PPC_FPU.

The following app spends approx 8.1 seconds system time on an 8xx
without the patch, and 7.0 seconds with the patch (13.5% reduction).

On an 832x, it spends approx 2.6 seconds system time without
the patch and 2.1 seconds with the patch (19% reduction).

	void sigusr1(int sig) { }

	int main(int argc, char **argv)
	{
		int i = 100000;

		signal(SIGUSR1, sigusr1);
		for (;i--;)
			raise(SIGUSR1);
		exit(0);
	}

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7569070083e6cd5b279bb5023da601aba3c06f3c.1597770847.git.christophe.leroy@csgroup.eu
2020-12-04 01:01:11 +11:00
Christophe Leroy 8f8cffd9df powerpc/processor: Move cpu_relax() into asm/vdso/processor.h
cpu_relax() need to be in asm/vdso/processor.h to be used by
the C VDSO generic library.

Move it there.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201126131006.2431205-2-mpe@ellerman.id.au
2020-12-04 01:01:09 +11:00
Linus Torvalds f56e65dff6 Merge branch 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull initial set_fs() removal from Al Viro:
 "Christoph's set_fs base series + fixups"

* 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: Allow a NULL pos pointer to __kernel_read
  fs: Allow a NULL pos pointer to __kernel_write
  powerpc: remove address space overrides using set_fs()
  powerpc: use non-set_fs based maccess routines
  x86: remove address space overrides using set_fs()
  x86: make TASK_SIZE_MAX usable from assembly code
  x86: move PAGE_OFFSET, TASK_SIZE & friends to page_{32,64}_types.h
  lkdtm: remove set_fs-based tests
  test_bitmap: remove user bitmap tests
  uaccess: add infrastructure for kernel builds with set_fs()
  fs: don't allow splice read/write without explicit ops
  fs: don't allow kernel reads and writes without iter ops
  sysctl: Convert to iter interfaces
  proc: add a read_iter method to proc proc_ops
  proc: cleanup the compat vs no compat file ops
  proc: remove a level of indentation in proc_get_inode
2020-10-22 09:59:21 -07:00
Gustavo Romero d0ffdee8ff powerpc/tm: Save and restore AMR on treclaim and trechkpt
Althought AMR is stashed in the checkpoint area, currently we don't save
it to the per thread checkpoint struct after a treclaim and so we don't
restore it either from that struct when we trechkpt. As a consequence when
the transaction is later rolled back the kernel space AMR value when the
trechkpt was done appears in userspace.

That commit saves and restores AMR accordingly on treclaim and trechkpt.
Since AMR value is also used in kernel space in other functions, it also
takes care of stashing kernel live AMR into the stack before treclaim and
before trechkpt, restoring it later, just before returning from tm_reclaim
and __tm_recheckpoint.

Is also fixes two nonrelated comments about CR and MSR.

Signed-off-by: Gustavo Romero <gromero@linux.ibm.com>
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200919150025.9609-1-gromero@linux.ibm.com
2020-10-06 23:22:25 +11:00
Nicholas Piggin ffd2961bb4 powerpc/powernv/idle: add a basic stop 0-3 driver for POWER10
This driver does not restore stop > 3 state, so it limits itself
to states which do not lose full state or TB.

The POWER10 SPRs are sufficiently different from P9 that it seems
easier to split out the P10 code. The POWER10 deep sleep code
(e.g., the BHRB restore) has been taken out, but it can be re-added
when stop > 3 support is added.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Pratik Rajesh Sampat<psampat@linux.ibm.com>
Tested-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Reviewed-by: Pratik Rajesh Sampat<psampat@linux.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200819094700.493399-1-npiggin@gmail.com
2020-09-15 22:13:38 +10:00
Christoph Hellwig 5ae4998b5d powerpc: remove address space overrides using set_fs()
Stop providing the possibility to override the address space using
set_fs() now that there is no need for that any more.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-09-08 22:21:37 -04:00
Christophe Leroy f663f33120 powerpc: Move flush_instruction_cache() prototype in asm/cacheflush.h
flush_instruction_cache() belongs to the cache flushing function
family.

Move its prototype in asm/cacheflush.h

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/993445b5227e8ca2f0e38bcc9ea3dfea6e865920.1597384512.git.christophe.leroy@csgroup.eu
2020-09-02 11:00:21 +10:00
Christophe Leroy e53281bc21 powerpc: Drop _nmask_and_or_msr()
_nmask_and_or_msr() is only used at two places to set MSR_IP.

The SYNC is unnecessary as the users are not PowerPC 601.

Can be easily writen in C.

Do it, and drop _nmask_and_or_msr()

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c2d2b8dfb8dd677026b26dffc8d31070c38a6b89.1597388079.git.christophe.leroy@csgroup.eu
2020-09-02 11:00:20 +10:00
Christophe Leroy 82eb179242 powerpc: drop hard_reset_now() and poweroff_now() declaration
Those function have never existed. Drop their declaration.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/edcdd72a36495d25213c0256c8022367458e0d19.1596716418.git.christophe.leroy@csgroup.eu
2020-09-02 11:00:20 +10:00
Christophe Leroy 63442de430 powerpc/fpu: Drop cvt_fd() and cvt_df()
Those two functions have been unused since commit identified below.
Drop them.

Fixes: 31bfdb036f ("powerpc: Use instruction emulation infrastructure to handle alignment faults")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d5641ada199b8dd2af16ad00a66084cf974f2704.1596716418.git.christophe.leroy@csgroup.eu
2020-09-02 11:00:19 +10:00
Madhavan Srinivasan c718547e4a powerpc/perf: Add support for ISA3.1 PMU SPRs
PowerISA v3.1 includes new performance monitoring unit(PMU)
special purpose registers (SPRs). They are

Monitor Mode Control Register 3 (MMCR3)
Sampled Instruction Event Register 2 (SIER2)
Sampled Instruction Event Register 3 (SIER3)

MMCR3 is added for further sampling related configuration
control. SIER2/SIER3 are added to provide additional
information about the sampled instruction.

Patch adds new PPMU flag called "PPMU_ARCH_31" to support handling of
these new SPRs, updates the struct thread_struct to include these new
SPRs, include MMCR3 in struct mmcr_regs. This is needed to support
programming of MMCR3 SPR during event_enable/disable. Patch also adds
the sysfs support for the MMCR3 SPR along with SPRN_ macros for these
new pmu SPRs.

Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
[mpe: Rename to PPMU_ARCH_31 as noted by jpn]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1594996707-3727-5-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-22 21:56:41 +10:00
Aneesh Kumar K.V e0d8e991be powerpc/book3s64/kuap: Move UAMOR setup to key init function
UAMOR values are not application-specific. The kernel initializes
its value based on different reserved keys. Remove the thread-specific
UAMOR value and don't switch the UAMOR on context switch.

Move UAMOR initialization to key initialization code and remove
thread_struct.uamor because it is not used anymore.

Before commit: 4a4a5e5d2a ("powerpc/pkeys: key allocation/deallocation must not change pkey registers")
we used to update uamor based on key allocation and free.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200709032946.881753-20-aneesh.kumar@linux.ibm.com
2020-07-20 22:57:59 +10:00
Michael Ellerman c887ef5707 powerpc/64s: Don't set FSCR bits in INIT_THREAD
Since the previous commit that saves the value of FSCR configured at
boot into init_task.thread.fscr, the static initialisation in
INIT_THREAD now no longer has any effect.

So remove it.

For non DT CPU features, the end result is the same, because
__init_FSCR() is called on all CPUs that have an FSCR (Power8,
Power9), and it sets FSCR_TAR & FSCR_EBB.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200527145843.2761782-4-mpe@ellerman.id.au
2020-06-02 20:59:18 +10:00
Ravi Bangoria 74c6881019 powerpc/watchpoint: Prepare handler to handle more than one watchpoint
Currently we assume that we have only one watchpoint supported by hw.
Get rid of that assumption and use dynamic loop instead. This should
make supporting more watchpoints very easy.

With more than one watchpoint, exception handler needs to know which
DAWR caused the exception, and hw currently does not provide it. So
we need sw logic for the same. To figure out which DAWR caused the
exception, check all different combinations of user specified range,
DAWR address range, actual access range and DAWRX constrains. For ex,
if user specified range and actual access range overlaps but DAWRX is
configured for readonly watchpoint and the instruction is store, this
DAWR must not have caused exception.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Reviewed-by: Michael Neuling <mikey@neuling.org>
[mpe: Unsplit multi-line printk() strings, fix some sparse warnings]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200514111741.97993-14-ravi.bangoria@linux.ibm.com
2020-05-19 00:14:37 +10:00
Ravi Bangoria 303e6a9ddc powerpc/watchpoint: Convert thread_struct->hw_brk to an array
So far powerpc hw supported only one watchpoint. But Power10 is
introducing 2nd DAWR. Convert thread_struct->hw_brk into an array.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Michael Neuling <mikey@neuling.org>
Link: https://lore.kernel.org/r/20200514111741.97993-10-ravi.bangoria@linux.ibm.com
2020-05-19 00:11:05 +10:00
Ravi Bangoria a6ba44e879 powerpc/watchpoint: Introduce function to get nr watchpoints dynamically
So far we had only one watchpoint, so we have hardcoded HBP_NUM to 1.
But Power10 is introducing 2nd DAWR and thus kernel should be able to
dynamically find actual number of watchpoints supported by hw it's
running on. Introduce function for the same. Also convert HBP_NUM macro
to HBP_NUM_MAX, which will now represent maximum number of watchpoints
supported by Powerpc.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Michael Neuling <mikey@neuling.org>
Link: https://lore.kernel.org/r/20200514111741.97993-4-ravi.bangoria@linux.ibm.com
2020-05-19 00:11:04 +10:00
Michael Ellerman 24ac99e97f powerpc: Drop unneeded cast in task_pt_regs()
There's no need to cast in task_pt_regs() as tsk->thread.regs should
already be a struct pt_regs. If someone's using task_pt_regs() on
something that's not a task but happens to have a thread.regs then
we'll deal with them later.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200428123152.73566-1-mpe@ellerman.id.au
2020-05-15 11:58:55 +10:00
Michael Ellerman 7ffa8b7dc1 powerpc/64: Don't initialise init_task->thread.regs
Aneesh increased the size of struct pt_regs by 16 bytes and started
seeing this WARN_ON:

  smp: Bringing up secondary CPUs ...
  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 0 at arch/powerpc/kernel/process.c:455 giveup_all+0xb4/0x110
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc2-gcc-8.2.0-1.g8f6a41f-default+ #318
  NIP:  c00000000001a2b4 LR: c00000000001a29c CTR: c0000000031d0000
  REGS: c0000000026d3980 TRAP: 0700   Not tainted  (5.7.0-rc2-gcc-8.2.0-1.g8f6a41f-default+)
  MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 48048224  XER: 00000000
  CFAR: c000000000019cc8 IRQMASK: 1
  GPR00: c00000000001a264 c0000000026d3c20 c0000000026d7200 800000000280b033
  GPR04: 0000000000000001 0000000000000000 0000000000000077 30206d7372203164
  GPR08: 0000000000002000 0000000002002000 800000000280b033 3230303030303030
  GPR12: 0000000000008800 c0000000031d0000 0000000000800050 0000000002000066
  GPR16: 000000000309a1a0 000000000309a4b0 000000000309a2d8 000000000309a890
  GPR20: 00000000030d0098 c00000000264da40 00000000fd620000 c0000000ff798080
  GPR24: c00000000264edf0 c0000001007469f0 00000000fd620000 c0000000020e5e90
  GPR28: c00000000264edf0 c00000000264d200 000000001db60000 c00000000264d200
  NIP [c00000000001a2b4] giveup_all+0xb4/0x110
  LR [c00000000001a29c] giveup_all+0x9c/0x110
  Call Trace:
  [c0000000026d3c20] [c00000000001a264] giveup_all+0x64/0x110 (unreliable)
  [c0000000026d3c90] [c00000000001ae34] __switch_to+0x104/0x480
  [c0000000026d3cf0] [c000000000e0b8a0] __schedule+0x320/0x970
  [c0000000026d3dd0] [c000000000e0c518] schedule_idle+0x38/0x70
  [c0000000026d3df0] [c00000000019c7c8] do_idle+0x248/0x3f0
  [c0000000026d3e70] [c00000000019cbb8] cpu_startup_entry+0x38/0x40
  [c0000000026d3ea0] [c000000000011bb0] rest_init+0xe0/0xf8
  [c0000000026d3ed0] [c000000002004820] start_kernel+0x990/0x9e0
  [c0000000026d3f90] [c00000000000c49c] start_here_common+0x1c/0x400

Which was unexpected. The warning is checking the thread.regs->msr
value of the task we are switching from:

  usermsr = tsk->thread.regs->msr;
  ...
  WARN_ON((usermsr & MSR_VSX) && !((usermsr & MSR_FP) && (usermsr & MSR_VEC)));

ie. if MSR_VSX is set then both of MSR_FP and MSR_VEC are also set.

Dumping tsk->thread.regs->msr we see that it's: 0x1db60000

Which is not a normal looking MSR, in fact the only valid bit is
MSR_VSX, all the other bits are reserved in the current definition of
the MSR.

We can see from the oops that it was swapper/0 that we were switching
from when we hit the warning, ie. init_task. So its thread.regs points
to the base (high addresses) in init_stack.

Dumping the content of init_task->thread.regs, with the members of
pt_regs annotated (the 16 bytes larger version), we see:

  0000000000000000 c000000002780080    gpr[0]     gpr[1]
  0000000000000000 c000000002666008    gpr[2]     gpr[3]
  c0000000026d3ed0 0000000000000078    gpr[4]     gpr[5]
  c000000000011b68 c000000002780080    gpr[6]     gpr[7]
  0000000000000000 0000000000000000    gpr[8]     gpr[9]
  c0000000026d3f90 0000800000002200    gpr[10]    gpr[11]
  c000000002004820 c0000000026d7200    gpr[12]    gpr[13]
  000000001db60000 c0000000010aabe8    gpr[14]    gpr[15]
  c0000000010aabe8 c0000000010aabe8    gpr[16]    gpr[17]
  c00000000294d598 0000000000000000    gpr[18]    gpr[19]
  0000000000000000 0000000000001ff8    gpr[20]    gpr[21]
  0000000000000000 c00000000206d608    gpr[22]    gpr[23]
  c00000000278e0cc 0000000000000000    gpr[24]    gpr[25]
  000000002fff0000 c000000000000000    gpr[26]    gpr[27]
  0000000002000000 0000000000000028    gpr[28]    gpr[29]
  000000001db60000 0000000004750000    gpr[30]    gpr[31]
  0000000002000000 000000001db60000    nip        msr
  0000000000000000 0000000000000000    orig_r3    ctr
  c00000000000c49c 0000000000000000    link       xer
  0000000000000000 0000000000000000    ccr        softe
  0000000000000000 0000000000000000    trap       dar
  0000000000000000 0000000000000000    dsisr      result
  0000000000000000 0000000000000000    ppr        kuap
  0000000000000000 0000000000000000    pad[2]     pad[3]

This looks suspiciously like stack frames, not a pt_regs. If we look
closely we can see return addresses from the stack trace above,
c000000002004820 (start_kernel) and c00000000000c49c (start_here_common).

init_task->thread.regs is setup at build time in processor.h:

  #define INIT_THREAD  { \
  	.ksp = INIT_SP, \
  	.regs = (struct pt_regs *)INIT_SP - 1, /* XXX bogus, I think */ \

The early boot code where we setup the initial stack is:

  LOAD_REG_ADDR(r3,init_thread_union)

  /* set up a stack pointer */
  LOAD_REG_IMMEDIATE(r1,THREAD_SIZE)
  add	r1,r3,r1
  li	r0,0
  stdu	r0,-STACK_FRAME_OVERHEAD(r1)

Which creates a stack frame of size 112 bytes (STACK_FRAME_OVERHEAD).
Which is far too small to contain a pt_regs.

So the result is init_task->thread.regs is pointing at some stack
frames on the init stack, not at a pt_regs.

We have gotten away with this for so long because with pt_regs at its
current size the MSR happens to point into the first frame, at a
location that is not written to by the early asm. With the 16 byte
expansion the MSR falls into the second frame, which is used by the
compiler, and collides with a saved register that tends to be
non-zero.

As far as I can see this has been wrong since the original merge of
64-bit ppc support, back in 2002.

Conceptually swapper should have no regs, it never entered from
userspace, and in fact that's what we do on 32-bit. It's also
presumably what the "bogus" comment is referring to.

So I think the right fix is to just not-initialise regs at all. I'm
slightly worried this will break some code that isn't prepared for a
NULL regs, but we'll have to see.

Remove the comment in head_64.S which refers to us setting up the
regs (even though we never did), and is otherwise not really accurate
any more.

Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200428123130.73078-1-mpe@ellerman.id.au
2020-05-15 11:58:54 +10:00
Haren Myneni c420644c0a powerpc: Use mm_context vas_windows counter to issue CP_ABORT
set_thread_uses_vas() sets used_vas flag for a process that opened VAS
window and issue CP_ABORT during context switch for only that process.
In multi-thread application, windows can be shared. For example Thread
A can open a window and Thread B can run COPY/PASTE instructions to
send NX request which may cause corruption or snooping or a covert
channel Also once this flag is set, continue to run CP_ABORT even the
VAS window is closed.

So define vas-windows counter in process mm_context, increment this
counter for each window open and decrement it for window close. If
vas-windows is set, issue CP_ABORT during context switch. It means
clear the foreign real address mapping only if the process / thread
uses COPY/PASTE. Then disable it for that process if windows are not
open.

Moved set_thread_uses_vas() code to vas_tx_win_open() as this
functionality is needed only for userspace open windows. We are adding
VAS userspace support along with this fix. So no need to include this
fix in stable releases.

Fixes: 9d2a4d7133 ("powerpc: Define set_thread_uses_vas()")
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reported-by: Nicholas Piggin <npiggin@gmail.com>
Suggested-by: Milton Miller <miltonm@us.ibm.com>
Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1587017291.2275.1077.camel@hbabu-laptop
2020-04-20 16:53:01 +10:00
Christophe Leroy 232ca1eeca powerpc/32s: Fix DSI and ISI exceptions for CONFIG_VMAP_STACK
hash_page() needs to read page tables from kernel memory. When entire
kernel memory is mapped by BATs, which is normally the case when
CONFIG_STRICT_KERNEL_RWX is not set, it works even if the page hosting
the page table is not referenced in the MMU hash table.

However, if the page where the page table resides is not covered by
a BAT, a DSI fault can be encountered from hash_page(), and it loops
forever. This can happen when CONFIG_STRICT_KERNEL_RWX is selected
and the alignment of the different regions is too small to allow
covering the entire memory with BATs. This also happens when
CONFIG_DEBUG_PAGEALLOC is selected or when booting with 'nobats'
flag.

Also, if the page containing the kernel stack is not present in the
MMU hash table, registers cannot be saved and a recursive DSI fault
is encountered.

To allow hash_page() to properly do its job at all time and load the
MMU hash table whenever needed, it must run with data MMU disabled.
This means it must be called before re-enabling data MMU. To allow
this, registers clobbered by hash_page() and create_hpte() have to
be saved in the thread struct together with SRR0, SSR1, DAR and DSISR.
It is also necessary to ensure that DSI prolog doesn't overwrite
regs saved by prolog of the current running exception. That means:
- DSI can only use SPRN_SPRG_SCRATCH0
- Exceptions must free SPRN_SPRG_SCRATCH0 before writing to the stack.

This also fixes the Oops reported by Erhard when create_hpte() is
called by add_hash_page().

Due to prolog size increase, a few more exceptions had to get split
in two parts.

Fixes: cd08f109e2 ("powerpc/32s: Enable CONFIG_VMAP_STACK")
Reported-by: Erhard F. <erhard_f@mailbox.org>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Tested-by: Erhard F. <erhard_f@mailbox.org>
Tested-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206501
Link: https://lore.kernel.org/r/64a4aa44686e9fd4b01333401367029771d9b231.1581761633.git.christophe.leroy@c-s.fr
2020-02-18 21:31:11 +11:00
Christophe Leroy 028474876f powerpc/32: prepare for CONFIG_VMAP_STACK
To support CONFIG_VMAP_STACK, the kernel has to activate Data MMU
Translation for accessing the stack. Before doing that it must save
SRR0, SRR1 and also DAR and DSISR when relevant, in order to not
loose them in case there is a Data TLB Miss once the translation is
reactivated.

This patch adds fields in thread struct for saving those registers.
It prepares entry_32.S to handle exception entry with
Data MMU Translation enabled and alters EXCEPTION_PROLOG macros to
save SRR0, SRR1, DAR and DSISR then reenables Data MMU.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a775a1fea60f190e0f63503463fb775310a2009b.1576916812.git.christophe.leroy@c-s.fr
2020-01-26 22:15:09 +11:00
Nicholas Piggin ed0bc98f8c powerpc/64s: Reimplement power4_idle code in C
This implements the tricky tracing and soft irq handling bits in C,
leaving the low level bit to asm.

A functional difference is that this redirects the interrupt exit to
a return stub to execute blr, rather than the lr address itself. This
is probably barely measurable on real hardware, but it keeps the link
stack balanced.

Tested with QEMU.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Move power4_fixup_nap back into exceptions-64s.S]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190711022404.18132-1-npiggin@gmail.com
2020-01-16 14:59:37 +10:00
Heiko Carstens 7928260539 processor: remove spin_cpu_yield
spin_cpu_yield is unused, therefore remove it.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2019-06-15 12:25:49 +02:00
Thomas Gleixner 2874c5fd28 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 of the license or at
  your option any later version

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30 11:26:32 -07:00
Michael Ellerman bdc7c970bc Merge branch 'topic/ppc-kvm' into next
Merge our topic branch shared with KVM. In particular this includes the
rewrite of the idle code into C.
2019-04-30 22:52:03 +10:00
Nicholas Piggin 10d91611f4 powerpc/64s: Reimplement book3s idle code in C
Reimplement Book3S idle code in C, moving POWER7/8/9 implementation
speific HV idle code to the powernv platform code.

Book3S assembly stubs are kept in common code and used only to save
the stack frame and non-volatile GPRs before executing architected
idle instructions, and restoring the stack and reloading GPRs then
returning to C after waking from idle.

The complex logic dealing with threads and subcores, locking, SPRs,
HMIs, timebase resync, etc., is all done in C which makes it more
maintainable.

This is not a strict translation to C code, there are some
significant differences:

- Idle wakeup no longer uses the ->cpu_restore call to reinit SPRs,
  but saves and restores them itself.

- The optimisation where EC=ESL=0 idle modes did not have to save GPRs
  or change MSR is restored, because it's now simple to do. ESL=1
  sleeps that do not lose GPRs can use this optimization too.

- KVM secondary entry and cede is now more of a call/return style
  rather than branchy. nap_state_lost is not required because KVM
  always returns via NVGPR restoring path.

- KVM secondary wakeup from offline sequence is moved entirely into
  the offline wakeup, which avoids a hwsync in the normal idle wakeup
  path.

Performance measured with context switch ping-pong on different
threads or cores, is possibly improved a small amount, 1-3% depending
on stop state and core vs thread test for shallow states. Deep states
it's in the noise compared with other latencies.

KVM improvements:

- Idle sleepers now always return to caller rather than branch out
  to KVM first.

- This allows optimisations like very fast return to caller when no
  state has been lost.

- KVM no longer requires nap_state_lost because it controls NVGPR
  save/restore itself on the way in and out.

- The heavy idle wakeup KVM request check can be moved out of the
  normal host idle code and into the not-performance-critical offline
  code.

- KVM nap code now returns from where it is called, which makes the
  flow a bit easier to follow.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Squash the KVM changes in]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-04-30 22:37:48 +10:00
Christophe Leroy a68c31fc01 powerpc/32s: Implement Kernel Userspace Access Protection
This patch implements Kernel Userspace Access Protection for
book3s/32.

Due to limitations of the processor page protection capabilities,
the protection is only against writing. read protection cannot be
achieved using page protection.

The previous patch modifies the page protection so that RW user
pages are RW for Key 0 and RO for Key 1, and it sets Key 0 for
both user and kernel.

This patch changes userspace segment registers are set to Ku 0
and Ks 1. When kernel needs to write to RW pages, the associated
segment register is then changed to Ks 0 in order to allow write
access to the kernel.

In order to avoid having the read all segment registers when
locking/unlocking the access, some data is kept in the thread_struct
and saved on stack on exceptions. The field identifies both the
first unlocked segment and the first segment following the last
unlocked one. When no segment is unlocked, it contains value 0.

As the hash_page() function is not able to easily determine if a
protfault is due to a bad kernel access to userspace, protfaults
need to be handled by handle_page_fault when KUAP is set.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Drop allow_read/write_to/from_user() as they're now in kup.h,
      and adapt allow_user_access() to do nothing when to == NULL]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-04-21 23:11:47 +10:00
Christophe Leroy a7916a1de5 powerpc: regain entire stack space
thread_info is not anymore in the stack, so the entire stack
can now be used.

There is also no risk anymore of corrupting task_cpu(p) with a
stack overflow so the patch removes the test.

When doing this, an explicit test for NULL stack pointer is
needed in validate_sp() as it is not anymore implicitely covered
by the sizeof(thread_info) gap.

In the meantime, with the previous patch all pointers to the stacks
are not anymore pointers to thread_info so this patch changes them
to void*

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23 22:31:40 +11:00
Christophe Leroy 3733304048 powerpc: Use linux/thread_info.h in processor.h
When we enable THREAD_INFO_IN_TASK we will remove our definition of
current_thread_info(). Instead it will come from linux/thread_info.h

So switch processor.h to include the latter, so that it can continue
to find current_thread_info().

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23 22:31:40 +11:00
Christophe Leroy 5497c2536f powerpc: Use sizeof(struct thread_info) in INIT_SP_LIMIT
Currently INIT_SP_LIMIT uses sizeof(init_thread_info), but that symbol
won't exist when we enable THREAD_INFO_IN_TASK. So just use the sizeof
the type which is the same value but will continue to work.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23 22:31:40 +11:00
Christophe Leroy 92ab45c5f2 powerpc: Avoid circular header inclusion in mmu-hash.h
When activating CONFIG_THREAD_INFO_IN_TASK, linux/sched.h includes
asm/current.h. This generates a circular dependency. To avoid that,
asm/processor.h shall not be included in mmu-hash.h.

In order to do that, this patch moves into a new header called
asm/task_size_64/32.h all the TASK_SIZE related constants, which can
then be included in mmu-hash.h directly.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Split out all the TASK_SIZE constants not just 64-bit ones]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23 22:31:39 +11:00
Christophe Leroy 0df977eafc powerpc/6xx: Don't use SPRN_SPRG2 for storing stack pointer while in RTAS
When calling RTAS, the stack pointer is stored in SPRN_SPRG2
in order to be able to restore it in case of machine check in RTAS.

As machine check is not a perfomance critical path, this patch
frees SPRN_SPRG2 by using a field in thread struct instead.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-22 00:10:16 +11:00
Nick Desaulniers de0d22e50c treewide: remove current_text_addr
Prefer _THIS_IP_ defined in linux/kernel.h.

Most definitions of current_text_addr were the same as _THIS_IP_, but
a few archs had inline assembly instead.

This patch removes the final call site of current_text_addr, making all
of the definitions dead code.

[akpm@linux-foundation.org: fix arch/csky/include/asm/processor.h]
Link: http://lkml.kernel.org/r/20180911182413.180715-1-ndesaulniers@google.com
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31 08:54:12 -07:00
Nicholas Piggin 5434ae7462 powerpc/64s/hash: Add a SLB preload cache
When switching processes, currently all user SLBEs are cleared, and a
few (exec_base, pc, and stack) are preloaded. In trivial testing with
small apps, this tends to miss the heap and low 256MB segments, and it
will also miss commonly accessed segments on large memory workloads.

Add a simple round-robin preload cache that just inserts the last SLB
miss into the head of the cache and preloads those at context switch
time. Every 256 context switches, the oldest entry is removed from the
cache to shrink the cache and require fewer slbmte if they are unused.

Much more could go into this, including into the SLB entry reclaim
side to track some LRU information etc, which would require a study of
large memory workloads. But this is a simple thing we can do now that
is an obvious win for common workloads.

With the full series, process switching speed on the context_switch
benchmark on POWER9/hash (with kernel speculation security masures
disabled) increases from 140K/s to 178K/s (27%).

POWER8 does not change much (within 1%), it's unclear why it does not
see a big gain like POWER9.

Booting to busybox init with 256MB segments has SLB misses go down
from 945 to 69, and with 1T segments 900 to 21. These could almost all
be eliminated by preloading a bit more carefully with ELF binary
loading.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Nicholas Piggin 4c2de74cc8 powerpc/64: Interrupts save PPR on stack rather than thread_struct
PPR is the odd register out when it comes to interrupt handling, it is
saved in current->thread.ppr while all others are saved on the stack.

The difficulty with this is that accessing thread.ppr can cause a SLB
fault, but the SLB fault handler implementation in C change had
assumed the normal exception entry handlers would not cause an SLB
fault.

Fix this by allocating room in the interrupt stack to save PPR.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Michael Ellerman 54be0b9c7c Revert "convert SLB miss handlers to C" and subsequent commits
This reverts commits:
  5e46e29e6a ("powerpc/64s/hash: convert SLB miss handlers to C")
  8fed04d0f6 ("powerpc/64s/hash: remove user SLB data from the paca")
  655deecf67 ("powerpc/64s/hash: SLB allocation status bitmaps")
  2e1626744e ("powerpc/64s/hash: provide arch_setup_exec hooks for hash slice setup")
  89ca4e126a ("powerpc/64s/hash: Add a SLB preload cache")

This series had a few bugs, and the fixes are not all trivial. So
revert most of it for now.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-03 15:32:49 +10:00
Nicholas Piggin 89ca4e126a powerpc/64s/hash: Add a SLB preload cache
When switching processes, currently all user SLBEs are cleared, and a
few (exec_base, pc, and stack) are preloaded. In trivial testing with
small apps, this tends to miss the heap and low 256MB segments, and it
will also miss commonly accessed segments on large memory workloads.

Add a simple round-robin preload cache that just inserts the last SLB
miss into the head of the cache and preloads those at context switch
time. Every 256 context switches, the oldest entry is removed from the
cache to shrink the cache and require fewer slbmte if they are unused.

Much more could go into this, including into the SLB entry reclaim
side to track some LRU information etc, which would require a study of
large memory workloads. But this is a simple thing we can do now that
is an obvious win for common workloads.

With the full series, process switching speed on the context_switch
benchmark on POWER9/hash (with kernel speculation security masures
disabled) increases from 140K/s to 178K/s (27%).

POWER8 does not change much (within 1%), it's unclear why it does not
see a big gain like POWER9.

Booting to busybox init with 256MB segments has SLB misses go down
from 945 to 69, and with 1T segments 900 to 21. These could almost all
be eliminated by preloading a bit more carefully with ELF binary
loading.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-19 22:01:56 +10:00
Christophe Leroy 62b8426578 powerpc: fix includes in asm/processor.h
Remove superflous includes and add missing ones

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:19 +10:00
Michael Ellerman ba0635fcbe powerpc: Rename thread_struct.fs to addr_limit
It's called 'fs' for historical reasons, it's named after the x86 'FS'
register. But we don't have to use that name for the member of
thread_struct, and in fact arch/x86 doesn't even call it 'fs' anymore.

So rename it to 'addr_limit', which better reflects what it's used
for, and is also the name used on other arches.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:42 +10:00
Nicholas Piggin 36d632ea83 powerpc/64: remove start_tb and accum_tb from thread_struct
These fields are only written to.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:26 +10:00
Nicholas Piggin 3d4fbffdd7 powerpc/64s/idle: POWER9 implement a separate idle stop function for hotplug
Implement a new function to invoke stop, power9_offline_stop, which is
like power9_idle_stop but used by the cpu hotplug code.

Move KVM secondary state manipulation code to the offline case.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01 00:47:46 +11:00
Aneesh Kumar K.V c2b4d8b741 powerpc/mm/hash64: Increase the VA range
This patch increases the max virtual (effective) address value to 4PB.
With 4K page size config we continue to limit ourself to 64TB.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Keep the H_PGTABLE_RANGE test, update it to work]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31 00:10:38 +11:00