Commit Graph

4 Commits

Author SHA1 Message Date
David S. Miller 44922150d8 sparc64: Fix userspace FPU register corruptions.
If we have a series of events from userpsace, with %fprs=FPRS_FEF,
like follows:

ETRAP
	ETRAP
		VIS_ENTRY(fprs=0x4)
		VIS_EXIT
		RTRAP (kernel FPU restore with fpu_saved=0x4)
	RTRAP

We will not restore the user registers that were clobbered by the FPU
using kernel code in the inner-most trap.

Traps allocate FPU save slots in the thread struct, and FPU using
sequences save the "dirty" FPU registers only.

This works at the initial trap level because all of the registers
get recorded into the top-level FPU save area, and we'll return
to userspace with the FPU disabled so that any FPU use by the user
will take an FPU disabled trap wherein we'll load the registers
back up properly.

But this is not how trap returns from kernel to kernel operate.

The simplest fix for this bug is to always save all FPU register state
for anything other than the top-most FPU save area.

Getting rid of the optimized inner-slot FPU saving code ends up
making VISEntryHalf degenerate into plain VISEntry.

Longer term we need to do something smarter to reinstate the partial
save optimizations.  Perhaps the fundament error is having trap entry
and exit allocate FPU save slots and restore register state.  Instead,
the VISEntry et al. calls should be doing that work.

This bug is about two decades old.

Reported-by: James Y Knight <jyknight@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 19:13:25 -07:00
David S. Miller f4da3628dc sparc64: Fix FPU register corruption with AES crypto offload.
The AES loops in arch/sparc/crypto/aes_glue.c use a scheme where the
key material is preloaded into the FPU registers, and then we loop
over and over doing the crypt operation, reusing those pre-cooked key
registers.

There are intervening blkcipher*() calls between the crypt operation
calls.  And those might perform memcpy() and thus also try to use the
FPU.

The sparc64 kernel FPU usage mechanism is designed to allow such
recursive uses, but with a catch.

There has to be a trap between the two FPU using threads of control.

The mechanism works by, when the FPU is already in use by the kernel,
allocating a slot for FPU saving at trap time.  Then if, within the
trap handler, we try to use the FPU registers, the pre-trap FPU
register state is saved into the slot.  Then at trap return time we
notice this and restore the pre-trap FPU state.

Over the long term there are various more involved ways we can make
this work, but for a quick fix let's take advantage of the fact that
the situation where this happens is very limited.

All sparc64 chips that support the crypto instructiosn also are using
the Niagara4 memcpy routine, and that routine only uses the FPU for
large copies where we can't get the source aligned properly to a
multiple of 8 bytes.

We look to see if the FPU is already in use in this context, and if so
we use the non-large copy path which only uses integer registers.

Furthermore, we also limit this special logic to when we are doing
kernel copy, rather than a user copy.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14 19:37:58 -07:00
David S. Miller 42a4172b6e sparc64: Fix trailing whitespace in NG4 memcpy.
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-28 13:08:22 -07:00
David S. Miller ae2c6ca641 sparc64: Add SPARC-T4 optimized memcpy.
Before		After
		--------------	--------------
bw_tcp:         1288.53 MB/sec	1637.77 MB/sec
bw_pipe:        1517.18 MB/sec	2107.61 MB/sec
bw_unix:        1838.38 MB/sec	2640.91 MB/sec

make -s -j128
allmodconfig	5min 49sec	5min 31sec

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-27 00:35:11 -07:00