OpenCloudOS-Kernel/arch
Eric Biggers 5172d322d3 crypto: arm/blake2s - add ARM scalar optimized BLAKE2s
Add an ARM scalar optimized implementation of BLAKE2s.

NEON isn't very useful for BLAKE2s because the BLAKE2s block size is too
small for NEON to help.  Each NEON instruction would depend on the
previous one, resulting in poor performance.

With scalar instructions, on the other hand, we can take advantage of
ARM's "free" rotations (like I did in chacha-scalar-core.S) to get an
implementation get runs much faster than the C implementation.

Performance results on Cortex-A7 in cycles per byte using the shash API:

	4096-byte messages:
		blake2s-256-arm:     18.8
		blake2s-256-generic: 26.0

	500-byte messages:
		blake2s-256-arm:     20.3
		blake2s-256-generic: 27.9

	100-byte messages:
		blake2s-256-arm:     29.7
		blake2s-256-generic: 39.2

	32-byte messages:
		blake2s-256-arm:     50.6
		blake2s-256-generic: 66.2

Except on very short messages, this is still slower than the NEON
implementation of BLAKE2b which I've written; that is 14.0, 16.4, 25.8,
and 76.1 cpb on 4096, 500, 100, and 32-byte messages, respectively.
However, optimized BLAKE2s is useful for cases where BLAKE2s is used
instead of BLAKE2b, such as WireGuard.

This new implementation is added in the form of a new module
blake2s-arm.ko, which is analogous to blake2s-x86_64.ko in that it
provides blake2s_compress_arch() for use by the library API as well as
optionally register the algorithms with the shash API.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-03 08:41:39 +11:00
..
alpha A treewide cleanup of interrupt descriptor (ab)use with all sorts of racy 2020-12-24 13:50:23 -08:00
arc tif-task_work.arch-2020-12-14 2020-12-16 12:33:35 -08:00
arm crypto: arm/blake2s - add ARM scalar optimized BLAKE2s 2021-01-03 08:41:39 +11:00
arm64 crypto: arm64/aes-ctr - improve tail handling 2021-01-03 08:41:37 +11:00
c6x tif-task_work.arch-2020-12-14 2020-12-16 12:33:35 -08:00
csky Tracing updates for 5.11 2020-12-17 13:22:17 -08:00
h8300 tif-task_work.arch-2020-12-14 2020-12-16 12:33:35 -08:00
hexagon tif-task_work.arch-2020-12-14 2020-12-16 12:33:35 -08:00
ia64 Kbuild updates for v5.11 2020-12-22 14:02:39 -08:00
m68k Fixes include: 2020-12-21 10:35:11 -08:00
microblaze epoll: wire up syscall epoll_pwait2 2020-12-19 11:18:38 -08:00
mips epoll: fix compat syscall wire up of epoll_pwait2 2020-12-20 10:01:38 -08:00
nds32 Tracing updates for 5.11 2020-12-17 13:22:17 -08:00
nios2 tif-task_work.arch-2020-12-14 2020-12-16 12:33:35 -08:00
openrisc OpenRISC updates for 5.11 2020-12-17 13:41:27 -08:00
parisc A treewide cleanup of interrupt descriptor (ab)use with all sorts of racy 2020-12-24 13:50:23 -08:00
powerpc powerpc fixes for 5.11 #2 2020-12-24 14:02:00 -08:00
riscv RISC-V Fixes for 5.11-rc1 2020-12-24 14:05:05 -08:00
s390 crypto: remove cipher routines from public crypto API 2021-01-03 08:41:35 +11:00
sh The core framework got some nice improvements this time around. We gained the 2020-12-21 10:39:37 -08:00
sparc epoll: fix compat syscall wire up of epoll_pwait2 2020-12-20 10:01:38 -08:00
um This pull request contains the following changes for UML: 2020-12-17 17:56:44 -08:00
x86 crypto: blake2s - share the "shash" API boilerplate code 2021-01-03 08:41:38 +11:00
xtensa The core framework got some nice improvements this time around. We gained the 2020-12-21 10:39:37 -08:00
.gitignore
Kconfig kasan: allow VMAP_STACK for HW_TAGS mode 2020-12-22 12:55:08 -08:00