linux-sg2042

Commit Graph

Author	SHA1	Message	Date
Ard Biesheuvel	2fffee536c	crypto: arm64/crct10dif - implement non-Crypto Extensions alternative The arm64 implementation of the CRC-T10DIF algorithm uses the 64x64 bit polynomial multiplication instructions, which are optional in the architecture, and if these instructions are not available, we fall back to the C routine which is slow and inefficient. So let's reuse the 64x64 bit PMULL alternative from the GHASH driver that uses a sequence of ~40 instructions involving 8x8 bit PMULL and some shifting and masking. This is a lot slower than the original, but it is still twice as fast as the current [unoptimized] C code on Cortex-A53, and it is time invariant and much easier on the D-cache. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-09-04 11:37:04 +08:00
Ard Biesheuvel	6c1b0da13e	crypto: arm64/crct10dif - preparatory refactor for 8x8 PMULL version Reorganize the CRC-T10DIF asm routine so we can easily instantiate an alternative version based on 8x8 polynomial multiplication in a subsequent patch. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-09-04 11:37:04 +08:00
Ard Biesheuvel	5b3da65177	crypto: arm64/crct10dif-ce - yield NEON after every block of input Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-05-12 00:13:11 +08:00
Ard Biesheuvel	325f562d8f	crypto: arm64/crct10dif - move literal data to .rodata section Move the CRC-T10DIF literal data to the .rodata section where it is safe from being exploited by speculative execution. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-01-18 23:00:31 +11:00
Ard Biesheuvel	6ef5737f39	crypto: arm64/crct10dif - port x86 SSE implementation to arm64 This is a transliteration of the Intel algorithm implemented using SSE and PCLMULQDQ instructions that resides in the file arch/x86/crypto/crct10dif-pcl-asm_64.S, but simplified to only operate on buffers that are 16 byte aligned (but of any size) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-12-07 20:01:17 +08:00

5 Commits