- Standardize the crypto asm code so that it looks like compiler-generated
code to objtool - so that it can understand it. This enables unwinding
from crypto asm code - and also fixes the last known remaining objtool
warnings for LTO and more.
- x86 decoder fixes: clean up and fix the decoder, and also extend it a bit
- Misc fixes and cleanups
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmCJEOQRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1jN0A//dZIR9GPW1cHkPD3Na+lxb+RWgCVnFUgw
meLEOum389zWr7S8YmcFpKWLy94f3l24i/e7ufKn6/RMdaQuT6pZUa6teiNPqDKN
Qq1v6EX/LT49Q1zh/zCCnKdlmF1t7wDyA1/+HBLnB4YfYEtteEt+p2Apyv4xIHOl
xWqaTMFcVR/El9FXSyqRWRR4zuqY0Uatz0fmfo5jmi2xq460k53fQlTLA/0w5Jw0
V3omyA3AYMUW6YlW5TGUINOhyDeAJm4PWl3siSUnSd6t8A/TVs5zpZX15BtseCle
0FRp2SbxOoVkiyo3N3XmkfYYns9+4wK7cr9qja9U9MsSBZJZwaBm2LO/t2WFrAhq
5dkOsoPmpIsjutsQnIhQgtVT9I/A4/u5m5Zi3trlXsBS0XAt/q+2GPfEngFmgb3q
nae4rhGUsQ3NTGBiqNuMHQF4yeEvQZ8DCf3ytTz7DjBeiQ9nAtwzbUUGQjYl2mj1
ZPOnl7Xmq/Nyw+AmdpffFPiEUJxqEg9HWjDo7DQATXb3Hw2VJ3cU8jwPRqDDlO10
OB81vysXNGTmhOngHXexxncpmU9gDOIC1imZZpw5lNx4W9Qn20AlGaGAIbqzlfx0
p5VuhkIWCySe1bOZx03xuk7Gq7GBIPPy/a2m204Ftipetlo1HBYwT3KB/wVpHmh7
CSjWgdiW3+k=
=poAZ
-----END PGP SIGNATURE-----
Merge tag 'objtool-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool updates from Ingo Molnar:
- Standardize the crypto asm code so that it looks like compiler-
generated code to objtool - so that it can understand it. This
enables unwinding from crypto asm code - and also fixes the last
known remaining objtool warnings for LTO and more.
- x86 decoder fixes: clean up and fix the decoder, and also extend it a
bit
- Misc fixes and cleanups
* tag 'objtool-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
x86/crypto: Enable objtool in crypto code
x86/crypto/sha512-ssse3: Standardize stack alignment prologue
x86/crypto/sha512-avx2: Standardize stack alignment prologue
x86/crypto/sha512-avx: Standardize stack alignment prologue
x86/crypto/sha256-avx2: Standardize stack alignment prologue
x86/crypto/sha1_avx2: Standardize stack alignment prologue
x86/crypto/sha_ni: Standardize stack alignment prologue
x86/crypto/crc32c-pcl-intel: Standardize jump table
x86/crypto/camellia-aesni-avx2: Unconditionally allocate stack buffer
x86/crypto/aesni-intel_avx: Standardize stack alignment prologue
x86/crypto/aesni-intel_avx: Fix register usage comments
x86/crypto/aesni-intel_avx: Remove unused macros
objtool: Support asm jump tables
objtool: Parse options from OBJTOOL_ARGS
objtool: Collate parse_options() users
objtool: Add --backup
objtool,x86: More ModRM sugar
objtool,x86: Rewrite ADD/SUB/AND
objtool,x86: Support %riz encodings
objtool,x86: Simplify register decode
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmCGmYIACgkQEsHwGGHe
VUr45w/8CSXr7MXaFBj4To0hTWJXSZyF6YGqlZOSJXFcFh4cWTNwfVOoFaV47aDo
+HsCNTkGENcKhLrDUWDRiG/Uo46jxtOtl1vhq7U4pGemSYH871XWOKfb5k5XNMwn
/uhaHMI4aEfd6bUFnF518NeyRIsD0BdqFj4tB7RbAiyFwdETDX9Tkj/uBKnQ4zon
4tEDoXgThuK5YKK9zVQg5pa7aFp2zg1CAdX/WzBkS8BHVBPXSV0CF97AJYQOM/V+
lUHv+BN3wp97GYHPQMPsbkNr8IuFoe2mIvikwjxg8iOFpzEU1G1u09XV9R+PXByX
LclFTRqK/2uU5hJlcsBiKfUuidyErYMRYImbMAOREt2w0ogWVu2zQ7HkjVve25h1
sQPwPudbAt6STbqRxvpmB3yoV4TCYwnF91FcWgEy+rcEK2BDsHCnScA45TsK5I1C
kGR1K17pHXprgMZFPveH+LgxewB6smDv+HllxQdSG67LhMJXcs2Epz0TsN8VsXw8
dlD3lGReK+5qy9FTgO7mY0xhiXGz1IbEdAPU4eRBgih13puu03+jqgMaMabvBWKD
wax+BWJUrPtetwD5fBPhlS/XdJDnd8Mkv2xsf//+wT0s4p+g++l1APYxeB8QEehm
Pd7Mvxm4GvQkfE13QEVIPYQRIXCMH/e9qixtY5SHUZDBVkUyFM0=
=bO1i
-----END PGP SIGNATURE-----
Merge tag 'x86_cleanups_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 cleanups from Borislav Petkov:
"Trivial cleanups and fixes all over the place"
* tag 'x86_cleanups_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
MAINTAINERS: Remove me from IDE/ATAPI section
x86/pat: Do not compile stubbed functions when X86_PAT is off
x86/asm: Ensure asm/proto.h can be included stand-alone
x86/platform/intel/quark: Fix incorrect kernel-doc comment syntax in files
x86/msr: Make locally used functions static
x86/cacheinfo: Remove unneeded dead-store initialization
x86/process/64: Move cpu_current_top_of_stack out of TSS
tools/turbostat: Unmark non-kernel-doc comment
x86/syscalls: Fix -Wmissing-prototypes warnings from COND_SYSCALL()
x86/fpu/math-emu: Fix function cast warning
x86/msr: Fix wr/rdmsr_safe_regs_on_cpu() prototypes
x86: Fix various typos in comments, take #2
x86: Remove unusual Unicode characters from comments
x86/kaslr: Return boolean values from a function returning bool
x86: Fix various typos in comments
x86/setup: Remove unused RESERVE_BRK_ARRAY()
stacktrace: Move documentation for arch_stack_walk_reliable() to header
x86: Remove duplicate TSC DEADLINE MSR definitions
Now that all the stack alignment prologues have been cleaned up in the
crypto code, enable objtool. Among other benefits, this will allow ORC
unwinding to work.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/fc2a1918c50e33e46ef0e9a5de02743f2f6e3639.1614182415.git.jpoimboe@redhat.com
Use a more standard prologue for saving the stack pointer before
realigning the stack.
This enables ORC unwinding by allowing objtool to understand the stack
realignment.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/6ecaaac9f3828fbb903513bf90c34a08380a8e35.1614182415.git.jpoimboe@redhat.com
Use a more standard prologue for saving the stack pointer before
realigning the stack.
This enables ORC unwinding by allowing objtool to understand the stack
realignment.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/b1a7b29fcfc65d60a3b6e77ef75f4762a5b8488d.1614182415.git.jpoimboe@redhat.com
Use a more standard prologue for saving the stack pointer before
realigning the stack.
This enables ORC unwinding by allowing objtool to understand the stack
realignment.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/d36e9ea1c819d87fa89b3df3fa83e2a1ede18146.1614182415.git.jpoimboe@redhat.com
Use a more standard prologue for saving the stack pointer before
realigning the stack.
This enables ORC unwinding by allowing objtool to understand the stack
realignment.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/8048e7444c49a8137f05265262b83dc50f8fb7f3.1614182415.git.jpoimboe@redhat.com
Use a more standard prologue for saving the stack pointer before
realigning the stack.
This enables ORC unwinding by allowing objtool to understand the stack
realignment.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/fdaaf8670ed1f52f55ba9a6bbac98c1afddc1af6.1614182415.git.jpoimboe@redhat.com
Use a more standard prologue for saving the stack pointer before
realigning the stack.
This enables ORC unwinding by allowing objtool to understand the stack
realignment.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/5033e1a79867dff1b18e1b4d0783c38897d3f223.1614182415.git.jpoimboe@redhat.com
Simplify the jump table code so that it resembles a compiler-generated
table.
This enables ORC unwinding by allowing objtool to follow all the
potential code paths.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/5357a039def90b8ef6b5874ef12cda008ecf18ba.1614182415.git.jpoimboe@redhat.com
A conditional stack allocation violates traditional unwinding
requirements when a single instruction can have differing stack layouts.
There's no benefit in allocating the stack buffer conditionally. Just
do it unconditionally.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/85ac96613ee5784b6239c18d3f68b1f3c509caa3.1614182415.git.jpoimboe@redhat.com
Use RBP instead of R14 for saving the old stack pointer before
realignment. This resembles what compilers normally do.
This enables ORC unwinding by allowing objtool to understand the stack
realignment.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/02d00a0903a0959f4787e186e2a07d271e1f63d4.1614182415.git.jpoimboe@redhat.com
gcc-11 points out a mismatch between the declaration and the definition
of poly1305_core_setkey():
lib/crypto/poly1305-donna32.c:13:67: error: argument 2 of type ‘const u8[16]’ {aka ‘const unsigned char[16]’} with mismatched bound [-Werror=array-parameter=]
13 | void poly1305_core_setkey(struct poly1305_core_key *key, const u8 raw_key[16])
| ~~~~~~~~~^~~~~~~~~~~
In file included from lib/crypto/poly1305-donna32.c:11:
include/crypto/internal/poly1305.h:21:68: note: previously declared as ‘const u8 *’ {aka ‘const unsigned char *’}
21 | void poly1305_core_setkey(struct poly1305_core_key *key, const u8 *raw_key);
This is harmless in principle, as the calling conventions are the same,
but the more specific prototype allows better type checking in the
caller.
Change the declaration to match the actual function definition.
The poly1305_simd_init() is a bit suspicious here, as it previously
had a 32-byte argument type, but looks like it needs to take the
16-byte POLY1305_BLOCK_SIZE array instead.
Fixes: 1c08a10436 ("crypto: poly1305 - add new 32 and 64-bit generic versions")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Fix another ~42 single-word typos in arch/x86/ code comments,
missed a few in the first pass, in particular in .S files.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-kernel@vger.kernel.org
Fix ~144 single-word typos in arch/x86/ code comments.
Doing this in a single commit should reduce the churn.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-kernel@vger.kernel.org
Taking ownership of the FPU in kernel mode disables preemption, and
this may result in excessive scheduling blackouts if the size of the
data being processed on the FPU is unbounded.
Given that taking and releasing the FPU is cheap these days on x86, we
can limit the impact of this issue easily for skcipher implementations,
by moving the FPU begin/end calls inside the skcipher walk processing
loop. Considering that skcipher walks operate on at most one page at a
time, doing so fully mitigates this issue.
This also permits the skcipher walk logic to use non-atomic kmalloc()
calls etc so we can change the 'atomic' bool argument in the calls to
skcipher_walk_virt() to false as well.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Indirect calls are very expensive on x86, so use a static call to set
the system-wide AES-NI/CTR asm helper.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The Camellia, Serpent and Twofish related header files only contain
declarations that are shared between different implementations of the
respective algorithms residing under arch/x86/crypto, and none of their
contents should be used elsewhere. So move the header files into the
same location, and use local #includes instead.
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
All dependencies on the x86 glue helper module have been replaced by
local instantiations of the new ECB/CBC preprocessor helper macros, so
the glue helper module can be retired.
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Replace the glue helper dependency with implementations of ECB and CBC
based on the new CPP macros, which avoid the need for indirect calls.
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Replace the glue helper dependency with implementations of ECB and CBC
based on the new CPP macros, which avoid the need for indirect calls.
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Replace the glue helper dependency with implementations of ECB and CBC
based on the new CPP macros, which avoid the need for indirect calls.
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Replace the glue helper dependency with implementations of ECB and CBC
based on the new CPP macros, which avoid the need for indirect calls.
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Replace the glue helper dependency with implementations of ECB and CBC
based on the new CPP macros, which avoid the need for indirect calls.
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The x86 glue helper module is starting to show its age:
- It relies heavily on function pointers to invoke asm helper functions that
operate on fixed input sizes that are relatively small. This means the
performance is severely impacted by retpolines.
- It goes to great lengths to amortize the cost of kernel_fpu_begin()/end()
over as much work as possible, which is no longer necessary now that FPU
save/restore is done lazily, and doing so may cause unbounded scheduling
blackouts due to the fact that enabling the FPU in kernel mode disables
preemption.
- The CBC mode decryption helper makes backward strides through the input, in
order to avoid a single block size memcpy() between chunks. Consuming the
input in this manner is highly likely to defeat any hardware prefetchers,
so it is better to go through the data linearly, and perform the extra
memcpy() where needed (which is turned into direct loads and stores by the
compiler anyway). Note that benchmarks won't show this effect, given that
the memory they use is always cache hot.
- It implements blockwise XOR in terms of le128 pointers, which imply an
alignment that is not guaranteed by the API, violating the C standard.
GCC does not seem to be smart enough to elide the indirect calls when the
function pointers are passed as arguments to static inline helper routines
modeled after the existing ones. So instead, let's create some CPP macros
that encapsulate the core of the ECB and CBC processing, so we can wire
them up for existing users of the glue helper module, i.e., Camellia,
Serpent, Twofish and CAST6.
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Blowfish in counter mode is never used in the kernel, so there
is no point in keeping an accelerated implementation around.
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
DES or Triple DES in counter mode is never used in the kernel, so there
is no point in keeping an accelerated implementation around.
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The glue helper's CTR routines are no longer used, so drop them.
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Twofish in CTR mode is never used by the kernel directly, and is highly
unlikely to be relied upon by dm-crypt or algif_skcipher. So let's drop
the accelerated CTR mode implementation, and instead, rely on the CTR
template and the bare cipher.
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
CAST6 in CTR mode is never used by the kernel directly, and is highly
unlikely to be relied upon by dm-crypt or algif_skcipher. So let's drop
the accelerated CTR mode implementation, and instead, rely on the CTR
template and the bare cipher.
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
CAST5 in CTR mode is never used by the kernel directly, and is highly
unlikely to be relied upon by dm-crypt or algif_skcipher. So let's drop
the accelerated CTR mode implementation, and instead, rely on the CTR
template and the bare cipher.
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Serpent in CTR mode is never used by the kernel directly, and is highly
unlikely to be relied upon by dm-crypt or algif_skcipher. So let's drop
the accelerated CTR mode implementation, and instead, rely on the CTR
template and the bare cipher.
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Camellia in CTR mode is never used by the kernel directly, and is highly
unlikely to be relied upon by dm-crypt or algif_skcipher. So let's drop
the accelerated CTR mode implementation, and instead, rely on the CTR
template and the bare cipher.
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The glue helper's XTS routines are no longer used, so drop them.
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Now that the XTS template can wrap accelerated ECB modes, it can be
used to implement Twofish in XTS mode as well, which turns out to
be at least as fast, and sometimes even faster
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Now that the XTS template can wrap accelerated ECB modes, it can be
used to implement Serpent in XTS mode as well, which turns out to
be at least as fast, and sometimes even faster
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Now that the XTS template can wrap accelerated ECB modes, it can be
used to implement CAST6 in XTS mode as well, which turns out to
be at least as fast, and sometimes even faster
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Now that the XTS template can wrap accelerated ECB modes, it can be
used to implement Camellia in XTS mode as well, which turns out to
be at least as fast, and sometimes even faster.
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Replace the function pointers in the GCM implementation with static branches,
which are based on code patching, which occurs only at module load time.
This avoids the severe performance penalty caused by the use of retpolines.
In order to retain the ability to switch between different versions of the
implementation based on the input size on cores that support AVX and AVX2,
use static branches instead of static calls.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Currently, the gcm(aes-ni) driver open codes the scatterlist handling
that is encapsulated by the skcipher walk API. So let's switch to that
instead.
Also, move the handling at the end of gcmaes_crypt_by_sg() that is
dependent on whether we are encrypting or decrypting into the callers,
which always do one or the other.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The gcm(aes-ni) driver is only built for x86_64, which does not make
use of highmem. So testing for PageHighMem is pointless and can be
omitted.
While at it, replace GFP_ATOMIC with the appropriate runtime decided
value based on the context.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Drop some prototypes that are declared but never called.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The GCM mode driver uses 16 byte aligned buffers on the stack to pass
the IV to the asm helpers, but unfortunately, the x86 port does not
guarantee that the stack pointer is 16 byte aligned upon entry in the
first place. Since the compiler is not aware of this, it will not emit
the additional stack realignment sequence that is needed, and so the
alignment is not guaranteed to be more than 8 bytes.
So instead, allocate some padding on the stack, and realign the IV
pointer by hand.
Cc: <stable@vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The AES-NI driver implements XTS via the glue helper, which consumes
a struct with sets of function pointers which are invoked on chunks
of input data of the appropriate size, as annotated in the struct.
Let's get rid of this indirection, so that we can perform direct calls
to the assembler helpers. Instead, let's adopt the arm64 strategy, i.e.,
provide a helper which can consume inputs of any size, provided that the
penultimate, full block is passed via the last call if ciphertext stealing
needs to be applied.
This also allows us to enable the XTS mode for i386.
Tested-by: Eric Biggers <ebiggers@google.com> # x86_64
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The XTS asm helper arrangement is a bit odd: the 8-way stride helper
consists of back-to-back calls to the 4-way core transforms, which
are called indirectly, based on a boolean that indicates whether we
are performing encryption or decryption.
Given how costly indirect calls are on x86, let's switch to direct
calls, and given how the 8-way stride doesn't really add anything
substantial, use a 4-way stride instead, and make the asm core
routine deal with any multiple of 4 blocks. Since 512 byte sectors
or 4 KB blocks are the typical quantities XTS operates on, increase
the stride exported to the glue helper to 512 bytes as well.
As a result, the number of indirect calls is reduced from 3 per 64 bytes
of in/output to 1 per 512 bytes of in/output, which produces a 65% speedup
when operating on 1 KB blocks (measured on a Intel(R) Core(TM) i7-8650U CPU)
Fixes: 9697fa39ef ("x86/retpoline/crypto: Convert crypto assembler indirect jumps")
Tested-by: Eric Biggers <ebiggers@google.com> # x86_64
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add helper functions for shash implementations of BLAKE2s to
include/crypto/internal/blake2s.h, taking advantage of
__blake2s_update() and __blake2s_final() that were added by the previous
patch to share more code between the library and shash implementations.
crypto_blake2s_setkey() and crypto_blake2s_init() are usable as
shash_alg::setkey and shash_alg::init directly, while
crypto_blake2s_update() and crypto_blake2s_final() take an extra
'blake2s_compress_t' function pointer parameter. This allows the
implementation of the compression function to be overridden, which is
the only part that optimized implementations really care about.
The new functions are inline functions (similar to those in sha1_base.h,
sha256_base.h, and sm3_base.h) because this avoids needing to add a new
module blake2s_helpers.ko, they aren't *too* long, and this avoids
indirect calls which are expensive these days. Note that they can't go
in blake2s_generic.ko, as that would require selecting CRYPTO_BLAKE2S
from CRYPTO_BLAKE2S_X86, which would cause a recursive dependency.
Finally, use these new helper functions in the x86 implementation of
BLAKE2s. (This part should be a separate patch, but unfortunately the
x86 implementation used the exact same function names like
"crypto_blake2s_update()", so it had to be updated at the same time.)
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The shash_alg structs for the four variants of BLAKE2s are identical
except for the algorithm name, driver name, and digest size. So, avoid
code duplication by using a macro to define these structs.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Follow the same approach as the arm64 driver for implementing a version
of AES-NI in CBC mode that supports ciphertext stealing. This results in
a ~2x speed increase for relatively short inputs (less than 256 bytes),
which is relevant given that AES-CBC with ciphertext stealing is used
for filename encryption in the fscrypt layer. For larger inputs, the
speedup is still significant (~25% on decryption, ~6% on encryption)
Tested-by: Eric Biggers <ebiggers@google.com> # x86_64
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>