Commit Graph

730 Commits

Author SHA1 Message Date
Josh Poimboeuf 7d3d10e0e8 x86/crypto: Enable objtool in crypto code
Now that all the stack alignment prologues have been cleaned up in the
crypto code, enable objtool.  Among other benefits, this will allow ORC
unwinding to work.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/fc2a1918c50e33e46ef0e9a5de02743f2f6e3639.1614182415.git.jpoimboe@redhat.com
2021-04-19 12:36:37 -05:00
Josh Poimboeuf 27d26793f2 x86/crypto/sha512-ssse3: Standardize stack alignment prologue
Use a more standard prologue for saving the stack pointer before
realigning the stack.

This enables ORC unwinding by allowing objtool to understand the stack
realignment.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/6ecaaac9f3828fbb903513bf90c34a08380a8e35.1614182415.git.jpoimboe@redhat.com
2021-04-19 12:36:37 -05:00
Josh Poimboeuf ec063e090b x86/crypto/sha512-avx2: Standardize stack alignment prologue
Use a more standard prologue for saving the stack pointer before
realigning the stack.

This enables ORC unwinding by allowing objtool to understand the stack
realignment.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/b1a7b29fcfc65d60a3b6e77ef75f4762a5b8488d.1614182415.git.jpoimboe@redhat.com
2021-04-19 12:36:36 -05:00
Josh Poimboeuf d61684b56e x86/crypto/sha512-avx: Standardize stack alignment prologue
Use a more standard prologue for saving the stack pointer before
realigning the stack.

This enables ORC unwinding by allowing objtool to understand the stack
realignment.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/d36e9ea1c819d87fa89b3df3fa83e2a1ede18146.1614182415.git.jpoimboe@redhat.com
2021-04-19 12:36:36 -05:00
Josh Poimboeuf ce58466680 x86/crypto/sha256-avx2: Standardize stack alignment prologue
Use a more standard prologue for saving the stack pointer before
realigning the stack.

This enables ORC unwinding by allowing objtool to understand the stack
realignment.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/8048e7444c49a8137f05265262b83dc50f8fb7f3.1614182415.git.jpoimboe@redhat.com
2021-04-19 12:36:36 -05:00
Josh Poimboeuf 20114c899c x86/crypto/sha1_avx2: Standardize stack alignment prologue
Use a more standard prologue for saving the stack pointer before
realigning the stack.

This enables ORC unwinding by allowing objtool to understand the stack
realignment.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/fdaaf8670ed1f52f55ba9a6bbac98c1afddc1af6.1614182415.git.jpoimboe@redhat.com
2021-04-19 12:36:35 -05:00
Josh Poimboeuf 35a0067d2c x86/crypto/sha_ni: Standardize stack alignment prologue
Use a more standard prologue for saving the stack pointer before
realigning the stack.

This enables ORC unwinding by allowing objtool to understand the stack
realignment.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/5033e1a79867dff1b18e1b4d0783c38897d3f223.1614182415.git.jpoimboe@redhat.com
2021-04-19 12:36:35 -05:00
Josh Poimboeuf 2b02ed5548 x86/crypto/crc32c-pcl-intel: Standardize jump table
Simplify the jump table code so that it resembles a compiler-generated
table.

This enables ORC unwinding by allowing objtool to follow all the
potential code paths.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/5357a039def90b8ef6b5874ef12cda008ecf18ba.1614182415.git.jpoimboe@redhat.com
2021-04-19 12:36:34 -05:00
Josh Poimboeuf dabe5167a3 x86/crypto/camellia-aesni-avx2: Unconditionally allocate stack buffer
A conditional stack allocation violates traditional unwinding
requirements when a single instruction can have differing stack layouts.

There's no benefit in allocating the stack buffer conditionally.  Just
do it unconditionally.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/85ac96613ee5784b6239c18d3f68b1f3c509caa3.1614182415.git.jpoimboe@redhat.com
2021-04-19 12:36:34 -05:00
Josh Poimboeuf e163be86ff x86/crypto/aesni-intel_avx: Standardize stack alignment prologue
Use RBP instead of R14 for saving the old stack pointer before
realignment.  This resembles what compilers normally do.

This enables ORC unwinding by allowing objtool to understand the stack
realignment.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/02d00a0903a0959f4787e186e2a07d271e1f63d4.1614182415.git.jpoimboe@redhat.com
2021-04-19 12:36:34 -05:00
Josh Poimboeuf ff5796b6db x86/crypto/aesni-intel_avx: Fix register usage comments
Fix register usage comments to match reality.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/8655d4513a0ed1eddec609165064153973010aa2.1614182415.git.jpoimboe@redhat.com
2021-04-19 12:36:33 -05:00
Josh Poimboeuf 4f08300916 x86/crypto/aesni-intel_avx: Remove unused macros
These macros are no longer used; remove them.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/53f7136ea93ebdbca399959e6d2991ecb46e733e.1614182415.git.jpoimboe@redhat.com
2021-04-19 12:36:33 -05:00
Arnd Bergmann 8d195e7a8a crypto: poly1305 - fix poly1305_core_setkey() declaration
gcc-11 points out a mismatch between the declaration and the definition
of poly1305_core_setkey():

lib/crypto/poly1305-donna32.c:13:67: error: argument 2 of type ‘const u8[16]’ {aka ‘const unsigned char[16]’} with mismatched bound [-Werror=array-parameter=]
   13 | void poly1305_core_setkey(struct poly1305_core_key *key, const u8 raw_key[16])
      |                                                          ~~~~~~~~~^~~~~~~~~~~
In file included from lib/crypto/poly1305-donna32.c:11:
include/crypto/internal/poly1305.h:21:68: note: previously declared as ‘const u8 *’ {aka ‘const unsigned char *’}
   21 | void poly1305_core_setkey(struct poly1305_core_key *key, const u8 *raw_key);

This is harmless in principle, as the calling conventions are the same,
but the more specific prototype allows better type checking in the
caller.

Change the declaration to match the actual function definition.
The poly1305_simd_init() is a bit suspicious here, as it previously
had a 32-byte argument type, but looks like it needs to take the
16-byte POLY1305_BLOCK_SIZE array instead.

Fixes: 1c08a10436 ("crypto: poly1305 - add new 32 and 64-bit generic versions")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-04-02 18:28:12 +11:00
Ingo Molnar 163b099146 x86: Fix various typos in comments, take #2
Fix another ~42 single-word typos in arch/x86/ code comments,
missed a few in the first pass, in particular in .S files.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-kernel@vger.kernel.org
2021-03-21 23:50:28 +01:00
Ingo Molnar d9f6e12fb0 x86: Fix various typos in comments
Fix ~144 single-word typos in arch/x86/ code comments.

Doing this in a single commit should reduce the churn.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-kernel@vger.kernel.org
2021-03-18 15:31:53 +01:00
Ard Biesheuvel 65d1e3c415 crypto: aesni - release FPU during skcipher walk API calls
Taking ownership of the FPU in kernel mode disables preemption, and
this may result in excessive scheduling blackouts if the size of the
data being processed on the FPU is unbounded.

Given that taking and releasing the FPU is cheap these days on x86, we
can limit the impact of this issue easily for skcipher implementations,
by moving the FPU begin/end calls inside the skcipher walk processing
loop. Considering that skcipher walks operate on at most one page at a
time, doing so fully mitigates this issue.

This also permits the skcipher walk logic to use non-atomic kmalloc()
calls etc so we can change the 'atomic' bool argument in the calls to
skcipher_walk_virt() to false as well.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-22 14:58:04 +11:00
Ard Biesheuvel 64a49b8595 crypto: aesni - replace CTR function pointer with static call
Indirect calls are very expensive on x86, so use a static call to set
the system-wide AES-NI/CTR asm helper.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-22 14:58:04 +11:00
Ard Biesheuvel a04ea6f7ff crypto: x86 - use local headers for x86 specific shared declarations
The Camellia, Serpent and Twofish related header files only contain
declarations that are shared between different implementations of the
respective algorithms residing under arch/x86/crypto, and none of their
contents should be used elsewhere. So move the header files into the
same location, and use local #includes instead.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14 17:10:30 +11:00
Ard Biesheuvel 64ca771cd6 crypto: x86 - remove glue helper module
All dependencies on the x86 glue helper module have been replaced by
local instantiations of the new ECB/CBC preprocessor helper macros, so
the glue helper module can be retired.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14 17:10:29 +11:00
Ard Biesheuvel 165f357334 crypto: x86/twofish - drop dependency on glue helper
Replace the glue helper dependency with implementations of ECB and CBC
based on the new CPP macros, which avoid the need for indirect calls.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14 17:10:29 +11:00
Ard Biesheuvel ea55cfc3f9 crypto: x86/cast6 - drop dependency on glue helper
Replace the glue helper dependency with implementations of ECB and CBC
based on the new CPP macros, which avoid the need for indirect calls.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14 17:10:29 +11:00
Ard Biesheuvel 674d40abac crypto: x86/cast5 - drop dependency on glue helper
Replace the glue helper dependency with implementations of ECB and CBC
based on the new CPP macros, which avoid the need for indirect calls.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14 17:10:29 +11:00
Ard Biesheuvel 9ad58b46f8 crypto: x86/serpent - drop dependency on glue helper
Replace the glue helper dependency with implementations of ECB and CBC
based on the new CPP macros, which avoid the need for indirect calls.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14 17:10:29 +11:00
Ard Biesheuvel 407d409a81 crypto: x86/camellia - drop dependency on glue helper
Replace the glue helper dependency with implementations of ECB and CBC
based on the new CPP macros, which avoid the need for indirect calls.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14 17:10:29 +11:00
Ard Biesheuvel 827ee47228 crypto: x86 - add some helper macros for ECB and CBC modes
The x86 glue helper module is starting to show its age:
- It relies heavily on function pointers to invoke asm helper functions that
  operate on fixed input sizes that are relatively small. This means the
  performance is severely impacted by retpolines.
- It goes to great lengths to amortize the cost of kernel_fpu_begin()/end()
  over as much work as possible, which is no longer necessary now that FPU
  save/restore is done lazily, and doing so may cause unbounded scheduling
  blackouts due to the fact that enabling the FPU in kernel mode disables
  preemption.
- The CBC mode decryption helper makes backward strides through the input, in
  order to avoid a single block size memcpy() between chunks. Consuming the
  input in this manner is highly likely to defeat any hardware prefetchers,
  so it is better to go through the data linearly, and perform the extra
  memcpy() where needed (which is turned into direct loads and stores by the
  compiler anyway). Note that benchmarks won't show this effect, given that
  the memory they use is always cache hot.
- It implements blockwise XOR in terms of le128 pointers, which imply an
  alignment that is not guaranteed by the API, violating the C standard.

GCC does not seem to be smart enough to elide the indirect calls when the
function pointers are passed as arguments to static inline helper routines
modeled after the existing ones. So instead, let's create some CPP macros
that encapsulate the core of the ECB and CBC processing, so we can wire
them up for existing users of the glue helper module, i.e., Camellia,
Serpent, Twofish and CAST6.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14 17:10:29 +11:00
Ard Biesheuvel c0a64926c5 crypto: x86/blowfish - drop CTR mode implementation
Blowfish in counter mode is never used in the kernel, so there
is no point in keeping an accelerated implementation around.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14 17:10:29 +11:00
Ard Biesheuvel 768db5fee3 crypto: x86/des - drop CTR mode implementation
DES or Triple DES in counter mode is never used in the kernel, so there
is no point in keeping an accelerated implementation around.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14 17:10:28 +11:00
Ard Biesheuvel 89b7ba5c8b crypto: x86/glue-helper - drop CTR helper routines
The glue helper's CTR routines are no longer used, so drop them.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14 17:10:28 +11:00
Ard Biesheuvel f43dcaf2c9 crypto: x86/twofish - drop CTR mode implementation
Twofish in CTR mode is never used by the kernel directly, and is highly
unlikely to be relied upon by dm-crypt or algif_skcipher. So let's drop
the accelerated CTR mode implementation, and instead, rely on the CTR
template and the bare cipher.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14 17:10:28 +11:00
Ard Biesheuvel 7a6623cc68 crypto: x86/cast6 - drop CTR mode implementation
CAST6 in CTR mode is never used by the kernel directly, and is highly
unlikely to be relied upon by dm-crypt or algif_skcipher. So let's drop
the accelerated CTR mode implementation, and instead, rely on the CTR
template and the bare cipher.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14 17:10:28 +11:00
Ard Biesheuvel e2d60e2f59 crypto: x86/cast5 - drop CTR mode implementation
CAST5 in CTR mode is never used by the kernel directly, and is highly
unlikely to be relied upon by dm-crypt or algif_skcipher. So let's drop
the accelerated CTR mode implementation, and instead, rely on the CTR
template and the bare cipher.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14 17:10:28 +11:00
Ard Biesheuvel 2e9440ae6e crypto: x86/serpent - drop CTR mode implementation
Serpent in CTR mode is never used by the kernel directly, and is highly
unlikely to be relied upon by dm-crypt or algif_skcipher. So let's drop
the accelerated CTR mode implementation, and instead, rely on the CTR
template and the bare cipher.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14 17:10:28 +11:00
Ard Biesheuvel a1f91ecf81 crypto: x86/camellia - drop CTR mode implementation
Camellia in CTR mode is never used by the kernel directly, and is highly
unlikely to be relied upon by dm-crypt or algif_skcipher. So let's drop
the accelerated CTR mode implementation, and instead, rely on the CTR
template and the bare cipher.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14 17:10:28 +11:00
Ard Biesheuvel 31d49c448a crypto: x86/glue-helper - drop XTS helper routines
The glue helper's XTS routines are no longer used, so drop them.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14 17:10:27 +11:00
Ard Biesheuvel da4df93a94 crypto: x86/twofish - switch to XTS template
Now that the XTS template can wrap accelerated ECB modes, it can be
used to implement Twofish in XTS mode as well, which turns out to
be at least as fast, and sometimes even faster

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14 17:10:27 +11:00
Ard Biesheuvel 9ec0af8aa6 crypto: x86/serpent- switch to XTS template
Now that the XTS template can wrap accelerated ECB modes, it can be
used to implement Serpent in XTS mode as well, which turns out to
be at least as fast, and sometimes even faster

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14 17:10:27 +11:00
Ard Biesheuvel 2cc0fedb81 crypto: x86/cast6 - switch to XTS template
Now that the XTS template can wrap accelerated ECB modes, it can be
used to implement CAST6 in XTS mode as well, which turns out to
be at least as fast, and sometimes even faster

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14 17:10:27 +11:00
Ard Biesheuvel 55a7e88f01 crypto: x86/camellia - switch to XTS template
Now that the XTS template can wrap accelerated ECB modes, it can be
used to implement Camellia in XTS mode as well, which turns out to
be at least as fast, and sometimes even faster.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14 17:10:27 +11:00
Ard Biesheuvel d6cbf4eaa4 crypto: aesni - replace function pointers with static branches
Replace the function pointers in the GCM implementation with static branches,
which are based on code patching, which occurs only at module load time.
This avoids the severe performance penalty caused by the use of retpolines.

In order to retain the ability to switch between different versions of the
implementation based on the input size on cores that support AVX and AVX2,
use static branches instead of static calls.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14 17:10:25 +11:00
Ard Biesheuvel 83c83e6588 crypto: aesni - refactor scatterlist processing
Currently, the gcm(aes-ni) driver open codes the scatterlist handling
that is encapsulated by the skcipher walk API. So let's switch to that
instead.

Also, move the handling at the end of gcmaes_crypt_by_sg() that is
dependent on whether we are encrypting or decrypting into the callers,
which always do one or the other.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14 17:10:25 +11:00
Ard Biesheuvel 2694e23ffd crypto: aesni - clean up mapping of associated data
The gcm(aes-ni) driver is only built for x86_64, which does not make
use of highmem. So testing for PageHighMem is pointless and can be
omitted.

While at it, replace GFP_ATOMIC with the appropriate runtime decided
value based on the context.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14 17:10:25 +11:00
Ard Biesheuvel 30f2c18eb5 crypto: aesni - drop unused asm prototypes
Drop some prototypes that are declared but never called.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14 17:10:25 +11:00
Ard Biesheuvel a13ed1d15b crypto: aesni - prevent misaligned buffers on the stack
The GCM mode driver uses 16 byte aligned buffers on the stack to pass
the IV to the asm helpers, but unfortunately, the x86 port does not
guarantee that the stack pointer is 16 byte aligned upon entry in the
first place. Since the compiler is not aware of this, it will not emit
the additional stack realignment sequence that is needed, and so the
alignment is not guaranteed to be more than 8 bytes.

So instead, allocate some padding on the stack, and realign the IV
pointer by hand.

Cc: <stable@vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14 17:10:25 +11:00
Ard Biesheuvel 2481104fe9 crypto: x86/aes-ni-xts - rewrite and drop indirections via glue helper
The AES-NI driver implements XTS via the glue helper, which consumes
a struct with sets of function pointers which are invoked on chunks
of input data of the appropriate size, as annotated in the struct.

Let's get rid of this indirection, so that we can perform direct calls
to the assembler helpers. Instead, let's adopt the arm64 strategy, i.e.,
provide a helper which can consume inputs of any size, provided that the
penultimate, full block is passed via the last call if ciphertext stealing
needs to be applied.

This also allows us to enable the XTS mode for i386.

Tested-by: Eric Biggers <ebiggers@google.com> # x86_64
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-08 15:39:47 +11:00
Ard Biesheuvel 86ad60a65f crypto: x86/aes-ni-xts - use direct calls to and 4-way stride
The XTS asm helper arrangement is a bit odd: the 8-way stride helper
consists of back-to-back calls to the 4-way core transforms, which
are called indirectly, based on a boolean that indicates whether we
are performing encryption or decryption.

Given how costly indirect calls are on x86, let's switch to direct
calls, and given how the 8-way stride doesn't really add anything
substantial, use a 4-way stride instead, and make the asm core
routine deal with any multiple of 4 blocks. Since 512 byte sectors
or 4 KB blocks are the typical quantities XTS operates on, increase
the stride exported to the glue helper to 512 bytes as well.

As a result, the number of indirect calls is reduced from 3 per 64 bytes
of in/output to 1 per 512 bytes of in/output, which produces a 65% speedup
when operating on 1 KB blocks (measured on a Intel(R) Core(TM) i7-8650U CPU)

Fixes: 9697fa39ef ("x86/retpoline/crypto: Convert crypto assembler indirect jumps")
Tested-by: Eric Biggers <ebiggers@google.com> # x86_64
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-08 15:39:47 +11:00
Eric Biggers 8c4a93a127 crypto: blake2s - share the "shash" API boilerplate code
Add helper functions for shash implementations of BLAKE2s to
include/crypto/internal/blake2s.h, taking advantage of
__blake2s_update() and __blake2s_final() that were added by the previous
patch to share more code between the library and shash implementations.

crypto_blake2s_setkey() and crypto_blake2s_init() are usable as
shash_alg::setkey and shash_alg::init directly, while
crypto_blake2s_update() and crypto_blake2s_final() take an extra
'blake2s_compress_t' function pointer parameter.  This allows the
implementation of the compression function to be overridden, which is
the only part that optimized implementations really care about.

The new functions are inline functions (similar to those in sha1_base.h,
sha256_base.h, and sm3_base.h) because this avoids needing to add a new
module blake2s_helpers.ko, they aren't *too* long, and this avoids
indirect calls which are expensive these days.  Note that they can't go
in blake2s_generic.ko, as that would require selecting CRYPTO_BLAKE2S
from CRYPTO_BLAKE2S_X86, which would cause a recursive dependency.

Finally, use these new helper functions in the x86 implementation of
BLAKE2s.  (This part should be a separate patch, but unfortunately the
x86 implementation used the exact same function names like
"crypto_blake2s_update()", so it had to be updated at the same time.)

Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-03 08:41:38 +11:00
Eric Biggers 1aa90f4cf0 crypto: x86/blake2s - define shash_alg structs using macros
The shash_alg structs for the four variants of BLAKE2s are identical
except for the algorithm name, driver name, and digest size.  So, avoid
code duplication by using a macro to define these structs.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-03 08:41:38 +11:00
Ard Biesheuvel ddf169a98f crypto: aesni - implement support for cts(cbc(aes))
Follow the same approach as the arm64 driver for implementing a version
of AES-NI in CBC mode that supports ciphertext stealing. This results in
a ~2x speed increase for relatively short inputs (less than 256 bytes),
which is relevant given that AES-CBC with ciphertext stealing is used
for filename encryption in the fscrypt layer. For larger inputs, the
speedup is still significant (~25% on decryption, ~6% on encryption)

Tested-by: Eric Biggers <ebiggers@google.com> # x86_64
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-03 08:41:34 +11:00
Linus Torvalds 9e4b0d55d8 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "API:
   - Add speed testing on 1420-byte blocks for networking

  Algorithms:
   - Improve performance of chacha on ARM for network packets
   - Improve performance of aegis128 on ARM for network packets

  Drivers:
   - Add support for Keem Bay OCS AES/SM4
   - Add support for QAT 4xxx devices
   - Enable crypto-engine retry mechanism in caam
   - Enable support for crypto engine on sdm845 in qce
   - Add HiSilicon PRNG driver support"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (161 commits)
  crypto: qat - add capability detection logic in qat_4xxx
  crypto: qat - add AES-XTS support for QAT GEN4 devices
  crypto: qat - add AES-CTR support for QAT GEN4 devices
  crypto: atmel-i2c - select CONFIG_BITREVERSE
  crypto: hisilicon/trng - replace atomic_add_return()
  crypto: keembay - Add support for Keem Bay OCS AES/SM4
  dt-bindings: Add Keem Bay OCS AES bindings
  crypto: aegis128 - avoid spurious references crypto_aegis128_update_simd
  crypto: seed - remove trailing semicolon in macro definition
  crypto: x86/poly1305 - Use TEST %reg,%reg instead of CMP $0,%reg
  crypto: x86/sha512 - Use TEST %reg,%reg instead of CMP $0,%reg
  crypto: aesni - Use TEST %reg,%reg instead of CMP $0,%reg
  crypto: cpt - Fix sparse warnings in cptpf
  hwrng: ks-sa - Add dependency on IOMEM and OF
  crypto: lib/blake2s - Move selftest prototype into header file
  crypto: arm/aes-ce - work around Cortex-A57/A72 silion errata
  crypto: ecdh - avoid unaligned accesses in ecdh_set_secret()
  crypto: ccree - rework cache parameters handling
  crypto: cavium - Use dma_set_mask_and_coherent to simplify code
  crypto: marvell/octeontx - Use dma_set_mask_and_coherent to simplify code
  ...
2020-12-14 12:18:19 -08:00
Uros Bizjak be169fe3ce crypto: x86/poly1305 - Use TEST %reg,%reg instead of CMP $0,%reg
CMP $0,%reg can't set overflow flag, so we can use shorter TEST %reg,%reg
instruction when only zero and sign flags are checked (E,L,LE,G,GE conditions).

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-12-04 18:13:16 +11:00
Uros Bizjak 0b837f1ef8 crypto: x86/sha512 - Use TEST %reg,%reg instead of CMP $0,%reg
CMP $0,%reg can't set overflow flag, so we can use shorter TEST %reg,%reg
instruction when only zero and sign flags are checked (E,L,LE,G,GE conditions).

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-12-04 18:13:15 +11:00
Uros Bizjak 032d049ea0 crypto: aesni - Use TEST %reg,%reg instead of CMP $0,%reg
CMP $0,%reg can't set overflow flag, so we can use shorter TEST %reg,%reg
instruction when only zero and sign flags are checked (E,L,LE,G,GE conditions).

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-12-04 18:13:15 +11:00
Eric Biggers a24d22b225 crypto: sha - split sha.h into sha1.h and sha2.h
Currently <crypto/sha.h> contains declarations for both SHA-1 and SHA-2,
and <crypto/sha3.h> contains declarations for SHA-3.

This organization is inconsistent, but more importantly SHA-1 is no
longer considered to be cryptographically secure.  So to the extent
possible, SHA-1 shouldn't be grouped together with any of the other SHA
versions, and usage of it should be phased out.

Therefore, split <crypto/sha.h> into two headers <crypto/sha1.h> and
<crypto/sha2.h>, and make everyone explicitly specify whether they want
the declarations for SHA-1, SHA-2, or both.

This avoids making the SHA-1 declarations visible to files that don't
want anything to do with SHA-1.  It also prepares for potentially moving
sha1.h into a new insecure/ or dangerous/ directory.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-11-20 14:45:33 +11:00
Arvind Sankar 458c0480dc crypto: hash - Use memzero_explicit() for clearing state
Without the barrier_data() inside memzero_explicit(), the compiler may
optimize away the state-clearing if it can tell that the state is not
used afterwards.

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-10-30 17:35:03 +11:00
Eric Biggers d4b3984c9e crypto: x86/aes - remove unused file aes_glue.c
Commit 1d2c327931 ("crypto: x86/aes - drop scalar assembler
implementations") was meant to remove aes_glue.c, but it actually left
it as an unused one-line file.  Remove this unused file.

Cc: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-10-30 17:35:01 +11:00
Eric Biggers c3a98c3ad5 crypto: x86/poly1305 - add back a needed assignment
One of the assignments that was removed by commit 4a0c1de64b ("crypto:
x86/poly1305 - Remove assignments with no effect") is actually needed,
since it affects the return value.

This fixes the following crypto self-test failure:

    alg: shash: poly1305-simd test failed (wrong result) on test vector 2, cfg="init+update+final aligned buffer"

Fixes: 4a0c1de64b ("crypto: x86/poly1305 - Remove assignments with no effect")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-10-24 09:38:32 +11:00
Herbert Xu 4a0c1de64b crypto: x86/poly1305 - Remove assignments with no effect
This patch removes a few ineffectual assignments from the function
crypto_poly1305_setdctxkey.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-10-02 18:02:13 +10:00
Uros Bizjak 7dfd1e01b3 crypto: poly1305-x86_64 - Use XORL r32,32
x86_64 zero extends 32bit operations, so for 64bit operands,
XORL r32,r32 is functionally equal to XORQ r64,r64, but avoids
a REX prefix byte when legacy registers are used.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-11 14:39:13 +10:00
Uros Bizjak db719539fd crypto: curve25519-x86_64 - Use XORL r32,32
x86_64 zero extends 32bit operations, so for 64bit operands,
XORL r32,r32 is functionally equal to XORL r64,r64, but avoids
a REX prefix byte when legacy registers are used.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-11 14:39:13 +10:00
Uros Bizjak 3a95887e27 crypto: x86/crc32c-intel - Use CRC32 mnemonic
Current minimum required version of binutils is 2.23,
which supports CRC32 instruction mnemonic.

Replace the byte-wise specification of CRC32 with this proper mnemonic.
The compiler is now able to pass memory operand to the instruction,
so there is no need for a temporary register anymore.

Some examples of the improvement:

 12a:	48 8b 08             	mov    (%rax),%rcx
 12d:	f2 48 0f 38 f1 f1    	crc32q %rcx,%rsi
 133:	48 83 c0 08          	add    $0x8,%rax
 137:	48 39 d0             	cmp    %rdx,%rax
 13a:	75 ee                	jne    12a <crc32c_intel_update+0x1a>

to:

 125:	f2 48 0f 38 f1 06    	crc32q (%rsi),%rax
 12b:	48 83 c6 08          	add    $0x8,%rsi
 12f:	48 39 d6             	cmp    %rdx,%rsi
 132:	75 f1                	jne    125 <crc32c_intel_update+0x15>

and:

 146:	0f b6 08             	movzbl (%rax),%ecx
 149:	f2 0f 38 f0 f1       	crc32b %cl,%esi
 14e:	48 83 c0 01          	add    $0x1,%rax
 152:	48 39 d0             	cmp    %rdx,%rax
 155:	75 ef                	jne    146 <crc32c_intel_update+0x36>

to:

 13b:	f2 0f 38 f0 02       	crc32b (%rdx),%eax
 140:	48 83 c2 01          	add    $0x1,%rdx
 144:	48 39 ca             	cmp    %rcx,%rdx
 147:	75 f2                	jne    13b <crc32c_intel_update+0x2b>

As the compiler has some more freedom w.r.t. register allocation,
there is also a couple of reg-reg moves removed.

There are no hidden states for CRC32 insn, so there is no need to mark
assembly as volatile.

v2: Introduce CRC32_INST define.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: "David S. Miller" <davem@davemloft.net>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: Borislav Petkov <bp@alien8.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-08-21 14:45:28 +10:00
Herbert Xu 0c3dc787a6 crypto: algapi - Remove skbuff.h inclusion
The header file algapi.h includes skbuff.h unnecessarily since
all we need is a forward declaration for struct sk_buff.  This
patch removes that inclusion.

Unfortunately skbuff.h pulls in a lot of things and drivers over
the years have come to rely on it so this patch adds a lot of
missing inclusions that result from this.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-08-20 14:04:28 +10:00
Herbert Xu 054a5540fb crypto: x86/curve25519 - Remove unused carry variables
The carry variables are assigned but never used, which upsets
the compiler.  This patch removes them.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Karthikeyan Bhargavan <karthik.bhargavan@gmail.com>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-07-31 18:25:29 +10:00
Arnd Bergmann 44623b2818 crypto: x86/crc32c - fix building with clang ias
The clang integrated assembler complains about movzxw:

arch/x86/crypto/crc32c-pcl-intel-asm_64.S:173:2: error: invalid instruction mnemonic 'movzxw'

It seems that movzwq is the mnemonic that it expects instead,
and this is what objdump prints when disassembling the file.

Fixes: 6a8ce1ef39 ("crypto: crc32c - Optimize CRC32C calculation with PCLMULQDQ instruction")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-07-23 17:34:16 +10:00
Uros Bizjak d7866e503b crypto: x86 - Remove include/asm/inst.h
Current minimum required version of binutils is 2.23,
which supports PSHUFB, PCLMULQDQ, PEXTRD, AESKEYGENASSIST,
AESIMC, AESENC, AESENCLAST, AESDEC, AESDECLAST and MOVQ
instruction mnemonics.

Substitute macros from include/asm/inst.h with a proper
instruction mnemonics in various assmbly files from
x86/crypto directory, and remove now unneeded file.

The patch was tested by calculating and comparing sha256sum
hashes of stripped object files before and after the patch,
to be sure that executable code didn't change.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: "David S. Miller" <davem@davemloft.net>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: Borislav Petkov <bp@alien8.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-07-16 21:49:07 +10:00
Ard Biesheuvel e79a317151 crypto: x86/chacha-sse3 - use unaligned loads for state array
Due to the fact that the x86 port does not support allocating objects
on the stack with an alignment that exceeds 8 bytes, we have a rather
ugly hack in the x86 code for ChaCha to ensure that the state array is
aligned to 16 bytes, allowing the SSE3 implementation of the algorithm
to use aligned loads.

Given that the performance benefit of using of aligned loads appears to
be limited (~0.25% for 1k blocks using tcrypt on a Corei7-8650U), and
the fact that this hack has leaked into generic ChaCha code, let's just
remove it.

Cc: Martin Willi <martin@strongswan.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Martin Willi <martin@strongswan.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-07-16 21:49:04 +10:00
Sedat Dilek 3347c8a079 crypto: aesni - Fix build with LLVM_IAS=1
When building with LLVM_IAS=1 means using Clang's Integrated Assembly (IAS)
from LLVM/Clang >= v10.0.1-rc1+ instead of GNU/as from GNU/binutils
I see the following breakage in Debian/testing AMD64:

<instantiation>:15:74: error: too many positional arguments
 PRECOMPUTE 8*3+8(%rsp), %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7,
                                                                         ^
 arch/x86/crypto/aesni-intel_asm.S:1598:2: note: while in macro instantiation
 GCM_INIT %r9, 8*3 +8(%rsp), 8*3 +16(%rsp), 8*3 +24(%rsp)
 ^
<instantiation>:47:2: error: unknown use of instruction mnemonic without a size suffix
 GHASH_4_ENCRYPT_4_PARALLEL_dec %xmm9, %xmm10, %xmm11, %xmm12, %xmm13, %xmm14, %xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, enc
 ^
arch/x86/crypto/aesni-intel_asm.S:1599:2: note: while in macro instantiation
 GCM_ENC_DEC dec
 ^
<instantiation>:15:74: error: too many positional arguments
 PRECOMPUTE 8*3+8(%rsp), %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7,
                                                                         ^
arch/x86/crypto/aesni-intel_asm.S:1686:2: note: while in macro instantiation
 GCM_INIT %r9, 8*3 +8(%rsp), 8*3 +16(%rsp), 8*3 +24(%rsp)
 ^
<instantiation>:47:2: error: unknown use of instruction mnemonic without a size suffix
 GHASH_4_ENCRYPT_4_PARALLEL_enc %xmm9, %xmm10, %xmm11, %xmm12, %xmm13, %xmm14, %xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, enc
 ^
arch/x86/crypto/aesni-intel_asm.S:1687:2: note: while in macro instantiation
 GCM_ENC_DEC enc

Craig Topper suggested me in ClangBuiltLinux issue #1050:

> I think the "too many positional arguments" is because the parser isn't able
> to handle the trailing commas.
>
> The "unknown use of instruction mnemonic" is because the macro was named
> GHASH_4_ENCRYPT_4_PARALLEL_DEC but its being instantiated with
> GHASH_4_ENCRYPT_4_PARALLEL_dec I guess gas ignores case on the
> macro instantiation, but llvm doesn't.

First, I removed the trailing comma in the PRECOMPUTE line.

Second, I substituted:
1. GHASH_4_ENCRYPT_4_PARALLEL_DEC -> GHASH_4_ENCRYPT_4_PARALLEL_dec
2. GHASH_4_ENCRYPT_4_PARALLEL_ENC -> GHASH_4_ENCRYPT_4_PARALLEL_enc

With these changes I was able to build with LLVM_IAS=1 and boot on bare metal.

I confirmed that this works with Linux-kernel v5.7.5 final.

NOTE: This patch is on top of Linux v5.7 final.

Thanks to Craig and especially Nick for double-checking and his comments.

Suggested-by: Craig Topper <craig.topper@intel.com>
Suggested-by: Craig Topper <craig.topper@gmail.com>
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: "ClangBuiltLinux" <clang-built-linux@googlegroups.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1050
Link: https://bugs.llvm.org/show_bug.cgi?id=24494
Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-07-09 18:25:22 +10:00
Jian Cai 44069737ac crypto: aesni - add compatibility with IAS
Clang's integrated assembler complains "invalid reassignment of
non-absolute variable 'var_ddq_add'" while assembling
arch/x86/crypto/aes_ctrby8_avx-x86_64.S. It was because var_ddq_add was
reassigned with non-absolute values several times, which IAS did not
support. We can avoid the reassignment by replacing the uses of
var_ddq_add with its definitions accordingly to have compatilibility
with IAS.

Link: https://github.com/ClangBuiltLinux/linux/issues/1008
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Reported-by: Fangrui Song <maskray@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # build+boot Linux v5.7.5; clang v11.0.0-git
Signed-off-by: Jian Cai <caij2003@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-07-03 14:18:34 +10:00
Linus Torvalds 69fc06f70f There are a lot of objtool changes in this cycle, all across the map:
- Speed up objtool significantly, especially when there are large number of sections
  - Improve objtool's understanding of special instructions such as IRET,
    to reduce the number of annotations required
  - Implement 'noinstr' validation
  - Do baby steps for non-x86 objtool use
  - Simplify/fix retpoline decoding
  - Add vmlinux validation
  - Improve documentation
  - Fix various bugs and apply smaller cleanups
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7VHvcRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gEfBAAhvPWljUmfQsetYq4q9BdbuC4xPSQN9ra
 e+2zu1MQaohkjAdFM1boNVhCCGKFUvlTEEw3GJR141Us6Y/ZRS8VIo70tmVSku6I
 OwuR5i8SgEKwurr1SwLxrI05rovYWRLSaDIRTHn2CViPEjgriyFGRV8QKam3AYmI
 dx47la3ELwuQR68nIdIMzDRt49oZVy+ZKW8Pjgjklzrd5KMYsPy7HPtraHUMeDg+
 GdoC7RresIt5AFiDiIJzKTT/jROI7KuHFnM6blluKHoKenWhYBFCz3sd6IvCdQWX
 JGy+KKY6H+YDMSpgc4FRP56M3GI0hX14oCd7L72epSLfOuzPr9Tmf6wfyQ8f50Je
 LGLD47tyltIcQR9H85YdR8UQspkjSW6xcql4ByCPTEqp0UzSGTsVntvsHzwsgz6A
 Csh3s+DVdv0rk5ZjMCu8STA2oErpehJm7fmugt2oLx+nsCNCBUI25lilw5JGeq5c
 +cO0IRxRxHPeRvMWvItTjbixVAHOHYlB00ilDbvsm+GnTJgu/5cMqpXdLvfXI2Rr
 nl360bSS3t3J4w5rX0mXw4x24vjQmVrA69jU+oo8RSHje2X8Y4Q7sFHNjmN0YAI3
 Re8aP6HSLQjioJxGz9aISlrxmPOXe0CMp8JE586SREVgmS/olXtidMgi7l12uZ2B
 cRdtNYcn31U=
 =dbCU
 -----END PGP SIGNATURE-----

Merge tag 'objtool-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool updates from Ingo Molnar:
 "There are a lot of objtool changes in this cycle, all across the map:

   - Speed up objtool significantly, especially when there are large
     number of sections

   - Improve objtool's understanding of special instructions such as
     IRET, to reduce the number of annotations required

   - Implement 'noinstr' validation

   - Do baby steps for non-x86 objtool use

   - Simplify/fix retpoline decoding

   - Add vmlinux validation

   - Improve documentation

   - Fix various bugs and apply smaller cleanups"

* tag 'objtool-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
  objtool: Enable compilation of objtool for all architectures
  objtool: Move struct objtool_file into arch-independent header
  objtool: Exit successfully when requesting help
  objtool: Add check_kcov_mode() to the uaccess safelist
  samples/ftrace: Fix asm function ELF annotations
  objtool: optimize add_dead_ends for split sections
  objtool: use gelf_getsymshndx to handle >64k sections
  objtool: Allow no-op CFI ops in alternatives
  x86/retpoline: Fix retpoline unwind
  x86: Change {JMP,CALL}_NOSPEC argument
  x86: Simplify retpoline declaration
  x86/speculation: Change FILL_RETURN_BUFFER to work with objtool
  objtool: Add support for intra-function calls
  objtool: Move the IRET hack into the arch decoder
  objtool: Remove INSN_STACK
  objtool: Make handle_insn_ops() unconditional
  objtool: Rework allocating stack_ops on decode
  objtool: UNWIND_HINT_RET_OFFSET should not check registers
  objtool: is_fentry_call() crashes if call has no destination
  x86,smap: Fix smap_{save,restore}() alternatives
  ...
2020-06-01 13:13:00 -07:00
Linus Torvalds 81e8c10dac Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "API:
   - Introduce crypto_shash_tfm_digest() and use it wherever possible.
   - Fix use-after-free and race in crypto_spawn_alg.
   - Add support for parallel and batch requests to crypto_engine.

  Algorithms:
   - Update jitter RNG for SP800-90B compliance.
   - Always use jitter RNG as seed in drbg.

  Drivers:
   - Add Arm CryptoCell driver cctrng.
   - Add support for SEV-ES to the PSP driver in ccp"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (114 commits)
  crypto: hisilicon - fix driver compatibility issue with different versions of devices
  crypto: engine - do not requeue in case of fatal error
  crypto: cavium/nitrox - Fix a typo in a comment
  crypto: hisilicon/qm - change debugfs file name from qm_regs to regs
  crypto: hisilicon/qm - add DebugFS for xQC and xQE dump
  crypto: hisilicon/zip - add debugfs for Hisilicon ZIP
  crypto: hisilicon/hpre - add debugfs for Hisilicon HPRE
  crypto: hisilicon/sec2 - add debugfs for Hisilicon SEC
  crypto: hisilicon/qm - add debugfs to the QM state machine
  crypto: hisilicon/qm - add debugfs for QM
  crypto: stm32/crc32 - protect from concurrent accesses
  crypto: stm32/crc32 - don't sleep in runtime pm
  crypto: stm32/crc32 - fix multi-instance
  crypto: stm32/crc32 - fix run-time self test issue.
  crypto: stm32/crc32 - fix ext4 chksum BUG_ON()
  crypto: hisilicon/zip - Use temporary sqe when doing work
  crypto: hisilicon - add device error report through abnormal irq
  crypto: hisilicon - remove codes of directly report device errors through MSI
  crypto: hisilicon - QM memory management optimization
  crypto: hisilicon - unify initial value assignment into QM
  ...
2020-06-01 12:00:10 -07:00
Ingo Molnar 7c0577f4e6 Linux 5.7-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl7BzV8eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGg8EH/A2pXMTxtc96RI4S
 sttEsUQqbakFS0Z/2tQPpMGr/qW2e5eHgsTX/a3SiUeZiIXk6f4lMFkMuctzBf7p
 X77cNEDwGOEdbtCXTsMcmKSde7sP2zCXsPB8xTWLyE6rnaFRgikwwkeqgkIKhp1h
 bvOQV0t9HNGvxGAM0iZeOvQAvFl4vd7nS123/MYbir9cugfQUSJRueQ4BiCiJqVE
 6cNA7/vFzDJuFGszzIrJ7HXn/IdQMMWHkvTDjgBw0GZw1mDbGFbfbZwOeTz1ojCt
 smUQ4tIFxBa/VA5zx7dOy2P2keHbSVf4VLkZRPcceT7OqVS65ETmFDp+qt5NdWM5
 vZ8+7/0=
 =CyYH
 -----END PGP SIGNATURE-----

Merge tag 'v5.7-rc6' into objtool/core, to pick up fixes and resolve semantic conflict

Resolve structural conflict between:

  59566b0b622e: ("x86/ftrace: Have ftrace trampolines turn read-only at the end of system boot up")

which introduced a new reference to 'ftrace_epilogue', and:

  0298739b7983: ("x86,ftrace: Fix ftrace_regs_caller() unwind")

Which renamed it to 'ftrace_caller_end'. Rename the new usage site in the merge commit.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-05-18 13:09:37 +03:00
Eric Biggers 2aaba014b5 crypto: lib/sha1 - remove unnecessary includes of linux/cryptohash.h
<linux/cryptohash.h> sounds very generic and important, like it's the
header to include if you're doing cryptographic hashing in the kernel.
But actually it only includes the library implementation of the SHA-1
compression function (not even the full SHA-1).  This should basically
never be used anymore; SHA-1 is no longer considered secure, and there
are much better ways to do cryptographic hashing in the kernel.

Most files that include this header don't actually need it.  So in
preparation for removing it, remove all these unneeded includes of it.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-05-08 15:32:17 +10:00
Peter Zijlstra 34fdce6981 x86: Change {JMP,CALL}_NOSPEC argument
In order to change the {JMP,CALL}_NOSPEC macros to call out-of-line
versions of the retpoline magic, we need to remove the '%' from the
argument, such that we can paste it onto symbol names.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200428191700.151623523@infradead.org
2020-04-30 20:14:34 +02:00
Jason A. Donenfeld a9a8ba90fa crypto: arch/nhpoly1305 - process in explicit 4k chunks
Rather than chunking via PAGE_SIZE, this commit changes the arch
implementations to chunk in explicit 4k parts, so that calculations on
maximum acceptable latency don't suddenly become invalid on platforms
where PAGE_SIZE isn't 4k, such as arm64.

Fixes: 0f961f9f67 ("crypto: x86/nhpoly1305 - add AVX2 accelerated NHPoly1305")
Fixes: 012c82388c ("crypto: x86/nhpoly1305 - add SSE2 accelerated NHPoly1305")
Fixes: a00fa0c887 ("crypto: arm64/nhpoly1305 - add NEON-accelerated NHPoly1305")
Fixes: 16aae3595a ("crypto: arm/nhpoly1305 - add NEON-accelerated NHPoly1305")
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-04-30 15:16:59 +10:00
Jason A. Donenfeld 706024a52c crypto: arch/lib - limit simd usage to 4k chunks
The initial Zinc patchset, after some mailing list discussion, contained
code to ensure that kernel_fpu_enable would not be kept on for more than
a 4k chunk, since it disables preemption. The choice of 4k isn't totally
scientific, but it's not a bad guess either, and it's what's used in
both the x86 poly1305, blake2s, and nhpoly1305 code already (in the form
of PAGE_SIZE, which this commit corrects to be explicitly 4k for the
former two).

Ard did some back of the envelope calculations and found that
at 5 cycles/byte (overestimate) on a 1ghz processor (pretty slow), 4k
means we have a maximum preemption disabling of 20us, which Sebastian
confirmed was probably a good limit.

Unfortunately the chunking appears to have been left out of the final
patchset that added the glue code. So, this commit adds it back in.

Fixes: 84e03fa39f ("crypto: x86/chacha - expose SIMD ChaCha routine as library function")
Fixes: b3aad5bad2 ("crypto: arm64/chacha - expose arm64 ChaCha routine as library function")
Fixes: a44a3430d7 ("crypto: arm/chacha - expose ARM ChaCha routine as library function")
Fixes: d7d7b85356 ("crypto: x86/poly1305 - wire up faster implementations for kernel")
Fixes: f569ca1647 ("crypto: arm64/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON implementation")
Fixes: a6b803b3dd ("crypto: arm/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON implementation")
Fixes: ed0356eda1 ("crypto: blake2s - x86_64 SIMD implementation")
Cc: Eric Biggers <ebiggers@google.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-04-30 15:16:59 +10:00
Jason A. Donenfeld e6abef610c x86: update AS_* macros to binutils >=2.23, supporting ADX and AVX2
Now that the kernel specifies binutils 2.23 as the minimum version, we
can remove ifdefs for AVX2 and ADX throughout.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-04-09 00:12:48 +09:00
Masahiro Yamada d7e40ea83e crypto: x86 - clean up poly1305-x86_64-cryptogams.S by 'make clean'
poly1305-x86_64-cryptogams.S is a generated file, so it should be
cleaned up by 'make clean'.

Assigning it to the variable 'targets' teaches Kbuild that it is a
generated file. However, this line is not evaluated when cleaning
because scripts/Makefile.clean does not include include/config/auto.conf.

Remove the ifneq-conditional, so this file is correctly cleaned up.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Ingo Molnar <mingo@kernel.org>
2020-04-09 00:02:00 +09:00
Jason A. Donenfeld 4dcbfc35f7 crypto: x86 - rework configuration based on Kconfig
Now that assembler capabilities are probed inside of Kconfig, we can set
up proper Kconfig-based dependencies. We also take this opportunity to
reorder the Makefile, so that items are grouped logically by primitive.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-04-09 00:01:59 +09:00
Masahiro Yamada 42251572c4 x86: remove always-defined CONFIG_AS_AVX
CONFIG_AS_AVX was introduced by commit ea4d26ae24 ("raid5: add AVX
optimized RAID5 checksumming").

We raise the minimal supported binutils version from time to time.
The last bump was commit 1fb12b35e5 ("kbuild: Raise the minimum
required binutils version to 2.21").

I confirmed the code in $(call as-instr,...) can be assembled by the
binutils 2.21 assembler and also by LLVM integrated assembler.

Remove CONFIG_AS_AVX, which is always defined.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
2020-04-09 00:01:59 +09:00
Masahiro Yamada 92203b0280 x86: remove always-defined CONFIG_AS_SSSE3
CONFIG_AS_SSSE3 was introduced by commit 75aaf4c3e6 ("x86/raid6:
correctly check for assembler capabilities").

We raise the minimal supported binutils version from time to time.
The last bump was commit 1fb12b35e5 ("kbuild: Raise the minimum
required binutils version to 2.21").

I confirmed the code in $(call as-instr,...) can be assembled by the
binutils 2.21 assembler and also by LLVM integrated assembler.

Remove CONFIG_AS_SSSE3, which is always defined.

I added ifdef CONFIG_X86 to lib/raid6/algos.c to avoid link errors
on non-x86 architectures.

lib/raid6/algos.c is built not only for the kernel but also for
testing the library code from userspace. I added -DCONFIG_X86 to
lib/raid6/test/Makefile to cator to this usecase.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
2020-04-09 00:01:59 +09:00
Linus Torvalds ff2ae607c6 SPDX patches for 5.7-rc1.
Here are 3 SPDX patches for 5.7-rc1.
 
 One fixes up the SPDX tag for a single driver, while the other two go
 through the tree and add SPDX tags for all of the .gitignore files as
 needed.
 
 Nothing too complex, but you will get a merge conflict with your current
 tree, that should be trivial to handle (one file modified by two things,
 one file deleted.)
 
 All 3 of these have been in linux-next for a while, with no reported
 issues other than the merge conflict.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXodg5A8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykySQCgy9YDrkz7nWq6v3Gohl6+lW/L+rMAnRM4uTZm
 m5AuCzO3Azt9KBi7NL+L
 =2Lm5
 -----END PGP SIGNATURE-----

Merge tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx

Pull SPDX updates from Greg KH:
 "Here are three SPDX patches for 5.7-rc1.

  One fixes up the SPDX tag for a single driver, while the other two go
  through the tree and add SPDX tags for all of the .gitignore files as
  needed.

  Nothing too complex, but you will get a merge conflict with your
  current tree, that should be trivial to handle (one file modified by
  two things, one file deleted.)

  All three of these have been in linux-next for a while, with no
  reported issues other than the merge conflict"

* tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx:
  ASoC: MT6660: make spdxcheck.py happy
  .gitignore: add SPDX License Identifier
  .gitignore: remove too obvious comments
2020-04-03 13:12:26 -07:00
Linus Torvalds 72f35423e8 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "API:
   - Fix out-of-sync IVs in self-test for IPsec AEAD algorithms

  Algorithms:
   - Use formally verified implementation of x86/curve25519

  Drivers:
   - Enhance hwrng support in caam

   - Use crypto_engine for skcipher/aead/rsa/hash in caam

   - Add Xilinx AES driver

   - Add uacce driver

   - Register zip engine to uacce in hisilicon

   - Add support for OCTEON TX CPT engine in marvell"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (162 commits)
  crypto: af_alg - bool type cosmetics
  crypto: arm[64]/poly1305 - add artifact to .gitignore files
  crypto: caam - limit single JD RNG output to maximum of 16 bytes
  crypto: caam - enable prediction resistance in HRWNG
  bus: fsl-mc: add api to retrieve mc version
  crypto: caam - invalidate entropy register during RNG initialization
  crypto: caam - check if RNG job failed
  crypto: caam - simplify RNG implementation
  crypto: caam - drop global context pointer and init_done
  crypto: caam - use struct hwrng's .init for initialization
  crypto: caam - allocate RNG instantiation descriptor with GFP_DMA
  crypto: ccree - remove duplicated include from cc_aead.c
  crypto: chelsio - remove set but not used variable 'adap'
  crypto: marvell - enable OcteonTX cpt options for build
  crypto: marvell - add the Virtual Function driver for CPT
  crypto: marvell - add support for OCTEON TX CPT engine
  crypto: marvell - create common Kconfig and Makefile for Marvell
  crypto: arm/neon - memzero_explicit aes-cbc key
  crypto: bcm - Use scnprintf() for avoiding potential buffer overflow
  crypto: atmel-i2c - Fix wakeup fail
  ...
2020-04-01 14:47:40 -07:00
Ingo Molnar 629b3df7ec Merge branch 'x86/cpu' into perf/core, to resolve conflict
Conflicts:
	arch/x86/events/intel/uncore.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-03-25 15:20:44 +01:00
Masahiro Yamada d198b34f38 .gitignore: add SPDX License Identifier
Add SPDX License Identifier to all .gitignore files.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-25 11:50:48 +01:00
Thomas Gleixner f30cfacad1 crypto: Convert to new CPU match macros
The new macro set has a consistent namespace and uses C99 initializers
instead of the grufty C89 ones.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lkml.kernel.org/r/20200320131510.700250889@linutronix.de
2020-03-24 21:36:06 +01:00
Jason A. Donenfeld dc7fc3a53a crypto: x86/curve25519 - leave r12 as spare register
This updates to the newer register selection proved by HACL*, which
leads to a more compact instruction encoding, and saves around 100
cycles.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-03-06 12:28:25 +11:00
Jason A. Donenfeld 1579f1bc3b crypto: x86/curve25519 - support assemblers with no adx support
Some older version of GAS do not support the ADX instructions, similarly
to how they also don't support AVX and such. This commit adds the same
build-time detection mechanisms we use for AVX and others for ADX, and
then makes sure that the curve25519 library dispatcher calls the right
functions.

Reported-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-03-05 18:28:09 +11:00
Jason A. Donenfeld 07b586fe06 crypto: x86/curve25519 - replace with formally verified implementation
This comes from INRIA's HACL*/Vale. It implements the same algorithm and
implementation strategy as the code it replaces, only this code has been
formally verified, sans the base point multiplication, which uses code
similar to prior, only it uses the formally verified field arithmetic
alongside reproducable ladder generation steps. This doesn't have a
pure-bmi2 version, which means haswell no longer benefits, but the
increased (doubled) code complexity is not worth it for a single
generation of chips that's already old.

Performance-wise, this is around 1% slower on older microarchitectures,
and slightly faster on newer microarchitectures, mainly 10nm ones or
backports of 10nm to 14nm. This implementation is "everest" below:

Xeon E5-2680 v4 (Broadwell)

     armfazh: 133340 cycles per call
     everest: 133436 cycles per call

Xeon Gold 5120 (Sky Lake Server)

     armfazh: 112636 cycles per call
     everest: 113906 cycles per call

Core i5-6300U (Sky Lake Client)

     armfazh: 116810 cycles per call
     everest: 117916 cycles per call

Core i7-7600U (Kaby Lake)

     armfazh: 119523 cycles per call
     everest: 119040 cycles per call

Core i7-8750H (Coffee Lake)

     armfazh: 113914 cycles per call
     everest: 113650 cycles per call

Core i9-9880H (Coffee Lake Refresh)

     armfazh: 112616 cycles per call
     everest: 114082 cycles per call

Core i3-8121U (Cannon Lake)

     armfazh: 113202 cycles per call
     everest: 111382 cycles per call

Core i7-8265U (Whiskey Lake)

     armfazh: 127307 cycles per call
     everest: 127697 cycles per call

Core i7-8550U (Kaby Lake Refresh)

     armfazh: 127522 cycles per call
     everest: 127083 cycles per call

Xeon Platinum 8275CL (Cascade Lake)

     armfazh: 114380 cycles per call
     everest: 114656 cycles per call

Achieving these kind of results with formally verified code is quite
remarkable, especialy considering that performance is favorable for
newer chips.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-02-13 16:58:09 +08:00
Jason A. Donenfeld f9e7fe32a7 crypto: x86/poly1305 - emit does base conversion itself
The emit code does optional base conversion itself in assembly, so we
don't need to do that here. Also, neither one of these functions uses
simd instructions, so checking for that doesn't make sense either.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-01-22 16:21:11 +08:00
Jason A. Donenfeld 1f68689953 crypto: x86/poly1305 - fix .gitignore typo
Admist the kbuild robot induced changes, the .gitignore file for the
generated file wasn't updated with the non-clashing filename. This
commit adjusts that.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-01-22 16:21:11 +08:00
Kees Cook 41419a2890 crypto: x86/sha - Eliminate casts on asm implementations
In order to avoid CFI function prototype mismatches, this removes the
casts on assembly implementations of sha1/256/512 accelerators. The
safety checks from BUILD_BUG_ON() remain.

Additionally, this renames various arguments for clarity, as suggested
by Eric Biggers.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-01-22 16:21:08 +08:00
Jason A. Donenfeld d7d7b85356 crypto: x86/poly1305 - wire up faster implementations for kernel
These x86_64 vectorized implementations support AVX, AVX-2, and AVX512F.
The AVX-512F implementation is disabled on Skylake, due to throttling,
but it is quite fast on >= Cannonlake.

On the left is cycle counts on a Core i7 6700HQ using the AVX-2
codepath, comparing this implementation ("new") to the implementation in
the current crypto api ("old"). On the right are benchmarks on a Xeon
Gold 5120 using the AVX-512 codepath. The new implementation is faster
on all benchmarks.

        AVX-2                  AVX-512
      ---------              -----------

    size    old     new      size   old     new
    ----    ----    ----     ----   ----    ----
    0       70      68       0      74      70
    16      92      90       16     96      92
    32      134     104      32     136     106
    48      172     120      48     184     124
    64      218     136      64     218     138
    80      254     158      80     260     160
    96      298     174      96     300     176
    112     342     192      112    342     194
    128     388     212      128    384     212
    144     428     228      144    420     226
    160     466     246      160    464     248
    176     510     264      176    504     264
    192     550     282      192    544     282
    208     594     302      208    582     300
    224     628     316      224    624     318
    240     676     334      240    662     338
    256     716     354      256    708     358
    272     764     374      272    748     372
    288     802     352      288    788     358
    304     420     366      304    422     370
    320     428     360      320    432     364
    336     484     378      336    486     380
    352     426     384      352    434     390
    368     478     400      368    480     408
    384     488     394      384    490     398
    400     542     408      400    542     412
    416     486     416      416    492     426
    432     534     430      432    538     436
    448     544     422      448    546     432
    464     600     438      464    600     448
    480     540     448      480    548     456
    496     594     464      496    594     476
    512     602     456      512    606     470
    528     656     476      528    656     480
    544     600     480      544    606     498
    560     650     494      560    652     512
    576     664     490      576    662     508
    592     714     508      592    716     522
    608     656     514      608    664     538
    624     708     532      624    710     552
    640     716     524      640    720     516
    656     770     536      656    772     526
    672     716     548      672    722     544
    688     770     562      688    768     556
    704     774     552      704    778     556
    720     826     568      720    832     568
    736     768     574      736    780     584
    752     822     592      752    826     600
    768     830     584      768    836     560
    784     884     602      784    888     572
    800     828     610      800    838     588
    816     884     628      816    884     604
    832     888     618      832    894     598
    848     942     632      848    946     612
    864     884     644      864    896     628
    880     936     660      880    942     644
    896     948     652      896    952     608
    912     1000    664      912    1004    616
    928     942     676      928    954     634
    944     994     690      944    1000    646
    960     1002    680      960    1008    646
    976     1054    694      976    1062    658
    992     1002    706      992    1012    674
    1008    1052    720      1008   1058    690

This commit wires in the prior implementation from Andy, and makes the
following changes to be suitable for kernel land.

  - Some cosmetic and structural changes, like renaming labels to
    .Lname, constants, and other Linux conventions, as well as making
    the code easy for us to maintain moving forward.

  - CPU feature checking is done in C by the glue code.

  - We avoid jumping into the middle of functions, to appease objtool,
    and instead parameterize shared code.

  - We maintain frame pointers so that stack traces make sense.

  - We remove the dependency on the perl xlate code, which transforms
    the output into things that assemblers we don't care about use.

Importantly, none of our changes affect the arithmetic or core code, but
just involve the differing environment of kernel space.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Co-developed-by: Samuel Neves <sneves@dei.uc.pt>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-01-16 15:18:12 +08:00
Jason A. Donenfeld 0896ca2a0c crypto: x86/poly1305 - import unmodified cryptogams implementation
These x86_64 vectorized implementations come from Andy Polyakov's
CRYPTOGAMS implementation, and are included here in raw form without
modification, so that subsequent commits that fix these up for the
kernel can see how it has changed.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-01-16 15:18:12 +08:00
Jason A. Donenfeld 1c08a10436 crypto: poly1305 - add new 32 and 64-bit generic versions
These two C implementations from Zinc -- a 32x32 one and a 64x64 one,
depending on the platform -- come from Andrew Moon's public domain
poly1305-donna portable code, modified for usage in the kernel. The
precomputation in the 32-bit version and the use of 64x64 multiplies in
the 64-bit version make these perform better than the code it replaces.
Moon's code is also very widespread and has received many eyeballs of
scrutiny.

There's a bit of interference between the x86 implementation, which
relies on internal details of the old scalar implementation. In the next
commit, the x86 implementation will be replaced with a faster one that
doesn't rely on this, so none of this matters much. But for now, to keep
this passing the tests, we inline the bits of the old implementation
that the x86 implementation relied on. Also, since we now support a
slightly larger key space, via the union, some offsets had to be fixed
up.

Nonce calculation was folded in with the emit function, to take
advantage of 64x64 arithmetic. However, Adiantum appeared to rely on no
nonce handling in emit, so this path was conditionalized. We also
introduced a new struct, poly1305_core_key, to represent the precise
amount of space that particular implementation uses.

Testing with kbench9000, depending on the CPU, the update function for
the 32x32 version has been improved by 4%-7%, and for the 64x64 by
19%-30%. The 32x32 gains are small, but I think there's great value in
having a parallel implementation to the 64x64 one so that the two can be
compared side-by-side as nice stand-alone units.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-01-16 15:18:12 +08:00
Eric Biggers af5034e8e4 crypto: remove propagation of CRYPTO_TFM_RES_* flags
The CRYPTO_TFM_RES_* flags were apparently meant as a way to make the
->setkey() functions provide more information about errors.  But these
flags weren't actually being used or tested, and in many cases they
weren't being set correctly anyway.  So they've now been removed.

Also, if someone ever actually needs to start better distinguishing
->setkey() errors (which is somewhat unlikely, as this has been unneeded
for a long time), we'd be much better off just defining different return
values, like -EINVAL if the key is invalid for the algorithm vs.
-EKEYREJECTED if the key was rejected by a policy like "no weak keys".
That would be much simpler, less error-prone, and easier to test.

So just remove CRYPTO_TFM_RES_MASK and all the unneeded logic that
propagates these flags around.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-01-09 11:30:53 +08:00
Eric Biggers 674f368a95 crypto: remove CRYPTO_TFM_RES_BAD_KEY_LEN
The CRYPTO_TFM_RES_BAD_KEY_LEN flag was apparently meant as a way to
make the ->setkey() functions provide more information about errors.

However, no one actually checks for this flag, which makes it pointless.

Also, many algorithms fail to set this flag when given a bad length key.
Reviewing just the generic implementations, this is the case for
aes-fixed-time, cbcmac, echainiv, nhpoly1305, pcrypt, rfc3686, rfc4309,
rfc7539, rfc7539esp, salsa20, seqiv, and xcbc.  But there are probably
many more in arch/*/crypto/ and drivers/crypto/.

Some algorithms can even set this flag when the key is the correct
length.  For example, authenc and authencesn set it when the key payload
is malformed in any way (not just a bad length), the atmel-sha and ccree
drivers can set it if a memory allocation fails, and the chelsio driver
sets it for bad auth tag lengths, not just bad key lengths.

So even if someone actually wanted to start checking this flag (which
seems unlikely, since it's been unused for a long time), there would be
a lot of work needed to get it working correctly.  But it would probably
be much better to go back to the drawing board and just define different
return values, like -EINVAL if the key is invalid for the algorithm vs.
-EKEYREJECTED if the key was rejected by a policy like "no weak keys".
That would be much simpler, less error-prone, and easier to test.

So just remove this flag.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-01-09 11:30:53 +08:00
Kees Cook 9c1e8836ed crypto: x86 - Regularize glue function prototypes
The crypto glue performed function prototype casting via macros to make
indirect calls to assembly routines. Instead of performing casts at the
call sites (which trips Control Flow Integrity prototype checking), switch
each prototype to a common standard set of arguments which allows the
removal of the existing macros. In order to keep pointer math unchanged,
internal casting between u128 pointers and u8 pointers is added.

Co-developed-by: João Moreira <joao.moreira@intel.com>
Signed-off-by: João Moreira <joao.moreira@intel.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-12-11 16:36:54 +08:00
Linus Torvalds 483847a702 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:

 - Fix build error in crypto lib code when crypto API is off

 - Fix NULL/error check in hisilicon

 - Fix Kconfig-related build error in talitos

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: hisilicon - fix a NULL vs IS_ERR() bug in sec_create_qp_ctx()
  crypto: talitos - Fix build error by selecting LIB_DES
  crypto: arch - conditionalize crypto api in arch glue for lib code
2019-12-02 17:23:21 -08:00
Jason A. Donenfeld 8394bfec51 crypto: arch - conditionalize crypto api in arch glue for lib code
For glue code that's used by Zinc, the actual Crypto API functions might
not necessarily exist, and don't need to exist either. Before this
patch, there are valid build configurations that lead to a unbuildable
kernel. This fixes it to conditionalize those symbols on the existence
of the proper config entry.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-11-27 13:08:49 +08:00
Linus Torvalds 1d87200446 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Cross-arch changes to move the linker sections for NOTES and
     EXCEPTION_TABLE into the RO_DATA area, where they belong on most
     architectures. (Kees Cook)

   - Switch the x86 linker fill byte from x90 (NOP) to 0xcc (INT3), to
     trap jumps into the middle of those padding areas instead of
     sliding execution. (Kees Cook)

   - A thorough cleanup of symbol definitions within x86 assembler code.
     The rather randomly named macros got streamlined around a
     (hopefully) straightforward naming scheme:

        SYM_START(name, linkage, align...)
        SYM_END(name, sym_type)

        SYM_FUNC_START(name)
        SYM_FUNC_END(name)

        SYM_CODE_START(name)
        SYM_CODE_END(name)

        SYM_DATA_START(name)
        SYM_DATA_END(name)

     etc - with about three times of these basic primitives with some
     label, local symbol or attribute variant, expressed via postfixes.

     No change in functionality intended. (Jiri Slaby)

   - Misc other changes, cleanups and smaller fixes"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
  x86/entry/64: Remove pointless jump in paranoid_exit
  x86/entry/32: Remove unused resume_userspace label
  x86/build/vdso: Remove meaningless CFLAGS_REMOVE_*.o
  m68k: Convert missed RODATA to RO_DATA
  x86/vmlinux: Use INT3 instead of NOP for linker fill bytes
  x86/mm: Report actual image regions in /proc/iomem
  x86/mm: Report which part of kernel image is freed
  x86/mm: Remove redundant address-of operators on addresses
  xtensa: Move EXCEPTION_TABLE to RO_DATA segment
  powerpc: Move EXCEPTION_TABLE to RO_DATA segment
  parisc: Move EXCEPTION_TABLE to RO_DATA segment
  microblaze: Move EXCEPTION_TABLE to RO_DATA segment
  ia64: Move EXCEPTION_TABLE to RO_DATA segment
  h8300: Move EXCEPTION_TABLE to RO_DATA segment
  c6x: Move EXCEPTION_TABLE to RO_DATA segment
  arm64: Move EXCEPTION_TABLE to RO_DATA segment
  alpha: Move EXCEPTION_TABLE to RO_DATA segment
  x86/vmlinux: Move EXCEPTION_TABLE to RO_DATA segment
  x86/vmlinux: Actually use _etext for the end of the text segment
  vmlinux.lds.h: Allow EXCEPTION_TABLE to live in RO_DATA
  ...
2019-11-26 10:42:40 -08:00
Eric Biggers b62755aed3 crypto: x86/chacha - only unregister algorithms if registered
It's not valid to call crypto_unregister_skciphers() without a prior
call to crypto_register_skciphers().

Fixes: 84e03fa39f ("crypto: x86/chacha - expose SIMD ChaCha routine as library function")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-11-22 18:48:39 +08:00