AEGIS-256 key is two blocks, not one.
Fixes: 1d373d4e8e ("crypto: x86 - Add optimized AEGIS implementations")
Reported-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds optimized implementations of MORUS-640 and MORUS-1280,
utilizing the SSE2 and AVX2 x86 extensions.
For MORUS-1280 (which operates on 256-bit blocks) we provide both AVX2
and SSE2 implementation. Although SSE2 MORUS-1280 is slower than AVX2
MORUS-1280, it is comparable in speed to the SSE2 MORUS-640.
Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds optimized implementations of AEGIS-128, AEGIS-128L,
and AEGIS-256, utilizing the AES-NI and SSE2 x86 extensions.
Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Trivial fix to spelling mistake in module description text
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
There are no users of the original glue_fpu_begin() anymore, so rename
glue_skwalk_fpu_begin() to glue_fpu_begin() so that it matches
glue_fpu_end() again.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Now that all glue_helper users have been switched from the blkcipher
interface over to the skcipher interface, remove the versions of the
glue_helper functions that handled the blkcipher interface.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Now that all users of lrw_crypt() have been removed in favor of the LRW
template wrapping an ECB mode algorithm, remove lrw_crypt(). Also
remove crypto/lrw.h as that is no longer needed either; and fold
'struct lrw_table_ctx' into 'struct priv', lrw_init_table() into
setkey(), and lrw_free_table() into exit_tfm().
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Convert the AESNI AVX and AESNI AVX2 implementations of Camellia from
the (deprecated) ablkcipher and blkcipher interfaces over to the
skcipher interface. Note that this includes replacing the use of
ablk_helper with crypto_simd.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Convert the x86 asm implementation of Camellia from the (deprecated)
blkcipher interface over to the skcipher interface.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The XTS template now wraps an ECB mode algorithm rather than the block
cipher directly. Therefore it is now redundant for crypto modules to
wrap their ECB code with generic XTS code themselves via xts_crypt().
Remove the xts-camellia-asm algorithm which did this. Users who request
xts(camellia) and previously would have gotten xts-camellia-asm will now
get xts(ecb-camellia-asm) instead, which is just as fast.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The LRW template now wraps an ECB mode algorithm rather than the block
cipher directly. Therefore it is now redundant for crypto modules to
wrap their ECB code with generic LRW code themselves via lrw_crypt().
Remove the lrw-camellia-asm algorithm which did this. Users who request
lrw(camellia) and previously would have gotten lrw-camellia-asm will now
get lrw(ecb-camellia-asm) instead, which is just as fast.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The LRW template now wraps an ECB mode algorithm rather than the block
cipher directly. Therefore it is now redundant for crypto modules to
wrap their ECB code with generic LRW code themselves via lrw_crypt().
Remove the lrw-camellia-aesni-avx2 algorithm which did this. Users who
request lrw(camellia) and previously would have gotten
lrw-camellia-aesni-avx2 will now get lrw(ecb-camellia-aesni-avx2)
instead, which is just as fast.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The LRW template now wraps an ECB mode algorithm rather than the block
cipher directly. Therefore it is now redundant for crypto modules to
wrap their ECB code with generic LRW code themselves via lrw_crypt().
Remove the lrw-camellia-aesni algorithm which did this. Users who
request lrw(camellia) and previously would have gotten
lrw-camellia-aesni will now get lrw(ecb-camellia-aesni) instead, which
is just as fast.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Convert the x86 asm implementation of Triple DES from the (deprecated)
blkcipher interface over to the skcipher interface.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Convert the x86 asm implementation of Blowfish from the (deprecated)
blkcipher interface over to the skcipher interface.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Convert the AVX implementation of CAST6 from the (deprecated) ablkcipher
and blkcipher interfaces over to the skcipher interface. Note that this
includes replacing the use of ablk_helper with crypto_simd.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The LRW template now wraps an ECB mode algorithm rather than the block
cipher directly. Therefore it is now redundant for crypto modules to
wrap their ECB code with generic LRW code themselves via lrw_crypt().
Remove the lrw-cast6-avx algorithm which did this. Users who request
lrw(cast6) and previously would have gotten lrw-cast6-avx will now get
lrw(ecb-cast6-avx) instead, which is just as fast.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Convert the AVX implementation of CAST5 from the (deprecated) ablkcipher
and blkcipher interfaces over to the skcipher interface. Note that this
includes replacing the use of ablk_helper with crypto_simd.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
With ecb-cast5-avx, if a 128+ byte scatterlist element followed a
shorter one, then the algorithm accidentally encrypted/decrypted only 8
bytes instead of the expected 128 bytes. Fix it by setting the
encryption/decryption 'fn' correctly.
Fixes: c12ab20b16 ("crypto: cast5/avx - avoid using temporary stack buffers")
Cc: <stable@vger.kernel.org> # v3.8+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Convert the AVX implementation of Twofish from the (deprecated)
ablkcipher and blkcipher interfaces over to the skcipher interface.
Note that this includes replacing the use of ablk_helper with
crypto_simd.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The LRW template now wraps an ECB mode algorithm rather than the block
cipher directly. Therefore it is now redundant for crypto modules to
wrap their ECB code with generic LRW code themselves via lrw_crypt().
Remove the lrw-twofish-avx algorithm which did this. Users who request
lrw(twofish) and previously would have gotten lrw-twofish-avx will now
get lrw(ecb-twofish-avx) instead, which is just as fast.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Convert the 3-way implementation of Twofish from the (deprecated)
blkcipher interface over to the skcipher interface.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The XTS template now wraps an ECB mode algorithm rather than the block
cipher directly. Therefore it is now redundant for crypto modules to
wrap their ECB code with generic XTS code themselves via xts_crypt().
Remove the xts-twofish-3way algorithm which did this. Users who request
xts(twofish) and previously would have gotten xts-twofish-3way will now
get xts(ecb-twofish-3way) instead, which is just as fast.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The LRW template now wraps an ECB mode algorithm rather than the block
cipher directly. Therefore it is now redundant for crypto modules to
wrap their ECB code with generic LRW code themselves via lrw_crypt().
Remove the lrw-twofish-3way algorithm which did this. Users who request
lrw(twofish) and previously would have gotten lrw-twofish-3way will now
get lrw(ecb-twofish-3way) instead, which is just as fast.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Convert the AVX and AVX2 implementations of Serpent from the
(deprecated) ablkcipher and blkcipher interfaces over to the skcipher
interface. Note that this includes replacing the use of ablk_helper
with crypto_simd.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The LRW template now wraps an ECB mode algorithm rather than the block
cipher directly. Therefore it is now redundant for crypto modules to
wrap their ECB code with generic LRW code themselves via lrw_crypt().
Remove the lrw-serpent-avx algorithm which did this. Users who request
lrw(serpent) and previously would have gotten lrw-serpent-avx will now
get lrw(ecb-serpent-avx) instead, which is just as fast.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The LRW template now wraps an ECB mode algorithm rather than the block
cipher directly. Therefore it is now redundant for crypto modules to
wrap their ECB code with generic LRW code themselves via lrw_crypt().
Remove the lrw-serpent-avx2 algorithm which did this. Users who request
lrw(serpent) and previously would have gotten lrw-serpent-avx2 will now
get lrw(ecb-serpent-avx2) instead, which is just as fast.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Convert the SSE2 implementation of Serpent from the (deprecated)
ablkcipher and blkcipher interfaces over to the skcipher interface.
Note that this includes replacing the use of ablk_helper with
crypto_simd.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The XTS template now wraps an ECB mode algorithm rather than the block
cipher directly. Therefore it is now redundant for crypto modules to
wrap their ECB code with generic XTS code themselves via xts_crypt().
Remove the xts-serpent-sse2 algorithm which did this. Users who request
xts(serpent) and previously would have gotten xts-serpent-sse2 will now
get xts(ecb-serpent-sse2) instead, which is just as fast.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The LRW template now wraps an ECB mode algorithm rather than the block
cipher directly. Therefore it is now redundant for crypto modules to
wrap their ECB code with generic LRW code themselves via lrw_crypt().
Remove the lrw-serpent-sse2 algorithm which did this. Users who request
lrw(serpent) and previously would have gotten lrw-serpent-sse2 will now
get lrw(ecb-serpent-sse2) instead, which is just as fast.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add ECB, CBC, and CTR functions to glue_helper which use skcipher_walk
rather than blkcipher_walk. This will allow converting the remaining
x86 algorithms from the blkcipher interface over to the skcipher
interface, after which we'll be able to remove the blkcipher_walk
versions of these functions.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add gcmaes_crypt_by_sg routine, that will do scatter/gather
by sg. Either src or dst may contain multiple buffers, so
iterate over both at the same time if they are different.
If the input is the same as the output, iterate only over one.
Currently both the AAD and TAG must be linear, so copy them out
with scatterlist_map_and_copy. If first buffer contains the
entire AAD, we can optimize and not copy. Since the AAD
can be any size, if copied it must be on the heap. TAG can
be on the stack since it is always < 16 bytes.
Only the SSE routines are updated so far, so leave the previous
gcmaes_en/decrypt routines, and branch to the sg ones if the
keysize is inappropriate for avx, or we are SSE only.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The asm macros are all set up now, introduce entry points.
GCM_INIT and GCM_COMPLETE have arguments supplied, so that
the new scatter/gather entry points don't have to take all the
arguments, and only the ones they need.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
We can fast-path any < 16 byte read if the full message is > 16 bytes,
and shift over by the appropriate amount. Usually we are
reading > 16 bytes, so this should be faster than the READ_PARTIAL
macro introduced in b20209c91e for the average case.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Before this diff, multiple calls to GCM_ENC_DEC will
succeed, but only if all calls are a multiple of 16 bytes.
Handle partial blocks at the start of GCM_ENC_DEC, and update
aadhash as appropriate.
The data offset %r11 is also updated after the partial block.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
HashKey computation only needs to happen once per scatter/gather operation,
save it between calls in gcm_context struct instead of on the stack.
Since the asm no longer stores anything on the stack, we can use
%rsp directly, and clean up the frame save/restore macros a bit.
Hashkeys actually only need to be calculated once per key and could
be moved to when set_key is called, however, the current glue code
falls back to generic aes code if fpu is disabled.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Prepare to handle partial blocks between scatter/gather calls.
For the last partial block, we only want to calculate the aadhash
in GCM_COMPLETE, and a new partial block macro will handle both
aadhash update and encrypting partial blocks between calls.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Fill in aadhash, aadlen, pblocklen, curcount with appropriate values.
pblocklen, aadhash, and pblockenckey are also updated at the end
of each scatter/gather operation, to be carried over to the next
operation.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
AAD hash only needs to be calculated once for each scatter/gather operation.
Move it to its own macro, and call it from GCM_INIT instead of
INITIAL_BLOCKS.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Introduce a gcm_context_data struct that will be used to pass
context data between scatter/gather update calls. It is passed
as the second argument (after crypto keys), other args are
renumbered.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Make a macro for the main encode/decode routine. Only a small handful
of lines differ for enc and dec. This will also become the main
scatter/gather update routine.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Merge encode and decode tag calculations in GCM_COMPLETE macro.
Scatter/gather routines will call this once at the end of encryption
or decryption.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reduce code duplication by introducting GCM_INIT macro. This macro
will also be exposed as a function for implementing scatter/gather
support, since INIT only needs to be called once for the full
operation.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Macro-ify function save and restore. These will be used in new functions
added for scatter/gather update operations.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use macro operations to merge implemetations of INITIAL_BLOCKS,
since they differ by only a small handful of lines.
Use macro counter \@ to simplify implementation.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Pull crypto fixes from Herbert Xu:
"This fixes the following issues:
- oversize stack frames on mn10300 in sha3-generic
- warning on old compilers in sha3-generic
- API error in sun4i_ss_prng
- potential dead-lock in sun4i_ss_prng
- null-pointer dereference in sha512-mb
- endless loop when DECO acquire fails in caam
- kernel oops when hashing empty message in talitos"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: sun4i_ss_prng - convert lock to _bh in sun4i_ss_prng_generate
crypto: sun4i_ss_prng - fix return value of sun4i_ss_prng_generate
crypto: caam - fix endless loop when DECO acquire fails
crypto: sha3-generic - Use __optimize to support old compilers
compiler-gcc.h: __nostackprotector needs gcc-4.4 and up
compiler-gcc.h: Introduce __optimize function attribute
crypto: sha3-generic - deal with oversize stack frames
crypto: talitos - fix Kernel Oops on hashing an empty file
crypto: sha512-mb - initialize pending lengths correctly