To help avoid confusion, add a comment to ghash-generic.c which explains
the convention that the kernel's implementation of GHASH uses.
Also update the Kconfig help text and module descriptions to call GHASH
a "hash function" rather than a "message digest", since the latter
normally means a real cryptographic hash function, which GHASH is not.
Cc: Pascal Van Leeuwen <pvanleeuwen@verimatrix.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Pascal Van Leeuwen <pvanleeuwen@verimatrix.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
With the removal of the padata timer, padata_do_serial no longer
needs special CPU handling, so remove it.
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Validated assoclen for RFC4543 which expects an assoclen
of 16 or 20, the same as RFC4106.
Based on seqiv, IPsec ESP and RFC4543/RFC4106 the assoclen is sizeof
IP Header (spi, seq_no, extended seq_no) and IV len. This can be 16 or
20 bytes.
Signed-off-by: Iuliana Prodan <iuliana.prodan@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Check assoclen to solve the extra tests that expect -EINVAL to be
returned when the associated data size is not valid.
Validated assoclen for RFC4543 which expects an assoclen
of 16 or 20, the same as RFC4106.
Based on seqiv, IPsec ESP and RFC4543/RFC4106 the assoclen is sizeof
IP Header (spi, seq_no, extended seq_no) and IV len. This can be 16 or
20 bytes.
Signed-off-by: Iuliana Prodan <iuliana.prodan@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The function padata_reorder will use a timer when it cannot progress
while completed jobs are outstanding (pd->reorder_objects > 0). This
is suboptimal as if we do end up using the timer then it would have
introduced a gratuitous delay of one second.
In fact we can easily distinguish between whether completed jobs
are outstanding and whether we can make progress. All we have to
do is look at the next pqueue list.
This patch does that by replacing pd->processed with pd->cpu so
that the next pqueue is more accessible.
A work queue is used instead of the original try_again to avoid
hogging the CPU.
Note that we don't bother removing the work queue in
padata_flush_queues because the whole premise is broken. You
cannot flush async crypto requests so it makes no sense to even
try. A subsequent patch will fix it by replacing it with a ref
counting scheme.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Clang sometimes makes very different inlining decisions from gcc.
In case of the aegis crypto algorithms, it decides to turn the innermost
primitives (and, xor, ...) into separate functions but inline most of
the rest.
This results in a huge amount of variables spilled on the stack, leading
to rather slow execution as well as kernel stack usage beyond the 32-bit
warning limit when CONFIG_KASAN is enabled:
crypto/aegis256.c:123:13: warning: stack frame size of 648 bytes in function 'crypto_aegis256_encrypt_chunk' [-Wframe-larger-than=]
crypto/aegis256.c:366:13: warning: stack frame size of 1264 bytes in function 'crypto_aegis256_crypt' [-Wframe-larger-than=]
crypto/aegis256.c:187:13: warning: stack frame size of 656 bytes in function 'crypto_aegis256_decrypt_chunk' [-Wframe-larger-than=]
crypto/aegis128l.c:135:13: warning: stack frame size of 832 bytes in function 'crypto_aegis128l_encrypt_chunk' [-Wframe-larger-than=]
crypto/aegis128l.c:415:13: warning: stack frame size of 1480 bytes in function 'crypto_aegis128l_crypt' [-Wframe-larger-than=]
crypto/aegis128l.c:218:13: warning: stack frame size of 848 bytes in function 'crypto_aegis128l_decrypt_chunk' [-Wframe-larger-than=]
crypto/aegis128.c:116:13: warning: stack frame size of 584 bytes in function 'crypto_aegis128_encrypt_chunk' [-Wframe-larger-than=]
crypto/aegis128.c:351:13: warning: stack frame size of 1064 bytes in function 'crypto_aegis128_crypt' [-Wframe-larger-than=]
crypto/aegis128.c:177:13: warning: stack frame size of 592 bytes in function 'crypto_aegis128_decrypt_chunk' [-Wframe-larger-than=]
Forcing the primitives to all get inlined avoids the issue and the
resulting code is similar to what gcc produces.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use dma_pool_zalloc instead of using dma_pool_alloc to allocate
memory and then zeroing it with memset 0.
This simplifies the code.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Acked-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
While running ipsec processing for traffic through multiple network
interfaces, it is observed that caam driver gets less time to poll
responses from caam block compared to ethernet driver. This is because
ethernet driver has as many napi instances per cpu as the number of
ethernet interfaces in system. Therefore, caam driver's napi executes
lesser than the ethernet driver's napi instances. This results in
situation that we end up submitting more requests to caam (which it is
able to finish off quite fast), but don't dequeue the responses at same
rate. This makes caam response FQs bloat with large number of frames. In
some situations, it makes kernel crash due to out-of-memory. To prevent
it We increase the napi budget of dpseci driver to a big value so that
caam driver is able to drain its response queues at enough rate.
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Reviewed-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use the new helper devm_platform_ioremap_resource() which wraps the
platform_get_resource() and devm_ioremap_resource() together, to
simplify the code.
Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Reviewed-by: Dong Aisheng <aisheng.dong@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use the new helper devm_platform_ioremap_resource() which wraps the
platform_get_resource() and devm_ioremap_resource() together, to
simplify the code.
Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Dong Aisheng <aisheng.dong@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Each of the operations in ccp_run_cmd() needs several hundred
bytes of kernel stack. Depending on the inlining, these may
need separate stack slots that add up to more than the warning
limit, as shown in this clang based build:
drivers/crypto/ccp/ccp-ops.c:871:12: error: stack frame size of 1164 bytes in function 'ccp_run_aes_cmd' [-Werror,-Wframe-larger-than=]
static int ccp_run_aes_cmd(struct ccp_cmd_queue *cmd_q, struct ccp_cmd *cmd)
The problem may also happen when there is no warning, e.g. in the
ccp_run_cmd()->ccp_run_aes_cmd()->ccp_run_aes_gcm_cmd() call chain with
over 2000 bytes.
Mark each individual function as 'noinline_for_stack' to prevent
this from happening, and move the calls to the two special cases for aes
into the top-level function. This will keep the actual combined stack
usage to the mimimum: 828 bytes for ccp_run_aes_gcm_cmd() and
at most 524 bytes for each of the other cases.
Fixes: 63b945091a ("crypto: ccp - CCP device driver and interface support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Redefine pr_fmt so that the module name is prefixed to every
log message produced by the ccp-crypto module
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The directory tools/crypto and the only file under it never gets
built anywhere. This program should instead be incorporated into
one of the existing user-space projects, crconf or libkcapi.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds support to load Asymmetric crypto firmware on
AE cores of CNN55XX device. Firmware is stored on UCD block 2
and all available AE cores are tagged to group 0.
Signed-off-by: Phani Kiran Hemadri <phemadri@marvell.com>
Reviewed-by: Srikanth Jampala <jsrikanth@marvell.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The CCP driver is able to act as a DMA engine. Add a module parameter that
allows this feature to be enabled/disabled.
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Provide the ability to constrain the total number of enabled devices in
the system. Once max_devs devices have been configured, subsequently
probed devices are ignored.
The max_devs parameter may be zero, in which case all CCPs are disabled.
PSPs are always enabled and active.
Disabling the CCPs also disables DMA and RNG registration.
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add a module parameter to limit the number of queues per CCP. The default
value (nqueues=0) is to set up every available queue on each device.
The count of queues starts from the first one found on the device (which
varies based on the device ID).
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add a config option to exclude DebugFS support in the CCP driver.
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Currently, NETLINK_CRYPTO works only in the init network namespace. It
doesn't make much sense to cut it out of the other network namespaces,
so do the minor plumbing work necessary to make it work in any network
namespace. Code inspired by net/core/sock_diag.c.
Tested using kcapi-dgst from libkcapi [1]:
Before:
# unshare -n kcapi-dgst -c sha256 </dev/null | wc -c
libkcapi - Error: Netlink error: sendmsg failed
libkcapi - Error: Netlink error: sendmsg failed
libkcapi - Error: NETLINK_CRYPTO: cannot obtain cipher information for hmac(sha512) (is required crypto_user.c patch missing? see documentation)
0
After:
# unshare -n kcapi-dgst -c sha256 </dev/null | wc -c
32
[1] https://github.com/smuellerDD/libkcapi
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch recognises the fact that the hardware cannot ever process more
than 2,199,023,386,111 bytes of hash or HMAC payload, so there is no point
in maintaining 128 bit wide byte counters, 64 bits is more than sufficient
Signed-off-by: Pascal van Leeuwen <pvanleeuwen@verimatrix.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds support for the following AEAD ciphersuites:
- authenc(hmac(sha1),rfc3686(ctr(aes)))
- authenc(hmac(sha224),rfc3686(ctr(aes)))
- authenc(hmac(sha256),rfc3686(ctr(aes)))
- authenc(hmac(sha384),rfc3686(ctr(aes)))
- authenc(hmac(sha512),rfc3686(ctr(aes)))
Signed-off-by: Pascal van Leeuwen <pvanleeuwen@verimatrix.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
For spinlocks the type spinlock_t should be used instead of "struct
spinlock".
Use spinlock_t for spinlock's definition.
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
kmemdup is introduced to duplicate a region of memory in a neat way.
Rather than kmalloc/kzalloc + memcpy, which the programmer needs to
write the size twice (sometimes lead to mistakes), kmemdup improves
readability, leads to smaller code and also reduce the chances of mistakes.
Suggestion to use kmemdup rather than using kmalloc/kzalloc + memcpy.
Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Reviewed-by: Horia Geantă <horia.geanta@nxp.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Provide an accelerated implementation of aegis128 by wiring up the
SIMD hooks in the generic driver to an implementation based on NEON
intrinsics, which can be compiled to both ARM and arm64 code.
This results in a performance of 2.2 cycles per byte on Cortex-A53,
which is a performance increase of ~11x compared to the generic
code.
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add some plumbing to allow the AEGIS128 code to be built with SIMD
routines for acceleration.
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The generic AES code provides four sets of lookup tables, where each
set consists of four tables containing the same 32-bit values, but
rotated by 0, 8, 16 and 24 bits, respectively. This makes sense for
CISC architectures such as x86 which support memory operands, but
for other architectures, the rotates are quite cheap, and using all
four tables needlessly thrashes the D-cache, and actually hurts rather
than helps performance.
Since x86 already has its own implementation of AEGIS based on AES-NI
instructions, let's tweak the generic implementation towards other
architectures, and avoid the prerotated tables, and perform the
rotations inline. On ARM Cortex-A53, this results in a ~8% speedup.
Acked-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
TFM init/exit routines are optional, so no need to provide empty ones.
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Three variants of AEGIS were proposed for the CAESAR competition, and
only one was selected for the final portfolio: AEGIS128.
The other variants, AEGIS128L and AEGIS256, are not likely to ever turn
up in networking protocols or other places where interoperability
between Linux and other systems is a concern, nor are they likely to
be subjected to further cryptanalysis. However, uninformed users may
think that AEGIS128L (which is faster) is equally fit for use.
So let's remove them now, before anyone starts using them and we are
forced to support them forever.
Note that there are no known flaws in the algorithms or in any of these
implementations, but they have simply outlived their usefulness.
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
MORUS was not selected as a winner in the CAESAR competition, which
is not surprising since it is considered to be cryptographically
broken [0]. (Note that this is not an implementation defect, but a
flaw in the underlying algorithm). Since it is unlikely to be in use
currently, let's remove it before we're stuck with it.
[0] https://eprint.iacr.org/2019/172.pdf
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The scalar table based AES routines are not used by other drivers, so
let's keep it that way and unexport the symbols.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
There are a few copies of the AES S-boxes floating around, so export
the ones from the AES library so that we can reuse them in other
modules.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The versions of the AES lookup tables that are only used during the last
round are never used outside of the driver, so there is no need to
export their symbols.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Replace a couple of occurrences where the "aes-generic" cipher is
instantiated explicitly and only used for encryption of a single block.
Use AES library calls instead.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use the AES library instead of the cipher interface to perform
the single block of AES processing involved in updating the key
of the cmac(aes) hash.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The AMCC code for GCM key derivation allocates a AES cipher to
perform a single block encryption. So let's switch to the new
and more lightweight AES library instead.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The bluetooth code uses a bare AES cipher for the encryption operations.
Given that it carries out a set_key() operation right before every
encryption operation, this is clearly not a hot path, and so the use of
the cipher interface (which provides the best implementation available
on the system) is not really required.
In fact, when using a cipher like AES-NI or AES-CE, both the set_key()
and the encrypt() operations involve en/disabling preemption as well as
stacking and unstacking the SIMD context, and this is most certainly
not worth it for encrypting 16 bytes of data.
So let's switch to the new lightweight library interface instead.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
GHASH is used by the GCM mode, which is often used in contexts where
only synchronous ciphers are permitted. So provide a synchronous version
of GHASH based on the existing code. This requires a non-SIMD fallback
to deal with invocations occurring from a context where SIMD instructions
may not be used.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
AES in CTR mode is used by modes such as GCM and CCM, which are often
used in contexts where only synchronous ciphers are permitted. So
provide a synchronous version of ctr(aes) based on the existing code.
This requires a non-SIMD fallback to deal with invocations occurring
from a context where SIMD instructions may not be used. We have a
helper for this now in the AES library, so wire that up.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
AES in CTR mode is used by modes such as GCM and CCM, which are often
used in contexts where only synchronous ciphers are permitted. So
provide a synchronous version of ctr(aes) based on the existing code.
This requires a non-SIMD fallback to deal with invocations occurring
from a context where SIMD instructions may not be used. We have a
helper for this now in the AES library, so wire that up.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Align ARM's hw instruction based AES implementation with other versions
that keep the key schedule in native endianness. This will allow us to
merge the various implementations going forward.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Instead of calling into the table based scalar AES code in situations
where the SIMD unit may not be used, use the generic AES code, which
is more appropriate since it is less likely to be susceptible to
timing attacks.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In preparation of duplicating the sync ctr(aes) functionality to modules
under arch/arm, move the helper function from a inline .h file to the
AES library, which is already depended upon by the drivers that use this
fallback.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add a static inline helper modeled after crypto_cbc_encrypt_walk()
that can be reused for SIMD algorithms that need to implement a
non-SIMD fallback for performing CTR encryption.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>