Pull crypto updates from Herbert Xu:
"Here is the crypto update for 4.10:
API:
- add skcipher walk interface
- add asynchronous compression (acomp) interface
- fix algif_aed AIO handling of zero buffer
Algorithms:
- fix unaligned access in poly1305
- fix DRBG output to large buffers
Drivers:
- add support for iMX6UL to caam
- fix givenc descriptors (used by IPsec) in caam
- accelerated SHA256/SHA512 for ARM64 from OpenSSL
- add SSE CRCT10DIF and CRC32 to ARM/ARM64
- add AEAD support to Chelsio chcr
- add Armada 8K support to omap-rng"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (148 commits)
crypto: testmgr - fix overlap in chunked tests again
crypto: arm/crc32 - accelerated support based on x86 SSE implementation
crypto: arm64/crc32 - accelerated support based on x86 SSE implementation
crypto: arm/crct10dif - port x86 SSE implementation to ARM
crypto: arm64/crct10dif - port x86 SSE implementation to arm64
crypto: testmgr - add/enhance test cases for CRC-T10DIF
crypto: testmgr - avoid overlap in chunked tests
crypto: chcr - checking for IS_ERR() instead of NULL
crypto: caam - check caam_emi_slow instead of re-lookup platform
crypto: algif_aead - fix AIO handling of zero buffer
crypto: aes-ce - Make aes_simd_algs static
crypto: algif_skcipher - set error code when kcalloc fails
crypto: caam - make aamalg_desc a proper module
crypto: caam - pass key buffers with typesafe pointers
crypto: arm64/aes-ce-ccm - Fix AEAD decryption length
MAINTAINERS: add crypto headers to crypto entry
crypt: doc - remove misleading mention of async API
crypto: doc - fix header file name
crypto: api - fix comment typo
crypto: skcipher - Add separate walker for AEAD decryption
..
mv_cesa_hash_std_step() copies the creq->state into the SRAM at each
step, but this is only required on the first one. By doing that, we
overwrite the engine state, and get erroneous results when the crypto
request is split in several chunks to fit in the internal SRAM.
This commit changes the function to copy the state only on the first
step.
Fixes: commit 2786cee8e5 ("crypto: marvell - Move SRAM I/O op...")
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
No need to copy the template of an hash operation twice into the SRAM
from the step function.
Fixes: commit 85030c5168 ("crypto: marvell - Add support for chai...")
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Currently, the driver breaks chain for all kind of hash requests in order to
don't override intermediate states of partial ahash updates. However, some final
ahash requests can be directly processed by the engine, and so without
intermediate state. This is typically the case for most for the HMAC requests
processed via IPSec.
This commits adds a TDMA descriptor to copy context for these of requests
into the "op" dma pool, then it allow to chain these requests at the DMA level.
The 'complete' operation is also updated to retrieve the MAC digest from the
right location.
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
So far, we used a dedicated dma pool to copy the result of outer IV for
cipher requests. Instead of using a dma pool per outer data, we prefer
use the op dma pool that contains all part of the request from the SRAM.
Then, the outer data that is likely to be used by the 'complete'
operation, is copied later. In this way, any type of result can be
retrieved by DMA for cipher or ahash requests.
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Don't use 64 'as is', as max block size in mv_cesa_ahash_cache_req. Use
CESA_MAX_HASH_BLOCK_SIZE instead, this is better for readability.
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Currently, in mv_cesa_{md5,sha1,sha256}_init creq->state is initialized
before the call to mv_cesa_ahash_init. This is wrong because this
function fills creq with zero by using memset, so its 'state' that
contains the default DIGEST is overwritten. This commit fixes the issue
by initializing creq->state just after the call to mv_cesa_ahash_init.
Fixes: commit b0ef51067c ("crypto: marvell/cesa - initialize hash...")
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
So far, sub part of mv_cesa_int was responsible of dequeuing complete
requests, then call the 'cleanup' operation on these reqs and call the
crypto api callback 'complete'. The problem is that the transformation
context 'ctx' is retrieved only once before the while loop. Which means
that the wrong 'cleanup' operation might be called on the wrong type of
cesa requests, it can lead to memory corruptions with this message:
marvell-cesa f1090000.crypto: dma_pool_free cesa_padding, 5a5a5a5a/5a5a5a5a (bad dma)
This commit fixes the issue, by updating the transformation context for
each dequeued cesa request.
Fixes: commit 85030c5168 ("crypto: marvell - Add support for chai...")
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The mv_cesa_ahash_cache_req() function always returns 0, which makes
its return value pretty much useless. However, in addition to
returning a useless value, it also returns a boolean in a variable
passed by reference to indicate if the request was already cached.
So, this commit changes mv_cesa_ahash_cache_req() to return this
boolean. It consequently simplifies the only call site of
mv_cesa_ahash_cache_req(), where the "ret" variable is no longer
needed.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The mv_cesa_ahash_init() function always returns 0, and the return
value is anyway never checked. Turn it into a function returning void.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The dma_iter parameter of mv_cesa_ahash_dma_add_cache() is never used,
so get rid of it.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The mv_cesa_dma_add_op() function builds a mv_cesa_tdma_desc structure
to copy the operation description to the SRAM, but doesn't explicitly
initialize the destination of the copy. It works fine because the
operatin description must be copied at the beginning of the SRAM, and
the mv_cesa_tdma_desc structure is initialized to zero when
allocated. However, it is somewhat confusing to not have a destination
defined.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The IV output vectors should only be copied from the _complete operation
and not from the _process operation, i.e only from the operation that is
designed to copy the result of the request to the right location. This
copy is already done in the _complete operation, so this commit removes
the duplicated code in the _process op.
Fixes: 3610d6cd5231 ("crypto: marvell - Add a complete...")
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
So far, the cache of the ahash requests was updated from the 'complete'
operation. This complete operation is called from mv_cesa_tdma_process
before the cleanup operation, which means that the content of req->src
can be read and copied when it is still mapped. This commit fixes the
issue by moving this cache update from mv_cesa_ahash_complete to
mv_cesa_ahash_req_cleanup, so the copy is done once the sglist is
unmapped.
Fixes: 1bf6682cb3 ("crypto: marvell - Add a complete operation for..")
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The flag CRYPTO_TFM_REQ_MAY_BACKLOG is optional and can be set from the
user to put requests into the backlog queue when the main cryptographic
queue is full. Before calling mv_cesa_tdma_chain we must check the value
of the return status to be sure that the current request has been
correctly queued or added to the backlog.
Fixes: 85030c5168 ("crypto: marvell - Add support for chaining...")
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
So far in mv_cesa_ablkcipher_dma_req_init, if an error is thrown while
the tdma chain is built there is a memory leak. This issue exists
because the chain is assigned later at the end of the function, so the
cleanup function is called with the wrong version of the chain.
Fixes: db509a4533 ("crypto: marvell/cesa - add TDMA support")
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use the parameter 'gfp_flags' instead of 'flag' as second argument of
dma_pool_alloc(). The parameter 'flag' is for the TDMA descriptor, its
content has no sense for the allocator.
Fixes: bac8e805a3 ("crypto: marvell - Copy IV vectors by DMA...")
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Now that crypto requests are chained together at the DMA level, we
increase the size of the crypto queue for each engine. The result is
that as the backlog list is reached later, it does not stop the crypto
stack from sending asychronous requests, so more cryptographic tasks
are processed by the engines.
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The Cryptographic Engines and Security Accelerators (CESA) supports the
Multi-Packet Chain Mode. With this mode enabled, multiple tdma requests
can be chained and processed by the hardware without software
intervention. This mode was already activated, however the crypto
requests were not chained together. By doing so, we reduce significantly
the number of IRQs. Instead of being interrupted at the end of each
crypto request, we are interrupted at the end of the last cryptographic
request processed by the engine.
This commits re-factorizes the code, changes the code architecture and
adds the required data structures to chain cryptographic requests
together before sending them to an engine (stopped or possibly already
running).
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This commits adds support for fine grained load balancing on
multi-engine IPs. The engine is pre-selected based on its current load
and on the weight of the crypto request that is about to be processed.
The global crypto queue is also moved to each engine. These changes are
required to allow chaining crypto requests at the DMA level. By using
a crypto queue per engine, we make sure that we keep the state of the
tdma chain synchronized with the crypto queue. We also reduce contention
on 'cesa_dev->lock' and improve parallelism.
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Currently the crypto requests were sent to engines sequentially.
This commit moves the SRAM I/O operations from the prepare to the step
functions. It provides flexibility for future works and allow to prepare
a request while the engine is running.
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
So far, the 'process' operation was used to check if the current request
was correctly handled by the engine, if it was the case it copied
information from the SRAM to the main memory. Now, we split this
operation. We keep the 'process' operation, which still checks if the
request was correctly handled by the engine or not, then we add a new
operation for completion. The 'complete' method copies the content of
the SRAM to memory. This will soon become useful if we want to call
the process and the complete operations from different locations
depending on the type of the request (different cleanup logic).
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Currently, the only way to access the tdma chain is to use the 'req'
union from a mv_cesa_{ablkcipher,ahash}. This will soon become a problem
if we want to handle the TDMA chaining vs standard/non-DMA processing in
a generic way (with generic functions at the cesa.c level detecting
whether the request should be queued at the DMA level or not). Hence the
decision to move the chain field a the mv_cesa_req level at the expense
of adding 2 void * fields to all request contexts (including non-DMA
ones) and to remove the type completly. To limit the overhead, we get
rid of the type field, which can now be deduced from the req->chain.first
value. Once these changes are done the union is no longer needed, so
remove it and move mv_cesa_ablkcipher_std_req and mv_cesa_req
to mv_cesa_ablkcipher_req directly. There are also no needs to keep the
'base' field into the union of mv_cesa_ahash_req, so move it into the
upper structure.
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add a TDMA descriptor at the end of the request for copying the
output IV vector via a DMA transfer. This is a good way for offloading
as much as processing as possible to the DMA and the crypto engine.
This is also required for processing multiple cipher requests
in chained mode, otherwise the content of the IV vector would be
overwritten by the last processed request.
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
So far, the way that the type of a TDMA operation was checked was wrong.
We have to use the type mask in order to get the right part of the flag
containing the type of the operation.
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add a BUG_ON() call when the driver tries to launch a crypto request
while the engine is still processing the previous one. This replaces
a silent system hang by a verbose kernel panic with the associated
backtrace to let the user know that something went wrong in the CESA
driver.
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Adding a macro constant to be used for the size of the crypto queue,
instead of using a numeric value directly. It will be easier to
maintain in case we add more than one crypto queue of the same size.
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Dma_pool_zalloc combines dma_pool_alloc and memset 0. The semantic patch
that makes this transformation is as follows: (http://coccinelle.lip6.fr/)
// <smpl>
@@
expression d,e;
statement S;
@@
d =
- dma_pool_alloc
+ dma_pool_zalloc
(...);
if (!d) S
- memset(d, 0, sizeof(*d));
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When looking for available engines, the variable "engine" is
assigned to "&cesa->engines[i]" at the beginning of the for loop. Replacing
next occurences of "&cesa->engines[i]" by "engine" and in order to improve
readability.
Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
creq->cache[] is an array inside the struct, it's not a pointer and it
can't be NULL.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Pull crypto fixes from Herbert Xu:
"This fixes the following issues:
API:
- Fix kzalloc error path crash in ecryptfs added by skcipher
conversion. Note the subject of the commit is screwed up and the
correct subject is actually in the body.
Drivers:
- A number of fixes to the marvell cesa hashing code.
- Remove bogus nested irqsave that clobbers the saved flags in ccp"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: marvell/cesa - forward devm_ioremap_resource() error code
crypto: marvell/cesa - initialize hash states
crypto: marvell/cesa - fix memory leak
crypto: ccp - fix lock acquisition code
eCryptfs: Use skcipher and shash
Forward devm_ioremap_resource() error code instead of returning
-ENOMEM.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reported-by: Russell King - ARM Linux <linux@arm.linux.org.uk>
Fixes: f63601fd61 ("crypto: marvell/cesa - add a new driver for Marvell's CESA")
Cc: <stable@vger.kernel.org> # 4.2+
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
->export() might be called before we have done an update operation,
and in this case the ->state field is left uninitialized.
Put the correct default value when initializing the request.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Crypto requests are not guaranteed to be finalized (->final() call),
and can be freed at any moment, without getting any notification from
the core. This can lead to memory leaks of the ->cache buffer.
Make this buffer part of the request object, and allocate an extra buffer
from the DMA cache pool when doing DMA operations.
As a side effect, this patch also fixes another bug related to cache
allocation and DMA operations. When the core allocates a new request and
import an existing state, a cache buffer can be allocated (depending
on the state). The problem is, at that very moment, we don't know yet
whether the request will use DMA or not, and since everything is
likely to be initialized to zero, mv_cesa_ahash_alloc_cache() thinks it
should allocate a buffer for standard operation. But when
mv_cesa_ahash_free_cache() is called, req->type has been set to
CESA_DMA_REQ in the meantime, thus leading to an invalind dma_pool_free()
call (the buffer passed in argument has not been allocated from the pool).
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reported-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
We are checking twice if dma->cache_pool is not NULL but are never testing
dma->padding_pool value.
Cc: stable@vger.kernel.org
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The sg_nents_for_len() function could fail, this patch add a check for
its return value.
Signed-off-by: LABBE Corentin <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Pull crypto update from Herbert Xu:
"API:
- Add support for cipher output IVs in testmgr
- Add missing crypto_ahash_blocksize helper
- Mark authenc and des ciphers as not allowed under FIPS.
Algorithms:
- Add CRC support to 842 compression
- Add keywrap algorithm
- A number of changes to the akcipher interface:
+ Separate functions for setting public/private keys.
+ Use SG lists.
Drivers:
- Add Intel SHA Extension optimised SHA1 and SHA256
- Use dma_map_sg instead of custom functions in crypto drivers
- Add support for STM32 RNG
- Add support for ST RNG
- Add Device Tree support to exynos RNG driver
- Add support for mxs-dcp crypto device on MX6SL
- Add xts(aes) support to caam
- Add ctr(aes) and xts(aes) support to qat
- A large set of fixes from Russell King for the marvell/cesa driver"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (115 commits)
crypto: asymmetric_keys - Fix unaligned access in x509_get_sig_params()
crypto: akcipher - Don't #include crypto/public_key.h as the contents aren't used
hwrng: exynos - Add Device Tree support
hwrng: exynos - Fix missing configuration after suspend to RAM
hwrng: exynos - Add timeout for waiting on init done
dt-bindings: rng: Describe Exynos4 PRNG bindings
crypto: marvell/cesa - use __le32 for hardware descriptors
crypto: marvell/cesa - fix missing cpu_to_le32() in mv_cesa_dma_add_op()
crypto: marvell/cesa - use memcpy_fromio()/memcpy_toio()
crypto: marvell/cesa - use gfp_t for gfp flags
crypto: marvell/cesa - use dma_addr_t for cur_dma
crypto: marvell/cesa - use readl_relaxed()/writel_relaxed()
crypto: caam - fix indentation of close braces
crypto: caam - only export the state we really need to export
crypto: caam - fix non-block aligned hash calculation
crypto: caam - avoid needlessly saving and restoring caam_hash_ctx
crypto: caam - print errno code when hash registration fails
crypto: marvell/cesa - fix memory leak
crypto: marvell/cesa - fix first-fragment handling in mv_cesa_ahash_dma_last_req()
crypto: marvell/cesa - rearrange handling for sw padded hashes
...
Much of the driver uses cpu_to_le32() to convert values for descriptors
to little endian before writing. Use __le32 to define the hardware-
accessed parts of the descriptors, and ensure most places where it's
reasonable to do so use cpu_to_le32() when assigning to these.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When tdma->src is freed in mv_cesa_dma_cleanup(), we convert the DMA
address from a little-endian value prior to calling dma_pool_free().
However, mv_cesa_dma_add_op() assigns tdma->src without first converting
the DMA address to little endian. Fix this.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use the IO memcpy() functions when copying from/to MMIO memory.
These locations were found via sparse.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
cur_dma is part of the software state, not read by the hardware.
Storing it in LE32 format is wrong, use dma_addr_t for this.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use relaxed IO accessors where appropriate.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
To: Boris Brezillon <boris.brezillon@free-electrons.com>,Arnaud Ebalard <arno@natisbad.org>,Thomas Petazzoni <thomas.petazzoni@free-electrons.com>,Jason Cooper <jason@lakedaemon.net>
The local chain variable is not cleaned up if an error occurs in the middle
of DMA chain creation. Fix that by dropping the local chain variable and
using the dreq->chain field which will be cleaned up by
mv_cesa_dma_cleanup() in case of errors.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reported-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When adding the software padding, this must be done using the first/mid
fragment mode, and any subsequent operation needs to be a mid-fragment.
Fix this.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Rearrange the last request handling for hashes which require software
padding.
We prepare the padding to be appended, and then append as much of the
padding to any existing data that's already queued up, adding an
operation block and launching the operation.
Any remainder is then appended as a separate operation.
This ensures that the hardware only ever sees multiples of the hash
block size to be operated on for software padded hashes, thus ensuring
that the engine always indicates that it has finished the calculation.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Rearrange the last request handling for hardware finished hashes
by moving the generation of the fragment operation into this path.
This results in a simplified sequence to handle this case, and
allows us to move the software padded case further down into the
function. Add comments describing these parts.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Move the test for the last request out of mv_cesa_ahash_dma_last_req()
to its caller, and move the mv_cesa_dma_add_frag() down into this
function.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Avoid adding the final operation within the loop, but instead add it
outside. We combine this with the handling for the no-data case.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When we process the last request of data, and the request contains user
data, the loop in mv_cesa_ahash_dma_req_init() marks the first data size
as being iter.base.op_len which does not include the size of the cache
data. This means we end up hashing an insufficient amount of data.
Fix this by always including the cache size in the first operation
length of any request.
This has the effect that for a request containing no user data,
iter.base.op_len === iter.src.op_offset === creq->cache_ptr
As a result, we include one further change to use iter.base.op_len in
the cache-but-no-user-data case to make the next change clearer.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use the presence of the scatterlist to determine whether we should load
any new user data to the engine. The following shall always be true at
this point:
iter.base.op_len == 0 === iter.src.sg
In doing so, we can:
1. eliminate the test for iter.base.op_len inside the loop, which
makes the loop operation more obvious and understandable.
2. move the operation generation for the cache-only case.
This prepares the code for the next step in its transformation, and also
uncovers a bug that will be fixed in the next patch.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Move the calls to mv_cesa_dma_add_frag() into the parent function,
mv_cesa_ahash_dma_req_init(). This is in preparation to changing
when we generate the operation blocks, as we need to avoid generating
a block for a partial hash block at the end of the user data.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
If we add a template first-fragment operation, always update the
template to be a mid-fragment. This ensures that mid-fragments
always follow on from a first fragment in every case.
This means we can move the first to mid-fragment update code out of
mv_cesa_ahash_dma_add_data().
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add a helper to add the fragment operation block followed by the DMA
entry to launch the operation.
Although at the moment this pattern only strictly appears at one site,
two other sites can be factored as well by slightly changing the order
in which the DMA operations are performed. This should be harmless as
the only thing which matters is to have all the data loaded into SRAM
prior to launching the operation.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Multiple locations in the driver test the operation context fragment
type, checking whether it is a first fragment or not. Introduce a
mv_cesa_mac_op_is_first_frag() helper, which returns true if the
fragment operation is for a first fragment.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
mv_cesa_get_op_cfg() does not write to its argument, it only reads.
So, let's make it const.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ensure that the template operation is fully initialised, otherwise we
end up loading data from the kernel stack into the engines, which can
upset the hash results.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The endianness of the bit length used in the final stage depends on the
endianness of the algorithm - md5 hashes need it to be in little endian
format, whereas SHA hashes need it in big endian format. Use the
previously added algorithm endianness flag to control this.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Rather than determining whether we're using a MD5 hash by looking at
the digest size, switch to a cleaner solution using a per-request flag
initialised by the method type.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Currently, we read/write the state in CPU endian, but on the final
request, we convert its endian according to the requested algorithm.
(md5 is little endian, SHA are big endian.)
Always keep creq->state in CPU native endian format, and perform the
necessary conversion when copying the hash to the result.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
There's an easier way to get at the hash transform - rather than
using crypto_ahash_tfm(ahash), we can get it directly from
req->base.tfm.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
As all the import functions and export functions are virtually
identical, factor out their common parts into a generic
mv_cesa_ahash_import() and mv_cesa_ahash_export() respectively. This
performs the actual import or export, and we pass the data pointers and
length into these functions.
We have to switch a % const operation to do_div() in the common import
function to avoid provoking gcc to use the expensive 64-bit by 64-bit
modulus operation.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Attempting to use the sha1 digest for openssh via openssl reveals that
the result from the hash is wrong: this happens when we export the
state from one socket and import it into another via calling accept().
The reason for this is because the operation is reset to "initial block"
state, whereas we may be past the first fragment of data to be hashed.
Arrange for the operation code to avoid the initialisation of the state,
thereby preserving the imported state.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When a AF_ALG fd is accepted a second time (hence hash_accept() is
used), hash_accept_parent() allocates a new private context using
sock_kmalloc(). This context is uninitialised. After use of the new
fd, we eventually end up with the kernel complaining:
marvell-cesa f1090000.crypto: dma_pool_free cesa_padding, c0627770/0 (bad dma)
where c0627770 is a random address. Poisoning the memory allocated by
the above sock_kmalloc() produces kernel oopses within the marvell hash
code, particularly the interrupt handling.
The following simplfied call sequence occurs:
hash_accept()
crypto_ahash_export()
marvell hash export function
af_alg_accept()
hash_accept_parent() <== allocates uninitialised struct hash_ctx
crypto_ahash_import()
marvell hash import function
hash_ctx contains the struct mv_cesa_ahash_req in its req.__ctx member,
and, as the marvell hash import function only partially initialises
this structure, we end up with a lot of members which are left with
whatever data was in memory prior to sock_kmalloc().
Add zero-initialisation of this structure.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Boris Brezillon <boris.brezillon@free-electronc.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Several of the algorithms in marvell/hash.c have a statesize of zero.
When an AF_ALG accept() on an already-accepted file descriptor to
calls into hash_accept(), this causes:
char state[crypto_ahash_statesize(crypto_ahash_reqtfm(req))];
to be zero-sized, but we still pass this to:
err = crypto_ahash_export(req, state);
which proceeds to write to 'state' as if it was a "struct md5_state",
"struct sha1_state" etc. Add the necessary initialisers for the
.statesize member.
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The mv_cesa_queue_req() function calls crypto_enqueue_request() to
enqueue a request. In the normal case (i.e the queue isn't full), this
function returns -EINPROGRESS. The current Marvell CESA crypto driver
takes this into account and cleans up the request only if an error
occured, i.e if the return value is not -EINPROGRESS.
Unfortunately this causes problems with
CRYPTO_TFM_REQ_MAY_BACKLOG-flagged requests. When such a request is
passed to crypto_enqueue_request() and the queue is full,
crypto_enqueue_request() will return -EBUSY, but will keep the request
enqueued nonetheless. This situation was not properly handled by the
Marvell CESA driver, which was anyway cleaning up the request in such
a situation. When later on the request was taken out of the backlog
and actually processed, a kernel crash occured due to the internal
driver data structures for this structure having been cleaned up.
To avoid this situation, this commit adds a
mv_cesa_req_needs_cleanup() helper function which indicates if the
request needs to be cleaned up or not after a call to
crypto_enqueue_request(). This helper allows to do the cleanup only in
the appropriate cases, and all call sites of mv_cesa_queue_req() are
fixed to use this new helper function.
Reported-by: Vincent Donnefort <vdonnefort@gmail.com>
Fixes: db509a4533 ("crypto: marvell/cesa - add TDMA support")
Cc: <stable@vger.kernel.org> # v4.2+
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Tested-by: Vincent Donnefort <vdonnefort@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
platform_driver does not need to set an owner because
platform_driver_register() will set it.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
To be consistent with other kernel interface namings, rename
of_get_named_gen_pool() to of_gen_pool_get(). In the original function
name "_named" suffix references to a device tree property, which contains
a phandle to a device and the corresponding device driver is assumed to
register a gen_pool object.
Due to a weak relation and to avoid any confusion (e.g. in future
possible scenario if gen_pool objects are named) the suffix is removed.
[sfr@canb.auug.org.au: crypto/marvell/cesa - fix up for of_get_named_gen_pool() rename]
Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Philipp Zabel <p.zabel@pengutronix.de>
Cc: Shawn Guo <shawn.guo@linaro.org>
Cc: Sascha Hauer <kernel@pengutronix.de>
Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Jaroslav Kysela <perex@perex.cz>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Boris BREZILLON <boris.brezillon@free-electrons.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add the Kirkwood and Dove SoC descriptions, and control the allhwsupport
module parameter to avoid probing the CESA IP when the old CESA driver is
enabled (unless it is explicitly requested to do so).
Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add the Orion SoC description, and select this implementation by default
to support non-DT probing: Orion is the only platform where non-DT boards
are declaring the CESA block.
Control the allhwsupport module parameter to avoid probing the CESA IP when
the old CESA driver is enabled (unless it is explicitly requested to do
so).
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The old and new marvell CESA drivers both support Orion and Kirkwood SoCs.
Add a module parameter to choose whether these SoCs should be attached to
the new or the old driver.
The default policy is to keep attaching those IPs to the old driver if it
is enabled, until we decide the new CESA driver is stable/secure enough.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add CESA IP description for all the missing armada SoCs (XP, 375 and 38x).
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add support for SHA256 operations.
Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add support for MD5 operations.
Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add support for Triple-DES operations.
Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add support for DES operations.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The CESA IP supports CPU offload through a dedicated DMA engine (TDMA)
which can control the crypto block.
When you use this mode, all the required data (operation metadata and
payload data) are transferred using DMA, and the results are retrieved
through DMA when possible (hash results are not retrieved through DMA yet),
thus reducing the involvement of the CPU and providing better performances
in most cases (for small requests, the cost of DMA preparation might
exceed the performance gain).
Note that some CESA IPs do not embed this dedicated DMA, hence the
activation of this feature on a per platform basis.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The existing mv_cesa driver supports some features of the CESA IP but is
quite limited, and reworking it to support new features (like involving the
TDMA engine to offload the CPU) is almost impossible.
This driver has been rewritten from scratch to take those new features into
account.
This commit introduce the base infrastructure allowing us to add support
for DMA optimization.
It also includes support for one hash (SHA1) and one cipher (AES)
algorithm, and enable those features on the Armada 370 SoC.
Other algorithms and platforms will be added later on.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>