In the quest to remove all stack VLA usage from the kernel[1], this
replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage
with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(),
which uses a fixed stack size.
[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Crypto driver queue size can now be configured from userspace. This
allows optimizing the queue usage based on use case. Default queue
size is still 10 entries.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Crypto driver fallback size can now be configured from userspace. This
allows optimizing the DMA usage based on use case. Detault fallback
size of 200 is still used.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch convert the driver to the new crypto engine API.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The usage of of_device_get_match_data reduce the code size a bit.
Furthermore, it prevents an improbable dereference when
of_match_device() return NULL.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch replace GCM IV size value by their constant name.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Propagate the return value of platform_get_irq on failure.
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
OMAP AES hw supports AES-GCM mode. This patch adds support for GCM and
RFC4106 GCM mode in omap-aes driver. The GCM implementation is mostly
written into its own source file, which gets built into the same driver
binary as the existing AES support.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
[t-kristo@ti.com: forward port to latest upstream kernel, conversion to use
omap-crypto lib and some additional fixes]
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
These are going to be required by the addition of the GCM support.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Move over most of the omap-aes driver internal definitions to a separate
header file. This is done so that the same definitions can be used in
the upcoming AES-GCM support code.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use the SG alignment APIs from the OMAP crypto support library instead
of using own implementations.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
AES can have multiple HW accelerator cores in the system, in which case
each core has its own crypto engine in use. Currently, the used hardware
device is stored under the omap_aes_ctx struct, which is global for
the algorithm itself, causing conflicts when used with multiple cores.
Fix this by moving the used HW device under reqctx, which is stored
per-request basis.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Fix to return error code -ENOMEM from the crypto_engine_alloc_init()
error handling case instead of 0, as done elsewhere in this function.
Fixes: 0529900a01 ("crypto: omap-aes - Support crypto engine framework")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The crypto engine must be initialized before registering algorithms,
otherwise the test manager will crash as it attempts to execute
tests for the algos while they are being registered.
Fixes: 0529900a01 ("crypto: omap-aes - Support crypto engine framework")
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
As setting up the DMA operations is quite costly, add software fallback
support for requests smaller than 200 bytes. This change gives some 10%
extra performance in ipsec use case.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
[t-kristo@ti.com: udpated against latest upstream, to use skcipher mainly]
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Some SoCs like omap4/omap5/dra7 contain multiple AES crypto accelerator
cores. Adapt the driver to support this. The driver picks the last used
device from a list of AES devices.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
[t-kristo@ti.com: forward ported to 4.7 kernel]
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Calling runtime PM API at the cra_init/exit is bad for power management
purposes, as the lifetime for a CRA can be very long. Instead, use
pm_runtime autosuspend approach for handling the device clocks. Clocks
are enabled when they are actually required, and autosuspend disables
these if they have not been used for a sufficiently long time period.
By default, the timeout value is 1 second.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The current crypto engine allow only ablkcipher_request to be enqueued.
Thus denying any use of it for hardware that also handle hash algo.
This patch modify the API for allowing to enqueue ciphers and hash.
Since omap-aes/omap-des are the only users, this patch also convert them
to the new cryptoengine API.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch move the whole crypto engine API to its own header
crypto/engine.h.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
We already have a generic function sg_nents_for_len which does
the same thing. This patch switches omap over to it and also
adds error handling in case the SG list is short.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The extra call to dmaengine_terminate_all is not needed, as the DMA
is not running at this point. This improves performance slightly.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Algorithms can be registered only once. So skip registration of
algorithms if already registered (i.e. in case we have two AES cores
in the system.)
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
With the new dma_request_chan() the client driver does not need to look for
the DMA resource and it does not need to pass filter_fn anymore.
By switching to the new API the driver can now support deferred probing
against DMA.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: David S. Miller <davem@davemloft.net>
CC: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Integrate with the newly added crypto engine to make the crypto hardware
engine underutilized as each block needs to be processed before the crypto
hardware can start working on the next block.
The requests from dm-crypt will be listed into engine queue and processed
by engine automatically, so remove the 'queue' and 'queue_task' things in
omap aes driver.
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use BIT()/GENMASK() macros for all register definitions instead of
hand-writing bit masks.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
AES_CTRL_REG is used to configure AES mode. Before configuring
any mode we need to make sure all other modes are reset or else
driver will misbehave. So mask all modes before configuring
any AES mode.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Increasing the priority of omap-aes hw algos, in order to take
precedence over sw algos.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Algo self tests are failing for CTR mode with omap-aes driver,
giving the following error:
[ 150.053644] omap_aes_crypt: request size is not exact amount of AES blocks
[ 150.061262] alg: skcipher: encryption failed on test 5 for ctr-aes-omap: ret=22
This is because the input length is not aligned with AES_BLOCK_SIZE.
Adding support for omap-aes driver for inputs with length not aligned
with AES_BLOCK_SIZE.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
For cases where total length of an input SGs is not same as
length of the input data for encryption, omap-aes driver
crashes. This happens in the case when IPsec is trying to use
omap-aes driver.
To avoid this, we copy all the pages from the input SG list
into a contiguous buffer and prepare a single element SG list
for this buffer with length as the total bytes to crypt, which is
similar thing that is done in case of unaligned lengths.
Fixes: 6242332ff2 ("crypto: omap-aes - Add support for cases of unaligned lengths")
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Modify crypto drivers to use the generic SG helper since
both of them are equivalent and the one from crypto is redundant.
See also:
468577abe3 reverted in
b2ab4a57b0
Signed-off-by: Cristian Stoica <cristian.stoica@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use SIMPLE_DEV_PM_OPS macro in order to make the code simpler.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Acked-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The AES driver currently assumes that pm_runtime_get_sync will always
succeed, which may not always be true, so add error handling for the
same.
This scenario was reported in the following bug:
place. https://bugzilla.kernel.org/show_bug.cgi?id=66441
Reported-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de>
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
NIST vectors for CTR mode in testmgr.h assume the entire IV as the counter. To
get correct results that match the output of these vectors, we need to set the
counter length correctly.
Signed-off-by: Joel Fernandes <joelf@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Local symbols used only in this file are made static.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Cc: Dmitry Kasatkin <dmitry.kasatkin@nokia.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use devm_kzalloc instead of kzalloc. With this change, there is no need to
call kfree in error/exit paths.
Signed-off-by: Joel Fernandes <joelf@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
For cases where offset/length of on any page of the input SG is not aligned by
AES_BLOCK_SIZE, we copy all the pages from the input SG list into a contiguous
buffer and prepare a single element SG list for this buffer with length as the
total bytes to crypt.
This is requried for cases such as when an SG list of 16 bytes total size
contains 16 pages each containing 1 byte. DMA using the direct buffers of such
instances is not possible.
For this purpose, we first detect if the unaligned case and accordingly
allocate enough number of pages to satisfy the request and prepare SG lists.
We then copy data into the buffer, and copy data out of it on completion.
Signed-off-by: Joel Fernandes <joelf@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In cases where requesting for DMA channels fails for some reason, or channel
numbers are not provided in DT or platform data, we switch to PIO-only mode
also checking if platform provides IRQ numbers and interrupt register offsets
in DT and platform data. All dma-only paths are avoided in this mode.
Signed-off-by: Joel Fernandes <joelf@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
We initialize the scatter gather walk lists needed for PIO mode and avoid all
DMA paths such as mapping/unmapping buffers by checking for the pio_only flag.
Signed-off-by: Joel Fernandes <joelf@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
We add an IRQ handler that implements a state-machine for PIO-mode and data
structures for walking the scatter-gather list. The IRQ handler is called in
succession both when data is available to read or next data can be sent for
processing. This process continues till the entire in/out SG lists have been
walked. Once the SG-list has been completely walked, the IRQ handler schedules
the done_task tasklet.
Also add a useful macro that is used through out the IRQ code for a common
pattern of calculating how much an SG list has been walked. This improves code
readability and avoids checkpatch errors.
Signed-off-by: Joel Fernandes <joelf@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add IRQ information to pdata and helper macros. These are required
for PIO-mode support.
Signed-off-by: Joel Fernandes <joelf@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Intermdiate buffers were allocated, mapped and used for DMA. These are no
longer required as we use the SGs from crypto layer directly in previous
commits in the series. Also along with it, remove the logic for copying SGs
etc as they are no longer used, and all the associated variables in omap_aes_device.
Signed-off-by: Joel Fernandes <joelf@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Earlier functions that did a similar sync are replaced by the dma_sync_sg_*
which can operate on entire SG list.
Signed-off-by: Joel Fernandes <joelf@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In early version of this driver, assumptions were made such as DMA layer
requires contiguous buffers etc. Due to this, new buffers were allocated,
mapped and used for DMA. These assumptions are no longer true and DMAEngine
scatter-gather DMA doesn't have such requirements. We simply the DMA operations
by directly using the scatter-gather buffers provided by the crypto layer
instead of creating our own.
Lot of logic that handled DMA'ing only X number of bytes of the total, or as
much as fitted into a 3rd party buffer is removed and is no longer required.
Also, good performance improvement of atleast ~20% seen with encrypting a
buffer size of 8K (1800 ops/sec vs 1400 ops/sec). Improvement will be higher
for much larger blocks though such benchmarking is left as an exercise for the
reader. Also DMA usage is much more simplified and coherent with rest of the
code.
Signed-off-by: Joel Fernandes <joelf@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Crypto layer only passes nbytes but number of SG elements is needed for mapping
or unmapping SGs at one time using dma_map* API and also needed to pass in for
dmaengine prep function.
We call function added to scatterwalk for this purpose in omap_aes_handle_queue
to populate the values which are used later.
Signed-off-by: Joel Fernandes <joelf@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When DEBUG is enabled, these macros can be used to print variables in integer
and hex format, and clearly display which registers, offsets and values are
being read/written , including printing the names of the offsets and their values.
Using statement expression macros in read path as,
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Joel Fernandes <joelf@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Calling runtime PM API for every block causes serious perf hit to
crypto operations that are done on a long buffer.
As crypto is performed on a page boundary, encrypting large buffers can
cause a series of crypto operations divided by page. The runtime PM API
is also called those many times.
We call runtime_pm_get_sync only at beginning on the session (cra_init)
and runtime_pm_put at the end. This result in upto a 50% speedup as below.
This doesn't make the driver to keep the system awake as runtime get/put
is only called during a crypto session which completes usually quickly.
Before:
root@beagleboard:~# time -v openssl speed -evp aes-128-cbc
Doing aes-128-cbc for 3s on 16 size blocks: 13310 aes-128-cbc's in 0.01s
Doing aes-128-cbc for 3s on 64 size blocks: 13040 aes-128-cbc's in 0.04s
Doing aes-128-cbc for 3s on 256 size blocks: 9134 aes-128-cbc's in 0.03s
Doing aes-128-cbc for 3s on 1024 size blocks: 8939 aes-128-cbc's in 0.01s
Doing aes-128-cbc for 3s on 8192 size blocks: 4299 aes-128-cbc's in 0.00s
After:
root@beagleboard:~# time -v openssl speed -evp aes-128-cbc
Doing aes-128-cbc for 3s on 16 size blocks: 18911 aes-128-cbc's in 0.02s
Doing aes-128-cbc for 3s on 64 size blocks: 18878 aes-128-cbc's in 0.02s
Doing aes-128-cbc for 3s on 256 size blocks: 11878 aes-128-cbc's in 0.10s
Doing aes-128-cbc for 3s on 1024 size blocks: 11538 aes-128-cbc's in 0.05s
Doing aes-128-cbc for 3s on 8192 size blocks: 4857 aes-128-cbc's in 0.03s
While at it, also drop enter and exit pr_debugs, in related code. tracers
can be used for that.
Tested on a Beaglebone (AM335x SoC) board.
Signed-off-by: Joel A Fernandes <joelagnel@ti.com>
Acked-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>