OMAP AES hw supports AES-GCM mode. This patch adds support for GCM and
RFC4106 GCM mode in omap-aes driver. The GCM implementation is mostly
written into its own source file, which gets built into the same driver
binary as the existing AES support.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
[t-kristo@ti.com: forward port to latest upstream kernel, conversion to use
omap-crypto lib and some additional fixes]
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
These are going to be required by the addition of the GCM support.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Move over most of the omap-aes driver internal definitions to a separate
header file. This is done so that the same definitions can be used in
the upcoming AES-GCM support code.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use the SG alignment APIs from the OMAP crypto support library instead
of using own implementations.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
AES can have multiple HW accelerator cores in the system, in which case
each core has its own crypto engine in use. Currently, the used hardware
device is stored under the omap_aes_ctx struct, which is global for
the algorithm itself, causing conflicts when used with multiple cores.
Fix this by moving the used HW device under reqctx, which is stored
per-request basis.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Fix to return error code -ENOMEM from the crypto_engine_alloc_init()
error handling case instead of 0, as done elsewhere in this function.
Fixes: 0529900a01 ("crypto: omap-aes - Support crypto engine framework")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The crypto engine must be initialized before registering algorithms,
otherwise the test manager will crash as it attempts to execute
tests for the algos while they are being registered.
Fixes: 0529900a01 ("crypto: omap-aes - Support crypto engine framework")
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
As setting up the DMA operations is quite costly, add software fallback
support for requests smaller than 200 bytes. This change gives some 10%
extra performance in ipsec use case.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
[t-kristo@ti.com: udpated against latest upstream, to use skcipher mainly]
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Some SoCs like omap4/omap5/dra7 contain multiple AES crypto accelerator
cores. Adapt the driver to support this. The driver picks the last used
device from a list of AES devices.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
[t-kristo@ti.com: forward ported to 4.7 kernel]
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Calling runtime PM API at the cra_init/exit is bad for power management
purposes, as the lifetime for a CRA can be very long. Instead, use
pm_runtime autosuspend approach for handling the device clocks. Clocks
are enabled when they are actually required, and autosuspend disables
these if they have not been used for a sufficiently long time period.
By default, the timeout value is 1 second.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The current crypto engine allow only ablkcipher_request to be enqueued.
Thus denying any use of it for hardware that also handle hash algo.
This patch modify the API for allowing to enqueue ciphers and hash.
Since omap-aes/omap-des are the only users, this patch also convert them
to the new cryptoengine API.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch move the whole crypto engine API to its own header
crypto/engine.h.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
We already have a generic function sg_nents_for_len which does
the same thing. This patch switches omap over to it and also
adds error handling in case the SG list is short.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The extra call to dmaengine_terminate_all is not needed, as the DMA
is not running at this point. This improves performance slightly.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Algorithms can be registered only once. So skip registration of
algorithms if already registered (i.e. in case we have two AES cores
in the system.)
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
With the new dma_request_chan() the client driver does not need to look for
the DMA resource and it does not need to pass filter_fn anymore.
By switching to the new API the driver can now support deferred probing
against DMA.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: David S. Miller <davem@davemloft.net>
CC: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Integrate with the newly added crypto engine to make the crypto hardware
engine underutilized as each block needs to be processed before the crypto
hardware can start working on the next block.
The requests from dm-crypt will be listed into engine queue and processed
by engine automatically, so remove the 'queue' and 'queue_task' things in
omap aes driver.
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use BIT()/GENMASK() macros for all register definitions instead of
hand-writing bit masks.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
AES_CTRL_REG is used to configure AES mode. Before configuring
any mode we need to make sure all other modes are reset or else
driver will misbehave. So mask all modes before configuring
any AES mode.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Increasing the priority of omap-aes hw algos, in order to take
precedence over sw algos.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Algo self tests are failing for CTR mode with omap-aes driver,
giving the following error:
[ 150.053644] omap_aes_crypt: request size is not exact amount of AES blocks
[ 150.061262] alg: skcipher: encryption failed on test 5 for ctr-aes-omap: ret=22
This is because the input length is not aligned with AES_BLOCK_SIZE.
Adding support for omap-aes driver for inputs with length not aligned
with AES_BLOCK_SIZE.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
For cases where total length of an input SGs is not same as
length of the input data for encryption, omap-aes driver
crashes. This happens in the case when IPsec is trying to use
omap-aes driver.
To avoid this, we copy all the pages from the input SG list
into a contiguous buffer and prepare a single element SG list
for this buffer with length as the total bytes to crypt, which is
similar thing that is done in case of unaligned lengths.
Fixes: 6242332ff2 ("crypto: omap-aes - Add support for cases of unaligned lengths")
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Modify crypto drivers to use the generic SG helper since
both of them are equivalent and the one from crypto is redundant.
See also:
468577abe3 reverted in
b2ab4a57b0
Signed-off-by: Cristian Stoica <cristian.stoica@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use SIMPLE_DEV_PM_OPS macro in order to make the code simpler.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Acked-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The AES driver currently assumes that pm_runtime_get_sync will always
succeed, which may not always be true, so add error handling for the
same.
This scenario was reported in the following bug:
place. https://bugzilla.kernel.org/show_bug.cgi?id=66441
Reported-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de>
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
NIST vectors for CTR mode in testmgr.h assume the entire IV as the counter. To
get correct results that match the output of these vectors, we need to set the
counter length correctly.
Signed-off-by: Joel Fernandes <joelf@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Local symbols used only in this file are made static.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Cc: Dmitry Kasatkin <dmitry.kasatkin@nokia.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use devm_kzalloc instead of kzalloc. With this change, there is no need to
call kfree in error/exit paths.
Signed-off-by: Joel Fernandes <joelf@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
For cases where offset/length of on any page of the input SG is not aligned by
AES_BLOCK_SIZE, we copy all the pages from the input SG list into a contiguous
buffer and prepare a single element SG list for this buffer with length as the
total bytes to crypt.
This is requried for cases such as when an SG list of 16 bytes total size
contains 16 pages each containing 1 byte. DMA using the direct buffers of such
instances is not possible.
For this purpose, we first detect if the unaligned case and accordingly
allocate enough number of pages to satisfy the request and prepare SG lists.
We then copy data into the buffer, and copy data out of it on completion.
Signed-off-by: Joel Fernandes <joelf@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In cases where requesting for DMA channels fails for some reason, or channel
numbers are not provided in DT or platform data, we switch to PIO-only mode
also checking if platform provides IRQ numbers and interrupt register offsets
in DT and platform data. All dma-only paths are avoided in this mode.
Signed-off-by: Joel Fernandes <joelf@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
We initialize the scatter gather walk lists needed for PIO mode and avoid all
DMA paths such as mapping/unmapping buffers by checking for the pio_only flag.
Signed-off-by: Joel Fernandes <joelf@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
We add an IRQ handler that implements a state-machine for PIO-mode and data
structures for walking the scatter-gather list. The IRQ handler is called in
succession both when data is available to read or next data can be sent for
processing. This process continues till the entire in/out SG lists have been
walked. Once the SG-list has been completely walked, the IRQ handler schedules
the done_task tasklet.
Also add a useful macro that is used through out the IRQ code for a common
pattern of calculating how much an SG list has been walked. This improves code
readability and avoids checkpatch errors.
Signed-off-by: Joel Fernandes <joelf@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add IRQ information to pdata and helper macros. These are required
for PIO-mode support.
Signed-off-by: Joel Fernandes <joelf@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Intermdiate buffers were allocated, mapped and used for DMA. These are no
longer required as we use the SGs from crypto layer directly in previous
commits in the series. Also along with it, remove the logic for copying SGs
etc as they are no longer used, and all the associated variables in omap_aes_device.
Signed-off-by: Joel Fernandes <joelf@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Earlier functions that did a similar sync are replaced by the dma_sync_sg_*
which can operate on entire SG list.
Signed-off-by: Joel Fernandes <joelf@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In early version of this driver, assumptions were made such as DMA layer
requires contiguous buffers etc. Due to this, new buffers were allocated,
mapped and used for DMA. These assumptions are no longer true and DMAEngine
scatter-gather DMA doesn't have such requirements. We simply the DMA operations
by directly using the scatter-gather buffers provided by the crypto layer
instead of creating our own.
Lot of logic that handled DMA'ing only X number of bytes of the total, or as
much as fitted into a 3rd party buffer is removed and is no longer required.
Also, good performance improvement of atleast ~20% seen with encrypting a
buffer size of 8K (1800 ops/sec vs 1400 ops/sec). Improvement will be higher
for much larger blocks though such benchmarking is left as an exercise for the
reader. Also DMA usage is much more simplified and coherent with rest of the
code.
Signed-off-by: Joel Fernandes <joelf@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Crypto layer only passes nbytes but number of SG elements is needed for mapping
or unmapping SGs at one time using dma_map* API and also needed to pass in for
dmaengine prep function.
We call function added to scatterwalk for this purpose in omap_aes_handle_queue
to populate the values which are used later.
Signed-off-by: Joel Fernandes <joelf@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When DEBUG is enabled, these macros can be used to print variables in integer
and hex format, and clearly display which registers, offsets and values are
being read/written , including printing the names of the offsets and their values.
Using statement expression macros in read path as,
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Joel Fernandes <joelf@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Calling runtime PM API for every block causes serious perf hit to
crypto operations that are done on a long buffer.
As crypto is performed on a page boundary, encrypting large buffers can
cause a series of crypto operations divided by page. The runtime PM API
is also called those many times.
We call runtime_pm_get_sync only at beginning on the session (cra_init)
and runtime_pm_put at the end. This result in upto a 50% speedup as below.
This doesn't make the driver to keep the system awake as runtime get/put
is only called during a crypto session which completes usually quickly.
Before:
root@beagleboard:~# time -v openssl speed -evp aes-128-cbc
Doing aes-128-cbc for 3s on 16 size blocks: 13310 aes-128-cbc's in 0.01s
Doing aes-128-cbc for 3s on 64 size blocks: 13040 aes-128-cbc's in 0.04s
Doing aes-128-cbc for 3s on 256 size blocks: 9134 aes-128-cbc's in 0.03s
Doing aes-128-cbc for 3s on 1024 size blocks: 8939 aes-128-cbc's in 0.01s
Doing aes-128-cbc for 3s on 8192 size blocks: 4299 aes-128-cbc's in 0.00s
After:
root@beagleboard:~# time -v openssl speed -evp aes-128-cbc
Doing aes-128-cbc for 3s on 16 size blocks: 18911 aes-128-cbc's in 0.02s
Doing aes-128-cbc for 3s on 64 size blocks: 18878 aes-128-cbc's in 0.02s
Doing aes-128-cbc for 3s on 256 size blocks: 11878 aes-128-cbc's in 0.10s
Doing aes-128-cbc for 3s on 1024 size blocks: 11538 aes-128-cbc's in 0.05s
Doing aes-128-cbc for 3s on 8192 size blocks: 4857 aes-128-cbc's in 0.03s
While at it, also drop enter and exit pr_debugs, in related code. tracers
can be used for that.
Tested on a Beaglebone (AM335x SoC) board.
Signed-off-by: Joel A Fernandes <joelagnel@ti.com>
Acked-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Replace calls to deprecated devm_request_and_ioremap by devm_ioremap_resource.
Found with coccicheck and this semantic patch:
scripts/coccinelle/api/devm_request_and_ioremap.cocci.
Signed-off-by: Laurent Navet <laurent.navet@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
module_platform_driver() makes the code simpler by eliminating boilerplate
code.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
After DMA is complete, the omap_aes_finish_req function is called as
a part of the done_task tasklet. During this its atomic and any calls
to pm functions should not assume they wont sleep.
The patch replaces a call to pm_runtime_put_sync (which can sleep) with
pm_runtime_put thus fixing a kernel panic observed on AM33xx SoC during
AES operation.
Tested on an AM33xx SoC device (beaglebone board).
To reproduce the problem, I used the tcrypt kernel module as:
modprobe tcrypt sec=2 mode=500
Signed-off-by: Joel A Fernandes <joelagnel@ti.com>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The OMAP3 and OMAP4/AM33xx versions of the AES crypto
module support the CTR algorithm in addition to ECB
and CBC that the OMAP2 version of the module supports.
So, OMAP2 and OMAP3 share a common register set but
OMAP3 supports CTR while OMAP2 doesn't. OMAP4/AM33XX
uses a different register set from OMAP2/OMAP3 and
also supports CTR.
To add this support, use the platform_data introduced
in an ealier commit to hold the list of algorithms
supported by the current module. The probe routine
will use that list to register the correct algorithms.
Note: The code being integrated is from the TI AM33xx SDK
and was written by Greg Turner <gkmturner@gmail.com> and
Herman Schuurman (current email unknown) while at TI.
CC: Greg Turner <gkmturner@gmail.com>
CC: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add support for the OMAP4 version of the AES module
that is present on OMAP4 and AM33xx SoCs.
The modules have several differences including register
offsets and how DMA is triggered. To handle these
differences, a platform_data structure is defined and
contains routine pointers, register offsets, and bit
offsets within registers. OMAP2/OMAP3-specific routines
are suffixed with '_omap2' and OMAP4/AM33xx routines are
suffixed with '_omap4'.
Note: The code being integrated is from the TI AM33xx SDK
and was written by Greg Turner <gkmturner@gmail.com> and
Herman Schuurman (current email unknown) while at TI.
CC: Greg Turner <gkmturner@gmail.com>
CC: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use the dma_request_slave_channel_compat() call instead of
the dma_request_channel() call to request a DMA channel.
This allows the omap-aes driver use different DMA engines.
CC: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add Device Tree suport to the omap-aes crypto
driver. Currently, only support for OMAP2 and
OMAP3 is being added but support for OMAP4 will
be added in a subsequent patch.
CC: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>