Shuffle some instructions around in the __hround macro to shave off
0.1 cycles per byte on Cortex-A57.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Using simple adrp/add pairs to refer to the AES lookup tables exposed by
the generic AES driver (which could be loaded far away from this driver
when KASLR is in effect) was unreliable at module load time before commit
41c066f2c4 ("arm64: assembler: make adr_l work in modules under KASLR"),
which is why the AES code used literals instead.
So now we can get rid of the literals, and switch to the adr_l macro.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Remove the unnecessary alignmask: it is much more efficient to deal with
the misalignment in the core algorithm than relying on the crypto API to
copy the data to a suitably aligned buffer.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Remove the unnecessary alignmask: it is much more efficient to deal with
the misalignment in the core algorithm than relying on the crypto API to
copy the data to a suitably aligned buffer.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Remove the unnecessary alignmask: it is much more efficient to deal with
the misalignment in the core algorithm than relying on the crypto API to
copy the data to a suitably aligned buffer.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Remove the unnecessary alignmask: it is much more efficient to deal with
the misalignment in the core algorithm than relying on the crypto API to
copy the data to a suitably aligned buffer.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Remove the unnecessary alignmask: it is much more efficient to deal with
the misalignment in the core algorithm than relying on the crypto API to
copy the data to a suitably aligned buffer.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Typecast the pointer with correct structure.
Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
1 Block of encrption can be done with aes-generic. no need of
cbc(aes). This patch replaces cbc(aes-generic) with aes-generic.
Signed-off-by: Harsh Jain <harsh@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The first argument to list_for_each_entry cannot be NULL.
Generated by: scripts/coccinelle/iterators/itnull.cocci
Signed-off-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Harsh Jain <harsh@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Change assign flowc id to each outgoing request.Firmware use flowc id
to schedule each request onto HW. FW reply may miss without this change.
Reviewed-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When VERBOSE_DEBUG is defined and SHA_FLAGS_DUMP_REG flag is set in
dd->flags, this patch prints the register names and values when performing
IO accesses.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patchs allows to combine the AES and SHA hardware accelerators on
some Atmel SoCs. Doing so, AES blocks are only written to/read from the
AES hardware. Those blocks are also transferred from the AES to the SHA
accelerator internally, without additionnal accesses to the system busses.
Hence, the AES and SHA accelerators work in parallel to process all the
data blocks, instead of serializing the process by (de)crypting those
blocks first then authenticating them after like the generic
crypto/authenc.c driver does.
Of course, both the AES and SHA hardware accelerators need to be available
before we can start to process the data blocks. Hence we use their crypto
request queue to synchronize both drivers.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch fixes the value returned by atmel_aes_handle_queue(), which
could have been wrong previously when the crypto request was started
synchronously but became asynchronous during the ctx->start() call.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds support to the hmac(shaX) algorithms.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds a simple function to perform data transfer with the DMA
controller.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds a simple function to perform data transfer with PIO, hence
handled by the CPU.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch defines an alias macro to SHA_MR_MODE_PDC, which is not suited
for DMA usage.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch simply defines a helper function to test the 'Data Ready' flag
of the Status Register. It also gives a chance for the crypto request to
be processed synchronously if this 'Data Ready' flag is already set when
polling the Status Register. Indeed, running synchronously avoid the
latency of the 'Data Ready' interrupt.
When the 'Data Ready' flag has not been set yet, we enable the associated
interrupt and resume processing the crypto request asynchronously from the
'done' task just as before.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch modifies the SHA_FLAGS_SHA* flags: those algo flags are now
organized as values of a single bitfield instead of individual bits.
This allows to reduce the number of bits needed to encode all possible
values. Also the new values match the SHA_MR_ALGO_SHA* values hence
the algorithm bitfield of the SHA_MR register could simply be set with:
mr = (mr & ~SHA_FLAGS_ALGO_MASK) | (ctx->flags & SHA_FLAGS_ALGO_MASK)
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch is a transitional patch. It updates atmel_sha_done_task() to
make it more generic. Indeed, it adds a new .resume() member in the
atmel_sha_dev structure. This hook is called from atmel_sha_done_task()
to resume processing an asynchronous request.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch is a transitional patch. It splits the atmel_sha_handle_queue()
function. Now atmel_sha_handle_queue() only manages the request queue and
calls a new .start() hook from the atmel_sha_ctx structure.
This hook allows to implement different kind of requests still handled by
a single queue.
Also when the req parameter of atmel_sha_handle_queue() refers to the very
same request as the one returned by crypto_dequeue_request(), the queue
management now gives a chance to this crypto request to be handled
synchronously, hence reducing latencies. The .start() hook returns 0 if
the crypto request was handled synchronously and -EINPROGRESS if the
crypto request still need to be handled asynchronously.
Besides, the new .is_async member of the atmel_sha_dev structure helps
tagging this asynchronous state. Indeed, the req->base.complete() callback
should not be called if the crypto request is handled synchronously.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This is a transitional patch: it creates the atmel_sha_find_dev() function,
which will be used in further patches to share the source code responsible
for finding a Atmel SHA device.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The documentation states that crypto_ahash_reqsize() provides the size
of the state structure used by crypto_ahash_export(). But it's actually
crypto_ahash_statesize() which provides this size.
Signed-off-by: Rabin Vincent <rabinv@axis.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Check keylen before copying salt to avoid wrap around of Integer.
Signed-off-by: Harsh Jain <harsh@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Kernel panics when userspace program try to access AEAD interface.
Remove node from Linked List before freeing its memory.
Cc: <stable@vger.kernel.org>
Signed-off-by: Harsh Jain <harsh@chelsio.com>
Reviewed-by: Stephan Müller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When aesni is built as a module together with pcbc, the pcbc module
must be present for aesni to load. However, the pcbc module may not
be present for reasons such as its absence on initramfs. This patch
allows the aesni to function even if the pcbc module is enabled but
not present.
Reported-by: Arkadiusz Miśkiewicz <arekm@maven.pl>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Eliminate a double-add by creating a new list to manage
command descriptors when created; move the descriptor to
the pending list when the command is submitted.
Cc: <stable@vger.kernel.org>
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
An I/O page fault occurs when the IOMMU is enabled on a
system that supports the v5 CCP. DMA operations use a
Request ID value that does not match what is expected by
the IOMMU, resulting in the I/O page fault. Setting the
Request ID value to 0 corrects this issue.
Cc: <stable@vger.kernel.org>
Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ensure dev is allocated for crypto uld context before using the device
for crypto operations.
Cc: <stable@vger.kernel.org>
Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Save DMA mapped sg list addresses to request context buffer.
Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Zero embedded ram in DH85x devices. This is not
needed for newer generations as it is done by HW.
Cc: <stable@vger.kernel.org>
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Some accelerators of the c62x series have only two bars.
This patch skips BAR0 if the accelerator does not have it.
Cc: <stable@vger.kernel.org>
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Some preemptible check warnings were reported from enable_kernel_vsx(). This
patch disables preemption in aes_ctr.c before enabling vsx, and they are now
consistent with other files in the same directory.
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch only regroup functions by usage.
This will help to integrate the GCM support patch later by
adjusting some shared code section, such as common code which
will be reused by GCM, AES mode setting, and DMA transfer.
Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch introduces a new callback 'resume' in the struct mtk_aes_rec.
This callback is run to resume/complete the processing of the crypto
request when woken up by AES interrupts when DMA completion.
This callback will help implementing the GCM mode support in further
patches.
Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch changes mtk_aes_handle_queue() to make it more generic.
The function argument is now a pointer to struct crypto_async_request,
which is the common base of struct ablkcipher_request and
struct aead_request.
Also this patch introduces struct mtk_aes_base_ctx which will be the
common base of all the transformation contexts.
Hence the very same queue will be used to manage both block cipher and
AEAD requests (such as gcm and authenc implemented in further patches).
Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch fixes mtk_aes_xmit() data transfer bug.
The original function uses the same loop and ring->pos
to handle both command and result descriptors. But this
produces incomplete results when src.sg_len != dst.sg_len.
To solve the problem, we splits the descriptors into different
loops and uses cmd_pos and res_pos to record them respectively.
Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch moves hardware control block members from
mtk_*_rec to transformation context and refines related
definition. This makes operational context to manage its
own control information easily for each DMA transfer.
Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
A lot of asm-optimized routines in arch/x86/crypto/ keep its
constants in .data. This is wrong, they should be on .rodata.
Mnay of these constants are the same in different modules.
For example, 128-bit shuffle mask 0x000102030405060708090A0B0C0D0E0F
exists in at least half a dozen places.
There is a way to let linker merge them and use just one copy.
The rules are as follows: mergeable objects of different sizes
should not share sections. You can't put them all in one .rodata
section, they will lose "mergeability".
GCC puts its mergeable constants in ".rodata.cstSIZE" sections,
or ".rodata.cstSIZE.<object_name>" if -fdata-sections is used.
This patch does the same:
.section .rodata.cst16.SHUF_MASK, "aM", @progbits, 16
It is important that all data in such section consists of
16-byte elements, not larger ones, and there are no implicit
use of one element from another.
When this is not the case, use non-mergeable section:
.section .rodata[.VAR_NAME], "a", @progbits
This reduces .data by ~15 kbytes:
text data bss dec hex filename
11097415 2705840 2630712 16433967 fac32f vmlinux-prev.o
11112095 2690672 2630712 16433479 fac147 vmlinux.o
Merged objects are visible in System.map:
ffffffff81a28810 r POLY
ffffffff81a28810 r POLY
ffffffff81a28820 r TWOONE
ffffffff81a28820 r TWOONE
ffffffff81a28830 r PSHUFFLE_BYTE_FLIP_MASK <- merged regardless of
ffffffff81a28830 r SHUF_MASK <------------- the name difference
ffffffff81a28830 r SHUF_MASK
ffffffff81a28830 r SHUF_MASK
..
ffffffff81a28d00 r K512 <- merged three identical 640-byte tables
ffffffff81a28d00 r K512
ffffffff81a28d00 r K512
Use of object names in section name suffixes is not strictly necessary,
but might help if someday link stage will use garbage collection
to eliminate unused sections (ld --gc-sections).
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: Josh Poimboeuf <jpoimboe@redhat.com>
CC: Xiaodong Liu <xiaodong.liu@intel.com>
CC: Megha Dey <megha.dey@intel.com>
CC: linux-crypto@vger.kernel.org
CC: x86@kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
%progbits form is used on ARM (where @ is a comment char).
x86 consistently uses @progbits everywhere else.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: Josh Poimboeuf <jpoimboe@redhat.com>
CC: Xiaodong Liu <xiaodong.liu@intel.com>
CC: Megha Dey <megha.dey@intel.com>
CC: George Spelvin <linux@horizon.com>
CC: linux-crypto@vger.kernel.org
CC: x86@kernel.org
CC: linux-kernel@vger.kernel.org
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The GNU assembler for ARM version 2.22 or older fails to infer the
element size from the vmov instructions, and aborts the build in
the following way;
.../aes-neonbs-core.S: Assembler messages:
.../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1h[1],r10'
.../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1h[0],r9'
.../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1l[1],r8'
.../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1l[0],r7'
.../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2h[1],r10'
.../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2h[0],r9'
.../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2l[1],r8'
.../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2l[0],r7'
Fix this by setting the element size explicitly, by replacing vmov with
vmov.32.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
tcrypt is very tight-lipped when it succeeds, but a bit more feedback
would be useful when developing or debugging crypto drivers, especially
since even a successful run ends with the module failing to insert. Add
a couple of debug prints, which can be enabled with dynamic debug:
Before:
# insmod tcrypt.ko mode=10
insmod: can't insert 'tcrypt.ko': Resource temporarily unavailable
After:
# insmod tcrypt.ko mode=10 dyndbg
tcrypt: testing ecb(aes)
tcrypt: testing cbc(aes)
tcrypt: testing lrw(aes)
tcrypt: testing xts(aes)
tcrypt: testing ctr(aes)
tcrypt: testing rfc3686(ctr(aes))
tcrypt: all tests passed
insmod: can't insert 'tcrypt.ko': Resource temporarily unavailable
Signed-off-by: Rabin Vincent <rabinv@axis.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>