Commit Graph

1983 Commits

Author SHA1 Message Date
Herbert Xu 4473710df1 crypto: user - Prepare for CRYPTO_MAX_ALG_NAME expansion
This patch hard-codes CRYPTO_MAX_NAME in the user-space API to
64, which is the current value of CRYPTO_MAX_ALG_NAME.  This patch
also replaces all remaining occurences of CRYPTO_MAX_ALG_NAME
in the user-space API with CRYPTO_MAX_NAME.

This way the user-space API will not be modified when we raise
the value of CRYPTO_MAX_ALG_NAME.

Furthermore, the code has been updated to handle names longer than
the user-space API.  They will be truncated.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Tested-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
2017-04-10 19:17:26 +08:00
Ondrej Mosnáček ad1064cd61 crypto: xts - drop gf128mul dependency
Since the gf128mul_x_ble function used by xts.c is now defined inline
in the header file, the XTS module no longer depends on gf128mul.
Therefore, the 'select CRYPTO_GF128MUL' line can be safely removed.

Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com>
Reviewd-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-04-05 21:58:37 +08:00
Ondrej Mosnáček e55318c84f crypto: gf128mul - switch gf128mul_x_ble to le128
Currently, gf128mul_x_ble works with pointers to be128, even though it
actually interprets the words as little-endian. Consequently, it uses
cpu_to_le64/le64_to_cpu on fields of type __be64, which is incorrect.

This patch fixes that by changing the function to accept pointers to
le128 and updating all users accordingly.

Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com>
Reviewd-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-04-05 21:58:37 +08:00
Ondrej Mosnáček acb9b159c7 crypto: gf128mul - define gf128mul_x_* in gf128mul.h
The gf128mul_x_ble function is currently defined in gf128mul.c, because
it depends on the gf128mul_table_be multiplication table.

However, since the function is very small and only uses two values from
the table, it is better for it to be defined as inline function in
gf128mul.h. That way, the function can be inlined by the compiler for
better performance.

For consistency, the other gf128mul_x_* functions are also moved to the
header file. In addition, the code is rewritten to be constant-time.

After this change, the speed of the generic 'xts(aes)' implementation
increased from ~225 MiB/s to ~235 MiB/s (measured using 'cryptsetup
benchmark -c aes-xts-plain64' on an Intel system with CRYPTO_AES_X86_64
and CRYPTO_AES_NI_INTEL disabled).

Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com>
Reviewd-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-04-05 21:58:35 +08:00
Herbert Xu c6dc060906 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Merge the crypto tree to resolve conflict between caam changes.
2017-04-05 21:57:07 +08:00
Stephan Mueller 44068d5999 crypto: DRBG - initialize SGL only once
An SGL to be initialized only once even when its buffers are written
to several times.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-24 22:03:01 +08:00
Marcelo Cerri 0d8da10484 crypto: testmgr - mark ctr(des3_ede) as fips_allowed
3DES is missing the fips_allowed flag for CTR mode.

Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com>
Acked-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-24 22:03:01 +08:00
Jason A. Donenfeld 3c7eb3cc83 md5: remove from lib and only live in crypto
The md5_transform function is no longer used any where in the tree,
except for the crypto api's actual implementation of md5, so we can drop
the function from lib and put it as a static function of the crypto
file, where it belongs. There should be no new users of md5_transform,
anyway, since there are more modern ways of doing what it once achieved.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-24 22:02:56 +08:00
Daniel Axtens 146c8688d9 crypto: powerpc - Stress test for vpmsum implementations
vpmsum implementations often don't kick in for short test vectors.
This is a simple test module that does a configurable number of
random tests, each up to 64kB and each with random offsets.

Both CRC-T10DIF and CRC32C are tested.

Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-24 22:02:54 +08:00
Daniel Axtens b01df1c16c crypto: powerpc - Add CRC-T10DIF acceleration
T10DIF is a CRC16 used heavily in NVMe.

It turns out we can accelerate it with a CRC32 library and a few
little tricks.

Provide the accelerator based the refactored CRC32 code.

Cc: Anton Blanchard <anton@samba.org>
Thanks-to: Hong Bo Peng <penghb@cn.ibm.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-24 22:02:53 +08:00
Herbert Xu 2e6d603e51 Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux
Merging 4.11-rc3 to pick up md5 removal from /dev/random.
2017-03-24 21:58:58 +08:00
Eric Biggers 9df0eb180c crypto: xts,lrw - fix out-of-bounds write after kmalloc failure
In the generic XTS and LRW algorithms, for input data > 128 bytes, a
temporary buffer is allocated to hold the values to be XOR'ed with the
data before and after encryption or decryption.  If the allocation
fails, the fixed-size buffer embedded in the request buffer is meant to
be used as a fallback --- resulting in more calls to the ECB algorithm,
but still producing the correct result.  However, we weren't correctly
limiting subreq->cryptlen in this case, resulting in pre_crypt()
overrunning the embedded buffer.  Fix this by setting subreq->cryptlen
correctly.

Fixes: f1c131b454 ("crypto: xts - Convert to skcipher")
Fixes: 700cb3f5fe ("crypto: lrw - Convert to skcipher")
Cc: stable@vger.kernel.org # v4.10+
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-24 21:51:34 +08:00
David Howells cdfbabfb2f net: Work around lockdep limitation in sockets that use sockets
Lockdep issues a circular dependency warning when AFS issues an operation
through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.

The theory lockdep comes up with is as follows:

 (1) If the pagefault handler decides it needs to read pages from AFS, it
     calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
     creating a call requires the socket lock:

	mmap_sem must be taken before sk_lock-AF_RXRPC

 (2) afs_open_socket() opens an AF_RXRPC socket and binds it.  rxrpc_bind()
     binds the underlying UDP socket whilst holding its socket lock.
     inet_bind() takes its own socket lock:

	sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET

 (3) Reading from a TCP socket into a userspace buffer might cause a fault
     and thus cause the kernel to take the mmap_sem, but the TCP socket is
     locked whilst doing this:

	sk_lock-AF_INET must be taken before mmap_sem

However, lockdep's theory is wrong in this instance because it deals only
with lock classes and not individual locks.  The AF_INET lock in (2) isn't
really equivalent to the AF_INET lock in (3) as the former deals with a
socket entirely internal to the kernel that never sees userspace.  This is
a limitation in the design of lockdep.

Fix the general case by:

 (1) Double up all the locking keys used in sockets so that one set are
     used if the socket is created by userspace and the other set is used
     if the socket is created by the kernel.

 (2) Store the kern parameter passed to sk_alloc() in a variable in the
     sock struct (sk_kern_sock).  This informs sock_lock_init(),
     sock_init_data() and sk_clone_lock() as to the lock keys to be used.

     Note that the child created by sk_clone_lock() inherits the parent's
     kern setting.

 (3) Add a 'kern' parameter to ->accept() that is analogous to the one
     passed in to ->create() that distinguishes whether kernel_accept() or
     sys_accept4() was the caller and can be passed to sk_alloc().

     Note that a lot of accept functions merely dequeue an already
     allocated socket.  I haven't touched these as the new socket already
     exists before we get the parameter.

     Note also that there are a couple of places where I've made the accepted
     socket unconditionally kernel-based:

	irda_accept()
	rds_rcp_accept_one()
	tcp_accept_from_sock()

     because they follow a sock_create_kern() and accept off of that.

Whilst creating this, I noticed that lustre and ocfs don't create sockets
through sock_create_kern() and thus they aren't marked as for-kernel,
though they appear to be internal.  I wonder if these should do that so
that they use the new set of lock keys.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09 18:23:27 -08:00
Marcelo Cerri d2c2a85cfe crypto: ctr - Propagate NEED_FALLBACK bit
When requesting a fallback algorithm, we should propagate the
NEED_FALLBACK bit when search for the underlying algorithm.

This will prevents drivers from allocating unnecessary fallbacks that
are never called. For instance, currently the vmx-crypto driver will use
the following chain of calls when calling the fallback implementation:

p8_aes_ctr -> ctr(p8_aes) -> aes-generic

However p8_aes will always delegate its calls to aes-generic. With this
patch, p8_aes_ctr will be able to use ctr(aes-generic) directly as its
fallback. The same applies to aes_s390.

Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-09 18:34:39 +08:00
Marcelo Cerri e6c2e65c70 crypto: cbc - Propagate NEED_FALLBACK bit
When requesting a fallback algorithm, we should propagate the
NEED_FALLBACK bit when search for the underlying algorithm.

This will prevents drivers from allocating unnecessary fallbacks that
are never called. For instance, currently the vmx-crypto driver will use
the following chain of calls when calling the fallback implementation:

p8_aes_cbc -> cbc(p8_aes) -> aes-generic

However p8_aes will always delegate its calls to aes-generic. With this
patch, p8_aes_cbc will be able to use cbc(aes-generic) directly as its
fallback. The same applies to aes_s390.

Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-09 18:34:39 +08:00
Eric Biggers b13b1e0c6b crypto: testmgr - constify all test vectors
Cryptographic test vectors should never be modified, so constify them to
enforce this at both compile-time and run-time.  This moves a significant
amount of data from .data to .rodata when the crypto tests are enabled.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-09 18:34:39 +08:00
Eric Biggers 5527dfb6dd crypto: kpp - constify buffer passed to crypto_kpp_set_secret()
Constify the buffer passed to crypto_kpp_set_secret() and
kpp_alg.set_secret, since it is never modified.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-09 18:34:27 +08:00
Ard Biesheuvel 27c539aeff crypto: algapi - annotate expected branch behavior in crypto_inc()
To prevent unnecessary branching, mark the exit condition of the
primary loop as likely(), given that a carry in a 32-bit counter
occurs very rarely.

On arm64, the resulting code is emitted by GCC as

     9a8:   cmp     w1, #0x3
     9ac:   add     x3, x0, w1, uxtw
     9b0:   b.ls    9e0 <crypto_inc+0x38>
     9b4:   ldr     w2, [x3,#-4]!
     9b8:   rev     w2, w2
     9bc:   add     w2, w2, #0x1
     9c0:   rev     w4, w2
     9c4:   str     w4, [x3]
     9c8:   cbz     w2, 9d0 <crypto_inc+0x28>
     9cc:   ret

where the two remaining branch conditions (one for size < 4 and one for
the carry) are statically predicted as non-taken, resulting in optimal
execution in the vast majority of cases.

Also, replace the open coded alignment test with IS_ALIGNED().

Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-09 18:34:17 +08:00
Eric Biggers 3ea996ddfb crypto: gf128mul - constify 4k and 64k multiplication tables
Constify the multiplication tables passed to the 4k and 64k
multiplication functions, as they are not modified by these functions.

Cc: Alex Cope <alexcope@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-09 18:34:16 +08:00
Eric Biggers f33fd64778 crypto: gf128mul - rename the byte overflow tables
Though the GF(2^128) byte overflow tables were named the "lle" and "bbe"
tables, they are not actually tied to these element formats
specifically, but rather to particular a "bit endianness".  For example,
the bbe table is actually used for both bbe and ble multiplication.
Therefore, rename the tables to "le" and "be" and update the comment to
explain this.

Cc: Alex Cope <alexcope@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-09 18:34:15 +08:00
Eric Biggers 2416e4fa98 crypto: gf128mul - remove xx() macro
The xx() macro serves no purpose and can be removed.

Cc: Alex Cope <alexcope@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-09 18:34:15 +08:00
Eric Biggers 63be5b53b6 crypto: gf128mul - fix some comments
Fix incorrect references to GF(128) instead of GF(2^128), as these are
two entirely different fields, and fix a few other incorrect comments.

Cc: Alex Cope <alexcope@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-09 18:34:14 +08:00
Linus Torvalds 33a8b3e99d Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:

 - vmalloc stack regression in CCM

 - Build problem in CRC32 on ARM

 - Memory leak in cavium

 - Missing Kconfig dependencies in atmel and mediatek

 - XTS Regression on some platforms (s390 and ppc)

 - Memory overrun in CCM test vector

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: vmx - Use skcipher for xts fallback
  crypto: vmx - Use skcipher for cbc fallback
  crypto: testmgr - Pad aes_ccm_enc_tv_template vector
  crypto: arm/crc32 - add build time test for CRC instruction support
  crypto: arm/crc32 - fix build error with outdated binutils
  crypto: ccm - move cbcmac input off the stack
  crypto: xts - Propagate NEED_FALLBACK bit
  crypto: api - Add crypto_requires_off helper
  crypto: atmel - CRYPTO_DEV_MEDIATEK should depend on HAS_DMA
  crypto: atmel - CRYPTO_DEV_ATMEL_TDES and CRYPTO_DEV_ATMEL_SHA should depend on HAS_DMA
  crypto: cavium - fix leak on curr if curr->head fails to be allocated
  crypto: cavium - Fix couple of static checker errors
2017-03-04 10:42:53 -08:00
Ingo Molnar 03441a3482 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/stat.h>
We are going to split <linux/sched/stat.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/stat.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:34 +01:00
Ingo Molnar 174cd4b1e5 sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h>
Fix up affected files that include this signal functionality via sched.h.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:32 +01:00
Ingo Molnar ae7e81c077 sched/headers: Prepare for new header dependencies before moving code to <uapi/linux/sched/types.h>
We are going to move scheduler ABI details to <uapi/linux/sched/types.h>,
which will be used from a number of .c files.

Create empty placeholder header that maps to <linux/types.h>.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:27 +01:00
Laura Abbott 1c68bb0f62 crypto: testmgr - Pad aes_ccm_enc_tv_template vector
Running with KASAN and crypto tests currently gives

 BUG: KASAN: global-out-of-bounds in __test_aead+0x9d9/0x2200 at addr ffffffff8212fca0
 Read of size 16 by task cryptomgr_test/1107
 Address belongs to variable 0xffffffff8212fca0
 CPU: 0 PID: 1107 Comm: cryptomgr_test Not tainted 4.10.0+ #45
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
 Call Trace:
  dump_stack+0x63/0x8a
  kasan_report.part.1+0x4a7/0x4e0
  ? __test_aead+0x9d9/0x2200
  ? crypto_ccm_init_crypt+0x218/0x3c0 [ccm]
  kasan_report+0x20/0x30
  check_memory_region+0x13c/0x1a0
  memcpy+0x23/0x50
  __test_aead+0x9d9/0x2200
  ? kasan_unpoison_shadow+0x35/0x50
  ? alg_test_akcipher+0xf0/0xf0
  ? crypto_skcipher_init_tfm+0x2e3/0x310
  ? crypto_spawn_tfm2+0x37/0x60
  ? crypto_ccm_init_tfm+0xa9/0xd0 [ccm]
  ? crypto_aead_init_tfm+0x7b/0x90
  ? crypto_alloc_tfm+0xc4/0x190
  test_aead+0x28/0xc0
  alg_test_aead+0x54/0xd0
  alg_test+0x1eb/0x3d0
  ? alg_find_test+0x90/0x90
  ? __sched_text_start+0x8/0x8
  ? __wake_up_common+0x70/0xb0
  cryptomgr_test+0x4d/0x60
  kthread+0x173/0x1c0
  ? crypto_acomp_scomp_free_ctx+0x60/0x60
  ? kthread_create_on_node+0xa0/0xa0
  ret_from_fork+0x2c/0x40
 Memory state around the buggy address:
  ffffffff8212fb80: 00 00 00 00 01 fa fa fa fa fa fa fa 00 00 00 00
  ffffffff8212fc00: 00 01 fa fa fa fa fa fa 00 00 00 00 01 fa fa fa
 >ffffffff8212fc80: fa fa fa fa 00 05 fa fa fa fa fa fa 00 00 00 00
                                   ^
  ffffffff8212fd00: 01 fa fa fa fa fa fa fa 00 00 00 00 01 fa fa fa
  ffffffff8212fd80: fa fa fa fa 00 00 00 00 00 05 fa fa fa fa fa fa

This always happens on the same IV which is less than 16 bytes.

Per Ard,

"CCM IVs are 16 bytes, but due to the way they are constructed
internally, the final couple of bytes of input IV are dont-cares.

Apparently, we do read all 16 bytes, which triggers the KASAN errors."

Fix this by padding the IV with null bytes to be at least 16 bytes.

Cc: stable@vger.kernel.org
Fixes: 0bc5a6c5c7 ("crypto: testmgr - Disable rfc4309 test and convert
test vectors")
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-01 19:48:00 +08:00
Ard Biesheuvel 3b30460c5b crypto: ccm - move cbcmac input off the stack
Commit f15f05b0a5 ("crypto: ccm - switch to separate cbcmac driver")
refactored the CCM driver to allow separate implementations of the
underlying MAC to be provided by a platform. However, in doing so, it
moved some data from the linear region to the stack, which violates the
SG constraints when the stack is virtually mapped.

So move idata/odata back to the request ctx struct, of which we can
reasonably expect that it has been allocated using kmalloc() et al.

Reported-by: Johannes Berg <johannes@sipsolutions.net>
Fixes: f15f05b0a5 ("crypto: ccm - switch to separate cbcmac driver")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-28 17:29:17 +08:00
Herbert Xu 89027579bc crypto: xts - Propagate NEED_FALLBACK bit
When we're used as a fallback algorithm, we should propagate
the NEED_FALLBACK bit when searching for the underlying ECB mode.

This just happens to fix a hang too because otherwise the search
may end up loading the same module that triggered this XTS creation.

Cc: stable@vger.kernel.org #4.10
Fixes: f1c131b454 ("crypto: xts - Convert to skcipher")
Reported-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-27 18:09:41 +08:00
Sven Schmidt 73a15ac6d5 crypto: change LZ4 modules to work with new LZ4 module version
Update the crypto modules using LZ4 compression as well as the test
cases in testmgr.h to work with the new LZ4 module version.

Link: http://lkml.kernel.org/r/1486321748-19085-4-git-send-email-4sschmid@informatik.uni-hamburg.de
Signed-off-by: Sven Schmidt <4sschmid@informatik.uni-hamburg.de>
Cc: Bongkyu Kim <bongkyu.kim@lge.com>
Cc: Rui Salvaterra <rsalvaterra@gmail.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David S. Miller <davem@davemloft.net>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:57 -08:00
Milan Broz 12cb3a1c41 crypto: xts - Add ECB dependency
Since the
   commit f1c131b454
   crypto: xts - Convert to skcipher
the XTS mode is based on ECB, so the mode must select
ECB otherwise it can fail to initialize.

Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-23 20:11:06 +08:00
Ard Biesheuvel 5ba8e2a05e crypto: ccm - drop unnecessary minimum 32-bit alignment
The CCM driver forces 32-bit alignment even if the underlying ciphers
don't care about alignment. This is because crypto_xor() used to require
this, but since this is no longer the case, drop the hardcoded minimum
of 32 bits.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-15 13:23:46 +08:00
Ard Biesheuvel 5338ad7065 crypto: ccm - honour alignmask of subordinate MAC cipher
The CCM driver was recently updated to defer the MAC part of the algorithm
to a dedicated crypto transform, and a template for instantiating such
transforms was added at the same time.

However, this new cbcmac template fails to take the alignmask of the
encapsulated cipher into account, which may result in buffer addresses
being passed down that are not sufficiently aligned.

So update the code to ensure that the digest buffer in the desc ctx
appears at a sufficiently aligned offset, and tweak the code so that all
calls to crypto_cipher_encrypt_one() operate on this buffer exclusively.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-15 13:23:45 +08:00
Ard Biesheuvel db91af0fbe crypto: algapi - make crypto_xor() and crypto_inc() alignment agnostic
Instead of unconditionally forcing 4 byte alignment for all generic
chaining modes that rely on crypto_xor() or crypto_inc() (which may
result in unnecessary copying of data when the underlying hardware
can perform unaligned accesses efficiently), make those functions
deal with unaligned input explicitly, but only if the Kconfig symbol
HAVE_EFFICIENT_UNALIGNED_ACCESS is set. This will allow us to drop
the alignmasks from the CBC, CMAC, CTR, CTS, PCBC and SEQIV drivers.

For crypto_inc(), this simply involves making the 4-byte stride
conditional on HAVE_EFFICIENT_UNALIGNED_ACCESS being set, given that
it typically operates on 16 byte buffers.

For crypto_xor(), an algorithm is implemented that simply runs through
the input using the largest strides possible if unaligned accesses are
allowed. If they are not, an optimal sequence of memory accesses is
emitted that takes the relative alignment of the input buffers into
account, e.g., if the relative misalignment of dst and src is 4 bytes,
the entire xor operation will be completed using 4 byte loads and stores
(modulo unaligned bits at the start and end). Note that all expressions
involving misalign are simply eliminated by the compiler when
HAVE_EFFICIENT_UNALIGNED_ACCESS is defined.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-11 17:52:28 +08:00
Arnd Bergmann 7d6e910502 crypto: improve gcc optimization flags for serpent and wp512
An ancient gcc bug (first reported in 2003) has apparently resurfaced
on MIPS, where kernelci.org reports an overly large stack frame in the
whirlpool hash algorithm:

crypto/wp512.c:987:1: warning: the frame size of 1112 bytes is larger than 1024 bytes [-Wframe-larger-than=]

With some testing in different configurations, I'm seeing large
variations in stack frames size up to 1500 bytes for what should have
around 300 bytes at most. I also checked the reference implementation,
which is essentially the same code but also comes with some test and
benchmarking infrastructure.

It seems that recent compiler versions on at least arm, arm64 and powerpc
have a partial fix for this problem, but enabling "-fsched-pressure", but
even with that fix they suffer from the issue to a certain degree. Some
testing on arm64 shows that the time needed to hash a given amount of
data is roughly proportional to the stack frame size here, which makes
sense given that the wp512 implementation is doing lots of loads for
table lookups, and the problem with the overly large stack is a result
of doing a lot more loads and stores for spilled registers (as seen from
inspecting the object code).

Disabling -fschedule-insns consistently fixes the problem for wp512,
in my collection of cross-compilers, the results are consistently better
or identical when comparing the stack sizes in this function, though
some architectures (notable x86) have schedule-insns disabled by
default.

The four columns are:
default: -O2
press:	 -O2 -fsched-pressure
nopress: -O2 -fschedule-insns -fno-sched-pressure
nosched: -O2 -no-schedule-insns (disables sched-pressure)

				default	press	nopress	nosched
alpha-linux-gcc-4.9.3		1136	848	1136	176
am33_2.0-linux-gcc-4.9.3	2100	2076	2100	2104
arm-linux-gnueabi-gcc-4.9.3	848	848	1048	352
cris-linux-gcc-4.9.3		272	272	272	272
frv-linux-gcc-4.9.3		1128	1000	1128	280
hppa64-linux-gcc-4.9.3		1128	336	1128	184
hppa-linux-gcc-4.9.3		644	308	644	276
i386-linux-gcc-4.9.3		352	352	352	352
m32r-linux-gcc-4.9.3		720	656	720	268
microblaze-linux-gcc-4.9.3	1108	604	1108	256
mips64-linux-gcc-4.9.3		1328	592	1328	208
mips-linux-gcc-4.9.3		1096	624	1096	240
powerpc64-linux-gcc-4.9.3	1088	432	1088	160
powerpc-linux-gcc-4.9.3		1080	584	1080	224
s390-linux-gcc-4.9.3		456	456	624	360
sh3-linux-gcc-4.9.3		292	292	292	292
sparc64-linux-gcc-4.9.3		992	240	992	208
sparc-linux-gcc-4.9.3		680	592	680	312
x86_64-linux-gcc-4.9.3		224	240	272	224
xtensa-linux-gcc-4.9.3		1152	704	1152	304

aarch64-linux-gcc-7.0.0		224	224	1104	208
arm-linux-gnueabi-gcc-7.0.1	824	824	1048	352
mips-linux-gcc-7.0.0		1120	648	1120	272
x86_64-linux-gcc-7.0.1		240	240	304	240

arm-linux-gnueabi-gcc-4.4.7	840			392
arm-linux-gnueabi-gcc-4.5.4	784	728	784	320
arm-linux-gnueabi-gcc-4.6.4	736	728	736	304
arm-linux-gnueabi-gcc-4.7.4	944	784	944	352
arm-linux-gnueabi-gcc-4.8.5	464	464	760	352
arm-linux-gnueabi-gcc-4.9.3	848	848	1048	352
arm-linux-gnueabi-gcc-5.3.1	824	824	1064	336
arm-linux-gnueabi-gcc-6.1.1	808	808	1056	344
arm-linux-gnueabi-gcc-7.0.1	824	824	1048	352

Trying the same test for serpent-generic, the picture is a bit different,
and while -fno-schedule-insns is generally better here than the default,
-fsched-pressure wins overall, so I picked that instead.

				default	press	nopress	nosched
alpha-linux-gcc-4.9.3		1392	864	1392	960
am33_2.0-linux-gcc-4.9.3	536	524	536	528
arm-linux-gnueabi-gcc-4.9.3	552	552	776	536
cris-linux-gcc-4.9.3		528	528	528	528
frv-linux-gcc-4.9.3		536	400	536	504
hppa64-linux-gcc-4.9.3		524	208	524	480
hppa-linux-gcc-4.9.3		768	472	768	508
i386-linux-gcc-4.9.3		564	564	564	564
m32r-linux-gcc-4.9.3		712	576	712	532
microblaze-linux-gcc-4.9.3	724	392	724	512
mips64-linux-gcc-4.9.3		720	384	720	496
mips-linux-gcc-4.9.3		728	384	728	496
powerpc64-linux-gcc-4.9.3	704	304	704	480
powerpc-linux-gcc-4.9.3		704	296	704	480
s390-linux-gcc-4.9.3		560	560	592	536
sh3-linux-gcc-4.9.3		540	540	540	540
sparc64-linux-gcc-4.9.3		544	352	544	496
sparc-linux-gcc-4.9.3		544	344	544	496
x86_64-linux-gcc-4.9.3		528	536	576	528
xtensa-linux-gcc-4.9.3		752	544	752	544

aarch64-linux-gcc-7.0.0		432	432	656	480
arm-linux-gnueabi-gcc-7.0.1	616	616	808	536
mips-linux-gcc-7.0.0		720	464	720	488
x86_64-linux-gcc-7.0.1		536	528	600	536

arm-linux-gnueabi-gcc-4.4.7	592			440
arm-linux-gnueabi-gcc-4.5.4	776	448	776	544
arm-linux-gnueabi-gcc-4.6.4	776	448	776	544
arm-linux-gnueabi-gcc-4.7.4	768	448	768	544
arm-linux-gnueabi-gcc-4.8.5	488	488	776	544
arm-linux-gnueabi-gcc-4.9.3	552	552	776	536
arm-linux-gnueabi-gcc-5.3.1	552	552	776	536
arm-linux-gnueabi-gcc-6.1.1	560	560	776	536
arm-linux-gnueabi-gcc-7.0.1	616	616	808	536

I did not do any runtime tests with serpent, so it is possible that stack
frame size does not directly correlate with runtime performance here and
it actually makes things worse, but it's more likely to help here, and
the reduced stack frame size is probably enough reason to apply the patch,
especially given that the crypto code is often used in deep call chains.

Link: https://kernelci.org/build/id/58797d7559b5149efdf6c3a9/logs/
Link: http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11488
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79149
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-11 17:52:26 +08:00
Ard Biesheuvel f15f05b0a5 crypto: ccm - switch to separate cbcmac driver
Update the generic CCM driver to defer CBC-MAC processing to a
dedicated CBC-MAC ahash transform rather than open coding this
transform (and much of the associated scatterwalk plumbing) in
the CCM driver itself.

This cleans up the code considerably, but more importantly, it allows
the use of alternative CBC-MAC implementations that don't suffer from
performance degradation due to significant setup time (e.g., the NEON
based AES code needs to enable/disable the NEON, and load the S-box
into 16 SIMD registers, which cannot be amortized over the entire input
when using the cipher interface)

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-11 17:50:45 +08:00
Ard Biesheuvel 092acf0698 crypto: testmgr - add test cases for cbcmac(aes)
In preparation of splitting off the CBC-MAC transform in the CCM
driver into a separate algorithm, define some test cases for the
AES incarnation of cbcmac.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-11 17:50:44 +08:00
Ard Biesheuvel b5e0b032b6 crypto: aes - add generic time invariant AES cipher
Lookup table based AES is sensitive to timing attacks, which is due to
the fact that such table lookups are data dependent, and the fact that
8 KB worth of tables covers a significant number of cachelines on any
architecture, resulting in an exploitable correlation between the key
and the processing time for known plaintexts.

For network facing algorithms such as CTR, CCM or GCM, this presents a
security risk, which is why arch specific AES ports are typically time
invariant, either through the use of special instructions, or by using
SIMD algorithms that don't rely on table lookups.

For generic code, this is difficult to achieve without losing too much
performance, but we can improve the situation significantly by switching
to an implementation that only needs 256 bytes of table data (the actual
S-box itself), which can be prefetched at the start of each block to
eliminate data dependent latencies.

This code encrypts at ~25 cycles per byte on ARM Cortex-A57 (while the
ordinary generic AES driver manages 18 cycles per byte on this
hardware). Decryption is substantially slower.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-11 17:50:43 +08:00
Ard Biesheuvel ec38a93761 crypto: aes-generic - drop alignment requirement
The generic AES code exposes a 32-bit align mask, which forces all
users of the code to use temporary buffers or take other measures to
ensure the alignment requirement is adhered to, even on architectures
that don't care about alignment for software algorithms such as this
one.

So drop the align mask, and fix the code to use get_unaligned_le32()
where appropriate, which will resolve to whatever is optimal for the
architecture.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-11 17:50:43 +08:00
Herbert Xu 34cb582139 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Merge the crypto tree to pick up arm64 output IV patch.
2017-02-03 18:14:10 +08:00
Harsh Jain 0b529f143e crypto: algif_aead - Fix kernel panic on list_del
Kernel panics when userspace program try to access AEAD interface.
Remove node from Linked List before freeing its memory.

Cc: <stable@vger.kernel.org>
Signed-off-by: Harsh Jain <harsh@chelsio.com>
Reviewed-by: Stephan Müller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03 17:45:48 +08:00
Rabin Vincent 76512f2d8c crypto: tcrypt - Add debug prints
tcrypt is very tight-lipped when it succeeds, but a bit more feedback
would be useful when developing or debugging crypto drivers, especially
since even a successful run ends with the module failing to insert. Add
a couple of debug prints, which can be enabled with dynamic debug:

Before:

 # insmod tcrypt.ko mode=10
 insmod: can't insert 'tcrypt.ko': Resource temporarily unavailable

After:

 # insmod tcrypt.ko mode=10 dyndbg
 tcrypt: testing ecb(aes)
 tcrypt: testing cbc(aes)
 tcrypt: testing lrw(aes)
 tcrypt: testing xts(aes)
 tcrypt: testing ctr(aes)
 tcrypt: testing rfc3686(ctr(aes))
 tcrypt: all tests passed
 insmod: can't insert 'tcrypt.ko': Resource temporarily unavailable

Signed-off-by: Rabin Vincent <rabinv@axis.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-23 22:50:24 +08:00
Salvatore Benedetto d6040764ad crypto: api - Clear CRYPTO_ALG_DEAD bit before registering an alg
Make sure CRYPTO_ALG_DEAD bit is cleared before proceeding with
the algorithm registration. This fixes qat-dh registration when
driver is restarted

Cc: <stable@vger.kernel.org>
Signed-off-by: Salvatore Benedetto <salvatore.benedetto@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-23 22:41:32 +08:00
Ard Biesheuvel 21c8e72037 crypto: testmgr - use calculated count for number of test vectors
When working on AES in CCM mode for ARM, my code passed the internal
tcrypt test before I had even bothered to implement the AES-192 and
AES-256 code paths, which is strange because the tcrypt does contain
AES-192 and AES-256 test vectors for CCM.

As it turned out, the define AES_CCM_ENC_TEST_VECTORS was out of sync
with the actual number of test vectors, causing only the AES-128 ones
to be executed.

So get rid of the defines, and wrap the test vector references in a
macro that calculates the number of vectors automatically.

The following test vector counts were out of sync with the respective
defines:

    BF_CTR_ENC_TEST_VECTORS          2 ->  3
    BF_CTR_DEC_TEST_VECTORS          2 ->  3
    TF_CTR_ENC_TEST_VECTORS          2 ->  3
    TF_CTR_DEC_TEST_VECTORS          2 ->  3
    SERPENT_CTR_ENC_TEST_VECTORS     2 ->  3
    SERPENT_CTR_DEC_TEST_VECTORS     2 ->  3
    AES_CCM_ENC_TEST_VECTORS         8 -> 14
    AES_CCM_DEC_TEST_VECTORS         7 -> 17
    AES_CCM_4309_ENC_TEST_VECTORS    7 -> 23
    AES_CCM_4309_DEC_TEST_VECTORS   10 -> 23
    CAMELLIA_CTR_ENC_TEST_VECTORS    2 ->  3
    CAMELLIA_CTR_DEC_TEST_VECTORS    2 ->  3

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-13 18:47:15 +08:00
Andrew Lutomirski e93acd6f67 crypto: testmgr - Allocate only the required output size for hash tests
There are some hashes (e.g. sha224) that have some internal trickery
to make sure that only the correct number of output bytes are
generated.  If something goes wrong, they could potentially overrun
the output buffer.

Make the test more robust by allocating only enough space for the
correct output size so that memory debugging will catch the error if
the output is overrun.

Tested by intentionally breaking sha224 to output all 256
internally-generated bits while running on KASAN.

Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-13 00:26:45 +08:00
Gideon Israel Dsouza d8c34b949d crypto: Replaced gcc specific attributes with macros from compiler.h
Continuing from this commit: 52f5684c8e
("kernel: use macros from compiler.h instead of __attribute__((...))")

I submitted 4 total patches. They are part of task I've taken up to
increase compiler portability in the kernel. I've cleaned up the
subsystems under /kernel /mm /block and /security, this patch targets
/crypto.

There is <linux/compiler.h> which provides macros for various gcc specific
constructs. Eg: __weak for __attribute__((weak)). I've cleaned all
instances of gcc specific attributes with the right macros for the crypto
subsystem.

I had to make one additional change into compiler-gcc.h for the case when
one wants to use this: __attribute__((aligned) and not specify an alignment
factor. From the gcc docs, this will result in the largest alignment for
that data type on the target machine so I've named the macro
__aligned_largest. Please advise if another name is more appropriate.

Signed-off-by: Gideon Israel Dsouza <gidisrael@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-13 00:24:39 +08:00
Eric Biggers d2110224a6 crypto: testmgr - use kmemdup instead of kmalloc+memcpy
It's recommended to use kmemdup instead of kmalloc followed by memcpy.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-13 00:24:37 +08:00
Ard Biesheuvel c821f6ab2e crypto: skcipher - introduce walksize attribute for SIMD algos
In some cases, SIMD algorithms can only perform optimally when
allowed to operate on multiple input blocks in parallel. This is
especially true for bit slicing algorithms, which typically take
the same amount of time processing a single block or 8 blocks in
parallel. However, other SIMD algorithms may benefit as well from
bigger strides.

So add a walksize attribute to the skcipher algorithm definition, and
wire it up to the skcipher walk API. To avoid confusion between the
skcipher and AEAD attributes, rename the skcipher_walk chunksize
attribute to 'stride', and set it from the walksize (in the skcipher
case) or from the chunksize (in the AEAD case).

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-12-30 19:52:47 +08:00
Jiri Slaby 6207119444 crypto: algif_hash - avoid zero-sized array
With this reproducer:
  struct sockaddr_alg alg = {
          .salg_family = 0x26,
          .salg_type = "hash",
          .salg_feat = 0xf,
          .salg_mask = 0x5,
          .salg_name = "digest_null",
  };
  int sock, sock2;

  sock = socket(AF_ALG, SOCK_SEQPACKET, 0);
  bind(sock, (struct sockaddr *)&alg, sizeof(alg));
  sock2 = accept(sock, NULL, NULL);
  setsockopt(sock, SOL_ALG, ALG_SET_KEY, "\x9b\xca", 2);
  accept(sock2, NULL, NULL);

==== 8< ======== 8< ======== 8< ======== 8< ====

one can immediatelly see an UBSAN warning:
UBSAN: Undefined behaviour in crypto/algif_hash.c:187:7
variable length array bound value 0 <= 0
CPU: 0 PID: 15949 Comm: syz-executor Tainted: G            E      4.4.30-0-default #1
...
Call Trace:
...
 [<ffffffff81d598fd>] ? __ubsan_handle_vla_bound_not_positive+0x13d/0x188
 [<ffffffff81d597c0>] ? __ubsan_handle_out_of_bounds+0x1bc/0x1bc
 [<ffffffffa0e2204d>] ? hash_accept+0x5bd/0x7d0 [algif_hash]
 [<ffffffffa0e2293f>] ? hash_accept_nokey+0x3f/0x51 [algif_hash]
 [<ffffffffa0e206b0>] ? hash_accept_parent_nokey+0x4a0/0x4a0 [algif_hash]
 [<ffffffff8235c42b>] ? SyS_accept+0x2b/0x40

It is a correct warning, as hash state is propagated to accept as zero,
but creating a zero-length variable array is not allowed in C.

Fix this as proposed by Herbert -- do "?: 1" on that site. No sizeof or
similar happens in the code there, so we just allocate one byte even
though we do not use the array.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net> (maintainer:CRYPTO API)
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-12-27 17:50:52 +08:00
Ard Biesheuvel 9ae433bc79 crypto: chacha20 - convert generic and x86 versions to skcipher
This converts the ChaCha20 code from a blkcipher to a skcipher, which
is now the preferred way to implement symmetric block and stream ciphers.

This ports the generic and x86 versions at the same time because the
latter reuses routines of the former.

Note that the skcipher_walk() API guarantees that all presented blocks
except the final one are a multiple of the chunk size, so we can simplify
the encrypt() routine somewhat.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-12-27 17:47:31 +08:00