Lots of updates for net-next for this cycle. As usual, we have

a lot of small fixes and cleanups, the bigger items are:
  * proper mac80211 rate control locking, to fix some random crashes
    (this required changing other locking as well)
  * mac80211 "fast-xmit", a mechanism to reduce, in most cases, the
    amount of code we execute while going from ndo_start_xmit() to
    the driver
  * this also clears the way for properly supporting S/G and checksum
    and segmentation offloads
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJVSh65AAoJEDBSmw7B7bqrudwP/0iXyNQhF0mLTENrx+rdsDZS
 qQhB/8wejJaOJb89Re7M+bhwri7Q6S5BM/G24vhMc01dxmqNMcdKfEV3+nlmc5C+
 KeEgTI9aZiCnUt4WAd54Zwbkc9o+1kBtaFuaWDvOdQHUf0WDwEIQxjnV4+SZujV9
 xl1TV5yV35hRQgrDE8ZSbtOYRmhSVoi0MEgwqAjzdN2fEPyWVeqwYULDtpOopjL2
 UHQgv0E2fYVRWennHyQQ88tWBQg+EsRaG1U1/rYHhNBmAJ+f9AOxKi7ErzxYfkbM
 961B+3E++pM+zUeqw6+jaMKqT5jeCCM5ugCNSG4NrIvfxDIDgecAFV9Fs2islnI4
 8xd3GqyA5iqaitAWIUsaYaQfaAcwSIlpSinfQW9EUm2wuCkPyZboFP+GRd2K7sQn
 FnRJSJ9PkGPdWwdDE3gunLHBHtbDS0z+R8VegIeS0qT8LamkqICiNQSyPlsTeluW
 ig2kwHsDdj3k11wyelhfp/RdtsOch/brKpLSjdzPXC1BzIWhQLwmsPh9qZ83vSB9
 qbLsdnM/IPQXocWB6fOhmwaGsLeRalxs2yQFM0zdJCwpaU9dzKsJrxepAXVuq31p
 r0fygWTp8GVevHXzfS7fRya8xjsTRrSs6n2kH7ErOfiep13HQypAjbyLswNe4kW/
 D6x8pVC3AhdGkl/9CW4m
 =oUlh
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-davem-2015-05-06' into iwlwifi-next

Lots of updates for net-next for this cycle. As usual, we have
a lot of small fixes and cleanups, the bigger items are:
 * proper mac80211 rate control locking, to fix some random crashes
   (this required changing other locking as well)
 * mac80211 "fast-xmit", a mechanism to reduce, in most cases, the
   amount of code we execute while going from ndo_start_xmit() to
   the driver
 * this also clears the way for properly supporting S/G and checksum
   and segmentation offloads
This commit is contained in:
Emmanuel Grumbach 2015-05-28 13:36:49 +03:00
commit 31eb07f5f8
2188 changed files with 76579 additions and 41908 deletions

1
.gitignore vendored
View File

@ -24,6 +24,7 @@
*.order
*.elf
*.bin
*.tar
*.gz
*.bz2
*.lzma

View File

@ -100,6 +100,7 @@ Rajesh Shah <rajesh.shah@intel.com>
Ralf Baechle <ralf@linux-mips.org>
Ralf Wildenhues <Ralf.Wildenhues@gmx.de>
Rémi Denis-Courmont <rdenis@simphalempin.com>
Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Rudolf Marek <R.Marek@sh.cvut.cz>
Rui Saraiva <rmps@joel.ist.utl.pt>
Sachin P Sant <ssant@in.ibm.com>

17
CREDITS
View File

@ -508,6 +508,10 @@ E: paul@paulbristow.net
W: http://paulbristow.net/linux/idefloppy.html
D: Maintainer of IDE/ATAPI floppy driver
N: Stefano Brivio
E: stefano.brivio@polimi.it
D: Broadcom B43 driver
N: Dominik Brodowski
E: linux@brodo.de
W: http://www.brodo.de/
@ -3008,6 +3012,19 @@ W: http://www.qsl.net/dl1bke/
D: Generic Z8530 driver, AX.25 DAMA slave implementation
D: Several AX.25 hacks
N: Ricardo Ribalda Delgado
E: ricardo.ribalda@gmail.com
W: http://ribalda.com
D: PLX USB338x driver
D: PCA9634 driver
D: Option GTM671WFS
D: Fintek F81216A
D: Various kernel hacks
S: Qtechnology A/S
S: Valby Langgade 142
S: 2500 Valby
S: Denmark
N: Francois-Rene Rideau
E: fare@tunes.org
W: http://www.tunes.org/~fare

View File

@ -0,0 +1,119 @@
What: /sys/block/zram<id>/num_reads
Date: August 2015
Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Description:
The num_reads file is read-only and specifies the number of
reads (failed or successful) done on this device.
Now accessible via zram<id>/stat node.
What: /sys/block/zram<id>/num_writes
Date: August 2015
Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Description:
The num_writes file is read-only and specifies the number of
writes (failed or successful) done on this device.
Now accessible via zram<id>/stat node.
What: /sys/block/zram<id>/invalid_io
Date: August 2015
Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Description:
The invalid_io file is read-only and specifies the number of
non-page-size-aligned I/O requests issued to this device.
Now accessible via zram<id>/io_stat node.
What: /sys/block/zram<id>/failed_reads
Date: August 2015
Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Description:
The failed_reads file is read-only and specifies the number of
failed reads happened on this device.
Now accessible via zram<id>/io_stat node.
What: /sys/block/zram<id>/failed_writes
Date: August 2015
Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Description:
The failed_writes file is read-only and specifies the number of
failed writes happened on this device.
Now accessible via zram<id>/io_stat node.
What: /sys/block/zram<id>/notify_free
Date: August 2015
Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Description:
The notify_free file is read-only. Depending on device usage
scenario it may account a) the number of pages freed because
of swap slot free notifications or b) the number of pages freed
because of REQ_DISCARD requests sent by bio. The former ones
are sent to a swap block device when a swap slot is freed, which
implies that this disk is being used as a swap disk. The latter
ones are sent by filesystem mounted with discard option,
whenever some data blocks are getting discarded.
Now accessible via zram<id>/io_stat node.
What: /sys/block/zram<id>/zero_pages
Date: August 2015
Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Description:
The zero_pages file is read-only and specifies number of zero
filled pages written to this disk. No memory is allocated for
such pages.
Now accessible via zram<id>/mm_stat node.
What: /sys/block/zram<id>/orig_data_size
Date: August 2015
Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Description:
The orig_data_size file is read-only and specifies uncompressed
size of data stored in this disk. This excludes zero-filled
pages (zero_pages) since no memory is allocated for them.
Unit: bytes
Now accessible via zram<id>/mm_stat node.
What: /sys/block/zram<id>/compr_data_size
Date: August 2015
Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Description:
The compr_data_size file is read-only and specifies compressed
size of data stored in this disk. So, compression ratio can be
calculated using orig_data_size and this statistic.
Unit: bytes
Now accessible via zram<id>/mm_stat node.
What: /sys/block/zram<id>/mem_used_total
Date: August 2015
Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Description:
The mem_used_total file is read-only and specifies the amount
of memory, including allocator fragmentation and metadata
overhead, allocated for this disk. So, allocator space
efficiency can be calculated using compr_data_size and this
statistic.
Unit: bytes
Now accessible via zram<id>/mm_stat node.
What: /sys/block/zram<id>/mem_used_max
Date: August 2015
Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Description:
The mem_used_max file is read/write and specifies the amount
of maximum memory zram have consumed to store compressed data.
For resetting the value, you should write "0". Otherwise,
you could see -EINVAL.
Unit: bytes
Downgraded to write-only node: so it's possible to set new
value only; its current value is stored in zram<id>/mm_stat
node.
What: /sys/block/zram<id>/mem_limit
Date: August 2015
Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Description:
The mem_limit file is read/write and specifies the maximum
amount of memory ZRAM can use to store the compressed data.
The limit could be changed in run time and "0" means disable
the limit. No limit is the initial state. Unit: bytes
Downgraded to write-only node: so it's possible to set new
value only; its current value is stored in zram<id>/mm_stat
node.

View File

@ -141,3 +141,28 @@ Description:
amount of memory ZRAM can use to store the compressed data. The
limit could be changed in run time and "0" means disable the
limit. No limit is the initial state. Unit: bytes
What: /sys/block/zram<id>/compact
Date: August 2015
Contact: Minchan Kim <minchan@kernel.org>
Description:
The compact file is write-only and trigger compaction for
allocator zrm uses. The allocator moves some objects so that
it could free fragment space.
What: /sys/block/zram<id>/io_stat
Date: August 2015
Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Description:
The io_stat file is read-only and accumulates device's I/O
statistics not accounted by block layer. For example,
failed_reads, failed_writes, etc. File format is similar to
block layer statistics file format.
What: /sys/block/zram<id>/mm_stat
Date: August 2015
Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Description:
The mm_stat file is read-only and represents device's mm
statistics (orig_data_size, compr_data_size, etc.) in a format
similar to block layer statistics file format.

View File

@ -100,7 +100,7 @@ Description: read only
Hexadecimal value of the device ID found in this AFU
configuration record.
What: /sys/class/cxl/<afu>/cr<config num>/vendor
What: /sys/class/cxl/<afu>/cr<config num>/class
Date: February 2015
Contact: linuxppc-dev@lists.ozlabs.org
Description: read only

View File

@ -0,0 +1,80 @@
What: /sys/class/leds/<led>/flash_brightness
Date: March 2015
KernelVersion: 4.0
Contact: Jacek Anaszewski <j.anaszewski@samsung.com>
Description: read/write
Set the brightness of this LED in the flash strobe mode, in
microamperes. The file is created only for the flash LED devices
that support setting flash brightness.
The value is between 0 and
/sys/class/leds/<led>/max_flash_brightness.
What: /sys/class/leds/<led>/max_flash_brightness
Date: March 2015
KernelVersion: 4.0
Contact: Jacek Anaszewski <j.anaszewski@samsung.com>
Description: read only
Maximum brightness level for this LED in the flash strobe mode,
in microamperes.
What: /sys/class/leds/<led>/flash_timeout
Date: March 2015
KernelVersion: 4.0
Contact: Jacek Anaszewski <j.anaszewski@samsung.com>
Description: read/write
Hardware timeout for flash, in microseconds. The flash strobe
is stopped after this period of time has passed from the start
of the strobe. The file is created only for the flash LED
devices that support setting flash timeout.
What: /sys/class/leds/<led>/max_flash_timeout
Date: March 2015
KernelVersion: 4.0
Contact: Jacek Anaszewski <j.anaszewski@samsung.com>
Description: read only
Maximum flash timeout for this LED, in microseconds.
What: /sys/class/leds/<led>/flash_strobe
Date: March 2015
KernelVersion: 4.0
Contact: Jacek Anaszewski <j.anaszewski@samsung.com>
Description: read/write
Flash strobe state. When written with 1 it triggers flash strobe
and when written with 0 it turns the flash off.
On read 1 means that flash is currently strobing and 0 means
that flash is off.
What: /sys/class/leds/<led>/flash_fault
Date: March 2015
KernelVersion: 4.0
Contact: Jacek Anaszewski <j.anaszewski@samsung.com>
Description: read only
Space separated list of flash faults that may have occurred.
Flash faults are re-read after strobing the flash. Possible
flash faults:
* led-over-voltage - flash controller voltage to the flash LED
has exceeded the limit specific to the flash controller
* flash-timeout-exceeded - the flash strobe was still on when
the timeout set by the user has expired; not all flash
controllers may set this in all such conditions
* controller-over-temperature - the flash controller has
overheated
* controller-short-circuit - the short circuit protection
of the flash controller has been triggered
* led-power-supply-over-current - current in the LED power
supply has exceeded the limit specific to the flash
controller
* indicator-led-fault - the flash controller has detected
a short or open circuit condition on the indicator LED
* led-under-voltage - flash controller voltage to the flash
LED has been below the minimum limit specific to
the flash
* controller-under-voltage - the input voltage of the flash
controller is below the limit under which strobing the
flash at full current will not be possible;
the condition persists until this flag is no longer set
* led-over-temperature - the temperature of the LED has exceeded
its allowed upper limit

View File

@ -659,6 +659,19 @@ macros using parameters.
#define CONSTANT 0x4000
#define CONSTEXP (CONSTANT | 3)
5) namespace collisions when defining local variables in macros resembling
functions:
#define FOO(x) \
({ \
typeof(x) ret; \
ret = calc_ret(x); \
(ret); \
)}
ret is a common name for a local variable - __foo_ret is less likely
to collide with an existing variable.
The cpp manual deals with macros exhaustively. The gcc internals manual also
covers RTL which is used frequently with assembly language in the kernel.

View File

@ -509,6 +509,270 @@
select it due to the used type and mask field.
</para>
</sect1>
<sect1><title>Internal Structure of Kernel Crypto API</title>
<para>
The kernel crypto API has an internal structure where a cipher
implementation may use many layers and indirections. This section
shall help to clarify how the kernel crypto API uses
various components to implement the complete cipher.
</para>
<para>
The following subsections explain the internal structure based
on existing cipher implementations. The first section addresses
the most complex scenario where all other scenarios form a logical
subset.
</para>
<sect2><title>Generic AEAD Cipher Structure</title>
<para>
The following ASCII art decomposes the kernel crypto API layers
when using the AEAD cipher with the automated IV generation. The
shown example is used by the IPSEC layer.
</para>
<para>
For other use cases of AEAD ciphers, the ASCII art applies as
well, but the caller may not use the GIVCIPHER interface. In
this case, the caller must generate the IV.
</para>
<para>
The depicted example decomposes the AEAD cipher of GCM(AES) based
on the generic C implementations (gcm.c, aes-generic.c, ctr.c,
ghash-generic.c, seqiv.c). The generic implementation serves as an
example showing the complete logic of the kernel crypto API.
</para>
<para>
It is possible that some streamlined cipher implementations (like
AES-NI) provide implementations merging aspects which in the view
of the kernel crypto API cannot be decomposed into layers any more.
In case of the AES-NI implementation, the CTR mode, the GHASH
implementation and the AES cipher are all merged into one cipher
implementation registered with the kernel crypto API. In this case,
the concept described by the following ASCII art applies too. However,
the decomposition of GCM into the individual sub-components
by the kernel crypto API is not done any more.
</para>
<para>
Each block in the following ASCII art is an independent cipher
instance obtained from the kernel crypto API. Each block
is accessed by the caller or by other blocks using the API functions
defined by the kernel crypto API for the cipher implementation type.
</para>
<para>
The blocks below indicate the cipher type as well as the specific
logic implemented in the cipher.
</para>
<para>
The ASCII art picture also indicates the call structure, i.e. who
calls which component. The arrows point to the invoked block
where the caller uses the API applicable to the cipher type
specified for the block.
</para>
<programlisting>
<![CDATA[
kernel crypto API | IPSEC Layer
|
+-----------+ |
| | (1)
| givcipher | <----------------------------------- esp_output
| (seqiv) | ---+
+-----------+ |
| (2)
+-----------+ |
| | <--+ (2)
| aead | <----------------------------------- esp_input
| (gcm) | ------------+
+-----------+ |
| (3) | (5)
v v
+-----------+ +-----------+
| | | |
| ablkcipher| | ahash |
| (ctr) | ---+ | (ghash) |
+-----------+ | +-----------+
|
+-----------+ | (4)
| | <--+
| cipher |
| (aes) |
+-----------+
]]>
</programlisting>
<para>
The following call sequence is applicable when the IPSEC layer
triggers an encryption operation with the esp_output function. During
configuration, the administrator set up the use of rfc4106(gcm(aes)) as
the cipher for ESP. The following call sequence is now depicted in the
ASCII art above:
</para>
<orderedlist>
<listitem>
<para>
esp_output() invokes crypto_aead_givencrypt() to trigger an encryption
operation of the GIVCIPHER implementation.
</para>
<para>
In case of GCM, the SEQIV implementation is registered as GIVCIPHER
in crypto_rfc4106_alloc().
</para>
<para>
The SEQIV performs its operation to generate an IV where the core
function is seqiv_geniv().
</para>
</listitem>
<listitem>
<para>
Now, SEQIV uses the AEAD API function calls to invoke the associated
AEAD cipher. In our case, during the instantiation of SEQIV, the
cipher handle for GCM is provided to SEQIV. This means that SEQIV
invokes AEAD cipher operations with the GCM cipher handle.
</para>
<para>
During instantiation of the GCM handle, the CTR(AES) and GHASH
ciphers are instantiated. The cipher handles for CTR(AES) and GHASH
are retained for later use.
</para>
<para>
The GCM implementation is responsible to invoke the CTR mode AES and
the GHASH cipher in the right manner to implement the GCM
specification.
</para>
</listitem>
<listitem>
<para>
The GCM AEAD cipher type implementation now invokes the ABLKCIPHER API
with the instantiated CTR(AES) cipher handle.
</para>
<para>
During instantiation of the CTR(AES) cipher, the CIPHER type
implementation of AES is instantiated. The cipher handle for AES is
retained.
</para>
<para>
That means that the ABLKCIPHER implementation of CTR(AES) only
implements the CTR block chaining mode. After performing the block
chaining operation, the CIPHER implementation of AES is invoked.
</para>
</listitem>
<listitem>
<para>
The ABLKCIPHER of CTR(AES) now invokes the CIPHER API with the AES
cipher handle to encrypt one block.
</para>
</listitem>
<listitem>
<para>
The GCM AEAD implementation also invokes the GHASH cipher
implementation via the AHASH API.
</para>
</listitem>
</orderedlist>
<para>
When the IPSEC layer triggers the esp_input() function, the same call
sequence is followed with the only difference that the operation starts
with step (2).
</para>
</sect2>
<sect2><title>Generic Block Cipher Structure</title>
<para>
Generic block ciphers follow the same concept as depicted with the ASCII
art picture above.
</para>
<para>
For example, CBC(AES) is implemented with cbc.c, and aes-generic.c. The
ASCII art picture above applies as well with the difference that only
step (4) is used and the ABLKCIPHER block chaining mode is CBC.
</para>
</sect2>
<sect2><title>Generic Keyed Message Digest Structure</title>
<para>
Keyed message digest implementations again follow the same concept as
depicted in the ASCII art picture above.
</para>
<para>
For example, HMAC(SHA256) is implemented with hmac.c and
sha256_generic.c. The following ASCII art illustrates the
implementation:
</para>
<programlisting>
<![CDATA[
kernel crypto API | Caller
|
+-----------+ (1) |
| | <------------------ some_function
| ahash |
| (hmac) | ---+
+-----------+ |
| (2)
+-----------+ |
| | <--+
| shash |
| (sha256) |
+-----------+
]]>
</programlisting>
<para>
The following call sequence is applicable when a caller triggers
an HMAC operation:
</para>
<orderedlist>
<listitem>
<para>
The AHASH API functions are invoked by the caller. The HMAC
implementation performs its operation as needed.
</para>
<para>
During initialization of the HMAC cipher, the SHASH cipher type of
SHA256 is instantiated. The cipher handle for the SHA256 instance is
retained.
</para>
<para>
At one time, the HMAC implementation requires a SHA256 operation
where the SHA256 cipher handle is used.
</para>
</listitem>
<listitem>
<para>
The HMAC instance now invokes the SHASH API with the SHA256
cipher handle to calculate the message digest.
</para>
</listitem>
</orderedlist>
</sect2>
</sect1>
</chapter>
<chapter id="Development"><title>Developing Cipher Algorithms</title>
@ -808,6 +1072,602 @@
</sect1>
</chapter>
<chapter id="User"><title>User Space Interface</title>
<sect1><title>Introduction</title>
<para>
The concepts of the kernel crypto API visible to kernel space is fully
applicable to the user space interface as well. Therefore, the kernel
crypto API high level discussion for the in-kernel use cases applies
here as well.
</para>
<para>
The major difference, however, is that user space can only act as a
consumer and never as a provider of a transformation or cipher algorithm.
</para>
<para>
The following covers the user space interface exported by the kernel
crypto API. A working example of this description is libkcapi that
can be obtained from [1]. That library can be used by user space
applications that require cryptographic services from the kernel.
</para>
<para>
Some details of the in-kernel kernel crypto API aspects do not
apply to user space, however. This includes the difference between
synchronous and asynchronous invocations. The user space API call
is fully synchronous.
</para>
<para>
[1] http://www.chronox.de/libkcapi.html
</para>
</sect1>
<sect1><title>User Space API General Remarks</title>
<para>
The kernel crypto API is accessible from user space. Currently,
the following ciphers are accessible:
</para>
<itemizedlist>
<listitem>
<para>Message digest including keyed message digest (HMAC, CMAC)</para>
</listitem>
<listitem>
<para>Symmetric ciphers</para>
</listitem>
<listitem>
<para>AEAD ciphers</para>
</listitem>
<listitem>
<para>Random Number Generators</para>
</listitem>
</itemizedlist>
<para>
The interface is provided via socket type using the type AF_ALG.
In addition, the setsockopt option type is SOL_ALG. In case the
user space header files do not export these flags yet, use the
following macros:
</para>
<programlisting>
#ifndef AF_ALG
#define AF_ALG 38
#endif
#ifndef SOL_ALG
#define SOL_ALG 279
#endif
</programlisting>
<para>
A cipher is accessed with the same name as done for the in-kernel
API calls. This includes the generic vs. unique naming schema for
ciphers as well as the enforcement of priorities for generic names.
</para>
<para>
To interact with the kernel crypto API, a socket must be
created by the user space application. User space invokes the cipher
operation with the send()/write() system call family. The result of the
cipher operation is obtained with the read()/recv() system call family.
</para>
<para>
The following API calls assume that the socket descriptor
is already opened by the user space application and discusses only
the kernel crypto API specific invocations.
</para>
<para>
To initialize the socket interface, the following sequence has to
be performed by the consumer:
</para>
<orderedlist>
<listitem>
<para>
Create a socket of type AF_ALG with the struct sockaddr_alg
parameter specified below for the different cipher types.
</para>
</listitem>
<listitem>
<para>
Invoke bind with the socket descriptor
</para>
</listitem>
<listitem>
<para>
Invoke accept with the socket descriptor. The accept system call
returns a new file descriptor that is to be used to interact with
the particular cipher instance. When invoking send/write or recv/read
system calls to send data to the kernel or obtain data from the
kernel, the file descriptor returned by accept must be used.
</para>
</listitem>
</orderedlist>
</sect1>
<sect1><title>In-place Cipher operation</title>
<para>
Just like the in-kernel operation of the kernel crypto API, the user
space interface allows the cipher operation in-place. That means that
the input buffer used for the send/write system call and the output
buffer used by the read/recv system call may be one and the same.
This is of particular interest for symmetric cipher operations where a
copying of the output data to its final destination can be avoided.
</para>
<para>
If a consumer on the other hand wants to maintain the plaintext and
the ciphertext in different memory locations, all a consumer needs
to do is to provide different memory pointers for the encryption and
decryption operation.
</para>
</sect1>
<sect1><title>Message Digest API</title>
<para>
The message digest type to be used for the cipher operation is
selected when invoking the bind syscall. bind requires the caller
to provide a filled struct sockaddr data structure. This data
structure must be filled as follows:
</para>
<programlisting>
struct sockaddr_alg sa = {
.salg_family = AF_ALG,
.salg_type = "hash", /* this selects the hash logic in the kernel */
.salg_name = "sha1" /* this is the cipher name */
};
</programlisting>
<para>
The salg_type value "hash" applies to message digests and keyed
message digests. Though, a keyed message digest is referenced by
the appropriate salg_name. Please see below for the setsockopt
interface that explains how the key can be set for a keyed message
digest.
</para>
<para>
Using the send() system call, the application provides the data that
should be processed with the message digest. The send system call
allows the following flags to be specified:
</para>
<itemizedlist>
<listitem>
<para>
MSG_MORE: If this flag is set, the send system call acts like a
message digest update function where the final hash is not
yet calculated. If the flag is not set, the send system call
calculates the final message digest immediately.
</para>
</listitem>
</itemizedlist>
<para>
With the recv() system call, the application can read the message
digest from the kernel crypto API. If the buffer is too small for the
message digest, the flag MSG_TRUNC is set by the kernel.
</para>
<para>
In order to set a message digest key, the calling application must use
the setsockopt() option of ALG_SET_KEY. If the key is not set the HMAC
operation is performed without the initial HMAC state change caused by
the key.
</para>
</sect1>
<sect1><title>Symmetric Cipher API</title>
<para>
The operation is very similar to the message digest discussion.
During initialization, the struct sockaddr data structure must be
filled as follows:
</para>
<programlisting>
struct sockaddr_alg sa = {
.salg_family = AF_ALG,
.salg_type = "skcipher", /* this selects the symmetric cipher */
.salg_name = "cbc(aes)" /* this is the cipher name */
};
</programlisting>
<para>
Before data can be sent to the kernel using the write/send system
call family, the consumer must set the key. The key setting is
described with the setsockopt invocation below.
</para>
<para>
Using the sendmsg() system call, the application provides the data that should be processed for encryption or decryption. In addition, the IV is
specified with the data structure provided by the sendmsg() system call.
</para>
<para>
The sendmsg system call parameter of struct msghdr is embedded into the
struct cmsghdr data structure. See recv(2) and cmsg(3) for more
information on how the cmsghdr data structure is used together with the
send/recv system call family. That cmsghdr data structure holds the
following information specified with a separate header instances:
</para>
<itemizedlist>
<listitem>
<para>
specification of the cipher operation type with one of these flags:
</para>
<itemizedlist>
<listitem>
<para>ALG_OP_ENCRYPT - encryption of data</para>
</listitem>
<listitem>
<para>ALG_OP_DECRYPT - decryption of data</para>
</listitem>
</itemizedlist>
</listitem>
<listitem>
<para>
specification of the IV information marked with the flag ALG_SET_IV
</para>
</listitem>
</itemizedlist>
<para>
The send system call family allows the following flag to be specified:
</para>
<itemizedlist>
<listitem>
<para>
MSG_MORE: If this flag is set, the send system call acts like a
cipher update function where more input data is expected
with a subsequent invocation of the send system call.
</para>
</listitem>
</itemizedlist>
<para>
Note: The kernel reports -EINVAL for any unexpected data. The caller
must make sure that all data matches the constraints given in
/proc/crypto for the selected cipher.
</para>
<para>
With the recv() system call, the application can read the result of
the cipher operation from the kernel crypto API. The output buffer
must be at least as large as to hold all blocks of the encrypted or
decrypted data. If the output data size is smaller, only as many
blocks are returned that fit into that output buffer size.
</para>
</sect1>
<sect1><title>AEAD Cipher API</title>
<para>
The operation is very similar to the symmetric cipher discussion.
During initialization, the struct sockaddr data structure must be
filled as follows:
</para>
<programlisting>
struct sockaddr_alg sa = {
.salg_family = AF_ALG,
.salg_type = "aead", /* this selects the symmetric cipher */
.salg_name = "gcm(aes)" /* this is the cipher name */
};
</programlisting>
<para>
Before data can be sent to the kernel using the write/send system
call family, the consumer must set the key. The key setting is
described with the setsockopt invocation below.
</para>
<para>
In addition, before data can be sent to the kernel using the
write/send system call family, the consumer must set the authentication
tag size. To set the authentication tag size, the caller must use the
setsockopt invocation described below.
</para>
<para>
Using the sendmsg() system call, the application provides the data that should be processed for encryption or decryption. In addition, the IV is
specified with the data structure provided by the sendmsg() system call.
</para>
<para>
The sendmsg system call parameter of struct msghdr is embedded into the
struct cmsghdr data structure. See recv(2) and cmsg(3) for more
information on how the cmsghdr data structure is used together with the
send/recv system call family. That cmsghdr data structure holds the
following information specified with a separate header instances:
</para>
<itemizedlist>
<listitem>
<para>
specification of the cipher operation type with one of these flags:
</para>
<itemizedlist>
<listitem>
<para>ALG_OP_ENCRYPT - encryption of data</para>
</listitem>
<listitem>
<para>ALG_OP_DECRYPT - decryption of data</para>
</listitem>
</itemizedlist>
</listitem>
<listitem>
<para>
specification of the IV information marked with the flag ALG_SET_IV
</para>
</listitem>
<listitem>
<para>
specification of the associated authentication data (AAD) with the
flag ALG_SET_AEAD_ASSOCLEN. The AAD is sent to the kernel together
with the plaintext / ciphertext. See below for the memory structure.
</para>
</listitem>
</itemizedlist>
<para>
The send system call family allows the following flag to be specified:
</para>
<itemizedlist>
<listitem>
<para>
MSG_MORE: If this flag is set, the send system call acts like a
cipher update function where more input data is expected
with a subsequent invocation of the send system call.
</para>
</listitem>
</itemizedlist>
<para>
Note: The kernel reports -EINVAL for any unexpected data. The caller
must make sure that all data matches the constraints given in
/proc/crypto for the selected cipher.
</para>
<para>
With the recv() system call, the application can read the result of
the cipher operation from the kernel crypto API. The output buffer
must be at least as large as defined with the memory structure below.
If the output data size is smaller, the cipher operation is not performed.
</para>
<para>
The authenticated decryption operation may indicate an integrity error.
Such breach in integrity is marked with the -EBADMSG error code.
</para>
<sect2><title>AEAD Memory Structure</title>
<para>
The AEAD cipher operates with the following information that
is communicated between user and kernel space as one data stream:
</para>
<itemizedlist>
<listitem>
<para>plaintext or ciphertext</para>
</listitem>
<listitem>
<para>associated authentication data (AAD)</para>
</listitem>
<listitem>
<para>authentication tag</para>
</listitem>
</itemizedlist>
<para>
The sizes of the AAD and the authentication tag are provided with
the sendmsg and setsockopt calls (see there). As the kernel knows
the size of the entire data stream, the kernel is now able to
calculate the right offsets of the data components in the data
stream.
</para>
<para>
The user space caller must arrange the aforementioned information
in the following order:
</para>
<itemizedlist>
<listitem>
<para>
AEAD encryption input: AAD || plaintext
</para>
</listitem>
<listitem>
<para>
AEAD decryption input: AAD || ciphertext || authentication tag
</para>
</listitem>
</itemizedlist>
<para>
The output buffer the user space caller provides must be at least as
large to hold the following data:
</para>
<itemizedlist>
<listitem>
<para>
AEAD encryption output: ciphertext || authentication tag
</para>
</listitem>
<listitem>
<para>
AEAD decryption output: plaintext
</para>
</listitem>
</itemizedlist>
</sect2>
</sect1>
<sect1><title>Random Number Generator API</title>
<para>
Again, the operation is very similar to the other APIs.
During initialization, the struct sockaddr data structure must be
filled as follows:
</para>
<programlisting>
struct sockaddr_alg sa = {
.salg_family = AF_ALG,
.salg_type = "rng", /* this selects the symmetric cipher */
.salg_name = "drbg_nopr_sha256" /* this is the cipher name */
};
</programlisting>
<para>
Depending on the RNG type, the RNG must be seeded. The seed is provided
using the setsockopt interface to set the key. For example, the
ansi_cprng requires a seed. The DRBGs do not require a seed, but
may be seeded.
</para>
<para>
Using the read()/recvmsg() system calls, random numbers can be obtained.
The kernel generates at most 128 bytes in one call. If user space
requires more data, multiple calls to read()/recvmsg() must be made.
</para>
<para>
WARNING: The user space caller may invoke the initially mentioned
accept system call multiple times. In this case, the returned file
descriptors have the same state.
</para>
</sect1>
<sect1><title>Zero-Copy Interface</title>
<para>
In addition to the send/write/read/recv system call familty, the AF_ALG
interface can be accessed with the zero-copy interface of splice/vmsplice.
As the name indicates, the kernel tries to avoid a copy operation into
kernel space.
</para>
<para>
The zero-copy operation requires data to be aligned at the page boundary.
Non-aligned data can be used as well, but may require more operations of
the kernel which would defeat the speed gains obtained from the zero-copy
interface.
</para>
<para>
The system-interent limit for the size of one zero-copy operation is
16 pages. If more data is to be sent to AF_ALG, user space must slice
the input into segments with a maximum size of 16 pages.
</para>
<para>
Zero-copy can be used with the following code example (a complete working
example is provided with libkcapi):
</para>
<programlisting>
int pipes[2];
pipe(pipes);
/* input data in iov */
vmsplice(pipes[1], iov, iovlen, SPLICE_F_GIFT);
/* opfd is the file descriptor returned from accept() system call */
splice(pipes[0], NULL, opfd, NULL, ret, 0);
read(opfd, out, outlen);
</programlisting>
</sect1>
<sect1><title>Setsockopt Interface</title>
<para>
In addition to the read/recv and send/write system call handling
to send and retrieve data subject to the cipher operation, a consumer
also needs to set the additional information for the cipher operation.
This additional information is set using the setsockopt system call
that must be invoked with the file descriptor of the open cipher
(i.e. the file descriptor returned by the accept system call).
</para>
<para>
Each setsockopt invocation must use the level SOL_ALG.
</para>
<para>
The setsockopt interface allows setting the following data using
the mentioned optname:
</para>
<itemizedlist>
<listitem>
<para>
ALG_SET_KEY -- Setting the key. Key setting is applicable to:
</para>
<itemizedlist>
<listitem>
<para>the skcipher cipher type (symmetric ciphers)</para>
</listitem>
<listitem>
<para>the hash cipher type (keyed message digests)</para>
</listitem>
<listitem>
<para>the AEAD cipher type</para>
</listitem>
<listitem>
<para>the RNG cipher type to provide the seed</para>
</listitem>
</itemizedlist>
</listitem>
<listitem>
<para>
ALG_SET_AEAD_AUTHSIZE -- Setting the authentication tag size
for AEAD ciphers. For a encryption operation, the authentication
tag of the given size will be generated. For a decryption operation,
the provided ciphertext is assumed to contain an authentication tag
of the given size (see section about AEAD memory layout below).
</para>
</listitem>
</itemizedlist>
</sect1>
<sect1><title>User space API example</title>
<para>
Please see [1] for libkcapi which provides an easy-to-use wrapper
around the aforementioned Netlink kernel interface. [1] also contains
a test application that invokes all libkcapi API calls.
</para>
<para>
[1] http://www.chronox.de/libkcapi.html
</para>
</sect1>
</chapter>
<chapter id="API"><title>Programming Interface</title>
<sect1><title>Block Cipher Context Data Structures</title>
!Pinclude/linux/crypto.h Block Cipher Context Data Structures

View File

@ -95,8 +95,7 @@ since it doesn't need to allocate a table as large as the largest
hwirq number. The disadvantage is that hwirq to IRQ number lookup is
dependent on how many entries are in the table.
Very few drivers should need this mapping. At the moment, powerpc
iseries is the only user.
Very few drivers should need this mapping.
==== No Map ===-
irq_domain_add_nomap()

View File

@ -614,8 +614,8 @@ The canonical patch message body contains the following:
- An empty line.
- The body of the explanation, which will be copied to the
permanent changelog to describe this patch.
- The body of the explanation, line wrapped at 75 columns, which will
be copied to the permanent changelog to describe this patch.
- The "Signed-off-by:" lines, described above, which will
also go in the changelog.

View File

@ -1,17 +1,31 @@
Network Block Device (TCP version)
What is it: With this compiled in the kernel (or as a module), Linux
can use a remote server as one of its block devices. So every time
the client computer wants to read, e.g., /dev/nb0, it sends a
request over TCP to the server, which will reply with the data read.
This can be used for stations with low disk space (or even diskless)
to borrow disk space from another computer.
Unlike NFS, it is possible to put any filesystem on it, etc.
Network Block Device (TCP version)
==================================
For more information, or to download the nbd-client and nbd-server
tools, go to http://nbd.sf.net/.
1) Overview
-----------
What is it: With this compiled in the kernel (or as a module), Linux
can use a remote server as one of its block devices. So every time
the client computer wants to read, e.g., /dev/nb0, it sends a
request over TCP to the server, which will reply with the data read.
This can be used for stations with low disk space (or even diskless)
to borrow disk space from another computer.
Unlike NFS, it is possible to put any filesystem on it, etc.
For more information, or to download the nbd-client and nbd-server
tools, go to http://nbd.sf.net/.
The nbd kernel module need only be installed on the client
system, as the nbd-server is completely in userspace. In fact,
the nbd-server has been successfully ported to other operating
systems, including Windows.
A) NBD parameters
-----------------
max_part
Number of partitions per device (default: 0).
nbds_max
Number of block devices that should be initialized (default: 16).
The nbd kernel module need only be installed on the client
system, as the nbd-server is completely in userspace. In fact,
the nbd-server has been successfully ported to other operating
systems, including Windows.

View File

@ -98,20 +98,79 @@ size of the disk when not in use so a huge zram is wasteful.
mount /dev/zram1 /tmp
7) Stats:
Per-device statistics are exported as various nodes under
/sys/block/zram<id>/
disksize
num_reads
num_writes
failed_reads
failed_writes
invalid_io
notify_free
zero_pages
orig_data_size
compr_data_size
mem_used_total
mem_used_max
Per-device statistics are exported as various nodes under /sys/block/zram<id>/
A brief description of exported device attritbutes. For more details please
read Documentation/ABI/testing/sysfs-block-zram.
Name access description
---- ------ -----------
disksize RW show and set the device's disk size
initstate RO shows the initialization state of the device
reset WO trigger device reset
num_reads RO the number of reads
failed_reads RO the number of failed reads
num_write RO the number of writes
failed_writes RO the number of failed writes
invalid_io RO the number of non-page-size-aligned I/O requests
max_comp_streams RW the number of possible concurrent compress operations
comp_algorithm RW show and change the compression algorithm
notify_free RO the number of notifications to free pages (either
slot free notifications or REQ_DISCARD requests)
zero_pages RO the number of zero filled pages written to this disk
orig_data_size RO uncompressed size of data stored in this disk
compr_data_size RO compressed size of data stored in this disk
mem_used_total RO the amount of memory allocated for this disk
mem_used_max RW the maximum amount memory zram have consumed to
store compressed data
mem_limit RW the maximum amount of memory ZRAM can use to store
the compressed data
num_migrated RO the number of objects migrated migrated by compaction
WARNING
=======
per-stat sysfs attributes are considered to be deprecated.
The basic strategy is:
-- the existing RW nodes will be downgraded to WO nodes (in linux 4.11)
-- deprecated RO sysfs nodes will eventually be removed (in linux 4.11)
The list of deprecated attributes can be found here:
Documentation/ABI/obsolete/sysfs-block-zram
Basically, every attribute that has its own read accessible sysfs node
(e.g. num_reads) *AND* is accessible via one of the stat files (zram<id>/stat
or zram<id>/io_stat or zram<id>/mm_stat) is considered to be deprecated.
User space is advised to use the following files to read the device statistics.
File /sys/block/zram<id>/stat
Represents block layer statistics. Read Documentation/block/stat.txt for
details.
File /sys/block/zram<id>/io_stat
The stat file represents device's I/O statistics not accounted by block
layer and, thus, not available in zram<id>/stat file. It consists of a
single line of text and contains the following stats separated by
whitespace:
failed_reads
failed_writes
invalid_io
notify_free
File /sys/block/zram<id>/mm_stat
The stat file represents device's mm statistics. It consists of a single
line of text and contains the following stats separated by whitespace:
orig_data_size
compr_data_size
mem_used_total
mem_limit
mem_used_max
zero_pages
num_migrated
8) Deactivate:
swapoff /dev/zram0

View File

@ -1,205 +0,0 @@
Introduction
============
The concepts of the kernel crypto API visible to kernel space is fully
applicable to the user space interface as well. Therefore, the kernel crypto API
high level discussion for the in-kernel use cases applies here as well.
The major difference, however, is that user space can only act as a consumer
and never as a provider of a transformation or cipher algorithm.
The following covers the user space interface exported by the kernel crypto
API. A working example of this description is libkcapi that can be obtained from
[1]. That library can be used by user space applications that require
cryptographic services from the kernel.
Some details of the in-kernel kernel crypto API aspects do not
apply to user space, however. This includes the difference between synchronous
and asynchronous invocations. The user space API call is fully synchronous.
In addition, only a subset of all cipher types are available as documented
below.
User space API general remarks
==============================
The kernel crypto API is accessible from user space. Currently, the following
ciphers are accessible:
* Message digest including keyed message digest (HMAC, CMAC)
* Symmetric ciphers
Note, AEAD ciphers are currently not supported via the symmetric cipher
interface.
The interface is provided via Netlink using the type AF_ALG. In addition, the
setsockopt option type is SOL_ALG. In case the user space header files do not
export these flags yet, use the following macros:
#ifndef AF_ALG
#define AF_ALG 38
#endif
#ifndef SOL_ALG
#define SOL_ALG 279
#endif
A cipher is accessed with the same name as done for the in-kernel API calls.
This includes the generic vs. unique naming schema for ciphers as well as the
enforcement of priorities for generic names.
To interact with the kernel crypto API, a Netlink socket must be created by
the user space application. User space invokes the cipher operation with the
send/write system call family. The result of the cipher operation is obtained
with the read/recv system call family.
The following API calls assume that the Netlink socket descriptor is already
opened by the user space application and discusses only the kernel crypto API
specific invocations.
To initialize a Netlink interface, the following sequence has to be performed
by the consumer:
1. Create a socket of type AF_ALG with the struct sockaddr_alg parameter
specified below for the different cipher types.
2. Invoke bind with the socket descriptor
3. Invoke accept with the socket descriptor. The accept system call
returns a new file descriptor that is to be used to interact with
the particular cipher instance. When invoking send/write or recv/read
system calls to send data to the kernel or obtain data from the
kernel, the file descriptor returned by accept must be used.
In-place cipher operation
=========================
Just like the in-kernel operation of the kernel crypto API, the user space
interface allows the cipher operation in-place. That means that the input buffer
used for the send/write system call and the output buffer used by the read/recv
system call may be one and the same. This is of particular interest for
symmetric cipher operations where a copying of the output data to its final
destination can be avoided.
If a consumer on the other hand wants to maintain the plaintext and the
ciphertext in different memory locations, all a consumer needs to do is to
provide different memory pointers for the encryption and decryption operation.
Message digest API
==================
The message digest type to be used for the cipher operation is selected when
invoking the bind syscall. bind requires the caller to provide a filled
struct sockaddr data structure. This data structure must be filled as follows:
struct sockaddr_alg sa = {
.salg_family = AF_ALG,
.salg_type = "hash", /* this selects the hash logic in the kernel */
.salg_name = "sha1" /* this is the cipher name */
};
The salg_type value "hash" applies to message digests and keyed message digests.
Though, a keyed message digest is referenced by the appropriate salg_name.
Please see below for the setsockopt interface that explains how the key can be
set for a keyed message digest.
Using the send() system call, the application provides the data that should be
processed with the message digest. The send system call allows the following
flags to be specified:
* MSG_MORE: If this flag is set, the send system call acts like a
message digest update function where the final hash is not
yet calculated. If the flag is not set, the send system call
calculates the final message digest immediately.
With the recv() system call, the application can read the message digest from
the kernel crypto API. If the buffer is too small for the message digest, the
flag MSG_TRUNC is set by the kernel.
In order to set a message digest key, the calling application must use the
setsockopt() option of ALG_SET_KEY. If the key is not set the HMAC operation is
performed without the initial HMAC state change caused by the key.
Symmetric cipher API
====================
The operation is very similar to the message digest discussion. During
initialization, the struct sockaddr data structure must be filled as follows:
struct sockaddr_alg sa = {
.salg_family = AF_ALG,
.salg_type = "skcipher", /* this selects the symmetric cipher */
.salg_name = "cbc(aes)" /* this is the cipher name */
};
Before data can be sent to the kernel using the write/send system call family,
the consumer must set the key. The key setting is described with the setsockopt
invocation below.
Using the sendmsg() system call, the application provides the data that should
be processed for encryption or decryption. In addition, the IV is specified
with the data structure provided by the sendmsg() system call.
The sendmsg system call parameter of struct msghdr is embedded into the
struct cmsghdr data structure. See recv(2) and cmsg(3) for more information
on how the cmsghdr data structure is used together with the send/recv system
call family. That cmsghdr data structure holds the following information
specified with a separate header instances:
* specification of the cipher operation type with one of these flags:
ALG_OP_ENCRYPT - encryption of data
ALG_OP_DECRYPT - decryption of data
* specification of the IV information marked with the flag ALG_SET_IV
The send system call family allows the following flag to be specified:
* MSG_MORE: If this flag is set, the send system call acts like a
cipher update function where more input data is expected
with a subsequent invocation of the send system call.
Note: The kernel reports -EINVAL for any unexpected data. The caller must
make sure that all data matches the constraints given in /proc/crypto for the
selected cipher.
With the recv() system call, the application can read the result of the
cipher operation from the kernel crypto API. The output buffer must be at least
as large as to hold all blocks of the encrypted or decrypted data. If the output
data size is smaller, only as many blocks are returned that fit into that
output buffer size.
Setsockopt interface
====================
In addition to the read/recv and send/write system call handling to send and
retrieve data subject to the cipher operation, a consumer also needs to set
the additional information for the cipher operation. This additional information
is set using the setsockopt system call that must be invoked with the file
descriptor of the open cipher (i.e. the file descriptor returned by the
accept system call).
Each setsockopt invocation must use the level SOL_ALG.
The setsockopt interface allows setting the following data using the mentioned
optname:
* ALG_SET_KEY -- Setting the key. Key setting is applicable to:
- the skcipher cipher type (symmetric ciphers)
- the hash cipher type (keyed message digests)
User space API example
======================
Please see [1] for libkcapi which provides an easy-to-use wrapper around the
aforementioned Netlink kernel interface. [1] also contains a test application
that invokes all libkcapi API calls.
[1] http://www.chronox.de/libkcapi.html
Author
======
Stephan Mueller <smueller@chronox.de>

View File

@ -26,6 +26,13 @@ Required properties:
Optional properties:
- interrupt-affinity : Valid only when using SPIs, specifies a list of phandles
to CPU nodes corresponding directly to the affinity of
the SPIs listed in the interrupts property.
This property should be present when there is more than
a single SPI.
- qcom,no-pc-write : Indicates that this PMU doesn't support the 0xc and 0xd
events.

View File

@ -0,0 +1,123 @@
Imagination Technologies Pistachio SoC clock controllers
========================================================
Pistachio has four clock controllers (core clock, peripheral clock, peripheral
general control, and top general control) which are instantiated individually
from the device-tree.
External clocks:
----------------
There are three external inputs to the clock controllers which should be
defined with the following clock-output-names:
- "xtal": External 52Mhz oscillator (required)
- "audio_clk_in": Alternate audio reference clock (optional)
- "enet_clk_in": Alternate ethernet PHY clock (optional)
Core clock controller:
----------------------
The core clock controller generates clocks for the CPU, RPU (WiFi + BT
co-processor), audio, and several peripherals.
Required properties:
- compatible: Must be "img,pistachio-clk".
- reg: Must contain the base address and length of the core clock controller.
- #clock-cells: Must be 1. The single cell is the clock identifier.
See dt-bindings/clock/pistachio-clk.h for the list of valid identifiers.
- clocks: Must contain an entry for each clock in clock-names.
- clock-names: Must include "xtal" (see "External clocks") and
"audio_clk_in_gate", "enet_clk_in_gate" which are generated by the
top-level general control.
Example:
clk_core: clock-controller@18144000 {
compatible = "img,pistachio-clk";
reg = <0x18144000 0x800>;
clocks = <&xtal>, <&cr_top EXT_CLK_AUDIO_IN>,
<&cr_top EXT_CLK_ENET_IN>;
clock-names = "xtal", "audio_clk_in_gate", "enet_clk_in_gate";
#clock-cells = <1>;
};
Peripheral clock controller:
----------------------------
The peripheral clock controller generates clocks for the DDR, ROM, and other
peripherals. The peripheral system clock ("periph_sys") generated by the core
clock controller is the input clock to the peripheral clock controller.
Required properties:
- compatible: Must be "img,pistachio-periph-clk".
- reg: Must contain the base address and length of the peripheral clock
controller.
- #clock-cells: Must be 1. The single cell is the clock identifier.
See dt-bindings/clock/pistachio-clk.h for the list of valid identifiers.
- clocks: Must contain an entry for each clock in clock-names.
- clock-names: Must include "periph_sys", the peripheral system clock generated
by the core clock controller.
Example:
clk_periph: clock-controller@18144800 {
compatible = "img,pistachio-clk-periph";
reg = <0x18144800 0x800>;
clocks = <&clk_core CLK_PERIPH_SYS>;
clock-names = "periph_sys";
#clock-cells = <1>;
};
Peripheral general control:
---------------------------
The peripheral general control block generates system interface clocks and
resets for various peripherals. It also contains miscellaneous peripheral
control registers. The system clock ("sys") generated by the peripheral clock
controller is the input clock to the system clock controller.
Required properties:
- compatible: Must include "img,pistachio-periph-cr" and "syscon".
- reg: Must contain the base address and length of the peripheral general
control registers.
- #clock-cells: Must be 1. The single cell is the clock identifier.
See dt-bindings/clock/pistachio-clk.h for the list of valid identifiers.
- clocks: Must contain an entry for each clock in clock-names.
- clock-names: Must include "sys", the system clock generated by the peripheral
clock controller.
Example:
cr_periph: syscon@18144800 {
compatible = "img,pistachio-cr-periph", "syscon";
reg = <0x18148000 0x1000>;
clocks = <&clock_periph PERIPH_CLK_PERIPH_SYS>;
clock-names = "sys";
#clock-cells = <1>;
};
Top-level general control:
--------------------------
The top-level general control block contains miscellaneous control registers and
gates for the external clocks "audio_clk_in" and "enet_clk_in".
Required properties:
- compatible: Must include "img,pistachio-cr-top" and "syscon".
- reg: Must contain the base address and length of the top-level
control registers.
- clocks: Must contain an entry for each clock in clock-names.
- clock-names: Two optional clocks, "audio_clk_in" and "enet_clk_in" (see
"External clocks").
- #clock-cells: Must be 1. The single cell is the clock identifier.
See dt-bindings/clock/pistachio-clk.h for the list of valid identifiers.
Example:
cr_top: syscon@18144800 {
compatible = "img,pistachio-cr-top", "syscon";
reg = <0x18149000 0x200>;
clocks = <&audio_refclk>, <&ext_enet_in>;
clock-names = "audio_clk_in", "enet_clk_in";
#clock-cells = <1>;
};

View File

@ -0,0 +1,27 @@
Imagination Technologies hardware hash accelerator
The hash accelerator provides hardware hashing acceleration for
SHA1, SHA224, SHA256 and MD5 hashes
Required properties:
- compatible : "img,hash-accelerator"
- reg : Offset and length of the register set for the module, and the DMA port
- interrupts : The designated IRQ line for the hashing module.
- dmas : DMA specifier as per Documentation/devicetree/bindings/dma/dma.txt
- dma-names : Should be "tx"
- clocks : Clock specifiers
- clock-names : "sys" Used to clock the hash block registers
"hash" Used to clock data through the accelerator
Example:
hash: hash@18149600 {
compatible = "img,hash-accelerator";
reg = <0x18149600 0x100>, <0x18101100 0x4>;
interrupts = <GIC_SHARED 59 IRQ_TYPE_LEVEL_HIGH>;
dmas = <&dma 8 0xffffffff 0>;
dma-names = "tx";
clocks = <&cr_periph SYS_CLK_HASH>, <&clk_periph PERIPH_CLK_ROM>;
clock-names = "sys", "hash";
};

View File

@ -0,0 +1,12 @@
HWRNG support for the iproc-rng200 driver
Required properties:
- compatible : "brcm,iproc-rng200"
- reg : base address and size of control register block
Example:
rng {
compatible = "brcm,iproc-rng200";
reg = <0x18032000 0x28>;
};

View File

@ -0,0 +1,41 @@
Broadcom BCM3380-style Level 1 / Level 2 interrupt controller
This interrupt controller shows up in various forms on many BCM338x/BCM63xx
chipsets. It has the following properties:
- outputs a single interrupt signal to its interrupt controller parent
- contains one or more enable/status word pairs, which often appear at
different offsets in different blocks
- no atomic set/clear operations
Required properties:
- compatible: should be "brcm,bcm3380-l2-intc"
- reg: specifies one or more enable/status pairs, in the following format:
<enable_reg 0x4 status_reg 0x4>...
- interrupt-controller: identifies the node as an interrupt controller
- #interrupt-cells: specifies the number of cells needed to encode an interrupt
source, should be 1.
- interrupt-parent: specifies the phandle to the parent interrupt controller
this one is cascaded from
- interrupts: specifies the interrupt line in the interrupt-parent controller
node, valid values depend on the type of parent interrupt controller
Optional properties:
- brcm,irq-can-wake: if present, this means the L2 controller can be used as a
wakeup source for system suspend/resume.
Example:
irq0_intc: interrupt-controller@10000020 {
compatible = "brcm,bcm3380-l2-intc";
reg = <0x10000024 0x4 0x1000002c 0x4>,
<0x10000020 0x4 0x10000028 0x4>;
interrupt-controller;
#interrupt-cells = <1>;
interrupt-parent = <&cpu_intc>;
interrupts = <2>;
};

View File

@ -0,0 +1,52 @@
Broadcom BCM7038-style Level 1 interrupt controller
This block is a first level interrupt controller that is typically connected
directly to one of the HW INT lines on each CPU. Every BCM7xxx set-top chip
since BCM7038 has contained this hardware.
Key elements of the hardware design include:
- 64, 96, 128, or 160 incoming level IRQ lines
- Most onchip peripherals are wired directly to an L1 input
- A separate instance of the register set for each CPU, allowing individual
peripheral IRQs to be routed to any CPU
- Atomic mask/unmask operations
- No polarity/level/edge settings
- No FIFO or priority encoder logic; software is expected to read all
2-5 status words to determine which IRQs are pending
Required properties:
- compatible: should be "brcm,bcm7038-l1-intc"
- reg: specifies the base physical address and size of the registers;
the number of supported IRQs is inferred from the size argument
- interrupt-controller: identifies the node as an interrupt controller
- #interrupt-cells: specifies the number of cells needed to encode an interrupt
source, should be 1.
- interrupt-parent: specifies the phandle to the parent interrupt controller(s)
this one is cascaded from
- interrupts: specifies the interrupt line(s) in the interrupt-parent controller
node; valid values depend on the type of parent interrupt controller
If multiple reg ranges and interrupt-parent entries are present on an SMP
system, the driver will allow IRQ SMP affinity to be set up through the
/proc/irq/ interface. In the simplest possible configuration, only one
reg range and one interrupt-parent is needed.
Example:
periph_intc: periph_intc@1041a400 {
compatible = "brcm,bcm7038-l1-intc";
reg = <0x1041a400 0x30 0x1041a600 0x30>;
interrupt-controller;
#interrupt-cells = <1>;
interrupt-parent = <&cpu_intc>;
interrupts = <2>, <3>;
};

View File

@ -13,8 +13,7 @@ Such an interrupt controller has the following hardware design:
or if they will output an interrupt signal at this 2nd level interrupt
controller, in particular for UARTs
- typically has one 32-bit enable word and one 32-bit status word, but on
some hardware may have more than one enable/status pair
- has one 32-bit enable word and one 32-bit status word
- no atomic set/clear operations
@ -53,9 +52,7 @@ The typical hardware layout for this controller is represented below:
Required properties:
- compatible: should be "brcm,bcm7120-l2-intc"
- reg: specifies the base physical address and size of the registers;
multiple pairs may be specified, with the first pair handling IRQ offsets
0..31 and the second pair handling 32..63
- reg: specifies the base physical address and size of the registers
- interrupt-controller: identifies the node as an interrupt controller
- #interrupt-cells: specifies the number of cells needed to encode an interrupt
source, should be 1.
@ -66,10 +63,7 @@ Required properties:
- brcm,int-map-mask: 32-bits bit mask describing how many and which interrupts
are wired to this 2nd level interrupt controller, and how they match their
respective interrupt parents. Should match exactly the number of interrupts
specified in the 'interrupts' property, multiplied by the number of
enable/status register pairs implemented by this controller. For
multiple parent IRQs with multiple enable/status words, this looks like:
<irq0_w0 irq0_w1 irq1_w0 irq1_w1 ...>
specified in the 'interrupts' property.
Optional properties:

View File

@ -0,0 +1,18 @@
* Xtensa Interrupt Distributor and Programmable Interrupt Controller (MX)
Required properties:
- compatible: Should be "cdns,xtensa-mx".
Remaining properties have exact same meaning as in Xtensa PIC
(see cdns,xtensa-pic.txt).
Examples:
pic: pic {
compatible = "cdns,xtensa-mx";
/* one cell: internal irq number,
* two cells: second cell == 0: internal irq number
* second cell == 1: external irq number
*/
#interrupt-cells = <2>;
interrupt-controller;
};

View File

@ -0,0 +1,25 @@
* Xtensa built-in Programmable Interrupt Controller (PIC)
Required properties:
- compatible: Should be "cdns,xtensa-pic".
- interrupt-controller: Identifies the node as an interrupt controller.
- #interrupt-cells: The number of cells to define the interrupts.
It may be either 1 or 2.
When it's 1, the first cell is the internal IRQ number.
When it's 2, the first cell is the IRQ number, and the second cell
specifies whether it's internal (0) or external (1).
Periferals are usually connected to a fixed external IRQ, but for different
core variants it may be mapped to different internal IRQ.
IRQ sensitivity and priority are fixed for each core variant and may not be
changed at runtime.
Examples:
pic: pic {
compatible = "cdns,xtensa-pic";
/* one cell: internal irq number,
* two cells: second cell == 0: internal irq number
* second cell == 1: external irq number
*/
#interrupt-cells = <2>;
interrupt-controller;
};

View File

@ -27,8 +27,13 @@ Optional properties:
Required properties for timer sub-node:
- compatible : Should be "mti,gic-timer".
- interrupts : Interrupt for the GIC local timer.
Optional properties for timer sub-node:
- clocks : GIC timer operating clock.
- clock-frequency : Clock frequency at which the GIC timers operate.
Note that one of clocks or clock-frequency must be specified.
Example:
gic: interrupt-controller@1bdc0000 {

View File

@ -14,8 +14,10 @@ Optional properties for child nodes:
- led-sources : List of device current outputs the LED is connected to. The
outputs are identified by the numbers that must be defined
in the LED device binding documentation.
- label : The label for this LED. If omitted, the label is
taken from the node name (excluding the unit address).
- label : The label for this LED. If omitted, the label is taken from the node
name (excluding the unit address). It has to uniquely identify
a device, i.e. no other LED class device can be assigned the same
label.
- linux,default-trigger : This parameter, if present, is a
string defining the trigger assigned to the LED. Current triggers are:

View File

@ -26,16 +26,18 @@ LED sub-node properties:
Examples:
#include <dt-bindings/gpio/gpio.h>
leds {
compatible = "gpio-leds";
hdd {
label = "IDE Activity";
gpios = <&mcu_pio 0 1>; /* Active low */
gpios = <&mcu_pio 0 GPIO_ACTIVE_LOW>;
linux,default-trigger = "ide-disk";
};
fault {
gpios = <&mcu_pio 1 0>;
gpios = <&mcu_pio 1 GPIO_ACTIVE_HIGH>;
/* Keep LED on if BIOS detected hardware fault */
default-state = "keep";
};
@ -44,11 +46,11 @@ leds {
run-control {
compatible = "gpio-leds";
red {
gpios = <&mpc8572 6 0>;
gpios = <&mpc8572 6 GPIO_ACTIVE_HIGH>;
default-state = "off";
};
green {
gpios = <&mpc8572 7 0>;
gpios = <&mpc8572 7 GPIO_ACTIVE_HIGH>;
default-state = "on";
};
};
@ -57,7 +59,7 @@ leds {
compatible = "gpio-leds";
charger-led {
gpios = <&gpio1 2 0>;
gpios = <&gpio1 2 GPIO_ACTIVE_HIGH>;
linux,default-trigger = "max8903-charger-charging";
retain-state-suspended;
};

View File

@ -0,0 +1,43 @@
Binding for Qualcomm PM8941 WLED driver
Required properties:
- compatible: should be "qcom,pm8941-wled"
- reg: slave address
Optional properties:
- label: The label for this led
See Documentation/devicetree/bindings/leds/common.txt
- linux,default-trigger: Default trigger assigned to the LED
See Documentation/devicetree/bindings/leds/common.txt
- qcom,cs-out: bool; enable current sink output
- qcom,cabc: bool; enable content adaptive backlight control
- qcom,ext-gen: bool; use externally generated modulator signal to dim
- qcom,current-limit: mA; per-string current limit; value from 0 to 25
default: 20mA
- qcom,current-boost-limit: mA; boost current limit; one of:
105, 385, 525, 805, 980, 1260, 1400, 1680
default: 805mA
- qcom,switching-freq: kHz; switching frequency; one of:
600, 640, 685, 738, 800, 872, 960, 1066, 1200, 1371,
1600, 1920, 2400, 3200, 4800, 9600,
default: 1600kHz
- qcom,ovp: V; Over-voltage protection limit; one of:
27, 29, 32, 35
default: 29V
- qcom,num-strings: #; number of led strings attached; value from 1 to 3
default: 2
Example:
pm8941-wled@d800 {
compatible = "qcom,pm8941-wled";
reg = <0xd800>;
label = "backlight";
qcom,cs-out;
qcom,current-limit = <20>;
qcom,current-boost-limit = <805>;
qcom,switching-freq = <1600>;
qcom,ovp = <29>;
qcom,num-strings = <2>;
};

View File

@ -0,0 +1,43 @@
ARM MHU Mailbox Driver
======================
The ARM's Message-Handling-Unit (MHU) is a mailbox controller that has
3 independent channels/links to communicate with remote processor(s).
MHU links are hardwired on a platform. A link raises interrupt for any
received data. However, there is no specified way of knowing if the sent
data has been read by the remote. This driver assumes the sender polls
STAT register and the remote clears it after having read the data.
The last channel is specified to be a 'Secure' resource, hence can't be
used by Linux running NS.
Mailbox Device Node:
====================
Required properties:
--------------------
- compatible: Shall be "arm,mhu" & "arm,primecell"
- reg: Contains the mailbox register address range (base
address and length)
- #mbox-cells Shall be 1 - the index of the channel needed.
- interrupts: Contains the interrupt information corresponding to
each of the 3 links of MHU.
Example:
--------
mhu: mailbox@2b1f0000 {
#mbox-cells = <1>;
compatible = "arm,mhu", "arm,primecell";
reg = <0 0x2b1f0000 0x1000>;
interrupts = <0 36 4>, /* LP-NonSecure */
<0 35 4>, /* HP-NonSecure */
<0 37 4>; /* Secure */
clocks = <&clock 0 2 1>;
clock-names = "apb_pclk";
};
mhu_client: scb@2e000000 {
compatible = "fujitsu,mb86s70-scb-1.0";
reg = <0 0x2e000000 0x4000>;
mboxes = <&mhu 1>; /* HP-NonSecure */
};

View File

@ -1,37 +0,0 @@
* Interrupt Controller
Properties:
- compatible: "brcm,bcm3384-intc"
Compatibility with BCM3384 and possibly other BCM33xx/BCM63xx SoCs.
- reg: Address/length pairs for each mask/status register set. Length must
be 8. If multiple register sets are specified, the first set will
handle IRQ offsets 0..31, the second set 32..63, and so on.
- interrupt-controller: This is an interrupt controller.
- #interrupt-cells: Must be <1>. Just a simple IRQ offset; no level/edge
or polarity configuration is possible with this controller.
- interrupt-parent: This controller is cascaded from a MIPS CPU HW IRQ, or
from another INTC.
- interrupts: The IRQ on the parent controller.
Example:
periph_intc: periph_intc@14e00038 {
compatible = "brcm,bcm3384-intc";
/*
* IRQs 0..31: mask reg 0x14e00038, status reg 0x14e0003c
* IRQs 32..63: mask reg 0x14e00340, status reg 0x14e00344
*/
reg = <0x14e00038 0x8 0x14e00340 0x8>;
interrupt-controller;
#interrupt-cells = <1>;
interrupt-parent = <&cpu_intc>;
interrupts = <4>;
};

View File

@ -1,11 +0,0 @@
* Broadcom cable/DSL platforms
SoCs:
Required properties:
- compatible: "brcm,bcm3384", "brcm,bcm33843"
Boards:
Required properties:
- compatible: "brcm,bcm93384wvg"

View File

@ -0,0 +1,12 @@
* Broadcom cable/DSL/settop platforms
Required properties:
- compatible: "brcm,bcm3384", "brcm,bcm33843"
"brcm,bcm3384-viper", "brcm,bcm33843-viper"
"brcm,bcm6328", "brcm,bcm6368",
"brcm,bcm7125", "brcm,bcm7346", "brcm,bcm7358", "brcm,bcm7360",
"brcm,bcm7362", "brcm,bcm7420", "brcm,bcm7425"
The experimental -viper variants are for running Linux on the 3384's
BMIPS4355 cable modem CPU instead of the BMIPS5000 application processor.

View File

@ -0,0 +1,42 @@
Imagination Pistachio SoC
=========================
Required properties:
--------------------
- compatible: Must include "img,pistachio".
CPU nodes:
----------
A "cpus" node is required. Required properties:
- #address-cells: Must be 1.
- #size-cells: Must be 0.
A CPU sub-node is also required for at least CPU 0. Since the topology may
be probed via CPS, it is not necessary to specify secondary CPUs. Required
propertis:
- device_type: Must be "cpu".
- compatible: Must be "mti,interaptiv".
- reg: CPU number.
- clocks: Must include the CPU clock. See ../../clock/clock-bindings.txt for
details on clock bindings.
Example:
cpus {
#address-cells = <1>;
#size-cells = <0>;
cpu0: cpu@0 {
device_type = "cpu";
compatible = "mti,interaptiv";
reg = <0>;
clocks = <&clk_core CLK_MIPS>;
};
};
Boot protocol:
--------------
In accordance with the MIPS UHI specification[1], the bootloader must pass the
following arguments to the kernel:
- $a0: -2.
- $a1: KSEG0 address of the flattened device-tree blob.
[1] http://prplfoundation.org/wiki/MIPS_documentation

View File

@ -19,6 +19,12 @@ The following properties are common to the Ethernet controllers:
- phy: the same as "phy-handle" property, not recommended for new bindings.
- phy-device: the same as "phy-handle" property, not recommended for new
bindings.
- rx-fifo-depth: the size of the controller's receive fifo in bytes. This
is used for components that can have configurable receive fifo sizes,
and is useful for determining certain configuration settings such as
flow control thresholds.
- tx-fifo-depth: the size of the controller's transmit fifo in bytes. This
is used for components that can have configurable fifo sizes.
Child nodes of the Ethernet controller are typically the individual PHY devices
connected via the MDIO bus (sometimes the MDIO bus controller is separate).

View File

@ -45,6 +45,8 @@ Optional properties:
If not passed then the system clock will be used and this is fine on some
platforms.
- snps,burst_len: The AXI burst lenth value of the AXI BUS MODE register.
- tx-fifo-depth: See ethernet.txt file in the same directory
- rx-fifo-depth: See ethernet.txt file in the same directory
Examples:
@ -59,6 +61,8 @@ Examples:
phy-mode = "gmii";
snps,multicast-filter-bins = <256>;
snps,perfect-filter-entries = <128>;
rx-fifo-depth = <16384>;
tx-fifo-depth = <16384>;
clocks = <&clock>;
clock-names = "stmmaceth";
};

View File

@ -0,0 +1,17 @@
Conexant Digicolor Real Time Clock controller
This binding currently supports the CX92755 SoC.
Required properties:
- compatible: should be "cnxt,cx92755-rtc"
- reg: physical base address of the controller and length of memory mapped
region.
- interrupts: rtc alarm interrupt
Example:
rtc@f0000c30 {
compatible = "cnxt,cx92755-rtc";
reg = <0xf0000c30 0x18>;
interrupts = <25>;
};

View File

@ -7,6 +7,11 @@ Required properties:
region.
- interrupts: rtc alarm interrupt
Optional properties:
- stmp,crystal-freq: override crystal frequency as determined from fuse bits.
Only <32000> and <32768> are possible for the hardware. Use <0> for
"no crystal".
Example:
rtc@80056000 {

View File

@ -0,0 +1,34 @@
* STMicroelectronics SAS. ST33ZP24 TPM SoC
Required properties:
- compatible: Should be "st,st33zp24-spi".
- spi-max-frequency: Maximum SPI frequency (<= 10000000).
Optional ST33ZP24 Properties:
- interrupt-parent: phandle for the interrupt gpio controller
- interrupts: GPIO interrupt to which the chip is connected
- lpcpd-gpios: Output GPIO pin used for ST33ZP24 power management D1/D2 state.
If set, power must be present when the platform is going into sleep/hibernate mode.
Optional SoC Specific Properties:
- pinctrl-names: Contains only one value - "default".
- pintctrl-0: Specifies the pin control groups used for this controller.
Example (for ARM-based BeagleBoard xM with ST33ZP24 on SPI4):
&mcspi4 {
status = "okay";
st33zp24@0 {
compatible = "st,st33zp24-spi";
spi-max-frequency = <10000000>;
interrupt-parent = <&gpio5>;
interrupts = <7 IRQ_TYPE_LEVEL_HIGH>;
lpcpd-gpios = <&gpio5 15 GPIO_ACTIVE_HIGH>;
};
};

View File

@ -1,7 +1,7 @@
Ingenic JZ4740 I2S controller
Required properties:
- compatible : "ingenic,jz4740-i2s"
- compatible : "ingenic,jz4740-i2s" or "ingenic,jz4780-i2s"
- reg : I2S registers location and length
- clocks : AIC and I2S PLL clock specifiers.
- clock-names: "aic" and "i2s"

View File

@ -0,0 +1,22 @@
max98925 audio CODEC
This device supports I2C.
Required properties:
- compatible : "maxim,max98925"
- vmon-slot-no : slot number used to send voltage information
- imon-slot-no : slot number used to send current information
- reg : the I2C address of the device for I2C
Example:
codec: max98925@1a {
compatible = "maxim,max98925";
vmon-slot-no = <0>;
imon-slot-no = <2>;
reg = <0x1a>;
};

View File

@ -18,6 +18,7 @@ Required properties:
* Headphones
* Speakers
* Mic Jack
* Int Mic
- nvidia,i2s-controller : The phandle of the Tegra I2S controller that's
connected to the CODEC.

View File

@ -0,0 +1,43 @@
* Qualcomm Technologies LPASS CPU DAI
This node models the Qualcomm Technologies Low-Power Audio SubSystem (LPASS).
Required properties:
- compatible : "qcom,lpass-cpu"
- clocks : Must contain an entry for each entry in clock-names.
- clock-names : A list which must include the following entries:
* "ahbix-clk"
* "mi2s-osr-clk"
* "mi2s-bit-clk"
- interrupts : Must contain an entry for each entry in
interrupt-names.
- interrupt-names : A list which must include the following entries:
* "lpass-irq-lpaif"
- pinctrl-N : One property must exist for each entry in
pinctrl-names. See ../pinctrl/pinctrl-bindings.txt
for details of the property values.
- pinctrl-names : Must contain a "default" entry.
- reg : Must contain an address for each entry in reg-names.
- reg-names : A list which must include the following entries:
* "lpass-lpaif"
Optional properties:
- qcom,adsp : Phandle for the audio DSP node
Example:
lpass@28100000 {
compatible = "qcom,lpass-cpu";
clocks = <&lcc AHBIX_CLK>, <&lcc MI2S_OSR_CLK>, <&lcc MI2S_BIT_CLK>;
clock-names = "ahbix-clk", "mi2s-osr-clk", "mi2s-bit-clk";
interrupts = <0 85 1>;
interrupt-names = "lpass-irq-lpaif";
pinctrl-names = "default", "idle";
pinctrl-0 = <&mi2s_default>;
pinctrl-1 = <&mi2s_idle>;
reg = <0x28100000 0x10000>;
reg-names = "lpass-lpaif";
qcom,adsp = <&adsp>;
};

View File

@ -29,9 +29,17 @@ SSI subnode properties:
- shared-pin : if shared clock pin
- pio-transfer : use PIO transfer mode
- no-busif : BUSIF is not ussed when [mem -> SSI] via DMA case
- dma : Should contain Audio DMAC entry
- dma-names : SSI case "rx" (=playback), "tx" (=capture)
SSIU case "rxu" (=playback), "txu" (=capture)
SRC subnode properties:
no properties at this point
- dma : Should contain Audio DMAC entry
- dma-names : "rx" (=playback), "tx" (=capture)
DVC subnode properties:
- dma : Should contain Audio DMAC entry
- dma-names : "tx" (=playback/capture)
DAI subnode properties:
- playback : list of playback modules
@ -45,56 +53,145 @@ rcar_sound: rcar_sound@ec500000 {
reg = <0 0xec500000 0 0x1000>, /* SCU */
<0 0xec5a0000 0 0x100>, /* ADG */
<0 0xec540000 0 0x1000>, /* SSIU */
<0 0xec541000 0 0x1280>; /* SSI */
<0 0xec541000 0 0x1280>, /* SSI */
<0 0xec740000 0 0x200>; /* Audio DMAC peri peri*/
reg-names = "scu", "adg", "ssiu", "ssi", "audmapp";
clocks = <&mstp10_clks R8A7790_CLK_SSI_ALL>,
<&mstp10_clks R8A7790_CLK_SSI9>, <&mstp10_clks R8A7790_CLK_SSI8>,
<&mstp10_clks R8A7790_CLK_SSI7>, <&mstp10_clks R8A7790_CLK_SSI6>,
<&mstp10_clks R8A7790_CLK_SSI5>, <&mstp10_clks R8A7790_CLK_SSI4>,
<&mstp10_clks R8A7790_CLK_SSI3>, <&mstp10_clks R8A7790_CLK_SSI2>,
<&mstp10_clks R8A7790_CLK_SSI1>, <&mstp10_clks R8A7790_CLK_SSI0>,
<&mstp10_clks R8A7790_CLK_SCU_SRC9>, <&mstp10_clks R8A7790_CLK_SCU_SRC8>,
<&mstp10_clks R8A7790_CLK_SCU_SRC7>, <&mstp10_clks R8A7790_CLK_SCU_SRC6>,
<&mstp10_clks R8A7790_CLK_SCU_SRC5>, <&mstp10_clks R8A7790_CLK_SCU_SRC4>,
<&mstp10_clks R8A7790_CLK_SCU_SRC3>, <&mstp10_clks R8A7790_CLK_SCU_SRC2>,
<&mstp10_clks R8A7790_CLK_SCU_SRC1>, <&mstp10_clks R8A7790_CLK_SCU_SRC0>,
<&mstp10_clks R8A7790_CLK_SCU_DVC0>, <&mstp10_clks R8A7790_CLK_SCU_DVC1>,
<&audio_clk_a>, <&audio_clk_b>, <&audio_clk_c>, <&m2_clk>;
clock-names = "ssi-all",
"ssi.9", "ssi.8", "ssi.7", "ssi.6", "ssi.5",
"ssi.4", "ssi.3", "ssi.2", "ssi.1", "ssi.0",
"src.9", "src.8", "src.7", "src.6", "src.5",
"src.4", "src.3", "src.2", "src.1", "src.0",
"dvc.0", "dvc.1",
"clk_a", "clk_b", "clk_c", "clk_i";
rcar_sound,dvc {
dvc0: dvc@0 { };
dvc1: dvc@1 { };
dvc0: dvc@0 {
dmas = <&audma0 0xbc>;
dma-names = "tx";
};
dvc1: dvc@1 {
dmas = <&audma0 0xbe>;
dma-names = "tx";
};
};
rcar_sound,src {
src0: src@0 { };
src1: src@1 { };
src2: src@2 { };
src3: src@3 { };
src4: src@4 { };
src5: src@5 { };
src6: src@6 { };
src7: src@7 { };
src8: src@8 { };
src9: src@9 { };
src0: src@0 {
interrupts = <0 352 IRQ_TYPE_LEVEL_HIGH>;
dmas = <&audma0 0x85>, <&audma1 0x9a>;
dma-names = "rx", "tx";
};
src1: src@1 {
interrupts = <0 353 IRQ_TYPE_LEVEL_HIGH>;
dmas = <&audma0 0x87>, <&audma1 0x9c>;
dma-names = "rx", "tx";
};
src2: src@2 {
interrupts = <0 354 IRQ_TYPE_LEVEL_HIGH>;
dmas = <&audma0 0x89>, <&audma1 0x9e>;
dma-names = "rx", "tx";
};
src3: src@3 {
interrupts = <0 355 IRQ_TYPE_LEVEL_HIGH>;
dmas = <&audma0 0x8b>, <&audma1 0xa0>;
dma-names = "rx", "tx";
};
src4: src@4 {
interrupts = <0 356 IRQ_TYPE_LEVEL_HIGH>;
dmas = <&audma0 0x8d>, <&audma1 0xb0>;
dma-names = "rx", "tx";
};
src5: src@5 {
interrupts = <0 357 IRQ_TYPE_LEVEL_HIGH>;
dmas = <&audma0 0x8f>, <&audma1 0xb2>;
dma-names = "rx", "tx";
};
src6: src@6 {
interrupts = <0 358 IRQ_TYPE_LEVEL_HIGH>;
dmas = <&audma0 0x91>, <&audma1 0xb4>;
dma-names = "rx", "tx";
};
src7: src@7 {
interrupts = <0 359 IRQ_TYPE_LEVEL_HIGH>;
dmas = <&audma0 0x93>, <&audma1 0xb6>;
dma-names = "rx", "tx";
};
src8: src@8 {
interrupts = <0 360 IRQ_TYPE_LEVEL_HIGH>;
dmas = <&audma0 0x95>, <&audma1 0xb8>;
dma-names = "rx", "tx";
};
src9: src@9 {
interrupts = <0 361 IRQ_TYPE_LEVEL_HIGH>;
dmas = <&audma0 0x97>, <&audma1 0xba>;
dma-names = "rx", "tx";
};
};
rcar_sound,ssi {
ssi0: ssi@0 {
interrupts = <0 370 IRQ_TYPE_LEVEL_HIGH>;
dmas = <&audma0 0x01>, <&audma1 0x02>, <&audma0 0x15>, <&audma1 0x16>;
dma-names = "rx", "tx", "rxu", "txu";
};
ssi1: ssi@1 {
interrupts = <0 371 IRQ_TYPE_LEVEL_HIGH>;
dmas = <&audma0 0x03>, <&audma1 0x04>, <&audma0 0x49>, <&audma1 0x4a>;
dma-names = "rx", "tx", "rxu", "txu";
};
ssi2: ssi@2 {
interrupts = <0 372 IRQ_TYPE_LEVEL_HIGH>;
dmas = <&audma0 0x05>, <&audma1 0x06>, <&audma0 0x63>, <&audma1 0x64>;
dma-names = "rx", "tx", "rxu", "txu";
};
ssi3: ssi@3 {
interrupts = <0 373 IRQ_TYPE_LEVEL_HIGH>;
dmas = <&audma0 0x07>, <&audma1 0x08>, <&audma0 0x6f>, <&audma1 0x70>;
dma-names = "rx", "tx", "rxu", "txu";
};
ssi4: ssi@4 {
interrupts = <0 374 IRQ_TYPE_LEVEL_HIGH>;
dmas = <&audma0 0x09>, <&audma1 0x0a>, <&audma0 0x71>, <&audma1 0x72>;
dma-names = "rx", "tx", "rxu", "txu";
};
ssi5: ssi@5 {
interrupts = <0 375 IRQ_TYPE_LEVEL_HIGH>;
dmas = <&audma0 0x0b>, <&audma1 0x0c>, <&audma0 0x73>, <&audma1 0x74>;
dma-names = "rx", "tx", "rxu", "txu";
};
ssi6: ssi@6 {
interrupts = <0 376 IRQ_TYPE_LEVEL_HIGH>;
dmas = <&audma0 0x0d>, <&audma1 0x0e>, <&audma0 0x75>, <&audma1 0x76>;
dma-names = "rx", "tx", "rxu", "txu";
};
ssi7: ssi@7 {
interrupts = <0 377 IRQ_TYPE_LEVEL_HIGH>;
dmas = <&audma0 0x0f>, <&audma1 0x10>, <&audma0 0x79>, <&audma1 0x7a>;
dma-names = "rx", "tx", "rxu", "txu";
};
ssi8: ssi@8 {
interrupts = <0 378 IRQ_TYPE_LEVEL_HIGH>;
dmas = <&audma0 0x11>, <&audma1 0x12>, <&audma0 0x7b>, <&audma1 0x7c>;
dma-names = "rx", "tx", "rxu", "txu";
};
ssi9: ssi@9 {
interrupts = <0 379 IRQ_TYPE_LEVEL_HIGH>;
dmas = <&audma0 0x13>, <&audma1 0x14>, <&audma0 0x7d>, <&audma1 0x7e>;
dma-names = "rx", "tx", "rxu", "txu";
};
};

View File

@ -0,0 +1,67 @@
Renesas Sampling Rate Convert Sound Card:
Renesas Sampling Rate Convert Sound Card specifies audio DAI connections of SoC <-> codec.
Required properties:
- compatible : "renesas,rsrc-card,<board>"
Examples with soctypes are:
- "renesas,rsrc-card,lager"
- "renesas,rsrc-card,koelsch"
Optional properties:
- card_name : User specified audio sound card name, one string
property.
- cpu : CPU sub-node
- codec : CODEC sub-node
Optional subnode properties:
- format : CPU/CODEC common audio format.
"i2s", "right_j", "left_j" , "dsp_a"
"dsp_b", "ac97", "pdm", "msb", "lsb"
- frame-master : Indicates dai-link frame master.
phandle to a cpu or codec subnode.
- bitclock-master : Indicates dai-link bit clock master.
phandle to a cpu or codec subnode.
- bitclock-inversion : bool property. Add this if the
dai-link uses bit clock inversion.
- frame-inversion : bool property. Add this if the
dai-link uses frame clock inversion.
- convert-rate : platform specified sampling rate convert
Required CPU/CODEC subnodes properties:
- sound-dai : phandle and port of CPU/CODEC
Optional CPU/CODEC subnodes properties:
- clocks / system-clock-frequency : specify subnode's clock if needed.
it can be specified via "clocks" if system has
clock node (= common clock), or "system-clock-frequency"
(if system doens't support common clock)
If a clock is specified, it is
enabled with clk_prepare_enable()
in dai startup() and disabled with
clk_disable_unprepare() in dai
shutdown().
Example
sound {
compatible = "renesas,rsrc-card,lager";
card-name = "rsnd-ak4643";
format = "left_j";
bitclock-master = <&sndcodec>;
frame-master = <&sndcodec>;
sndcpu: cpu {
sound-dai = <&rcar_sound>;
};
sndcodec: codec {
sound-dai = <&ak4643>;
system-clock-frequency = <11289600>;
};
};

View File

@ -0,0 +1,23 @@
* Sound complex for Storm boards
Models a soundcard for Storm boards with the Qualcomm Technologies IPQ806x SOC
connected to a MAX98357A DAC via I2S.
Required properties:
- compatible : "google,storm-audio"
- cpu : Phandle of the CPU DAI
- codec : Phandle of the codec DAI
Optional properties:
- qcom,model : The user-visible name of this sound card.
Example:
sound {
compatible = "google,storm-audio";
qcom,model = "ipq806x-storm";
cpu = <&lpass_cpu>;
codec = <&max98357a>;
};

View File

@ -10,6 +10,13 @@ Required properties:
- reg : the I2C address of the device for I2C, the chip select
number for SPI.
- PVDD-supply, DVDD-supply : Power supplies for the device, as covered
in Documentation/devicetree/bindings/regulator/regulator.txt
Optional properties:
- wlf,reset-gpio: A GPIO specifier for the GPIO controlling the reset pin
Example:
codec: wm8804@1a {

View File

@ -15,6 +15,7 @@ Table of Contents
1) Entry point for arch/arm
2) Entry point for arch/powerpc
3) Entry point for arch/x86
4) Entry point for arch/mips/bmips
II - The DT block format
1) Header
@ -288,6 +289,33 @@ it with special cases.
or initrd address. It simply holds information which can not be retrieved
otherwise like interrupt routing or a list of devices behind an I2C bus.
4) Entry point for arch/mips/bmips
----------------------------------
Some bootloaders only support a single entry point, at the start of the
kernel image. Other bootloaders will jump to the ELF start address.
Both schemes are supported; CONFIG_BOOT_RAW=y and CONFIG_NO_EXCEPT_FILL=y,
so the first instruction immediately jumps to kernel_entry().
Similar to the arch/arm case (b), a DT-aware bootloader is expected to
set up the following registers:
a0 : 0
a1 : 0xffffffff
a2 : Physical pointer to the device tree block (defined in chapter
II) in RAM. The device tree can be located anywhere in the first
512MB of the physical address space (0x00000000 - 0x1fffffff),
aligned on a 64 bit boundary.
Legacy bootloaders do not use this convention, and they do not pass in a
DT block. In this case, Linux will look for a builtin DTB, selected via
CONFIG_DT_*.
This convention is defined for 32-bit systems only, as there are not
currently any 64-bit BMIPS implementations.
II - The DT block format
========================

View File

@ -289,6 +289,10 @@ IRQ
devm_request_irq()
devm_request_threaded_irq()
LED
devm_led_classdev_register()
devm_led_classdev_unregister()
MDIO
devm_mdiobus_alloc()
devm_mdiobus_alloc_size()

View File

@ -196,7 +196,7 @@ prototypes:
void (*invalidatepage) (struct page *, unsigned int, unsigned int);
int (*releasepage) (struct page *, int);
void (*freepage)(struct page *);
int (*direct_IO)(int, struct kiocb *, struct iov_iter *iter, loff_t offset);
int (*direct_IO)(struct kiocb *, struct iov_iter *iter, loff_t offset);
int (*migratepage)(struct address_space *, struct page *, struct page *);
int (*launder_page)(struct page *);
int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long);
@ -429,8 +429,6 @@ prototypes:
loff_t (*llseek) (struct file *, loff_t, int);
ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
ssize_t (*aio_read) (struct kiocb *, const struct iovec *, unsigned long, loff_t);
ssize_t (*aio_write) (struct kiocb *, const struct iovec *, unsigned long, loff_t);
ssize_t (*read_iter) (struct kiocb *, struct iov_iter *);
ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
int (*iterate) (struct file *, struct dir_context *);
@ -525,6 +523,7 @@ prototypes:
void (*close)(struct vm_area_struct*);
int (*fault)(struct vm_area_struct*, struct vm_fault *);
int (*page_mkwrite)(struct vm_area_struct *, struct vm_fault *);
int (*pfn_mkwrite)(struct vm_area_struct *, struct vm_fault *);
int (*access)(struct vm_area_struct *, unsigned long, void*, int, int);
locking rules:
@ -534,6 +533,7 @@ close: yes
fault: yes can return with page locked
map_pages: yes
page_mkwrite: yes can return with page locked
pfn_mkwrite: yes
access: yes
->fault() is called when a previously not present pte is about
@ -560,6 +560,12 @@ the page has been truncated, the filesystem should not look up a new page
like the ->fault() handler, but simply return with VM_FAULT_NOPAGE, which
will cause the VM to retry the fault.
->pfn_mkwrite() is the same as page_mkwrite but when the pte is
VM_PFNMAP or VM_MIXEDMAP with a page-less entry. Expected return is
VM_FAULT_NOPAGE. Or one of the VM_FAULT_ERROR types. The default behavior
after this call is to make the pte read-write, unless pfn_mkwrite returns
an error.
->access() is called when get_user_pages() fails in
access_process_vm(), typically used to debug a process through
/proc/pid/mem or ptrace. This function is needed only for

View File

@ -471,3 +471,15 @@ in your dentry operations instead.
[mandatory]
f_dentry is gone; use f_path.dentry, or, better yet, see if you can avoid
it entirely.
--
[mandatory]
never call ->read() and ->write() directly; use __vfs_{read,write} or
wrappers; instead of checking for ->write or ->read being NULL, look for
FMODE_CAN_{WRITE,READ} in file->f_mode.
--
[mandatory]
do _not_ use new_sync_{read,write} for ->read/->write; leave it NULL
instead.
--
[mandatory]
->aio_read/->aio_write are gone. Use ->read_iter/->write_iter.

View File

@ -200,12 +200,12 @@ contains details information about the process itself. Its fields are
explained in Table 1-4.
(for SMP CONFIG users)
For making accounting scalable, RSS related information are handled in
asynchronous manner and the vaule may not be very precise. To see a precise
For making accounting scalable, RSS related information are handled in an
asynchronous manner and the value may not be very precise. To see a precise
snapshot of a moment, you can see /proc/<pid>/smaps file and scan page table.
It's slow but very precise.
Table 1-2: Contents of the status files (as of 2.6.30-rc7)
Table 1-2: Contents of the status files (as of 3.20.0)
..............................................................................
Field Content
Name filename of the executable
@ -213,6 +213,7 @@ Table 1-2: Contents of the status files (as of 2.6.30-rc7)
in an uninterruptible wait, Z is zombie,
T is traced or stopped)
Tgid thread group ID
Ngid NUMA group ID (0 if none)
Pid process id
PPid process id of the parent process
TracerPid PID of process tracing this process (0 if not)
@ -220,6 +221,10 @@ Table 1-2: Contents of the status files (as of 2.6.30-rc7)
Gid Real, effective, saved set, and file system GIDs
FDSize number of file descriptor slots currently allocated
Groups supplementary group list
NStgid descendant namespace thread group ID hierarchy
NSpid descendant namespace process ID hierarchy
NSpgid descendant namespace process group ID hierarchy
NSsid descendant namespace session ID hierarchy
VmPeak peak virtual memory size
VmSize total program size
VmLck locked memory size
@ -1704,6 +1709,10 @@ A typical output is
flags: 0100002
mnt_id: 19
All locks associated with a file descriptor are shown in its fdinfo too.
lock: 1: FLOCK ADVISORY WRITE 359 00:13:11691 0 EOF
The files such as eventfd, fsnotify, signalfd, epoll among the regular pos/flags
pair provide additional information particular to the objects they represent.

View File

@ -590,7 +590,7 @@ struct address_space_operations {
void (*invalidatepage) (struct page *, unsigned int, unsigned int);
int (*releasepage) (struct page *, int);
void (*freepage)(struct page *);
ssize_t (*direct_IO)(int, struct kiocb *, struct iov_iter *iter, loff_t offset);
ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter, loff_t offset);
/* migrate the contents of a page to the specified target */
int (*migratepage) (struct page *, struct page *);
int (*launder_page) (struct page *);
@ -804,8 +804,6 @@ struct file_operations {
loff_t (*llseek) (struct file *, loff_t, int);
ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
ssize_t (*aio_read) (struct kiocb *, const struct iovec *, unsigned long, loff_t);
ssize_t (*aio_write) (struct kiocb *, const struct iovec *, unsigned long, loff_t);
ssize_t (*read_iter) (struct kiocb *, struct iov_iter *);
ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
int (*iterate) (struct file *, struct dir_context *);
@ -838,14 +836,10 @@ otherwise noted.
read: called by read(2) and related system calls
aio_read: vectored, possibly asynchronous read
read_iter: possibly asynchronous read with iov_iter as destination
write: called by write(2) and related system calls
aio_write: vectored, possibly asynchronous write
write_iter: possibly asynchronous write with iov_iter as source
iterate: called when the VFS needs to read the directory contents

View File

@ -2317,7 +2317,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
noexec32=off: disable non-executable mappings
read implies executable mappings
nofpu [SH] Disable hardware FPU at boot time.
nofpu [MIPS,SH] Disable hardware FPU at boot time.
nofxsr [BUGS=X86-32] Disables x86 floating point extended
register save and restore. The kernel will only save

View File

@ -0,0 +1,22 @@
Flash LED handling under Linux
==============================
Some LED devices provide two modes - torch and flash. In the LED subsystem
those modes are supported by LED class (see Documentation/leds/leds-class.txt)
and LED Flash class respectively. The torch mode related features are enabled
by default and the flash ones only if a driver declares it by setting
LED_DEV_CAP_FLASH flag.
In order to enable the support for flash LEDs CONFIG_LEDS_CLASS_FLASH symbol
must be defined in the kernel config. A LED Flash class driver must be
registered in the LED subsystem with led_classdev_flash_register function.
Following sysfs attributes are exposed for controlling flash LED devices:
(see Documentation/ABI/testing/sysfs-class-led-flash)
- flash_brightness
- max_flash_brightness
- flash_timeout
- max_flash_timeout
- flash_strobe
- flash_fault

View File

@ -0,0 +1,301 @@
Wei Yang <weiyang@linux.vnet.ibm.com>
Benjamin Herrenschmidt <benh@au1.ibm.com>
Bjorn Helgaas <bhelgaas@google.com>
26 Aug 2014
This document describes the requirement from hardware for PCI MMIO resource
sizing and assignment on PowerKVM and how generic PCI code handles this
requirement. The first two sections describe the concepts of Partitionable
Endpoints and the implementation on P8 (IODA2). The next two sections talks
about considerations on enabling SRIOV on IODA2.
1. Introduction to Partitionable Endpoints
A Partitionable Endpoint (PE) is a way to group the various resources
associated with a device or a set of devices to provide isolation between
partitions (i.e., filtering of DMA, MSIs etc.) and to provide a mechanism
to freeze a device that is causing errors in order to limit the possibility
of propagation of bad data.
There is thus, in HW, a table of PE states that contains a pair of "frozen"
state bits (one for MMIO and one for DMA, they get set together but can be
cleared independently) for each PE.
When a PE is frozen, all stores in any direction are dropped and all loads
return all 1's value. MSIs are also blocked. There's a bit more state that
captures things like the details of the error that caused the freeze etc., but
that's not critical.
The interesting part is how the various PCIe transactions (MMIO, DMA, ...)
are matched to their corresponding PEs.
The following section provides a rough description of what we have on P8
(IODA2). Keep in mind that this is all per PHB (PCI host bridge). Each PHB
is a completely separate HW entity that replicates the entire logic, so has
its own set of PEs, etc.
2. Implementation of Partitionable Endpoints on P8 (IODA2)
P8 supports up to 256 Partitionable Endpoints per PHB.
* Inbound
For DMA, MSIs and inbound PCIe error messages, we have a table (in
memory but accessed in HW by the chip) that provides a direct
correspondence between a PCIe RID (bus/dev/fn) with a PE number.
We call this the RTT.
- For DMA we then provide an entire address space for each PE that can
contain two "windows", depending on the value of PCI address bit 59.
Each window can be configured to be remapped via a "TCE table" (IOMMU
translation table), which has various configurable characteristics
not described here.
- For MSIs, we have two windows in the address space (one at the top of
the 32-bit space and one much higher) which, via a combination of the
address and MSI value, will result in one of the 2048 interrupts per
bridge being triggered. There's a PE# in the interrupt controller
descriptor table as well which is compared with the PE# obtained from
the RTT to "authorize" the device to emit that specific interrupt.
- Error messages just use the RTT.
* Outbound. That's where the tricky part is.
Like other PCI host bridges, the Power8 IODA2 PHB supports "windows"
from the CPU address space to the PCI address space. There is one M32
window and sixteen M64 windows. They have different characteristics.
First what they have in common: they forward a configurable portion of
the CPU address space to the PCIe bus and must be naturally aligned
power of two in size. The rest is different:
- The M32 window:
* Is limited to 4GB in size.
* Drops the top bits of the address (above the size) and replaces
them with a configurable value. This is typically used to generate
32-bit PCIe accesses. We configure that window at boot from FW and
don't touch it from Linux; it's usually set to forward a 2GB
portion of address space from the CPU to PCIe
0x8000_0000..0xffff_ffff. (Note: The top 64KB are actually
reserved for MSIs but this is not a problem at this point; we just
need to ensure Linux doesn't assign anything there, the M32 logic
ignores that however and will forward in that space if we try).
* It is divided into 256 segments of equal size. A table in the chip
maps each segment to a PE#. That allows portions of the MMIO space
to be assigned to PEs on a segment granularity. For a 2GB window,
the segment granularity is 2GB/256 = 8MB.
Now, this is the "main" window we use in Linux today (excluding
SR-IOV). We basically use the trick of forcing the bridge MMIO windows
onto a segment alignment/granularity so that the space behind a bridge
can be assigned to a PE.
Ideally we would like to be able to have individual functions in PEs
but that would mean using a completely different address allocation
scheme where individual function BARs can be "grouped" to fit in one or
more segments.
- The M64 windows:
* Must be at least 256MB in size.
* Do not translate addresses (the address on PCIe is the same as the
address on the PowerBus). There is a way to also set the top 14
bits which are not conveyed by PowerBus but we don't use this.
* Can be configured to be segmented. When not segmented, we can
specify the PE# for the entire window. When segmented, a window
has 256 segments; however, there is no table for mapping a segment
to a PE#. The segment number *is* the PE#.
* Support overlaps. If an address is covered by multiple windows,
there's a defined ordering for which window applies.
We have code (fairly new compared to the M32 stuff) that exploits that
for large BARs in 64-bit space:
We configure an M64 window to cover the entire region of address space
that has been assigned by FW for the PHB (about 64GB, ignore the space
for the M32, it comes out of a different "reserve"). We configure it
as segmented.
Then we do the same thing as with M32, using the bridge alignment
trick, to match to those giant segments.
Since we cannot remap, we have two additional constraints:
- We do the PE# allocation *after* the 64-bit space has been assigned
because the addresses we use directly determine the PE#. We then
update the M32 PE# for the devices that use both 32-bit and 64-bit
spaces or assign the remaining PE# to 32-bit only devices.
- We cannot "group" segments in HW, so if a device ends up using more
than one segment, we end up with more than one PE#. There is a HW
mechanism to make the freeze state cascade to "companion" PEs but
that only works for PCIe error messages (typically used so that if
you freeze a switch, it freezes all its children). So we do it in
SW. We lose a bit of effectiveness of EEH in that case, but that's
the best we found. So when any of the PEs freezes, we freeze the
other ones for that "domain". We thus introduce the concept of
"master PE" which is the one used for DMA, MSIs, etc., and "secondary
PEs" that are used for the remaining M64 segments.
We would like to investigate using additional M64 windows in "single
PE" mode to overlay over specific BARs to work around some of that, for
example for devices with very large BARs, e.g., GPUs. It would make
sense, but we haven't done it yet.
3. Considerations for SR-IOV on PowerKVM
* SR-IOV Background
The PCIe SR-IOV feature allows a single Physical Function (PF) to
support several Virtual Functions (VFs). Registers in the PF's SR-IOV
Capability control the number of VFs and whether they are enabled.
When VFs are enabled, they appear in Configuration Space like normal
PCI devices, but the BARs in VF config space headers are unusual. For
a non-VF device, software uses BARs in the config space header to
discover the BAR sizes and assign addresses for them. For VF devices,
software uses VF BAR registers in the *PF* SR-IOV Capability to
discover sizes and assign addresses. The BARs in the VF's config space
header are read-only zeros.
When a VF BAR in the PF SR-IOV Capability is programmed, it sets the
base address for all the corresponding VF(n) BARs. For example, if the
PF SR-IOV Capability is programmed to enable eight VFs, and it has a
1MB VF BAR0, the address in that VF BAR sets the base of an 8MB region.
This region is divided into eight contiguous 1MB regions, each of which
is a BAR0 for one of the VFs. Note that even though the VF BAR
describes an 8MB region, the alignment requirement is for a single VF,
i.e., 1MB in this example.
There are several strategies for isolating VFs in PEs:
- M32 window: There's one M32 window, and it is split into 256
equally-sized segments. The finest granularity possible is a 256MB
window with 1MB segments. VF BARs that are 1MB or larger could be
mapped to separate PEs in this window. Each segment can be
individually mapped to a PE via the lookup table, so this is quite
flexible, but it works best when all the VF BARs are the same size. If
they are different sizes, the entire window has to be small enough that
the segment size matches the smallest VF BAR, which means larger VF
BARs span several segments.
- Non-segmented M64 window: A non-segmented M64 window is mapped entirely
to a single PE, so it could only isolate one VF.
- Single segmented M64 windows: A segmented M64 window could be used just
like the M32 window, but the segments can't be individually mapped to
PEs (the segment number is the PE#), so there isn't as much
flexibility. A VF with multiple BARs would have to be in a "domain" of
multiple PEs, which is not as well isolated as a single PE.
- Multiple segmented M64 windows: As usual, each window is split into 256
equally-sized segments, and the segment number is the PE#. But if we
use several M64 windows, they can be set to different base addresses
and different segment sizes. If we have VFs that each have a 1MB BAR
and a 32MB BAR, we could use one M64 window to assign 1MB segments and
another M64 window to assign 32MB segments.
Finally, the plan to use M64 windows for SR-IOV, which will be described
more in the next two sections. For a given VF BAR, we need to
effectively reserve the entire 256 segments (256 * VF BAR size) and
position the VF BAR to start at the beginning of a free range of
segments/PEs inside that M64 window.
The goal is of course to be able to give a separate PE for each VF.
The IODA2 platform has 16 M64 windows, which are used to map MMIO
range to PE#. Each M64 window defines one MMIO range and this range is
divided into 256 segments, with each segment corresponding to one PE.
We decide to leverage this M64 window to map VFs to individual PEs, since
SR-IOV VF BARs are all the same size.
But doing so introduces another problem: total_VFs is usually smaller
than the number of M64 window segments, so if we map one VF BAR directly
to one M64 window, some part of the M64 window will map to another
device's MMIO range.
IODA supports 256 PEs, so segmented windows contain 256 segments, so if
total_VFs is less than 256, we have the situation in Figure 1.0, where
segments [total_VFs, 255] of the M64 window may map to some MMIO range on
other devices:
0 1 total_VFs - 1
+------+------+- -+------+------+
| | | ... | | |
+------+------+- -+------+------+
VF(n) BAR space
0 1 total_VFs - 1 255
+------+------+- -+------+------+- -+------+------+
| | | ... | | | ... | | |
+------+------+- -+------+------+- -+------+------+
M64 window
Figure 1.0 Direct map VF(n) BAR space
Our current solution is to allocate 256 segments even if the VF(n) BAR
space doesn't need that much, as shown in Figure 1.1:
0 1 total_VFs - 1 255
+------+------+- -+------+------+- -+------+------+
| | | ... | | | ... | | |
+------+------+- -+------+------+- -+------+------+
VF(n) BAR space + extra
0 1 total_VFs - 1 255
+------+------+- -+------+------+- -+------+------+
| | | ... | | | ... | | |
+------+------+- -+------+------+- -+------+------+
M64 window
Figure 1.1 Map VF(n) BAR space + extra
Allocating the extra space ensures that the entire M64 window will be
assigned to this one SR-IOV device and none of the space will be
available for other devices. Note that this only expands the space
reserved in software; there are still only total_VFs VFs, and they only
respond to segments [0, total_VFs - 1]. There's nothing in hardware that
responds to segments [total_VFs, 255].
4. Implications for the Generic PCI Code
The PCIe SR-IOV spec requires that the base of the VF(n) BAR space be
aligned to the size of an individual VF BAR.
In IODA2, the MMIO address determines the PE#. If the address is in an M32
window, we can set the PE# by updating the table that translates segments
to PE#s. Similarly, if the address is in an unsegmented M64 window, we can
set the PE# for the window. But if it's in a segmented M64 window, the
segment number is the PE#.
Therefore, the only way to control the PE# for a VF is to change the base
of the VF(n) BAR space in the VF BAR. If the PCI core allocates the exact
amount of space required for the VF(n) BAR space, the VF BAR value is fixed
and cannot be changed.
On the other hand, if the PCI core allocates additional space, the VF BAR
value can be changed as long as the entire VF(n) BAR space remains inside
the space allocated by the core.
Ideally the segment size will be the same as an individual VF BAR size.
Then each VF will be in its own PE. The VF BARs (and therefore the PE#s)
are contiguous. If VF0 is in PE(x), then VF(n) is in PE(x+n). If we
allocate 256 segments, there are (256 - numVFs) choices for the PE# of VF0.
If the segment size is smaller than the VF BAR size, it will take several
segments to cover a VF BAR, and a VF will be in several PEs. This is
possible, but the isolation isn't as good, and it reduces the number of PE#
choices because instead of consuming only numVFs segments, the VF(n) BAR
space will consume (numVFs * n) segments. That means there aren't as many
available segments for adjusting base of the VF(n) BAR space.

View File

@ -74,22 +74,23 @@ Causes of transaction aborts
Syscalls
========
Performing syscalls from within transaction is not recommended, and can lead
to unpredictable results.
Syscalls made from within an active transaction will not be performed and the
transaction will be doomed by the kernel with the failure code TM_CAUSE_SYSCALL
| TM_CAUSE_PERSISTENT.
Syscalls do not by design abort transactions, but beware: The kernel code will
not be running in transactional state. The effect of syscalls will always
remain visible, but depending on the call they may abort your transaction as a
side-effect, read soon-to-be-aborted transactional data that should not remain
invisible, etc. If you constantly retry a transaction that constantly aborts
itself by calling a syscall, you'll have a livelock & make no progress.
Syscalls made from within a suspended transaction are performed as normal and
the transaction is not explicitly doomed by the kernel. However, what the
kernel does to perform the syscall may result in the transaction being doomed
by the hardware. The syscall is performed in suspended mode so any side
effects will be persistent, independent of transaction success or failure. No
guarantees are provided by the kernel about which syscalls will affect
transaction success.
Simple syscalls (e.g. sigprocmask()) "could" be OK. Even things like write()
from, say, printf() should be OK as long as the kernel does not access any
memory that was accessed transactionally.
Consider any syscalls that happen to work as debug-only -- not recommended for
production use. Best to queue them up till after the transaction is over.
Care must be taken when relying on syscalls to abort during active transactions
if the calls are made via a library. Libraries may cache values (which may
give the appearance of success) or perform operations that cause transaction
failure before entering the kernel (which may produce different failure codes).
Examples are glibc's getpid() and lazy symbol resolution.
Signals
@ -174,10 +175,9 @@ These are defined in <asm/reg.h>, and distinguish different reasons why the
kernel aborted a transaction:
TM_CAUSE_RESCHED Thread was rescheduled.
TM_CAUSE_TLBI Software TLB invalide.
TM_CAUSE_TLBI Software TLB invalid.
TM_CAUSE_FAC_UNAV FP/VEC/VSX unavailable trap.
TM_CAUSE_SYSCALL Currently unused; future syscalls that must abort
transactions for consistency will use this.
TM_CAUSE_SYSCALL Syscall from active transaction.
TM_CAUSE_SIGNAL Signal delivered.
TM_CAUSE_MISC Currently unused.
TM_CAUSE_ALIGNMENT Alignment fault.
@ -185,7 +185,7 @@ kernel aborted a transaction:
These can be checked by the user program's abort handler as TEXASR[0:7]. If
bit 7 is set, it indicates that the error is consider persistent. For example
a TM_CAUSE_ALIGNMENT will be persistent while a TM_CAUSE_RESCHED will not.q
a TM_CAUSE_ALIGNMENT will be persistent while a TM_CAUSE_RESCHED will not.
GDB
===

View File

@ -8,6 +8,21 @@ If variable is of Type, use printk format specifier:
unsigned long long %llu or %llx
size_t %zu or %zx
ssize_t %zd or %zx
s32 %d or %x
u32 %u or %x
s64 %lld or %llx
u64 %llu or %llx
If <type> is dependent on a config option for its size (e.g., sector_t,
blkcnt_t) or is architecture-dependent for its size (e.g., tcflag_t), use a
format specifier of its largest possible type and explicitly cast to it.
Example:
printk("test: sector number/total blocks: %llu/%llu\n",
(unsigned long long)sector, (unsigned long long)blockcount);
Reminder: sizeof() result is of type size_t.
Raw pointer value SHOULD be printed with %p. The kernel supports
the following extended format specifiers for pointer types:
@ -54,6 +69,7 @@ Struct Resources:
For printing struct resources. The 'R' and 'r' specifiers result in a
printed resource with ('R') or without ('r') a decoded flags member.
Passed by reference.
Physical addresses types phys_addr_t:
@ -132,6 +148,8 @@ MAC/FDDI addresses:
specifier to use reversed byte order suitable for visual interpretation
of Bluetooth addresses which are in the little endian order.
Passed by reference.
IPv4 addresses:
%pI4 1.2.3.4
@ -146,6 +164,8 @@ IPv4 addresses:
host, network, big or little endian order addresses respectively. Where
no specifier is provided the default network/big endian order is used.
Passed by reference.
IPv6 addresses:
%pI6 0001:0002:0003:0004:0005:0006:0007:0008
@ -160,6 +180,8 @@ IPv6 addresses:
print a compressed IPv6 address as described by
http://tools.ietf.org/html/rfc5952
Passed by reference.
IPv4/IPv6 addresses (generic, with port, flowinfo, scope):
%pIS 1.2.3.4 or 0001:0002:0003:0004:0005:0006:0007:0008
@ -186,6 +208,8 @@ IPv4/IPv6 addresses (generic, with port, flowinfo, scope):
specifiers can be used as well and are ignored in case of an IPv6
address.
Passed by reference.
Further examples:
%pISfc 1.2.3.4 or [1:2:3:4:5:6:7:8]/123456789
@ -207,6 +231,8 @@ UUID/GUID addresses:
Where no additional specifiers are used the default little endian
order with lower case hex characters will be printed.
Passed by reference.
dentry names:
%pd{,2,3,4}
%pD{,2,3,4}
@ -216,6 +242,8 @@ dentry names:
equivalent of %s dentry->d_name.name we used to use, %pd<n> prints
n last components. %pD does the same thing for struct file.
Passed by reference.
struct va_format:
%pV
@ -231,23 +259,20 @@ struct va_format:
Do not use this feature without some mechanism to verify the
correctness of the format string and va_list arguments.
u64 SHOULD be printed with %llu/%llx:
Passed by reference.
printk("%llu", u64_var);
struct clk:
s64 SHOULD be printed with %lld/%llx:
%pC pll1
%pCn pll1
%pCr 1560000000
printk("%lld", s64_var);
For printing struct clk structures. '%pC' and '%pCn' print the name
(Common Clock Framework) or address (legacy clock framework) of the
structure; '%pCr' prints the current clock rate.
If <type> is dependent on a config option for its size (e.g., sector_t,
blkcnt_t) or is architecture-dependent for its size (e.g., tcflag_t), use a
format specifier of its largest possible type and explicitly cast to it.
Example:
Passed by reference.
printk("test: sector number/total blocks: %llu/%llu\n",
(unsigned long long)sector, (unsigned long long)blockcount);
Reminder: sizeof() result is of type size_t.
Thank you for your cooperation and attention.

View File

@ -33,11 +33,18 @@ The current git repository for Smack user space is:
git://github.com/smack-team/smack.git
This should make and install on most modern distributions.
There are three commands included in smackutil:
There are five commands included in smackutil:
smackload - properly formats data for writing to /smack/load
smackcipso - properly formats data for writing to /smack/cipso
chsmack - display or set Smack extended attribute values
smackctl - load the Smack access rules
smackaccess - report if a process with one label has access
to an object with another
These two commands are obsolete with the introduction of
the smackfs/load2 and smackfs/cipso2 interfaces.
smackload - properly formats data for writing to smackfs/load
smackcipso - properly formats data for writing to smackfs/cipso
In keeping with the intent of Smack, configuration data is
minimal and not strictly required. The most important
@ -47,9 +54,9 @@ of this, but it can be manually as well.
Add this line to /etc/fstab:
smackfs /smack smackfs smackfsdef=* 0 0
smackfs /sys/fs/smackfs smackfs defaults 0 0
and create the /smack directory for mounting.
The /sys/fs/smackfs directory is created by the kernel.
Smack uses extended attributes (xattrs) to store labels on filesystem
objects. The attributes are stored in the extended attribute security
@ -92,13 +99,13 @@ There are multiple ways to set a Smack label on a file:
# attr -S -s SMACK64 -V "value" path
# chsmack -a value path
A process can see the smack label it is running with by
A process can see the Smack label it is running with by
reading /proc/self/attr/current. A process with CAP_MAC_ADMIN
can set the process smack by writing there.
can set the process Smack by writing there.
Most Smack configuration is accomplished by writing to files
in the smackfs filesystem. This pseudo-filesystem is usually
mounted on /smack.
in the smackfs filesystem. This pseudo-filesystem is mounted
on /sys/fs/smackfs.
access
This interface reports whether a subject with the specified
@ -206,23 +213,30 @@ onlycap
file or cleared by writing "-" to the file.
ptrace
This is used to define the current ptrace policy
0 - default: this is the policy that relies on smack access rules.
0 - default: this is the policy that relies on Smack access rules.
For the PTRACE_READ a subject needs to have a read access on
object. For the PTRACE_ATTACH a read-write access is required.
1 - exact: this is the policy that limits PTRACE_ATTACH. Attach is
only allowed when subject's and object's labels are equal.
PTRACE_READ is not affected. Can be overriden with CAP_SYS_PTRACE.
PTRACE_READ is not affected. Can be overridden with CAP_SYS_PTRACE.
2 - draconian: this policy behaves like the 'exact' above with an
exception that it can't be overriden with CAP_SYS_PTRACE.
exception that it can't be overridden with CAP_SYS_PTRACE.
revoke-subject
Writing a Smack label here sets the access to '-' for all access
rules with that subject label.
unconfined
If the kernel is configured with CONFIG_SECURITY_SMACK_BRINGUP
a process with CAP_MAC_ADMIN can write a label into this interface.
Thereafter, accesses that involve that label will be logged and
the access permitted if it wouldn't be otherwise. Note that this
is dangerous and can ruin the proper labeling of your system.
It should never be used in production.
You can add access rules in /etc/smack/accesses. They take the form:
subjectlabel objectlabel access
access is a combination of the letters rwxa which specify the
access is a combination of the letters rwxatb which specify the
kind of access permitted a subject with subjectlabel on an
object with objectlabel. If there is no rule no access is allowed.
@ -318,8 +332,9 @@ each of the subject and the object.
Labels
Smack labels are ASCII character strings, one to twenty-three characters in
length. Single character labels using special characters, that being anything
Smack labels are ASCII character strings. They can be up to 255 characters
long, but keeping them to twenty-three characters is recommended.
Single character labels using special characters, that being anything
other than a letter or digit, are reserved for use by the Smack development
team. Smack labels are unstructured, case sensitive, and the only operation
ever performed on them is comparison for equality. Smack labels cannot
@ -335,10 +350,9 @@ There are some predefined labels:
? Pronounced "huh", a single question mark character.
@ Pronounced "web", a single at sign character.
Every task on a Smack system is assigned a label. System tasks, such as
init(8) and systems daemons, are run with the floor ("_") label. User tasks
are assigned labels according to the specification found in the
/etc/smack/user configuration file.
Every task on a Smack system is assigned a label. The Smack label
of a process will usually be assigned by the system initialization
mechanism.
Access Rules
@ -393,6 +407,7 @@ describe access modes:
w: indicates that write access should be granted.
x: indicates that execute access should be granted.
t: indicates that the rule requests transmutation.
b: indicates that the rule should be reported for bring-up.
Uppercase values for the specification letters are allowed as well.
Access mode specifications can be in any order. Examples of acceptable rules
@ -402,6 +417,7 @@ are:
Secret Unclass R
Manager Game x
User HR w
Snap Crackle rwxatb
New Old rRrRr
Closed Off -
@ -413,7 +429,7 @@ Examples of unacceptable rules are:
Spaces are not allowed in labels. Since a subject always has access to files
with the same label specifying a rule for that case is pointless. Only
valid letters (rwxatRWXAT) and the dash ('-') character are allowed in
valid letters (rwxatbRWXATB) and the dash ('-') character are allowed in
access specifications. The dash is a placeholder, so "a-r" is the same
as "ar". A lone dash is used to specify that no access should be allowed.
@ -462,16 +478,11 @@ receiver. The receiver is not required to have read access to the sender.
Setting Access Rules
The configuration file /etc/smack/accesses contains the rules to be set at
system startup. The contents are written to the special file /smack/load.
Rules can be written to /smack/load at any time and take effect immediately.
For any pair of subject and object labels there can be only one rule, with the
most recently specified overriding any earlier specification.
The program smackload is provided to ensure data is formatted
properly when written to /smack/load. This program reads lines
of the form
subjectlabel objectlabel mode.
system startup. The contents are written to the special file
/sys/fs/smackfs/load2. Rules can be added at any time and take effect
immediately. For any pair of subject and object labels there can be only
one rule, with the most recently specified overriding any earlier
specification.
Task Attribute
@ -488,7 +499,10 @@ only be changed by a process with privilege.
Privilege
A process with CAP_MAC_OVERRIDE is privileged.
A process with CAP_MAC_OVERRIDE or CAP_MAC_ADMIN is privileged.
CAP_MAC_OVERRIDE allows the process access to objects it would
be denied otherwise. CAP_MAC_ADMIN allows a process to change
Smack data, including rules and attributes.
Smack Networking
@ -510,14 +524,14 @@ intervention. Unlabeled packets that come into the system will be given the
ambient label.
Smack requires configuration in the case where packets from a system that is
not smack that speaks CIPSO may be encountered. Usually this will be a Trusted
not Smack that speaks CIPSO may be encountered. Usually this will be a Trusted
Solaris system, but there are other, less widely deployed systems out there.
CIPSO provides 3 important values, a Domain Of Interpretation (DOI), a level,
and a category set with each packet. The DOI is intended to identify a group
of systems that use compatible labeling schemes, and the DOI specified on the
smack system must match that of the remote system or packets will be
discarded. The DOI is 3 by default. The value can be read from /smack/doi and
can be changed by writing to /smack/doi.
Smack system must match that of the remote system or packets will be
discarded. The DOI is 3 by default. The value can be read from
/sys/fs/smackfs/doi and can be changed by writing to /sys/fs/smackfs/doi.
The label and category set are mapped to a Smack label as defined in
/etc/smack/cipso.
@ -539,15 +553,13 @@ The ":" and "," characters are permitted in a Smack label but have no special
meaning.
The mapping of Smack labels to CIPSO values is defined by writing to
/smack/cipso. Again, the format of data written to this special file
is highly restrictive, so the program smackcipso is provided to
ensure the writes are done properly. This program takes mappings
on the standard input and sends them to /smack/cipso properly.
/sys/fs/smackfs/cipso2.
In addition to explicit mappings Smack supports direct CIPSO mappings. One
CIPSO level is used to indicate that the category set passed in the packet is
in fact an encoding of the Smack label. The level used is 250 by default. The
value can be read from /smack/direct and changed by writing to /smack/direct.
value can be read from /sys/fs/smackfs/direct and changed by writing to
/sys/fs/smackfs/direct.
Socket Attributes
@ -565,8 +577,8 @@ sockets.
Smack Netlabel Exceptions
You will often find that your labeled application has to talk to the outside,
unlabeled world. To do this there's a special file /smack/netlabel where you can
add some exceptions in the form of :
unlabeled world. To do this there's a special file /sys/fs/smackfs/netlabel
where you can add some exceptions in the form of :
@IP1 LABEL1 or
@IP2/MASK LABEL2
@ -574,22 +586,22 @@ It means that your application will have unlabeled access to @IP1 if it has
write access on LABEL1, and access to the subnet @IP2/MASK if it has write
access on LABEL2.
Entries in the /smack/netlabel file are matched by longest mask first, like in
classless IPv4 routing.
Entries in the /sys/fs/smackfs/netlabel file are matched by longest mask
first, like in classless IPv4 routing.
A special label '@' and an option '-CIPSO' can be used there :
@ means Internet, any application with any label has access to it
-CIPSO means standard CIPSO networking
If you don't know what CIPSO is and don't plan to use it, you can just do :
echo 127.0.0.1 -CIPSO > /smack/netlabel
echo 0.0.0.0/0 @ > /smack/netlabel
echo 127.0.0.1 -CIPSO > /sys/fs/smackfs/netlabel
echo 0.0.0.0/0 @ > /sys/fs/smackfs/netlabel
If you use CIPSO on your 192.168.0.0/16 local network and need also unlabeled
Internet access, you can have :
echo 127.0.0.1 -CIPSO > /smack/netlabel
echo 192.168.0.0/16 -CIPSO > /smack/netlabel
echo 0.0.0.0/0 @ > /smack/netlabel
echo 127.0.0.1 -CIPSO > /sys/fs/smackfs/netlabel
echo 192.168.0.0/16 -CIPSO > /sys/fs/smackfs/netlabel
echo 0.0.0.0/0 @ > /sys/fs/smackfs/netlabel
Writing Applications for Smack
@ -676,7 +688,7 @@ Smack auditing
If you want Smack auditing of security events, you need to set CONFIG_AUDIT
in your kernel configuration.
By default, all denied events will be audited. You can change this behavior by
writing a single character to the /smack/logging file :
writing a single character to the /sys/fs/smackfs/logging file :
0 : no logging
1 : log denied (default)
2 : log accepted
@ -686,3 +698,20 @@ Events are logged as 'key=value' pairs, for each event you at least will get
the subject, the object, the rights requested, the action, the kernel function
that triggered the event, plus other pairs depending on the type of event
audited.
Bringup Mode
Bringup mode provides logging features that can make application
configuration and system bringup easier. Configure the kernel with
CONFIG_SECURITY_SMACK_BRINGUP to enable these features. When bringup
mode is enabled accesses that succeed due to rules marked with the "b"
access mode will logged. When a new label is introduced for processes
rules can be added aggressively, marked with the "b". The logging allows
tracking of which rules actual get used for that label.
Another feature of bringup mode is the "unconfined" option. Writing
a label to /sys/fs/smackfs/unconfined makes subjects with that label
able to access any object, and objects with that label accessible to
all subjects. Any access that is granted because a label is unconfined
is logged. This feature is dangerous, as files and directories may
be created in places they couldn't if the policy were being enforced.

View File

@ -71,11 +71,11 @@ SOURCE:
HDMI/DP (either HDMI or DisplayPort)
Exceptions (deprecated):
[Digital] Capture Source
[Digital] Capture Switch (aka input gain switch)
[Digital] Capture Volume (aka input gain volume)
[Digital] Playback Switch (aka output gain switch)
[Digital] Playback Volume (aka output gain volume)
[Analogue|Digital] Capture Source
[Analogue|Digital] Capture Switch (aka input gain switch)
[Analogue|Digital] Capture Volume (aka input gain volume)
[Analogue|Digital] Playback Switch (aka output gain switch)
[Analogue|Digital] Playback Volume (aka output gain volume)
Tone Control - Switch
Tone Control - Bass
Tone Control - Treble

View File

@ -466,7 +466,11 @@ The generic parser supports the following hints:
- add_jack_modes (bool): add "xxx Jack Mode" enum controls to each
I/O jack for allowing to change the headphone amp and mic bias VREF
capabilities
- power_down_unused (bool): power down the unused widgets
- power_save_node (bool): advanced power management for each widget,
controlling the power sate (D0/D3) of each widget node depending on
the actual pin and stream states
- power_down_unused (bool): power down the unused widgets, a subset of
power_save_node, and will be dropped in future
- add_hp_mic (bool): add the headphone to capture source if possible
- hp_mic_detect (bool): enable/disable the hp/mic shared input for a
single built-in mic case; default true

View File

@ -0,0 +1,200 @@
The ALSA API can provide two different system timestamps:
- Trigger_tstamp is the system time snapshot taken when the .trigger
callback is invoked. This snapshot is taken by the ALSA core in the
general case, but specific hardware may have synchronization
capabilities or conversely may only be able to provide a correct
estimate with a delay. In the latter two cases, the low-level driver
is responsible for updating the trigger_tstamp at the most appropriate
and precise moment. Applications should not rely solely on the first
trigger_tstamp but update their internal calculations if the driver
provides a refined estimate with a delay.
- tstamp is the current system timestamp updated during the last
event or application query.
The difference (tstamp - trigger_tstamp) defines the elapsed time.
The ALSA API provides reports two basic pieces of information, avail
and delay, which combined with the trigger and current system
timestamps allow for applications to keep track of the 'fullness' of
the ring buffer and the amount of queued samples.
The use of these different pointers and time information depends on
the application needs:
- 'avail' reports how much can be written in the ring buffer
- 'delay' reports the time it will take to hear a new sample after all
queued samples have been played out.
When timestamps are enabled, the avail/delay information is reported
along with a snapshot of system time. Applications can select from
CLOCK_REALTIME (NTP corrections including going backwards),
CLOCK_MONOTONIC (NTP corrections but never going backwards),
CLOCK_MONOTIC_RAW (without NTP corrections) and change the mode
dynamically with sw_params
The ALSA API also provide an audio_tstamp which reflects the passage
of time as measured by different components of audio hardware. In
ascii-art, this could be represented as follows (for the playback
case):
--------------------------------------------------------------> time
^ ^ ^ ^ ^
| | | | |
analog link dma app FullBuffer
time time time time time
| | | | |
|< codec delay >|<--hw delay-->|<queued samples>|<---avail->|
|<----------------- delay---------------------->| |
|<----ring buffer length---->|
The analog time is taken at the last stage of the playback, as close
as possible to the actual transducer
The link time is taken at the output of the SOC/chipset as the samples
are pushed on a link. The link time can be directly measured if
supported in hardware by sample counters or wallclocks (e.g. with
HDAudio 24MHz or PTP clock for networked solutions) or indirectly
estimated (e.g. with the frame counter in USB).
The DMA time is measured using counters - typically the least reliable
of all measurements due to the bursty natured of DMA transfers.
The app time corresponds to the time tracked by an application after
writing in the ring buffer.
The application can query what the hardware supports, define which
audio time it wants reported by selecting the relevant settings in
audio_tstamp_config fields, get an estimate of the timestamp
accuracy. It can also request the delay-to-analog be included in the
measurement. Direct access to the link time is very interesting on
platforms that provide an embedded DSP; measuring directly the link
time with dedicated hardware, possibly synchronized with system time,
removes the need to keep track of internal DSP processing times and
latency.
In case the application requests an audio tstamp that is not supported
in hardware/low-level driver, the type is overridden as DEFAULT and the
timestamp will report the DMA time based on the hw_pointer value.
For backwards compatibility with previous implementations that did not
provide timestamp selection, with a zero-valued COMPAT timestamp type
the results will default to the HDAudio wall clock for playback
streams and to the DMA time (hw_ptr) in all other cases.
The audio timestamp accuracy can be returned to user-space, so that
appropriate decisions are made:
- for dma time (default), the granularity of the transfers can be
inferred from the steps between updates and in turn provide
information on how much the application pointer can be rewound
safely.
- the link time can be used to track long-term drifts between audio
and system time using the (tstamp-trigger_tstamp)/audio_tstamp
ratio, the precision helps define how much smoothing/low-pass
filtering is required. The link time can be either reset on startup
or reported as is (the latter being useful to compare progress of
different streams - but may require the wallclock to be always
running and not wrap-around during idle periods). If supported in
hardware, the absolute link time could also be used to define a
precise start time (patches WIP)
- including the delay in the audio timestamp may
counter-intuitively not increase the precision of timestamps, e.g. if a
codec includes variable-latency DSP processing or a chain of
hardware components the delay is typically not known with precision.
The accuracy is reported in nanosecond units (using an unsigned 32-bit
word), which gives a max precision of 4.29s, more than enough for
audio applications...
Due to the varied nature of timestamping needs, even for a single
application, the audio_tstamp_config can be changed dynamically. In
the STATUS ioctl, the parameters are read-only and do not allow for
any application selection. To work around this limitation without
impacting legacy applications, a new STATUS_EXT ioctl is introduced
with read/write parameters. ALSA-lib will be modified to make use of
STATUS_EXT and effectively deprecate STATUS.
The ALSA API only allows for a single audio timestamp to be reported
at a time. This is a conscious design decision, reading the audio
timestamps from hardware registers or from IPC takes time, the more
timestamps are read the more imprecise the combined measurements
are. To avoid any interpretation issues, a single (system, audio)
timestamp is reported. Applications that need different timestamps
will be required to issue multiple queries and perform an
interpolation of the results
In some hardware-specific configuration, the system timestamp is
latched by a low-level audio subsytem, and the information provided
back to the driver. Due to potential delays in the communication with
the hardware, there is a risk of misalignment with the avail and delay
information. To make sure applications are not confused, a
driver_timestamp field is added in the snd_pcm_status structure; this
timestamp shows when the information is put together by the driver
before returning from the STATUS and STATUS_EXT ioctl. in most cases
this driver_timestamp will be identical to the regular system tstamp.
Examples of typestamping with HDaudio:
1. DMA timestamp, no compensation for DMA+analog delay
$ ./audio_time -p --ts_type=1
playback: systime: 341121338 nsec, audio time 342000000 nsec, systime delta -878662
playback: systime: 426236663 nsec, audio time 427187500 nsec, systime delta -950837
playback: systime: 597080580 nsec, audio time 598000000 nsec, systime delta -919420
playback: systime: 682059782 nsec, audio time 683020833 nsec, systime delta -961051
playback: systime: 852896415 nsec, audio time 853854166 nsec, systime delta -957751
playback: systime: 937903344 nsec, audio time 938854166 nsec, systime delta -950822
2. DMA timestamp, compensation for DMA+analog delay
$ ./audio_time -p --ts_type=1 -d
playback: systime: 341053347 nsec, audio time 341062500 nsec, systime delta -9153
playback: systime: 426072447 nsec, audio time 426062500 nsec, systime delta 9947
playback: systime: 596899518 nsec, audio time 596895833 nsec, systime delta 3685
playback: systime: 681915317 nsec, audio time 681916666 nsec, systime delta -1349
playback: systime: 852741306 nsec, audio time 852750000 nsec, systime delta -8694
3. link timestamp, compensation for DMA+analog delay
$ ./audio_time -p --ts_type=2 -d
playback: systime: 341060004 nsec, audio time 341062791 nsec, systime delta -2787
playback: systime: 426242074 nsec, audio time 426244875 nsec, systime delta -2801
playback: systime: 597080992 nsec, audio time 597084583 nsec, systime delta -3591
playback: systime: 682084512 nsec, audio time 682088291 nsec, systime delta -3779
playback: systime: 852936229 nsec, audio time 852940916 nsec, systime delta -4687
playback: systime: 938107562 nsec, audio time 938112708 nsec, systime delta -5146
Example 1 shows that the timestamp at the DMA level is close to 1ms
ahead of the actual playback time (as a side time this sort of
measurement can help define rewind safeguards). Compensating for the
DMA-link delay in example 2 helps remove the hardware buffering abut
the information is still very jittery, with up to one sample of
error. In example 3 where the timestamps are measured with the link
wallclock, the timestamps show a monotonic behavior and a lower
dispersion.
Example 3 and 4 are with USB audio class. Example 3 shows a high
offset between audio time and system time due to buffering. Example 4
shows how compensating for the delay exposes a 1ms accuracy (due to
the use of the frame counter by the driver)
Example 3: DMA timestamp, no compensation for delay, delta of ~5ms
$ ./audio_time -p -Dhw:1 -t1
playback: systime: 120174019 nsec, audio time 125000000 nsec, systime delta -4825981
playback: systime: 245041136 nsec, audio time 250000000 nsec, systime delta -4958864
playback: systime: 370106088 nsec, audio time 375000000 nsec, systime delta -4893912
playback: systime: 495040065 nsec, audio time 500000000 nsec, systime delta -4959935
playback: systime: 620038179 nsec, audio time 625000000 nsec, systime delta -4961821
playback: systime: 745087741 nsec, audio time 750000000 nsec, systime delta -4912259
playback: systime: 870037336 nsec, audio time 875000000 nsec, systime delta -4962664
Example 4: DMA timestamp, compensation for delay, delay of ~1ms
$ ./audio_time -p -Dhw:1 -t1 -d
playback: systime: 120190520 nsec, audio time 120000000 nsec, systime delta 190520
playback: systime: 245036740 nsec, audio time 244000000 nsec, systime delta 1036740
playback: systime: 370034081 nsec, audio time 369000000 nsec, systime delta 1034081
playback: systime: 495159907 nsec, audio time 494000000 nsec, systime delta 1159907
playback: systime: 620098824 nsec, audio time 619000000 nsec, systime delta 1098824
playback: systime: 745031847 nsec, audio time 744000000 nsec, systime delta 1031847

View File

@ -80,7 +80,7 @@ static void hex_dump(const void *src, size_t length, size_t line_size, char *pre
* Unescape - process hexadecimal escape character
* converts shell input "\x23" -> 0x23
*/
int unespcape(char *_dst, char *_src, size_t len)
static int unescape(char *_dst, char *_src, size_t len)
{
int ret = 0;
char *src = _src;
@ -304,7 +304,7 @@ int main(int argc, char *argv[])
size = strlen(input_tx+1);
tx = malloc(size);
rx = malloc(size);
size = unespcape((char *)tx, input_tx, size);
size = unescape((char *)tx, input_tx, size);
transfer(fd, tx, rx, size);
free(rx);
free(tx);

View File

@ -872,6 +872,27 @@ can be ORed together:
==============================================================
threads-max
This value controls the maximum number of threads that can be created
using fork().
During initialization the kernel sets this value such that even if the
maximum number of threads is created, the thread structures occupy only
a part (1/8th) of the available RAM pages.
The minimum value that can be written to threads-max is 20.
The maximum value that can be written to threads-max is given by the
constant FUTEX_TID_MASK (0x3fffffff).
If a value outside of this range is written to threads-max an error
EINVAL occurs.
The value written is checked against the available RAM pages. If the
thread structures would occupy too much (more than 1/8th) of the
available RAM pages threads-max is reduced accordingly.
==============================================================
unknown_nmi_panic:
The value in this file affects behavior of handling NMI. When the

View File

@ -21,6 +21,7 @@ Currently, these files are in /proc/sys/vm:
- admin_reserve_kbytes
- block_dump
- compact_memory
- compact_unevictable_allowed
- dirty_background_bytes
- dirty_background_ratio
- dirty_bytes
@ -106,6 +107,16 @@ huge pages although processes will also directly compact memory as required.
==============================================================
compact_unevictable_allowed
Available only when CONFIG_COMPACTION is set. When set to 1, compaction is
allowed to examine the unevictable lru (mlocked pages) for pages to compact.
This should be used on systems where stalls for minor page faults are an
acceptable trade for large contiguous free memory. Set to 0 to prevent
compaction from moving pages that are unevictable. Default value is 1.
==============================================================
dirty_background_bytes
Contains the amount of dirty memory at which the background kernel

View File

@ -267,21 +267,34 @@ call, then it is required that system administrator mount a file system of
type hugetlbfs:
mount -t hugetlbfs \
-o uid=<value>,gid=<value>,mode=<value>,size=<value>,nr_inodes=<value> \
none /mnt/huge
-o uid=<value>,gid=<value>,mode=<value>,pagesize=<value>,size=<value>,\
min_size=<value>,nr_inodes=<value> none /mnt/huge
This command mounts a (pseudo) filesystem of type hugetlbfs on the directory
/mnt/huge. Any files created on /mnt/huge uses huge pages. The uid and gid
options sets the owner and group of the root of the file system. By default
the uid and gid of the current process are taken. The mode option sets the
mode of root of file system to value & 01777. This value is given in octal.
By default the value 0755 is picked. The size option sets the maximum value of
memory (huge pages) allowed for that filesystem (/mnt/huge). The size is
rounded down to HPAGE_SIZE. The option nr_inodes sets the maximum number of
inodes that /mnt/huge can use. If the size or nr_inodes option is not
provided on command line then no limits are set. For size and nr_inodes
options, you can use [G|g]/[M|m]/[K|k] to represent giga/mega/kilo. For
example, size=2K has the same meaning as size=2048.
By default the value 0755 is picked. If the paltform supports multiple huge
page sizes, the pagesize option can be used to specify the huge page size and
associated pool. pagesize is specified in bytes. If pagesize is not specified
the paltform's default huge page size and associated pool will be used. The
size option sets the maximum value of memory (huge pages) allowed for that
filesystem (/mnt/huge). The size option can be specified in bytes, or as a
percentage of the specified huge page pool (nr_hugepages). The size is
rounded down to HPAGE_SIZE boundary. The min_size option sets the minimum
value of memory (huge pages) allowed for the filesystem. min_size can be
specified in the same way as size, either bytes or a percentage of the
huge page pool. At mount time, the number of huge pages specified by
min_size are reserved for use by the filesystem. If there are not enough
free huge pages available, the mount will fail. As huge pages are allocated
to the filesystem and freed, the reserve count is adjusted so that the sum
of allocated and reserved huge pages is always at least min_size. The option
nr_inodes sets the maximum number of inodes that /mnt/huge can use. If the
size, min_size or nr_inodes option is not provided on command line then
no limits are set. For pagesize, size, min_size and nr_inodes options, you
can use [G|g]/[M|m]/[K|k] to represent giga/mega/kilo. For example, size=2K
has the same meaning as size=2048.
While read system calls are supported on files that reside on hugetlb
file systems, write system calls are not.
@ -289,15 +302,23 @@ file systems, write system calls are not.
Regular chown, chgrp, and chmod commands (with right permissions) could be
used to change the file attributes on hugetlbfs.
Also, it is important to note that no such mount command is required if the
Also, it is important to note that no such mount command is required if
applications are going to use only shmat/shmget system calls or mmap with
MAP_HUGETLB. Users who wish to use hugetlb page via shared memory segment
should be a member of a supplementary group and system admin needs to
configure that gid into /proc/sys/vm/hugetlb_shm_group. It is possible for
same or different applications to use any combination of mmaps and shm*
calls, though the mount of filesystem will be required for using mmap calls
without MAP_HUGETLB. For an example of how to use mmap with MAP_HUGETLB see
map_hugetlb.c.
MAP_HUGETLB. For an example of how to use mmap with MAP_HUGETLB see map_hugetlb
below.
Users who wish to use hugetlb memory via shared memory segment should be a
member of a supplementary group and system admin needs to configure that gid
into /proc/sys/vm/hugetlb_shm_group. It is possible for same or different
applications to use any combination of mmaps and shm* calls, though the mount of
filesystem will be required for using mmap calls without MAP_HUGETLB.
Syscalls that operate on memory backed by hugetlb pages only have their lengths
aligned to the native page size of the processor; they will normally fail with
errno set to EINVAL or exclude hugetlb pages that extend beyond the length if
not hugepage aligned. For example, munmap(2) will fail if memory is backed by
a hugetlb page and the length is smaller than the hugepage size.
Examples
========

View File

@ -22,6 +22,7 @@ CONTENTS
- Filtering special vmas.
- munlock()/munlockall() system call handling.
- Migrating mlocked pages.
- Compacting mlocked pages.
- mmap(MAP_LOCKED) system call handling.
- munmap()/exit()/exec() system call handling.
- try_to_unmap().
@ -450,6 +451,17 @@ list because of a race between munlock and migration, page migration uses the
putback_lru_page() function to add migrated pages back to the LRU.
COMPACTING MLOCKED PAGES
------------------------
The unevictable LRU can be scanned for compactable regions and the default
behavior is to do so. /proc/sys/vm/compact_unevictable_allowed controls
this behavior (see Documentation/sysctl/vm.txt). Once scanning of the
unevictable LRU is enabled, the work of compaction is mostly handled by
the page migration code and the same work flow as described in MIGRATING
MLOCKED PAGES will apply.
mmap(MAP_LOCKED) SYSTEM CALL HANDLING
-------------------------------------

View File

@ -0,0 +1,70 @@
zsmalloc
--------
This allocator is designed for use with zram. Thus, the allocator is
supposed to work well under low memory conditions. In particular, it
never attempts higher order page allocation which is very likely to
fail under memory pressure. On the other hand, if we just use single
(0-order) pages, it would suffer from very high fragmentation --
any object of size PAGE_SIZE/2 or larger would occupy an entire page.
This was one of the major issues with its predecessor (xvmalloc).
To overcome these issues, zsmalloc allocates a bunch of 0-order pages
and links them together using various 'struct page' fields. These linked
pages act as a single higher-order page i.e. an object can span 0-order
page boundaries. The code refers to these linked pages as a single entity
called zspage.
For simplicity, zsmalloc can only allocate objects of size up to PAGE_SIZE
since this satisfies the requirements of all its current users (in the
worst case, page is incompressible and is thus stored "as-is" i.e. in
uncompressed form). For allocation requests larger than this size, failure
is returned (see zs_malloc).
Additionally, zs_malloc() does not return a dereferenceable pointer.
Instead, it returns an opaque handle (unsigned long) which encodes actual
location of the allocated object. The reason for this indirection is that
zsmalloc does not keep zspages permanently mapped since that would cause
issues on 32-bit systems where the VA region for kernel space mappings
is very small. So, before using the allocating memory, the object has to
be mapped using zs_map_object() to get a usable pointer and subsequently
unmapped using zs_unmap_object().
stat
----
With CONFIG_ZSMALLOC_STAT, we could see zsmalloc internal information via
/sys/kernel/debug/zsmalloc/<user name>. Here is a sample of stat output:
# cat /sys/kernel/debug/zsmalloc/zram0/classes
class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage
..
..
9 176 0 1 186 129 8 4
10 192 1 0 2880 2872 135 3
11 208 0 1 819 795 42 2
12 224 0 1 219 159 12 4
..
..
class: index
size: object size zspage stores
almost_empty: the number of ZS_ALMOST_EMPTY zspages(see below)
almost_full: the number of ZS_ALMOST_FULL zspages(see below)
obj_allocated: the number of objects allocated
obj_used: the number of objects allocated to the user
pages_used: the number of pages allocated for the class
pages_per_zspage: the number of 0-order pages to make a zspage
We assign a zspage to ZS_ALMOST_EMPTY fullness group when:
n <= N / f, where
n = number of allocated objects
N = total number of objects zspage can store
f = fullness_threshold_frac(ie, 4 at the moment)
Similarly, we assign zspage to:
ZS_ALMOST_FULL when n > N / f
ZS_EMPTY when n == 0
ZS_FULL when n == N

21
Kbuild
View File

@ -13,8 +13,9 @@ define sed-y
s:->::; p;}"
endef
quiet_cmd_offsets = GEN $@
define cmd_offsets
# Use filechk to avoid rebuilds when a header changes, but the resulting file
# does not
define filechk_offsets
(set -e; \
echo "#ifndef $2"; \
echo "#define $2"; \
@ -24,9 +25,9 @@ define cmd_offsets
echo " * This file was generated by Kbuild"; \
echo " */"; \
echo ""; \
sed -ne $(sed-y) $<; \
sed -ne $(sed-y); \
echo ""; \
echo "#endif" ) > $@
echo "#endif" )
endef
#####
@ -35,16 +36,15 @@ endef
bounds-file := include/generated/bounds.h
always := $(bounds-file)
targets := $(bounds-file) kernel/bounds.s
targets := kernel/bounds.s
# We use internal kbuild rules to avoid the "is up to date" message from make
kernel/bounds.s: kernel/bounds.c FORCE
$(Q)mkdir -p $(dir $@)
$(call if_changed_dep,cc_s_c)
$(obj)/$(bounds-file): kernel/bounds.s Kbuild
$(Q)mkdir -p $(dir $@)
$(call cmd,offsets,__LINUX_BOUNDS_H__)
$(obj)/$(bounds-file): kernel/bounds.s FORCE
$(call filechk,offsets,__LINUX_BOUNDS_H__)
#####
# 2) Generate asm-offsets.h
@ -53,7 +53,6 @@ $(obj)/$(bounds-file): kernel/bounds.s Kbuild
offsets-file := include/generated/asm-offsets.h
always += $(offsets-file)
targets += $(offsets-file)
targets += arch/$(SRCARCH)/kernel/asm-offsets.s
# We use internal kbuild rules to avoid the "is up to date" message from make
@ -62,8 +61,8 @@ arch/$(SRCARCH)/kernel/asm-offsets.s: arch/$(SRCARCH)/kernel/asm-offsets.c \
$(Q)mkdir -p $(dir $@)
$(call if_changed_dep,cc_s_c)
$(obj)/$(offsets-file): arch/$(SRCARCH)/kernel/asm-offsets.s Kbuild
$(call cmd,offsets,__ASM_OFFSETS_H__)
$(obj)/$(offsets-file): arch/$(SRCARCH)/kernel/asm-offsets.s FORCE
$(call filechk,offsets,__ASM_OFFSETS_H__)
#####
# 3) Check for missing system calls

View File

@ -625,16 +625,16 @@ F: drivers/iommu/amd_iommu*.[ch]
F: include/linux/amd-iommu.h
AMD KFD
M: Oded Gabbay <oded.gabbay@amd.com>
L: dri-devel@lists.freedesktop.org
T: git git://people.freedesktop.org/~gabbayo/linux.git
S: Supported
F: drivers/gpu/drm/amd/amdkfd/
M: Oded Gabbay <oded.gabbay@amd.com>
L: dri-devel@lists.freedesktop.org
T: git git://people.freedesktop.org/~gabbayo/linux.git
S: Supported
F: drivers/gpu/drm/amd/amdkfd/
F: drivers/gpu/drm/amd/include/cik_structs.h
F: drivers/gpu/drm/amd/include/kgd_kfd_interface.h
F: drivers/gpu/drm/radeon/radeon_kfd.c
F: drivers/gpu/drm/radeon/radeon_kfd.h
F: include/uapi/linux/kfd_ioctl.h
F: drivers/gpu/drm/radeon/radeon_kfd.c
F: drivers/gpu/drm/radeon/radeon_kfd.h
F: include/uapi/linux/kfd_ioctl.h
AMD MICROCODE UPDATE SUPPORT
M: Borislav Petkov <bp@alien8.de>
@ -1215,6 +1215,7 @@ F: arch/arm/mach-orion5x/ts78xx-*
ARM/Mediatek SoC support
M: Matthias Brugger <matthias.bgg@gmail.com>
L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
L: linux-mediatek@lists.infradead.org (moderated for non-subscribers)
S: Maintained
F: arch/arm/boot/dts/mt6*
F: arch/arm/boot/dts/mt8*
@ -1764,7 +1765,7 @@ S: Supported
F: drivers/tty/serial/atmel_serial.c
ATMEL Audio ALSA driver
M: Bo Shen <voice.shen@atmel.com>
M: Nicolas Ferre <nicolas.ferre@atmel.com>
L: alsa-devel@alsa-project.org (moderated for non-subscribers)
S: Supported
F: sound/soc/atmel
@ -1915,16 +1916,14 @@ S: Maintained
F: drivers/media/radio/radio-aztech*
B43 WIRELESS DRIVER
M: Stefano Brivio <stefano.brivio@polimi.it>
L: linux-wireless@vger.kernel.org
L: b43-dev@lists.infradead.org
W: http://wireless.kernel.org/en/users/Drivers/b43
S: Maintained
S: Odd Fixes
F: drivers/net/wireless/b43/
B43LEGACY WIRELESS DRIVER
M: Larry Finger <Larry.Finger@lwfinger.net>
M: Stefano Brivio <stefano.brivio@polimi.it>
L: linux-wireless@vger.kernel.org
L: b43-dev@lists.infradead.org
W: http://wireless.kernel.org/en/users/Drivers/b43
@ -1967,10 +1966,10 @@ F: Documentation/filesystems/befs.txt
F: fs/befs/
BECKHOFF CX5020 ETHERCAT MASTER DRIVER
M: Dariusz Marcinkiewicz <reksio@newterm.pl>
L: netdev@vger.kernel.org
S: Maintained
F: drivers/net/ethernet/ec_bhf.c
M: Dariusz Marcinkiewicz <reksio@newterm.pl>
L: netdev@vger.kernel.org
S: Maintained
F: drivers/net/ethernet/ec_bhf.c
BFS FILE SYSTEM
M: "Tigran A. Aivazian" <tigran@aivazian.fsnet.co.uk>
@ -2825,6 +2824,7 @@ L: linux-crypto@vger.kernel.org
T: git git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git
S: Maintained
F: Documentation/crypto/
F: Documentation/DocBook/crypto-API.tmpl
F: arch/*/crypto/
F: crypto/
F: drivers/crypto/
@ -2895,11 +2895,11 @@ S: Supported
F: drivers/net/ethernet/chelsio/cxgb3/
CXGB3 ISCSI DRIVER (CXGB3I)
M: Karen Xie <kxie@chelsio.com>
L: linux-scsi@vger.kernel.org
W: http://www.chelsio.com
S: Supported
F: drivers/scsi/cxgbi/cxgb3i
M: Karen Xie <kxie@chelsio.com>
L: linux-scsi@vger.kernel.org
W: http://www.chelsio.com
S: Supported
F: drivers/scsi/cxgbi/cxgb3i
CXGB3 IWARP RNIC DRIVER (IW_CXGB3)
M: Steve Wise <swise@chelsio.com>
@ -2916,11 +2916,11 @@ S: Supported
F: drivers/net/ethernet/chelsio/cxgb4/
CXGB4 ISCSI DRIVER (CXGB4I)
M: Karen Xie <kxie@chelsio.com>
L: linux-scsi@vger.kernel.org
W: http://www.chelsio.com
S: Supported
F: drivers/scsi/cxgbi/cxgb4i
M: Karen Xie <kxie@chelsio.com>
L: linux-scsi@vger.kernel.org
W: http://www.chelsio.com
S: Supported
F: drivers/scsi/cxgbi/cxgb4i
CXGB4 IWARP RNIC DRIVER (IW_CXGB4)
M: Steve Wise <swise@chelsio.com>
@ -5222,7 +5222,7 @@ F: arch/x86/kernel/tboot.c
INTEL WIRELESS WIMAX CONNECTION 2400
M: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
M: linux-wimax@intel.com
L: wimax@linuxwimax.org (subscribers-only)
L: wimax@linuxwimax.org (subscribers-only)
S: Supported
W: http://linuxwimax.org
F: Documentation/wimax/README.i2400m
@ -5300,6 +5300,13 @@ F: drivers/char/ipmi/
F: include/linux/ipmi*
F: include/uapi/linux/ipmi*
QCOM AUDIO (ASoC) DRIVERS
M: Patrick Lai <plai@codeaurora.org>
M: Banajit Goswami <bgoswami@codeaurora.org>
L: alsa-devel@alsa-project.org (moderated for non-subscribers)
S: Supported
F: sound/soc/qcom/
IPS SCSI RAID DRIVER
M: Adaptec OEM Raid Solutions <aacraid@adaptec.com>
L: linux-scsi@vger.kernel.org
@ -5918,7 +5925,7 @@ F: arch/powerpc/platforms/512x/
F: arch/powerpc/platforms/52xx/
LINUX FOR POWERPC EMBEDDED PPC4XX
M: Alistair Popple <alistair@popple.id.au>
M: Alistair Popple <alistair@popple.id.au>
M: Matt Porter <mporter@kernel.crashing.org>
W: http://www.penguinppc.org/
L: linuxppc-dev@lists.ozlabs.org
@ -6391,7 +6398,7 @@ S: Supported
F: drivers/watchdog/mena21_wdt.c
MEN CHAMELEON BUS (mcb)
M: Johannes Thumshirn <johannes.thumshirn@men.de>
M: Johannes Thumshirn <johannes.thumshirn@men.de>
S: Supported
F: drivers/mcb/
F: include/linux/mcb.h
@ -7947,10 +7954,10 @@ L: rtc-linux@googlegroups.com
S: Maintained
QAT DRIVER
M: Tadeusz Struk <tadeusz.struk@intel.com>
L: qat-linux@intel.com
S: Supported
F: drivers/crypto/qat/
M: Tadeusz Struk <tadeusz.struk@intel.com>
L: qat-linux@intel.com
S: Supported
F: drivers/crypto/qat/
QIB DRIVER
M: Mike Marciniszyn <infinipath@intel.com>
@ -8101,7 +8108,7 @@ S: Maintained
F: drivers/net/wireless/rt2x00/
RAMDISK RAM BLOCK DEVICE DRIVER
M: Nick Piggin <npiggin@kernel.dk>
M: Jens Axboe <axboe@kernel.dk>
S: Maintained
F: Documentation/blockdev/ramdisk.txt
F: drivers/block/brd.c
@ -8177,6 +8184,7 @@ X: kernel/torture.c
REAL TIME CLOCK (RTC) SUBSYSTEM
M: Alessandro Zummo <a.zummo@towertech.it>
M: Alexandre Belloni <alexandre.belloni@free-electrons.com>
L: rtc-linux@googlegroups.com
Q: http://patchwork.ozlabs.org/project/rtc-linux/list/
S: Maintained
@ -8647,11 +8655,9 @@ F: drivers/scsi/sg.c
F: include/scsi/sg.h
SCSI SUBSYSTEM
M: "James E.J. Bottomley" <JBottomley@parallels.com>
M: "James E.J. Bottomley" <JBottomley@odin.com>
L: linux-scsi@vger.kernel.org
T: git git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git
T: git git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6.git
T: git git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-pending-2.6.git
T: git git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git
S: Maintained
F: drivers/scsi/
F: include/scsi/
@ -9967,6 +9973,7 @@ F: drivers/media/pci/tw68/
TPM DEVICE DRIVER
M: Peter Huewe <peterhuewe@gmx.de>
M: Marcel Selhorst <tpmdd@selhorst.net>
R: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
W: http://tpmdd.sourceforge.net
L: tpmdd-devel@lists.sourceforge.net (moderated for non-subscribers)
Q: git git://github.com/PeterHuewe/linux-tpmdd.git
@ -10120,11 +10127,11 @@ F: include/linux/cdrom.h
F: include/uapi/linux/cdrom.h
UNISYS S-PAR DRIVERS
M: Benjamin Romer <benjamin.romer@unisys.com>
M: David Kershner <david.kershner@unisys.com>
L: sparmaintainer@unisys.com (Unisys internal)
S: Supported
F: drivers/staging/unisys/
M: Benjamin Romer <benjamin.romer@unisys.com>
M: David Kershner <david.kershner@unisys.com>
L: sparmaintainer@unisys.com (Unisys internal)
S: Supported
F: drivers/staging/unisys/
UNIVERSAL FLASH STORAGE HOST CONTROLLER DRIVER
M: Vinayak Holikatti <vinholikatti@gmail.com>
@ -10681,7 +10688,7 @@ F: drivers/media/rc/winbond-cir.c
WIMAX STACK
M: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
M: linux-wimax@intel.com
L: wimax@linuxwimax.org (subscribers-only)
L: wimax@linuxwimax.org (subscribers-only)
S: Supported
W: http://linuxwimax.org
F: Documentation/wimax/README.wimax
@ -10972,6 +10979,7 @@ L: linux-mm@kvack.org
S: Maintained
F: mm/zsmalloc.c
F: include/linux/zsmalloc.h
F: Documentation/vm/zsmalloc.txt
ZSWAP COMPRESSED SWAP CACHING
M: Seth Jennings <sjennings@variantweb.net>

View File

@ -10,9 +10,10 @@ NAME = Hurr durr I'ma sheep
# Comments in this file are targeted only to the developer, do not
# expect to learn how to build the kernel reading this file.
# Do not use make's built-in rules and variables
# (this increases performance and avoids hard-to-debug behaviour);
MAKEFLAGS += -rR
# o Do not use make's built-in rules and variables
# (this increases performance and avoids hard-to-debug behaviour);
# o Look for make include files relative to root of kernel src
MAKEFLAGS += -rR --include-dir=$(CURDIR)
# Avoid funny character set dependencies
unexport LC_ALL
@ -344,12 +345,9 @@ endif
export COMPILER
endif
# Look for make include files relative to root of kernel src
MAKEFLAGS += --include-dir=$(srctree)
# We need some generic definitions (do not try to remake the file).
$(srctree)/scripts/Kbuild.include: ;
include $(srctree)/scripts/Kbuild.include
scripts/Kbuild.include: ;
include scripts/Kbuild.include
# Make variables (CC, etc...)
AS = $(CROSS_COMPILE)as
@ -533,7 +531,7 @@ ifeq ($(config-targets),1)
# Read arch specific Makefile to set KBUILD_DEFCONFIG as needed.
# KBUILD_DEFCONFIG may point out an alternative default configuration
# used for 'make defconfig'
include $(srctree)/arch/$(SRCARCH)/Makefile
include arch/$(SRCARCH)/Makefile
export KBUILD_DEFCONFIG KBUILD_KCONFIG
config: scripts_basic outputmakefile FORCE
@ -609,7 +607,7 @@ endif # $(dot-config)
# Defaults to vmlinux, but the arch makefile usually adds further targets
all: vmlinux
include $(srctree)/arch/$(SRCARCH)/Makefile
include arch/$(SRCARCH)/Makefile
KBUILD_CFLAGS += $(call cc-option,-fno-delete-null-pointer-checks,)
@ -782,8 +780,8 @@ ifeq ($(shell $(CONFIG_SHELL) $(srctree)/scripts/gcc-goto.sh $(CC)), y)
KBUILD_AFLAGS += -DCC_HAVE_ASM_GOTO
endif
include $(srctree)/scripts/Makefile.kasan
include $(srctree)/scripts/Makefile.extrawarn
include scripts/Makefile.kasan
include scripts/Makefile.extrawarn
# Add user supplied CPPFLAGS, AFLAGS and CFLAGS as the last assignments
KBUILD_CPPFLAGS += $(KCPPFLAGS)
@ -1026,12 +1024,6 @@ headerdep:
$(Q)find $(srctree)/include/ -name '*.h' | xargs --max-args 1 \
$(srctree)/scripts/headerdep.pl -I$(srctree)/include
# ---------------------------------------------------------------------------
PHONY += depend dep
depend dep:
@echo '*** Warning: make $@ is unnecessary now.'
# ---------------------------------------------------------------------------
# Firmware install
INSTALL_FW_PATH=$(INSTALL_MOD_PATH)/lib/firmware

View File

@ -32,7 +32,7 @@ config HAVE_OPROFILE
config OPROFILE_NMI_TIMER
def_bool y
depends on PERF_EVENTS && HAVE_PERF_EVENTS_NMI
depends on PERF_EVENTS && HAVE_PERF_EVENTS_NMI && !PPC64
config KPROBES
bool "Kprobes"

View File

@ -44,6 +44,7 @@ struct task_struct;
extern unsigned long thread_saved_pc(struct task_struct *);
/* Do necessary setup to start up a newly executed thread. */
struct pt_regs;
extern void start_thread(struct pt_regs *, unsigned long, unsigned long);
/* Free all resources held by a thread. */

View File

@ -18,7 +18,6 @@ struct thread_info {
unsigned int flags; /* low level flags */
unsigned int ieee_state; /* see fpu.h */
struct exec_domain *exec_domain; /* execution domain */
mm_segment_t addr_limit; /* thread address space */
unsigned cpu; /* current CPU */
int preempt_count; /* 0 => preemptable, <0 => BUG */
@ -35,7 +34,6 @@ struct thread_info {
#define INIT_THREAD_INFO(tsk) \
{ \
.task = &tsk, \
.exec_domain = &default_exec_domain, \
.addr_limit = KERNEL_DS, \
.preempt_count = INIT_PREEMPT_COUNT, \
}

View File

@ -43,7 +43,6 @@ struct thread_info {
int preempt_count; /* 0 => preemptable, <0 => BUG */
struct task_struct *task; /* main task structure */
mm_segment_t addr_limit; /* thread address space */
struct exec_domain *exec_domain;/* execution domain */
__u32 cpu; /* current CPU */
unsigned long thr_ptr; /* TLS ptr */
};
@ -56,7 +55,6 @@ struct thread_info {
#define INIT_THREAD_INFO(tsk) \
{ \
.task = &tsk, \
.exec_domain = &default_exec_domain, \
.flags = 0, \
.cpu = 0, \
.preempt_count = INIT_PREEMPT_COUNT, \

View File

@ -171,18 +171,6 @@ static inline void __user *get_sigframe(struct ksignal *ksig,
return frame;
}
/*
* translate the signal
*/
static inline int map_sig(int sig)
{
struct thread_info *thread = current_thread_info();
if (thread->exec_domain && thread->exec_domain->signal_invmap
&& sig < 32)
sig = thread->exec_domain->signal_invmap[sig];
return sig;
}
static int
setup_rt_frame(struct ksignal *ksig, sigset_t *set, struct pt_regs *regs)
{
@ -231,7 +219,7 @@ setup_rt_frame(struct ksignal *ksig, sigset_t *set, struct pt_regs *regs)
return err;
/* #1 arg to the user Signal handler */
regs->r0 = map_sig(ksig->sig);
regs->r0 = ksig->sig;
/* setup PC of user space signal handler */
regs->ret = (unsigned long)ksig->ka.sa.sa_handler;

View File

@ -52,7 +52,7 @@ static void show_callee_regs(struct callee_regs *cregs)
print_reg_file(&(cregs->r13), 13);
}
void print_task_path_n_nm(struct task_struct *tsk, char *buf)
static void print_task_path_n_nm(struct task_struct *tsk, char *buf)
{
struct path path;
char *path_nm = NULL;
@ -77,7 +77,6 @@ void print_task_path_n_nm(struct task_struct *tsk, char *buf)
done:
pr_info("Path: %s\n", path_nm);
}
EXPORT_SYMBOL(print_task_path_n_nm);
static void show_faulting_vma(unsigned long address, char *buf)
{

View File

@ -2133,16 +2133,6 @@ menu "Userspace binary formats"
source "fs/Kconfig.binfmt"
config ARTHUR
tristate "RISC OS personality"
depends on !AEABI
help
Say Y here to include the kernel code necessary if you want to run
Acorn RISC OS/Arthur binaries under Linux. This code is still very
experimental; if this sounds frightening, say N and sleep in peace.
You can also say M here to compile this support as a module (which
will be called arthur).
endmenu
menu "Power management options"
@ -2175,6 +2165,9 @@ source "arch/arm/Kconfig.debug"
source "security/Kconfig"
source "crypto/Kconfig"
if CRYPTO
source "arch/arm/crypto/Kconfig"
endif
source "lib/Kconfig"

View File

@ -12,7 +12,7 @@
#
ifneq ($(MACHINE),)
include $(srctree)/$(MACHINE)/Makefile.boot
include $(MACHINE)/Makefile.boot
endif
# Note: the following conditions must always be true:

View File

@ -12,7 +12,6 @@ CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_FPE_NWFPE=y
CONFIG_BINFMT_AOUT=m
CONFIG_BINFMT_MISC=m
CONFIG_ARTHUR=m
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y

130
arch/arm/crypto/Kconfig Normal file
View File

@ -0,0 +1,130 @@
menuconfig ARM_CRYPTO
bool "ARM Accelerated Cryptographic Algorithms"
depends on ARM
help
Say Y here to choose from a selection of cryptographic algorithms
implemented using ARM specific CPU features or instructions.
if ARM_CRYPTO
config CRYPTO_SHA1_ARM
tristate "SHA1 digest algorithm (ARM-asm)"
select CRYPTO_SHA1
select CRYPTO_HASH
help
SHA-1 secure hash standard (FIPS 180-1/DFIPS 180-2) implemented
using optimized ARM assembler.
config CRYPTO_SHA1_ARM_NEON
tristate "SHA1 digest algorithm (ARM NEON)"
depends on KERNEL_MODE_NEON
select CRYPTO_SHA1_ARM
select CRYPTO_SHA1
select CRYPTO_HASH
help
SHA-1 secure hash standard (FIPS 180-1/DFIPS 180-2) implemented
using optimized ARM NEON assembly, when NEON instructions are
available.
config CRYPTO_SHA1_ARM_CE
tristate "SHA1 digest algorithm (ARM v8 Crypto Extensions)"
depends on KERNEL_MODE_NEON
select CRYPTO_SHA1_ARM
select CRYPTO_HASH
help
SHA-1 secure hash standard (FIPS 180-1/DFIPS 180-2) implemented
using special ARMv8 Crypto Extensions.
config CRYPTO_SHA2_ARM_CE
tristate "SHA-224/256 digest algorithm (ARM v8 Crypto Extensions)"
depends on KERNEL_MODE_NEON
select CRYPTO_SHA256_ARM
select CRYPTO_HASH
help
SHA-256 secure hash standard (DFIPS 180-2) implemented
using special ARMv8 Crypto Extensions.
config CRYPTO_SHA256_ARM
tristate "SHA-224/256 digest algorithm (ARM-asm and NEON)"
select CRYPTO_HASH
depends on !CPU_V7M
help
SHA-256 secure hash standard (DFIPS 180-2) implemented
using optimized ARM assembler and NEON, when available.
config CRYPTO_SHA512_ARM_NEON
tristate "SHA384 and SHA512 digest algorithm (ARM NEON)"
depends on KERNEL_MODE_NEON
select CRYPTO_SHA512
select CRYPTO_HASH
help
SHA-512 secure hash standard (DFIPS 180-2) implemented
using ARM NEON instructions, when available.
This version of SHA implements a 512 bit hash with 256 bits of
security against collision attacks.
This code also includes SHA-384, a 384 bit hash with 192 bits
of security against collision attacks.
config CRYPTO_AES_ARM
tristate "AES cipher algorithms (ARM-asm)"
depends on ARM
select CRYPTO_ALGAPI
select CRYPTO_AES
help
Use optimized AES assembler routines for ARM platforms.
AES cipher algorithms (FIPS-197). AES uses the Rijndael
algorithm.
Rijndael appears to be consistently a very good performer in
both hardware and software across a wide range of computing
environments regardless of its use in feedback or non-feedback
modes. Its key setup time is excellent, and its key agility is
good. Rijndael's very low memory requirements make it very well
suited for restricted-space environments, in which it also
demonstrates excellent performance. Rijndael's operations are
among the easiest to defend against power and timing attacks.
The AES specifies three key sizes: 128, 192 and 256 bits
See <http://csrc.nist.gov/encryption/aes/> for more information.
config CRYPTO_AES_ARM_BS
tristate "Bit sliced AES using NEON instructions"
depends on KERNEL_MODE_NEON
select CRYPTO_ALGAPI
select CRYPTO_AES_ARM
select CRYPTO_ABLK_HELPER
help
Use a faster and more secure NEON based implementation of AES in CBC,
CTR and XTS modes
Bit sliced AES gives around 45% speedup on Cortex-A15 for CTR mode
and for XTS mode encryption, CBC and XTS mode decryption speedup is
around 25%. (CBC encryption speed is not affected by this driver.)
This implementation does not rely on any lookup tables so it is
believed to be invulnerable to cache timing attacks.
config CRYPTO_AES_ARM_CE
tristate "Accelerated AES using ARMv8 Crypto Extensions"
depends on KERNEL_MODE_NEON
select CRYPTO_ALGAPI
select CRYPTO_ABLK_HELPER
help
Use an implementation of AES in CBC, CTR and XTS modes that uses
ARMv8 Crypto Extensions
config CRYPTO_GHASH_ARM_CE
tristate "PMULL-accelerated GHASH using ARMv8 Crypto Extensions"
depends on KERNEL_MODE_NEON
select CRYPTO_HASH
select CRYPTO_CRYPTD
help
Use an implementation of GHASH (used by the GCM AEAD chaining mode)
that uses the 64x64 to 128 bit polynomial multiplication (vmull.p64)
that is part of the ARMv8 Crypto Extensions
endif

View File

@ -6,13 +6,35 @@ obj-$(CONFIG_CRYPTO_AES_ARM) += aes-arm.o
obj-$(CONFIG_CRYPTO_AES_ARM_BS) += aes-arm-bs.o
obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o
obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o
obj-$(CONFIG_CRYPTO_SHA512_ARM_NEON) += sha512-arm-neon.o
ce-obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o
ce-obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o
ce-obj-$(CONFIG_CRYPTO_SHA2_ARM_CE) += sha2-arm-ce.o
ce-obj-$(CONFIG_CRYPTO_GHASH_ARM_CE) += ghash-arm-ce.o
ifneq ($(ce-obj-y)$(ce-obj-m),)
ifeq ($(call as-instr,.fpu crypto-neon-fp-armv8,y,n),y)
obj-y += $(ce-obj-y)
obj-m += $(ce-obj-m)
else
$(warning These ARMv8 Crypto Extensions modules need binutils 2.23 or higher)
$(warning $(ce-obj-y) $(ce-obj-m))
endif
endif
aes-arm-y := aes-armv4.o aes_glue.o
aes-arm-bs-y := aesbs-core.o aesbs-glue.o
sha1-arm-y := sha1-armv4-large.o sha1_glue.o
sha1-arm-neon-y := sha1-armv7-neon.o sha1_neon_glue.o
sha256-arm-neon-$(CONFIG_KERNEL_MODE_NEON) := sha256_neon_glue.o
sha256-arm-y := sha256-core.o sha256_glue.o $(sha256-arm-neon-y)
sha512-arm-neon-y := sha512-armv7-neon.o sha512_neon_glue.o
sha1-arm-ce-y := sha1-ce-core.o sha1-ce-glue.o
sha2-arm-ce-y := sha2-ce-core.o sha2-ce-glue.o
aes-arm-ce-y := aes-ce-core.o aes-ce-glue.o
ghash-arm-ce-y := ghash-ce-core.o ghash-ce-glue.o
quiet_cmd_perl = PERL $@
cmd_perl = $(PERL) $(<) > $(@)
@ -20,4 +42,7 @@ quiet_cmd_perl = PERL $@
$(src)/aesbs-core.S_shipped: $(src)/bsaes-armv7.pl
$(call cmd,perl)
.PRECIOUS: $(obj)/aesbs-core.S
$(src)/sha256-core.S_shipped: $(src)/sha256-armv4.pl
$(call cmd,perl)
.PRECIOUS: $(obj)/aesbs-core.S $(obj)/sha256-core.S

View File

@ -0,0 +1,518 @@
/*
* aes-ce-core.S - AES in CBC/CTR/XTS mode using ARMv8 Crypto Extensions
*
* Copyright (C) 2015 Linaro Ltd <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#include <linux/linkage.h>
#include <asm/assembler.h>
.text
.fpu crypto-neon-fp-armv8
.align 3
.macro enc_round, state, key
aese.8 \state, \key
aesmc.8 \state, \state
.endm
.macro dec_round, state, key
aesd.8 \state, \key
aesimc.8 \state, \state
.endm
.macro enc_dround, key1, key2
enc_round q0, \key1
enc_round q0, \key2
.endm
.macro dec_dround, key1, key2
dec_round q0, \key1
dec_round q0, \key2
.endm
.macro enc_fround, key1, key2, key3
enc_round q0, \key1
aese.8 q0, \key2
veor q0, q0, \key3
.endm
.macro dec_fround, key1, key2, key3
dec_round q0, \key1
aesd.8 q0, \key2
veor q0, q0, \key3
.endm
.macro enc_dround_3x, key1, key2
enc_round q0, \key1
enc_round q1, \key1
enc_round q2, \key1
enc_round q0, \key2
enc_round q1, \key2
enc_round q2, \key2
.endm
.macro dec_dround_3x, key1, key2
dec_round q0, \key1
dec_round q1, \key1
dec_round q2, \key1
dec_round q0, \key2
dec_round q1, \key2
dec_round q2, \key2
.endm
.macro enc_fround_3x, key1, key2, key3
enc_round q0, \key1
enc_round q1, \key1
enc_round q2, \key1
aese.8 q0, \key2
aese.8 q1, \key2
aese.8 q2, \key2
veor q0, q0, \key3
veor q1, q1, \key3
veor q2, q2, \key3
.endm
.macro dec_fround_3x, key1, key2, key3
dec_round q0, \key1
dec_round q1, \key1
dec_round q2, \key1
aesd.8 q0, \key2
aesd.8 q1, \key2
aesd.8 q2, \key2
veor q0, q0, \key3
veor q1, q1, \key3
veor q2, q2, \key3
.endm
.macro do_block, dround, fround
cmp r3, #12 @ which key size?
vld1.8 {q10-q11}, [ip]!
\dround q8, q9
vld1.8 {q12-q13}, [ip]!
\dround q10, q11
vld1.8 {q10-q11}, [ip]!
\dround q12, q13
vld1.8 {q12-q13}, [ip]!
\dround q10, q11
blo 0f @ AES-128: 10 rounds
vld1.8 {q10-q11}, [ip]!
beq 1f @ AES-192: 12 rounds
\dround q12, q13
vld1.8 {q12-q13}, [ip]
\dround q10, q11
0: \fround q12, q13, q14
bx lr
1: \dround q12, q13
\fround q10, q11, q14
bx lr
.endm
/*
* Internal, non-AAPCS compliant functions that implement the core AES
* transforms. These should preserve all registers except q0 - q2 and ip
* Arguments:
* q0 : first in/output block
* q1 : second in/output block (_3x version only)
* q2 : third in/output block (_3x version only)
* q8 : first round key
* q9 : secound round key
* ip : address of 3rd round key
* q14 : final round key
* r3 : number of rounds
*/
.align 6
aes_encrypt:
add ip, r2, #32 @ 3rd round key
.Laes_encrypt_tweak:
do_block enc_dround, enc_fround
ENDPROC(aes_encrypt)
.align 6
aes_decrypt:
add ip, r2, #32 @ 3rd round key
do_block dec_dround, dec_fround
ENDPROC(aes_decrypt)
.align 6
aes_encrypt_3x:
add ip, r2, #32 @ 3rd round key
do_block enc_dround_3x, enc_fround_3x
ENDPROC(aes_encrypt_3x)
.align 6
aes_decrypt_3x:
add ip, r2, #32 @ 3rd round key
do_block dec_dround_3x, dec_fround_3x
ENDPROC(aes_decrypt_3x)
.macro prepare_key, rk, rounds
add ip, \rk, \rounds, lsl #4
vld1.8 {q8-q9}, [\rk] @ load first 2 round keys
vld1.8 {q14}, [ip] @ load last round key
.endm
/*
* aes_ecb_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds,
* int blocks)
* aes_ecb_decrypt(u8 out[], u8 const in[], u8 const rk[], int rounds,
* int blocks)
*/
ENTRY(ce_aes_ecb_encrypt)
push {r4, lr}
ldr r4, [sp, #8]
prepare_key r2, r3
.Lecbencloop3x:
subs r4, r4, #3
bmi .Lecbenc1x
vld1.8 {q0-q1}, [r1, :64]!
vld1.8 {q2}, [r1, :64]!
bl aes_encrypt_3x
vst1.8 {q0-q1}, [r0, :64]!
vst1.8 {q2}, [r0, :64]!
b .Lecbencloop3x
.Lecbenc1x:
adds r4, r4, #3
beq .Lecbencout
.Lecbencloop:
vld1.8 {q0}, [r1, :64]!
bl aes_encrypt
vst1.8 {q0}, [r0, :64]!
subs r4, r4, #1
bne .Lecbencloop
.Lecbencout:
pop {r4, pc}
ENDPROC(ce_aes_ecb_encrypt)
ENTRY(ce_aes_ecb_decrypt)
push {r4, lr}
ldr r4, [sp, #8]
prepare_key r2, r3
.Lecbdecloop3x:
subs r4, r4, #3
bmi .Lecbdec1x
vld1.8 {q0-q1}, [r1, :64]!
vld1.8 {q2}, [r1, :64]!
bl aes_decrypt_3x
vst1.8 {q0-q1}, [r0, :64]!
vst1.8 {q2}, [r0, :64]!
b .Lecbdecloop3x
.Lecbdec1x:
adds r4, r4, #3
beq .Lecbdecout
.Lecbdecloop:
vld1.8 {q0}, [r1, :64]!
bl aes_decrypt
vst1.8 {q0}, [r0, :64]!
subs r4, r4, #1
bne .Lecbdecloop
.Lecbdecout:
pop {r4, pc}
ENDPROC(ce_aes_ecb_decrypt)
/*
* aes_cbc_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds,
* int blocks, u8 iv[])
* aes_cbc_decrypt(u8 out[], u8 const in[], u8 const rk[], int rounds,
* int blocks, u8 iv[])
*/
ENTRY(ce_aes_cbc_encrypt)
push {r4-r6, lr}
ldrd r4, r5, [sp, #16]
vld1.8 {q0}, [r5]
prepare_key r2, r3
.Lcbcencloop:
vld1.8 {q1}, [r1, :64]! @ get next pt block
veor q0, q0, q1 @ ..and xor with iv
bl aes_encrypt
vst1.8 {q0}, [r0, :64]!
subs r4, r4, #1
bne .Lcbcencloop
vst1.8 {q0}, [r5]
pop {r4-r6, pc}
ENDPROC(ce_aes_cbc_encrypt)
ENTRY(ce_aes_cbc_decrypt)
push {r4-r6, lr}
ldrd r4, r5, [sp, #16]
vld1.8 {q6}, [r5] @ keep iv in q6
prepare_key r2, r3
.Lcbcdecloop3x:
subs r4, r4, #3
bmi .Lcbcdec1x
vld1.8 {q0-q1}, [r1, :64]!
vld1.8 {q2}, [r1, :64]!
vmov q3, q0
vmov q4, q1
vmov q5, q2
bl aes_decrypt_3x
veor q0, q0, q6
veor q1, q1, q3
veor q2, q2, q4
vmov q6, q5
vst1.8 {q0-q1}, [r0, :64]!
vst1.8 {q2}, [r0, :64]!
b .Lcbcdecloop3x
.Lcbcdec1x:
adds r4, r4, #3
beq .Lcbcdecout
vmov q15, q14 @ preserve last round key
.Lcbcdecloop:
vld1.8 {q0}, [r1, :64]! @ get next ct block
veor q14, q15, q6 @ combine prev ct with last key
vmov q6, q0
bl aes_decrypt
vst1.8 {q0}, [r0, :64]!
subs r4, r4, #1
bne .Lcbcdecloop
.Lcbcdecout:
vst1.8 {q6}, [r5] @ keep iv in q6
pop {r4-r6, pc}
ENDPROC(ce_aes_cbc_decrypt)
/*
* aes_ctr_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds,
* int blocks, u8 ctr[])
*/
ENTRY(ce_aes_ctr_encrypt)
push {r4-r6, lr}
ldrd r4, r5, [sp, #16]
vld1.8 {q6}, [r5] @ load ctr
prepare_key r2, r3
vmov r6, s27 @ keep swabbed ctr in r6
rev r6, r6
cmn r6, r4 @ 32 bit overflow?
bcs .Lctrloop
.Lctrloop3x:
subs r4, r4, #3
bmi .Lctr1x
add r6, r6, #1
vmov q0, q6
vmov q1, q6
rev ip, r6
add r6, r6, #1
vmov q2, q6
vmov s7, ip
rev ip, r6
add r6, r6, #1
vmov s11, ip
vld1.8 {q3-q4}, [r1, :64]!
vld1.8 {q5}, [r1, :64]!
bl aes_encrypt_3x
veor q0, q0, q3
veor q1, q1, q4
veor q2, q2, q5
rev ip, r6
vst1.8 {q0-q1}, [r0, :64]!
vst1.8 {q2}, [r0, :64]!
vmov s27, ip
b .Lctrloop3x
.Lctr1x:
adds r4, r4, #3
beq .Lctrout
.Lctrloop:
vmov q0, q6
bl aes_encrypt
subs r4, r4, #1
bmi .Lctrhalfblock @ blocks < 0 means 1/2 block
vld1.8 {q3}, [r1, :64]!
veor q3, q0, q3
vst1.8 {q3}, [r0, :64]!
adds r6, r6, #1 @ increment BE ctr
rev ip, r6
vmov s27, ip
bcs .Lctrcarry
teq r4, #0
bne .Lctrloop
.Lctrout:
vst1.8 {q6}, [r5]
pop {r4-r6, pc}
.Lctrhalfblock:
vld1.8 {d1}, [r1, :64]
veor d0, d0, d1
vst1.8 {d0}, [r0, :64]
pop {r4-r6, pc}
.Lctrcarry:
.irp sreg, s26, s25, s24
vmov ip, \sreg @ load next word of ctr
rev ip, ip @ ... to handle the carry
adds ip, ip, #1
rev ip, ip
vmov \sreg, ip
bcc 0f
.endr
0: teq r4, #0
beq .Lctrout
b .Lctrloop
ENDPROC(ce_aes_ctr_encrypt)
/*
* aes_xts_encrypt(u8 out[], u8 const in[], u8 const rk1[], int rounds,
* int blocks, u8 iv[], u8 const rk2[], int first)
* aes_xts_decrypt(u8 out[], u8 const in[], u8 const rk1[], int rounds,
* int blocks, u8 iv[], u8 const rk2[], int first)
*/
.macro next_tweak, out, in, const, tmp
vshr.s64 \tmp, \in, #63
vand \tmp, \tmp, \const
vadd.u64 \out, \in, \in
vext.8 \tmp, \tmp, \tmp, #8
veor \out, \out, \tmp
.endm
.align 3
.Lxts_mul_x:
.quad 1, 0x87
ce_aes_xts_init:
vldr d14, .Lxts_mul_x
vldr d15, .Lxts_mul_x + 8
ldrd r4, r5, [sp, #16] @ load args
ldr r6, [sp, #28]
vld1.8 {q0}, [r5] @ load iv
teq r6, #1 @ start of a block?
bxne lr
@ Encrypt the IV in q0 with the second AES key. This should only
@ be done at the start of a block.
ldr r6, [sp, #24] @ load AES key 2
prepare_key r6, r3
add ip, r6, #32 @ 3rd round key of key 2
b .Laes_encrypt_tweak @ tail call
ENDPROC(ce_aes_xts_init)
ENTRY(ce_aes_xts_encrypt)
push {r4-r6, lr}
bl ce_aes_xts_init @ run shared prologue
prepare_key r2, r3
vmov q3, q0
teq r6, #0 @ start of a block?
bne .Lxtsenc3x
.Lxtsencloop3x:
next_tweak q3, q3, q7, q6
.Lxtsenc3x:
subs r4, r4, #3
bmi .Lxtsenc1x
vld1.8 {q0-q1}, [r1, :64]! @ get 3 pt blocks
vld1.8 {q2}, [r1, :64]!
next_tweak q4, q3, q7, q6
veor q0, q0, q3
next_tweak q5, q4, q7, q6
veor q1, q1, q4
veor q2, q2, q5
bl aes_encrypt_3x
veor q0, q0, q3
veor q1, q1, q4
veor q2, q2, q5
vst1.8 {q0-q1}, [r0, :64]! @ write 3 ct blocks
vst1.8 {q2}, [r0, :64]!
vmov q3, q5
teq r4, #0
beq .Lxtsencout
b .Lxtsencloop3x
.Lxtsenc1x:
adds r4, r4, #3
beq .Lxtsencout
.Lxtsencloop:
vld1.8 {q0}, [r1, :64]!
veor q0, q0, q3
bl aes_encrypt
veor q0, q0, q3
vst1.8 {q0}, [r0, :64]!
subs r4, r4, #1
beq .Lxtsencout
next_tweak q3, q3, q7, q6
b .Lxtsencloop
.Lxtsencout:
vst1.8 {q3}, [r5]
pop {r4-r6, pc}
ENDPROC(ce_aes_xts_encrypt)
ENTRY(ce_aes_xts_decrypt)
push {r4-r6, lr}
bl ce_aes_xts_init @ run shared prologue
prepare_key r2, r3
vmov q3, q0
teq r6, #0 @ start of a block?
bne .Lxtsdec3x
.Lxtsdecloop3x:
next_tweak q3, q3, q7, q6
.Lxtsdec3x:
subs r4, r4, #3
bmi .Lxtsdec1x
vld1.8 {q0-q1}, [r1, :64]! @ get 3 ct blocks
vld1.8 {q2}, [r1, :64]!
next_tweak q4, q3, q7, q6
veor q0, q0, q3
next_tweak q5, q4, q7, q6
veor q1, q1, q4
veor q2, q2, q5
bl aes_decrypt_3x
veor q0, q0, q3
veor q1, q1, q4
veor q2, q2, q5
vst1.8 {q0-q1}, [r0, :64]! @ write 3 pt blocks
vst1.8 {q2}, [r0, :64]!
vmov q3, q5
teq r4, #0
beq .Lxtsdecout
b .Lxtsdecloop3x
.Lxtsdec1x:
adds r4, r4, #3
beq .Lxtsdecout
.Lxtsdecloop:
vld1.8 {q0}, [r1, :64]!
veor q0, q0, q3
add ip, r2, #32 @ 3rd round key
bl aes_decrypt
veor q0, q0, q3
vst1.8 {q0}, [r0, :64]!
subs r4, r4, #1
beq .Lxtsdecout
next_tweak q3, q3, q7, q6
b .Lxtsdecloop
.Lxtsdecout:
vst1.8 {q3}, [r5]
pop {r4-r6, pc}
ENDPROC(ce_aes_xts_decrypt)
/*
* u32 ce_aes_sub(u32 input) - use the aese instruction to perform the
* AES sbox substitution on each byte in
* 'input'
*/
ENTRY(ce_aes_sub)
vdup.32 q1, r0
veor q0, q0, q0
aese.8 q0, q1
vmov r0, s0
bx lr
ENDPROC(ce_aes_sub)
/*
* void ce_aes_invert(u8 *dst, u8 *src) - perform the Inverse MixColumns
* operation on round key *src
*/
ENTRY(ce_aes_invert)
vld1.8 {q0}, [r1]
aesimc.8 q0, q0
vst1.8 {q0}, [r0]
bx lr
ENDPROC(ce_aes_invert)

View File

@ -0,0 +1,524 @@
/*
* aes-ce-glue.c - wrapper code for ARMv8 AES
*
* Copyright (C) 2015 Linaro Ltd <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#include <asm/hwcap.h>
#include <asm/neon.h>
#include <asm/hwcap.h>
#include <crypto/aes.h>
#include <crypto/ablk_helper.h>
#include <crypto/algapi.h>
#include <linux/module.h>
MODULE_DESCRIPTION("AES-ECB/CBC/CTR/XTS using ARMv8 Crypto Extensions");
MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
MODULE_LICENSE("GPL v2");
/* defined in aes-ce-core.S */
asmlinkage u32 ce_aes_sub(u32 input);
asmlinkage void ce_aes_invert(void *dst, void *src);
asmlinkage void ce_aes_ecb_encrypt(u8 out[], u8 const in[], u8 const rk[],
int rounds, int blocks);
asmlinkage void ce_aes_ecb_decrypt(u8 out[], u8 const in[], u8 const rk[],
int rounds, int blocks);
asmlinkage void ce_aes_cbc_encrypt(u8 out[], u8 const in[], u8 const rk[],
int rounds, int blocks, u8 iv[]);
asmlinkage void ce_aes_cbc_decrypt(u8 out[], u8 const in[], u8 const rk[],
int rounds, int blocks, u8 iv[]);
asmlinkage void ce_aes_ctr_encrypt(u8 out[], u8 const in[], u8 const rk[],
int rounds, int blocks, u8 ctr[]);
asmlinkage void ce_aes_xts_encrypt(u8 out[], u8 const in[], u8 const rk1[],
int rounds, int blocks, u8 iv[],
u8 const rk2[], int first);
asmlinkage void ce_aes_xts_decrypt(u8 out[], u8 const in[], u8 const rk1[],
int rounds, int blocks, u8 iv[],
u8 const rk2[], int first);
struct aes_block {
u8 b[AES_BLOCK_SIZE];
};
static int num_rounds(struct crypto_aes_ctx *ctx)
{
/*
* # of rounds specified by AES:
* 128 bit key 10 rounds
* 192 bit key 12 rounds
* 256 bit key 14 rounds
* => n byte key => 6 + (n/4) rounds
*/
return 6 + ctx->key_length / 4;
}
static int ce_aes_expandkey(struct crypto_aes_ctx *ctx, const u8 *in_key,
unsigned int key_len)
{
/*
* The AES key schedule round constants
*/
static u8 const rcon[] = {
0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1b, 0x36,
};
u32 kwords = key_len / sizeof(u32);
struct aes_block *key_enc, *key_dec;
int i, j;
if (key_len != AES_KEYSIZE_128 &&
key_len != AES_KEYSIZE_192 &&
key_len != AES_KEYSIZE_256)
return -EINVAL;
memcpy(ctx->key_enc, in_key, key_len);
ctx->key_length = key_len;
kernel_neon_begin();
for (i = 0; i < sizeof(rcon); i++) {
u32 *rki = ctx->key_enc + (i * kwords);
u32 *rko = rki + kwords;
rko[0] = ror32(ce_aes_sub(rki[kwords - 1]), 8);
rko[0] = rko[0] ^ rki[0] ^ rcon[i];
rko[1] = rko[0] ^ rki[1];
rko[2] = rko[1] ^ rki[2];
rko[3] = rko[2] ^ rki[3];
if (key_len == AES_KEYSIZE_192) {
if (i >= 7)
break;
rko[4] = rko[3] ^ rki[4];
rko[5] = rko[4] ^ rki[5];
} else if (key_len == AES_KEYSIZE_256) {
if (i >= 6)
break;
rko[4] = ce_aes_sub(rko[3]) ^ rki[4];
rko[5] = rko[4] ^ rki[5];
rko[6] = rko[5] ^ rki[6];
rko[7] = rko[6] ^ rki[7];
}
}
/*
* Generate the decryption keys for the Equivalent Inverse Cipher.
* This involves reversing the order of the round keys, and applying
* the Inverse Mix Columns transformation on all but the first and
* the last one.
*/
key_enc = (struct aes_block *)ctx->key_enc;
key_dec = (struct aes_block *)ctx->key_dec;
j = num_rounds(ctx);
key_dec[0] = key_enc[j];
for (i = 1, j--; j > 0; i++, j--)
ce_aes_invert(key_dec + i, key_enc + j);
key_dec[i] = key_enc[0];
kernel_neon_end();
return 0;
}
static int ce_aes_setkey(struct crypto_tfm *tfm, const u8 *in_key,
unsigned int key_len)
{
struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
int ret;
ret = ce_aes_expandkey(ctx, in_key, key_len);
if (!ret)
return 0;
tfm->crt_flags |= CRYPTO_TFM_RES_BAD_KEY_LEN;
return -EINVAL;
}
struct crypto_aes_xts_ctx {
struct crypto_aes_ctx key1;
struct crypto_aes_ctx __aligned(8) key2;
};
static int xts_set_key(struct crypto_tfm *tfm, const u8 *in_key,
unsigned int key_len)
{
struct crypto_aes_xts_ctx *ctx = crypto_tfm_ctx(tfm);
int ret;
ret = ce_aes_expandkey(&ctx->key1, in_key, key_len / 2);
if (!ret)
ret = ce_aes_expandkey(&ctx->key2, &in_key[key_len / 2],
key_len / 2);
if (!ret)
return 0;
tfm->crt_flags |= CRYPTO_TFM_RES_BAD_KEY_LEN;
return -EINVAL;
}
static int ecb_encrypt(struct blkcipher_desc *desc, struct scatterlist *dst,
struct scatterlist *src, unsigned int nbytes)
{
struct crypto_aes_ctx *ctx = crypto_blkcipher_ctx(desc->tfm);
struct blkcipher_walk walk;
unsigned int blocks;
int err;
desc->flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP;
blkcipher_walk_init(&walk, dst, src, nbytes);
err = blkcipher_walk_virt(desc, &walk);
kernel_neon_begin();
while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) {
ce_aes_ecb_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
(u8 *)ctx->key_enc, num_rounds(ctx), blocks);
err = blkcipher_walk_done(desc, &walk,
walk.nbytes % AES_BLOCK_SIZE);
}
kernel_neon_end();
return err;
}
static int ecb_decrypt(struct blkcipher_desc *desc, struct scatterlist *dst,
struct scatterlist *src, unsigned int nbytes)
{
struct crypto_aes_ctx *ctx = crypto_blkcipher_ctx(desc->tfm);
struct blkcipher_walk walk;
unsigned int blocks;
int err;
desc->flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP;
blkcipher_walk_init(&walk, dst, src, nbytes);
err = blkcipher_walk_virt(desc, &walk);
kernel_neon_begin();
while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) {
ce_aes_ecb_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
(u8 *)ctx->key_dec, num_rounds(ctx), blocks);
err = blkcipher_walk_done(desc, &walk,
walk.nbytes % AES_BLOCK_SIZE);
}
kernel_neon_end();
return err;
}
static int cbc_encrypt(struct blkcipher_desc *desc, struct scatterlist *dst,
struct scatterlist *src, unsigned int nbytes)
{
struct crypto_aes_ctx *ctx = crypto_blkcipher_ctx(desc->tfm);
struct blkcipher_walk walk;
unsigned int blocks;
int err;
desc->flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP;
blkcipher_walk_init(&walk, dst, src, nbytes);
err = blkcipher_walk_virt(desc, &walk);
kernel_neon_begin();
while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) {
ce_aes_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
(u8 *)ctx->key_enc, num_rounds(ctx), blocks,
walk.iv);
err = blkcipher_walk_done(desc, &walk,
walk.nbytes % AES_BLOCK_SIZE);
}
kernel_neon_end();
return err;
}
static int cbc_decrypt(struct blkcipher_desc *desc, struct scatterlist *dst,
struct scatterlist *src, unsigned int nbytes)
{
struct crypto_aes_ctx *ctx = crypto_blkcipher_ctx(desc->tfm);
struct blkcipher_walk walk;
unsigned int blocks;
int err;
desc->flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP;
blkcipher_walk_init(&walk, dst, src, nbytes);
err = blkcipher_walk_virt(desc, &walk);
kernel_neon_begin();
while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) {
ce_aes_cbc_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
(u8 *)ctx->key_dec, num_rounds(ctx), blocks,
walk.iv);
err = blkcipher_walk_done(desc, &walk,
walk.nbytes % AES_BLOCK_SIZE);
}
kernel_neon_end();
return err;
}
static int ctr_encrypt(struct blkcipher_desc *desc, struct scatterlist *dst,
struct scatterlist *src, unsigned int nbytes)
{
struct crypto_aes_ctx *ctx = crypto_blkcipher_ctx(desc->tfm);
struct blkcipher_walk walk;
int err, blocks;
desc->flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP;
blkcipher_walk_init(&walk, dst, src, nbytes);
err = blkcipher_walk_virt_block(desc, &walk, AES_BLOCK_SIZE);
kernel_neon_begin();
while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) {
ce_aes_ctr_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
(u8 *)ctx->key_enc, num_rounds(ctx), blocks,
walk.iv);
nbytes -= blocks * AES_BLOCK_SIZE;
if (nbytes && nbytes == walk.nbytes % AES_BLOCK_SIZE)
break;
err = blkcipher_walk_done(desc, &walk,
walk.nbytes % AES_BLOCK_SIZE);
}
if (nbytes) {
u8 *tdst = walk.dst.virt.addr + blocks * AES_BLOCK_SIZE;
u8 *tsrc = walk.src.virt.addr + blocks * AES_BLOCK_SIZE;
u8 __aligned(8) tail[AES_BLOCK_SIZE];
/*
* Minimum alignment is 8 bytes, so if nbytes is <= 8, we need
* to tell aes_ctr_encrypt() to only read half a block.
*/
blocks = (nbytes <= 8) ? -1 : 1;
ce_aes_ctr_encrypt(tail, tsrc, (u8 *)ctx->key_enc,
num_rounds(ctx), blocks, walk.iv);
memcpy(tdst, tail, nbytes);
err = blkcipher_walk_done(desc, &walk, 0);
}
kernel_neon_end();
return err;
}
static int xts_encrypt(struct blkcipher_desc *desc, struct scatterlist *dst,
struct scatterlist *src, unsigned int nbytes)
{
struct crypto_aes_xts_ctx *ctx = crypto_blkcipher_ctx(desc->tfm);
int err, first, rounds = num_rounds(&ctx->key1);
struct blkcipher_walk walk;
unsigned int blocks;
desc->flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP;
blkcipher_walk_init(&walk, dst, src, nbytes);
err = blkcipher_walk_virt(desc, &walk);
kernel_neon_begin();
for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) {
ce_aes_xts_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
(u8 *)ctx->key1.key_enc, rounds, blocks,
walk.iv, (u8 *)ctx->key2.key_enc, first);
err = blkcipher_walk_done(desc, &walk,
walk.nbytes % AES_BLOCK_SIZE);
}
kernel_neon_end();
return err;
}
static int xts_decrypt(struct blkcipher_desc *desc, struct scatterlist *dst,
struct scatterlist *src, unsigned int nbytes)
{
struct crypto_aes_xts_ctx *ctx = crypto_blkcipher_ctx(desc->tfm);
int err, first, rounds = num_rounds(&ctx->key1);
struct blkcipher_walk walk;
unsigned int blocks;
desc->flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP;
blkcipher_walk_init(&walk, dst, src, nbytes);
err = blkcipher_walk_virt(desc, &walk);
kernel_neon_begin();
for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) {
ce_aes_xts_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
(u8 *)ctx->key1.key_dec, rounds, blocks,
walk.iv, (u8 *)ctx->key2.key_enc, first);
err = blkcipher_walk_done(desc, &walk,
walk.nbytes % AES_BLOCK_SIZE);
}
kernel_neon_end();
return err;
}
static struct crypto_alg aes_algs[] = { {
.cra_name = "__ecb-aes-ce",
.cra_driver_name = "__driver-ecb-aes-ce",
.cra_priority = 0,
.cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER |
CRYPTO_ALG_INTERNAL,
.cra_blocksize = AES_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct crypto_aes_ctx),
.cra_alignmask = 7,
.cra_type = &crypto_blkcipher_type,
.cra_module = THIS_MODULE,
.cra_blkcipher = {
.min_keysize = AES_MIN_KEY_SIZE,
.max_keysize = AES_MAX_KEY_SIZE,
.ivsize = AES_BLOCK_SIZE,
.setkey = ce_aes_setkey,
.encrypt = ecb_encrypt,
.decrypt = ecb_decrypt,
},
}, {
.cra_name = "__cbc-aes-ce",
.cra_driver_name = "__driver-cbc-aes-ce",
.cra_priority = 0,
.cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER |
CRYPTO_ALG_INTERNAL,
.cra_blocksize = AES_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct crypto_aes_ctx),
.cra_alignmask = 7,
.cra_type = &crypto_blkcipher_type,
.cra_module = THIS_MODULE,
.cra_blkcipher = {
.min_keysize = AES_MIN_KEY_SIZE,
.max_keysize = AES_MAX_KEY_SIZE,
.ivsize = AES_BLOCK_SIZE,
.setkey = ce_aes_setkey,
.encrypt = cbc_encrypt,
.decrypt = cbc_decrypt,
},
}, {
.cra_name = "__ctr-aes-ce",
.cra_driver_name = "__driver-ctr-aes-ce",
.cra_priority = 0,
.cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER |
CRYPTO_ALG_INTERNAL,
.cra_blocksize = 1,
.cra_ctxsize = sizeof(struct crypto_aes_ctx),
.cra_alignmask = 7,
.cra_type = &crypto_blkcipher_type,
.cra_module = THIS_MODULE,
.cra_blkcipher = {
.min_keysize = AES_MIN_KEY_SIZE,
.max_keysize = AES_MAX_KEY_SIZE,
.ivsize = AES_BLOCK_SIZE,
.setkey = ce_aes_setkey,
.encrypt = ctr_encrypt,
.decrypt = ctr_encrypt,
},
}, {
.cra_name = "__xts-aes-ce",
.cra_driver_name = "__driver-xts-aes-ce",
.cra_priority = 0,
.cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER |
CRYPTO_ALG_INTERNAL,
.cra_blocksize = AES_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct crypto_aes_xts_ctx),
.cra_alignmask = 7,
.cra_type = &crypto_blkcipher_type,
.cra_module = THIS_MODULE,
.cra_blkcipher = {
.min_keysize = 2 * AES_MIN_KEY_SIZE,
.max_keysize = 2 * AES_MAX_KEY_SIZE,
.ivsize = AES_BLOCK_SIZE,
.setkey = xts_set_key,
.encrypt = xts_encrypt,
.decrypt = xts_decrypt,
},
}, {
.cra_name = "ecb(aes)",
.cra_driver_name = "ecb-aes-ce",
.cra_priority = 300,
.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER|CRYPTO_ALG_ASYNC,
.cra_blocksize = AES_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct async_helper_ctx),
.cra_alignmask = 7,
.cra_type = &crypto_ablkcipher_type,
.cra_module = THIS_MODULE,
.cra_init = ablk_init,
.cra_exit = ablk_exit,
.cra_ablkcipher = {
.min_keysize = AES_MIN_KEY_SIZE,
.max_keysize = AES_MAX_KEY_SIZE,
.ivsize = AES_BLOCK_SIZE,
.setkey = ablk_set_key,
.encrypt = ablk_encrypt,
.decrypt = ablk_decrypt,
}
}, {
.cra_name = "cbc(aes)",
.cra_driver_name = "cbc-aes-ce",
.cra_priority = 300,
.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER|CRYPTO_ALG_ASYNC,
.cra_blocksize = AES_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct async_helper_ctx),
.cra_alignmask = 7,
.cra_type = &crypto_ablkcipher_type,
.cra_module = THIS_MODULE,
.cra_init = ablk_init,
.cra_exit = ablk_exit,
.cra_ablkcipher = {
.min_keysize = AES_MIN_KEY_SIZE,
.max_keysize = AES_MAX_KEY_SIZE,
.ivsize = AES_BLOCK_SIZE,
.setkey = ablk_set_key,
.encrypt = ablk_encrypt,
.decrypt = ablk_decrypt,
}
}, {
.cra_name = "ctr(aes)",
.cra_driver_name = "ctr-aes-ce",
.cra_priority = 300,
.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER|CRYPTO_ALG_ASYNC,
.cra_blocksize = 1,
.cra_ctxsize = sizeof(struct async_helper_ctx),
.cra_alignmask = 7,
.cra_type = &crypto_ablkcipher_type,
.cra_module = THIS_MODULE,
.cra_init = ablk_init,
.cra_exit = ablk_exit,
.cra_ablkcipher = {
.min_keysize = AES_MIN_KEY_SIZE,
.max_keysize = AES_MAX_KEY_SIZE,
.ivsize = AES_BLOCK_SIZE,
.setkey = ablk_set_key,
.encrypt = ablk_encrypt,
.decrypt = ablk_decrypt,
}
}, {
.cra_name = "xts(aes)",
.cra_driver_name = "xts-aes-ce",
.cra_priority = 300,
.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER|CRYPTO_ALG_ASYNC,
.cra_blocksize = AES_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct async_helper_ctx),
.cra_alignmask = 7,
.cra_type = &crypto_ablkcipher_type,
.cra_module = THIS_MODULE,
.cra_init = ablk_init,
.cra_exit = ablk_exit,
.cra_ablkcipher = {
.min_keysize = 2 * AES_MIN_KEY_SIZE,
.max_keysize = 2 * AES_MAX_KEY_SIZE,
.ivsize = AES_BLOCK_SIZE,
.setkey = ablk_set_key,
.encrypt = ablk_encrypt,
.decrypt = ablk_decrypt,
}
} };
static int __init aes_init(void)
{
if (!(elf_hwcap2 & HWCAP2_AES))
return -ENODEV;
return crypto_register_algs(aes_algs, ARRAY_SIZE(aes_algs));
}
static void __exit aes_exit(void)
{
crypto_unregister_algs(aes_algs, ARRAY_SIZE(aes_algs));
}
module_init(aes_init);
module_exit(aes_exit);

View File

@ -301,7 +301,8 @@ static struct crypto_alg aesbs_algs[] = { {
.cra_name = "__cbc-aes-neonbs",
.cra_driver_name = "__driver-cbc-aes-neonbs",
.cra_priority = 0,
.cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER,
.cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER |
CRYPTO_ALG_INTERNAL,
.cra_blocksize = AES_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct aesbs_cbc_ctx),
.cra_alignmask = 7,
@ -319,7 +320,8 @@ static struct crypto_alg aesbs_algs[] = { {
.cra_name = "__ctr-aes-neonbs",
.cra_driver_name = "__driver-ctr-aes-neonbs",
.cra_priority = 0,
.cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER,
.cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER |
CRYPTO_ALG_INTERNAL,
.cra_blocksize = 1,
.cra_ctxsize = sizeof(struct aesbs_ctr_ctx),
.cra_alignmask = 7,
@ -337,7 +339,8 @@ static struct crypto_alg aesbs_algs[] = { {
.cra_name = "__xts-aes-neonbs",
.cra_driver_name = "__driver-xts-aes-neonbs",
.cra_priority = 0,
.cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER,
.cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER |
CRYPTO_ALG_INTERNAL,
.cra_blocksize = AES_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct aesbs_xts_ctx),
.cra_alignmask = 7,

View File

@ -0,0 +1,94 @@
/*
* Accelerated GHASH implementation with ARMv8 vmull.p64 instructions.
*
* Copyright (C) 2015 Linaro Ltd. <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License version 2 as published
* by the Free Software Foundation.
*/
#include <linux/linkage.h>
#include <asm/assembler.h>
SHASH .req q0
SHASH2 .req q1
T1 .req q2
T2 .req q3
MASK .req q4
XL .req q5
XM .req q6
XH .req q7
IN1 .req q7
SHASH_L .req d0
SHASH_H .req d1
SHASH2_L .req d2
T1_L .req d4
MASK_L .req d8
XL_L .req d10
XL_H .req d11
XM_L .req d12
XM_H .req d13
XH_L .req d14
.text
.fpu crypto-neon-fp-armv8
/*
* void pmull_ghash_update(int blocks, u64 dg[], const char *src,
* struct ghash_key const *k, const char *head)
*/
ENTRY(pmull_ghash_update)
vld1.64 {SHASH}, [r3]
vld1.64 {XL}, [r1]
vmov.i8 MASK, #0xe1
vext.8 SHASH2, SHASH, SHASH, #8
vshl.u64 MASK, MASK, #57
veor SHASH2, SHASH2, SHASH
/* do the head block first, if supplied */
ldr ip, [sp]
teq ip, #0
beq 0f
vld1.64 {T1}, [ip]
teq r0, #0
b 1f
0: vld1.64 {T1}, [r2]!
subs r0, r0, #1
1: /* multiply XL by SHASH in GF(2^128) */
#ifndef CONFIG_CPU_BIG_ENDIAN
vrev64.8 T1, T1
#endif
vext.8 T2, XL, XL, #8
vext.8 IN1, T1, T1, #8
veor T1, T1, T2
veor XL, XL, IN1
vmull.p64 XH, SHASH_H, XL_H @ a1 * b1
veor T1, T1, XL
vmull.p64 XL, SHASH_L, XL_L @ a0 * b0
vmull.p64 XM, SHASH2_L, T1_L @ (a1 + a0)(b1 + b0)
vext.8 T1, XL, XH, #8
veor T2, XL, XH
veor XM, XM, T1
veor XM, XM, T2
vmull.p64 T2, XL_L, MASK_L
vmov XH_L, XM_H
vmov XM_H, XL_L
veor XL, XM, T2
vext.8 T2, XL, XL, #8
vmull.p64 XL, XL_L, MASK_L
veor T2, T2, XH
veor XL, XL, T2
bne 0b
vst1.64 {XL}, [r1]
bx lr
ENDPROC(pmull_ghash_update)

View File

@ -0,0 +1,320 @@
/*
* Accelerated GHASH implementation with ARMv8 vmull.p64 instructions.
*
* Copyright (C) 2015 Linaro Ltd. <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License version 2 as published
* by the Free Software Foundation.
*/
#include <asm/hwcap.h>
#include <asm/neon.h>
#include <asm/simd.h>
#include <asm/unaligned.h>
#include <crypto/cryptd.h>
#include <crypto/internal/hash.h>
#include <crypto/gf128mul.h>
#include <linux/crypto.h>
#include <linux/module.h>
MODULE_DESCRIPTION("GHASH secure hash using ARMv8 Crypto Extensions");
MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
MODULE_LICENSE("GPL v2");
#define GHASH_BLOCK_SIZE 16
#define GHASH_DIGEST_SIZE 16
struct ghash_key {
u64 a;
u64 b;
};
struct ghash_desc_ctx {
u64 digest[GHASH_DIGEST_SIZE/sizeof(u64)];
u8 buf[GHASH_BLOCK_SIZE];
u32 count;
};
struct ghash_async_ctx {
struct cryptd_ahash *cryptd_tfm;
};
asmlinkage void pmull_ghash_update(int blocks, u64 dg[], const char *src,
struct ghash_key const *k, const char *head);
static int ghash_init(struct shash_desc *desc)
{
struct ghash_desc_ctx *ctx = shash_desc_ctx(desc);
*ctx = (struct ghash_desc_ctx){};
return 0;
}
static int ghash_update(struct shash_desc *desc, const u8 *src,
unsigned int len)
{
struct ghash_desc_ctx *ctx = shash_desc_ctx(desc);
unsigned int partial = ctx->count % GHASH_BLOCK_SIZE;
ctx->count += len;
if ((partial + len) >= GHASH_BLOCK_SIZE) {
struct ghash_key *key = crypto_shash_ctx(desc->tfm);
int blocks;
if (partial) {
int p = GHASH_BLOCK_SIZE - partial;
memcpy(ctx->buf + partial, src, p);
src += p;
len -= p;
}
blocks = len / GHASH_BLOCK_SIZE;
len %= GHASH_BLOCK_SIZE;
kernel_neon_begin();
pmull_ghash_update(blocks, ctx->digest, src, key,
partial ? ctx->buf : NULL);
kernel_neon_end();
src += blocks * GHASH_BLOCK_SIZE;
partial = 0;
}
if (len)
memcpy(ctx->buf + partial, src, len);
return 0;
}
static int ghash_final(struct shash_desc *desc, u8 *dst)
{
struct ghash_desc_ctx *ctx = shash_desc_ctx(desc);
unsigned int partial = ctx->count % GHASH_BLOCK_SIZE;
if (partial) {
struct ghash_key *key = crypto_shash_ctx(desc->tfm);
memset(ctx->buf + partial, 0, GHASH_BLOCK_SIZE - partial);
kernel_neon_begin();
pmull_ghash_update(1, ctx->digest, ctx->buf, key, NULL);
kernel_neon_end();
}
put_unaligned_be64(ctx->digest[1], dst);
put_unaligned_be64(ctx->digest[0], dst + 8);
*ctx = (struct ghash_desc_ctx){};
return 0;
}
static int ghash_setkey(struct crypto_shash *tfm,
const u8 *inkey, unsigned int keylen)
{
struct ghash_key *key = crypto_shash_ctx(tfm);
u64 a, b;
if (keylen != GHASH_BLOCK_SIZE) {
crypto_shash_set_flags(tfm, CRYPTO_TFM_RES_BAD_KEY_LEN);
return -EINVAL;
}
/* perform multiplication by 'x' in GF(2^128) */
b = get_unaligned_be64(inkey);
a = get_unaligned_be64(inkey + 8);
key->a = (a << 1) | (b >> 63);
key->b = (b << 1) | (a >> 63);
if (b >> 63)
key->b ^= 0xc200000000000000UL;
return 0;
}
static struct shash_alg ghash_alg = {
.digestsize = GHASH_DIGEST_SIZE,
.init = ghash_init,
.update = ghash_update,
.final = ghash_final,
.setkey = ghash_setkey,
.descsize = sizeof(struct ghash_desc_ctx),
.base = {
.cra_name = "ghash",
.cra_driver_name = "__driver-ghash-ce",
.cra_priority = 0,
.cra_flags = CRYPTO_ALG_TYPE_SHASH | CRYPTO_ALG_INTERNAL,
.cra_blocksize = GHASH_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct ghash_key),
.cra_module = THIS_MODULE,
},
};
static int ghash_async_init(struct ahash_request *req)
{
struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
struct ghash_async_ctx *ctx = crypto_ahash_ctx(tfm);
struct ahash_request *cryptd_req = ahash_request_ctx(req);
struct cryptd_ahash *cryptd_tfm = ctx->cryptd_tfm;
if (!may_use_simd()) {
memcpy(cryptd_req, req, sizeof(*req));
ahash_request_set_tfm(cryptd_req, &cryptd_tfm->base);
return crypto_ahash_init(cryptd_req);
} else {
struct shash_desc *desc = cryptd_shash_desc(cryptd_req);
struct crypto_shash *child = cryptd_ahash_child(cryptd_tfm);
desc->tfm = child;
desc->flags = req->base.flags;
return crypto_shash_init(desc);
}
}
static int ghash_async_update(struct ahash_request *req)
{
struct ahash_request *cryptd_req = ahash_request_ctx(req);
if (!may_use_simd()) {
struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
struct ghash_async_ctx *ctx = crypto_ahash_ctx(tfm);
struct cryptd_ahash *cryptd_tfm = ctx->cryptd_tfm;
memcpy(cryptd_req, req, sizeof(*req));
ahash_request_set_tfm(cryptd_req, &cryptd_tfm->base);
return crypto_ahash_update(cryptd_req);
} else {
struct shash_desc *desc = cryptd_shash_desc(cryptd_req);
return shash_ahash_update(req, desc);
}
}
static int ghash_async_final(struct ahash_request *req)
{
struct ahash_request *cryptd_req = ahash_request_ctx(req);
if (!may_use_simd()) {
struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
struct ghash_async_ctx *ctx = crypto_ahash_ctx(tfm);
struct cryptd_ahash *cryptd_tfm = ctx->cryptd_tfm;
memcpy(cryptd_req, req, sizeof(*req));
ahash_request_set_tfm(cryptd_req, &cryptd_tfm->base);
return crypto_ahash_final(cryptd_req);
} else {
struct shash_desc *desc = cryptd_shash_desc(cryptd_req);
return crypto_shash_final(desc, req->result);
}
}
static int ghash_async_digest(struct ahash_request *req)
{
struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
struct ghash_async_ctx *ctx = crypto_ahash_ctx(tfm);
struct ahash_request *cryptd_req = ahash_request_ctx(req);
struct cryptd_ahash *cryptd_tfm = ctx->cryptd_tfm;
if (!may_use_simd()) {
memcpy(cryptd_req, req, sizeof(*req));
ahash_request_set_tfm(cryptd_req, &cryptd_tfm->base);
return crypto_ahash_digest(cryptd_req);
} else {
struct shash_desc *desc = cryptd_shash_desc(cryptd_req);
struct crypto_shash *child = cryptd_ahash_child(cryptd_tfm);
desc->tfm = child;
desc->flags = req->base.flags;
return shash_ahash_digest(req, desc);
}
}
static int ghash_async_setkey(struct crypto_ahash *tfm, const u8 *key,
unsigned int keylen)
{
struct ghash_async_ctx *ctx = crypto_ahash_ctx(tfm);
struct crypto_ahash *child = &ctx->cryptd_tfm->base;
int err;
crypto_ahash_clear_flags(child, CRYPTO_TFM_REQ_MASK);
crypto_ahash_set_flags(child, crypto_ahash_get_flags(tfm)
& CRYPTO_TFM_REQ_MASK);
err = crypto_ahash_setkey(child, key, keylen);
crypto_ahash_set_flags(tfm, crypto_ahash_get_flags(child)
& CRYPTO_TFM_RES_MASK);
return err;
}
static int ghash_async_init_tfm(struct crypto_tfm *tfm)
{
struct cryptd_ahash *cryptd_tfm;
struct ghash_async_ctx *ctx = crypto_tfm_ctx(tfm);
cryptd_tfm = cryptd_alloc_ahash("__driver-ghash-ce",
CRYPTO_ALG_INTERNAL,
CRYPTO_ALG_INTERNAL);
if (IS_ERR(cryptd_tfm))
return PTR_ERR(cryptd_tfm);
ctx->cryptd_tfm = cryptd_tfm;
crypto_ahash_set_reqsize(__crypto_ahash_cast(tfm),
sizeof(struct ahash_request) +
crypto_ahash_reqsize(&cryptd_tfm->base));
return 0;
}
static void ghash_async_exit_tfm(struct crypto_tfm *tfm)
{
struct ghash_async_ctx *ctx = crypto_tfm_ctx(tfm);
cryptd_free_ahash(ctx->cryptd_tfm);
}
static struct ahash_alg ghash_async_alg = {
.init = ghash_async_init,
.update = ghash_async_update,
.final = ghash_async_final,
.setkey = ghash_async_setkey,
.digest = ghash_async_digest,
.halg.digestsize = GHASH_DIGEST_SIZE,
.halg.base = {
.cra_name = "ghash",
.cra_driver_name = "ghash-ce",
.cra_priority = 300,
.cra_flags = CRYPTO_ALG_TYPE_AHASH | CRYPTO_ALG_ASYNC,
.cra_blocksize = GHASH_BLOCK_SIZE,
.cra_type = &crypto_ahash_type,
.cra_ctxsize = sizeof(struct ghash_async_ctx),
.cra_module = THIS_MODULE,
.cra_init = ghash_async_init_tfm,
.cra_exit = ghash_async_exit_tfm,
},
};
static int __init ghash_ce_mod_init(void)
{
int err;
if (!(elf_hwcap2 & HWCAP2_PMULL))
return -ENODEV;
err = crypto_register_shash(&ghash_alg);
if (err)
return err;
err = crypto_register_ahash(&ghash_async_alg);
if (err)
goto err_shash;
return 0;
err_shash:
crypto_unregister_shash(&ghash_alg);
return err;
}
static void __exit ghash_ce_mod_exit(void)
{
crypto_unregister_ahash(&ghash_async_alg);
crypto_unregister_shash(&ghash_alg);
}
module_init(ghash_ce_mod_init);
module_exit(ghash_ce_mod_exit);

View File

@ -0,0 +1,125 @@
/*
* sha1-ce-core.S - SHA-1 secure hash using ARMv8 Crypto Extensions
*
* Copyright (C) 2015 Linaro Ltd.
* Author: Ard Biesheuvel <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#include <linux/linkage.h>
#include <asm/assembler.h>
.text
.fpu crypto-neon-fp-armv8
k0 .req q0
k1 .req q1
k2 .req q2
k3 .req q3
ta0 .req q4
ta1 .req q5
tb0 .req q5
tb1 .req q4
dga .req q6
dgb .req q7
dgbs .req s28
dg0 .req q12
dg1a0 .req q13
dg1a1 .req q14
dg1b0 .req q14
dg1b1 .req q13
.macro add_only, op, ev, rc, s0, dg1
.ifnb \s0
vadd.u32 tb\ev, q\s0, \rc
.endif
sha1h.32 dg1b\ev, dg0
.ifb \dg1
sha1\op\().32 dg0, dg1a\ev, ta\ev
.else
sha1\op\().32 dg0, \dg1, ta\ev
.endif
.endm
.macro add_update, op, ev, rc, s0, s1, s2, s3, dg1
sha1su0.32 q\s0, q\s1, q\s2
add_only \op, \ev, \rc, \s1, \dg1
sha1su1.32 q\s0, q\s3
.endm
.align 6
.Lsha1_rcon:
.word 0x5a827999, 0x5a827999, 0x5a827999, 0x5a827999
.word 0x6ed9eba1, 0x6ed9eba1, 0x6ed9eba1, 0x6ed9eba1
.word 0x8f1bbcdc, 0x8f1bbcdc, 0x8f1bbcdc, 0x8f1bbcdc
.word 0xca62c1d6, 0xca62c1d6, 0xca62c1d6, 0xca62c1d6
/*
* void sha1_ce_transform(struct sha1_state *sst, u8 const *src,
* int blocks);
*/
ENTRY(sha1_ce_transform)
/* load round constants */
adr ip, .Lsha1_rcon
vld1.32 {k0-k1}, [ip, :128]!
vld1.32 {k2-k3}, [ip, :128]
/* load state */
vld1.32 {dga}, [r0]
vldr dgbs, [r0, #16]
/* load input */
0: vld1.32 {q8-q9}, [r1]!
vld1.32 {q10-q11}, [r1]!
subs r2, r2, #1
#ifndef CONFIG_CPU_BIG_ENDIAN
vrev32.8 q8, q8
vrev32.8 q9, q9
vrev32.8 q10, q10
vrev32.8 q11, q11
#endif
vadd.u32 ta0, q8, k0
vmov dg0, dga
add_update c, 0, k0, 8, 9, 10, 11, dgb
add_update c, 1, k0, 9, 10, 11, 8
add_update c, 0, k0, 10, 11, 8, 9
add_update c, 1, k0, 11, 8, 9, 10
add_update c, 0, k1, 8, 9, 10, 11
add_update p, 1, k1, 9, 10, 11, 8
add_update p, 0, k1, 10, 11, 8, 9
add_update p, 1, k1, 11, 8, 9, 10
add_update p, 0, k1, 8, 9, 10, 11
add_update p, 1, k2, 9, 10, 11, 8
add_update m, 0, k2, 10, 11, 8, 9
add_update m, 1, k2, 11, 8, 9, 10
add_update m, 0, k2, 8, 9, 10, 11
add_update m, 1, k2, 9, 10, 11, 8
add_update m, 0, k3, 10, 11, 8, 9
add_update p, 1, k3, 11, 8, 9, 10
add_only p, 0, k3, 9
add_only p, 1, k3, 10
add_only p, 0, k3, 11
add_only p, 1
/* update state */
vadd.u32 dga, dga, dg0
vadd.u32 dgb, dgb, dg1a0
bne 0b
/* store new state */
vst1.32 {dga}, [r0]
vstr dgbs, [r0, #16]
bx lr
ENDPROC(sha1_ce_transform)

View File

@ -0,0 +1,96 @@
/*
* sha1-ce-glue.c - SHA-1 secure hash using ARMv8 Crypto Extensions
*
* Copyright (C) 2015 Linaro Ltd <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#include <crypto/internal/hash.h>
#include <crypto/sha.h>
#include <crypto/sha1_base.h>
#include <linux/crypto.h>
#include <linux/module.h>
#include <asm/hwcap.h>
#include <asm/neon.h>
#include <asm/simd.h>
#include "sha1.h"
MODULE_DESCRIPTION("SHA1 secure hash using ARMv8 Crypto Extensions");
MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
MODULE_LICENSE("GPL v2");
asmlinkage void sha1_ce_transform(struct sha1_state *sst, u8 const *src,
int blocks);
static int sha1_ce_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
struct sha1_state *sctx = shash_desc_ctx(desc);
if (!may_use_simd() ||
(sctx->count % SHA1_BLOCK_SIZE) + len < SHA1_BLOCK_SIZE)
return sha1_update_arm(desc, data, len);
kernel_neon_begin();
sha1_base_do_update(desc, data, len, sha1_ce_transform);
kernel_neon_end();
return 0;
}
static int sha1_ce_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
if (!may_use_simd())
return sha1_finup_arm(desc, data, len, out);
kernel_neon_begin();
if (len)
sha1_base_do_update(desc, data, len, sha1_ce_transform);
sha1_base_do_finalize(desc, sha1_ce_transform);
kernel_neon_end();
return sha1_base_finish(desc, out);
}
static int sha1_ce_final(struct shash_desc *desc, u8 *out)
{
return sha1_ce_finup(desc, NULL, 0, out);
}
static struct shash_alg alg = {
.init = sha1_base_init,
.update = sha1_ce_update,
.final = sha1_ce_final,
.finup = sha1_ce_finup,
.descsize = sizeof(struct sha1_state),
.digestsize = SHA1_DIGEST_SIZE,
.base = {
.cra_name = "sha1",
.cra_driver_name = "sha1-ce",
.cra_priority = 200,
.cra_flags = CRYPTO_ALG_TYPE_SHASH,
.cra_blocksize = SHA1_BLOCK_SIZE,
.cra_module = THIS_MODULE,
}
};
static int __init sha1_ce_mod_init(void)
{
if (!(elf_hwcap2 & HWCAP2_SHA1))
return -ENODEV;
return crypto_register_shash(&alg);
}
static void __exit sha1_ce_mod_fini(void)
{
crypto_unregister_shash(&alg);
}
module_init(sha1_ce_mod_init);
module_exit(sha1_ce_mod_fini);

View File

@ -7,4 +7,7 @@
extern int sha1_update_arm(struct shash_desc *desc, const u8 *data,
unsigned int len);
extern int sha1_finup_arm(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out);
#endif

View File

@ -22,127 +22,47 @@
#include <linux/cryptohash.h>
#include <linux/types.h>
#include <crypto/sha.h>
#include <crypto/sha1_base.h>
#include <asm/byteorder.h>
#include <asm/crypto/sha1.h>
#include "sha1.h"
asmlinkage void sha1_block_data_order(u32 *digest,
const unsigned char *data, unsigned int rounds);
static int sha1_init(struct shash_desc *desc)
{
struct sha1_state *sctx = shash_desc_ctx(desc);
*sctx = (struct sha1_state){
.state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 },
};
return 0;
}
static int __sha1_update(struct sha1_state *sctx, const u8 *data,
unsigned int len, unsigned int partial)
{
unsigned int done = 0;
sctx->count += len;
if (partial) {
done = SHA1_BLOCK_SIZE - partial;
memcpy(sctx->buffer + partial, data, done);
sha1_block_data_order(sctx->state, sctx->buffer, 1);
}
if (len - done >= SHA1_BLOCK_SIZE) {
const unsigned int rounds = (len - done) / SHA1_BLOCK_SIZE;
sha1_block_data_order(sctx->state, data + done, rounds);
done += rounds * SHA1_BLOCK_SIZE;
}
memcpy(sctx->buffer, data + done, len - done);
return 0;
}
int sha1_update_arm(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
struct sha1_state *sctx = shash_desc_ctx(desc);
unsigned int partial = sctx->count % SHA1_BLOCK_SIZE;
int res;
/* make sure casting to sha1_block_fn() is safe */
BUILD_BUG_ON(offsetof(struct sha1_state, state) != 0);
/* Handle the fast case right here */
if (partial + len < SHA1_BLOCK_SIZE) {
sctx->count += len;
memcpy(sctx->buffer + partial, data, len);
return 0;
}
res = __sha1_update(sctx, data, len, partial);
return res;
return sha1_base_do_update(desc, data, len,
(sha1_block_fn *)sha1_block_data_order);
}
EXPORT_SYMBOL_GPL(sha1_update_arm);
/* Add padding and return the message digest. */
static int sha1_final(struct shash_desc *desc, u8 *out)
{
struct sha1_state *sctx = shash_desc_ctx(desc);
unsigned int i, index, padlen;
__be32 *dst = (__be32 *)out;
__be64 bits;
static const u8 padding[SHA1_BLOCK_SIZE] = { 0x80, };
bits = cpu_to_be64(sctx->count << 3);
/* Pad out to 56 mod 64 and append length */
index = sctx->count % SHA1_BLOCK_SIZE;
padlen = (index < 56) ? (56 - index) : ((SHA1_BLOCK_SIZE+56) - index);
/* We need to fill a whole block for __sha1_update() */
if (padlen <= 56) {
sctx->count += padlen;
memcpy(sctx->buffer + index, padding, padlen);
} else {
__sha1_update(sctx, padding, padlen, index);
}
__sha1_update(sctx, (const u8 *)&bits, sizeof(bits), 56);
/* Store state in digest */
for (i = 0; i < 5; i++)
dst[i] = cpu_to_be32(sctx->state[i]);
/* Wipe context */
memset(sctx, 0, sizeof(*sctx));
return 0;
sha1_base_do_finalize(desc, (sha1_block_fn *)sha1_block_data_order);
return sha1_base_finish(desc, out);
}
static int sha1_export(struct shash_desc *desc, void *out)
int sha1_finup_arm(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
struct sha1_state *sctx = shash_desc_ctx(desc);
memcpy(out, sctx, sizeof(*sctx));
return 0;
sha1_base_do_update(desc, data, len,
(sha1_block_fn *)sha1_block_data_order);
return sha1_final(desc, out);
}
static int sha1_import(struct shash_desc *desc, const void *in)
{
struct sha1_state *sctx = shash_desc_ctx(desc);
memcpy(sctx, in, sizeof(*sctx));
return 0;
}
EXPORT_SYMBOL_GPL(sha1_finup_arm);
static struct shash_alg alg = {
.digestsize = SHA1_DIGEST_SIZE,
.init = sha1_init,
.init = sha1_base_init,
.update = sha1_update_arm,
.final = sha1_final,
.export = sha1_export,
.import = sha1_import,
.finup = sha1_finup_arm,
.descsize = sizeof(struct sha1_state),
.statesize = sizeof(struct sha1_state),
.base = {
.cra_name = "sha1",
.cra_driver_name= "sha1-asm",

View File

@ -25,147 +25,60 @@
#include <linux/cryptohash.h>
#include <linux/types.h>
#include <crypto/sha.h>
#include <asm/byteorder.h>
#include <crypto/sha1_base.h>
#include <asm/neon.h>
#include <asm/simd.h>
#include <asm/crypto/sha1.h>
#include "sha1.h"
asmlinkage void sha1_transform_neon(void *state_h, const char *data,
unsigned int rounds);
static int sha1_neon_init(struct shash_desc *desc)
{
struct sha1_state *sctx = shash_desc_ctx(desc);
*sctx = (struct sha1_state){
.state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 },
};
return 0;
}
static int __sha1_neon_update(struct shash_desc *desc, const u8 *data,
unsigned int len, unsigned int partial)
{
struct sha1_state *sctx = shash_desc_ctx(desc);
unsigned int done = 0;
sctx->count += len;
if (partial) {
done = SHA1_BLOCK_SIZE - partial;
memcpy(sctx->buffer + partial, data, done);
sha1_transform_neon(sctx->state, sctx->buffer, 1);
}
if (len - done >= SHA1_BLOCK_SIZE) {
const unsigned int rounds = (len - done) / SHA1_BLOCK_SIZE;
sha1_transform_neon(sctx->state, data + done, rounds);
done += rounds * SHA1_BLOCK_SIZE;
}
memcpy(sctx->buffer, data + done, len - done);
return 0;
}
static int sha1_neon_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
unsigned int len)
{
struct sha1_state *sctx = shash_desc_ctx(desc);
unsigned int partial = sctx->count % SHA1_BLOCK_SIZE;
int res;
/* Handle the fast case right here */
if (partial + len < SHA1_BLOCK_SIZE) {
sctx->count += len;
memcpy(sctx->buffer + partial, data, len);
if (!may_use_simd() ||
(sctx->count % SHA1_BLOCK_SIZE) + len < SHA1_BLOCK_SIZE)
return sha1_update_arm(desc, data, len);
return 0;
}
kernel_neon_begin();
sha1_base_do_update(desc, data, len,
(sha1_block_fn *)sha1_transform_neon);
kernel_neon_end();
if (!may_use_simd()) {
res = sha1_update_arm(desc, data, len);
} else {
kernel_neon_begin();
res = __sha1_neon_update(desc, data, len, partial);
kernel_neon_end();
}
return res;
return 0;
}
static int sha1_neon_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
if (!may_use_simd())
return sha1_finup_arm(desc, data, len, out);
kernel_neon_begin();
if (len)
sha1_base_do_update(desc, data, len,
(sha1_block_fn *)sha1_transform_neon);
sha1_base_do_finalize(desc, (sha1_block_fn *)sha1_transform_neon);
kernel_neon_end();
return sha1_base_finish(desc, out);
}
/* Add padding and return the message digest. */
static int sha1_neon_final(struct shash_desc *desc, u8 *out)
{
struct sha1_state *sctx = shash_desc_ctx(desc);
unsigned int i, index, padlen;
__be32 *dst = (__be32 *)out;
__be64 bits;
static const u8 padding[SHA1_BLOCK_SIZE] = { 0x80, };
bits = cpu_to_be64(sctx->count << 3);
/* Pad out to 56 mod 64 and append length */
index = sctx->count % SHA1_BLOCK_SIZE;
padlen = (index < 56) ? (56 - index) : ((SHA1_BLOCK_SIZE+56) - index);
if (!may_use_simd()) {
sha1_update_arm(desc, padding, padlen);
sha1_update_arm(desc, (const u8 *)&bits, sizeof(bits));
} else {
kernel_neon_begin();
/* We need to fill a whole block for __sha1_neon_update() */
if (padlen <= 56) {
sctx->count += padlen;
memcpy(sctx->buffer + index, padding, padlen);
} else {
__sha1_neon_update(desc, padding, padlen, index);
}
__sha1_neon_update(desc, (const u8 *)&bits, sizeof(bits), 56);
kernel_neon_end();
}
/* Store state in digest */
for (i = 0; i < 5; i++)
dst[i] = cpu_to_be32(sctx->state[i]);
/* Wipe context */
memset(sctx, 0, sizeof(*sctx));
return 0;
}
static int sha1_neon_export(struct shash_desc *desc, void *out)
{
struct sha1_state *sctx = shash_desc_ctx(desc);
memcpy(out, sctx, sizeof(*sctx));
return 0;
}
static int sha1_neon_import(struct shash_desc *desc, const void *in)
{
struct sha1_state *sctx = shash_desc_ctx(desc);
memcpy(sctx, in, sizeof(*sctx));
return 0;
return sha1_neon_finup(desc, NULL, 0, out);
}
static struct shash_alg alg = {
.digestsize = SHA1_DIGEST_SIZE,
.init = sha1_neon_init,
.init = sha1_base_init,
.update = sha1_neon_update,
.final = sha1_neon_final,
.export = sha1_neon_export,
.import = sha1_neon_import,
.finup = sha1_neon_finup,
.descsize = sizeof(struct sha1_state),
.statesize = sizeof(struct sha1_state),
.base = {
.cra_name = "sha1",
.cra_driver_name = "sha1-neon",

View File

@ -0,0 +1,125 @@
/*
* sha2-ce-core.S - SHA-224/256 secure hash using ARMv8 Crypto Extensions
*
* Copyright (C) 2015 Linaro Ltd.
* Author: Ard Biesheuvel <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#include <linux/linkage.h>
#include <asm/assembler.h>
.text
.fpu crypto-neon-fp-armv8
k0 .req q7
k1 .req q8
rk .req r3
ta0 .req q9
ta1 .req q10
tb0 .req q10
tb1 .req q9
dga .req q11
dgb .req q12
dg0 .req q13
dg1 .req q14
dg2 .req q15
.macro add_only, ev, s0
vmov dg2, dg0
.ifnb \s0
vld1.32 {k\ev}, [rk, :128]!
.endif
sha256h.32 dg0, dg1, tb\ev
sha256h2.32 dg1, dg2, tb\ev
.ifnb \s0
vadd.u32 ta\ev, q\s0, k\ev
.endif
.endm
.macro add_update, ev, s0, s1, s2, s3
sha256su0.32 q\s0, q\s1
add_only \ev, \s1
sha256su1.32 q\s0, q\s2, q\s3
.endm
.align 6
.Lsha256_rcon:
.word 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5
.word 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5
.word 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3
.word 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174
.word 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc
.word 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da
.word 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7
.word 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967
.word 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13
.word 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85
.word 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3
.word 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070
.word 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5
.word 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3
.word 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208
.word 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
/*
* void sha2_ce_transform(struct sha256_state *sst, u8 const *src,
int blocks);
*/
ENTRY(sha2_ce_transform)
/* load state */
vld1.32 {dga-dgb}, [r0]
/* load input */
0: vld1.32 {q0-q1}, [r1]!
vld1.32 {q2-q3}, [r1]!
subs r2, r2, #1
#ifndef CONFIG_CPU_BIG_ENDIAN
vrev32.8 q0, q0
vrev32.8 q1, q1
vrev32.8 q2, q2
vrev32.8 q3, q3
#endif
/* load first round constant */
adr rk, .Lsha256_rcon
vld1.32 {k0}, [rk, :128]!
vadd.u32 ta0, q0, k0
vmov dg0, dga
vmov dg1, dgb
add_update 1, 0, 1, 2, 3
add_update 0, 1, 2, 3, 0
add_update 1, 2, 3, 0, 1
add_update 0, 3, 0, 1, 2
add_update 1, 0, 1, 2, 3
add_update 0, 1, 2, 3, 0
add_update 1, 2, 3, 0, 1
add_update 0, 3, 0, 1, 2
add_update 1, 0, 1, 2, 3
add_update 0, 1, 2, 3, 0
add_update 1, 2, 3, 0, 1
add_update 0, 3, 0, 1, 2
add_only 1, 1
add_only 0, 2
add_only 1, 3
add_only 0
/* update state */
vadd.u32 dga, dga, dg0
vadd.u32 dgb, dgb, dg1
bne 0b
/* store new state */
vst1.32 {dga-dgb}, [r0]
bx lr
ENDPROC(sha2_ce_transform)

View File

@ -0,0 +1,114 @@
/*
* sha2-ce-glue.c - SHA-224/SHA-256 using ARMv8 Crypto Extensions
*
* Copyright (C) 2015 Linaro Ltd <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#include <crypto/internal/hash.h>
#include <crypto/sha.h>
#include <crypto/sha256_base.h>
#include <linux/crypto.h>
#include <linux/module.h>
#include <asm/hwcap.h>
#include <asm/simd.h>
#include <asm/neon.h>
#include <asm/unaligned.h>
#include "sha256_glue.h"
MODULE_DESCRIPTION("SHA-224/SHA-256 secure hash using ARMv8 Crypto Extensions");
MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
MODULE_LICENSE("GPL v2");
asmlinkage void sha2_ce_transform(struct sha256_state *sst, u8 const *src,
int blocks);
static int sha2_ce_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
struct sha256_state *sctx = shash_desc_ctx(desc);
if (!may_use_simd() ||
(sctx->count % SHA256_BLOCK_SIZE) + len < SHA256_BLOCK_SIZE)
return crypto_sha256_arm_update(desc, data, len);
kernel_neon_begin();
sha256_base_do_update(desc, data, len,
(sha256_block_fn *)sha2_ce_transform);
kernel_neon_end();
return 0;
}
static int sha2_ce_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
if (!may_use_simd())
return crypto_sha256_arm_finup(desc, data, len, out);
kernel_neon_begin();
if (len)
sha256_base_do_update(desc, data, len,
(sha256_block_fn *)sha2_ce_transform);
sha256_base_do_finalize(desc, (sha256_block_fn *)sha2_ce_transform);
kernel_neon_end();
return sha256_base_finish(desc, out);
}
static int sha2_ce_final(struct shash_desc *desc, u8 *out)
{
return sha2_ce_finup(desc, NULL, 0, out);
}
static struct shash_alg algs[] = { {
.init = sha224_base_init,
.update = sha2_ce_update,
.final = sha2_ce_final,
.finup = sha2_ce_finup,
.descsize = sizeof(struct sha256_state),
.digestsize = SHA224_DIGEST_SIZE,
.base = {
.cra_name = "sha224",
.cra_driver_name = "sha224-ce",
.cra_priority = 300,
.cra_flags = CRYPTO_ALG_TYPE_SHASH,
.cra_blocksize = SHA256_BLOCK_SIZE,
.cra_module = THIS_MODULE,
}
}, {
.init = sha256_base_init,
.update = sha2_ce_update,
.final = sha2_ce_final,
.finup = sha2_ce_finup,
.descsize = sizeof(struct sha256_state),
.digestsize = SHA256_DIGEST_SIZE,
.base = {
.cra_name = "sha256",
.cra_driver_name = "sha256-ce",
.cra_priority = 300,
.cra_flags = CRYPTO_ALG_TYPE_SHASH,
.cra_blocksize = SHA256_BLOCK_SIZE,
.cra_module = THIS_MODULE,
}
} };
static int __init sha2_ce_mod_init(void)
{
if (!(elf_hwcap2 & HWCAP2_SHA2))
return -ENODEV;
return crypto_register_shashes(algs, ARRAY_SIZE(algs));
}
static void __exit sha2_ce_mod_fini(void)
{
crypto_unregister_shashes(algs, ARRAY_SIZE(algs));
}
module_init(sha2_ce_mod_init);
module_exit(sha2_ce_mod_fini);

View File

@ -0,0 +1,716 @@
#!/usr/bin/env perl
# ====================================================================
# Written by Andy Polyakov <appro@openssl.org> for the OpenSSL
# project. The module is, however, dual licensed under OpenSSL and
# CRYPTOGAMS licenses depending on where you obtain it. For further
# details see http://www.openssl.org/~appro/cryptogams/.
#
# Permission to use under GPL terms is granted.
# ====================================================================
# SHA256 block procedure for ARMv4. May 2007.
# Performance is ~2x better than gcc 3.4 generated code and in "abso-
# lute" terms is ~2250 cycles per 64-byte block or ~35 cycles per
# byte [on single-issue Xscale PXA250 core].
# July 2010.
#
# Rescheduling for dual-issue pipeline resulted in 22% improvement on
# Cortex A8 core and ~20 cycles per processed byte.
# February 2011.
#
# Profiler-assisted and platform-specific optimization resulted in 16%
# improvement on Cortex A8 core and ~15.4 cycles per processed byte.
# September 2013.
#
# Add NEON implementation. On Cortex A8 it was measured to process one
# byte in 12.5 cycles or 23% faster than integer-only code. Snapdragon
# S4 does it in 12.5 cycles too, but it's 50% faster than integer-only
# code (meaning that latter performs sub-optimally, nothing was done
# about it).
# May 2014.
#
# Add ARMv8 code path performing at 2.0 cpb on Apple A7.
while (($output=shift) && ($output!~/^\w[\w\-]*\.\w+$/)) {}
open STDOUT,">$output";
$ctx="r0"; $t0="r0";
$inp="r1"; $t4="r1";
$len="r2"; $t1="r2";
$T1="r3"; $t3="r3";
$A="r4";
$B="r5";
$C="r6";
$D="r7";
$E="r8";
$F="r9";
$G="r10";
$H="r11";
@V=($A,$B,$C,$D,$E,$F,$G,$H);
$t2="r12";
$Ktbl="r14";
@Sigma0=( 2,13,22);
@Sigma1=( 6,11,25);
@sigma0=( 7,18, 3);
@sigma1=(17,19,10);
sub BODY_00_15 {
my ($i,$a,$b,$c,$d,$e,$f,$g,$h) = @_;
$code.=<<___ if ($i<16);
#if __ARM_ARCH__>=7
@ ldr $t1,[$inp],#4 @ $i
# if $i==15
str $inp,[sp,#17*4] @ make room for $t4
# endif
eor $t0,$e,$e,ror#`$Sigma1[1]-$Sigma1[0]`
add $a,$a,$t2 @ h+=Maj(a,b,c) from the past
eor $t0,$t0,$e,ror#`$Sigma1[2]-$Sigma1[0]` @ Sigma1(e)
# ifndef __ARMEB__
rev $t1,$t1
# endif
#else
@ ldrb $t1,[$inp,#3] @ $i
add $a,$a,$t2 @ h+=Maj(a,b,c) from the past
ldrb $t2,[$inp,#2]
ldrb $t0,[$inp,#1]
orr $t1,$t1,$t2,lsl#8
ldrb $t2,[$inp],#4
orr $t1,$t1,$t0,lsl#16
# if $i==15
str $inp,[sp,#17*4] @ make room for $t4
# endif
eor $t0,$e,$e,ror#`$Sigma1[1]-$Sigma1[0]`
orr $t1,$t1,$t2,lsl#24
eor $t0,$t0,$e,ror#`$Sigma1[2]-$Sigma1[0]` @ Sigma1(e)
#endif
___
$code.=<<___;
ldr $t2,[$Ktbl],#4 @ *K256++
add $h,$h,$t1 @ h+=X[i]
str $t1,[sp,#`$i%16`*4]
eor $t1,$f,$g
add $h,$h,$t0,ror#$Sigma1[0] @ h+=Sigma1(e)
and $t1,$t1,$e
add $h,$h,$t2 @ h+=K256[i]
eor $t1,$t1,$g @ Ch(e,f,g)
eor $t0,$a,$a,ror#`$Sigma0[1]-$Sigma0[0]`
add $h,$h,$t1 @ h+=Ch(e,f,g)
#if $i==31
and $t2,$t2,#0xff
cmp $t2,#0xf2 @ done?
#endif
#if $i<15
# if __ARM_ARCH__>=7
ldr $t1,[$inp],#4 @ prefetch
# else
ldrb $t1,[$inp,#3]
# endif
eor $t2,$a,$b @ a^b, b^c in next round
#else
ldr $t1,[sp,#`($i+2)%16`*4] @ from future BODY_16_xx
eor $t2,$a,$b @ a^b, b^c in next round
ldr $t4,[sp,#`($i+15)%16`*4] @ from future BODY_16_xx
#endif
eor $t0,$t0,$a,ror#`$Sigma0[2]-$Sigma0[0]` @ Sigma0(a)
and $t3,$t3,$t2 @ (b^c)&=(a^b)
add $d,$d,$h @ d+=h
eor $t3,$t3,$b @ Maj(a,b,c)
add $h,$h,$t0,ror#$Sigma0[0] @ h+=Sigma0(a)
@ add $h,$h,$t3 @ h+=Maj(a,b,c)
___
($t2,$t3)=($t3,$t2);
}
sub BODY_16_XX {
my ($i,$a,$b,$c,$d,$e,$f,$g,$h) = @_;
$code.=<<___;
@ ldr $t1,[sp,#`($i+1)%16`*4] @ $i
@ ldr $t4,[sp,#`($i+14)%16`*4]
mov $t0,$t1,ror#$sigma0[0]
add $a,$a,$t2 @ h+=Maj(a,b,c) from the past
mov $t2,$t4,ror#$sigma1[0]
eor $t0,$t0,$t1,ror#$sigma0[1]
eor $t2,$t2,$t4,ror#$sigma1[1]
eor $t0,$t0,$t1,lsr#$sigma0[2] @ sigma0(X[i+1])
ldr $t1,[sp,#`($i+0)%16`*4]
eor $t2,$t2,$t4,lsr#$sigma1[2] @ sigma1(X[i+14])
ldr $t4,[sp,#`($i+9)%16`*4]
add $t2,$t2,$t0
eor $t0,$e,$e,ror#`$Sigma1[1]-$Sigma1[0]` @ from BODY_00_15
add $t1,$t1,$t2
eor $t0,$t0,$e,ror#`$Sigma1[2]-$Sigma1[0]` @ Sigma1(e)
add $t1,$t1,$t4 @ X[i]
___
&BODY_00_15(@_);
}
$code=<<___;
#ifndef __KERNEL__
# include "arm_arch.h"
#else
# define __ARM_ARCH__ __LINUX_ARM_ARCH__
# define __ARM_MAX_ARCH__ 7
#endif
.text
#if __ARM_ARCH__<7
.code 32
#else
.syntax unified
# ifdef __thumb2__
# define adrl adr
.thumb
# else
.code 32
# endif
#endif
.type K256,%object
.align 5
K256:
.word 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5
.word 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5
.word 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3
.word 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174
.word 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc
.word 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da
.word 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7
.word 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967
.word 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13
.word 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85
.word 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3
.word 0xd192e819,0xd6990624,0xf40e3585,0x106aa070
.word 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5
.word 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3
.word 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208
.word 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
.size K256,.-K256
.word 0 @ terminator
#if __ARM_MAX_ARCH__>=7 && !defined(__KERNEL__)
.LOPENSSL_armcap:
.word OPENSSL_armcap_P-sha256_block_data_order
#endif
.align 5
.global sha256_block_data_order
.type sha256_block_data_order,%function
sha256_block_data_order:
#if __ARM_ARCH__<7
sub r3,pc,#8 @ sha256_block_data_order
#else
adr r3,sha256_block_data_order
#endif
#if __ARM_MAX_ARCH__>=7 && !defined(__KERNEL__)
ldr r12,.LOPENSSL_armcap
ldr r12,[r3,r12] @ OPENSSL_armcap_P
tst r12,#ARMV8_SHA256
bne .LARMv8
tst r12,#ARMV7_NEON
bne .LNEON
#endif
add $len,$inp,$len,lsl#6 @ len to point at the end of inp
stmdb sp!,{$ctx,$inp,$len,r4-r11,lr}
ldmia $ctx,{$A,$B,$C,$D,$E,$F,$G,$H}
sub $Ktbl,r3,#256+32 @ K256
sub sp,sp,#16*4 @ alloca(X[16])
.Loop:
# if __ARM_ARCH__>=7
ldr $t1,[$inp],#4
# else
ldrb $t1,[$inp,#3]
# endif
eor $t3,$B,$C @ magic
eor $t2,$t2,$t2
___
for($i=0;$i<16;$i++) { &BODY_00_15($i,@V); unshift(@V,pop(@V)); }
$code.=".Lrounds_16_xx:\n";
for (;$i<32;$i++) { &BODY_16_XX($i,@V); unshift(@V,pop(@V)); }
$code.=<<___;
#if __ARM_ARCH__>=7
ite eq @ Thumb2 thing, sanity check in ARM
#endif
ldreq $t3,[sp,#16*4] @ pull ctx
bne .Lrounds_16_xx
add $A,$A,$t2 @ h+=Maj(a,b,c) from the past
ldr $t0,[$t3,#0]
ldr $t1,[$t3,#4]
ldr $t2,[$t3,#8]
add $A,$A,$t0
ldr $t0,[$t3,#12]
add $B,$B,$t1
ldr $t1,[$t3,#16]
add $C,$C,$t2
ldr $t2,[$t3,#20]
add $D,$D,$t0
ldr $t0,[$t3,#24]
add $E,$E,$t1
ldr $t1,[$t3,#28]
add $F,$F,$t2
ldr $inp,[sp,#17*4] @ pull inp
ldr $t2,[sp,#18*4] @ pull inp+len
add $G,$G,$t0
add $H,$H,$t1
stmia $t3,{$A,$B,$C,$D,$E,$F,$G,$H}
cmp $inp,$t2
sub $Ktbl,$Ktbl,#256 @ rewind Ktbl
bne .Loop
add sp,sp,#`16+3`*4 @ destroy frame
#if __ARM_ARCH__>=5
ldmia sp!,{r4-r11,pc}
#else
ldmia sp!,{r4-r11,lr}
tst lr,#1
moveq pc,lr @ be binary compatible with V4, yet
bx lr @ interoperable with Thumb ISA:-)
#endif
.size sha256_block_data_order,.-sha256_block_data_order
___
######################################################################
# NEON stuff
#
{{{
my @X=map("q$_",(0..3));
my ($T0,$T1,$T2,$T3,$T4,$T5)=("q8","q9","q10","q11","d24","d25");
my $Xfer=$t4;
my $j=0;
sub Dlo() { shift=~m|q([1]?[0-9])|?"d".($1*2):""; }
sub Dhi() { shift=~m|q([1]?[0-9])|?"d".($1*2+1):""; }
sub AUTOLOAD() # thunk [simplified] x86-style perlasm
{ my $opcode = $AUTOLOAD; $opcode =~ s/.*:://; $opcode =~ s/_/\./;
my $arg = pop;
$arg = "#$arg" if ($arg*1 eq $arg);
$code .= "\t$opcode\t".join(',',@_,$arg)."\n";
}
sub Xupdate()
{ use integer;
my $body = shift;
my @insns = (&$body,&$body,&$body,&$body);
my ($a,$b,$c,$d,$e,$f,$g,$h);
&vext_8 ($T0,@X[0],@X[1],4); # X[1..4]
eval(shift(@insns));
eval(shift(@insns));
eval(shift(@insns));
&vext_8 ($T1,@X[2],@X[3],4); # X[9..12]
eval(shift(@insns));
eval(shift(@insns));
eval(shift(@insns));
&vshr_u32 ($T2,$T0,$sigma0[0]);
eval(shift(@insns));
eval(shift(@insns));
&vadd_i32 (@X[0],@X[0],$T1); # X[0..3] += X[9..12]
eval(shift(@insns));
eval(shift(@insns));
&vshr_u32 ($T1,$T0,$sigma0[2]);
eval(shift(@insns));
eval(shift(@insns));
&vsli_32 ($T2,$T0,32-$sigma0[0]);
eval(shift(@insns));
eval(shift(@insns));
&vshr_u32 ($T3,$T0,$sigma0[1]);
eval(shift(@insns));
eval(shift(@insns));
&veor ($T1,$T1,$T2);
eval(shift(@insns));
eval(shift(@insns));
&vsli_32 ($T3,$T0,32-$sigma0[1]);
eval(shift(@insns));
eval(shift(@insns));
&vshr_u32 ($T4,&Dhi(@X[3]),$sigma1[0]);
eval(shift(@insns));
eval(shift(@insns));
&veor ($T1,$T1,$T3); # sigma0(X[1..4])
eval(shift(@insns));
eval(shift(@insns));
&vsli_32 ($T4,&Dhi(@X[3]),32-$sigma1[0]);
eval(shift(@insns));
eval(shift(@insns));
&vshr_u32 ($T5,&Dhi(@X[3]),$sigma1[2]);
eval(shift(@insns));
eval(shift(@insns));
&vadd_i32 (@X[0],@X[0],$T1); # X[0..3] += sigma0(X[1..4])
eval(shift(@insns));
eval(shift(@insns));
&veor ($T5,$T5,$T4);
eval(shift(@insns));
eval(shift(@insns));
&vshr_u32 ($T4,&Dhi(@X[3]),$sigma1[1]);
eval(shift(@insns));
eval(shift(@insns));
&vsli_32 ($T4,&Dhi(@X[3]),32-$sigma1[1]);
eval(shift(@insns));
eval(shift(@insns));
&veor ($T5,$T5,$T4); # sigma1(X[14..15])
eval(shift(@insns));
eval(shift(@insns));
&vadd_i32 (&Dlo(@X[0]),&Dlo(@X[0]),$T5);# X[0..1] += sigma1(X[14..15])
eval(shift(@insns));
eval(shift(@insns));
&vshr_u32 ($T4,&Dlo(@X[0]),$sigma1[0]);
eval(shift(@insns));
eval(shift(@insns));
&vsli_32 ($T4,&Dlo(@X[0]),32-$sigma1[0]);
eval(shift(@insns));
eval(shift(@insns));
&vshr_u32 ($T5,&Dlo(@X[0]),$sigma1[2]);
eval(shift(@insns));
eval(shift(@insns));
&veor ($T5,$T5,$T4);
eval(shift(@insns));
eval(shift(@insns));
&vshr_u32 ($T4,&Dlo(@X[0]),$sigma1[1]);
eval(shift(@insns));
eval(shift(@insns));
&vld1_32 ("{$T0}","[$Ktbl,:128]!");
eval(shift(@insns));
eval(shift(@insns));
&vsli_32 ($T4,&Dlo(@X[0]),32-$sigma1[1]);
eval(shift(@insns));
eval(shift(@insns));
&veor ($T5,$T5,$T4); # sigma1(X[16..17])
eval(shift(@insns));
eval(shift(@insns));
&vadd_i32 (&Dhi(@X[0]),&Dhi(@X[0]),$T5);# X[2..3] += sigma1(X[16..17])
eval(shift(@insns));
eval(shift(@insns));
&vadd_i32 ($T0,$T0,@X[0]);
while($#insns>=2) { eval(shift(@insns)); }
&vst1_32 ("{$T0}","[$Xfer,:128]!");
eval(shift(@insns));
eval(shift(@insns));
push(@X,shift(@X)); # "rotate" X[]
}
sub Xpreload()
{ use integer;
my $body = shift;
my @insns = (&$body,&$body,&$body,&$body);
my ($a,$b,$c,$d,$e,$f,$g,$h);
eval(shift(@insns));
eval(shift(@insns));
eval(shift(@insns));
eval(shift(@insns));
&vld1_32 ("{$T0}","[$Ktbl,:128]!");
eval(shift(@insns));
eval(shift(@insns));
eval(shift(@insns));
eval(shift(@insns));
&vrev32_8 (@X[0],@X[0]);
eval(shift(@insns));
eval(shift(@insns));
eval(shift(@insns));
eval(shift(@insns));
&vadd_i32 ($T0,$T0,@X[0]);
foreach (@insns) { eval; } # remaining instructions
&vst1_32 ("{$T0}","[$Xfer,:128]!");
push(@X,shift(@X)); # "rotate" X[]
}
sub body_00_15 () {
(
'($a,$b,$c,$d,$e,$f,$g,$h)=@V;'.
'&add ($h,$h,$t1)', # h+=X[i]+K[i]
'&eor ($t1,$f,$g)',
'&eor ($t0,$e,$e,"ror#".($Sigma1[1]-$Sigma1[0]))',
'&add ($a,$a,$t2)', # h+=Maj(a,b,c) from the past
'&and ($t1,$t1,$e)',
'&eor ($t2,$t0,$e,"ror#".($Sigma1[2]-$Sigma1[0]))', # Sigma1(e)
'&eor ($t0,$a,$a,"ror#".($Sigma0[1]-$Sigma0[0]))',
'&eor ($t1,$t1,$g)', # Ch(e,f,g)
'&add ($h,$h,$t2,"ror#$Sigma1[0]")', # h+=Sigma1(e)
'&eor ($t2,$a,$b)', # a^b, b^c in next round
'&eor ($t0,$t0,$a,"ror#".($Sigma0[2]-$Sigma0[0]))', # Sigma0(a)
'&add ($h,$h,$t1)', # h+=Ch(e,f,g)
'&ldr ($t1,sprintf "[sp,#%d]",4*(($j+1)&15)) if (($j&15)!=15);'.
'&ldr ($t1,"[$Ktbl]") if ($j==15);'.
'&ldr ($t1,"[sp,#64]") if ($j==31)',
'&and ($t3,$t3,$t2)', # (b^c)&=(a^b)
'&add ($d,$d,$h)', # d+=h
'&add ($h,$h,$t0,"ror#$Sigma0[0]");'. # h+=Sigma0(a)
'&eor ($t3,$t3,$b)', # Maj(a,b,c)
'$j++; unshift(@V,pop(@V)); ($t2,$t3)=($t3,$t2);'
)
}
$code.=<<___;
#if __ARM_MAX_ARCH__>=7
.arch armv7-a
.fpu neon
.global sha256_block_data_order_neon
.type sha256_block_data_order_neon,%function
.align 4
sha256_block_data_order_neon:
.LNEON:
stmdb sp!,{r4-r12,lr}
sub $H,sp,#16*4+16
adrl $Ktbl,K256
bic $H,$H,#15 @ align for 128-bit stores
mov $t2,sp
mov sp,$H @ alloca
add $len,$inp,$len,lsl#6 @ len to point at the end of inp
vld1.8 {@X[0]},[$inp]!
vld1.8 {@X[1]},[$inp]!
vld1.8 {@X[2]},[$inp]!
vld1.8 {@X[3]},[$inp]!
vld1.32 {$T0},[$Ktbl,:128]!
vld1.32 {$T1},[$Ktbl,:128]!
vld1.32 {$T2},[$Ktbl,:128]!
vld1.32 {$T3},[$Ktbl,:128]!
vrev32.8 @X[0],@X[0] @ yes, even on
str $ctx,[sp,#64]
vrev32.8 @X[1],@X[1] @ big-endian
str $inp,[sp,#68]
mov $Xfer,sp
vrev32.8 @X[2],@X[2]
str $len,[sp,#72]
vrev32.8 @X[3],@X[3]
str $t2,[sp,#76] @ save original sp
vadd.i32 $T0,$T0,@X[0]
vadd.i32 $T1,$T1,@X[1]
vst1.32 {$T0},[$Xfer,:128]!
vadd.i32 $T2,$T2,@X[2]
vst1.32 {$T1},[$Xfer,:128]!
vadd.i32 $T3,$T3,@X[3]
vst1.32 {$T2},[$Xfer,:128]!
vst1.32 {$T3},[$Xfer,:128]!
ldmia $ctx,{$A-$H}
sub $Xfer,$Xfer,#64
ldr $t1,[sp,#0]
eor $t2,$t2,$t2
eor $t3,$B,$C
b .L_00_48
.align 4
.L_00_48:
___
&Xupdate(\&body_00_15);
&Xupdate(\&body_00_15);
&Xupdate(\&body_00_15);
&Xupdate(\&body_00_15);
$code.=<<___;
teq $t1,#0 @ check for K256 terminator
ldr $t1,[sp,#0]
sub $Xfer,$Xfer,#64
bne .L_00_48
ldr $inp,[sp,#68]
ldr $t0,[sp,#72]
sub $Ktbl,$Ktbl,#256 @ rewind $Ktbl
teq $inp,$t0
it eq
subeq $inp,$inp,#64 @ avoid SEGV
vld1.8 {@X[0]},[$inp]! @ load next input block
vld1.8 {@X[1]},[$inp]!
vld1.8 {@X[2]},[$inp]!
vld1.8 {@X[3]},[$inp]!
it ne
strne $inp,[sp,#68]
mov $Xfer,sp
___
&Xpreload(\&body_00_15);
&Xpreload(\&body_00_15);
&Xpreload(\&body_00_15);
&Xpreload(\&body_00_15);
$code.=<<___;
ldr $t0,[$t1,#0]
add $A,$A,$t2 @ h+=Maj(a,b,c) from the past
ldr $t2,[$t1,#4]
ldr $t3,[$t1,#8]
ldr $t4,[$t1,#12]
add $A,$A,$t0 @ accumulate
ldr $t0,[$t1,#16]
add $B,$B,$t2
ldr $t2,[$t1,#20]
add $C,$C,$t3
ldr $t3,[$t1,#24]
add $D,$D,$t4
ldr $t4,[$t1,#28]
add $E,$E,$t0
str $A,[$t1],#4
add $F,$F,$t2
str $B,[$t1],#4
add $G,$G,$t3
str $C,[$t1],#4
add $H,$H,$t4
str $D,[$t1],#4
stmia $t1,{$E-$H}
ittte ne
movne $Xfer,sp
ldrne $t1,[sp,#0]
eorne $t2,$t2,$t2
ldreq sp,[sp,#76] @ restore original sp
itt ne
eorne $t3,$B,$C
bne .L_00_48
ldmia sp!,{r4-r12,pc}
.size sha256_block_data_order_neon,.-sha256_block_data_order_neon
#endif
___
}}}
######################################################################
# ARMv8 stuff
#
{{{
my ($ABCD,$EFGH,$abcd)=map("q$_",(0..2));
my @MSG=map("q$_",(8..11));
my ($W0,$W1,$ABCD_SAVE,$EFGH_SAVE)=map("q$_",(12..15));
my $Ktbl="r3";
$code.=<<___;
#if __ARM_MAX_ARCH__>=7 && !defined(__KERNEL__)
# ifdef __thumb2__
# define INST(a,b,c,d) .byte c,d|0xc,a,b
# else
# define INST(a,b,c,d) .byte a,b,c,d
# endif
.type sha256_block_data_order_armv8,%function
.align 5
sha256_block_data_order_armv8:
.LARMv8:
vld1.32 {$ABCD,$EFGH},[$ctx]
# ifdef __thumb2__
adr $Ktbl,.LARMv8
sub $Ktbl,$Ktbl,#.LARMv8-K256
# else
adrl $Ktbl,K256
# endif
add $len,$inp,$len,lsl#6 @ len to point at the end of inp
.Loop_v8:
vld1.8 {@MSG[0]-@MSG[1]},[$inp]!
vld1.8 {@MSG[2]-@MSG[3]},[$inp]!
vld1.32 {$W0},[$Ktbl]!
vrev32.8 @MSG[0],@MSG[0]
vrev32.8 @MSG[1],@MSG[1]
vrev32.8 @MSG[2],@MSG[2]
vrev32.8 @MSG[3],@MSG[3]
vmov $ABCD_SAVE,$ABCD @ offload
vmov $EFGH_SAVE,$EFGH
teq $inp,$len
___
for($i=0;$i<12;$i++) {
$code.=<<___;
vld1.32 {$W1},[$Ktbl]!
vadd.i32 $W0,$W0,@MSG[0]
sha256su0 @MSG[0],@MSG[1]
vmov $abcd,$ABCD
sha256h $ABCD,$EFGH,$W0
sha256h2 $EFGH,$abcd,$W0
sha256su1 @MSG[0],@MSG[2],@MSG[3]
___
($W0,$W1)=($W1,$W0); push(@MSG,shift(@MSG));
}
$code.=<<___;
vld1.32 {$W1},[$Ktbl]!
vadd.i32 $W0,$W0,@MSG[0]
vmov $abcd,$ABCD
sha256h $ABCD,$EFGH,$W0
sha256h2 $EFGH,$abcd,$W0
vld1.32 {$W0},[$Ktbl]!
vadd.i32 $W1,$W1,@MSG[1]
vmov $abcd,$ABCD
sha256h $ABCD,$EFGH,$W1
sha256h2 $EFGH,$abcd,$W1
vld1.32 {$W1},[$Ktbl]
vadd.i32 $W0,$W0,@MSG[2]
sub $Ktbl,$Ktbl,#256-16 @ rewind
vmov $abcd,$ABCD
sha256h $ABCD,$EFGH,$W0
sha256h2 $EFGH,$abcd,$W0
vadd.i32 $W1,$W1,@MSG[3]
vmov $abcd,$ABCD
sha256h $ABCD,$EFGH,$W1
sha256h2 $EFGH,$abcd,$W1
vadd.i32 $ABCD,$ABCD,$ABCD_SAVE
vadd.i32 $EFGH,$EFGH,$EFGH_SAVE
it ne
bne .Loop_v8
vst1.32 {$ABCD,$EFGH},[$ctx]
ret @ bx lr
.size sha256_block_data_order_armv8,.-sha256_block_data_order_armv8
#endif
___
}}}
$code.=<<___;
.asciz "SHA256 block transform for ARMv4/NEON/ARMv8, CRYPTOGAMS by <appro\@openssl.org>"
.align 2
#if __ARM_MAX_ARCH__>=7 && !defined(__KERNEL__)
.comm OPENSSL_armcap_P,4,4
#endif
___
open SELF,$0;
while(<SELF>) {
next if (/^#!/);
last if (!s/^#/@/ and !/^$/);
print;
}
close SELF;
{ my %opcode = (
"sha256h" => 0xf3000c40, "sha256h2" => 0xf3100c40,
"sha256su0" => 0xf3ba03c0, "sha256su1" => 0xf3200c40 );
sub unsha256 {
my ($mnemonic,$arg)=@_;
if ($arg =~ m/q([0-9]+)(?:,\s*q([0-9]+))?,\s*q([0-9]+)/o) {
my $word = $opcode{$mnemonic}|(($1&7)<<13)|(($1&8)<<19)
|(($2&7)<<17)|(($2&8)<<4)
|(($3&7)<<1) |(($3&8)<<2);
# since ARMv7 instructions are always encoded little-endian.
# correct solution is to use .inst directive, but older
# assemblers don't implement it:-(
sprintf "INST(0x%02x,0x%02x,0x%02x,0x%02x)\t@ %s %s",
$word&0xff,($word>>8)&0xff,
($word>>16)&0xff,($word>>24)&0xff,
$mnemonic,$arg;
}
}
}
foreach (split($/,$code)) {
s/\`([^\`]*)\`/eval $1/geo;
s/\b(sha256\w+)\s+(q.*)/unsha256($1,$2)/geo;
s/\bret\b/bx lr/go or
s/\bbx\s+lr\b/.word\t0xe12fff1e/go; # make it possible to compile with -march=armv4
print $_,"\n";
}
close STDOUT; # enforce flush

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,128 @@
/*
* Glue code for the SHA256 Secure Hash Algorithm assembly implementation
* using optimized ARM assembler and NEON instructions.
*
* Copyright © 2015 Google Inc.
*
* This file is based on sha256_ssse3_glue.c:
* Copyright (C) 2013 Intel Corporation
* Author: Tim Chen <tim.c.chen@linux.intel.com>
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License as published by the Free
* Software Foundation; either version 2 of the License, or (at your option)
* any later version.
*
*/
#include <crypto/internal/hash.h>
#include <linux/crypto.h>
#include <linux/init.h>
#include <linux/module.h>
#include <linux/mm.h>
#include <linux/cryptohash.h>
#include <linux/types.h>
#include <linux/string.h>
#include <crypto/sha.h>
#include <crypto/sha256_base.h>
#include <asm/simd.h>
#include <asm/neon.h>
#include "sha256_glue.h"
asmlinkage void sha256_block_data_order(u32 *digest, const void *data,
unsigned int num_blks);
int crypto_sha256_arm_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
/* make sure casting to sha256_block_fn() is safe */
BUILD_BUG_ON(offsetof(struct sha256_state, state) != 0);
return sha256_base_do_update(desc, data, len,
(sha256_block_fn *)sha256_block_data_order);
}
EXPORT_SYMBOL(crypto_sha256_arm_update);
static int sha256_final(struct shash_desc *desc, u8 *out)
{
sha256_base_do_finalize(desc,
(sha256_block_fn *)sha256_block_data_order);
return sha256_base_finish(desc, out);
}
int crypto_sha256_arm_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
sha256_base_do_update(desc, data, len,
(sha256_block_fn *)sha256_block_data_order);
return sha256_final(desc, out);
}
EXPORT_SYMBOL(crypto_sha256_arm_finup);
static struct shash_alg algs[] = { {
.digestsize = SHA256_DIGEST_SIZE,
.init = sha256_base_init,
.update = crypto_sha256_arm_update,
.final = sha256_final,
.finup = crypto_sha256_arm_finup,
.descsize = sizeof(struct sha256_state),
.base = {
.cra_name = "sha256",
.cra_driver_name = "sha256-asm",
.cra_priority = 150,
.cra_flags = CRYPTO_ALG_TYPE_SHASH,
.cra_blocksize = SHA256_BLOCK_SIZE,
.cra_module = THIS_MODULE,
}
}, {
.digestsize = SHA224_DIGEST_SIZE,
.init = sha224_base_init,
.update = crypto_sha256_arm_update,
.final = sha256_final,
.finup = crypto_sha256_arm_finup,
.descsize = sizeof(struct sha256_state),
.base = {
.cra_name = "sha224",
.cra_driver_name = "sha224-asm",
.cra_priority = 150,
.cra_flags = CRYPTO_ALG_TYPE_SHASH,
.cra_blocksize = SHA224_BLOCK_SIZE,
.cra_module = THIS_MODULE,
}
} };
static int __init sha256_mod_init(void)
{
int res = crypto_register_shashes(algs, ARRAY_SIZE(algs));
if (res < 0)
return res;
if (IS_ENABLED(CONFIG_KERNEL_MODE_NEON) && cpu_has_neon()) {
res = crypto_register_shashes(sha256_neon_algs,
ARRAY_SIZE(sha256_neon_algs));
if (res < 0)
crypto_unregister_shashes(algs, ARRAY_SIZE(algs));
}
return res;
}
static void __exit sha256_mod_fini(void)
{
crypto_unregister_shashes(algs, ARRAY_SIZE(algs));
if (IS_ENABLED(CONFIG_KERNEL_MODE_NEON) && cpu_has_neon())
crypto_unregister_shashes(sha256_neon_algs,
ARRAY_SIZE(sha256_neon_algs));
}
module_init(sha256_mod_init);
module_exit(sha256_mod_fini);
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("SHA256 Secure Hash Algorithm (ARM), including NEON");
MODULE_ALIAS_CRYPTO("sha256");

View File

@ -0,0 +1,14 @@
#ifndef _CRYPTO_SHA256_GLUE_H
#define _CRYPTO_SHA256_GLUE_H
#include <linux/crypto.h>
extern struct shash_alg sha256_neon_algs[2];
int crypto_sha256_arm_update(struct shash_desc *desc, const u8 *data,
unsigned int len);
int crypto_sha256_arm_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *hash);
#endif /* _CRYPTO_SHA256_GLUE_H */

View File

@ -0,0 +1,101 @@
/*
* Glue code for the SHA256 Secure Hash Algorithm assembly implementation
* using NEON instructions.
*
* Copyright © 2015 Google Inc.
*
* This file is based on sha512_neon_glue.c:
* Copyright © 2014 Jussi Kivilinna <jussi.kivilinna@iki.fi>
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License as published by the Free
* Software Foundation; either version 2 of the License, or (at your option)
* any later version.
*
*/
#include <crypto/internal/hash.h>
#include <linux/cryptohash.h>
#include <linux/types.h>
#include <linux/string.h>
#include <crypto/sha.h>
#include <crypto/sha256_base.h>
#include <asm/byteorder.h>
#include <asm/simd.h>
#include <asm/neon.h>
#include "sha256_glue.h"
asmlinkage void sha256_block_data_order_neon(u32 *digest, const void *data,
unsigned int num_blks);
static int sha256_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
struct sha256_state *sctx = shash_desc_ctx(desc);
if (!may_use_simd() ||
(sctx->count % SHA256_BLOCK_SIZE) + len < SHA256_BLOCK_SIZE)
return crypto_sha256_arm_update(desc, data, len);
kernel_neon_begin();
sha256_base_do_update(desc, data, len,
(sha256_block_fn *)sha256_block_data_order_neon);
kernel_neon_end();
return 0;
}
static int sha256_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
if (!may_use_simd())
return crypto_sha256_arm_finup(desc, data, len, out);
kernel_neon_begin();
if (len)
sha256_base_do_update(desc, data, len,
(sha256_block_fn *)sha256_block_data_order_neon);
sha256_base_do_finalize(desc,
(sha256_block_fn *)sha256_block_data_order_neon);
kernel_neon_end();
return sha256_base_finish(desc, out);
}
static int sha256_final(struct shash_desc *desc, u8 *out)
{
return sha256_finup(desc, NULL, 0, out);
}
struct shash_alg sha256_neon_algs[] = { {
.digestsize = SHA256_DIGEST_SIZE,
.init = sha256_base_init,
.update = sha256_update,
.final = sha256_final,
.finup = sha256_finup,
.descsize = sizeof(struct sha256_state),
.base = {
.cra_name = "sha256",
.cra_driver_name = "sha256-neon",
.cra_priority = 250,
.cra_flags = CRYPTO_ALG_TYPE_SHASH,
.cra_blocksize = SHA256_BLOCK_SIZE,
.cra_module = THIS_MODULE,
}
}, {
.digestsize = SHA224_DIGEST_SIZE,
.init = sha224_base_init,
.update = sha256_update,
.final = sha256_final,
.finup = sha256_finup,
.descsize = sizeof(struct sha256_state),
.base = {
.cra_name = "sha224",
.cra_driver_name = "sha224-neon",
.cra_priority = 250,
.cra_flags = CRYPTO_ALG_TYPE_SHASH,
.cra_blocksize = SHA224_BLOCK_SIZE,
.cra_module = THIS_MODULE,
}
} };

View File

@ -21,6 +21,7 @@ generic-y += preempt.h
generic-y += resource.h
generic-y += rwsem.h
generic-y += scatterlist.h
generic-y += seccomp.h
generic-y += sections.h
generic-y += segment.h
generic-y += sembuf.h

View File

@ -269,6 +269,16 @@ static inline void __kvm_flush_dcache_pud(pud_t pud)
void kvm_set_way_flush(struct kvm_vcpu *vcpu);
void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
static inline bool __kvm_cpu_uses_extended_idmap(void)
{
return false;
}
static inline void __kvm_extend_hypmap(pgd_t *boot_hyp_pgd,
pgd_t *hyp_pgd,
pgd_t *merged_hyp_pgd,
unsigned long hyp_idmap_start) { }
#endif /* !__ASSEMBLY__ */
#endif /* __ARM_KVM_MMU_H__ */

View File

@ -1,11 +0,0 @@
#ifndef _ASM_ARM_SECCOMP_H
#define _ASM_ARM_SECCOMP_H
#include <linux/unistd.h>
#define __NR_seccomp_read __NR_read
#define __NR_seccomp_write __NR_write
#define __NR_seccomp_exit __NR_exit
#define __NR_seccomp_sigreturn __NR_rt_sigreturn
#endif /* _ASM_ARM_SECCOMP_H */

Some files were not shown because too many files have changed in this diff Show More