Merge branch 'for-6.2/apple' into for-linus

- new quirks for select Apple keyboards (Kerem Karabay, Aditya Garg)
This commit is contained in:
Jiri Kosina 2022-12-13 14:27:16 +01:00
commit cfd1f6c16f
5394 changed files with 238379 additions and 75045 deletions

View File

@ -104,6 +104,7 @@ Christoph Hellwig <hch@lst.de>
Colin Ian King <colin.i.king@gmail.com> <colin.king@canonical.com>
Corey Minyard <minyard@acm.org>
Damian Hobson-Garcia <dhobsong@igel.co.jp>
Dan Carpenter <error27@gmail.com> <dan.carpenter@oracle.com>
Daniel Borkmann <daniel@iogearbox.net> <danborkmann@googlemail.com>
Daniel Borkmann <daniel@iogearbox.net> <danborkmann@iogearbox.net>
Daniel Borkmann <daniel@iogearbox.net> <daniel.borkmann@tik.ee.ethz.ch>
@ -137,6 +138,7 @@ Filipe Lautert <filipe@icewall.org>
Finn Thain <fthain@linux-m68k.org> <fthain@telegraphics.com.au>
Franck Bui-Huu <vagabon.xyz@gmail.com>
Frank Rowand <frowand.list@gmail.com> <frank.rowand@am.sony.com>
Frank Rowand <frowand.list@gmail.com> <frank.rowand@sony.com>
Frank Rowand <frowand.list@gmail.com> <frank.rowand@sonymobile.com>
Frank Rowand <frowand.list@gmail.com> <frowand@mvista.com>
Frank Zago <fzago@systemfabricworks.com>
@ -336,6 +338,7 @@ Oleksij Rempel <linux@rempel-privat.de> <external.Oleksij.Rempel@de.bosch.com>
Oleksij Rempel <linux@rempel-privat.de> <fixed-term.Oleksij.Rempel@de.bosch.com>
Oleksij Rempel <linux@rempel-privat.de> <o.rempel@pengutronix.de>
Oleksij Rempel <linux@rempel-privat.de> <ore@pengutronix.de>
Oliver Upton <oliver.upton@linux.dev> <oupton@google.com>
Pali Rohár <pali@kernel.org> <pali.rohar@gmail.com>
Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Patrick Mochel <mochel@digitalimplant.org>
@ -351,7 +354,8 @@ Peter Oruba <peter@oruba.de>
Pratyush Anand <pratyush.anand@gmail.com> <pratyush.anand@st.com>
Praveen BP <praveenbp@ti.com>
Punit Agrawal <punitagrawal@gmail.com> <punit.agrawal@arm.com>
Qais Yousef <qsyousef@gmail.com> <qais.yousef@imgtec.com>
Qais Yousef <qyousef@layalina.io> <qais.yousef@imgtec.com>
Qais Yousef <qyousef@layalina.io> <qais.yousef@arm.com>
Quentin Monnet <quentin@isovalent.com> <quentin.monnet@netronome.com>
Quentin Perret <qperret@qperret.net> <quentin.perret@arm.com>
Rafael J. Wysocki <rjw@rjwysocki.net> <rjw@sisk.pl>

View File

@ -227,6 +227,17 @@ Contact: dmaengine@vger.kernel.org
Description: Indicate the number of retires for an enqcmds submission on a sharedwq.
A max value to set attribute is capped at 64.
What: /sys/bus/dsa/devices/wq<m>.<n>/op_config
Date: Sept 14, 2022
KernelVersion: 6.0.0
Contact: dmaengine@vger.kernel.org
Description: Shows the operation capability bits displayed in bitmap format
presented by %*pb printk() output format specifier.
The attribute can be configured when the WQ is disabled in
order to configure the WQ to accept specific bits that
correlates to the operations allowed. It's visible only
on platforms that support the capability.
What: /sys/bus/dsa/devices/engine<m>.<n>/group_id
Date: Oct 25, 2019
KernelVersion: 5.6.0
@ -255,3 +266,27 @@ Contact: dmaengine@vger.kernel.org
Description: Indicates the number of Read Buffers reserved for the use of
engines in the group. See DSA spec v1.2 9.2.18 GRPCFG Read Buffers
Reserved.
What: /sys/bus/dsa/devices/group<m>.<n>/desc_progress_limit
Date: Sept 14, 2022
KernelVersion: 6.0.0
Contact: dmaengine@vger.kernel.org
Description: Allows control of the number of work descriptors that can be
concurrently processed by an engine in the group as a fraction
of the Maximum Work Descriptors in Progress value specified in
the ENGCAP register. The acceptable values are 0 (default),
1 (1/2 of max value), 2 (1/4 of the max value), and 3 (1/8 of
the max value). It's visible only on platforms that support
the capability.
What: /sys/bus/dsa/devices/group<m>.<n>/batch_progress_limit
Date: Sept 14, 2022
KernelVersion: 6.0.0
Contact: dmaengine@vger.kernel.org
Description: Allows control of the number of batch descriptors that can be
concurrently processed by an engine in the group as a fraction
of the Maximum Batch Descriptors in Progress value specified in
the ENGCAP register. The acceptable values are 0 (default),
1 (1/2 of max value), 2 (1/4 of the max value), and 3 (1/8 of
the max value). It's visible only on platforms that support
the capability.

View File

@ -516,3 +516,11 @@ Contact: Mathieu Poirier <mathieu.poirier@linaro.org>
Description: (Read) Returns the number of special conditional P1 right-hand keys
that the trace unit can use (0x194). The value is taken
directly from the HW.
What: /sys/bus/coresight/devices/etm<N>/ts_source
Date: October 2022
KernelVersion: 6.1
Contact: Mathieu Poirier <mathieu.poirier@linaro.org> or Suzuki K Poulose <suzuki.poulose@arm.com>
Description: (Read) When FEAT_TRF is implemented, value of TRFCR_ELx.TS used for
trace session. Otherwise -1 indicates an unknown time source. Check
trcidr0.tssize to see if a global timestamp is available.

View File

@ -4,6 +4,12 @@ Contact: linux-iio@vger.kernel.org
Description:
Count data of Count Y represented as a string.
What: /sys/bus/counter/devices/counterX/countY/capture
KernelVersion: 6.1
Contact: linux-iio@vger.kernel.org
Description:
Historical capture of the Count Y count data.
What: /sys/bus/counter/devices/counterX/countY/ceiling
KernelVersion: 5.2
Contact: linux-iio@vger.kernel.org
@ -203,6 +209,13 @@ Description:
both edges:
Any state transition.
What: /sys/bus/counter/devices/counterX/countY/num_overflows
KernelVersion: 6.1
Contact: linux-iio@vger.kernel.org
Description:
This attribute indicates the number of overflows of count Y.
What: /sys/bus/counter/devices/counterX/countY/capture_component_id
What: /sys/bus/counter/devices/counterX/countY/ceiling_component_id
What: /sys/bus/counter/devices/counterX/countY/floor_component_id
What: /sys/bus/counter/devices/counterX/countY/count_mode_component_id
@ -213,11 +226,14 @@ What: /sys/bus/counter/devices/counterX/countY/prescaler_component_id
What: /sys/bus/counter/devices/counterX/countY/preset_component_id
What: /sys/bus/counter/devices/counterX/countY/preset_enable_component_id
What: /sys/bus/counter/devices/counterX/countY/signalZ_action_component_id
What: /sys/bus/counter/devices/counterX/countY/num_overflows_component_id
What: /sys/bus/counter/devices/counterX/signalY/cable_fault_component_id
What: /sys/bus/counter/devices/counterX/signalY/cable_fault_enable_component_id
What: /sys/bus/counter/devices/counterX/signalY/filter_clock_prescaler_component_id
What: /sys/bus/counter/devices/counterX/signalY/index_polarity_component_id
What: /sys/bus/counter/devices/counterX/signalY/polarity_component_id
What: /sys/bus/counter/devices/counterX/signalY/synchronous_mode_component_id
What: /sys/bus/counter/devices/counterX/signalY/frequency_component_id
KernelVersion: 5.16
Contact: linux-iio@vger.kernel.org
Description:
@ -303,6 +319,19 @@ Description:
Discrete set of available values for the respective Signal Y
configuration are listed in this file.
What: /sys/bus/counter/devices/counterX/signalY/polarity
KernelVersion: 6.1
Contact: linux-iio@vger.kernel.org
Description:
Active level of Signal Y. The following polarity values are
available:
positive:
Signal high state considered active level (rising edge).
negative:
Signal low state considered active level (falling edge).
What: /sys/bus/counter/devices/counterX/signalY/name
KernelVersion: 5.2
Contact: linux-iio@vger.kernel.org
@ -345,3 +374,9 @@ Description:
via index_polarity. The index function (as enabled via
preset_enable) is performed synchronously with the
quadrature clock on the active level of the index input.
What: /sys/bus/counter/devices/counterX/signalY/frequency
KernelVersion: 6.1
Contact: linux-iio@vger.kernel.org
Description:
Read-only attribute that indicates the signal Y frequency, in Hz.

View File

@ -196,7 +196,7 @@ Description:
Raw capacitance measurement from channel Y. Units after
application of scale and offset are nanofarads.
What: /sys/.../iio:deviceX/in_capacitanceY-in_capacitanceZ_raw
What: /sys/.../iio:deviceX/in_capacitanceY-capacitanceZ_raw
KernelVersion: 3.2
Contact: linux-iio@vger.kernel.org
Description:
@ -207,6 +207,25 @@ Description:
is required is a consistent labeling. Units after application
of scale and offset are nanofarads.
What: /sys/.../iio:deviceX/in_capacitanceY-capacitanceZ_zeropoint
KernelVersion: 6.1
Contact: linux-iio@vger.kernel.org
Description:
For differential channels, this an offset that is applied
equally to both inputs. As the reading is of the difference
between the two inputs, this should not be applied to the _raw
reading by userspace (unlike _offset) and unlike calibbias
it does not affect the differential value measured because
the effect of _zeropoint cancels out across the two inputs
that make up the differential pair. It's purpose is to bring
the individual signals, before the differential is measured,
within the measurement range of the device. The naming is
chosen because if the separate inputs that make the
differential pair are drawn on a graph in their
_raw units, this is the value that the zero point on the
measurement axis represents. It is expressed with the
same scaling as _raw.
What: /sys/bus/iio/devices/iio:deviceX/in_temp_raw
What: /sys/bus/iio/devices/iio:deviceX/in_tempX_raw
What: /sys/bus/iio/devices/iio:deviceX/in_temp_x_raw
@ -241,6 +260,15 @@ Description:
Has all of the equivalent parameters as per voltageY. Units
after application of scale and offset are m/s^2.
What: /sys/bus/iio/devices/iio:deviceX/in_accel_linear_x_raw
What: /sys/bus/iio/devices/iio:deviceX/in_accel_linear_y_raw
What: /sys/bus/iio/devices/iio:deviceX/in_accel_linear_z_raw
KernelVersion: 6.1
Contact: linux-iio@vger.kernel.org
Description:
As per in_accel_X_raw attributes, but minus the
acceleration due to gravity.
What: /sys/bus/iio/devices/iio:deviceX/in_gravity_x_raw
What: /sys/bus/iio/devices/iio:deviceX/in_gravity_y_raw
What: /sys/bus/iio/devices/iio:deviceX/in_gravity_z_raw
@ -2038,3 +2066,99 @@ Description:
Available range for the forced calibration value, expressed as:
- a range specified as "[min step max]"
What: /sys/bus/iio/devices/iio:deviceX/in_voltageX_sampling_frequency
What: /sys/bus/iio/devices/iio:deviceX/in_powerY_sampling_frequency
What: /sys/bus/iio/devices/iio:deviceX/in_currentZ_sampling_frequency
KernelVersion: 5.20
Contact: linux-iio@vger.kernel.org
Description:
Some devices have separate controls of sampling frequency for
individual channels. If multiple channels are enabled in a scan,
then the sampling_frequency of the scan may be computed from the
per channel sampling frequencies.
What: /sys/.../events/in_accel_gesture_singletap_en
What: /sys/.../events/in_accel_gesture_doubletap_en
KernelVersion: 6.1
Contact: linux-iio@vger.kernel.org
Description:
Device generates an event on a single or double tap.
What: /sys/.../events/in_accel_gesture_singletap_value
What: /sys/.../events/in_accel_gesture_doubletap_value
KernelVersion: 6.1
Contact: linux-iio@vger.kernel.org
Description:
Specifies the threshold value that the device is comparing
against to generate the tap gesture event. The lower
threshold value increases the sensitivity of tap detection.
Units and the exact meaning of value are device-specific.
What: /sys/.../events/in_accel_gesture_tap_value_available
KernelVersion: 6.1
Contact: linux-iio@vger.kernel.org
Description:
Lists all available threshold values which can be used to
modify the sensitivity of the tap detection.
What: /sys/.../events/in_accel_gesture_singletap_reset_timeout
What: /sys/.../events/in_accel_gesture_doubletap_reset_timeout
KernelVersion: 6.1
Contact: linux-iio@vger.kernel.org
Description:
Specifies the timeout value in seconds for the tap detector
to not to look for another tap event after the event as
occurred. Basically the minimum quiet time between the two
single-tap's or two double-tap's.
What: /sys/.../events/in_accel_gesture_tap_reset_timeout_available
KernelVersion: 6.1
Contact: linux-iio@vger.kernel.org
Description:
Lists all available tap reset timeout values. Units in seconds.
What: /sys/.../events/in_accel_gesture_doubletap_tap2_min_delay
KernelVersion: 6.1
Contact: linux-iio@vger.kernel.org
Description:
Specifies the minimum quiet time in seconds between the two
taps of a double tap.
What: /sys/.../events/in_accel_gesture_doubletap_tap2_min_delay_available
KernelVersion: 6.1
Contact: linux-iio@vger.kernel.org
Description:
Lists all available delay values between two taps in the double
tap. Units in seconds.
What: /sys/.../events/in_accel_gesture_tap_maxtomin_time
KernelVersion: 6.1
Contact: linux-iio@vger.kernel.org
Description:
Specifies the maximum time difference allowed between upper
and lower peak of tap to consider it as the valid tap event.
Units in seconds.
What: /sys/.../events/in_accel_gesture_tap_maxtomin_time_available
KernelVersion: 6.1
Contact: linux-iio@vger.kernel.org
Description:
Lists all available time values between upper peak to lower
peak. Units in seconds.
What: /sys/bus/iio/devices/iio:deviceX/in_rot_yaw_raw
What: /sys/bus/iio/devices/iio:deviceX/in_rot_pitch_raw
What: /sys/bus/iio/devices/iio:deviceX/in_rot_roll_raw
KernelVersion: 6.1
Contact: linux-iio@vger.kernel.org
Description:
Raw (unscaled) euler angles readings. Units after
application of scale are deg.
What: /sys/bus/iio/devices/iio:deviceX/serialnumber
KernelVersion: 6.1
Contact: linux-iio@vger.kernel.org
Description:
An example format is 16-bytes, 2-digits-per-byte, HEX-string
representing the sensor unique ID number.

View File

@ -0,0 +1,81 @@
What: /sys/bus/iio/devices/iio:deviceX/in_accel_raw_range
KernelVersion: 6.1
Contact: linux-iio@vger.kernel.org
Description:
Raw (unscaled) range for acceleration readings. Unit after
application of scale is m/s^2. Note that this doesn't affects
the scale (which should be used when changing the maximum and
minimum readable value affects also the reading scaling factor).
What: /sys/bus/iio/devices/iio:deviceX/in_anglvel_raw_range
KernelVersion: 6.1
Contact: linux-iio@vger.kernel.org
Description:
Range for angular velocity readings in radians per second. Note
that this does not affects the scale (which should be used when
changing the maximum and minimum readable value affects also the
reading scaling factor).
What: /sys/bus/iio/devices/iio:deviceX/in_accel_raw_range_available
KernelVersion: 6.1
Contact: linux-iio@vger.kernel.org
Description:
List of allowed values for in_accel_raw_range attribute
What: /sys/bus/iio/devices/iio:deviceX/in_anglvel_raw_range_available
KernelVersion: 6.1
Contact: linux-iio@vger.kernel.org
Description:
List of allowed values for in_anglvel_raw_range attribute
What: /sys/bus/iio/devices/iio:deviceX/in_magn_calibration_fast_enable
KernelVersion: 6.1
Contact: linux-iio@vger.kernel.org
Description:
Can be 1 or 0. Enables/disables the "Fast Magnetometer
Calibration" HW function.
What: /sys/bus/iio/devices/iio:deviceX/fusion_enable
KernelVersion: 6.1
Contact: linux-iio@vger.kernel.org
Description:
Can be 1 or 0. Enables/disables the "sensor fusion" (a.k.a.
NDOF) HW function.
What: /sys/bus/iio/devices/iio:deviceX/calibration_data
KernelVersion: 6.1
Contact: linux-iio@vger.kernel.org
Description:
Reports the binary calibration data blob for the IMU sensors.
What: /sys/bus/iio/devices/iio:deviceX/in_accel_calibration_auto_status
KernelVersion: 6.1
Contact: linux-iio@vger.kernel.org
Description:
Reports the autocalibration status for the accelerometer sensor.
Can be 0 (calibration non even enabled) or 1 to 5 where the greater
the number, the better the calibration status.
What: /sys/bus/iio/devices/iio:deviceX/in_gyro_calibration_auto_status
KernelVersion: 6.1
Contact: linux-iio@vger.kernel.org
Description:
Reports the autocalibration status for the gyroscope sensor.
Can be 0 (calibration non even enabled) or 1 to 5 where the greater
the number, the better the calibration status.
What: /sys/bus/iio/devices/iio:deviceX/in_magn_calibration_auto_status
KernelVersion: 6.1
Contact: linux-iio@vger.kernel.org
Description:
Reports the autocalibration status for the magnetometer sensor.
Can be 0 (calibration non even enabled) or 1 to 5 where the greater
the number, the better the calibration status.
What: /sys/bus/iio/devices/iio:deviceX/sys_calibration_auto_status
KernelVersion: 6.1
Contact: linux-iio@vger.kernel.org
Description:
Reports the status for the IMU overall autocalibration.
Can be 0 (calibration non even enabled) or 1 to 5 where the greater
the number, the better the calibration status.

View File

@ -0,0 +1,11 @@
What: /sys/.../iio:deviceX/in_capacitableY_calibbias_calibration
What: /sys/.../iio:deviceX/in_capacitableY_calibscale_calibration
KernelVersion: 6.1
Contact: linux-iio@vger.kernel.org
Description:
Write 1 to trigger a calibration of the calibbias or
calibscale. For calibscale, a full scale capacitance should
be connected to the capacitance input and a
calibscale_calibration then started. For calibbias see
the device datasheet section on "capacitive system offset
calibration".

View File

@ -457,3 +457,36 @@ Description:
The file is writable if the PF is bound to a driver that
implements ->sriov_set_msix_vec_count().
What: /sys/bus/pci/devices/.../resourceN_resize
Date: September 2022
Contact: Alex Williamson <alex.williamson@redhat.com>
Description:
These files provide an interface to PCIe Resizable BAR support.
A file is created for each BAR resource (N) supported by the
PCIe Resizable BAR extended capability of the device. Reading
each file exposes the bitmap of available resource sizes:
# cat resource1_resize
00000000000001c0
The bitmap represents supported resource sizes for the BAR,
where bit0 = 1MB, bit1 = 2MB, bit2 = 4MB, etc. In the above
example the device supports 64MB, 128MB, and 256MB BAR sizes.
When writing the file, the user provides the bit position of
the desired resource size, for example:
# echo 7 > resource1_resize
This indicates to set the size value corresponding to bit 7,
128MB. The resulting size is 2 ^ (bit# + 20). This definition
matches the PCIe specification of this capability.
In order to make use of resource resizing, all PCI drivers must
be unbound from the device and peer devices under the same
parent bridge may need to be soft removed. In the case of
VGA devices, writing a resize value will remove low level
console drivers from the device. Raw users of pci-sysfs
resourceN attributes must be terminated prior to resizing.
Success of the resizing operation is not guaranteed.

View File

@ -153,7 +153,7 @@ Date: Jan 2020
KernelVersion: 5.5
Contact: Mika Westerberg <mika.westerberg@linux.intel.com>
Description: This attribute reports number of RX lanes the device is
using simultaneusly through its upstream port.
using simultaneously through its upstream port.
What: /sys/bus/thunderbolt/devices/.../tx_speed
Date: Jan 2020
@ -167,7 +167,7 @@ Date: Jan 2020
KernelVersion: 5.5
Contact: Mika Westerberg <mika.westerberg@linux.intel.com>
Description: This attribute reports number of TX lanes the device is
using simultaneusly through its upstream port.
using simultaneously through its upstream port.
What: /sys/bus/thunderbolt/devices/.../vendor
Date: Sep 2017

View File

@ -0,0 +1,61 @@
What: /sys/devices/hisi_ptt<sicl_id>_<core_id>/tune
Date: October 2022
KernelVersion: 6.1
Contact: Yicong Yang <yangyicong@hisilicon.com>
Description: This directory contains files for tuning the PCIe link
parameters(events). Each file is named after the event
of the PCIe link.
See Documentation/trace/hisi-ptt.rst for more information.
What: /sys/devices/hisi_ptt<sicl_id>_<core_id>/tune/qos_tx_cpl
Date: October 2022
KernelVersion: 6.1
Contact: Yicong Yang <yangyicong@hisilicon.com>
Description: (RW) Controls the weight of Tx completion TLPs, which influence
the proportion of outbound completion TLPs on the PCIe link.
The available tune data is [0, 1, 2]. Writing a negative value
will return an error, and out of range values will be converted
to 2. The value indicates a probable level of the event.
What: /sys/devices/hisi_ptt<sicl_id>_<core_id>/tune/qos_tx_np
Date: October 2022
KernelVersion: 6.1
Contact: Yicong Yang <yangyicong@hisilicon.com>
Description: (RW) Controls the weight of Tx non-posted TLPs, which influence
the proportion of outbound non-posted TLPs on the PCIe link.
The available tune data is [0, 1, 2]. Writing a negative value
will return an error, and out of range values will be converted
to 2. The value indicates a probable level of the event.
What: /sys/devices/hisi_ptt<sicl_id>_<core_id>/tune/qos_tx_p
Date: October 2022
KernelVersion: 6.1
Contact: Yicong Yang <yangyicong@hisilicon.com>
Description: (RW) Controls the weight of Tx posted TLPs, which influence the
proportion of outbound posted TLPs on the PCIe link.
The available tune data is [0, 1, 2]. Writing a negative value
will return an error, and out of range values will be converted
to 2. The value indicates a probable level of the event.
What: /sys/devices/hisi_ptt<sicl_id>_<core_id>/tune/rx_alloc_buf_level
Date: October 2022
KernelVersion: 6.1
Contact: Yicong Yang <yangyicong@hisilicon.com>
Description: (RW) Control the allocated buffer watermark for inbound packets.
The packets will be stored in the buffer first and then transmitted
either when the watermark reached or when timed out.
The available tune data is [0, 1, 2]. Writing a negative value
will return an error, and out of range values will be converted
to 2. The value indicates a probable level of the event.
What: /sys/devices/hisi_ptt<sicl_id>_<core_id>/tune/tx_alloc_buf_level
Date: October 2022
KernelVersion: 6.1
Contact: Yicong Yang <yangyicong@hisilicon.com>
Description: (RW) Control the allocated buffer watermark of outbound packets.
The packets will be stored in the buffer first and then transmitted
either when the watermark reached or when timed out.
The available tune data is [0, 1, 2]. Writing a negative value
will return an error, and out of range values will be converted
to 2. The value indicates a probable level of the event.

View File

@ -0,0 +1,8 @@
What: /sys/.../<device>/vfio-dev/vfioX/
Date: September 2022
Contact: Yi Liu <yi.l.liu@intel.com>
Description:
This directory is created when the device is bound to a
vfio driver. The layout under this directory matches what
exists for a standard 'struct device'. 'X' is a unique
index marking this device in vfio.

View File

@ -16,7 +16,7 @@ Description: Version of the application running on the device's CPU
What: /sys/class/habanalabs/hl<n>/clk_max_freq_mhz
Date: Jun 2019
KernelVersion: not yet upstreamed
KernelVersion: 5.7
Contact: ogabbay@kernel.org
Description: Allows the user to set the maximum clock frequency, in MHz.
The device clock might be set to lower value than the maximum.
@ -26,7 +26,7 @@ Description: Allows the user to set the maximum clock frequency, in MHz.
What: /sys/class/habanalabs/hl<n>/clk_cur_freq_mhz
Date: Jun 2019
KernelVersion: not yet upstreamed
KernelVersion: 5.7
Contact: ogabbay@kernel.org
Description: Displays the current frequency, in MHz, of the device clock.
This property is valid only for the Gaudi ASIC family
@ -176,6 +176,12 @@ KernelVersion: 5.1
Contact: ogabbay@kernel.org
Description: Version of the device's preboot F/W code
What: /sys/class/habanalabs/hl<n>/security_enabled
Date: Oct 2022
KernelVersion: 6.1
Contact: obitton@habana.ai
Description: Displays the device's security status
What: /sys/class/habanalabs/hl<n>/soft_reset
Date: Jan 2019
KernelVersion: 5.1
@ -230,6 +236,6 @@ Description: Version of the u-boot running on the device's CPU
What: /sys/class/habanalabs/hl<n>/vrm_ver
Date: Jan 2022
KernelVersion: not yet upstreamed
KernelVersion: 5.17
Contact: ogabbay@kernel.org
Description: Version of the Device's Voltage Regulator Monitor F/W code. N/A to GOYA and GAUDI

View File

@ -1417,6 +1417,15 @@ Description: This node is used to set or display whether UFS WriteBooster is
platform that doesn't support UFSHCD_CAP_CLK_SCALING, we can
disable/enable WriteBooster through this sysfs node.
What: /sys/bus/platform/drivers/ufshcd/*/enable_wb_buf_flush
What: /sys/bus/platform/devices/*.ufs/enable_wb_buf_flush
Date: July 2022
Contact: Jinyoung Choi <j-young.choi@samsung.com>
Description: This entry shows the status of WriteBooster buffer flushing
and it can be used to enable or disable the flushing.
If flushing is enabled, the device executes the flush
operation when the command queue is empty.
What: /sys/bus/platform/drivers/ufshcd/*/device_descriptor/hpb_version
What: /sys/bus/platform/devices/*.ufs/device_descriptor/hpb_version
Date: June 2021
@ -1591,6 +1600,43 @@ Description: This entry shows the status of HPB.
The file is read only.
Contact: Daniil Lunev <dlunev@chromium.org>
What: /sys/bus/platform/drivers/ufshcd/*/capabilities/
What: /sys/bus/platform/devices/*.ufs/capabilities/
Date: August 2022
Description: The group represents the effective capabilities of the
host-device pair. i.e. the capabilities which are enabled in the
driver for the specific host controller, supported by the host
controller and are supported and/or have compatible
configuration on the device side.
Contact: Daniil Lunev <dlunev@chromium.org>
What: /sys/bus/platform/drivers/ufshcd/*/capabilities/clock_scaling
What: /sys/bus/platform/devices/*.ufs/capabilities/clock_scaling
Date: August 2022
Contact: Daniil Lunev <dlunev@chromium.org>
Description: Indicates status of clock scaling.
== ============================
0 Clock scaling is not supported.
1 Clock scaling is supported.
== ============================
The file is read only.
What: /sys/bus/platform/drivers/ufshcd/*/capabilities/write_booster
What: /sys/bus/platform/devices/*.ufs/capabilities/write_booster
Date: August 2022
Contact: Daniil Lunev <dlunev@chromium.org>
Description: Indicates status of Write Booster.
== ============================
0 Write Booster can not be enabled.
1 Write Booster can be enabled.
== ============================
The file is read only.
What: /sys/class/scsi_device/*/device/hpb_param_sysfs/activation_thld
Date: February 2021
Contact: Avri Altman <avri.altman@wdc.com>

View File

@ -466,6 +466,30 @@ Description: Show status of f2fs superblock in real time.
0x4000 SBI_IS_FREEZING freefs is in process
====== ===================== =================================
What: /sys/fs/f2fs/<disk>/stat/cp_status
Date: September 2022
Contact: "Chao Yu" <chao.yu@oppo.com>
Description: Show status of f2fs checkpoint in real time.
=============================== ==============================
cp flag value
CP_UMOUNT_FLAG 0x00000001
CP_ORPHAN_PRESENT_FLAG 0x00000002
CP_COMPACT_SUM_FLAG 0x00000004
CP_ERROR_FLAG 0x00000008
CP_FSCK_FLAG 0x00000010
CP_FASTBOOT_FLAG 0x00000020
CP_CRC_RECOVERY_FLAG 0x00000040
CP_NAT_BITS_FLAG 0x00000080
CP_TRIMMED_FLAG 0x00000100
CP_NOCRC_RECOVERY_FLAG 0x00000200
CP_LARGE_NAT_BITMAP_FLAG 0x00000400
CP_QUOTA_NEED_FSCK_FLAG 0x00000800
CP_DISABLED_FLAG 0x00001000
CP_DISABLED_QUICK_FLAG 0x00002000
CP_RESIZEFS_FLAG 0x00004000
=============================== ==============================
What: /sys/fs/f2fs/<disk>/ckpt_thread_ioprio
Date: January 2021
Contact: "Daeho Jeong" <daehojeong@google.com>

View File

@ -55,6 +55,14 @@ Description:
The object directory contains subdirectories for each function
that is patched within the object.
What: /sys/kernel/livepatch/<patch>/<object>/patched
Date: August 2022
KernelVersion: 6.1.0
Contact: live-patching@vger.kernel.org
Description:
An attribute which indicates whether the object is currently
patched.
What: /sys/kernel/livepatch/<patch>/<object>/<function,sympos>
Date: Nov 2014
KernelVersion: 3.19.0

View File

@ -0,0 +1,25 @@
What: /sys/devices/virtual/memory_tiering/
Date: August 2022
Contact: Linux memory management mailing list <linux-mm@kvack.org>
Description: A collection of all the memory tiers allocated.
Individual memory tier details are contained in subdirectories
named by the abstract distance of the memory tier.
/sys/devices/virtual/memory_tiering/memory_tierN/
What: /sys/devices/virtual/memory_tiering/memory_tierN/
/sys/devices/virtual/memory_tiering/memory_tierN/nodes
Date: August 2022
Contact: Linux memory management mailing list <linux-mm@kvack.org>
Description: Directory with details of a specific memory tier
This is the directory containing information about a particular
memory tier, memtierN, where N is derived based on abstract distance.
A smaller value of N implies a higher (faster) memory tier in the
hierarchy.
nodes: NUMA nodes that are part of this memory tier.

View File

@ -13,7 +13,7 @@ a) waiting for a CPU (while being runnable)
b) completion of synchronous block I/O initiated by the task
c) swapping in pages
d) memory reclaim
e) thrashing page cache
e) thrashing
f) direct compact
g) write-protect copy

View File

@ -299,7 +299,7 @@ Per-node-per-memcgroup LRU (cgroup's private LRU) is guarded by
lruvec->lru_lock; PG_lru bit of page->flags is cleared before
isolating a page from its LRU under lruvec->lru_lock.
2.7 Kernel Memory Extension (CONFIG_MEMCG_KMEM)
2.7 Kernel Memory Extension
-----------------------------------------------
With the Kernel memory extension, the Memory Controller is able to limit
@ -386,8 +386,6 @@ U != 0, K >= U:
a. Enable CONFIG_CGROUPS
b. Enable CONFIG_MEMCG
c. Enable CONFIG_MEMCG_SWAP (to use swap extension)
d. Enable CONFIG_MEMCG_KMEM (to use kmem extension)
3.1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
-------------------------------------------------------------------

View File

@ -976,6 +976,29 @@ All cgroup core files are prefixed with "cgroup."
killing cgroups is a process directed operation, i.e. it affects
the whole thread-group.
cgroup.pressure
A read-write single value file that allowed values are "0" and "1".
The default is "1".
Writing "0" to the file will disable the cgroup PSI accounting.
Writing "1" to the file will re-enable the cgroup PSI accounting.
This control attribute is not hierarchical, so disable or enable PSI
accounting in a cgroup does not affect PSI accounting in descendants
and doesn't need pass enablement via ancestors from root.
The reason this control attribute exists is that PSI accounts stalls for
each cgroup separately and aggregates it at each level of the hierarchy.
This may cause non-negligible overhead for some workloads when under
deep level of the hierarchy, in which case this control attribute can
be used to disable PSI accounting in the non-leaf cgroups.
irq.pressure
A read-write nested-keyed file.
Shows pressure stall information for IRQ/SOFTIRQ. See
:ref:`Documentation/accounting/psi.rst <psi>` for details.
Controllers
===========
@ -1355,6 +1378,11 @@ PAGE_SIZE multiple when read back.
pagetables
Amount of memory allocated for page tables.
sec_pagetables
Amount of memory allocated for secondary page tables,
this currently includes KVM mmu allocations on x86
and arm64.
percpu (npn)
Amount of memory used for storing per-cpu kernel
data structures.
@ -2185,75 +2213,93 @@ Cpuset Interface Files
It accepts only the following input values when written to.
======== ================================
"root" a partition root
"member" a non-root member of a partition
======== ================================
========== =====================================
"member" Non-root member of a partition
"root" Partition root
"isolated" Partition root without load balancing
========== =====================================
When set to be a partition root, the current cgroup is the
root of a new partition or scheduling domain that comprises
itself and all its descendants except those that are separate
partition roots themselves and their descendants. The root
cgroup is always a partition root.
The root cgroup is always a partition root and its state
cannot be changed. All other non-root cgroups start out as
"member".
There are constraints on where a partition root can be set.
It can only be set in a cgroup if all the following conditions
are true.
When set to "root", the current cgroup is the root of a new
partition or scheduling domain that comprises itself and all
its descendants except those that are separate partition roots
themselves and their descendants.
1) The "cpuset.cpus" is not empty and the list of CPUs are
exclusive, i.e. they are not shared by any of its siblings.
2) The parent cgroup is a partition root.
3) The "cpuset.cpus" is also a proper subset of the parent's
"cpuset.cpus.effective".
4) There is no child cgroups with cpuset enabled. This is for
eliminating corner cases that have to be handled if such a
condition is allowed.
When set to "isolated", the CPUs in that partition root will
be in an isolated state without any load balancing from the
scheduler. Tasks placed in such a partition with multiple
CPUs should be carefully distributed and bound to each of the
individual CPUs for optimal performance.
Setting it to partition root will take the CPUs away from the
effective CPUs of the parent cgroup. Once it is set, this
file cannot be reverted back to "member" if there are any child
cgroups with cpuset enabled.
The value shown in "cpuset.cpus.effective" of a partition root
is the CPUs that the partition root can dedicate to a potential
new child partition root. The new child subtracts available
CPUs from its parent "cpuset.cpus.effective".
A parent partition cannot distribute all its CPUs to its
child partitions. There must be at least one cpu left in the
parent partition.
A partition root ("root" or "isolated") can be in one of the
two possible states - valid or invalid. An invalid partition
root is in a degraded state where some state information may
be retained, but behaves more like a "member".
Once becoming a partition root, changes to "cpuset.cpus" is
generally allowed as long as the first condition above is true,
the change will not take away all the CPUs from the parent
partition and the new "cpuset.cpus" value is a superset of its
children's "cpuset.cpus" values.
All possible state transitions among "member", "root" and
"isolated" are allowed.
Sometimes, external factors like changes to ancestors'
"cpuset.cpus" or cpu hotplug can cause the state of the partition
root to change. On read, the "cpuset.sched.partition" file
can show the following values.
On read, the "cpuset.cpus.partition" file can show the following
values.
============== ==============================
"member" Non-root member of a partition
"root" Partition root
"root invalid" Invalid partition root
============== ==============================
============================= =====================================
"member" Non-root member of a partition
"root" Partition root
"isolated" Partition root without load balancing
"root invalid (<reason>)" Invalid partition root
"isolated invalid (<reason>)" Invalid isolated partition root
============================= =====================================
It is a partition root if the first 2 partition root conditions
above are true and at least one CPU from "cpuset.cpus" is
granted by the parent cgroup.
In the case of an invalid partition root, a descriptive string on
why the partition is invalid is included within parentheses.
A partition root can become invalid if none of CPUs requested
in "cpuset.cpus" can be granted by the parent cgroup or the
parent cgroup is no longer a partition root itself. In this
case, it is not a real partition even though the restriction
of the first partition root condition above will still apply.
The cpu affinity of all the tasks in the cgroup will then be
associated with CPUs in the nearest ancestor partition.
For a partition root to become valid, the following conditions
must be met.
An invalid partition root can be transitioned back to a
real partition root if at least one of the requested CPUs
can now be granted by its parent. In this case, the cpu
affinity of all the tasks in the formerly invalid partition
will be associated to the CPUs of the newly formed partition.
Changing the partition state of an invalid partition root to
"member" is always allowed even if child cpusets are present.
1) The "cpuset.cpus" is exclusive with its siblings , i.e. they
are not shared by any of its siblings (exclusivity rule).
2) The parent cgroup is a valid partition root.
3) The "cpuset.cpus" is not empty and must contain at least
one of the CPUs from parent's "cpuset.cpus", i.e. they overlap.
4) The "cpuset.cpus.effective" cannot be empty unless there is
no task associated with this partition.
External events like hotplug or changes to "cpuset.cpus" can
cause a valid partition root to become invalid and vice versa.
Note that a task cannot be moved to a cgroup with empty
"cpuset.cpus.effective".
For a valid partition root with the sibling cpu exclusivity
rule enabled, changes made to "cpuset.cpus" that violate the
exclusivity rule will invalidate the partition as well as its
sibiling partitions with conflicting cpuset.cpus values. So
care must be taking in changing "cpuset.cpus".
A valid non-root parent partition may distribute out all its CPUs
to its child partitions when there is no task associated with it.
Care must be taken to change a valid partition root to
"member" as all its child partitions, if present, will become
invalid causing disruption to tasks running in those child
partitions. These inactivated partitions could be recovered if
their parent is switched back to a partition root with a proper
set of "cpuset.cpus".
Poll and inotify events are triggered whenever the state of
"cpuset.cpus.partition" changes. That includes changes caused
by write to "cpuset.cpus.partition", cpu hotplug or other
changes that modify the validity status of the partition.
This will allow user space agents to monitor unexpected changes
to "cpuset.cpus.partition" without the need to do continuous
polling.
Device controller

View File

@ -141,6 +141,10 @@ root_hash_sig_key_desc <key_description>
also gain new certificates at run time if they are signed by a certificate
already in the secondary trusted keyring.
try_verify_in_tasklet
If verity hashes are in cache, verify data blocks in kernel tasklet instead
of workqueue. This option can reduce IO latency.
Theory of operation
===================

View File

@ -5,143 +5,115 @@ Dynamic debug
Introduction
============
This document describes how to use the dynamic debug (dyndbg) feature.
Dynamic debug allows you to dynamically enable/disable kernel
debug-print code to obtain additional kernel information.
Dynamic debug is designed to allow you to dynamically enable/disable
kernel code to obtain additional kernel information. Currently, if
``CONFIG_DYNAMIC_DEBUG`` is set, then all ``pr_debug()``/``dev_dbg()`` and
``print_hex_dump_debug()``/``print_hex_dump_bytes()`` calls can be dynamically
enabled per-callsite.
If ``/proc/dynamic_debug/control`` exists, your kernel has dynamic
debug. You'll need root access (sudo su) to use this.
If you do not want to enable dynamic debug globally (i.e. in some embedded
system), you may set ``CONFIG_DYNAMIC_DEBUG_CORE`` as basic support of dynamic
debug and add ``ccflags := -DDYNAMIC_DEBUG_MODULE`` into the Makefile of any
modules which you'd like to dynamically debug later.
Dynamic debug provides:
If ``CONFIG_DYNAMIC_DEBUG`` is not set, ``print_hex_dump_debug()`` is just
shortcut for ``print_hex_dump(KERN_DEBUG)``.
* a Catalog of all *prdbgs* in your kernel.
``cat /proc/dynamic_debug/control`` to see them.
For ``print_hex_dump_debug()``/``print_hex_dump_bytes()``, format string is
its ``prefix_str`` argument, if it is constant string; or ``hexdump``
in case ``prefix_str`` is built dynamically.
Dynamic debug has even more useful features:
* Simple query language allows turning on and off debugging
statements by matching any combination of 0 or 1 of:
* a Simple query/command language to alter *prdbgs* by selecting on
any combination of 0 or 1 of:
- source filename
- function name
- line number (including ranges of line numbers)
- module name
- format string
* Provides a debugfs control file: ``<debugfs>/dynamic_debug/control``
which can be read to display the complete list of known debug
statements, to help guide you
Controlling dynamic debug Behaviour
===================================
The behaviour of ``pr_debug()``/``dev_dbg()`` are controlled via writing to a
control file in the 'debugfs' filesystem. Thus, you must first mount
the debugfs filesystem, in order to make use of this feature.
Subsequently, we refer to the control file as:
``<debugfs>/dynamic_debug/control``. For example, if you want to enable
printing from source file ``svcsock.c``, line 1603 you simply do::
nullarbor:~ # echo 'file svcsock.c line 1603 +p' >
<debugfs>/dynamic_debug/control
If you make a mistake with the syntax, the write will fail thus::
nullarbor:~ # echo 'file svcsock.c wtf 1 +p' >
<debugfs>/dynamic_debug/control
-bash: echo: write error: Invalid argument
Note, for systems without 'debugfs' enabled, the control file can be
found in ``/proc/dynamic_debug/control``.
- class name (as known/declared by each module)
Viewing Dynamic Debug Behaviour
===============================
You can view the currently configured behaviour of all the debug
statements via::
You can view the currently configured behaviour in the *prdbg* catalog::
nullarbor:~ # cat <debugfs>/dynamic_debug/control
:#> head -n7 /proc/dynamic_debug/control
# filename:lineno [module]function flags format
net/sunrpc/svc_rdma.c:323 [svcxprt_rdma]svc_rdma_cleanup =_ "SVCRDMA Module Removed, deregister RPC RDMA transport\012"
net/sunrpc/svc_rdma.c:341 [svcxprt_rdma]svc_rdma_init =_ "\011max_inline : %d\012"
net/sunrpc/svc_rdma.c:340 [svcxprt_rdma]svc_rdma_init =_ "\011sq_depth : %d\012"
net/sunrpc/svc_rdma.c:338 [svcxprt_rdma]svc_rdma_init =_ "\011max_requests : %d\012"
...
init/main.c:1179 [main]initcall_blacklist =_ "blacklisting initcall %s\012
init/main.c:1218 [main]initcall_blacklisted =_ "initcall %s blacklisted\012"
init/main.c:1424 [main]run_init_process =_ " with arguments:\012"
init/main.c:1426 [main]run_init_process =_ " %s\012"
init/main.c:1427 [main]run_init_process =_ " with environment:\012"
init/main.c:1429 [main]run_init_process =_ " %s\012"
The 3rd space-delimited column shows the current flags, preceded by
a ``=`` for easy use with grep/cut. ``=p`` shows enabled callsites.
You can also apply standard Unix text manipulation filters to this
data, e.g.::
Controlling dynamic debug Behaviour
===================================
nullarbor:~ # grep -i rdma <debugfs>/dynamic_debug/control | wc -l
62
The behaviour of *prdbg* sites are controlled by writing
query/commands to the control file. Example::
nullarbor:~ # grep -i tcp <debugfs>/dynamic_debug/control | wc -l
42
# grease the interface
:#> alias ddcmd='echo $* > /proc/dynamic_debug/control'
The third column shows the currently enabled flags for each debug
statement callsite (see below for definitions of the flags). The
default value, with no flags enabled, is ``=_``. So you can view all
the debug statement callsites with any non-default flags::
:#> ddcmd '-p; module main func run* +p'
:#> grep =p /proc/dynamic_debug/control
init/main.c:1424 [main]run_init_process =p " with arguments:\012"
init/main.c:1426 [main]run_init_process =p " %s\012"
init/main.c:1427 [main]run_init_process =p " with environment:\012"
init/main.c:1429 [main]run_init_process =p " %s\012"
nullarbor:~ # awk '$3 != "=_"' <debugfs>/dynamic_debug/control
# filename:lineno [module]function flags format
net/sunrpc/svcsock.c:1603 [sunrpc]svc_send p "svc_process: st_sendto returned %d\012"
Error messages go to console/syslog::
:#> ddcmd mode foo +p
dyndbg: unknown keyword "mode"
dyndbg: query parse failed
bash: echo: write error: Invalid argument
If debugfs is also enabled and mounted, ``dynamic_debug/control`` is
also under the mount-dir, typically ``/sys/kernel/debug/``.
Command Language Reference
==========================
At the lexical level, a command comprises a sequence of words separated
At the basic lexical level, a command is a sequence of words separated
by spaces or tabs. So these are all equivalent::
nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' >
<debugfs>/dynamic_debug/control
nullarbor:~ # echo -n ' file svcsock.c line 1603 +p ' >
<debugfs>/dynamic_debug/control
nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' >
<debugfs>/dynamic_debug/control
:#> ddcmd file svcsock.c line 1603 +p
:#> ddcmd "file svcsock.c line 1603 +p"
:#> ddcmd ' file svcsock.c line 1603 +p '
Command submissions are bounded by a write() system call.
Multiple commands can be written together, separated by ``;`` or ``\n``::
~# echo "func pnpacpi_get_resources +p; func pnp_assign_mem +p" \
> <debugfs>/dynamic_debug/control
:#> ddcmd "func pnpacpi_get_resources +p; func pnp_assign_mem +p"
:#> ddcmd <<"EOC"
func pnpacpi_get_resources +p
func pnp_assign_mem +p
EOC
:#> cat query-batch-file > /proc/dynamic_debug/control
If your query set is big, you can batch them too::
You can also use wildcards in each query term. The match rule supports
``*`` (matches zero or more characters) and ``?`` (matches exactly one
character). For example, you can match all usb drivers::
~# cat query-batch-file > <debugfs>/dynamic_debug/control
:#> ddcmd file "drivers/usb/*" +p # "" to suppress shell expansion
Another way is to use wildcards. The match rule supports ``*`` (matches
zero or more characters) and ``?`` (matches exactly one character). For
example, you can match all usb drivers::
~# echo "file drivers/usb/* +p" > <debugfs>/dynamic_debug/control
At the syntactical level, a command comprises a sequence of match
specifications, followed by a flags change specification::
Syntactically, a command is pairs of keyword values, followed by a
flags change or setting::
command ::= match-spec* flags-spec
The match-spec's are used to choose a subset of the known pr_debug()
callsites to which to apply the flags-spec. Think of them as a query
with implicit ANDs between each pair. Note that an empty list of
match-specs will select all debug statement callsites.
The match-spec's select *prdbgs* from the catalog, upon which to apply
the flags-spec, all constraints are ANDed together. An absent keyword
is the same as keyword "*".
A match specification comprises a keyword, which controls the
attribute of the callsite to be compared, and a value to compare
against. Possible keywords are:::
A match specification is a keyword, which selects the attribute of
the callsite to be compared, and a value to compare against. Possible
keywords are:::
match-spec ::= 'func' string |
'file' string |
'module' string |
'format' string |
'class' string |
'line' line-range
line-range ::= lineno |
@ -203,6 +175,16 @@ format
format "nfsd: SETATTR" // a neater way to match a format with whitespace
format 'nfsd: SETATTR' // yet another way to match a format with whitespace
class
The given class_name is validated against each module, which may
have declared a list of known class_names. If the class_name is
found for a module, callsite & class matching and adjustment
proceeds. Examples::
class DRM_UT_KMS # a DRM.debug category
class JUNK # silent non-match
// class TLD_* # NOTICE: no wildcard in class names
line
The given line number or range of line numbers is compared
against the line number of each ``pr_debug()`` callsite. A single
@ -228,17 +210,16 @@ of the characters::
The flags are::
p enables the pr_debug() callsite.
f Include the function name in the printed message
l Include line number in the printed message
m Include module name in the printed message
t Include thread ID in messages not generated from interrupt context
_ No flags are set. (Or'd with others on input)
_ enables no flags.
For ``print_hex_dump_debug()`` and ``print_hex_dump_bytes()``, only ``p`` flag
have meaning, other flags ignored.
Decorator flags add to the message-prefix, in order:
t Include thread ID, or <intr>
m Include module name
f Include the function name
l Include line number
For display, the flags are preceded by ``=``
(mnemonic: what the flags are currently equal to).
For ``print_hex_dump_debug()`` and ``print_hex_dump_bytes()``, only
the ``p`` flag has meaning, other flags are ignored.
Note the regexp ``^[-+=][flmpt_]+$`` matches a flags specification.
To clear all flags at once, use ``=_`` or ``-flmpt``.
@ -313,7 +294,7 @@ For ``CONFIG_DYNAMIC_DEBUG`` kernels, any settings given at boot-time (or
enabled by ``-DDEBUG`` flag during compilation) can be disabled later via
the debugfs interface if the debug messages are no longer needed::
echo "module module_name -p" > <debugfs>/dynamic_debug/control
echo "module module_name -p" > /proc/dynamic_debug/control
Examples
========
@ -321,37 +302,31 @@ Examples
::
// enable the message at line 1603 of file svcsock.c
nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' >
<debugfs>/dynamic_debug/control
:#> ddcmd 'file svcsock.c line 1603 +p'
// enable all the messages in file svcsock.c
nullarbor:~ # echo -n 'file svcsock.c +p' >
<debugfs>/dynamic_debug/control
:#> ddcmd 'file svcsock.c +p'
// enable all the messages in the NFS server module
nullarbor:~ # echo -n 'module nfsd +p' >
<debugfs>/dynamic_debug/control
:#> ddcmd 'module nfsd +p'
// enable all 12 messages in the function svc_process()
nullarbor:~ # echo -n 'func svc_process +p' >
<debugfs>/dynamic_debug/control
:#> ddcmd 'func svc_process +p'
// disable all 12 messages in the function svc_process()
nullarbor:~ # echo -n 'func svc_process -p' >
<debugfs>/dynamic_debug/control
:#> ddcmd 'func svc_process -p'
// enable messages for NFS calls READ, READLINK, READDIR and READDIR+.
nullarbor:~ # echo -n 'format "nfsd: READ" +p' >
<debugfs>/dynamic_debug/control
:#> ddcmd 'format "nfsd: READ" +p'
// enable messages in files of which the paths include string "usb"
nullarbor:~ # echo -n 'file *usb* +p' > <debugfs>/dynamic_debug/control
:#> ddcmd 'file *usb* +p' > /proc/dynamic_debug/control
// enable all messages
nullarbor:~ # echo -n '+p' > <debugfs>/dynamic_debug/control
:#> ddcmd '+p' > /proc/dynamic_debug/control
// add module, function to all enabled messages
nullarbor:~ # echo -n '+mf' > <debugfs>/dynamic_debug/control
:#> ddcmd '+mf' > /proc/dynamic_debug/control
// boot-args example, with newlines and comments for readability
Kernel command line: ...
@ -364,3 +339,38 @@ Examples
dyndbg="file init/* +p #cmt ; func parse_one +p"
// enable pr_debugs in 2 functions in a module loaded later
pc87360.dyndbg="func pc87360_init_device +p; func pc87360_find +p"
Kernel Configuration
====================
Dynamic Debug is enabled via kernel config items::
CONFIG_DYNAMIC_DEBUG=y # build catalog, enables CORE
CONFIG_DYNAMIC_DEBUG_CORE=y # enable mechanics only, skip catalog
If you do not want to enable dynamic debug globally (i.e. in some embedded
system), you may set ``CONFIG_DYNAMIC_DEBUG_CORE`` as basic support of dynamic
debug and add ``ccflags := -DDYNAMIC_DEBUG_MODULE`` into the Makefile of any
modules which you'd like to dynamically debug later.
Kernel *prdbg* API
==================
The following functions are cataloged and controllable when dynamic
debug is enabled::
pr_debug()
dev_dbg()
print_hex_dump_debug()
print_hex_dump_bytes()
Otherwise, they are off by default; ``ccflags += -DDEBUG`` or
``#define DEBUG`` in a source file will enable them appropriately.
If ``CONFIG_DYNAMIC_DEBUG`` is not set, ``print_hex_dump_debug()`` is
just a shortcut for ``print_hex_dump(KERN_DEBUG)``.
For ``print_hex_dump_debug()``/``print_hex_dump_bytes()``, format string is
its ``prefix_str`` argument, if it is constant string; or ``hexdump``
in case ``prefix_str`` is built dynamically.

View File

@ -321,6 +321,8 @@
force_enable - Force enable the IOMMU on platforms known
to be buggy with IOMMU enabled. Use this
option with care.
pgtbl_v1 - Use v1 page table for DMA-API (Default).
pgtbl_v2 - Use v2 page table for DMA-API.
amd_iommu_dump= [HW,X86-64]
Enable AMD IOMMU driver option to dump the ACPI table
@ -1467,6 +1469,14 @@
Permit 'security.evm' to be updated regardless of
current integrity status.
early_page_ext [KNL] Enforces page_ext initialization to earlier
stages so cover more early boot allocations.
Please note that as side effect some optimizations
might be disabled to achieve that (e.g. parallelized
memory initialization is disabled) so the boot process
might take longer, especially on systems with a lot of
memory. Available with CONFIG_PAGE_EXTENSION=y.
failslab=
fail_usercopy=
fail_page_alloc=
@ -3629,7 +3639,7 @@
(bounds check bypass). With this option data leaks are
possible in the system.
nospectre_v2 [X86,PPC_FSL_BOOK3E,ARM64] Disable all mitigations for
nospectre_v2 [X86,PPC_E500,ARM64] Disable all mitigations for
the Spectre variant 2 (indirect branch prediction)
vulnerability. System may allow data leaks with this
option.
@ -3748,9 +3758,9 @@
[X86,PV_OPS] Disable paravirtualized VMware scheduler
clock and use the default one.
no-steal-acc [X86,PV_OPS,ARM64] Disable paravirtualized steal time
accounting. steal time is computed, but won't
influence scheduler behaviour
no-steal-acc [X86,PV_OPS,ARM64,PPC/PSERIES] Disable paravirtualized
steal time accounting. steal time is computed, but
won't influence scheduler behaviour
nolapic [X86-32,APIC] Do not enable or use the local APIC.
@ -6039,12 +6049,6 @@
This parameter controls use of the Protected
Execution Facility on pSeries.
swapaccount= [KNL]
Format: [0|1]
Enable accounting of swap in memory resource
controller if no parameter or 1 is given or disable
it if 0 is given (See Documentation/admin-guide/cgroup-v1/memory.rst)
swiotlb= [ARM,IA-64,PPC,MIPS,X86]
Format: { <int> [,<int>] | force | noforce }
<int> -- Number of I/O TLB slabs
@ -6847,6 +6851,12 @@
Crash from Xen panic notifier, without executing late
panic() code such as dumping handler.
xen_msr_safe= [X86,XEN]
Format: <bool>
Select whether to always use non-faulting (safe) MSR
access functions when running as Xen PV guest. The
default value is controlled by CONFIG_XEN_PV_MSR_SAFE.
xen_nopvspin [X86,XEN]
Disables the qspinlock slowpath using Xen PV optimizations.
This parameter is obsoleted by "nopvspin" parameter, which

View File

@ -5,10 +5,10 @@ CMA Debugfs Interface
The CMA debugfs interface is useful to retrieve basic information out of the
different CMA areas and to test allocation/release in each of the areas.
Each CMA zone represents a directory under <debugfs>/cma/, indexed by the
kernel's CMA index. So the first CMA zone would be:
Each CMA area represents a directory under <debugfs>/cma/, represented by
its CMA name like below:
<debugfs>/cma/cma-0
<debugfs>/cma/<cma_name>
The structure of the files created under that directory is as follows:
@ -18,8 +18,8 @@ The structure of the files created under that directory is as follows:
- [RO] bitmap: The bitmap of page states in the zone.
- [WO] alloc: Allocate N pages from that CMA area. For example::
echo 5 > <debugfs>/cma/cma-2/alloc
echo 5 > <debugfs>/cma/<cma_name>/alloc
would try to allocate 5 pages from the cma-2 area.
would try to allocate 5 pages from the 'cma_name' area.
- [WO] free: Free N pages from that CMA area, similar to the above.

View File

@ -1,8 +1,8 @@
.. SPDX-License-Identifier: GPL-2.0
========================
Monitoring Data Accesses
========================
==========================
DAMON: Data Access MONitor
==========================
:doc:`DAMON </mm/damon/index>` allows light-weight data access monitoring.
Using DAMON, users can analyze the memory access patterns of their systems and

View File

@ -29,16 +29,9 @@ called DAMON Operator (DAMO). It is available at
https://github.com/awslabs/damo. The examples below assume that ``damo`` is on
your ``$PATH``. It's not mandatory, though.
Because DAMO is using the debugfs interface (refer to :doc:`usage` for the
detail) of DAMON, you should ensure debugfs is mounted. Mount it manually as
below::
# mount -t debugfs none /sys/kernel/debug/
or append the following line to your ``/etc/fstab`` file so that your system
can automatically mount debugfs upon booting::
debugfs /sys/kernel/debug debugfs defaults 0 0
Because DAMO is using the sysfs interface (refer to :doc:`usage` for the
detail) of DAMON, you should ensure :doc:`sysfs </filesystems/sysfs>` is
mounted.
Recording Data Access Patterns

View File

@ -393,6 +393,11 @@ the files as above. Above is only for an example.
debugfs Interface
=================
.. note::
DAMON debugfs interface will be removed after next LTS kernel is released, so
users should move to the :ref:`sysfs interface <sysfs_interface>`.
DAMON exports eight files, ``attrs``, ``target_ids``, ``init_regions``,
``schemes``, ``monitor_on``, ``kdamond_pid``, ``mk_contexts`` and
``rm_contexts`` under its debugfs directory, ``<debugfs>/damon/``.

View File

@ -32,6 +32,7 @@ the Linux memory management.
idle_page_tracking
ksm
memory-hotplug
multigen_lru
nommu-mmap
numa_memory_policy
numaperf

View File

@ -184,6 +184,42 @@ The maximum possible ``pages_sharing/pages_shared`` ratio is limited by the
``max_page_sharing`` tunable. To increase the ratio ``max_page_sharing`` must
be increased accordingly.
Monitoring KSM profit
=====================
KSM can save memory by merging identical pages, but also can consume
additional memory, because it needs to generate a number of rmap_items to
save each scanned page's brief rmap information. Some of these pages may
be merged, but some may not be abled to be merged after being checked
several times, which are unprofitable memory consumed.
1) How to determine whether KSM save memory or consume memory in system-wide
range? Here is a simple approximate calculation for reference::
general_profit =~ pages_sharing * sizeof(page) - (all_rmap_items) *
sizeof(rmap_item);
where all_rmap_items can be easily obtained by summing ``pages_sharing``,
``pages_shared``, ``pages_unshared`` and ``pages_volatile``.
2) The KSM profit inner a single process can be similarly obtained by the
following approximate calculation::
process_profit =~ ksm_merging_pages * sizeof(page) -
ksm_rmap_items * sizeof(rmap_item).
where ksm_merging_pages is shown under the directory ``/proc/<pid>/``,
and ksm_rmap_items is shown in ``/proc/<pid>/ksm_stat``.
From the perspective of application, a high ratio of ``ksm_rmap_items`` to
``ksm_merging_pages`` means a bad madvise-applied policy, so developers or
administrators have to rethink how to change madvise policy. Giving an example
for reference, a page's size is usually 4K, and the rmap_item's size is
separately 32B on 32-bit CPU architecture and 64B on 64-bit CPU architecture.
so if the ``ksm_rmap_items/ksm_merging_pages`` ratio exceeds 64 on 64-bit CPU
or exceeds 128 on 32-bit CPU, then the app's madvise policy should be dropped,
because the ksm profit is approximately zero or negative.
Monitoring KSM events
=====================

View File

@ -0,0 +1,162 @@
.. SPDX-License-Identifier: GPL-2.0
=============
Multi-Gen LRU
=============
The multi-gen LRU is an alternative LRU implementation that optimizes
page reclaim and improves performance under memory pressure. Page
reclaim decides the kernel's caching policy and ability to overcommit
memory. It directly impacts the kswapd CPU usage and RAM efficiency.
Quick start
===========
Build the kernel with the following configurations.
* ``CONFIG_LRU_GEN=y``
* ``CONFIG_LRU_GEN_ENABLED=y``
All set!
Runtime options
===============
``/sys/kernel/mm/lru_gen/`` contains stable ABIs described in the
following subsections.
Kill switch
-----------
``enabled`` accepts different values to enable or disable the
following components. Its default value depends on
``CONFIG_LRU_GEN_ENABLED``. All the components should be enabled
unless some of them have unforeseen side effects. Writing to
``enabled`` has no effect when a component is not supported by the
hardware, and valid values will be accepted even when the main switch
is off.
====== ===============================================================
Values Components
====== ===============================================================
0x0001 The main switch for the multi-gen LRU.
0x0002 Clearing the accessed bit in leaf page table entries in large
batches, when MMU sets it (e.g., on x86). This behavior can
theoretically worsen lock contention (mmap_lock). If it is
disabled, the multi-gen LRU will suffer a minor performance
degradation for workloads that contiguously map hot pages,
whose accessed bits can be otherwise cleared by fewer larger
batches.
0x0004 Clearing the accessed bit in non-leaf page table entries as
well, when MMU sets it (e.g., on x86). This behavior was not
verified on x86 varieties other than Intel and AMD. If it is
disabled, the multi-gen LRU will suffer a negligible
performance degradation.
[yYnN] Apply to all the components above.
====== ===============================================================
E.g.,
::
echo y >/sys/kernel/mm/lru_gen/enabled
cat /sys/kernel/mm/lru_gen/enabled
0x0007
echo 5 >/sys/kernel/mm/lru_gen/enabled
cat /sys/kernel/mm/lru_gen/enabled
0x0005
Thrashing prevention
--------------------
Personal computers are more sensitive to thrashing because it can
cause janks (lags when rendering UI) and negatively impact user
experience. The multi-gen LRU offers thrashing prevention to the
majority of laptop and desktop users who do not have ``oomd``.
Users can write ``N`` to ``min_ttl_ms`` to prevent the working set of
``N`` milliseconds from getting evicted. The OOM killer is triggered
if this working set cannot be kept in memory. In other words, this
option works as an adjustable pressure relief valve, and when open, it
terminates applications that are hopefully not being used.
Based on the average human detectable lag (~100ms), ``N=1000`` usually
eliminates intolerable janks due to thrashing. Larger values like
``N=3000`` make janks less noticeable at the risk of premature OOM
kills.
The default value ``0`` means disabled.
Experimental features
=====================
``/sys/kernel/debug/lru_gen`` accepts commands described in the
following subsections. Multiple command lines are supported, so does
concatenation with delimiters ``,`` and ``;``.
``/sys/kernel/debug/lru_gen_full`` provides additional stats for
debugging. ``CONFIG_LRU_GEN_STATS=y`` keeps historical stats from
evicted generations in this file.
Working set estimation
----------------------
Working set estimation measures how much memory an application needs
in a given time interval, and it is usually done with little impact on
the performance of the application. E.g., data centers want to
optimize job scheduling (bin packing) to improve memory utilizations.
When a new job comes in, the job scheduler needs to find out whether
each server it manages can allocate a certain amount of memory for
this new job before it can pick a candidate. To do so, the job
scheduler needs to estimate the working sets of the existing jobs.
When it is read, ``lru_gen`` returns a histogram of numbers of pages
accessed over different time intervals for each memcg and node.
``MAX_NR_GENS`` decides the number of bins for each histogram. The
histograms are noncumulative.
::
memcg memcg_id memcg_path
node node_id
min_gen_nr age_in_ms nr_anon_pages nr_file_pages
...
max_gen_nr age_in_ms nr_anon_pages nr_file_pages
Each bin contains an estimated number of pages that have been accessed
within ``age_in_ms``. E.g., ``min_gen_nr`` contains the coldest pages
and ``max_gen_nr`` contains the hottest pages, since ``age_in_ms`` of
the former is the largest and that of the latter is the smallest.
Users can write the following command to ``lru_gen`` to create a new
generation ``max_gen_nr+1``:
``+ memcg_id node_id max_gen_nr [can_swap [force_scan]]``
``can_swap`` defaults to the swap setting and, if it is set to ``1``,
it forces the scan of anon pages when swap is off, and vice versa.
``force_scan`` defaults to ``1`` and, if it is set to ``0``, it
employs heuristics to reduce the overhead, which is likely to reduce
the coverage as well.
A typical use case is that a job scheduler runs this command at a
certain time interval to create new generations, and it ranks the
servers it manages based on the sizes of their cold pages defined by
this time interval.
Proactive reclaim
-----------------
Proactive reclaim induces page reclaim when there is no memory
pressure. It usually targets cold pages only. E.g., when a new job
comes in, the job scheduler wants to proactively reclaim cold pages on
the server it selected, to improve the chance of successfully landing
this new job.
Users can write the following command to ``lru_gen`` to evict
generations less than or equal to ``min_gen_nr``.
``- memcg_id node_id min_gen_nr [swappiness [nr_to_reclaim]]``
``min_gen_nr`` should be less than ``max_gen_nr-1``, since
``max_gen_nr`` and ``max_gen_nr-1`` are not fully aged (equivalent to
the active list) and therefore cannot be evicted. ``swappiness``
overrides the default value in ``/proc/sys/vm/swappiness``.
``nr_to_reclaim`` limits the number of pages to evict.
A typical use case is that a job scheduler runs this command before it
tries to land a new job on a server. If it fails to materialize enough
cold pages because of the overestimation, it retries on the next
server according to the ranking result obtained from the working set
estimation step. This less forceful approach limits the impacts on the
existing jobs.

View File

@ -191,7 +191,14 @@ allocation failure to throttle the next allocation attempt::
/sys/kernel/mm/transparent_hugepage/khugepaged/alloc_sleep_millisecs
The khugepaged progress can be seen in the number of pages collapsed::
The khugepaged progress can be seen in the number of pages collapsed (note
that this counter may not be an exact count of the number of pages
collapsed, since "collapsed" could mean multiple things: (1) A PTE mapping
being replaced by a PMD mapping, or (2) All 4K physical pages replaced by
one 2M hugepage. Each may happen independently, or together, depending on
the type of memory and the failures that occur. As such, this value should
be interpreted roughly as a sign of progress, and counters in /proc/vmstat
consulted for more accurate accounting)::
/sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed
@ -366,10 +373,9 @@ thp_split_pmd
page table entry.
thp_zero_page_alloc
is incremented every time a huge zero page is
successfully allocated. It includes allocations which where
dropped due race with other allocation. Note, it doesn't count
every map of the huge zero page, only its allocation.
is incremented every time a huge zero page used for thp is
successfully allocated. Note, it doesn't count every map of
the huge zero page, only its allocation.
thp_zero_page_alloc_failed
is incremented if kernel fails to allocate

View File

@ -17,7 +17,10 @@ of the ``PROT_NONE+SIGSEGV`` trick.
Design
======
Userfaults are delivered and resolved through the ``userfaultfd`` syscall.
Userspace creates a new userfaultfd, initializes it, and registers one or more
regions of virtual memory with it. Then, any page faults which occur within the
region(s) result in a message being delivered to the userfaultfd, notifying
userspace of the fault.
The ``userfaultfd`` (aside from registering and unregistering virtual
memory ranges) provides two primary functionalities:
@ -34,12 +37,11 @@ The real advantage of userfaults if compared to regular virtual memory
management of mremap/mprotect is that the userfaults in all their
operations never involve heavyweight structures like vmas (in fact the
``userfaultfd`` runtime load never takes the mmap_lock for writing).
Vmas are not suitable for page- (or hugepage) granular fault tracking
when dealing with virtual address spaces that could span
Terabytes. Too many vmas would be needed for that.
The ``userfaultfd`` once opened by invoking the syscall, can also be
The ``userfaultfd``, once created, can also be
passed using unix domain sockets to a manager process, so the same
manager process could handle the userfaults of a multitude of
different processes without them being aware about what is going on
@ -50,6 +52,39 @@ is a corner case that would currently return ``-EBUSY``).
API
===
Creating a userfaultfd
----------------------
There are two ways to create a new userfaultfd, each of which provide ways to
restrict access to this functionality (since historically userfaultfds which
handle kernel page faults have been a useful tool for exploiting the kernel).
The first way, supported since userfaultfd was introduced, is the
userfaultfd(2) syscall. Access to this is controlled in several ways:
- Any user can always create a userfaultfd which traps userspace page faults
only. Such a userfaultfd can be created using the userfaultfd(2) syscall
with the flag UFFD_USER_MODE_ONLY.
- In order to also trap kernel page faults for the address space, either the
process needs the CAP_SYS_PTRACE capability, or the system must have
vm.unprivileged_userfaultfd set to 1. By default, vm.unprivileged_userfaultfd
is set to 0.
The second way, added to the kernel more recently, is by opening
/dev/userfaultfd and issuing a USERFAULTFD_IOC_NEW ioctl to it. This method
yields equivalent userfaultfds to the userfaultfd(2) syscall.
Unlike userfaultfd(2), access to /dev/userfaultfd is controlled via normal
filesystem permissions (user/group/mode), which gives fine grained access to
userfaultfd specifically, without also granting other unrelated privileges at
the same time (as e.g. granting CAP_SYS_PTRACE would do). Users who have access
to /dev/userfaultfd can always create userfaultfds that trap kernel page faults;
vm.unprivileged_userfaultfd is not considered.
Initializing a userfaultfd
--------------------------
When first opened the ``userfaultfd`` must be enabled invoking the
``UFFDIO_API`` ioctl specifying a ``uffdio_api.api`` value set to ``UFFD_API`` (or
a later API version) which will specify the ``read/POLLIN`` protocol

View File

@ -65,6 +65,11 @@ combining the following values:
4 s3_beep
= =======
arch
====
The machine hardware name, the same output as ``uname -m``
(e.g. ``x86_64`` or ``aarch64``).
auto_msgmni
===========
@ -635,6 +640,17 @@ different types of memory (represented as different NUMA nodes) to
place the hot pages in the fast memory. This is implemented based on
unmapping and page fault too.
numa_balancing_promote_rate_limit_MBps
======================================
Too high promotion/demotion throughput between different memory types
may hurt application latency. This can be used to rate limit the
promotion throughput. The per-node max promotion throughput in MB/s
will be limited to be no more than the set value.
A rule of thumb is to set this to less than 1/10 of the PMEM node
write bandwidth.
oops_all_cpu_backtrace
======================

View File

@ -926,6 +926,9 @@ calls without any restrictions.
The default value is 0.
Another way to control permissions for userfaultfd is to use
/dev/userfaultfd instead of userfaultfd(2). See
Documentation/admin-guide/mm/userfaultfd.rst.
user_reserve_kbytes
===================

View File

@ -59,6 +59,7 @@ SoC-specific documents
stm32/stm32f429-overview
stm32/stm32mp13-overview
stm32/stm32mp157-overview
stm32/stm32-dma-mdma-chaining
sunxi

View File

@ -0,0 +1,415 @@
.. SPDX-License-Identifier: GPL-2.0
=======================
STM32 DMA-MDMA chaining
=======================
Introduction
------------
This document describes the STM32 DMA-MDMA chaining feature. But before going
further, let's introduce the peripherals involved.
To offload data transfers from the CPU, STM32 microprocessors (MPUs) embed
direct memory access controllers (DMA).
STM32MP1 SoCs embed both STM32 DMA and STM32 MDMA controllers. STM32 DMA
request routing capabilities are enhanced by a DMA request multiplexer
(STM32 DMAMUX).
**STM32 DMAMUX**
STM32 DMAMUX routes any DMA request from a given peripheral to any STM32 DMA
controller (STM32MP1 counts two STM32 DMA controllers) channels.
**STM32 DMA**
STM32 DMA is mainly used to implement central data buffer storage (usually in
the system SRAM) for different peripheral. It can access external RAMs but
without the ability to generate convenient burst transfer ensuring the best
load of the AXI.
**STM32 MDMA**
STM32 MDMA (Master DMA) is mainly used to manage direct data transfers between
RAM data buffers without CPU intervention. It can also be used in a
hierarchical structure that uses STM32 DMA as first level data buffer
interfaces for AHB peripherals, while the STM32 MDMA acts as a second level
DMA with better performance. As a AXI/AHB master, STM32 MDMA can take control
of the AXI/AHB bus.
Principles
----------
STM32 DMA-MDMA chaining feature relies on the strengths of STM32 DMA and
STM32 MDMA controllers.
STM32 DMA has a circular Double Buffer Mode (DBM). At each end of transaction
(when DMA data counter - DMA_SxNDTR - reaches 0), the memory pointers
(configured with DMA_SxSM0AR and DMA_SxM1AR) are swapped and the DMA data
counter is automatically reloaded. This allows the SW or the STM32 MDMA to
process one memory area while the second memory area is being filled/used by
the STM32 DMA transfer.
With STM32 MDMA linked-list mode, a single request initiates the data array
(collection of nodes) to be transferred until the linked-list pointer for the
channel is null. The channel transfer complete of the last node is the end of
transfer, unless first and last nodes are linked to each other, in such a
case, the linked-list loops on to create a circular MDMA transfer.
STM32 MDMA has direct connections with STM32 DMA. This enables autonomous
communication and synchronization between peripherals, thus saving CPU
resources and bus congestion. Transfer Complete signal of STM32 DMA channel
can triggers STM32 MDMA transfer. STM32 MDMA can clear the request generated
by the STM32 DMA by writing to its Interrupt Clear register (whose address is
stored in MDMA_CxMAR, and bit mask in MDMA_CxMDR).
.. table:: STM32 MDMA interconnect table with STM32 DMA
+--------------+----------------+-----------+------------+
| STM32 DMAMUX | STM32 DMA | STM32 DMA | STM32 MDMA |
| channels | channels | Transfer | request |
| | | complete | |
| | | signal | |
+==============+================+===========+============+
| Channel *0* | DMA1 channel 0 | dma1_tcf0 | *0x00* |
+--------------+----------------+-----------+------------+
| Channel *1* | DMA1 channel 1 | dma1_tcf1 | *0x01* |
+--------------+----------------+-----------+------------+
| Channel *2* | DMA1 channel 2 | dma1_tcf2 | *0x02* |
+--------------+----------------+-----------+------------+
| Channel *3* | DMA1 channel 3 | dma1_tcf3 | *0x03* |
+--------------+----------------+-----------+------------+
| Channel *4* | DMA1 channel 4 | dma1_tcf4 | *0x04* |
+--------------+----------------+-----------+------------+
| Channel *5* | DMA1 channel 5 | dma1_tcf5 | *0x05* |
+--------------+----------------+-----------+------------+
| Channel *6* | DMA1 channel 6 | dma1_tcf6 | *0x06* |
+--------------+----------------+-----------+------------+
| Channel *7* | DMA1 channel 7 | dma1_tcf7 | *0x07* |
+--------------+----------------+-----------+------------+
| Channel *8* | DMA2 channel 0 | dma2_tcf0 | *0x08* |
+--------------+----------------+-----------+------------+
| Channel *9* | DMA2 channel 1 | dma2_tcf1 | *0x09* |
+--------------+----------------+-----------+------------+
| Channel *10* | DMA2 channel 2 | dma2_tcf2 | *0x0A* |
+--------------+----------------+-----------+------------+
| Channel *11* | DMA2 channel 3 | dma2_tcf3 | *0x0B* |
+--------------+----------------+-----------+------------+
| Channel *12* | DMA2 channel 4 | dma2_tcf4 | *0x0C* |
+--------------+----------------+-----------+------------+
| Channel *13* | DMA2 channel 5 | dma2_tcf5 | *0x0D* |
+--------------+----------------+-----------+------------+
| Channel *14* | DMA2 channel 6 | dma2_tcf6 | *0x0E* |
+--------------+----------------+-----------+------------+
| Channel *15* | DMA2 channel 7 | dma2_tcf7 | *0x0F* |
+--------------+----------------+-----------+------------+
STM32 DMA-MDMA chaining feature then uses a SRAM buffer. STM32MP1 SoCs embed
three fast access static internal RAMs of various size, used for data storage.
Due to STM32 DMA legacy (within microcontrollers), STM32 DMA performances are
bad with DDR, while they are optimal with SRAM. Hence the SRAM buffer used
between STM32 DMA and STM32 MDMA. This buffer is split in two equal periods
and STM32 DMA uses one period while STM32 MDMA uses the other period
simultaneously.
::
dma[1:2]-tcf[0:7]
.----------------.
____________ ' _________ V____________
| STM32 DMA | / __|>_ \ | STM32 MDMA |
|------------| | / \ | |------------|
| DMA_SxM0AR |<=>| | SRAM | |<=>| []-[]...[] |
| DMA_SxM1AR | | \_____/ | | |
|____________| \___<|____/ |____________|
STM32 DMA-MDMA chaining uses (struct dma_slave_config).peripheral_config to
exchange the parameters needed to configure MDMA. These parameters are
gathered into a u32 array with three values:
* the STM32 MDMA request (which is actually the DMAMUX channel ID),
* the address of the STM32 DMA register to clear the Transfer Complete
interrupt flag,
* the mask of the Transfer Complete interrupt flag of the STM32 DMA channel.
Device Tree updates for STM32 DMA-MDMA chaining support
-------------------------------------------------------
**1. Allocate a SRAM buffer**
SRAM device tree node is defined in SoC device tree. You can refer to it in
your board device tree to define your SRAM pool.
::
&sram {
my_foo_device_dma_pool: dma-sram@0 {
reg = <0x0 0x1000>;
};
};
Be careful of the start index, in case there are other SRAM consumers.
Define your pool size strategically: to optimise chaining, the idea is that
STM32 DMA and STM32 MDMA can work simultaneously, on each buffer of the
SRAM.
If the SRAM period is greater than the expected DMA transfer, then STM32 DMA
and STM32 MDMA will work sequentially instead of simultaneously. It is not a
functional issue but it is not optimal.
Don't forget to refer to your SRAM pool in your device node. You need to
define a new property.
::
&my_foo_device {
...
my_dma_pool = &my_foo_device_dma_pool;
};
Then get this SRAM pool in your foo driver and allocate your SRAM buffer.
**2. Allocate a STM32 DMA channel and a STM32 MDMA channel**
You need to define an extra channel in your device tree node, in addition to
the one you should already have for "classic" DMA operation.
This new channel must be taken from STM32 MDMA channels, so, the phandle of
the DMA controller to use is the MDMA controller's one.
::
&my_foo_device {
[...]
my_dma_pool = &my_foo_device_dma_pool;
dmas = <&dmamux1 ...>, // STM32 DMA channel
<&mdma1 0 0x3 0x1200000a 0 0>; // + STM32 MDMA channel
};
Concerning STM32 MDMA bindings:
1. The request line number : whatever the value here, it will be overwritten
by MDMA driver with the STM32 DMAMUX channel ID passed through
(struct dma_slave_config).peripheral_config
2. The priority level : choose Very High (0x3) so that your channel will
take priority other the other during request arbitration
3. A 32bit mask specifying the DMA channel configuration : source and
destination address increment, block transfer with 128 bytes per single
transfer
4. The 32bit value specifying the register to be used to acknowledge the
request: it will be overwritten by MDMA driver, with the DMA channel
interrupt flag clear register address passed through
(struct dma_slave_config).peripheral_config
5. The 32bit mask specifying the value to be written to acknowledge the
request: it will be overwritten by MDMA driver, with the DMA channel
Transfer Complete flag passed through
(struct dma_slave_config).peripheral_config
Driver updates for STM32 DMA-MDMA chaining support in foo driver
----------------------------------------------------------------
**0. (optional) Refactor the original sg_table if dmaengine_prep_slave_sg()**
In case of dmaengine_prep_slave_sg(), the original sg_table can't be used as
is. Two new sg_tables must be created from the original one. One for
STM32 DMA transfer (where memory address targets now the SRAM buffer instead
of DDR buffer) and one for STM32 MDMA transfer (where memory address targets
the DDR buffer).
The new sg_list items must fit SRAM period length. Here is an example for
DMA_DEV_TO_MEM:
::
/*
* Assuming sgl and nents, respectively the initial scatterlist and its
* length.
* Assuming sram_dma_buf and sram_period, respectively the memory
* allocated from the pool for DMA usage, and the length of the period,
* which is half of the sram_buf size.
*/
struct sg_table new_dma_sgt, new_mdma_sgt;
struct scatterlist *s, *_sgl;
dma_addr_t ddr_dma_buf;
u32 new_nents = 0, len;
int i;
/* Count the number of entries needed */
for_each_sg(sgl, s, nents, i)
if (sg_dma_len(s) > sram_period)
new_nents += DIV_ROUND_UP(sg_dma_len(s), sram_period);
else
new_nents++;
/* Create sg table for STM32 DMA channel */
ret = sg_alloc_table(&new_dma_sgt, new_nents, GFP_ATOMIC);
if (ret)
dev_err(dev, "DMA sg table alloc failed\n");
for_each_sg(new_dma_sgt.sgl, s, new_dma_sgt.nents, i) {
_sgl = sgl;
sg_dma_len(s) = min(sg_dma_len(_sgl), sram_period);
/* Targets the beginning = first half of the sram_buf */
s->dma_address = sram_buf;
/*
* Targets the second half of the sram_buf
* for odd indexes of the item of the sg_list
*/
if (i & 1)
s->dma_address += sram_period;
}
/* Create sg table for STM32 MDMA channel */
ret = sg_alloc_table(&new_mdma_sgt, new_nents, GFP_ATOMIC);
if (ret)
dev_err(dev, "MDMA sg_table alloc failed\n");
_sgl = sgl;
len = sg_dma_len(sgl);
ddr_dma_buf = sg_dma_address(sgl);
for_each_sg(mdma_sgt.sgl, s, mdma_sgt.nents, i) {
size_t bytes = min_t(size_t, len, sram_period);
sg_dma_len(s) = bytes;
sg_dma_address(s) = ddr_dma_buf;
len -= bytes;
if (!len && sg_next(_sgl)) {
_sgl = sg_next(_sgl);
len = sg_dma_len(_sgl);
ddr_dma_buf = sg_dma_address(_sgl);
} else {
ddr_dma_buf += bytes;
}
}
Don't forget to release these new sg_tables after getting the descriptors
with dmaengine_prep_slave_sg().
**1. Set controller specific parameters**
First, use dmaengine_slave_config() with a struct dma_slave_config to
configure STM32 DMA channel. You just have to take care of DMA addresses,
the memory address (depending on the transfer direction) must point on your
SRAM buffer, and set (struct dma_slave_config).peripheral_size != 0.
STM32 DMA driver will check (struct dma_slave_config).peripheral_size to
determine if chaining is being used or not. If it is used, then STM32 DMA
driver fills (struct dma_slave_config).peripheral_config with an array of
three u32 : the first one containing STM32 DMAMUX channel ID, the second one
the channel interrupt flag clear register address, and the third one the
channel Transfer Complete flag mask.
Then, use dmaengine_slave_config with another struct dma_slave_config to
configure STM32 MDMA channel. Take care of DMA addresses, the device address
(depending on the transfer direction) must point on your SRAM buffer, and
the memory address must point to the buffer originally used for "classic"
DMA operation. Use the previous (struct dma_slave_config).peripheral_size
and .peripheral_config that have been updated by STM32 DMA driver, to set
(struct dma_slave_config).peripheral_size and .peripheral_config of the
struct dma_slave_config to configure STM32 MDMA channel.
::
struct dma_slave_config dma_conf;
struct dma_slave_config mdma_conf;
memset(&dma_conf, 0, sizeof(dma_conf));
[...]
config.direction = DMA_DEV_TO_MEM;
config.dst_addr = sram_dma_buf; // SRAM buffer
config.peripheral_size = 1; // peripheral_size != 0 => chaining
dmaengine_slave_config(dma_chan, &dma_config);
memset(&mdma_conf, 0, sizeof(mdma_conf));
config.direction = DMA_DEV_TO_MEM;
mdma_conf.src_addr = sram_dma_buf; // SRAM buffer
mdma_conf.dst_addr = rx_dma_buf; // original memory buffer
mdma_conf.peripheral_size = dma_conf.peripheral_size; // <- dma_conf
mdma_conf.peripheral_config = dma_config.peripheral_config; // <- dma_conf
dmaengine_slave_config(mdma_chan, &mdma_conf);
**2. Get a descriptor for STM32 DMA channel transaction**
In the same way you get your descriptor for your "classic" DMA operation,
you just have to replace the original sg_list (in case of
dmaengine_prep_slave_sg()) with the new sg_list using SRAM buffer, or to
replace the original buffer address, length and period (in case of
dmaengine_prep_dma_cyclic()) with the new SRAM buffer.
**3. Get a descriptor for STM32 MDMA channel transaction**
If you previously get descriptor (for STM32 DMA) with
* dmaengine_prep_slave_sg(), then use dmaengine_prep_slave_sg() for
STM32 MDMA;
* dmaengine_prep_dma_cyclic(), then use dmaengine_prep_dma_cyclic() for
STM32 MDMA.
Use the new sg_list using SRAM buffer (in case of dmaengine_prep_slave_sg())
or, depending on the transfer direction, either the original DDR buffer (in
case of DMA_DEV_TO_MEM) or the SRAM buffer (in case of DMA_MEM_TO_DEV), the
source address being previously set with dmaengine_slave_config().
**4. Submit both transactions**
Before submitting your transactions, you may need to define on which
descriptor you want a callback to be called at the end of the transfer
(dmaengine_prep_slave_sg()) or the period (dmaengine_prep_dma_cyclic()).
Depending on the direction, set the callback on the descriptor that finishes
the overal transfer:
* DMA_DEV_TO_MEM: set the callback on the "MDMA" descriptor
* DMA_MEM_TO_DEV: set the callback on the "DMA" descriptor
Then, submit the descriptors whatever the order, with dmaengine_tx_submit().
**5. Issue pending requests (and wait for callback notification)**
As STM32 MDMA channel transfer is triggered by STM32 DMA, you must issue
STM32 MDMA channel before STM32 DMA channel.
If any, your callback will be called to warn you about the end of the overal
transfer or the period completion.
Don't forget to terminate both channels. STM32 DMA channel is configured in
cyclic Double-Buffer mode so it won't be disabled by HW, you need to terminate
it. STM32 MDMA channel will be stopped by HW in case of sg transfer, but not
in case of cyclic transfer. You can terminate it whatever the kind of transfer.
**STM32 DMA-MDMA chaining DMA_MEM_TO_DEV special case**
STM32 DMA-MDMA chaining in DMA_MEM_TO_DEV is a special case. Indeed, the
STM32 MDMA feeds the SRAM buffer with the DDR data, and the STM32 DMA reads
data from SRAM buffer. So some data (the first period) have to be copied in
SRAM buffer when the STM32 DMA starts to read.
A trick could be pausing the STM32 DMA channel (that will raise a Transfer
Complete signal, triggering the STM32 MDMA channel), but the first data read
by the STM32 DMA could be "wrong". The proper way is to prepare the first SRAM
period with dmaengine_prep_dma_memcpy(). Then this first period should be
"removed" from the sg or the cyclic transfer.
Due to this complexity, rather use the STM32 DMA-MDMA chaining for
DMA_DEV_TO_MEM and keep the "classic" DMA usage for DMA_MEM_TO_DEV, unless
you're not afraid.
Resources
---------
Application note, datasheet and reference manual are available on ST website
(STM32MP1_).
Dedicated focus on three application notes (AN5224_, AN4031_ & AN5001_)
dealing with STM32 DMAMUX, STM32 DMA and STM32 MDMA.
.. _STM32MP1: https://www.st.com/en/microcontrollers-microprocessors/stm32mp1-series.html
.. _AN5224: https://www.st.com/resource/en/application_note/an5224-stm32-dmamux-the-dma-request-router-stmicroelectronics.pdf
.. _AN4031: https://www.st.com/resource/en/application_note/dm00046011-using-the-stm32f2-stm32f4-and-stm32f7-series-dma-controller-stmicroelectronics.pdf
.. _AN5001: https://www.st.com/resource/en/application_note/an5001-stm32cube-expansion-package-for-stm32h7-series-mdma-stmicroelectronics.pdf
:Authors:
- Amelie Delaunay <amelie.delaunay@foss.st.com>

View File

@ -65,10 +65,6 @@ linux,uefi-mmap-desc-size 32-bit Size in bytes of each entry in the UEFI
linux,uefi-mmap-desc-ver 32-bit Version of the mmap descriptor format.
linux,initrd-start 64-bit Physical start address of an initrd
linux,initrd-end 64-bit Physical end address of an initrd
kaslr-seed 64-bit Entropy used to randomize the kernel image
base address location.
========================== ====== ===========================================

View File

@ -76,6 +76,8 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A55 | #1530923 | ARM64_ERRATUM_1530923 |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A55 | #2441007 | ARM64_ERRATUM_2441007 |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A57 | #832075 | ARM64_ERRATUM_832075 |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A57 | #852523 | N/A |

View File

@ -144,6 +144,42 @@ managing and controlling ublk devices with help of several control commands:
For retrieving device info via ``ublksrv_ctrl_dev_info``. It is the server's
responsibility to save IO target specific info in userspace.
- ``UBLK_CMD_START_USER_RECOVERY``
This command is valid if ``UBLK_F_USER_RECOVERY`` feature is enabled. This
command is accepted after the old process has exited, ublk device is quiesced
and ``/dev/ublkc*`` is released. User should send this command before he starts
a new process which re-opens ``/dev/ublkc*``. When this command returns, the
ublk device is ready for the new process.
- ``UBLK_CMD_END_USER_RECOVERY``
This command is valid if ``UBLK_F_USER_RECOVERY`` feature is enabled. This
command is accepted after ublk device is quiesced and a new process has
opened ``/dev/ublkc*`` and get all ublk queues be ready. When this command
returns, ublk device is unquiesced and new I/O requests are passed to the
new process.
- user recovery feature description
Two new features are added for user recovery: ``UBLK_F_USER_RECOVERY`` and
``UBLK_F_USER_RECOVERY_REISSUE``.
With ``UBLK_F_USER_RECOVERY`` set, after one ubq_daemon(ublk server's io
handler) is dying, ublk does not delete ``/dev/ublkb*`` during the whole
recovery stage and ublk device ID is kept. It is ublk server's
responsibility to recover the device context by its own knowledge.
Requests which have not been issued to userspace are requeued. Requests
which have been issued to userspace are aborted.
With ``UBLK_F_USER_RECOVERY_REISSUE`` set, after one ubq_daemon(ublk
server's io handler) is dying, contrary to ``UBLK_F_USER_RECOVERY``,
requests which have been issued to userspace are requeued and will be
re-issued to the new process after handling ``UBLK_CMD_END_USER_RECOVERY``.
``UBLK_F_USER_RECOVERY_REISSUE`` is designed for backends who tolerate
double-write since the driver may issue the same I/O request twice. It
might be useful to a read-only FS or a VM backend.
Data plane
----------

View File

@ -37,6 +37,7 @@ Library functionality that is used throughout the kernel.
kref
assoc_array
xarray
maple_tree
idr
circular-buffers
rbtree

View File

@ -0,0 +1,217 @@
.. SPDX-License-Identifier: GPL-2.0+
==========
Maple Tree
==========
:Author: Liam R. Howlett
Overview
========
The Maple Tree is a B-Tree data type which is optimized for storing
non-overlapping ranges, including ranges of size 1. The tree was designed to
be simple to use and does not require a user written search method. It
supports iterating over a range of entries and going to the previous or next
entry in a cache-efficient manner. The tree can also be put into an RCU-safe
mode of operation which allows reading and writing concurrently. Writers must
synchronize on a lock, which can be the default spinlock, or the user can set
the lock to an external lock of a different type.
The Maple Tree maintains a small memory footprint and was designed to use
modern processor cache efficiently. The majority of the users will be able to
use the normal API. An :ref:`maple-tree-advanced-api` exists for more complex
scenarios. The most important usage of the Maple Tree is the tracking of the
virtual memory areas.
The Maple Tree can store values between ``0`` and ``ULONG_MAX``. The Maple
Tree reserves values with the bottom two bits set to '10' which are below 4096
(ie 2, 6, 10 .. 4094) for internal use. If the entries may use reserved
entries then the users can convert the entries using xa_mk_value() and convert
them back by calling xa_to_value(). If the user needs to use a reserved
value, then the user can convert the value when using the
:ref:`maple-tree-advanced-api`, but are blocked by the normal API.
The Maple Tree can also be configured to support searching for a gap of a given
size (or larger).
Pre-allocating of nodes is also supported using the
:ref:`maple-tree-advanced-api`. This is useful for users who must guarantee a
successful store operation within a given
code segment when allocating cannot be done. Allocations of nodes are
relatively small at around 256 bytes.
.. _maple-tree-normal-api:
Normal API
==========
Start by initialising a maple tree, either with DEFINE_MTREE() for statically
allocated maple trees or mt_init() for dynamically allocated ones. A
freshly-initialised maple tree contains a ``NULL`` pointer for the range ``0``
- ``ULONG_MAX``. There are currently two types of maple trees supported: the
allocation tree and the regular tree. The regular tree has a higher branching
factor for internal nodes. The allocation tree has a lower branching factor
but allows the user to search for a gap of a given size or larger from either
``0`` upwards or ``ULONG_MAX`` down. An allocation tree can be used by
passing in the ``MT_FLAGS_ALLOC_RANGE`` flag when initialising the tree.
You can then set entries using mtree_store() or mtree_store_range().
mtree_store() will overwrite any entry with the new entry and return 0 on
success or an error code otherwise. mtree_store_range() works in the same way
but takes a range. mtree_load() is used to retrieve the entry stored at a
given index. You can use mtree_erase() to erase an entire range by only
knowing one value within that range, or mtree_store() call with an entry of
NULL may be used to partially erase a range or many ranges at once.
If you want to only store a new entry to a range (or index) if that range is
currently ``NULL``, you can use mtree_insert_range() or mtree_insert() which
return -EEXIST if the range is not empty.
You can search for an entry from an index upwards by using mt_find().
You can walk each entry within a range by calling mt_for_each(). You must
provide a temporary variable to store a cursor. If you want to walk each
element of the tree then ``0`` and ``ULONG_MAX`` may be used as the range. If
the caller is going to hold the lock for the duration of the walk then it is
worth looking at the mas_for_each() API in the :ref:`maple-tree-advanced-api`
section.
Sometimes it is necessary to ensure the next call to store to a maple tree does
not allocate memory, please see :ref:`maple-tree-advanced-api` for this use case.
Finally, you can remove all entries from a maple tree by calling
mtree_destroy(). If the maple tree entries are pointers, you may wish to free
the entries first.
Allocating Nodes
----------------
The allocations are handled by the internal tree code. See
:ref:`maple-tree-advanced-alloc` for other options.
Locking
-------
You do not have to worry about locking. See :ref:`maple-tree-advanced-locks`
for other options.
The Maple Tree uses RCU and an internal spinlock to synchronise access:
Takes RCU read lock:
* mtree_load()
* mt_find()
* mt_for_each()
* mt_next()
* mt_prev()
Takes ma_lock internally:
* mtree_store()
* mtree_store_range()
* mtree_insert()
* mtree_insert_range()
* mtree_erase()
* mtree_destroy()
* mt_set_in_rcu()
* mt_clear_in_rcu()
If you want to take advantage of the internal lock to protect the data
structures that you are storing in the Maple Tree, you can call mtree_lock()
before calling mtree_load(), then take a reference count on the object you
have found before calling mtree_unlock(). This will prevent stores from
removing the object from the tree between looking up the object and
incrementing the refcount. You can also use RCU to avoid dereferencing
freed memory, but an explanation of that is beyond the scope of this
document.
.. _maple-tree-advanced-api:
Advanced API
============
The advanced API offers more flexibility and better performance at the
cost of an interface which can be harder to use and has fewer safeguards.
You must take care of your own locking while using the advanced API.
You can use the ma_lock, RCU or an external lock for protection.
You can mix advanced and normal operations on the same array, as long
as the locking is compatible. The :ref:`maple-tree-normal-api` is implemented
in terms of the advanced API.
The advanced API is based around the ma_state, this is where the 'mas'
prefix originates. The ma_state struct keeps track of tree operations to make
life easier for both internal and external tree users.
Initialising the maple tree is the same as in the :ref:`maple-tree-normal-api`.
Please see above.
The maple state keeps track of the range start and end in mas->index and
mas->last, respectively.
mas_walk() will walk the tree to the location of mas->index and set the
mas->index and mas->last according to the range for the entry.
You can set entries using mas_store(). mas_store() will overwrite any entry
with the new entry and return the first existing entry that is overwritten.
The range is passed in as members of the maple state: index and last.
You can use mas_erase() to erase an entire range by setting index and
last of the maple state to the desired range to erase. This will erase
the first range that is found in that range, set the maple state index
and last as the range that was erased and return the entry that existed
at that location.
You can walk each entry within a range by using mas_for_each(). If you want
to walk each element of the tree then ``0`` and ``ULONG_MAX`` may be used as
the range. If the lock needs to be periodically dropped, see the locking
section mas_pause().
Using a maple state allows mas_next() and mas_prev() to function as if the
tree was a linked list. With such a high branching factor the amortized
performance penalty is outweighed by cache optimization. mas_next() will
return the next entry which occurs after the entry at index. mas_prev()
will return the previous entry which occurs before the entry at index.
mas_find() will find the first entry which exists at or above index on
the first call, and the next entry from every subsequent calls.
mas_find_rev() will find the fist entry which exists at or below the last on
the first call, and the previous entry from every subsequent calls.
If the user needs to yield the lock during an operation, then the maple state
must be paused using mas_pause().
There are a few extra interfaces provided when using an allocation tree.
If you wish to search for a gap within a range, then mas_empty_area()
or mas_empty_area_rev() can be used. mas_empty_area() searches for a gap
starting at the lowest index given up to the maximum of the range.
mas_empty_area_rev() searches for a gap starting at the highest index given
and continues downward to the lower bound of the range.
.. _maple-tree-advanced-alloc:
Advanced Allocating Nodes
-------------------------
Allocations are usually handled internally to the tree, however if allocations
need to occur before a write occurs then calling mas_expected_entries() will
allocate the worst-case number of needed nodes to insert the provided number of
ranges. This also causes the tree to enter mass insertion mode. Once
insertions are complete calling mas_destroy() on the maple state will free the
unused allocations.
.. _maple-tree-advanced-locks:
Advanced Locking
----------------
The maple tree uses a spinlock by default, but external locks can be used for
tree updates as well. To use an external lock, the tree must be initialized
with the ``MT_FLAGS_LOCK_EXTERN flag``, this is usually done with the
MTREE_INIT_EXT() #define, which takes an external lock as an argument.
Functions and structures
========================
.. kernel-doc:: include/linux/maple_tree.h
.. kernel-doc:: lib/maple_tree.c

View File

@ -19,9 +19,6 @@ User Space Memory Access
Memory Allocation Controls
==========================
.. kernel-doc:: include/linux/gfp.h
:internal:
.. kernel-doc:: include/linux/gfp_types.h
:doc: Page mobility and placement hints

View File

@ -612,6 +612,13 @@ Commit message
See: https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes
**BAD_FIXES_TAG**
The Fixes: tag is malformed or does not follow the community conventions.
This can occur if the tag have been split into multiple lines (e.g., when
pasted in an email program with word wrapping enabled).
See: https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes
Comparison style
----------------

View File

@ -24,6 +24,7 @@ Documentation/dev-tools/testing-overview.rst
kcov
gcov
kasan
kmsan
ubsan
kmemleak
kcsan

View File

@ -111,9 +111,17 @@ parameter can be used to control panic and reporting behaviour:
report or also panic the kernel (default: ``report``). The panic happens even
if ``kasan_multi_shot`` is enabled.
Hardware Tag-Based KASAN mode (see the section about various modes below) is
intended for use in production as a security mitigation. Therefore, it supports
additional boot parameters that allow disabling KASAN or controlling features:
Software and Hardware Tag-Based KASAN modes (see the section about various
modes below) support altering stack trace collection behavior:
- ``kasan.stacktrace=off`` or ``=on`` disables or enables alloc and free stack
traces collection (default: ``on``).
- ``kasan.stack_ring_size=<number of entries>`` specifies the number of entries
in the stack ring (default: ``32768``).
Hardware Tag-Based KASAN mode is intended for use in production as a security
mitigation. Therefore, it supports additional boot parameters that allow
disabling KASAN altogether or controlling its features:
- ``kasan=off`` or ``=on`` controls whether KASAN is enabled (default: ``on``).
@ -132,9 +140,6 @@ additional boot parameters that allow disabling KASAN or controlling features:
- ``kasan.vmalloc=off`` or ``=on`` disables or enables tagging of vmalloc
allocations (default: ``on``).
- ``kasan.stacktrace=off`` or ``=on`` disables or enables alloc and free stack
traces collection (default: ``on``).
Error reports
~~~~~~~~~~~~~

View File

@ -0,0 +1,427 @@
.. SPDX-License-Identifier: GPL-2.0
.. Copyright (C) 2022, Google LLC.
===================================
The Kernel Memory Sanitizer (KMSAN)
===================================
KMSAN is a dynamic error detector aimed at finding uses of uninitialized
values. It is based on compiler instrumentation, and is quite similar to the
userspace `MemorySanitizer tool`_.
An important note is that KMSAN is not intended for production use, because it
drastically increases kernel memory footprint and slows the whole system down.
Usage
=====
Building the kernel
-------------------
In order to build a kernel with KMSAN you will need a fresh Clang (14.0.6+).
Please refer to `LLVM documentation`_ for the instructions on how to build Clang.
Now configure and build the kernel with CONFIG_KMSAN enabled.
Example report
--------------
Here is an example of a KMSAN report::
=====================================================
BUG: KMSAN: uninit-value in test_uninit_kmsan_check_memory+0x1be/0x380 [kmsan_test]
test_uninit_kmsan_check_memory+0x1be/0x380 mm/kmsan/kmsan_test.c:273
kunit_run_case_internal lib/kunit/test.c:333
kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374
kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28
kthread+0x721/0x850 kernel/kthread.c:327
ret_from_fork+0x1f/0x30 ??:?
Uninit was stored to memory at:
do_uninit_local_array+0xfa/0x110 mm/kmsan/kmsan_test.c:260
test_uninit_kmsan_check_memory+0x1a2/0x380 mm/kmsan/kmsan_test.c:271
kunit_run_case_internal lib/kunit/test.c:333
kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374
kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28
kthread+0x721/0x850 kernel/kthread.c:327
ret_from_fork+0x1f/0x30 ??:?
Local variable uninit created at:
do_uninit_local_array+0x4a/0x110 mm/kmsan/kmsan_test.c:256
test_uninit_kmsan_check_memory+0x1a2/0x380 mm/kmsan/kmsan_test.c:271
Bytes 4-7 of 8 are uninitialized
Memory access of size 8 starts at ffff888083fe3da0
CPU: 0 PID: 6731 Comm: kunit_try_catch Tainted: G B E 5.16.0-rc3+ #104
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
=====================================================
The report says that the local variable ``uninit`` was created uninitialized in
``do_uninit_local_array()``. The third stack trace corresponds to the place
where this variable was created.
The first stack trace shows where the uninit value was used (in
``test_uninit_kmsan_check_memory()``). The tool shows the bytes which were left
uninitialized in the local variable, as well as the stack where the value was
copied to another memory location before use.
A use of uninitialized value ``v`` is reported by KMSAN in the following cases:
- in a condition, e.g. ``if (v) { ... }``;
- in an indexing or pointer dereferencing, e.g. ``array[v]`` or ``*v``;
- when it is copied to userspace or hardware, e.g. ``copy_to_user(..., &v, ...)``;
- when it is passed as an argument to a function, and
``CONFIG_KMSAN_CHECK_PARAM_RETVAL`` is enabled (see below).
The mentioned cases (apart from copying data to userspace or hardware, which is
a security issue) are considered undefined behavior from the C11 Standard point
of view.
Disabling the instrumentation
-----------------------------
A function can be marked with ``__no_kmsan_checks``. Doing so makes KMSAN
ignore uninitialized values in that function and mark its output as initialized.
As a result, the user will not get KMSAN reports related to that function.
Another function attribute supported by KMSAN is ``__no_sanitize_memory``.
Applying this attribute to a function will result in KMSAN not instrumenting
it, which can be helpful if we do not want the compiler to interfere with some
low-level code (e.g. that marked with ``noinstr`` which implicitly adds
``__no_sanitize_memory``).
This however comes at a cost: stack allocations from such functions will have
incorrect shadow/origin values, likely leading to false positives. Functions
called from non-instrumented code may also receive incorrect metadata for their
parameters.
As a rule of thumb, avoid using ``__no_sanitize_memory`` explicitly.
It is also possible to disable KMSAN for a single file (e.g. main.o)::
KMSAN_SANITIZE_main.o := n
or for the whole directory::
KMSAN_SANITIZE := n
in the Makefile. Think of this as applying ``__no_sanitize_memory`` to every
function in the file or directory. Most users won't need KMSAN_SANITIZE, unless
their code gets broken by KMSAN (e.g. runs at early boot time).
Support
=======
In order for KMSAN to work the kernel must be built with Clang, which so far is
the only compiler that has KMSAN support. The kernel instrumentation pass is
based on the userspace `MemorySanitizer tool`_.
The runtime library only supports x86_64 at the moment.
How KMSAN works
===============
KMSAN shadow memory
-------------------
KMSAN associates a metadata byte (also called shadow byte) with every byte of
kernel memory. A bit in the shadow byte is set iff the corresponding bit of the
kernel memory byte is uninitialized. Marking the memory uninitialized (i.e.
setting its shadow bytes to ``0xff``) is called poisoning, marking it
initialized (setting the shadow bytes to ``0x00``) is called unpoisoning.
When a new variable is allocated on the stack, it is poisoned by default by
instrumentation code inserted by the compiler (unless it is a stack variable
that is immediately initialized). Any new heap allocation done without
``__GFP_ZERO`` is also poisoned.
Compiler instrumentation also tracks the shadow values as they are used along
the code. When needed, instrumentation code invokes the runtime library in
``mm/kmsan/`` to persist shadow values.
The shadow value of a basic or compound type is an array of bytes of the same
length. When a constant value is written into memory, that memory is unpoisoned.
When a value is read from memory, its shadow memory is also obtained and
propagated into all the operations which use that value. For every instruction
that takes one or more values the compiler generates code that calculates the
shadow of the result depending on those values and their shadows.
Example::
int a = 0xff; // i.e. 0x000000ff
int b;
int c = a | b;
In this case the shadow of ``a`` is ``0``, shadow of ``b`` is ``0xffffffff``,
shadow of ``c`` is ``0xffffff00``. This means that the upper three bytes of
``c`` are uninitialized, while the lower byte is initialized.
Origin tracking
---------------
Every four bytes of kernel memory also have a so-called origin mapped to them.
This origin describes the point in program execution at which the uninitialized
value was created. Every origin is associated with either the full allocation
stack (for heap-allocated memory), or the function containing the uninitialized
variable (for locals).
When an uninitialized variable is allocated on stack or heap, a new origin
value is created, and that variable's origin is filled with that value. When a
value is read from memory, its origin is also read and kept together with the
shadow. For every instruction that takes one or more values, the origin of the
result is one of the origins corresponding to any of the uninitialized inputs.
If a poisoned value is written into memory, its origin is written to the
corresponding storage as well.
Example 1::
int a = 42;
int b;
int c = a + b;
In this case the origin of ``b`` is generated upon function entry, and is
stored to the origin of ``c`` right before the addition result is written into
memory.
Several variables may share the same origin address, if they are stored in the
same four-byte chunk. In this case every write to either variable updates the
origin for all of them. We have to sacrifice precision in this case, because
storing origins for individual bits (and even bytes) would be too costly.
Example 2::
int combine(short a, short b) {
union ret_t {
int i;
short s[2];
} ret;
ret.s[0] = a;
ret.s[1] = b;
return ret.i;
}
If ``a`` is initialized and ``b`` is not, the shadow of the result would be
0xffff0000, and the origin of the result would be the origin of ``b``.
``ret.s[0]`` would have the same origin, but it will never be used, because
that variable is initialized.
If both function arguments are uninitialized, only the origin of the second
argument is preserved.
Origin chaining
~~~~~~~~~~~~~~~
To ease debugging, KMSAN creates a new origin for every store of an
uninitialized value to memory. The new origin references both its creation stack
and the previous origin the value had. This may cause increased memory
consumption, so we limit the length of origin chains in the runtime.
Clang instrumentation API
-------------------------
Clang instrumentation pass inserts calls to functions defined in
``mm/kmsan/nstrumentation.c`` into the kernel code.
Shadow manipulation
~~~~~~~~~~~~~~~~~~~
For every memory access the compiler emits a call to a function that returns a
pair of pointers to the shadow and origin addresses of the given memory::
typedef struct {
void *shadow, *origin;
} shadow_origin_ptr_t
shadow_origin_ptr_t __msan_metadata_ptr_for_load_{1,2,4,8}(void *addr)
shadow_origin_ptr_t __msan_metadata_ptr_for_store_{1,2,4,8}(void *addr)
shadow_origin_ptr_t __msan_metadata_ptr_for_load_n(void *addr, uintptr_t size)
shadow_origin_ptr_t __msan_metadata_ptr_for_store_n(void *addr, uintptr_t size)
The function name depends on the memory access size.
The compiler makes sure that for every loaded value its shadow and origin
values are read from memory. When a value is stored to memory, its shadow and
origin are also stored using the metadata pointers.
Handling locals
~~~~~~~~~~~~~~~
A special function is used to create a new origin value for a local variable and
set the origin of that variable to that value::
void __msan_poison_alloca(void *addr, uintptr_t size, char *descr)
Access to per-task data
~~~~~~~~~~~~~~~~~~~~~~~
At the beginning of every instrumented function KMSAN inserts a call to
``__msan_get_context_state()``::
kmsan_context_state *__msan_get_context_state(void)
``kmsan_context_state`` is declared in ``include/linux/kmsan.h``::
struct kmsan_context_state {
char param_tls[KMSAN_PARAM_SIZE];
char retval_tls[KMSAN_RETVAL_SIZE];
char va_arg_tls[KMSAN_PARAM_SIZE];
char va_arg_origin_tls[KMSAN_PARAM_SIZE];
u64 va_arg_overflow_size_tls;
char param_origin_tls[KMSAN_PARAM_SIZE];
depot_stack_handle_t retval_origin_tls;
};
This structure is used by KMSAN to pass parameter shadows and origins between
instrumented functions (unless the parameters are checked immediately by
``CONFIG_KMSAN_CHECK_PARAM_RETVAL``).
Passing uninitialized values to functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Clang's MemorySanitizer instrumentation has an option,
``-fsanitize-memory-param-retval``, which makes the compiler check function
parameters passed by value, as well as function return values.
The option is controlled by ``CONFIG_KMSAN_CHECK_PARAM_RETVAL``, which is
enabled by default to let KMSAN report uninitialized values earlier.
Please refer to the `LKML discussion`_ for more details.
Because of the way the checks are implemented in LLVM (they are only applied to
parameters marked as ``noundef``), not all parameters are guaranteed to be
checked, so we cannot give up the metadata storage in ``kmsan_context_state``.
String functions
~~~~~~~~~~~~~~~~
The compiler replaces calls to ``memcpy()``/``memmove()``/``memset()`` with the
following functions. These functions are also called when data structures are
initialized or copied, making sure shadow and origin values are copied alongside
with the data::
void *__msan_memcpy(void *dst, void *src, uintptr_t n)
void *__msan_memmove(void *dst, void *src, uintptr_t n)
void *__msan_memset(void *dst, int c, uintptr_t n)
Error reporting
~~~~~~~~~~~~~~~
For each use of a value the compiler emits a shadow check that calls
``__msan_warning()`` in the case that value is poisoned::
void __msan_warning(u32 origin)
``__msan_warning()`` causes KMSAN runtime to print an error report.
Inline assembly instrumentation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
KMSAN instruments every inline assembly output with a call to::
void __msan_instrument_asm_store(void *addr, uintptr_t size)
, which unpoisons the memory region.
This approach may mask certain errors, but it also helps to avoid a lot of
false positives in bitwise operations, atomics etc.
Sometimes the pointers passed into inline assembly do not point to valid memory.
In such cases they are ignored at runtime.
Runtime library
---------------
The code is located in ``mm/kmsan/``.
Per-task KMSAN state
~~~~~~~~~~~~~~~~~~~~
Every task_struct has an associated KMSAN task state that holds the KMSAN
context (see above) and a per-task flag disallowing KMSAN reports::
struct kmsan_context {
...
bool allow_reporting;
struct kmsan_context_state cstate;
...
}
struct task_struct {
...
struct kmsan_context kmsan;
...
}
KMSAN contexts
~~~~~~~~~~~~~~
When running in a kernel task context, KMSAN uses ``current->kmsan.cstate`` to
hold the metadata for function parameters and return values.
But in the case the kernel is running in the interrupt, softirq or NMI context,
where ``current`` is unavailable, KMSAN switches to per-cpu interrupt state::
DEFINE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx);
Metadata allocation
~~~~~~~~~~~~~~~~~~~
There are several places in the kernel for which the metadata is stored.
1. Each ``struct page`` instance contains two pointers to its shadow and
origin pages::
struct page {
...
struct page *shadow, *origin;
...
};
At boot-time, the kernel allocates shadow and origin pages for every available
kernel page. This is done quite late, when the kernel address space is already
fragmented, so normal data pages may arbitrarily interleave with the metadata
pages.
This means that in general for two contiguous memory pages their shadow/origin
pages may not be contiguous. Consequently, if a memory access crosses the
boundary of a memory block, accesses to shadow/origin memory may potentially
corrupt other pages or read incorrect values from them.
In practice, contiguous memory pages returned by the same ``alloc_pages()``
call will have contiguous metadata, whereas if these pages belong to two
different allocations their metadata pages can be fragmented.
For the kernel data (``.data``, ``.bss`` etc.) and percpu memory regions
there also are no guarantees on metadata contiguity.
In the case ``__msan_metadata_ptr_for_XXX_YYY()`` hits the border between two
pages with non-contiguous metadata, it returns pointers to fake shadow/origin regions::
char dummy_load_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE)));
char dummy_store_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE)));
``dummy_load_page`` is zero-initialized, so reads from it always yield zeroes.
All stores to ``dummy_store_page`` are ignored.
2. For vmalloc memory and modules, there is a direct mapping between the memory
range, its shadow and origin. KMSAN reduces the vmalloc area by 3/4, making only
the first quarter available to ``vmalloc()``. The second quarter of the vmalloc
area contains shadow memory for the first quarter, the third one holds the
origins. A small part of the fourth quarter contains shadow and origins for the
kernel modules. Please refer to ``arch/x86/include/asm/pgtable_64_types.h`` for
more details.
When an array of pages is mapped into a contiguous virtual memory space, their
shadow and origin pages are similarly mapped into contiguous regions.
References
==========
E. Stepanov, K. Serebryany. `MemorySanitizer: fast detector of uninitialized
memory use in C++
<https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43308.pdf>`_.
In Proceedings of CGO 2015.
.. _MemorySanitizer tool: https://clang.llvm.org/docs/MemorySanitizer.html
.. _LLVM documentation: https://llvm.org/docs/GettingStarted.html
.. _LKML discussion: https://lore.kernel.org/all/20220614144853.3693273-1-glider@google.com/

View File

@ -251,14 +251,15 @@ command line arguments:
compiling a kernel (using ``build`` or ``run`` commands). For example:
to enable compiler warnings, we can pass ``--make_options W=1``.
- ``--alltests``: Builds a UML kernel with all config options enabled
using ``make allyesconfig``. This allows us to run as many tests as
possible.
- ``--alltests``: Enable a predefined set of options in order to build
as many tests as possible.
.. note:: It is slow and prone to breakage as new options are
added or modified. Instead, enable all tests
which have satisfied dependencies by adding
``CONFIG_KUNIT_ALL_TESTS=y`` to your ``.kunitconfig``.
.. note:: The list of enabled options can be found in
``tools/testing/kunit/configs/all_tests.config``.
If you only want to enable all tests with otherwise satisfied
dependencies, instead add ``CONFIG_KUNIT_ALL_TESTS=y`` to your
``.kunitconfig``.
- ``--kunitconfig``: Specifies the path or the directory of the ``.kunitconfig``
file. For example:

View File

@ -75,3 +75,6 @@ always-$(CHECK_DT_BINDING) += $(patsubst $(srctree)/$(src)/%.yaml,%.example.dtb,
# build artifacts here before they are processed by scripts/Makefile.clean
clean-files = $(shell find $(obj) \( -name '*.example.dts' -o \
-name '*.example.dtb' \) -delete 2>/dev/null)
dt_compatible_check: $(obj)/processed-schema.json
$(Q)$(srctree)/scripts/dtc/dt-extract-compatibles $(srctree) | xargs dt-check-compatible -v -s $<

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/actions.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Actions Semi platforms device tree bindings
title: Actions Semi platforms
maintainers:
- Andreas Färber <afaerber@suse.de>

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/airoha.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Airoha SoC based Platforms Device Tree Bindings
title: Airoha SoC based Platforms
maintainers:
- Felix Fietkau <nbd@nbd.name>

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/altera.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Altera's SoCFPGA platform device tree bindings
title: Altera's SoCFPGA platform
maintainers:
- Dinh Nguyen <dinguyen@kernel.org>

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/amazon,al.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Amazon's Annapurna Labs Alpine Platform Device Tree Bindings
title: Amazon's Annapurna Labs Alpine Platform
maintainers:
- Hanna Hawa <hhhawa@amazon.com>

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/amlogic.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Amlogic MesonX device tree bindings
title: Amlogic MesonX
maintainers:
- Kevin Hilman <khilman@baylibre.com>

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/apple.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Apple ARM Machine Device Tree Bindings
title: Apple ARM Machine
maintainers:
- Hector Martin <marcan@marcan.st>

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/arm,cci-400.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: ARM CCI Cache Coherent Interconnect Device Tree Binding
title: ARM CCI Cache Coherent Interconnect
maintainers:
- Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

View File

@ -61,6 +61,9 @@ properties:
maxItems: 1
description: Address translation error interrupt
power-domains:
maxItems: 1
in-ports:
$ref: /schemas/graph.yaml#/properties/ports
additionalProperties: false

View File

@ -98,6 +98,9 @@ properties:
base cti node if compatible string arm,coresight-cti-v8-arch is used,
or may appear in a trig-conns child node when appropriate.
power-domains:
maxItems: 1
arm,cti-ctm-id:
$ref: /schemas/types.yaml#/definitions/uint32
description:

View File

@ -54,6 +54,9 @@ properties:
- const: apb_pclk
- const: atclk
power-domains:
maxItems: 1
in-ports:
$ref: /schemas/graph.yaml#/properties/ports

View File

@ -54,6 +54,9 @@ properties:
- const: apb_pclk
- const: atclk
power-domains:
maxItems: 1
qcom,replicator-loses-context:
type: boolean
description:

View File

@ -54,6 +54,9 @@ properties:
- const: apb_pclk
- const: atclk
power-domains:
maxItems: 1
in-ports:
$ref: /schemas/graph.yaml#/properties/ports
additionalProperties: false

View File

@ -73,6 +73,9 @@ properties:
- const: apb_pclk
- const: atclk
power-domains:
maxItems: 1
arm,coresight-loses-context-with-cpu:
type: boolean
description:

View File

@ -27,6 +27,9 @@ properties:
compatible:
const: arm,coresight-static-funnel
power-domains:
maxItems: 1
in-ports:
$ref: /schemas/graph.yaml#/properties/ports

View File

@ -27,6 +27,9 @@ properties:
compatible:
const: arm,coresight-static-replicator
power-domains:
maxItems: 1
in-ports:
$ref: /schemas/graph.yaml#/properties/ports
additionalProperties: false

View File

@ -61,6 +61,9 @@ properties:
- const: apb_pclk
- const: atclk
power-domains:
maxItems: 1
out-ports:
$ref: /schemas/graph.yaml#/properties/ports
additionalProperties: false

View File

@ -55,6 +55,12 @@ properties:
- const: apb_pclk
- const: atclk
iommus:
maxItems: 1
power-domains:
maxItems: 1
arm,buffer-size:
$ref: /schemas/types.yaml#/definitions/uint32
deprecated: true

View File

@ -54,6 +54,9 @@ properties:
- const: apb_pclk
- const: atclk
power-domains:
maxItems: 1
in-ports:
$ref: /schemas/graph.yaml#/properties/ports
additionalProperties: false

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/arm,corstone1000.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: ARM Corstone1000 Device Tree Bindings
title: ARM Corstone1000
maintainers:
- Vishnu Banavath <vishnu.banavath@arm.com>

View File

@ -33,6 +33,9 @@ properties:
Handle to the cpu this ETE is bound to.
$ref: /schemas/types.yaml#/definitions/phandle
power-domains:
maxItems: 1
out-ports:
description: |
Output connections from the ETE to legacy CoreSight trace bus.

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/arm,integrator.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: ARM Integrator Boards Device Tree Bindings
title: ARM Integrator Boards
maintainers:
- Linus Walleij <linus.walleij@linaro.org>

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/arm,realview.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: ARM RealView Boards Device Tree Bindings
title: ARM RealView Boards
maintainers:
- Linus Walleij <linus.walleij@linaro.org>

View File

@ -0,0 +1,35 @@
# SPDX-License-Identifier: (GPL-2.0-only or BSD-2-Clause)
%YAML 1.2
---
$id: http://devicetree.org/schemas/arm/arm,versatile-sysreg.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Arm Versatile system registers
maintainers:
- Linus Walleij <linus.walleij@linaro.org>
description:
This is a system control registers block, providing multiple low level
platform functions like board detection and identification, software
interrupt generation, MMC and NOR Flash control, etc.
properties:
compatible:
items:
- const: arm,versatile-sysreg
- const: syscon
- const: simple-mfd
reg:
maxItems: 1
panel:
type: object
required:
- compatible
- reg
additionalProperties: false
...

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/arm,versatile.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: ARM Versatile Boards Device Tree Bindings
title: ARM Versatile Boards
maintainers:
- Linus Walleij <linus.walleij@linaro.org>

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/arm,vexpress-juno.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: ARM Versatile Express and Juno Boards Device Tree Bindings
title: ARM Versatile Express and Juno Boards
maintainers:
- Sudeep Holla <sudeep.holla@arm.com>

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/atmel-at91.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Atmel AT91 device tree bindings.
title: Atmel AT91.
maintainers:
- Alexandre Belloni <alexandre.belloni@bootlin.com>

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/axxia.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Axxia AXM55xx device tree bindings
title: Axxia AXM55xx
maintainers:
- Anders Berg <anders.berg@lsi.com>

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/bitmain.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Bitmain platform device tree bindings
title: Bitmain platform
maintainers:
- Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/calxeda.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Calxeda Platforms Device Tree Bindings
title: Calxeda Platforms
maintainers:
- Rob Herring <robh@kernel.org>

View File

@ -174,6 +174,7 @@ properties:
- nvidia,tegra194-carmel
- qcom,krait
- qcom,kryo
- qcom,kryo240
- qcom,kryo250
- qcom,kryo260
- qcom,kryo280

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/digicolor.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Conexant Digicolor Platforms Device Tree Bindings
title: Conexant Digicolor Platforms
maintainers:
- Baruch Siach <baruch@tkos.co.il>

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/fsl.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Freescale i.MX Platforms Device Tree Bindings
title: Freescale i.MX Platforms
maintainers:
- Shawn Guo <shawnguo@kernel.org>

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/intel,keembay.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Keem Bay platform device tree bindings
title: Keem Bay platform
maintainers:
- Paul J. Murphy <paul.j.murphy@intel.com>

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/intel,socfpga.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Intel SoCFPGA platform device tree bindings
title: Intel SoCFPGA platform
maintainers:
- Dinh Nguyen <dinguyen@kernel.org>

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/intel-ixp4xx.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Intel IXP4xx Device Tree Bindings
title: Intel IXP4xx
maintainers:
- Linus Walleij <linus.walleij@linaro.org>

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/mediatek.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: MediaTek SoC based Platforms Device Tree Bindings
title: MediaTek SoC based Platforms
maintainers:
- Sean Wang <sean.wang@mediatek.com>

View File

@ -23,6 +23,7 @@ properties:
- mediatek,mt2701-infracfg
- mediatek,mt2712-infracfg
- mediatek,mt6765-infracfg
- mediatek,mt6795-infracfg
- mediatek,mt6779-infracfg_ao
- mediatek,mt6797-infracfg
- mediatek,mt7622-infracfg
@ -60,6 +61,7 @@ if:
enum:
- mediatek,mt2701-infracfg
- mediatek,mt2712-infracfg
- mediatek,mt6795-infracfg
- mediatek,mt7622-infracfg
- mediatek,mt7986-infracfg
- mediatek,mt8135-infracfg

View File

@ -25,6 +25,7 @@ properties:
- mediatek,mt2712-mmsys
- mediatek,mt6765-mmsys
- mediatek,mt6779-mmsys
- mediatek,mt6795-mmsys
- mediatek,mt6797-mmsys
- mediatek,mt8167-mmsys
- mediatek,mt8173-mmsys
@ -52,7 +53,8 @@ properties:
description:
Using mailbox to communicate with GCE, it should have this
property and list of phandle, mailbox specifiers. See
Documentation/devicetree/bindings/mailbox/mtk-gce.txt for details.
Documentation/devicetree/bindings/mailbox/mediatek,gce-mailbox.yaml
for details.
$ref: /schemas/types.yaml#/definitions/phandle-array
mediatek,gce-client-reg:

View File

@ -21,6 +21,7 @@ properties:
- mediatek,mt2701-pericfg
- mediatek,mt2712-pericfg
- mediatek,mt6765-pericfg
- mediatek,mt6795-pericfg
- mediatek,mt7622-pericfg
- mediatek,mt7629-pericfg
- mediatek,mt8135-pericfg

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/microchip,sparx5.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Microchip Sparx5 Boards Device Tree Bindings
title: Microchip Sparx5 Boards
maintainers:
- Lars Povlsen <lars.povlsen@microchip.com>

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/moxart.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: MOXA ART device tree bindings
title: MOXA ART
maintainers:
- Jonas Jensen <jonas.jensen@gmail.com>

View File

@ -4,7 +4,7 @@
$id: "http://devicetree.org/schemas/arm/nvidia,tegra194-ccplex.yaml#"
$schema: "http://devicetree.org/meta-schemas/core.yaml#"
title: NVIDIA Tegra194 CPU Complex device tree bindings
title: NVIDIA Tegra194 CPU Complex
maintainers:
- Thierry Reding <thierry.reding@gmail.com>

View File

@ -41,31 +41,26 @@ properties:
For implementations complying to PSCI versions prior to 0.2.
const: arm,psci
- description:
For implementations complying to PSCI 0.2.
const: arm,psci-0.2
- description:
For implementations complying to PSCI 0.2.
Function IDs are not required and should be ignored by an OS with
PSCI 0.2 support, but are permitted to be present for compatibility
with existing software when "arm,psci" is later in the compatible
list.
minItems: 1
items:
- const: arm,psci-0.2
- const: arm,psci
- description:
For implementations complying to PSCI 1.0.
const: arm,psci-1.0
- description:
For implementations complying to PSCI 1.0.
PSCI 1.0 is backward compatible with PSCI 0.2 with minor
specification updates, as defined in the PSCI specification[2].
minItems: 1
items:
- const: arm,psci-1.0
- const: arm,psci-0.2
- const: arm,psci
method:
description: The method of calling the PSCI firmware.

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/qcom.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: QCOM device tree bindings
title: QCOM
maintainers:
- Bjorn Andersson <bjorn.andersson@linaro.org>

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/rda.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: RDA Micro platforms device tree bindings
title: RDA Micro platforms
maintainers:
- Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/realtek.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Realtek platforms device tree bindings
title: Realtek platforms
maintainers:
- Andreas Färber <afaerber@suse.de>

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/renesas.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Renesas SH-Mobile, R-Mobile, and R-Car Platform Device Tree Bindings
title: Renesas SH-Mobile, R-Mobile, and R-Car Platform
maintainers:
- Geert Uytterhoeven <geert+renesas@glider.be>

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/rockchip.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Rockchip platforms device tree bindings
title: Rockchip platforms
maintainers:
- Heiko Stuebner <heiko@sntech.de>

View File

@ -22,7 +22,6 @@ properties:
description: |
should contain 3 regions: control register, revision register,
operation register, in this order.
minItems: 3
maxItems: 3
interrupts:

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/spear.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: ST SPEAr Platforms Device Tree Bindings
title: ST SPEAr Platforms
maintainers:
- Viresh Kumar <vireshk@kernel.org>

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/sti.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: ST STi Platforms Device Tree Bindings
title: ST STi Platforms
maintainers:
- Patrice Chotard <patrice.chotard@foss.st.com>

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/sunxi.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Allwinner platforms device tree bindings
title: Allwinner platforms
maintainers:
- Chen-Yu Tsai <wens@csie.org>

View File

@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/arm/tegra.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: NVIDIA Tegra device tree bindings
title: NVIDIA Tegra
maintainers:
- Thierry Reding <thierry.reding@gmail.com>

Some files were not shown because too many files have changed in this diff Show More