This tag contains habanalabs driver changes for v5.12:
- Add feature called "staged command submissions". In this feature, the driver allows the user to submit multiple command submissions that describe a single pass on the deep learning graph. The driver tracks the completion of the entire pass by the last stage CS. - Update code to support the latest firmware image - Optimizations and improvements to MMU code: - Support page size that is not power-of-2 - Make the locks scheme simpler - mmap areas in device configuration space to userspace - Security fixes: - Make ETR non-secured - Remove access to kernel memory through debug-fs interface - Remove access through PCI bar to SyncManager register block in Gaudi - Many small bug fixes -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEE7TEboABC71LctBLFZR1NuKta54AFAmARvmgTHG9nYWJiYXlA a2VybmVsLm9yZwAKCRBlHU24q1rngJ67B/9eSEhEXDoYVXjdt0qebOf2sAI65csq ZZ5FXcnkQHjStytpSfBTztlz1fvRF9sged7Kta98Bl+H70JqebzRhv076ZDT5IEs 0DI//FoMYIShItTtFwgjINU8QGBww42Cod4SXNJ6wpRBrIhtBQF3Yn9XpWA7nesY ido3O7Vf73mU+gCA+mj1TBkhmGg+tZ8c1rwhItBkNYjU9mQwSZSEY/fGwtadwsB/ GECYAu3ekZn/RmUC9YvJ68o6b/CLpAmOGSqcOsj6mRzL9CsI73KuVU23N0plnLaX kuCCSLRZb2AbNnj5u7Hp7FvwBa8LVlxYRsCKbTJ9KXpmSlbrj67I4sHw =cqv6 -----END PGP SIGNATURE----- Merge tag 'misc-habanalabs-next-2021-01-27' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux into char-misc-next Oded writes: This tag contains habanalabs driver changes for v5.12: - Add feature called "staged command submissions". In this feature, the driver allows the user to submit multiple command submissions that describe a single pass on the deep learning graph. The driver tracks the completion of the entire pass by the last stage CS. - Update code to support the latest firmware image - Optimizations and improvements to MMU code: - Support page size that is not power-of-2 - Make the locks scheme simpler - mmap areas in device configuration space to userspace - Security fixes: - Make ETR non-secured - Remove access to kernel memory through debug-fs interface - Remove access through PCI bar to SyncManager register block in Gaudi - Many small bug fixes * tag 'misc-habanalabs-next-2021-01-27' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux: (41 commits) habanalabs: update to latest hl_boot_if.h spec from F/W habanalabs/gaudi: unmask HBM interrupts after handling habanalabs: update SyncManager interrupt handling habanalabs: fix ETR security issue habanalabs: staged submission support habanalabs: modify device_idle interface habanalabs: add CS completion and timeout properties habanalabs: add new mem ioctl op for mapping hw blocks habanalabs: fix MMU debugfs related nodes habanalabs: add user available interrupt to hw_ip habanalabs: always try to use the hint address CREDITS: update email address and home address habanalabs: update email address in sysfs/debugfs docs habanalabs: add security violations dump to debugfs habanalabs: ignore F/W BMC errors in case no BMC present habanalabs/gaudi: print sync manager SEI interrupt info habanalabs: Use 'dma_set_mask_and_coherent()' habanalabs/gaudi: remove PCI access to SM block habanalabs: add driver support for internal cb scheduling habanalabs: increment ctx ref from within a cs allocation ...
This commit is contained in:
commit
15b3d7f190
8
CREDITS
8
CREDITS
|
@ -1244,10 +1244,10 @@ S: 80050-430 - Curitiba - Paraná
|
|||
S: Brazil
|
||||
|
||||
N: Oded Gabbay
|
||||
E: oded.gabbay@gmail.com
|
||||
D: HabanaLabs and AMD KFD maintainer
|
||||
S: 12 Shraga Raphaeli
|
||||
S: Petah-Tikva, 4906418
|
||||
E: ogabbay@kernel.org
|
||||
D: HabanaLabs maintainer
|
||||
S: 29 Duchifat St.
|
||||
S: Ra'anana 4372029
|
||||
S: Israel
|
||||
|
||||
N: Kumar Gala
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
What: /sys/kernel/debug/habanalabs/hl<n>/addr
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Sets the device address to be used for read or write through
|
||||
PCI bar, or the device VA of a host mapped memory to be read or
|
||||
written directly from the host. The latter option is allowed
|
||||
|
@ -11,7 +11,7 @@ Description: Sets the device address to be used for read or write through
|
|||
What: /sys/kernel/debug/habanalabs/hl<n>/clk_gate
|
||||
Date: May 2020
|
||||
KernelVersion: 5.8
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Allow the root user to disable/enable in runtime the clock
|
||||
gating mechanism in Gaudi. Due to how Gaudi is built, the
|
||||
clock gating needs to be disabled in order to access the
|
||||
|
@ -34,28 +34,28 @@ Description: Allow the root user to disable/enable in runtime the clock
|
|||
What: /sys/kernel/debug/habanalabs/hl<n>/command_buffers
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Displays a list with information about the currently allocated
|
||||
command buffers
|
||||
|
||||
What: /sys/kernel/debug/habanalabs/hl<n>/command_submission
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Displays a list with information about the currently active
|
||||
command submissions
|
||||
|
||||
What: /sys/kernel/debug/habanalabs/hl<n>/command_submission_jobs
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Displays a list with detailed information about each JOB (CB) of
|
||||
each active command submission
|
||||
|
||||
What: /sys/kernel/debug/habanalabs/hl<n>/data32
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Allows the root user to read or write directly through the
|
||||
device's PCI bar. Writing to this file generates a write
|
||||
transaction while reading from the file generates a read
|
||||
|
@ -70,7 +70,7 @@ Description: Allows the root user to read or write directly through the
|
|||
What: /sys/kernel/debug/habanalabs/hl<n>/data64
|
||||
Date: Jan 2020
|
||||
KernelVersion: 5.6
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Allows the root user to read or write 64 bit data directly
|
||||
through the device's PCI bar. Writing to this file generates a
|
||||
write transaction while reading from the file generates a read
|
||||
|
@ -85,7 +85,7 @@ Description: Allows the root user to read or write 64 bit data directly
|
|||
What: /sys/kernel/debug/habanalabs/hl<n>/device
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Enables the root user to set the device to specific state.
|
||||
Valid values are "disable", "enable", "suspend", "resume".
|
||||
User can read this property to see the valid values
|
||||
|
@ -93,28 +93,28 @@ Description: Enables the root user to set the device to specific state.
|
|||
What: /sys/kernel/debug/habanalabs/hl<n>/engines
|
||||
Date: Jul 2019
|
||||
KernelVersion: 5.3
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Displays the status registers values of the device engines and
|
||||
their derived idle status
|
||||
|
||||
What: /sys/kernel/debug/habanalabs/hl<n>/i2c_addr
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Sets I2C device address for I2C transaction that is generated
|
||||
by the device's CPU
|
||||
|
||||
What: /sys/kernel/debug/habanalabs/hl<n>/i2c_bus
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Sets I2C bus address for I2C transaction that is generated by
|
||||
the device's CPU
|
||||
|
||||
What: /sys/kernel/debug/habanalabs/hl<n>/i2c_data
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Triggers an I2C transaction that is generated by the device's
|
||||
CPU. Writing to this file generates a write transaction while
|
||||
reading from the file generates a read transcation
|
||||
|
@ -122,32 +122,32 @@ Description: Triggers an I2C transaction that is generated by the device's
|
|||
What: /sys/kernel/debug/habanalabs/hl<n>/i2c_reg
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Sets I2C register id for I2C transaction that is generated by
|
||||
the device's CPU
|
||||
|
||||
What: /sys/kernel/debug/habanalabs/hl<n>/led0
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Sets the state of the first S/W led on the device
|
||||
|
||||
What: /sys/kernel/debug/habanalabs/hl<n>/led1
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Sets the state of the second S/W led on the device
|
||||
|
||||
What: /sys/kernel/debug/habanalabs/hl<n>/led2
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Sets the state of the third S/W led on the device
|
||||
|
||||
What: /sys/kernel/debug/habanalabs/hl<n>/mmu
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Displays the hop values and physical address for a given ASID
|
||||
and virtual address. The user should write the ASID and VA into
|
||||
the file and then read the file to get the result.
|
||||
|
@ -157,14 +157,14 @@ Description: Displays the hop values and physical address for a given ASID
|
|||
What: /sys/kernel/debug/habanalabs/hl<n>/set_power_state
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Sets the PCI power state. Valid values are "1" for D0 and "2"
|
||||
for D3Hot
|
||||
|
||||
What: /sys/kernel/debug/habanalabs/hl<n>/userptr
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Displays a list with information about the currently user
|
||||
pointers (user virtual addresses) that are pinned and mapped
|
||||
to DMA addresses
|
||||
|
@ -172,13 +172,21 @@ Description: Displays a list with information about the currently user
|
|||
What: /sys/kernel/debug/habanalabs/hl<n>/vm
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Displays a list with information about all the active virtual
|
||||
address mappings per ASID
|
||||
|
||||
What: /sys/kernel/debug/habanalabs/hl<n>/stop_on_err
|
||||
Date: Mar 2020
|
||||
KernelVersion: 5.6
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Sets the stop-on_error option for the device engines. Value of
|
||||
"0" is for disable, otherwise enable.
|
||||
|
||||
What: /sys/kernel/debug/habanalabs/hl<n>/dump_security_violations
|
||||
Date: Jan 2021
|
||||
KernelVersion: 5.12
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Dumps all security violations to dmesg. This will also ack
|
||||
all security violations meanings those violations will not be
|
||||
dumped next time user calls this API
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
What: /sys/class/habanalabs/hl<n>/armcp_kernel_ver
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Version of the Linux kernel running on the device's CPU.
|
||||
Will be DEPRECATED in Linux kernel version 5.10, and be
|
||||
replaced with cpucp_kernel_ver
|
||||
|
@ -9,7 +9,7 @@ Description: Version of the Linux kernel running on the device's CPU.
|
|||
What: /sys/class/habanalabs/hl<n>/armcp_ver
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Version of the application running on the device's CPU
|
||||
Will be DEPRECATED in Linux kernel version 5.10, and be
|
||||
replaced with cpucp_ver
|
||||
|
@ -17,7 +17,7 @@ Description: Version of the application running on the device's CPU
|
|||
What: /sys/class/habanalabs/hl<n>/clk_max_freq_mhz
|
||||
Date: Jun 2019
|
||||
KernelVersion: not yet upstreamed
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Allows the user to set the maximum clock frequency, in MHz.
|
||||
The device clock might be set to lower value than the maximum.
|
||||
The user should read the clk_cur_freq_mhz to see the actual
|
||||
|
@ -27,52 +27,52 @@ Description: Allows the user to set the maximum clock frequency, in MHz.
|
|||
What: /sys/class/habanalabs/hl<n>/clk_cur_freq_mhz
|
||||
Date: Jun 2019
|
||||
KernelVersion: not yet upstreamed
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Displays the current frequency, in MHz, of the device clock.
|
||||
This property is valid only for the Gaudi ASIC family
|
||||
|
||||
What: /sys/class/habanalabs/hl<n>/cpld_ver
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Version of the Device's CPLD F/W
|
||||
|
||||
What: /sys/class/habanalabs/hl<n>/cpucp_kernel_ver
|
||||
Date: Oct 2020
|
||||
KernelVersion: 5.10
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Version of the Linux kernel running on the device's CPU
|
||||
|
||||
What: /sys/class/habanalabs/hl<n>/cpucp_ver
|
||||
Date: Oct 2020
|
||||
KernelVersion: 5.10
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Version of the application running on the device's CPU
|
||||
|
||||
What: /sys/class/habanalabs/hl<n>/device_type
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Displays the code name of the device according to its type.
|
||||
The supported values are: "GOYA"
|
||||
|
||||
What: /sys/class/habanalabs/hl<n>/eeprom
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: A binary file attribute that contains the contents of the
|
||||
on-board EEPROM
|
||||
|
||||
What: /sys/class/habanalabs/hl<n>/fuse_ver
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Displays the device's version from the eFuse
|
||||
|
||||
What: /sys/class/habanalabs/hl<n>/hard_reset
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Interface to trigger a hard-reset operation for the device.
|
||||
Hard-reset will reset ALL internal components of the device
|
||||
except for the PCI interface and the internal PLLs
|
||||
|
@ -80,14 +80,14 @@ Description: Interface to trigger a hard-reset operation for the device.
|
|||
What: /sys/class/habanalabs/hl<n>/hard_reset_cnt
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Displays how many times the device have undergone a hard-reset
|
||||
operation since the driver was loaded
|
||||
|
||||
What: /sys/class/habanalabs/hl<n>/high_pll
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Allows the user to set the maximum clock frequency for MME, TPC
|
||||
and IC when the power management profile is set to "automatic".
|
||||
This property is valid only for the Goya ASIC family
|
||||
|
@ -95,7 +95,7 @@ Description: Allows the user to set the maximum clock frequency for MME, TPC
|
|||
What: /sys/class/habanalabs/hl<n>/ic_clk
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Allows the user to set the maximum clock frequency, in Hz, of
|
||||
the Interconnect fabric. Writes to this parameter affect the
|
||||
device only when the power management profile is set to "manual"
|
||||
|
@ -107,27 +107,27 @@ Description: Allows the user to set the maximum clock frequency, in Hz, of
|
|||
What: /sys/class/habanalabs/hl<n>/ic_clk_curr
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Displays the current clock frequency, in Hz, of the Interconnect
|
||||
fabric. This property is valid only for the Goya ASIC family
|
||||
|
||||
What: /sys/class/habanalabs/hl<n>/infineon_ver
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Version of the Device's power supply F/W code
|
||||
|
||||
What: /sys/class/habanalabs/hl<n>/max_power
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Allows the user to set the maximum power consumption of the
|
||||
device in milliwatts.
|
||||
|
||||
What: /sys/class/habanalabs/hl<n>/mme_clk
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Allows the user to set the maximum clock frequency, in Hz, of
|
||||
the MME compute engine. Writes to this parameter affect the
|
||||
device only when the power management profile is set to "manual"
|
||||
|
@ -139,21 +139,21 @@ Description: Allows the user to set the maximum clock frequency, in Hz, of
|
|||
What: /sys/class/habanalabs/hl<n>/mme_clk_curr
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Displays the current clock frequency, in Hz, of the MME compute
|
||||
engine. This property is valid only for the Goya ASIC family
|
||||
|
||||
What: /sys/class/habanalabs/hl<n>/pci_addr
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Displays the PCI address of the device. This is needed so the
|
||||
user would be able to open a device based on its PCI address
|
||||
|
||||
What: /sys/class/habanalabs/hl<n>/pm_mng_profile
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Power management profile. Values are "auto", "manual". In "auto"
|
||||
mode, the driver will set the maximum clock frequency to a high
|
||||
value when a user-space process opens the device's file (unless
|
||||
|
@ -167,13 +167,13 @@ Description: Power management profile. Values are "auto", "manual". In "auto"
|
|||
What: /sys/class/habanalabs/hl<n>/preboot_btl_ver
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Version of the device's preboot F/W code
|
||||
|
||||
What: /sys/class/habanalabs/hl<n>/soft_reset
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Interface to trigger a soft-reset operation for the device.
|
||||
Soft-reset will reset only the compute and DMA engines of the
|
||||
device
|
||||
|
@ -181,26 +181,26 @@ Description: Interface to trigger a soft-reset operation for the device.
|
|||
What: /sys/class/habanalabs/hl<n>/soft_reset_cnt
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Displays how many times the device have undergone a soft-reset
|
||||
operation since the driver was loaded
|
||||
|
||||
What: /sys/class/habanalabs/hl<n>/status
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Status of the card: "Operational", "Malfunction", "In reset".
|
||||
|
||||
What: /sys/class/habanalabs/hl<n>/thermal_ver
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Version of the Device's thermal daemon
|
||||
|
||||
What: /sys/class/habanalabs/hl<n>/tpc_clk
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Allows the user to set the maximum clock frequency, in Hz, of
|
||||
the TPC compute engines. Writes to this parameter affect the
|
||||
device only when the power management profile is set to "manual"
|
||||
|
@ -212,12 +212,12 @@ Description: Allows the user to set the maximum clock frequency, in Hz, of
|
|||
What: /sys/class/habanalabs/hl<n>/tpc_clk_curr
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Displays the current clock frequency, in Hz, of the TPC compute
|
||||
engines. This property is valid only for the Goya ASIC family
|
||||
|
||||
What: /sys/class/habanalabs/hl<n>/uboot_ver
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
Contact: oded.gabbay@gmail.com
|
||||
Contact: ogabbay@kernel.org
|
||||
Description: Version of the u-boot running on the device's CPU
|
|
@ -1,7 +1,13 @@
|
|||
# SPDX-License-Identifier: GPL-2.0-only
|
||||
|
||||
include $(src)/common/mmu/Makefile
|
||||
habanalabs-y += $(HL_COMMON_MMU_FILES)
|
||||
|
||||
include $(src)/common/pci/Makefile
|
||||
habanalabs-y += $(HL_COMMON_PCI_FILES)
|
||||
|
||||
HL_COMMON_FILES := common/habanalabs_drv.o common/device.o common/context.o \
|
||||
common/asid.o common/habanalabs_ioctl.o \
|
||||
common/command_buffer.o common/hw_queue.o common/irq.o \
|
||||
common/sysfs.o common/hwmon.o common/memory.o \
|
||||
common/command_submission.o common/mmu.o common/mmu_v1.o \
|
||||
common/firmware_if.o common/pci.o
|
||||
common/command_submission.o common/firmware_if.o
|
||||
|
|
|
@ -50,8 +50,10 @@ unsigned long hl_asid_alloc(struct hl_device *hdev)
|
|||
|
||||
void hl_asid_free(struct hl_device *hdev, unsigned long asid)
|
||||
{
|
||||
if (WARN((asid == 0 || asid >= hdev->asic_prop.max_asid),
|
||||
"Invalid ASID %lu", asid))
|
||||
if (asid == HL_KERNEL_ASID_ID || asid >= hdev->asic_prop.max_asid) {
|
||||
dev_crit(hdev->dev, "Invalid ASID %lu", asid);
|
||||
return;
|
||||
}
|
||||
|
||||
clear_bit(asid, hdev->asid_bitmap);
|
||||
}
|
||||
|
|
|
@ -635,10 +635,12 @@ struct hl_cb *hl_cb_kernel_create(struct hl_device *hdev, u32 cb_size,
|
|||
|
||||
cb_handle >>= PAGE_SHIFT;
|
||||
cb = hl_cb_get(hdev, &hdev->kernel_cb_mgr, (u32) cb_handle);
|
||||
/* hl_cb_get should never fail here so use kernel WARN */
|
||||
WARN(!cb, "Kernel CB handle invalid 0x%x\n", (u32) cb_handle);
|
||||
if (!cb)
|
||||
/* hl_cb_get should never fail here */
|
||||
if (!cb) {
|
||||
dev_crit(hdev->dev, "Kernel CB handle invalid 0x%x\n",
|
||||
(u32) cb_handle);
|
||||
goto destroy_cb;
|
||||
}
|
||||
|
||||
return cb;
|
||||
|
||||
|
|
|
@ -48,8 +48,8 @@ void hl_sob_reset_error(struct kref *ref)
|
|||
struct hl_device *hdev = hw_sob->hdev;
|
||||
|
||||
dev_crit(hdev->dev,
|
||||
"SOB release shouldn't be called here, q_idx: %d, sob_id: %d\n",
|
||||
hw_sob->q_idx, hw_sob->sob_id);
|
||||
"SOB release shouldn't be called here, q_idx: %d, sob_id: %d\n",
|
||||
hw_sob->q_idx, hw_sob->sob_id);
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -149,9 +149,10 @@ void hl_fence_get(struct hl_fence *fence)
|
|||
kref_get(&fence->refcount);
|
||||
}
|
||||
|
||||
static void hl_fence_init(struct hl_fence *fence)
|
||||
static void hl_fence_init(struct hl_fence *fence, u64 sequence)
|
||||
{
|
||||
kref_init(&fence->refcount);
|
||||
fence->cs_sequence = sequence;
|
||||
fence->error = 0;
|
||||
fence->timestamp = ktime_set(0, 0);
|
||||
init_completion(&fence->completion);
|
||||
|
@ -184,6 +185,28 @@ static void cs_job_put(struct hl_cs_job *job)
|
|||
kref_put(&job->refcount, cs_job_do_release);
|
||||
}
|
||||
|
||||
bool cs_needs_completion(struct hl_cs *cs)
|
||||
{
|
||||
/* In case this is a staged CS, only the last CS in sequence should
|
||||
* get a completion, any non staged CS will always get a completion
|
||||
*/
|
||||
if (cs->staged_cs && !cs->staged_last)
|
||||
return false;
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
bool cs_needs_timeout(struct hl_cs *cs)
|
||||
{
|
||||
/* In case this is a staged CS, only the first CS in sequence should
|
||||
* get a timeout, any non staged CS will always get a timeout
|
||||
*/
|
||||
if (cs->staged_cs && !cs->staged_first)
|
||||
return false;
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
static bool is_cb_patched(struct hl_device *hdev, struct hl_cs_job *job)
|
||||
{
|
||||
/*
|
||||
|
@ -225,6 +248,7 @@ static int cs_parser(struct hl_fpriv *hpriv, struct hl_cs_job *job)
|
|||
parser.queue_type = job->queue_type;
|
||||
parser.is_kernel_allocated_cb = job->is_kernel_allocated_cb;
|
||||
job->patched_cb = NULL;
|
||||
parser.completion = cs_needs_completion(job->cs);
|
||||
|
||||
rc = hdev->asic_funcs->cs_parser(hdev, &parser);
|
||||
|
||||
|
@ -290,13 +314,153 @@ static void complete_job(struct hl_device *hdev, struct hl_cs_job *job)
|
|||
|
||||
hl_debugfs_remove_job(hdev, job);
|
||||
|
||||
if (job->queue_type == QUEUE_TYPE_EXT ||
|
||||
job->queue_type == QUEUE_TYPE_HW)
|
||||
/* We decrement reference only for a CS that gets completion
|
||||
* because the reference was incremented only for this kind of CS
|
||||
* right before it was scheduled.
|
||||
*
|
||||
* In staged submission, only the last CS marked as 'staged_last'
|
||||
* gets completion, hence its release function will be called from here.
|
||||
* As for all the rest CS's in the staged submission which do not get
|
||||
* completion, their CS reference will be decremented by the
|
||||
* 'staged_last' CS during the CS release flow.
|
||||
* All relevant PQ CI counters will be incremented during the CS release
|
||||
* flow by calling 'hl_hw_queue_update_ci'.
|
||||
*/
|
||||
if (cs_needs_completion(cs) &&
|
||||
(job->queue_type == QUEUE_TYPE_EXT ||
|
||||
job->queue_type == QUEUE_TYPE_HW))
|
||||
cs_put(cs);
|
||||
|
||||
cs_job_put(job);
|
||||
}
|
||||
|
||||
/*
|
||||
* hl_staged_cs_find_first - locate the first CS in this staged submission
|
||||
*
|
||||
* @hdev: pointer to device structure
|
||||
* @cs_seq: staged submission sequence number
|
||||
*
|
||||
* @note: This function must be called under 'hdev->cs_mirror_lock'
|
||||
*
|
||||
* Find and return a CS pointer with the given sequence
|
||||
*/
|
||||
struct hl_cs *hl_staged_cs_find_first(struct hl_device *hdev, u64 cs_seq)
|
||||
{
|
||||
struct hl_cs *cs;
|
||||
|
||||
list_for_each_entry_reverse(cs, &hdev->cs_mirror_list, mirror_node)
|
||||
if (cs->staged_cs && cs->staged_first &&
|
||||
cs->sequence == cs_seq)
|
||||
return cs;
|
||||
|
||||
return NULL;
|
||||
}
|
||||
|
||||
/*
|
||||
* is_staged_cs_last_exists - returns true if the last CS in sequence exists
|
||||
*
|
||||
* @hdev: pointer to device structure
|
||||
* @cs: staged submission member
|
||||
*
|
||||
*/
|
||||
bool is_staged_cs_last_exists(struct hl_device *hdev, struct hl_cs *cs)
|
||||
{
|
||||
struct hl_cs *last_entry;
|
||||
|
||||
last_entry = list_last_entry(&cs->staged_cs_node, struct hl_cs,
|
||||
staged_cs_node);
|
||||
|
||||
if (last_entry->staged_last)
|
||||
return true;
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
/*
|
||||
* staged_cs_get - get CS reference if this CS is a part of a staged CS
|
||||
*
|
||||
* @hdev: pointer to device structure
|
||||
* @cs: current CS
|
||||
* @cs_seq: staged submission sequence number
|
||||
*
|
||||
* Increment CS reference for every CS in this staged submission except for
|
||||
* the CS which get completion.
|
||||
*/
|
||||
static void staged_cs_get(struct hl_device *hdev, struct hl_cs *cs)
|
||||
{
|
||||
/* Only the last CS in this staged submission will get a completion.
|
||||
* We must increment the reference for all other CS's in this
|
||||
* staged submission.
|
||||
* Once we get a completion we will release the whole staged submission.
|
||||
*/
|
||||
if (!cs->staged_last)
|
||||
cs_get(cs);
|
||||
}
|
||||
|
||||
/*
|
||||
* staged_cs_put - put a CS in case it is part of staged submission
|
||||
*
|
||||
* @hdev: pointer to device structure
|
||||
* @cs: CS to put
|
||||
*
|
||||
* This function decrements a CS reference (for a non completion CS)
|
||||
*/
|
||||
static void staged_cs_put(struct hl_device *hdev, struct hl_cs *cs)
|
||||
{
|
||||
/* We release all CS's in a staged submission except the last
|
||||
* CS which we have never incremented its reference.
|
||||
*/
|
||||
if (!cs_needs_completion(cs))
|
||||
cs_put(cs);
|
||||
}
|
||||
|
||||
static void cs_handle_tdr(struct hl_device *hdev, struct hl_cs *cs)
|
||||
{
|
||||
bool next_entry_found = false;
|
||||
struct hl_cs *next;
|
||||
|
||||
if (!cs_needs_timeout(cs))
|
||||
return;
|
||||
|
||||
spin_lock(&hdev->cs_mirror_lock);
|
||||
|
||||
/* We need to handle tdr only once for the complete staged submission.
|
||||
* Hence, we choose the CS that reaches this function first which is
|
||||
* the CS marked as 'staged_last'.
|
||||
*/
|
||||
if (cs->staged_cs && cs->staged_last)
|
||||
cs = hl_staged_cs_find_first(hdev, cs->staged_sequence);
|
||||
|
||||
spin_unlock(&hdev->cs_mirror_lock);
|
||||
|
||||
/* Don't cancel TDR in case this CS was timedout because we might be
|
||||
* running from the TDR context
|
||||
*/
|
||||
if (cs && (cs->timedout ||
|
||||
hdev->timeout_jiffies == MAX_SCHEDULE_TIMEOUT))
|
||||
return;
|
||||
|
||||
if (cs && cs->tdr_active)
|
||||
cancel_delayed_work_sync(&cs->work_tdr);
|
||||
|
||||
spin_lock(&hdev->cs_mirror_lock);
|
||||
|
||||
/* queue TDR for next CS */
|
||||
list_for_each_entry(next, &hdev->cs_mirror_list, mirror_node)
|
||||
if (cs_needs_timeout(next)) {
|
||||
next_entry_found = true;
|
||||
break;
|
||||
}
|
||||
|
||||
if (next_entry_found && !next->tdr_active) {
|
||||
next->tdr_active = true;
|
||||
schedule_delayed_work(&next->work_tdr,
|
||||
hdev->timeout_jiffies);
|
||||
}
|
||||
|
||||
spin_unlock(&hdev->cs_mirror_lock);
|
||||
}
|
||||
|
||||
static void cs_do_release(struct kref *ref)
|
||||
{
|
||||
struct hl_cs *cs = container_of(ref, struct hl_cs, refcount);
|
||||
|
@ -346,36 +510,37 @@ static void cs_do_release(struct kref *ref)
|
|||
|
||||
hdev->asic_funcs->hw_queues_unlock(hdev);
|
||||
|
||||
/* Need to update CI for internal queues */
|
||||
hl_int_hw_queue_update_ci(cs);
|
||||
/* Need to update CI for all queue jobs that does not get completion */
|
||||
hl_hw_queue_update_ci(cs);
|
||||
|
||||
/* remove CS from CS mirror list */
|
||||
spin_lock(&hdev->cs_mirror_lock);
|
||||
list_del_init(&cs->mirror_node);
|
||||
spin_unlock(&hdev->cs_mirror_lock);
|
||||
|
||||
/* Don't cancel TDR in case this CS was timedout because we might be
|
||||
* running from the TDR context
|
||||
*/
|
||||
if (!cs->timedout && hdev->timeout_jiffies != MAX_SCHEDULE_TIMEOUT) {
|
||||
struct hl_cs *next;
|
||||
cs_handle_tdr(hdev, cs);
|
||||
|
||||
if (cs->tdr_active)
|
||||
cancel_delayed_work_sync(&cs->work_tdr);
|
||||
if (cs->staged_cs) {
|
||||
/* the completion CS decrements reference for the entire
|
||||
* staged submission
|
||||
*/
|
||||
if (cs->staged_last) {
|
||||
struct hl_cs *staged_cs, *tmp;
|
||||
|
||||
spin_lock(&hdev->cs_mirror_lock);
|
||||
|
||||
/* queue TDR for next CS */
|
||||
next = list_first_entry_or_null(&hdev->cs_mirror_list,
|
||||
struct hl_cs, mirror_node);
|
||||
|
||||
if (next && !next->tdr_active) {
|
||||
next->tdr_active = true;
|
||||
schedule_delayed_work(&next->work_tdr,
|
||||
hdev->timeout_jiffies);
|
||||
list_for_each_entry_safe(staged_cs, tmp,
|
||||
&cs->staged_cs_node, staged_cs_node)
|
||||
staged_cs_put(hdev, staged_cs);
|
||||
}
|
||||
|
||||
spin_unlock(&hdev->cs_mirror_lock);
|
||||
/* A staged CS will be a member in the list only after it
|
||||
* was submitted. We used 'cs_mirror_lock' when inserting
|
||||
* it to list so we will use it again when removing it
|
||||
*/
|
||||
if (cs->submitted) {
|
||||
spin_lock(&hdev->cs_mirror_lock);
|
||||
list_del(&cs->staged_cs_node);
|
||||
spin_unlock(&hdev->cs_mirror_lock);
|
||||
}
|
||||
}
|
||||
|
||||
out:
|
||||
|
@ -461,7 +626,8 @@ static void cs_timedout(struct work_struct *work)
|
|||
}
|
||||
|
||||
static int allocate_cs(struct hl_device *hdev, struct hl_ctx *ctx,
|
||||
enum hl_cs_type cs_type, struct hl_cs **cs_new)
|
||||
enum hl_cs_type cs_type, u64 user_sequence,
|
||||
struct hl_cs **cs_new)
|
||||
{
|
||||
struct hl_cs_counters_atomic *cntr;
|
||||
struct hl_fence *other = NULL;
|
||||
|
@ -478,6 +644,9 @@ static int allocate_cs(struct hl_device *hdev, struct hl_ctx *ctx,
|
|||
return -ENOMEM;
|
||||
}
|
||||
|
||||
/* increment refcnt for context */
|
||||
hl_ctx_get(hdev, ctx);
|
||||
|
||||
cs->ctx = ctx;
|
||||
cs->submitted = false;
|
||||
cs->completed = false;
|
||||
|
@ -507,6 +676,18 @@ static int allocate_cs(struct hl_device *hdev, struct hl_ctx *ctx,
|
|||
(hdev->asic_prop.max_pending_cs - 1)];
|
||||
|
||||
if (other && !completion_done(&other->completion)) {
|
||||
/* If the following statement is true, it means we have reached
|
||||
* a point in which only part of the staged submission was
|
||||
* submitted and we don't have enough room in the 'cs_pending'
|
||||
* array for the rest of the submission.
|
||||
* This causes a deadlock because this CS will never be
|
||||
* completed as it depends on future CS's for completion.
|
||||
*/
|
||||
if (other->cs_sequence == user_sequence)
|
||||
dev_crit_ratelimited(hdev->dev,
|
||||
"Staged CS %llu deadlock due to lack of resources",
|
||||
user_sequence);
|
||||
|
||||
dev_dbg_ratelimited(hdev->dev,
|
||||
"Rejecting CS because of too many in-flights CS\n");
|
||||
atomic64_inc(&ctx->cs_counters.max_cs_in_flight_drop_cnt);
|
||||
|
@ -525,7 +706,7 @@ static int allocate_cs(struct hl_device *hdev, struct hl_ctx *ctx,
|
|||
}
|
||||
|
||||
/* init hl_fence */
|
||||
hl_fence_init(&cs_cmpl->base_fence);
|
||||
hl_fence_init(&cs_cmpl->base_fence, cs_cmpl->cs_seq);
|
||||
|
||||
cs->sequence = cs_cmpl->cs_seq;
|
||||
|
||||
|
@ -549,6 +730,7 @@ free_fence:
|
|||
kfree(cs_cmpl);
|
||||
free_cs:
|
||||
kfree(cs);
|
||||
hl_ctx_put(ctx);
|
||||
return rc;
|
||||
}
|
||||
|
||||
|
@ -556,6 +738,8 @@ static void cs_rollback(struct hl_device *hdev, struct hl_cs *cs)
|
|||
{
|
||||
struct hl_cs_job *job, *tmp;
|
||||
|
||||
staged_cs_put(hdev, cs);
|
||||
|
||||
list_for_each_entry_safe(job, tmp, &cs->job_list, cs_node)
|
||||
complete_job(hdev, job);
|
||||
}
|
||||
|
@ -565,7 +749,9 @@ void hl_cs_rollback_all(struct hl_device *hdev)
|
|||
int i;
|
||||
struct hl_cs *cs, *tmp;
|
||||
|
||||
/* flush all completions */
|
||||
/* flush all completions before iterating over the CS mirror list in
|
||||
* order to avoid a race with the release functions
|
||||
*/
|
||||
for (i = 0 ; i < hdev->asic_prop.completion_queues_count ; i++)
|
||||
flush_workqueue(hdev->cq_wq[i]);
|
||||
|
||||
|
@ -574,12 +760,24 @@ void hl_cs_rollback_all(struct hl_device *hdev)
|
|||
cs_get(cs);
|
||||
cs->aborted = true;
|
||||
dev_warn_ratelimited(hdev->dev, "Killing CS %d.%llu\n",
|
||||
cs->ctx->asid, cs->sequence);
|
||||
cs->ctx->asid, cs->sequence);
|
||||
cs_rollback(hdev, cs);
|
||||
cs_put(cs);
|
||||
}
|
||||
}
|
||||
|
||||
void hl_pending_cb_list_flush(struct hl_ctx *ctx)
|
||||
{
|
||||
struct hl_pending_cb *pending_cb, *tmp;
|
||||
|
||||
list_for_each_entry_safe(pending_cb, tmp,
|
||||
&ctx->pending_cb_list, cb_node) {
|
||||
list_del(&pending_cb->cb_node);
|
||||
hl_cb_put(pending_cb->cb);
|
||||
kfree(pending_cb);
|
||||
}
|
||||
}
|
||||
|
||||
static void job_wq_completion(struct work_struct *work)
|
||||
{
|
||||
struct hl_cs_job *job = container_of(work, struct hl_cs_job,
|
||||
|
@ -734,6 +932,12 @@ static int hl_cs_sanity_checks(struct hl_fpriv *hpriv, union hl_cs_args *args)
|
|||
return -EBUSY;
|
||||
}
|
||||
|
||||
if ((args->in.cs_flags & HL_CS_FLAGS_STAGED_SUBMISSION) &&
|
||||
!hdev->supports_staged_submission) {
|
||||
dev_err(hdev->dev, "staged submission not supported");
|
||||
return -EPERM;
|
||||
}
|
||||
|
||||
cs_type_flags = args->in.cs_flags & HL_CS_FLAGS_TYPE_MASK;
|
||||
|
||||
if (unlikely(cs_type_flags && !is_power_of_2(cs_type_flags))) {
|
||||
|
@ -805,10 +1009,38 @@ static int hl_cs_copy_chunk_array(struct hl_device *hdev,
|
|||
return 0;
|
||||
}
|
||||
|
||||
static int cs_ioctl_default(struct hl_fpriv *hpriv, void __user *chunks,
|
||||
u32 num_chunks, u64 *cs_seq, bool timestamp)
|
||||
static int cs_staged_submission(struct hl_device *hdev, struct hl_cs *cs,
|
||||
u64 sequence, u32 flags)
|
||||
{
|
||||
bool int_queues_only = true;
|
||||
if (!(flags & HL_CS_FLAGS_STAGED_SUBMISSION))
|
||||
return 0;
|
||||
|
||||
cs->staged_last = !!(flags & HL_CS_FLAGS_STAGED_SUBMISSION_LAST);
|
||||
cs->staged_first = !!(flags & HL_CS_FLAGS_STAGED_SUBMISSION_FIRST);
|
||||
|
||||
if (cs->staged_first) {
|
||||
/* Staged CS sequence is the first CS sequence */
|
||||
INIT_LIST_HEAD(&cs->staged_cs_node);
|
||||
cs->staged_sequence = cs->sequence;
|
||||
} else {
|
||||
/* User sequence will be validated in 'hl_hw_queue_schedule_cs'
|
||||
* under the cs_mirror_lock
|
||||
*/
|
||||
cs->staged_sequence = sequence;
|
||||
}
|
||||
|
||||
/* Increment CS reference if needed */
|
||||
staged_cs_get(hdev, cs);
|
||||
|
||||
cs->staged_cs = true;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int cs_ioctl_default(struct hl_fpriv *hpriv, void __user *chunks,
|
||||
u32 num_chunks, u64 *cs_seq, u32 flags)
|
||||
{
|
||||
bool staged_mid, int_queues_only = true;
|
||||
struct hl_device *hdev = hpriv->hdev;
|
||||
struct hl_cs_chunk *cs_chunk_array;
|
||||
struct hl_cs_counters_atomic *cntr;
|
||||
|
@ -816,9 +1048,11 @@ static int cs_ioctl_default(struct hl_fpriv *hpriv, void __user *chunks,
|
|||
struct hl_cs_job *job;
|
||||
struct hl_cs *cs;
|
||||
struct hl_cb *cb;
|
||||
u64 user_sequence;
|
||||
int rc, i;
|
||||
|
||||
cntr = &hdev->aggregated_cs_counters;
|
||||
user_sequence = *cs_seq;
|
||||
*cs_seq = ULLONG_MAX;
|
||||
|
||||
rc = hl_cs_copy_chunk_array(hdev, &cs_chunk_array, chunks, num_chunks,
|
||||
|
@ -826,20 +1060,26 @@ static int cs_ioctl_default(struct hl_fpriv *hpriv, void __user *chunks,
|
|||
if (rc)
|
||||
goto out;
|
||||
|
||||
/* increment refcnt for context */
|
||||
hl_ctx_get(hdev, hpriv->ctx);
|
||||
if ((flags & HL_CS_FLAGS_STAGED_SUBMISSION) &&
|
||||
!(flags & HL_CS_FLAGS_STAGED_SUBMISSION_FIRST))
|
||||
staged_mid = true;
|
||||
else
|
||||
staged_mid = false;
|
||||
|
||||
rc = allocate_cs(hdev, hpriv->ctx, CS_TYPE_DEFAULT, &cs);
|
||||
if (rc) {
|
||||
hl_ctx_put(hpriv->ctx);
|
||||
rc = allocate_cs(hdev, hpriv->ctx, CS_TYPE_DEFAULT,
|
||||
staged_mid ? user_sequence : ULLONG_MAX, &cs);
|
||||
if (rc)
|
||||
goto free_cs_chunk_array;
|
||||
}
|
||||
|
||||
cs->timestamp = !!timestamp;
|
||||
cs->timestamp = !!(flags & HL_CS_FLAGS_TIMESTAMP);
|
||||
*cs_seq = cs->sequence;
|
||||
|
||||
hl_debugfs_add_cs(cs);
|
||||
|
||||
rc = cs_staged_submission(hdev, cs, user_sequence, flags);
|
||||
if (rc)
|
||||
goto free_cs_object;
|
||||
|
||||
/* Validate ALL the CS chunks before submitting the CS */
|
||||
for (i = 0 ; i < num_chunks ; i++) {
|
||||
struct hl_cs_chunk *chunk = &cs_chunk_array[i];
|
||||
|
@ -899,8 +1139,9 @@ static int cs_ioctl_default(struct hl_fpriv *hpriv, void __user *chunks,
|
|||
* Only increment for JOB on external or H/W queues, because
|
||||
* only for those JOBs we get completion
|
||||
*/
|
||||
if (job->queue_type == QUEUE_TYPE_EXT ||
|
||||
job->queue_type == QUEUE_TYPE_HW)
|
||||
if (cs_needs_completion(cs) &&
|
||||
(job->queue_type == QUEUE_TYPE_EXT ||
|
||||
job->queue_type == QUEUE_TYPE_HW))
|
||||
cs_get(cs);
|
||||
|
||||
hl_debugfs_add_job(hdev, job);
|
||||
|
@ -916,11 +1157,14 @@ static int cs_ioctl_default(struct hl_fpriv *hpriv, void __user *chunks,
|
|||
}
|
||||
}
|
||||
|
||||
if (int_queues_only) {
|
||||
/* We allow a CS with any queue type combination as long as it does
|
||||
* not get a completion
|
||||
*/
|
||||
if (int_queues_only && cs_needs_completion(cs)) {
|
||||
atomic64_inc(&ctx->cs_counters.validation_drop_cnt);
|
||||
atomic64_inc(&cntr->validation_drop_cnt);
|
||||
dev_err(hdev->dev,
|
||||
"Reject CS %d.%llu because only internal queues jobs are present\n",
|
||||
"Reject CS %d.%llu since it contains only internal queues jobs and needs completion\n",
|
||||
cs->ctx->asid, cs->sequence);
|
||||
rc = -EINVAL;
|
||||
goto free_cs_object;
|
||||
|
@ -954,6 +1198,129 @@ out:
|
|||
return rc;
|
||||
}
|
||||
|
||||
static int pending_cb_create_job(struct hl_device *hdev, struct hl_ctx *ctx,
|
||||
struct hl_cs *cs, struct hl_cb *cb, u32 size, u32 hw_queue_id)
|
||||
{
|
||||
struct hw_queue_properties *hw_queue_prop;
|
||||
struct hl_cs_counters_atomic *cntr;
|
||||
struct hl_cs_job *job;
|
||||
|
||||
hw_queue_prop = &hdev->asic_prop.hw_queues_props[hw_queue_id];
|
||||
cntr = &hdev->aggregated_cs_counters;
|
||||
|
||||
job = hl_cs_allocate_job(hdev, hw_queue_prop->type, true);
|
||||
if (!job) {
|
||||
atomic64_inc(&ctx->cs_counters.out_of_mem_drop_cnt);
|
||||
atomic64_inc(&cntr->out_of_mem_drop_cnt);
|
||||
dev_err(hdev->dev, "Failed to allocate a new job\n");
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
job->id = 0;
|
||||
job->cs = cs;
|
||||
job->user_cb = cb;
|
||||
atomic_inc(&job->user_cb->cs_cnt);
|
||||
job->user_cb_size = size;
|
||||
job->hw_queue_id = hw_queue_id;
|
||||
job->patched_cb = job->user_cb;
|
||||
job->job_cb_size = job->user_cb_size;
|
||||
|
||||
/* increment refcount as for external queues we get completion */
|
||||
cs_get(cs);
|
||||
|
||||
cs->jobs_in_queue_cnt[job->hw_queue_id]++;
|
||||
|
||||
list_add_tail(&job->cs_node, &cs->job_list);
|
||||
|
||||
hl_debugfs_add_job(hdev, job);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int hl_submit_pending_cb(struct hl_fpriv *hpriv)
|
||||
{
|
||||
struct hl_device *hdev = hpriv->hdev;
|
||||
struct hl_ctx *ctx = hpriv->ctx;
|
||||
struct hl_pending_cb *pending_cb, *tmp;
|
||||
struct list_head local_cb_list;
|
||||
struct hl_cs *cs;
|
||||
struct hl_cb *cb;
|
||||
u32 hw_queue_id;
|
||||
u32 cb_size;
|
||||
int process_list, rc = 0;
|
||||
|
||||
if (list_empty(&ctx->pending_cb_list))
|
||||
return 0;
|
||||
|
||||
process_list = atomic_cmpxchg(&ctx->thread_pending_cb_token, 1, 0);
|
||||
|
||||
/* Only a single thread is allowed to process the list */
|
||||
if (!process_list)
|
||||
return 0;
|
||||
|
||||
if (list_empty(&ctx->pending_cb_list))
|
||||
goto free_pending_cb_token;
|
||||
|
||||
/* move all list elements to a local list */
|
||||
INIT_LIST_HEAD(&local_cb_list);
|
||||
spin_lock(&ctx->pending_cb_lock);
|
||||
list_for_each_entry_safe(pending_cb, tmp, &ctx->pending_cb_list,
|
||||
cb_node)
|
||||
list_move_tail(&pending_cb->cb_node, &local_cb_list);
|
||||
spin_unlock(&ctx->pending_cb_lock);
|
||||
|
||||
rc = allocate_cs(hdev, ctx, CS_TYPE_DEFAULT, ULLONG_MAX, &cs);
|
||||
if (rc)
|
||||
goto add_list_elements;
|
||||
|
||||
hl_debugfs_add_cs(cs);
|
||||
|
||||
/* Iterate through pending cb list, create jobs and add to CS */
|
||||
list_for_each_entry(pending_cb, &local_cb_list, cb_node) {
|
||||
cb = pending_cb->cb;
|
||||
cb_size = pending_cb->cb_size;
|
||||
hw_queue_id = pending_cb->hw_queue_id;
|
||||
|
||||
rc = pending_cb_create_job(hdev, ctx, cs, cb, cb_size,
|
||||
hw_queue_id);
|
||||
if (rc)
|
||||
goto free_cs_object;
|
||||
}
|
||||
|
||||
rc = hl_hw_queue_schedule_cs(cs);
|
||||
if (rc) {
|
||||
if (rc != -EAGAIN)
|
||||
dev_err(hdev->dev,
|
||||
"Failed to submit CS %d.%llu (%d)\n",
|
||||
ctx->asid, cs->sequence, rc);
|
||||
goto free_cs_object;
|
||||
}
|
||||
|
||||
/* pending cb was scheduled successfully */
|
||||
list_for_each_entry_safe(pending_cb, tmp, &local_cb_list, cb_node) {
|
||||
list_del(&pending_cb->cb_node);
|
||||
kfree(pending_cb);
|
||||
}
|
||||
|
||||
cs_put(cs);
|
||||
|
||||
goto free_pending_cb_token;
|
||||
|
||||
free_cs_object:
|
||||
cs_rollback(hdev, cs);
|
||||
cs_put(cs);
|
||||
add_list_elements:
|
||||
spin_lock(&ctx->pending_cb_lock);
|
||||
list_for_each_entry_safe_reverse(pending_cb, tmp, &local_cb_list,
|
||||
cb_node)
|
||||
list_move(&pending_cb->cb_node, &ctx->pending_cb_list);
|
||||
spin_unlock(&ctx->pending_cb_lock);
|
||||
free_pending_cb_token:
|
||||
atomic_set(&ctx->thread_pending_cb_token, 1);
|
||||
|
||||
return rc;
|
||||
}
|
||||
|
||||
static int hl_cs_ctx_switch(struct hl_fpriv *hpriv, union hl_cs_args *args,
|
||||
u64 *cs_seq)
|
||||
{
|
||||
|
@ -1003,7 +1370,7 @@ static int hl_cs_ctx_switch(struct hl_fpriv *hpriv, union hl_cs_args *args,
|
|||
rc = 0;
|
||||
} else {
|
||||
rc = cs_ioctl_default(hpriv, chunks, num_chunks,
|
||||
cs_seq, false);
|
||||
cs_seq, 0);
|
||||
}
|
||||
|
||||
mutex_unlock(&hpriv->restore_phase_mutex);
|
||||
|
@ -1275,15 +1642,11 @@ static int cs_ioctl_signal_wait(struct hl_fpriv *hpriv, enum hl_cs_type cs_type,
|
|||
}
|
||||
}
|
||||
|
||||
/* increment refcnt for context */
|
||||
hl_ctx_get(hdev, ctx);
|
||||
|
||||
rc = allocate_cs(hdev, ctx, cs_type, &cs);
|
||||
rc = allocate_cs(hdev, ctx, cs_type, ULLONG_MAX, &cs);
|
||||
if (rc) {
|
||||
if (cs_type == CS_TYPE_WAIT ||
|
||||
cs_type == CS_TYPE_COLLECTIVE_WAIT)
|
||||
hl_fence_put(sig_fence);
|
||||
hl_ctx_put(ctx);
|
||||
goto free_cs_chunk_array;
|
||||
}
|
||||
|
||||
|
@ -1346,7 +1709,7 @@ int hl_cs_ioctl(struct hl_fpriv *hpriv, void *data)
|
|||
enum hl_cs_type cs_type;
|
||||
u64 cs_seq = ULONG_MAX;
|
||||
void __user *chunks;
|
||||
u32 num_chunks;
|
||||
u32 num_chunks, flags;
|
||||
int rc;
|
||||
|
||||
rc = hl_cs_sanity_checks(hpriv, args);
|
||||
|
@ -1357,10 +1720,20 @@ int hl_cs_ioctl(struct hl_fpriv *hpriv, void *data)
|
|||
if (rc)
|
||||
goto out;
|
||||
|
||||
rc = hl_submit_pending_cb(hpriv);
|
||||
if (rc)
|
||||
goto out;
|
||||
|
||||
cs_type = hl_cs_get_cs_type(args->in.cs_flags &
|
||||
~HL_CS_FLAGS_FORCE_RESTORE);
|
||||
chunks = (void __user *) (uintptr_t) args->in.chunks_execute;
|
||||
num_chunks = args->in.num_chunks_execute;
|
||||
flags = args->in.cs_flags;
|
||||
|
||||
/* In case this is a staged CS, user should supply the CS sequence */
|
||||
if ((flags & HL_CS_FLAGS_STAGED_SUBMISSION) &&
|
||||
!(flags & HL_CS_FLAGS_STAGED_SUBMISSION_FIRST))
|
||||
cs_seq = args->in.seq;
|
||||
|
||||
switch (cs_type) {
|
||||
case CS_TYPE_SIGNAL:
|
||||
|
@ -1371,7 +1744,7 @@ int hl_cs_ioctl(struct hl_fpriv *hpriv, void *data)
|
|||
break;
|
||||
default:
|
||||
rc = cs_ioctl_default(hpriv, chunks, num_chunks, &cs_seq,
|
||||
args->in.cs_flags & HL_CS_FLAGS_TIMESTAMP);
|
||||
args->in.cs_flags);
|
||||
break;
|
||||
}
|
||||
|
||||
|
|
|
@ -12,9 +12,14 @@
|
|||
static void hl_ctx_fini(struct hl_ctx *ctx)
|
||||
{
|
||||
struct hl_device *hdev = ctx->hdev;
|
||||
u64 idle_mask = 0;
|
||||
u64 idle_mask[HL_BUSY_ENGINES_MASK_EXT_SIZE] = {0};
|
||||
int i;
|
||||
|
||||
/* Release all allocated pending cb's, those cb's were never
|
||||
* scheduled so it is safe to release them here
|
||||
*/
|
||||
hl_pending_cb_list_flush(ctx);
|
||||
|
||||
/*
|
||||
* If we arrived here, there are no jobs waiting for this context
|
||||
* on its queues so we can safely remove it.
|
||||
|
@ -50,12 +55,15 @@ static void hl_ctx_fini(struct hl_ctx *ctx)
|
|||
|
||||
if ((!hdev->pldm) && (hdev->pdev) &&
|
||||
(!hdev->asic_funcs->is_device_idle(hdev,
|
||||
&idle_mask, NULL)))
|
||||
idle_mask,
|
||||
HL_BUSY_ENGINES_MASK_EXT_SIZE, NULL)))
|
||||
dev_notice(hdev->dev,
|
||||
"device not idle after user context is closed (0x%llx)\n",
|
||||
idle_mask);
|
||||
"device not idle after user context is closed (0x%llx, 0x%llx)\n",
|
||||
idle_mask[0], idle_mask[1]);
|
||||
} else {
|
||||
dev_dbg(hdev->dev, "closing kernel context\n");
|
||||
hdev->asic_funcs->ctx_fini(ctx);
|
||||
hl_vm_ctx_fini(ctx);
|
||||
hl_mmu_ctx_fini(ctx);
|
||||
}
|
||||
}
|
||||
|
@ -140,8 +148,11 @@ int hl_ctx_init(struct hl_device *hdev, struct hl_ctx *ctx, bool is_kernel_ctx)
|
|||
kref_init(&ctx->refcount);
|
||||
|
||||
ctx->cs_sequence = 1;
|
||||
INIT_LIST_HEAD(&ctx->pending_cb_list);
|
||||
spin_lock_init(&ctx->pending_cb_lock);
|
||||
spin_lock_init(&ctx->cs_lock);
|
||||
atomic_set(&ctx->thread_ctx_switch_token, 1);
|
||||
atomic_set(&ctx->thread_pending_cb_token, 1);
|
||||
ctx->thread_ctx_switch_wait_token = 0;
|
||||
ctx->cs_pending = kcalloc(hdev->asic_prop.max_pending_cs,
|
||||
sizeof(struct hl_fence *),
|
||||
|
@ -151,11 +162,18 @@ int hl_ctx_init(struct hl_device *hdev, struct hl_ctx *ctx, bool is_kernel_ctx)
|
|||
|
||||
if (is_kernel_ctx) {
|
||||
ctx->asid = HL_KERNEL_ASID_ID; /* Kernel driver gets ASID 0 */
|
||||
rc = hl_mmu_ctx_init(ctx);
|
||||
rc = hl_vm_ctx_init(ctx);
|
||||
if (rc) {
|
||||
dev_err(hdev->dev, "Failed to init mmu ctx module\n");
|
||||
dev_err(hdev->dev, "Failed to init mem ctx module\n");
|
||||
rc = -ENOMEM;
|
||||
goto err_free_cs_pending;
|
||||
}
|
||||
|
||||
rc = hdev->asic_funcs->ctx_init(ctx);
|
||||
if (rc) {
|
||||
dev_err(hdev->dev, "ctx_init failed\n");
|
||||
goto err_vm_ctx_fini;
|
||||
}
|
||||
} else {
|
||||
ctx->asid = hl_asid_alloc(hdev);
|
||||
if (!ctx->asid) {
|
||||
|
@ -194,7 +212,8 @@ err_cb_va_pool_fini:
|
|||
err_vm_ctx_fini:
|
||||
hl_vm_ctx_fini(ctx);
|
||||
err_asid_free:
|
||||
hl_asid_free(hdev, ctx->asid);
|
||||
if (ctx->asid != HL_KERNEL_ASID_ID)
|
||||
hl_asid_free(hdev, ctx->asid);
|
||||
err_free_cs_pending:
|
||||
kfree(ctx->cs_pending);
|
||||
|
||||
|
|
|
@ -310,8 +310,8 @@ static int mmu_show(struct seq_file *s, void *data)
|
|||
struct hl_dbg_device_entry *dev_entry = entry->dev_entry;
|
||||
struct hl_device *hdev = dev_entry->hdev;
|
||||
struct hl_ctx *ctx;
|
||||
struct hl_mmu_hop_info hops_info;
|
||||
u64 virt_addr = dev_entry->mmu_addr;
|
||||
struct hl_mmu_hop_info hops_info = {0};
|
||||
u64 virt_addr = dev_entry->mmu_addr, phys_addr;
|
||||
int i;
|
||||
|
||||
if (!hdev->mmu_enable)
|
||||
|
@ -333,8 +333,19 @@ static int mmu_show(struct seq_file *s, void *data)
|
|||
return 0;
|
||||
}
|
||||
|
||||
seq_printf(s, "asid: %u, virt_addr: 0x%llx\n",
|
||||
dev_entry->mmu_asid, dev_entry->mmu_addr);
|
||||
phys_addr = hops_info.hop_info[hops_info.used_hops - 1].hop_pte_val;
|
||||
|
||||
if (hops_info.scrambled_vaddr &&
|
||||
(dev_entry->mmu_addr != hops_info.scrambled_vaddr))
|
||||
seq_printf(s,
|
||||
"asid: %u, virt_addr: 0x%llx, scrambled virt_addr: 0x%llx,\nphys_addr: 0x%llx, scrambled_phys_addr: 0x%llx\n",
|
||||
dev_entry->mmu_asid, dev_entry->mmu_addr,
|
||||
hops_info.scrambled_vaddr,
|
||||
hops_info.unscrambled_paddr, phys_addr);
|
||||
else
|
||||
seq_printf(s,
|
||||
"asid: %u, virt_addr: 0x%llx, phys_addr: 0x%llx\n",
|
||||
dev_entry->mmu_asid, dev_entry->mmu_addr, phys_addr);
|
||||
|
||||
for (i = 0 ; i < hops_info.used_hops ; i++) {
|
||||
seq_printf(s, "hop%d_addr: 0x%llx\n",
|
||||
|
@ -403,7 +414,7 @@ static int engines_show(struct seq_file *s, void *data)
|
|||
return 0;
|
||||
}
|
||||
|
||||
hdev->asic_funcs->is_device_idle(hdev, NULL, s);
|
||||
hdev->asic_funcs->is_device_idle(hdev, NULL, 0, s);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
@ -865,6 +876,17 @@ static ssize_t hl_stop_on_err_write(struct file *f, const char __user *buf,
|
|||
return count;
|
||||
}
|
||||
|
||||
static ssize_t hl_security_violations_read(struct file *f, char __user *buf,
|
||||
size_t count, loff_t *ppos)
|
||||
{
|
||||
struct hl_dbg_device_entry *entry = file_inode(f)->i_private;
|
||||
struct hl_device *hdev = entry->hdev;
|
||||
|
||||
hdev->asic_funcs->ack_protection_bits_errors(hdev);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static const struct file_operations hl_data32b_fops = {
|
||||
.owner = THIS_MODULE,
|
||||
.read = hl_data_read32,
|
||||
|
@ -922,6 +944,11 @@ static const struct file_operations hl_stop_on_err_fops = {
|
|||
.write = hl_stop_on_err_write
|
||||
};
|
||||
|
||||
static const struct file_operations hl_security_violations_fops = {
|
||||
.owner = THIS_MODULE,
|
||||
.read = hl_security_violations_read
|
||||
};
|
||||
|
||||
static const struct hl_info_list hl_debugfs_list[] = {
|
||||
{"command_buffers", command_buffers_show, NULL},
|
||||
{"command_submission", command_submission_show, NULL},
|
||||
|
@ -1071,6 +1098,12 @@ void hl_debugfs_add_device(struct hl_device *hdev)
|
|||
dev_entry,
|
||||
&hl_stop_on_err_fops);
|
||||
|
||||
debugfs_create_file("dump_security_violations",
|
||||
0644,
|
||||
dev_entry->root,
|
||||
dev_entry,
|
||||
&hl_security_violations_fops);
|
||||
|
||||
for (i = 0, entry = dev_entry->entry_arr ; i < count ; i++, entry++) {
|
||||
|
||||
ent = debugfs_create_file(hl_debugfs_list[i].name,
|
||||
|
|
|
@ -142,6 +142,9 @@ static int hl_mmap(struct file *filp, struct vm_area_struct *vma)
|
|||
switch (vm_pgoff & HL_MMAP_TYPE_MASK) {
|
||||
case HL_MMAP_TYPE_CB:
|
||||
return hl_cb_mmap(hpriv, vma);
|
||||
|
||||
case HL_MMAP_TYPE_BLOCK:
|
||||
return hl_hw_block_mmap(hpriv, vma);
|
||||
}
|
||||
|
||||
return -EINVAL;
|
||||
|
@ -373,7 +376,6 @@ static int device_early_init(struct hl_device *hdev)
|
|||
|
||||
mutex_init(&hdev->send_cpu_message_lock);
|
||||
mutex_init(&hdev->debug_lock);
|
||||
mutex_init(&hdev->mmu_cache_lock);
|
||||
INIT_LIST_HEAD(&hdev->cs_mirror_list);
|
||||
spin_lock_init(&hdev->cs_mirror_lock);
|
||||
INIT_LIST_HEAD(&hdev->fpriv_list);
|
||||
|
@ -414,7 +416,6 @@ static void device_early_fini(struct hl_device *hdev)
|
|||
{
|
||||
int i;
|
||||
|
||||
mutex_destroy(&hdev->mmu_cache_lock);
|
||||
mutex_destroy(&hdev->debug_lock);
|
||||
mutex_destroy(&hdev->send_cpu_message_lock);
|
||||
|
||||
|
@ -1314,11 +1315,16 @@ int hl_device_init(struct hl_device *hdev, struct class *hclass)
|
|||
|
||||
hdev->compute_ctx = NULL;
|
||||
|
||||
hl_debugfs_add_device(hdev);
|
||||
|
||||
/* debugfs nodes are created in hl_ctx_init so it must be called after
|
||||
* hl_debugfs_add_device.
|
||||
*/
|
||||
rc = hl_ctx_init(hdev, hdev->kernel_ctx, true);
|
||||
if (rc) {
|
||||
dev_err(hdev->dev, "failed to initialize kernel context\n");
|
||||
kfree(hdev->kernel_ctx);
|
||||
goto mmu_fini;
|
||||
goto remove_device_from_debugfs;
|
||||
}
|
||||
|
||||
rc = hl_cb_pool_init(hdev);
|
||||
|
@ -1327,8 +1333,6 @@ int hl_device_init(struct hl_device *hdev, struct class *hclass)
|
|||
goto release_ctx;
|
||||
}
|
||||
|
||||
hl_debugfs_add_device(hdev);
|
||||
|
||||
/*
|
||||
* From this point, in case of an error, add char devices and create
|
||||
* sysfs nodes as part of the error flow, to allow debugging.
|
||||
|
@ -1417,6 +1421,8 @@ release_ctx:
|
|||
if (hl_ctx_put(hdev->kernel_ctx) != 1)
|
||||
dev_err(hdev->dev,
|
||||
"kernel ctx is still alive on initialization failure\n");
|
||||
remove_device_from_debugfs:
|
||||
hl_debugfs_remove_device(hdev);
|
||||
mmu_fini:
|
||||
hl_mmu_fini(hdev);
|
||||
eq_fini:
|
||||
|
@ -1482,7 +1488,8 @@ void hl_device_fini(struct hl_device *hdev)
|
|||
usleep_range(50, 200);
|
||||
rc = atomic_cmpxchg(&hdev->in_reset, 0, 1);
|
||||
if (ktime_compare(ktime_get(), timeout) > 0) {
|
||||
WARN(1, "Failed to remove device because reset function did not finish\n");
|
||||
dev_crit(hdev->dev,
|
||||
"Failed to remove device because reset function did not finish\n");
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
@ -1515,8 +1522,6 @@ void hl_device_fini(struct hl_device *hdev)
|
|||
|
||||
device_late_fini(hdev);
|
||||
|
||||
hl_debugfs_remove_device(hdev);
|
||||
|
||||
/*
|
||||
* Halt the engines and disable interrupts so we won't get any more
|
||||
* completions from H/W and we won't have any accesses from the
|
||||
|
@ -1548,6 +1553,8 @@ void hl_device_fini(struct hl_device *hdev)
|
|||
if ((hdev->kernel_ctx) && (hl_ctx_put(hdev->kernel_ctx) != 1))
|
||||
dev_err(hdev->dev, "kernel ctx is still alive\n");
|
||||
|
||||
hl_debugfs_remove_device(hdev);
|
||||
|
||||
hl_vm_fini(hdev);
|
||||
|
||||
hl_mmu_fini(hdev);
|
||||
|
|
|
@ -279,8 +279,74 @@ int hl_fw_send_heartbeat(struct hl_device *hdev)
|
|||
return rc;
|
||||
}
|
||||
|
||||
static int fw_read_errors(struct hl_device *hdev, u32 boot_err0_reg,
|
||||
u32 cpu_security_boot_status_reg)
|
||||
{
|
||||
u32 err_val, security_val;
|
||||
|
||||
/* Some of the firmware status codes are deprecated in newer f/w
|
||||
* versions. In those versions, the errors are reported
|
||||
* in different registers. Therefore, we need to check those
|
||||
* registers and print the exact errors. Moreover, there
|
||||
* may be multiple errors, so we need to report on each error
|
||||
* separately. Some of the error codes might indicate a state
|
||||
* that is not an error per-se, but it is an error in production
|
||||
* environment
|
||||
*/
|
||||
err_val = RREG32(boot_err0_reg);
|
||||
if (!(err_val & CPU_BOOT_ERR0_ENABLED))
|
||||
return 0;
|
||||
|
||||
if (err_val & CPU_BOOT_ERR0_DRAM_INIT_FAIL)
|
||||
dev_err(hdev->dev,
|
||||
"Device boot error - DRAM initialization failed\n");
|
||||
if (err_val & CPU_BOOT_ERR0_FIT_CORRUPTED)
|
||||
dev_err(hdev->dev, "Device boot error - FIT image corrupted\n");
|
||||
if (err_val & CPU_BOOT_ERR0_TS_INIT_FAIL)
|
||||
dev_err(hdev->dev,
|
||||
"Device boot error - Thermal Sensor initialization failed\n");
|
||||
if (err_val & CPU_BOOT_ERR0_DRAM_SKIPPED)
|
||||
dev_warn(hdev->dev,
|
||||
"Device boot warning - Skipped DRAM initialization\n");
|
||||
|
||||
if (err_val & CPU_BOOT_ERR0_BMC_WAIT_SKIPPED) {
|
||||
if (hdev->bmc_enable)
|
||||
dev_warn(hdev->dev,
|
||||
"Device boot error - Skipped waiting for BMC\n");
|
||||
else
|
||||
err_val &= ~CPU_BOOT_ERR0_BMC_WAIT_SKIPPED;
|
||||
}
|
||||
|
||||
if (err_val & CPU_BOOT_ERR0_NIC_DATA_NOT_RDY)
|
||||
dev_err(hdev->dev,
|
||||
"Device boot error - Serdes data from BMC not available\n");
|
||||
if (err_val & CPU_BOOT_ERR0_NIC_FW_FAIL)
|
||||
dev_err(hdev->dev,
|
||||
"Device boot error - NIC F/W initialization failed\n");
|
||||
if (err_val & CPU_BOOT_ERR0_SECURITY_NOT_RDY)
|
||||
dev_warn(hdev->dev,
|
||||
"Device boot warning - security not ready\n");
|
||||
if (err_val & CPU_BOOT_ERR0_SECURITY_FAIL)
|
||||
dev_err(hdev->dev, "Device boot error - security failure\n");
|
||||
if (err_val & CPU_BOOT_ERR0_EFUSE_FAIL)
|
||||
dev_err(hdev->dev, "Device boot error - eFuse failure\n");
|
||||
if (err_val & CPU_BOOT_ERR0_PLL_FAIL)
|
||||
dev_err(hdev->dev, "Device boot error - PLL failure\n");
|
||||
|
||||
security_val = RREG32(cpu_security_boot_status_reg);
|
||||
if (security_val & CPU_BOOT_DEV_STS0_ENABLED)
|
||||
dev_dbg(hdev->dev, "Device security status %#x\n",
|
||||
security_val);
|
||||
|
||||
if (err_val & ~CPU_BOOT_ERR0_ENABLED)
|
||||
return -EIO;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
int hl_fw_cpucp_info_get(struct hl_device *hdev,
|
||||
u32 cpu_security_boot_status_reg)
|
||||
u32 cpu_security_boot_status_reg,
|
||||
u32 boot_err0_reg)
|
||||
{
|
||||
struct asic_fixed_properties *prop = &hdev->asic_prop;
|
||||
struct cpucp_packet pkt = {};
|
||||
|
@ -314,6 +380,12 @@ int hl_fw_cpucp_info_get(struct hl_device *hdev,
|
|||
goto out;
|
||||
}
|
||||
|
||||
rc = fw_read_errors(hdev, boot_err0_reg, cpu_security_boot_status_reg);
|
||||
if (rc) {
|
||||
dev_err(hdev->dev, "Errors in device boot\n");
|
||||
goto out;
|
||||
}
|
||||
|
||||
memcpy(&prop->cpucp_info, cpucp_info_cpu_addr,
|
||||
sizeof(prop->cpucp_info));
|
||||
|
||||
|
@ -483,58 +555,6 @@ int hl_fw_cpucp_pll_info_get(struct hl_device *hdev, u16 pll_index,
|
|||
return rc;
|
||||
}
|
||||
|
||||
static void fw_read_errors(struct hl_device *hdev, u32 boot_err0_reg,
|
||||
u32 cpu_security_boot_status_reg)
|
||||
{
|
||||
u32 err_val, security_val;
|
||||
|
||||
/* Some of the firmware status codes are deprecated in newer f/w
|
||||
* versions. In those versions, the errors are reported
|
||||
* in different registers. Therefore, we need to check those
|
||||
* registers and print the exact errors. Moreover, there
|
||||
* may be multiple errors, so we need to report on each error
|
||||
* separately. Some of the error codes might indicate a state
|
||||
* that is not an error per-se, but it is an error in production
|
||||
* environment
|
||||
*/
|
||||
err_val = RREG32(boot_err0_reg);
|
||||
if (!(err_val & CPU_BOOT_ERR0_ENABLED))
|
||||
return;
|
||||
|
||||
if (err_val & CPU_BOOT_ERR0_DRAM_INIT_FAIL)
|
||||
dev_err(hdev->dev,
|
||||
"Device boot error - DRAM initialization failed\n");
|
||||
if (err_val & CPU_BOOT_ERR0_FIT_CORRUPTED)
|
||||
dev_err(hdev->dev, "Device boot error - FIT image corrupted\n");
|
||||
if (err_val & CPU_BOOT_ERR0_TS_INIT_FAIL)
|
||||
dev_err(hdev->dev,
|
||||
"Device boot error - Thermal Sensor initialization failed\n");
|
||||
if (err_val & CPU_BOOT_ERR0_DRAM_SKIPPED)
|
||||
dev_warn(hdev->dev,
|
||||
"Device boot warning - Skipped DRAM initialization\n");
|
||||
if (err_val & CPU_BOOT_ERR0_BMC_WAIT_SKIPPED)
|
||||
dev_warn(hdev->dev,
|
||||
"Device boot error - Skipped waiting for BMC\n");
|
||||
if (err_val & CPU_BOOT_ERR0_NIC_DATA_NOT_RDY)
|
||||
dev_err(hdev->dev,
|
||||
"Device boot error - Serdes data from BMC not available\n");
|
||||
if (err_val & CPU_BOOT_ERR0_NIC_FW_FAIL)
|
||||
dev_err(hdev->dev,
|
||||
"Device boot error - NIC F/W initialization failed\n");
|
||||
if (err_val & CPU_BOOT_ERR0_SECURITY_NOT_RDY)
|
||||
dev_warn(hdev->dev,
|
||||
"Device boot warning - security not ready\n");
|
||||
if (err_val & CPU_BOOT_ERR0_SECURITY_FAIL)
|
||||
dev_err(hdev->dev, "Device boot error - security failure\n");
|
||||
if (err_val & CPU_BOOT_ERR0_EFUSE_FAIL)
|
||||
dev_err(hdev->dev, "Device boot error - eFuse failure\n");
|
||||
|
||||
security_val = RREG32(cpu_security_boot_status_reg);
|
||||
if (security_val & CPU_BOOT_DEV_STS0_ENABLED)
|
||||
dev_dbg(hdev->dev, "Device security status %#x\n",
|
||||
security_val);
|
||||
}
|
||||
|
||||
static void detect_cpu_boot_status(struct hl_device *hdev, u32 status)
|
||||
{
|
||||
/* Some of the status codes below are deprecated in newer f/w
|
||||
|
@ -659,6 +679,9 @@ int hl_fw_read_preboot_status(struct hl_device *hdev, u32 cpu_boot_status_reg,
|
|||
prop->fw_security_disabled = true;
|
||||
}
|
||||
|
||||
dev_dbg(hdev->dev, "Firmware preboot security status %#x\n",
|
||||
security_status);
|
||||
|
||||
dev_dbg(hdev->dev, "Firmware preboot hard-reset is %s\n",
|
||||
prop->hard_reset_done_by_fw ? "enabled" : "disabled");
|
||||
|
||||
|
@ -753,6 +776,10 @@ int hl_fw_init_cpu(struct hl_device *hdev, u32 cpu_boot_status_reg,
|
|||
if (prop->fw_boot_cpu_security_map &
|
||||
CPU_BOOT_DEV_STS0_FW_HARD_RST_EN)
|
||||
prop->hard_reset_done_by_fw = true;
|
||||
|
||||
dev_dbg(hdev->dev,
|
||||
"Firmware boot CPU security status %#x\n",
|
||||
prop->fw_boot_cpu_security_map);
|
||||
}
|
||||
|
||||
dev_dbg(hdev->dev, "Firmware boot CPU hard-reset is %s\n",
|
||||
|
@ -826,6 +853,10 @@ int hl_fw_init_cpu(struct hl_device *hdev, u32 cpu_boot_status_reg,
|
|||
goto out;
|
||||
}
|
||||
|
||||
rc = fw_read_errors(hdev, boot_err0_reg, cpu_security_boot_status_reg);
|
||||
if (rc)
|
||||
return rc;
|
||||
|
||||
/* Clear reset status since we need to read again from app */
|
||||
prop->hard_reset_done_by_fw = false;
|
||||
|
||||
|
@ -837,6 +868,10 @@ int hl_fw_init_cpu(struct hl_device *hdev, u32 cpu_boot_status_reg,
|
|||
if (prop->fw_app_security_map &
|
||||
CPU_BOOT_DEV_STS0_FW_HARD_RST_EN)
|
||||
prop->hard_reset_done_by_fw = true;
|
||||
|
||||
dev_dbg(hdev->dev,
|
||||
"Firmware application CPU security status %#x\n",
|
||||
prop->fw_app_security_map);
|
||||
}
|
||||
|
||||
dev_dbg(hdev->dev, "Firmware application CPU hard-reset is %s\n",
|
||||
|
@ -844,6 +879,8 @@ int hl_fw_init_cpu(struct hl_device *hdev, u32 cpu_boot_status_reg,
|
|||
|
||||
dev_info(hdev->dev, "Successfully loaded firmware to device\n");
|
||||
|
||||
return 0;
|
||||
|
||||
out:
|
||||
fw_read_errors(hdev, boot_err0_reg, cpu_security_boot_status_reg);
|
||||
|
||||
|
|
|
@ -28,17 +28,18 @@
|
|||
#define HL_NAME "habanalabs"
|
||||
|
||||
/* Use upper bits of mmap offset to store habana driver specific information.
|
||||
* bits[63:62] - Encode mmap type
|
||||
* bits[63:61] - Encode mmap type
|
||||
* bits[45:0] - mmap offset value
|
||||
*
|
||||
* NOTE: struct vm_area_struct.vm_pgoff uses offset in pages. Hence, these
|
||||
* defines are w.r.t to PAGE_SIZE
|
||||
*/
|
||||
#define HL_MMAP_TYPE_SHIFT (62 - PAGE_SHIFT)
|
||||
#define HL_MMAP_TYPE_MASK (0x3ull << HL_MMAP_TYPE_SHIFT)
|
||||
#define HL_MMAP_TYPE_SHIFT (61 - PAGE_SHIFT)
|
||||
#define HL_MMAP_TYPE_MASK (0x7ull << HL_MMAP_TYPE_SHIFT)
|
||||
#define HL_MMAP_TYPE_BLOCK (0x4ull << HL_MMAP_TYPE_SHIFT)
|
||||
#define HL_MMAP_TYPE_CB (0x2ull << HL_MMAP_TYPE_SHIFT)
|
||||
|
||||
#define HL_MMAP_OFFSET_VALUE_MASK (0x3FFFFFFFFFFFull >> PAGE_SHIFT)
|
||||
#define HL_MMAP_OFFSET_VALUE_MASK (0x1FFFFFFFFFFFull >> PAGE_SHIFT)
|
||||
#define HL_MMAP_OFFSET_VALUE_GET(off) (off & HL_MMAP_OFFSET_VALUE_MASK)
|
||||
|
||||
#define HL_PENDING_RESET_PER_SEC 10
|
||||
|
@ -408,6 +409,8 @@ struct hl_mmu_properties {
|
|||
* @sync_stream_first_mon: first monitor available for sync stream use
|
||||
* @first_available_user_sob: first sob available for the user
|
||||
* @first_available_user_mon: first monitor available for the user
|
||||
* @first_available_user_msix_interrupt: first available msix interrupt
|
||||
* reserved for the user
|
||||
* @tpc_enabled_mask: which TPCs are enabled.
|
||||
* @completion_queues_count: number of completion queues.
|
||||
* @fw_security_disabled: true if security measures are disabled in firmware,
|
||||
|
@ -416,6 +419,7 @@ struct hl_mmu_properties {
|
|||
* from BOOT_DEV_STS0
|
||||
* @dram_supports_virtual_memory: is there an MMU towards the DRAM
|
||||
* @hard_reset_done_by_fw: true if firmware is handling hard reset flow
|
||||
* @num_functional_hbms: number of functional HBMs in each DCORE.
|
||||
*/
|
||||
struct asic_fixed_properties {
|
||||
struct hw_queue_properties *hw_queues_props;
|
||||
|
@ -468,18 +472,21 @@ struct asic_fixed_properties {
|
|||
u16 sync_stream_first_mon;
|
||||
u16 first_available_user_sob[HL_MAX_DCORES];
|
||||
u16 first_available_user_mon[HL_MAX_DCORES];
|
||||
u16 first_available_user_msix_interrupt;
|
||||
u8 tpc_enabled_mask;
|
||||
u8 completion_queues_count;
|
||||
u8 fw_security_disabled;
|
||||
u8 fw_security_status_valid;
|
||||
u8 dram_supports_virtual_memory;
|
||||
u8 hard_reset_done_by_fw;
|
||||
u8 num_functional_hbms;
|
||||
};
|
||||
|
||||
/**
|
||||
* struct hl_fence - software synchronization primitive
|
||||
* @completion: fence is implemented using completion
|
||||
* @refcount: refcount for this fence
|
||||
* @cs_sequence: sequence of the corresponding command submission
|
||||
* @error: mark this fence with error
|
||||
* @timestamp: timestamp upon completion
|
||||
*
|
||||
|
@ -487,6 +494,7 @@ struct asic_fixed_properties {
|
|||
struct hl_fence {
|
||||
struct completion completion;
|
||||
struct kref refcount;
|
||||
u64 cs_sequence;
|
||||
int error;
|
||||
ktime_t timestamp;
|
||||
};
|
||||
|
@ -846,6 +854,13 @@ enum div_select_defs {
|
|||
* @collective_wait_init_cs: Generate collective master/slave packets
|
||||
* and place them in the relevant cs jobs
|
||||
* @collective_wait_create_jobs: allocate collective wait cs jobs
|
||||
* @scramble_addr: Routine to scramble the address prior of mapping it
|
||||
* in the MMU.
|
||||
* @descramble_addr: Routine to de-scramble the address prior of
|
||||
* showing it to users.
|
||||
* @ack_protection_bits_errors: ack and dump all security violations
|
||||
* @get_hw_block_id: retrieve a HW block id to be used by the user to mmap it.
|
||||
* @hw_block_mmap: mmap a HW block with a given id.
|
||||
*/
|
||||
struct hl_asic_funcs {
|
||||
int (*early_init)(struct hl_device *hdev);
|
||||
|
@ -918,8 +933,8 @@ struct hl_asic_funcs {
|
|||
void (*set_clock_gating)(struct hl_device *hdev);
|
||||
void (*disable_clock_gating)(struct hl_device *hdev);
|
||||
int (*debug_coresight)(struct hl_device *hdev, void *data);
|
||||
bool (*is_device_idle)(struct hl_device *hdev, u64 *mask,
|
||||
struct seq_file *s);
|
||||
bool (*is_device_idle)(struct hl_device *hdev, u64 *mask_arr,
|
||||
u8 mask_len, struct seq_file *s);
|
||||
int (*soft_reset_late_init)(struct hl_device *hdev);
|
||||
void (*hw_queues_lock)(struct hl_device *hdev);
|
||||
void (*hw_queues_unlock)(struct hl_device *hdev);
|
||||
|
@ -955,6 +970,13 @@ struct hl_asic_funcs {
|
|||
int (*collective_wait_create_jobs)(struct hl_device *hdev,
|
||||
struct hl_ctx *ctx, struct hl_cs *cs, u32 wait_queue_id,
|
||||
u32 collective_engine_id);
|
||||
u64 (*scramble_addr)(struct hl_device *hdev, u64 addr);
|
||||
u64 (*descramble_addr)(struct hl_device *hdev, u64 addr);
|
||||
void (*ack_protection_bits_errors)(struct hl_device *hdev);
|
||||
int (*get_hw_block_id)(struct hl_device *hdev, u64 block_addr,
|
||||
u32 *block_id);
|
||||
int (*hw_block_mmap)(struct hl_device *hdev, struct vm_area_struct *vma,
|
||||
u32 block_id, u32 block_size);
|
||||
};
|
||||
|
||||
|
||||
|
@ -1011,6 +1033,20 @@ struct hl_cs_counters_atomic {
|
|||
atomic64_t validation_drop_cnt;
|
||||
};
|
||||
|
||||
/**
|
||||
* struct hl_pending_cb - pending command buffer structure
|
||||
* @cb_node: cb node in pending cb list
|
||||
* @cb: command buffer to send in next submission
|
||||
* @cb_size: command buffer size
|
||||
* @hw_queue_id: destination queue id
|
||||
*/
|
||||
struct hl_pending_cb {
|
||||
struct list_head cb_node;
|
||||
struct hl_cb *cb;
|
||||
u32 cb_size;
|
||||
u32 hw_queue_id;
|
||||
};
|
||||
|
||||
/**
|
||||
* struct hl_ctx - user/kernel context.
|
||||
* @mem_hash: holds mapping from virtual address to virtual memory area
|
||||
|
@ -1026,6 +1062,8 @@ struct hl_cs_counters_atomic {
|
|||
* @mmu_lock: protects the MMU page tables. Any change to the PGT, modifying the
|
||||
* MMU hash or walking the PGT requires talking this lock.
|
||||
* @debugfs_list: node in debugfs list of contexts.
|
||||
* pending_cb_list: list of pending command buffers waiting to be sent upon
|
||||
* next user command submission context.
|
||||
* @cs_counters: context command submission counters.
|
||||
* @cb_va_pool: device VA pool for command buffers which are mapped to the
|
||||
* device's MMU.
|
||||
|
@ -1034,11 +1072,17 @@ struct hl_cs_counters_atomic {
|
|||
* index to cs_pending array.
|
||||
* @dram_default_hops: array that holds all hops addresses needed for default
|
||||
* DRAM mapping.
|
||||
* @pending_cb_lock: spinlock to protect pending cb list
|
||||
* @cs_lock: spinlock to protect cs_sequence.
|
||||
* @dram_phys_mem: amount of used physical DRAM memory by this context.
|
||||
* @thread_ctx_switch_token: token to prevent multiple threads of the same
|
||||
* context from running the context switch phase.
|
||||
* Only a single thread should run it.
|
||||
* @thread_pending_cb_token: token to prevent multiple threads from processing
|
||||
* the pending CB list. Only a single thread should
|
||||
* process the list since it is protected by a
|
||||
* spinlock and we don't want to halt the entire
|
||||
* command submission sequence.
|
||||
* @thread_ctx_switch_wait_token: token to prevent the threads that didn't run
|
||||
* the context switch phase from moving to their
|
||||
* execution phase before the context switch phase
|
||||
|
@ -1057,13 +1101,16 @@ struct hl_ctx {
|
|||
struct mutex mem_hash_lock;
|
||||
struct mutex mmu_lock;
|
||||
struct list_head debugfs_list;
|
||||
struct list_head pending_cb_list;
|
||||
struct hl_cs_counters_atomic cs_counters;
|
||||
struct gen_pool *cb_va_pool;
|
||||
u64 cs_sequence;
|
||||
u64 *dram_default_hops;
|
||||
spinlock_t pending_cb_lock;
|
||||
spinlock_t cs_lock;
|
||||
atomic64_t dram_phys_mem;
|
||||
atomic_t thread_ctx_switch_token;
|
||||
atomic_t thread_pending_cb_token;
|
||||
u32 thread_ctx_switch_wait_token;
|
||||
u32 asid;
|
||||
u32 handle;
|
||||
|
@ -1122,8 +1169,11 @@ struct hl_userptr {
|
|||
* @finish_work: workqueue object to run when CS is completed by H/W.
|
||||
* @work_tdr: delayed work node for TDR.
|
||||
* @mirror_node : node in device mirror list of command submissions.
|
||||
* @staged_cs_node: node in the staged cs list.
|
||||
* @debugfs_list: node in debugfs list of command submissions.
|
||||
* @sequence: the sequence number of this CS.
|
||||
* @staged_sequence: the sequence of the staged submission this CS is part of,
|
||||
* relevant only if staged_cs is set.
|
||||
* @type: CS_TYPE_*.
|
||||
* @submitted: true if CS was submitted to H/W.
|
||||
* @completed: true if CS was completed by device.
|
||||
|
@ -1131,7 +1181,11 @@ struct hl_userptr {
|
|||
* @tdr_active: true if TDR was activated for this CS (to prevent
|
||||
* double TDR activation).
|
||||
* @aborted: true if CS was aborted due to some device error.
|
||||
* @timestamp: true if a timestmap must be captured upon completion
|
||||
* @timestamp: true if a timestmap must be captured upon completion.
|
||||
* @staged_last: true if this is the last staged CS and needs completion.
|
||||
* @staged_first: true if this is the first staged CS and we need to receive
|
||||
* timeout for this CS.
|
||||
* @staged_cs: true if this CS is part of a staged submission.
|
||||
*/
|
||||
struct hl_cs {
|
||||
u16 *jobs_in_queue_cnt;
|
||||
|
@ -1144,8 +1198,10 @@ struct hl_cs {
|
|||
struct work_struct finish_work;
|
||||
struct delayed_work work_tdr;
|
||||
struct list_head mirror_node;
|
||||
struct list_head staged_cs_node;
|
||||
struct list_head debugfs_list;
|
||||
u64 sequence;
|
||||
u64 staged_sequence;
|
||||
enum hl_cs_type type;
|
||||
u8 submitted;
|
||||
u8 completed;
|
||||
|
@ -1153,6 +1209,9 @@ struct hl_cs {
|
|||
u8 tdr_active;
|
||||
u8 aborted;
|
||||
u8 timestamp;
|
||||
u8 staged_last;
|
||||
u8 staged_first;
|
||||
u8 staged_cs;
|
||||
};
|
||||
|
||||
/**
|
||||
|
@ -1223,6 +1282,7 @@ struct hl_cs_job {
|
|||
* MSG_PROT packets. Relevant only for GAUDI as GOYA doesn't
|
||||
* have streams so the engine can't be busy by another
|
||||
* stream.
|
||||
* @completion: true if we need completion for this CS.
|
||||
*/
|
||||
struct hl_cs_parser {
|
||||
struct hl_cb *user_cb;
|
||||
|
@ -1237,6 +1297,7 @@ struct hl_cs_parser {
|
|||
u8 job_id;
|
||||
u8 is_kernel_allocated_cb;
|
||||
u8 contains_dma_pkt;
|
||||
u8 completion;
|
||||
};
|
||||
|
||||
/*
|
||||
|
@ -1686,12 +1747,20 @@ struct hl_mmu_per_hop_info {
|
|||
* struct hl_mmu_hop_info - A structure describing the TLB hops and their
|
||||
* hop-entries that were created in order to translate a virtual address to a
|
||||
* physical one.
|
||||
* @scrambled_vaddr: The value of the virtual address after scrambling. This
|
||||
* address replaces the original virtual-address when mapped
|
||||
* in the MMU tables.
|
||||
* @unscrambled_paddr: The un-scrambled physical address.
|
||||
* @hop_info: Array holding the per-hop information used for the translation.
|
||||
* @used_hops: The number of hops used for the translation.
|
||||
* @range_type: virtual address range type.
|
||||
*/
|
||||
struct hl_mmu_hop_info {
|
||||
u64 scrambled_vaddr;
|
||||
u64 unscrambled_paddr;
|
||||
struct hl_mmu_per_hop_info hop_info[MMU_ARCH_5_HOPS];
|
||||
u32 used_hops;
|
||||
enum hl_va_range_type range_type;
|
||||
};
|
||||
|
||||
/**
|
||||
|
@ -1764,7 +1833,6 @@ struct hl_mmu_funcs {
|
|||
* @asic_funcs: ASIC specific functions.
|
||||
* @asic_specific: ASIC specific information to use only from ASIC files.
|
||||
* @vm: virtual memory manager for MMU.
|
||||
* @mmu_cache_lock: protects MMU cache invalidation as it can serve one context.
|
||||
* @hwmon_dev: H/W monitor device.
|
||||
* @pm_mng_profile: current power management profile.
|
||||
* @hl_chip_info: ASIC's sensors information.
|
||||
|
@ -1842,6 +1910,7 @@ struct hl_mmu_funcs {
|
|||
* user processes
|
||||
* @device_fini_pending: true if device_fini was called and might be
|
||||
* waiting for the reset thread to finish
|
||||
* @supports_staged_submission: true if staged submissions are supported
|
||||
*/
|
||||
struct hl_device {
|
||||
struct pci_dev *pdev;
|
||||
|
@ -1879,7 +1948,6 @@ struct hl_device {
|
|||
const struct hl_asic_funcs *asic_funcs;
|
||||
void *asic_specific;
|
||||
struct hl_vm vm;
|
||||
struct mutex mmu_cache_lock;
|
||||
struct device *hwmon_dev;
|
||||
enum hl_pm_mng_profile pm_mng_profile;
|
||||
struct hwmon_chip_info *hl_chip_info;
|
||||
|
@ -1948,6 +2016,7 @@ struct hl_device {
|
|||
u8 needs_reset;
|
||||
u8 process_kill_trial_cnt;
|
||||
u8 device_fini_pending;
|
||||
u8 supports_staged_submission;
|
||||
|
||||
/* Parameters for bring-up */
|
||||
u64 nic_ports_mask;
|
||||
|
@ -2065,7 +2134,7 @@ int hl_hw_queue_send_cb_no_cmpl(struct hl_device *hdev, u32 hw_queue_id,
|
|||
int hl_hw_queue_schedule_cs(struct hl_cs *cs);
|
||||
u32 hl_hw_queue_add_ptr(u32 ptr, u16 val);
|
||||
void hl_hw_queue_inc_ci_kernel(struct hl_device *hdev, u32 hw_queue_id);
|
||||
void hl_int_hw_queue_update_ci(struct hl_cs *cs);
|
||||
void hl_hw_queue_update_ci(struct hl_cs *cs);
|
||||
void hl_hw_queue_reset(struct hl_device *hdev, bool hard_reset);
|
||||
|
||||
#define hl_queue_inc_ptr(p) hl_hw_queue_add_ptr(p, 1)
|
||||
|
@ -2121,6 +2190,7 @@ int hl_cb_create(struct hl_device *hdev, struct hl_cb_mgr *mgr,
|
|||
bool map_cb, u64 *handle);
|
||||
int hl_cb_destroy(struct hl_device *hdev, struct hl_cb_mgr *mgr, u64 cb_handle);
|
||||
int hl_cb_mmap(struct hl_fpriv *hpriv, struct vm_area_struct *vma);
|
||||
int hl_hw_block_mmap(struct hl_fpriv *hpriv, struct vm_area_struct *vma);
|
||||
struct hl_cb *hl_cb_get(struct hl_device *hdev, struct hl_cb_mgr *mgr,
|
||||
u32 handle);
|
||||
void hl_cb_put(struct hl_cb *cb);
|
||||
|
@ -2134,6 +2204,7 @@ int hl_cb_va_pool_init(struct hl_ctx *ctx);
|
|||
void hl_cb_va_pool_fini(struct hl_ctx *ctx);
|
||||
|
||||
void hl_cs_rollback_all(struct hl_device *hdev);
|
||||
void hl_pending_cb_list_flush(struct hl_ctx *ctx);
|
||||
struct hl_cs_job *hl_cs_allocate_job(struct hl_device *hdev,
|
||||
enum hl_queue_type queue_type, bool is_kernel_allocated_cb);
|
||||
void hl_sob_reset_error(struct kref *ref);
|
||||
|
@ -2141,6 +2212,10 @@ int hl_gen_sob_mask(u16 sob_base, u8 sob_mask, u8 *mask);
|
|||
void hl_fence_put(struct hl_fence *fence);
|
||||
void hl_fence_get(struct hl_fence *fence);
|
||||
void cs_get(struct hl_cs *cs);
|
||||
bool cs_needs_completion(struct hl_cs *cs);
|
||||
bool cs_needs_timeout(struct hl_cs *cs);
|
||||
bool is_staged_cs_last_exists(struct hl_device *hdev, struct hl_cs *cs);
|
||||
struct hl_cs *hl_staged_cs_find_first(struct hl_device *hdev, u64 cs_seq);
|
||||
|
||||
void goya_set_asic_funcs(struct hl_device *hdev);
|
||||
void gaudi_set_asic_funcs(struct hl_device *hdev);
|
||||
|
@ -2182,6 +2257,8 @@ void hl_mmu_v1_set_funcs(struct hl_device *hdev, struct hl_mmu_funcs *mmu);
|
|||
int hl_mmu_va_to_pa(struct hl_ctx *ctx, u64 virt_addr, u64 *phys_addr);
|
||||
int hl_mmu_get_tlb_info(struct hl_ctx *ctx, u64 virt_addr,
|
||||
struct hl_mmu_hop_info *hops);
|
||||
u64 hl_mmu_scramble_addr(struct hl_device *hdev, u64 addr);
|
||||
u64 hl_mmu_descramble_addr(struct hl_device *hdev, u64 addr);
|
||||
bool hl_is_dram_va(struct hl_device *hdev, u64 virt_addr);
|
||||
|
||||
int hl_fw_load_fw_to_device(struct hl_device *hdev, const char *fw_name,
|
||||
|
@ -2199,7 +2276,8 @@ void hl_fw_cpu_accessible_dma_pool_free(struct hl_device *hdev, size_t size,
|
|||
void *vaddr);
|
||||
int hl_fw_send_heartbeat(struct hl_device *hdev);
|
||||
int hl_fw_cpucp_info_get(struct hl_device *hdev,
|
||||
u32 cpu_security_boot_status_reg);
|
||||
u32 cpu_security_boot_status_reg,
|
||||
u32 boot_err0_reg);
|
||||
int hl_fw_get_eeprom_data(struct hl_device *hdev, void *data, size_t max_size);
|
||||
int hl_fw_cpucp_pci_counters_get(struct hl_device *hdev,
|
||||
struct hl_info_pci_counters *counters);
|
||||
|
|
|
@ -57,12 +57,23 @@ static int hw_ip_info(struct hl_device *hdev, struct hl_info_args *args)
|
|||
|
||||
hw_ip.device_id = hdev->asic_funcs->get_pci_id(hdev);
|
||||
hw_ip.sram_base_address = prop->sram_user_base_address;
|
||||
hw_ip.dram_base_address = prop->dram_user_base_address;
|
||||
hw_ip.dram_base_address =
|
||||
hdev->mmu_enable && prop->dram_supports_virtual_memory ?
|
||||
prop->dmmu.start_addr : prop->dram_user_base_address;
|
||||
hw_ip.tpc_enabled_mask = prop->tpc_enabled_mask;
|
||||
hw_ip.sram_size = prop->sram_size - sram_kmd_size;
|
||||
hw_ip.dram_size = prop->dram_size - dram_kmd_size;
|
||||
|
||||
if (hdev->mmu_enable)
|
||||
hw_ip.dram_size =
|
||||
DIV_ROUND_DOWN_ULL(prop->dram_size - dram_kmd_size,
|
||||
prop->dram_page_size) *
|
||||
prop->dram_page_size;
|
||||
else
|
||||
hw_ip.dram_size = prop->dram_size - dram_kmd_size;
|
||||
|
||||
if (hw_ip.dram_size > PAGE_SIZE)
|
||||
hw_ip.dram_enabled = 1;
|
||||
hw_ip.dram_page_size = prop->dram_page_size;
|
||||
hw_ip.num_of_events = prop->num_of_events;
|
||||
|
||||
memcpy(hw_ip.cpucp_version, prop->cpucp_info.cpucp_version,
|
||||
|
@ -79,6 +90,8 @@ static int hw_ip_info(struct hl_device *hdev, struct hl_info_args *args)
|
|||
hw_ip.psoc_pci_pll_od = prop->psoc_pci_pll_od;
|
||||
hw_ip.psoc_pci_pll_div_factor = prop->psoc_pci_pll_div_factor;
|
||||
|
||||
hw_ip.first_available_interrupt_id =
|
||||
prop->first_available_user_msix_interrupt;
|
||||
return copy_to_user(out, &hw_ip,
|
||||
min((size_t)size, sizeof(hw_ip))) ? -EFAULT : 0;
|
||||
}
|
||||
|
@ -132,9 +145,10 @@ static int hw_idle(struct hl_device *hdev, struct hl_info_args *args)
|
|||
return -EINVAL;
|
||||
|
||||
hw_idle.is_idle = hdev->asic_funcs->is_device_idle(hdev,
|
||||
&hw_idle.busy_engines_mask_ext, NULL);
|
||||
hw_idle.busy_engines_mask_ext,
|
||||
HL_BUSY_ENGINES_MASK_EXT_SIZE, NULL);
|
||||
hw_idle.busy_engines_mask =
|
||||
lower_32_bits(hw_idle.busy_engines_mask_ext);
|
||||
lower_32_bits(hw_idle.busy_engines_mask_ext[0]);
|
||||
|
||||
return copy_to_user(out, &hw_idle,
|
||||
min((size_t) max_size, sizeof(hw_idle))) ? -EFAULT : 0;
|
||||
|
|
|
@ -38,7 +38,7 @@ static inline int queue_free_slots(struct hl_hw_queue *q, u32 queue_len)
|
|||
return (abs(delta) - queue_len);
|
||||
}
|
||||
|
||||
void hl_int_hw_queue_update_ci(struct hl_cs *cs)
|
||||
void hl_hw_queue_update_ci(struct hl_cs *cs)
|
||||
{
|
||||
struct hl_device *hdev = cs->ctx->hdev;
|
||||
struct hl_hw_queue *q;
|
||||
|
@ -53,8 +53,13 @@ void hl_int_hw_queue_update_ci(struct hl_cs *cs)
|
|||
if (!hdev->asic_prop.max_queues || q->queue_type == QUEUE_TYPE_HW)
|
||||
return;
|
||||
|
||||
/* We must increment CI for every queue that will never get a
|
||||
* completion, there are 2 scenarios this can happen:
|
||||
* 1. All queues of a non completion CS will never get a completion.
|
||||
* 2. Internal queues never gets completion.
|
||||
*/
|
||||
for (i = 0 ; i < hdev->asic_prop.max_queues ; i++, q++) {
|
||||
if (q->queue_type == QUEUE_TYPE_INT)
|
||||
if (!cs_needs_completion(cs) || q->queue_type == QUEUE_TYPE_INT)
|
||||
atomic_add(cs->jobs_in_queue_cnt[i], &q->ci);
|
||||
}
|
||||
}
|
||||
|
@ -292,6 +297,10 @@ static void ext_queue_schedule_job(struct hl_cs_job *job)
|
|||
len = job->job_cb_size;
|
||||
ptr = cb->bus_address;
|
||||
|
||||
/* Skip completion flow in case this is a non completion CS */
|
||||
if (!cs_needs_completion(job->cs))
|
||||
goto submit_bd;
|
||||
|
||||
cq_pkt.data = cpu_to_le32(
|
||||
((q->pi << CQ_ENTRY_SHADOW_INDEX_SHIFT)
|
||||
& CQ_ENTRY_SHADOW_INDEX_MASK) |
|
||||
|
@ -318,6 +327,7 @@ static void ext_queue_schedule_job(struct hl_cs_job *job)
|
|||
|
||||
cq->pi = hl_cq_inc_ptr(cq->pi);
|
||||
|
||||
submit_bd:
|
||||
ext_and_hw_queue_submit_bd(hdev, q, ctl, len, ptr);
|
||||
}
|
||||
|
||||
|
@ -525,6 +535,7 @@ int hl_hw_queue_schedule_cs(struct hl_cs *cs)
|
|||
struct hl_cs_job *job, *tmp;
|
||||
struct hl_hw_queue *q;
|
||||
int rc = 0, i, cq_cnt;
|
||||
bool first_entry;
|
||||
u32 max_queues;
|
||||
|
||||
cntr = &hdev->aggregated_cs_counters;
|
||||
|
@ -548,7 +559,9 @@ int hl_hw_queue_schedule_cs(struct hl_cs *cs)
|
|||
switch (q->queue_type) {
|
||||
case QUEUE_TYPE_EXT:
|
||||
rc = ext_queue_sanity_checks(hdev, q,
|
||||
cs->jobs_in_queue_cnt[i], true);
|
||||
cs->jobs_in_queue_cnt[i],
|
||||
cs_needs_completion(cs) ?
|
||||
true : false);
|
||||
break;
|
||||
case QUEUE_TYPE_INT:
|
||||
rc = int_queue_sanity_checks(hdev, q,
|
||||
|
@ -583,12 +596,38 @@ int hl_hw_queue_schedule_cs(struct hl_cs *cs)
|
|||
hdev->asic_funcs->collective_wait_init_cs(cs);
|
||||
|
||||
spin_lock(&hdev->cs_mirror_lock);
|
||||
|
||||
/* Verify staged CS exists and add to the staged list */
|
||||
if (cs->staged_cs && !cs->staged_first) {
|
||||
struct hl_cs *staged_cs;
|
||||
|
||||
staged_cs = hl_staged_cs_find_first(hdev, cs->staged_sequence);
|
||||
if (!staged_cs) {
|
||||
dev_err(hdev->dev,
|
||||
"Cannot find staged submission sequence %llu",
|
||||
cs->staged_sequence);
|
||||
rc = -EINVAL;
|
||||
goto unlock_cs_mirror;
|
||||
}
|
||||
|
||||
if (is_staged_cs_last_exists(hdev, staged_cs)) {
|
||||
dev_err(hdev->dev,
|
||||
"Staged submission sequence %llu already submitted",
|
||||
cs->staged_sequence);
|
||||
rc = -EINVAL;
|
||||
goto unlock_cs_mirror;
|
||||
}
|
||||
|
||||
list_add_tail(&cs->staged_cs_node, &staged_cs->staged_cs_node);
|
||||
}
|
||||
|
||||
list_add_tail(&cs->mirror_node, &hdev->cs_mirror_list);
|
||||
|
||||
/* Queue TDR if the CS is the first entry and if timeout is wanted */
|
||||
first_entry = list_first_entry(&hdev->cs_mirror_list,
|
||||
struct hl_cs, mirror_node) == cs;
|
||||
if ((hdev->timeout_jiffies != MAX_SCHEDULE_TIMEOUT) &&
|
||||
(list_first_entry(&hdev->cs_mirror_list,
|
||||
struct hl_cs, mirror_node) == cs)) {
|
||||
first_entry && cs_needs_timeout(cs)) {
|
||||
cs->tdr_active = true;
|
||||
schedule_delayed_work(&cs->work_tdr, hdev->timeout_jiffies);
|
||||
|
||||
|
@ -623,6 +662,8 @@ int hl_hw_queue_schedule_cs(struct hl_cs *cs)
|
|||
|
||||
goto out;
|
||||
|
||||
unlock_cs_mirror:
|
||||
spin_unlock(&hdev->cs_mirror_lock);
|
||||
unroll_cq_resv:
|
||||
q = &hdev->kernel_queues[0];
|
||||
for (i = 0 ; (i < max_queues) && (cq_cnt > 0) ; i++, q++) {
|
||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,2 @@
|
|||
# SPDX-License-Identifier: GPL-2.0-only
|
||||
HL_COMMON_MMU_FILES := common/mmu/mmu.o common/mmu/mmu_v1.o
|
|
@ -7,7 +7,7 @@
|
|||
|
||||
#include <linux/slab.h>
|
||||
|
||||
#include "habanalabs.h"
|
||||
#include "../habanalabs.h"
|
||||
|
||||
bool hl_is_dram_va(struct hl_device *hdev, u64 virt_addr)
|
||||
{
|
||||
|
@ -166,7 +166,6 @@ int hl_mmu_unmap_page(struct hl_ctx *ctx, u64 virt_addr, u32 page_size,
|
|||
mmu_prop = &prop->pmmu;
|
||||
|
||||
pgt_residency = mmu_prop->host_resident ? MMU_HR_PGT : MMU_DR_PGT;
|
||||
|
||||
/*
|
||||
* The H/W handles mapping of specific page sizes. Hence if the page
|
||||
* size is bigger, we break it to sub-pages and unmap them separately.
|
||||
|
@ -174,11 +173,21 @@ int hl_mmu_unmap_page(struct hl_ctx *ctx, u64 virt_addr, u32 page_size,
|
|||
if ((page_size % mmu_prop->page_size) == 0) {
|
||||
real_page_size = mmu_prop->page_size;
|
||||
} else {
|
||||
dev_err(hdev->dev,
|
||||
"page size of %u is not %uKB aligned, can't unmap\n",
|
||||
page_size, mmu_prop->page_size >> 10);
|
||||
/*
|
||||
* MMU page size may differ from DRAM page size.
|
||||
* In such case work with the DRAM page size and let the MMU
|
||||
* scrambling routine to handle this mismatch when
|
||||
* calculating the address to remove from the MMU page table
|
||||
*/
|
||||
if (is_dram_addr && ((page_size % prop->dram_page_size) == 0)) {
|
||||
real_page_size = prop->dram_page_size;
|
||||
} else {
|
||||
dev_err(hdev->dev,
|
||||
"page size of %u is not %uKB aligned, can't unmap\n",
|
||||
page_size, mmu_prop->page_size >> 10);
|
||||
|
||||
return -EFAULT;
|
||||
return -EFAULT;
|
||||
}
|
||||
}
|
||||
|
||||
npages = page_size / real_page_size;
|
||||
|
@ -253,6 +262,17 @@ int hl_mmu_map_page(struct hl_ctx *ctx, u64 virt_addr, u64 phys_addr,
|
|||
*/
|
||||
if ((page_size % mmu_prop->page_size) == 0) {
|
||||
real_page_size = mmu_prop->page_size;
|
||||
} else if (is_dram_addr && ((page_size % prop->dram_page_size) == 0) &&
|
||||
(prop->dram_page_size < mmu_prop->page_size)) {
|
||||
/*
|
||||
* MMU page size may differ from DRAM page size.
|
||||
* In such case work with the DRAM page size and let the MMU
|
||||
* scrambling routine handle this mismatch when calculating
|
||||
* the address to place in the MMU page table. (in that case
|
||||
* also make sure that the dram_page_size smaller than the
|
||||
* mmu page size)
|
||||
*/
|
||||
real_page_size = prop->dram_page_size;
|
||||
} else {
|
||||
dev_err(hdev->dev,
|
||||
"page size of %u is not %uKB aligned, can't map\n",
|
||||
|
@ -261,9 +281,21 @@ int hl_mmu_map_page(struct hl_ctx *ctx, u64 virt_addr, u64 phys_addr,
|
|||
return -EFAULT;
|
||||
}
|
||||
|
||||
WARN_ONCE((phys_addr & (real_page_size - 1)),
|
||||
"Mapping 0x%llx with page size of 0x%x is erroneous! Address must be divisible by page size",
|
||||
phys_addr, real_page_size);
|
||||
/*
|
||||
* Verify that the phys and virt addresses are aligned with the
|
||||
* MMU page size (in dram this means checking the address and MMU
|
||||
* after scrambling)
|
||||
*/
|
||||
if ((is_dram_addr &&
|
||||
((hdev->asic_funcs->scramble_addr(hdev, phys_addr) &
|
||||
(mmu_prop->page_size - 1)) ||
|
||||
(hdev->asic_funcs->scramble_addr(hdev, virt_addr) &
|
||||
(mmu_prop->page_size - 1)))) ||
|
||||
(!is_dram_addr && ((phys_addr & (real_page_size - 1)) ||
|
||||
(virt_addr & (real_page_size - 1)))))
|
||||
dev_crit(hdev->dev,
|
||||
"Mapping address 0x%llx with virtual address 0x%llx and page size of 0x%x is erroneous! Addresses must be divisible by page size",
|
||||
phys_addr, virt_addr, real_page_size);
|
||||
|
||||
npages = page_size / real_page_size;
|
||||
real_virt_addr = virt_addr;
|
||||
|
@ -444,19 +476,53 @@ void hl_mmu_swap_in(struct hl_ctx *ctx)
|
|||
hdev->mmu_func[MMU_HR_PGT].swap_in(ctx);
|
||||
}
|
||||
|
||||
static void hl_mmu_pa_page_with_offset(struct hl_ctx *ctx, u64 virt_addr,
|
||||
struct hl_mmu_hop_info *hops,
|
||||
u64 *phys_addr)
|
||||
{
|
||||
struct hl_device *hdev = ctx->hdev;
|
||||
struct asic_fixed_properties *prop = &hdev->asic_prop;
|
||||
u64 offset_mask, addr_mask, hop_shift, tmp_phys_addr;
|
||||
u32 hop0_shift_off;
|
||||
void *p;
|
||||
|
||||
/* last hop holds the phys address and flags */
|
||||
if (hops->unscrambled_paddr)
|
||||
tmp_phys_addr = hops->unscrambled_paddr;
|
||||
else
|
||||
tmp_phys_addr = hops->hop_info[hops->used_hops - 1].hop_pte_val;
|
||||
|
||||
if (hops->range_type == HL_VA_RANGE_TYPE_HOST_HUGE)
|
||||
p = &prop->pmmu_huge;
|
||||
else if (hops->range_type == HL_VA_RANGE_TYPE_HOST)
|
||||
p = &prop->pmmu;
|
||||
else /* HL_VA_RANGE_TYPE_DRAM */
|
||||
p = &prop->dmmu;
|
||||
|
||||
/*
|
||||
* find the correct hop shift field in hl_mmu_properties structure
|
||||
* in order to determine the right maks for the page offset.
|
||||
*/
|
||||
hop0_shift_off = offsetof(struct hl_mmu_properties, hop0_shift);
|
||||
p = (char *)p + hop0_shift_off;
|
||||
p = (char *)p + ((hops->used_hops - 1) * sizeof(u64));
|
||||
hop_shift = *(u64 *)p;
|
||||
offset_mask = (1 << hop_shift) - 1;
|
||||
addr_mask = ~(offset_mask);
|
||||
*phys_addr = (tmp_phys_addr & addr_mask) |
|
||||
(virt_addr & offset_mask);
|
||||
}
|
||||
|
||||
int hl_mmu_va_to_pa(struct hl_ctx *ctx, u64 virt_addr, u64 *phys_addr)
|
||||
{
|
||||
struct hl_mmu_hop_info hops;
|
||||
u64 tmp_addr;
|
||||
int rc;
|
||||
|
||||
rc = hl_mmu_get_tlb_info(ctx, virt_addr, &hops);
|
||||
if (rc)
|
||||
return rc;
|
||||
|
||||
/* last hop holds the phys address and flags */
|
||||
tmp_addr = hops.hop_info[hops.used_hops - 1].hop_pte_val;
|
||||
*phys_addr = (tmp_addr & HOP_PHYS_ADDR_MASK) | (virt_addr & FLAGS_MASK);
|
||||
hl_mmu_pa_page_with_offset(ctx, virt_addr, &hops, phys_addr);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
@ -473,6 +539,8 @@ int hl_mmu_get_tlb_info(struct hl_ctx *ctx, u64 virt_addr,
|
|||
if (!hdev->mmu_enable)
|
||||
return -EOPNOTSUPP;
|
||||
|
||||
hops->scrambled_vaddr = virt_addr; /* assume no scrambling */
|
||||
|
||||
is_dram_addr = hl_mem_area_inside_range(virt_addr, prop->dmmu.page_size,
|
||||
prop->dmmu.start_addr,
|
||||
prop->dmmu.end_addr);
|
||||
|
@ -491,6 +559,11 @@ int hl_mmu_get_tlb_info(struct hl_ctx *ctx, u64 virt_addr,
|
|||
|
||||
mutex_unlock(&ctx->mmu_lock);
|
||||
|
||||
/* add page offset to physical address */
|
||||
if (hops->unscrambled_paddr)
|
||||
hl_mmu_pa_page_with_offset(ctx, virt_addr, hops,
|
||||
&hops->unscrambled_paddr);
|
||||
|
||||
return rc;
|
||||
}
|
||||
|
||||
|
@ -512,3 +585,28 @@ int hl_mmu_if_set_funcs(struct hl_device *hdev)
|
|||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/**
|
||||
* hl_mmu_scramble_addr() - The generic mmu address scrambling routine.
|
||||
* @hdev: pointer to device data.
|
||||
* @addr: The address to scramble.
|
||||
*
|
||||
* Return: The scrambled address.
|
||||
*/
|
||||
u64 hl_mmu_scramble_addr(struct hl_device *hdev, u64 addr)
|
||||
{
|
||||
return addr;
|
||||
}
|
||||
|
||||
/**
|
||||
* hl_mmu_descramble_addr() - The generic mmu address descrambling
|
||||
* routine.
|
||||
* @hdev: pointer to device data.
|
||||
* @addr: The address to descramble.
|
||||
*
|
||||
* Return: The un-scrambled address.
|
||||
*/
|
||||
u64 hl_mmu_descramble_addr(struct hl_device *hdev, u64 addr)
|
||||
{
|
||||
return addr;
|
||||
}
|
|
@ -5,8 +5,8 @@
|
|||
* All Rights Reserved.
|
||||
*/
|
||||
|
||||
#include "habanalabs.h"
|
||||
#include "../include/hw_ip/mmu/mmu_general.h"
|
||||
#include "../habanalabs.h"
|
||||
#include "../../include/hw_ip/mmu/mmu_general.h"
|
||||
|
||||
#include <linux/slab.h>
|
||||
|
|
@ -0,0 +1,2 @@
|
|||
# SPDX-License-Identifier: GPL-2.0-only
|
||||
HL_COMMON_PCI_FILES := common/pci/pci.o
|
|
@ -5,8 +5,8 @@
|
|||
* All Rights Reserved.
|
||||
*/
|
||||
|
||||
#include "habanalabs.h"
|
||||
#include "../include/hw_ip/pci/pci_general.h"
|
||||
#include "../habanalabs.h"
|
||||
#include "../../include/hw_ip/pci/pci_general.h"
|
||||
|
||||
#include <linux/pci.h>
|
||||
|
||||
|
@ -307,40 +307,6 @@ int hl_pci_set_outbound_region(struct hl_device *hdev,
|
|||
return rc;
|
||||
}
|
||||
|
||||
/**
|
||||
* hl_pci_set_dma_mask() - Set DMA masks for the device.
|
||||
* @hdev: Pointer to hl_device structure.
|
||||
*
|
||||
* This function sets the DMA masks (regular and consistent) for a specified
|
||||
* value. If it doesn't succeed, it tries to set it to a fall-back value
|
||||
*
|
||||
* Return: 0 on success, non-zero for failure.
|
||||
*/
|
||||
static int hl_pci_set_dma_mask(struct hl_device *hdev)
|
||||
{
|
||||
struct pci_dev *pdev = hdev->pdev;
|
||||
int rc;
|
||||
|
||||
/* set DMA mask */
|
||||
rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(hdev->dma_mask));
|
||||
if (rc) {
|
||||
dev_err(hdev->dev,
|
||||
"Failed to set pci dma mask to %d bits, error %d\n",
|
||||
hdev->dma_mask, rc);
|
||||
return rc;
|
||||
}
|
||||
|
||||
rc = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(hdev->dma_mask));
|
||||
if (rc) {
|
||||
dev_err(hdev->dev,
|
||||
"Failed to set pci consistent dma mask to %d bits, error %d\n",
|
||||
hdev->dma_mask, rc);
|
||||
return rc;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/**
|
||||
* hl_pci_init() - PCI initialization code.
|
||||
* @hdev: Pointer to hl_device structure.
|
||||
|
@ -377,9 +343,14 @@ int hl_pci_init(struct hl_device *hdev)
|
|||
goto unmap_pci_bars;
|
||||
}
|
||||
|
||||
rc = hl_pci_set_dma_mask(hdev);
|
||||
if (rc)
|
||||
rc = dma_set_mask_and_coherent(&pdev->dev,
|
||||
DMA_BIT_MASK(hdev->dma_mask));
|
||||
if (rc) {
|
||||
dev_err(hdev->dev,
|
||||
"Failed to set dma mask to %d bits, error %d\n",
|
||||
hdev->dma_mask, rc);
|
||||
goto unmap_pci_bars;
|
||||
}
|
||||
|
||||
return 0;
|
||||
|
|
@ -225,6 +225,12 @@ gaudi_qman_arb_error_cause[GAUDI_NUM_OF_QM_ARB_ERR_CAUSE] = {
|
|||
"MSG AXI LBW returned with error"
|
||||
};
|
||||
|
||||
enum gaudi_sm_sei_cause {
|
||||
GAUDI_SM_SEI_SO_OVERFLOW,
|
||||
GAUDI_SM_SEI_LBW_4B_UNALIGNED,
|
||||
GAUDI_SM_SEI_AXI_RESPONSE_ERR
|
||||
};
|
||||
|
||||
static enum hl_queue_type gaudi_queue_type[GAUDI_QUEUE_ID_SIZE] = {
|
||||
QUEUE_TYPE_EXT, /* GAUDI_QUEUE_ID_DMA_0_0 */
|
||||
QUEUE_TYPE_EXT, /* GAUDI_QUEUE_ID_DMA_0_1 */
|
||||
|
@ -354,6 +360,10 @@ static int gaudi_send_job_on_qman0(struct hl_device *hdev,
|
|||
struct hl_cs_job *job);
|
||||
static int gaudi_memset_device_memory(struct hl_device *hdev, u64 addr,
|
||||
u32 size, u64 val);
|
||||
static int gaudi_memset_registers(struct hl_device *hdev, u64 reg_base,
|
||||
u32 num_regs, u32 val);
|
||||
static int gaudi_schedule_register_memset(struct hl_device *hdev,
|
||||
u32 hw_queue_id, u64 reg_base, u32 num_regs, u32 val);
|
||||
static int gaudi_run_tpc_kernel(struct hl_device *hdev, u64 tpc_kernel,
|
||||
u32 tpc_id);
|
||||
static int gaudi_mmu_clear_pgt_range(struct hl_device *hdev);
|
||||
|
@ -517,6 +527,8 @@ static int gaudi_get_fixed_properties(struct hl_device *hdev)
|
|||
prop->sync_stream_first_mon +
|
||||
(num_sync_stream_queues * HL_RSVD_MONS);
|
||||
|
||||
prop->first_available_user_msix_interrupt = USHRT_MAX;
|
||||
|
||||
/* disable fw security for now, set it in a later stage */
|
||||
prop->fw_security_disabled = true;
|
||||
prop->fw_security_status_valid = false;
|
||||
|
@ -913,11 +925,17 @@ static void gaudi_sob_group_hw_reset(struct kref *ref)
|
|||
struct gaudi_hw_sob_group *hw_sob_group =
|
||||
container_of(ref, struct gaudi_hw_sob_group, kref);
|
||||
struct hl_device *hdev = hw_sob_group->hdev;
|
||||
int i;
|
||||
u64 base_addr;
|
||||
int rc;
|
||||
|
||||
for (i = 0 ; i < NUMBER_OF_SOBS_IN_GRP ; i++)
|
||||
WREG32(mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0 +
|
||||
(hw_sob_group->base_sob_id + i) * 4, 0);
|
||||
base_addr = CFG_BASE + mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0 +
|
||||
hw_sob_group->base_sob_id * 4;
|
||||
rc = gaudi_schedule_register_memset(hdev, hw_sob_group->queue_id,
|
||||
base_addr, NUMBER_OF_SOBS_IN_GRP, 0);
|
||||
if (rc)
|
||||
dev_err(hdev->dev,
|
||||
"failed resetting sob group - sob base %u, count %u",
|
||||
hw_sob_group->base_sob_id, NUMBER_OF_SOBS_IN_GRP);
|
||||
|
||||
kref_init(&hw_sob_group->kref);
|
||||
}
|
||||
|
@ -1008,6 +1026,8 @@ static void gaudi_collective_master_init_job(struct hl_device *hdev,
|
|||
cprop->hw_sob_group[sob_group_offset].base_sob_id;
|
||||
master_monitor = prop->collective_mstr_mon_id[0];
|
||||
|
||||
cprop->hw_sob_group[sob_group_offset].queue_id = queue_id;
|
||||
|
||||
dev_dbg(hdev->dev,
|
||||
"Generate master wait CBs, sob %d (mask %#x), val:0x%x, mon %u, q %d\n",
|
||||
master_sob_base, cprop->mstr_sob_mask[0],
|
||||
|
@ -1248,7 +1268,7 @@ static int gaudi_collective_wait_create_jobs(struct hl_device *hdev,
|
|||
u32 queue_id, collective_queue, num_jobs;
|
||||
u32 stream, nic_queue, nic_idx = 0;
|
||||
bool skip;
|
||||
int i, rc;
|
||||
int i, rc = 0;
|
||||
|
||||
/* Verify wait queue id is configured as master */
|
||||
hw_queue_prop = &hdev->asic_prop.hw_queues_props[wait_queue_id];
|
||||
|
@ -1607,6 +1627,7 @@ static int gaudi_sw_init(struct hl_device *hdev)
|
|||
|
||||
hdev->supports_sync_stream = true;
|
||||
hdev->supports_coresight = true;
|
||||
hdev->supports_staged_submission = true;
|
||||
|
||||
return 0;
|
||||
|
||||
|
@ -4518,7 +4539,6 @@ static int gaudi_scrub_device_mem(struct hl_device *hdev, u64 addr, u64 size)
|
|||
{
|
||||
struct asic_fixed_properties *prop = &hdev->asic_prop;
|
||||
struct gaudi_device *gaudi = hdev->asic_specific;
|
||||
u64 idle_mask = 0;
|
||||
int rc = 0;
|
||||
u64 val = 0;
|
||||
|
||||
|
@ -4531,8 +4551,8 @@ static int gaudi_scrub_device_mem(struct hl_device *hdev, u64 addr, u64 size)
|
|||
hdev,
|
||||
mmDMA0_CORE_STS0/* dummy */,
|
||||
val/* dummy */,
|
||||
(hdev->asic_funcs->is_device_idle(hdev,
|
||||
&idle_mask, NULL)),
|
||||
(hdev->asic_funcs->is_device_idle(hdev, NULL,
|
||||
0, NULL)),
|
||||
1000,
|
||||
HBM_SCRUBBING_TIMEOUT_US);
|
||||
if (rc) {
|
||||
|
@ -5060,7 +5080,8 @@ static int gaudi_validate_cb(struct hl_device *hdev,
|
|||
* 1. A packet that will act as a completion packet
|
||||
* 2. A packet that will generate MSI-X interrupt
|
||||
*/
|
||||
parser->patched_cb_size += sizeof(struct packet_msg_prot) * 2;
|
||||
if (parser->completion)
|
||||
parser->patched_cb_size += sizeof(struct packet_msg_prot) * 2;
|
||||
|
||||
return rc;
|
||||
}
|
||||
|
@ -5287,8 +5308,11 @@ static int gaudi_parse_cb_mmu(struct hl_device *hdev,
|
|||
* 1. A packet that will act as a completion packet
|
||||
* 2. A packet that will generate MSI interrupt
|
||||
*/
|
||||
parser->patched_cb_size = parser->user_cb_size +
|
||||
sizeof(struct packet_msg_prot) * 2;
|
||||
if (parser->completion)
|
||||
parser->patched_cb_size = parser->user_cb_size +
|
||||
sizeof(struct packet_msg_prot) * 2;
|
||||
else
|
||||
parser->patched_cb_size = parser->user_cb_size;
|
||||
|
||||
rc = hl_cb_create(hdev, &hdev->kernel_cb_mgr, hdev->kernel_ctx,
|
||||
parser->patched_cb_size, false, false,
|
||||
|
@ -5304,10 +5328,10 @@ static int gaudi_parse_cb_mmu(struct hl_device *hdev,
|
|||
patched_cb_handle >>= PAGE_SHIFT;
|
||||
parser->patched_cb = hl_cb_get(hdev, &hdev->kernel_cb_mgr,
|
||||
(u32) patched_cb_handle);
|
||||
/* hl_cb_get should never fail here so use kernel WARN */
|
||||
WARN(!parser->patched_cb, "DMA CB handle invalid 0x%x\n",
|
||||
(u32) patched_cb_handle);
|
||||
/* hl_cb_get should never fail */
|
||||
if (!parser->patched_cb) {
|
||||
dev_crit(hdev->dev, "DMA CB handle invalid 0x%x\n",
|
||||
(u32) patched_cb_handle);
|
||||
rc = -EFAULT;
|
||||
goto out;
|
||||
}
|
||||
|
@ -5376,10 +5400,10 @@ static int gaudi_parse_cb_no_mmu(struct hl_device *hdev,
|
|||
patched_cb_handle >>= PAGE_SHIFT;
|
||||
parser->patched_cb = hl_cb_get(hdev, &hdev->kernel_cb_mgr,
|
||||
(u32) patched_cb_handle);
|
||||
/* hl_cb_get should never fail here so use kernel WARN */
|
||||
WARN(!parser->patched_cb, "DMA CB handle invalid 0x%x\n",
|
||||
(u32) patched_cb_handle);
|
||||
/* hl_cb_get should never fail here */
|
||||
if (!parser->patched_cb) {
|
||||
dev_crit(hdev->dev, "DMA CB handle invalid 0x%x\n",
|
||||
(u32) patched_cb_handle);
|
||||
rc = -EFAULT;
|
||||
goto out;
|
||||
}
|
||||
|
@ -5579,31 +5603,206 @@ release_cb:
|
|||
return rc;
|
||||
}
|
||||
|
||||
static void gaudi_restore_sm_registers(struct hl_device *hdev)
|
||||
static int gaudi_memset_registers(struct hl_device *hdev, u64 reg_base,
|
||||
u32 num_regs, u32 val)
|
||||
{
|
||||
struct packet_msg_long *pkt;
|
||||
struct hl_cs_job *job;
|
||||
u32 cb_size, ctl;
|
||||
struct hl_cb *cb;
|
||||
int i, rc;
|
||||
|
||||
cb_size = (sizeof(*pkt) * num_regs) + sizeof(struct packet_msg_prot);
|
||||
|
||||
if (cb_size > SZ_2M) {
|
||||
dev_err(hdev->dev, "CB size must be smaller than %uMB", SZ_2M);
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
cb = hl_cb_kernel_create(hdev, cb_size, false);
|
||||
if (!cb)
|
||||
return -EFAULT;
|
||||
|
||||
pkt = cb->kernel_address;
|
||||
|
||||
ctl = FIELD_PREP(GAUDI_PKT_LONG_CTL_OP_MASK, 0); /* write the value */
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_MSG_LONG);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_CTL_EB_MASK, 1);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_CTL_RB_MASK, 1);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
|
||||
|
||||
for (i = 0; i < num_regs ; i++, pkt++) {
|
||||
pkt->ctl = cpu_to_le32(ctl);
|
||||
pkt->value = cpu_to_le32(val);
|
||||
pkt->addr = cpu_to_le64(reg_base + (i * 4));
|
||||
}
|
||||
|
||||
job = hl_cs_allocate_job(hdev, QUEUE_TYPE_EXT, true);
|
||||
if (!job) {
|
||||
dev_err(hdev->dev, "Failed to allocate a new job\n");
|
||||
rc = -ENOMEM;
|
||||
goto release_cb;
|
||||
}
|
||||
|
||||
job->id = 0;
|
||||
job->user_cb = cb;
|
||||
atomic_inc(&job->user_cb->cs_cnt);
|
||||
job->user_cb_size = cb_size;
|
||||
job->hw_queue_id = GAUDI_QUEUE_ID_DMA_0_0;
|
||||
job->patched_cb = job->user_cb;
|
||||
job->job_cb_size = cb_size;
|
||||
|
||||
hl_debugfs_add_job(hdev, job);
|
||||
|
||||
rc = gaudi_send_job_on_qman0(hdev, job);
|
||||
hl_debugfs_remove_job(hdev, job);
|
||||
kfree(job);
|
||||
atomic_dec(&cb->cs_cnt);
|
||||
|
||||
release_cb:
|
||||
hl_cb_put(cb);
|
||||
hl_cb_destroy(hdev, &hdev->kernel_cb_mgr, cb->id << PAGE_SHIFT);
|
||||
|
||||
return rc;
|
||||
}
|
||||
|
||||
static int gaudi_schedule_register_memset(struct hl_device *hdev,
|
||||
u32 hw_queue_id, u64 reg_base, u32 num_regs, u32 val)
|
||||
{
|
||||
struct hl_ctx *ctx = hdev->compute_ctx;
|
||||
struct hl_pending_cb *pending_cb;
|
||||
struct packet_msg_long *pkt;
|
||||
u32 cb_size, ctl;
|
||||
struct hl_cb *cb;
|
||||
int i;
|
||||
|
||||
for (i = 0 ; i < NUM_OF_SOB_IN_BLOCK << 2 ; i += 4) {
|
||||
WREG32(mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0 + i, 0);
|
||||
WREG32(mmSYNC_MNGR_E_S_SYNC_MNGR_OBJS_SOB_OBJ_0 + i, 0);
|
||||
WREG32(mmSYNC_MNGR_W_N_SYNC_MNGR_OBJS_SOB_OBJ_0 + i, 0);
|
||||
/* If no compute context available or context is going down
|
||||
* memset registers directly
|
||||
*/
|
||||
if (!ctx || kref_read(&ctx->refcount) == 0)
|
||||
return gaudi_memset_registers(hdev, reg_base, num_regs, val);
|
||||
|
||||
cb_size = (sizeof(*pkt) * num_regs) +
|
||||
sizeof(struct packet_msg_prot) * 2;
|
||||
|
||||
if (cb_size > SZ_2M) {
|
||||
dev_err(hdev->dev, "CB size must be smaller than %uMB", SZ_2M);
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
for (i = 0 ; i < NUM_OF_MONITORS_IN_BLOCK << 2 ; i += 4) {
|
||||
WREG32(mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_STATUS_0 + i, 0);
|
||||
WREG32(mmSYNC_MNGR_E_S_SYNC_MNGR_OBJS_MON_STATUS_0 + i, 0);
|
||||
WREG32(mmSYNC_MNGR_W_N_SYNC_MNGR_OBJS_MON_STATUS_0 + i, 0);
|
||||
pending_cb = kzalloc(sizeof(*pending_cb), GFP_KERNEL);
|
||||
if (!pending_cb)
|
||||
return -ENOMEM;
|
||||
|
||||
cb = hl_cb_kernel_create(hdev, cb_size, false);
|
||||
if (!cb) {
|
||||
kfree(pending_cb);
|
||||
return -EFAULT;
|
||||
}
|
||||
|
||||
i = GAUDI_FIRST_AVAILABLE_W_S_SYNC_OBJECT * 4;
|
||||
pkt = cb->kernel_address;
|
||||
|
||||
for (; i < NUM_OF_SOB_IN_BLOCK << 2 ; i += 4)
|
||||
WREG32(mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0 + i, 0);
|
||||
ctl = FIELD_PREP(GAUDI_PKT_LONG_CTL_OP_MASK, 0); /* write the value */
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_MSG_LONG);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_CTL_EB_MASK, 1);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_CTL_RB_MASK, 1);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
|
||||
|
||||
i = GAUDI_FIRST_AVAILABLE_W_S_MONITOR * 4;
|
||||
for (i = 0; i < num_regs ; i++, pkt++) {
|
||||
pkt->ctl = cpu_to_le32(ctl);
|
||||
pkt->value = cpu_to_le32(val);
|
||||
pkt->addr = cpu_to_le64(reg_base + (i * 4));
|
||||
}
|
||||
|
||||
for (; i < NUM_OF_MONITORS_IN_BLOCK << 2 ; i += 4)
|
||||
WREG32(mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_STATUS_0 + i, 0);
|
||||
hl_cb_destroy(hdev, &hdev->kernel_cb_mgr, cb->id << PAGE_SHIFT);
|
||||
|
||||
pending_cb->cb = cb;
|
||||
pending_cb->cb_size = cb_size;
|
||||
/* The queue ID MUST be an external queue ID. Otherwise, we will
|
||||
* have undefined behavior
|
||||
*/
|
||||
pending_cb->hw_queue_id = hw_queue_id;
|
||||
|
||||
spin_lock(&ctx->pending_cb_lock);
|
||||
list_add_tail(&pending_cb->cb_node, &ctx->pending_cb_list);
|
||||
spin_unlock(&ctx->pending_cb_lock);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int gaudi_restore_sm_registers(struct hl_device *hdev)
|
||||
{
|
||||
u64 base_addr;
|
||||
u32 num_regs;
|
||||
int rc;
|
||||
|
||||
base_addr = CFG_BASE + mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0;
|
||||
num_regs = NUM_OF_SOB_IN_BLOCK;
|
||||
rc = gaudi_memset_registers(hdev, base_addr, num_regs, 0);
|
||||
if (rc) {
|
||||
dev_err(hdev->dev, "failed resetting SM registers");
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
base_addr = CFG_BASE + mmSYNC_MNGR_E_S_SYNC_MNGR_OBJS_SOB_OBJ_0;
|
||||
num_regs = NUM_OF_SOB_IN_BLOCK;
|
||||
rc = gaudi_memset_registers(hdev, base_addr, num_regs, 0);
|
||||
if (rc) {
|
||||
dev_err(hdev->dev, "failed resetting SM registers");
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
base_addr = CFG_BASE + mmSYNC_MNGR_W_N_SYNC_MNGR_OBJS_SOB_OBJ_0;
|
||||
num_regs = NUM_OF_SOB_IN_BLOCK;
|
||||
rc = gaudi_memset_registers(hdev, base_addr, num_regs, 0);
|
||||
if (rc) {
|
||||
dev_err(hdev->dev, "failed resetting SM registers");
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
base_addr = CFG_BASE + mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_STATUS_0;
|
||||
num_regs = NUM_OF_MONITORS_IN_BLOCK;
|
||||
rc = gaudi_memset_registers(hdev, base_addr, num_regs, 0);
|
||||
if (rc) {
|
||||
dev_err(hdev->dev, "failed resetting SM registers");
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
base_addr = CFG_BASE + mmSYNC_MNGR_E_S_SYNC_MNGR_OBJS_MON_STATUS_0;
|
||||
num_regs = NUM_OF_MONITORS_IN_BLOCK;
|
||||
rc = gaudi_memset_registers(hdev, base_addr, num_regs, 0);
|
||||
if (rc) {
|
||||
dev_err(hdev->dev, "failed resetting SM registers");
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
base_addr = CFG_BASE + mmSYNC_MNGR_W_N_SYNC_MNGR_OBJS_MON_STATUS_0;
|
||||
num_regs = NUM_OF_MONITORS_IN_BLOCK;
|
||||
rc = gaudi_memset_registers(hdev, base_addr, num_regs, 0);
|
||||
if (rc) {
|
||||
dev_err(hdev->dev, "failed resetting SM registers");
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
base_addr = CFG_BASE + mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0 +
|
||||
(GAUDI_FIRST_AVAILABLE_W_S_SYNC_OBJECT * 4);
|
||||
num_regs = NUM_OF_SOB_IN_BLOCK - GAUDI_FIRST_AVAILABLE_W_S_SYNC_OBJECT;
|
||||
rc = gaudi_memset_registers(hdev, base_addr, num_regs, 0);
|
||||
if (rc) {
|
||||
dev_err(hdev->dev, "failed resetting SM registers");
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
base_addr = CFG_BASE + mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_STATUS_0 +
|
||||
(GAUDI_FIRST_AVAILABLE_W_S_MONITOR * 4);
|
||||
num_regs = NUM_OF_MONITORS_IN_BLOCK - GAUDI_FIRST_AVAILABLE_W_S_MONITOR;
|
||||
rc = gaudi_memset_registers(hdev, base_addr, num_regs, 0);
|
||||
if (rc) {
|
||||
dev_err(hdev->dev, "failed resetting SM registers");
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void gaudi_restore_dma_registers(struct hl_device *hdev)
|
||||
|
@ -5660,18 +5859,23 @@ static void gaudi_restore_qm_registers(struct hl_device *hdev)
|
|||
}
|
||||
}
|
||||
|
||||
static void gaudi_restore_user_registers(struct hl_device *hdev)
|
||||
static int gaudi_restore_user_registers(struct hl_device *hdev)
|
||||
{
|
||||
gaudi_restore_sm_registers(hdev);
|
||||
int rc;
|
||||
|
||||
rc = gaudi_restore_sm_registers(hdev);
|
||||
if (rc)
|
||||
return rc;
|
||||
|
||||
gaudi_restore_dma_registers(hdev);
|
||||
gaudi_restore_qm_registers(hdev);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int gaudi_context_switch(struct hl_device *hdev, u32 asid)
|
||||
{
|
||||
gaudi_restore_user_registers(hdev);
|
||||
|
||||
return 0;
|
||||
return gaudi_restore_user_registers(hdev);
|
||||
}
|
||||
|
||||
static int gaudi_mmu_clear_pgt_range(struct hl_device *hdev)
|
||||
|
@ -5730,8 +5934,6 @@ static int gaudi_debugfs_read32(struct hl_device *hdev, u64 addr, u32 *val)
|
|||
}
|
||||
if (hbm_bar_addr == U64_MAX)
|
||||
rc = -EIO;
|
||||
} else if (addr >= HOST_PHYS_BASE && !iommu_present(&pci_bus_type)) {
|
||||
*val = *(u32 *) phys_to_virt(addr - HOST_PHYS_BASE);
|
||||
} else {
|
||||
rc = -EFAULT;
|
||||
}
|
||||
|
@ -5777,8 +5979,6 @@ static int gaudi_debugfs_write32(struct hl_device *hdev, u64 addr, u32 val)
|
|||
}
|
||||
if (hbm_bar_addr == U64_MAX)
|
||||
rc = -EIO;
|
||||
} else if (addr >= HOST_PHYS_BASE && !iommu_present(&pci_bus_type)) {
|
||||
*(u32 *) phys_to_virt(addr - HOST_PHYS_BASE) = val;
|
||||
} else {
|
||||
rc = -EFAULT;
|
||||
}
|
||||
|
@ -5828,8 +6028,6 @@ static int gaudi_debugfs_read64(struct hl_device *hdev, u64 addr, u64 *val)
|
|||
}
|
||||
if (hbm_bar_addr == U64_MAX)
|
||||
rc = -EIO;
|
||||
} else if (addr >= HOST_PHYS_BASE && !iommu_present(&pci_bus_type)) {
|
||||
*val = *(u64 *) phys_to_virt(addr - HOST_PHYS_BASE);
|
||||
} else {
|
||||
rc = -EFAULT;
|
||||
}
|
||||
|
@ -5878,8 +6076,6 @@ static int gaudi_debugfs_write64(struct hl_device *hdev, u64 addr, u64 val)
|
|||
}
|
||||
if (hbm_bar_addr == U64_MAX)
|
||||
rc = -EIO;
|
||||
} else if (addr >= HOST_PHYS_BASE && !iommu_present(&pci_bus_type)) {
|
||||
*(u64 *) phys_to_virt(addr - HOST_PHYS_BASE) = val;
|
||||
} else {
|
||||
rc = -EFAULT;
|
||||
}
|
||||
|
@ -5924,7 +6120,7 @@ static void gaudi_mmu_prepare(struct hl_device *hdev, u32 asid)
|
|||
return;
|
||||
|
||||
if (asid & ~DMA0_QM_GLBL_NON_SECURE_PROPS_0_ASID_MASK) {
|
||||
WARN(1, "asid %u is too big\n", asid);
|
||||
dev_crit(hdev->dev, "asid %u is too big\n", asid);
|
||||
return;
|
||||
}
|
||||
|
||||
|
@ -6227,7 +6423,7 @@ static int gaudi_send_job_on_qman0(struct hl_device *hdev,
|
|||
else
|
||||
timeout = HL_DEVICE_TIMEOUT_USEC;
|
||||
|
||||
if (!hdev->asic_funcs->is_device_idle(hdev, NULL, NULL)) {
|
||||
if (!hdev->asic_funcs->is_device_idle(hdev, NULL, 0, NULL)) {
|
||||
dev_err_ratelimited(hdev->dev,
|
||||
"Can't send driver job on QMAN0 because the device is not idle\n");
|
||||
return -EBUSY;
|
||||
|
@ -6658,6 +6854,34 @@ static void gaudi_handle_qman_err_generic(struct hl_device *hdev,
|
|||
}
|
||||
}
|
||||
|
||||
static void gaudi_print_sm_sei_info(struct hl_device *hdev, u16 event_type,
|
||||
struct hl_eq_sm_sei_data *sei_data)
|
||||
{
|
||||
u32 index = event_type - GAUDI_EVENT_DMA_IF_SEI_0;
|
||||
|
||||
switch (sei_data->sei_cause) {
|
||||
case SM_SEI_SO_OVERFLOW:
|
||||
dev_err(hdev->dev,
|
||||
"SM %u SEI Error: SO %u overflow/underflow",
|
||||
index, le32_to_cpu(sei_data->sei_log));
|
||||
break;
|
||||
case SM_SEI_LBW_4B_UNALIGNED:
|
||||
dev_err(hdev->dev,
|
||||
"SM %u SEI Error: Unaligned 4B LBW access, monitor agent address low - %#x",
|
||||
index, le32_to_cpu(sei_data->sei_log));
|
||||
break;
|
||||
case SM_SEI_AXI_RESPONSE_ERR:
|
||||
dev_err(hdev->dev,
|
||||
"SM %u SEI Error: AXI ID %u response error",
|
||||
index, le32_to_cpu(sei_data->sei_log));
|
||||
break;
|
||||
default:
|
||||
dev_err(hdev->dev, "Unknown SM SEI cause %u",
|
||||
le32_to_cpu(sei_data->sei_log));
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
static void gaudi_handle_ecc_event(struct hl_device *hdev, u16 event_type,
|
||||
struct hl_eq_ecc_data *ecc_data)
|
||||
{
|
||||
|
@ -7153,6 +7377,7 @@ static void gaudi_handle_eqe(struct hl_device *hdev,
|
|||
gaudi_hbm_read_interrupts(hdev,
|
||||
gaudi_hbm_event_to_dev(event_type),
|
||||
&eq_entry->hbm_ecc_data);
|
||||
hl_fw_unmask_irq(hdev, event_type);
|
||||
break;
|
||||
|
||||
case GAUDI_EVENT_TPC0_DEC:
|
||||
|
@ -7281,6 +7506,13 @@ static void gaudi_handle_eqe(struct hl_device *hdev,
|
|||
hl_fw_unmask_irq(hdev, event_type);
|
||||
break;
|
||||
|
||||
case GAUDI_EVENT_DMA_IF_SEI_0 ... GAUDI_EVENT_DMA_IF_SEI_3:
|
||||
gaudi_print_irq_info(hdev, event_type, false);
|
||||
gaudi_print_sm_sei_info(hdev, event_type,
|
||||
&eq_entry->sm_sei_data);
|
||||
hl_fw_unmask_irq(hdev, event_type);
|
||||
break;
|
||||
|
||||
case GAUDI_EVENT_FIX_POWER_ENV_S ... GAUDI_EVENT_FIX_THERMAL_ENV_E:
|
||||
gaudi_print_clk_change_info(hdev, event_type);
|
||||
hl_fw_unmask_irq(hdev, event_type);
|
||||
|
@ -7330,8 +7562,6 @@ static int gaudi_mmu_invalidate_cache(struct hl_device *hdev, bool is_hard,
|
|||
else
|
||||
timeout_usec = MMU_CONFIG_TIMEOUT_USEC;
|
||||
|
||||
mutex_lock(&hdev->mmu_cache_lock);
|
||||
|
||||
/* L0 & L1 invalidation */
|
||||
WREG32(mmSTLB_INV_PS, 3);
|
||||
WREG32(mmSTLB_CACHE_INV, gaudi->mmu_cache_inv_pi++);
|
||||
|
@ -7347,8 +7577,6 @@ static int gaudi_mmu_invalidate_cache(struct hl_device *hdev, bool is_hard,
|
|||
|
||||
WREG32(mmSTLB_INV_SET, 0);
|
||||
|
||||
mutex_unlock(&hdev->mmu_cache_lock);
|
||||
|
||||
if (rc) {
|
||||
dev_err_ratelimited(hdev->dev,
|
||||
"MMU cache invalidation timeout\n");
|
||||
|
@ -7371,8 +7599,6 @@ static int gaudi_mmu_invalidate_cache_range(struct hl_device *hdev,
|
|||
hdev->hard_reset_pending)
|
||||
return 0;
|
||||
|
||||
mutex_lock(&hdev->mmu_cache_lock);
|
||||
|
||||
if (hdev->pldm)
|
||||
timeout_usec = GAUDI_PLDM_MMU_TIMEOUT_USEC;
|
||||
else
|
||||
|
@ -7400,8 +7626,6 @@ static int gaudi_mmu_invalidate_cache_range(struct hl_device *hdev,
|
|||
1000,
|
||||
timeout_usec);
|
||||
|
||||
mutex_unlock(&hdev->mmu_cache_lock);
|
||||
|
||||
if (rc) {
|
||||
dev_err_ratelimited(hdev->dev,
|
||||
"MMU cache invalidation timeout\n");
|
||||
|
@ -7463,7 +7687,7 @@ static int gaudi_cpucp_info_get(struct hl_device *hdev)
|
|||
if (!(gaudi->hw_cap_initialized & HW_CAP_CPU_Q))
|
||||
return 0;
|
||||
|
||||
rc = hl_fw_cpucp_info_get(hdev, mmCPU_BOOT_DEV_STS0);
|
||||
rc = hl_fw_cpucp_info_get(hdev, mmCPU_BOOT_DEV_STS0, mmCPU_BOOT_ERR0);
|
||||
if (rc)
|
||||
return rc;
|
||||
|
||||
|
@ -7483,13 +7707,14 @@ static int gaudi_cpucp_info_get(struct hl_device *hdev)
|
|||
return 0;
|
||||
}
|
||||
|
||||
static bool gaudi_is_device_idle(struct hl_device *hdev, u64 *mask,
|
||||
struct seq_file *s)
|
||||
static bool gaudi_is_device_idle(struct hl_device *hdev, u64 *mask_arr,
|
||||
u8 mask_len, struct seq_file *s)
|
||||
{
|
||||
struct gaudi_device *gaudi = hdev->asic_specific;
|
||||
const char *fmt = "%-5d%-9s%#-14x%#-12x%#x\n";
|
||||
const char *mme_slave_fmt = "%-5d%-9s%-14s%-12s%#x\n";
|
||||
const char *nic_fmt = "%-5d%-9s%#-14x%#x\n";
|
||||
unsigned long *mask = (unsigned long *)mask_arr;
|
||||
u32 qm_glbl_sts0, qm_cgm_sts, dma_core_sts0, tpc_cfg_sts, mme_arch_sts;
|
||||
bool is_idle = true, is_eng_idle, is_slave;
|
||||
u64 offset;
|
||||
|
@ -7515,9 +7740,8 @@ static bool gaudi_is_device_idle(struct hl_device *hdev, u64 *mask,
|
|||
IS_DMA_IDLE(dma_core_sts0);
|
||||
is_idle &= is_eng_idle;
|
||||
|
||||
if (mask)
|
||||
*mask |= ((u64) !is_eng_idle) <<
|
||||
(GAUDI_ENGINE_ID_DMA_0 + dma_id);
|
||||
if (mask && !is_eng_idle)
|
||||
set_bit(GAUDI_ENGINE_ID_DMA_0 + dma_id, mask);
|
||||
if (s)
|
||||
seq_printf(s, fmt, dma_id,
|
||||
is_eng_idle ? "Y" : "N", qm_glbl_sts0,
|
||||
|
@ -7538,9 +7762,8 @@ static bool gaudi_is_device_idle(struct hl_device *hdev, u64 *mask,
|
|||
IS_TPC_IDLE(tpc_cfg_sts);
|
||||
is_idle &= is_eng_idle;
|
||||
|
||||
if (mask)
|
||||
*mask |= ((u64) !is_eng_idle) <<
|
||||
(GAUDI_ENGINE_ID_TPC_0 + i);
|
||||
if (mask && !is_eng_idle)
|
||||
set_bit(GAUDI_ENGINE_ID_TPC_0 + i, mask);
|
||||
if (s)
|
||||
seq_printf(s, fmt, i,
|
||||
is_eng_idle ? "Y" : "N",
|
||||
|
@ -7567,9 +7790,8 @@ static bool gaudi_is_device_idle(struct hl_device *hdev, u64 *mask,
|
|||
|
||||
is_idle &= is_eng_idle;
|
||||
|
||||
if (mask)
|
||||
*mask |= ((u64) !is_eng_idle) <<
|
||||
(GAUDI_ENGINE_ID_MME_0 + i);
|
||||
if (mask && !is_eng_idle)
|
||||
set_bit(GAUDI_ENGINE_ID_MME_0 + i, mask);
|
||||
if (s) {
|
||||
if (!is_slave)
|
||||
seq_printf(s, fmt, i,
|
||||
|
@ -7595,9 +7817,8 @@ static bool gaudi_is_device_idle(struct hl_device *hdev, u64 *mask,
|
|||
is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_cgm_sts);
|
||||
is_idle &= is_eng_idle;
|
||||
|
||||
if (mask)
|
||||
*mask |= ((u64) !is_eng_idle) <<
|
||||
(GAUDI_ENGINE_ID_NIC_0 + port);
|
||||
if (mask && !is_eng_idle)
|
||||
set_bit(GAUDI_ENGINE_ID_NIC_0 + port, mask);
|
||||
if (s)
|
||||
seq_printf(s, nic_fmt, port,
|
||||
is_eng_idle ? "Y" : "N",
|
||||
|
@ -7611,9 +7832,8 @@ static bool gaudi_is_device_idle(struct hl_device *hdev, u64 *mask,
|
|||
is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_cgm_sts);
|
||||
is_idle &= is_eng_idle;
|
||||
|
||||
if (mask)
|
||||
*mask |= ((u64) !is_eng_idle) <<
|
||||
(GAUDI_ENGINE_ID_NIC_0 + port);
|
||||
if (mask && !is_eng_idle)
|
||||
set_bit(GAUDI_ENGINE_ID_NIC_0 + port, mask);
|
||||
if (s)
|
||||
seq_printf(s, nic_fmt, port,
|
||||
is_eng_idle ? "Y" : "N",
|
||||
|
@ -7876,18 +8096,16 @@ static void gaudi_internal_cb_pool_fini(struct hl_device *hdev,
|
|||
|
||||
static int gaudi_ctx_init(struct hl_ctx *ctx)
|
||||
{
|
||||
if (ctx->asid == HL_KERNEL_ASID_ID)
|
||||
return 0;
|
||||
|
||||
gaudi_mmu_prepare(ctx->hdev, ctx->asid);
|
||||
return gaudi_internal_cb_pool_init(ctx->hdev, ctx);
|
||||
}
|
||||
|
||||
static void gaudi_ctx_fini(struct hl_ctx *ctx)
|
||||
{
|
||||
struct hl_device *hdev = ctx->hdev;
|
||||
|
||||
/* Gaudi will NEVER support more then a single compute context.
|
||||
* Therefore, don't clear anything unless it is the compute context
|
||||
*/
|
||||
if (hdev->compute_ctx != ctx)
|
||||
if (ctx->asid == HL_KERNEL_ASID_ID)
|
||||
return;
|
||||
|
||||
gaudi_internal_cb_pool_fini(ctx->hdev, ctx);
|
||||
|
@ -7928,10 +8146,10 @@ static u32 gaudi_gen_signal_cb(struct hl_device *hdev, void *data, u16 sob_id,
|
|||
ctl = FIELD_PREP(GAUDI_PKT_SHORT_CTL_ADDR_MASK, sob_id * 4);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_OP_MASK, 0); /* write the value */
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_BASE_MASK, 3); /* W_S SOB base */
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_OPCODE_MASK, PACKET_MSG_SHORT);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_EB_MASK, eb);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_RB_MASK, 1);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_MB_MASK, 1);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_MSG_SHORT);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_CTL_EB_MASK, eb);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_CTL_RB_MASK, 1);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
|
||||
|
||||
pkt->value = cpu_to_le32(value);
|
||||
pkt->ctl = cpu_to_le32(ctl);
|
||||
|
@ -7948,10 +8166,10 @@ static u32 gaudi_add_mon_msg_short(struct packet_msg_short *pkt, u32 value,
|
|||
|
||||
ctl = FIELD_PREP(GAUDI_PKT_SHORT_CTL_ADDR_MASK, addr);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_BASE_MASK, 2); /* W_S MON base */
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_OPCODE_MASK, PACKET_MSG_SHORT);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_EB_MASK, 0);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_RB_MASK, 1);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_MB_MASK, 0); /* last pkt MB */
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_MSG_SHORT);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_CTL_EB_MASK, 0);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_CTL_RB_MASK, 1);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 0); /* last pkt MB */
|
||||
|
||||
pkt->value = cpu_to_le32(value);
|
||||
pkt->ctl = cpu_to_le32(ctl);
|
||||
|
@ -7997,10 +8215,10 @@ static u32 gaudi_add_arm_monitor_pkt(struct hl_device *hdev,
|
|||
ctl = FIELD_PREP(GAUDI_PKT_SHORT_CTL_ADDR_MASK, msg_addr_offset);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_OP_MASK, 0); /* write the value */
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_BASE_MASK, 2); /* W_S MON base */
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_OPCODE_MASK, PACKET_MSG_SHORT);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_EB_MASK, 0);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_RB_MASK, 1);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_MB_MASK, 1);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_MSG_SHORT);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_CTL_EB_MASK, 0);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_CTL_RB_MASK, 1);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
|
||||
|
||||
pkt->value = cpu_to_le32(value);
|
||||
pkt->ctl = cpu_to_le32(ctl);
|
||||
|
@ -8018,10 +8236,10 @@ static u32 gaudi_add_fence_pkt(struct packet_fence *pkt)
|
|||
cfg |= FIELD_PREP(GAUDI_PKT_FENCE_CFG_TARGET_VAL_MASK, 1);
|
||||
cfg |= FIELD_PREP(GAUDI_PKT_FENCE_CFG_ID_MASK, 2);
|
||||
|
||||
ctl = FIELD_PREP(GAUDI_PKT_FENCE_CTL_OPCODE_MASK, PACKET_FENCE);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_EB_MASK, 0);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_RB_MASK, 1);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_MB_MASK, 1);
|
||||
ctl = FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_FENCE);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_CTL_EB_MASK, 0);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_CTL_RB_MASK, 1);
|
||||
ctl |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
|
||||
|
||||
pkt->cfg = cpu_to_le32(cfg);
|
||||
pkt->ctl = cpu_to_le32(ctl);
|
||||
|
@ -8217,12 +8435,16 @@ static u32 gaudi_gen_wait_cb(struct hl_device *hdev,
|
|||
static void gaudi_reset_sob(struct hl_device *hdev, void *data)
|
||||
{
|
||||
struct hl_hw_sob *hw_sob = (struct hl_hw_sob *) data;
|
||||
int rc;
|
||||
|
||||
dev_dbg(hdev->dev, "reset SOB, q_idx: %d, sob_id: %d\n", hw_sob->q_idx,
|
||||
hw_sob->sob_id);
|
||||
|
||||
WREG32(mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0 + hw_sob->sob_id * 4,
|
||||
0);
|
||||
rc = gaudi_schedule_register_memset(hdev, hw_sob->q_idx,
|
||||
CFG_BASE + mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0 +
|
||||
hw_sob->sob_id * 4, 1, 0);
|
||||
if (rc)
|
||||
dev_err(hdev->dev, "failed resetting sob %u", hw_sob->sob_id);
|
||||
|
||||
kref_init(&hw_sob->kref);
|
||||
}
|
||||
|
@ -8246,6 +8468,19 @@ static u64 gaudi_get_device_time(struct hl_device *hdev)
|
|||
return device_time | RREG32(mmPSOC_TIMESTAMP_CNTCVL);
|
||||
}
|
||||
|
||||
static int gaudi_get_hw_block_id(struct hl_device *hdev, u64 block_addr,
|
||||
u32 *block_id)
|
||||
{
|
||||
return -EPERM;
|
||||
}
|
||||
|
||||
static int gaudi_block_mmap(struct hl_device *hdev,
|
||||
struct vm_area_struct *vma,
|
||||
u32 block_id, u32 block_size)
|
||||
{
|
||||
return -EPERM;
|
||||
}
|
||||
|
||||
static const struct hl_asic_funcs gaudi_funcs = {
|
||||
.early_init = gaudi_early_init,
|
||||
.early_fini = gaudi_early_fini,
|
||||
|
@ -8322,7 +8557,12 @@ static const struct hl_asic_funcs gaudi_funcs = {
|
|||
.set_dma_mask_from_fw = gaudi_set_dma_mask_from_fw,
|
||||
.get_device_time = gaudi_get_device_time,
|
||||
.collective_wait_init_cs = gaudi_collective_wait_init_cs,
|
||||
.collective_wait_create_jobs = gaudi_collective_wait_create_jobs
|
||||
.collective_wait_create_jobs = gaudi_collective_wait_create_jobs,
|
||||
.scramble_addr = hl_mmu_scramble_addr,
|
||||
.descramble_addr = hl_mmu_descramble_addr,
|
||||
.ack_protection_bits_errors = gaudi_ack_protection_bits_errors,
|
||||
.get_hw_block_id = gaudi_get_hw_block_id,
|
||||
.hw_block_mmap = gaudi_block_mmap
|
||||
};
|
||||
|
||||
/**
|
||||
|
|
|
@ -251,11 +251,13 @@ enum gaudi_nic_mask {
|
|||
* @hdev: habanalabs device structure.
|
||||
* @kref: refcount of this SOB group. group will reset once refcount is zero.
|
||||
* @base_sob_id: base sob id of this SOB group.
|
||||
* @queue_id: id of the queue that waits on this sob group
|
||||
*/
|
||||
struct gaudi_hw_sob_group {
|
||||
struct hl_device *hdev;
|
||||
struct kref kref;
|
||||
u32 base_sob_id;
|
||||
u32 queue_id;
|
||||
};
|
||||
|
||||
#define NUM_SOB_GROUPS (HL_RSVD_SOBS * QMAN_STREAMS)
|
||||
|
@ -333,6 +335,7 @@ struct gaudi_device {
|
|||
};
|
||||
|
||||
void gaudi_init_security(struct hl_device *hdev);
|
||||
void gaudi_ack_protection_bits_errors(struct hl_device *hdev);
|
||||
void gaudi_add_device_attr(struct hl_device *hdev,
|
||||
struct attribute_group *dev_attr_grp);
|
||||
void gaudi_set_pll_profile(struct hl_device *hdev, enum hl_pll_frequency freq);
|
||||
|
|
|
@ -634,9 +634,21 @@ static int gaudi_config_etr(struct hl_device *hdev,
|
|||
WREG32(mmPSOC_ETR_BUFWM, 0x3FFC);
|
||||
WREG32(mmPSOC_ETR_RSZ, input->buffer_size);
|
||||
WREG32(mmPSOC_ETR_MODE, input->sink_mode);
|
||||
/* Workaround for H3 #HW-2075 bug: use small data chunks */
|
||||
WREG32(mmPSOC_ETR_AXICTL, (is_host ? 0 : 0x700) |
|
||||
PSOC_ETR_AXICTL_PROTCTRLBIT1_SHIFT);
|
||||
if (hdev->asic_prop.fw_security_disabled) {
|
||||
/* make ETR not privileged */
|
||||
val = FIELD_PREP(
|
||||
PSOC_ETR_AXICTL_PROTCTRLBIT0_MASK, 0);
|
||||
/* make ETR non-secured (inverted logic) */
|
||||
val |= FIELD_PREP(
|
||||
PSOC_ETR_AXICTL_PROTCTRLBIT1_MASK, 1);
|
||||
/*
|
||||
* Workaround for H3 #HW-2075 bug: use small data
|
||||
* chunks
|
||||
*/
|
||||
val |= FIELD_PREP(PSOC_ETR_AXICTL_WRBURSTLEN_MASK,
|
||||
is_host ? 0 : 7);
|
||||
WREG32(mmPSOC_ETR_AXICTL, val);
|
||||
}
|
||||
WREG32(mmPSOC_ETR_DBALO,
|
||||
lower_32_bits(input->buffer_address));
|
||||
WREG32(mmPSOC_ETR_DBAHI,
|
||||
|
|
|
@ -13052,3 +13052,8 @@ void gaudi_init_security(struct hl_device *hdev)
|
|||
|
||||
gaudi_init_protection_bits(hdev);
|
||||
}
|
||||
|
||||
void gaudi_ack_protection_bits_errors(struct hl_device *hdev)
|
||||
{
|
||||
|
||||
}
|
||||
|
|
|
@ -455,6 +455,8 @@ int goya_get_fixed_properties(struct hl_device *hdev)
|
|||
|
||||
prop->max_pending_cs = GOYA_MAX_PENDING_CS;
|
||||
|
||||
prop->first_available_user_msix_interrupt = USHRT_MAX;
|
||||
|
||||
/* disable fw security for now, set it in a later stage */
|
||||
prop->fw_security_disabled = true;
|
||||
prop->fw_security_status_valid = false;
|
||||
|
@ -2914,7 +2916,7 @@ static int goya_send_job_on_qman0(struct hl_device *hdev, struct hl_cs_job *job)
|
|||
else
|
||||
timeout = HL_DEVICE_TIMEOUT_USEC;
|
||||
|
||||
if (!hdev->asic_funcs->is_device_idle(hdev, NULL, NULL)) {
|
||||
if (!hdev->asic_funcs->is_device_idle(hdev, NULL, 0, NULL)) {
|
||||
dev_err_ratelimited(hdev->dev,
|
||||
"Can't send driver job on QMAN0 because the device is not idle\n");
|
||||
return -EBUSY;
|
||||
|
@ -3876,10 +3878,10 @@ static int goya_parse_cb_mmu(struct hl_device *hdev,
|
|||
patched_cb_handle >>= PAGE_SHIFT;
|
||||
parser->patched_cb = hl_cb_get(hdev, &hdev->kernel_cb_mgr,
|
||||
(u32) patched_cb_handle);
|
||||
/* hl_cb_get should never fail here so use kernel WARN */
|
||||
WARN(!parser->patched_cb, "DMA CB handle invalid 0x%x\n",
|
||||
(u32) patched_cb_handle);
|
||||
/* hl_cb_get should never fail here */
|
||||
if (!parser->patched_cb) {
|
||||
dev_crit(hdev->dev, "DMA CB handle invalid 0x%x\n",
|
||||
(u32) patched_cb_handle);
|
||||
rc = -EFAULT;
|
||||
goto out;
|
||||
}
|
||||
|
@ -3948,10 +3950,10 @@ static int goya_parse_cb_no_mmu(struct hl_device *hdev,
|
|||
patched_cb_handle >>= PAGE_SHIFT;
|
||||
parser->patched_cb = hl_cb_get(hdev, &hdev->kernel_cb_mgr,
|
||||
(u32) patched_cb_handle);
|
||||
/* hl_cb_get should never fail here so use kernel WARN */
|
||||
WARN(!parser->patched_cb, "DMA CB handle invalid 0x%x\n",
|
||||
(u32) patched_cb_handle);
|
||||
/* hl_cb_get should never fail here */
|
||||
if (!parser->patched_cb) {
|
||||
dev_crit(hdev->dev, "DMA CB handle invalid 0x%x\n",
|
||||
(u32) patched_cb_handle);
|
||||
rc = -EFAULT;
|
||||
goto out;
|
||||
}
|
||||
|
@ -4122,9 +4124,6 @@ static int goya_debugfs_read32(struct hl_device *hdev, u64 addr, u32 *val)
|
|||
if (ddr_bar_addr == U64_MAX)
|
||||
rc = -EIO;
|
||||
|
||||
} else if (addr >= HOST_PHYS_BASE && !iommu_present(&pci_bus_type)) {
|
||||
*val = *(u32 *) phys_to_virt(addr - HOST_PHYS_BASE);
|
||||
|
||||
} else {
|
||||
rc = -EFAULT;
|
||||
}
|
||||
|
@ -4178,9 +4177,6 @@ static int goya_debugfs_write32(struct hl_device *hdev, u64 addr, u32 val)
|
|||
if (ddr_bar_addr == U64_MAX)
|
||||
rc = -EIO;
|
||||
|
||||
} else if (addr >= HOST_PHYS_BASE && !iommu_present(&pci_bus_type)) {
|
||||
*(u32 *) phys_to_virt(addr - HOST_PHYS_BASE) = val;
|
||||
|
||||
} else {
|
||||
rc = -EFAULT;
|
||||
}
|
||||
|
@ -4223,9 +4219,6 @@ static int goya_debugfs_read64(struct hl_device *hdev, u64 addr, u64 *val)
|
|||
if (ddr_bar_addr == U64_MAX)
|
||||
rc = -EIO;
|
||||
|
||||
} else if (addr >= HOST_PHYS_BASE && !iommu_present(&pci_bus_type)) {
|
||||
*val = *(u64 *) phys_to_virt(addr - HOST_PHYS_BASE);
|
||||
|
||||
} else {
|
||||
rc = -EFAULT;
|
||||
}
|
||||
|
@ -4266,9 +4259,6 @@ static int goya_debugfs_write64(struct hl_device *hdev, u64 addr, u64 val)
|
|||
if (ddr_bar_addr == U64_MAX)
|
||||
rc = -EIO;
|
||||
|
||||
} else if (addr >= HOST_PHYS_BASE && !iommu_present(&pci_bus_type)) {
|
||||
*(u64 *) phys_to_virt(addr - HOST_PHYS_BASE) = val;
|
||||
|
||||
} else {
|
||||
rc = -EFAULT;
|
||||
}
|
||||
|
@ -4877,8 +4867,6 @@ int goya_context_switch(struct hl_device *hdev, u32 asid)
|
|||
|
||||
WREG32(mmTPC_PLL_CLK_RLX_0, 0x200020);
|
||||
|
||||
goya_mmu_prepare(hdev, asid);
|
||||
|
||||
goya_clear_sm_regs(hdev);
|
||||
|
||||
return 0;
|
||||
|
@ -5044,7 +5032,7 @@ static void goya_mmu_prepare(struct hl_device *hdev, u32 asid)
|
|||
return;
|
||||
|
||||
if (asid & ~MME_QM_GLBL_SECURE_PROPS_ASID_MASK) {
|
||||
WARN(1, "asid %u is too big\n", asid);
|
||||
dev_crit(hdev->dev, "asid %u is too big\n", asid);
|
||||
return;
|
||||
}
|
||||
|
||||
|
@ -5073,8 +5061,6 @@ static int goya_mmu_invalidate_cache(struct hl_device *hdev, bool is_hard,
|
|||
else
|
||||
timeout_usec = MMU_CONFIG_TIMEOUT_USEC;
|
||||
|
||||
mutex_lock(&hdev->mmu_cache_lock);
|
||||
|
||||
/* L0 & L1 invalidation */
|
||||
WREG32(mmSTLB_INV_ALL_START, 1);
|
||||
|
||||
|
@ -5086,8 +5072,6 @@ static int goya_mmu_invalidate_cache(struct hl_device *hdev, bool is_hard,
|
|||
1000,
|
||||
timeout_usec);
|
||||
|
||||
mutex_unlock(&hdev->mmu_cache_lock);
|
||||
|
||||
if (rc) {
|
||||
dev_err_ratelimited(hdev->dev,
|
||||
"MMU cache invalidation timeout\n");
|
||||
|
@ -5117,8 +5101,6 @@ static int goya_mmu_invalidate_cache_range(struct hl_device *hdev,
|
|||
else
|
||||
timeout_usec = MMU_CONFIG_TIMEOUT_USEC;
|
||||
|
||||
mutex_lock(&hdev->mmu_cache_lock);
|
||||
|
||||
/*
|
||||
* TODO: currently invalidate entire L0 & L1 as in regular hard
|
||||
* invalidation. Need to apply invalidation of specific cache lines with
|
||||
|
@ -5141,8 +5123,6 @@ static int goya_mmu_invalidate_cache_range(struct hl_device *hdev,
|
|||
1000,
|
||||
timeout_usec);
|
||||
|
||||
mutex_unlock(&hdev->mmu_cache_lock);
|
||||
|
||||
if (rc) {
|
||||
dev_err_ratelimited(hdev->dev,
|
||||
"MMU cache invalidation timeout\n");
|
||||
|
@ -5172,7 +5152,7 @@ int goya_cpucp_info_get(struct hl_device *hdev)
|
|||
if (!(goya->hw_cap_initialized & HW_CAP_CPU_Q))
|
||||
return 0;
|
||||
|
||||
rc = hl_fw_cpucp_info_get(hdev, mmCPU_BOOT_DEV_STS0);
|
||||
rc = hl_fw_cpucp_info_get(hdev, mmCPU_BOOT_DEV_STS0, mmCPU_BOOT_ERR0);
|
||||
if (rc)
|
||||
return rc;
|
||||
|
||||
|
@ -5207,11 +5187,12 @@ static void goya_disable_clock_gating(struct hl_device *hdev)
|
|||
/* clock gating not supported in Goya */
|
||||
}
|
||||
|
||||
static bool goya_is_device_idle(struct hl_device *hdev, u64 *mask,
|
||||
struct seq_file *s)
|
||||
static bool goya_is_device_idle(struct hl_device *hdev, u64 *mask_arr,
|
||||
u8 mask_len, struct seq_file *s)
|
||||
{
|
||||
const char *fmt = "%-5d%-9s%#-14x%#-16x%#x\n";
|
||||
const char *dma_fmt = "%-5d%-9s%#-14x%#x\n";
|
||||
unsigned long *mask = (unsigned long *)mask_arr;
|
||||
u32 qm_glbl_sts0, cmdq_glbl_sts0, dma_core_sts0, tpc_cfg_sts,
|
||||
mme_arch_sts;
|
||||
bool is_idle = true, is_eng_idle;
|
||||
|
@ -5231,9 +5212,8 @@ static bool goya_is_device_idle(struct hl_device *hdev, u64 *mask,
|
|||
IS_DMA_IDLE(dma_core_sts0);
|
||||
is_idle &= is_eng_idle;
|
||||
|
||||
if (mask)
|
||||
*mask |= ((u64) !is_eng_idle) <<
|
||||
(GOYA_ENGINE_ID_DMA_0 + i);
|
||||
if (mask && !is_eng_idle)
|
||||
set_bit(GOYA_ENGINE_ID_DMA_0 + i, mask);
|
||||
if (s)
|
||||
seq_printf(s, dma_fmt, i, is_eng_idle ? "Y" : "N",
|
||||
qm_glbl_sts0, dma_core_sts0);
|
||||
|
@ -5255,9 +5235,8 @@ static bool goya_is_device_idle(struct hl_device *hdev, u64 *mask,
|
|||
IS_TPC_IDLE(tpc_cfg_sts);
|
||||
is_idle &= is_eng_idle;
|
||||
|
||||
if (mask)
|
||||
*mask |= ((u64) !is_eng_idle) <<
|
||||
(GOYA_ENGINE_ID_TPC_0 + i);
|
||||
if (mask && !is_eng_idle)
|
||||
set_bit(GOYA_ENGINE_ID_TPC_0 + i, mask);
|
||||
if (s)
|
||||
seq_printf(s, fmt, i, is_eng_idle ? "Y" : "N",
|
||||
qm_glbl_sts0, cmdq_glbl_sts0, tpc_cfg_sts);
|
||||
|
@ -5276,8 +5255,8 @@ static bool goya_is_device_idle(struct hl_device *hdev, u64 *mask,
|
|||
IS_MME_IDLE(mme_arch_sts);
|
||||
is_idle &= is_eng_idle;
|
||||
|
||||
if (mask)
|
||||
*mask |= ((u64) !is_eng_idle) << GOYA_ENGINE_ID_MME_0;
|
||||
if (mask && !is_eng_idle)
|
||||
set_bit(GOYA_ENGINE_ID_MME_0, mask);
|
||||
if (s) {
|
||||
seq_printf(s, fmt, 0, is_eng_idle ? "Y" : "N", qm_glbl_sts0,
|
||||
cmdq_glbl_sts0, mme_arch_sts);
|
||||
|
@ -5321,6 +5300,9 @@ static int goya_get_eeprom_data(struct hl_device *hdev, void *data,
|
|||
|
||||
static int goya_ctx_init(struct hl_ctx *ctx)
|
||||
{
|
||||
if (ctx->asid != HL_KERNEL_ASID_ID)
|
||||
goya_mmu_prepare(ctx->hdev, ctx->asid);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
@ -5399,6 +5381,18 @@ static void goya_ctx_fini(struct hl_ctx *ctx)
|
|||
|
||||
}
|
||||
|
||||
static int goya_get_hw_block_id(struct hl_device *hdev, u64 block_addr,
|
||||
u32 *block_id)
|
||||
{
|
||||
return -EPERM;
|
||||
}
|
||||
|
||||
static int goya_block_mmap(struct hl_device *hdev, struct vm_area_struct *vma,
|
||||
u32 block_id, u32 block_size)
|
||||
{
|
||||
return -EPERM;
|
||||
}
|
||||
|
||||
static const struct hl_asic_funcs goya_funcs = {
|
||||
.early_init = goya_early_init,
|
||||
.early_fini = goya_early_fini,
|
||||
|
@ -5475,7 +5469,12 @@ static const struct hl_asic_funcs goya_funcs = {
|
|||
.set_dma_mask_from_fw = goya_set_dma_mask_from_fw,
|
||||
.get_device_time = goya_get_device_time,
|
||||
.collective_wait_init_cs = goya_collective_wait_init_cs,
|
||||
.collective_wait_create_jobs = goya_collective_wait_create_jobs
|
||||
.collective_wait_create_jobs = goya_collective_wait_create_jobs,
|
||||
.scramble_addr = hl_mmu_scramble_addr,
|
||||
.descramble_addr = hl_mmu_descramble_addr,
|
||||
.ack_protection_bits_errors = goya_ack_protection_bits_errors,
|
||||
.get_hw_block_id = goya_get_hw_block_id,
|
||||
.hw_block_mmap = goya_block_mmap
|
||||
};
|
||||
|
||||
/*
|
||||
|
|
|
@ -173,6 +173,7 @@ void goya_init_mme_qmans(struct hl_device *hdev);
|
|||
void goya_init_tpc_qmans(struct hl_device *hdev);
|
||||
int goya_init_cpu_queues(struct hl_device *hdev);
|
||||
void goya_init_security(struct hl_device *hdev);
|
||||
void goya_ack_protection_bits_errors(struct hl_device *hdev);
|
||||
int goya_late_init(struct hl_device *hdev);
|
||||
void goya_late_fini(struct hl_device *hdev);
|
||||
|
||||
|
|
|
@ -434,8 +434,15 @@ static int goya_config_etr(struct hl_device *hdev,
|
|||
WREG32(mmPSOC_ETR_BUFWM, 0x3FFC);
|
||||
WREG32(mmPSOC_ETR_RSZ, input->buffer_size);
|
||||
WREG32(mmPSOC_ETR_MODE, input->sink_mode);
|
||||
WREG32(mmPSOC_ETR_AXICTL,
|
||||
0x700 | PSOC_ETR_AXICTL_PROTCTRLBIT1_SHIFT);
|
||||
if (hdev->asic_prop.fw_security_disabled) {
|
||||
/* make ETR not privileged */
|
||||
val = FIELD_PREP(PSOC_ETR_AXICTL_PROTCTRLBIT0_MASK, 0);
|
||||
/* make ETR non-secured (inverted logic) */
|
||||
val |= FIELD_PREP(PSOC_ETR_AXICTL_PROTCTRLBIT1_MASK, 1);
|
||||
/* burst size 8 */
|
||||
val |= FIELD_PREP(PSOC_ETR_AXICTL_WRBURSTLEN_MASK, 7);
|
||||
WREG32(mmPSOC_ETR_AXICTL, val);
|
||||
}
|
||||
WREG32(mmPSOC_ETR_DBALO,
|
||||
lower_32_bits(input->buffer_address));
|
||||
WREG32(mmPSOC_ETR_DBAHI,
|
||||
|
|
|
@ -3120,3 +3120,8 @@ void goya_init_security(struct hl_device *hdev)
|
|||
|
||||
goya_init_protection_bits(hdev);
|
||||
}
|
||||
|
||||
void goya_ack_protection_bits_errors(struct hl_device *hdev)
|
||||
{
|
||||
|
||||
}
|
||||
|
|
|
@ -58,11 +58,25 @@ struct hl_eq_ecc_data {
|
|||
__u8 pad[7];
|
||||
};
|
||||
|
||||
enum hl_sm_sei_cause {
|
||||
SM_SEI_SO_OVERFLOW,
|
||||
SM_SEI_LBW_4B_UNALIGNED,
|
||||
SM_SEI_AXI_RESPONSE_ERR
|
||||
};
|
||||
|
||||
struct hl_eq_sm_sei_data {
|
||||
__le32 sei_log;
|
||||
/* enum hl_sm_sei_cause */
|
||||
__u8 sei_cause;
|
||||
__u8 pad[3];
|
||||
};
|
||||
|
||||
struct hl_eq_entry {
|
||||
struct hl_eq_header hdr;
|
||||
union {
|
||||
struct hl_eq_ecc_data ecc_data;
|
||||
struct hl_eq_hbm_ecc_data hbm_ecc_data;
|
||||
struct hl_eq_sm_sei_data sm_sei_data;
|
||||
__le64 data[7];
|
||||
};
|
||||
};
|
||||
|
|
|
@ -70,6 +70,9 @@
|
|||
* checksum. Trying to program image again
|
||||
* might solve this.
|
||||
*
|
||||
* CPU_BOOT_ERR0_PLL_FAIL PLL settings failed, meaning that one
|
||||
* of the PLLs remains in REF_CLK
|
||||
*
|
||||
* CPU_BOOT_ERR0_ENABLED Error registers enabled.
|
||||
* This is a main indication that the
|
||||
* running FW populates the error
|
||||
|
@ -88,6 +91,7 @@
|
|||
#define CPU_BOOT_ERR0_EFUSE_FAIL (1 << 9)
|
||||
#define CPU_BOOT_ERR0_PRI_IMG_VER_FAIL (1 << 10)
|
||||
#define CPU_BOOT_ERR0_SEC_IMG_VER_FAIL (1 << 11)
|
||||
#define CPU_BOOT_ERR0_PLL_FAIL (1 << 12)
|
||||
#define CPU_BOOT_ERR0_ENABLED (1 << 31)
|
||||
|
||||
/*
|
||||
|
@ -150,10 +154,18 @@
|
|||
* CPU_BOOT_DEV_STS0_PLL_INFO_EN FW retrieval of PLL info is enabled.
|
||||
* Initialized in: linux
|
||||
*
|
||||
* CPU_BOOT_DEV_STS0_SP_SRAM_EN SP SRAM is initialized and available
|
||||
* for use.
|
||||
* Initialized in: preboot
|
||||
*
|
||||
* CPU_BOOT_DEV_STS0_CLK_GATE_EN Clock Gating enabled.
|
||||
* FW initialized Clock Gating.
|
||||
* Initialized in: preboot
|
||||
*
|
||||
* CPU_BOOT_DEV_STS0_HBM_ECC_EN HBM ECC handling Enabled.
|
||||
* FW handles HBM ECC indications.
|
||||
* Initialized in: linux
|
||||
*
|
||||
* CPU_BOOT_DEV_STS0_ENABLED Device status register enabled.
|
||||
* This is a main indication that the
|
||||
* running FW populates the device status
|
||||
|
@ -175,7 +187,9 @@
|
|||
#define CPU_BOOT_DEV_STS0_DRAM_SCR_EN (1 << 9)
|
||||
#define CPU_BOOT_DEV_STS0_FW_HARD_RST_EN (1 << 10)
|
||||
#define CPU_BOOT_DEV_STS0_PLL_INFO_EN (1 << 11)
|
||||
#define CPU_BOOT_DEV_STS0_SP_SRAM_EN (1 << 12)
|
||||
#define CPU_BOOT_DEV_STS0_CLK_GATE_EN (1 << 13)
|
||||
#define CPU_BOOT_DEV_STS0_HBM_ECC_EN (1 << 14)
|
||||
#define CPU_BOOT_DEV_STS0_ENABLED (1 << 31)
|
||||
|
||||
enum cpu_boot_status {
|
||||
|
|
|
@ -212,6 +212,10 @@ enum gaudi_async_event_id {
|
|||
GAUDI_EVENT_NIC_SEI_2 = 266,
|
||||
GAUDI_EVENT_NIC_SEI_3 = 267,
|
||||
GAUDI_EVENT_NIC_SEI_4 = 268,
|
||||
GAUDI_EVENT_DMA_IF_SEI_0 = 277,
|
||||
GAUDI_EVENT_DMA_IF_SEI_1 = 278,
|
||||
GAUDI_EVENT_DMA_IF_SEI_2 = 279,
|
||||
GAUDI_EVENT_DMA_IF_SEI_3 = 280,
|
||||
GAUDI_EVENT_PCIE_FLR = 290,
|
||||
GAUDI_EVENT_TPC0_BMON_SPMU = 300,
|
||||
GAUDI_EVENT_TPC0_KRN_ERR = 301,
|
||||
|
|
|
@ -388,7 +388,10 @@ enum axi_id {
|
|||
#define RAZWI_INITIATOR_ID_X_Y_TPC6 RAZWI_INITIATOR_ID_X_Y(7, 6)
|
||||
#define RAZWI_INITIATOR_ID_X_Y_TPC7_NIC4_NIC5 RAZWI_INITIATOR_ID_X_Y(8, 6)
|
||||
|
||||
#define PSOC_ETR_AXICTL_PROTCTRLBIT1_SHIFT 1
|
||||
#define PSOC_ETR_AXICTL_PROTCTRLBIT1_SHIFT 1
|
||||
#define PSOC_ETR_AXICTL_PROTCTRLBIT0_MASK 0x1
|
||||
#define PSOC_ETR_AXICTL_PROTCTRLBIT1_MASK 0x2
|
||||
#define PSOC_ETR_AXICTL_WRBURSTLEN_MASK 0xF00
|
||||
|
||||
/* STLB_CACHE_INV */
|
||||
#define STLB_CACHE_INV_PRODUCER_INDEX_SHIFT 0
|
||||
|
|
|
@ -78,6 +78,9 @@ struct packet_wreg_bulk {
|
|||
__le64 values[0]; /* data starts here */
|
||||
};
|
||||
|
||||
#define GAUDI_PKT_LONG_CTL_OP_SHIFT 20
|
||||
#define GAUDI_PKT_LONG_CTL_OP_MASK 0x00300000
|
||||
|
||||
struct packet_msg_long {
|
||||
__le32 value;
|
||||
__le32 ctl;
|
||||
|
@ -111,18 +114,6 @@ struct packet_msg_long {
|
|||
#define GAUDI_PKT_SHORT_CTL_BASE_SHIFT 22
|
||||
#define GAUDI_PKT_SHORT_CTL_BASE_MASK 0x00C00000
|
||||
|
||||
#define GAUDI_PKT_SHORT_CTL_OPCODE_SHIFT 24
|
||||
#define GAUDI_PKT_SHORT_CTL_OPCODE_MASK 0x1F000000
|
||||
|
||||
#define GAUDI_PKT_SHORT_CTL_EB_SHIFT 29
|
||||
#define GAUDI_PKT_SHORT_CTL_EB_MASK 0x20000000
|
||||
|
||||
#define GAUDI_PKT_SHORT_CTL_RB_SHIFT 30
|
||||
#define GAUDI_PKT_SHORT_CTL_RB_MASK 0x40000000
|
||||
|
||||
#define GAUDI_PKT_SHORT_CTL_MB_SHIFT 31
|
||||
#define GAUDI_PKT_SHORT_CTL_MB_MASK 0x80000000
|
||||
|
||||
struct packet_msg_short {
|
||||
__le32 value;
|
||||
__le32 ctl;
|
||||
|
@ -146,18 +137,6 @@ struct packet_msg_prot {
|
|||
#define GAUDI_PKT_FENCE_CTL_PRED_SHIFT 0
|
||||
#define GAUDI_PKT_FENCE_CTL_PRED_MASK 0x0000001F
|
||||
|
||||
#define GAUDI_PKT_FENCE_CTL_OPCODE_SHIFT 24
|
||||
#define GAUDI_PKT_FENCE_CTL_OPCODE_MASK 0x1F000000
|
||||
|
||||
#define GAUDI_PKT_FENCE_CTL_EB_SHIFT 29
|
||||
#define GAUDI_PKT_FENCE_CTL_EB_MASK 0x20000000
|
||||
|
||||
#define GAUDI_PKT_FENCE_CTL_RB_SHIFT 30
|
||||
#define GAUDI_PKT_FENCE_CTL_RB_MASK 0x40000000
|
||||
|
||||
#define GAUDI_PKT_FENCE_CTL_MB_SHIFT 31
|
||||
#define GAUDI_PKT_FENCE_CTL_MB_MASK 0x80000000
|
||||
|
||||
struct packet_fence {
|
||||
__le32 cfg;
|
||||
__le32 ctl;
|
||||
|
|
|
@ -259,6 +259,9 @@
|
|||
#define DMA_QM_3_GLBL_CFG1_DMA_STOP_SHIFT DMA_QM_0_GLBL_CFG1_DMA_STOP_SHIFT
|
||||
#define DMA_QM_4_GLBL_CFG1_DMA_STOP_SHIFT DMA_QM_0_GLBL_CFG1_DMA_STOP_SHIFT
|
||||
|
||||
#define PSOC_ETR_AXICTL_PROTCTRLBIT1_SHIFT 1
|
||||
#define PSOC_ETR_AXICTL_PROTCTRLBIT1_SHIFT 1
|
||||
#define PSOC_ETR_AXICTL_PROTCTRLBIT0_MASK 0x1
|
||||
#define PSOC_ETR_AXICTL_PROTCTRLBIT1_MASK 0x2
|
||||
#define PSOC_ETR_AXICTL_WRBURSTLEN_MASK 0xF00
|
||||
|
||||
#endif /* ASIC_REG_GOYA_MASKS_H_ */
|
||||
|
|
|
@ -309,7 +309,9 @@ struct hl_info_hw_ip_info {
|
|||
__u32 num_of_events;
|
||||
__u32 device_id; /* PCI Device ID */
|
||||
__u32 module_id; /* For mezzanine cards in servers (From OCP spec.) */
|
||||
__u32 reserved[2];
|
||||
__u32 reserved;
|
||||
__u16 first_available_interrupt_id;
|
||||
__u16 reserved2;
|
||||
__u32 cpld_version;
|
||||
__u32 psoc_pci_pll_nr;
|
||||
__u32 psoc_pci_pll_nf;
|
||||
|
@ -320,6 +322,8 @@ struct hl_info_hw_ip_info {
|
|||
__u8 pad[2];
|
||||
__u8 cpucp_version[HL_INFO_VERSION_MAX_LEN];
|
||||
__u8 card_name[HL_INFO_CARD_NAME_MAX_LEN];
|
||||
__u64 reserved3;
|
||||
__u64 dram_page_size;
|
||||
};
|
||||
|
||||
struct hl_info_dram_usage {
|
||||
|
@ -327,6 +331,8 @@ struct hl_info_dram_usage {
|
|||
__u64 ctx_dram_mem;
|
||||
};
|
||||
|
||||
#define HL_BUSY_ENGINES_MASK_EXT_SIZE 2
|
||||
|
||||
struct hl_info_hw_idle {
|
||||
__u32 is_idle;
|
||||
/*
|
||||
|
@ -339,7 +345,7 @@ struct hl_info_hw_idle {
|
|||
* Extended Bitmask of busy engines.
|
||||
* Bits definition is according to `enum <chip>_enging_id'.
|
||||
*/
|
||||
__u64 busy_engines_mask_ext;
|
||||
__u64 busy_engines_mask_ext[HL_BUSY_ENGINES_MASK_EXT_SIZE];
|
||||
};
|
||||
|
||||
struct hl_info_device_status {
|
||||
|
@ -604,11 +610,14 @@ struct hl_cs_chunk {
|
|||
};
|
||||
|
||||
/* SIGNAL and WAIT/COLLECTIVE_WAIT flags are mutually exclusive */
|
||||
#define HL_CS_FLAGS_FORCE_RESTORE 0x1
|
||||
#define HL_CS_FLAGS_SIGNAL 0x2
|
||||
#define HL_CS_FLAGS_WAIT 0x4
|
||||
#define HL_CS_FLAGS_COLLECTIVE_WAIT 0x8
|
||||
#define HL_CS_FLAGS_TIMESTAMP 0x20
|
||||
#define HL_CS_FLAGS_FORCE_RESTORE 0x1
|
||||
#define HL_CS_FLAGS_SIGNAL 0x2
|
||||
#define HL_CS_FLAGS_WAIT 0x4
|
||||
#define HL_CS_FLAGS_COLLECTIVE_WAIT 0x8
|
||||
#define HL_CS_FLAGS_TIMESTAMP 0x20
|
||||
#define HL_CS_FLAGS_STAGED_SUBMISSION 0x40
|
||||
#define HL_CS_FLAGS_STAGED_SUBMISSION_FIRST 0x80
|
||||
#define HL_CS_FLAGS_STAGED_SUBMISSION_LAST 0x100
|
||||
|
||||
#define HL_CS_STATUS_SUCCESS 0
|
||||
|
||||
|
@ -622,10 +631,17 @@ struct hl_cs_in {
|
|||
/* holds address of array of hl_cs_chunk for execution phase */
|
||||
__u64 chunks_execute;
|
||||
|
||||
/* this holds address of array of hl_cs_chunk for store phase -
|
||||
* Currently not in use
|
||||
*/
|
||||
__u64 chunks_store;
|
||||
union {
|
||||
/* this holds address of array of hl_cs_chunk for store phase -
|
||||
* Currently not in use
|
||||
*/
|
||||
__u64 chunks_store;
|
||||
|
||||
/* Sequence number of a staged submission CS
|
||||
* valid only if HL_CS_FLAGS_STAGED_SUBMISSION is set
|
||||
*/
|
||||
__u64 seq;
|
||||
};
|
||||
|
||||
/* Number of chunks in restore phase array. Maximum number is
|
||||
* HL_MAX_JOBS_PER_CS
|
||||
|
@ -704,6 +720,8 @@ union hl_wait_cs_args {
|
|||
#define HL_MEM_OP_MAP 2
|
||||
/* Opcode to unmap previously mapped host and device memory */
|
||||
#define HL_MEM_OP_UNMAP 3
|
||||
/* Opcode to map a hw block */
|
||||
#define HL_MEM_OP_MAP_BLOCK 4
|
||||
|
||||
/* Memory flags */
|
||||
#define HL_MEM_CONTIGUOUS 0x1
|
||||
|
@ -758,6 +776,17 @@ struct hl_mem_in {
|
|||
__u64 mem_size;
|
||||
} map_host;
|
||||
|
||||
/* HL_MEM_OP_MAP_BLOCK - map a hw block */
|
||||
struct {
|
||||
/*
|
||||
* HW block address to map, a handle will be returned
|
||||
* to the user and will be used to mmap the relevant
|
||||
* block. Only addresses from configuration space are
|
||||
* allowed.
|
||||
*/
|
||||
__u64 block_addr;
|
||||
} map_block;
|
||||
|
||||
/* HL_MEM_OP_UNMAP - unmap host memory */
|
||||
struct {
|
||||
/* Virtual address returned from HL_MEM_OP_MAP */
|
||||
|
@ -784,8 +813,9 @@ struct hl_mem_out {
|
|||
__u64 device_virt_addr;
|
||||
|
||||
/*
|
||||
* Used for HL_MEM_OP_ALLOC. This is the assigned
|
||||
* handle for the allocated memory
|
||||
* Used for HL_MEM_OP_ALLOC and HL_MEM_OP_MAP_BLOCK.
|
||||
* This is the assigned handle for the allocated memory
|
||||
* or mapped block
|
||||
*/
|
||||
__u64 handle;
|
||||
};
|
||||
|
|
Loading…
Reference in New Issue