habanalabs: add an option to control watchdog timeout via debugfs

Add an option to control the timeout value for the driver's watchdog
of the reset process. The timeout represents the amount of the user
has to close his process once he gets a device reset notification from
the driver.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
This commit is contained in:
Tomer Tayar 2022-09-30 16:37:41 +03:00 committed by Oded Gabbay
parent a88a6f5f5c
commit 11669b58fa
2 changed files with 12 additions and 0 deletions

View File

@ -91,6 +91,13 @@ Description: Enables the root user to set the device to specific state.
Valid values are "disable", "enable", "suspend", "resume".
User can read this property to see the valid values
What: /sys/kernel/debug/habanalabs/hl<n>/device_release_watchdog_timeout
Date: Oct 2022
KernelVersion: 6.2
Contact: ttayar@habana.ai
Description: The watchdog timeout value in seconds for a device relese upon
certain error cases, after which the device is reset.
What: /sys/kernel/debug/habanalabs/hl<n>/dma_size
Date: Apr 2021
KernelVersion: 5.13

View File

@ -1769,6 +1769,11 @@ void hl_debugfs_add_device(struct hl_device *hdev)
dev_entry,
&hl_timeout_locked_fops);
debugfs_create_u32("device_release_watchdog_timeout",
0644,
dev_entry->root,
&hdev->device_release_watchdog_timeout_sec);
for (i = 0, entry = dev_entry->entry_arr ; i < count ; i++, entry++) {
debugfs_create_file(hl_debugfs_list[i].name,
0444,