Commit Graph

154 Commits

Author SHA1 Message Date
Ofir Bitton bb677d527e habanalabs/gaudi2: allow user to flush PCIE by read
In order for the user to flush PCIE he needs to read some register
from PCIE block. The chosen register is SPECIAL_GLBL_SPARE_0 and
hence needs to be unsecured.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-19 15:10:01 +03:00
Dani Liberman 0c88760f8f habanalabs/gaudi2: add secured attestation info uapi
User will provide a nonce via the ioctl, and will retrieve
secured attestation data of the boot, generated using given
nonce.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-19 15:08:40 +03:00
farah kassabri 04d53cd2a6 habanalabs/gaudi2: get f/w reset status register dynamically
Get the firmware reset status address from the dynamic registers
we read from the firmware instead of using a define.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-19 15:08:39 +03:00
Ofir Bitton 0626fa1a4d habanalabs: add support for new cpucp return codes
Firmware now responds with a more detailed cpucp return codes.
Driver can now distinguish between error and debug return codes.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-19 15:08:38 +03:00
farah kassabri 4745b2f0d0 habanalabs: send device active message to f/w
As part of the RAS that is done by the f/w, we should send a message
to the f/w when a user either acquires or releases the device.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-19 15:08:37 +03:00
Ofir Bitton d155df4f62 habanalabs: ignore EEPROM errors during boot
EEPROM errors reported by firmware are basically warnings and
should not fail the boot process.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18 13:29:53 +03:00
Tal Cohen 194e515c79 habanalabs/gaudi2: new API to control engine cores running mode
The current flow of halting the engine cores is implemented by command
buffers built by the user space and sent towards the Driver.

This current flow is broken since the user space does not know when
the cores actually halt as sending a workload is async op.

Therefore the application can not free the memory that is mapped
to the engine cores.

This new API allows the user space to control the running mode. The
API call is sync (returns after the cores are set to the
requested mode).

Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18 13:29:51 +03:00
Oded Gabbay 07056f58e4 habanalabs: remove left-over code from bring-up
There is some left-over code from the gaudi2 bring-up that wasn't
removed so far.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18 13:29:51 +03:00
Ofir Bitton ae937492ec habanalabs/gaudi2: remove old interrupt mappings
Interrupt enumration has changed some time ago but the old mapping
was accidentally left in the driver.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18 13:29:50 +03:00
Tomer Tayar 168fc71857 habanalabs/gaudi2: map virtual MSI-X doorbell memory for user
Upon the initialization of a user context, map the host memory page of
the virtual MSI-X doorbell in the device MMU.
A reserved VA is used for this purpose, so user can use it directly
without any allocation/map operation.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12 09:09:31 +03:00
Tomer Tayar 3f043b3192 habanalabs/gaudi2: modify decoder to use virtual MSI-X doorbell
Modify the decoder wrapper blocks to generate interrupts using the
virtual MSI-X doorbell.

As a decoder wrapper block cannot write directly to HBW upon completion,
it writes instead to SOB which is monitored by a master monitor.
When resolved, this monitor will be the one to actually write to the
virtual MSI-X doorbell.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12 09:09:31 +03:00
Ofir Bitton a85e389a84 habanalabs/gaudi2: reset device upon critical ECC event
Correctable ECC events are not fatal, but as they accumulate, the f/w
can decide that a hard-rest is required. This indication is
propagated to the host using the existing ECC event interface.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12 09:09:28 +03:00
Oded Gabbay d7bb1ac89b habanalabs: add gaudi2 asic-specific code
Add the ASIC-specific code for Gaudi2. Supply (almost) all of the
function callbacks that the driver's common code need to initialize,
finalize and submit workloads to the Gaudi2 ASIC.

It also contains the code to initialize the F/W of the Gaudi2 ASIC
and to receive events from the F/W.

It contains new debugfs entry to dump razwi events. razwi is a case
where the device's engines create a transaction that reaches an
invalid destination.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12 09:09:27 +03:00
Oded Gabbay 01d9ccf865 habanalabs/gaudi2: add asic registers header files
Add the relevant GAUDI2 ASIC registers header files. These files are
generated automatically from a tool maintained by the VLSI engineers.

There are more files which are not upstreamed because only very few
defines from those files are used in the driver. For those files, I
copied the relevant defines into gaudi2_regs.h and gaudi2_masks.h, to
reduce the size of this patch.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12 09:09:27 +03:00
Oded Gabbay be882e534f habanalabs/gaudi: enable error interrupt on ARB WDT
We want to receive an error interrupt in case the watchdog timer
expires on arbitration event in the queues.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12 09:09:26 +03:00
Oded Gabbay 5125aa3368 habanalabs/goya: move dma direction enum to uapi file
The values in this enum are not used by h/w but are a contract
between userspace and the kernel driver so they must be defined
in the uapi file.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12 09:09:25 +03:00
ran shalit e41c641856 habanalabs: add critical indication in sram ecc
Multiple SRAM SERR events are treated as critical events,
and host should be notified about it. Thus, adding is_critical
indication as part of SRAM ECC failure packet.

Signed-off-by: ran shalit <rshalit@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12 09:09:23 +03:00
Oded Gabbay 368b0b4fd6 habanalabs: update firmware header
Update cpucp_if.h to latest version.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-22 21:01:20 +02:00
Ohad Sharabi d0b59cf68c habanalabs/gaudi: add debugfs to fetch internal sync status
When Gaudi device is secured the monitors data in the configuration
space is blocked from PCI access.
As we need to enable user to get sync-manager monitors registers when
debugging, this patch adds a debugfs that dumps the information to a
binary file (blob).
When a root user will trigger the dump, the driver will send request to
the f/w to fill a data structure containing dump of all monitors
registers.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-22 20:57:37 +02:00
Ohad Sharabi 06926dbed2 habanalabs: convert all MMU masks/shifts to arrays
There is no need to hold each MMU mask/shift as a denoted structure
member (e.g. hop0_mask).

Instead converting it to array will result in smaller and more readable
code.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-22 20:57:34 +02:00
Linus Torvalds 02e2af20f4 Char/Misc and other driver updates for 5.18-rc1
Here is the big set of char/misc and other small driver subsystem
 updates for 5.18-rc1.
 
 Included in here are merges from driver subsystems which contain:
 	- iio driver updates and new drivers
 	- fsi driver updates
 	- fpga driver updates
 	- habanalabs driver updates and support for new hardware
 	- soundwire driver updates and new drivers
 	- phy driver updates and new drivers
 	- coresight driver updates
 	- icc driver updates
 
 Individual changes include:
 	- mei driver updates
 	- interconnect driver updates
 	- new PECI driver subsystem added
 	- vmci driver updates
 	- lots of tiny misc/char driver updates
 
 There will be two merge conflicts with your tree, one in MAINTAINERS
 which is obvious to fix up, and one in drivers/phy/freescale/Kconfig
 which also should be easy to resolve.
 
 All of these have been in linux-next for a while with no reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYkG3fQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykNEgCfaRG8CRxewDXOO4+GSeA3NGK+AIoAnR89donC
 R4bgCjfg8BWIBcVVXg3/
 =WWXC
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc and other driver updates from Greg KH:
 "Here is the big set of char/misc and other small driver subsystem
  updates for 5.18-rc1.

  Included in here are merges from driver subsystems which contain:

   - iio driver updates and new drivers

   - fsi driver updates

   - fpga driver updates

   - habanalabs driver updates and support for new hardware

   - soundwire driver updates and new drivers

   - phy driver updates and new drivers

   - coresight driver updates

   - icc driver updates

  Individual changes include:

   - mei driver updates

   - interconnect driver updates

   - new PECI driver subsystem added

   - vmci driver updates

   - lots of tiny misc/char driver updates

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'char-misc-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (556 commits)
  firmware: google: Properly state IOMEM dependency
  kgdbts: fix return value of __setup handler
  firmware: sysfb: fix platform-device leak in error path
  firmware: stratix10-svc: add missing callback parameter on RSU
  arm64: dts: qcom: add non-secure domain property to fastrpc nodes
  misc: fastrpc: Add dma handle implementation
  misc: fastrpc: Add fdlist implementation
  misc: fastrpc: Add helper function to get list and page
  misc: fastrpc: Add support to secure memory map
  dt-bindings: misc: add fastrpc domain vmid property
  misc: fastrpc: check before loading process to the DSP
  misc: fastrpc: add secure domain support
  dt-bindings: misc: add property to support non-secure DSP
  misc: fastrpc: Add support to get DSP capabilities
  misc: fastrpc: add support for FASTRPC_IOCTL_MEM_MAP/UNMAP
  misc: fastrpc: separate fastrpc device from channel context
  dt-bindings: nvmem: brcm,nvram: add basic NVMEM cells
  dt-bindings: nvmem: make "reg" property optional
  nvmem: brcm_nvram: parse NVRAM content into NVMEM cells
  nvmem: dt-bindings: Fix the error of dt-bindings check
  ...
2022-03-28 12:27:35 -07:00
Oded Gabbay 100fcf1e11 habanalabs/gaudi: add missing handling of NIC related events
There are a few events that can arrive from the f/w and without proper
handling can cause errors to appear in the kernel log without reason.

Add the relevant handling that was missing.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28 14:22:05 +02:00
Oded Gabbay 008255ec3d habanalabs: update to latest f/w specs
Copy the latest versions of the f/w specs files.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28 14:22:03 +02:00
Rajaravi Krishna Katta 4c01e524b2 habanalabs: sysfs support for fw os version
Adds new sysfs entry to display firmware os version
/sys/class/habanalabs/hl<n>/fw_os_ver

Signed-off-by: Rajaravi Krishna Katta <rkatta@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28 14:22:02 +02:00
Gustavo A. R. Silva 5224f79096 treewide: Replace zero-length arrays with flexible-array members
There is a regular need in the kernel to provide a way to declare
having a dynamically sized set of trailing elements in a structure.
Kernel code should always use “flexible array members”[1] for these
cases. The older style of one-element or zero-length arrays should
no longer be used[2].

This code was transformed with the help of Coccinelle:
(next-20220214$ spatch --jobs $(getconf _NPROCESSORS_ONLN) --sp-file script.cocci --include-headers --dir . > output.patch)

@@
identifier S, member, array;
type T1, T2;
@@

struct S {
  ...
  T1 member;
  T2 array[
- 0
  ];
};

UAPI and wireless changes were intentionally excluded from this patch
and will be sent out separately.

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.16/process/deprecated.html#zero-length-and-one-element-arrays

Link: https://github.com/KSPP/linux/issues/78
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2022-02-17 07:00:39 -06:00
Tomer Tayar f297a0e9fe habanalabs: add CPU-CP packet for engine core ASID cfg
In some cases the driver cannot configure ASID of some engines due to
the security level of the relevant registers.
For this a new CPU-CP packet is introduced, which will allow the driver
to ask the F/W to do this configuration instead.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 14:39:53 +02:00
Ohad Sharabi d636a932b3 habanalabs: clean MMU headers definitions
During the MMU development the MMU header files were left with unclean
definitions:

- MMU "version specific" definitions that were left in the mmu_general
  file
- unused definitions

This patch attempts, where possible, to keep definitions that can serve
multiple MMU versions (but that are not tightly bound with specific MMU
arch) in the mmu_general header file (e.g. different definitions for
number of HOPs).

Otherwise, move MMU version specific definitions (e.g. HOPs masks and
shifts) to the specific MMU version file.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:09 +02:00
Ofir Bitton b5c92b8882 habanalabs: sysfs support for two infineon versions
Currently sysfs support dumping a single infineon version, in
future asics we will have two infineon versions.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:09 +02:00
Ofir Bitton e2637fdca7 habanalabs: handle device TPM boot error as warning
AS TPM error indication is not fatal, driver should dump a warning
and continue booting.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:05 +02:00
Ofir Bitton 3eb7754ff4 habanalabs: debugfs support for larger I2C transactions
I2C debugfs support is limited to 1 byte. We extend functionality
to more than 1 byte by using one of the pad fields as a length.
No backward compatibility issues as new F/W versions will treat 0
length as a 1 byte length transaction.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:05 +02:00
farah kassabri 49c052dad6 habanalabs: add new opcodes for INFO IOCTL
Add implementation for new opcodes in the INFO IOCTL:
1. Retrieve the replaced DRAM rows from f/w.
2. Retrieve the pending DRAM rows from f/w.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:05 +02:00
Rajaravi Krishna Katta e84e31a912 habanalabs: add dedicated message towards f/w to set power
CPUCP_PACKET_POWER_GET packet type was used for both
hl_get_power() and hl_set_power().

To align with other sensor functions hl_set_power()
should use CPUCP_PACKET_POWER_SET.

This packet will only be used with newer ASICs, so need to add
a compatibility flag to the asic properties to indicate whether to use
this packet or the GET packet.

Signed-off-by: Rajaravi Krishna Katta <rkatta@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:04 +02:00
Oded Gabbay efc6b04b86 habanalabs: update firmware files
Update the firmware headers to the latest version

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-10-18 12:05:47 +03:00
Rajaravi Krishna Katta 2b28485d0a habanalabs: enable power info via HWMON framework
Add support to retrieve following power info via HWMON:
- instantaneous power value
- highest value since last reset
- reset the highest place holder

Signed-off-by: Rajaravi Krishna Katta <rkatta@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-10-18 12:05:46 +03:00
Rajaravi Krishna Katta 7457269136 habanalabs: create static map of f/w hwmon enums
Instead of using the Linux kernel HWMON enums definition when
communicating with the firmware, use proprietary HWMON based enums
i.e. map hwmon.h header enum to cpucp_if.h based enum while.

This is needed because the HWMON enums are not forcing backward
compatibility and therefore changes can break compatibility between
newer driver and older firmware.

The driver will check for CPU_BOOT_DEV_STS0_MAP_HWMON_EN bit to
validate if f/w supports cpucp->hwmon enum mapping to support older
firmware where this mapping won't be available.

Signed-off-by: Rajaravi Krishna Katta <rkatta@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-10-18 12:05:46 +03:00
Omer Shpigelman 3e08f157c2 habanalabs/gaudi: use direct MSI in single mode
Due to FLR scenario when running inside a VM, we must not use indirect
MSI because it might cause some issues on VM destroy.
In a VM we use single MSI mode in contrary to multi MSI mode which is
used in bare-metal.
Hence direct MSI should be used in single MSI mode only.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-09-14 15:00:03 +03:00
Oded Gabbay c2aa713618 habanalabs: update to latest firmware headers
Add several new packets between driver and firmware.
Add matching compatibility bits for backward compatibility.
Add support for 4K event types.
Add information about pcie errors.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-09-01 18:38:24 +03:00
Oded Gabbay 5dc9ffaff1 habanalabs: expose server type in INFO IOCTL
Add the server type property to the hl_info_hw_ip_info structure
that is exposed to the user via the INFO IOCTL.

This is needed by the userspace s/w stack to know the connections map
of the internal links that connect the ASIC among themselves inside the
server.

The F/W will tell us, as part of the NIC information, the server type
that the GAUDI is located in.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-09-01 18:38:24 +03:00
Oded Gabbay 2a2c4b7403 habanalabs: update firmware header to latest version
Add two new fields regarding interrupts communication between driver
and f/w.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-08-29 09:47:47 +03:00
Yuri Nudelman 77977ac875 habanalabs/gaudi: implement state dump
At the first stage, only gaudi core dump shall be implemented, not
including the status registers.

Signed-off-by: Yuri Nudelman <ynudelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-08-29 09:47:46 +03:00
Ofir Bitton c67b0579b8 habanalabs: update firmware header files
Update recent changes made in firmware header files, which contain
a minor COMMS protocol change and new error status definitions.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-08-29 09:47:44 +03:00
Ofir Bitton 6c31f494d8 habanalabs/gaudi: add support for NIC DERR
We add support for NIC DERR ECC error events, in case this error
is received a device reset will be performed.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-21 10:21:28 +03:00
Ofir Bitton 7d5ba005cf habanalabs/gaudi: correct driver events numbering
Currently driver sends fc interrupt id to FW instead of using
cpu interrupt id. We intend to fix that and keep backward
compatibility by using the same interrupt values.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:42 +03:00
Ohad Sharabi e1222c2794 habanalabs: report EQ fault during heartbeat
In case we have EQ fault we would like to know about it.
For this, a status bitmask was added in which EQ_FAULT bit is
set by FW in case of EQ fault.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:41 +03:00
Ofir Bitton 254fac6d1a habanalabs/gaudi: add FW alive event support
In order for driver to be aware of process or thread crashes inside
GAUDI's CPU, we introduce a new event which contains all relevant
information. Upon event reception, driver will dump information and
will reset the device.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:41 +03:00
Oded Gabbay 5a967fb3a7 habanalabs/gaudi: update to latest f/w specs
Update the firmware interface files to their latest version.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:41 +03:00
Ofir Bitton 5bc691d849 habanalabs/gaudi: split host irq interfaces towards FW
Current implementation uses a single interrupt interface towards
FW, this interface is causing races between interrupt types.
We split this interface to interface per interrupt type.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:41 +03:00
Tomer Tayar ae151bcfab habanalabs/gaudi: add ARB to QM stop on error masks
Update the QM stop on error masks to also stop on ARB errors.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:40 +03:00
Oded Gabbay 1242e9f0f4 habanalabs: check running index in eqe control
To harden the event queue mechanism, we add a running index to the
control header of the entry.

The firmware writes the index in each entry and the driver verifies
that the index of the current entry is larger by 1 of the index of
the previous entry.

In case it isn't, the driver will treat the entry as if it wasn't
valid (it won't process it but won't skip it).

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:40 +03:00
Koby Elbaz e591a49cb5 habanalabs/gaudi: read GIC sts after FW is loaded
Reading of GIC privileged status will be done after F/W is loaded,
because privileged GIC capability is only available with the correct
ARMCP version, and after it's loaded.
Such versions necessarily support COMMS, so GIC alternatives (SP regs)
will be read directly from dynamic regs.

As well, initiation of DMA QMANs will occur after F/W is loaded
since it depends on GIC configuration.

In case F/W isn't loaded there's no problem since either way
there won't be any GIC IRQ handling.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:40 +03:00