Commit Graph

118 Commits

Author SHA1 Message Date
Oded Gabbay f83f3a31b2 habanalabs/gaudi: mask WDT error in QMAN
This interrupt cause is not relevant because of how the user use the
QMAN arbitration mechanism. We must mask it as the log explodes with it.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-04 08:56:07 +02:00
Oded Gabbay f279e5cd95 habanalabs: update scratchpad register map
Our firmware use some scratchpad registers in the device for different
roles. Update the file to the latest version of the firmware code.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:54 +03:00
Oded Gabbay 219b8f2ff0 habanalabs: update firmware interface file
Add new packet to fetch PLL information from firmware. This will be needed
in the future when the driver won't be able to access the PLL registers
directly

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:54 +03:00
Moti Haimovski 7edf341b9e habanalabs: add num_hops to hl_mmu_properties
This commit adds the number of HOPs supported by the device to the
device MMU properties.

Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:53 +03:00
Oded Gabbay 5a1b861daa habanalabs: increase PQ COMP_OFFSET by one nibble
For future ASICs, we increase this field by one nibble. This field was not
used by the current ASICs so this change doesn't break anything.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:52 +03:00
Ofir Bitton 763a0b4d81 habanalabs: Fix alignment issue in cpucp_info structure
Because the device CPU compiler aligns structures to 8 bytes,
struct cpucp_info has an alignment issue as some parts
in the structure are not aligned to 8 bytes.
It is preferred that we explicitly insert placeholders inside
the structure to avoid confusion

in order to validate this scenario, we printed both pointers:

__u8 cpucp_version[VERSION_MAX_LEN]; (0xffff899c67ed4cbc)
__le64 dram_size;                    (0xffff899c67ed4d40)

we see difference of 132 bytes although the first array
is only 128 bytes long, Meaning compiler added a 4 byte padding.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:52 +03:00
Oded Gabbay 2f55342c5e habanalabs: replace armcp with the generic cpucp
ArmCP mandates that the device CPU is always an ARM processor, which might
be wrong in the future.

Most of this change is an internal renaming of variables, functions and
defines but there are two entries in sysfs which have armcp in their
names. Add identical cpucp entries but don't remove yet the armcp entries.
Those will be deprecated next year. Add the documentation about it in sysfs
documentation.

Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:51 +03:00
Oded Gabbay 42b0698add habanalabs: update GAUDI hardware specs
Add define for the 2 MME slave engines.

Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:51 +03:00
farah kassabri 9f3064913e habanalabs: add support for getting device total energy
Add driver implementation for reading the total energy consumption
from the device ARM FW.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:51 +03:00
Tomer Tayar 56004701f5 habanalabs: Include linux/bitfield.h only in habanalabs.h
Include linux/bitfield.h only in habanalabs.h, instead of in each and
every file that needs it, as habanalabs.h is already included by all.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:51 +03:00
Oded Gabbay 65887291c6 habanalabs: use FIELD_PREP() instead of <<
Use the standard FIELD_PREP() macro instead of << operator to perform
bitmask operations. This ensures type check safety and eliminate compiler
warnings.

Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:50 +03:00
Ofir Bitton 0a068adde5 habanalabs: add information about PCIe controller
Update firmware header with new API for getting pcie info
such as tx/rx throughput and replay counter.
These counters are needed by customers for monitor and maintenance
of multiple devices.
Add new opcodes to the INFO ioctl to retrieve these counters.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:49 +03:00
Ofir Bitton 69c6e18d0c habanalabs: fix report of RAZWI initiator coordinates
All initiator coordinates received upon an 'MMU page fault RAZWI
event' should be the routers coordinates, the only exception is the
DMA initiators for which the reported coordinates correspond to
their actual location.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-08-31 15:10:27 +03:00
Oded Gabbay eb8b293e79 habanalabs: update hl_boot_if.h from firmware
Update the boot interface file from the latest version from firmware.
Defines for secure boot were added.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2020-07-24 20:31:37 +03:00
Oded Gabbay 70b2f993ea habanalabs: create common folder
For internal needs of our CI we need to move all the common code into a
common folder instead of putting them in the root folder of the driver.

Same applies to the common header files under include/

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2020-07-24 20:31:37 +03:00
Oded Gabbay c83c417193 habanalabs: halt device CPU only upon certain reset
Currently the driver halts the device CPU in the halt engines function,
which halts all the engines of the ASIC. The problem is that if later on we
stop the reset process (due to inability to clean memory mappings in time),
the CPU will remain in halt mode. This creates many issues, such as
thermal/power control and FLR handling.

Therefore, move the halting of the device CPU to the very end of the reset
process, just before writing to the registers to initiate the reset. In
addition, the driver now needs to send a message to the device F/W to
disable it from sending interrupts to the host machine because during halt
engines function the driver disables the MSI/MSI-X interrupts.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
2020-07-24 20:31:36 +03:00
Oded Gabbay fcc6a4e606 habanalabs: Extract ECC information from FW
ECC (Error Correcting Code) interrupts are going to be handled
by the FW. Hence, we define an interface in which the driver can
obtain the relevant ECC information.
This information is needed for monitoring and can also lead
to a hard reset if ECC error is not correctable.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-07-24 20:31:36 +03:00
Adam Aharon e8edded693 habanalabs: calculate trace frequency from PLL
The profiler needs to know the PLL values for correctly showing the
profiling data. Because our firmware can use different PLL configurations,
we need to read the PLL values from the ASIC to pass them to the profiler.

Signed-off-by: Adam Aharon <aaharon@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-07-24 20:31:35 +03:00
Oded Gabbay 6ced91170d habanalabs: align armcp_packet structure to 8 bytes
Once there is a 64-bit field in a structure, GCC compiler for ARM aligns
the structure to 8 bytes. In order to avoid confusion when these
structures are being passed between CPUs from different architectures, we
explicitly align the structure to 8 bytes.

Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-07-24 20:31:35 +03:00
Ofir Bitton 6c07bab34b habanalabs: Use mask instead of shift in sync stream registers
Use proper bitfield masks instead of shifting values when configuring
packets sent to device.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-07-24 20:31:34 +03:00
Oded Gabbay 64536abc62 habanalabs: block scalar load_and_exe on external queue
In Gaudi, the user can't execute scalar load_and_exe on external queue
because it can be a security hole. The driver doesn't parse the commands
being loaded and it can be msg_prot, which the user isn't allowed to use.

Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-06-24 09:09:10 +03:00
Ofir Bitton ebd8d12251 habanalabs: move event handling to common firmware file
Instead of writing similar event handling code for each ASIC, move the code
to the common firmware file. This code will be used for GAUDI and all
future ASICs.

In addition, add two new fields to the auto-generated events file: valid
and description. This will save the need to manually write the events
description in the source code and simplify the code.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-05-19 14:48:41 +03:00
Oded Gabbay ac0ae6a96a habanalabs: add gaudi asic-dependent code
Add the ASIC-dependent code for GAUDI. Supply (almost) all of the function
callbacks that the driver's common code need to initialize, finalize and
submit workloads to the GAUDI ASIC.

It also contains the code to initialize the F/W of the GAUDI ASIC and to
receive events from the F/W.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-05-19 14:48:41 +03:00
Oded Gabbay 2aad2bf81c habanalabs: add gaudi asic registers header files
Add the relevant GAUDI ASIC registers header files. These files are
generated automatically from a tool maintained by the VLSI engineers.

There are more files which are not upstreamed because only very few defines
from those files are used in the driver. For those files, we copied the
relevant defines into gaudi_regs.h and gaudi_masks.h, to reduce the size of
this patch.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-05-19 14:48:41 +03:00
Omer Shpigelman fca72fbb66 habanalabs: get card type, location from F/W
For Gaudi the driver gets two new additional properties from the F/W:
1. The card's type - PCI or PMC
2. The card's location in the Gaudi's box (relevant only for PMC).

The card's location is also passed to the user in the HW IP info structure
as it needs this property for establishing communication between Gaudis.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-05-19 14:48:41 +03:00
Oded Gabbay 010a118cfe habanalabs: update F/W register map
Update the mapping to the latest one used by the Firmware. No impact on the
driver in this update.

Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-05-19 14:48:41 +03:00
Ofir Bitton 47f6b41cdd habanalabs: load CPU device boot loader from host
Load CPU device boot loader during driver boot time in order to avoid flash
write for every boot loader update.

To preserve backward-compatibility, skip the device boot load if the device
doesn't request it.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-05-19 14:48:41 +03:00
Christine Gharzuzi 8e708af284 habanalabs: support hwmon_reset_history attribute
Support hwmon_temp_reset_histroy, hwmon_in_reset_history and
hwmon_curr_reset attribute which resets the historical highest value.

Signed-off-by: Christine Gharzuzi <cgharzuzi@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-05-19 14:48:41 +03:00
Tomer Tayar 25e7aeba60 habanalabs: Add INFO IOCTL opcode for time sync information
Add a new opcode to the INFO IOCTL that retrieves the device time
alongside the host time, to allow a user application that want to measure
device time together with host time (such as a profiler) to synchronize
these times.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-05-19 14:48:41 +03:00
Oded Gabbay cb056b9fd5 habanalabs: retrieve DMA mask indication from firmware
Retrieve from the firmware the DMA mask value we need to set according to
the device's PCI controller configuration. This is needed when working on
POWER9 machines, as the device's PCI controller is configured in a
different way in those machines.

Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-05-17 12:06:22 +03:00
Oded Gabbay c8aee597bb habanalabs: update firmware definitions
Add comments for the various errors and states of the firmware during boot.
Add a mapping of a new register that will tell the driver whether the
firmware executed the request from the driver or if it has encountered an
error.
Add a new enum for the possible values of this register.

Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-05-17 12:06:22 +03:00
Oded Gabbay 7e1c07dd35 habanalabs: unify and improve device cpu init
Move the code of device CPU initialization from being ASIC-Dependent to
common code. In addition, add support for the new error reporting feature
of the firmware boot code.

Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-05-17 12:06:22 +03:00
Omer Shpigelman 76cedc739d habanalabs: remove stop-on-error flag from DMA
Stop-on-error mode in DMA is useful as it stops the transaction
immediately upon error e.g. page fault.
But it may cause the next command submission to fail as is leaves the DMA
in unstable state.
Therefore we remove the stop-on-error configuration from the DMA.
Stop-on-err is still available for debug.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-05-17 12:06:22 +03:00
Omer Shpigelman 4f0e6ab78a habanalabs: add print upon clock change
Add print upon clock slow down due to power consumption or overheating.
In addition, add print when back to optimal clock.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-03-24 10:54:17 +02:00
Oded Gabbay bc6ed3aa92 habanalabs: update goya firmware register map
Use specific values in enum of register map to be able to deprecate old
values.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-03-24 10:54:17 +02:00
Christine Gharzuzi 0da10e683e habanalabs: provide historical maximum of various sensors
Add support for hwmon_in_highest, hwmon_temp_highest and hwmon_curr_highest
attributes. These attributes retrieve the historical maximum voltage,
temperature and current that were sampled, respectively.

Signed-off-by: Christine Gharzuzi <cgharzuzi@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-03-24 10:54:16 +02:00
Moti Haimovski 5557b138dc habanalabs: support temperature offset via sysfs
This commit adds support for offsetting the temperatures reading
by a specified value as defined in
https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface
using the standard sysfs defined for hwmon.
This is required by system administrators to inject errors to test
their monitoring applications in data centers.

Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-03-24 10:54:16 +02:00
Omer Shpigelman 54bb67444e habanalabs: split MMU properties to PCI/DRAM
Split the properties used for MMU mappings to DRAM and PCI (host) types.
This is a prerequisite for future ASICs support.
Note that in Goya ASIC, the PMMU and DMMU are the same (except of page
sizes) as only one MMU mechanism is used for both of the mapping types.
Hence this patch should not have any effect on current behavior.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-11-21 11:35:46 +02:00
Omer Shpigelman 30919edef2 habanalabs: re-factor MMU masks and documentation
Some cosmetics around the MMU code to make it more self-explanatory.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-11-21 11:35:46 +02:00
Oded Gabbay 6476b47243 habanalabs: set ETR as non-secured
ETR should always be non-secured as it is used by the users to record
profiling/trace data.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2019-11-21 11:35:45 +02:00
Oded Gabbay e1a84d56fc habanalabs: use registers name defines for ETR block
We have a single ETR block in the SOC, so use explicit register
name defines for initializing this block. This makes it more readable and
maintainable.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2019-11-21 11:35:45 +02:00
Tomer Tayar cb596aee88 habanalabs: Add a new H/W queue type
This patch adds a support for a new H/W queue type.
This type of queue is for DMA and compute engines jobs, for which
completion notification are sent by H/W.
Command buffer for this queue can be created either through the CB
IOCTL and using the retrieved CB handle, or by preparing a buffer on the
host or device SRAM/DRAM, and using the device address to that buffer.
The patch includes the handling of the 2 options, as well as the
initialization of the H/W queue and its jobs scheduling.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-11-21 11:35:45 +02:00
Oded Gabbay abb7e16fb6 habanalabs: handle F/W failure for sensor initialization
In case the F/W fails to initialize the thermal sensors, print an
appropriate error message to kernel log and fail the device
initialization.

Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-11-21 11:35:44 +02:00
Oded Gabbay 4c172bbfaa habanalabs: stop using the acronym KMD
We want to stop using the acronym KMD. Therefore, replace all locations
(except for register names we can't modify) where KMD is written to other
terms such as "Linux kernel driver" or "Host kernel driver", etc.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2019-09-05 14:55:27 +03:00
Oded Gabbay 0996bd1c74 habanalabs: display card name as sensors header
To allow the user to use a custom file for the HWMON lm-sensors library
per card type, the driver needs to register the HWMON sensors with the
specific card type name.

The card name is supplied by the F/W running on the device. If the F/W is
old and doesn't supply a card name, a default card name is displayed as
the sensors group name.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2019-09-05 14:55:27 +03:00
Oded Gabbay 75b3cb2bb0 habanalabs: add uapi to retrieve device utilization
Users and sysadmins usually want to know what is the device utilization as
a level 0 indication if they are efficiently using the device.

Add a new opcode to the INFO IOCTL that will return the device utilization
over the last period of 100-1000ms. The return value is 0-100,
representing as percentage the total utilization rate.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2019-09-05 14:55:27 +03:00
Tomer Tayar 10d7de2cdb habanalabs: Add descriptive name to PSOC app status register
Add a meaningful name to the general PSOC application status register
which better describes its usage in keeping the HW state.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-09-05 14:55:26 +03:00
Tomer Tayar 4095a17657 habanalabs: Add descriptive names to PSOC scratch-pad registers
The PSOC scratch-pad registers are used for communication with the
device CPU. This patch adds new definitions for these registers which
are more descriptive than their general names.

The new set of definitions also gathers and documents the current usage
of the scratch-pad registers by the driver and the device CPU.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-09-05 14:55:26 +03:00
Ben Segal 213ad5ad01 habanalabs: fix endianness handling for packets from user
Packets that arrive from the user and need to be parsed by the driver are
assumed to be in LE format.

This patch fix all the places where the code handles these packets and use
the correct endianness macros to handle them, as the driver handles the
packets in CPU format (LE or BE depending on the arch).

Signed-off-by: Ben Segal <bpsegal20@gmail.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-08-12 09:01:10 +03:00
Tomer Tayar ac6183ae4b habanalabs: Update the device idle check
The patch updates the device idle check:
- Add reading the DMA core status register, because it is possible that
  a QMAN has finished its work but the DMA itself is still running.
- Remove the MME shadow status check, as the MME ARCH status register
  includes the status of all MME shadows.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-07-01 13:59:44 +00:00
Tomer Tayar 03d5f641dc habanalabs: Use single pool for CPU accessible host memory
The device's CPU accessible memory on host is managed in a dedicated
pool, except for 2 regions - Primary Queue (PQ) and Event Queue (EQ) -
which are allocated from generic DMA pools.
Due to address length limitations of the CPU, the addresses of all these
memory regions must have the same MSBs starting at bit 40.
This patch modifies the allocation of the PQ and EQ to be also from the
dedicated pool, to ensure compliance with the limitation.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-04-28 19:17:38 +03:00
Omer Shpigelman 8ba2876ddf habanalabs: add goya implementation for debug configuration
This patch adds the ASIC-specific function for GOYA to configure the
coresight components.

Most of the components have an enabled/disabled flag, depending on whether
the user wants to enable the component or disable it.

For some of the components, such as ETR and SPMU, the user can also
request to read values from them. Those values are needed for the user to
parse the trace data.

The ETR configuration is also checked for security purposes, to make sure
the trace data is written to the device's DRAM.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-04-01 22:23:02 +03:00
Oded Gabbay 9336c02167 habanalabs: remove trailing blank line from EOF
GIT does not like extra blank lines at the end of the file, so this patch
removes those lines.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
2019-03-31 11:29:53 +03:00
Omer Shpigelman 66542c3b9d habanalabs: add MMU shadow mapping
This patch adds shadow mapping to the MMU module. The shadow mapping
allows traversing the page table in host memory rather reading each PTE
from the device memory.
It brings better performance and avoids reading from invalid device
address upon PCI errors.
Only at the end of map/unmap flow, writings to the device are performed in
order to sync the H/W page tables with the shadow ones.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-02-24 09:17:55 +02:00
Tomer Tayar b6f897d75d habanalabs: Move PCI code into common file
Move duplicated PCI-related code from ASIC-specific files into the common
pci.c file.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-03-05 16:48:42 +02:00
Oded Gabbay c535bfdd0f habanalabs: use EQ MSI/X ID per chip
The Event Queue MSI/X ID is different per ASIC. This patch renames the
current define to have the GOYA_ prefix to mark it only for Goya. It also
moves it from the common armcp_if.h file to the ASIC specific goya_fw_if.h
file.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-03-04 15:51:30 +02:00
Tomer Tayar 3110c60fdc habanalabs: Move device CPU code into common file
This patch moves the code that is responsible of the communication
vs. the F/W to a dedicated file. This will allow us to share the code
between different ASICs.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-03-04 10:22:09 +02:00
Dotan Barak 5eb420446a habanalabs: remove implicit include from header files
This will prevent unneeded include of header files, which may increase
compilation time.

Signed-off-by: Dotan Barak <dbarak@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-02-27 09:55:57 +02:00
Oded Gabbay b24ca4587e habanalabs: rename goya_non_fatal_events array to all events
The goya_non_fatal_events array actually contains all the possible events
the driver can receive from the F/W. Therefore, use a proper name
for the array.

The patch also adds missing event Ids to the goya_async_event_id enum.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-02-24 15:50:53 +02:00
Igor Grinberg 0ca3b1b7b9 habanalabs: add new device CPU boot status
This patch adds a definition of a new status in the device CPU boot stages
and add the handling of the new status.

Signed-off-by: Igor Grinberg <igrinberg@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-02-24 11:20:02 +02:00
Oded Gabbay 887f7d38e4 habanalabs: set DMA0 completion to SOB 1007
This patch fix a bug where DMA channel 0 completion address wasn't
initialized by the driver.

The patch sets the address to Sync Object no. 1007

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-28 13:04:59 +01:00
Omer Shpigelman 27ca384cb7 habanalabs: add MMU DRAM default page mapping
This patch provides a workaround for a H/W bug in Goya, where access to
RAZWI from TPC can cause PCI completion timeout.

The WA is to use the device MMU to map any unmapped DRAM memory to a
default page in the DRAM. That way, the TPC will never reach RAZWI upon
accessing a bad address in the DRAM.

When a DRAM page is mapped by the user, its default mapping is
overwritten. Once that page is unmapped, the MMU driver will map that page
to the default page.

To help debugging, the driver will set the default page area to 0x99 on
device initialization.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-28 13:04:59 +01:00
Omer Shpigelman 0feaf86d4e habanalabs: add virtual memory and MMU modules
This patch adds the Virtual Memory and MMU modules.

Goya has an internal MMU which provides process isolation on the internal
DDR. The internal MMU also performs translations for transactions that go
from Goya to the Host.

The driver is responsible for allocating and freeing memory on the DDR
upon user request. It also provides an interface to map and unmap DDR and
Host memory to the device address space.

The MMU in Goya supports 3-level and 4-level page tables. With 3-level, the
size of each page is 2MB, while with 4-level the size of each page is 4KB.

In the DDR, the physical pages are always 2MB.

Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-18 09:46:46 +01:00
Oded Gabbay 1251f23ae8 habanalabs: add event queue and interrupts
This patch adds support for receiving events from Goya's control CPU and
for receiving MSI-X interrupts from Goya's DMA engines and CPU.

Goya's PCI controller supports up to 8 MSI-X interrupts, which only 6 of
them are currently used. The first 5 interrupts are dedicated for Goya's
DMA engine queues. The 6th interrupt is dedicated for Goya's control CPU.

The DMA queue will signal its MSI-X entry upon each completion of a command
buffer that was placed on its primary queue. The driver will then mark that
CB as completed and free the related resources. It will also update the
command submission object which that CB belongs to.

There is a dedicated event queue (EQ) between the driver and Goya's control
CPU. The EQ is located on the Host memory. The control CPU writes a new
entry to the EQ for various reasons, such as ECC error, MMU page fault, Hot
temperature. After writing the new entry to the EQ, the control CPU will
trigger its dedicated MSI-X entry to signal the driver that there is a new
entry in the EQ. The driver will then read the entry and act accordingly.

Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-18 09:46:45 +01:00
Oded Gabbay 9494a8dd8d habanalabs: add h/w queues module
This patch adds the H/W queues module and the code to initialize Goya's
various compute and DMA engines and their queues.

Goya has 5 DMA channels, 8 TPC engines and a single MME engine. For each
channel/engine, there is a H/W queue logic which is used to pass commands
from the user to the H/W. That logic is called QMAN.

There are two types of QMANs: external and internal. The DMA QMANs are
considered external while the TPC and MME QMANs are considered internal.
For each external queue there is a completion queue, which is located on
the Host memory.

The differences between external and internal QMANs are:

1. The location of the queue's memory. External QMANs are located on the
   Host memory while internal QMANs are located on the on-chip memory.

2. The external QMAN write an entry to a completion queue and sends an
   MSI-X interrupt upon completion of a command buffer that was given to
   it. The internal QMAN doesn't do that.

Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-18 09:46:45 +01:00
Oded Gabbay 839c48030d habanalabs: add basic Goya h/w initialization
This patch adds the basic part of Goya's H/W initialization. It adds code
that initializes Goya's internal CPU, various registers that are related to
internal routing, scrambling, workarounds for H/W bugs, etc.

It also initializes Goya's security scheme that prevents the user from
abusing Goya to steal data from the host, crash the host, change
Goya's F/W, etc.

Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-18 09:46:44 +01:00
Oded Gabbay 99b9d7b497 habanalabs: add basic Goya support
This patch adds a basic support for the Goya device. The code initializes
the device's PCI controller and PCI bars. It also initializes various S/W
structures and adds some basic helper functions.

Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-18 09:46:44 +01:00
Oded Gabbay 1ea2a20e91 habanalabs: add Goya registers header files
This patch just adds a lot of header files that contain description of
Goya's registers.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-18 09:46:44 +01:00