Commit Graph

89 Commits

Author SHA1 Message Date
farah kassabri eb10b897e4 habanalabs: reset device upon fw read failure
failure in reading pre-boot verion is not handled correctly,
upon failure we need to reset the device in order to be able
to reinstall the driver.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30 10:47:32 +02:00
Tomer Tayar ba7e389c30 habanalabs: Move repeatedly included headers to habanalabs.h
Several header files are repeatedly included in many files.
Move these files to habanalabs.h which is included by all.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30 10:47:32 +02:00
Ofir Bitton 1cbca899fa habanalabs/gaudi: fetch PLL info from FW
Once FW security is enabled there is no access to PLL registers,
need to read values from FW using a dedicated interface.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30 10:47:31 +02:00
farah kassabri 03df136bc5 habanalabs/gaudi: scrub all memory upon closing FD
In cases of multi-tenants, administrators may want to prevent data
leakage between users running on the same device one after another.

To do that the driver can scrub the internal memory (both SRAM and
DRAM) after a user finish to use the memory.

Because in GAUDI the driver allows only one application to use the
device at a time, it can scrub the memory when user app close FD.

In future devices where we have MMU on the DRAM, we can scrub the DRAM
memory with a finer granularity (page granularity) when the user
allocates the memory.

This feature is not supported in Goya.

To allow users that want to debug their applications, we add a kernel
module parameter to load the driver with this feature disabled.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30 10:47:31 +02:00
Ofir Bitton c692dec703 habanalabs/gaudi: add support for FW security
Skip relevant HW configurations once FW security is enabled
because these configurations are being performed by FW.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30 10:47:31 +02:00
Ofir Bitton 323b726706 habanalabs: fetch security indication from FW
Add support for fetching security indication from FW.
This indication is needed in order to skip unnecessary
initializations done by FW.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30 10:47:31 +02:00
farah kassabri e753643d51 habanalabs: fix cs counters structure
Fix cs counters structure in uapi to be one flat structure instead
of two instances of the same other structure.
use atomic read/increment for context counters so we could use
one structure for both aggregated and context counters.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30 10:47:31 +02:00
Ofir Bitton 9bb86b63d8 habanalabs: advanced FW loading
Today driver is able to load a whole FW binary into a specific
location on ASIC. We add support for loading sections from the
same FW binary into different loactions.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30 10:47:31 +02:00
Ofir Bitton 71a984f9ae habanalabs/gaudi: remove unreachable code
Remove unreachable code in gaudi collective flow.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30 10:47:30 +02:00
Ofir Bitton 5de406c0b5 habanalabs: sync stream collective support
Implement sync stream collective for GAUDI. Need to allocate additional
resources for that and add ctx_fini() to clean up those resources.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30 10:47:30 +02:00
Ofir Bitton 0940cabafd habanalabs/gaudi: Set DMA5 QMAN internal
DMA5 QMAN is designated to be used for reduction process, hence it will
be no longer configured as external queue.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30 10:47:30 +02:00
Ofir Bitton 5fe1c17ddf habanalabs: sync stream collective infrastructure
Define new API for collective wait support and modify sync stream
common flow. In addition add kernel CB allocation support for
internal queues.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30 10:47:30 +02:00
Tal Cohen 4bb1f2f3fb habanalabs: use enum for CB allocation options
In the future there will be situations where queues can accept either
kernel allocated CBs or user allocated CBs, depending on different
states.

Therefore, instead of using a boolean variable of kernel/user allocated
CB, we need to use a bitmask to indicate that, which will allow to
combine the two options.

Add a flag to the uapi so the user will be able to indicate whether
the CB was allocated by kernel or by user. Of course the driver
validates that.

Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30 10:47:29 +02:00
Oded Gabbay 3c68157fb8 habanalabs/gaudi: add support for NIC QMANs
Initialize the QMANs that are responsible to submit doorbells to the NIC
engines. Add support for stopping and disabling them, and reset them as
part of the hard-reset procedure of GAUDI. This will allow the user to
submit work to the NICs.

Add support for receiving events on QMAN errors from the firmware.

However, the nic_ports_mask is still initialized to 0. That means this code
won't initialize the QMANs just yet. That will be in a later patch.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30 10:47:29 +02:00
Oded Gabbay 11dcb8c712 habanalabs/gaudi: add NIC security configuration
Configure the security properties of the NIC IP. This is to prevent the
user process from doing something with the NIC that he shouldn't do. e.g.
crash the server, steal data, etc.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30 10:47:29 +02:00
Ofir Bitton 2992c1dcd3 habanalabs: add support for multiple SOBs per monitor
Support advanced monitor functionality to monitor more than a
single SOB. In addition expand all CB generation functions
with buffer offset in order to put in them multiple packets that are
generated by different functions.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30 10:47:28 +02:00
Ofir Bitton 3cf74b3656 habanalabs: sync stream structures refactor
Refactor sync stream implementation by adding more structures for
better readability. In addition reducing allocated resources.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30 10:47:28 +02:00
Oded Gabbay 8f50314674 habanalabs: minimize prints when everything is fine
No need to print when the driver starts to initialize the H/W. Drivers
should be silent when everything is OK.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30 10:47:28 +02:00
Ofir Bitton 20b7525dc4 habanalabs/gaudi: move mmu_prepare to context init
Currently mmu_prepare is located at context switch.
Since we support a single context, no reason to reconfigure
the MMU registers every context switch.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30 10:47:27 +02:00
Oded Gabbay 652b44453e habanalabs/gaudi: fix missing code in ECC handling
There is missing statement and missing "break;" in the ECC handling
code in gaudi.c
This will cause a wrong behavior upon certain ECC interrupts.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-23 20:08:43 +02:00
Ofir Bitton 1137e1ead9 habanalabs/gaudi: move coresight mmu config
We must relocate the coresight mmu configuration to the coresight
flow to make it work in case the first submission is to configure
the profiler.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-04 08:56:06 +02:00
Arnd Bergmann 82948e6e1d habanalabs: fix kernel pointer type
All throughout the driver, normal kernel pointers are
stored as 'u64' struct members, which is kind of silly
and requires casting through a uintptr_t to void* every
time they are used.

There is one line that missed the intermediate uintptr_t
case, which leads to a compiler warning:

drivers/misc/habanalabs/common/command_buffer.c: In function 'hl_cb_mmap':
drivers/misc/habanalabs/common/command_buffer.c:512:44: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
  512 |  rc = hdev->asic_funcs->cb_mmap(hdev, vma, (void *) cb->kernel_address,

Rather than adding one more cast, just fix the type and
remove all the other casts.

Fixes: 0db575350c ("habanalabs: make use of dma_mmap_coherent")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-04 08:56:06 +02:00
Oded Gabbay 5b94d6e476 habanalabs/gaudi: use correct define for qman init
There was a copy-paste error, and the wrong define was used for
initializing the QMAN.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Link: https://lore.kernel.org/r/20200925171415.25663-1-oded.gabbay@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-30 08:38:18 +02:00
Ofir Bitton 25121d9804 habanalabs/gaudi: configure QMAN LDMA registers properly
LDMA registers are configured with a fixed value.
We add new define set which gives the configuration
a proper meaning.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-25 14:44:21 +03:00
Oded Gabbay 57799ce9f8 habanalabs: add indication of security-enabled F/W
Future F/W versions will have enhanced security measures and the driver
won't be able to do certain configurations that it always did and those
configurations will be done by the firmware.

We use the firmware's preboot version to determine whether security
measures are enabled or not. Because we need this very early in our code,
the read of the preboot version is moved to the earliest possible place,
right after the device's PCI initialization.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:54 +03:00
Oded Gabbay d1f3633599 habanalabs/gaudi: fix DMA completions max outstanding to 15
This is a workaround for H/W bug H3-2116, where if there are more than 16
outstanding completions in the DMA transpose engine, there can be a
deadlock in the engine.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:54 +03:00
Oded Gabbay dbf053c429 habanalabs/gaudi: remove axi drain support
AXI drain is broken in GAUDI so remove support for enabling it.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:54 +03:00
Tomer Tayar ef6a0f6caa habanalabs: Add an option to map CB to device MMU
There are cases in which the device should access the host memory of a
CB through the device MMU, and thus this memory should be mapped.
The patch adds a flag to the CB IOCTL, in which a user can ask the
driver to perform the mapping when creating a CB.
The mapping is allowed only if a dedicated VA range was allocated for
the specific ASIC.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:54 +03:00
Tomer Tayar fa8641a14f habanalabs: Save context in a command buffer object
Future changes require using a context while handling a command buffer,
and thus need to save the context in the command buffer object.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:54 +03:00
Oded Gabbay 230b9b7d45 habanalabs/gaudi: increase timeout for boot fit load
The firmware running in the boot stage takes more time to execute due to
increased security mechanisms. Therefore, we need to increase the timeout
we wait for the boot fit to finish loading.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:53 +03:00
Moti Haimovski 7edf341b9e habanalabs: add num_hops to hl_mmu_properties
This commit adds the number of HOPs supported by the device to the
device MMU properties.

Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:53 +03:00
Oded Gabbay ae926514dd habanalabs: remove unused define
Cleanup the code.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:52 +03:00
Oded Gabbay b01a971f80 habanalabs: remove unused ASIC function pointer
Old function pointer that was left when the call to this function pointer
was removed.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:52 +03:00
Oded Gabbay 6138bbe911 habanalabs: rename ArmCP to CPU-CP
There were a couple of comments where the name ArmCP was still used. Rename
it to CPU-CP.

In addition, rename ArmCP or ARM in log messages to "device CPU".

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:52 +03:00
Hillf Danton 0db575350c habanalabs: make use of dma_mmap_coherent
Add dma_mmap_coherent() for goya and gaudi to match their use of
dma_alloc_coherent(), see the Link tag for why.

Link: https://lore.kernel.org/lkml/20200609091727.GA23814@lst.de/
Cc: Christoph Hellwig <hch@lst.de>
Cc: Zhang Li <li.zhang@bitmain.com>
Cc: Ding Z Nan <oshack@hotmail.com>
Signed-off-by: Hillf Danton <hdanton@sina.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:51 +03:00
Oded Gabbay f763946aef habanalabs: cast to u64 before shift > 31 bits
When shifting a boolean variable by more than 31 bits and putting the
result into a u64 variable, we need to cast the boolean into unsigned 64
bits to prevent possible overflow.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:51 +03:00
Oded Gabbay 2f55342c5e habanalabs: replace armcp with the generic cpucp
ArmCP mandates that the device CPU is always an ARM processor, which might
be wrong in the future.

Most of this change is an internal renaming of variables, functions and
defines but there are two entries in sysfs which have armcp in their
names. Add identical cpucp entries but don't remove yet the armcp entries.
Those will be deprecated next year. Add the documentation about it in sysfs
documentation.

Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:51 +03:00
Tomer Tayar 56004701f5 habanalabs: Include linux/bitfield.h only in habanalabs.h
Include linux/bitfield.h only in habanalabs.h, instead of in each and
every file that needs it, as habanalabs.h is already included by all.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:51 +03:00
farah kassabri d90416c84d habanalabs: extend busy engines mask to 64 bits
change busy engines bitmask to 64 bits in order to represent
more engines, needed for future ASIC support.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:50 +03:00
Oded Gabbay 107dd31465 habanalabs: use 1U when shifting bits
Eliminate following warning:
warning: Shifting signed 32-bit value by 31 bits is undefined behavior

Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:50 +03:00
Oded Gabbay 31ac1f1a57 habanalabs: check TPC vector pipe is empty
The driver waits for the TPC vector pipe to be empty before checking if the
TPC kernel has finished executing, but the code doesn't validate that the
pipe was indeed empty, it just wait for it without checking the return
value.

Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:50 +03:00
Oded Gabbay 0358372bbe habanalabs: remove redundant assignment to variable
new_dma_pkt->ctl is assigned a value and then is reassigned a new value
without the first value ever being used.

Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:50 +03:00
Oded Gabbay 65887291c6 habanalabs: use FIELD_PREP() instead of <<
Use the standard FIELD_PREP() macro instead of << operator to perform
bitmask operations. This ensures type check safety and eliminate compiler
warnings.

Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:50 +03:00
Oded Gabbay a0e072f5a1 habanalabs: use standard BIT() and GENMASK()
Use the standard macros to define bitmasks.

Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:50 +03:00
Dotan Barak d6b045c083 habanalabs: print the queue id in case of an error
If there is a failure during the testing of a queue,
to ease up debugging - print the queue id.

Signed-off-by: Dotan Barak <dbarak@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:49 +03:00
farah kassabri acd330c141 habanalabs: remove security from ARB_MST_QUIET register
Allow user application to write to this register in order
to be able to configure the quiet period of the QMAN between grants.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:49 +03:00
Ofir Bitton 843839bec3 habanalabs: expose sync manager resources allocation in INFO IOCTL
Although the driver defines the first user-available sync manager object
and monitor in habanalabs.h, we would like to also expose this information
via the INFO IOCTL so the runtime can get this information dynamically.
This is because in future ASICs we won't need to define it statically.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:49 +03:00
Ofir Bitton 0a068adde5 habanalabs: add information about PCIe controller
Update firmware header with new API for getting pcie info
such as tx/rx throughput and replay counter.
These counters are needed by customers for monitor and maintenance
of multiple devices.
Add new opcodes to the INFO ioctl to retrieve these counters.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:49 +03:00
Oded Gabbay 58361aae4b habanalabs: set max power according to card type
In Gaudi, the default max power setting is different between PCI and PMC
cards. Therefore, the driver need to set the default after knowing what is
the card type.

The current code has a bug where it limits the maximum power of the PMC
card to 200W after a reset occurs.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-08-22 12:47:57 +03:00
Ofir Bitton 36545279f0 habanalabs: proper handling of alloc size in coresight
Allocation size can go up to 64bit but truncated to 32bit,
we should make sure it is not truncated and validate no address
overflow.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-08-22 12:47:57 +03:00