OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Ofir Bitton	2795c88915	habanalabs: staged submission support We introduce a new mechanism named Staged Submission. This mechanism allows the user to send a whole CS in pieces. Each CS will not require completion rather than the last CS. Timeout timer will be triggered upon reception of the first CS in group. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:51 +02:00
Ohad Sharabi	cf30339d3f	habanalabs: modify device_idle interface Currently this API uses single 64 bits mask for engines idle indication. Recently, it was observed that more bits are needed for some ASICs. This patch modifies the use of the idle mask and the idle_extensions mask. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:51 +02:00
Ofir Bitton	0811b39146	habanalabs: add CS completion and timeout properties In order to support staged submission feature, we need to distinguish on which command submission we want to receive timeout and for which we want to receive completion. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:51 +02:00
Ofir Bitton	d00697fbe1	habanalabs: add new mem ioctl op for mapping hw blocks For future ASIC support the driver allows user to map certain regions in the device's configuration space for direct access from userspace. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:51 +02:00
farah kassabri	89473a1fc3	habanalabs: fix MMU debugfs related nodes In mmu debugfs node show un-scrambled physical addresses. before read/write through data nodes, need to unscramble the physical address before using it for pci transaction. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:50 +02:00
Ofir Bitton	e1fa724dd1	habanalabs: add user available interrupt to hw_ip In order to support completions that arrive directly to the user, the driver needs to supply the user with the first available msix interrupt available. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:50 +02:00
farah kassabri	8d79ce162e	habanalabs: always try to use the hint address Currently hint address is ignored in case va block page size is not power of 2. We need to support th user hint address also in this case, but only if the hint address is aligned to page size. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:50 +02:00
Ofir Bitton	d2b980f329	habanalabs: add security violations dump to debugfs In order to improve driver security debuggability, we add security violations dump to debugfs. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:50 +02:00
Ofir Bitton	eea4c2557c	habanalabs: ignore F/W BMC errors in case no BMC present In order to support operation mode in which BMC is not active, driver must not take BMC errors into consideration. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:50 +02:00
Ofir Bitton	f8bc7f091c	habanalabs/gaudi: print sync manager SEI interrupt info Driver must print sync manager SEI information upon receiving interrupt from FW. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:50 +02:00
Christophe JAILLET	825b30c4f3	habanalabs: Use 'dma_set_mask_and_coherent()' Axe 'hl_pci_set_dma_mask()' and replace it with an equivalent 'dma_set_mask_and_coherent()' call. This makes the code a bit less verbose. It also removes an erroneous comment, because 'hl_pci_set_dma_mask()' does not try to use a fall-back value. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:50 +02:00
Ofir Bitton	423815bf02	habanalabs/gaudi: remove PCI access to SM block Due to HW limitation we must remove all direct access to SM registers, in order to do that we will access SM registers using the HW QMANS. When possible and no user context is present, we can directly access the HW QMANS. Whenever there is an active user, driver will prepare a pending command buffer list which will be sent upon user submissions. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:50 +02:00
Ofir Bitton	d3f139c462	habanalabs: add driver support for internal cb scheduling In order to support scnenarios in which driver needs access to HW components but it cannot access them directly, we add support for scheduling command buffers internally. These command buffers will be transmitted upon next user command submission context. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:50 +02:00
Ofir Bitton	1e3f2536a8	habanalabs: increment ctx ref from within a cs allocation A CS must increment the relevant context reference count. We want to increment the reference inside the CS allocation function as opposed for today where we increment it outside. This is logical since we want to avoid explicitly incrementing the context every time we call the CS allocate function. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:50 +02:00
Ofir Bitton	8563e19159	habanalabs: separate common code to dedicated folders We separate some of the common code source files to different folders for a better maintainability and testability. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:50 +02:00
Ofir Bitton	edb07cb69c	habanalabs: read device boot errors after cpucp is up Boot cpu can report errors in various boot stages. Current implementaion does not take into consideration errors reported in late stages, hence we will check for errors at the most late stage when fetching cpucp information. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:50 +02:00
Ofir Bitton	6769cea8de	habanalabs: report correct dram size in info ioctl In case MMU is enabled, we must take MMU page size into consideration when reporting dram size to the user. This is because the MMU page size can be a value which is NOT a power-of-2 value. As a result, the total DRAM size (which is always a power-of-2 value) needed to be rounded-down. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:49 +02:00
Moti Haimovski	b19dc67aa8	habanalabs: support non power-of-2 DRAM phys page sizes DRAM physical page sizes depend of the amount of HBMs available in the device. this number is device-dependent and may also be subject to binning when one or more of the DRAM controllers are found to to be faulty. Such a configuration may lead to partitioning the DRAM to non-power-of-2 pages. To support this feature we also need to add infrastructure of address scarmbling. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:49 +02:00
Ofir Bitton	a1f8533269	habanalabs: remove access to kernel memory using debugfs Accessing kernel allocated memory through debugfs should not be allowed as it introduces a security vulnerability. We remove the option to read/write kernel memory for all asics. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:49 +02:00
Ofir Bitton	266cdfa2b7	habanalabs/gaudi: set uninitialized symbol Initialize local variable that is returned by the function, in case it is never assigned. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:49 +02:00
Alon Mizrahi	9402a33624	habanalabs: return dram virtual address in info ioctl When working with DRAM MMU, we should supply the userspace with the virtual start address of the DRAM instead of the physical one. This is because the physical one has no meaning for the user as he only knows the virtual address range. Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:49 +02:00
Oded Gabbay	3abe1040ba	habanalabs: update to latest hl_boot_if.h Update the latest version of this file that the F/W exports Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:49 +02:00
Oded Gabbay	1530d46817	habanalabs: add ASIC property of functional HBMs The number of functional HBMs in the same ASIC can be different due to malfunctioning HBM banks. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:49 +02:00
Ofir Bitton	2e36856008	habanalabs/gaudi: add debug prints for security status In order to have more information while debugging boot issues, we should print the firmware security status at every boot stage. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:49 +02:00
Omer Shpigelman	f19040ce41	habanalabs: modify memory functions signatures For consistency, modify all memory ioctl functions to get the ioctl arguments structure rather than the arguments themselves. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:49 +02:00
Omer Shpigelman	3b762f55aa	habanalabs: kernel doc format in memory functions Change all memory functions documentation according to kernel doc format. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:49 +02:00
Alon Mizrahi	75d9a2a0aa	habanalabs: replace WARN/WARN_ON with dev_crit in driver Often WARN is defined in data-centers as BUG and we would like to avoid hanging the entire server on some internal error of the driver (important as it might be). Therefore, use dev_crit instead. Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:49 +02:00
Moti Haimovski	0eda23d77e	habanalabs: report dram_page_size in hw_ip_info ioctl Instead of having it hard-coded as a define, pass it to the user in runtime. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:49 +02:00
Ohad Sharabi	e1b85dbaf0	habanalabs/goya: move mmu_prepare to context init Currently mmu_prepare is located at context switch. Since we support a single context, no reason to reconfigure the MMU registers every context switch. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:49 +02:00
Ofir Bitton	f8b0f2ecc5	habanalabs/gaudi: remove duplicated gaudi packets masks As all packets use the same CTL register masks, we remove duplicated masks and use common masks instead. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:48 +02:00
Ofir Bitton	c209e74214	habanalabs: allow user to pass a staged submission seq In order to support the staged submission feature, user must be allowed to use the same CS sequence for all submissions in the same staged submission. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:48 +02:00
Ofir Bitton	ac6fdbfe2e	habanalabs/gaudi: support CS with no completion As part of the staged submission feature, we need Gaudi to support command submissions that will never get a completion. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:48 +02:00
Ofir Bitton	8e39e75a13	habanalabs: Init the VM module for kernel context In order for reserving VA ranges for kernel memory, we need to allow the VM module to be initiated with kernel context. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:48 +02:00
Ohad Sharabi	cb6ef0ee6d	habanalabs: refactor MMU locks code remove mmu_cache_lock as it protects a section which is already protected by mmu_lock. in addition, wrap mmu cache invalidate calls in hl_vm_ctx_fini with mmu_lock. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:48 +02:00
Oded Gabbay	4c998836d4	habanalabs: update firmware boot interface Update to latest firmware hl_boot_if.h file. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:48 +02:00
Oded Gabbay	2dc4a6d791	habanalabs: disable FW events on device removal When device is removed, we need to make sure the F/W won't send us any more events because during the remove process we disable the interrupts. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-21 20:30:22 +02:00
Oded Gabbay	f8abaf379b	habanalabs: fix backward compatibility of idle check Need to take the lower 32 bits of the driver's 64-bit idle mask and put it in the legacy 32-bit variable that the userspace reads to know the idle mask. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-21 20:30:22 +02:00
Ofir Bitton	9354f1b421	habanalabs: zero pci counters packet before submit to FW Driver does not zero some pci counters packets before sending to FW. This causes an out of sync PI/CI between driver and FW. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-21 20:30:22 +02:00
Daniel Vetter	d88a0c169b	misc/habana: Use FOLL_LONGTERM for userptr These are persistent, not just for the duration of a dma operation. Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Oded Gabbay <oded.gabbay@gmail.com> Cc: Omer Shpigelman <oshpigelman@habana.ai> Cc: Ofir Bitton <obitton@habana.ai> Cc: Tomer Tayar <ttayar@habana.ai> Cc: Moti Haimovski <mhaimovski@habana.ai> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Pawel Piskorski <ppiskorski@habana.ai> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20201127164131.2244124-5-daniel.vetter@ffwll.ch	2021-01-12 14:15:09 +01:00
Daniel Vetter	d4cb19250a	misc/habana: Stop using frame_vector helpers All we need are a pages array, pin_user_pages_fast can give us that directly. Plus this avoids the entire raw pfn side of get_vaddr_frames. Note that pin_user_pages_fast is a safe replacement despite the seeming lack of checking for vma->vm_flasg & (VM_IO \| VM_PFNMAP). Such ptes are marked with pte_mkspecial (which pup_fast rejects in the fastpath), and only architectures supporting that support the pin_user_pages_fast fastpath. Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Oded Gabbay <oded.gabbay@gmail.com> Cc: Omer Shpigelman <oshpigelman@habana.ai> Cc: Ofir Bitton <obitton@habana.ai> Cc: Tomer Tayar <ttayar@habana.ai> Cc: Moti Haimovski <mhaimovski@habana.ai> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Pawel Piskorski <ppiskorski@habana.ai> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20201127164131.2244124-4-daniel.vetter@ffwll.ch	2021-01-12 14:14:53 +01:00
Oded Gabbay	9488307a55	habanalabs: prevent soft lockup during unmap When using Deep learning framework such as tensorflow or pytorch, there are tens of thousands of host memory mappings. When the user frees all those mappings at the same time, the process of unmapping and unpinning them can take a long time, which may cause a soft lockup bug. To prevent this, we need to free the core to do other things during the unmapping process. For now, we chose to do it every 32K unmappings (each unmap is a single 4K page). Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-12 15:00:10 +02:00
Oded Gabbay	aa6df6533b	habanalabs: fix reset process in case of failures There are some points in the reset process where if the code fails for some reason, and the system admin tries to initiate the reset process again we will get a kernel panic. This is because there aren't any protections in different fini functions that are called during the reset process. The protections that are added in this patch make sure that if the fini functions are called multiple times, without calling init functions between them, there won't be double release of already released resources. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-12 14:59:52 +02:00
Oded Gabbay	a9d4ef6434	habanalabs: fix dma_addr passed to dma_mmap_coherent When doing dma_alloc_coherent in the driver, we add a certain hard-coded offset to the DMA address before returning to the callee function. This offset is needed when our device use this DMA address to perform outbound transactions to the host. However, if we want to map the DMA'able memory to the user via dma_mmap_coherent(), we need to pass the original dma address, without this offset. Otherwise, we will get erronouos mapping. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-12 14:59:36 +02:00
Dinghao Liu	b000700d6d	habanalabs: Fix memleak in hl_device_reset When kzalloc() fails, we should execute hl_mmu_fini() to release the MMU module. It's the same when hl_ctx_init() fails. Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-12-29 23:23:12 +02:00
Oded Gabbay	097c62b6f0	habanalabs: fix order of status check When the device is in reset or needs to be reset, the disabled property is don't-care. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-12-28 08:47:39 +02:00
Oded Gabbay	fcaebc7354	habanalabs: register to pci shutdown callback We need to make sure our device is idle when rebooting a virtual machine. This is done in the driver level. The firmware will later handle FLR but we want to be extra safe and stop the devices until the FLR is handled. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-12-28 08:47:39 +02:00
Alon Mizrahi	a3fd283063	habanalabs: add validation cs counter, fix misplaced counters Up until now validation errors were counted in the parsing field of the cs_counters struct, so we added a new counter and increased it when needed. In addition, there were some locations where only one of the counters was updated (ctx or aggregate) so add the second one to be updated as well. Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-12-28 08:47:39 +02:00
Oded Gabbay	98e8781f00	habanalabs/gaudi: retry loading TPC f/w on -EINTR If loading the firmware file for the TPC f/w was interrupted, try to do it again, up to 5 times. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-12-28 08:47:39 +02:00
Oded Gabbay	377182a3cc	habanalabs: adjust pci controller init to new firmware When the firmware security is enabled, the pcie_aux_dbi_reg_addr register in the PCI controller is blocked. Therefore, ignore the result of writing to this register and assume it worked. Also remove the prints on errors in the internal ELBI write function. If the security is enabled, the firmware is responsible for setting this register correctly so we won't have any problem. If the security is disabled, the write will work (unless something is totally broken at the PCI level and then the whole sequence will fail). In addition, remove a write to register pcie_aux_dbi_reg_addr+4, which was never actually needed. Moreover, PCIE_DBI registers are blocked to access from host when firmware security is enabled. Use a different register to flush the writes. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-12-28 08:47:39 +02:00
Oded Gabbay	90ffe170a3	habanalabs: update comment in hl_boot_if.h Hard-reset flag is updated in many stages of the boot sequence of the firmware. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-12-28 08:47:38 +02:00
Oded Gabbay	13d0ee10b5	habanalabs/gaudi: enhance reset message Print the initiator who performs the hard-reset for easier debugging. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-12-28 08:47:38 +02:00
Ofir Bitton	6bbb77b9e6	habanalabs: full FW hard reset support Driver must fetch FW hard reset capability at every FW boot stage: preboot, CPU boot, CPU application. If hard reset is triggered, driver will take into consideration only the last capability received. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-12-28 08:47:38 +02:00
Oded Gabbay	0024c09485	habanalabs/gaudi: disable CGM at HW initialization In case the clock gating was enabled in preboot we need to disable it at the H/W initialization stage before touching the MME/TPC registers. Otherwise, the ASIC can get stuck. If the security is enabled in the firmware level, the CGM is always disabled and the driver can't enable it. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-12-28 08:47:38 +02:00
Tomer Tayar	7a585dfc32	habanalabs: Revise comment to align with mirror list name hw_queues_mirror was renamed to cs_mirror, so revise accordingly a comment that refers to this list. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-12-28 08:47:38 +02:00
Alon Mizrahi	72ab9ca52d	habanalabs/gaudi: do not set EB in collective slave queues We don't need to set EB on signal packets from collective slave queues as it degrades performance. Because the slaves are the network queues, the engine barrier doesn't actually guarantee that the packet has been sent. Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-12-28 08:47:38 +02:00
Ofir Bitton	9c9013cbd8	habanalabs: preboot hard reset support FW hard reset capability indication is now moved to preboot stage. Driver will check if HW is dirty only after it validated preboot is up. If HW is dirty, driver will perform a hard reset according to the FW capability. In addition, FW defines a new message which driver need to send in order to initiate a hard reset. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-12-28 08:47:38 +02:00
Alon Mizrahi	6585489e80	habanalabs: remove generic gaudi get_pll_freq function As we only fetch the CPU_PLL frequency in gaudi, we don't need a generic get_pll_frequency function which takes a pll index as input Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-12-28 08:47:38 +02:00
Alon Mizrahi	4783489951	habanalabs: fetch PSOC PLL frequency from F/W in goya When the F/W security is enabled, goya needs to fetch the PSOC pll frequency through a dedicated interface Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-12-28 08:47:37 +02:00
Tomer Tayar	105b5ca9b1	habanalabs: Fix a missing-braces warning Fix a compilation "missing braces around initializer" warning. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-12-28 08:47:37 +02:00
Greg Kroah-Hartman	a3ab07c642	Merge 5.10-rc7 into char-misc-next We want the fixes in here, and this resolves a merge issue with drivers/misc/habanalabs/common/memory.c. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-12-07 10:08:14 +01:00
Tomer Tayar	f44afb5b5a	habanalabs: Add CB IOCTL opcode to retrieve CB information Add a new CB IOCTL opcode that enables a user to query about a CB and get its usage count. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:38 +02:00
Tomer Tayar	f074867454	habanalabs: Modify the cs_cnt of a CB to be atomic Modify the CS counter of a CB to be atomic, so no locking is required when it is being modified or read. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:38 +02:00
Tomer Tayar	3e438b42a5	habanalabs: Add mask for CS type bits in CS flags hl_cs_sanity_checks() extracts the CS type bits of the CS flags, by masking out the non-type bits. To save the need for updating the function whenever new bits for non-type flags are added, add an explicit mask for the CS type bits. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:38 +02:00
Oded Gabbay	3b82c34f06	habanalabs: change messages to debug level Some messages should be changed to debug mode as we want to keep minimal prints during normal operation of the device. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:37 +02:00
Ofir Bitton	8e718f2eda	habanalabs: free host huge va_range if not used If huge range is not valid, driver uses the host range also for huge page allocations, but driver never frees its allocation. This introduces a memory leak every time a user closes its context. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:37 +02:00
Oded Gabbay	a63c3fb37b	habanalabs/gaudi: handle reset when f/w is in preboot Currently, if the f/w is in preboot/u-boot they don't perform the new reset mechanism. Therefore, the driver needs to reset the device. To prevent reset of PCI_IF, the driver needs to first configure the reset units. If the security is enabled, the driver can't configure the reset units. In that situation, don't reset the card. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:37 +02:00
Oded Gabbay	ee3287798d	habanalabs: add missing counter update The global CS drop-on-reset counter wasn't updated together with the context counter. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:37 +02:00
Alon Mizrahi	d2bbf2ca33	habanalabs: add ull to PLL masks These defines are 64-bit defines so they need ull suffix. Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:37 +02:00
Ofir Bitton	bd2f477f20	habanalabs: add support for cs with timestamp add support for user to request a timestamp upon cs completion. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:37 +02:00
Ofir Bitton	9d127ad571	habanalabs: indicate to user that a cs is gone We want to indicate to the user that a certain command submission is finished long time ago and it is no longer in database. This means no further information regarding this cs can be obtained. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:36 +02:00
Oded Gabbay	64a9d5ab2c	habanalabs/gaudi: print ECC type field We have the ECC type field from the firmware but the driver didn't print it, so we need to add that field to the ECC print message. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:36 +02:00
Oded Gabbay	051504d9f6	habanalabs: update firmware files Update various firmware header files with new defines. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:36 +02:00
kernel test robot	293744d92c	habanalabs: gaudi_ctx_fini() can be static Make a function in gaudi.c to be static Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: kernel test robot <lkp@intel.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:36 +02:00
kernel test robot	2a570736ef	habanalabs: goya_reset_sob_group() can be static Make some functions static Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: kernel test robot <lkp@intel.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:36 +02:00
Alon Mizrahi	4147864e8d	habanalabs: fetch pll frequency from firmware Once firmware security is enabled, driver must fetch pll frequencies through the firmware message interface instead of reading the registers directly. Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:36 +02:00
Ofir Bitton	5c05487f15	habanalabs: mmu map wrapper for sizes larger than a page We introduce a new wrapper which allows us to mmu map any size to any host va_range available. In addition we remove duplicated code from various places in driver and using this new wrapper instead. This wrapper supports mapping only contiguous physical memory blocks and will be used for mappings that are done to the driver ASID. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:36 +02:00
Oded Gabbay	5e5867e51d	habanalabs: print CS type when it is stuck We have several types of command submissions and the user wants to know which type of command submission has not finished in time when that event occurs. This is very helpful for debug. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:36 +02:00
Ofir Bitton	b90c894434	habanalabs/gaudi: align to new FW reset scheme As part of the security effort in which FW will be handling sensitive HW registers, hard reset flow will be done by FW and will be triggered by driver. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:35 +02:00
Alon Mizrahi	439bc47b8e	habanalabs: firmware returns 64bit argument F/W message returns 64bit value but up until now we casted it to a 32bit variable, instead of receiving 64bit in the first place. Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:35 +02:00
Moti Haimovski	00e1b59c8b	habanalabs: fix MMU debugfs operations After the MMU-code refactoring, the existing MMU debugfs operations are no longer working so we need to fix them. In addition, remove the duplicate code that was in the debugfs code and use the already existing MMU-code. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:35 +02:00
Moti Haimovski	fe2bc2d249	habanalabs: share a single ctx-mutex between all MMUs Multiple locks are usually a source of problems, which in the MMU case can be avoided since it is relatively rare that both MMU tables are updated at the same time. Therefore, use a single shared lock instead of two separate ones. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:35 +02:00
Ofir Bitton	412c41fcd5	habanalabs: support reserving aligned va block Add support for reserving va block with alignment different than page size. This is a pre-requisite for allocations needed in future ASICs Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:35 +02:00
Guy Nisan	b2d09622be	habanalabs: add boot errors prints Add log prints for security and eFuse boot error bits Signed-off-by: Guy Nisan <gnisan@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:35 +02:00
Oded Gabbay	92ede12a07	habanalabs: print message with correct device During hard-reset, the driver rejects further IOCTL calls and prints an error message. That error message should be printed with the correct device instead of using only the control device. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:35 +02:00
Ofir Bitton	5a2998f46c	habanalabs/gaudi: fetch HBM ecc info from FW Once FW security is enabled there is no access to HBM ecc registers, need to read values from FW using a dedicated interface. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:34 +02:00
Ofir Bitton	d611b9f0b1	habanalabs: fetch hard reset capability from FW Driver must fetch FW hard reset capability during boot time, in order to skip the hard reset flow if necessary. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:34 +02:00
Oded Gabbay	7f070c913c	habanalabs: move asic property to correct structure Whether an ASIC has MMU towards its DRAM is an ASIC property, so move it to the asic fixed properties structure. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:34 +02:00
Ofir Bitton	be91b91fa4	habanalabs: use host va range for internal pools Instead of using a dedicated va range for each internal pool, we introduce a new way for reserving a va block from an existing va range. This is a more generic way of reserving va blocks for future use. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:34 +02:00
Ofir Bitton	adb51298fd	habanalabs: improve hard reset procedure We want to handle the scenario in which the driver was not able to kill all user processes due to many memory mappings. We need to retry again after some period while releasing the cores. The devices will be unusable and "in-reset" status during that time. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:34 +02:00
Tomer Tayar	804a72276c	habanalabs: Rename hw_queues_mirror to cs_mirror Future command submission types might be submitted to HW not via the QMAN queues path. However, it would be still required to have the TDR mechanism for these CS, and thus the patch renames the TDR fields and replaces the hw_queues_ prefix with cs_. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:34 +02:00
Ofir Bitton	784b916dad	habanalabs: refactor mmu va_range db structure Use an array of va_ranges instead of keeping each va_range separately, we do this for better readability and in order to support access to a specific range in a much elegant manner. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:34 +02:00
Ofir Bitton	d1ddd90551	habanalabs: move HW dirty check to a proper location Driver must verify if HW is dirty before trying to fetch preboot information. Hence, we move this validation to a prior stage of the boot sequence. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:33 +02:00
Oded Gabbay	28e052c952	habanalabs: restore vm_pgoff after mmap Due to using dma_mmap_coherent() to perform mmap of dma memory, we had to clear the vm_pgoff field before calling that function. However, that broke the userspace (profiler tool) as they relied on searching the /proc/self/maps for these values to correctly "disassemble" the topology recipe. To re-enable that functionality, the driver can simply restore the value of vm_pgoff before returning to userspace but after calling dma_mmap_coherent(). Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:33 +02:00
Ofir Bitton	66a76401c5	habanalabs: add 'needs reset' state in driver The new state indicates that device should be reset in order to re-gain funcionality. This unique state can occur if reset_on_lockup is disabled and an actual lockup has occurred. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:33 +02:00
Omer Shpigelman	f2d032ee13	habanalabs: fix hard reset print and comment One of the first steps of a hard reset flow is to close all open user contexts. This user process teradown might take some time due to long cleanup in our driver or some other reason even before our cleanup flow. Hence fix the relevant print and comment to be more accurate. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:33 +02:00
Igor Grinberg	b726a2f7c0	habanalabs/gaudi: remove pcie_en strap toggle Since the very large grace period is over and this functionality prevents us to implement the new reset sequence and apply security settings, we need to remove the code toggling the PCIE_EN bit in the straps register. Remove it for good. Signed-off-by: Igor Grinberg <igrinberg@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:33 +02:00
Oded Gabbay	66bfcccdb8	habanalabs: remove duplicate print We print twice the firmware status regarding security, once in common code and once in asic code. Remove the print in asic code and leave the common code print. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:33 +02:00
Tomer Tayar	649c459212	habanalabs: Separate CS job completion from its deallocation Current CS jobs are no longer needed after their completion. However, jobs of future workload might be in use even after they are completed. To allow that, the patch adds a refcount to the job object, and decouples its completion handling from its deallocation. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:33 +02:00
Oded Gabbay	0da5698bf4	habanalabs/gaudi: increase MAX CS to 16K We need to have the MAX CS be much larger than the size of the different queues. In GAUDI we have around 8 groups of queues, and each group has 1K queue size. To prevent head-of-the-line blocking, we need to make sure there is sufficient number of available CS allocations even if one or more of those queues are full. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:32 +02:00
farah kassabri	eb10b897e4	habanalabs: reset device upon fw read failure failure in reading pre-boot verion is not handled correctly, upon failure we need to reset the device in order to be able to reinstall the driver. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:32 +02:00
Tomer Tayar	ba7e389c30	habanalabs: Move repeatedly included headers to habanalabs.h Several header files are repeatedly included in many files. Move these files to habanalabs.h which is included by all. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:32 +02:00
Ofir Bitton	c1d505a922	habanalabs: release signal if collective wait was dropped As in standard wait cs, we must release a signal fence once a collective wait cs was dropped and not submitted. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:32 +02:00
Tomer Tayar	4ba1b227b6	habanalabs: Skip updating CI of internal queues if not in use There are no internal queues if H/W queues are being used. In this case we can skip the redundant traversal over the queues array, looking for internal queues. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:32 +02:00
Tomer Tayar	ea6ee260cb	habanalabs: Small refactoring of cs_do_release() Slightly refactor the cs_do_release() function, to reduce nesting level and to ease the handling of future CS types. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:32 +02:00
Tomer Tayar	6de3d769fd	habanalabs: Small refactoring of CS IOCTL handling Refactor the CS IOCTL handling by gathering common code into sub-functions, in order to ease future additions of new CS types. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:32 +02:00
Ofir Bitton	1cbca899fa	habanalabs/gaudi: fetch PLL info from FW Once FW security is enabled there is no access to PLL registers, need to read values from FW using a dedicated interface. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:31 +02:00
Moti Haimovski	ccf979ee33	habanalabs: refactor MMU to support dual residency MMU This commit refactors the MMU code to support PCI MMU page tables residing on host and DCORE MMU residing on the device DRAM at the same time. This is needed for future devices as on GAUDI and GOYA we have a single MMU where its page tables always reside on DRAM. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:31 +02:00
Moti Haimovski	a6722d6a97	habanalabs: fix MMU print message This commit fixes an incorrect error message Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:31 +02:00
farah kassabri	03df136bc5	habanalabs/gaudi: scrub all memory upon closing FD In cases of multi-tenants, administrators may want to prevent data leakage between users running on the same device one after another. To do that the driver can scrub the internal memory (both SRAM and DRAM) after a user finish to use the memory. Because in GAUDI the driver allows only one application to use the device at a time, it can scrub the memory when user app close FD. In future devices where we have MMU on the DRAM, we can scrub the DRAM memory with a finer granularity (page granularity) when the user allocates the memory. This feature is not supported in Goya. To allow users that want to debug their applications, we add a kernel module parameter to load the driver with this feature disabled. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:31 +02:00
Ofir Bitton	c692dec703	habanalabs/gaudi: add support for FW security Skip relevant HW configurations once FW security is enabled because these configurations are being performed by FW. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:31 +02:00
Ofir Bitton	323b726706	habanalabs: fetch security indication from FW Add support for fetching security indication from FW. This indication is needed in order to skip unnecessary initializations done by FW. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:31 +02:00
farah kassabri	e753643d51	habanalabs: fix cs counters structure Fix cs counters structure in uapi to be one flat structure instead of two instances of the same other structure. use atomic read/increment for context counters so we could use one structure for both aggregated and context counters. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:31 +02:00
Ofir Bitton	9bb86b63d8	habanalabs: advanced FW loading Today driver is able to load a whole FW binary into a specific location on ASIC. We add support for loading sections from the same FW binary into different loactions. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:31 +02:00
Oded Gabbay	977d53a614	habanalabs: initialize variable before use GCC 7.3.1 20180303 (Red Hat 7.3.1-5) complains that collective_engine_id might be used uninitialized. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:30 +02:00
Ofir Bitton	71a984f9ae	habanalabs/gaudi: remove unreachable code Remove unreachable code in gaudi collective flow. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:30 +02:00
Oded Gabbay	e716ad3c76	habanalabs: make sure cs type is valid in cs_ioctl_signal_wait Although we get a valid cs type from the callee, in case new values will be added in the future, it is best to check the expected values in that function. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:30 +02:00
Oded Gabbay	3e62299657	habanalabs/gaudi: monitor device memory usage In GAUDI we don't have an MMU towards the HBM device memory. Therefore, the user access that memory directly through physical address (via the different engines) without the need to go through the driver to allocate/free memory on the HBM. For system monitoring purposes, the driver will keep track of the HBM usage. This can be done as long as the user accurately reports the allocations and releases of HBM memory, through the existing MEMORY IOCTL uapi. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:30 +02:00
Ofir Bitton	5de406c0b5	habanalabs: sync stream collective support Implement sync stream collective for GAUDI. Need to allocate additional resources for that and add ctx_fini() to clean up those resources. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:30 +02:00
Ofir Bitton	0940cabafd	habanalabs/gaudi: Set DMA5 QMAN internal DMA5 QMAN is designated to be used for reduction process, hence it will be no longer configured as external queue. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:30 +02:00
Ofir Bitton	5fe1c17ddf	habanalabs: sync stream collective infrastructure Define new API for collective wait support and modify sync stream common flow. In addition add kernel CB allocation support for internal queues. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:30 +02:00
Tal Cohen	4bb1f2f3fb	habanalabs: use enum for CB allocation options In the future there will be situations where queues can accept either kernel allocated CBs or user allocated CBs, depending on different states. Therefore, instead of using a boolean variable of kernel/user allocated CB, we need to use a bitmask to indicate that, which will allow to combine the two options. Add a flag to the uapi so the user will be able to indicate whether the CB was allocated by kernel or by user. Of course the driver validates that. Signed-off-by: Tal Cohen <talcohen@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:29 +02:00
Oded Gabbay	3c68157fb8	habanalabs/gaudi: add support for NIC QMANs Initialize the QMANs that are responsible to submit doorbells to the NIC engines. Add support for stopping and disabling them, and reset them as part of the hard-reset procedure of GAUDI. This will allow the user to submit work to the NICs. Add support for receiving events on QMAN errors from the firmware. However, the nic_ports_mask is still initialized to 0. That means this code won't initialize the QMANs just yet. That will be in a later patch. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:29 +02:00
Oded Gabbay	11dcb8c712	habanalabs/gaudi: add NIC security configuration Configure the security properties of the NIC IP. This is to prevent the user process from doing something with the NIC that he shouldn't do. e.g. crash the server, steal data, etc. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:29 +02:00
Oded Gabbay	b3a9c0bd2f	habanalabs/gaudi: add NIC firmware-related definitions Add new structures and messages that the driver use to interact with the firmware to receive information and events (errors) about GAUDI's NIC. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:29 +02:00
Oded Gabbay	16ac365045	habanalabs/gaudi: add NIC QMAN H/W and registers definitions Add auto-generated header files that describe the NIC QMANs registers used by the driver. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:28 +02:00
Oded Gabbay	becce5f994	habanalabs: remove duplicate check We already check if queue index is smaller than max queues a few lines above this check so no need to check this again. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:28 +02:00
Ofir Bitton	06f791f74f	habanalabs: sync stream refactor functions Refactor sync stream implementation by reducing function length for better readability. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:28 +02:00
Ofir Bitton	2992c1dcd3	habanalabs: add support for multiple SOBs per monitor Support advanced monitor functionality to monitor more than a single SOB. In addition expand all CB generation functions with buffer offset in order to put in them multiple packets that are generated by different functions. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:28 +02:00
Ofir Bitton	3cf74b3656	habanalabs: sync stream structures refactor Refactor sync stream implementation by adding more structures for better readability. In addition reducing allocated resources. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:28 +02:00
Oded Gabbay	f3a965c250	habanalabs: don't init vm module if no MMU In case we are running without MMU enabled (debug mode), no need to initialize the VM module in the driver. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:28 +02:00
Oded Gabbay	8f50314674	habanalabs: minimize prints when everything is fine No need to print when the driver starts to initialize the H/W. Drivers should be silent when everything is OK. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:28 +02:00
Oded Gabbay	596553dbf9	habanalabs: support multiple types of firmwares The driver now loads the firmware in two stages. For debugging purposes we need to support situations where only the first stage firmware is loaded. Therefore, use a bitmask to determine which F/W is loaded Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:27 +02:00
Oded Gabbay	28958207e9	habanalabs: we need CPU queues for hwmon F/W can be loaded but device CPU queues disabled. In that case, HWMON should be disabled. This is only relevant when debugging Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:27 +02:00
Ofir Bitton	20b7525dc4	habanalabs/gaudi: move mmu_prepare to context init Currently mmu_prepare is located at context switch. Since we support a single context, no reason to reconfigure the MMU registers every context switch. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:27 +02:00
Oded Gabbay	23c15ae615	habanalabs: change aggregate cs counters to atomic In case we will have multiple contexts/processes, we can't just increment aggregated counters. We need to make them atomic as they can be incremented by multiple processes Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:47:27 +02:00
Ofir Bitton	5555b7c56b	habanalabs: put devices before driver removal Driver never puts its device and control_device objects, hence a memory leak is introduced every driver removal. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:30:16 +02:00
Ofir Bitton	c8c39fbd01	habanalabs: free host huge va_range if not used If huge range is not valid, driver uses the host range also for huge page allocations, but driver never frees its allocation. This introduces a memory leak every time a user closes its context. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-30 10:30:16 +02:00
Oded Gabbay	652b44453e	habanalabs/gaudi: fix missing code in ECC handling There is missing statement and missing "break;" in the ECC handling code in gaudi.c This will cause a wrong behavior upon certain ECC interrupts. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-23 20:08:43 +02:00
Oded Gabbay	f83f3a31b2	habanalabs/gaudi: mask WDT error in QMAN This interrupt cause is not relevant because of how the user use the QMAN arbitration mechanism. We must mask it as the log explodes with it. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-04 08:56:07 +02:00
Ofir Bitton	1137e1ead9	habanalabs/gaudi: move coresight mmu config We must relocate the coresight mmu configuration to the coresight flow to make it work in case the first submission is to configure the profiler. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-04 08:56:06 +02:00
Arnd Bergmann	82948e6e1d	habanalabs: fix kernel pointer type All throughout the driver, normal kernel pointers are stored as 'u64' struct members, which is kind of silly and requires casting through a uintptr_t to void* every time they are used. There is one line that missed the intermediate uintptr_t case, which leads to a compiler warning: drivers/misc/habanalabs/common/command_buffer.c: In function 'hl_cb_mmap': drivers/misc/habanalabs/common/command_buffer.c:512:44: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] 512 \| rc = hdev->asic_funcs->cb_mmap(hdev, vma, (void *) cb->kernel_address, Rather than adding one more cast, just fix the type and remove all the other casts. Fixes: `0db575350c` ("habanalabs: make use of dma_mmap_coherent") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2020-11-04 08:56:06 +02:00
Oded Gabbay	5b94d6e476	habanalabs/gaudi: use correct define for qman init There was a copy-paste error, and the wrong define was used for initializing the QMAN. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Link: https://lore.kernel.org/r/20200925171415.25663-1-oded.gabbay@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-09-30 08:38:18 +02:00
Ofir Bitton	25121d9804	habanalabs/gaudi: configure QMAN LDMA registers properly LDMA registers are configured with a fixed value. We add new define set which gives the configuration a proper meaning. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-25 14:44:21 +03:00
Oded Gabbay	eab1f6e7b0	habanalabs: add notice of device not idle The device should be idle after a context is closed. If not, print a notice. Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-25 14:44:21 +03:00
Oded Gabbay	3c3aa5dbd6	habanalabs: add debug messages for opening/closing context During debugging of error we sometimes need to know whether the error happened when a user context was open. Add debug prints when opening and closing user contexts. Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-25 14:44:20 +03:00
Oded Gabbay	9e2e8fc7d6	habanalabs: release kernel context after hw_fini Some engines use resources that belong to the kernel context (e.g. MMU mappings). In case the halt-engines doesn't work properly due to H/W restriction, we need to make sure the kernel context lives on until after the hw_fini. The hw_fini resets the ASIC after that no engine is alive and we can safely close the kernel context. Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-25 14:44:20 +03:00
Oded Gabbay	fc6121e961	habanalabs: correct an error message We don't try to allocate huge pages here so remove the huge word. Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-25 14:44:20 +03:00
Oded Gabbay	f279e5cd95	habanalabs: update scratchpad register map Our firmware use some scratchpad registers in the device for different roles. Update the file to the latest version of the firmware code. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:54 +03:00
Oded Gabbay	57799ce9f8	habanalabs: add indication of security-enabled F/W Future F/W versions will have enhanced security measures and the driver won't be able to do certain configurations that it always did and those configurations will be done by the firmware. We use the firmware's preboot version to determine whether security measures are enabled or not. Because we need this very early in our code, the read of the preboot version is moved to the earliest possible place, right after the device's PCI initialization. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:54 +03:00
Oded Gabbay	d1f3633599	habanalabs/gaudi: fix DMA completions max outstanding to 15 This is a workaround for H/W bug H3-2116, where if there are more than 16 outstanding completions in the DMA transpose engine, there can be a deadlock in the engine. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:54 +03:00
Oded Gabbay	dbf053c429	habanalabs/gaudi: remove axi drain support AXI drain is broken in GAUDI so remove support for enabling it. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:54 +03:00
Oded Gabbay	219b8f2ff0	habanalabs: update firmware interface file Add new packet to fetch PLL information from firmware. This will be needed in the future when the driver won't be able to access the PLL registers directly Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:54 +03:00
Tomer Tayar	ef6a0f6caa	habanalabs: Add an option to map CB to device MMU There are cases in which the device should access the host memory of a CB through the device MMU, and thus this memory should be mapped. The patch adds a flag to the CB IOCTL, in which a user can ask the driver to perform the mapping when creating a CB. The mapping is allowed only if a dedicated VA range was allocated for the specific ASIC. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:54 +03:00
Tomer Tayar	fa8641a14f	habanalabs: Save context in a command buffer object Future changes require using a context while handling a command buffer, and thus need to save the context in the command buffer object. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:54 +03:00
Oded Gabbay	448f63badc	habanalabs: no need for DMA_SHARED_BUFFER Now that the driver no longer uses dma_buf, we can remove the select of DMA_SHARED_BUFFER from kconfig. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:53 +03:00
Oded Gabbay	681a22f55f	habanalabs: allow to wait on CS without sleep The user sometimes wants to check if a CS has completed to clean resources. In that case, the user doesn't want to sleep but just to check if the CS has finished and continue with his code. Add a new definition to the API of the wait on CS. The new definition says that if the timeout is 0, the driver won't sleep at all but return immediately after checking if the CS has finished. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:53 +03:00
Oded Gabbay	230b9b7d45	habanalabs/gaudi: increase timeout for boot fit load The firmware running in the boot stage takes more time to execute due to increased security mechanisms. Therefore, we need to increase the timeout we wait for the boot fit to finish loading. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:53 +03:00
Moti Haimovski	214afa974d	habanalabs: add debugfs support for MMU with 6 HOPs This commit modify the existing debugfs code to support future devices that have a 6 HOPs MMU implementation instead of 5 HOPs implementation. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:53 +03:00
Moti Haimovski	7edf341b9e	habanalabs: add num_hops to hl_mmu_properties This commit adds the number of HOPs supported by the device to the device MMU properties. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:53 +03:00
Moti Haimovski	d83fe66928	habanalabs: refactor MMU as device-oriented As preparation to MMU v2, rework MMU to be device oriented instantiated according to the device in hand. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:53 +03:00
Moti Haimovski	c91324f41b	habanalabs: rename mmu.c to mmu_v1.c In the future we will have MMU v2 code, so we need to prepare the driver for it. The first step is to rename the current MMU file to mmu_v1.c. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:53 +03:00
Omer Shpigelman	7c52fb0a09	habanalabs: use smallest possible alignment for virtual addresses Change the acquiring of a device virtual address for mapping by using the smallest possible alignment, rather than the biggest, depending on the page size used by the user for allocating the memory. This will lower the virtual space memory consumption. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:53 +03:00
Oded Gabbay	1fb2f37437	habanalabs: check flag before reset because of f/w event For consistency with GAUDI code, add check of the relevant flag in the device structure before resetting the GOYA device in case of firmware event. Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:52 +03:00
Oded Gabbay	5a1b861daa	habanalabs: increase PQ COMP_OFFSET by one nibble For future ASICs, we increase this field by one nibble. This field was not used by the current ASICs so this change doesn't break anything. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:52 +03:00
Ofir Bitton	763a0b4d81	habanalabs: Fix alignment issue in cpucp_info structure Because the device CPU compiler aligns structures to 8 bytes, struct cpucp_info has an alignment issue as some parts in the structure are not aligned to 8 bytes. It is preferred that we explicitly insert placeholders inside the structure to avoid confusion in order to validate this scenario, we printed both pointers: __u8 cpucp_version[VERSION_MAX_LEN]; (0xffff899c67ed4cbc) __le64 dram_size; (0xffff899c67ed4d40) we see difference of 132 bytes although the first array is only 128 bytes long, Meaning compiler added a 4 byte padding. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:52 +03:00
Oded Gabbay	ae926514dd	habanalabs: remove unused define Cleanup the code. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:52 +03:00
Oded Gabbay	b01a971f80	habanalabs: remove unused ASIC function pointer Old function pointer that was left when the call to this function pointer was removed. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:52 +03:00
Oded Gabbay	6138bbe911	habanalabs: rename ArmCP to CPU-CP There were a couple of comments where the name ArmCP was still used. Rename it to CPU-CP. In addition, rename ArmCP or ARM in log messages to "device CPU". Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:52 +03:00
Oded Gabbay	975ab7b32b	habanalabs: count dropped CS because max CS in-flight There is a case where the user reaches the maximum number of CS in-flight. In that case, the driver rejects the new CS of the user with EAGAIN. Count that event so the user can query the driver later to see if it happened. Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:52 +03:00
Hillf Danton	0db575350c	habanalabs: make use of dma_mmap_coherent Add dma_mmap_coherent() for goya and gaudi to match their use of dma_alloc_coherent(), see the Link tag for why. Link: https://lore.kernel.org/lkml/20200609091727.GA23814@lst.de/ Cc: Christoph Hellwig <hch@lst.de> Cc: Zhang Li <li.zhang@bitmain.com> Cc: Ding Z Nan <oshack@hotmail.com> Signed-off-by: Hillf Danton <hdanton@sina.com> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:51 +03:00
Oded Gabbay	c5e0ec66f0	habanalabs: clear vm_pgoff before doing the mmap The driver use vm_pgoff to hold the CB idr handle. Before we actually call the mapping function, we need to clear the handle so there won't be any garbage left in vm_pgoff. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:51 +03:00
Oded Gabbay	3174ac9bb1	habanalabs: restructure hl_mmap Arrange the hl_mmap code to be more structured and expandable for the future. Add better defines that describe our usage of the vm_pgoff. Note that I shamelessly took the code and defines from the amdkfd driver (my previous driver). Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:51 +03:00
Oded Gabbay	f763946aef	habanalabs: cast to u64 before shift > 31 bits When shifting a boolean variable by more than 31 bits and putting the result into a u64 variable, we need to cast the boolean into unsigned 64 bits to prevent possible overflow. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:51 +03:00
Oded Gabbay	2f55342c5e	habanalabs: replace armcp with the generic cpucp ArmCP mandates that the device CPU is always an ARM processor, which might be wrong in the future. Most of this change is an internal renaming of variables, functions and defines but there are two entries in sysfs which have armcp in their names. Add identical cpucp entries but don't remove yet the armcp entries. Those will be deprecated next year. Add the documentation about it in sysfs documentation. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:51 +03:00
Oded Gabbay	42b0698add	habanalabs: update GAUDI hardware specs Add define for the 2 MME slave engines. Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:51 +03:00
farah kassabri	9f3064913e	habanalabs: add support for getting device total energy Add driver implementation for reading the total energy consumption from the device ARM FW. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:51 +03:00
Tomer Tayar	56004701f5	habanalabs: Include linux/bitfield.h only in habanalabs.h Include linux/bitfield.h only in habanalabs.h, instead of in each and every file that needs it, as habanalabs.h is already included by all. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:51 +03:00
farah kassabri	d90416c84d	habanalabs: extend busy engines mask to 64 bits change busy engines bitmask to 64 bits in order to represent more engines, needed for future ASIC support. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:50 +03:00
Oded Gabbay	107dd31465	habanalabs: use 1U when shifting bits Eliminate following warning: warning: Shifting signed 32-bit value by 31 bits is undefined behavior Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:50 +03:00
Oded Gabbay	31ac1f1a57	habanalabs: check TPC vector pipe is empty The driver waits for the TPC vector pipe to be empty before checking if the TPC kernel has finished executing, but the code doesn't validate that the pipe was indeed empty, it just wait for it without checking the return value. Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:50 +03:00
Oded Gabbay	0358372bbe	habanalabs: remove redundant assignment to variable new_dma_pkt->ctl is assigned a value and then is reassigned a new value without the first value ever being used. Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:50 +03:00
Oded Gabbay	65887291c6	habanalabs: use FIELD_PREP() instead of << Use the standard FIELD_PREP() macro instead of << operator to perform bitmask operations. This ensures type check safety and eliminate compiler warnings. Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:50 +03:00
Oded Gabbay	a0e072f5a1	habanalabs: use standard BIT() and GENMASK() Use the standard macros to define bitmasks. Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:50 +03:00
Oded Gabbay	bd4ef37292	habanalabs: eliminate redundant else condition If both parts of if-else are goto statements, we can remove the else and put the else goto statement after the if statement. Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:50 +03:00
Oded Gabbay	f907af183b	habanalabs: cast int to u32 before printing it with %u %u is used for unsigned so we need to cast the int variable to u32 to avoid compiler warning. Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:49 +03:00
Oded Gabbay	f5b9c8cf25	habanalabs: change CB's ID to be 64 bits Although the possible values for CB's ID are only 32 bits, there are a few places in the code where this field is shifted and passed into a function which expects 64 bits. Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:49 +03:00
Dotan Barak	d6b045c083	habanalabs: print the queue id in case of an error If there is a failure during the testing of a queue, to ease up debugging - print the queue id. Signed-off-by: Dotan Barak <dbarak@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:49 +03:00
farah kassabri	acd330c141	habanalabs: remove security from ARB_MST_QUIET register Allow user application to write to this register in order to be able to configure the quiet period of the QMAN between grants. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:49 +03:00
Ofir Bitton	2e5eda4681	habanalabs: PCIe Advanced Error Reporting support driver will now get notified upon any PCI error occurred and will respond according to the severity of the error. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:49 +03:00
Ofir Bitton	843839bec3	habanalabs: expose sync manager resources allocation in INFO IOCTL Although the driver defines the first user-available sync manager object and monitor in habanalabs.h, we would like to also expose this information via the INFO IOCTL so the runtime can get this information dynamically. This is because in future ASICs we won't need to define it statically. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:49 +03:00
Ofir Bitton	0a068adde5	habanalabs: add information about PCIe controller Update firmware header with new API for getting pcie info such as tx/rx throughput and replay counter. These counters are needed by customers for monitor and maintenance of multiple devices. Add new opcodes to the INFO ioctl to retrieve these counters. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:49 +03:00
Ofir Bitton	a98d73c7fa	habanalabs: Replace dma-fence mechanism with completions habanalabs driver uses dma-fence mechanism for synchronization. dma-fence mechanism was designed solely for GPUs, hence we purpose a simpler mechanism based on completions to replace current dma-fence objects. Signed-off-by: Ofir Bitton <obitton@habana.ai> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-09-22 18:49:48 +03:00
Oded Gabbay	b71590efb2	habanalabs: increase length of ASIC name Future ASIC names are longer than 15 chars so increase the variable length to 32 chars. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Tomer Tayar <ttayar@habana.ai>	2020-09-22 18:49:48 +03:00
Ofir Bitton	69c6e18d0c	habanalabs: fix report of RAZWI initiator coordinates All initiator coordinates received upon an 'MMU page fault RAZWI event' should be the routers coordinates, the only exception is the DMA initiators for which the reported coordinates correspond to their actual location. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-08-31 15:10:27 +03:00
Moti Haimovski	6396feabf7	habanalabs: prevent user buff overflow This commit fixes a potential debugfs issue that may occur when reading the clock gating mask into the user buffer since the user buffer size was not taken into consideration. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-08-31 15:10:27 +03:00
Ofir Bitton	5aba368893	habanalabs: correctly report inbound pci region cfg error During inbound iATU configuration we can get errors while configuring PCI registers, there is a certain scenario in which these errors are not reflected and driver is loaded with wrong configuration. Signed-off-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-08-22 12:47:58 +03:00
Ofir Bitton	0839152f8c	habanalabs: check correct vmalloc return code vmalloc can return different return code than NULL and a valid pointer. We must validate it in order to dereference a non valid pointer. Signed-off-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-08-22 12:47:58 +03:00
Ofir Bitton	bce382a8bb	habanalabs: validate FW file size We must validate FW size in order not to corrupt memory in case a malicious FW file will be present in system. Signed-off-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-08-22 12:47:58 +03:00
Colin Ian King	804d057cfa	habanalabs: fix incorrect check on failed workqueue create The null check on a failed workqueue create is currently null checking hdev->cq_wq rather than the pointer hdev->cq_wq[i] and so the test will never be true on a failed workqueue create. Fix this by checking hdev->cq_wq[i]. Addresses-Coverity: ("Dereference before null check") Fixes: `5574cb2194` ("habanalabs: Assign each CQ with its own work queue") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-08-22 12:47:58 +03:00
Oded Gabbay	58361aae4b	habanalabs: set max power according to card type In Gaudi, the default max power setting is different between PCI and PMC cards. Therefore, the driver need to set the default after knowing what is the card type. The current code has a bug where it limits the maximum power of the PMC card to 200W after a reset occurs. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-08-22 12:47:57 +03:00
Ofir Bitton	36545279f0	habanalabs: proper handling of alloc size in coresight Allocation size can go up to 64bit but truncated to 32bit, we should make sure it is not truncated and validate no address overflow. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-08-22 12:47:57 +03:00
Ofir Bitton	f44d23b909	habanalabs: set clock gating according to mask Once clock gating is set we enable clock gating according to mask, we should also disable clock gating according to relevant bits. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-08-22 12:47:57 +03:00
Ofir Bitton	1cff119740	habanalabs: verify user input in cs_ioctl_signal_wait User input must be validated before using it to access internal structures. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-08-22 12:47:57 +03:00
Dan Carpenter	b0353540ff	habanalabs: Fix a loop in gaudi_extract_ecc_info() The condition was reversed. It should have been less than instead of greater than. The result is that we never enter the loop. Fixes: `fcc6a4e606` ("habanalabs: Extract ECC information from FW") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-08-22 12:47:57 +03:00
Dan Carpenter	eeec23cd32	habanalabs: Fix memory corruption in debugfs This has to be a long instead of a u32 because we write a long value. On 64 bit systems, this will cause memory corruption. Fixes: `c216477363` ("habanalabs: add debugfs support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-08-22 12:47:57 +03:00
Ofir Bitton	bc75be24fa	habanalabs: validate packet id during CB parse During command buffer parsing, driver extracts packet id from user buffer. Driver must validate this packet id, since it is being used in order to extract information from internal structures. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-08-22 12:47:57 +03:00
Ofir Bitton	bf6d10963e	habanalabs: Validate user address before mapping User address must be validated before driver performs address map. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-08-22 12:47:56 +03:00
Ofir Bitton	f1aae40e8d	habanalabs: unmap PCI bars upon iATU failure In case the driver fails to configure the PCI controller iATU, it needs to unmap the PCI bars before exiting so if the driver is removed, the bars won't be left mapped. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-08-22 12:47:56 +03:00
Wei Yongjun	22362aa30b	habanalabs: remove unused but set variable 'ctx_asid' Gcc report warning as follows: drivers/misc/habanalabs/common/command_submission.c:373:6: warning: variable 'ctx_asid' set but not used [-Wunused-but-set-variable] 373 \| int ctx_asid, rc; \| ^~~~~~~~ This variable is not used in function cs_timedout(), this commit remove it to fix the warning. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Link: https://lore.kernel.org/r/20200729155902.33976-1-weiyongjun1@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-07-29 18:02:21 +02:00
kernel test robot	bb34bf798c	habanalabs: goya_ctx_init() can be static Signed-off-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/r/20200729000313.GA14680@e442e3f624c4 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-07-29 09:46:05 +02:00
Greg Kroah-Hartman	7b16a15524	habanalabs: fix up absolute include instructions There's no need to try to be cute with the include file locations in the Makefile, so just specify exactly where the files are. Bonus is this fixes the problem of building with O= as well as trying to just build the subdirectory alone. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Oded Gabbay <oded.gabbay@gmail.com> Cc: Omer Shpigelman <oshpigelman@habana.ai> Cc: Tomer Tayar <ttayar@habana.ai> Cc: Moti Haimovski <mhaimovski@habana.ai> Cc: Ofir Bitton <obitton@habana.ai> Cc: Ben Segal <bpsegal20@gmail.com> Cc: Christine Gharzuzi <cgharzuzi@habana.ai> Cc: Pawel Piskorski <ppiskorski@habana.ai> Link: https://lore.kernel.org/r/20200728171851.55842-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-07-29 08:15:50 +02:00
Greg Kroah-Hartman	65a9bde6ed	Linux 5.8-rc7 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl8d8h4eHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGd0sH/2iktYhMwPxzzpnb eI3OuTX/mRn4vUFOfpx9dmGVleMfKkpbvnn3IY7wA62Qfv7J7lkFRa1Bd1DlqXfW yyGTGDSKG5chiRCOU3s9ni92M4xIzFlrojyt/dIK2lUGMzUPI9FGlZRGQLKqqwLh 2syOXRWbcQ7e52IHtDSy3YBNveKRsP4NyqV+GxGiex18SMB/M3Pw9EMH614eDPsE QAGQi5uGv4hPJtFHgXgUyBPLFHIyFAiVxhFRIj7u2DSEKY79+wO1CGWFiFvdTY4B CbqKXLffY3iQdFsLJkj9Dl8cnOQnoY44V0EBzhhORxeOp71StUVaRwQMFa5tp48G 171s5Hs= =BQIl -----END PGP SIGNATURE----- Merge 5.8-rc7 into char-misc-next This should resolve the merge/build issues reported when trying to create linux-next. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-07-27 11:49:37 +02:00
Tomer Tayar	94f8be9eb0	habanalabs: Fix memory leak in error flow of context initialization Add a missing free of the cs_pending array in the error flow of context initialization. Fixes: `c16d45f42b` ("habanalabs: Use pending CS amount per ASIC") Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-07-24 20:40:06 +03:00
Tomer Tayar	644883ef1a	habanalabs: use no flags on MMU cache invalidation gaudi_mmu_invalidate_cache() doesn't use the flags parameter, and thus it can be set to 0 when the function is called in the gaudi only files. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-07-24 20:31:37 +03:00
Oded Gabbay	8df8cb1efc	habanalabs: enable device before hw_init() Device is now enabled before the hw_init() because part of the initialization requires communication with the device firmware to get information that is required for the initialization itself Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Tomer Tayar <ttayar@habana.ai>	2020-07-24 20:31:37 +03:00
Ofir Bitton	a04b7cd97e	habanalabs: create internal CB pool Create a device MMU-mapped internal command buffer pool, in order to allow the driver to allocate CBs for the signal/wait operations that are fetched by the queues when they are configured with the user's address space ID. We must pre-map this internal pool due to performance issues. This pool is needed for future ASIC support and it is currently unused in GOYA and GAUDI. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-07-24 20:31:37 +03:00
Oded Gabbay	eb8b293e79	habanalabs: update hl_boot_if.h from firmware Update the boot interface file from the latest version from firmware. Defines for secure boot were added. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>	2020-07-24 20:31:37 +03:00
Oded Gabbay	70b2f993ea	habanalabs: create common folder For internal needs of our CI we need to move all the common code into a common folder instead of putting them in the root folder of the driver. Same applies to the common header files under include/ Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>	2020-07-24 20:31:37 +03:00
Moti Haimovski	a9855a2d91	habanalabs: check for DMA errors when clearing memory In GAUDI we use QMAN0 DMA for clearing the MMU memory region at initialization. if this operation fails it places the DMA in an error state and then when trying to initialize QMAN0 we fail and erroneously assume its the QMAN that failed. This commit adds a check and clear of such DMA errors at initialization so we will have a better understanding of what went wrong. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-07-24 20:31:37 +03:00
Ofir Bitton	22cb855598	habanalabs: verify queue can contain all cs jobs In order for the user to be aware of wrong inputs, we must return error in case the amount of jobs per cs exceeds the corresponding queue size. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-07-24 20:31:37 +03:00
Ofir Bitton	5574cb2194	habanalabs: Assign each CQ with its own work queue We identified a possible race during job completion when working with a single multi-threaded work queue. In order to overcome this race we suggest using a single threaded work queue per completion queue, hence we guarantee jobs completion in order. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-07-24 20:31:37 +03:00
Oded Gabbay	c83c417193	habanalabs: halt device CPU only upon certain reset Currently the driver halts the device CPU in the halt engines function, which halts all the engines of the ASIC. The problem is that if later on we stop the reset process (due to inability to clean memory mappings in time), the CPU will remain in halt mode. This creates many issues, such as thermal/power control and FLR handling. Therefore, move the halting of the device CPU to the very end of the reset process, just before writing to the registers to initiate the reset. In addition, the driver now needs to send a message to the device F/W to disable it from sending interrupts to the host machine because during halt engines function the driver disables the MSI/MSI-X interrupts. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Tomer Tayar <ttayar@habana.ai>	2020-07-24 20:31:36 +03:00
Omer Shpigelman	9158c47e20	habanalabs: remove unused hash Remove an old hash that is not in use anymore. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-07-24 20:31:36 +03:00
Ofir Bitton	79b1894c41	habanalabs: use queue pi/ci in order to determine queue occupancy Instead of using the free slots amount on the compute CQ to determine whether we can submit work to queues, use the queues pi/ci. This is needed in future ASICs where we don't have CQ per queue. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-07-24 20:31:36 +03:00
Ofir Bitton	3abc99bb7d	habanalabs: configure maximum queues per asic Currently the amount of maximum queues is statically configured. Using a static value is causing redundunt cycles when traversing all queues and consumes more memory than actually needed. In this patch we configure each asic with the exact number of queues needed. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-07-24 20:31:36 +03:00
Oded Gabbay	12ae3133d2	habanalabs: remove soft-reset support from GAUDI Soft-reset isn't supported in GAUDI. Remove the code that performs it and print error in case the user wants to do it via sysfs. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Tomer Tayar <ttayar@habana.ai>	2020-07-24 20:31:36 +03:00
Ofir Bitton	f4cbfd2445	habanalabs: PCIe iATU refactoring Divide iATU initialization into inbound/outbound methods. We must separate it in order to enable different match mode per PCIe region. In addition, added support for PCI address match mode. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-07-24 20:31:36 +03:00
Oded Gabbay	fcc6a4e606	habanalabs: Extract ECC information from FW ECC (Error Correcting Code) interrupts are going to be handled by the FW. Hence, we define an interface in which the driver can obtain the relevant ECC information. This information is needed for monitoring and can also lead to a hard reset if ECC error is not correctable. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-07-24 20:31:36 +03:00
Ofir Bitton	db491e4f08	habanalabs: Add dropped cs statistics info struct Add command submission statistics structure which can be obtained through the info ioctl. Each drop counter describes the reason for which the command submission was dropped. This information is needed for the user to be aware of the specific reason for which the submitted work was dropped. The user can then utilize the driver more efficiently. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-07-24 20:31:36 +03:00
Christine Gharzuzi	c8f9b49d2d	habanalabs: extract cpu boot status lookup Extract detection of the cpu boot status to a function to allow code reuse Signed-off-by: Christine Gharzuzi <cgharzuzi@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-07-24 20:31:35 +03:00
Oded Gabbay	0eab4f89d6	habanalabs: rephrase error messages rephrase some error/warning/notice messages to make them more accessible to ordinary users. There is no need to print context ASID as the driver currently doesn't support multiple contexts. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Tomer Tayar <ttayar@habana.ai>	2020-07-24 20:31:35 +03:00
Ofir Bitton	dd9efabd0a	habanalabs: Increase queues depth After recent concurrent cs amount increase, we must also increase queues depth since much more concurrent work can be done. All external queue depths were increased to 4096 as gaudi's internal queue depths were also increased to 1024. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-07-24 20:31:35 +03:00
Omer Shpigelman	917b79b096	habanalabs: rephrase error message Rephrase F/W error message to make it more understandable to ordinary users. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-07-24 20:31:35 +03:00
Adam Aharon	e8edded693	habanalabs: calculate trace frequency from PLL The profiler needs to know the PLL values for correctly showing the profiling data. Because our firmware can use different PLL configurations, we need to read the PLL values from the ASIC to pass them to the profiler. Signed-off-by: Adam Aharon <aaharon@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-07-24 20:31:35 +03:00
Oded Gabbay	6ced91170d	habanalabs: align armcp_packet structure to 8 bytes Once there is a 64-bit field in a structure, GCC compiler for ARM aligns the structure to 8 bytes. In order to avoid confusion when these structures are being passed between CPUs from different architectures, we explicitly align the structure to 8 bytes. Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-07-24 20:31:35 +03:00
Ofir Bitton	6c07bab34b	habanalabs: Use mask instead of shift in sync stream registers Use proper bitfield masks instead of shifting values when configuring packets sent to device. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-07-24 20:31:34 +03:00
Ofir Bitton	21e7a34634	habanalabs: sync stream generic functionality Currently sync stream is limited only for external queues. We want to remove this constraint by adding a new queue property dedicated for sync stream. In addition we move the initialization and reset methods to the common code since we can re-use them with slight changes. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-07-24 20:31:34 +03:00
Ofir Bitton	c16d45f42b	habanalabs: Use pending CS amount per ASIC Training schemes requires much more concurrent command submissions than inference does. In addition, training command submissions can be completed in a non serialized manner. Hence, we add support in which each ASIC will be able to configure the amount of concurrent pending command submissions, rather than use a predefined amount. This change will enhance performance by allowing the user to add more concurrent work without waiting for the previous work to be completed. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-07-24 20:31:34 +03:00
Oded Gabbay	0b168c8f1d	habanalabs: remove rate limiters from GAUDI We no longer need to initialize the rate limiters in GAUDI A1. Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>	2020-07-24 20:31:34 +03:00
Oded Gabbay	cea7a0449e	habanalabs: prevent possible out-of-bounds array access Queue index is received from the user. Therefore, we must validate it before using it to access the queue props array. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Tomer Tayar <ttayar@habana.ai>	2020-07-19 08:15:36 +03:00
Oded Gabbay	788cacf308	habanalabs: set 4s timeout for message to device CPU We see that sometimes the CPU in GOYA and GAUDI is occupied by the power/thermal loop and can't answer requests from the driver fast enough. Therefore, to avoid false notifications on timeouts, increase the timeout to 4 seconds on each message sent to the device CPU. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Tomer Tayar <ttayar@habana.ai>	2020-07-10 19:53:03 +03:00
Oded Gabbay	e38bfd30e0	habanalabs: set clock gating per engine For debugging purposes, we need to allow the root user better control of the clock gating feature of the DMA and compute engines. Therefore, change the clock gating debugfs interface to be bitmask instead of true/false. Each bit represents a different engine, according to gaudi_engine_id enum. See debugfs documentation for more details. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>	2020-07-10 19:53:03 +03:00
Oded Gabbay	2edc66e22b	habanalabs: block WREG_BULK packet on PDMA WREG_BULK is a special packet that has a variable length. Therefore, we can't parse it when validating CBs that go to the PCI DMA queue. In case the user needs to use it, it can put multiple WREG32 packets instead. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>	2020-07-10 19:53:03 +03:00
Lee Jones	14395c6fb1	misc: habanalabs: gaudi: gaudi_security: Repair incorrectly named function arg gaudi_pb_set_block()'s argument 'base' was incorrectly named 'block' in its function header. Fixes the following W=1 kernel build warning(s): drivers/misc/habanalabs/gaudi/gaudi_security.c:454: warning: Function parameter or member 'base' not described in 'gaudi_pb_set_block' drivers/misc/habanalabs/gaudi/gaudi_security.c:454: warning: Excess function parameter 'block' description in 'gaudi_pb_set_block' Cc: Oded Gabbay <oded.gabbay@gmail.com> Cc: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Lee Jones <lee.jones@linaro.org> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Link: https://lore.kernel.org/r/20200701085853.164358-11-lee.jones@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-07-01 15:05:37 +02:00
Lee Jones	f7d227c306	misc: habanalabs: gaudi: Remove ill placed asterisk from kerneldoc header W=1 kernel builds report a lack of description of gaudi_set_asic_funcs()'s 'hdev' argument. In reality it is documented, but the formatting was not as expected '@.*:'. Instead, there was a misplaced asterisk which was confusing the kerneldoc validator. Squashes the following W=1 warning: drivers/misc/habanalabs/gaudi/gaudi.c:6746: warning: Function parameter or member 'hdev' not described in 'gaudi_set_asic_funcs' Cc: Oded Gabbay <oded.gabbay@gmail.com> Cc: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Lee Jones <lee.jones@linaro.org> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Link: https://lore.kernel.org/r/20200701085853.164358-10-lee.jones@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-07-01 15:05:37 +02:00
Lee Jones	67db05cea6	misc: habanalabs: goya: goya_coresight: Remove set but unused variable 'val' No attempt to check the return value of RREG32() has been made since the call was introduced a year ago. Fixes W=1 kernel build warning: drivers/misc/habanalabs/goya/goya_coresight.c: In function ‘goya_debug_coresight’: drivers/misc/habanalabs/goya/goya_coresight.c:643:6: warning: variable ‘val’ set but not used [-Wunused-but-set-variable] 643 \| u32 val; \| ^~~ Cc: Oded Gabbay <oded.gabbay@gmail.com> Cc: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Lee Jones <lee.jones@linaro.org> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Link: https://lore.kernel.org/r/20200701085853.164358-9-lee.jones@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-07-01 15:05:37 +02:00
Lee Jones	e0712c600e	misc: habanalabs: pci: Scrub documentation for non-present function argument 'dma_mask' is not passed directly into hl_pci_set_dma_mask() as an argument. Instead, it is pulled from struct hl_device *hdev. Fixed the following W=1 warning: drivers/misc/habanalabs/pci.c:328: warning: Excess function parameter 'dma_mask' description in 'hl_pci_set_dma_mask Cc: Oded Gabbay <oded.gabbay@gmail.com> Cc: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Lee Jones <lee.jones@linaro.org> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Link: https://lore.kernel.org/r/20200701085853.164358-8-lee.jones@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-07-01 15:05:36 +02:00
Lee Jones	2557f27fd4	misc: habanalabs: goya: Omit pointless check ensuring addr is >=0 Seeing as 'addr' is unsigned, it would be impossible for the assigned value to be anything other than zero or positive. Squashes the following W=1 warnings: drivers/misc/habanalabs/goya/goya.c: In function ‘goya_debugfs_read32’: drivers/misc/habanalabs/goya/goya.c:3945:19: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits] 3945 \| } else if ((addr >= DRAM_PHYS_BASE) && \| ^~ drivers/misc/habanalabs/goya/goya.c: In function ‘goya_debugfs_write32’: drivers/misc/habanalabs/goya/goya.c:4002:19: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits] 4002 \| } else if ((addr >= DRAM_PHYS_BASE) && \| ^~ drivers/misc/habanalabs/goya/goya.c: In function ‘goya_debugfs_read64’: drivers/misc/habanalabs/goya/goya.c:4047:19: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits] 4047 \| } else if ((addr >= DRAM_PHYS_BASE) && \| ^~ drivers/misc/habanalabs/goya/goya.c: In function ‘goya_debugfs_write64’: drivers/misc/habanalabs/goya/goya.c:4091:19: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits] 4091 \| } else if ((addr >= DRAM_PHYS_BASE) && \| ^~ drivers/misc/habanalabs/pci.c:328: warning: Excess function parameter 'dma_mask' description in 'hl_pci_set_dma_mask' drivers/misc/habanalabs/goya/goya_coresight.c: In function ‘goya_debug_coresight’: drivers/misc/habanalabs/goya/goya_coresight.c:643:6: warning: variable ‘val’ set but not used [-Wunused-but-set-variable] 643 \| u32 val; \| ^~~ Cc: Oded Gabbay <oded.gabbay@gmail.com> Cc: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Lee Jones <lee.jones@linaro.org> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Link: https://lore.kernel.org/r/20200701085853.164358-7-lee.jones@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-07-01 15:05:36 +02:00
Lee Jones	3db99f000b	misc: habanalabs: irq: Repair kerneldoc formatting issues W=1 kernel builds report a lack of descriptions for various function arguments. In reality they are documented, but the formatting was not as expected '@.:'. Instead, '-'s were used as separators. While we're here, the headers for functions various functions were written in kerneldoc format, but lack the kerneldoc identifier '/*'. Let's promote them so they can gain access to the checker. This change fixes the following W=1 warnings: drivers/misc/habanalabs/irq.c:24: warning: Function parameter or member 'eq_work' not described in 'hl_eqe_work' drivers/misc/habanalabs/irq.c:24: warning: Function parameter or member 'hdev' not described in 'hl_eqe_work' drivers/misc/habanalabs/irq.c:24: warning: Function parameter or member 'eq_entry' not described in 'hl_eqe_work' Cc: Oded Gabbay <oded.gabbay@gmail.com> Cc: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Lee Jones <lee.jones@linaro.org> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Link: https://lore.kernel.org/r/20200701085853.164358-6-lee.jones@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-07-01 15:05:36 +02:00
Lee Jones	df123c9dcd	misc: habanalabs: pci: Fix a variety of kerneldoc issues hl_pci_bars_map() has a miss-typed argument name in the function description. hl_pci_elbi_write() was missing documented arguments. The headers for functions hl_pci_bars_unmap(), hl_pci_elbi_write() and hl_pci_reset_link_through_bridge() were written in kerneldoc format, but lack the kerneldoc identifier '/**'. Let's promote them so they can gain access to the checker. These changes fix the following W=1 kernel build warnings: drivers/misc/habanalabs/pci.c:27: warning: Function parameter or member 'name' not described in 'hl_pci_bars_map' drivers/misc/habanalabs/pci.c:27: warning: Excess function parameter 'bar_name' description in 'hl_pci_bars_map' drivers/misc/habanalabs/pci.c:147: warning: Function parameter or member 'addr' not described in 'hl_pci_iatu_write' drivers/misc/habanalabs/pci.c:147: warning: Function parameter or member 'data' not described in 'hl_pci_iatu_write' drivers/misc/habanalabs/pci.c:324: warning: Excess function parameter 'dma_mask' description in 'hl_pci_set_dma_mask' Cc: Oded Gabbay <oded.gabbay@gmail.com> Cc: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Lee Jones <lee.jones@linaro.org> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Link: https://lore.kernel.org/r/20200701085853.164358-5-lee.jones@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-07-01 15:05:36 +02:00

... 3 4 5 6 7 ...

727 Commits