OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Oded Gabbay	9c7fde71a7	habanalabs: use %pa to print pci bar size PCI bar size is resource_size_t so we should use %pa to make it work correctly on all architectures. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2022-07-12 09:09:26 +03:00
Dafna Hirschfeld	0407c155f1	habanalabs/gaudi: replace hl_poll_timeout with while loop in gaudi_scrub_device_mem, replace call to hl_poll_timeout with a while loop to avoid using dummy variables. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2022-07-12 09:09:26 +03:00
Ohad Sharabi	fce854e9bc	habanalabs: communicate supported page sizes to user Because in future ASICs the driver will allow the user to set the page size we need to make sure this data is propagated in all APIs. In addition, since this is already an ASIC property we no longer need ASIC function for it. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2022-07-12 09:09:26 +03:00
Ohad Sharabi	b2711ab2b0	habanalabs: page size can only be a power of 2 We dropped support for page sizes that are not power of 2. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2022-07-12 09:09:26 +03:00
Ohad Sharabi	1ef0c327e1	habanalabs: refactor dma asic-specific functions This is a pre-requisite patch for adding tracepoints to the DMA memory operations (allocation/free) in the driver. The main purpose is to be able to cross data with the map operations and determine whether memory violation occurred, for example free DMA allocation before unmapping it from device memory. To achieve this the DMA alloc/free code flows were refactored so that a single DMA tracepoint will catch many flows. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2022-07-12 09:09:26 +03:00
Oded Gabbay	c37d50e84e	habanalabs/gaudi: remove unused enum Also beautify code by preferring single line wherever possible. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2022-07-12 09:09:25 +03:00
Oded Gabbay	e3f49437a2	habanalabs/gaudi: mask constant value before cast This fixes a sparse warning of "cast truncates bits from constant value" Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2022-07-12 09:09:25 +03:00
Oded Gabbay	c74400f61e	habanalabs/gaudi: use correct type in assignment packets are defined as LE so we need to convert before assigning values to them. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2022-07-12 09:09:25 +03:00
Oded Gabbay	94f27905bd	habanalabs/gaudi: fix function name in comment function name in comment didn't match actual function name. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2022-07-12 09:09:25 +03:00
Dafna Hirschfeld	70852c95ac	habanalabs/gaudi: use memory_scrub_val from debugfs In the callback scrub_device_mem, use 'memory_scrub_val' from debugfs for the scrubbing value. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2022-07-12 09:09:25 +03:00
Dafna Hirschfeld	8c834a1442	habanalabs: don't send addr and size to scrub_device_mem cb We use scrub_device_mem only to scrub the entire SRAM and entire DRAM. Therefore there is no need to send addr and size args to the callback. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2022-07-12 09:09:25 +03:00
Yuri Nudelman	17ab47d2d6	habanalabs/gaudi: fix a race condition causing DMAR error There is a rare race condition in CB completion mechanism, that can occur under a very high pressure of command submissions. The preconditions for this to happen are: 1. There should be enough command submissions for the pre-allocated patched CB pool to run out of commands. At this stage we start allocating new patched CBs as they arrive. 2. CB size has to be exactly (128*n + 104)B for some n, i.e. 24B below a cache line end. The flow: 1. Two command buffers being completed on different streams, at the same time. Denote those CB1 and CB2. 2. Each command buffer is injected with two messages, 16B each - one for a HBW update of the completion queue, another to raise interrupt. 3. Assume CB1 updated the completion queue and raise the interrupt. 4. Assume CB2 updated the completion queue but did not raise the interrupt yet. 5. The host receives the interrupt. It goes over the completion queue and sees two completions - CB1 and CB2. Release them both. 6. CB2 performs the last command. The problem is that the last command is split between 2 cache lines. So to read the last 8B of the last command, it has to access the host again. Problem is - CB2 is already released. This causes a DMAR error. The solution to this problem is simply to make sure the last two commands in the CB are always in the same cache line, using NOP padding. Signed-off-by: Yuri Nudelman <ynudelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2022-07-12 09:09:24 +03:00
Koby Elbaz	0c584e192f	habanalabs/gaudi: fix warning: var might be used uninitialized kernel test robot: "warning: variable 'index' is used uninitialized whenever 'if' condition is false" Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2022-07-12 09:09:24 +03:00
Tal Cohen	67a54d5de2	habanalabs/gaudi: notify user process on device unavailable When a device error occurs, user process would like to get some indication on the error by reading some device HW info. If the device is unavailable, user process can't perform any HW device reading. Signed-off-by: Tal Cohen <talcohen@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2022-07-12 09:09:23 +03:00
Oded Gabbay	4cd213807b	habanalabs: remove unused get_dma_desc_list_size This asic callback function is not called anymore from the common code. The asic-specific function itself is called but from within the asic-specific code. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2022-07-12 09:09:23 +03:00
Ofir Bitton	01622098ae	habanalabs/gaudi: fix shift out of bounds When validating NIC queues, queue offset calculation must be performed only for NIC queues. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2022-07-12 09:09:23 +03:00
Koby Elbaz	70d25e96b6	habanalabs/gaudi: fix incorrect MME offset calculation Once FW raised an event following a MME2 QMAN error, the driver should have gone to the corresponding status registers, trying to gather more info on the error, yet it was accidentally accessing MME1 QMAN address space. Generally, we have x4 MMEs, while 0 & 2 are marked MASTER, and 1 & 3 are marked SLAVE. The former can be addressed, yet addressing the latter is considered an access violation, and will result in a hung system, which is what unintentionally happened above. Note that this cannot happen in a secured system, since these registers are protected with range registers. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2022-07-12 09:09:23 +03:00
Tal Cohen	969202e5cb	habanalabs/gaudi: send device reset notification Device reset event, indicates that the device shall be reset - after a short delay. In such case, the driver sends a notification towards the User process. This allows the User process to be able to take several debug actions for system diagnostic purposes. Signed-off-by: Tal Cohen <talcohen@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2022-07-12 09:09:22 +03:00
Tal Cohen	be572e67da	habanalabs/gaudi: invoke device reset from one code block In order to prepare the driver code for device reset event notification, change the event handler function flow to call device reset from one code block. In addition, the commit fixes an issue that reset was performed w/o checking the 'hard_reset_on_fw_event' state and w/o setting the HL_DRV_RESET_DELAY flag. Signed-off-by: Tal Cohen <talcohen@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2022-07-12 09:09:22 +03:00
Tal Cohen	a7d6c35bcd	habanalabs/gaudi: collect undefined opcode error info when an undefined opcode error occurres, the driver collects the relevant information from the Qman and stores it inside the hdev data structure. An event fd indication is sent towards the user space. Note: another commit shall be followed which will add support to read the error info by an ioctl. Signed-off-by: Tal Cohen <talcohen@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2022-07-12 09:09:22 +03:00
Oded Gabbay	8742a75a1c	habanalabs/gaudi: fix comment to reflect current code Due to code changes in the past few years, the original comment of how parser->user_cb_size is checked was not correct anymore. Fix it to reflect current code and add more explanation as the code is more complex now. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2022-07-12 09:09:22 +03:00
Tal Cohen	d0c92afc0e	habanalabs: change the write flag name of error info structs positive flags naming will make more clear code while adding more 'error info' structures Signed-off-by: Tal Cohen <talcohen@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2022-07-12 09:09:21 +03:00
Tal Cohen	939ed076ea	habanalabs/gaudi: move tpc assert raise into internal func raising the tpc assert event in an internal function will make the code cleaner as we are going to be adding more events Signed-off-by: Tal Cohen <talcohen@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2022-07-12 09:09:21 +03:00
Dafna Hirschfeld	78d503087b	habanalabs: add terminating NULL to attrs arrays Arrays of struct attribute are expected to be NULL terminated. This is required by API methods such as device_add_groups. This fixes a crash when loading the driver for Goya device. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2022-07-12 09:09:21 +03:00
Tal Cohen	90de680526	habanalabs: use separate structure info for each error collect data Create separate info structure for each error type. The structures shall be used inside the large structure that contains the last session error. This is more scalable for adding more errors in the future. Signed-off-by: Tal Cohen <talcohen@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-22 21:01:21 +02:00
Ohad Sharabi	9e495e2400	habanalabs: do MMU prefetch as deferred work When user requests to prefetch the MMU translations, the driver will not block the user until prefetch is done. Instead, the prefetch work will be delegated to a WQ which will do it in the background. This way, the prefetch may progress without blocking the user at all. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-22 21:01:21 +02:00
Tal Cohen	422ef17103	habanalabs: add support for notification via eventfd The driver will be able to send notification events towards a user process, using user's registered event file descriptor. The driver uses the notification mechanism to inform the user about an occurred event. A user thread can wait until a notification is received from the driver. The driver stores the occurred event until the user reads it, using HL_INFO_GET_EVENTS - new ioctl opcode in the INFO ioctl. Gaudi specific implementation includes sending a notification on a TPC assertion event that is received from f/w. Signed-off-by: Tal Cohen <talcohen@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-22 21:01:20 +02:00
Dafna Hirschfeld	0688474eda	habanalabs: add device memory scrub ability through debugfs Add the ability to scrub the device memory with a given value. Add file 'dram_mem_scrub_val' to set the value and a file 'dram_mem_scrub' to scrub the dram. This is very important to help during automated tests, when you want the CI system to randomize the memory before training certain DL topologies. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-22 21:01:20 +02:00
Yuri Nudelman	829ec038c9	habanalabs: use unified memory manager for CB flow With the new code required for the flow added, we can now switch to using the new memory manager infrastructure, removing the old code. Signed-off-by: Yuri Nudelman <ynudelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-22 21:01:19 +02:00
Ofir Bitton	2db04a6826	habanalabs/gaudi: set arbitration timeout to a high value In certain workloads, arbitration timeout might expire although no actual issue present. Hence, we set timeout to a very high value. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-22 21:01:19 +02:00
Ohad Sharabi	ab4ea58728	habanalabs: use for_each_sgtable_dma_sg for dma sgt Instead of using for_each_sg when iterating sgt that contains dma entries, use the more proper for_each_sgtable_dma_sg macro. In addition, both Goya and Gaudi have the exact same implementation of the asic function that encapsulate the usage of this macro, so it is better to move that implementation to the common code. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-22 21:01:18 +02:00
Rajaravi Krishna Katta	b8d852add6	habanalabs/gaudi: use lower_32_bits() for casting Use standard kernel macro to take lower 32 bits of 64-bits variable. Signed-off-by: Rajaravi Krishna Katta <rkatta@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-22 21:01:18 +02:00
Oded Gabbay	738607f005	habanalabs: change a reset print to debug level Currently we have two reset prints per reset. One is in the common code and one in each asic-specific file. We can change the asic-specific message to be debug only as we can know the type of reset being done according to the print in the common code, which is also easier to maintain. Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-22 21:01:17 +02:00
Oded Gabbay	fcadbf5688	habanalabs: remove redundant info print Halting compute engines is a print that doesn't add us any information because it is always done in the reset process and not used elsewhere. Even if it was, we don't use prints to mark functions we passed through. Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-22 21:01:17 +02:00
Dafna Hirschfeld	799b9eb01a	habanalabs: remove debugfs read/write callbacks The debugfs memory access now uses the callback 'access_dev_mem' so there is no use of the callbacks 'debugfs_{read32,read64,write32,write6}'. Remove them. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-22 21:01:16 +02:00
Dafna Hirschfeld	234366d3b6	habanalabs: add callback and field to be used for debugfs refactor This is a preparation for unifying the code of accessing device memory through debugfs. Add struct fields and callbacks that will later be used in debugfs code and will reduce code duplication among the different read{32,64}/write{32,64} callbacks of every asic. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-22 21:01:16 +02:00
Ohad Sharabi	d0b59cf68c	habanalabs/gaudi: add debugfs to fetch internal sync status When Gaudi device is secured the monitors data in the configuration space is blocked from PCI access. As we need to enable user to get sync-manager monitors registers when debugging, this patch adds a debugfs that dumps the information to a binary file (blob). When a root user will trigger the dump, the driver will send request to the f/w to fill a data structure containing dump of all monitors registers. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-22 20:57:37 +02:00
Dafna Hirschfeld	c3712c1d7d	habanalabs/gaudi: Use correct sram size macro for debugfs We currently allow accessing the whole SRAM bar size with the macro SRAM_BAR_SIZE, but the actual size of the sram region is the macro SRAM_SIZE which is only a portion of the whole bar size. So when accessing the sram through debugfs, use the macro SRAM_SIZE for the sram size which is the correct macro. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-22 20:57:36 +02:00
Ohad Sharabi	acbabe63ef	habanalabs: add MMU prefetch to ASIC-specific code This is necessary pre-requisite for future ASIC support, where MMU TLB prefetch is supported. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-22 20:57:36 +02:00
Tomer Tayar	687c6b535e	habanalabs: modify dma_mask to be ASIC specific property The required DMA mask is no longer based on input from the F/W, but it is fixed per ASIC according to its address space. As such, the per-ASIC function to get this value can be replaced with a property variable. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-22 20:57:35 +02:00
Tomer Tayar	9d92689ca2	habanalabs/gaudi: avoid resetting max power in hard reset The default max power is deduced from the card type value in the CPU-CP info. This value is then set in the max power variable of the device structure. Getting the CPU-CP info is done as part of the late init phase which is called also during reset. This means that a max power value which is modified via sysfs will be reset during hard reset back to the default value. As the max power is updated in any case during device init in hl_sysfs_init(), this setting in late init can be removed, and the overriding during reset is thus avoided. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-22 20:57:35 +02:00
Ofir Bitton	b19768d81a	habanalabs/gaudi: increase submission resources In order to allow user to have larger amount of submissions, we increase the DMA and NIC queue depth to 4K. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-22 20:57:34 +02:00
Ohad Sharabi	050a6f349a	habanalabs: add user API to get valid DRAM page sizes Future devices will support multiple device memory page sizes. In addition, an API for the user was added for it to be able to control the device memory allocation page size. This patch is a complementary patch to inform the user of the available page size supported by the device. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-22 20:57:34 +02:00
Ohad Sharabi	06926dbed2	habanalabs: convert all MMU masks/shifts to arrays There is no need to hold each MMU mask/shift as a denoted structure member (e.g. hop0_mask). Instead converting it to array will result in smaller and more readable code. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-22 20:57:34 +02:00
Ohad Sharabi	2f8f0de878	habanalabs: change mmu_get_real_page_size to be ASIC-specific This patch breaks the cumbersome implementation of "get real page size" along with it's multiple inner conditions and implement each case (according to the real complexity) inside an ASIC function. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-22 20:57:33 +02:00
Ohad Sharabi	378b02dc01	habanalabs: set non-0 value in dram default page size Looking forward we will need to report to the user what is the default page size used. This will be done more conveniently by explicitly updating the property rather than to rely on a "0 meaning default" value. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-22 20:57:33 +02:00
Tomer Tayar	b0106bc6fe	habanalabs: add an option to delay a device reset Several H/W events can be sent adjacently, even due to a single error. If a hard-reset is triggered as part of handling one of these events, the following events won't be handled. The debug info from these missed events is important, sometimes even more important than the one that was handled. To allow handling these close events, add an option to delay a device reset and use it when resetting due to H/W events. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2022-02-28 14:22:06 +02:00
Oded Gabbay	100fcf1e11	habanalabs/gaudi: add missing handling of NIC related events There are a few events that can arrive from the f/w and without proper handling can cause errors to appear in the kernel log without reason. Add the relevant handling that was missing. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2022-02-28 14:22:05 +02:00
Oded Gabbay	26ef1c000b	habanalabs/gaudi: handle axi errors from NIC engines Various AXI errors can occur in the NIC engines and are reported to the driver by the f/w. Add code to print the errors and ack them to the f/w. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2022-02-28 14:22:05 +02:00
Ohad Sharabi	f23f280277	habanalabs: allow user to set allocation page size In future ASICs the MMU will be able to work with multiple page sizes, thus a new flag is added to allow the user to set the requested page size. This flag is added since the whole DRAM is allocated for the user and the user also should be familiar with the memory usage use case. As such, the user may choose to "over allocate" memory in favor of performance (for instance- large page allocations covers more memory in less TLB entries). For example: say available page sizes are of 1MB and 32MB. If user wants to allocate 40MB the user can either set page size to 1MB and allocate the exact amount of memory (but will result in 40 TLB entries) or the user can use 32MB pages, "waste" 8MB of physical memory but occupy only 2 TLB entries. Note that this feature will be available only to ASIC that supports multiple DRAM page sizes. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2022-02-28 14:22:05 +02:00

1 2 3 4 5 ...

314 Commits