OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Felix Kuehling	c98171ccf6	drm/amdgpu: Handle GPUVM fault storms When many wavefronts cause VM faults at the same time, it can overwhelm the interrupt handler and cause IH ring overflows before the driver can notify or kill the faulting application. As a workaround I'm introducing limited per-VM fault credit. After that number of VM faults have occurred, further VM faults are filtered out at the prescreen stage of processing. This depends on the PASID in the interrupt packet, so it currently only works for KFD contexts. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-28 16:03:30 -04:00
Andres Rodriguez	35161bbc13	drm/amdgpu: map compute rings by least recently used pipe This patch provides a guarantee that the first n queues allocated by an application will be on different pipes. Where n is the number of pipes available from the hardware. This helps avoid ring aliasing which can result in work executing in time-sliced mode instead of truly parallel mode. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Andres Rodriguez <andresx7@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-28 16:03:22 -04:00
Andres Rodriguez	4a75aefe3f	drm/amdgpu: add option for force enable multipipe policy for compute Useful for testing the effects of multipipe compute without recompiling. Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Andres Rodriguez <andresx7@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-28 16:03:21 -04:00
Andres Rodriguez	0f7607d484	drm/amdgpu: use multipipe compute policy on non PL11 asics A performance regression for OpenCL tests on Polaris11 had this feature disabled for all asics. Instead, disable it selectively on the affected asics. Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Andres Rodriguez <andresx7@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-28 16:03:21 -04:00
Alex Deucher	e23b74aab5	drm/amdgpu: fix vf error handling The error handling for virtual functions assumed a single vf per VM and didn't properly account for bare metal. Make the error arrays per device and add locking. Reviewed-by: Gavin Wan <gavin.wan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-28 16:03:20 -04:00
Alex Deucher	6f87a89570	drm/amdgpu: clarify license in amdgpu_trace_points.c It was not clear. The rest of the driver is MIT/X11. Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:37 -04:00
Samuel Li	dfced2e4bc	drm/amdgpu: Add gem_prime_mmap support v2: drop hdp invalidate/flush. v3: honor pgoff during prime mmap. Add a barrier after cpu access. v4: drop begin/end_cpu_access() for now, revisit later. Signed-off-by: Samuel Li <Samuel.Li@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:37 -04:00
Christian König	e9c7577c09	drm/amdgpu: simplify pinning into visible VRAM Just set the CPU access required flag when we pin it. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:25 -04:00
Monk Liu	c833d8aa4d	drm/amdgpu:fix firmware memoryleak(v2) this fix memory leak due to request_firmware after driver unloaded v2: release gmc firmware for gmc6/7/8 as well Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:25 -04:00
Monk Liu	4ff184d70e	drm/amdgpu:fix uvd ring fini routine(v2) fix missing finish uvd enc_ring. v2: since the adev pointer check in already in ring_fini so drop the check outsider Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:24 -04:00
Monk Liu	beb8410284	drm/amdgpu/sriov:alloc KIQ MQD in VRAM(v2) this way after KIQ MQD released in drv unloading, CPC can still let KIQ access this MQD thus RLCV SAVE_VF will not fail v2: always use VRAM domain for KIQ MQD no matter BM or SRIOV Signed-off-by: Monk Liu <Monk.Liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:24 -04:00
Monk Liu	85f95ad629	drm/amdgpu:unmap KCQ in gfx hw_fini(v2) v2: move kcq_disable out of SRIOV, make it genearal Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:23 -04:00
Monk Liu	4bd9a67e17	drm/amdgpu:halt when vm fault only with this way we can debug the VMC page fault issue Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:22 -04:00
Yong Zhao	e6d921974a	drm/amdgpu: Add copy_pte_num_dw member in amdgpu_vm_pte_funcs Use it to replace the hard coded value in amdgpu_vm_bo_update_mapping(). Signed-off-by: Yong Zhao <yong.zhao@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:22 -04:00
Yong Zhao	7bdc53f925	drm/amdgpu: Fix a bug in amdgpu_fill_buffer() When max_bytes is not 8 bytes aligned and bo size is larger than max_bytes, the last 8 bytes in a ttm node may be left unchanged. For example, on pre SDMA 4.0, max_bytes = 0x1fffff, and the bo size is 0x200000, the problem will happen. In order to fix the problem, we separately store the max nums of PTEs/PDEs a single operation can set in amdgpu_vm_pte_funcs structure, rather than inferring it from bytes limit of SDMA constant fill, i.e. fill_max_bytes. Together with the fix, we replace the hard code value "10" in amdgpu_vm_bo_update_mapping() with the corresponding values from structure amdgpu_vm_pte_funcs. Signed-off-by: Yong Zhao <yong.zhao@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:21 -04:00
Yong Zhao	dfe5c2b76b	drm/amdgpu: Correct bytes limit for SDMA 3.0 copy and fill Signed-off-by: Yong Zhao <yong.zhao@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:21 -04:00
Christian König	a8ffeac96d	drm/amdgpu: use 2MB fragment size for GFX6,7 and 8 Use 2MB fragment size by default for older hardware generations as well. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: John Bridgman <john.bridgman@amd.com> Reviewed-by: Roger He <Hongbo.He@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:20 -04:00
Xiangliang.Yu	fd4495e57c	drm/amdgpu: Fix driver reloading failure SRIOV doesn't implement PMC capability of PCIe, so it can't update power state by reading PMC register. Currently, amdgpu driver doesn't disable pci device when removing driver, the enable_cnt of pci device will not be decrease to 0. When reloading driver, pci_enable_device will do nothing as enable_cnt is not zero. And power state will not be updated as PMC is not support. So current_state of pci device is not D0 state and pci_enable_msi return fail. Add pci_disable_device when remmoving driver to fix the issue. Signed-off-by: Xiangliang.Yu <Xiangliang.Yu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:19 -04:00
Evan Quan	5c58301856	drm/amd/amdgpu: add vega10/raven mmhub/athub golden settings Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:18 -04:00
Eric Huang	4d1f9fb721	drm/amdgpu: add cgs query info of pci bus devfn Signed-off-by: Eric Huang <JinHuiEric.Huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:16 -04:00
Tom St Denis	10cfafd62a	drm/amd/amdgpu: Partial revert of iova debugfs We discovered that on some devices even with iommu enabled you can access all of system memory through the iommu translation. Therefore, we revert the read method to the translation only service and drop the write method completely. Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Reviewed-by: Christan König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:15 -04:00
Evan Quan	a49ccdbd1d	drm/amd/amgpu: update vega10 sdma golden setting Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:15 -04:00
Evan Quan	6fe8542957	drm/amd/amgpu: update raven sdma golden setting Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:14 -04:00
Monk Liu	d59c026b7b	drm/amdgpu/sriov:fix memory leak after gpu reset GPU reset will require all hw doing hw_init thus ucode_init_bo will be invoked again, which lead to memory leak skip the fw_buf allocation during sriov gpu reset to avoid memory leak. Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:13 -04:00
Monk Liu	eb01abc7c4	drm/amdgpu:make ctx_add_fence interruptible(v2) otherwise a gpu hang will make application couldn't be killed under timedout=0 mode v2: Fix memoryleak job/job->s_fence issue unlock mn remove the ERROR msg after waiting being interrupted Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:13 -04:00
Monk Liu	f840cc5f84	drm/amdgpu/sriov:init csb for gfxv9 RLC need CSB registers initiated under SRIOV during world switch otherwise the clear state buffer behav will not be recovered to current VF scheme after switch back Signed-off-by: Monk Liu <Monk.Liu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:12 -04:00
Horace Chen	6e132ca0bb	drm/amdgpu/sriov:increate mailbox polling timeout increase timeout to 12 seconds,because there may have multiple FLR waiting for done, the waiting time of events may be long, increase to 12s to reduce timeout failure. Signed-off-by: Horace Chen <horace.chen@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:12 -04:00
Monk Liu	030308fcbd	drm/amdgpu/sriov:fix page fault issue of driver unload bo_free on csa is too late to put in amdgpu_fini because that time ttm is already finished, Move it earlier to avoid the page fault. Signed-off-by: Monk Liu <Monk.Liu@amd.com> Signed-off-by: Horace Chen <horace.chen@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:11 -04:00
Monk Liu	6e2e216fad	drm/amdgpu:use formal register to trigger hdp invalidate Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:10 -04:00
Monk Liu	1d4e0a8c4f	drm/amdgpu:hdp flush should be put it initialized Signed-off-by: Monk Liu <Monk.Liu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:10 -04:00
Monk Liu	2ea6ab2741	drm/amdgpu:insert TMZ_BEGIN FRAME_CONTROL(begin) is needed for vega10 due to ucode logic change, it can fix some CTS random fail under gfx preemption enabled mode. Signed-off-by: Monk Liu <Monk.Liu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:09 -04:00
Monk Liu	55981bd2e8	drm/amdgpu/sriov:don't load psp fw during gpu reset At least for SRIOV we found reload PSP fw during gpu reset cause PSP hang. Signed-off-by: Monk Liu <Monk.Liu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:09 -04:00
Monk Liu	3224a12b90	drm/amdgpu/sriov:move in_reset to adev and rename currently in_reset is only used in sriov gpu reset, and it will be used for other non-gfx hw component later, like PSP, so move it from gfx to adev and rename to in_sriov_reset make more sense. Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:08 -04:00
Monk Liu	7c3f2167b4	drm/amdgpu:no kiq in IH Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:07 -04:00
Monk Liu	ab5d6227b7	drm/amdgpu/sriov:fix missing error handling Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:07 -04:00
Ken Wang	98512bb8c2	drm/amdgpu: Add GPU reset functionality for Vega10 V2 Signed-off-by: Ken Wang <Ken.Wang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:06 -04:00
Tom St Denis	79ba280066	drm/amd/amdgpu: remove usage of ttm trace Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:05 -04:00
Tom St Denis	38290b2c45	drm/amd/amdgpu: add support for iova_to_phys to replace TTM trace (v5) Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (v2): Add domain to iova debugfs (v3): Add true read/write methods to access system memory of pages mapped to the device (v4): Move get_domain call out of loop and return on error (v5): Just use kmap/kunmap	2017-09-26 15:14:04 -04:00
Tom St Denis	a40cfa0bef	drm/amd/amdgpu: Fold TTM debugfs entries into array (v2) Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (v2): add domains and avoid strcmp	2017-09-26 15:14:04 -04:00
Rex Zhu	0b693f0b56	drm/amdgpu: fix checkpatch.pl warning to amdgpu_drv.c fix checkpatch.pl WARNING: Prefer 'unsigned int' to bare use of 'unsigned' Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:03 -04:00
Leo Liu	f6e8b15af7	drm/amdgpu: remove the clearance of vce 4.0 interrupt mask Requested by SRIOV, the clearance of the bit moved into firmware Signed-off-by: Leo Liu <leo.liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:03 -04:00
Xiangliang.Yu	3e4b0bd960	drm/amdgpu/sdma3: set wptr shadow atomically Port it from sdma4 for wptr polling usage. Signed-off-by: Xiangliang.Yu <Xiangliang.Yu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:02 -04:00
Xiangliang.Yu	e33dac39bc	drm/amdgpu/sdma3: Enable sdma wptr polling for SRIOV When hypervisor triggering FLR for one of VFs, need to enable sdma wptr polling to avoid missing wptr update if enabling doorbell. Signed-off-by: Xiangliang.Yu <Xiangliang.Yu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 15:14:01 -04:00
Felix Kuehling	a2f14820e3	drm/amdgpu: Track pending retry faults in IH and VM (v2) IH tracks pending retry faults in a hash table for fast lookup in interrupt context. Each VM has a short FIFO of pending VM faults for processing in a bottom half. The IH prescreening stage adds retry faults and filters out repeated retry interrupts to minimize the impact of interrupt storms. It's the VM's responsibility remove pending faults once they are handled. For now this is only done when the VM is destroyed. v2: - Made the hash table smaller and the FIFO longer. I never want the FIFO to fill up, because that would make prescreen take longer. 128 pending page faults should be enough to keep migrations busy. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> (v1) Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 14:53:20 -04:00
Felix Kuehling	00ecd8a27c	drm/amdgpu: Add prescreening stage in IH processing (v2) To filter out high-frequency interrupts that can be safely ignored. v2: squash in trivial typo fix for si (Alex) Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 13:07:04 -04:00
Felix Kuehling	02208441cc	drm/amdgpu: Add PASID management Allows assigning a PASID to a VM for identifying VMs involved in page faults. The global PASID manager is also exported in the KFD interface so that AMDGPU and KFD can share the PASID space. PASIDs of different sizes can be requested. On APUs, the PASID size is deterined by the capabilities of the IOMMU. So KFD must be able to allocate PASIDs in a smaller range. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 13:07:02 -04:00
Felix Kuehling	ca290da8f6	drm/amdgpu: Fix error handling in amdgpu_vm_init Make sure vm->root.bo is not left reserved if amdgpu_bo_kmap fails. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 13:07:01 -04:00
Rex Zhu	780cffc599	drm/amdgpu: add powerplay support for CI asics currently, for CI asics, use dpm by default, amdgpu.dpm=-1. when set amdgpu.dpm=1, enable powplay. when set amdgpu.dpm=0, disable both dpm and powerplay. when powerplay is stable on CI asics, ci_dpm will be removed. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-26 13:07:00 -04:00
Rex Zhu	6df9855fe2	drm/amdgpu: add support for request SI/CI firmware in CGS Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-18 23:30:36 -04:00
Rex Zhu	cd4d74648b	drm/amdgpu: unify the interface of amd_pm_funcs put amd_pm_funcs table in struct powerplay for all asics. Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-09-18 23:30:35 -04:00

1 2 3 4 5 ...

3170 Commits