amd/amdgpu: force to trigger a no-retry-fault after a retry-fault

Only for the debugger use case. [why] Avoid endless translation retries, after an invalid address access has been issued to the GPU. Instead, the trap handler is forced to enter by generating a no-retry-fault. A s_trap instruction is inserted in the debugger case to let the wave to enter trap handler to save context. [how] Intentionally using an invalid flag combination (F and P set at the same time) to trigger a no-retry-fault, after a retry-fault happens. This is only valid under compute context. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-18 15:33:07 -06:00 · 2019-11-18 15:33:07 -06:00 · b4672c8a84
parent f43ef951f6
commit b4672c8a84
1 changed files with 10 additions and 1 deletions
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@ -3197,11 +3197,20 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
 	flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
 		AMDGPU_PTE_SYSTEM;

-	if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
+	if (vm->is_compute_context) {
+		/* Intentionally setting invalid PTE flag
+		 * combination to force a no-retry-fault
+		 */
+		flags = AMDGPU_PTE_EXECUTABLE | AMDGPU_PDE_PTE |
+			AMDGPU_PTE_TF;
+		value = 0;
+
+	} else if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
 		/* Redirect the access to the dummy page */
 		value = adev->dummy_page_addr;
 		flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
 			AMDGPU_PTE_WRITEABLE;
+
 	} else {
 		/* Let the hw retry silently on the PTE */
 		value = 0;