OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Alex Deucher	5f152b5e69	drm/amdgpu: rename amdgpu_gpu_recover add device to the name for consistency. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-12-18 10:59:58 -05:00
Alex Deucher	9c3f2b5474	drm/amdgpu: rename amdgpu_program_register_sequence add device for consistency with other functions in this file. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-12-18 10:59:13 -05:00
Andrey Grodzovsky	8854695add	drm/amdgpu: Simplify amdgpu_lockup_timeout usage. With introduction of amdgpu_gpu_recovery we don't need any more to rely on amdgpu_lockup_timeout == 0 for disabling GPU reset. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-12-15 17:15:00 -05:00
Andrey Grodzovsky	dcebf026e6	drm/amdgpu: Add gpu_recovery parameter Add new parameter to control GPU recovery procedure. v2: Add auto logic where reset is disabled for bare metal and enabled for SR-IOV. Allow forced reset from debugfs. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-12-15 17:14:50 -05:00
Christian König	c47b41a79a	drm/amdgpu: remove nonsense const u32 cast on ARRAY_SIZE result Not sure what that should originally been good for, but it doesn't seem to make any sense any more. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-12-06 12:47:21 -05:00
pding	f47110330c	drm/amdgpu: return error when sriov access requests get timeout Reported-by: Sun Gary <Gary.Sun@amd.com> Signed-off-by: pding <Pixel.Ding@amd.com> Reviewed-by: Xiangliang Yu <Xiangliang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-12-04 16:41:41 -05:00
Monk Liu	5740682e66	drm/amdgpu:implement new GPU recover(v3) 1,new imple names amdgpu_gpu_recover which gives more hint on what it does compared with gpu_reset 2,gpu_recover unify bare-metal and SR-IOV, only the asic reset part is implemented differently 3,gpu_recover will increase hang job karma and mark its entity/context as guilty if exceeds limit V2: 4,in scheduler main routine the job from guilty context will be immedialy fake signaled after it poped from queue and its fence be set with "-ECANCELED" error 5,in scheduler recovery routine all jobs from the guilty entity would be dropped 6,in run_job() routine the real IB submission would be skipped if @skip parameter equales true or there was VRAM lost occured. V3: 7,replace deprecated gpu reset, use new gpu recover Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-12-04 16:41:30 -05:00
pding	b59142384e	drm/amdgpu/virt: implement wait_reset callbacks for vi/ai Reviewed-by: Monk Liu <monk.liu@amd.com> Signed-off-by: pding <Pixel.Ding@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-12-04 16:33:14 -05:00
Gavin Wan	890419409a	drm/amdgpu: Support passing amdgpu critical error to host via GPU Mailbox. This feature works for SRIOV enviroment. For non-SRIOV enviroment, the trans_error function does nothing. The error information includes error_code (16bit), error_flags(16bit) and error_data(64bit). Since there are not many errors, we keep the errors in an array and transfer all errors to Host before amdgpu initialization function (amdgpu_device_init) exit. Signed-off-by: Gavin Wan <Gavin.Wan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-07-14 11:05:52 -04:00
Monk Liu	0c63e11340	drm/amdgpu:only call flr_work under infinite timeout Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-05-24 17:40:39 -04:00
Monk Liu	7225f8736c	drm/amdgpu:use job* to replace voluntary that way we can know which job cause hang and can do per sched reset/recovery instead of all sched. Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-05-24 17:40:38 -04:00
Monk Liu	17b2e332a2	drm/amdgpu:need som change on vega10 mailbox if sriov gpu reset is invoked by job timeout, it is run in a global work-queue which is very slow and better not call msleep ortherwise it takes long time to get back CPU. so make below changes: 1: Change msleep 1 to mdelay 5 2: Ignore the ack fail from pf after time out, because VF FLR will clear ack, sometime VF FLR is done prior to the beginning of poll_ack so we can ignore this ack TODO: Put job_timedout (and the following gpu reset) in a driver thread, instead of the global work_struct. Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Xiangliang Yu <Xiangliang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-05-24 17:40:18 -04:00
Pixel Ding	ee73164a0d	drm/amdgpu/virt: don't check VALID bit for FLR completion message The interrupt after FLR is missed sometimes due to hardware reason, so guest driver get the notification of FLR completion via polling message. Then host doesn't write VALID bit to avoid sending interrupt, otherwise the completion will be handled twice. So there's a valid message without VALID bit for FLR completion, driver should handle it without checking. Signed-off-by: Pixel Ding <Pixel.Ding@amd.com> Reviewed-by: Xiangliang Yu <Xiangliang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-04-28 17:32:40 -04:00
Alex Deucher	d766e6a393	drm/amdgpu: switch ih handling to two levels (v3) Newer asics have a two levels of irq ids now: client id - the IP src id - the interrupt src within the IP v2: integrated Christian's comments. v3: fix rebase fail in SI and CIK Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Ken Wang <Qingqing.Wang@amd.com> Reviewed-by: Ken Wang <Qingqing.Wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-03-29 23:53:37 -04:00
Xiangliang Yu	d1aad4d8a4	drm/amdgpu/virt: fix typo When send messages to hypervior, the messages format should be is idh_request, not idh_event. Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Monk Liu <Monk.Liu@amd.com> Acked-by: Christian KÃ¶nig <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-03-29 23:53:16 -04:00
Monk Liu	2641e38b02	drm/amdgpu:RUNTIME flag should clr later this flag will get cleared by request gpu access Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Xiangliang Yu <Xiangliang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-03-29 23:53:12 -04:00
Monk Liu	480da26260	drm/amdgpu:use work instead of delay-work no need to use a delay work since we don't know how much time hypervisor takes on FLR, so just polling and waiting in a work. Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Xiangliang Yu <Xiangliang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-03-29 23:53:11 -04:00
Monk Liu	4a370955ed	drm/amdgpu:no kiq for mailbox registers access Use no kiq version reg access due to: 1) better performance 2) INTR context consideration (some routine in mailbox is in INTR context e.g.xgpu_vi_mailbox_rcv_irq) Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Xiangliang Yu <Xiangliang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-03-29 23:53:10 -04:00
Ken Xue	562fe45c05	drm/amdgpu:Refine handshake of mailbox Signed-off-by: Ken Xue <Ken.Xue@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Xiangliang Yu <Xiangliang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-03-29 23:53:10 -04:00
Xiangliang Yu	ab71ac56f6	drm/amdgpu/virt: implement VI virt operation interfaces VI has asic specific virt support, which including mailbox and golden registers init. Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com> Signed-off-by: Monk Liu <Monk.Liu@amd.com> Signed-off-by: shaoyunl <Shaoyun.Liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2017-01-27 11:13:24 -05:00

20 Commits