OpenCloudOS-Kernel

History

Trigger Huang 85744e9c10 drm/scheduler: Fix bad job be re-processed in TDR A bad job is the one triggered TDR(In the current amdgpu's implementation, actually all the jobs in the current joq-queue will be treated as bad jobs). In the recovery process, its fence will be fake signaled and as a result, the work behind will be scheduled to delete it from the mirror list, but if the TDR process is invoked before the work's execution, then this bad job might be processed again and the call dma_fence_set_error to its fence in TDR process will lead to kernel warning trace: [ 143.033605] WARNING: CPU: 2 PID: 53 at ./include/linux/dma-fence.h:437 amddrm_sched_job_recovery+0x1af/0x1c0 [amd_sched] kernel: [ 143.033606] Modules linked in: amdgpu(OE) amdchash(OE) amdttm(OE) amd_sched(OE) amdkcl(OE) amd_iommu_v2 drm_kms_helper drm i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 snd_hda_codec_generic crypto_simd glue_helper cryptd snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq joydev snd_seq_device snd_timer snd soundcore binfmt_misc input_leds mac_hid serio_raw nfsd auth_rpcgss nfs_acl lockd grace sunrpc sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 8139too floppy psmouse 8139cp mii i2c_piix4 pata_acpi [ 143.033649] CPU: 2 PID: 53 Comm: kworker/2:1 Tainted: G OE 4.15.0-20-generic #21-Ubuntu [ 143.033650] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 143.033653] Workqueue: events drm_sched_job_timedout [amd_sched] [ 143.033656] RIP: 0010:amddrm_sched_job_recovery+0x1af/0x1c0 [amd_sched] [ 143.033657] RSP: 0018:ffffa9f880fe7d48 EFLAGS: 00010202 [ 143.033659] RAX: 0000000000000007 RBX: ffff9b98f2b24c00 RCX: ffff9b98efef4f08 [ 143.033660] RDX: ffff9b98f2b27400 RSI: ffff9b98f2b24c50 RDI: ffff9b98efef4f18 [ 143.033660] RBP: ffffa9f880fe7d98 R08: 0000000000000001 R09: 00000000000002b6 [ 143.033661] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9b98efef3430 [ 143.033662] R13: ffff9b98efef4d80 R14: ffff9b98efef4e98 R15: ffff9b98eaf91c00 [ 143.033663] FS: 0000000000000000(0000) GS:ffff9b98ffd00000(0000) knlGS:0000000000000000 [ 143.033664] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 143.033665] CR2: 00007fc49c96d470 CR3: 000000001400a005 CR4: 00000000003606e0 [ 143.033669] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 143.033669] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 143.033670] Call Trace: [ 143.033744] amdgpu_device_gpu_recover+0x144/0x820 [amdgpu] [ 143.033788] amdgpu_job_timedout+0x9b/0xa0 [amdgpu] [ 143.033791] drm_sched_job_timedout+0xcc/0x150 [amd_sched] [ 143.033795] process_one_work+0x1de/0x410 [ 143.033797] worker_thread+0x32/0x410 [ 143.033799] kthread+0x121/0x140 [ 143.033801] ? process_one_work+0x410/0x410 [ 143.033803] ? kthread_create_worker_on_cpu+0x70/0x70 [ 143.033806] ret_from_fork+0x35/0x40 So just delete the bad job from mirror list directly Changes in v3: - Add a helper function to delete the bad jobs from mirror list and call it directly before the job's fence is signaled Changes in v2: - delete the useless list node check - also delete bad jobs in drm_sched_main because: kthread_unpark(ring->sched.thread) will be invoked very early before amdgpu_device_gpu_recover's return, then drm_sched_main will have chance to pick up a new job from the job queue. This new job will be added into the mirror list and processed by amdgpu_job_run, but may not be deleted from the mirror list on time due to the same reason. And finally re-processed by drm_sched_job_recovery Signed-off-by: Trigger Huang <Trigger.Huang@amd.com> Reviewed-by: Christian König <chrstian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>		2018-11-19 16:38:15 -05:00
..
amd	drm/amdgpu/gfx: use proper offset define for MEC doorbells	2018-11-19 16:38:14 -05:00
arc	drm/arc: Use drm_fbdev_generic_setup()	2018-11-01 15:23:21 +01:00
arm	drm: mali-dp: Enable Mali-DP tiled buffer formats	2018-11-02 09:57:27 +00:00
armada	drm: extract drm_atomic_uapi.c	2018-09-09 14:19:18 +02:00
ast	drm/ttm: initialize globals during device init (v2)	2018-11-05 14:21:21 -05:00
atmel-hlcdc	drm/atmel-hlcdc: Use drm_fbdev_generic_setup()	2018-11-01 15:24:22 +01:00
bochs	Merge branch 'drm-next-4.21' of git://people.freedesktop.org/~agd5f/linux into drm-next	2018-11-19 11:07:52 +10:00
bridge	drm-misc-next for v4.21, part 1:	2018-11-19 10:40:33 +10:00
cirrus	drm/ttm: initialize globals during device init (v2)	2018-11-05 14:21:21 -05:00
etnaviv	Merge branch 'drm-next-4.21' of git://people.freedesktop.org/~agd5f/linux into drm-next	2018-11-19 11:07:52 +10:00
exynos	drm/exynos/fbdev: do not skip fbdev init if there are no connectors	2018-11-05 16:37:24 +09:00
fsl-dcu	drm/fsl-dcu: Use drm_fbdev_generic_setup()	2018-11-01 15:23:58 +01:00
gma500	Merge drm/drm-next into drm-misc-next	2018-08-27 10:00:03 -04:00
hisilicon	drm/ttm: initialize globals during device init (v2)	2018-11-05 14:21:21 -05:00
i2c	Merge branch 'drm-tda9950-fixes' of git://git.armlinux.org.uk/~rmk/linux-arm into drm-fixes	2018-10-04 10:32:14 +10:00
i810	…
i915	drm-misc-next for v4.21, part 1:	2018-11-19 10:40:33 +10:00
imx	drm/imx: fix build failure without CONFIG_DRM_FBDEV_EMULATION	2018-10-05 12:09:20 +02:00
lib	…
mediatek	drm pull for 4.20-rc1	2018-10-28 17:49:53 -07:00
meson	drm-misc-next for v4.21, part 1:	2018-11-19 10:40:33 +10:00
mga	…
mgag200	drm/ttm: initialize globals during device init (v2)	2018-11-05 14:21:21 -05:00
msm	drm-misc-next for v4.21, part 1:	2018-11-19 10:40:33 +10:00
mxsfb	drm/mxsfb: Switch to drm_atomic_helper_commit_tail_rpm	2018-09-26 22:07:40 +02:00
nouveau	Merge branch 'drm-next-4.21' of git://people.freedesktop.org/~agd5f/linux into drm-next	2018-11-19 11:07:52 +10:00
omapdrm	drm/omap: dsi: Fix missing of_platform_depopulate()	2018-11-12 11:50:13 +02:00
panel	drm-misc-next for v4.21, part 1:	2018-11-19 10:40:33 +10:00
pl111	Merge drm/drm-next into drm-misc-next	2018-09-27 02:54:54 -04:00
qxl	Merge branch 'drm-next-4.21' of git://people.freedesktop.org/~agd5f/linux into drm-next	2018-11-19 11:07:52 +10:00
r128	…
radeon	Merge branch 'drm-next-4.21' of git://people.freedesktop.org/~agd5f/linux into drm-next	2018-11-19 11:07:52 +10:00
rcar-du	drm/rcar-du: Convert drm_atomic_helper_suspend/resume()	2018-10-23 15:59:01 +02:00
rockchip	drm/rockchip: dsi: add dual mipi support	2018-10-30 14:06:31 +01:00
savage	…
scheduler	drm/scheduler: Fix bad job be re-processed in TDR	2018-11-19 16:38:15 -05:00
selftests	drm/selftests: Fix build warning -Wframe-larger-than	2018-11-02 14:25:32 +01:00
shmobile	drm: shmobile: convert to SPDX identifiers	2018-09-14 13:54:02 +03:00
sis	…
sti	drm: sti: don't pass GFP_DMA32 to dma_alloc_wc	2018-10-18 13:50:22 +02:00
stm	drm/stm: Use drm_fbdev_generic_setup()	2018-10-25 17:00:28 +02:00
sun4i	drm-misc-next for v4.21, part 1:	2018-11-19 10:40:33 +10:00
tdfx	…
tegra	drm/tegra: Changes for v4.20-rc1	2018-09-28 09:47:31 +10:00
tilcdc	drm/tilcdc: Use drm_fbdev_generic_setup()	2018-11-01 15:25:41 +01:00
tinydrm	drm/tinydrm: Fix setting of the column/page end addresses.	2018-10-30 16:23:38 -07:00
ttm	Merge branch 'drm-next-4.21' of git://people.freedesktop.org/~agd5f/linux into drm-next	2018-11-19 11:07:52 +10:00
tve200	drm/tve200: Use drm_fbdev_generic_setup()	2018-09-25 11:34:24 +02:00
udl	DRM: UDL: get rid of useless vblank initialization	2018-10-23 15:59:01 +02:00
v3d	Merge branch 'drm-next-4.21' of git://people.freedesktop.org/~agd5f/linux into drm-next	2018-11-19 11:07:52 +10:00
vc4	dma-buf: allow reserving more than one shared fence slot	2018-10-25 13:45:07 +02:00
vgem	drm/vgem: Fix typo in driver feature flags	2018-11-05 15:31:51 +00:00
via	…
virtio	Merge branch 'drm-next-4.21' of git://people.freedesktop.org/~agd5f/linux into drm-next	2018-11-19 11:07:52 +10:00
vkms	drm/vkms: provide a parent device to drm_dev_init()	2018-10-29 11:13:40 +00:00
vmwgfx	Merge branch 'drm-next-4.21' of git://people.freedesktop.org/~agd5f/linux into drm-next	2018-11-19 11:07:52 +10:00
xen	drm: Replace NULL with error value in drm_prime_pages_to_sg	2018-07-23 11:47:35 +03:00
zte	drm/zte: Use drm_atomic_helper_shutdown	2018-10-05 18:04:10 +02:00
Kconfig	drm/fb_helper: Allow leaking fbdev smem_start	2018-10-03 21:08:21 +02:00
Makefile	Merge branch 'drm-next-4.21' of git://people.freedesktop.org/~agd5f/linux into drm-next	2018-11-19 11:07:52 +10:00
ati_pcigart.c	…
drm_agpsupport.c	…
drm_atomic.c	drm pull for 4.20-rc1	2018-10-28 17:49:53 -07:00
drm_atomic_helper.c	drm-misc-next for v4.21, part 1:	2018-11-19 10:40:33 +10:00
drm_atomic_state_helper.c	drm: Extract drm_atomic_state_helper.[hc]	2018-10-05 18:04:09 +02:00
drm_atomic_uapi.c	drm/atomic_helper: Stop modesets on unregistered connectors harder	2018-10-19 11:46:46 +03:00
drm_auth.c	…
drm_blend.c	drm: Clarify DRM_MODE_REFLECT_X/Y documentation	2018-09-11 11:21:30 +01:00
drm_bridge.c	drm: bridge: document bridge attach/detach imbalance	2018-09-13 11:28:12 +02:00
drm_bufs.c	drm/bufs: Fix Spectre v1 vulnerability	2018-10-17 09:17:33 +02:00
drm_cache.c	…
drm_client.c	drm pull for 4.20-rc1	2018-10-28 17:49:53 -07:00
drm_color_mgmt.c	drm: Differentiate the lack of an interface from invalid parameter	2018-09-14 17:29:47 +01:00
drm_connector.c	drm-misc-next for v4.21, part 1:	2018-11-19 10:40:33 +10:00
drm_context.c	drm: Differentiate the lack of an interface from invalid parameter	2018-09-14 17:29:47 +01:00
drm_crtc.c	drm pull for 4.20-rc1	2018-10-28 17:49:53 -07:00
drm_crtc_helper.c	drm: Remove transitional helpers	2018-10-05 18:04:10 +02:00
drm_crtc_helper_internal.h	…
drm_crtc_internal.h	drm: refuse ADDFB2 ioctl for broken bigendian drivers	2018-09-10 07:10:36 +02:00
drm_debugfs.c	drm/atomic: Use drm_drv_uses_atomic_modeset() for debugfs creation	2018-09-17 19:24:37 -04:00
drm_debugfs_crc.c	Revert "drm: crc: Wait for a frame before returning from open()"	2018-08-22 09:50:16 -07:00
drm_dma.c	…
drm_dp_aux_dev.c	…
drm_dp_cec.c	drm: Do not call drm_dp_cec_set_edid() while registering DP connectors	2018-10-11 10:52:35 +02:00
drm_dp_dual_mode_helper.c	…
drm_dp_helper.c	drm: add LG eDP panel to quirk database	2018-09-19 16:44:12 +03:00
drm_dp_mst_topology.c	drm-misc-next for v4.21, part 1:	2018-11-19 10:40:33 +10:00
drm_drv.c	Merge branch 'drm-next-4.21' of git://people.freedesktop.org/~agd5f/linux into drm-next	2018-11-19 11:07:52 +10:00
drm_dumb_buffers.c	…
drm_edid.c	drm, i915, amdgpu, bridge + core quirk	2018-11-02 10:58:20 -07:00
drm_edid_load.c	…
drm_encoder.c	drm: Differentiate the lack of an interface from invalid parameter	2018-09-14 17:29:47 +01:00
drm_encoder_slave.c	…
drm_fb_cma_helper.c	drm-misc-next for v4.21, part 1:	2018-11-19 10:40:33 +10:00
drm_fb_helper.c	drm-misc-next for v4.21, part 1:	2018-11-19 10:40:33 +10:00
drm_file.c	…
drm_flip_work.c	…
drm_fourcc.c	drm: Fix htmldocs warnings in drm_fourcc.c	2018-11-07 16:16:27 -05:00
drm_framebuffer.c	drm: Add macro to export functions only when CONFIG_DRM_DEBUG_SELFTEST is enabled	2018-11-02 09:58:10 +00:00
drm_gem.c	drm: Differentiate the lack of an interface from invalid parameter	2018-09-14 17:29:47 +01:00
drm_gem_cma_helper.c	drm: Replace NULL with error value in drm_prime_pages_to_sg	2018-07-23 11:47:35 +03:00
drm_gem_framebuffer_helper.c	drm/fourcc: Add char_per_block, block_w and block_h in drm_format_info	2018-11-02 09:55:27 +00:00
drm_hashtab.c	…
drm_info.c	…
drm_internal.h	drm: Drop drmP.h from drm_connector.c	2018-09-09 14:19:17 +02:00
drm_ioc32.c	…
drm_ioctl.c	drm: Return -EOPNOTSUPP in drm_setclientcap() when driver do not support KMS	2018-09-21 11:19:40 +02:00
drm_irq.c	drm: Differentiate the lack of an interface from invalid parameter	2018-09-14 17:29:47 +01:00
drm_kms_helper_common.c	…
drm_lease.c	drm-misc-next for v4.21, part 1:	2018-11-19 10:40:33 +10:00
drm_legacy.h	…
drm_lock.c	drm: Differentiate the lack of an interface from invalid parameter	2018-09-14 17:29:47 +01:00
drm_memory.c	drm: Shift * to be adjacent to pointer name	2018-10-16 14:39:25 +02:00
drm_mipi_dsi.c	drm: Add support for pps and compression mode command packet	2018-07-25 07:51:05 -04:00
drm_mm.c	…
drm_mode_config.c	drm: Differentiate the lack of an interface from invalid parameter	2018-09-14 17:29:47 +01:00
drm_mode_object.c	drm: Remove 80-column line in drm_mode_object.c	2018-11-01 18:54:05 +01:00
drm_modes.c	drm: Convert to using %pOFn instead of device_node.name	2018-10-01 10:16:39 +02:00
drm_modeset_helper.c	drm: Unexport primary plane helpers	2018-10-05 18:06:49 +02:00
drm_modeset_lock.c	…
drm_of.c	…
drm_panel.c	This is the 4.19-rc6 release	2018-10-04 11:03:34 +10:00
drm_panel_orientation_quirks.c	Merge drm/drm-next into drm-misc-next	2018-10-24 14:26:04 -04:00
drm_pci.c	drm/drm_pci.c: Use dma_zalloc_coherent	2018-10-23 15:59:01 +02:00
drm_plane.c	drm: Add drm_any_plane_has_format()	2018-11-06 21:34:22 +02:00
drm_plane_helper.c	drm: Unexport drm_plane_helper_check_update	2018-10-05 22:45:19 +02:00
drm_prime.c	drm: Remove defunct dma_buf_kmap stubs	2018-10-05 16:45:40 +01:00
drm_print.c	drm: Add puts callback for the coredump printer	2018-07-30 08:49:41 -04:00
drm_probe_helper.c	…
drm_property.c	drm: Differentiate the lack of an interface from invalid parameter	2018-09-14 17:29:47 +01:00
drm_rect.c	…
drm_scatter.c	drm: Differentiate the lack of an interface from invalid parameter	2018-09-14 17:29:47 +01:00
drm_scdc_helper.c	…
drm_simple_kms_helper.c	drm/tinydrm: Advertise that we can do only DRM_FORMAT_MOD_LINEAR.	2018-10-30 13:01:50 -07:00
drm_syncobj.c	drm/syncobj: Fix oops on drm_syncobj_find_fence(file_priv, 0, ...).	2018-11-06 12:53:02 +01:00
drm_sysfs.c	…
drm_trace.h	…
drm_trace_points.c	…
drm_vblank.c	drm: Differentiate the lack of an interface from invalid parameter	2018-09-14 17:29:47 +01:00
drm_vm.c	…
drm_vma_manager.c	drm: Remove "protection" around drm_vma_offset_manager_destroy()	2018-09-04 19:00:32 +01:00
drm_writeback.c	drm: writeback: Fix doc that says connector should be disconnected	2018-07-16 16:35:27 +01:00