OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Christian Gmeiner	f51d753f81	drm/etnaviv: print offender task information on hangcheck recovery Track the pid per submit, so we can print the name and cmdline of the task which submitted the batch that caused the gpu to hang. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>	2022-08-19 18:31:43 +02:00
Daniel Vetter	b827c84f5e	drm/etnaviv: Use scheduler dependency handling We need to pull the drm_sched_job_init much earlier, but that's very minor surgery. v2: Actually fix up cleanup paths by calling drm_sched_job_init, which I wanted to to in the previous round (and did, for all other drivers). Spotted by Lucas. v3: Rebase over renamed functions to add dependencies. v4: Rebase over patches from Christian. v5: More rebasing over work from Christian. Acked-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Russell King <linux+etnaviv@armlinux.org.uk> Cc: Christian Gmeiner <christian.gmeiner@gmail.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: "Christian König" <christian.koenig@amd.com> Cc: etnaviv@lists.freedesktop.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Link: https://patchwork.freedesktop.org/patch/msgid/20220331204651.2699107-2-daniel.vetter@ffwll.ch	2022-04-04 16:45:49 +02:00
Christian König	0941a4e3c6	drm/etnaviv: stop using dma_resv_excl_fence v2 We can get the excl fence together with the shared ones as well. v2: rename the member to fences as well Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Russell King <linux+etnaviv@armlinux.org.uk> Cc: Christian Gmeiner <christian.gmeiner@gmail.com> Cc: etnaviv@lists.freedesktop.org Link: https://patchwork.freedesktop.org/patch/msgid/20220321135856.1331-5-christian.koenig@amd.com	2022-03-24 10:09:20 +01:00
Jiawei Gu	8ab62eda17	drm/sched: Add device pointer to drm_gpu_scheduler Add device pointer so scheduler's printing can use DRM_DEV_ERROR() instead, which makes life easier under multiple GPU scenario. v2: amend all calls of drm_sched_init() v3: fill dev pointer for all drm_sched_init() calls Signed-off-by: Jiawei Gu <Jiawei.Gu@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220221095705.5290-1-Jiawei.Gu@amd.com	2022-02-23 10:04:14 +01:00
Lucas Stach	cdd156955f	drm/etnaviv: consider completed fence seqno in hang check Some GPU heavy test programs manage to trigger the hangcheck quite often. If there are no other GPU users in the system and the test program exhibits a very regular structure in the commandstreams that are being submitted, we can end up with two distinct submits managing to trigger the hangcheck with the FE in a very similar address range. This leads the hangcheck to believe that the GPU is stuck, while in reality the GPU is already busy working on a different job. To avoid those spurious GPU resets, also remember and consider the last completed fence seqno in the hang check. Reported-by: Joerg Albert <joerg.albert@iav.de> Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2021-12-23 20:21:33 +01:00
Daniel Vetter	0e10e9a1db	drm/sched: drop entity parameter from drm_sched_push_job Originally a job was only bound to the queue when we pushed this, but now that's done in drm_sched_job_init, making that parameter entirely redundant. Remove it. The same applies to the context parameter in lima_sched_context_queue_task, simplify that too. v2: Rebase on top of msm adopting drm/sched Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Emma Anholt <emma@anholt.net> Acked-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Steven Price <steven.price@arm.com> (v1) Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> (v1) Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Russell King <linux+etnaviv@armlinux.org.uk> Cc: Christian Gmeiner <christian.gmeiner@gmail.com> Cc: Qiang Yu <yuq825@gmail.com> Cc: Rob Herring <robh@kernel.org> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com> Cc: Steven Price <steven.price@arm.com> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: Emma Anholt <emma@anholt.net> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: "Christian König" <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Nirmoy Das <nirmoy.das@amd.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Chen Li <chenli@uniontech.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Deepak R Varma <mh12gx2825@gmail.com> Cc: Kevin Wang <kevin1.wang@amd.com> Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: "Marek Olšák" <marek.olsak@amd.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Cc: Dennis Li <Dennis.Li@amd.com> Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: etnaviv@lists.freedesktop.org Cc: lima@lists.freedesktop.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: Rob Clark <robdclark@gmail.com> Cc: Sean Paul <sean@poorly.run> Cc: Melissa Wen <mwen@igalia.com> Cc: linux-arm-msm@vger.kernel.org Cc: freedreno@lists.freedesktop.org Link: https://patchwork.freedesktop.org/patch/msgid/20210805104705.862416-6-daniel.vetter@ffwll.ch	2021-08-30 10:54:45 +02:00
Daniel Vetter	dbe48d030b	drm/sched: Split drm_sched_job_init This is a very confusingly named function, because not just does it init an object, it arms it and provides a point of no return for pushing a job into the scheduler. It would be nice if that's a bit clearer in the interface. But the real reason is that I want to push the dependency tracking helpers into the scheduler code, and that means drm_sched_job_init must be called a lot earlier, without arming the job. v2: - don't change .gitignore (Steven) - don't forget v3d (Emma) v3: Emma noticed that I leak the memory allocated in drm_sched_job_init if we bail out before the point of no return in subsequent driver patches. To be able to fix this change drm_sched_job_cleanup() so it can handle being called both before and after drm_sched_job_arm(). Also improve the kerneldoc for this. v4: - Fix the drm_sched_job_cleanup logic, I inverted the booleans, as usual (Melissa) - Christian pointed out that drm_sched_entity_select_rq() also needs to be moved into drm_sched_job_arm, which made me realize that the job->id definitely needs to be moved too. Shuffle things to fit between job_init and job_arm. v5: Reshuffle the split between init/arm once more, amdgpu abuses drm_sched.ready to signal gpu reset failures. Also document this somewhat. (Christian) v6: Rebase on top of the msm drm/sched support. Note that the drm_sched_job_init() call is completely misplaced, and hence also the split-out drm_sched_entity_push_job(). I've put in a FIXME which the next patch will address. v7: Drop the FIXME in msm, after discussions with Rob I agree it shouldn't be a problem where it is now. Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Melissa Wen <mwen@igalia.com> Cc: Melissa Wen <melissa.srw@gmail.com> Acked-by: Emma Anholt <emma@anholt.net> Acked-by: Steven Price <steven.price@arm.com> (v2) Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> (v5) Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Russell King <linux+etnaviv@armlinux.org.uk> Cc: Christian Gmeiner <christian.gmeiner@gmail.com> Cc: Qiang Yu <yuq825@gmail.com> Cc: Rob Herring <robh@kernel.org> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com> Cc: Steven Price <steven.price@arm.com> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: "Christian König" <christian.koenig@amd.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Adam Borowski <kilobyte@angband.pl> Cc: Nick Terrell <terrelln@fb.com> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Paul Menzel <pmenzel@molgen.mpg.de> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Nirmoy Das <nirmoy.das@amd.com> Cc: Deepak R Varma <mh12gx2825@gmail.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Kevin Wang <kevin1.wang@amd.com> Cc: Chen Li <chenli@uniontech.com> Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: "Marek Olšák" <marek.olsak@amd.com> Cc: Dennis Li <Dennis.Li@amd.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Cc: Sonny Jiang <sonny.jiang@amd.com> Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: Tian Tao <tiantao6@hisilicon.com> Cc: etnaviv@lists.freedesktop.org Cc: lima@lists.freedesktop.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: Emma Anholt <emma@anholt.net> Cc: Rob Clark <robdclark@gmail.com> Cc: Sean Paul <sean@poorly.run> Cc: linux-arm-msm@vger.kernel.org Cc: freedreno@lists.freedesktop.org Link: https://patchwork.freedesktop.org/patch/msgid/20210817084917.3555822-1-daniel.vetter@ffwll.ch	2021-08-30 10:50:44 +02:00
Boris Brezillon	78efe21b6f	drm/sched: Allow using a dedicated workqueue for the timeout/fault tdr Mali Midgard/Bifrost GPUs have 3 hardware queues but only a global GPU reset. This leads to extra complexity when we need to synchronize timeout works with the reset work. One solution to address that is to have an ordered workqueue at the driver level that will be used by the different schedulers to queue their timeout work. Thanks to the serialization provided by the ordered workqueue we are guaranteed that timeout handlers are executed sequentially, and can thus easily reset the GPU from the timeout handler without extra synchronization. v5: * Add a new paragraph to the timedout_job() method v3: * New patch v4: * Actually use the timeout_wq to queue the timeout work Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Christian König <christian.koenig@amd.com> Cc: Qiang Yu <yuq825@gmail.com> Cc: Emma Anholt <emma@anholt.net> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210630062751.2832545-3-boris.brezillon@collabora.com	2021-07-01 08:53:25 +02:00
Christian König	f2f12eb9c3	drm/scheduler: provide scheduler score externally Allow multiple schedulers to share the load balancing score. This is useful when one engine has different hw rings. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-and-Tested-by: Leo Liu <leo.liu@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210204144405.2737-1-christian.koenig@amd.com	2021-02-05 10:47:11 +01:00
Luben Tuikov	a6a1f036c7	drm/scheduler: Job timeout handler returns status (v3) This patch does not change current behaviour. The driver's job timeout handler now returns status indicating back to the DRM layer whether the device (GPU) is no longer available, such as after it's been unplugged, or whether all is normal, i.e. current behaviour. All drivers which make use of the drm_sched_backend_ops' .timedout_job() callback have been accordingly renamed and return the would've-been default value of DRM_GPU_SCHED_STAT_NOMINAL to restart the task's timeout timer--this is the old behaviour, and is preserved by this patch. v2: Use enum as the status of a driver's job timeout callback method. v3: Return scheduler/device information, rather than task information. Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Russell King <linux+etnaviv@armlinux.org.uk> Cc: Christian Gmeiner <christian.gmeiner@gmail.com> Cc: Qiang Yu <yuq825@gmail.com> Cc: Rob Herring <robh@kernel.org> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com> Cc: Steven Price <steven.price@arm.com> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: Eric Anholt <eric@anholt.net> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Steven Price <steven.price@arm.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/415095/	2021-01-29 11:30:22 +01:00
Lucas Stach	50248a3ec0	drm/etnaviv: always start/stop scheduler in timeout processing The drm scheduler currently expects that the stop/start sequence is always executed in the timeout handling, as the job at the head of the hardware execution list is always removed from the ring mirror before the driver function is called and only inserted back into the list when starting the scheduler. This adds some unnecessary overhead if the timeout handler determines that the GPU is still executing jobs normally and just wished to extend the timeout, but a better solution requires a major rearchitecture of the scheduler, which is not applicable as a fix. Fixes: `135517d356` ("drm/scheduler: Avoid accessing freed bad job.") Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Tested-by: Russell King <rmk+kernel@armlinux.org.uk>	2020-08-24 17:21:21 +02:00
Lucas Stach	9a1fdae587	drm/etnaviv: dump only failing submit Due to the tracking provided by the scheduler we know exactly which submit is failing. Only dump this single submit and the required auxiliary information. This cuts down the size of the devcoredumps by only including relevant information. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>	2019-08-15 10:48:51 +02:00
Lucas Stach	2e737e5205	drm/etnaviv: clean up includes Drop unused includes, move more includes from the generic etnaviv_drv.h to the units where they are actually used, sort includes. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Acked-by: Sam Ravnborg <sam@ravnborg.org>	2019-08-02 19:17:33 +02:00
Christian König	5918045c4e	drm/scheduler: rework job destruction We now destroy finished jobs from the worker thread to make sure that we never destroy a job currently in timeout processing. By this we avoid holding lock around ring mirror list in drm_sched_stop which should solve a deadlock reported by a user. v2: Remove unused variable. v4: Move guilty job free into sched code. v5: Move sched->hw_rq_count to drm_sched_start to account for counter decrement in drm_sched_stop even when we don't call resubmit jobs if guily job did signal. v6: remove unused variable Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692 Acked-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/1555599624-12285-3-git-send-email-andrey.grodzovsky@amd.com	2019-05-02 15:45:48 -05:00
Dave Airlie	3a7d2f4f44	Merge branch 'etnaviv/next' of https://git.pengutronix.de/git/lst/linux into drm-next "small fixes and a change to not restrict etnaviv to certain architectures." Signed-off-by: Dave Airlie <airlied@redhat.com> From: Lucas Stach <l.stach@pengutronix.de> Link: https://patchwork.freedesktop.org/patch/msgid/4bc1a4c8447bb947d2fe8facd0ff09c5b8753087.camel@pengutronix.de	2019-03-12 15:20:57 +10:00
Andrey Grodzovsky	222b5f0441	drm/sched: Refactor ring mirror list handling. Decauple sched threads stop and start and ring mirror list handling from the policy of what to do about the guilty jobs. When stoppping the sched thread and detaching sched fences from non signaled HW fenes wait for all signaled HW fences to complete before rerunning the jobs. v2: Fix resubmission of guilty job into HW after refactoring. v4: Full restart for all the jobs, not only from guilty ring. Extract karma increase into standalone function. v5: Rework waiting for signaled jobs without relying on the job struct itself as those might already be freed for non 'guilty' job's schedulers. Expose karma increase to drivers. v6: Use list_for_each_entry_safe_continue and drm_sched_process_job in case fence already signaled. Call drm_sched_increase_karma only once for amdgpu and add documentation. v7: Wait only for the latest job's fence. Suggested-by: Christian Koenig <Christian.Koenig@amd.com> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2019-01-25 16:15:36 -05:00
Lucas Stach	72ac64b84b	drm/etnaviv: move job context pointer to etnaviv_gem_submit The context isn't really related to the cmdbuf, but is a property of the job. This has been missed when moving to a properly refcounted etnaviv_gem_submit. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-01-07 16:10:20 +01:00
Dave Airlie	9235dd441a	Merge branch 'drm-next-4.21' of git://people.freedesktop.org/~agd5f/linux into drm-next New features for 4.21: amdgpu: - Support for SDMA paging queue on vega - Put compute EOP buffers into vram for better performance - Share more code with amdkfd - Support for scanout with DCC on gfx9 - Initial kerneldoc for DC - Updated SMU firmware support for gfx8 chips - Rework CSA handling for eventual support for preemption - XGMI PSP support - Clean up RLC handling - Enable GPU reset by default on VI, SOC15 dGPUs - Ring and IB test cleanups amdkfd: - Share more code with amdgpu ttm: - Move global init out of the drivers scheduler: - Track if schedulers are ready for work - Timeout/fault handling changes to facilitate GPU recovery Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181114165113.3751-1-alexander.deucher@amd.com	2018-11-19 11:07:52 +10:00
Dave Airlie	d99de699ac	Merge branch 'etnaviv/fixes' of https://git.pengutronix.de/git/lst/linux into drm-fixes Single etnaviv fence fix for GPU recovery. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Lucas Stach <l.stach@pengutronix.de> Link: https://patchwork.freedesktop.org/patch/msgid/1541522424.2508.26.camel@pengutronix.de	2018-11-07 09:32:30 +10:00
Sharat Masetty	26efecf955	drm/scheduler: Add drm_sched_job_cleanup This patch adds a new API to clean up the scheduler job resources. This is primarliy needed in cases the job was created but was not queued to the scheduler queue. Additionally with this change, the layer which creates the scheduler job also gets to free up the job's resources and this entails moving the dma_fence_put(finished_fence) to the drivers ops free handler routines. Signed-off-by: Sharat Masetty <smasetty@codeaurora.org> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-11-05 14:21:27 -05:00
Christian König	19067e522d	drm/sched: make sure timer is restarted Make sure we always restart the timer after a timeout and remove the device specific workarounds. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-11-05 14:21:01 -05:00
Lucas Stach	6fce3a4061	drm/etnaviv: fix bogus fence complete check in timeout handler The GPU hardware fences and the job out-fences are on different timelines so it's wrong to compare them. Fix this by only looking at the out-fence. Cc: <stable@vger.kernel.org> Fixes: `2c83a726d6` (drm/etnaviv: bring back progress check in job timeout handler) Signed-off-by: Lucas Stach <l.stach@pengutronix.de>	2018-11-05 18:14:48 +01:00
Nayan Deshmukh	6a96243056	drm/scheduler: remove timeout work_struct from drm_sched_job (v3) having a delayed work item per job is redundant as we only need one per scheduler to track the time out the currently executing job. v2: the first element of the ring mirror list is the currently executing job so we don't need a additional variable for it v3: squash in fixes for v3d and etnaviv Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-09-27 09:55:45 -05:00
Dave Airlie	569f0a8694	Merge branch 'etnaviv/next' of https://git.pengutronix.de/git/lst/linux into drm-next From: Lucas Stach <l.stach@pengutronix.de> "not much to de-stage this time. Changes from Philipp and Souptick to use memset32 more and switch the fault handler to the new vm_fault_t and two small fixes for issues that can be hit in rare corner cases from me." Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/1533563808.2809.7.camel@pengutronix.de	2018-08-08 06:07:30 +10:00
Lucas Stach	a0780bb1df	drm/etnaviv: protect sched job submission with fence mutex The documentation of drm_sched_job_init and drm_sched_entity_push_job has been clarified. Both functions should be called under a shared lock, to avoid jobs getting pushed into the scheduler queue in a different order than their sched_fence seqnos, which will confuse checks that are looking at the seqnos to infer information about completion order. Signed-off-by: Lucas Stach <l.stach@pengutronix.de>	2018-08-06 15:24:05 +02:00
Dave Airlie	3fce461827	BackMerge v4.18-rc7 into drm-next rmk requested this for armada and I think we've had a few conflicts build up. Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-07-30 10:39:22 +10:00
Nayan Deshmukh	cdc5017659	drm/scheduler: modify API to avoid redundancy entity has a scheduler field and we don't need the sched argument in any of the functions where entity is provided. Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2018-07-25 15:06:19 -05:00
Lucas Stach	2c83a726d6	drm/etnaviv: bring back progress check in job timeout handler When the hangcheck handler was replaced by the DRM scheduler timeout handling we dropped the forward progress check, as this might allow clients to hog the GPU for a long time with a big job. It turns out that even reasonably well behaved clients like the Armada Xorg driver occasionally trip over the 500ms timeout. Bring back the forward progress check to get rid of the userspace regression. We would still like to fix userspace to submit smaller batches if possible, but that is for another day. Cc: <stable@vger.kernel.org> Fixes: `6d7a20c077` (drm/etnaviv: replace hangcheck with scheduler timeout) Reported-by: Russell King <linux@armlinux.org.uk> Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-07-05 14:16:20 +02:00
Lucas Stach	f6ffbd4fc1	drm/etnaviv: replace license text with SPDX tags This replaces the repetitive GPL-2.0 license text in code and header files with the SPDX tags. Generated hardware headers aren't changed, as any changes there need to be done in the upstream rnndb repository. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2018-05-18 15:27:56 +02:00
Lucas Stach	4ed75c3e52	drm/etnaviv: bump HW job limit to 4 The current limit of 2 leads to some GPU idle times, as the usual IRQ latency leads to up to 3 jobs getting signaled at once with some standard workloads. A larger HW job limit might lead to slightly worse QoS, but we accept that to not sacrifice GPU throughput in the common case. Signed-off-by: Lucas Stach <l.stach@pengutronix.de>	2018-03-22 11:08:48 +01:00
Fabio Estevam	fc0775da8e	drm/etnaviv: etnaviv_sched: Staticize functions when possible etnaviv_sched_dependency() and etnaviv_sched_run_job() are only used in this file, so make them static. This fixes the following sparse warnings: drivers/gpu/drm/etnaviv/etnaviv_sched.c:30:18: warning: symbol 'etnaviv_sched_dependency' was not declared. Should it be static? drivers/gpu/drm/etnaviv/etnaviv_sched.c:81:18: warning: symbol 'etnaviv_sched_run_job' was not declared. Should it be static? Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>	2018-03-09 12:25:01 +01:00
Lucas Stach	6d7a20c077	drm/etnaviv: replace hangcheck with scheduler timeout This replaces the etnaviv internal hangcheck logic with the job timeout handling provided by the DRM scheduler. This simplifies the driver further and allows to replay jobs after a GPU reset, so only minimal state is lost. This introduces a user-visible change in that we don't allow jobs to run indefinitely as long as they make progress anymore, as this introduces quality of service issues when multiple processes are using the GPU. Userspace is now responsible to flush jobs in a way that the finish in a reasonable time, where reasonable is currently defined as less than 500ms. Signed-off-by: Lucas Stach <l.stach@pengutronix.de>	2018-02-12 16:31:01 +01:00
Lucas Stach	683da226f8	drm/etnaviv: move dependency handling to scheduler Move the fence dependency handling to the scheduler where it belongs. Jobs with unsignaled dependencies just get to sit in the scheduler queue without holding any locks. Signed-off-by: Lucas Stach <l.stach@pengutronix.de>	2018-02-12 16:31:00 +01:00
Lucas Stach	e93b6deeb4	drm/etnaviv: hook up DRM GPU scheduler This hooks in the DRM GPU scheduler. No improvement yet, as all the dependency handling is still done in etnaviv_gem_submit. This just replaces the actual GPU submit by passing through the scheduler. Allows to get rid of the retire worker, as this is now driven by the scheduler. Signed-off-by: Lucas Stach <l.stach@pengutronix.de>	2018-02-12 16:30:59 +01:00

34 Commits