Commit Graph

45 Commits

Author SHA1 Message Date
Dmitry Osipenko 03dec92c4f drm/scheduler: Fix lockup in drm_sched_entity_kill()
The drm_sched_entity_kill() is invoked twice by drm_sched_entity_destroy()
while userspace process is exiting or being killed. First time it's invoked
when sched entity is flushed and second time when entity is released. This
causes a lockup within wait_for_completion(entity_idle) due to how completion
API works.

Calling wait_for_completion() more times than complete() was invoked is a
error condition that causes lockup because completion internally uses
counter for complete/wait calls. The complete_all() must be used instead
in such cases.

This patch fixes lockup of Panfrost driver that is reproducible by killing
any application in a middle of 3d drawing operation.

Fixes: 2fdb8a8f07 ("drm/scheduler: rework entity flush, kill and fini")
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://patchwork.freedesktop.org/patch/msgid/20221123001303.533968-1-dmitry.osipenko@collabora.com
2023-01-02 17:45:18 +03:00
Thomas Zimmermann 8e4e4c2f53 Merge drm/drm-next into drm-misc-next
Backmerging drm/drm-next to get the latest changes in the xlnx driver.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2022-11-05 16:08:36 +01:00
Christian König a82f30b04c drm/scheduler: rename dependency callback into prepare_job
This now matches much better what this is doing.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-14-christian.koenig@amd.com
2022-11-03 12:45:20 +01:00
Christian König 2fdb8a8f07 drm/scheduler: rework entity flush, kill and fini
This was buggy because when we had to wait for entities which were
killed as well we would just deadlock.

Instead move all the dependency handling into the callbacks so that
will all happen asynchronously.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-13-christian.koenig@amd.com
2022-11-03 12:45:20 +01:00
Dave Airlie b837d3db9a drm-misc-next for 6.2:
UAPI Changes:
   - Documentation for page-flip flags
 
 Cross-subsystem Changes:
   - dma-buf: Add unlocked variant of vmapping and attachment-mapping
     functions
 
 Core Changes:
   - atomic-helpers: CRTC primary plane test fixes
   - connector: TV API consistency improvements, cmdline parsing
     improvements
   - crtc-helpers: Introduce drm_crtc_helper_atomic_check() helper
   - edid: Fixes for HFVSDB parsing,
   - fourcc: Addition of the Vivante tiled modifier
   - makefile: Sort and reorganize the objects files
   - mode_config: Remove fb_base from drm_mode_config_funcs
   - sched: Add a module parameter to change the scheduling policy,
     refcounting fix for fences
   - tests: Sort the Kunit tests in the Makefile, improvements to the
     DP-MST tests
   - ttm: Remove unnecessary drm_mm_clean() call
 
 Driver Changes:
   - New driver: ofdrm
   - Move all drivers to a common dma-buf locking convention
   - bridge:
     - adv7533: Remove dynamic lane switching
     - it6505: Runtime PM support
     - ps8640: Handle AUX defer messages
     - tc358775: Drop soft-reset over I2C
   - ast: Atomic Gamma LUT Support, Convert to SHMEM, various
     improvements
   - lcdif: Support for YUV planes
   - mgag200: Fix PLL Setup on some revisions
   - udl: Modesetting improvements, hot-unplug support
   - vc4: Fix support for PAL-M
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCY1D3fQAKCRDj7w1vZxhR
 xZZgAPoCSfyU3+M3ALT0vQ/OpktE5tuwUBMac2Lxkqgx4dnQ0gEAnYeez0Dedod8
 HNdgBH7FTklYa/zT8n17SwHUOzJJ5gc=
 =YvuW
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2022-10-20' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next for 6.2:

UAPI Changes:
  - Documentation for page-flip flags

Cross-subsystem Changes:
  - dma-buf: Add unlocked variant of vmapping and attachment-mapping
    functions

Core Changes:
  - atomic-helpers: CRTC primary plane test fixes
  - connector: TV API consistency improvements, cmdline parsing
    improvements
  - crtc-helpers: Introduce drm_crtc_helper_atomic_check() helper
  - edid: Fixes for HFVSDB parsing,
  - fourcc: Addition of the Vivante tiled modifier
  - makefile: Sort and reorganize the objects files
  - mode_config: Remove fb_base from drm_mode_config_funcs
  - sched: Add a module parameter to change the scheduling policy,
    refcounting fix for fences
  - tests: Sort the Kunit tests in the Makefile, improvements to the
    DP-MST tests
  - ttm: Remove unnecessary drm_mm_clean() call

Driver Changes:
  - New driver: ofdrm
  - Move all drivers to a common dma-buf locking convention
  - bridge:
    - adv7533: Remove dynamic lane switching
    - it6505: Runtime PM support
    - ps8640: Handle AUX defer messages
    - tc358775: Drop soft-reset over I2C
  - ast: Atomic Gamma LUT Support, Convert to SHMEM, various
    improvements
  - lcdif: Support for YUV planes
  - mgag200: Fix PLL Setup on some revisions
  - udl: Modesetting improvements, hot-unplug support
  - vc4: Fix support for PAL-M

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20221020072405.g3o4hxuk75gmeumw@houat
2022-10-25 11:42:18 +10:00
Christian König 7b476affcc drm/sched: add DRM_SCHED_FENCE_DONT_PIPELINE flag
Setting this flag on a scheduler fence prevents pipelining of jobs
depending on this fence. In other words we always insert a full CPU
round trip before dependent jobs are pushed to the pipeline.

Signed-off-by: Christian König <christian.koenig@amd.com>
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2113#note_1579296
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014081553.114899-1-christian.koenig@amd.com
2022-10-19 12:42:51 +02:00
Christian König 7c022f516f drm/scheduler: fix fence ref counting
We leaked dependency fences when processes were beeing killed.

Additional to that grab a reference to the last scheduled fence.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220929180151.139751-1-christian.koenig@amd.com
2022-10-05 12:36:20 +02:00
Andrey Grodzovsky 08fb97de03 drm/sched: Add FIFO sched policy to run queue
When many entities are competing for the same run queue
on the same scheduler, we observe an unusually long wait
times and some jobs get starved. This has been observed on GPUVis.

The issue is due to the Round Robin policy used by schedulers
to pick up the next entity's job queue for execution. Under stress
of many entities and long job queues within entity some
jobs could be stuck for very long time in it's entity's
queue before being popped from the queue and executed
while for other entities with smaller job queues a job
might execute earlier even though that job arrived later
then the job in the long queue.

Fix:
Add FIFO selection policy to entities in run queue, chose next entity
on run queue in such order that if job on one entity arrived
earlier then job on another entity the first job will start
executing earlier regardless of the length of the entity's job
queue.

v2:
Switch to rb tree structure for entities based on TS of
oldest job waiting in the job queue of an entity. Improves next
entity extraction to O(1). Entity TS update
O(log N) where N is the number of entities in the run-queue

Drop default option in module control parameter.

v3:
Various cosmetical fixes and minor refactoring of fifo update function. (Luben)

v4:
Switch drm_sched_rq_select_entity_fifo to in order search (Luben)

v5: Fix up drm_sched_rq_select_entity_fifo loop (Luben)

v6: Add missing drm_sched_rq_remove_fifo_locked

v7: Fix ts sampling bug and more cosmetic stuff (Luben)

v8: Fix module parameter string (Luben)

Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Direct Rendering Infrastructure - Development <dri-devel@lists.freedesktop.org>
Cc: AMD Graphics <amd-gfx@lists.freedesktop.org>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Tested-by: Yunxiang Li (Teddy) <Yunxiang.Li@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220930041258.1050247-1-luben.tuikov@amd.com
2022-09-30 09:12:08 -04:00
Dmitry Osipenko 9b04369b06 drm/scheduler: Don't kill jobs in interrupt context
Interrupt context can't sleep. Drivers like Panfrost and MSM are taking
mutex when job is released, and thus, that code can sleep. This results
into "BUG: scheduling while atomic" if locks are contented while job is
freed. There is no good reason for releasing scheduler's jobs in IRQ
context, hence use normal context to fix the trouble.

Cc: stable@vger.kernel.org
Fixes: 542cff7893 ("drm/sched: Avoid lockdep spalt on killing a processes")
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220411221536.283312-1-dmitry.osipenko@collabora.com
2022-07-15 10:09:15 -04:00
Dmitry Osipenko 7d64c40a7d drm/scheduler: Don't kill jobs in interrupt context
Interrupt context can't sleep. Drivers like Panfrost and MSM are taking
mutex when job is released, and thus, that code can sleep. This results
into "BUG: scheduling while atomic" if locks are contented while job is
freed. There is no good reason for releasing scheduler's jobs in IRQ
context, hence use normal context to fix the trouble.

Cc: stable@vger.kernel.org
Fixes: 542cff7893 ("drm/sched: Avoid lockdep spalt on killing a processes")
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220411221536.283312-1-dmitry.osipenko@collabora.com
2022-05-17 10:06:41 -04:00
Andrey Grodzovsky 542cff7893 drm/sched: Avoid lockdep spalt on killing a processes
Probelm:
Singlaning one sched fence from within another's sched
fence singal callback generates lockdep splat because
the both have same lockdep class of their fence->lock

Fix:
Fix bellow stack by rescheduling to irq work of
signaling and killing of jobs that left when entity is killed.

[11176.741181]  dump_stack+0x10/0x12
[11176.741186] __lock_acquire.cold+0x208/0x2df
[11176.741197]  lock_acquire+0xc6/0x2d0
[11176.741204]  ? dma_fence_signal+0x28/0x80
[11176.741212] _raw_spin_lock_irqsave+0x4d/0x70
[11176.741219]  ? dma_fence_signal+0x28/0x80
[11176.741225]  dma_fence_signal+0x28/0x80
[11176.741230] drm_sched_fence_finished+0x12/0x20 [gpu_sched]
[11176.741240] drm_sched_entity_kill_jobs_cb+0x1c/0x50 [gpu_sched]
[11176.741248] dma_fence_signal_timestamp_locked+0xac/0x1a0
[11176.741254]  dma_fence_signal+0x3b/0x80
[11176.741260] drm_sched_fence_finished+0x12/0x20 [gpu_sched]
[11176.741268] drm_sched_job_done.isra.0+0x7f/0x1a0 [gpu_sched]
[11176.741277] drm_sched_job_done_cb+0x12/0x20 [gpu_sched]
[11176.741284] dma_fence_signal_timestamp_locked+0xac/0x1a0
[11176.741290]  dma_fence_signal+0x3b/0x80
[11176.741296] amdgpu_fence_process+0xd1/0x140 [amdgpu]
[11176.741504] sdma_v4_0_process_trap_irq+0x8c/0xb0 [amdgpu]
[11176.741731]  amdgpu_irq_dispatch+0xce/0x250 [amdgpu]
[11176.741954]  amdgpu_ih_process+0x81/0x100 [amdgpu]
[11176.742174]  amdgpu_irq_handler+0x26/0xa0 [amdgpu]
[11176.742393] __handle_irq_event_percpu+0x4f/0x2c0
[11176.742402] handle_irq_event_percpu+0x33/0x80
[11176.742408]  handle_irq_event+0x39/0x60
[11176.742414]  handle_edge_irq+0x93/0x1d0
[11176.742419]  __common_interrupt+0x50/0xe0
[11176.742426]  common_interrupt+0x80/0x90

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: Daniel Vetter  <daniel.vetter@ffwll.ch>
Suggested-by: Christian König <christian.koenig@amd.com>
Tested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://www.spinics.net/lists/dri-devel/msg321250.html
2021-11-01 11:08:21 -04:00
Daniel Vetter 981b04d968 drm/sched: improve docs around drm_sched_entity
I found a few too many things that are tricky and not documented, so I
started typing.

I found a few more things that looked broken while typing, see the
varios FIXME in drm_sched_entity.

Also some of the usual logics:
- actually include sched_entity.c declarations, that was lost in the
  move here: 620e762f9a ("drm/scheduler: move entity handling into
  separate file")

- Ditch the kerneldoc for internal functions, keep the comments where
  they're describing more than what the function name already implies.

- Switch drm_sched_entity to inline docs.

Acked-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Steven Price <steven.price@arm.com>
Cc: Emma Anholt <emma@anholt.net>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210805104705.862416-7-daniel.vetter@ffwll.ch
2021-08-30 10:57:18 +02:00
Daniel Vetter 0e10e9a1db drm/sched: drop entity parameter from drm_sched_push_job
Originally a job was only bound to the queue when we pushed this, but
now that's done in drm_sched_job_init, making that parameter entirely
redundant.

Remove it.

The same applies to the context parameter in
lima_sched_context_queue_task, simplify that too.

v2:
Rebase on top of msm adopting drm/sched

Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Emma Anholt <emma@anholt.net>
Acked-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Steven Price <steven.price@arm.com> (v1)
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Russell King <linux+etnaviv@armlinux.org.uk>
Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: Qiang Yu <yuq825@gmail.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Cc: Steven Price <steven.price@arm.com>
Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: Emma Anholt <emma@anholt.net>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Nirmoy Das <nirmoy.das@amd.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Chen Li <chenli@uniontech.com>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Deepak R Varma <mh12gx2825@gmail.com>
Cc: Kevin Wang <kevin1.wang@amd.com>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: "Marek Olšák" <marek.olsak@amd.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Cc: Dennis Li <Dennis.Li@amd.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: etnaviv@lists.freedesktop.org
Cc: lima@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: Rob Clark <robdclark@gmail.com>
Cc: Sean Paul <sean@poorly.run>
Cc: Melissa Wen <mwen@igalia.com>
Cc: linux-arm-msm@vger.kernel.org
Cc: freedreno@lists.freedesktop.org
Link: https://patchwork.freedesktop.org/patch/msgid/20210805104705.862416-6-daniel.vetter@ffwll.ch
2021-08-30 10:54:45 +02:00
Daniel Vetter ebd5f74255 drm/sched: Add dependency tracking
Instead of just a callback we can just glue in the gem helpers that
panfrost, v3d and lima currently use. There's really not that many
ways to skin this cat.

v2/3: Rebased.

v4: Repaint this shed. The functions are now called _add_dependency()
and _add_implicit_dependency()

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> (v3)
Reviewed-by: Steven Price <steven.price@arm.com> (v1)
Acked-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Nirmoy Das <nirmoy.aiemd@gmail.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Link: https://patchwork.freedesktop.org/patch/msgid/20210805104705.862416-5-daniel.vetter@ffwll.ch
2021-08-30 10:54:12 +02:00
Daniel Vetter b0a5303d4e drm/sched: Barriers are needed for entity->last_scheduled
It might be good enough on x86 with just READ_ONCE, but the write side
should then at least be WRITE_ONCE because x86 has total store order.

It's definitely not enough on arm.

Fix this proplery, which means
- explain the need for the barrier in both places
- point at the other side in each comment

Also pull out the !sched_list case as the first check, so that the
code flow is clearer.

While at it sprinkle some comments around because it was very
non-obvious to me what's actually going on here and why.

Note that we really need full barriers here, at first I thought
store-release and load-acquire on ->last_scheduled would be enough,
but we actually requiring ordering between that and the queue state.

v2: Put smp_rmp() in the right place and fix up comment (Andrey)

Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Steven Price <steven.price@arm.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210805104705.862416-4-daniel.vetter@ffwll.ch
2021-08-30 10:53:48 +02:00
Daniel Vetter dbe48d030b drm/sched: Split drm_sched_job_init
This is a very confusingly named function, because not just does it
init an object, it arms it and provides a point of no return for
pushing a job into the scheduler. It would be nice if that's a bit
clearer in the interface.

But the real reason is that I want to push the dependency tracking
helpers into the scheduler code, and that means drm_sched_job_init
must be called a lot earlier, without arming the job.

v2:
- don't change .gitignore (Steven)
- don't forget v3d (Emma)

v3: Emma noticed that I leak the memory allocated in
drm_sched_job_init if we bail out before the point of no return in
subsequent driver patches. To be able to fix this change
drm_sched_job_cleanup() so it can handle being called both before and
after drm_sched_job_arm().

Also improve the kerneldoc for this.

v4:
- Fix the drm_sched_job_cleanup logic, I inverted the booleans, as
  usual (Melissa)

- Christian pointed out that drm_sched_entity_select_rq() also needs
  to be moved into drm_sched_job_arm, which made me realize that the
  job->id definitely needs to be moved too.

  Shuffle things to fit between job_init and job_arm.

v5:
Reshuffle the split between init/arm once more, amdgpu abuses
drm_sched.ready to signal gpu reset failures. Also document this
somewhat. (Christian)

v6:
Rebase on top of the msm drm/sched support. Note that the
drm_sched_job_init() call is completely misplaced, and hence also the
split-out drm_sched_entity_push_job(). I've put in a FIXME which the next
patch will address.

v7: Drop the FIXME in msm, after discussions with Rob I agree it shouldn't
be a problem where it is now.

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Melissa Wen <mwen@igalia.com>
Cc: Melissa Wen <melissa.srw@gmail.com>
Acked-by: Emma Anholt <emma@anholt.net>
Acked-by: Steven Price <steven.price@arm.com> (v2)
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> (v5)
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Russell King <linux+etnaviv@armlinux.org.uk>
Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: Qiang Yu <yuq825@gmail.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Cc: Steven Price <steven.price@arm.com>
Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Adam Borowski <kilobyte@angband.pl>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Nirmoy Das <nirmoy.das@amd.com>
Cc: Deepak R Varma <mh12gx2825@gmail.com>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Kevin Wang <kevin1.wang@amd.com>
Cc: Chen Li <chenli@uniontech.com>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: "Marek Olšák" <marek.olsak@amd.com>
Cc: Dennis Li <Dennis.Li@amd.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Cc: Sonny Jiang <sonny.jiang@amd.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Tian Tao <tiantao6@hisilicon.com>
Cc: etnaviv@lists.freedesktop.org
Cc: lima@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: Emma Anholt <emma@anholt.net>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Sean Paul <sean@poorly.run>
Cc: linux-arm-msm@vger.kernel.org
Cc: freedreno@lists.freedesktop.org
Link: https://patchwork.freedesktop.org/patch/msgid/20210817084917.3555822-1-daniel.vetter@ffwll.ch
2021-08-30 10:50:44 +02:00
Andrey Grodzovsky 0b10ab8069 drm/sched: Avoid data corruptions
Wait for all dependencies of a job  to complete before
killing it to avoid data corruptions.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210519141407.88444-1-andrey.grodzovsky@amd.com
2021-05-19 23:50:28 -04:00
Andrey Grodzovsky c61cdbdbff drm/scheduler: Fix hang when sched_entity released
Problem: If scheduler is already stopped by the time sched_entity
is released and entity's job_queue not empty I encountred
a hang in drm_sched_entity_flush. This is because drm_sched_entity_is_idle
never becomes false.

Fix: In drm_sched_fini detach all sched_entities from the
scheduler's run queues. This will satisfy drm_sched_entity_is_idle.
Also wakeup all those processes stuck in sched_entity flushing
as the scheduler main thread which wakes them up is stopped by now.

v2:
Reverse order of drm_sched_rq_remove_entity and marking
s_entity as stopped to prevent reinserion back to rq due
to race.

v3:
Drop drm_sched_rq_remove_entity, only modify entity->stopped
and check for it in drm_sched_entity_is_idle

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210512142648.666476-14-andrey.grodzovsky@amd.com
2021-05-19 23:50:28 -04:00
Lee Jones 04be0c5b40 drm/scheduler/sched_entity: Fix some function name disparity
Fixes the following W=1 kernel build warning(s):

 drivers/gpu/drm/scheduler/sched_entity.c:204: warning: expecting prototype for drm_sched_entity_kill_jobs(). Prototype was for drm_sched_entity_kill_jobs_cb() instead
 drivers/gpu/drm/scheduler/sched_entity.c:262: warning: expecting prototype for drm_sched_entity_cleanup(). Prototype was for drm_sched_entity_fini() instead
 drivers/gpu/drm/scheduler/sched_entity.c:305: warning: expecting prototype for drm_sched_entity_fini(). Prototype was for drm_sched_entity_destroy() instead

Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: dri-devel@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20210416143725.2769053-25-lee.jones@linaro.org
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
2021-04-22 14:05:17 +02:00
Christian König ac4eb83ab2 drm/sched: select new rq even if there is only one v3
This is necessary when changing priorities of an entity.

v2: test the sched_list instead of num_sched.
v3: set the sched_list to NULL when there is only one entry

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sonny Jiang <sonny.jiang@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210305125155.2312-1-christian.koenig@amd.com
2021-03-08 14:06:17 +01:00
Christian König f2f12eb9c3 drm/scheduler: provide scheduler score externally
Allow multiple schedulers to share the load balancing score.

This is useful when one engine has different hw rings.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-and-Tested-by: Leo Liu <leo.liu@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210204144405.2737-1-christian.koenig@amd.com
2021-02-05 10:47:11 +01:00
Lee Jones 00d44b966d gpu: drm: scheduler: sched_entity: Demote non-conformant kernel-doc headers
Fixes the following W=1 kernel build warning(s):

 drivers/gpu/drm/scheduler/sched_entity.c:316: warning: Function parameter or member 'f' not described in 'drm_sched_entity_clear_dep'
 drivers/gpu/drm/scheduler/sched_entity.c:316: warning: Function parameter or member 'cb' not described in 'drm_sched_entity_clear_dep'
 drivers/gpu/drm/scheduler/sched_entity.c:330: warning: Function parameter or member 'f' not described in 'drm_sched_entity_wakeup'
 drivers/gpu/drm/scheduler/sched_entity.c:330: warning: Function parameter or member 'cb' not described in 'drm_sched_entity_wakeup'

Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Nirmoy Das <nirmoy.aiemd@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-11-13 00:03:31 -05:00
Boris Brezillon 170fb58ee3 drm/sched: Avoid infinite waits in the drm_sched_entity_destroy() path
If we don't initialize the entity to idle and the entity is never
scheduled before being destroyed we end up with an infinite wait in the
destroy path.

v2:
- Add Steven's R-b

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/393486/
2020-10-05 15:06:33 +02:00
Nirmoy Das d41a39dda1 drm/scheduler: improve job distribution with multiple queues
This patch uses score to select a new drm scheduler for better
loadbalance between multiple drm schedulers instead of num_jobs.

Below are test results after running amdgpu_test for ~10 times.

Before this patch:

sched_name     num of many times it got schedule
=========      ==================================
sdma0          1463
sdma1          198
comp_1.0.1     280

After this patch:

sched_name     num of many times it got schedule
=========      ==================================
sdma0          925
sdma1          928
comp_1.0.1     177
comp_1.1.1     44
comp_1.2.1     43
comp_1.3.1     44

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/373000/
Signed-off-by: Christian König <christian.koenig@amd.com>
2020-06-26 14:16:29 +02:00
Nirmoy Das ec2edcc279 drm/sched: implement and export drm_sched_pick_best
Remove drm_sched_entity_get_free_sched() and use the logic of picking
the least loaded drm scheduler from a drm scheduler list to implement
drm_sched_pick_best(). This patch also exports drm_sched_pick_best() so
that it can be utilized by other drm drivers.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-16 16:21:32 -04:00
changzhu d164bebb95 Revert "drm/scheduler: improve job distribution with multiple queues"
It needs to revert this patch to avoid amdgpu_test compute hang problem
on picasso.

This reverts commit 56822db194.

Signed-off-by: changzhu <Changfeng.Zhu@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-16 16:21:32 -04:00
Nirmoy Das b37aced31e drm/scheduler: implement a function to modify sched list
Implement drm_sched_entity_modify_sched() which modifies existing
sched_list with a different one. This is going to be helpful when
userspace changes priority of a ctx/entity then the driver can switch
to the corresponding HW scheduler list for that priority.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-09 13:51:35 -04:00
Nirmoy Das 2639f453f2 drm/amdgpu: fix doc by clarifying sched_list definition
expand sched_list definition for better understanding.
Also fix a typo atleast -> at least

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-27 16:46:44 -05:00
Nirmoy Das 9e3e90c50d drm/scheduler: fix documentation by replacing rq_list with sched_list
This also replaces old artifacts with a correct one in drm_sched_entity_init()
declaration

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-16 13:41:00 -05:00
Nirmoy Das 56822db194 drm/scheduler: improve job distribution with multiple queues
This patch uses score based logic to select a new rq for better
loadbalance between multiple rq/scheds instead of num_jobs.

Below are test results after running amdgpu_test from mesa drm

Before this patch:

sched_name     num of many times it got scheduled
=========      ==================================
sdma0          314
sdma1          32
comp_1.0.0     56
comp_1.0.1     0
comp_1.1.0     0
comp_1.1.1     0
comp_1.2.0     0
comp_1.2.1     0
comp_1.3.0     0
comp_1.3.1     0
After this patch:

sched_name     num of many times it got scheduled
=========      ==================================
sdma0          216
sdma1          185
comp_1.0.0     39
comp_1.0.1     9
comp_1.1.0     12
comp_1.1.1     0
comp_1.2.0     12
comp_1.2.1     0
comp_1.3.0     12
comp_1.3.1     0

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-16 13:37:54 -05:00
Nirmoy Das 8c23056bdc drm/scheduler: do not keep a copy of sched list
entity should not keep copy and maintain sched list for
itself.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-18 16:09:12 -05:00
Nirmoy Das b3ac17667f drm/scheduler: rework entity creation
Entity currently keeps a copy of run_queue list and modify it in
drm_sched_entity_set_priority(). Entities shouldn't modify run_queue
list. Use drm_gpu_scheduler list instead of drm_sched_rq list
in drm_sched_entity struct. In this way we can select a runqueue based
on entity/ctx's priority for a  drm scheduler.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-18 16:09:12 -05:00
Andrey Grodzovsky 83a7772ba2 drm/sched: Use completion to wait for sched->thread idle v2.
Removes thread park/unpark hack from drm_sched_entity_fini and
by this fixes reactivation of scheduler thread while the thread
is supposed to be stopped.

v2: Per sched entity completion.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-07 18:08:07 -05:00
Christian König 85cb9d5067 drm/scheduler: use job count instead of peek
The spsc_queue_peek function is accessing queue->head which belongs to
the consumer thread and shouldn't be accessed by the producer

This is fixing a rare race condition when destroying entities.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Monk.liu@amd.com
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-15 10:52:10 -05:00
Sam Ravnborg 7c1be93c8e drm/scheduler: drop use of drmP.h
Drop use of the deprecated drmP.h header file.
Fix fallout.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: Sharat Masetty <smasetty@codeaurora.org>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190630061922.7254-26-sam@ravnborg.org
2019-07-15 18:11:31 +02:00
Dave Airlie fbac3c48fa Merge branch 'drm-next-5.1' of git://people.freedesktop.org/~agd5f/linux into drm-next
Fixes for 5.1:
amdgpu:
- Fix missing fw declaration after dropping old CI DPM code
- Fix debugfs access to registers beyond the MMIO bar size
- Fix context priority handling
- Add missing license on some new files
- Various cleanups and bug fixes

radeon:
- Fix missing break in CS parser for evergreen
- Various cleanups and bug fixes

sched:
- Fix entities with 0 run queues

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190221214134.3308-1-alexander.deucher@amd.com
2019-02-22 15:56:42 +10:00
Bas Nieuwenhuizen 1decbf6bb0 drm/sched: Fix entities with 0 rqs.
Some blocks in amdgpu can have 0 rqs.

Job creation already fails with -ENOENT when entity->rq is NULL,
so jobs cannot be pushed. Without a rq there is no scheduler to
pop jobs, and rq selection already does the right thing with a
list of length 0.

So the operations we need to fix are:
  - Creation, do not set rq to rq_list[0] if the list can have length 0.
  - Do not flush any jobs when there is no rq.
  - On entity destruction handle the rq = NULL case.
  - on set_priority, do not try to change the rq if it is NULL.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-02-15 11:15:08 -05:00
Eric Anholt 82abf33766 drm/sched: Always trace the dependencies we wait on, to fix a race.
The entity->dependency can go away completely once we've called
drm_sched_entity_add_dependency_cb() (if the cb is called before we
get around to tracing).  The tracepoint is more useful if we trace
every dependency instead of just ones that get callbacks installed,
anyway, so just do that.

Fixes any easy-to-produce OOPS when tracing the scheduler on V3D with
"perf record -a -e gpu_scheduler:.\* glxgears" and DEBUG_SLAB enabled.

Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-02-08 14:02:33 -05:00
Sharat Masetty 26efecf955 drm/scheduler: Add drm_sched_job_cleanup
This patch adds a new API to clean up the scheduler job resources. This
is primarliy needed in cases the job was created but was not queued to
the scheduler queue. Additionally with this change, the layer which
creates the scheduler job also gets to free up the job's resources and
this entails moving the dma_fence_put(finished_fence) to the drivers
ops free handler routines.

Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-05 14:21:27 -05:00
Andrey Grodzovsky faf6e1a87e drm/sched: Add boolean to mark if sched is ready to work v5
Problem:
A particular scheduler may become unsuable (underlying HW) after
some event (e.g. GPU reset). If it's later chosen by
the get free sched. policy a command will fail to be
submitted.

Fix:
Add a driver specific callback to report the sched status so
rq with bad sched can be avoided in favor of working one or
none in which case job init will fail.

v2: Switch from driver callback to flag in scheduler.

v3: rebase

v4: Remove ready paramter from drm_sched_init, set
uncoditionally to true once init done.

v5: fix missed change in v3d in v4 (Alex)

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-05 14:21:22 -05:00
Nathan Chancellor c1f0320e03 drm/scheduler: Simplify spsc_queue_count check in drm_sched_entity_select_rq
Clang generates a warning when it sees a logical not followed by a
conditional operator like ==, >, or <.

drivers/gpu/drm/scheduler/sched_entity.c:470:6: warning: logical not is
only applied to the left hand side of this comparison
[-Wlogical-not-parentheses]
        if (!spsc_queue_count(&entity->job_queue) == 0 ||
            ^                                     ~~
drivers/gpu/drm/scheduler/sched_entity.c:470:6: note: add parentheses
after the '!' to evaluate the comparison first
        if (!spsc_queue_count(&entity->job_queue) == 0 ||
            ^
             (                                        )
drivers/gpu/drm/scheduler/sched_entity.c:470:6: note: add parentheses
around left hand side expression to silence this warning
        if (!spsc_queue_count(&entity->job_queue) == 0 ||
            ^
            (                                    )
1 warning generated.

It assumes the author might have made a mistake in their logic:

if (!a == b) -> if (!(a == b))

Sometimes that is the case; other times, it's just a super convoluted
way of saying 'if (a)' when b = 0:

if (!1 == 0) -> if (0 == 0) -> if (true)

Alternatively:

if (!1 == 0) -> if (!!1) -> if (1)

Simplify this comparison so that Clang doesn't complain.

Fixes: 35e160e781 ("drm/scheduler: change entities rq even earlier")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-10-09 17:06:00 -05:00
Nayan Deshmukh c89677afb3 drm/scheduler: avoid redundant shifting of the entity v2
do not remove entity from the rq if the current rq is from
the least loaded scheduler.

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27 11:11:14 -05:00
Andrey Grodzovsky 62347a3300 drm/scheduler: Add stopped flag to drm_sched_entity
The flag will prevent another thread from same process to
reinsert the entity queue into scheduler's rq after it was already
removewd from there by another thread during drm_sched_entity_flush.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27 11:11:10 -05:00
Christian König 7b10574eac drm/scheduler: cleanup entity coding style
Cleanup coding style in sched_entity.c

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27 11:10:43 -05:00
Christian König 620e762f9a drm/scheduler: move entity handling into separate file
This is complex enough on it's own. Move it into a separate C file.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27 11:10:43 -05:00