Commit Graph

154 Commits

Author SHA1 Message Date
Chris Wilson 2871ea85c1 drm/i915/gt: Split intel_ring_submission
Split the legacy submission backend from the common CS ring buffer
handling.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191024100344.5041-1-chris@chris-wilson.co.uk
2019-10-24 12:14:21 +01:00
Sebastian Andrzej Siewior 1dfffa0051 drm/i915: Don't disable interrupts independently of the lock
The locks (active.lock and rq->lock) need to be taken with disabled
interrupts. This is done in i915_request_retire() by disabling the
interrupts independently of the locks itself.
While local_irq_disable()+spin_lock() equals spin_lock_irq() on vanilla
it does not on PREEMPT_RT.
Chris Wilson confirmed that local_irq_disable() was just introduced as
an optimisation to avoid enabling/disabling interrupts during
lock/unlock combo.

Enable/disable interrupts as part of the locking instruction.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191017161352.e5z3ugse7gxl5ari@linutronix.de
2019-10-17 21:51:51 +01:00
Chris Wilson 1930650240 drm/i915: Flush tasklet submission before sleeping on i915_request_wait
If the system is being slow and userspace is racing ahead of the GPU and
finds itself waiting for the GPU to catch up, before the process sleeps
give the tasklet a kick, bypassing ksoftirqd. If the system is
overloaded, then ksoftirqd may be delayed incurring additional latency
to our user.

This should not be a frequent problem, but in the past we have observed
several hundred millisecond delays before ksoftirqd services an
interrupt, so burn a few cycles to lend a helping hand.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191015132606.14349-1-chris@chris-wilson.co.uk
2019-10-15 16:42:47 +01:00
Chris Wilson 89b6d1831d drm/i915/execlists: Tweak virtual unsubmission
Since commit e2144503bf ("drm/i915: Prevent bonded requests from
overtaking each other on preemption") we have restricted requests to run
on their chosen engine across preemption events. We can take this
restriction into account to know that we will want to resubmit those
requests onto the same physical engine, and so can shortcircuit the
virtual engine selection process and keep the request on the same
engine during unwind.

References: e2144503bf ("drm/i915: Prevent bonded requests from overtaking each other on preemption")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Ramlingam C <ramalingam.c@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191013203012.25208-1-chris@chris-wilson.co.uk
2019-10-14 12:51:17 +01:00
Chris Wilson 6610197542 drm/i915: Move request runtime management onto gt
Requests are run from the gt and are tided into the gt runtime power
management, so pull the runtime request management under gt/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-12-chris@chris-wilson.co.uk
2019-10-04 15:39:26 +01:00
Chris Wilson f33a8a5160 drm/i915: Merge wait_for_timelines with retire_request
wait_for_timelines is essentially the same loop as retiring requests
(with an extra timeout), so merge the two into one routine.

v2: i915_retire_requests_timeout and keep VT'd w/a as !interruptible

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-10-chris@chris-wilson.co.uk
2019-10-04 15:39:23 +01:00
Chris Wilson b1e3177bd1 drm/i915: Coordinate i915_active with its own mutex
Forgo the struct_mutex serialisation for i915_active, and interpose its
own mutex handling for active/retire.

This is a multi-layered sleight-of-hand. First, we had to ensure that no
active/retire callbacks accidentally inverted the mutex ordering rules,
nor assumed that they were themselves serialised by struct_mutex. More
challenging though, is the rule over updating elements of the active
rbtree. Instead of the whole i915_active now being serialised by
struct_mutex, allocations/rotations of the tree are serialised by the
i915_active.mutex and individual nodes are serialised by the caller
using the i915_timeline.mutex (we need to use nested spinlocks to
interact with the dma_fence callback lists).

The pain point here is that instead of a single mutex around execbuf, we
now have to take a mutex for active tracker (one for each vma, context,
etc) and a couple of spinlocks for each fence update. The improvement in
fine grained locking allowing for multiple concurrent clients
(eventually!) should be worth it in typical loads.

v2: Add some comments that barely elucidate anything :(

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-6-chris@chris-wilson.co.uk
2019-10-04 15:39:12 +01:00
Chris Wilson c0bb487dc1 drm/i915: Only enqueue already completed requests
If we are asked to submit a completed request, just move it onto the
active-list without modifying it's payload. If we try to emit the
modified payload of a completed request, we risk racing with the
ring->head update during retirement which may advance the head past our
breadcrumb and so we generate a warning for the emission being behind
the RING_HEAD.

v2: Commentary for the sneaky, shared responsibility between functions.
v3: Spelling mistakes and bonus assertion

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-3-chris@chris-wilson.co.uk
2019-09-23 16:21:37 +01:00
Chris Wilson 6a79d84840 drm/i915: Lock signaler timeline while navigating
As we need to take a walk back along the signaler timeline to find the
fence before upon which we want to wait, we need to lock that timeline
to prevent it being modified as we walk. Similarly, we also need to
acquire a reference to the earlier fence while it still exists!

Though we lack the correct locking today, we are saved by the
overarching struct_mutex -- but that protection is being removed.

v2: Tvrtko made me realise I was being lax and using annotations to
ignore the AB-BA deadlock from the timeline overlap. As it would be
possible to construct a second request that was using a semaphore from the
same timeline as ourselves, we could quite easily end up in a situation
where we deadlocked in our mutex waits. Avoid that by using a trylock
and falling back to a normal dma-fence await if contended.

v3: Eek, the signal->timeline is volatile and must be carefully
dereferenced to ensure it is valid.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190919111912.21631-2-chris@chris-wilson.co.uk
2019-09-20 10:24:12 +01:00
Chris Wilson d19d71fc2b drm/i915: Mark i915_request.timeline as a volatile, rcu pointer
The request->timeline is only valid until the request is retired (i.e.
before it is completed). Upon retiring the request, the context may be
unpinned and freed, and along with it the timeline may be freed. We
therefore need to be very careful when chasing rq->timeline that the
pointer does not disappear beneath us. The vast majority of users are in
a protected context, either during request construction or retirement,
where the timeline->mutex is held and the timeline cannot disappear. It
is those few off the beaten path (where we access a second timeline) that
need extra scrutiny -- to be added in the next patch after first adding
the warnings about dangerous access.

One complication, where we cannot use the timeline->mutex itself, is
during request submission onto hardware (under spinlocks). Here, we want
to check on the timeline to finalize the breadcrumb, and so we need to
impose a second rule to ensure that the request->timeline is indeed
valid. As we are submitting the request, it's context and timeline must
be pinned, as it will be used by the hardware. Since it is pinned, we
know the request->timeline must still be valid, and we cannot submit the
idle barrier until after we release the engine->active.lock, ergo while
submitting and holding that spinlock, a second thread cannot release the
timeline.

v2: Don't be lazy inside selftests; hold the timeline->mutex for as long
as we need it, and tidy up acquiring the timeline with a bit of
refactoring (i915_active_add_request)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190919111912.21631-1-chris@chris-wilson.co.uk
2019-09-20 10:24:09 +01:00
Chris Wilson 37fa0de3c1 drm/i915: Verify the engine after acquiring the active.lock
When using virtual engines, the rq->engine is not stable until we hold
the engine->active.lock (as the virtual engine may be exchanged with the
sibling). Since commit 22b7a426bb ("drm/i915/execlists: Preempt-to-busy")
we may retire a request concurrently with resubmitting it to HW, we need
to be extra careful to verify we are holding the correct lock for the
request's active list. This is similar to the issue we saw with
rescheduling the virtual requests, see sched_lock_engine().

Or else:

<4> [876.736126] list_add corruption. prev->next should be next (ffff8883f931a1f8), but was dead000000000100. (prev=ffff888361ffa610).
<4> [876.736136] WARNING: CPU: 2 PID: 21 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70
<4> [876.736137] Modules linked in: i915(+) amdgpu gpu_sched ttm vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_intel_nhlt snd_hda_codec snd_hwdep snd_hda_core ghash_clmulni_intel e1000e cdc_ether usbnet mii snd_pcm ptp pps_core mei_me mei prime_numbers btusb btrtl btbcm btintel bluetooth ecdh_generic ecc [last unloaded: i915]
<4> [876.736154] CPU: 2 PID: 21 Comm: ksoftirqd/2 Tainted: G     U            5.3.0-CI-CI_DRM_6898+ #1
<4> [876.736156] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3183.A00.1905020411 05/02/2019
<4> [876.736157] RIP: 0010:__list_add_valid+0x4d/0x70
<4> [876.736159] Code: c3 48 89 d1 48 c7 c7 20 33 0e 82 48 89 c2 e8 4a 4a bc ff 0f 0b 31 c0 c3 48 89 c1 4c 89 c6 48 c7 c7 70 33 0e 82 e8 33 4a bc ff <0f> 0b 31 c0 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 c0 33 0e 82 e8
<4> [876.736160] RSP: 0018:ffffc9000018bd30 EFLAGS: 00010082
<4> [876.736162] RAX: 0000000000000000 RBX: ffff888361ffc840 RCX: 0000000000000104
<4> [876.736163] RDX: 0000000080000104 RSI: 0000000000000000 RDI: 00000000ffffffff
<4> [876.736164] RBP: ffffc9000018bd68 R08: 0000000000000000 R09: 0000000000000001
<4> [876.736165] R10: 00000000aed95de3 R11: 000000007fe927eb R12: ffff888361ffca10
<4> [876.736166] R13: ffff888361ffa610 R14: ffff888361ffc880 R15: ffff8883f931a1f8
<4> [876.736168] FS:  0000000000000000(0000) GS:ffff88849fd00000(0000) knlGS:0000000000000000
<4> [876.736169] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [876.736170] CR2: 00007f093a9173c0 CR3: 00000003bba08005 CR4: 0000000000760ee0
<4> [876.736171] PKRU: 55555554
<4> [876.736172] Call Trace:
<4> [876.736226]  __i915_request_submit+0x152/0x370 [i915]
<4> [876.736263]  __execlists_submission_tasklet+0x6da/0x1f50 [i915]
<4> [876.736293]  ? execlists_submission_tasklet+0x29/0x50 [i915]
<4> [876.736321]  execlists_submission_tasklet+0x34/0x50 [i915]
<4> [876.736325]  tasklet_action_common.isra.5+0x47/0xb0
<4> [876.736328]  __do_softirq+0xd8/0x4ae
<4> [876.736332]  ? smpboot_thread_fn+0x23/0x280
<4> [876.736334]  ? smpboot_thread_fn+0x6b/0x280
<4> [876.736336]  run_ksoftirqd+0x2b/0x50
<4> [876.736338]  smpboot_thread_fn+0x1d3/0x280
<4> [876.736341]  ? sort_range+0x20/0x20
<4> [876.736343]  kthread+0x119/0x130
<4> [876.736345]  ? kthread_park+0xa0/0xa0
<4> [876.736347]  ret_from_fork+0x24/0x50
<4> [876.736353] irq event stamp: 2290145
<4> [876.736356] hardirqs last  enabled at (2290144): [<ffffffff8123cde8>] __slab_free+0x3e8/0x500
<4> [876.736358] hardirqs last disabled at (2290145): [<ffffffff819cfb4d>] _raw_spin_lock_irqsave+0xd/0x50
<4> [876.736360] softirqs last  enabled at (2290114): [<ffffffff81c0033e>] __do_softirq+0x33e/0x4ae
<4> [876.736361] softirqs last disabled at (2290119): [<ffffffff810b815b>] run_ksoftirqd+0x2b/0x50
<4> [876.736363] WARNING: CPU: 2 PID: 21 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70
<4> [876.736364] ---[ end trace 3e58d6c7356c65bf ]---
<4> [876.736406] ------------[ cut here ]------------
<4> [876.736415] list_del corruption. prev->next should be ffff888361ffca10, but was ffff88840ac2c730
<4> [876.736421] WARNING: CPU: 2 PID: 5490 at lib/list_debug.c:53 __list_del_entry_valid+0x79/0x90
<4> [876.736422] Modules linked in: i915(+) amdgpu gpu_sched ttm vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_intel_nhlt snd_hda_codec snd_hwdep snd_hda_core ghash_clmulni_intel e1000e cdc_ether usbnet mii snd_pcm ptp pps_core mei_me mei prime_numbers btusb btrtl btbcm btintel bluetooth ecdh_generic ecc [last unloaded: i915]
<4> [876.736433] CPU: 2 PID: 5490 Comm: i915_selftest Tainted: G     U  W         5.3.0-CI-CI_DRM_6898+ #1
<4> [876.736435] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3183.A00.1905020411 05/02/2019
<4> [876.736436] RIP: 0010:__list_del_entry_valid+0x79/0x90
<4> [876.736438] Code: 0b 31 c0 c3 48 89 fe 48 c7 c7 30 34 0e 82 e8 ae 49 bc ff 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 68 34 0e 82 e8 97 49 bc ff <0f> 0b 31 c0 c3 48 c7 c7 a8 34 0e 82 e8 86 49 bc ff 0f 0b 31 c0 c3
<4> [876.736439] RSP: 0018:ffffc900003ef758 EFLAGS: 00010086
<4> [876.736440] RAX: 0000000000000000 RBX: ffff888361ffc840 RCX: 0000000000000002
<4> [876.736442] RDX: 0000000080000002 RSI: 0000000000000000 RDI: 00000000ffffffff
<4> [876.736443] RBP: ffffc900003ef780 R08: 0000000000000000 R09: 0000000000000001
<4> [876.736444] R10: 000000001418e4b7 R11: 000000007f0ea93b R12: ffff888361ffcab8
<4> [876.736445] R13: ffff88843b6d0000 R14: 000000000000217c R15: 0000000000000001
<4> [876.736447] FS:  00007f4e6f255240(0000) GS:ffff88849fd00000(0000) knlGS:0000000000000000
<4> [876.736448] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [876.736449] CR2: 00007f093a9173c0 CR3: 00000003bba08005 CR4: 0000000000760ee0
<4> [876.736450] PKRU: 55555554
<4> [876.736451] Call Trace:
<4> [876.736488]  i915_request_retire+0x224/0x8e0 [i915]
<4> [876.736521]  i915_request_create+0x4b/0x1b0 [i915]
<4> [876.736550]  nop_virtual_engine+0x230/0x4d0 [i915]

Fixes: 22b7a426bb ("drm/i915/execlists: Preempt-to-busy")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111695
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190918145453.8800-1-chris@chris-wilson.co.uk
2019-09-19 11:50:36 +01:00
Chris Wilson c210e85b8f drm/i915/tgl: Extend MI_SEMAPHORE_WAIT
On Tigerlake, MI_SEMAPHORE_WAIT grew an extra dword, so be sure to
update the length field and emit that extra parameter and any padding
noop as required.

v2: Define the token shift while we are adding the updated MI_SEMAPHORE_WAIT
v3: Use int instead of bool in the addition so that readers are not left
wondering about the intricacies of the C spec. Now they just have to
worry what the integer value of a boolean operation is...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190917123055.28965-1-chris@chris-wilson.co.uk
2019-09-17 15:33:21 +01:00
Chris Wilson 0f7dc62068 drm/i915: Protect our local workers against I915_FENCE_TIMEOUT
Trust our own workers to not cause unnecessary delays and disable the
automatic timeout on their asynchronous fence waits. (Along the same
lines that we trust our own requests to complete eventually, if
necessary by force.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190826072149.9447-6-chris@chris-wilson.co.uk
2019-08-28 18:17:53 +01:00
Chris Wilson 7771590692 drm/i915: Keep drm_i915_file_private around under RCU
Ensure that the drm_i915_file_private continues to exist as we attempt
to remove a request from its list, which may race with the destruction
of the file.

<6> [38.380714] [IGT] gem_ctx_create: starting subtest basic-files
<0> [42.201329] BUG: spinlock bad magic on CPU#0, kworker/u16:0/7
<4> [42.201356] general protection fault: 0000 [#1] PREEMPT SMP PTI
<4> [42.201371] CPU: 0 PID: 7 Comm: kworker/u16:0 Tainted: G     U            5.3.0-rc5-CI-Patchwork_14169+ #1
<4> [42.201391] Hardware name: Dell Inc.                 OptiPlex 745                 /0GW726, BIOS 2.3.1  05/21/2007
<4> [42.201594] Workqueue: i915 retire_work_handler [i915]
<4> [42.201614] RIP: 0010:spin_dump+0x5a/0x90
<4> [42.201625] Code: 00 48 8d 88 c0 06 00 00 48 c7 c7 00 71 09 82 e8 35 ef 00 00 48 85 db 44 8b 4d 08 41 b8 ff ff ff ff 48 c7 c1 0b cd 0f 82 74 0e <44> 8b 83 e0 04 00 00 48 8d 8b c0 06 00 00 8b 55 04 48 89 ee 48 c7
<4> [42.201660] RSP: 0018:ffffc9000004bd80 EFLAGS: 00010202
<4> [42.201673] RAX: 0000000000000031 RBX: 6b6b6b6b6b6b6b6b RCX: ffffffff820fcd0b
<4> [42.201688] RDX: 0000000000000000 RSI: ffff88803de266f8 RDI: 00000000ffffffff
<4> [42.201703] RBP: ffff888038381ff8 R08: 00000000ffffffff R09: 000000006b6b6b6b
<4> [42.201718] R10: 0000000041cb0b89 R11: 646162206b636f6c R12: ffff88802a618500
<4> [42.201733] R13: ffff88802b32c288 R14: ffff888038381ff8 R15: ffff88802b32c250
<4> [42.201748] FS:  0000000000000000(0000) GS:ffff88803de00000(0000) knlGS:0000000000000000
<4> [42.201765] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [42.201778] CR2: 00007f2cefc6d180 CR3: 00000000381ee000 CR4: 00000000000006f0
<4> [42.201793] Call Trace:
<4> [42.201805]  do_raw_spin_lock+0x66/0xb0
<4> [42.201898]  i915_request_retire+0x548/0x7c0 [i915]
<4> [42.201989]  retire_requests+0x4d/0x60 [i915]
<4> [42.202078]  i915_retire_requests+0x144/0x2e0 [i915]
<4> [42.202169]  retire_work_handler+0x10/0x40 [i915]

Recently, in commit 44c22f3f1a ("drm/i915: Serialize insertion into the
file->mm.request_list"), we fixed a race on insertion. Now, it appears
we also have a race with destruction!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190823181455.31910-1-chris@chris-wilson.co.uk
2019-08-23 22:13:17 +01:00
Chris Wilson 6dcb85a0ad drm/i915: Hold irq-off for the entire fake lock period
Sadly lockdep records when the irqs are re-enabled and then marks up the
fake lock as being irq-unsafe. Our hand is forced and so we must mark up
the entire fake lock critical section as irq-off.

Hopefully this is the last tweak required!

v2: Not quite, we need to mark the timeline spinlock as irqsafe. That
was a genuine bug being hidden by the earlier lockdep splat.

Fixes: d67739268c ("drm/i915/gt: Mark up the nested engine-pm timeline lock as irqsafe")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190823132700.25286-2-chris@chris-wilson.co.uk
2019-08-23 19:44:21 +01:00
Rodrigo Vivi 829e8def7b Merge drm/drm-next into drm-intel-next-queued
We need the rename of reservation_object to dma_resv.

The solution on this merge came from linux-next:
From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Wed, 14 Aug 2019 12:48:39 +1000
Subject: [PATCH] drm: fix up fallout from "dma-buf: rename reservation_object to dma_resv"

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
---
 drivers/gpu/drm/i915/gt/intel_engine_pool.c | 8 ++++----
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pool.c b/drivers/gpu/drm/i915/gt/intel_engine_pool.c
index 03d90b49584a..4cd54c569911 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pool.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pool.c
@@ -43,12 +43,12 @@ static int pool_active(struct i915_active *ref)
 {
        struct intel_engine_pool_node *node =
                container_of(ref, typeof(*node), active);
-       struct reservation_object *resv = node->obj->base.resv;
+       struct dma_resv *resv = node->obj->base.resv;
        int err;

-       if (reservation_object_trylock(resv)) {
-               reservation_object_add_excl_fence(resv, NULL);
-               reservation_object_unlock(resv);
+       if (dma_resv_trylock(resv)) {
+               dma_resv_add_excl_fence(resv, NULL);
+               dma_resv_unlock(resv);
        }

        err = i915_gem_object_pin_pages(node->obj);

which is a simplified version from a previous one which had:
Reviewed-by: Christian König <christian.koenig@amd.com>

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2019-08-22 00:10:36 -07:00
Dave Airlie 5f680625d9 drm-misc-next for 5.4:
UAPI Changes:
 
 Cross-subsystem Changes:
 
 Core Changes:
   - dma-buf: add reservation_object_fences helper, relax
              reservation_object_add_shared_fence, remove
              reservation_object seq number (and then
              restored)
   - dma-fence: Shrinkage of the dma_fence structure,
                Merge dma_fence_signal and dma_fence_signal_locked,
                Store the timestamp in struct dma_fence in a union with
                cb_list
 
 Driver Changes:
   - More dt-bindings YAML conversions
   - More removal of drmP.h includes
   - dw-hdmi: Support get_eld and various i2s improvements
   - gm12u320: Few fixes
   - meson: Global cleanup
   - panfrost: Few refactors, Support for GPU heap allocations
   - sun4i: Support for DDC enable GPIO
   - New panels: TI nspire, NEC NL8048HL11, LG Philips LB035Q02,
                 Sharp LS037V7DW01, Sony ACX565AKM, Toppoly TD028TTEC1
                 Toppoly TD043MTEA1
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCXVqvpwAKCRDj7w1vZxhR
 xa3RAQDzAnt5zeesAxX4XhRJzHoCEwj2PJj9Re6xMJ9PlcfcvwD+OS+bcB6jfiXV
 Ug9IBd/DqjlmD9G9MxFxfSV946rksAw=
 =8uv4
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2019-08-19' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next for 5.4:

UAPI Changes:

Cross-subsystem Changes:

Core Changes:
  - dma-buf: add reservation_object_fences helper, relax
             reservation_object_add_shared_fence, remove
             reservation_object seq number (and then
             restored)
  - dma-fence: Shrinkage of the dma_fence structure,
               Merge dma_fence_signal and dma_fence_signal_locked,
               Store the timestamp in struct dma_fence in a union with
               cb_list

Driver Changes:
  - More dt-bindings YAML conversions
  - More removal of drmP.h includes
  - dw-hdmi: Support get_eld and various i2s improvements
  - gm12u320: Few fixes
  - meson: Global cleanup
  - panfrost: Few refactors, Support for GPU heap allocations
  - sun4i: Support for DDC enable GPIO
  - New panels: TI nspire, NEC NL8048HL11, LG Philips LB035Q02,
                Sharp LS037V7DW01, Sony ACX565AKM, Toppoly TD028TTEC1
                Toppoly TD043MTEA1

Signed-off-by: Dave Airlie <airlied@redhat.com>
[airlied: fixup dma_resv rename fallout]

From: Maxime Ripard <maxime.ripard@bootlin.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190819141923.7l2adietcr2pioct@flea
2019-08-21 16:44:41 +10:00
Chris Wilson 44c22f3f1a drm/i915: Serialize insertion into the file->mm.request_list
Currently, we remove the from per-file request list for throttling and
retirement under a dedicated spinlock, but insertion is governed by
struct_mutex. This needs to be the same lock so that the
retirement/insertion of neighbouring requests (at the tail) doesn't
break the list.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190820080907.4665-1-chris@chris-wilson.co.uk
2019-08-20 14:23:45 +01:00
Chris Wilson cc3375607d drm/i915: Use 0 for the unordered context
Since commit 078dec3326 ("dma-buf: add dma_fence_get_stub") the 0
fence context became an impossible match as it is used for an always
signaled fence. We can simplify our timeline tracking by knowing that 0
always means no match.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190819184404.24200-1-chris@chris-wilson.co.uk
Link: https://patchwork.freedesktop.org/patch/msgid/20190819175109.5241-1-chris@chris-wilson.co.uk
2019-08-19 20:07:03 +01:00
Chris Wilson ef46884975 drm/i915: Propagate fence errors
Errors spread like wildfire, and must eventually be returned to the
user. They need to be captured and passed along the flow of fences,
infecting each in turn with the existing error, until finally they fall
out of a user visible result.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190817232511.11391-1-chris@chris-wilson.co.uk
2019-08-18 12:38:09 +01:00
Chris Wilson 6c69a45445 drm/i915/gt: Mark context->active_count as protected by timeline->mutex
We use timeline->mutex to protect modifications to
context->active_count, and the associated enable/disable callbacks.
Due to complications with engine-pm barrier there is a path where we used
a "superlock" to provide serialised protect and so could not
unconditionally assert with lockdep that it was always held. However,
we can mark the mutex as taken (noting that we may be nested underneath
ourselves) which means we can be reassured the right timeline->mutex is
always treated as held and let lockdep roam free.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190816121000.8507-1-chris@chris-wilson.co.uk
2019-08-16 18:02:06 +01:00
Chris Wilson e5dadff4b0 drm/i915: Protect request retirement with timeline->mutex
Forgo the struct_mutex requirement for request retirement as we have
been transitioning over to only using the timeline->mutex for
controlling the lifetime of a request on that timeline.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190815205709.24285-4-chris@chris-wilson.co.uk
2019-08-15 23:21:13 +01:00
Chris Wilson 62520e3361 drm/i915: Move tasklet kicking to __i915_request_queue caller
Since __i915_request_queue() may be called from hardirq (timer) context,
we cannot use local_bh_disable/enable at the lower level. As we do want
to kick the tasklet to speed up initial submission or preemption for
normal client submission, lift it to the normal process context
callpath.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190815042031.27750-1-chris@chris-wilson.co.uk
2019-08-15 13:27:44 +01:00
Chris Wilson a79ca656b6 drm/i915: Push the wakeref->count deferral to the backend
If the backend wishes to defer the wakeref parking, make it responsible
for unlocking the wakeref (i.e. bumping the counter). This allows it to
time the unlock much more carefully in case it happens to needs the
wakeref to be active during its deferral.

For instance, during engine parking we may choose to emit an idle
barrier (a request). To do so, we borrow the engine->kernel_context
timeline and to ensure exclusive access we keep the
engine->wakeref.count as 0. However, to submit that request to HW may
require a intel_engine_pm_get() (e.g. to keep the submission tasklet
alive) and before we allow that we have to rewake our wakeref to avoid a
recursive deadlock.

<4> [257.742916] IRQs not enabled as expected
<4> [257.742930] WARNING: CPU: 0 PID: 0 at kernel/softirq.c:169 __local_bh_enable_ip+0xa9/0x100
<4> [257.742936] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 btusb btrtl btbcm btintel snd_hda_intel snd_intel_nhlt bluetooth snd_hda_codec coretemp snd_hwdep crct10dif_pclmul snd_hda_core crc32_pclmul ecdh_generic ecc ghash_clmulni_intel snd_pcm r8169 realtek lpc_ich prime_numbers i2c_hid
<4> [257.742991] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G     U  W         5.3.0-rc3-g5d0a06cd532c-drmtip_340+ #1
<4> [257.742998] Hardware name: GIGABYTE GB-BXBT-1900/MZBAYAB-00, BIOS F6 02/17/2015
<4> [257.743008] RIP: 0010:__local_bh_enable_ip+0xa9/0x100
<4> [257.743017] Code: 37 5b 5d c3 8b 80 50 08 00 00 85 c0 75 a9 80 3d 0b be 25 01 00 75 a0 48 c7 c7 f3 0c 06 ac c6 05 fb bd 25 01 01 e8 77 84 ff ff <0f> 0b eb 89 48 89 ef e8 3b 41 06 00 eb 98 e8 e4 5c f4 ff 5b 5d c3
<4> [257.743025] RSP: 0018:ffffa78600003cb8 EFLAGS: 00010086
<4> [257.743035] RAX: 0000000000000000 RBX: 0000000000000200 RCX: 0000000000010302
<4> [257.743042] RDX: 0000000080010302 RSI: 0000000000000000 RDI: 00000000ffffffff
<4> [257.743050] RBP: ffffffffc0494bb3 R08: 0000000000000000 R09: 0000000000000001
<4> [257.743058] R10: 0000000014c8f0e9 R11: 00000000fee2ff8e R12: ffffa23ba8c38008
<4> [257.743065] R13: ffffa23bacc579c0 R14: ffffa23bb7db0f60 R15: ffffa23b9cc8c430
<4> [257.743074] FS:  0000000000000000(0000) GS:ffffa23bbba00000(0000) knlGS:0000000000000000
<4> [257.743082] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [257.743089] CR2: 00007fe477b20778 CR3: 000000011f72a000 CR4: 00000000001006f0
<4> [257.743096] Call Trace:
<4> [257.743104]  <IRQ>
<4> [257.743265]  __i915_request_commit+0x240/0x5d0 [i915]
<4> [257.743427]  ? __i915_request_create+0x228/0x4c0 [i915]
<4> [257.743584]  __engine_park+0x64/0x250 [i915]
<4> [257.743730]  ____intel_wakeref_put_last+0x1c/0x70 [i915]
<4> [257.743878]  i915_sample+0x2ee/0x310 [i915]
<4> [257.744030]  ? i915_pmu_cpu_offline+0xb0/0xb0 [i915]
<4> [257.744040]  __hrtimer_run_queues+0x11e/0x4b0
<4> [257.744068]  hrtimer_interrupt+0xea/0x250
<4> [257.744079]  ? lockdep_hardirqs_off+0x79/0xd0
<4> [257.744101]  smp_apic_timer_interrupt+0x96/0x280
<4> [257.744114]  apic_timer_interrupt+0xf/0x20
<4> [257.744125] RIP: 0010:__do_softirq+0xb3/0x4ae

v2: Keep the priority_hint assert
v3: That assert was desperately trying to point out my bug. Sorry, little
assert.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111378
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190813190705.23869-1-chris@chris-wilson.co.uk
2019-08-13 21:09:49 +01:00
Christian König 52791eeec1 dma-buf: rename reservation_object to dma_resv
Be more consistent with the naming of the other DMA-buf objects.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/323401/
2019-08-13 09:09:30 +02:00
Chris Wilson 75d0a7f31e drm/i915: Lift timeline into intel_context
Move the timeline from being inside the intel_ring to intel_context
itself. This saves much pointer dancing and makes the relations of the
context to its timeline much clearer.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190809182518.20486-4-chris@chris-wilson.co.uk
2019-08-09 20:18:30 +01:00
Jani Nikula a09d9a8002 drm/i915: avoid including intel_drv.h via i915_drv.h->i915_trace.h
Disentangle i915_drv.h from intel_drv.h, which gets included via
i915_trace.h. This necessitates including i915_trace.h wherever it's
needed.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ed82bf259d3b725a1a1a3c3e9d6fb5c08bc4d489.1565085691.git.jani.nikula@intel.com
2019-08-07 12:43:14 +03:00
Chris Wilson c29579d2fa drm/i915/gem: Make caps.scheduler static
We do not notify userspace when the scheduler capabilities are changed
(due to wedging the driver) and as such userspace will expect the caps
to be static and unchanging. Make it so, and so we only need to compute
our caps once during driver registration.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190806124300.24945-1-chris@chris-wilson.co.uk
2019-08-06 15:00:14 +01:00
Chris Wilson cb823ed991 drm/i915/gt: Use intel_gt as the primary object for handling resets
Having taken the first step in encapsulating the functionality by moving
the related files under gt/, the next step is to start encapsulating by
passing around the relevant structs rather than the global
drm_i915_private. In this step, we pass intel_gt to intel_reset.c

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190712192953.9187-1-chris@chris-wilson.co.uk
2019-07-12 21:06:56 +01:00
Lionel Landwerlin 2a98f4e65b drm/i915: add infrastructure to hold off preemption on a request
We want to set this flag in the next commit on requests containing
perf queries so that the result of the perf query can just be a delta
of global counters, rather than doing post processing of the OA
buffer.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[ickle: add basic selftest for nopreempt]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190709164227.25859-1-chris@chris-wilson.co.uk
2019-07-09 21:26:40 +01:00
Tvrtko Ursulin f0c02c1b91 drm/i915: Rename i915_timeline to intel_timeline and move under gt
Move all timeline code under gt and rename to intel_gt prefix.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-32-tvrtko.ursulin@linux.intel.com
2019-06-21 13:48:53 +01:00
Chris Wilson 22b7a426bb drm/i915/execlists: Preempt-to-busy
When using a global seqno, we required a precise stop-the-workd event to
handle preemption and unwind the global seqno counter. To accomplish
this, we would preempt to a special out-of-band context and wait for the
machine to report that it was idle. Given an idle machine, we could very
precisely see which requests had completed and which we needed to feed
back into the run queue.

However, now that we have scrapped the global seqno, we no longer need
to precisely unwind the global counter and only track requests by their
per-context seqno. This allows us to loosely unwind inflight requests
while scheduling a preemption, with the enormous caveat that the
requests we put back on the run queue are still _inflight_ (until the
preemption request is complete). This makes request tracking much more
messy, as at any point then we can see a completed request that we
believe is not currently scheduled for execution. We also have to be
careful not to rewind RING_TAIL past RING_HEAD on preempting to the
running context, and for this we use a semaphore to prevent completion
of the request before continuing.

To accomplish this feat, we change how we track requests scheduled to
the HW. Instead of appending our requests onto a single list as we
submit, we track each submission to ELSP as its own block. Then upon
receiving the CS preemption event, we promote the pending block to the
inflight block (discarding what was previously being tracked). As normal
CS completion events arrive, we then remove stale entries from the
inflight tracker.

v2: Be a tinge paranoid and ensure we flush the write into the HWS page
for the GPU semaphore to pick in a timely fashion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190620142052.19311-1-chris@chris-wilson.co.uk
2019-06-20 16:52:36 +01:00
Chris Wilson b87b6c0dfc drm/i915: Flush the execution-callbacks on retiring
In the unlikely case the request completes while we regard it as not even
executing on the GPU (see the next patch!), we have to flush any pending
execution callbacks at retirement and ensure that we do not add any
more.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190618074153.16055-4-chris@chris-wilson.co.uk
2019-06-19 17:09:25 +01:00
Chris Wilson ce94bef935 drm/i915: Signal fence completion from i915_request_wait
With the upcoming change to automanaged i915_active, the intent is that
whenever we wait on the set of active fences, they are signaled and
collected.  The requirement is that all successful returns from
i915_request_wait() signal the fence, so fixup the one remaining path
where we may return before the interrupt has been run.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190619112341.9082-3-chris@chris-wilson.co.uk
2019-06-19 16:45:04 +01:00
Chris Wilson 2f5309452d drm/i915: Stop passing I915_WAIT_LOCKED to i915_request_wait()
Since commit eb8d0f5af4 ("drm/i915: Remove GPU reset dependence on
struct_mutex"), the I915_WAIT_LOCKED flags passed to i915_request_wait()
has been defunct. Now go ahead and remove it from all callers.

References: eb8d0f5af4 ("drm/i915: Remove GPU reset dependence on struct_mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190618074153.16055-3-chris@chris-wilson.co.uk
2019-06-19 12:58:38 +01:00
Chris Wilson 44d89409a1 drm/i915: Make the semaphore saturation mask global
The idea behind keeping the saturation mask local to a context backfired
spectacularly. The premise with the local mask was that we would be more
proactive in attempting to use semaphores after each time the context
idled, and that all new contexts would attempt to use semaphores
ignoring the current state of the system. This turns out to be horribly
optimistic. If the system state is still oversaturated and the existing
workloads have all stopped using semaphores, the new workloads would
attempt to use semaphores and be deprioritised behind real work. The
new contexts would not switch off using semaphores until their initial
batch of low priority work had completed. Given sufficient backload load
of equal user priority, this would completely starve the new work of any
GPU time.

To compensate, remove the local tracking in favour of keeping it as
global state on the engine -- once the system is saturated and
semaphores are disabled, everyone stops attempting to use semaphores
until the system is idle again. One of the reason for preferring local
context tracking was that it worked with virtual engines, so for
switching to global state we could either do a complete check of all the
virtual siblings or simply disable semaphores for those requests. This
takes the simpler approach of disabling semaphores on virtual engines.

The downside is that the decision that the engine is saturated is a
local measure -- we are only checking whether or not this context was
scheduled in a timely fashion, it may be legitimately delayed due to user
priorities. We still have the same dilemma though, that we do not want
to employ the semaphore poll unless it will be used.

v2: Explain why we need to assume the worst wrt virtual engines.

Fixes: ca6e56f654 ("drm/i915: Disable semaphore busywaits on saturated systems")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190618074153.16055-8-chris@chris-wilson.co.uk
2019-06-19 12:10:45 +01:00
Chris Wilson ef78f7b187 drm/i915: Use drm_gem_object.resv
Since commit 1ba627148e ("drm: Add reservation_object to
drm_gem_object"), struct drm_gem_object grew its own builtin
reservation_object rendering our own private one bloat. Remove our
redundant reservation_object and point into obj->base.resv instead.

References: 1ba627148e ("drm: Add reservation_object to drm_gem_object")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190618125858.7295-1-chris@chris-wilson.co.uk
2019-06-18 15:30:32 +01:00
Chris Wilson 422d7df4f0 drm/i915: Replace engine->timeline with a plain list
To continue the onslaught of removing the assumption of a global
execution ordering, another casualty is the engine->timeline. Without an
actual timeline to track, it is overkill and we can replace it with a
much less grand plain list. We still need a list of requests inflight,
for the simple purpose of finding inflight requests (for retiring,
resetting, preemption etc).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190614164606.15633-3-chris@chris-wilson.co.uk
2019-06-14 19:03:40 +01:00
Chris Wilson 9db0c5caa7 drm/i915: Stop retiring along engine
We no longer track the execution order along the engine and so no longer
need to enforce ordering of retire along the engine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190614164606.15633-2-chris@chris-wilson.co.uk
2019-06-14 19:03:39 +01:00
Chris Wilson ce476c80b8 drm/i915: Keep contexts pinned until after the next kernel context switch
We need to keep the context image pinned in memory until after the GPU
has finished writing into it. Since it continues to write as we signal
the final breadcrumb, we need to keep it pinned until the request after
it is complete. Currently we know the order in which requests execute on
each engine, and so to remove that presumption we need to identify a
request/context-switch we know must occur after our completion. Any
request queued after the signal must imply a context switch, for
simplicity we use a fresh request from the kernel context.

The sequence of operations for keeping the context pinned until saved is:

 - On context activation, we preallocate a node for each physical engine
   the context may operate on. This is to avoid allocations during
   unpinning, which may be from inside FS_RECLAIM context (aka the
   shrinker)

 - On context deactivation on retirement of the last active request (which
   is before we know the context has been saved), we add the
   preallocated node onto a barrier list on each engine

 - On engine idling, we emit a switch to kernel context. When this
   switch completes, we know that all previous contexts must have been
   saved, and so on retiring this request we can finally unpin all the
   contexts that were marked as deactivated prior to the switch.

We can enhance this in future by flushing all the idle contexts on a
regular heartbeat pulse of a switch to kernel context, which will also
be used to check for hung engines.

v2: intel_context_active_acquire/_release

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190614164606.15633-1-chris@chris-wilson.co.uk
2019-06-14 19:03:32 +01:00
Chris Wilson 84383d2e8d drm/i915: Refine i915_reset.lock_map
We already use a mutex to serialise i915_reset() and wedging, so all we
need it to link that into i915_request_wait() and we have our lock cycle
detection.

v2.5: Take error mutex for selftests

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190614071023.17929-3-chris@chris-wilson.co.uk
2019-06-14 15:17:54 +01:00
Chris Wilson 6e4e970861 drm/i915: Execute signal callbacks from no-op i915_request_wait
If we enter i915_request_wait() with an already completed request, but
unsignaled dma-fence, signal the fence before returning. This allows us
to execute any of the signal callbacks at the earliest opportunity.

v2: Also signal after busyspin success

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190614111053.25615-2-chris@chris-wilson.co.uk
2019-06-14 12:16:32 +01:00
Chris Wilson 33df8a7697 drm/i915: Prevent lock-cycles between GPU waits and GPU resets
We cannot allow ourselves to wait on the GPU while holding any lock as we
may need to reset the GPU. While there is not an explicit lock between
the two operations, lockdep cannot detect the dependency. So let's tell
lockdep about the wait/reset dependency with an explicit lockmap.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190612085246.16374-1-chris@chris-wilson.co.uk
2019-06-12 12:06:11 +01:00
Chris Wilson f4d57d838c drm/i915: Allow interrupts when taking the timeline->mutex
Before we commit ourselves to writing commands into the
ringbuffer and submitting the request, allow signals to interrupt
acquisition of the timeline mutex. We allow ourselves to be interrupted
at any time later if we need to block for space in the ring, anyway.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190610103610.19883-1-chris@chris-wilson.co.uk
2019-06-10 17:31:47 +01:00
Chris Wilson 10be98a77c drm/i915: Move more GEM objects under gem/
Continuing the theme of separating out the GEM clutter.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-8-chris@chris-wilson.co.uk
2019-05-28 12:45:29 +01:00
Chris Wilson f71e01a78b drm/i915: Extend execution fence to support a callback
In the next patch, we will want to configure the slave request
depending on which physical engine the master request is executed on.
For this, we introduce a callback from the execute fence to convey this
information.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-8-chris@chris-wilson.co.uk
2019-05-22 08:40:45 +01:00
Chris Wilson 78e41ddd21 drm/i915: Apply an execution_mask to the virtual_engine
Allow the user to direct which physical engines of the virtual engine
they wish to execute one, as sometimes it is necessary to override the
load balancing algorithm.

v2: Only kick the virtual engines on context-out if required

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-7-chris@chris-wilson.co.uk
2019-05-22 08:40:43 +01:00
Chris Wilson 68fc728b01 drm/i915: Downgrade NEWCLIENT to non-preemptive
Commit 1413b2bc07 ("drm/i915: Trim NEWCLIENT boosting") had the
intended consequence of not allowing a sequence of work that merely
crossed into a new engine the privilege to be promoted to NEWCLIENT
status. It also had the unintended consequence of actually making
NEWCLIENT effective on heavily oversubscribed transcode machines and
impacting upon their throughput.

If we consider a client packet composed of (rcsA, rcsB, vcs) and 30 of
those clients, using the NEWCLIENT boost that will be scheduled as

	rcsA x 30, (rcsB, vcs) x 30

where as before it would have been

	(rcsA, rcsB, vcs) x 30

That is with NEWCLIENT only boosting the first request of each client,
we would execute all rcsA requests prior to running on the vcs engines;
acruing a lot of dead time as compared to the previous case where the
vcs engine would be started in parallel to processing the second client.

The previous patch has the effect of delaying submission until it is
required by a third party (either the user with an explicit wait, or by
another client/engine). We reduce the NEWCLIENT bump to a mere WAIT,
which has the effect of removing its preemptive grant and reducing it to
the same level as any other user interaction -- that it will not be
promoted above the interengine dependencies, and so preventing NEWCLIENTS
from starving other engines. This a large nerf to the rrul properties of
the current NEWCLIENT, but it still does give prioritised submission to
new requests from light workloads.

References: b16c765122 ("drm/i915: Priority boost for new clients")
Fixes: 1413b2bc07 ("drm/i915: Trim NEWCLIENT boosting") # customer impact
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-4-chris@chris-wilson.co.uk
2019-05-17 16:04:56 +01:00
Chris Wilson 6e7eb7a807 drm/i915: Bump signaler priority on adding a waiter
The handling of the no-preemption priority level imposes the restriction
that we need to maintain the implied ordering even though preemption is
disabled. Otherwise we may end up with an AB-BA deadlock across multiple
engine due to a real preemption event reordering the no-preemption
WAITs. To resolve this issue we currently promote all requests to WAIT
on unsubmission, however this interferes with the timeslicing
requirement that we do not apply any implicit promotion that will defeat
the round-robin timeslice list. (If we automatically promote the active
request it will go back to the head of the queue and not the tail!)

So we need implicit promotion to prevent reordering around semaphores
where we are not allowed to preempt, and we must avoid implicit
promotion on unsubmission. So instead of at unsubmit, if we apply that
implicit promotion on adding the dependency, we avoid the semaphore
deadlock and we also reduce the gains made by the promotion for user
space waiting. Furthermore, by keeping the earlier dependencies at a
higher level, we reduce the search space for timeslicing without
altering runtime scheduling too badly (no dependencies at all will be
assigned a higher priority for rrul).

v2: Limit the bump to external edges (as originally intended) i.e.
between contexts and out to the user.

Testcase: igt/gem_concurrent_blit
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-3-chris@chris-wilson.co.uk
2019-05-17 16:04:46 +01:00
Chris Wilson 17db337f50 drm/i915: Truly bump ready tasks ahead of busywaits
In commit b7404c7ecb ("drm/i915: Bump ready tasks ahead of
busywaits"), I tried cutting a corner in order to not install a signal
for each of our dependencies, and only listened to requests on which we
were intending to busywait. The compromise that was made was that
instead of then being able to promote the request with a full
NOSEMAPHORE like its non-busywaiting brethren, as we had not ensured we
had cleared the semaphore chain, we settled for only using the NEWCLIENT
boost. With an over saturated system with multiple NEWCLIENTS in flight
at any time, this was found to be an inadequate promotion and left us
with a much poorer scheduling order than prior to using semaphores.

The outcome of this patch, is that all requests have NOSEMAPHORE
priority when they have no dependencies and are ready to run and not
busywait, restoring the pre-semaphore ordering on saturated systems.

We can demonstrate the effect of poor scheduling order by oversaturating
the system using gem_wsim on a system with multiple vcs engines
(i.e running the same workloads across more clients than required for
peak throughput, e.g. media_load_balance_17i7.wsim -c4 -b context):

x v5.1 (normalized)
+ tip
* fix
+------------------------------------------------------------------------+
|                                                                    x   |
|                                                                    x   |
|                                                                    x   |
|                                                                    x   |
|                                                                   %x   |
|                                                                  %%x   |
|                                                                  %%x   |
|                                                                  %%x   |
|                                                                  %%x   |
|                                                                  %%x   |
|                                                                  %%x   |
|                                                                  %%x   |
|                                                                  %%x   |
|                                                                  %%x   |
|                                                                  %%x   |
|                                                                  %#x   |
|                                                                  %#x   |
|                                                                  %#x   |
|                                                                  %#x   |
|                                                                  %#x   |
|         +                                                        %#xx  |
|         +                                                        %#xx  |
|         +                                                       %%#xx  |
|         +                                                       %%#xx  |
|         +                                                       %%#xx  |
|         +                                                       %%#xx  |
|         +                                                       %%##x  |
|         +++                                                     %%##x  |
|         +++                                                     %%##x  |
|         +++                                                     %%##x  |
|        ++++                                                     %%##x  |
|        ++++                                                     %%##x  |
|        ++++                                                     %%##xx |
|        ++++                                                     %###xx |
|        ++++                                                     %###xx |
|        ++++                                                     %###xx |
|        ++++                                                     %###xx |
|        ++++ +                                                   %#O#xx |
|        ++++ +                                                   %#O#xx |
|        ++++++ +                                                 %#O#xx |
|       ++++++++++                                                %OOOxxx|
|       ++++++++++       +                                       %#OOO#xx|
|     + ++++++++++++ ++ +++++    +                        ++    @@OOOO#xx|
|                                                                   |A_| |
||__________M_______A____________________|                               |
|                                                                 |A_|   |
+------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x 120       0.99456       1.00628      0.999985     1.0001545  0.0024387139
+ 120      0.873021       1.00037      0.884134    0.90148752   0.039190862
Difference at 99.5% confidence
	-0.098667 +/- 0.0110762
	-9.86517% +/- 1.10745%
	(Student's t, pooled s = 0.0277657)
% 120      0.990207       1.00165     0.9970265    0.99699748     0.0021024
Difference at 99.5% confidence
	-0.003157 +/- 0.000908245
	-0.315651% +/- 0.0908105%
	(Student's t, pooled s = 0.00227678)

Fixes: b7404c7ecb ("drm/i915: Bump ready tasks ahead of busywaits")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-2-chris@chris-wilson.co.uk
2019-05-17 13:56:13 +01:00