Commit Graph

84 Commits

Author SHA1 Message Date
Dave Airlie 41d1d0c51f Merge tag 'drm-intel-gt-next-2021-04-06' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
Driver Changes:

- Prepare for local/device memory support on DG1 by starting
  to use it for kernel internal allocations: context, ring
  and engine scratch (Matt A, CQ, Abdiel, Imre)
- Sandybridge fix to avoid hard hang on ring resume (Chris)
- Limit imported dma-buf size to int32 (Matt A)
- Double check heartbeat timeout before resetting (Chris)

- Use new tasklet API for execution list (Emil)
- Fix SPDX checkpats warnings (Chris)
- Fixes for various checkpatch warnings (Chris)
- Selftest improvements (Chris)
- Move the defer_request waiter active assertion to correct spot (Chris)
- Make local-memory probing a GT operation (Matt, Tvrtko)
- Protect against request freeing during cancellation on wedging (Chris)
- Retire unexpected starting state error dumping (Chris)
- Distinction of memory regions in debugging (Zbigniew)
- Always flush the submission queue on checking for idle (Chris)

- Consolidate 2big error check to helper (Matt)
- Decrease number of subplatform bits (Tvrtko)
- Remove unused internal request priority levels (Chris)
- Document the unused internal header bits in buddy allocator (Matt)
- Cleanup the region class/instance encoding (Matt)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YGxksaZGXHnFxlwg@jlahtine-mobl.ger.corp.intel.com
2021-04-08 12:46:12 +10:00
Tvrtko Ursulin 9b4d0598ee drm/i915: Request watchdog infrastructure
Prepares the plumbing for setting request/fence expiration time. All code
is put in place but is never activated due yet missing ability to actually
configure the timer.

Outline of the basic operation:

A timer is started when request is ready for execution. If the request
completes (retires) before the timer fires, timer is cancelled and nothing
further happens.

If the timer fires request is added to a lockless list and worker queued.
Purpose of this is twofold: a) It allows request cancellation from a more
friendly context and b) coalesces multiple expirations into a single event
of consuming the list.

Worker locklessly consumes the list of expired requests and cancels them
all using previous added i915_request_cancel().

Associated timeout value is stored in rq->context.watchdog.timeout_us.

v2:
 * Log expiration.

v3:
 * Include more information about user timeline in the log message.

v4:
 * Remove obsolete comment and fix formatting. (Matt)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210324121335.2307063-6-tvrtko.ursulin@linux.intel.com
2021-03-26 00:58:52 +01:00
Chris Wilson 38b237eab2 drm/i915: Individual request cancellation
Currently, we cancel outstanding requests within a context when the
context is closed. We may also want to cancel individual requests using
the same graceful preemption mechanism.

v2 (Tvrtko):
 * Cancel waiters carefully considering no timeline lock and RCU.
 * Fixed selftests.

v3 (Tvrtko):
 * Remove error propagation to waiters for now.

v4 (Tvrtko):
 * Rebase for extracted i915_request_active_engine. (Matt)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
[danvet: Resolve conflict because intel_engine_flush_scheduler is
still called intel_engine_flush_submission]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210324121335.2307063-3-tvrtko.ursulin@linux.intel.com
2021-03-26 00:55:30 +01:00
Tvrtko Ursulin 7dbc19da5d drm/i915: Extract active lookup engine to a helper
Move active engine lookup to exported i915_request_active_engine.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
[danvet: Slight rebase, engine->sched.lock is still called
engine->active.lock.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210324121335.2307063-2-tvrtko.ursulin@linux.intel.com
2021-03-26 00:48:08 +01:00
Chris Wilson c10e4a7960 drm/i915: Protect against request freeing during cancellation on wedging
As soon as we mark a request as completed, it may be retired. So when
cancelling a request and marking it complete, make sure we first keep a
reference to the request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210201085715.27435-4-chris@chris-wilson.co.uk
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2021-03-24 19:30:36 +01:00
Maarten Lankhorst 12ca695d2c drm/i915: Do not share hwsp across contexts any more, v8.
Instead of sharing pages with breadcrumbs, give each timeline a
single page. This allows unrelated timelines not to share locks
any more during command submission.

As an additional benefit, seqno wraparound no longer requires
i915_vma_pin, which means we no longer need to worry about a
potential -EDEADLK at a point where we are ready to submit.

Changes since v1:
- Fix erroneous i915_vma_acquire that should be a i915_vma_release (ickle).
- Extra check for completion in intel_read_hwsp().
Changes since v2:
- Fix inconsistent indent in hwsp_alloc() (kbuild)
- memset entire cacheline to 0.
Changes since v3:
- Do same in intel_timeline_reset_seqno(), and clflush for good measure.
Changes since v4:
- Use refcounting on timeline, instead of relying on i915_active.
- Fix waiting on kernel requests.
Changes since v5:
- Bump amount of slots to maximum (256), for best wraparounds.
- Add hwsp_offset to i915_request to fix potential wraparound hang.
- Ensure timeline wrap test works with the changes.
- Assign hwsp in intel_timeline_read_hwsp() within the rcu lock to
  fix a hang.
Changes since v6:
- Rename i915_request_active_offset to i915_request_active_seqno(),
  and elaborate the function. (tvrtko)
Changes since v7:
- Move hunk to where it belongs. (jekstrand)
- Replace CACHELINE_BYTES with TIMELINE_SEQNO_BYTES. (jekstrand)

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> #v1
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-2-maarten.lankhorst@linux.intel.com
2021-03-24 11:38:56 +01:00
Chris Wilson baa7c2cd99 drm/i915: Refactor marking a request as EIO
When wedging the device, we cancel all outstanding requests and mark
them as EIO. Rather than duplicate the small function to do so between
each submission backend, export one.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210109163455.28466-3-chris@chris-wilson.co.uk
2021-01-09 18:29:07 +00:00
Chris Wilson 16f2941ad3 drm/i915/gt: Replace direct submit with direct call to tasklet
Rather than having special case code for opportunistically calling
process_csb() and performing a direct submit while holding the engine
spinlock for submitting the request, simply call the tasklet directly.
This allows us to retain the direct submission path, including the CS
draining to allow fast/immediate submissions, without requiring any
duplicated code paths, and most importantly greatly simplifying the
control flow by removing reentrancy. This will enable us to close a few
races in the virtual engines in the next few patches.

The trickiest part here is to ensure that paired operations (such as
schedule_in/schedule_out) remain under consistent locking domains,
e.g. when pulled outside of the engine->active.lock

v2: Use bh kicking, see commit 3c53776e29 ("Mark HI and TASKLET
softirq synchronous").
v3: Update engine-reset to be tasklet aware

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201224135544.1713-1-chris@chris-wilson.co.uk
2020-12-24 15:02:35 +00:00
Chris Wilson 9bb36cf660 drm/i915: Check for rq->hwsp validity after acquiring RCU lock
Since we allow removing the timeline map at runtime, there is a risk
that rq->hwsp points into a stale page. To control that risk, we hold
the RCU read lock while reading *rq->hwsp, but we missed a couple of
important barriers. First, the unpinning / removal of the timeline map
must be after all RCU readers into that map are complete, i.e. after an
rcu barrier (in this case courtesy of call_rcu()). Secondly, we must
make sure that the rq->hwsp we are about to dereference under the RCU
lock is valid. In this case, we make the rq->hwsp pointer safe during
i915_request_retire() and so we know that rq->hwsp may become invalid
only after the request has been signaled. Therefore is the request is
not yet signaled when we acquire rq->hwsp under the RCU, we know that
rq->hwsp will remain valid for the duration of the RCU read lock.

This is a very small window that may lead to either considering the
request not completed (causing a delay until the request is checked
again, any wait for the request is not affected) or dereferencing an
invalid pointer.

Fixes: 3adac4689f ("drm/i915: Introduce concept of per-timeline (context) HWSP")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.1+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201218122421.18344-1-chris@chris-wilson.co.uk
2020-12-18 18:49:40 +00:00
Chris Wilson c744d50363 drm/i915/gt: Split the breadcrumb spinlock between global and contexts
As we funnel more and more contexts into the breadcrumbs on an engine,
the hold time of b->irq_lock grows. As we may then contend with the
b->irq_lock during request submission, this increases the burden upon
the engine->active.lock and so directly impacts both our execution
latency and client latency. If we split the b->irq_lock by introducing a
per-context spinlock to manage the signalers within a context, we then
only need the b->irq_lock for enabling/disabling the interrupt and can
avoid taking the lock for walking the list of contexts within the signal
worker. Even with the current setup, this greatly reduces the number of
times we have to take and fight for b->irq_lock.

Furthermore, this closes the race between enabling the signaling context
while it is in the process of being signaled and removed:

<4>[  416.208555] list_add corruption. prev->next should be next (ffff8881951d5910), but was dead000000000100. (prev=ffff8882781bb870).
<4>[  416.208573] WARNING: CPU: 7 PID: 0 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70
<4>[  416.208575] Modules linked in: i915(+) vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio mei_hdcp x86_pkg_temp_thermal coretemp ax88179_178a usbnet mii crct10dif_pclmul snd_intel_dspcfg crc32_pclmul snd_hda_codec snd_hwdep ghash_clmulni_intel snd_hda_core e1000e snd_pcm ptp pps_core mei_me mei prime_numbers intel_lpss_pci [last unloaded: i915]
<4>[  416.208611] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G     U            5.8.0-CI-CI_DRM_8852+ #1
<4>[  416.208614] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake Y LPDDR4x T4 RVP TLC, BIOS ICLSFWR1.R00.3212.A00.1905212112 05/21/2019
<4>[  416.208627] RIP: 0010:__list_add_valid+0x4d/0x70
<4>[  416.208631] Code: c3 48 89 d1 48 c7 c7 60 18 33 82 48 89 c2 e8 ea e0 b6 ff 0f 0b 31 c0 c3 48 89 c1 4c 89 c6 48 c7 c7 b0 18 33 82 e8 d3 e0 b6 ff <0f> 0b 31 c0 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 00 19 33 82 e8
<4>[  416.208633] RSP: 0018:ffffc90000280e18 EFLAGS: 00010086
<4>[  416.208636] RAX: 0000000000000000 RBX: ffff888250a44880 RCX: 0000000000000105
<4>[  416.208639] RDX: 0000000000000105 RSI: ffffffff82320c5b RDI: 00000000ffffffff
<4>[  416.208641] RBP: ffff8882781bb870 R08: 0000000000000000 R09: 0000000000000001
<4>[  416.208643] R10: 00000000054d2957 R11: 000000006abbd991 R12: ffff8881951d58c8
<4>[  416.208646] R13: ffff888286073880 R14: ffff888286073848 R15: ffff8881951d5910
<4>[  416.208669] FS:  0000000000000000(0000) GS:ffff88829c180000(0000) knlGS:0000000000000000
<4>[  416.208671] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  416.208673] CR2: 0000556231326c48 CR3: 0000000005610001 CR4: 0000000000760ee0
<4>[  416.208675] PKRU: 55555554
<4>[  416.208677] Call Trace:
<4>[  416.208679]  <IRQ>
<4>[  416.208751]  i915_request_enable_breadcrumb+0x278/0x400 [i915]
<4>[  416.208839]  __i915_request_submit+0xca/0x2a0 [i915]
<4>[  416.208892]  __execlists_submission_tasklet+0x480/0x1830 [i915]
<4>[  416.208942]  execlists_submission_tasklet+0xc4/0x130 [i915]
<4>[  416.208947]  tasklet_action_common.isra.17+0x6c/0x1c0
<4>[  416.208954]  __do_softirq+0xdf/0x498
<4>[  416.208960]  ? handle_fasteoi_irq+0x150/0x150
<4>[  416.208964]  asm_call_on_stack+0xf/0x20
<4>[  416.208966]  </IRQ>
<4>[  416.208969]  do_softirq_own_stack+0xa1/0xc0
<4>[  416.208972]  irq_exit_rcu+0xb5/0xc0
<4>[  416.208976]  common_interrupt+0xf7/0x260
<4>[  416.208980]  asm_common_interrupt+0x1e/0x40
<4>[  416.208985] RIP: 0010:cpuidle_enter_state+0xb6/0x410
<4>[  416.208987] Code: 00 31 ff e8 9c 3e 89 ff 80 7c 24 0b 00 74 12 9c 58 f6 c4 02 0f 85 31 03 00 00 31 ff e8 e3 6c 90 ff e8 fe a4 94 ff fb 45 85 ed <0f> 88 c7 02 00 00 49 63 c5 4c 2b 24 24 48 8d 14 40 48 8d 14 90 48
<4>[  416.208989] RSP: 0018:ffffc90000143e70 EFLAGS: 00000206
<4>[  416.208991] RAX: 0000000000000007 RBX: ffffe8ffffda8070 RCX: 0000000000000000
<4>[  416.208993] RDX: 0000000000000000 RSI: ffffffff8238b4ee RDI: ffffffff8233184f
<4>[  416.208995] RBP: ffffffff826b4e00 R08: 0000000000000000 R09: 0000000000000000
<4>[  416.208997] R10: 0000000000000001 R11: 0000000000000000 R12: 00000060e7f24a8f
<4>[  416.208998] R13: 0000000000000003 R14: 0000000000000003 R15: 0000000000000003
<4>[  416.209012]  cpuidle_enter+0x24/0x40
<4>[  416.209016]  do_idle+0x22f/0x2d0
<4>[  416.209022]  cpu_startup_entry+0x14/0x20
<4>[  416.209025]  start_secondary+0x158/0x1a0
<4>[  416.209030]  secondary_startup_64+0xa4/0xb0
<4>[  416.209039] irq event stamp: 10186977
<4>[  416.209042] hardirqs last  enabled at (10186976): [<ffffffff810b9363>] tasklet_action_common.isra.17+0xe3/0x1c0
<4>[  416.209044] hardirqs last disabled at (10186977): [<ffffffff81a5e5ed>] _raw_spin_lock_irqsave+0xd/0x50
<4>[  416.209047] softirqs last  enabled at (10186968): [<ffffffff810b9a1a>] irq_enter_rcu+0x6a/0x70
<4>[  416.209049] softirqs last disabled at (10186969): [<ffffffff81c00f4f>] asm_call_on_stack+0xf/0x20

<4>[  416.209317] list_del corruption, ffff8882781bb870->next is LIST_POISON1 (dead000000000100)
<4>[  416.209317] WARNING: CPU: 7 PID: 46 at lib/list_debug.c:47 __list_del_entry_valid+0x4e/0x90
<4>[  416.209317] Modules linked in: i915(+) vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio mei_hdcp x86_pkg_temp_thermal coretemp ax88179_178a usbnet mii crct10dif_pclmul snd_intel_dspcfg crc32_pclmul snd_hda_codec snd_hwdep ghash_clmulni_intel snd_hda_core e1000e snd_pcm ptp pps_core mei_me mei prime_numbers intel_lpss_pci [last unloaded: i915]
<4>[  416.209317] CPU: 7 PID: 46 Comm: ksoftirqd/7 Tainted: G     U  W         5.8.0-CI-CI_DRM_8852+ #1
<4>[  416.209317] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake Y LPDDR4x T4 RVP TLC, BIOS ICLSFWR1.R00.3212.A00.1905212112 05/21/2019
<4>[  416.209317] RIP: 0010:__list_del_entry_valid+0x4e/0x90
<4>[  416.209317] Code: 2e 48 8b 32 48 39 fe 75 3a 48 8b 50 08 48 39 f2 75 48 b8 01 00 00 00 c3 48 89 fe 48 89 c2 48 c7 c7 38 19 33 82 e8 62 e0 b6 ff <0f> 0b 31 c0 c3 48 89 fe 48 c7 c7 70 19 33 82 e8 4e e0 b6 ff 0f 0b
<4>[  416.209317] RSP: 0018:ffffc90000280de8 EFLAGS: 00010086
<4>[  416.209317] RAX: 0000000000000000 RBX: ffff8882781bb848 RCX: 0000000000010104
<4>[  416.209317] RDX: 0000000000010104 RSI: ffffffff8238b4ee RDI: 00000000ffffffff
<4>[  416.209317] RBP: ffff8882781bb880 R08: 0000000000000000 R09: 0000000000000001
<4>[  416.209317] R10: 000000009fb6666e R11: 00000000feca9427 R12: ffffc90000280e18
<4>[  416.209317] R13: ffff8881951d5930 R14: dead0000000000d8 R15: ffff8882781bb880
<4>[  416.209317] FS:  0000000000000000(0000) GS:ffff88829c180000(0000) knlGS:0000000000000000
<4>[  416.209317] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  416.209317] CR2: 0000556231326c48 CR3: 0000000005610001 CR4: 0000000000760ee0
<4>[  416.209317] PKRU: 55555554
<4>[  416.209317] Call Trace:
<4>[  416.209317]  <IRQ>
<4>[  416.209317]  remove_signaling_context.isra.13+0xd/0x70 [i915]
<4>[  416.209513]  signal_irq_work+0x1f7/0x4b0 [i915]

This is caused by virtual engines where although we take the breadcrumb
lock on each of the active engines, they may be different engines on
different requests, It turns out that the b->irq_lock was not a
sufficient proxy for the engine->active.lock in the case of more than
one request, so introduce an explicit lock around ce->signals.

v2: ce->signal_lock is acquired with only RCU protection and so must be
treated carefully and not cleared during reallocation. We also then need
to confirm that the ce we lock is the same as we found in the breadcrumb
list.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2276
Fixes: c18636f763 ("drm/i915: Remove requirement for holding i915_request.lock for breadcrumbs")
Fixes: 2854d86632 ("drm/i915/gt: Replace intel_engine_transfer_stale_breadcrumbs")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201126140407.31952-4-chris@chris-wilson.co.uk
2020-11-26 18:35:03 +00:00
Chris Wilson 6cfe66eb71 drm/i915/gt: Track signaled breadcrumbs outside of the breadcrumb spinlock
Make b->signaled_requests a lockless-list so that we can manipulate it
outside of the b->irq_lock.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201123113717.20500-2-chris@chris-wilson.co.uk
2020-11-23 17:26:38 +00:00
Chris Wilson 562675d09a drm/i915/gt: Update request status flags for debug pretty-printer
We plan to expand upon the number of available statuses for when we
pretty-print the requests along the timelines, and so need a new set of
flags. We have settled upon:

	Unready [U]
	  - initial status after being submitted, the request is not
	    ready for execution as it is waiting for external fences

	Ready [R]
	  - all fences the request was waiting on have been signaled,
            and the request is now ready for execution and will be
	    in a backend queue

	  - a ready request may still need to wait on semaphores
	    [internal fences]

	Ready/virtual [V]
	  - same as ready, but queued over multiple backends

	Executing [E]
	  - the request has been transferred from the backend queue and
	    submitted for execution on HW

	  - a completed request may still be regarded as executing, its
	    status may not be updated until it is retired and removed
	    from the lists

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201119165616.10834-3-chris@chris-wilson.co.uk
2020-11-19 20:34:14 +00:00
Chris Wilson 1f0e785a9c drm/i915: Lift i915_request_show()
Extract i915_request_show for reuse in other request chain pretty
printers.

For a bonus point, quietly change the seqno format from %llx to %lld to
match everywhere else.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201119165616.10834-2-chris@chris-wilson.co.uk
2020-11-19 20:31:04 +00:00
Chris Wilson b3786b2937 drm/i915/gt: Distinguish the virtual breadcrumbs from the irq breadcrumbs
On the virtual engines, we only use the intel_breadcrumbs for tracking
signaling of stale breadcrumbs from the irq_workers. They do not have
any associated interrupt handling, active requests are passed to a
physical engine and associated breadcrumb interrupt handler. This causes
issues for us as we need to ensure that we do not actually try and
enable interrupts and the powermanagement required for them on the
virtual engine, as they will never be disabled. Instead, let's
specify the physical engine used for interrupt handler on a particular
breadcrumb.

v2: Drop b->irq_armed = true mocking for no interrupt HW

Fixes: 4fe6abb8f5 ("drm/i915/gt: Ignore irq enabling on the virtual engines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200731154834.8378-4-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 14:23:55 +03:00
Chris Wilson 27a5dcfe73 drm/i915/gem: Remove disordered per-file request list for throttling
I915_GEM_THROTTLE dates back to the time before contexts where there was
just a single engine, and therefore a single timeline and request list
globally. That request list was in execution/retirement order, and so
walking it to find a particular aged request made sense and could be
split per file.

That is no more. We now have many timelines with a file, as many as the
user wants to construct (essentially per-engine, per-context). Each of
those run independently and so make the single list futile. Remove the
disordered list, and iterate over all the timelines to find a request to
wait on in each to satisfy the criteria that the CPU is no more than 20ms
ahead of its oldest request.

It should go without saying that the I915_GEM_THROTTLE ioctl is no
longer used as the primary means of throttling, so it makes sense to push
the complication into the ioctl where it only impacts upon its few
irregular users, rather than the execbuf/retire where everybody has to
pay the cost. Fortunately, the few users do not create vast amount of
contexts, so the loops over contexts/engines should be concise.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200728152010.30701-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 13:13:50 +03:00
Chris Wilson e971fe9128 drm/i915: Mark up inline getters as taking a const i915_request
Since these inline routines only return the desired pointer from the
i915_request(after checking the preconditions for acquiring said
pointer), they can be const.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200616183139.4061-1-chris@chris-wilson.co.uk
2020-06-16 21:13:58 +01:00
Chris Wilson 5a83399536 drm/i915: Drop i915_request.i915 backpointer
We infrequently use the direct i915 backpointer from the i915_request,
so do we really need to waste the space in the struct for it? 8 bytes
from the most frequently allocated struct vs an 3 bytes and pointer
chasing in using rq->engine->i915?

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200602220953.21178-1-chris@chris-wilson.co.uk
2020-06-03 13:53:39 +01:00
Chris Wilson fc0e127022 drm/i915: Improve execute_cb struct packing
Reduce the irq_work llist for attaching the callbacks to the signal for
both smaller structs (two fewer pointers!) and simpler [debug] code:

Function                                     old     new   delta
irq_execute_cb                                35      34      -1
__igt_breadcrumbs_smoketest                 1684    1682      -2
i915_request_retire                         2003    1996      -7
__i915_request_create                       1047    1040      -7
__notify_execute_cb                          135     126      -9
__i915_request_ctor                          188     178     -10
__await_execution.part.constprop             451     440     -11
igt_wait_request                             924     714    -210

One minor artifact is that the order of cb exection is reversed. No
current use cases are affected by that change.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200526112051.10229-1-chris@chris-wilson.co.uk
2020-05-26 12:24:28 +01:00
Chris Wilson 18e4af04d2 drm/i915: Drop no-semaphore boosting
Now that we have fast timeslicing on semaphores, we no longer need to
prioritise none-semaphore work as we will yield any work blocked on a
semaphore to the next in the queue. Previously with no timeslicing,
blocking on the semaphore caused extremely bad scheduling with multiple
clients utilising multiple rings. Now, there is no impact and we can
remove the complication.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200513173504.28322-1-chris@chris-wilson.co.uk
2020-05-14 06:14:33 +01:00
Chris Wilson 795d4d7fa3 drm/i915: Mark the addition of the initial-breadcrumb in the request
The initial-breadcrumb is used to mark the end of the awaiting and the
beginning of the user payload. We verify that we do not start the user
payload before all signaler are completed, checking our semaphore setup
by looking for the initial breadcrumb being written too early. We also
want to ensure that we do not add semaphore waits after we have already
closed the semaphore section, an issue for later deferred waits.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200513165937.9508-2-chris@chris-wilson.co.uk
2020-05-13 21:09:54 +01:00
Chris Wilson 43acd6516c drm/i915: Keep a per-engine request pool
Add a tiny per-engine request mempool so that we should always have a
request available for powermanagement allocations from tricky
contexts. This reserve is expected to be only used for kernel
contexts when barriers must be emitted [almost] without fail.

The main consumer for this reserved request is expected to be engine-pm,
for which we know that there will always be at least the previous pm
request that we can reuse under mempressure (so there should always be
a spare request for engine_park()).

This is an alternative to using a comparatively bulky mempool, which
requires custom handling for both our reserved allocation requirement
and to protect our TYPESAFE_BY_RCU slab cache. The advantage of mempool
would be that it would allow us to keep a larger per-engine request
pool. However, converting over to mempool is straightforward should the
need arise.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-and-tested-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200402184037.21630-1-chris@chris-wilson.co.uk
2020-04-03 15:20:03 +01:00
Chris Wilson 209df10bb4 drm/i915: Defer semaphore priority bumping to a workqueue
Since the semaphore fence may be signaled from inside an interrupt
handler from inside a request holding its request->lock, we cannot then
enter into the engine->active.lock for processing the semaphore priority
bump as we may traverse our call tree and end up on another held
request.

CPU 0:
[ 2243.218864]  _raw_spin_lock_irqsave+0x9a/0xb0
[ 2243.218867]  i915_schedule_bump_priority+0x49/0x80 [i915]
[ 2243.218869]  semaphore_notify+0x6d/0x98 [i915]
[ 2243.218871]  __i915_sw_fence_complete+0x61/0x420 [i915]
[ 2243.218874]  ? kmem_cache_free+0x211/0x290
[ 2243.218876]  i915_sw_fence_complete+0x58/0x80 [i915]
[ 2243.218879]  dma_i915_sw_fence_wake+0x3e/0x80 [i915]
[ 2243.218881]  signal_irq_work+0x571/0x690 [i915]
[ 2243.218883]  irq_work_run_list+0xd7/0x120
[ 2243.218885]  irq_work_run+0x1d/0x50
[ 2243.218887]  smp_irq_work_interrupt+0x21/0x30
[ 2243.218889]  irq_work_interrupt+0xf/0x20

CPU 1:
[ 2242.173107]  _raw_spin_lock+0x8f/0xa0
[ 2242.173110]  __i915_request_submit+0x64/0x4a0 [i915]
[ 2242.173112]  __execlists_submission_tasklet+0x8ee/0x2120 [i915]
[ 2242.173114]  ? i915_sched_lookup_priolist+0x1e3/0x2b0 [i915]
[ 2242.173117]  execlists_submit_request+0x2e8/0x2f0 [i915]
[ 2242.173119]  submit_notify+0x8f/0xc0 [i915]
[ 2242.173121]  __i915_sw_fence_complete+0x61/0x420 [i915]
[ 2242.173124]  ? _raw_spin_unlock_irqrestore+0x39/0x40
[ 2242.173137]  i915_sw_fence_complete+0x58/0x80 [i915]
[ 2242.173140]  i915_sw_fence_commit+0x16/0x20 [i915]

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1318
Fixes: b7404c7ecb ("drm/i915: Bump ready tasks ahead of busywaits")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.2+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200310101720.9944-1-chris@chris-wilson.co.uk
2020-03-10 23:12:38 +00:00
Chris Wilson 89f077ab90 drm/i915: Mark up unlocked update of i915_request.hwsp_seqno
During i915_request_retire() we decouple the i915_request.hwsp_seqno
from the intel_timeline so that it may be freed before the request is
released. However, we need to warn the compiler that the pointer may
update under its nose.

[  171.438899] BUG: KCSAN: data-race in i915_request_await_dma_fence [i915] / i915_request_retire [i915]
[  171.438920]
[  171.438932] write to 0xffff8881e7e28ce0 of 8 bytes by task 148 on cpu 2:
[  171.439174]  i915_request_retire+0x1ea/0x660 [i915]
[  171.439408]  retire_requests+0x7a/0xd0 [i915]
[  171.439640]  engine_retire+0xa1/0xe0 [i915]
[  171.439657]  process_one_work+0x3b1/0x690
[  171.439671]  worker_thread+0x80/0x670
[  171.439685]  kthread+0x19a/0x1e0
[  171.439701]  ret_from_fork+0x1f/0x30
[  171.439721]
[  171.439739] read to 0xffff8881e7e28ce0 of 8 bytes by task 696 on cpu 1:
[  171.439990]  i915_request_await_dma_fence+0x162/0x520 [i915]
[  171.440230]  i915_request_await_object+0x2fe/0x470 [i915]
[  171.440467]  i915_gem_do_execbuffer+0x45dc/0x4c20 [i915]
[  171.440704]  i915_gem_execbuffer2_ioctl+0x2c3/0x580 [i915]
[  171.440722]  drm_ioctl_kernel+0xe4/0x120
[  171.440736]  drm_ioctl+0x297/0x4c7
[  171.440750]  ksys_ioctl+0x89/0xb0
[  171.440766]  __x64_sys_ioctl+0x42/0x60
[  171.440788]  do_syscall_64+0x6e/0x2c0
[  171.440802]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200309110934.868-1-chris@chris-wilson.co.uk
2020-03-09 18:23:59 +00:00
Chris Wilson 36e191f064 drm/i915: Apply i915_request_skip() on submission
Trying to use i915_request_skip() prior to i915_request_add() causes us
to try and fill the ring upto request->postfix, which has not yet been
set, and so may cause us to memset() past the end of the ring.

Instead of skipping the request immediately, just flag the error on the
request (only accepting the first fatal error we see) and then clear the
request upon submission.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200304121849.2448028-1-chris@chris-wilson.co.uk
2020-03-04 14:29:50 +00:00
Chris Wilson f1766e3a78 drm/i915: Fix typo in kerneldoc function name
A forgetful copy'n'paste left the name of the old function intact, and
did not introduce the new function 'i915_request_is_ready'

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200117101639.2908469-1-chris@chris-wilson.co.uk
2020-01-17 13:20:18 +00:00
Chris Wilson 32ff621fd7 drm/i915/gt: Allow temporary suspension of inflight requests
In order to support out-of-line error capture, we need to remove the
active request from HW and put it to one side while a worker compresses
and stores all the details associated with that request. (As that
compression may take an arbitrary user-controlled amount of time, we
want to let the engine continue running on other workloads while the
hanging request is dumped.) Not only do we need to remove the active
request, but we also have to remove its context and all requests that
were dependent on it (both in flight, queued and future submission).

Finally once the capture is complete, we need to be able to resubmit the
request and its dependents and allow them to execute.

v2: Replace stack recursion with a simple list.
v3: Check all the parents, not just the first, when searching for a
stuck ancestor!

References: https://gitlab.freedesktop.org/drm/intel/issues/738
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-2-chris@chris-wilson.co.uk
2020-01-16 19:56:16 +00:00
Chris Wilson 672c368f93 drm/i915: Keep track of request among the scheduling lists
If we keep track of when the i915_request.sched.link is on the HW
runlist, or in the priority queue we can simplify our interactions with
the request (such as during rescheduling). This also simplifies the next
patch where we introduce a new in-between list, for requests that are
ready but neither on the run list or in the queue.

v2: Update i915_sched_node.link explanation for current usage where it
is a link on both the queue and on the runlists.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-1-chris@chris-wilson.co.uk
2020-01-16 19:56:15 +00:00
Chris Wilson e1c31fb5dd drm/i915: Merge i915_request.flags with i915_request.fence.flags
As we already have a flags field buried within i915_request, reuse it!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200106114234.2529613-3-chris@chris-wilson.co.uk
2020-01-06 14:38:55 +00:00
Chris Wilson 41d329e287 drm/i915: Add spaces before compound GEM_TRACE
Add a space between the prefixed format and the users format so that the
join are not mistakenly combined into one long word.

Fixes: 639f2f2489 ("drm/i915: Introduce new macros for tracing")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191223204411.2355304-1-chris@chris-wilson.co.uk
2019-12-25 12:14:34 +00:00
Chris Wilson 6a8679c048 drm/i915: Mark the GEM context link as RCU protected
The only protection for intel_context.gem_cotext is granted by RCU, so
annotate it as a rcu protected pointer and carefully dereference it in
the few occasions we need to use it.

Fixes: 9f3ccd40ac ("drm/i915: Drop GEM context as a direct link from i915_request")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Acked-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191222233558.2201901-1-chris@chris-wilson.co.uk
2019-12-23 13:08:47 +00:00
Chris Wilson 9f3ccd40ac drm/i915: Drop GEM context as a direct link from i915_request
Keep the intel_context as being the primary state for i915_request, with
the GEM context a backpointer from the low level state for the rarer
cases we need client information. Our goal is to remove such references
to clients from the backend, and leave the HW submission agnostic to
client interfaces and self-contained.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191220101230.256839-1-chris@chris-wilson.co.uk
2019-12-20 10:52:21 +00:00
Chris Wilson b81e4d9b59 drm/i915/gt: Track engine round-trip times
Knowing the round trip time of an engine is useful for tracking the
health of the system as well as providing a metric for the baseline
responsiveness of the engine. We can use the latter metric for
automatically tuning our waits in selftests and when idling so we don't
confuse a slower system with a dead one.

Upon idling the engine, we send one last pulse to switch the context
away from precious user state to the volatile kernel context. We know
the engine is idle at this point, and the pulse is non-preemptible, so
this provides us with a good measurement of the round trip time. It also
provides us with faster engine parking for ringbuffer submission, which
is a welcome bonus (e.g. softer-rc6).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191219105043.4169050-1-chris@chris-wilson.co.uk
Link: https://patchwork.freedesktop.org/patch/msgid/20191219124353.8607-2-chris@chris-wilson.co.uk
2019-12-19 17:03:57 +00:00
Chris Wilson 85bedbf191 drm/i915/gt: Eliminate the trylock for reading a timeline's hwsp
As we stash a pointer to the HWSP cacheline on the request, when reading
it we only need confirm that the cacheline is still valid by checking
that the request and timeline are still intact.

v2: Protect hwsp_cachline with RCU

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191217011659.3092130-1-chris@chris-wilson.co.uk
2019-12-17 16:59:48 +00:00
Venkata Sandeep Dhanalakota 639f2f2489 drm/i915: Introduce new macros for tracing
New macros ENGINE_TRACE(), CE_TRACE(), RQ_TRACE() and
GT_TRACE() are introduce to tag device name and engine
name with contexts and requests tracing in i915.

Cc: Sudeep Dutt <sudeep.dutt@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191213155152.69182-2-venkata.s.dhanalakota@intel.com
2019-12-13 20:16:23 +00:00
Chris Wilson c3eb54aad9 drm/i915: Mark up "sentinel" requests
Sometimes we want to emit a terminator request, a request that flushes
the pipeline and allows no request to come after it. This can be used
for a "preempt-to-idle" to ensure that upon processing the
context-switch to that request, all other active contexts have been
flushed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191012070136.32058-1-chris@chris-wilson.co.uk
2019-10-12 08:51:17 +01:00
Chris Wilson 6610197542 drm/i915: Move request runtime management onto gt
Requests are run from the gt and are tided into the gt runtime power
management, so pull the runtime request management under gt/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-12-chris@chris-wilson.co.uk
2019-10-04 15:39:26 +01:00
Chris Wilson f33a8a5160 drm/i915: Merge wait_for_timelines with retire_request
wait_for_timelines is essentially the same loop as retiring requests
(with an extra timeout), so merge the two into one routine.

v2: i915_retire_requests_timeout and keep VT'd w/a as !interruptible

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-10-chris@chris-wilson.co.uk
2019-10-04 15:39:23 +01:00
Chris Wilson 7e80576266 drm/i915: Drop struct_mutex from around i915_retire_requests()
We don't need to hold struct_mutex now for retiring requests, so drop it
from i915_retire_requests() and i915_gem_wait_for_idle(), finally
removing I915_WAIT_LOCKED for good.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-8-chris@chris-wilson.co.uk
2019-10-04 15:39:17 +01:00
Chris Wilson b1e3177bd1 drm/i915: Coordinate i915_active with its own mutex
Forgo the struct_mutex serialisation for i915_active, and interpose its
own mutex handling for active/retire.

This is a multi-layered sleight-of-hand. First, we had to ensure that no
active/retire callbacks accidentally inverted the mutex ordering rules,
nor assumed that they were themselves serialised by struct_mutex. More
challenging though, is the rule over updating elements of the active
rbtree. Instead of the whole i915_active now being serialised by
struct_mutex, allocations/rotations of the tree are serialised by the
i915_active.mutex and individual nodes are serialised by the caller
using the i915_timeline.mutex (we need to use nested spinlocks to
interact with the dma_fence callback lists).

The pain point here is that instead of a single mutex around execbuf, we
now have to take a mutex for active tracker (one for each vma, context,
etc) and a couple of spinlocks for each fence update. The improvement in
fine grained locking allowing for multiple concurrent clients
(eventually!) should be worth it in typical loads.

v2: Add some comments that barely elucidate anything :(

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-6-chris@chris-wilson.co.uk
2019-10-04 15:39:12 +01:00
Chris Wilson c0bb487dc1 drm/i915: Only enqueue already completed requests
If we are asked to submit a completed request, just move it onto the
active-list without modifying it's payload. If we try to emit the
modified payload of a completed request, we risk racing with the
ring->head update during retirement which may advance the head past our
breadcrumb and so we generate a warning for the emission being behind
the RING_HEAD.

v2: Commentary for the sneaky, shared responsibility between functions.
v3: Spelling mistakes and bonus assertion

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-3-chris@chris-wilson.co.uk
2019-09-23 16:21:37 +01:00
Chris Wilson d19d71fc2b drm/i915: Mark i915_request.timeline as a volatile, rcu pointer
The request->timeline is only valid until the request is retired (i.e.
before it is completed). Upon retiring the request, the context may be
unpinned and freed, and along with it the timeline may be freed. We
therefore need to be very careful when chasing rq->timeline that the
pointer does not disappear beneath us. The vast majority of users are in
a protected context, either during request construction or retirement,
where the timeline->mutex is held and the timeline cannot disappear. It
is those few off the beaten path (where we access a second timeline) that
need extra scrutiny -- to be added in the next patch after first adding
the warnings about dangerous access.

One complication, where we cannot use the timeline->mutex itself, is
during request submission onto hardware (under spinlocks). Here, we want
to check on the timeline to finalize the breadcrumb, and so we need to
impose a second rule to ensure that the request->timeline is indeed
valid. As we are submitting the request, it's context and timeline must
be pinned, as it will be used by the hardware. Since it is pinned, we
know the request->timeline must still be valid, and we cannot submit the
idle barrier until after we release the engine->active.lock, ergo while
submitting and holding that spinlock, a second thread cannot release the
timeline.

v2: Don't be lazy inside selftests; hold the timeline->mutex for as long
as we need it, and tidy up acquiring the timeline with a bit of
refactoring (i915_active_add_request)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190919111912.21631-1-chris@chris-wilson.co.uk
2019-09-20 10:24:09 +01:00
Chris Wilson e5dadff4b0 drm/i915: Protect request retirement with timeline->mutex
Forgo the struct_mutex requirement for request retirement as we have
been transitioning over to only using the timeline->mutex for
controlling the lifetime of a request on that timeline.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190815205709.24285-4-chris@chris-wilson.co.uk
2019-08-15 23:21:13 +01:00
Chris Wilson a79ca656b6 drm/i915: Push the wakeref->count deferral to the backend
If the backend wishes to defer the wakeref parking, make it responsible
for unlocking the wakeref (i.e. bumping the counter). This allows it to
time the unlock much more carefully in case it happens to needs the
wakeref to be active during its deferral.

For instance, during engine parking we may choose to emit an idle
barrier (a request). To do so, we borrow the engine->kernel_context
timeline and to ensure exclusive access we keep the
engine->wakeref.count as 0. However, to submit that request to HW may
require a intel_engine_pm_get() (e.g. to keep the submission tasklet
alive) and before we allow that we have to rewake our wakeref to avoid a
recursive deadlock.

<4> [257.742916] IRQs not enabled as expected
<4> [257.742930] WARNING: CPU: 0 PID: 0 at kernel/softirq.c:169 __local_bh_enable_ip+0xa9/0x100
<4> [257.742936] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 btusb btrtl btbcm btintel snd_hda_intel snd_intel_nhlt bluetooth snd_hda_codec coretemp snd_hwdep crct10dif_pclmul snd_hda_core crc32_pclmul ecdh_generic ecc ghash_clmulni_intel snd_pcm r8169 realtek lpc_ich prime_numbers i2c_hid
<4> [257.742991] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G     U  W         5.3.0-rc3-g5d0a06cd532c-drmtip_340+ #1
<4> [257.742998] Hardware name: GIGABYTE GB-BXBT-1900/MZBAYAB-00, BIOS F6 02/17/2015
<4> [257.743008] RIP: 0010:__local_bh_enable_ip+0xa9/0x100
<4> [257.743017] Code: 37 5b 5d c3 8b 80 50 08 00 00 85 c0 75 a9 80 3d 0b be 25 01 00 75 a0 48 c7 c7 f3 0c 06 ac c6 05 fb bd 25 01 01 e8 77 84 ff ff <0f> 0b eb 89 48 89 ef e8 3b 41 06 00 eb 98 e8 e4 5c f4 ff 5b 5d c3
<4> [257.743025] RSP: 0018:ffffa78600003cb8 EFLAGS: 00010086
<4> [257.743035] RAX: 0000000000000000 RBX: 0000000000000200 RCX: 0000000000010302
<4> [257.743042] RDX: 0000000080010302 RSI: 0000000000000000 RDI: 00000000ffffffff
<4> [257.743050] RBP: ffffffffc0494bb3 R08: 0000000000000000 R09: 0000000000000001
<4> [257.743058] R10: 0000000014c8f0e9 R11: 00000000fee2ff8e R12: ffffa23ba8c38008
<4> [257.743065] R13: ffffa23bacc579c0 R14: ffffa23bb7db0f60 R15: ffffa23b9cc8c430
<4> [257.743074] FS:  0000000000000000(0000) GS:ffffa23bbba00000(0000) knlGS:0000000000000000
<4> [257.743082] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [257.743089] CR2: 00007fe477b20778 CR3: 000000011f72a000 CR4: 00000000001006f0
<4> [257.743096] Call Trace:
<4> [257.743104]  <IRQ>
<4> [257.743265]  __i915_request_commit+0x240/0x5d0 [i915]
<4> [257.743427]  ? __i915_request_create+0x228/0x4c0 [i915]
<4> [257.743584]  __engine_park+0x64/0x250 [i915]
<4> [257.743730]  ____intel_wakeref_put_last+0x1c/0x70 [i915]
<4> [257.743878]  i915_sample+0x2ee/0x310 [i915]
<4> [257.744030]  ? i915_pmu_cpu_offline+0xb0/0xb0 [i915]
<4> [257.744040]  __hrtimer_run_queues+0x11e/0x4b0
<4> [257.744068]  hrtimer_interrupt+0xea/0x250
<4> [257.744079]  ? lockdep_hardirqs_off+0x79/0xd0
<4> [257.744101]  smp_apic_timer_interrupt+0x96/0x280
<4> [257.744114]  apic_timer_interrupt+0xf/0x20
<4> [257.744125] RIP: 0010:__do_softirq+0xb3/0x4ae

v2: Keep the priority_hint assert
v3: That assert was desperately trying to point out my bug. Sorry, little
assert.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111378
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190813190705.23869-1-chris@chris-wilson.co.uk
2019-08-13 21:09:49 +01:00
Lionel Landwerlin 2a98f4e65b drm/i915: add infrastructure to hold off preemption on a request
We want to set this flag in the next commit on requests containing
perf queries so that the result of the perf query can just be a delta
of global counters, rather than doing post processing of the OA
buffer.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[ickle: add basic selftest for nopreempt]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190709164227.25859-1-chris@chris-wilson.co.uk
2019-07-09 21:26:40 +01:00
Tvrtko Ursulin f0c02c1b91 drm/i915: Rename i915_timeline to intel_timeline and move under gt
Move all timeline code under gt and rename to intel_gt prefix.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-32-tvrtko.ursulin@linux.intel.com
2019-06-21 13:48:53 +01:00
Chris Wilson 22b7a426bb drm/i915/execlists: Preempt-to-busy
When using a global seqno, we required a precise stop-the-workd event to
handle preemption and unwind the global seqno counter. To accomplish
this, we would preempt to a special out-of-band context and wait for the
machine to report that it was idle. Given an idle machine, we could very
precisely see which requests had completed and which we needed to feed
back into the run queue.

However, now that we have scrapped the global seqno, we no longer need
to precisely unwind the global counter and only track requests by their
per-context seqno. This allows us to loosely unwind inflight requests
while scheduling a preemption, with the enormous caveat that the
requests we put back on the run queue are still _inflight_ (until the
preemption request is complete). This makes request tracking much more
messy, as at any point then we can see a completed request that we
believe is not currently scheduled for execution. We also have to be
careful not to rewind RING_TAIL past RING_HEAD on preempting to the
running context, and for this we use a semaphore to prevent completion
of the request before continuing.

To accomplish this feat, we change how we track requests scheduled to
the HW. Instead of appending our requests onto a single list as we
submit, we track each submission to ELSP as its own block. Then upon
receiving the CS preemption event, we promote the pending block to the
inflight block (discarding what was previously being tracked). As normal
CS completion events arrive, we then remove stale entries from the
inflight tracker.

v2: Be a tinge paranoid and ensure we flush the write into the HWS page
for the GPU semaphore to pick in a timely fashion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190620142052.19311-1-chris@chris-wilson.co.uk
2019-06-20 16:52:36 +01:00
Chris Wilson 422d7df4f0 drm/i915: Replace engine->timeline with a plain list
To continue the onslaught of removing the assumption of a global
execution ordering, another casualty is the engine->timeline. Without an
actual timeline to track, it is overkill and we can replace it with a
much less grand plain list. We still need a list of requests inflight,
for the simple purpose of finding inflight requests (for retiring,
resetting, preemption etc).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190614164606.15633-3-chris@chris-wilson.co.uk
2019-06-14 19:03:40 +01:00
Chris Wilson f71e01a78b drm/i915: Extend execution fence to support a callback
In the next patch, we will want to configure the slave request
depending on which physical engine the master request is executed on.
For this, we introduce a callback from the execute fence to convey this
information.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-8-chris@chris-wilson.co.uk
2019-05-22 08:40:45 +01:00
Chris Wilson 78e41ddd21 drm/i915: Apply an execution_mask to the virtual_engine
Allow the user to direct which physical engines of the virtual engine
they wish to execute one, as sometimes it is necessary to override the
load balancing algorithm.

v2: Only kick the virtual engines on context-out if required

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-7-chris@chris-wilson.co.uk
2019-05-22 08:40:43 +01:00
Chris Wilson 46472b3efb drm/i915: Move i915_request_alloc into selftests/
Having transitioned GEM over to using intel_context as its primary means
of tracking the GEM context and engine combined and using
i915_request_create(), we can move the older i915_request_alloc()
helper function into selftests/ where the remaining users are confined.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190426163336.15906-9-chris@chris-wilson.co.uk
2019-04-26 18:32:20 +01:00