Commit Graph

272 Commits

Author SHA1 Message Date
Chris Wilson b68be5c623 drm/i915/execlists: Record the active CCID from before reset
If we cannot trust the reset will flush out the CS event queue such that
process_csb() reports an accurate view of HW, we will need to search the
active and pending contexts to determine which was actually running at
the time we issued the reset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200505084629.31365-1-chris@chris-wilson.co.uk
2020-05-05 12:05:40 +01:00
Chris Wilson 25fd6de315 drm/i915/gt: Small tidy of gen8+ breadcrumb emission
Use a local to shrink a line under 80 columns, and refactor the common
emit_xcs_breadcrumb() wrapper of ggtt-write.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200504180507.6017-1-chris@chris-wilson.co.uk
2020-05-05 09:16:59 +01:00
Chris Wilson a211da9c77 drm/i915/gt: Make timeslicing an explicit engine property
In order to allow userspace to rely on timeslicing to reorder their
batches, we must support preemption of those user batches. Declare
timeslicing as an explicit property that is a combination of having the
kernel support and HW support.

Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 8ee36e048c ("drm/i915/execlists: Minimalistic timeslicing")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200501122249.12417-1-chris@chris-wilson.co.uk
2020-05-01 15:17:33 +01:00
Chris Wilson 426d0073fb drm/i915/gt: Always enable busy-stats for execlists
In the near future, we will utilize the busy-stats on each engine to
approximate the C0 cycles of each, and use that as an input to a manual
RPS mechanism. That entails having busy-stats always enabled and so we
can remove the enable/disable routines and simplify the pmu setup. As a
consequence of always having the stats enabled, we can also show the
current active time via sysfs/engine/xcs/active_time_ns.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200429205446.3259-1-chris@chris-wilson.co.uk
2020-04-30 00:57:34 +01:00
Chris Wilson be1cb55a07 drm/i915/gt: Keep a no-frills swappable copy of the default context state
We need to keep the default context state around to instantiate new
contexts (aka golden rendercontext), and we also keep it pinned while
the engine is active so that we can quickly reset a hanging context.
However, the default contexts are large enough to merit keeping in
swappable memory as opposed to kernel memory, so we store them inside
shmemfs. Currently, we use the normal GEM objects to create the default
context image, but we can throw away all but the shmemfs file.

This greatly simplifies the tricky power management code which wants to
run underneath the normal GT locking, and we definitely do not want to
use any high level objects that may appear to recurse back into the GT.
Though perhaps the primary advantage of the complex GEM object is that
we aggressively cache the mapping, but here we are recreating the
vm_area everytime time we unpark. At the worst, we add a lightweight
cache, but first find a microbenchmark that is impacted.

Having started to create some utility functions to make working with
shmemfs objects easier, we can start putting them to wider use, where
GEM objects are overkill, such as storing persistent error state.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200429172429.6054-1-chris@chris-wilson.co.uk
2020-04-29 19:02:37 +01:00
Chris Wilson f6a7c21c99 drm/i915/execlists: Verify we don't submit two identical CCIDs
Check that we do not submit two contexts into ELSP with the same CCID
[upper portion of the descriptor].

References: https://gitlab.freedesktop.org/drm/intel/-/issues/1793
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200428184751.11257-3-chris@chris-wilson.co.uk
2020-04-28 22:17:36 +01:00
Chris Wilson 5c4a53e3b1 drm/i915/execlists: Track inflight CCID
The presumption is that by using a circular counter that is twice as
large as the maximum ELSP submission, we would never reuse the same CCID
for two inflight contexts.

However, if we continually preempt an active context such that it always
remains inflight, it can be resubmitted with an arbitrary number of
paired contexts. As each of its paired contexts will use a new CCID,
eventually it will wrap and submit two ELSP with the same CCID.

Rather than use a simple circular counter, switch over to a small bitmap
of inflight ids so we can avoid reusing one that is still potentially
active.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1796
Fixes: 2935ed5339 ("drm/i915: Remove logical HW ID")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200428184751.11257-2-chris@chris-wilson.co.uk
2020-04-28 22:17:36 +01:00
Chris Wilson 2632f174a2 drm/i915/execlists: Avoid reusing the same logical CCID
The bspec is confusing on the nature of the upper 32bits of the LRC
descriptor. Once upon a time, it said that it uses the upper 32b to
decide if it should perform a lite-restore, and so we must ensure that
each unique context submitted to HW is given a unique CCID [for the
duration of it being on the HW]. Currently, this is achieved by using
a small circular tag, and assigning every context submitted to HW a
new id. However, this tag is being cleared on repinning an inflight
context such that we end up re-using the 0 tag for multiple contexts.

To avoid accidentally clearing the CCID in the upper 32bits of the LRC
descriptor, split the descriptor into two dwords so we can update the
GGTT address separately from the CCID.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1796
Fixes: 2935ed5339 ("drm/i915: Remove logical HW ID")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200428184751.11257-1-chris@chris-wilson.co.uk
2020-04-28 22:17:36 +01:00
Chris Wilson 4243cd5388 drm/i915/gt: Sanitize GT first
We see that if the HW doesn't actually sleep, the HW may eat the poison
we set in its write-only HWSP during sanitize:

  intel_gt_resume.part.8: 0000:00:02.0
  __gt_unpark: 0000:00:02.0
  gt_sanitize: 0000:00:02.0 force:yes
  process_csb: 0000:00:02.0 vcs0: cs-irq head=5, tail=90
  process_csb: 0000:00:02.0 vcs0: csb[0]: status=0x5a5a5a5a:0x5a5a5a5a
  assert_pending_valid: Nothing pending for promotion!

The CS TAIL pointer should have been reset by reset_csb_pointers(), so
in this case it is likely that we have read back from the CPU cache and
so we must clflush our control over that page. In doing so, push the
sanitisation to the start of the GT sequence so that our poisoning is
assuredly before we start talking to the HW.

References: https://gitlab.freedesktop.org/drm/intel/-/issues/1794
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200427084000.10999-1-chris@chris-wilson.co.uk
2020-04-27 11:39:23 +01:00
Chris Wilson 68ace460c5 drm/i915/execlists: Check preempt-timeout target before submit_ports
We evaluate *active, which is a pointer into execlists->inflight[]
during dequeue to decide how long a preempt-timeout we need to apply.
However, as soon as we do the submit_ports, the HW may send its ACK
interrupt causing us to promote execlists->pending[] tp
execlists->inflight[], overwriting the value of *active. We know *active
is only stable until we submit (as we only submit when there is no
pending promotion).

[   16.102328] BUG: KCSAN: data-race in execlists_dequeue+0x1449/0x1600 [i915]
[   16.102356]
[   16.102375] race at unknown origin, with read to 0xffff8881e9500488 of 8 bytes by task 429 on cpu 1:
[   16.102780]  execlists_dequeue+0x1449/0x1600 [i915]
[   16.103160]  __execlists_submission_tasklet+0x48/0x60 [i915]
[   16.103540]  execlists_submit_request+0x38e/0x3c0 [i915]
[   16.103940]  submit_notify+0x8f/0xc0 [i915]
[   16.104308]  __i915_sw_fence_complete+0x61/0x420 [i915]
[   16.104683]  i915_sw_fence_complete+0x58/0x80 [i915]
[   16.105054]  i915_sw_fence_commit+0x16/0x20 [i915]
[   16.105457]  __i915_request_queue+0x60/0x70 [i915]
[   16.105843]  i915_gem_do_execbuffer+0x2d6b/0x4230 [i915]
[   16.106227]  i915_gem_execbuffer2_ioctl+0x2b0/0x580 [i915]
[   16.106257]  drm_ioctl_kernel+0xe9/0x130
[   16.106279]  drm_ioctl+0x27d/0x45e
[   16.106311]  ksys_ioctl+0x89/0xb0
[   16.106336]  __x64_sys_ioctl+0x42/0x60
[   16.106370]  do_syscall_64+0x6e/0x2c0
[   16.106397]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200426094231.21995-1-chris@chris-wilson.co.uk
2020-04-27 11:36:59 +01:00
Mika Kuoppala b8a1181122 drm/i915: Use indirect ctx bb to mend CMD_BUF_CCTL
Use indirect ctx bb to load cmd buffer control value
from context image to avoid corruption.

v2: add to lrc layout (Chris)
v3: end to a cacheline (Chris)
v4: add to lrc fixed (Chris)
v5: value in offset+1

Testcase: igt/i915_selftest/gt_lrc
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200424230632.30333-1-mika.kuoppala@linux.intel.com
2020-04-25 19:08:56 +01:00
Mika Kuoppala 685d21096f drm/i915: Add per ctx batchbuffer wa for timestamp
Restoration of a previous timestamp can collide
with updating the timestamp, causing a value corruption.

Combat this issue by using indirect ctx bb to
modify the context image during restoring process.

We can preload value into scratch register. From which
we then do the actual write with LRR. LRR is faster and
thus less error prone as probability of race drops.

v2: tidying (Chris)
v3: lrr for all engines
v4: grp
v5: reg bit
v6: wa_bb_offset, virtual engines (Chris)

References: HSDES#16010904313
Testcase: igt/i915_selftest/gt_lrc
Suggested-by: Joseph Koston <joseph.koston@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200424230546.30271-1-mika.kuoppala@linux.intel.com
2020-04-25 18:39:32 +01:00
Mika Kuoppala 168c6d231b drm/i915: Add engine scratch register to live_lrc_fixed
General purpose registers are per engine and
in a fixed location. Add to live_lrc_fixed.

Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200424214841.28076-1-mika.kuoppala@linux.intel.com
2020-04-25 17:58:33 +01:00
Mika Kuoppala b4892e4404 drm/i915: Make define for lrc state offset
More often than not, we need a byte offset into lrc
register state from the start of the hw state. Make it so.

Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200423182355.21837-3-mika.kuoppala@linux.intel.com
2020-04-24 00:52:14 +01:00
Mika Kuoppala f1cc6acf22 drm/i915/selftests: Add context batchbuffers registers to live_lrc_fixed
Add per ctx bb and indirect ctx bb register locations to live_lrc_fixed
for verification.

Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200423224159.22078-1-chris@chris-wilson.co.uk
2020-04-24 00:36:13 +01:00
Chris Wilson 36fe164d8d drm/i915/gt: Carefully order virtual_submission_tasklet
During the virtual engine's submission tasklet, we take the request and
insert into the submission queue on each of our siblings. This seems
quite simply, and so no problems with ordering. However, the sibling
execlists' submission tasklets may run concurrently with the virtual
engine's tasklet, submitting the request to HW before the virtual
finishes its task of telling all the siblings. If this happens, the
sibling tasklet may *reorder* the ve->sibling[] array that the virtual
engine tasklet is processing. This can *only* reorder within the
elements already processed by the virtual engine, nevertheless the
race is detected by KCSAN:

[  185.580014] BUG: KCSAN: data-race in execlists_dequeue [i915] / virtual_submission_tasklet [i915]
[  185.580054]
[  185.580076] write to 0xffff8881f1919860 of 8 bytes by interrupt on cpu 2:
[  185.580553]  execlists_dequeue+0x6ad/0x1600 [i915]
[  185.581044]  __execlists_submission_tasklet+0x48/0x60 [i915]
[  185.581517]  execlists_submission_tasklet+0xd3/0x170 [i915]
[  185.581554]  tasklet_action_common.isra.0+0x42/0x90
[  185.581585]  __do_softirq+0xc8/0x206
[  185.581613]  run_ksoftirqd+0x15/0x20
[  185.581641]  smpboot_thread_fn+0x15a/0x270
[  185.581669]  kthread+0x19a/0x1e0
[  185.581695]  ret_from_fork+0x1f/0x30
[  185.581717]
[  185.581736] read to 0xffff8881f1919860 of 8 bytes by interrupt on cpu 0:
[  185.582231]  virtual_submission_tasklet+0x10e/0x5c0 [i915]
[  185.582265]  tasklet_action_common.isra.0+0x42/0x90
[  185.582291]  __do_softirq+0xc8/0x206
[  185.582315]  run_ksoftirqd+0x15/0x20
[  185.582340]  smpboot_thread_fn+0x15a/0x270
[  185.582368]  kthread+0x19a/0x1e0
[  185.582395]  ret_from_fork+0x1f/0x30
[  185.582417]

We can prevent this race by checking for the ve->request after looking
up the sibling array.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200423115315.26825-1-chris@chris-wilson.co.uk
2020-04-23 16:14:27 +01:00
Chris Wilson 15501287b1 drm/i915/execlists: Drop request-before-CS assertion
When we migrated to execlists, one of the conditions we wanted to test
for was whether the breadcrumb seqno was being written before the
breadcumb interrupt was delivered. This was following on from issues
observed on previous generations which were not so strongly ordered. With
the removal of the missed interrupt detection, we have not reliable
means of detecting the out-of-order seqno/interrupt but instead tried to
assert that the relationship between the CS event interrupt and the
breadwrite should be strongly ordered. However, Icelake proves it is
possible for the HW implementation to forget about minor little details
such as write ordering and so the order between *processing* the CS
event and the breadcrumb is unreliable.

Remove the unreliable assertion, but leave a debug telltale in case we
have reason to suspect.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1658
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200422141749.28709-1-chris@chris-wilson.co.uk
2020-04-22 17:17:50 +01:00
Chris Wilson bd3ec9e758 drm/i915/gt: Poison residual state [HWSP] across resume.
Since we may lose the content of any buffer when we relinquish control
of the system (e.g. suspend/resume), we have to be careful not to rely
on regaining control. A good method to detect when we might be using
garbage is by always injecting that garbage prior to first use on
load/resume/etc.

v2: Drop sanitize callback on cleanup
v3: Move seqno reset to timeline enter, so we reset all timelines.
However, this is done on every activation during runtime and not reset.
The similar level of paranoia we apply to correcting context state after
a period of inactivity.

Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Venkata Ramana Nayana <venkata.ramana.nayana@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200421092504.7416-1-chris@chris-wilson.co.uk
2020-04-21 16:27:39 +01:00
Chris Wilson 23122a4d99 drm/i915/gt: Scrub execlists state on resume
Before we resume, we reset the HW so we restart from a known good state.
However, as a part of the reset process, we drain our pending CS event
queue -- and if we are resuming that does not correspond to internal
state. On setup, we are scrubbing the CS pointers, but alas only on
setup.

Apply the sanitization not just to setup, but to all resumes.

Reported-by: Venkata Ramana Nayana <venkata.ramana.nayana@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Venkata Ramana Nayana <venkata.ramana.nayana@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200416114117.3460-1-chris@chris-wilson.co.uk
2020-04-17 13:56:00 +01:00
Jani Nikula dc483ba501 drm/i915/gt: prefer struct drm_device based logging
Prefer struct drm_device based logging over struct device based logging.

No functional changes.

Cc: Wambui Karuga <wambui.karugax@gmail.com>
Reviewed-by: Wambui Karuga <wambui.karugax@gmail.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200402114819.17232-16-jani.nikula@intel.com
2020-04-08 13:49:35 +03:00
Chris Wilson c4e8ba7390 drm/i915/gt: Yield the timeslice if caught waiting on a user semaphore
If we find ourselves waiting on a MI_SEMAPHORE_WAIT, either within the
user batch or in our own preamble, the engine raises a
GT_WAIT_ON_SEMAPHORE interrupt. We can unmask that interrupt and so
respond to a semaphore wait by yielding the timeslice, if we have
another context to yield to!

The only real complication is that the interrupt is only generated for
the start of the semaphore wait, and is asynchronous to our
process_csb() -- that is, we may not have registered the timeslice before
we see the interrupt. To ensure we don't miss a potential semaphore
blocking forward progress (e.g. selftests/live_timeslice_preempt) we mark
the interrupt and apply it to the next timeslice regardless of whether it
was active at the time.

v2: We use semaphores in preempt-to-busy, within the timeslicing
implementation itself! Ergo, when we do insert a preemption due to an
expired timeslice, the new context may start with the missed semaphore
flagged by the retired context and be yielded, ad infinitum. To avoid
this, read the context id at the time of the semaphore interrupt and
only yield if that context is still active.

Fixes: 8ee36e048c ("drm/i915/execlists: Minimalistic timeslicing")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200407130811.17321-1-chris@chris-wilson.co.uk
2020-04-07 14:43:58 +01:00
Chris Wilson 848862e672 drm/i915/gt: Free request pool from virtual engines
While extremely unlikely to be populated, we could capture a request on
the virtual engine which we should free along with the virtual engine.

Fixes: 43acd6516c ("drm/i915: Keep a per-engine request pool")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200403203303.10903-1-chris@chris-wilson.co.uk
2020-04-03 21:50:24 +01:00
Chris Wilson 4c977837ba drm/i915/execlists: Peek at the next submission for error interrupts
If we receive the error interrupt before the CS interrupt, we may find
ourselves without an active request to reset, skipping the GPU reset.
All because the attempt to reset was too early.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200401110435.30389-1-chris@chris-wilson.co.uk
2020-04-02 21:30:30 +01:00
Chris Wilson 9171555572 drm/i915/execlists: Pause CS flow before reset
Since we may be attempting to reset an active engine, we try to freeze
it in place before resetting -- to be on the safe side. We can go one
step further if we are using the CS flow semaphore to prevent the
context switching into the next.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200331091459.29179-2-chris@chris-wilson.co.uk
2020-03-31 21:42:12 +01:00
Chris Wilson f53ae29c0e drm/i915/gt: Include a few tracek for timeslicing
Add a few telltales to see when timeslicing is being enabled.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200331120502.14713-1-chris@chris-wilson.co.uk
2020-03-31 21:42:12 +01:00
Chris Wilson e2ccf0d009 drm/i915/execlists: Double check breadcrumb before crying foul
process_csb: 0000:00:02.0 bcs0: cs-irq head=4, tail=5
  process_csb: 0000:00:02.0 bcs0: csb[5]: status=0x00008002:0x60000020
  trace_ports: 0000:00:02.0 bcs0: preempted { ff84:45154! prio 2 }
  trace_ports: 0000:00:02.0 bcs0: promote { ff84:45155* prio 2 }
  trace_ports: 0000:00:02.0 bcs0: submit { ff84:45156 prio 2 }

  process_csb: 0000:00:02.0 bcs0: cs-irq head=5, tail=6
  process_csb: 0000:00:02.0 bcs0: csb[6]: status=0x00000018:0x60000020
  trace_ports: 0000:00:02.0 bcs0: completed { ff84:45155* prio 2 }
  process_csb: 0000:00:02.0 bcs0: ring:{start:0x00178000, head:0928, tail:0928, ctl:00000000, mode:00000200}
  process_csb: 0000:00:02.0 bcs0: rq:{start:00178000, head:08b0, tail:08f0, seqno:ff84:45155, hwsp:45156},
  process_csb: 0000:00:02.0 bcs0: ctx:{start:00178000, head:e000928, tail:0928},
  process_csb: GEM_BUG_ON("context completed before request")

In this sequence, we can see that although we have submitted the next
request [ff84:45156] to HW (via ELSP[]) it has not yet reported the
lite-restore. Instead, we see the completion event of the currently
active request [ff84:45155] but at the time of processing that event,
the breadcrumb has not yet been written. Though by the time we do print
out the debug info, the seqno write of ff84:45156 has landed!

Therefore there is a serialisation problem between the seqno writes and
CS events, not just between the CS buffer and its head/tail pointers as
previously observed on Icelake.

This is not a huge problem, as we don't strictly rely on the breadcrumb
to determine HW activity, but it may indicate that interrupt delivery is
before the seqno write, aka bringing back the plague of missed
interrupts from yesteryear. However, there is no indication of this
wider problem, so let's just flush the seqno read before reporting an
error. If it persists after the fresh read we can worry again.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200330234318.30638-1-chris@chris-wilson.co.uk
2020-03-31 10:02:04 +01:00
Chris Wilson b28b34ac85 drm/i915/execlists: Explicitly reset both reg and context runtime
Upon a GPU reset, we copy the default context image over top of the
guilty image. This will rollback the CTX_TIMESTAMP register to before
our value of ce->runtime.last. Reset both back to 0 so that we do not
encounter an underflow on the next schedule out after resume.

This should not be a huge issue in practice, as hangs should be rare in
correct code.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200330125827.5804-1-chris@chris-wilson.co.uk
2020-03-30 21:13:50 +01:00
Chris Wilson 8b6d457f95 drm/i915/execlists: Include priority info in trace_ports
Add some extra information into trace_ports to help with reviewing
correctness.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200330113137.24425-1-chris@chris-wilson.co.uk
2020-03-30 17:56:00 +01:00
Chris Wilson 35f3fd8182 drm/i915/execlists: Workaround switching back to a completed context
In what seems remarkably similar to the w/a required to not reload an
idle context with HEAD==TAIL, it appears we must prevent the HW from
switching to an idle context in ELSP[1], while simultaneously trying to
preempt the HW to run another context and a continuation of the idle
context (which is no longer idle).

We can achieve this by preventing the context from completing while we
reload a new ELSP (by applying ring_set_paused(1) across the whole of
dequeue), except this eventually fails due to a lite-restore into a
waiting semaphore does not generate an ACK. Instead, we try to avoid
making the GPU do anything too challenging and not submit a new ELSP
while the interrupts + CSB events appear to have fallen behind the
completed contexts. We expect it to catch up shortly so we queue another
tasklet execution and hope for the best.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1501
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200327201433.21864-1-chris@chris-wilson.co.uk
2020-03-27 20:53:26 +00:00
Chris Wilson 6c81e21a47 drm/i915/gt: Stage the transfer of the virtual breadcrumb
We move the virtual breadcrumb from one physical engine to the next, if
the next virtual request is scheduled on a new physical engine. Since
the virtual context can only be in one signal queue, we need it to track
the current physical engine for the new breadcrumbs. However, to move
the list we need both breadcrumb locks -- and since we cannot take both
at the same time (unless we are careful and always ensure consistent
ordering) stage the movement of the signaler via the current virtual
request.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1510
Fixes: 6d06779e86 ("drm/i915: Load balancing across a virtual engine")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200325130059.30600-1-chris@chris-wilson.co.uk
2020-03-25 13:59:41 +00:00
Chris Wilson 6670b413f8 drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission
We dropped calling process_csb prior to handling direct submission in
order to avoid the nesting of spinlocks and lift process_csb() and the
majority of the tasklet out of irq-off. However, we do want to avoid
ksoftirqd latency in the fast path, so try and pull the interrupt-bh
local to direct submission if we can acquire the tasklet's lock.

v2: Document the read of pending[0] from outside the tasklet with
READ_ONCE.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200325120227.8044-1-chris@chris-wilson.co.uk
2020-03-25 13:05:04 +00:00
Chris Wilson 9bf7c31386 drm/i915/execlists: Drop setting sibling priority hint on virtual engines
We set the priority hint on execlists to avoid executing the tasklet for
when we know that there will be no change in execution order. However,
as we set it from the virtual engine for all siblings, but only one
physical engine may respond, we leave the hint set on the others
stopping direct submission that could take place.

If we do not set the hint, we may attempt direct submission even if we
don't expect to submit. If we set the hint, we may not do any submission
until the tasklet is run (and sometimes we may park the engine before
that has had a chance). Ergo there's only a minor ill-effect on mixed
virtual/physical engine workloads where we may try and fail to do direct
submission more often than required. (Pure virtual / engine workloads
will have redundant tasklet execution suppressed as normal.)

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1522
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200325101358.12231-1-chris@chris-wilson.co.uk
2020-03-25 10:53:46 +00:00
Chris Wilson 41e4065a6b drm/i915: Rely on direct submission to the queue
Drop the pretense of kicking the tasklet (used only for the defunct guc
submission backend, it should just take ownership of the submit!) and so
remove the bh-kicking from around submission.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200323092841.22240-5-chris@chris-wilson.co.uk
2020-03-23 11:51:39 +00:00
Wambui Karuga 91682e45ba drm/i915/lrc: convert to struct drm_device based logging macros.
Convert various instances of the printk based drm logging macros to the
struct drm_device based logging macros.

Note that this converts DRM_DEBUG_DRIVER() to drm_dbg() but does not
convert DRM_DEBUG() due to the lack of an analogous drm_device based
macro.

References: https://lists.freedesktop.org/archives/dri-devel/2020-January/253381.html
Signed-off-by: Wambui Karuga <wambui.karugax@gmail.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200314183344.17603-3-wambui.karugax@gmail.com
2020-03-19 11:34:11 +02:00
Caz Yokoyama 175c4d9b3b Revert "drm/i915/tgl: Add extra hdc flush workaround"
This reverts commit 36a6b5d964.

The commit takes care Wa_1604544889 which was fixed on a0 stepping based on
a0 replan. So no SW workaround is required on any stepping now.

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Caz Yokoyama <caz.yokoyama@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Fixes: 36a6b5d964 ("drm/i915/tgl: Add extra hdc flush workaround")
Link: https://patchwork.freedesktop.org/patch/msgid/1c751032ce79c80c5485cae315f1a9904ce07cac.1583359940.git.caz.yokoyama@intel.com
2020-03-12 15:19:00 -07:00
Chris Wilson 60ef5b7ac6 drm/i915/execlists: Track active elements during dequeue
Record the initial active element we use when building the next ELSP
submission, so that we can compare against it latter to see if there's
no change.

Fixes: 44d0a9c05b ("drm/i915/execlists: Skip redundant resubmission")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200311092624.10012-2-chris@chris-wilson.co.uk
2020-03-11 11:59:59 +00:00
Chris Wilson 3a55dc895e drm/i915/execlists: Mark up data-races in virtual engines
The virtual engine passes tokens back and forth to its backing physical
engines.

[   57.372993] BUG: KCSAN: data-race in execlists_dequeue [i915] / virtual_submission_tasklet [i915]
[   57.373012]
[   57.373023] write to 0xffff8881f47324c0 of 4 bytes by interrupt on cpu 2:
[   57.373241]  execlists_dequeue+0x6fa/0x2150 [i915]
[   57.373458]  __execlists_submission_tasklet+0x48/0x60 [i915]
[   57.373677]  execlists_submission_tasklet+0xd3/0x170 [i915]
[   57.373694]  tasklet_action_common.isra.0+0x42/0xa0
[   57.373709]  __do_softirq+0xd7/0x2cd
[   57.373723]  irq_exit+0xbe/0xe0
[   57.373735]  do_IRQ+0x51/0x100
[   57.373748]  ret_from_intr+0x0/0x1c
[   57.373963]  engine_retire+0x89/0xe0 [i915]
[   57.373977]  process_one_work+0x3b1/0x690
[   57.373990]  worker_thread+0x80/0x670
[   57.374004]  kthread+0x19a/0x1e0
[   57.374017]  ret_from_fork+0x1f/0x30
[   57.374027]
[   57.374038] read to 0xffff8881f47324c0 of 4 bytes by interrupt on cpu 3:
[   57.374256]  virtual_submission_tasklet+0x27/0x5a0 [i915]
[   57.374273]  tasklet_action_common.isra.0+0x42/0xa0
[   57.374288]  __do_softirq+0xd7/0x2cd
[   57.374302]  run_ksoftirqd+0x15/0x20
[   57.374315]  smpboot_thread_fn+0x1ab/0x300
[   57.374329]  kthread+0x19a/0x1e0
[   57.374342]  ret_from_fork+0x1f/0x30

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200310141320.24149-3-chris@chris-wilson.co.uk
2020-03-10 23:12:39 +00:00
Chris Wilson f494960d5e drm/i915/gt: Defend against concurrent updates to execlists->active
[  206.875637] BUG: KCSAN: data-race in __i915_schedule+0x7fc/0x930 [i915]
[  206.875654]
[  206.875666] race at unknown origin, with read to 0xffff8881f7644480 of 8 bytes by task 703 on cpu 3:
[  206.875901]  __i915_schedule+0x7fc/0x930 [i915]
[  206.876130]  __bump_priority+0x63/0x80 [i915]
[  206.876361]  __i915_sched_node_add_dependency+0x258/0x300 [i915]
[  206.876593]  i915_sched_node_add_dependency+0x50/0xa0 [i915]
[  206.876824]  i915_request_await_dma_fence+0x1da/0x530 [i915]
[  206.877057]  i915_request_await_object+0x2fe/0x470 [i915]
[  206.877287]  i915_gem_do_execbuffer+0x45dc/0x4c20 [i915]
[  206.877517]  i915_gem_execbuffer2_ioctl+0x2c3/0x580 [i915]
[  206.877535]  drm_ioctl_kernel+0xe4/0x120
[  206.877549]  drm_ioctl+0x297/0x4c7
[  206.877563]  ksys_ioctl+0x89/0xb0
[  206.877577]  __x64_sys_ioctl+0x42/0x60
[  206.877591]  do_syscall_64+0x6e/0x2c0
[  206.877606]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

v2: Be safe and include mb

References: https://gitlab.freedesktop.org/drm/intel/issues/1318
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200309170540.10332-1-chris@chris-wilson.co.uk
2020-03-09 20:38:57 +00:00
Chris Wilson a4e648a0b3 drm/i915/execlsts: Mark up racy inspection of current i915_request priority
[  120.176548] BUG: KCSAN: data-race in __i915_schedule [i915] / effective_prio [i915]
[  120.176566]
[  120.176577] write to 0xffff8881e35e6540 of 4 bytes by task 730 on cpu 3:
[  120.176792]  __i915_schedule+0x63e/0x920 [i915]
[  120.177007]  __bump_priority+0x63/0x80 [i915]
[  120.177220]  __i915_sched_node_add_dependency+0x258/0x300 [i915]
[  120.177438]  i915_sched_node_add_dependency+0x50/0xa0 [i915]
[  120.177654]  i915_request_await_dma_fence+0x1da/0x530 [i915]
[  120.177867]  i915_request_await_object+0x2fe/0x470 [i915]
[  120.178081]  i915_gem_do_execbuffer+0x45dc/0x4c20 [i915]
[  120.178292]  i915_gem_execbuffer2_ioctl+0x2c3/0x580 [i915]
[  120.178309]  drm_ioctl_kernel+0xe4/0x120
[  120.178322]  drm_ioctl+0x297/0x4c7
[  120.178335]  ksys_ioctl+0x89/0xb0
[  120.178348]  __x64_sys_ioctl+0x42/0x60
[  120.178361]  do_syscall_64+0x6e/0x2c0
[  120.178375]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  120.178387]
[  120.178397] read to 0xffff8881e35e6540 of 4 bytes by interrupt on cpu 2:
[  120.178606]  effective_prio+0x25/0xc0 [i915]
[  120.178812]  process_csb+0xe8b/0x10a0 [i915]
[  120.179021]  execlists_submission_tasklet+0x30/0x170 [i915]
[  120.179038]  tasklet_action_common.isra.0+0x42/0xa0
[  120.179053]  __do_softirq+0xd7/0x2cd
[  120.179066]  irq_exit+0xbe/0xe0
[  120.179078]  do_IRQ+0x51/0x100
[  120.179090]  ret_from_intr+0x0/0x1c
[  120.179104]  cpuidle_enter_state+0x1b8/0x5d0
[  120.179117]  cpuidle_enter+0x50/0x90
[  120.179131]  do_idle+0x1a1/0x1f0
[  120.179145]  cpu_startup_entry+0x14/0x16
[  120.179158]  start_secondary+0x120/0x180
[  120.179172]  secondary_startup_64+0xa4/0xb0

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200309110934.868-5-chris@chris-wilson.co.uk
2020-03-09 18:24:13 +00:00
Chris Wilson fa192d90cf drm/i915/execlists: Mark up read of i915_request.fence.flags
[  145.927961] BUG: KCSAN: data-race in can_merge_rq [i915] / signal_irq_work [i915]
[  145.927980]
[  145.927992] write (marked) to 0xffff8881e513fab0 of 8 bytes by interrupt on cpu 2:
[  145.928250]  signal_irq_work+0x134/0x640 [i915]
[  145.928268]  irq_work_run_list+0xd7/0x120
[  145.928283]  irq_work_run+0x1d/0x50
[  145.928300]  smp_irq_work_interrupt+0x21/0x30
[  145.928328]  irq_work_interrupt+0xf/0x20
[  145.928356]  _raw_spin_unlock_irqrestore+0x34/0x40
[  145.928596]  execlists_submission_tasklet+0xde/0x170 [i915]
[  145.928616]  tasklet_action_common.isra.0+0x42/0xa0
[  145.928632]  __do_softirq+0xd7/0x2cd
[  145.928646]  irq_exit+0xbe/0xe0
[  145.928665]  do_IRQ+0x51/0x100
[  145.928684]  ret_from_intr+0x0/0x1c
[  145.928699]  schedule+0x0/0xb0
[  145.928719]  worker_thread+0x194/0x670
[  145.928743]  kthread+0x19a/0x1e0
[  145.928765]  ret_from_fork+0x1f/0x30
[  145.928784]
[  145.928796] read to 0xffff8881e513fab0 of 8 bytes by task 738 on cpu 1:
[  145.929046]  can_merge_rq+0xb1/0x100 [i915]
[  145.929282]  __execlists_submission_tasklet+0x866/0x25a0 [i915]
[  145.929518]  execlists_submit_request+0x2a4/0x2b0 [i915]
[  145.929758]  submit_notify+0x8f/0xc0 [i915]
[  145.929989]  __i915_sw_fence_complete+0x5d/0x3e0 [i915]
[  145.930221]  i915_sw_fence_complete+0x58/0x80 [i915]
[  145.930453]  i915_sw_fence_commit+0x16/0x20 [i915]
[  145.930698]  __i915_request_queue+0x60/0x70 [i915]
[  145.930935]  i915_gem_do_execbuffer+0x3997/0x4c20 [i915]
[  145.931175]  i915_gem_execbuffer2_ioctl+0x2c3/0x580 [i915]
[  145.931194]  drm_ioctl_kernel+0xe4/0x120
[  145.931208]  drm_ioctl+0x297/0x4c7
[  145.931222]  ksys_ioctl+0x89/0xb0
[  145.931238]  __x64_sys_ioctl+0x42/0x60
[  145.931260]  do_syscall_64+0x6e/0x2c0
[  145.931275]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200309110934.868-4-chris@chris-wilson.co.uk
2020-03-09 18:24:13 +00:00
Chris Wilson 875c3b4b5c drm/i915/gt: Mark up racy check of last list element
[   25.025543] BUG: KCSAN: data-race in __i915_request_create [i915] / process_csb [i915]
[   25.025561]
[   25.025573] write (marked) to 0xffff8881e85c1620 of 8 bytes by task 696 on cpu 1:
[   25.025789]  __i915_request_create+0x54b/0x5d0 [i915]
[   25.026001]  i915_request_create+0xcc/0x150 [i915]
[   25.026218]  i915_gem_do_execbuffer+0x2f70/0x4c20 [i915]
[   25.026428]  i915_gem_execbuffer2_ioctl+0x2c3/0x580 [i915]
[   25.026445]  drm_ioctl_kernel+0xe4/0x120
[   25.026459]  drm_ioctl+0x297/0x4c7
[   25.026472]  ksys_ioctl+0x89/0xb0
[   25.026484]  __x64_sys_ioctl+0x42/0x60
[   25.026497]  do_syscall_64+0x6e/0x2c0
[   25.026510]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   25.026522]
[   25.026532] read to 0xffff8881e85c1620 of 8 bytes by interrupt on cpu 2:
[   25.026742]  process_csb+0x8d6/0x1070 [i915]
[   25.026949]  execlists_submission_tasklet+0x30/0x170 [i915]
[   25.026969]  tasklet_action_common.isra.0+0x42/0xa0
[   25.026984]  __do_softirq+0xd7/0x2cd
[   25.026997]  irq_exit+0xbe/0xe0
[   25.027009]  do_IRQ+0x51/0x100
[   25.027021]  ret_from_intr+0x0/0x1c
[   25.027033]  poll_idle+0x3e/0x13b
[   25.027047]  cpuidle_enter_state+0x189/0x5d0
[   25.027060]  cpuidle_enter+0x50/0x90
[   25.027074]  do_idle+0x1a1/0x1f0
[   25.027086]  cpu_startup_entry+0x14/0x16
[   25.027100]  start_secondary+0x120/0x180
[   25.027116]  secondary_startup_64+0xa4/0xb0

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200309110934.868-2-chris@chris-wilson.co.uk
2020-03-09 18:23:59 +00:00
Chris Wilson 23a44ae9e8 drm/i915/execlists: Mark up the racy access to switch_priority_hint
[ 7534.150687] BUG: KCSAN: data-race in __execlists_submission_tasklet [i915] / process_csb [i915]
[ 7534.150706]
[ 7534.150717] write to 0xffff8881f1bc24b4 of 4 bytes by task 24404 on cpu 3:
[ 7534.150925]  __execlists_submission_tasklet+0x1158/0x2780 [i915]
[ 7534.151133]  execlists_submit_request+0x2e8/0x2f0 [i915]
[ 7534.151348]  submit_notify+0x8f/0xc0 [i915]
[ 7534.151549]  __i915_sw_fence_complete+0x5d/0x3e0 [i915]
[ 7534.151753]  i915_sw_fence_complete+0x58/0x80 [i915]
[ 7534.151963]  i915_sw_fence_commit+0x16/0x20 [i915]
[ 7534.152179]  __i915_request_queue+0x60/0x70 [i915]
[ 7534.152388]  i915_gem_do_execbuffer+0x3997/0x4c20 [i915]
[ 7534.152598]  i915_gem_execbuffer2_ioctl+0x2c3/0x580 [i915]
[ 7534.152615]  drm_ioctl_kernel+0xe4/0x120
[ 7534.152629]  drm_ioctl+0x297/0x4c7
[ 7534.152642]  ksys_ioctl+0x89/0xb0
[ 7534.152654]  __x64_sys_ioctl+0x42/0x60
[ 7534.152667]  do_syscall_64+0x6e/0x2c0
[ 7534.152681]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 7534.152693]
[ 7534.152703] read to 0xffff8881f1bc24b4 of 4 bytes by interrupt on cpu 2:
[ 7534.152914]  process_csb+0xe7c/0x10a0 [i915]
[ 7534.153120]  execlists_submission_tasklet+0x30/0x170 [i915]
[ 7534.153138]  tasklet_action_common.isra.0+0x42/0xa0
[ 7534.153153]  __do_softirq+0xd7/0x2cd
[ 7534.153166]  run_ksoftirqd+0x15/0x20
[ 7534.153180]  smpboot_thread_fn+0x1ab/0x300
[ 7534.153194]  kthread+0x19a/0x1e0
[ 7534.153207]  ret_from_fork+0x1f/0x30

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200309144249.10309-1-chris@chris-wilson.co.uk
2020-03-09 17:08:58 +00:00
Chris Wilson 3df2deed41 drm/i915/execlists: Enable timeslice on partial virtual engine dequeue
If we stop filling the ELSP due to an incompatible virtual engine
request, check if we should enable the timeslice on behalf of the queue.

This fixes the case where we are inspecting the last->next element when
we know that the last element is the last request in the execution queue,
and so decided we did not need to enable timeslicing despite the intent
to do so!

Fixes: 8ee36e048c ("drm/i915/execlists: Minimalistic timeslicing")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.4+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200306113012.3184606-1-chris@chris-wilson.co.uk
2020-03-07 00:05:54 +00:00
Chris Wilson 1eaa251b66 drm/i915: Assert requests within a context are submitted in order
Check the flow of requests into the hardware to verify that are
submitted in order along their timeline.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200306071614.2846708-1-chris@chris-wilson.co.uk
2020-03-06 10:53:54 +00:00
Chris Wilson 81dcef4cee drm/i915/execlists: Show the "switch priority hint" in dumps
Show the timeslicing priority hint in engine dumps to aide debugging.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200305135843.2760512-1-chris@chris-wilson.co.uk
2020-03-05 15:46:59 +00:00
Chris Wilson 8e9f84cf5c drm/i915/gt: Propagate change in error status to children on unhold
As we release the head requests back into the queue, propagate any
change in error status that may have occurred while the requests were
temporarily suspended.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1277
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200304121849.2448028-2-chris@chris-wilson.co.uk
2020-03-04 14:29:50 +00:00
Chris Wilson 36e191f064 drm/i915: Apply i915_request_skip() on submission
Trying to use i915_request_skip() prior to i915_request_add() causes us
to try and fill the ring upto request->postfix, which has not yet been
set, and so may cause us to memset() past the end of the ring.

Instead of skipping the request immediately, just flag the error on the
request (only accepting the first fatal error we see) and then clear the
request upon submission.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200304121849.2448028-1-chris@chris-wilson.co.uk
2020-03-04 14:29:50 +00:00
Chris Wilson 15db5fcce9 drm/i915/execlists: Check the sentinel is alone in the ELSP
We only use sentinel requests for "preempt-to-idle" passes, so assert
that they are the only request in a new submission.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200302085812.4172450-12-chris@chris-wilson.co.uk
2020-03-02 21:28:17 +00:00
Chris Wilson 3fc28d3e0e drm/i915/gt: Reset queue_priority_hint after wedging
An odd and highly unlikely path caught us out. On delayed submission
(due to an asynchronous reset handler), we poked the priority_hint and
kicked the tasklet. However, we had already marked the device as wedged
and swapped out the tasklet for a no-op. The result was that we never
cleared the priority hint and became upset when we later checked.

<0> [574.303565] i915_sel-6278    2.... 481822445us : __i915_subtests: Running intel_execlists_live_selftests/live_error_interrupt
<0> [574.303565] i915_sel-6278    2.... 481822472us : __engine_unpark: 0000:00:02.0 rcs0:
<0> [574.303565] i915_sel-6278    2.... 481822491us : __gt_unpark: 0000:00:02.0
<0> [574.303565] i915_sel-6278    2.... 481823220us : execlists_context_reset: 0000:00:02.0 rcs0: context:f4ee reset
<0> [574.303565] i915_sel-6278    2.... 481824830us : __intel_context_active: 0000:00:02.0 rcs0: context:f51b active
<0> [574.303565] i915_sel-6278    2.... 481825258us : __intel_context_do_pin: 0000:00:02.0 rcs0: context:f51b pin ring:{start:00006000, head:0000, tail:0000}
<0> [574.303565] i915_sel-6278    2.... 481825311us : __i915_request_commit: 0000:00:02.0 rcs0: fence f51b:2, current 0
<0> [574.303565] i915_sel-6278    2d..1 481825347us : __i915_request_submit: 0000:00:02.0 rcs0: fence f51b:2, current 0
<0> [574.303565] i915_sel-6278    2d..1 481825363us : trace_ports: 0000:00:02.0 rcs0: submit { f51b:2, 0:0 }
<0> [574.303565] i915_sel-6278    2.... 481826809us : __intel_context_active: 0000:00:02.0 rcs0: context:f51c active
<0> [574.303565]   <idle>-0       7d.h2 481827326us : cs_irq_handler: 0000:00:02.0 rcs0: CS error: 1
<0> [574.303565]   <idle>-0       7..s1 481827377us : process_csb: 0000:00:02.0 rcs0: cs-irq head=3, tail=4
<0> [574.303565]   <idle>-0       7..s1 481827379us : process_csb: 0000:00:02.0 rcs0: csb[4]: status=0x10000001:0x00000000
<0> [574.305593]   <idle>-0       7..s1 481827385us : trace_ports: 0000:00:02.0 rcs0: promote { f51b:2*, 0:0 }
<0> [574.305611]   <idle>-0       7..s1 481828179us : execlists_reset: 0000:00:02.0 rcs0: reset for CS error
<0> [574.305611] i915_sel-6278    2.... 481828284us : __intel_context_do_pin: 0000:00:02.0 rcs0: context:f51c pin ring:{start:00007000, head:0000, tail:0000}
<0> [574.305611] i915_sel-6278    2.... 481828345us : __i915_request_commit: 0000:00:02.0 rcs0: fence f51c:2, current 0
<0> [574.305611]   <idle>-0       7dNs2 481847823us : __i915_request_unsubmit: 0000:00:02.0 rcs0: fence f51b:2, current 1
<0> [574.305611]   <idle>-0       7dNs2 481847857us : execlists_hold: 0000:00:02.0 rcs0: fence f51b:2, current 1 on hold
<0> [574.305611]   <idle>-0       7.Ns1 481847863us : intel_engine_reset: 0000:00:02.0 rcs0: flags=4
<0> [574.305611]   <idle>-0       7.Ns1 481847945us : execlists_reset_prepare: 0000:00:02.0 rcs0: depth<-1
<0> [574.305611]   <idle>-0       7.Ns1 481847946us : intel_engine_stop_cs: 0000:00:02.0 rcs0:
<0> [574.305611]   <idle>-0       7.Ns1 538584284us : intel_engine_stop_cs: 0000:00:02.0 rcs0: timed out on STOP_RING -> IDLE
<0> [574.305611]   <idle>-0       7.Ns1 538584347us : __intel_gt_reset: 0000:00:02.0 engine_mask=1
<0> [574.305611]   <idle>-0       7.Ns1 538584406us : execlists_reset_rewind: 0000:00:02.0 rcs0:
<0> [574.305611]   <idle>-0       7dNs2 538585050us : __i915_request_reset: 0000:00:02.0 rcs0: fence f51b:2, current 1 guilty? yes
<0> [574.305611]   <idle>-0       7dNs2 538585063us : __execlists_reset: 0000:00:02.0 rcs0: replay {head:0000, tail:0068}
<0> [574.306565]   <idle>-0       7.Ns1 538588457us : intel_engine_cancel_stop_cs: 0000:00:02.0 rcs0:
<0> [574.306565]   <idle>-0       7dNs2 538588462us : __i915_request_submit: 0000:00:02.0 rcs0: fence f51c:2, current 0
<0> [574.306565]   <idle>-0       7dNs2 538588471us : trace_ports: 0000:00:02.0 rcs0: submit { f51c:2, 0:0 }
<0> [574.306565]   <idle>-0       7.Ns1 538588474us : execlists_reset_finish: 0000:00:02.0 rcs0: depth->1
<0> [574.306565] kworker/-202     2.... 538588755us : i915_request_retire: 0000:00:02.0 rcs0: fence f51c:2, current 2
<0> [574.306565] ksoftirq-46      7..s. 538588773us : process_csb: 0000:00:02.0 rcs0: cs-irq head=11, tail=1
<0> [574.306565] ksoftirq-46      7..s. 538588774us : process_csb: 0000:00:02.0 rcs0: csb[0]: status=0x10000001:0x00000000
<0> [574.306565] ksoftirq-46      7..s. 538588776us : trace_ports: 0000:00:02.0 rcs0: promote { f51c:2!, 0:0 }
<0> [574.306565] ksoftirq-46      7..s. 538588778us : process_csb: 0000:00:02.0 rcs0: csb[1]: status=0x10000018:0x00000020
<0> [574.306565] ksoftirq-46      7..s. 538588779us : trace_ports: 0000:00:02.0 rcs0: completed { f51c:2!, 0:0 }
<0> [574.306565] kworker/-202     2.... 538588826us : intel_context_unpin: 0000:00:02.0 rcs0: context:f51c unpin
<0> [574.306565] i915_sel-6278    6.... 538589663us : __intel_gt_set_wedged.part.32: 0000:00:02.0 start
<0> [574.306565] i915_sel-6278    6.... 538589667us : execlists_reset_prepare: 0000:00:02.0 rcs0: depth<-0
<0> [574.306565] i915_sel-6278    6.... 538589710us : intel_engine_stop_cs: 0000:00:02.0 rcs0:
<0> [574.306565] i915_sel-6278    6.... 538589732us : execlists_reset_prepare: 0000:00:02.0 bcs0: depth<-0
<0> [574.307591] i915_sel-6278    6.... 538589733us : intel_engine_stop_cs: 0000:00:02.0 bcs0:
<0> [574.307591] i915_sel-6278    6.... 538589757us : execlists_reset_prepare: 0000:00:02.0 vcs0: depth<-0
<0> [574.307591] i915_sel-6278    6.... 538589758us : intel_engine_stop_cs: 0000:00:02.0 vcs0:
<0> [574.307591] i915_sel-6278    6.... 538589771us : execlists_reset_prepare: 0000:00:02.0 vcs1: depth<-0
<0> [574.307591] i915_sel-6278    6.... 538589772us : intel_engine_stop_cs: 0000:00:02.0 vcs1:
<0> [574.307591] i915_sel-6278    6.... 538589778us : execlists_reset_prepare: 0000:00:02.0 vecs0: depth<-0
<0> [574.307591] i915_sel-6278    6.... 538589780us : intel_engine_stop_cs: 0000:00:02.0 vecs0:
<0> [574.307591] i915_sel-6278    6.... 538589786us : __intel_gt_reset: 0000:00:02.0 engine_mask=ff
<0> [574.307591] i915_sel-6278    6.... 538591175us : execlists_reset_cancel: 0000:00:02.0 rcs0:
<0> [574.307591] i915_sel-6278    6.... 538591970us : execlists_reset_cancel: 0000:00:02.0 bcs0:
<0> [574.307591] i915_sel-6278    6.... 538591982us : execlists_reset_cancel: 0000:00:02.0 vcs0:
<0> [574.307591] i915_sel-6278    6.... 538591996us : execlists_reset_cancel: 0000:00:02.0 vcs1:
<0> [574.307591] i915_sel-6278    6.... 538592759us : execlists_reset_cancel: 0000:00:02.0 vecs0:
<0> [574.307591] i915_sel-6278    6.... 538592977us : execlists_reset_finish: 0000:00:02.0 rcs0: depth->0
<0> [574.307591] i915_sel-6278    6.N.. 538592996us : execlists_reset_finish: 0000:00:02.0 bcs0: depth->0
<0> [574.307591] i915_sel-6278    6.N.. 538593023us : execlists_reset_finish: 0000:00:02.0 vcs0: depth->0
<0> [574.307591] i915_sel-6278    6.N.. 538593037us : execlists_reset_finish: 0000:00:02.0 vcs1: depth->0
<0> [574.307591] i915_sel-6278    6.N.. 538593051us : execlists_reset_finish: 0000:00:02.0 vecs0: depth->0
<0> [574.307591] i915_sel-6278    6.... 538593407us : __intel_gt_set_wedged.part.32: 0000:00:02.0 end
<0> [574.307591] kworker/-210     7d..1 551958381us : execlists_unhold: 0000:00:02.0 rcs0: fence f51b:2, current 2 hold release
<0> [574.307591] i915_sel-6278    0.... 559490788us : i915_request_retire: 0000:00:02.0 rcs0: fence f51b:2, current 2
<0> [574.307591] i915_sel-6278    0.... 559490793us : intel_context_unpin: 0000:00:02.0 rcs0: context:f51b unpin
<0> [574.307591] i915_sel-6278    0.... 559490798us : __engine_park: 0000:00:02.0 rcs0: parked
<0> [574.307591] i915_sel-6278    0.... 559490982us : __intel_context_retire: 0000:00:02.0 rcs0: context:f51c retire runtime: { total:30004ns, avg:30004ns }
<0> [574.307591] i915_sel-6278    0.... 559491372us : __engine_park: __engine_park:261 GEM_BUG_ON(engine->execlists.queue_priority_hint != (-((int)(~0U >> 1)) - 1))

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200227085723.1961649-9-chris@chris-wilson.co.uk
2020-02-28 15:48:10 +00:00
Chris Wilson d3b03d8bf4 drm/i915/gt: Check engine-is-awake on reset later
As we drop the engine-pm on retiring, that may happen while there are
still CS events in the buffer. As such we cannot assert the engine is
still active on reset, until we know that the current request is still
in flight.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1338
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200227204727.2009346-1-chris@chris-wilson.co.uk
2020-02-28 09:30:14 +00:00