Commit Graph

13 Commits

Author SHA1 Message Date
Chris Wilson 87dc03ad26 drm/i915/selftests: Try to recover from a wedged GPU during reset tests
If we see the seqno stop progressing, we abandon the test for fear that
the GPU died following the reset. However, during test teardown we still
wait for the GPU to idle before continuing, but we have already
confirmed that the GPU is dead. Furthermore, since we are inside a reset
test, we have disabled the hangchecker, and so there is no safety net and
we wait indefinitely. Detect the stuck GPU and declare it wedged as a
state of emergency so we can escape.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jari Tahvanainen <jari.tahvanainen@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170915130929.18892-1-chris@chris-wilson.co.uk
Tested-by: Jari Tahvanainen <jari.tahvanainen@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
2017-09-26 14:19:25 +01:00
Chris Wilson f2f5c0610f drm/i915: Don't use MI_STORE_DWORD_IMM on Sandybridge/vcs
MI_STORE_DWORD_IMM just doesn't work on the video decode engine under
Sandybridge, so refrain from using it. Then switch the selftests over to
using the now common test prior to using MI_STORE_DWORD_IMM.

Fixes: 7dd4f6729f ("drm/i915: Async GPU relocation processing")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.13-rc1+
Link: https://patchwork.freedesktop.org/patch/msgid/20170816085210.4199-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
2017-08-18 11:55:02 +01:00
Chris Wilson 41533940a9 drm/i915/selftests: Retarget igt_render_engine_reset_fallback()
The purpose of the test was to check per-engine resets would fallback to
the global reset when required, but first we actually need a test for a
basic i915_handle_error()!

Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170728112110.6464-1-chris@chris-wilson.co.uk
2017-08-04 19:08:30 +01:00
Chris Wilson cb7ffbad18 drm/i915/selftests: Fix kbuild error
After applying af2788925ae0 ("drm/i915: Squelch reset messages during
selftests") out of sequence, I missed fixing up a call to i915_reset().

Reported-by: kbuild test robot <kbuild-all@01.org>
Fixes: af2788925ae0 ("drm/i915: Squelch reset messages during selftests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170725125336.11969-1-chris@chris-wilson.co.uk
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27 09:38:58 +02:00
Chris Wilson 535275d323 drm/i915: Squelch reset messages during selftests
During our selftests, we try reseting the GPU tens of thousands of
times, flooding the dmesg with our reset spam drowning out any potential
warnings. Add an option to i915_reset()/i915_reset_engine() to specify a
quiet reset for selftesting.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-19-chris@chris-wilson.co.uk
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27 09:38:57 +02:00
Chris Wilson 3744d49c6b drm/i915/selftest: Refactor reset locking
Extract the common barrier against rogue hangchecks from disrupting our
direct testing of resets, and in the process expand the lock to include
the per-engine reset shortcuts.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-17-chris@chris-wilson.co.uk
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27 09:38:57 +02:00
Chris Wilson 79f0f4724d drm/i915/selftests: Exercise independence of per-engine resets
If all goes well, resetting one engine should not affect the operation of
any others. So to test this, we setup a continuous stream of requests
onto to each of the "innocent" engines whilst constantly resetting our
target engine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-16-chris@chris-wilson.co.uk
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27 09:38:48 +02:00
Chris Wilson 774eed4a40 drm/i915/selftests: Fix mutex imbalance for igt_render_engine_reset_fallback
Smatch spots:

drivers/gpu/drm/i915/selftests/intel_hangcheck.c:669 igt_render_engine_reset_fallback() error: double unlock 'mutex:&i915->drm.struct_mutex'

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170623131907.24236-1-chris@chris-wilson.co.uk
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
2017-06-27 14:23:09 +01:00
Michel Thierry abeb4def31 drm/i915/selftests: reset engine self tests
Check that we can reset specific engines, also check the fallback to
full reset if something didn't work.

v2: rebase.
v3: use RESET_ENGINE_IN_PROGRESS flag.
v4: use I915_RESET_ENGINE flag.

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170615201828.23144-12-michel.thierry@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170620095751.13127-9-chris@chris-wilson.co.uk
2017-06-20 21:00:30 +01:00
Chris Wilson d5367307d4 drm/i915: Wait for concurrent global resets to complete
If we enter i915_handle_error() a second time and a global reset is
already in progress, we can simply wait for completion of the first
reset. Currently we exit early prior to the actual reset being
performed -- the worst of both worlds!

v2: Plug into the existing reset_queue, and remember that kselftests is
playing games with I915_RESET_BACKOFF to prevent hangcheck from screwing
up.
v3: Rename to i915_reset_device to fit in better with i915_reset_engine

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170620095751.13127-2-chris@chris-wilson.co.uk
2017-06-20 20:59:58 +01:00
Chris Wilson 72022a705e drm/i915: Move retire-requests into i915_gem_wait_for_idle()
As we now distinguish everywhere that can call
i915_gem_retire_requests() following a successful wait_for_idle, we can
remove the duplication by moving that call into i915_gem_wait_for_idle()
itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170330145041.9005-3-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-03-31 12:03:46 +01:00
Chris Wilson 8c185ecaf4 drm/i915: Split I915_RESET_IN_PROGRESS into two flags
I915_RESET_IN_PROGRESS is being used for both signaling the requirement
to i915_mutex_lock_interruptible() to avoid taking the struct_mutex and
to instruct a waiter (already holding the struct_mutex) to perform the
reset. To allow for a little more coordination, split these two meaning
into a couple of distinct flags. I915_RESET_BACKOFF tells
i915_mutex_lock_interruptible() not to acquire the mutex and
I915_RESET_HANDOFF tells the waiter to call i915_reset().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170316171305.12972-1-chris@chris-wilson.co.uk
2017-03-16 17:17:10 +00:00
Chris Wilson 496b575e3c drm/i915: Add initial selftests for hang detection and resets
Check that we can reset the GPU and continue executing from the next
request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170213171558.20942-47-chris@chris-wilson.co.uk
2017-02-13 20:46:53 +00:00