Commit Graph

823 Commits

Author SHA1 Message Date
Chris Wilson 11caf5517c drm/i915: Empty the ring before disabling
An interesting snippet from Sandybridge's prm:

"Although a Ring Buffer can be enabled in the non-empty state, it must
not be disabled unless it is empty. Attempting to disable a Ring Buffer
in the non-empty state is UNDEFINED."

Let's avoid the undefined behaviour as we disable the rings prior to
reset and resume.

v2: Tell HEAD to catch up to TAIL (empty ring) first, then reset both to
0 (supposedly while stopped).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171027094311.30380-1-chris@chris-wilson.co.uk
2017-10-27 12:09:29 +01:00
Chris Wilson aba5e27858 drm/i915: Add a hook for making the engines idle (parking) and unparking
In the next patch, we will want to install a callback when the engines
(GT as a whole) become idle and similarly when they first become busy.
To enable that callback, first rename intel_engines_mark_idle() to
intel_engines_park() and provide the companion intel_engines_unpark().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171025143943.7661-2-chris@chris-wilson.co.uk
2017-10-25 18:47:30 +01:00
Chris Wilson 3d574a6bbb drm/i915: Remove walk over obj->vma_list for the shrinker
In the next patch, we want to reduce the lock coverage within the
shrinker, and one of the dangerous walks we have is over obj->vma_list.
We are only walking the obj->vma_list in order to check whether it has
been permanently pinned by HW access, typically via use on the scanout.
But we have a couple of other long term pins, the context objects for
which we currently have to check the individual vma pin_count. If we
instead mark these using obj->pin_display, we can forgo the dangerous
and sometimes slow list iteration.

v2: Rearrange code to try and avoid confusion from false associations
due to arrangement of whitespace along with rebasing on obj->pin_global.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171013202621.7276-4-chris@chris-wilson.co.uk
2017-10-16 20:44:19 +01:00
Chris Wilson 7836cd02f2 drm/i915: Keep the rings stopped until they have been re-initialized
Before modifying the ring register (RING_START, HEAD, TAIL, CTL) we
first make sure it is stopped (or else the hw may not resample the
registers). However, we do not need to let the hw restart until after we
have reprogrammed all the rings. This should help prevent situations
where pending operations on the ring may resume (because we are trying
to re-initialize following an unsuccessful GPU hang, i.e. from
i915_gem_unset_wedged).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103260
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171013131218.18013-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-10-13 20:57:29 +01:00
Chris Wilson 67e6456485 drm/i915: Provide an assert for when we expect forcewake to be held
Add assert_forcewakes_active() (the complementary function to
assert_forcewakes_inactive) that documents the requirement of a
function for its callers to be holding the forcewake ref (i.e. the
function is part of a sequence over which RC6 must be prevented).

One such example is during ringbuffer reset, where RC6 must be held
across the whole reinitialisation sequence.

v2: Include debug information in the WARN so we know which fw domain is
missing.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> #v1
Link: https://patchwork.freedesktop.org/patch/msgid/20171009110301.21705-5-chris@chris-wilson.co.uk
2017-10-09 17:07:29 +01:00
Michal Wajdeczko 4f044a88a8 drm/i915: Rename global i915 to i915_modparams
Our global struct with params is named exactly the same way
as new preferred name for the drm_i915_private function parameter.
To avoid such name reuse lets use different name for the global.

v5: pure rename
v6: fix

Credits-to: Coccinelle

@@
identifier n;
@@
(
-	i915.n
+	i915_modparams.n
)

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Ville Syrjala <ville.syrjala@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170919193846.38060-1-michal.wajdeczko@intel.com
2017-09-22 14:50:36 +03:00
Chris Wilson 27a5f61b37 drm/i915: Cancel all ready but queued requests when wedging
When wedging the hw, we want to mark all in-flight requests as -EIO.
This is made slightly more complex by execlists who store the ready but
not yet submitted-to-hw requests on a private queue (an rbtree
priolist). Call into execlists to cancel not only the ELSP tracking for
the submitted requests, but also the queue of unsubmitted requests.

v2: Move the majority of engine_set_wedged to the backends (both legacy
ringbuffer and execlists handling their own lists).

Reported-by: Michał Winiarski <michal.winiarski@intel.com>
Testcase: igt/gem_eio/in-flight-contexts
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170915173100.26470-1-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
2017-09-18 10:59:55 +01:00
Ville Syrjälä c549808946 drm/i915: Mask everything in ring HWSTAM on gen6+ in ringbuffer mode
The execlist code already masks everything in the ring HWSTAM, but
the ringbuffer code doesn't. Let's go ahead and do that. Pre-gen6
platforms setup HWSTAM during irq setup already since there's just
the one register, and it also contains bits for non-ring interrupts.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170818183705.27850-13-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-09-15 14:42:55 +03:00
Daniele Ceraolo Spurio 486e93f72a drm/i915/lrc: allocate separate page for HWSP
On gen8+ we're currently using the PPHWSP of the kernel ctx as the
global HWSP. However, when the kernel ctx gets submitted (e.g. from
__intel_autoenable_gt_powersave) the HW will use that page as both
HWSP and PPHWSP. This causes a conflict in the register arena of the
HWSP, i.e. dword indices below 0x30. We don't current utilize this arena,
but in the following patches we will take advantage of the cached
register state for handling execlist's context status interrupt.

To avoid the conflict, instead of re-using the PPHWSP of the kernel
ctx we can allocate a separate page for the HWSP like what happens for
pre-execlists platform.

v2: Add a use-case for the register arena of the HWSP.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1499357440-34688-1-git-send-email-daniele.ceraolospurio@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170913085605.18299-3-chris@chris-wilson.co.uk
2017-09-13 15:02:39 +01:00
Michel Thierry a2d3d2655e drm/i915: Add a default case in gen7 hwsp switch-case
Gen7 won't get any new engines, and we already added VCS2 there to just
silence gcc's not handled in switch warnings.

Use a default case instead, otherwise we will need to keep adding extra
cases if changes happen in the future.

v2: Since reaching the default case is impossible, use GEM_BUG_ON (Chris).

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170830180115.907-1-michel.thierry@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-09-08 20:07:06 +01:00
Chris Wilson 6492ca79c8 drm/i915: Enforce that CS packets are qword aligned
We require the caller to ensure that the packets they wish to emit into
the CS ring are qword aligned (i.e. have an even number of dwords).
Double check this.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721161101.1618-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27 09:38:57 +02:00
Tvrtko Ursulin c58949f418 drm/i915: Do not re-calculate num_rings locally
Since bb8f0f5abd ("drm/i915: Split intel_engine allocation
and initialisation") intel_info->num_rings is set early in the
load sequence and so available to be used direclty in the 2nd
load phase.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170616130339.23015-1-tvrtko.ursulin@linux.intel.com
2017-06-20 12:42:13 +01:00
Chris Wilson 5e5655c32d drm/i915: Micro-optimise hotpath through intel_ring_begin()
Typically, there is space available within the ring and if not we have
to wait (by definition a slow path). Rearrange the code to reduce the
number of branches and stack size for the hotpath, accomodating a slight
growth for the wait.

v2: Fix the new assert that packets are not larger than the actual ring.
v3: Make the parameters unsigned as well to make usage.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170504130846.4807-3-chris@chris-wilson.co.uk
2017-05-04 15:40:38 +01:00
Chris Wilson 95aebcb2da drm/i915: Report the ring->space from intel_ring_update_space()
Some callers immediately want to know the current ring->space after
calling intel_ring_update_space(), which we can freely provide via the
return parameter.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170504130846.4807-2-chris@chris-wilson.co.uk
2017-05-04 15:40:38 +01:00
Chris Wilson 605d5b3297 drm/i915: Avoid the branch in computing intel_ring_space()
Exploit the power-of-two ring size to compute the space across the
wraparound using a mask rather than a if. Convert to unsigned integers
so the operation is well defined.

References: https://bugs.freedesktop.org/show_bug.cgi?id=99671
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170504130846.4807-1-chris@chris-wilson.co.uk
2017-05-04 15:40:38 +01:00
Chris Wilson 266a240bf0 drm/i915: Use engine->context_pin() to report the intel_ring
Since unifying ringbuffer/execlist submission to use
engine->pin_context, we ensure that the intel_ring is available before
we start constructing the request. We can therefore move the assignment
of the request->ring to the central i915_gem_request_alloc() and not
require it in every engine->request_alloc() callback. Another small step
towards simplification (of the core, but at a cost of handling error
pointers in less important callers of engine->pin_context).

v2: Rearrange a few branches to reduce impact of PTR_ERR() on gcc's code
generation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Oscar Mateo <oscar.mateo@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170504093308.4137-1-chris@chris-wilson.co.uk
2017-05-04 11:54:43 +01:00
Joonas Lahtinen 63ffbcdadc drm/i915: Sanitize engine context sizes
Pre-calculate engine context size based on engine class and device
generation and store it in the engine instance.

v2:
- Squash and get rid of hw_context_size (Chris)

v3:
- Move after MMIO init for probing on Gen7 and 8 (Chris)
- Retained rounding (Tvrtko)
v4:
- Rebase for deferred legacy context allocation

Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: intel-gvt-dev@lists.freedesktop.org
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-28 12:11:59 +03:00
Chris Wilson 3204c343bb drm/i915: Defer context state allocation for legacy ring submission
Almost from the outset for execlists, we used deferred allocation of the
logical context and rings. Then we ported the infrastructure for pinning
contexts back to legacy, and so now we are able to also implement
deferred allocation for context objects prior to first use on the legacy
submission.

v2: We still need to differentiate between legacy engines, Joonas is
fixing that but I want this first ;) (Joonas)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170427104651.22394-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-04-27 12:22:13 +01:00
Chris Wilson 0100186386 drm/i915: Poison the request before emitting commands
If we poison the request before we emit commands, it should be easier to
spot when we execute an uninitialised request.

References: https://bugs.freedesktop.org/show_bug.cgi?id=100144
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170423170619.7156-2-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-04-25 15:34:24 +01:00
Chris Wilson e6ba9992de drm/i915: Differentiate between sw write location into ring and last hw read
We need to keep track of the last location we ask the hw to read up to
(RING_TAIL) separately from our last write location into the ring, so
that in the event of a GPU reset we do not tell the HW to proceed into
a partially written request (which can happen if that request is waiting
for an external signal before being executed).

v2: Refactor intel_ring_reset() (Mika)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100144
Testcase: igt/gem_exec_fence/await-hang
Fixes: 821ed7df6e ("drm/i915: Update reset path to fix incomplete requests")
Fixes: d55ac5bf97 ("drm/i915: Defer transfer onto execution timeline to actual hw submission")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170425130049.26147-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-04-25 15:33:22 +01:00
Chris Wilson 2d6c4c8423 drm/i915: Use discardable buffers for rings
The contents of a ring are only valid between HEAD and TAIL, when the
ring is idle (HEAD == TAIL) we can simply let the pages go under memory
pressure if they are not pinned by an active context. Any new content
will be written after HEAD and so the ring will again be valid between
HEAD and TAIL, everything outside can be discarded.

Note that we take care of ensuring that we do not reset the HEAD
backwards following a GPU hang on an idle ring.

The same precautions are what enable us to use stolen memory for rings.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170420101709.27250-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-04-20 12:00:32 +01:00
Oscar Mateo 5ff36d36a3 drm/i915: Use the same vfunc for BSD2 ring init
If we needed to do something different for the init functions, we could
always look at the engine instance to make the distinction. But, in any
case, the two functions are virtually identical already (please notice
that BSD2_RING is only used from gen8 onwards).

With this, the init functions depends excusively on the engine class
(a fact that we will use soon).

v2: Commit message

Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1491834873-9345-3-git-send-email-oscar.mateo@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-11 12:57:51 +01:00
Chris Wilson f42bb651d1 drm/i915: Use safer intel_uncore_wait_for_register in ring-init
While we do hold the forcewake for legacy ringbuffer initialisation, we
don't guard our access with the uncore.lock spinlock. In theory, we only
initialise when no others should be accessing the same mmio cachelines,
but in practice be safe as this is an infrequently used path and not
worth risky micro-optimisations.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170411101340.31994-5-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-04-11 12:50:44 +01:00
Chris Wilson 02b312d05d drm/i915: Stop sleeping from inside gen6_bsd_submit_request()
submit_request() is called from an atomic context, it's not allowed to
sleep. We have to be careful in our parameters to
intel_uncore_wait_for_register() to limit ourselves to the atomic wait
loop and not incur the wrath of our warnings.

Fixes: 6976e74b5f ("drm/i915: Don't allow overuse of __intel_wait_for_register_fw()")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170410143807.22725-1-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/20170411101340.31994-2-chris@chris-wilson.co.uk
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-04-11 12:44:02 +01:00
Chris Wilson 1a5788bf27 drm/i915: Onion unwind for intel_init_ring_common()
Rather than call intel_engine_cleanup() with a partially constructed
engine, unwind the error during intel_init_ring_common().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170403113426.25707-2-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-04-03 13:53:14 +01:00
Chris Wilson d822bb18ce drm/i915: intel_ring.engine is unused
Or rather it is used only by intel_ring_pin() to extract the
drm_i915_private which we can easily pass in. As this is a relatively
rare operation, save the space in the struct, and as such it is even
break even in the extra code for passing around the parameter:

add/remove: 0/0 grow/shrink: 2/3 up/down: 15/-15 (0)
function                                     old     new   delta
intel_init_ring_buffer                       906     918     +12
execlists_context_pin                       1308    1311      +3
mock_engine                                  407     403      -4
intel_engine_create_ring                     367     363      -4
intel_ring_pin                               326     319      -7
Total: Before=1261794, After=1261794, chg +0.00%

v2: Reorder intel_init_ring_buffer to keep the ring setup together:

add/remove: 0/0 grow/shrink: 2/3 up/down: 9/-15 (-6)
function                                     old     new   delta
intel_init_ring_buffer                       906     912      +6
execlists_context_pin                       1308    1311      +3
mock_engine                                  407     403      -4
intel_engine_create_ring                     367     363      -4
intel_ring_pin                               326     319      -7
Total: Before=1261794, After=1261788, chg -0.00%

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170403113426.25707-1-chris@chris-wilson.co.uk
2017-04-03 13:52:58 +01:00
Chris Wilson ed1501d451 drm/i915: Refactor tests for validity of RING_TAIL
Whilst I like having the assertions clearly visible in the code, they
are quite repetitious! As we find new limits we want to incorporate into
the set of assertions, it make sense to refactor them to a common
routine.

v2: Add a guc holdout.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170327131412.20293-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-03-27 15:03:53 +01:00
Chris Wilson a91fdf1293 drm/i915: Assert that the request->tail fits within the ring
In addition to being qword-aligned, the RING_TAIL offset must be within
the ring!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170327130009.4678-2-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-03-27 15:03:21 +01:00
Chris Wilson 6f9b850bec drm/i915: Fix semaphore emission for BDW+ RCS ringbuffer emission
The required number of dwords for semaphore emission on BDW RCS is 8,
not 6 - leading to ring buffer corruption and immediate GPU hangs when
using ringbuffer submission.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170324151724.32640-2-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
2017-03-24 17:10:27 +00:00
Chris Wilson 5d4bac5503 drm/i915: Restore marking context objects as dirty on pinning
Commit e8a9c58fcd ("drm/i915: Unify active context tracking between
legacy/execlists/guc") converted the legacy intel_ringbuffer submission
to the same context pinning mechanism as execlists - that is to pin the
context until the subsequent request is retired. Previously it used the
vma retirement of the context object to keep itself pinned until the
next request (after i915_vma_move_to_active()). In the conversion, I
missed that the vma retirement was also responsible for marking the
object as dirty. Mark the context object as dirty when pinning
(equivalent to execlists) which ensures that if the context is swapped
out due to mempressure or suspend/hibernation, when it is loaded back in
it does so with the previous state (and not all zero).

Fixes: e8a9c58fcd ("drm/i915: Unify active context tracking between legacy/execlists/guc")
Reported-by: Dennis Gilmore <dennis@ausil.us>
Reported-by: Mathieu Marquer <mathieu.marquer@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99993
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100181
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.11-rc1
Link: http://patchwork.freedesktop.org/patch/msgid/20170322205930.12762-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-03-23 09:55:57 +00:00
Chris Wilson fe085f13c7 drm/i915: Remove intel_ring.last_retired_head
Storing the position of the breadcrumb of the last retired request as
a separate last_retired_head is superfluous as we always copy that into
head prior to recalculation of the intel_ring.space.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170321102552.24357-1-chris@chris-wilson.co.uk
2017-03-21 14:21:50 +00:00
Chris Wilson a533b4ba77 drm/i915: Assert that the context pin_counts do not overflow
This should be impossible, but let's assert that we do not pin a context
4 billion times before retiring!

v2: Fix the assertion -- the patch had just one job to do!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170316171628.3228-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-03-16 20:48:58 +00:00
Chris Wilson ff44ad51eb drm/i915: Move engine->submit_request selection to a vfunc
It turns out that we may want to restore the original
engine->submit_request (and engine->schedule) callbacks from more than
just the guc <-> execlists transition. Move this to a vfunc so we can
have a common interface.

v2: Move initial selection to intel_engines_init_common(), repaint vfunc
with engine->set_default_submission (and a similar colour for the
helper).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170316171305.12972-2-chris@chris-wilson.co.uk
2017-03-16 17:17:12 +00:00
Chris Wilson afeddf5081 drm/i915: Reduce context alignment
No hardware was ever shipped that needed more than 4096 byte alignment
and future hardware will not use this legacy path. So reduce the
alignment to make it easier and quicker to launch workloads.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170227135913.8056-3-chris@chris-wilson.co.uk
2017-02-27 16:01:47 +00:00
Chris Wilson 944a36d472 drm/i915: Assert that the request->tail is always qword aligned
The hardware requires that the tail pointer only advance in qword units,
so assert that the value we write is aligned to qwords, and similarly
enforce this restriction onto the request->tail.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170217163833.731-1-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
2017-02-20 14:32:25 +00:00
Tvrtko Ursulin 9f235dfa49 drm/i915: Consolidate gen8_emit_pipe_control
We have a few open coded instances in the execlists code and an
almost suitable helper in intel_ringbuf.c

We can consolidate to a single helper if we change the existing
helper to emit directly to ring buffer memory and move the space
reservation outside it.

v2: Drop memcpy for memset. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170216122325.31391-2-tvrtko.ursulin@linux.intel.com
2017-02-17 11:39:59 +00:00
Tvrtko Ursulin 133b4bd74d drm/i915: Move common workaround code to intel_engine_cs
It is used by all submission backends.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-02-17 11:39:59 +00:00
Tvrtko Ursulin 2f35afe94a drm/i915: Make int __intel_ring_space static
It is only used within intel_ringbuffer.c

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.oc.uk>
2017-02-17 11:39:59 +00:00
Tvrtko Ursulin 4ac9659ef9 drm/i915: Remove duplicate intel_logical_ring_workarounds_emit
intel_ring_workarounds_emit is exactly the same code.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170214150017.16058-1-tvrtko.ursulin@linux.intel.com
2017-02-15 08:16:41 +00:00
Robert Bragg 9cc19733fd drm/i915: fix for WaDisableDopClockGating:bdw
This workaround for BDW was incomplete as it also requires EUTC clock
gating to be disabled via UCGCTL1.

v2: read modify write UCGTL1 in broadwell_init_clock_gating (Ville)

Signed-off-by: Robert Bragg <robert@sixbynine.org>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170212133252.20990-1-robert@sixbynine.org
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
2017-02-14 22:29:28 +02:00
Tvrtko Ursulin 73dec95e6b drm/i915: Emit to ringbuffer directly
This removes the usage of intel_ring_emit in favour of
directly writing to the ring buffer.

intel_ring_emit was preventing the compiler for optimising
fetch and increment of the current ring buffer pointer and
therefore generating very verbose code for every write.

It had no useful purpose since all ringbuffer operations
are started and ended with intel_ring_begin and
intel_ring_advance respectively, with no bail out in the
middle possible, so it is fine to increment the tail in
intel_ring_begin and let the code manage the pointer
itself.

Useless instruction removal amounts to approximately
two and half kilobytes of saved text on my build.

Not sure if this has any measurable performance
implications but executing a ton of useless instructions
on fast paths cannot be good.

v2:
 * Change return from intel_ring_begin to error pointer by
   popular demand.
 * Move tail increment to intel_ring_advance to enable some
   error checking.

v3:
 * Move tail advance back into intel_ring_begin.
 * Rebase and tidy.

v4:
 * Complete rebase after a few months since v3.

v5:
 * Remove unecessary cast and fix !debug compile. (Chris Wilson)

v6:
 * Make intel_ring_offset take request as well.
 * Fix recording of request postfix plus a sprinkle of asserts.
   (Chris Wilson)

v7:
 * Use intel_ring_offset to get the postfix. (Chris Wilson)
 * Convert GVT code as well.

v8:
 * Rename *out++ to *cs++.

v9:
 * Fix GVT out to cs conversion in GVT.

v10:
 * Rebase for new intel_ring_begin in selftests.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170214113242.29241-1-tvrtko.ursulin@linux.intel.com
2017-02-14 14:30:46 +00:00
Chris Wilson ae9a043b0c drm/i915: Rename conditional GEM execution macros
After a brief discussion, we settled on a naming convention for the
conditional GEM debugging data that should be clearer to the casual
user: GEM_DEBUG

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170207102319.10910-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-02-10 21:43:43 +00:00
Chris Wilson 72b72ae473 drm/i915: Always pin contexts into the high GGTT
Now that we have fast top-down insertion into the drm_mm, we can use it
for frequent runtime operations like insertion of the context object,
whereas before we limited it to the one-off insertion of the pinned
kernel context. Keeping the active context objects out of the mappable
region of the global GTT (except under memory pressure) improves our
ability to allocate mappable aperture region without triggering a GPU
stall.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170210101422.1598-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-02-10 13:58:40 +00:00
Chris Wilson c0dcb203fb drm/i915: Restore context and pd for ringbuffer submission after reset
Following a reset, the context and page directory registers are lost.
However, the queue of requests that we resubmit after the reset may
depend upon them - the registers are restored from a context image, but
that restore may be inhibited and may simply be absent from the request
if it was in the middle of a sequence using the same context. If we
prime the CCID/PD registers with the first request in the queue (even
for the hung request), we prevent invalid memory access for the
following requests (and continually hung engines).

v2: Magic BIT(8), reserved for future use but still appears unused.
v3: Some commentary on handling innocent vs guilty requests
v4: Add a wait for PD_BASE fetch. The reload appears to be instant on my
Ivybridge, but this bit probably exists for a reason.

Fixes: 821ed7df6e ("drm/i915: Update reset path to fix incomplete requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170207152437.4252-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-02-07 21:34:58 +00:00
Chris Wilson eca56a3511 drm/i915: Mark the end of intel_ring_begin() and check in intel_ring_advance()
It is required that the caller declare the exact number of dwords they
wish to write into the ring. This is required for two reasons, we need
to allocate sufficient space for the entire command packet and we need
to be sure that the contents are completely written to avoid executing
stale data. The current interface requires for any bug to be caught in
review, the reader has to carefully count the number of
intel_ring_emit() between intel_ring_begin() and intel_ring_advance().
If we record the end of the packet of each intel_ring_begin() we can
also have CI check for us.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170206170502.30944-1-chris@chris-wilson.co.uk
2017-02-06 20:41:00 +00:00
Ander Conselvan de Oliveira 9fb5026f85 drm/i915/glk: Turn on workarounds that apply to Geminilake too
Apply workarounds to Geminilake, and annotate those that are applied
unconditionally when they apply to GLK based on the workaround database.

v2: Fix commit message typos. (David)
v3: Rebase.
Cc: David Weinehall <david.weinehall@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1485422218-9102-1-git-send-email-ander.conselvan.de.oliveira@intel.com
2017-01-30 09:49:07 +02:00
Chris Wilson a01cb37aff drm/i915: Remove i915_vma_create from VMA API
With the introduce of i915_vma_instance() for obtaining the VMA
singleton for a (obj, vm, view) tuple, we can remove the
i915_vma_create() in favour of a single entry point. We do incur a
lookup onto an empty tree, but the i915_vma_create() were being called
infrequently and during initialisation, so the small overhead is
negligible.

v2: Drop the i915_ prefix from the now static vma_create() function

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170116152131.18089-4-chris@chris-wilson.co.uk
2017-01-19 10:17:39 +00:00
Francisco Jerez 8726f2faa3 drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround.
The WaDisableLSQCROPERFforOCL workaround has the side effect of
disabling an L3SQ optimization that has huge performance implications
and is unlikely to be necessary for the correct functioning of usual
graphic workloads.  Userspace is free to re-enable the workaround on
demand, and is generally in a better position to determine whether the
workaround is necessary than the DRM is (e.g. only during the
execution of compute kernels that rely on both L3 fences and HDC R/W
requests).

The same workaround seems to apply to BDW (at least to production
stepping G1) and SKL as well (the internal workaround database claims
that it does for all steppings, while the BSpec workaround table only
mentions pre-production steppings), but the DRM doesn't do anything
beyond whitelisting the L3SQCREG4 register so userspace can enable it
when it sees fit.  Do the same on KBL platforms.

Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%,
and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master --
This is followed by a regression of 35% and 10% respectively for the
same benchmarks and platform caused by my recent patch series
switching userspace to use the dataport constant cache instead of the
sampler to implement uniform pull constant loads, which caused us to
hit more heavily the L3 cache (and on platforms other than KBL had the
opposite effect of improving performance of the same two benchmarks).
The overall effect on KBL of this change combined with the recent
userspace change is respectively 4.6% and 2.6%.  SynMark2 OglShMapPcf
was affected by the constant cache changes (though it improved as it
did on other platforms rather than regressing), but is not
significantly affected by this patch (with statistical significance of
5% and sample size 20).

v2: Drop some more code to avoid unused variable warning.

Fixes: 738fa1b312 ("drm/i915/kbl: Add WaDisableLSQCROPERFforOCL")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: beignet@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v4.7+
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
[Removed double Fixes tag]
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484217894-20505-1-git-send-email-mika.kuoppala@intel.com
2017-01-12 15:48:08 +02:00
Chris Wilson f51455d442 drm/i915: Replace 4096 with PAGE_SIZE or I915_GTT_PAGE_SIZE
Start converting over from the byte count to its semantic macro, either
we want to allocate the size of a physical page in main memory or we
want the size of a virtual page in the GTT. 4096 could mean either, but
PAGE_SIZE and I915_GTT_PAGE_SIZE are explicit and should help improve
code comprehension and future changes. In the future, we may want to use
variable GTT page sizes and so have the challenge of knowing which
hardcoded values were used to represent a physical page vs the virtual
page.

v2: Look for a few more 4096s to convert, discover IS_ALIGNED().
v3: 4096ul paranoia, make fence alignment a distinct value of 4096, keep
bdw stolen w/a as 4096 until we know better.
v4: Add asserts that i915_vma_insert() start/end are aligned to GTT page
sizes.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110144734.26052-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-01-10 20:54:32 +00:00
Chris Wilson 984ff29f74 drm/i915: Simplify testing for am-I-the-kernel-context?
The kernel context (dev_priv->kernel_context) is unique in that it is
not associated with any user filp - it is the only one with
ctx->file_priv == NULL. This is a simpler test than comparing it against
dev_priv->kernel_context which involves some pointer dancing.

In checking that this is true, we notice that the gvt context is
allocating itself a i915_hw_ppgtt it doesn't use and not flagging that
its file_priv should be invalid.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170106152013.24684-5-chris@chris-wilson.co.uk
2017-01-06 16:02:12 +00:00
Daniele Ceraolo Spurio d3ef1af6fd drm/i915: request ring to be pinned above GUC_WOPCM_TOP
GuC will validate the ring offset and fail if it is in the
[0, GUC_WOPCM_TOP) range. The bias is conditionally applied only
if GuC loading is enabled (we can't check for guc submission enabled as
in other cases because HuC loading requires this fix).

Note that the default context is processed before enable_guc_loading is
sanitized, so we might still apply the bias to its ring even if it is
not needed.

v2: compute the value during ctx init and pass it to
    intel_ring_pin (Chris), updated commit message

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Cc: Anusha Srivatsa <anusha.srivatsa@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1482537382-28584-1-git-send-email-daniele.ceraolospurio@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-12-24 10:06:59 +00:00
Chris Wilson f73e73999d drm/i915: Swap if(enable_execlists) in i915_gem_request_alloc for a vfunc
A fairly trivial move of a matching pair of routines (for preparing a
request for construction) onto an engine vfunc. The ulterior motive is
to be able to create a mock request implementation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-7-chris@chris-wilson.co.uk
2016-12-18 16:18:56 +00:00
Chris Wilson e8a9c58fcd drm/i915: Unify active context tracking between legacy/execlists/guc
The requests conversion introduced a nasty bug where we could generate a
new request in the middle of constructing a request if we needed to idle
the system in order to evict space for a context. The request to idle
would be executed (and waited upon) before the current one, creating a
minor havoc in the seqno accounting, as we will consider the current
request to already be completed (prior to deferred seqno assignment) but
ring->last_retired_head would have been updated and still could allow
us to overwrite the current request before execution.

We also employed two different mechanisms to track the active context
until it was switched out. The legacy method allowed for waiting upon an
active context (it could forcibly evict any vma, including context's),
but the execlists method took a step backwards by pinning the vma for
the entire active lifespan of the context (the only way to evict was to
idle the entire GPU, not individual contexts). However, to circumvent
the tricky issue of locking (i.e. we cannot take struct_mutex at the
time of i915_gem_request_submit(), where we would want to move the
previous context onto the active tracker and unpin it), we take the
execlists approach and keep the contexts pinned until retirement.
The benefit of the execlists approach, more important for execlists than
legacy, was the reduction in work in pinning the context for each
request - as the context was kept pinned until idle, it could short
circuit the pinning for all active contexts.

We introduce new engine vfuncs to pin and unpin the context
respectively. The context is pinned at the start of the request, and
only unpinned when the following request is retired (this ensures that
the context is idle and coherent in main memory before we unpin it). We
move the engine->last_context tracking into the retirement itself
(rather than during request submission) in order to allow the submission
to be reordered or unwound without undue difficultly.

And finally an ulterior motive for unifying context handling was to
prepare for mock requests.

v2: Rename to last_retired_context, split out legacy_context tracking
for MI_SET_CONTEXT.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-3-chris@chris-wilson.co.uk
2016-12-18 16:18:50 +00:00
Jani Nikula 2a307c2e91 drm/i915: add some more "i" in platform names for consistency
Consistency FTW.

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/9ab811dc06570bd3fc05a917ade1bdc9bb805a75.1480520526.git.jani.nikula@intel.com
2016-12-07 15:19:31 +02:00
Tvrtko Ursulin 12d79d7828 drm/i915: Make GEM object create and create from data take dev_priv
Makes all GEM object constructors consistent.

v2: Fix compilation in GVT code.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v1)
2016-12-01 18:01:08 +00:00
Tvrtko Ursulin 187685cb90 drm/i915: Make GEM object alloc/free and stolen created take dev_priv
Where it is more appropriate and also to be consistent with
the direction of the driver.

v2: Leave out object alloc/free inlining. (Joonas Lahtinen)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-12-01 18:00:15 +00:00
Chris Wilson d55ac5bf97 drm/i915: Defer transfer onto execution timeline to actual hw submission
Defer the transfer from the client's timeline onto the execution
timeline from the point of readiness to the point of actual submission.
For example, in execlists, a request is finally submitted to hardware
when the hardware is ready, and only put onto the hardware queue when
the request is ready. By deferring the transfer, we ensure that the
timeline is maintained in retirement order if we decide to queue the
requests onto the hardware in a different order than fifo.

v2: Rebased onto distinct global/user timeline lock classes.
v3: Play with the position of the spin_lock().
v4: Nesting finally resolved with distinct sw_fence lock classes.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-4-chris@chris-wilson.co.uk
2016-11-14 21:00:23 +00:00
Chris Wilson 07c9a21a0d drm/i915: Export a function to flush the context upon pinning
For legacy contexts we employ an optimisation to only flush the context
when binding into the global GTT. This avoids stalling on the GPU when
reloading an active context. Wrap this detail up into a helper and
export it for a potential third user. (Longer term, context pinning
needs to be reworked as the current handling of switch context pins too
late and so risks eviction and corrupting the request. Plans, plans,
plans.)

v2: Expand the comment explaining the optimisation for avoiding the
stall on active contexts.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20161030132820.32163-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
2016-11-01 14:26:33 +00:00
Chris Wilson 85e17f5974 drm/i915: Move the global sync optimisation to the timeline
Currently we try to reduce the number of synchronisations (now the
number of requests we need to wait upon) by noting that if we have
earlier waited upon a request, all subsequent requests in the timeline
will be after the wait. This only applies to requests in this timeline,
as other timelines will not be ordered by that waiter.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-30-chris@chris-wilson.co.uk
2016-10-28 20:53:54 +01:00
Chris Wilson caddfe7192 drm/i915: Defer breadcrumb emission
Move the actual emission of the breadcrumb for closing the request from
i915_add_request() to the submit callback. (It can be moved later when
required.) This allows us to defer the allocation of the global_seqno
from request construction to actual submission, allowing us to emit the
requests out of order (wrt to the order of their construction, they
still will only be executed one all of their dependencies are resolved
including that all earlier requests on their timeline have been
submitted.) We have to specialise how we then emit the request in order
to write into the preallocated space, rather than at the tail of the
ringbuffer (which will have been advanced by the addition of new
requests).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-29-chris@chris-wilson.co.uk
2016-10-28 20:53:54 +01:00
Chris Wilson 98f29e8d90 drm/i915: Record space required for breadcrumb emission
In the next patch, we will use deferred breadcrumb emission. That requires
reserving sufficient space in the ringbuffer to emit the breadcrumb, which
first requires us to know how large the breadcrumb is.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-28-chris@chris-wilson.co.uk
2016-10-28 20:53:53 +01:00
Chris Wilson 9b81d556b1 drm/i915: Rename ->emit_request to ->emit_breadcrumb
Now that the emission of the request tail and its submission to hardware
are two separate steps, engine->emit_request() is confusing.
engine->emit_request() is called to emit the breadcrumb commands for the
request into the ring, name it such (engine->emit_breadcrumb).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-27-chris@chris-wilson.co.uk
2016-10-28 20:53:53 +01:00
Chris Wilson 65e4760e39 drm/i915: Introduce a global_seqno for each request
Though we will have multiple timelines, we still have a single timeline
of execution. This we can use to provide an execution and retirement order
of requests. This keeps tracking execution of requests simple, and vital
for preserving a single waiter (i.e. so that we can order the waiters so
that only the earliest to wakeup need be woken). To accomplish this we
distinguish the seqno used to order requests per-context (external) and
that used internally for execution.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-26-chris@chris-wilson.co.uk
2016-10-28 20:53:53 +01:00
Chris Wilson 4e50f082ac drm/i915: Reuse the active golden render state batch
The golden render state is constant, but we recreate the batch setting
it up for every new context. If we keep that batch in a volatile cache
we can safely reuse it whenever we need to initialise a new context. We
mark the pages as purgeable and use the shrinker to recover pages from
the batch whenever we face memory pressues, recreating that batch afresh
on the next new context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtien@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-8-chris@chris-wilson.co.uk
2016-10-28 20:53:44 +01:00
Chris Wilson 920cf41949 drm/i915: Introduce an internal allocator for disposable private objects
Quite a few of our objects used for internal hardware programming do not
benefit from being swappable or from being zero initialised. As such
they do not benefit from using a shmemfs backing storage and since they
are internal and never directly exposed to the user, we do not need to
worry about providing a filp. For these we can use an
drm_i915_gem_object wrapper around a sg_table of plain struct page. They
are not swap backed and not automatically pinned. If they are reaped
by the shrinker, the pages are released and the contents discarded. For
the internal use case, this is fine as for example, ringbuffers are
pinned from being written by a request to be read by the hardware. Once
they are idle, they can be discarded entirely. As such they are a good
match for execlist ringbuffers and a small variety of other internal
objects.

In the first iteration, this is limited to the scratch batch buffers we
use (for command parsing and state initialisation).

v2: Allocate physically contiguous pages, where possible.
v3: Reduce maximum order on subsequent requests following an allocation
failure.
v4: Fix up mismatch between swiotlb segment size and page count (it
counts in 2k units, not 4k pages)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-7-chris@chris-wilson.co.uk
2016-10-28 20:53:44 +01:00
Chris Wilson f8a7fde456 drm/i915: Defer active reference until required
We only need the active reference to keep the object alive after the
handle has been deleted (so as to prevent a synchronous gem_close). Why
then pay the price of a kref on every execbuf when we can insert that
final active ref just in time for the handle deletion?

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-6-chris@chris-wilson.co.uk
2016-10-28 20:53:43 +01:00
Chris Wilson e95433c73a drm/i915: Rearrange i915_wait_request() accounting with callers
Our low-level wait routine has evolved from our generic wait interface
that handled unlocked, RPS boosting, waits with time tracking. If we
push our GEM fence tracking to use reservation_objects (required for
handling multiple timelines), we lose the ability to pass the required
information down to i915_wait_request(). However, if we push the extra
functionality from i915_wait_request() to the individual callsites
(i915_gem_object_wait_rendering and i915_gem_wait_ioctl) that make use
of those extras, we can both simplify our low level wait and prepare for
extending the GEM interface for use of reservation_objects.

v2: Rewrite i915_wait_request() kerneldocs

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-4-chris@chris-wilson.co.uk
2016-10-28 20:53:43 +01:00
Akash Goel f4e9af4f5a drm/i915: Add low level set of routines for programming PM IER/IIR/IMR register set
So far PM IER/IIR/IMR registers were being used only for Turbo related
interrupts. But interrupts coming from GuC also use the same set.
As a precursor to supporting GuC interrupts, added new low level routines
so as to allow sharing the programming of PM IER/IIR/IMR registers between
Turbo & GuC.
Also similar to PM IMR, maintaining a bitmask for PM IER register, to allow
easy sharing of it between Turbo & GuC without involving a rmw operation.

v2:
- For appropriateness & avoid any ambiguity, rename old functions
  enable/disable pm_irq to mask/unmask pm_irq and rename new functions
  enable/disable pm_interrupts to enable/disable pm_irq. (Tvrtko)
- Use u32 in place of uint32_t. (Tvrtko)

v3:
- Rename the fields pm_irq_mask & pm_ier_mask and do some cleanup. (Chris)
- Rebase.

v4: Fix the inadvertent disabling of User interrupt for VECS ring causing
    failure for certain IGTs.

v5: Use dev_priv with HAS_VEBOX macro. (Tvrtko)

Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-10-25 09:34:06 +01:00
Arkadiusz Hiler 465418c606 drm/i915/gen9: Remove WaEnableYV12BugFixInHalfSliceChicken7
Dropping WA because it was for early steppings.

It is fixed in newer preproduction and all production revisions.

v2: add references, updated commit message

References: HSD#2126385, HSD#2131381, BSID#0764
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Signed-off-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476977460-28088-1-git-send-email-arkadiusz.hiler@intel.com
2016-10-21 14:22:50 +03:00
Akash Goel 3b3f1650b1 drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
	struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
	struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.

There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).

v2:
- Remove the engine iterator field added in drm_i915_private structure,
  instead pass a local iterator variable to the for_each_engine**
  macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
  NULL pointer check on engine pointer. (Chris)

v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
  can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
  engine specific init is done later in Driver load sequence.

v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().

v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
  allocation of intel_engine_cs structure. (Chris)

v6:
- Rebase.

v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.

v8: Rebase.

v9: Rebase.

v10:
- For index calculation use engine ID instead of pointer based arithmetic in
  intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
  check for NULL engine pointer in cleanup() routines. (Joonas)

v11: Rebase.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 09:58:43 +01:00
Chris Wilson ad07dfcddf drm/i915: Reset the breadcrumbs IRQ more carefully
Along with the interrupt, we want to restore the fake-irq and
wait-timeout detection. If we use the breadcrumbs interface to setup the
interrupt as it wants, the auxiliary timers will also be restored.

v2: Cancel both timers as well, sanitize the IMR.

Fixes: 821ed7df6e ("drm/i915: Update reset path to fix incomplete requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161007065327.24515-3-chris@chris-wilson.co.uk
2016-10-07 08:27:23 +01:00
Chris Wilson 1b36595ffb drm/i915: Show RING registers through debugfs
Knowing where the RINGs are pointing is extremely useful in diagnosing
if the engines are executing the ringbuffers you expect - and igt may be
suppressing the usual method of looking in the GPU error state.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161004201132.21801-7-chris@chris-wilson.co.uk
2016-10-05 08:40:06 +01:00
Chris Wilson 62ae14b1ed drm/i915: Share the computation of ring size for RING_CTL register
Since both legacy and execlists want to populate the RING_CTL register,
share the computation of the right bits for the ring->size. We can then
stop masking errors and explicitly forbid them during creation!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161004201132.21801-1-chris@chris-wilson.co.uk
2016-10-05 08:40:05 +01:00
Jani Nikula 3ec92362cd drm/i915/skl: drop workarounds for F0 revision
Pre-production hardware is not supported.

Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/e5433e6430dcfd941209c4d8103035ddb13d17b4.1474034059.git.jani.nikula@intel.com
2016-09-26 12:14:05 +03:00
Jani Nikula 3be192e92d drm/i915/skl: drop workarounds for E0 revision
Pre-production hardware is not supported.

Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/0633a02177195703502ef2396aab03efc0314334.1474034059.git.jani.nikula@intel.com
2016-09-26 12:12:56 +03:00
Jani Nikula 9fc736e833 drm/i915/skl: drop workarounds for D0 revision
Pre-production hardware is not supported.

Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/d28d21ceddeec226b5d1a20a7382bee9a72709a4.1474034059.git.jani.nikula@intel.com
2016-09-26 12:12:19 +03:00
Jani Nikula 0d0b8dcf94 drm/i915/skl: drop workarounds for C0 revision
Pre-production hardware is not supported.

Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/ed7b784306b35fa5215b9c04de79a2bc48585503.1474034059.git.jani.nikula@intel.com
2016-09-26 12:11:39 +03:00
Jani Nikula a117f378f4 drm/i915/skl: drop workarounds for A0 and B0 revisions
Pre-production hardware is not supported.

Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/7929af62a68504c84038a8db1625bd96ebaa9e6f.1474034059.git.jani.nikula@intel.com
2016-09-26 12:08:22 +03:00
Chris Wilson 821ed7df6e drm/i915: Update reset path to fix incomplete requests
Update reset path in preparation for engine reset which requires
identification of incomplete requests and associated context and fixing
their state so that engine can resume correctly after reset.

The request that caused the hang will be skipped and head is reset to the
start of breadcrumb. This allows us to resume from where we left-off.
Since this request didn't complete normally we also need to cleanup elsp
queue manually. This is vital if we employ nonblocking request
submission where we may have a web of dependencies upon the hung request
and so advancing the seqno manually is no longer trivial.

ABI: gem_reset_stats / DRM_IOCTL_I915_GET_RESET_STATS

We change the way we count pending batches. Only the active context
involved in the reset is marked as either innocent or guilty, and not
mark the entire world as pending. By inspection this only affects
igt/gem_reset_stats (which assumes implementation details) and not
piglit.

ARB_robustness gives this guide on how we expect the user of this
interface to behave:

 * Provide a mechanism for an OpenGL application to learn about
   graphics resets that affect the context.  When a graphics reset
   occurs, the OpenGL context becomes unusable and the application
   must create a new context to continue operation. Detecting a
   graphics reset happens through an inexpensive query.

And with regards to the actual meaning of the reset values:

   Certain events can result in a reset of the GL context. Such a reset
   causes all context state to be lost. Recovery from such events
   requires recreation of all objects in the affected context. The
   current status of the graphics reset state is returned by

	enum GetGraphicsResetStatusARB();

   The symbolic constant returned indicates if the GL context has been
   in a reset state at any point since the last call to
   GetGraphicsResetStatusARB. NO_ERROR indicates that the GL context
   has not been in a reset state since the last call.
   GUILTY_CONTEXT_RESET_ARB indicates that a reset has been detected
   that is attributable to the current GL context.
   INNOCENT_CONTEXT_RESET_ARB indicates a reset has been detected that
   is not attributable to the current GL context.
   UNKNOWN_CONTEXT_RESET_ARB indicates a detected graphics reset whose
   cause is unknown.

The language here is explicit in that we must mark up the guilty batch,
but is loose enough for us to relax the innocent (i.e. pending)
accounting as only the active batches are involved with the reset.

In the future, we are looking towards single engine resetting (with
minimal locking), where it seems inappropriate to mark the entire world
as innocent since the reset occurred on a different engine. Reducing the
information available means we only have to encounter the pain once, and
also reduces the information leaking from one context to another.

v2: Legacy ringbuffer submission required a reset following hibernation,
or else we restore stale values to the RING_HEAD and walked over
stolen garbage.

v3: GuC requires replaying the requests after a reset.

v4: Restore engine IRQ after reset (so waiters will be woken!)
    Rearm hangcheck if resetting with a waiter.

Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-13-chris@chris-wilson.co.uk
2016-09-09 14:23:05 +01:00
Chris Wilson 221fe79945 drm/i915: Perform a direct reset of the GPU from the waiter
If a waiter is holding the struct_mutex, then the reset worker cannot
reset the GPU until the waiter returns. We do not want to return -EAGAIN
form i915_wait_request as that breaks delicate operations like
i915_vma_unbind() which often cannot be restarted easily, and returning
-EIO is just as useless (and has in the past proven dangerous). The
remaining WARN_ON(i915_wait_request) serve as a valuable reminder that
handling errors from an indefinite wait are tricky.

We can keep the current semantic that knowing after a reset is complete,
so is the request, by performing the reset ourselves if we hold the
mutex.

uevent emission is still handled by the reset worker, so it may appear
slightly out of order with respect to the actual reset (and concurrent
use of the device).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-11-chris@chris-wilson.co.uk
2016-09-09 14:23:04 +01:00
Chris Wilson 22dd3bb919 drm/i915: Mark up all locked waiters
In the next patch we want to handle reset directly by a locked waiter in
order to avoid issues with returning before the reset is handled. To
handle the reset, we must first know whether we hold the struct_mutex.
If we do not hold the struct_mtuex we can not perform the reset, but we do
not block the reset worker either (and so we can just continue to wait for
request completion) - otherwise we must relinquish the mutex.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-10-chris@chris-wilson.co.uk
2016-09-09 14:23:03 +01:00
Chris Wilson ea746f3659 drm/i915: Expand bool interruptible to pass flags to i915_wait_request()
We need finer control over wakeup behaviour during i915_wait_request(),
so expand the current bool interruptible to a bitmask.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-9-chris@chris-wilson.co.uk
2016-09-09 14:23:03 +01:00
Carlos Santa 3177659a41 drm/i915: Make HWS_NEEDS_PHYSICAL the exception
Make the .hws_needs_physical the exception by switching the flag
on earlier platforms since they are fewer to support. Remove the flag on
later GPUs hardware since they all use GTT hws by default.

Switch the logic as well in the driver to reflect this change

Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2016-09-07 16:07:09 -07:00
Imre Deak 43b6799814 drm/i915: sseu: Use sseu_dev_info in device info
Move all slice/subslice/eu related properties to the sseu_dev_info
struct.

No functional change.

v2:
- s/info/sseu/ based on the new struct name. (Ben)

Reviewed-by: Robert Bragg <robert@sixbynine.org> (v1)
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
Tested-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
Signed-off-by: Imre Deak <imre.deak@intel.com>
2016-09-02 18:16:31 +03:00
Chris Wilson c58b735fc7 drm/i915: Allocate rings from stolen
If we have stolen available, make use of it for ringbuffer allocation.
Previously this was restricted to !llc platforms, as writing to stolen
requires a GGTT mapping - but now that we have partial mappable support,
the mappable aperture isn't quite so precious so we can use it more
freely and ringbuffers are a good user for the otherwise wasted stolen.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-18-chris@chris-wilson.co.uk
2016-08-18 22:36:49 +01:00
Chris Wilson 9d80841ea4 drm/i915: Allow ringbuffers to be bound anywhere
Now that we have WC vmapping available, we can bind our rings anywhere
in the GGTT and do not need to restrict them to the mappable region.
Except for stolen objects, for which direct access is verbatim and we
must use the mappable aperture.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-17-chris@chris-wilson.co.uk
2016-08-18 22:36:48 +01:00
Tvrtko Ursulin 318f89ca20 drm/i915: Initialize legacy semaphores from engine hw id indexed array
Build the legacy semaphore initialisation array using the engine
hardware ids instead of driver internal ones. This makes the
static array size dependent only on the number of gen6 semaphore
engines.

Also makes the per-engine semaphore wait and signal tables
hardware id indexed saving some more space.

v2: Refactor I915_GEN6_NUM_ENGINES to GEN6_SEMAPHORE_LAST. (Chris Wilson)
v3: More polish. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1471363461-9973-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-08-17 11:29:56 +01:00
Chris Wilson 21a2c58a9c drm/i915: Record the RING_MODE register for post-mortem debugging
Just another useful register to inspect following a GPU hang.

v2: Remove partial decoding of RING_MODE to userspace, be consistent and
use GEN > 2 guards around RING_MODE everywhere.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-32-git-send-email-chris@chris-wilson.co.uk
2016-08-15 11:01:19 +01:00
Chris Wilson bde13ebdab drm/i915: Introduce i915_ggtt_offset()
This little helper only exists to safely discard the upper unused 32bits
of the general 64-bit VMA address - as we know that all Global GTT
currently are less than 4GiB in size and so that the upper bits must be
zero. In many places, we use a u32 for the global GTT offset and we want
to document where we are discarding the full VMA offset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-28-git-send-email-chris@chris-wilson.co.uk
2016-08-15 11:01:14 +01:00
Chris Wilson 19880c4a3f drm/i915: Consolidate i915_vma_unpin_and_release()
In a few places, we repeat a call to clear a pointer to a vma whilst
unpinning and releasing a reference to its owner. Refactor those into a
common function.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-26-git-send-email-chris@chris-wilson.co.uk
2016-08-15 11:01:12 +01:00
Chris Wilson 51d545d026 drm/i915: Use VMA as the primary tracker for semaphore page
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-23-git-send-email-chris@chris-wilson.co.uk
2016-08-15 11:01:10 +01:00
Chris Wilson 57f275a22b drm/i915: Move common seqno reset to intel_engine_cs.c
Since the intel_engine_init_seqno() is shared by all engine submission
backends, move it out of the legacy intel_ringbuffer.c and
into the new home for common routines, intel_engine_cs.c

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-21-git-send-email-chris@chris-wilson.co.uk
2016-08-15 11:01:08 +01:00
Chris Wilson adc320c4b7 drm/i915: Move common scratch allocation/destroy to intel_engine_cs.c
Since the scratch allocation and cleanup is shared by all engine
submission backends, move it out of the legacy intel_ringbuffer.c and
into the new home for common routines, intel_engine_cs.c

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-20-git-send-email-chris@chris-wilson.co.uk
2016-08-15 11:01:08 +01:00
Chris Wilson 56c0f1a7c1 drm/i915: Use VMA for scratch page tracking
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-19-git-send-email-chris@chris-wilson.co.uk
2016-08-15 11:01:07 +01:00
Chris Wilson 57e8853181 drm/i915: Use VMA for ringbuffer tracking
Use the GGTT VMA as the primary cookie for handing ring objects as
the most common action upon the ring is mapping and unmapping which act
upon the VMA itself. By restructuring the code to work with the ring
VMA, we can shrink the code and remove a few cycles from context pinning.

v2: Move the flush of the object back to before the first pin. We use
the am-I-bound? query to only have to check the flush on the first
bind and so avoid stalling on active rings.
Lots of little renames and small hoops.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-18-git-send-email-chris@chris-wilson.co.uk
2016-08-15 11:01:05 +01:00
Chris Wilson e5cdb22b27 drm/i915: Move assertion for iomap access to i915_vma_pin_iomap
Access through the GTT requires the device to be awake. Ideally
i915_vma_pin_iomap() is short-lived and the pinning demarcates the
access through the iomap. This is not entirely true, we have a mixture
of long lived pins that exceed the wakelock (such as legacy ringbuffers)
and short lived pin that do live within the wakelock (such as execlist
ringbuffers).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-17-git-send-email-chris@chris-wilson.co.uk
2016-08-15 11:01:04 +01:00
Chris Wilson 7abc98fadf drm/i915: Only change the context object's domain when binding
We know that the only access to the context object is via the GPU, and
the only time when it can be out of the GPU domain is when it is swapped
out and unbound. Therefore we only need to clflush the object when
binding, thus avoiding any potential stall on touching the domain on an
active context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-16-git-send-email-chris@chris-wilson.co.uk
2016-08-15 11:01:03 +01:00
Chris Wilson bf3783e52a drm/i915: Use VMA as the primary object for context state
When working with contexts, we most frequently want the GGTT VMA for the
context state, first and foremost. Since the object is available via the
VMA, we need only then store the VMA.

v2: Formatting tweaks to debugfs output, restored some comments removed
in the next patch

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-15-git-send-email-chris@chris-wilson.co.uk
2016-08-15 11:01:02 +01:00
Chris Wilson d31d7cb146 drm/i915: Support for creating write combined type vmaps
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.

We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT

v2: Remove the redundant braces around pin count check and fix the marker
     in documentation (Chris)

v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
   i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
   superfluous BUG_ON.(Tvrtko)

v4:
- Rename the enums and clean up the pin_map function. (Chris)

v5: Drop the VM_NO_GUARD, minor cosmetics.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
2016-08-12 13:06:36 +01:00
Tvrtko Ursulin c1bb11451e drm/i915: Store number of active engines in device info
Until now code was calling hweight32 to figure out the
number from device_info->ring_mask at runtime. Instead
we can cache it at engine init time and use directly.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1470842530-35854-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-08-11 11:33:10 +01:00