I woke up one morning and found 50k objects sitting in the batch pool
and every search seemed to iterate the entire list... Painting the
screen in oils would provide a more fluid display.
One issue with the current design is that we only check for retirements
on the current ring when preparing to submit a new batch. This means
that we can have thousands of "active" batches on another ring that we
have to walk over. The simplest way to avoid that is to split the pools
per ring and then our LRU execution ordering will also ensure that the
inactive buffers remain at the front.
v2: execlists still requires duplicate code.
v3: execlists requires more duplicate code
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
According to Spec this is a reserved bit for Gen9+ and should not be set.
Change-Id: I0215fb7057b94139b7a2f90ecc7a0201c0c93ad4
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The legacy and LRC code paths have an almost identical procedure for waiting for
space in the ring buffer. They both search for a request in the free list that
will advance the tail to a point where sufficient space is available. They then
wait for that request, retire it and recalculate the free space value.
Unfortunately, a bug in the LRC side meant that the resulting free space might
not be as large as expected and indeed, might not be sufficient. This is because
it was testing against the value of request->tail not request->postfix. Whereas,
when a request is retired, ringbuf->tail is updated to req->postfix not
req->tail.
Another significant difference between the two is that the LRC one did not trust
the wait for request to work! It redid the is there enough space available test
and would fail the call if insufficient. Whereas, the legacy version just said
'return 0' - it assumed the preceeding code works. This difference meant that
the LRC version still worked even with the bug - it just fell back to the
polling wait path.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The request allocation code is largely duplicated between legacy mode and
execlist mode. The actual difference between the two versions of the code is
pretty minimal.
This patch moves the common code out into a separate function. This is then
called by the execution specific version prior to setting up the one different
value.
For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The only usage of intel_logical_ring_begin() is within intel_lrc.c so it can be
made static. To avoid a forward declaration at the top of the file, it and bunch
of other functions have been shuffled upwards.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJU/NacAAoJEHm+PkMAQRiGdUcIAJU5dHclwd9HRc7LX5iOwYN6
mN0aCsYjMD8Pjx2VcPCgJvkIoESQO5pkwYpFFWCwILup1bVEidqXfr8EPOdThzdh
kcaT0FwUvd19K+0jcKVNCX1RjKBtlUfUKONk6sS2x4RrYZpv0Ur8Gh+yXV8iMWtf
fAusNEYlxQJvEz5+NSKw86EZTr4VVcykKLNvj+/t/JrXEuue7IG8EyoAO/nLmNd2
V/TUKKttqpE6aUVBiBDmcMQl2SUVAfp5e+KJAHmizdDpSE80nU59UC1uyV8VCYdM
qwHXgttLhhKr8jBPOkvUxl4aSXW7S0QWO8TrMpNdEOeB3ZB8AKsiIuhe1JrK0ro=
=Xkue
-----END PGP SIGNATURE-----
Merge tag 'v4.0-rc3' into drm-next
Linux 4.0-rc3 backmerge to fix two i915 conflicts, and get
some mainline bug fixes needed for my testing box
Conflicts:
drivers/gpu/drm/i915/i915_drv.h
drivers/gpu/drm/i915/intel_display.c
- Y tiling support for scanout from Tvrtko&Damien
- Remove more UMS support
- some small prep patches for OLR removal from John Harrison
- first few patches for dynamic pagetable allocation from Ben Widawsky, rebased
by tons of other people
- DRRS support patches (Sonika&Vandana)
- fbc patches from Paulo
- make sure our vblank callbacks aren't called when the pipes are off
- various patches all over
* tag 'drm-intel-next-2015-02-27' of git://anongit.freedesktop.org/drm-intel: (61 commits)
drm/i915: Update DRIVER_DATE to 20150227
drm/i915: Clarify obj->map_and_fenceable
drm/i915/skl: Allow Y (and Yf) frame buffer creation
drm/i915/skl: Update watermarks for Y tiling
drm/i915/skl: Updated watermark programming
drm/i915/skl: Adjust get_plane_config() to support Yb/Yf tiling
drm/i915/skl: Teach pin_and_fence_fb_obj() about Y tiling constraints
drm/i915/skl: Adjust intel_fb_align_height() for Yb/Yf tiling
drm/i915/skl: Allow scanning out Y and Yf fbs
drm/i915/skl: Add new displayable tiling formats
drm/i915: Remove DRIVER_MODESET checks from modeset code
drm/i915: Remove regfile code&data for UMS suspend/resume
drm/i915: Remove DRIVER_MODESET checks from gem code
drm/i915: Remove DRIVER_MODESET checks in the gpu reset code
drm/i915: Remove DRIVER_MODESET checks from suspend/resume code
drm/i915: Remove DRIVER_MODESET checks in load/unload/close code
drm/i915: fix a printk format
drm/i915: Add media rc6 residency file to sysfs
drm/i915: Add missing description to parameter in alloc_pt_range
drm/i915: Removed the read of RP_STATE_CAP from sysfs/debugfs functions
...
- use the atomic helpers for plane_upate/disable hooks (Matt Roper)
- refactor the initial plane config code (Damien)
- ppgtt prep patches for dynamic pagetable alloc (Ben Widawsky, reworked and
rebased by a lot of other people)
- framebuffer modifier support from Tvrtko Ursulin, drm core code from Rob Clark
- piles of workaround patches for skl from Damien and Nick Hoath
- vGPU support for xengt on the client side (Yu Zhang)
- and the usual smaller things all over
* tag 'drm-intel-next-2015-02-14' of git://anongit.freedesktop.org/drm-intel: (88 commits)
drm/i915: Update DRIVER_DATE to 20150214
drm/i915: Remove references to previously removed UMS config option
drm/i915/skl: Use a LRI for WaDisableDgMirrorFixInHalfSliceChicken5
drm/i915/skl: Fix always true comparison in a revision id check
drm/i915/skl: Implement WaEnableLbsSlaRetryTimerDecrement
drm/i915/skl: Implement WaSetDisablePixMaskCammingAndRhwoInCommonSliceChicken
drm/i915: Add process identifier to requests
drm/i915/skl: Implement WaBarrierPerformanceFixDisable
drm/i915/skl: Implement WaCcsTlbPrefetchDisable:skl
drm/i915/skl: Implement WaDisableChickenBitTSGBarrierAckForFFSliceCS
drm/i915/skl: Implement WaDisableHDCInvalidation
drm/i915/skl: Implement WaDisableLSQCROPERFforOCL
drm/i915/skl: Implement WaDisablePartialResolveInVc
drm/i915/skl: Introduce a SKL specific init_workarounds()
drm/i915/skl: Document that we implement WaRsClearFWBitsAtReset
drm/i915/skl: Implement WaSetGAPSunitClckGateDisable
drm/i915/skl: Make the init clock gating function skylake specific
drm/i915/skl: Provide a gen9 specific init_render_ring()
drm/i915/skl: Document the WM read latency W/A with its name
drm/i915/skl: Also detect eDRAM on SKL
...
In execlist mode, the ringbuf is a function of the ring and context whereas in
legacy mode, it is derived from the ring alone. Thus the calculation required to
determine the ringbuf pointer from the ring (and context) also needs to test
execlist mode or not. This is messy.
Further, the request structure holds a pointer to both the ring and the context
for which it was created. Thus, given a request, it is possible to derive the
ringbuf in either legacy or execlist mode. Hence it is necessary to pass just
the request in to all the low level functions rather than some combination of
request, ring, context and ringbuf. However, rather than recalculating it each
time, it is much simpler to just cache the ringbuf pointer in the request
structure itself.
Caching the pointer means the calculation is done once at request creation time
and all further code and simply read it directly from the request structure.
OTC-Jira: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
[danvet: Drop contentless comment in lrc alloc request entirely. And
spelling fix in the commit message.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There is a trace point in the legacy execbuffer execution path that is missing
from the execlist path. Trace points are extremely useful for debugging and are
used by various automated validation tests. Hence, this patch adds the missing
trace point back in.
OTC-Jira: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There is a flags word that is passed through the execbuffer code path all the
way from initial decoding of the user parameters down to the very final dispatch
buffer call. It is simply called 'flags'. Unfortuantely, there are many other
flags words floating around in the same blocks of code. Even more once the GPU
scheduler arrives.
This patch makes it more obvious exactly which flags word is which by renaming
'flags' to 'dispatch_flags'. Note that the bit definitions for this flags word
already have an 'I915_DISPATCH_' prefix on them and so are not quite so
ambiguous.
OTC-Jira: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
[danvet: Resolve conflict with Chris' rework of the bb parsing.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks. This makes the
code easier to write, read, and verify.
Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,
The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain additional complexity as a
result.
This patch exists for a few reasons:
1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the DMA mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.
2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).
3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and catching leaks should be easy.
4. Less important: the GFP flags are confined to one location, which
makes playing around with such things trivial.
v2: Updated commit message to explain why this patch exists
v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/
v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)
v5: Added additional safety checks in gen8 clear/free/unmap.
v6: Use WARN_ON and return -EINVAL in alloc_pt_range (Mika).
v7: Make err_out loop symmetrical to the way we allocate in
alloc_pt_range. Also s/page_tables/page_table and correct commit
message (Mika)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Move the remaining members over to the new page table structures.
This can be squashed with the previous commit if desire. The reasoning
is the same as that patch. I simply felt it is easier to review if split.
v2: In lrc: s/ppgtt->pd_dma_addr[i]/ppgtt->pdp.page_directory[i].daddr/
v3: Rebase.
v4: Rebased after s/page_tables/page_table/.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When converting from implicitly tracked execlist queue items to ref counted
requests, not all frees of requests were replaced with unrefs, and extraneous
refs/unrefs of contexts were added.
Correct the unbalanced refcount & replace the frees.
Remove a noisy warning when hitting the request creation path.
drm_i915_gem_request and intel_context are both kref reference counted
structures. Upon allocation, drm_i915_gem_request's ref count should be
bumped using kref_init. When a context is assigned to the request,
the context's reference count should be bumped using i915_gem_context_reference.
i915_gem_request_reference will reduce the context reference count when
the request is freed.
Problem introduced in
commit 6d3d8274bc
Author: Nick Hoath <nicholas.hoath@intel.com>
AuthorDate: Thu Jan 15 13:10:39 2015 +0000
drm/i915: Subsume intel_ctx_submit_request in to drm_i915_gem_request
v2: Added comments explaining how the ctx pointer and the request object should
be ref-counted. Removed noisy warning.
v3: Cleaned up the language used in the commit & the header
description (Thanks David Gordon)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88652
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Work was getting left behind in LRC contexts during reset. This causes a hang
if the GPU is reset when HEAD==TAIL because the context's ringbuffer head and
tail don't get reset and retiring a request doesn't alter them, so the ring
still appears full.
Added a function intel_lr_context_reset() to reset head and tail on a LRC and
its ringbuffer.
Call intel_lr_context_reset() for each context in i915_gem_context_reset() when
in execlists mode.
Testcase: igt/pm_rps --run-subtest reset #bdw
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88096
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
[danvet: Flatten control flow in the lrc reset code a notch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
On Gen9 the render power gating can leave slice/subslice/EU in
a partially enabled state. We must make an explicit request for
full SSEU enablement through the Render Power Clock State
register when resuming render work. This register is save/
restored in the logical ring context image for execlist
submission mode. Initialize its value in each LRC image to
request full enablement according to the device SSEU config.
Thanks to Sharma Ankitprasad and Akash Goel for highlighting the
issue and proposing the initial fix on which this patch is based.
v2: Adjusted the names of the power gating support flags to fit
update of an earlier patch.
Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
Reviewed-by: "Akash Goel <akash.goel@intel.com>"
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
WaDisableAsyncFlipPerfMode isn't listed for SKL and
INSTPM_FORCE_ORDERING is MBZ so let's make a gen9 specific render init
function.
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Nick Hoath <nicholas.hoath@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This function is only used in intel_lrc.c, so restrict it to that file. The
function was moved around to avoid a forward declaration and group it with its
user.
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This function is only used in intel_lrc.c, so restrict it to that file. The
function was moved around to avoid a forward declaration and group it with its
user.
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This patch introduces 2 bit definitions of context save/restore
control register.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Suggested-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This looked like an odd regression from
commit ec5cc0f9b0
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Thu Jun 12 10:28:55 2014 +0100
drm/i915: Restrict GPU boost to the RCS engine
but in reality it undercovered a much older coherency bug. The issue that
boosting the GPU frequency on the BCS ring was masking was that we could
wake the CPU up after completion of a BCS batch and inspect memory prior
to the write cache being fully evicted. In order to serialise the
breadcrumb interrupt (and so ensure that the CPU's view of memory is
coherent) we need to perform a post-sync operation in the MI_FLUSH_DW.
v2: Fix all the MI_FLUSH_DW (bsd plus the duplication in execlists).
Also fix the invalidate_domains mask in gen8_emit_flush() for ring !=
VCS.
Testcase: gpuX-rcs-gpu-read-after-write
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Remove request from list before unreferencing it, in case it's actually
the only reference. (Found by Tvrtko Ursulin)
This issue has been most likely introduced in
commit 6d3d8274bc
Author: Nick Hoath <nicholas.hoath@intel.com>
Date: Thu Jan 15 13:10:39 2015 +0000
drm/i915: Subsume intel_ctx_submit_request in to drm_i915_gem_request
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We increase it when we pin, so for the casual reader
rename it to cause less confusion.
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We pin when we submit to execlist queue. Balance
the pinning when the submitted queue is cleaned on reset.
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We have multiple forcewake domains now on recent gens. Change the
function naming to reflect this.
v2: More verbose names (Chris)
v3: Rebase
v4: Rebase
v5: Add documentation for forcewake_get/put
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Deepak S <deepak.s@linux.intel.com> (v2)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
On user forcewake access, assert that runtime pm reference is held.
Fix and cleanup the callsites accordingly.
v2: Remove intel_runtime_pm_get() rebasehap (Deepak)
v3: use drivers own runtime state tracking as pm_runtime_active()
will return wrong results when we are in resume callchain (Mika)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Deepak S <deepak.s@linux.intel.com> (v2)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Move all remaining elements that were unique to execlists queue items
in to the associated request.
Issue: VIZ-4274
v2: Rebase. Fixed issue of overzealous freeing of request.
v3: Removed re-addition of cleanup work queue (found by Daniel Vetter)
v4: Rebase.
v5: Actual removal of intel_ctx_submit_request. Update both tail and postfix
pointer in __i915_add_request (found by Thomas Daniel)
v6: Removed unrelated changes
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
[danvet: Reformat comment with strange linebreaks.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The first pass implementation of execlists required a backpointer to the context to be held
in the intel_ringbuffer. However the context pointer is available higher in the call stack.
Remove the backpointer from the ring buffer structure and instead pass it down through the
call stack.
v2: Integrate this changeset with the removal of duplicate request/execlist queue item members.
v3: Rebase
v4: Rebase. Remove passing of context when the request is passed.
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Where there were duplicate variables for the tail, context and ring (engine)
in the gem request and the execlist queue item, use the one from the request
and remove the duplicate from the execlist queue item.
Issue: VIZ-4274
v1: Rebase
v2: Fixed build issues. Keep separate postfix & tail pointers as these are
used in different ways. Reinserted missing full tail pointer update.
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Add a reference and pointer from the execlist queue item to the associated
gem request. For execlist requests that don't have a request, create one
as a placeholder.
Issue: VIZ-4274
v1: Rebase after upstream of "Replace seqno values with request structures" patchset.
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
A previous commit enabled execlists by default:
commit 27401d126b ("drm/i915/bdw: Enable execlists by default where supported")
This allowed routine testing of execlists which exposed a regression when
resuming from suspend. The cause was tracked down the to recent changes to the
ring init sequence:
commit 35a57ffbb1 ("drm/i915: Only init engines once")
During a suspend/resume cycle the hardware Context Status Buffer write pointer
is reset. However since the recent changes to the init sequence the software CSB
read pointer is no longer reset. This means that context status events are not
handled correctly and new contexts are not written to the ELSP, resulting in an
apparent GPU hang.
Pending further changes to the ring init code, just move the
ring->next_context_status_buffer initialization into gen8_init_common_ring to
fix this regression.
v2: Moved init into gen8_init_common_ring rather than context_enable after
feedback from Daniel Vetter. Updated commit msg to reflect this and also cite
commits related to the regression. Fixed bz link to correct bug.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88096
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Otherwise, new platforms without workarounds will hit this warning for
every new context created.
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Execlist support in the i915 driver is now considered good enough for the
feature to be enabled by default on Gen8 and later and routinely tested.
Adjusted i915 parameters structure initialization to reflect this and updated
the comment in intel_sanitize_enable_execlists().
There's still work to do before we can let the wider massive onto it,
but there's still time left before the 3.20 cutoff.
v2: Update the MODULE_PARM_DESC too.
Issue: VIZ-2020
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
[danvet: Add note that there's still some work left to do.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We consistently use the _irq_handler postfix for functions called in
hardirq context. Especially when it's a non-static function hardirq is
a crazy enough calling context to warrant this level of ocd. So rename
it.
Cc: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
For debugging purposes, it is useful to be able to uniquely identify a given
request structure as it works its way through the system. This becomes
especially tricky once the seqno value is lazily allocated as then the request
has nothing but its pointer to identify it for much of its life.
Change-Id: Ie76b2268b940467f4cdf5a4ba6f5a54cbb96445d
For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There is a general theory that kzmalloc is better/safer than kmalloc, especially
for interesting data structures. This change updates the request structure
allocation to be zero filled.
This also fixes crashes in the reset code. Quoting Mika's patch:
"Clean the request structure on alloc. Otherwise we might end up
referencing uninitialized fields. This is apparent when we try to
cleanup the preallocated request on ring reset, before any request has
been submitted to the ring. The request->ctx is foobar and we end up
freeing the foobarness."
Note that this fixes a regression introduced in
commit 9eba5d4a1d
Author: John Harrison <John.C.Harrison@Intel.com>
Date: Mon Nov 24 18:49:23 2014 +0000
drm/i915: Ensure OLS & PLR are always in sync
References: https://bugs.freedesktop.org/show_bug.cgi?id=86959
References: https://bugs.freedesktop.org/show_bug.cgi?id=86962
References: https://bugs.freedesktop.org/show_bug.cgi?id=86992
Change-Id: I68715ef758025fab8db763941ef63bf60d7031e2
For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
MI_STORE_DWORD_IMM length has been the same ever since gen4. Rename
the define to avoid potential confusion if someone tries to use this
on pre-gen8.
Also correct the comment on MI_MEM_VIRTUAL bit. It's present on 945,g33
and 965 only.
Cc: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
[danvet: Add USE_GGTT define for g4x+ too.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
A previous commit introduced engine init changes:
commit 372ee59699d9 ("drm/i915: Only init engines once")
This broke execlists as intel_lr_context_render_state_init was trying to emit
commands to the RCS for the default context before the ring->init_hw was called.
Made a new gen8_init_rcs_context function and assign in to render ring
init_context. Moved call to intel_logical_ring_workarounds_emit into
gen8_init_rcs_context to maintain previous functionality.
Moved call to render_state_init from lr_context_deferred_create into
gen8_init_rcs_context, and modified deferred_create to call ring->init_context
for non-default contexts.
Modified i915_gem_context_enable to call ring->init_context for the default
context.
So init_context will now always be called when the hw is ready - in
i915_gem_context_enable for the default context and in lr_context_deferred_create
for other contexts.
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now that sanity prevails and we have the clean split between software
init and starting the engines we can drop all the "have we allocate
this struct already?" nonsense.
Execlist code could benefit quite a bit more still, but that's for
another patch.
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
We can do this.
And now there's finally the clean split between software setup and
hardware setup I kinda wanted since multi-ring support was merged
aeons ago. It only took almost 5 years.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With this all the ->init_hw hooks really only set up hw state needed
to start the ring, all the software state setup and memory/buffer
allocations happen beforehand.
v2: We need to call intel_init_pipe_control after the ring init since
otherwise engine->dev is NULL and it falls over. Currently that's
now after the hw ring is enabled but a) we'll be fine as long as no
one submits a batch b) this will change soon.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is (mostly, some exceptions that need fixing) the hw setup
function which starts the ring. And not the function which allocates
all the resources.
Make this clear by giving it a better name.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There are numerous places in the code where the driver's idea of
how much space is left in a ring is updated using the driver's
latest notions of the positions of 'head' and 'tail' for the ring.
Among them are some that update one or both of these values before
(re)doing the calculation. In particular, there are four different
places in the code where 'last_retired_head' is copied to 'head'
and then set to -1; and two of these do not have a guard to check
that it has actually been updated since last time it was consumed,
leaving the possibility that the dummy -1 can be transferred from
'last_retired_head' to 'head', causing the space calculation to
produce 'impossible' results (previously seen on Android/VLV).
This code therefore consolidates all the calculation and updating of
these values, such that there is only one place where the ring space
is updated, and it ALWAYS uses (and consumes) 'last_retired_head' if
(and ONLY if) it has been updated since the last call.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It makes a lot more sense (and makes future seqno -> request conversion patches
simpler) to fill in the 'ring' field of the request structure at the point of
creation rather than submission. Given that the request structure is assigned by
ring specific code and thus is locked to a ring from the start, there really is
no reason to defer this assignment.
For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There is no longer any need to retrieve a seqno value from an i915_add_request()
call. The calling code already knows which request structure is being processed
(it can only be ring->OLR). And as the request itself is now used in preference
to the basic seqno value, the latter is now redundant in this situation.
For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Updated i915_wait_seqno() to take a request structure instead of a seqno value
and renamed it accordingly. Internally, it just pulls the seqno out of the
request and calls on to __wait_seqno() as before. However, all the code further
up the stack is now simplified as it can just pass the request object straight
through without having to peek inside.
For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
[danvet: Squash in hunk from an earlier patch which was rebased
wrongly.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The OLS value is now obsolete. Exactly the same value is guarateed to be always
available as PLR->seqno. Thus it is safe to remove the OLS completely. And also
to rename the PLR to OLR to keep the 'outstanding lazy ...' naming convention
valid.
For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The plan is to use request structures everywhere that seqno values were
previously used. This means saving pointers to structures in places that used to
be simple integers. In turn, that means that the target structure now needs much
more stringent lifetime tracking. That is, it must not be freed while some other
random object still holds a pointer to it.
To achieve this tracking, a reference count needs to be added. Whenever a
pointer to the structure is saved away, the count must be incremented and the
free must only occur when all references have been released.
For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The aim is to replace seqno values with request structures. A step along the way
is to switch to using the PLR in preference to the OLS. That requires the PLR to
only be valid when and only when the OLS is also valid. I.e., the two must be
kept in lock step. Then, code which was using the OLS can be safely switched
over to using the PLR instead.
For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The logical ring code was updating the software ring 'head' value
by reading the hardware 'HEAD' register. In LRC mode, this is not
valid as the hardware is not necessarily executing the same context
that is being processed by the software. Thus reading the h/w HEAD
could put an unrelated (undefined, effectively random) value into
the s/w 'head' -- A Bad Thing for the free space calculations.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Deepak S <deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The request queue is per-engine, and may therefore contain requests
from several different contexts/ringbuffers. In determining which
request to wait for, this function should only consider requests
from the ringbuffer that it's checking for space, and ignore any
that it finds that belong to other contexts.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Deepak S <deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Same as with the context, pinning to GGTT regardless is harmful (it
badly fragments the GGTT and can even exhaust it).
Unfortunately, this case is also more complex than the previous one
because we need to map and access the ringbuffer in several places
along the execbuffer path (and we cannot make do by leaving the
default ringbuffer pinned, as before). Also, the context object
itself contains a pointer to the ringbuffer address that we have to
keep updated if we are going to allow the ringbuffer to move around.
v2: Same as with the context pinning, we cannot really do it during
an interrupt. Also, pin the default ringbuffers objects regardless
(makes error capture a lot easier).
v3: Rebased. Take a pin reference of the ringbuffer for each item
in the execlist request queue because the hardware may still be using
the ringbuffer after the MI_USER_INTERRUPT to notify the seqno update
is executed. The ringbuffer must remain pinned until the context save
is complete. No longer pin and unpin ringbuffer in
populate_lr_context() - this transient address is meaningless and the
pinning can cause a sleep while atomic.
v4: Moved ringbuffer pin and unpin into the lr_context_pin functions.
Downgraded pinning check BUG_ONs to WARN_ONs.
v5: Reinstated WARN_ONs for unexpected execlist states. Removed unused
variable.
Issue: VIZ-4277
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Akash Goel <akash.goels@gmail.com>
Reviewed-by: Deepak S<deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Up until now, we have pinned every logical ring context backing object
during creation, and left it pinned until destruction. This made my life
easier, but it's a harmful thing to do, because we cause fragmentation
of the GGTT (and, eventually, we would run out of space).
This patch makes the pinning on-demand: the backing objects of the two
contexts that are written to the ELSP are pinned right before submission
and unpinned once the hardware is done with them. The only context that
is still pinned regardless is the global default one, so that the HWS can
still be accessed in the same way (ring->status_page).
v2: In the early version of this patch, we were pinning the context as
we put it into the ELSP: on the one hand, this is very efficient because
only a maximum two contexts are pinned at any given time, but on the other
hand, we cannot really pin in interrupt time :(
v3: Use a mutex rather than atomic_t to protect pin count to avoid races.
Do not unpin default context in free_request.
v4: Break out pin and unpin into functions. Fix style problems reported
by checkpatch
v5: Remove unpin_lock as all pinning and unpinning is done with the struct
mutex already locked. Add WARN_ONs to make sure this is the case in future.
Issue: VIZ-4277
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Akash Goel <akash.goels@gmail.com>
Reviewed-by: Deepak S<deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
No longer create a work item to clean each execlist queue item.
Instead, move retired execlist requests to a queue and clean up the
items during retire_requests.
v2: Fix legacy ring path broken during overzealous cleanup
v3: Update idle detection to take execlists queue into account
v4: Grab execlist lock when checking queue state
v5: Fix leaking requests by freeing in execlists_retire_requests.
Issue: VIZ-4274
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Deepak S <deepak.s@linux.intel.com>
Reviewed-by: Akash Goel <akash.goels@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
kmap never fails.
Spotted-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Running the driver without execlists and hence PPGTT (either aliasing or
full) isn't a supported configuration on gen9+.
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Write and reads following the block changed use engine specific use counters
and unless that is matched here force wake use counting goes bad. Same
force wake is attempted to be taken twice which leads to at least time outs.
NOTE: Depending on feedback from hardware designers it may not be necessary
to grab force wakes on Gen9 here. But for Gen8 it is needed due to a race
between RC6 and ELSP writes.
v2: Added blitter force wake engine and made more future proof.
Added commit note.
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The LRC increased in size on gen9. Make sure we return the right
size in get_lr_context_size()
v2. Corrected the size, should be 22 pages. I unintentionally mailed out
a test patch w/ size equaling 23 pages.
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Michael H. Nguyen <michael.h.nguyen@intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Following the legacy ring submission example, update the
ring->init_context() hook to support the execlist submission mode.
v2: update to use the new workaround macros and cleanup unused code.
This takes care of both bdw and chv workarounds.
v2.1: Add missing call to init_context() during deferred context creation.
v3: Split init_context (emit) in legacy/lrc modes. For lrc, get the ringbuf
from the context (Mika/Daniel).
v4: Merge init_context interfaces back, the legacy mode only needs the ring,
but the lrc mode needs the ring and context (Mika).
Issue: VIZ-4092
Issue: GMIN-3475
Change-Id: Ie3d093b2542ab0e2a44b90460533e2f979788d6c
Cc: Deepak S <deepak.s@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
[danvet: Align function paramater lists properly.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Backmerge drm-next so that I can keep merging patches. Specifically I
want:
- atomic stuff, yay!
- eld parsing patch from Jani.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Write HWS_PGA address even in execlists mode as the global hardware status
page is still required. This address was previously uninitialized and
HWSP writes would clobber whatever buffer happened to reside at GGTT
address 0.
v2: Break out hardware status page setup into a separate function.
Issue: VIZ-2020
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Just various stuff all over from a bunch of people. Shortlog gives a beter
overview, it's really all misc drm patches.
* tag 'topic/core-stuff-2014-11-05' of git://anongit.freedesktop.org/drm-intel:
drm/edid: add #defines and helpers for ELD
drm/dp: Add counters in the drm_dp_aux struct for I2C NACKs and DEFERs
drm: Remove compiler BUG_ON() test
drm: Fix DRM_FORCE_ON_DIGITAL use
drm/gma500: Don't destroy DRM properties in the driver
drm/i915: Don't destroy DRM properties in the driver
drm: Add a note to drm_property_create() about property lifetime
gpu: drm: Fix warning caused by a parameter description in drm_crtc.c
drm/dp-helper: Move the legacy helpers to gma500
drm/crtc: Remove duplicated ioctl code
drm/crtc: Fix two typos
gpu:drm: Fix typo in Documentation/DocBook/drm.xml
gpu: drm: drm_dp_mst_topology.c: Fix improper use of strncat
drm: drm_err: Remove unnecessary __func__ argument
drm: Implement O_NONBLOCK support on /dev/dri/cardN
execlists_submit_context() always returns 0, which is redundant.
And its name is inaccurate, since it actually submits (up to)
TWO contextS. So we rename it, change it to "void", and remove
the WARN_ON() testing its return value.
Change-Id: Ie225b0eca7754c6093c8b8bd15550b251b6feb82
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If a ring failed to initialise for any reason then the error path would try to
clean up all rings including those that had not yet been allocated. The ring
clean up code did a check that the ring was valid before starting its work.
Unfortunately, that was after it had already dereferenced the ring to obtain a
dev_private pointer.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This patch fix spelling typos found in drm.xml.
It is because the file is generated from comments in
source codes, I have to fix the typos within source files.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Yet another place that wasn't properly transformed when implementing
SOix. While at it convert the checks to WARN_ON on gen5+ (since we
don't have UMS potentially doing stupid things on those platforms).
And also add the corresponding checks to the put functions (again with
a WARN_ON) for gen5+.
v2: Drop the WARNINGS in the irq_put functions (including the existing
one for vebox), Chris convinced me that they're not that terribly
useful.
v3: Don't forget about execlist code.
Cc: Imre Deak <imre.deak@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: "Volkin, Bradley D" <bradley.d.volkin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
In chv, we have two power wells Render & Media. We need to use
corresponsing forcewake count. If we dont follow this we are getting
error "*ERROR*: Timed out waiting for forcewake old ack to clear" due to
multiple entry into __vlv_force_wake_get.
Signed-off-by: Deepak S <deepak.s@linux.intel.com>
Requested-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The batchbuffer that sets the render context state is submitted
in a different way, and from different places.
We needed to make both the render state preparation and free functions
outside accesible, and namespace accordingly. This mess is so that all
LR, LRC and Execlists functionality can go together in intel_lrc.c: we
can fix all of this later on, once the interfaces are clear.
v2: Create a separate ctx->rcs_initialized for the Execlists case, as
suggested by Chris Wilson.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
v3: Setup ring status page in lr_context_deferred_create when the
default context is being created. This means that the render state
init for the default context is no longer a special case. Execute
deferred creation of the default context at the end of
logical_ring_init to allow the render state commands to be submitted.
Fix style errors reported by checkpatch. Rebased.
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
A previous commit broke aliasing PPGTT for lrc, resulting in a kernel oops
on boot. Add a check so that is full PPGTT is not in use the context is
populated with the aliasing PPGTT.
Issue: VIZ-4278
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Add theory of operation notes to intel_lrc.c and comments to externally
visible functions.
v2: Add notes on logical ring context creation.
v3: Use kerneldoc.
v4: Integrate it in the DocBook template.
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2, v3)
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Drop hunk about render ring init function since that's not
yet merged.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
v2: Warn and return if LRCs are not enabled.
v3: Grab the Execlists spinlock (noticed by Daniel Vetter).
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
v4: Lock the struct mutex for atomic state capture
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Checkpatch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If we receive a storm of requests for the same context (see gem_storedw_loop_*)
we might end up iterating over too many elements in interrupt time, looking for
contexts to squash together. Instead, share the burden by giving more
intelligence to the queue function. At most, the interrupt will iterate over
three elements.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Checkpatch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In the current Execlists feeding mechanism, full preemption is not
supported yet: only lite-restores are allowed (this is: the GPU
simply samples a new tail pointer for the context currently in
execution).
But we have identified an scenario in which a full preemption occurs:
1) We submit two contexts for execution (A & B).
2) The GPU finishes with the first one (A), switches to the second one
(B) and informs us.
3) We submit B again (hoping to cause a lite restore) together with C,
but in the time we spend writing to the ELSP, the GPU finishes B.
4) The GPU start executing B again (since we told it so).
5) We receive a B finished interrupt and, mistakenly, we submit C (again)
and D, causing a full preemption of B.
The race is avoided by keeping track of how many times a context has been
submitted to the hardware and by better discriminating the received context
switch interrupts: in the example, when we have submitted B twice, we won´t
submit C and D as soon as we receive the notification that B is completed
because we were expecting to get a LITE_RESTORE and we didn´t, so we know a
second completion will be received shortly.
Without this explicit checking, somehow, the batch buffer execution order
gets messed with. This can be verified with the IGT test I sent together with
the series. I don´t know the exact mechanism by which the pre-emption messes
with the execution order but, since other people is working on the Scheduler
+ Preemption on Execlists, I didn´t try to fix it. In these series, only Lite
Restores are supported (other kind of preemptions WARN).
v2: elsp_submitted belongs in the new intel_ctx_submit_request. Several
rebase changes.
v3: Clarify how the race is avoided, as requested by Daniel.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Align function parameters ...]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Handle all context status events in the context status buffer on every
context switch interrupt. We only remove work from the execlist queue
after a context status buffer reports that it has completed and we only
attempt to schedule new contexts on interrupt when a previously submitted
context completes (unless no contexts are queued, which means the GPU is
free).
We canot call intel_runtime_pm_get() in an interrupt (or with a spinlock
grabbed, FWIW), because it might sleep, which is not a nice thing to do.
Instead, do the runtime_pm get/put together with the create/destroy request,
and handle the forcewake get/put directly.
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
v2: Unreferencing the context when we are freeing the request might free
the backing bo, which requires the struct_mutex to be grabbed, so defer
unreferencing and freeing to a bottom half.
v3:
- Ack the interrupt inmediately, before trying to handle it (fix for
missing interrupts by Bob Beckett <robert.beckett@intel.com>).
- Update the Context Status Buffer Read Pointer, just in case (spotted
by Damien Lespiau).
v4: New namespace and multiple rebase changes.
v5: Squash with "drm/i915/bdw: Do not call intel_runtime_pm_get() in an
interrupt", as suggested by Daniel.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Checkpatch ...]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Context switch (and execlist submission) should happen only when
other contexts are not active, otherwise pre-emption occurs.
To assure this, we place context switch requests in a queue and those
request are later consumed when the right context switch interrupt is
received (still TODO).
v2: Use a spinlock, do not remove the requests on unqueue (wait for
context switch completion).
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
v3: Several rebases and code changes. Use unique ID.
v4:
- Move the queue/lock init to the late ring initialization.
- Damien's kmalloc review comments: check return, use sizeof(*req),
do not cast.
v5:
- Do not reuse drm_i915_gem_request. Instead, create our own.
- New namespace.
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2-v5)
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[davnet: Checkpatch + wash-up s/BUG_ON/WARN_ON/.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Each logical ring context has the tail pointer in the context object,
so update it before submission.
v2: New namespace.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
A context switch occurs by submitting a context descriptor to the
ExecList Submission Port. Given that we can now initialize a context,
it's possible to begin implementing the context switch by creating the
descriptor and submitting it to ELSP (actually two, since the ELSP
has two ports).
The context object must be mapped in the GGTT, which means it must exist
in the 0-4GB graphics VA range.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
v2: This code has changed quite a lot in various rebases. Of particular
importance is that now we use the globally unique Submission ID to send
to the hardware. Also, context pages are now pinned unconditionally to
GGTT, so there is no need to bind them.
v3: Use LRCA[31:12] as hwCtxId[19:0]. This guarantees that the HW context
ID we submit to the ELSP is globally unique and != 0 (Bspec requirements
of the software use-only bits of the Context ID in the Context Descriptor
Format) without the hassle of the previous submission Id construction.
Also, re-add the ELSP porting read (it was dropped somewhere during the
rebases).
v4:
- Squash with "drm/i915/bdw: Add forcewake lock around ELSP writes" (BSPEC
says: "SW must set Force Wakeup bit to prevent GT from entering C6 while
ELSP writes are in progress") as noted by Thomas Daniel
(thomas.daniel@intel.com).
- Rename functions and use an execlists/intel_execlists_ namespace.
- The BUG_ON only checked that the LRCA was <32 bits, but it didn't make
sure that it was properly aligned. Spotted by Alistair Mcaulay
<alistair.mcaulay@intel.com>.
v5:
- Improved source code comments as suggested by Chris Wilson.
- No need to abstract submit_ctx away, as pointed by Brad Volkin.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Checkpatch. Sigh.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
On a previous iteration of this patch, I created an Execlists
version of __i915_add_request and asbtracted it away as a
vfunc. Daniel Vetter wondered then why that was needed:
"with the clean split in command submission I expect every
function to know wether it'll submit to an lrc (everything in
intel_lrc.c) or wether it'll submit to a legacy ring (existing
code), so I don't see a need for an add_request vfunc."
The honest, hairy truth is that this patch is the glue keeping
the whole logical ring puzzle together:
- i915_add_request is used by intel_ring_idle, which in turn is
used by i915_gpu_idle, which in turn is used in several places
inside the eviction and gtt codes.
- Also, it is used by i915_gem_check_olr, which is littered all
over i915_gem.c
- ...
If I were to duplicate all the code that directly or indirectly
uses __i915_add_request, I'll end up creating a separate driver.
To show the differences between the existing legacy version and
the new Execlists one, this time I have special-cased
__i915_add_request instead of adding an add_request vfunc. I
hope this helps to untangle this Gordian knot.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Adjust to ringbuf->FIXME_lrc_ctx per the discussion with
Thomas Daniel.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The execlist patches have a bit a convoluted and long history and due
to that have the actual submission still misplaced deeply burried in
the low-level ringbuffer handling code. This design goes back to the
legacy ringbuffer code with its tricky lazy request and simple work
submissiion using ring tail writes. For that reason they need a
ring->ctx backpointer.
The goal is to unburry that code and move it up into a level where the
full execlist context is available so that we can ditch this
backpointer. Until that's done make it really obvious that there's
work still to be done.
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Thomas Daniel <thomas.daniel@intel.com>
Acked-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There's a bit a confusion since we track the global gtt,
the aliasing and real ppgtt in the ctx->vm pointer. And not
all callers really bother to check for the different cases and just
presume that it points to a real ppgtt.
Now looking closely we don't actually need ->vm to always point at an
address space - the only place that cares actually has fixup code
already to decide whether to look at the per-proces or the global
address space.
So switch to just tracking the ppgtt directly and ditch all the
extraneous code.
v2: Fixup the ppgtt debugfs file to not oops on a NULL ctx->ppgtt.
Also drop the early exit - without aliasing ppgtt we want to dump all
the ppgtts of the contexts if we have full ppgtt.
v3: Actually git add the compile fix.
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Cc: "Thierry, Michel" <michel.thierry@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
OTC-Jira: VIZ-3724
[danvet: Resolve conflicts with execlist patches while applying.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The normal flip function places things in the ring in the legacy
way, so we either fix that or force MMIO flips always as we do in
this patch.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Checkpatch. Fucking again.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is what i915_gem_do_execbuffer calls when it wants to execute some
worload in an Execlists world.
v2: Check arguments before doing stuff in intel_execlists_submission. Also,
get rel_constants parsing right.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Drop the chipset flush, that's pre-gen6. And appease
checkpatch a bit .... again!]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We need to attend context switch interrupts from all rings. Also, fixed writing
IMR/IER and added HWSTAM at ring init time.
Notice that, if added to irq_enable_mask, the context switch interrupts would
be incorrectly masked out when the user interrupts are due to no users waiting
on a sequence number. Therefore, this commit adds a bitmask of interrupts to
be kept unmasked at all times.
v2: Disable HWSTAM, as suggested by Damien (nobody listens to these interrupts,
anyway).
v3: Add new get/put_irq functions.
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2 & v3)
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Drop the GEN8_ prefix from the context switch interrupt
define and move it to its brethren.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is a hard one, since there is no direct hardware ring to
control when in Execlists.
We reuse intel_ring_idle here, but it should be fine as long
as i915_add_request does the ring thing.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Same as the legacy-style ring->flush.
v2: The BSD invalidate bit still exists in GEN8! Add it for the VCS
rings (but still consolidate the blt and bsd ring flushes into one).
This was noticed by Brad Volkin.
v3: The command for BSD and for other rings is slightly different:
get it exactly the same as in gen6_ring_flush + gen6_bsd_ring_flush
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Checkpatch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Very similar to the legacy add_request, only modified to account for
logical ringbuffer.
v2: Use MI_GLOBAL_GTT, as suggested by Brad Volkin.
v3: Unify render and non-render in the same function, as noticed by
Brad Volkin.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Well, new-ish: if all this code looks familiar, that's because it's
a clone of the existing submission mechanism (with some modifications
here and there to adapt it to LRCs and Execlists).
And why did we do this instead of reusing code, one might wonder?
Well, there are some fears that the differences are big enough that
they will end up breaking all platforms.
Also, Execlists offer several advantages, like control over when the
GPU is done with a given workload, that can help simplify the
submission mechanism, no doubt. I am interested in getting Execlists
to work first and foremost, but in the future this parallel submission
mechanism will help us to fine tune the mechanism without affecting
old gens.
v2: Pass the ringbuffer only (whenever possible).
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Appease checkpatch. Again. And drop the legacy sarea gunk
that somehow crept in.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
No mistery here: the seqno is still retrieved from the engine's
HW status page (the one in the default context. For the moment,
I see no reason to worry about other context's HWS page).
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Logical rings do not need most of the initialization their
legacy ringbuffer counterparts do: we just need the pipe
control object for the render ring, enable Execlists on the
hardware and a few workarounds.
v2: Squash with: "drm/i915: Extract pipe control fini & make
init outside accesible".
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Make checkpatch happy.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Allocate and populate the default LRC for every ring, call
gen-specific init/cleanup, init/fini the command parser and
set the status page (now inside the LRC object). These are
things all engines/rings have in common.
Stopping the ring before cleanup and initializing the seqnos
is left as a TODO task (we need more infrastructure in place
before we can achieve this).
v2: Check the ringbuffer backing obj for ring_is_initialized,
instead of the context backing obj (similar, but not exactly
the same).
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Execlists are indeed a brave new world with respect to workload
submission to the GPU.
In previous version of these series, I have tried to impact the
legacy ringbuffer submission path as little as possible (mostly,
passing the context around and using the correct ringbuffer when I
needed one) but Daniel is afraid (probably with a reason) that
these changes and, especially, future ones, will end up breaking
older gens.
This commit and some others coming next will try to limit the
damage by creating an alternative path for workload submission.
The first step is here: laying out a new ring init/fini.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
For the most part, logical ring context objects are similar to hardware
contexts in that the backing object is meant to be opaque. There are
some exceptions where we need to poke certain offsets of the object for
initialization, updating the tail pointer or updating the PDPs.
For our basic execlist implementation we'll only need our PPGTT PDs,
and ringbuffer addresses in order to set up the context. With previous
patches, we have both, so start prepping the context to be load.
Before running a context for the first time you must populate some
fields in the context object. These fields begin 1 PAGE + LRCA, ie. the
first page (in 0 based counting) of the context image. These same
fields will be read and written to as contexts are saved and restored
once the system is up and running.
Many of these fields are completely reused from previous global
registers: ringbuffer head/tail/control, context control matches some
previous MI_SET_CONTEXT flags, and page directories. There are other
fields which we don't touch which we may want in the future.
v2: CTX_LRI_HEADER_0 is MI_LOAD_REGISTER_IMM(14) for render and (11)
for other engines.
v3: Several rebases and general changes to the code.
v4: Squash with "Extract LR context object populating"
Also, Damien's review comments:
- Set the Force Posted bit on the LRI header, as the BSpec suggest we do.
- Prevent warning when compiling a 32-bits kernel without HIGHMEM64.
- Add a clarifying comment to the context population code.
v5: Damien's review comments:
- The third MI_LOAD_REGISTER_IMM in the context does not set Force Posted.
- Remove dead code.
v6: Add a note about the (presumed) differences between BDW and CHV state
contexts. Also, Brad's review comments:
- Use the _MASKED_BIT_ENABLE, upper_32_bits and lower_32_bits macros.
- Be less magical about how we set the ring size in the context.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com> (v2)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Any given ringbuffer is unequivocally tied to one context and one engine.
By setting the appropriate pointers to them, the ringbuffer struct holds
all the infromation you might need to submit a workload for processing,
Execlists style.
v2: Drop ring->ctx since that looks terribly ill-defined for legacy
ringbuffer submission.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v1)
Acked-by: Damien Lespiau <damien.lespiau@intel.com> (v2)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we have said a couple of times by now, logical ring contexts have
their own ringbuffers: not only the backing pages, but the whole
management struct.
In a previous version of the series, this was achieved with two separate
patches:
drm/i915/bdw: Allocate ringbuffer backing objects for default global LRC
drm/i915/bdw: Allocate ringbuffer for user-created LRCs
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now that we have the ability to allocate our own context backing objects
and we have multiplexed one of them per engine inside the context structs,
we can finally allocate and free them correctly.
Regarding the context size, reading the register to calculate the sizes
can work, I think, however the docs are very clear about the actual
context sizes on GEN8, so just hardcode that and use it.
v2: Rebased on top of the Full PPGTT series. It is important to notice
that at this point we have one global default context per engine, all
of them using the aliasing PPGTT (as opposed to the single global
default context we have with legacy HW contexts).
v3:
- Go back to one single global default context, this time with multiple
backing objects inside.
- Use different context sizes for non-render engines, as suggested by
Damien (still hardcoded, since the information about the context size
registers in the BSpec is, well, *lacking*).
- Render ctx size is 20 (or 19) pages, but not 21 (caught by Damien).
- Move default context backing object creation to intel_init_ring (so
that we don't waste memory in rings that might not get initialized).
v4:
- Reuse the HW legacy context init/fini.
- Create a separate free function.
- Rename the functions with an intel_ preffix.
v5: Several rebases to account for the changes in the previous patches.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
For the moment this is just a placeholder, but it shows one of the
main differences between the good ol' HW contexts and the shiny
new Logical Ring Contexts: LR contexts allocate and free their
own backing objects. Another difference is that the allocation is
deferred (as the create function name suggests), but that does not
happen in this patch yet, because for the moment we are only dealing
with the default context.
Early in the series we had our own gen8_gem_context_init/fini
functions, but the truth is they now look almost the same as the
legacy hw context init/fini functions. We can always split them
later if this ceases to be the case.
Also, we do not fall back to legacy ringbuffers when logical ring
context initialization fails (not very likely to happen and, even
if it does, hw contexts would probably fail as well).
v2: Daniel says "explain, do not showcase".
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: s/BUG_ON/WARN_ON/.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Depending upon one module option to be sanitized (through USES_PPGTT)
for the other is a bit too fragile for my taste. At least WARN about
this.
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
GEN8 brings an expansion of the HW contexts: "Logical Ring Contexts".
These expanded contexts enable a number of new abilities, especially
"Execlists".
The macro is defined to off until we have things in place to hope to
work.
v2: Rename "advanced contexts" to the more correct "logical ring
contexts".
v3: Add a module parameter to enable execlists. Execlist are relatively
new, and so it'd be wise to be able to switch back to ring submission
to debug subtle problems that will inevitably arise.
v4: Add an intel_enable_execlists function.
v5: Sanitize early, as suggested by Daniel. Remove lrc_enabled.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> (v3)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2, v4 & v5)
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Some legacy HW context code assumptions don't make sense for this new
submission method, so we will place this stuff in a separate file.
Note for reviewers: I've carefully considered the best name for this file
and this was my best option (other possibilities were intel_lr_context.c
or intel_execlist.c). I am open to a certain bikeshedding on this matter,
anyway.
And some point in time, it would be a good idea to split intel_lrc.c/.h
even further, but for the moment just shove everything together.
v2: Change to intel_lrc.c
v3: Squash together with the header file addition
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>