The execlist patches have a bit a convoluted and long history and due
to that have the actual submission still misplaced deeply burried in
the low-level ringbuffer handling code. This design goes back to the
legacy ringbuffer code with its tricky lazy request and simple work
submissiion using ring tail writes. For that reason they need a
ring->ctx backpointer.
The goal is to unburry that code and move it up into a level where the
full execlist context is available so that we can ditch this
backpointer. Until that's done make it really obvious that there's
work still to be done.
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Thomas Daniel <thomas.daniel@intel.com>
Acked-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The current error state harks back to the era of just a single VM. For
full-ppgtt, we capture every bo on every VM. It behoves us to then print
every bo for every VM, which we currently fail to do and so miss vital
information in the error state.
v2: Use the vma address rather than -1!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
On VLV, post S0i3 during i915_drm_thaw following issue is observed during ring
initialization.
[ 335.604039] [drm:stop_ring] ERROR render ring :timed out trying to stop ring
[ 336.607340] [drm:stop_ring] ERROR render ring :timed out trying to stop ring
[ 336.607345] [drm:init_ring_common] ERROR failed to set render ring head to zero ctl 00000000 head 00000000 tail 00000000 start 00000000
[ 337.610645] [drm:stop_ring] ERROR bsd ring :timed out trying to stop ring
[ 338.613952] [drm:stop_ring] ERROR bsd ring :timed out trying to stop ring
[ 338.613956] [drm:init_ring_common] ERROR failed to set bsd ring head to zero ctl 00000000 head 00000000 tail 00000000 start 00000000
[ 339.617256] [drm:stop_ring] ERROR render ring :timed out trying to stop ring
[ 339.617258] -----------[ cut here ]-----------
[ 339.617267] WARNING: CPU: 0 PID: 6 at drivers/gpu/drm/i915/intel_ringbuffer.c:1666 intel_cleanup_ring+0xe6/0xf0()
[ 339.617396] --[ end trace 5ef5ed1a3c92e2a6 ]--
[ 339.617428] [drm:__i915_drm_thaw] ERROR failed to re-initialize GPU, declaring wedged!
This is happening since wake is not enabled and Gunit registers are not restored.
For this system suspend/resume paths need to follow save/restore and additional
platform specific setup in suspend_complete and resume_prepare.
suspend_complete is shared unconditionaly for VLV, HSW, BDW. resume_prepare for
HSW and BDW has pc8 disabling which is needed during thaw_early so sharing
uncondtionally. For VLV and SNB runtime resume specific sequence exists.
Cc: Imre Deak <imre.deak@intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Goel, Akash <akash.goel@intel.com>
Signed-off-by: Sagar Kamble <sagar.a.kamble@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With this change, intel_runtime_suspend and intel_runtime_resume functions
become completely platform agnostic. Platform specific suspend/resume
changes are moved to intel_suspend_complete and intel_resume_prepare.
Cc: Imre Deak <imre.deak@intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Goel, Akash <akash.goel@intel.com>
Signed-off-by: Sagar Kamble <sagar.a.kamble@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Rather than take and release the console_lock() around a non-existent
DRM_I915_FBDEV, move the lock acquisation into the callee where it will
be compiled out by the config option entirely. This includes moving the
deferred fb_set_suspend() dance and encapsulating it entirely within
intel_fbdev.c.
v2: Use an integral work item so that we can explicitly flush the work
upon suspend/unload.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
[danvet: Add the flush_work in fbdev_fini per the mailing list
discussion. And s/BUG_ON/WARN_ON/ because.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Ville pointed out the GCCism __builtin_types_compatible_p() that we
could use to replace our heavily casted presumption __I915__ macro that
was based on comparing struct sizes.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
845/865 support different cursor sizes as well, albeit a bit differently
than later platforms. Add the necessary code to make them work.
Untested due to lack of hardware.
v2: Warn but accept invalid stride (Chris)
Rewrite the cursor size checks for other platforms (Chris)
v3: More polish and magic to the cursor size checks (Chris)
v4: Moar polish and a comment (Chris)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Ever since
commit 5efb3e2838
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date: Wed Apr 9 13:28:53 2014 +0300
drm/i915/chv: Add cursor pipe offsets
the only difference between i9xx_update_cursor() and ivb_update_cursor()
was the hsw+ pipe csc handling. Let's unify them and we can rid
outselves of some duplicated code.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
CURSIZE register exists on 845/865 only, so move it to
i845_update_cursor(). Changes to cursor size must be done only when the
cursor is disabled, so do the write just before enabling the cursor.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Make sure the cursor gets fully clipped when enabling it on a disabled
crtc via setplane. This will prevent the lower level code from
attempting to enable the cursor in hardware.
Cc: Paulo Zanoni <przanoni@gmail.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Also remove related WARN_ONs which seem to have been hit since a rather
long time. But apperently no one noticed since our module reload is
already WARNING-infested :(
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We want to move the aliasing ppgtt cleanup back into the global
gtt cleanup code for symmetry, but first we need to create such
a place.
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Address space cleanup isn't really a job for the low-level cleanup
callbacks. Without this change we can't reuse the low-level cleanup
callback for the aliasing ppgtt cleanup.
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now that all the flow is streamlined the rule is simple: We create
a new ppgtt for a new context when we have full ppgtt enabled.
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There's a bit a confusion since we track the global gtt,
the aliasing and real ppgtt in the ctx->vm pointer. And not
all callers really bother to check for the different cases and just
presume that it points to a real ppgtt.
Now looking closely we don't actually need ->vm to always point at an
address space - the only place that cares actually has fixup code
already to decide whether to look at the per-proces or the global
address space.
So switch to just tracking the ppgtt directly and ditch all the
extraneous code.
v2: Fixup the ppgtt debugfs file to not oops on a NULL ctx->ppgtt.
Also drop the early exit - without aliasing ppgtt we want to dump all
the ppgtts of the contexts if we have full ppgtt.
v3: Actually git add the compile fix.
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Cc: "Thierry, Michel" <michel.thierry@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
OTC-Jira: VIZ-3724
[danvet: Resolve conflicts with execlist patches while applying.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Stuffing this into the context setup code doesn't make a lot of sense.
Also reusing the real ppgtt setup code makes even less sense since the
aliasing ppgtt isn't a real address space. Leaving all that stuff
unitialized will make sure that we catch any abusers promptly.
This is also a prep work to clean up the context->ppgtt link.
v2: Fix up the logic fail, I've fumbled it so badly to completely
disable ppgtt on gen6. Spotted by Ville and Michel. Also move around
the pde write into the gen6 init function, since otherwise it won't
work at all.
v3: Only initialize the aliasing ppgtt when we actually enable it.
Cc: "Thierry, Michel" <michel.thierry@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
[danvet: Squash in fixup from Fengguang Wu.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Currently we abuse the aliasing ppgtt to set up the ppgtt support in
general. Which is a bit backwards since with full ppgtt we don't ever
need the aliasing ppgtt.
So untangle this and separate the ppgtt init from the aliasing
ppgtt. While at it drag it out of the context enabling (which just
does a switch to the default context).
Note that we still have the differentiation between synchronous and
asynchronous ppgtt setup, but that will soon vanish. So also correctly
wire up the return value handling to be prepared for when ->switch_mm
drops the synchronous parameter and could start to fail.
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
A subsequent patch will no longer initialize the aliasing ppgtt if we
have full ppgtt enabled, since we simply don't need that any more.
Unfortunately a few places check for the aliasing ppgtt instead of
checking for ppgtt in general. Fix them up.
One special case are the gtt offset and size macros, which have some
code to remap the aliasing ppgtt to the global gtt. The aliasing ppgtt
is _not_ a logical address space, so passing that in as the vm is
plain and simple a bug. So just WARN about it and carry on - we have a
gracefully fall-through anyway if we can't find the vma.
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We already needs this just as a safety check in case the preallocation
reservation dance fails. But we definitely need this to be able to
move tha aliasing ppgtt setup back out of the context code to this
place, where it belongs.
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Stuff in headers really aught to have this.
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This essentially unbreaks non-ppgtt operation where we'd scribble over
random memory.
While at it give the vm_to_ppgtt function a proper prefix and make it
a bit more paranoid.
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Hardware contexts reference a ppgtt, not the other way round. And the
only user of this (in debugfs) actually only cares about which file
the ppgtt is associated with. So give it what it wants.
While at it give the ppgtt create function a proper name&place.
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
So when reviewing Michel's patch I've noticed a few things and cleaned
them up:
- The early checks in ppgtt_release are now redundant: The inactive
list should always be empty now, so we can ditch these checks. Even
for the aliasing ppgtt (though that's a different confusion) since
we tear that down after all the objects are gone.
- The ppgtt handling functions are splattered all over. Consolidate
them in i915_gem_gtt.c, give them OCD prefixes and add wrappers for
get/put.
- There was a bit a confusion in ppgtt_release about whether it cares
about the active or inactive list. It should care about them both,
so augment the WARNINGs to check for both.
There's still create_vm_for_ctx left to do, put that is blocked on the
removal of ppgtt->ctx. Once that's done we can rename it to
i915_ppgtt_create and move it to its siblings for handling ppgtts.
v2: Move the ppgtt checks into the inline get/put functions as
suggested by Chris.
v3: Inline the now redundant ppgtt local variable.
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
VMAs should take a reference of the address space they use.
Now, when the fd is closed, it will release the ref that the context was
holding, but it will still be referenced by any vmas that are still
active.
ppgtt_release() should then only be called when the last thing referencing
it releases the ref, and it can just call the base cleanup and free the
ppgtt.
Note that with this we will extend the lifetime of ppgtts which
contain shared objects. But all the non-shared objects will get
removed as soon as they drop of the active list and for the shared
ones the shrinker can eventually reap them. Since we currently can't
evict ppgtt pagetables either I don't think that temporary leak is
important.
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
[danvet: Add note about potential ppgtt leak with this approach.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The normal flip function places things in the ring in the legacy
way, so we either fix that or force MMIO flips always as we do in
this patch.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Checkpatch. Fucking again.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is what i915_gem_do_execbuffer calls when it wants to execute some
worload in an Execlists world.
v2: Check arguments before doing stuff in intel_execlists_submission. Also,
get rel_constants parsing right.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Drop the chipset flush, that's pre-gen6. And appease
checkpatch a bit .... again!]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We need to attend context switch interrupts from all rings. Also, fixed writing
IMR/IER and added HWSTAM at ring init time.
Notice that, if added to irq_enable_mask, the context switch interrupts would
be incorrectly masked out when the user interrupts are due to no users waiting
on a sequence number. Therefore, this commit adds a bitmask of interrupts to
be kept unmasked at all times.
v2: Disable HWSTAM, as suggested by Damien (nobody listens to these interrupts,
anyway).
v3: Add new get/put_irq functions.
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2 & v3)
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Drop the GEN8_ prefix from the context switch interrupt
define and move it to its brethren.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is a hard one, since there is no direct hardware ring to
control when in Execlists.
We reuse intel_ring_idle here, but it should be fine as long
as i915_add_request does the ring thing.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Same as the legacy-style ring->flush.
v2: The BSD invalidate bit still exists in GEN8! Add it for the VCS
rings (but still consolidate the blt and bsd ring flushes into one).
This was noticed by Brad Volkin.
v3: The command for BSD and for other rings is slightly different:
get it exactly the same as in gen6_ring_flush + gen6_bsd_ring_flush
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Checkpatch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Very similar to the legacy add_request, only modified to account for
logical ringbuffer.
v2: Use MI_GLOBAL_GTT, as suggested by Brad Volkin.
v3: Unify render and non-render in the same function, as noticed by
Brad Volkin.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Well, new-ish: if all this code looks familiar, that's because it's
a clone of the existing submission mechanism (with some modifications
here and there to adapt it to LRCs and Execlists).
And why did we do this instead of reusing code, one might wonder?
Well, there are some fears that the differences are big enough that
they will end up breaking all platforms.
Also, Execlists offer several advantages, like control over when the
GPU is done with a given workload, that can help simplify the
submission mechanism, no doubt. I am interested in getting Execlists
to work first and foremost, but in the future this parallel submission
mechanism will help us to fine tune the mechanism without affecting
old gens.
v2: Pass the ringbuffer only (whenever possible).
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Appease checkpatch. Again. And drop the legacy sarea gunk
that somehow crept in.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Don't print raw numbers, use port_name() and tell the user whether it's
long or short without having to figure out what the other magic number
means.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
No mistery here: the seqno is still retrieved from the engine's
HW status page (the one in the default context. For the moment,
I see no reason to worry about other context's HWS page).
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Logical rings do not need most of the initialization their
legacy ringbuffer counterparts do: we just need the pipe
control object for the render ring, enable Execlists on the
hardware and a few workarounds.
v2: Squash with: "drm/i915: Extract pipe control fini & make
init outside accesible".
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Make checkpatch happy.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Allocate and populate the default LRC for every ring, call
gen-specific init/cleanup, init/fini the command parser and
set the status page (now inside the LRC object). These are
things all engines/rings have in common.
Stopping the ring before cleanup and initializing the seqnos
is left as a TODO task (we need more infrastructure in place
before we can achieve this).
v2: Check the ringbuffer backing obj for ring_is_initialized,
instead of the context backing obj (similar, but not exactly
the same).
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Execlists are indeed a brave new world with respect to workload
submission to the GPU.
In previous version of these series, I have tried to impact the
legacy ringbuffer submission path as little as possible (mostly,
passing the context around and using the correct ringbuffer when I
needed one) but Daniel is afraid (probably with a reason) that
these changes and, especially, future ones, will end up breaking
older gens.
This commit and some others coming next will try to limit the
damage by creating an alternative path for workload submission.
The first step is here: laying out a new ring init/fini.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As suggested by Daniel Vetter. The idea, in subsequent patches, is to
provide an alternative to these vfuncs for the Execlists submission
mechanism.
v2: Splitted into two and reordered to illustrate our intentions, instead
of showing it off. Also, remove the add_request vfunc and added the
stop_ring one.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet:
- Make checkpatch happy.
- Be grumpy about the excessive vtable.
- Ditch gt->is_ring_initialized.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The backing objects and ringbuffers for contexts created via open
fd are actually empty until the user starts sending execbuffers to
them. At that point, we allocate & populate them. We do this because,
at create time, we really don't know which engine is going to be used
with the context later on (and we don't want to waste memory on
objects that we might never use).
v2: As contexts created via ioctl can only be used with the render
ring, we have enough information to allocate & populate them right
away.
v3: Defer the creation always, even with ioctl-created contexts, as
requested by Daniel Vetter.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
For the most part, logical ring context objects are similar to hardware
contexts in that the backing object is meant to be opaque. There are
some exceptions where we need to poke certain offsets of the object for
initialization, updating the tail pointer or updating the PDPs.
For our basic execlist implementation we'll only need our PPGTT PDs,
and ringbuffer addresses in order to set up the context. With previous
patches, we have both, so start prepping the context to be load.
Before running a context for the first time you must populate some
fields in the context object. These fields begin 1 PAGE + LRCA, ie. the
first page (in 0 based counting) of the context image. These same
fields will be read and written to as contexts are saved and restored
once the system is up and running.
Many of these fields are completely reused from previous global
registers: ringbuffer head/tail/control, context control matches some
previous MI_SET_CONTEXT flags, and page directories. There are other
fields which we don't touch which we may want in the future.
v2: CTX_LRI_HEADER_0 is MI_LOAD_REGISTER_IMM(14) for render and (11)
for other engines.
v3: Several rebases and general changes to the code.
v4: Squash with "Extract LR context object populating"
Also, Damien's review comments:
- Set the Force Posted bit on the LRI header, as the BSpec suggest we do.
- Prevent warning when compiling a 32-bits kernel without HIGHMEM64.
- Add a clarifying comment to the context population code.
v5: Damien's review comments:
- The third MI_LOAD_REGISTER_IMM in the context does not set Force Posted.
- Remove dead code.
v6: Add a note about the (presumed) differences between BDW and CHV state
contexts. Also, Brad's review comments:
- Use the _MASKED_BIT_ENABLE, upper_32_bits and lower_32_bits macros.
- Be less magical about how we set the ring size in the context.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com> (v2)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Any given ringbuffer is unequivocally tied to one context and one engine.
By setting the appropriate pointers to them, the ringbuffer struct holds
all the infromation you might need to submit a workload for processing,
Execlists style.
v2: Drop ring->ctx since that looks terribly ill-defined for legacy
ringbuffer submission.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v1)
Acked-by: Damien Lespiau <damien.lespiau@intel.com> (v2)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we have said a couple of times by now, logical ring contexts have
their own ringbuffers: not only the backing pages, but the whole
management struct.
In a previous version of the series, this was achieved with two separate
patches:
drm/i915/bdw: Allocate ringbuffer backing objects for default global LRC
drm/i915/bdw: Allocate ringbuffer for user-created LRCs
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now that we have the ability to allocate our own context backing objects
and we have multiplexed one of them per engine inside the context structs,
we can finally allocate and free them correctly.
Regarding the context size, reading the register to calculate the sizes
can work, I think, however the docs are very clear about the actual
context sizes on GEN8, so just hardcode that and use it.
v2: Rebased on top of the Full PPGTT series. It is important to notice
that at this point we have one global default context per engine, all
of them using the aliasing PPGTT (as opposed to the single global
default context we have with legacy HW contexts).
v3:
- Go back to one single global default context, this time with multiple
backing objects inside.
- Use different context sizes for non-render engines, as suggested by
Damien (still hardcoded, since the information about the context size
registers in the BSpec is, well, *lacking*).
- Render ctx size is 20 (or 19) pages, but not 21 (caught by Damien).
- Move default context backing object creation to intel_init_ring (so
that we don't waste memory in rings that might not get initialized).
v4:
- Reuse the HW legacy context init/fini.
- Create a separate free function.
- Rename the functions with an intel_ preffix.
v5: Several rebases to account for the changes in the previous patches.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
A context backing object only makes sense for a given engine (because
it holds state data specific to that engine).
In legacy ringbuffer sumission mode, the only MI_SET_CONTEXT we really
perform is for the render engine, so one backing object is all we nee.
With Execlists, however, we need backing objects for every engine, as
contexts become the only way to submit workloads to the GPU. To tackle
this problem, we multiplex the context struct to contain <no-of-engines>
objects.
Originally, I colored this code by instantiating one new context for
every engine I wanted to use, but this change suggested by Brad Volkin
makes it more elegant.
v2: Leave the old backing object pointer behind. Daniel Vetter suggested
using a union, but it makes more sense to keep rcs_state as a NULL
pointer behind, to make sure no one uses it incorrectly when Execlists
are enabled, similar to what he suggested for ring->buffer (Rusty's API
level 5).
v3: Use the name "state" instead of the too-generic "obj", so that it
mirrors the name choice for the legacy rcs_state.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
For the moment this is just a placeholder, but it shows one of the
main differences between the good ol' HW contexts and the shiny
new Logical Ring Contexts: LR contexts allocate and free their
own backing objects. Another difference is that the allocation is
deferred (as the create function name suggests), but that does not
happen in this patch yet, because for the moment we are only dealing
with the default context.
Early in the series we had our own gen8_gem_context_init/fini
functions, but the truth is they now look almost the same as the
legacy hw context init/fini functions. We can always split them
later if this ceases to be the case.
Also, we do not fall back to legacy ringbuffers when logical ring
context initialization fails (not very likely to happen and, even
if it does, hw contexts would probably fail as well).
v2: Daniel says "explain, do not showcase".
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: s/BUG_ON/WARN_ON/.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Depending upon one module option to be sanitized (through USES_PPGTT)
for the other is a bit too fragile for my taste. At least WARN about
this.
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
GEN8 brings an expansion of the HW contexts: "Logical Ring Contexts".
These expanded contexts enable a number of new abilities, especially
"Execlists".
The macro is defined to off until we have things in place to hope to
work.
v2: Rename "advanced contexts" to the more correct "logical ring
contexts".
v3: Add a module parameter to enable execlists. Execlist are relatively
new, and so it'd be wise to be able to switch back to ring submission
to debug subtle problems that will inevitably arise.
v4: Add an intel_enable_execlists function.
v5: Sanitize early, as suggested by Daniel. Remove lrc_enabled.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> (v3)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2, v4 & v5)
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Some legacy HW context code assumptions don't make sense for this new
submission method, so we will place this stuff in a separate file.
Note for reviewers: I've carefully considered the best name for this file
and this was my best option (other possibilities were intel_lr_context.c
or intel_execlist.c). I am open to a certain bikeshedding on this matter,
anyway.
And some point in time, it would be a good idea to split intel_lrc.c/.h
even further, but for the moment just shove everything together.
v2: Change to intel_lrc.c
v3: Squash together with the header file addition
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Even though we should not try to use 4+GiB GTTs on 32-bit systems, by
using a local variable we can future proof the code whilst making it
easier to read.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Appease checkpatch a bit.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Part of the pre-validation for an execbuffer call is that there is at
least one object in the execlist. As we bail if we fail to lookup any
object, we can be sure that after the eb_lookup_vma() there is at least
one object in the vma list and so we do not need to assert.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>