OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Michel Thierry	acdd884a2e	drm/i915/bdw: Two-stage execlist submit process Context switch (and execlist submission) should happen only when other contexts are not active, otherwise pre-emption occurs. To assure this, we place context switch requests in a queue and those request are later consumed when the right context switch interrupt is received (still TODO). v2: Use a spinlock, do not remove the requests on unqueue (wait for context switch completion). Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> v3: Several rebases and code changes. Use unique ID. v4: - Move the queue/lock init to the late ring initialization. - Damien's kmalloc review comments: check return, use sizeof(*req), do not cast. v5: - Do not reuse drm_i915_gem_request. Instead, create our own. - New namespace. Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v1) Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2-v5) Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [davnet: Checkpatch + wash-up s/BUG_ON/WARN_ON/.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-14 22:10:59 +02:00
Oscar Mateo	ae1250b9da	drm/i915/bdw: Write the tail pointer, LRC style Each logical ring context has the tail pointer in the context object, so update it before submission. v2: New namespace. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-14 22:03:09 +02:00
Ben Widawsky	84b790f80e	drm/i915/bdw: Implement context switching (somewhat) A context switch occurs by submitting a context descriptor to the ExecList Submission Port. Given that we can now initialize a context, it's possible to begin implementing the context switch by creating the descriptor and submitting it to ELSP (actually two, since the ELSP has two ports). The context object must be mapped in the GGTT, which means it must exist in the 0-4GB graphics VA range. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> v2: This code has changed quite a lot in various rebases. Of particular importance is that now we use the globally unique Submission ID to send to the hardware. Also, context pages are now pinned unconditionally to GGTT, so there is no need to bind them. v3: Use LRCA[31:12] as hwCtxId[19:0]. This guarantees that the HW context ID we submit to the ELSP is globally unique and != 0 (Bspec requirements of the software use-only bits of the Context ID in the Context Descriptor Format) without the hassle of the previous submission Id construction. Also, re-add the ELSP porting read (it was dropped somewhere during the rebases). v4: - Squash with "drm/i915/bdw: Add forcewake lock around ELSP writes" (BSPEC says: "SW must set Force Wakeup bit to prevent GT from entering C6 while ELSP writes are in progress") as noted by Thomas Daniel (thomas.daniel@intel.com). - Rename functions and use an execlists/intel_execlists_ namespace. - The BUG_ON only checked that the LRCA was <32 bits, but it didn't make sure that it was properly aligned. Spotted by Alistair Mcaulay <alistair.mcaulay@intel.com>. v5: - Improved source code comments as suggested by Chris Wilson. - No need to abstract submit_ctx away, as pointed by Brad Volkin. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: Checkpatch. Sigh.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-14 22:03:03 +02:00
Oscar Mateo	48e29f5535	drm/i915/bdw: Emission of requests with logical rings On a previous iteration of this patch, I created an Execlists version of __i915_add_request and asbtracted it away as a vfunc. Daniel Vetter wondered then why that was needed: "with the clean split in command submission I expect every function to know wether it'll submit to an lrc (everything in intel_lrc.c) or wether it'll submit to a legacy ring (existing code), so I don't see a need for an add_request vfunc." The honest, hairy truth is that this patch is the glue keeping the whole logical ring puzzle together: - i915_add_request is used by intel_ring_idle, which in turn is used by i915_gpu_idle, which in turn is used in several places inside the eviction and gtt codes. - Also, it is used by i915_gem_check_olr, which is littered all over i915_gem.c - ... If I were to duplicate all the code that directly or indirectly uses __i915_add_request, I'll end up creating a separate driver. To show the differences between the existing legacy version and the new Execlists one, this time I have special-cased __i915_add_request instead of adding an add_request vfunc. I hope this helps to untangle this Gordian knot. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: Adjust to ringbuf->FIXME_lrc_ctx per the discussion with Thomas Daniel.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-14 22:02:55 +02:00
Oscar Mateo	582d67f0b1	drm/i915: Add temporary ring->ctx backpointer The execlist patches have a bit a convoluted and long history and due to that have the actual submission still misplaced deeply burried in the low-level ringbuffer handling code. This design goes back to the legacy ringbuffer code with its tricky lazy request and simple work submissiion using ring tail writes. For that reason they need a ring->ctx backpointer. The goal is to unburry that code and move it up into a level where the full execlist context is available so that we can ditch this backpointer. Until that's done make it really obvious that there's work still to be done. Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: Thomas Daniel <thomas.daniel@intel.com> Acked-by: Thomas Daniel <thomas.daniel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-14 18:42:59 +02:00
Daniel Vetter	ae6c480692	drm/i915: Only track real ppgtt for a context There's a bit a confusion since we track the global gtt, the aliasing and real ppgtt in the ctx->vm pointer. And not all callers really bother to check for the different cases and just presume that it points to a real ppgtt. Now looking closely we don't actually need ->vm to always point at an address space - the only place that cares actually has fixup code already to decide whether to look at the per-proces or the global address space. So switch to just tracking the ppgtt directly and ditch all the extraneous code. v2: Fixup the ppgtt debugfs file to not oops on a NULL ctx->ppgtt. Also drop the early exit - without aliasing ppgtt we want to dump all the ppgtts of the contexts if we have full ppgtt. v3: Actually git add the compile fix. Reviewed-by: Michel Thierry <michel.thierry@intel.com> Cc: "Thierry, Michel" <michel.thierry@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> OTC-Jira: VIZ-3724 [danvet: Resolve conflicts with execlist patches while applying.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-13 14:23:33 +02:00
Oscar Mateo	14bf993e83	drm/i915/bdw: Always use MMIO flips with Execlists The normal flip function places things in the ring in the legacy way, so we either fix that or force MMIO flips always as we do in this patch. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: Checkpatch. Fucking again.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 23:25:49 +02:00
Oscar Mateo	ba8b7ccb19	drm/i915/bdw: Workload submission mechanism for Execlists This is what i915_gem_do_execbuffer calls when it wants to execute some worload in an Execlists world. v2: Check arguments before doing stuff in intel_execlists_submission. Also, get rel_constants parsing right. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: Drop the chipset flush, that's pre-gen6. And appease checkpatch a bit .... again!] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 23:18:38 +02:00
Oscar Mateo	1564858526	drm/i915/bdw: GEN-specific logical ring emit batchbuffer start Dispatch_execbuffer's evil twin. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: Ditch the check for aliasing ppgtt. It'll break soon and execlists requires full ppgtt anyway.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 23:12:34 +02:00
Oscar Mateo	73d477f6bb	drm/i915/bdw: Interrupts with logical rings We need to attend context switch interrupts from all rings. Also, fixed writing IMR/IER and added HWSTAM at ring init time. Notice that, if added to irq_enable_mask, the context switch interrupts would be incorrectly masked out when the user interrupts are due to no users waiting on a sequence number. Therefore, this commit adds a bitmask of interrupts to be kept unmasked at all times. v2: Disable HWSTAM, as suggested by Damien (nobody listens to these interrupts, anyway). v3: Add new get/put_irq functions. Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> (v1) Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2 & v3) Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: Drop the GEN8_ prefix from the context switch interrupt define and move it to its brethren.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 23:06:58 +02:00
Oscar Mateo	9832b9dae8	drm/i915/bdw: Ring idle and stop with logical rings This is a hard one, since there is no direct hardware ring to control when in Execlists. We reuse intel_ring_idle here, but it should be fine as long as i915_add_request does the ring thing. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 22:57:38 +02:00
Oscar Mateo	4712274c36	drm/i915/bdw: GEN-specific logical ring emit flush Same as the legacy-style ring->flush. v2: The BSD invalidate bit still exists in GEN8! Add it for the VCS rings (but still consolidate the blt and bsd ring flushes into one). This was noticed by Brad Volkin. v3: The command for BSD and for other rings is slightly different: get it exactly the same as in gen6_ring_flush + gen6_bsd_ring_flush Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: Checkpatch.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 22:44:37 +02:00
Oscar Mateo	4da46e1e5b	drm/i915/bdw: GEN-specific logical ring emit request Very similar to the legacy add_request, only modified to account for logical ringbuffer. v2: Use MI_GLOBAL_GTT, as suggested by Brad Volkin. v3: Unify render and non-render in the same function, as noticed by Brad Volkin. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 22:42:49 +02:00
Oscar Mateo	82e104cc26	drm/i915/bdw: New logical ring submission mechanism Well, new-ish: if all this code looks familiar, that's because it's a clone of the existing submission mechanism (with some modifications here and there to adapt it to LRCs and Execlists). And why did we do this instead of reusing code, one might wonder? Well, there are some fears that the differences are big enough that they will end up breaking all platforms. Also, Execlists offer several advantages, like control over when the GPU is done with a given workload, that can help simplify the submission mechanism, no doubt. I am interested in getting Execlists to work first and foremost, but in the future this parallel submission mechanism will help us to fine tune the mechanism without affecting old gens. v2: Pass the ringbuffer only (whenever possible). Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: Appease checkpatch. Again. And drop the legacy sarea gunk that somehow crept in.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 22:42:36 +02:00
Oscar Mateo	e94e37ad19	drm/i915/bdw: GEN-specific logical ring set/get seqno No mistery here: the seqno is still retrieved from the engine's HW status page (the one in the default context. For the moment, I see no reason to worry about other context's HWS page). Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 17:05:17 +02:00
Oscar Mateo	9b1136d505	drm/i915/bdw: GEN-specific logical ring init Logical rings do not need most of the initialization their legacy ringbuffer counterparts do: we just need the pipe control object for the render ring, enable Execlists on the hardware and a few workarounds. v2: Squash with: "drm/i915: Extract pipe control fini & make init outside accesible". Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: Make checkpatch happy.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 17:03:28 +02:00
Oscar Mateo	48d823878d	drm/i915/bdw: Generic logical ring init and cleanup Allocate and populate the default LRC for every ring, call gen-specific init/cleanup, init/fini the command parser and set the status page (now inside the LRC object). These are things all engines/rings have in common. Stopping the ring before cleanup and initializing the seqnos is left as a TODO task (we need more infrastructure in place before we can achieve this). v2: Check the ringbuffer backing obj for ring_is_initialized, instead of the context backing obj (similar, but not exactly the same). Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 16:55:17 +02:00
Oscar Mateo	454afebde8	drm/i915/bdw: Skeleton for the new logical rings submission path Execlists are indeed a brave new world with respect to workload submission to the GPU. In previous version of these series, I have tried to impact the legacy ringbuffer submission path as little as possible (mostly, passing the context around and using the correct ringbuffer when I needed one) but Daniel is afraid (probably with a reason) that these changes and, especially, future ones, will end up breaking older gens. This commit and some others coming next will try to limit the damage by creating an alternative path for workload submission. The first step is here: laying out a new ring init/fini. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 16:40:57 +02:00
Oscar Mateo	8670d6f97d	drm/i915/bdw: Populate LR contexts (somewhat) For the most part, logical ring context objects are similar to hardware contexts in that the backing object is meant to be opaque. There are some exceptions where we need to poke certain offsets of the object for initialization, updating the tail pointer or updating the PDPs. For our basic execlist implementation we'll only need our PPGTT PDs, and ringbuffer addresses in order to set up the context. With previous patches, we have both, so start prepping the context to be load. Before running a context for the first time you must populate some fields in the context object. These fields begin 1 PAGE + LRCA, ie. the first page (in 0 based counting) of the context image. These same fields will be read and written to as contexts are saved and restored once the system is up and running. Many of these fields are completely reused from previous global registers: ringbuffer head/tail/control, context control matches some previous MI_SET_CONTEXT flags, and page directories. There are other fields which we don't touch which we may want in the future. v2: CTX_LRI_HEADER_0 is MI_LOAD_REGISTER_IMM(14) for render and (11) for other engines. v3: Several rebases and general changes to the code. v4: Squash with "Extract LR context object populating" Also, Damien's review comments: - Set the Force Posted bit on the LRI header, as the BSpec suggest we do. - Prevent warning when compiling a 32-bits kernel without HIGHMEM64. - Add a clarifying comment to the context population code. v5: Damien's review comments: - The third MI_LOAD_REGISTER_IMM in the context does not set Force Posted. - Remove dead code. v6: Add a note about the (presumed) differences between BDW and CHV state contexts. Also, Brad's review comments: - Use the _MASKED_BIT_ENABLE, upper_32_bits and lower_32_bits macros. - Be less magical about how we set the ring size in the context. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1) Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com> (v2) Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 16:21:53 +02:00
Daniel Vetter	0c7dd53b84	drm/i915/bdw: Add a context and an engine pointers to the ringbuffer Any given ringbuffer is unequivocally tied to one context and one engine. By setting the appropriate pointers to them, the ringbuffer struct holds all the infromation you might need to submit a workload for processing, Execlists style. v2: Drop ring->ctx since that looks terribly ill-defined for legacy ringbuffer submission. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v1) Acked-by: Damien Lespiau <damien.lespiau@intel.com> (v2) Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 16:18:17 +02:00
Oscar Mateo	84c2377fce	drm/i915/bdw: Allocate ringbuffers for Logical Ring Contexts As we have said a couple of times by now, logical ring contexts have their own ringbuffers: not only the backing pages, but the whole management struct. In a previous version of the series, this was achieved with two separate patches: drm/i915/bdw: Allocate ringbuffer backing objects for default global LRC drm/i915/bdw: Allocate ringbuffer for user-created LRCs Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 16:10:58 +02:00
Oscar Mateo	8c8579176a	drm/i915/bdw: A bit more advanced LR context alloc/free Now that we have the ability to allocate our own context backing objects and we have multiplexed one of them per engine inside the context structs, we can finally allocate and free them correctly. Regarding the context size, reading the register to calculate the sizes can work, I think, however the docs are very clear about the actual context sizes on GEN8, so just hardcode that and use it. v2: Rebased on top of the Full PPGTT series. It is important to notice that at this point we have one global default context per engine, all of them using the aliasing PPGTT (as opposed to the single global default context we have with legacy HW contexts). v3: - Go back to one single global default context, this time with multiple backing objects inside. - Use different context sizes for non-render engines, as suggested by Damien (still hardcoded, since the information about the context size registers in the BSpec is, well, lacking). - Render ctx size is 20 (or 19) pages, but not 21 (caught by Damien). - Move default context backing object creation to intel_init_ring (so that we don't waste memory in rings that might not get initialized). v4: - Reuse the HW legacy context init/fini. - Create a separate free function. - Rename the functions with an intel_ preffix. v5: Several rebases to account for the changes in the previous patches. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1) Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 16:08:18 +02:00
Oscar Mateo	ede7d42bae	drm/i915/bdw: Initialization for Logical Ring Contexts For the moment this is just a placeholder, but it shows one of the main differences between the good ol' HW contexts and the shiny new Logical Ring Contexts: LR contexts allocate and free their own backing objects. Another difference is that the allocation is deferred (as the create function name suggests), but that does not happen in this patch yet, because for the moment we are only dealing with the default context. Early in the series we had our own gen8_gem_context_init/fini functions, but the truth is they now look almost the same as the legacy hw context init/fini functions. We can always split them later if this ceases to be the case. Also, we do not fall back to legacy ringbuffers when logical ring context initialization fails (not very likely to happen and, even if it does, hw contexts would probably fail as well). v2: Daniel says "explain, do not showcase". Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: s/BUG_ON/WARN_ON/.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 16:04:11 +02:00
Daniel Vetter	bd84b1e995	drm/i915: WARN if module opt sanitization goes out of order Depending upon one module option to be sanitized (through USES_PPGTT) for the other is a bit too fragile for my taste. At least WARN about this. Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Damien Lespiau <damien.lespiau@intel.com> Cc: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 16:00:34 +02:00
Oscar Mateo	127f100369	drm/i915/bdw: Macro for LRCs and module option for Execlists GEN8 brings an expansion of the HW contexts: "Logical Ring Contexts". These expanded contexts enable a number of new abilities, especially "Execlists". The macro is defined to off until we have things in place to hope to work. v2: Rename "advanced contexts" to the more correct "logical ring contexts". v3: Add a module parameter to enable execlists. Execlist are relatively new, and so it'd be wise to be able to switch back to ring submission to debug subtle problems that will inevitably arise. v4: Add an intel_enable_execlists function. v5: Sanitize early, as suggested by Daniel. Remove lrc_enabled. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1) Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> (v3) Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2, v4 & v5) Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 16:00:27 +02:00
Oscar Mateo	b20385f1f8	drm/i915/bdw: New source and header file for LRs, LRCs and Execlists Some legacy HW context code assumptions don't make sense for this new submission method, so we will place this stuff in a separate file. Note for reviewers: I've carefully considered the best name for this file and this was my best option (other possibilities were intel_lr_context.c or intel_execlist.c). I am open to a certain bikeshedding on this matter, anyway. And some point in time, it would be a good idea to split intel_lrc.c/.h even further, but for the moment just shove everything together. v2: Change to intel_lrc.c v3: Squash together with the header file addition Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 16:00:07 +02:00

26 Commits