OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Tvrtko Ursulin	2f35afe94a	drm/i915: Make int __intel_ring_space static It is only used within intel_ringbuffer.c Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.oc.uk>	2017-02-17 11:39:59 +00:00
Tvrtko Ursulin	4ac9659ef9	drm/i915: Remove duplicate intel_logical_ring_workarounds_emit intel_ring_workarounds_emit is exactly the same code. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170214150017.16058-1-tvrtko.ursulin@linux.intel.com	2017-02-15 08:16:41 +00:00
Robert Bragg	9cc19733fd	drm/i915: fix for WaDisableDopClockGating:bdw This workaround for BDW was incomplete as it also requires EUTC clock gating to be disabled via UCGCTL1. v2: read modify write UCGTL1 in broadwell_init_clock_gating (Ville) Signed-off-by: Robert Bragg <robert@sixbynine.org> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170212133252.20990-1-robert@sixbynine.org Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>	2017-02-14 22:29:28 +02:00
Tvrtko Ursulin	73dec95e6b	drm/i915: Emit to ringbuffer directly This removes the usage of intel_ring_emit in favour of directly writing to the ring buffer. intel_ring_emit was preventing the compiler for optimising fetch and increment of the current ring buffer pointer and therefore generating very verbose code for every write. It had no useful purpose since all ringbuffer operations are started and ended with intel_ring_begin and intel_ring_advance respectively, with no bail out in the middle possible, so it is fine to increment the tail in intel_ring_begin and let the code manage the pointer itself. Useless instruction removal amounts to approximately two and half kilobytes of saved text on my build. Not sure if this has any measurable performance implications but executing a ton of useless instructions on fast paths cannot be good. v2: * Change return from intel_ring_begin to error pointer by popular demand. * Move tail increment to intel_ring_advance to enable some error checking. v3: * Move tail advance back into intel_ring_begin. * Rebase and tidy. v4: * Complete rebase after a few months since v3. v5: * Remove unecessary cast and fix !debug compile. (Chris Wilson) v6: * Make intel_ring_offset take request as well. * Fix recording of request postfix plus a sprinkle of asserts. (Chris Wilson) v7: * Use intel_ring_offset to get the postfix. (Chris Wilson) * Convert GVT code as well. v8: * Rename out++ to cs++. v9: * Fix GVT out to cs conversion in GVT. v10: * Rebase for new intel_ring_begin in selftests. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170214113242.29241-1-tvrtko.ursulin@linux.intel.com	2017-02-14 14:30:46 +00:00
Chris Wilson	ae9a043b0c	drm/i915: Rename conditional GEM execution macros After a brief discussion, we settled on a naming convention for the conditional GEM debugging data that should be clearer to the casual user: GEM_DEBUG Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170207102319.10910-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2017-02-10 21:43:43 +00:00
Chris Wilson	72b72ae473	drm/i915: Always pin contexts into the high GGTT Now that we have fast top-down insertion into the drm_mm, we can use it for frequent runtime operations like insertion of the context object, whereas before we limited it to the one-off insertion of the pinned kernel context. Keeping the active context objects out of the mappable region of the global GTT (except under memory pressure) improves our ability to allocate mappable aperture region without triggering a GPU stall. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170210101422.1598-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2017-02-10 13:58:40 +00:00
Chris Wilson	c0dcb203fb	drm/i915: Restore context and pd for ringbuffer submission after reset Following a reset, the context and page directory registers are lost. However, the queue of requests that we resubmit after the reset may depend upon them - the registers are restored from a context image, but that restore may be inhibited and may simply be absent from the request if it was in the middle of a sequence using the same context. If we prime the CCID/PD registers with the first request in the queue (even for the hung request), we prevent invalid memory access for the following requests (and continually hung engines). v2: Magic BIT(8), reserved for future use but still appears unused. v3: Some commentary on handling innocent vs guilty requests v4: Add a wait for PD_BASE fetch. The reload appears to be instant on my Ivybridge, but this bit probably exists for a reason. Fixes: `821ed7df6e` ("drm/i915: Update reset path to fix incomplete requests") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170207152437.4252-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2017-02-07 21:34:58 +00:00
Chris Wilson	eca56a3511	drm/i915: Mark the end of intel_ring_begin() and check in intel_ring_advance() It is required that the caller declare the exact number of dwords they wish to write into the ring. This is required for two reasons, we need to allocate sufficient space for the entire command packet and we need to be sure that the contents are completely written to avoid executing stale data. The current interface requires for any bug to be caught in review, the reader has to carefully count the number of intel_ring_emit() between intel_ring_begin() and intel_ring_advance(). If we record the end of the packet of each intel_ring_begin() we can also have CI check for us. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170206170502.30944-1-chris@chris-wilson.co.uk	2017-02-06 20:41:00 +00:00
Ander Conselvan de Oliveira	9fb5026f85	drm/i915/glk: Turn on workarounds that apply to Geminilake too Apply workarounds to Geminilake, and annotate those that are applied unconditionally when they apply to GLK based on the workaround database. v2: Fix commit message typos. (David) v3: Rebase. Cc: David Weinehall <david.weinehall@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@gmail.com> Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Reviewed-by: David Weinehall <david.weinehall@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1485422218-9102-1-git-send-email-ander.conselvan.de.oliveira@intel.com	2017-01-30 09:49:07 +02:00
Chris Wilson	a01cb37aff	drm/i915: Remove i915_vma_create from VMA API With the introduce of i915_vma_instance() for obtaining the VMA singleton for a (obj, vm, view) tuple, we can remove the i915_vma_create() in favour of a single entry point. We do incur a lookup onto an empty tree, but the i915_vma_create() were being called infrequently and during initialisation, so the small overhead is negligible. v2: Drop the i915_ prefix from the now static vma_create() function Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170116152131.18089-4-chris@chris-wilson.co.uk	2017-01-19 10:17:39 +00:00
Francisco Jerez	8726f2faa3	drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround. The WaDisableLSQCROPERFforOCL workaround has the side effect of disabling an L3SQ optimization that has huge performance implications and is unlikely to be necessary for the correct functioning of usual graphic workloads. Userspace is free to re-enable the workaround on demand, and is generally in a better position to determine whether the workaround is necessary than the DRM is (e.g. only during the execution of compute kernels that rely on both L3 fences and HDC R/W requests). The same workaround seems to apply to BDW (at least to production stepping G1) and SKL as well (the internal workaround database claims that it does for all steppings, while the BSpec workaround table only mentions pre-production steppings), but the DRM doesn't do anything beyond whitelisting the L3SQCREG4 register so userspace can enable it when it sees fit. Do the same on KBL platforms. Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%, and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master -- This is followed by a regression of 35% and 10% respectively for the same benchmarks and platform caused by my recent patch series switching userspace to use the dataport constant cache instead of the sampler to implement uniform pull constant loads, which caused us to hit more heavily the L3 cache (and on platforms other than KBL had the opposite effect of improving performance of the same two benchmarks). The overall effect on KBL of this change combined with the recent userspace change is respectively 4.6% and 2.6%. SynMark2 OglShMapPcf was affected by the constant cache changes (though it improved as it did on other platforms rather than regressing), but is not significantly affected by this patch (with statistical significance of 5% and sample size 20). v2: Drop some more code to avoid unused variable warning. Fixes: `738fa1b312` ("drm/i915/kbl: Add WaDisableLSQCROPERFforOCL") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256 Signed-off-by: Francisco Jerez <currojerez@riseup.net> Cc: Matthew Auld <matthew.william.auld@gmail.com> Cc: Eero Tamminen <eero.t.tamminen@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: beignet@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v4.7+ Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> [Removed double Fixes tag] Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1484217894-20505-1-git-send-email-mika.kuoppala@intel.com	2017-01-12 15:48:08 +02:00
Chris Wilson	f51455d442	drm/i915: Replace 4096 with PAGE_SIZE or I915_GTT_PAGE_SIZE Start converting over from the byte count to its semantic macro, either we want to allocate the size of a physical page in main memory or we want the size of a virtual page in the GTT. 4096 could mean either, but PAGE_SIZE and I915_GTT_PAGE_SIZE are explicit and should help improve code comprehension and future changes. In the future, we may want to use variable GTT page sizes and so have the challenge of knowing which hardcoded values were used to represent a physical page vs the virtual page. v2: Look for a few more 4096s to convert, discover IS_ALIGNED(). v3: 4096ul paranoia, make fence alignment a distinct value of 4096, keep bdw stolen w/a as 4096 until we know better. v4: Add asserts that i915_vma_insert() start/end are aligned to GTT page sizes. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170110144734.26052-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2017-01-10 20:54:32 +00:00
Chris Wilson	984ff29f74	drm/i915: Simplify testing for am-I-the-kernel-context? The kernel context (dev_priv->kernel_context) is unique in that it is not associated with any user filp - it is the only one with ctx->file_priv == NULL. This is a simpler test than comparing it against dev_priv->kernel_context which involves some pointer dancing. In checking that this is true, we notice that the gvt context is allocating itself a i915_hw_ppgtt it doesn't use and not flagging that its file_priv should be invalid. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170106152013.24684-5-chris@chris-wilson.co.uk	2017-01-06 16:02:12 +00:00
Daniele Ceraolo Spurio	d3ef1af6fd	drm/i915: request ring to be pinned above GUC_WOPCM_TOP GuC will validate the ring offset and fail if it is in the [0, GUC_WOPCM_TOP) range. The bias is conditionally applied only if GuC loading is enabled (we can't check for guc submission enabled as in other cases because HuC loading requires this fix). Note that the default context is processed before enable_guc_loading is sanitized, so we might still apply the bias to its ring even if it is not needed. v2: compute the value during ctx init and pass it to intel_ring_pin (Chris), updated commit message Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com> Cc: Anusha Srivatsa <anusha.srivatsa@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1482537382-28584-1-git-send-email-daniele.ceraolospurio@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-12-24 10:06:59 +00:00
Chris Wilson	f73e73999d	drm/i915: Swap if(enable_execlists) in i915_gem_request_alloc for a vfunc A fairly trivial move of a matching pair of routines (for preparing a request for construction) onto an engine vfunc. The ulterior motive is to be able to create a mock request implementation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-7-chris@chris-wilson.co.uk	2016-12-18 16:18:56 +00:00
Chris Wilson	e8a9c58fcd	drm/i915: Unify active context tracking between legacy/execlists/guc The requests conversion introduced a nasty bug where we could generate a new request in the middle of constructing a request if we needed to idle the system in order to evict space for a context. The request to idle would be executed (and waited upon) before the current one, creating a minor havoc in the seqno accounting, as we will consider the current request to already be completed (prior to deferred seqno assignment) but ring->last_retired_head would have been updated and still could allow us to overwrite the current request before execution. We also employed two different mechanisms to track the active context until it was switched out. The legacy method allowed for waiting upon an active context (it could forcibly evict any vma, including context's), but the execlists method took a step backwards by pinning the vma for the entire active lifespan of the context (the only way to evict was to idle the entire GPU, not individual contexts). However, to circumvent the tricky issue of locking (i.e. we cannot take struct_mutex at the time of i915_gem_request_submit(), where we would want to move the previous context onto the active tracker and unpin it), we take the execlists approach and keep the contexts pinned until retirement. The benefit of the execlists approach, more important for execlists than legacy, was the reduction in work in pinning the context for each request - as the context was kept pinned until idle, it could short circuit the pinning for all active contexts. We introduce new engine vfuncs to pin and unpin the context respectively. The context is pinned at the start of the request, and only unpinned when the following request is retired (this ensures that the context is idle and coherent in main memory before we unpin it). We move the engine->last_context tracking into the retirement itself (rather than during request submission) in order to allow the submission to be reordered or unwound without undue difficultly. And finally an ulterior motive for unifying context handling was to prepare for mock requests. v2: Rename to last_retired_context, split out legacy_context tracking for MI_SET_CONTEXT. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-3-chris@chris-wilson.co.uk	2016-12-18 16:18:50 +00:00
Jani Nikula	2a307c2e91	drm/i915: add some more "i" in platform names for consistency Consistency FTW. Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/9ab811dc06570bd3fc05a917ade1bdc9bb805a75.1480520526.git.jani.nikula@intel.com	2016-12-07 15:19:31 +02:00
Tvrtko Ursulin	12d79d7828	drm/i915: Make GEM object create and create from data take dev_priv Makes all GEM object constructors consistent. v2: Fix compilation in GVT code. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v1)	2016-12-01 18:01:08 +00:00
Tvrtko Ursulin	187685cb90	drm/i915: Make GEM object alloc/free and stolen created take dev_priv Where it is more appropriate and also to be consistent with the direction of the driver. v2: Leave out object alloc/free inlining. (Joonas Lahtinen) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2016-12-01 18:00:15 +00:00
Chris Wilson	d55ac5bf97	drm/i915: Defer transfer onto execution timeline to actual hw submission Defer the transfer from the client's timeline onto the execution timeline from the point of readiness to the point of actual submission. For example, in execlists, a request is finally submitted to hardware when the hardware is ready, and only put onto the hardware queue when the request is ready. By deferring the transfer, we ensure that the timeline is maintained in retirement order if we decide to queue the requests onto the hardware in a different order than fifo. v2: Rebased onto distinct global/user timeline lock classes. v3: Play with the position of the spin_lock(). v4: Nesting finally resolved with distinct sw_fence lock classes. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-4-chris@chris-wilson.co.uk	2016-11-14 21:00:23 +00:00
Chris Wilson	07c9a21a0d	drm/i915: Export a function to flush the context upon pinning For legacy contexts we employ an optimisation to only flush the context when binding into the global GTT. This avoids stalling on the GPU when reloading an active context. Wrap this detail up into a helper and export it for a potential third user. (Longer term, context pinning needs to be reworked as the current handling of switch context pins too late and so risks eviction and corrupting the request. Plans, plans, plans.) v2: Expand the comment explaining the optimisation for avoiding the stall on active contexts. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20161030132820.32163-1-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.auld@intel.com>	2016-11-01 14:26:33 +00:00
Chris Wilson	85e17f5974	drm/i915: Move the global sync optimisation to the timeline Currently we try to reduce the number of synchronisations (now the number of requests we need to wait upon) by noting that if we have earlier waited upon a request, all subsequent requests in the timeline will be after the wait. This only applies to requests in this timeline, as other timelines will not be ordered by that waiter. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-30-chris@chris-wilson.co.uk	2016-10-28 20:53:54 +01:00
Chris Wilson	caddfe7192	drm/i915: Defer breadcrumb emission Move the actual emission of the breadcrumb for closing the request from i915_add_request() to the submit callback. (It can be moved later when required.) This allows us to defer the allocation of the global_seqno from request construction to actual submission, allowing us to emit the requests out of order (wrt to the order of their construction, they still will only be executed one all of their dependencies are resolved including that all earlier requests on their timeline have been submitted.) We have to specialise how we then emit the request in order to write into the preallocated space, rather than at the tail of the ringbuffer (which will have been advanced by the addition of new requests). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-29-chris@chris-wilson.co.uk	2016-10-28 20:53:54 +01:00
Chris Wilson	98f29e8d90	drm/i915: Record space required for breadcrumb emission In the next patch, we will use deferred breadcrumb emission. That requires reserving sufficient space in the ringbuffer to emit the breadcrumb, which first requires us to know how large the breadcrumb is. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-28-chris@chris-wilson.co.uk	2016-10-28 20:53:53 +01:00
Chris Wilson	9b81d556b1	drm/i915: Rename ->emit_request to ->emit_breadcrumb Now that the emission of the request tail and its submission to hardware are two separate steps, engine->emit_request() is confusing. engine->emit_request() is called to emit the breadcrumb commands for the request into the ring, name it such (engine->emit_breadcrumb). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-27-chris@chris-wilson.co.uk	2016-10-28 20:53:53 +01:00
Chris Wilson	65e4760e39	drm/i915: Introduce a global_seqno for each request Though we will have multiple timelines, we still have a single timeline of execution. This we can use to provide an execution and retirement order of requests. This keeps tracking execution of requests simple, and vital for preserving a single waiter (i.e. so that we can order the waiters so that only the earliest to wakeup need be woken). To accomplish this we distinguish the seqno used to order requests per-context (external) and that used internally for execution. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-26-chris@chris-wilson.co.uk	2016-10-28 20:53:53 +01:00
Chris Wilson	4e50f082ac	drm/i915: Reuse the active golden render state batch The golden render state is constant, but we recreate the batch setting it up for every new context. If we keep that batch in a volatile cache we can safely reuse it whenever we need to initialise a new context. We mark the pages as purgeable and use the shrinker to recover pages from the batch whenever we face memory pressues, recreating that batch afresh on the next new context. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtien@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-8-chris@chris-wilson.co.uk	2016-10-28 20:53:44 +01:00
Chris Wilson	920cf41949	drm/i915: Introduce an internal allocator for disposable private objects Quite a few of our objects used for internal hardware programming do not benefit from being swappable or from being zero initialised. As such they do not benefit from using a shmemfs backing storage and since they are internal and never directly exposed to the user, we do not need to worry about providing a filp. For these we can use an drm_i915_gem_object wrapper around a sg_table of plain struct page. They are not swap backed and not automatically pinned. If they are reaped by the shrinker, the pages are released and the contents discarded. For the internal use case, this is fine as for example, ringbuffers are pinned from being written by a request to be read by the hardware. Once they are idle, they can be discarded entirely. As such they are a good match for execlist ringbuffers and a small variety of other internal objects. In the first iteration, this is limited to the scratch batch buffers we use (for command parsing and state initialisation). v2: Allocate physically contiguous pages, where possible. v3: Reduce maximum order on subsequent requests following an allocation failure. v4: Fix up mismatch between swiotlb segment size and page count (it counts in 2k units, not 4k pages) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-7-chris@chris-wilson.co.uk	2016-10-28 20:53:44 +01:00
Chris Wilson	f8a7fde456	drm/i915: Defer active reference until required We only need the active reference to keep the object alive after the handle has been deleted (so as to prevent a synchronous gem_close). Why then pay the price of a kref on every execbuf when we can insert that final active ref just in time for the handle deletion? Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-6-chris@chris-wilson.co.uk	2016-10-28 20:53:43 +01:00
Chris Wilson	e95433c73a	drm/i915: Rearrange i915_wait_request() accounting with callers Our low-level wait routine has evolved from our generic wait interface that handled unlocked, RPS boosting, waits with time tracking. If we push our GEM fence tracking to use reservation_objects (required for handling multiple timelines), we lose the ability to pass the required information down to i915_wait_request(). However, if we push the extra functionality from i915_wait_request() to the individual callsites (i915_gem_object_wait_rendering and i915_gem_wait_ioctl) that make use of those extras, we can both simplify our low level wait and prepare for extending the GEM interface for use of reservation_objects. v2: Rewrite i915_wait_request() kerneldocs Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.william.auld@gmail.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-4-chris@chris-wilson.co.uk	2016-10-28 20:53:43 +01:00
Akash Goel	f4e9af4f5a	drm/i915: Add low level set of routines for programming PM IER/IIR/IMR register set So far PM IER/IIR/IMR registers were being used only for Turbo related interrupts. But interrupts coming from GuC also use the same set. As a precursor to supporting GuC interrupts, added new low level routines so as to allow sharing the programming of PM IER/IIR/IMR registers between Turbo & GuC. Also similar to PM IMR, maintaining a bitmask for PM IER register, to allow easy sharing of it between Turbo & GuC without involving a rmw operation. v2: - For appropriateness & avoid any ambiguity, rename old functions enable/disable pm_irq to mask/unmask pm_irq and rename new functions enable/disable pm_interrupts to enable/disable pm_irq. (Tvrtko) - Use u32 in place of uint32_t. (Tvrtko) v3: - Rename the fields pm_irq_mask & pm_ier_mask and do some cleanup. (Chris) - Rebase. v4: Fix the inadvertent disabling of User interrupt for VECS ring causing failure for certain IGTs. v5: Use dev_priv with HAS_VEBOX macro. (Tvrtko) Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Akash Goel <akash.goel@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2016-10-25 09:34:06 +01:00
Arkadiusz Hiler	465418c606	drm/i915/gen9: Remove WaEnableYV12BugFixInHalfSliceChicken7 Dropping WA because it was for early steppings. It is fixed in newer preproduction and all production revisions. v2: add references, updated commit message References: HSD#2126385, HSD#2131381, BSID#0764 Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Winiarski <michal.winiarski@intel.com> Signed-off-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1476977460-28088-1-git-send-email-arkadiusz.hiler@intel.com	2016-10-21 14:22:50 +03:00
Akash Goel	3b3f1650b1	drm/i915: Allocate intel_engine_cs structure only for the enabled engines With the possibility of addition of many more number of rings in future, the drm_i915_private structure could bloat as an array, of type intel_engine_cs, is embedded inside it. struct intel_engine_cs engine[I915_NUM_ENGINES]; Though this is still fine as generally there is only a single instance of drm_i915_private structure used, but not all of the possible rings would be enabled or active on most of the platforms. Some memory can be saved by allocating intel_engine_cs structure only for the enabled/active engines. Currently the engine/ring ID is kept static and dev_priv->engine[] is simply indexed using the enums defined in intel_engine_id. To save memory and continue using the static engine/ring IDs, 'engine' is defined as an array of pointers. struct intel_engine_cs engine[I915_NUM_ENGINES]; dev_priv->engine[engine_ID] will be NULL for disabled engine instances. There is a text size reduction of 928 bytes, from 1028200 to 1027272, for i915.o file (but for i915.ko file text size remain same as 1193131 bytes). v2: - Remove the engine iterator field added in drm_i915_private structure, instead pass a local iterator variable to the for_each_engine* macros. (Chris) - Do away with intel_engine_initialized() and instead directly use the NULL pointer check on engine pointer. (Chris) v3: - Remove for_each_engine_id() macro, as the updated macro for_each_engine() can be used in place of it. (Chris) - Protect the access to Render engine Fault register with a NULL check, as engine specific init is done later in Driver load sequence. v4: - Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris) - Kill the superfluous init_engine_lists(). v5: - Cleanup the intel_engines_init() & intel_engines_setup(), with respect to allocation of intel_engine_cs structure. (Chris) v6: - Rebase. v7: - Optimize the for_each_engine_masked() macro. (Chris) - Change the type of 'iter' local variable to enum intel_engine_id. (Chris) - Rebase. v8: Rebase. v9: Rebase. v10: - For index calculation use engine ID instead of pointer based arithmetic in intel_engine_sync_index() as engine pointers are not contiguous now (Chris) - For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas) - Use for_each_engine macro for cleanup in intel_engines_init() and remove check for NULL engine pointer in cleanup() routines. (Joonas) v11: Rebase. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Akash Goel <akash.goel@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com	2016-10-14 09:58:43 +01:00
Chris Wilson	ad07dfcddf	drm/i915: Reset the breadcrumbs IRQ more carefully Along with the interrupt, we want to restore the fake-irq and wait-timeout detection. If we use the breadcrumbs interface to setup the interrupt as it wants, the auxiliary timers will also be restored. v2: Cancel both timers as well, sanitize the IMR. Fixes: `821ed7df6e` ("drm/i915: Update reset path to fix incomplete requests") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161007065327.24515-3-chris@chris-wilson.co.uk	2016-10-07 08:27:23 +01:00
Chris Wilson	1b36595ffb	drm/i915: Show RING registers through debugfs Knowing where the RINGs are pointing is extremely useful in diagnosing if the engines are executing the ringbuffers you expect - and igt may be suppressing the usual method of looking in the GPU error state. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161004201132.21801-7-chris@chris-wilson.co.uk	2016-10-05 08:40:06 +01:00
Chris Wilson	62ae14b1ed	drm/i915: Share the computation of ring size for RING_CTL register Since both legacy and execlists want to populate the RING_CTL register, share the computation of the right bits for the ring->size. We can then stop masking errors and explicitly forbid them during creation! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161004201132.21801-1-chris@chris-wilson.co.uk	2016-10-05 08:40:05 +01:00
Jani Nikula	3ec92362cd	drm/i915/skl: drop workarounds for F0 revision Pre-production hardware is not supported. Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/e5433e6430dcfd941209c4d8103035ddb13d17b4.1474034059.git.jani.nikula@intel.com	2016-09-26 12:14:05 +03:00
Jani Nikula	3be192e92d	drm/i915/skl: drop workarounds for E0 revision Pre-production hardware is not supported. Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/0633a02177195703502ef2396aab03efc0314334.1474034059.git.jani.nikula@intel.com	2016-09-26 12:12:56 +03:00
Jani Nikula	9fc736e833	drm/i915/skl: drop workarounds for D0 revision Pre-production hardware is not supported. Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/d28d21ceddeec226b5d1a20a7382bee9a72709a4.1474034059.git.jani.nikula@intel.com	2016-09-26 12:12:19 +03:00
Jani Nikula	0d0b8dcf94	drm/i915/skl: drop workarounds for C0 revision Pre-production hardware is not supported. Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/ed7b784306b35fa5215b9c04de79a2bc48585503.1474034059.git.jani.nikula@intel.com	2016-09-26 12:11:39 +03:00
Jani Nikula	a117f378f4	drm/i915/skl: drop workarounds for A0 and B0 revisions Pre-production hardware is not supported. Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/7929af62a68504c84038a8db1625bd96ebaa9e6f.1474034059.git.jani.nikula@intel.com	2016-09-26 12:08:22 +03:00
Chris Wilson	821ed7df6e	drm/i915: Update reset path to fix incomplete requests Update reset path in preparation for engine reset which requires identification of incomplete requests and associated context and fixing their state so that engine can resume correctly after reset. The request that caused the hang will be skipped and head is reset to the start of breadcrumb. This allows us to resume from where we left-off. Since this request didn't complete normally we also need to cleanup elsp queue manually. This is vital if we employ nonblocking request submission where we may have a web of dependencies upon the hung request and so advancing the seqno manually is no longer trivial. ABI: gem_reset_stats / DRM_IOCTL_I915_GET_RESET_STATS We change the way we count pending batches. Only the active context involved in the reset is marked as either innocent or guilty, and not mark the entire world as pending. By inspection this only affects igt/gem_reset_stats (which assumes implementation details) and not piglit. ARB_robustness gives this guide on how we expect the user of this interface to behave: * Provide a mechanism for an OpenGL application to learn about graphics resets that affect the context. When a graphics reset occurs, the OpenGL context becomes unusable and the application must create a new context to continue operation. Detecting a graphics reset happens through an inexpensive query. And with regards to the actual meaning of the reset values: Certain events can result in a reset of the GL context. Such a reset causes all context state to be lost. Recovery from such events requires recreation of all objects in the affected context. The current status of the graphics reset state is returned by enum GetGraphicsResetStatusARB(); The symbolic constant returned indicates if the GL context has been in a reset state at any point since the last call to GetGraphicsResetStatusARB. NO_ERROR indicates that the GL context has not been in a reset state since the last call. GUILTY_CONTEXT_RESET_ARB indicates that a reset has been detected that is attributable to the current GL context. INNOCENT_CONTEXT_RESET_ARB indicates a reset has been detected that is not attributable to the current GL context. UNKNOWN_CONTEXT_RESET_ARB indicates a detected graphics reset whose cause is unknown. The language here is explicit in that we must mark up the guilty batch, but is loose enough for us to relax the innocent (i.e. pending) accounting as only the active batches are involved with the reset. In the future, we are looking towards single engine resetting (with minimal locking), where it seems inappropriate to mark the entire world as innocent since the reset occurred on a different engine. Reducing the information available means we only have to encounter the pain once, and also reduces the information leaking from one context to another. v2: Legacy ringbuffer submission required a reset following hibernation, or else we restore stale values to the RING_HEAD and walked over stolen garbage. v3: GuC requires replaying the requests after a reset. v4: Restore engine IRQ after reset (so waiters will be woken!) Rearm hangcheck if resetting with a waiter. Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Arun Siluvery <arun.siluvery@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-13-chris@chris-wilson.co.uk	2016-09-09 14:23:05 +01:00
Chris Wilson	221fe79945	drm/i915: Perform a direct reset of the GPU from the waiter If a waiter is holding the struct_mutex, then the reset worker cannot reset the GPU until the waiter returns. We do not want to return -EAGAIN form i915_wait_request as that breaks delicate operations like i915_vma_unbind() which often cannot be restarted easily, and returning -EIO is just as useless (and has in the past proven dangerous). The remaining WARN_ON(i915_wait_request) serve as a valuable reminder that handling errors from an indefinite wait are tricky. We can keep the current semantic that knowing after a reset is complete, so is the request, by performing the reset ourselves if we hold the mutex. uevent emission is still handled by the reset worker, so it may appear slightly out of order with respect to the actual reset (and concurrent use of the device). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-11-chris@chris-wilson.co.uk	2016-09-09 14:23:04 +01:00
Chris Wilson	22dd3bb919	drm/i915: Mark up all locked waiters In the next patch we want to handle reset directly by a locked waiter in order to avoid issues with returning before the reset is handled. To handle the reset, we must first know whether we hold the struct_mutex. If we do not hold the struct_mtuex we can not perform the reset, but we do not block the reset worker either (and so we can just continue to wait for request completion) - otherwise we must relinquish the mutex. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-10-chris@chris-wilson.co.uk	2016-09-09 14:23:03 +01:00
Chris Wilson	ea746f3659	drm/i915: Expand bool interruptible to pass flags to i915_wait_request() We need finer control over wakeup behaviour during i915_wait_request(), so expand the current bool interruptible to a bitmask. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-9-chris@chris-wilson.co.uk	2016-09-09 14:23:03 +01:00
Carlos Santa	3177659a41	drm/i915: Make HWS_NEEDS_PHYSICAL the exception Make the .hws_needs_physical the exception by switching the flag on earlier platforms since they are fewer to support. Remove the flag on later GPUs hardware since they all use GTT hws by default. Switch the logic as well in the driver to reflect this change Signed-off-by: Carlos Santa <carlos.santa@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2016-09-07 16:07:09 -07:00
Imre Deak	43b6799814	drm/i915: sseu: Use sseu_dev_info in device info Move all slice/subslice/eu related properties to the sseu_dev_info struct. No functional change. v2: - s/info/sseu/ based on the new struct name. (Ben) Reviewed-by: Robert Bragg <robert@sixbynine.org> (v1) Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1) Tested-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1) Signed-off-by: Imre Deak <imre.deak@intel.com>	2016-09-02 18:16:31 +03:00
Chris Wilson	c58b735fc7	drm/i915: Allocate rings from stolen If we have stolen available, make use of it for ringbuffer allocation. Previously this was restricted to !llc platforms, as writing to stolen requires a GGTT mapping - but now that we have partial mappable support, the mappable aperture isn't quite so precious so we can use it more freely and ringbuffers are a good user for the otherwise wasted stolen. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-18-chris@chris-wilson.co.uk	2016-08-18 22:36:49 +01:00
Chris Wilson	9d80841ea4	drm/i915: Allow ringbuffers to be bound anywhere Now that we have WC vmapping available, we can bind our rings anywhere in the GGTT and do not need to restrict them to the mappable region. Except for stolen objects, for which direct access is verbatim and we must use the mappable aperture. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-17-chris@chris-wilson.co.uk	2016-08-18 22:36:48 +01:00
Tvrtko Ursulin	318f89ca20	drm/i915: Initialize legacy semaphores from engine hw id indexed array Build the legacy semaphore initialisation array using the engine hardware ids instead of driver internal ones. This makes the static array size dependent only on the number of gen6 semaphore engines. Also makes the per-engine semaphore wait and signal tables hardware id indexed saving some more space. v2: Refactor I915_GEN6_NUM_ENGINES to GEN6_SEMAPHORE_LAST. (Chris Wilson) v3: More polish. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1471363461-9973-1-git-send-email-tvrtko.ursulin@linux.intel.com	2016-08-17 11:29:56 +01:00
Chris Wilson	21a2c58a9c	drm/i915: Record the RING_MODE register for post-mortem debugging Just another useful register to inspect following a GPU hang. v2: Remove partial decoding of RING_MODE to userspace, be consistent and use GEN > 2 guards around RING_MODE everywhere. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-32-git-send-email-chris@chris-wilson.co.uk	2016-08-15 11:01:19 +01:00
Chris Wilson	bde13ebdab	drm/i915: Introduce i915_ggtt_offset() This little helper only exists to safely discard the upper unused 32bits of the general 64-bit VMA address - as we know that all Global GTT currently are less than 4GiB in size and so that the upper bits must be zero. In many places, we use a u32 for the global GTT offset and we want to document where we are discarding the full VMA offset. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-28-git-send-email-chris@chris-wilson.co.uk	2016-08-15 11:01:14 +01:00
Chris Wilson	19880c4a3f	drm/i915: Consolidate i915_vma_unpin_and_release() In a few places, we repeat a call to clear a pointer to a vma whilst unpinning and releasing a reference to its owner. Refactor those into a common function. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-26-git-send-email-chris@chris-wilson.co.uk	2016-08-15 11:01:12 +01:00
Chris Wilson	51d545d026	drm/i915: Use VMA as the primary tracker for semaphore page Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-23-git-send-email-chris@chris-wilson.co.uk	2016-08-15 11:01:10 +01:00
Chris Wilson	57f275a22b	drm/i915: Move common seqno reset to intel_engine_cs.c Since the intel_engine_init_seqno() is shared by all engine submission backends, move it out of the legacy intel_ringbuffer.c and into the new home for common routines, intel_engine_cs.c Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-21-git-send-email-chris@chris-wilson.co.uk	2016-08-15 11:01:08 +01:00
Chris Wilson	adc320c4b7	drm/i915: Move common scratch allocation/destroy to intel_engine_cs.c Since the scratch allocation and cleanup is shared by all engine submission backends, move it out of the legacy intel_ringbuffer.c and into the new home for common routines, intel_engine_cs.c Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-20-git-send-email-chris@chris-wilson.co.uk	2016-08-15 11:01:08 +01:00
Chris Wilson	56c0f1a7c1	drm/i915: Use VMA for scratch page tracking Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-19-git-send-email-chris@chris-wilson.co.uk	2016-08-15 11:01:07 +01:00
Chris Wilson	57e8853181	drm/i915: Use VMA for ringbuffer tracking Use the GGTT VMA as the primary cookie for handing ring objects as the most common action upon the ring is mapping and unmapping which act upon the VMA itself. By restructuring the code to work with the ring VMA, we can shrink the code and remove a few cycles from context pinning. v2: Move the flush of the object back to before the first pin. We use the am-I-bound? query to only have to check the flush on the first bind and so avoid stalling on active rings. Lots of little renames and small hoops. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-18-git-send-email-chris@chris-wilson.co.uk	2016-08-15 11:01:05 +01:00
Chris Wilson	e5cdb22b27	drm/i915: Move assertion for iomap access to i915_vma_pin_iomap Access through the GTT requires the device to be awake. Ideally i915_vma_pin_iomap() is short-lived and the pinning demarcates the access through the iomap. This is not entirely true, we have a mixture of long lived pins that exceed the wakelock (such as legacy ringbuffers) and short lived pin that do live within the wakelock (such as execlist ringbuffers). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-17-git-send-email-chris@chris-wilson.co.uk	2016-08-15 11:01:04 +01:00
Chris Wilson	7abc98fadf	drm/i915: Only change the context object's domain when binding We know that the only access to the context object is via the GPU, and the only time when it can be out of the GPU domain is when it is swapped out and unbound. Therefore we only need to clflush the object when binding, thus avoiding any potential stall on touching the domain on an active context. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-16-git-send-email-chris@chris-wilson.co.uk	2016-08-15 11:01:03 +01:00
Chris Wilson	bf3783e52a	drm/i915: Use VMA as the primary object for context state When working with contexts, we most frequently want the GGTT VMA for the context state, first and foremost. Since the object is available via the VMA, we need only then store the VMA. v2: Formatting tweaks to debugfs output, restored some comments removed in the next patch Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-15-git-send-email-chris@chris-wilson.co.uk	2016-08-15 11:01:02 +01:00
Chris Wilson	d31d7cb146	drm/i915: Support for creating write combined type vmaps vmaps has a provision for controlling the page protection bits, with which we can use to control the mapping type, e.g. WB, WC, UC or even WT. To allow the caller to choose their mapping type, we add a parameter to i915_gem_object_pin_map - but we still only allow one vmap to be cached per object. If the object is currently not pinned, then we recreate the previous vmap with the new access type, but if it was pinned we report an error. This effectively limits the access via i915_gem_object_pin_map to a single mapping type for the lifetime of the object. Not usually a problem, but something to be aware of when setting up the object's vmap. We will want to vary the access type to enable WC mappings of ringbuffer and context objects on !llc platforms, as well as other objects where we need coherent access to the GPU's pages without going through the GTT v2: Remove the redundant braces around pin count check and fix the marker in documentation (Chris) v3: - Add a new enum for the vmalloc mapping type & pass that as an argument to i915_object_pin_map. (Tvrtko) - Use PAGE_MASK to extract or filter the mapping type info and remove a superfluous BUG_ON.(Tvrtko) v4: - Rename the enums and clean up the pin_map function. (Chris) v5: Drop the VM_NO_GUARD, minor cosmetics. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Akash Goel <akash.goel@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk	2016-08-12 13:06:36 +01:00
Tvrtko Ursulin	c1bb11451e	drm/i915: Store number of active engines in device info Until now code was calling hweight32 to figure out the number from device_info->ring_mask at runtime. Instead we can cache it at engine init time and use directly. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Dave Gordon <david.s.gordon@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1470842530-35854-1-git-send-email-tvrtko.ursulin@linux.intel.com	2016-08-11 11:33:10 +01:00
Chris Wilson	737aac2465	drm/i915: Mark unmappable GGTT entries as PIN_HIGH We allocate a few objects into the GGTT that we never need to access via the mappable aperture (such as contexts, status pages). We can request that these are bound high in the VM to increase the amount of mappable aperture available. However, anything that may be frequently pinned (such as logical contexts) we want to use the fast search & insert. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470832906-13972-1-git-send-email-chris@chris-wilson.co.uk	2016-08-10 16:07:51 +01:00
Chris Wilson	dbd6ef29a7	drm/i915: Use RCU to annotate and enforce protection for breadcrumb's bh The bottom-half we use for processing the breadcrumb interrupt is a task, which is an RCU protected struct. When accessing this struct, we need to be holding the RCU read lock to prevent it disappearing beneath us. We can use the RCU annotation to mark our irq_seqno_bh pointer as being under RCU guard and then use the RCU accessors to both provide correct ordering of access through the pointer. Most notably, this fixes the access from hard irq context to use the RCU read lock, which both Daniel and Tvrtko complained about. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1470761272-1245-3-git-send-email-chris@chris-wilson.co.uk	2016-08-10 10:37:49 +01:00
Matthew Auld	575e3ccbce	drm/i915: fix WaInsertDummyPushConstPs As pointed out by Chris Harris, we are using the wrong WA name, it should in fact be WaToEnableHwFixForPushConstHWBug, also it should be applied from C0 onwards for both BXT and KBL. Fixes: `7b9005cd45` ("drm/i915: Add WaInsertDummyPushConstP for bxt and kbl") Cc: Chris Harris <chris.harris@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reported-by: Chris Harris <chris.harris@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470127013-29653-1-git-send-email-matthew.auld@intel.com	2016-08-05 13:16:55 +03:00
Chris Wilson	dcff85c844	drm/i915: Enable i915_gem_wait_for_idle() without holding struct_mutex The principal motivation for this was to try and eliminate the struct_mutex from i915_gem_suspend - but we still need to hold the mutex current for the i915_gem_context_lost(). (The issue there is that there may be an indirect lockdep cycle between cpu_hotplug (i.e. suspend) and struct_mutex via the stop_machine().) For the moment, enabling last request tracking for the engine, allows us to do busyness checking and waiting without requiring the struct_mutex - which is useful in its own right. As a side-effect of having a robust means for tracking engine busyness, we can replace our other busyness heuristic, that of comparing against the last submitted seqno. For paranoid reasons, we have a semi-ordered check of that seqno inside the hangchecker, which we can now improve to an ordered check of the engine's busyness (removing a locked xchg in the process). v2: Pass along "bool interruptible" as being unlocked we cannot rely on i915->mm.interruptible being stable or even under our control. v3: Replace check Ironlake i915_gpu_busy() with the common precalculated value Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-6-git-send-email-chris@chris-wilson.co.uk	2016-08-05 10:54:37 +01:00
Chris Wilson	90f4fcd56b	drm/i915: Remove forced stop ring on suspend/unload Before suspending (or unloading), we would first wait upon all rendering to be completed and then disable the rings. This later step is a remanent from DRI1 days when we did not use request tracking for all operations upon the ring. Now that we are sure we are waiting upon the very last operation by the engine, we can forgo clobbering the ring registers, though we do keep the assert that the engine is indeed idle before sleeping. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-5-git-send-email-chris@chris-wilson.co.uk	2016-08-05 10:54:36 +01:00
Chris Wilson	de895082f7	drm/i915: Remove highly confusing i915_gem_obj_ggtt_pin() Since i915_gem_obj_ggtt_pin() is an idiom breaking curry function for i915_gem_object_ggtt_pin(), spare us the confusion and remove it. Removing it now simplifies later patches to change the i915_vma_pin() (and friends) interface. v2: Add a redundant GEM_BUG_ON(!view) to i915_gem_obj_lookup_or_create_ggtt_vma() Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-18-git-send-email-chris@chris-wilson.co.uk	2016-08-04 20:20:00 +01:00
Chris Wilson	776f32364d	drm/i915: s/__i915_wait_request/i915_wait_request/ There is only one wait on request function now, so drop the "expert" indication of leading __. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-21-git-send-email-chris@chris-wilson.co.uk	2016-08-04 08:09:29 +01:00
Chris Wilson	37db14700e	drm/i915: Disable waitboosting for a saturated engine If the user floods the GPU with so many requests that the engine stalls waiting for free space, don't automatically promote the GPU to maximum frequencies. If the GPU really is saturated with work, it will migrate to high clocks by itself, otherwise it is merely a user flooding us with busy-work. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-20-git-send-email-chris@chris-wilson.co.uk	2016-08-04 08:09:28 +01:00
Chris Wilson	7da844c5c6	drm/i915: Move the special case wait-request handling to its one caller Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-19-git-send-email-chris@chris-wilson.co.uk	2016-08-04 08:09:27 +01:00
Chris Wilson	675d9ad71b	drm/i915: Track requests inside each intel_ring By tracking each request occupying space inside an individual intel_ring, we can greatly simplify the logic of tracking available space and not worry about other timelines. (Each ring is an ordered timeline of committed requests.) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-17-git-send-email-chris@chris-wilson.co.uk	2016-08-04 08:09:26 +01:00
Chris Wilson	efdf7c0605	drm/i915: Rename request->list to link for consistency We use "list" to denote the list and "link" to denote an element on that list. Rename request->list to match this idiom. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-14-git-send-email-chris@chris-wilson.co.uk	2016-08-04 08:09:23 +01:00
Chris Wilson	96a945aa42	drm/i915: Move the common engine cleanup to intel_engine_cs.c Now that we initialize the state to both legacy and execlists inside intel_engine_cs, we should also clean up that state from the common functions. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470226756-24401-1-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2016-08-03 14:01:12 +01:00
Chris Wilson	ad7bdb2b99	drm/i915: Rename engine->semaphore.sync_to, engine->sempahore.signal locals In order to be more consistent with the rest of the request construction and ring emission, use the common names for the ring and request. Rather than using signaler_req, waiter_req, and intel_ring *wait, we use plain req and ring. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-32-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-23-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:33 +01:00
Chris Wilson	ddf07be7a2	drm/i915: Simplify calling engine->sync_to Since requests can no longer be generated as a side-effect of intel_ring_begin(), we know that the seqno will be unchanged during ring-emission. This predicatablity then means we do not have to check for the seqno wrapping around whilst emitting the semaphore for engine->sync_to(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-31-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-22-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:32 +01:00
Chris Wilson	618e4ca7b1	drm/i915/ringbuffer: Specialise SNB+ request emission for semaphores As gen6_emit_request() only differs from i9xx_emit_request() when semaphores are enabled, only use the specialised vfunc in that scenario. v2: Reorder semaphore init so as to keep engine->emit_request default vfunc selection compact. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-27-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-18-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:29 +01:00
Chris Wilson	b0411e7d45	drm/i915: Reuse legacy breadcrumbs + tail emission As GEN6+ is now a simple variant on the basic breadcrumbs + tail write, reuse the common code. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-26-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-17-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:29 +01:00
Chris Wilson	9242f974dc	drm/i915: Stop passing caller's num_dwords to engine->semaphore.signal() Rather than pass in the num_dwords that the caller wishes to use after the signal command packet, split the breadcrumb emission into two phases and have both the signal and breadcrumb individiually acquire space on the ring. This makes the interface simpler for the reader, and will simplify for patches. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-25-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-16-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:28 +01:00
Chris Wilson	ddd66c5154	drm/i915: Unify request submission Move request submission from emit_request into its own common vfunc from i915_add_request(). v2: Convert I915_DISPATCH_flags to BIT(x) whilst passing v3: Rename a few functions to match. v4: Reenable execlists submission after disabling guc. v5: Be aware that everyone calls i915_guc_submission_disable()! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-23-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-14-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:26 +01:00
Chris Wilson	8f9420184a	drm/i915: Move the modulus for ring emission to the register write Space reservation is already safe with respect to the ring->size modulus, but hardware only expects to see values in the range 0...ring->size-1 (inclusive) and so requires the modulus to prevent us writing the value ring->size instead of 0. As this is only required for the register itself, we can defer the modulus to the register update and not perform it after every command packet. We keep the intel_ring_advance() around in the code to provide demarcation for the end-of-packet (which then can be compared against intel_ring_begin() as the number of dwords emitted must match the reserved space). v2: Assert that the ring size is a power-of-two to match assumptions in the code. Simplify the comment before writing the tail value to explain why the modulus is necessary. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Dave Gordon <david.s.gordon@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-13-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:25 +01:00
Chris Wilson	c5efa1ad09	drm/i915: Convert engine->write_tail to operate on a request If we rewrite the I915_WRITE_TAIL specialisation for the legacy ringbuffer as submitting the request onto the ringbuffer, we can unify the vfunc with both execlists and GuC in the next patch. v2: Drop the modulus from the I915_WRITE_TAIL as it is currently being applied in intel_ring_advance() after every command packet, and add a comment explaining why we need the modulus at all. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-22-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-12-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:24 +01:00
Chris Wilson	803688babd	drm/i915: Unify legacy/execlists emission of MI_BATCHBUFFER_START Both the ->dispatch_execbuffer and ->emit_bb_start callbacks do exactly the same thing, add MI_BATCHBUFFER_START to the request's ringbuffer - we need only one vfunc. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-20-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-10-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:21 +01:00
Chris Wilson	7c9cf4e33a	drm/i915: Reduce engine->emit_flush() to a single mode parameter Rather than passing a complete set of GPU cache domains for either invalidation or for flushing, or even both, just pass a single parameter to the engine->emit_flush to determine the required operations. engine->emit_flush(GPU, 0) -> engine->emit_flush(EMIT_INVALIDATE) engine->emit_flush(0, GPU) -> engine->emit_flush(EMIT_FLUSH) engine->emit_flush(GPU, GPU) -> engine->emit_flush(EMIT_FLUSH \| EMIT_INVALIDATE) This allows us to extend the behaviour easily in future, for example if we want just a command barrier without the overhead of flushing. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dave Gordon <david.s.gordon@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-8-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:19 +01:00
Chris Wilson	c7fe7d25ed	drm/i915: Remove obsolete engine->gpu_caches_dirty Space for flushing the GPU cache prior to completing the request is preallocated and so cannot fail - the GPU caches will always be flushed along with the completed request. This means we no longer have to track whether the GPU cache is dirty between batches like we had to with the outstanding_lazy_seqno. With the removal of the duplication in the per-backend entry points for emitting the obsolete lazy flush, we can then further unify the engine->emit_flush. v2: Expand a bit on the legacy of gpu_caches_dirty Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-18-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-7-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:19 +01:00
Chris Wilson	aad29fbbb8	drm/i915: Rename intel_pin_and_map_ring() For more consistent oop-naming, we would use intel_ring_verb, so pick intel_ring_pin() and intel_ring_unpin(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-17-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-6-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:18 +01:00
Chris Wilson	32c04f16f0	drm/i915: Rename residual ringbuf parameters Now that we have a clear ring/engine split and a struct intel_ring, we no longer need the stopgap ringbuf names. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-16-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-5-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:17 +01:00
Chris Wilson	7e37f889b5	drm/i915: Rename struct intel_ringbuffer to struct intel_ring The state stored in this struct is not only the information about the buffer object, but the ring used to communicate with the hardware. Using buffer here is overly specific and, for me at least, conflates with the notion of buffer objects themselves. s/struct intel_ringbuffer/struct intel_ring/ s/enum intel_ring_hangcheck/enum intel_engine_hangcheck/ s/describe_ctx_ringbuf()/describe_ctx_ring()/ s/intel_ring_get_active_head()/intel_engine_get_active_head()/ s/intel_ring_sync_index()/intel_engine_sync_index()/ s/intel_ring_init_seqno()/intel_engine_init_seqno()/ s/ring_stuck()/engine_stuck()/ s/intel_cleanup_engine()/intel_engine_cleanup()/ s/intel_stop_engine()/intel_engine_stop()/ s/intel_pin_and_map_ringbuffer_obj()/intel_pin_and_map_ring()/ s/intel_unpin_ringbuffer()/intel_unpin_ring()/ s/intel_engine_create_ringbuffer()/intel_engine_create_ring()/ s/intel_ring_flush_all_caches()/intel_engine_flush_all_caches()/ s/intel_ring_invalidate_all_caches()/intel_engine_invalidate_all_caches()/ s/intel_ringbuffer_free()/intel_ring_free()/ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-15-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-4-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:16 +01:00
Chris Wilson	1dae2dfb0b	drm/i915: Rename request->ringbuf to request->ring Now that we have disambuigated ring and engine, we can use the clearer and more consistent name for the intel_ringbuffer pointer in the request. @@ struct drm_i915_gem_request *r; @@ - r->ringbuf + r->ring Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-12-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-2-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:15 +01:00
Chris Wilson	b5321f309b	drm/i915: Unify intel_logical_ring_emit and intel_ring_emit Both perform the same actions with more or less indirection, so just unify the code. v2: Add back a few intel_engine_cs locals Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-11-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-1-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:13 +01:00
Chris Wilson	33a051a5fc	drm/i915/cmdparser: Remove stray intel_engine_cs *ring When we refer to intel_engine_cs, we want to use engine so as not to confuse ourselves about ringbuffers. v2: Rename all the functions as well, as well as a few more stray comments. v3: Split the really long error message strings Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-6-git-send-email-chris@chris-wilson.co.uk Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469606850-28659-1-git-send-email-chris@chris-wilson.co.uk	2016-07-27 16:23:05 +01:00
Dave Gordon	38a0f2db5b	drm/i915: rename 'ring' where it refers to an engine or engine_id 'ring' is an old deprecated term for a GPU engine. Chris Wilson wants to use the name for what is currently known as an intel_ringbuffer, but it will be dreadfully confusing if some rings are ringbuffers but other rings are still engines. So this patch changes the names of a bunch of parameters called 'ring' to either 'engine' or 'engine_id' according to what they actually are. Signed-off-by: Dave Gordon <david.s.gordon@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469034967-15840-3-git-send-email-david.s.gordon@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-07-21 09:59:41 +01:00
Mika Kuoppala	4ba9c1f7c7	drm/i915/gen9: Add WaInPlaceDecompressionHang Add this workaround to prevent hang when in place compression is used. References: HSD#2135774 Cc: stable@vger.kernel.org Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2016-07-20 16:10:20 +03:00
Chris Wilson	39df91905d	drm/i915: Convert i915_semaphores_is_enabled over to early sanitize Rather than recomputing whether semaphores are enabled, we can do that computation once during early initialisation as the i915.semaphores module parameter is now read-only. s/i915_semaphores_is_enabled/i915.semaphores/ v2: Add the state to the debug dmesg as well Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-10-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-9-git-send-email-chris@chris-wilson.co.uk	2016-07-20 13:40:43 +01:00
Chris Wilson	f2f0ed718b	drm/i915: Rename ring->virtual_start as ring->vaddr Just a different colour to better match virtual addresses elsewhere. s/ring->virtual_start/ring->vaddr/ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-9-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Dave Gordon <david.s.gordon@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-8-git-send-email-chris@chris-wilson.co.uk	2016-07-20 13:40:39 +01:00
Chris Wilson	406ea8d22f	drm/i915: Treat ringbuffer writes as write to normal memory Ringbuffers are now being written to either through LLC or WC paths, so treating them as simply iomem is no longer adequate. However, for the older !llc hardware, the hardware is documentated as treating the TAIL register update as serialising, so we can relax the barriers when filling the rings (but even if it were not, it is still an uncached register write and so serialising anyway.). For simplicity, let's ignore the iomem annotation. v2: Remove iomem from ringbuffer->virtual_address v3: And for good measure add iomem elsewhere to keep sparse happy Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> #v2 Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-8-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-7-git-send-email-chris@chris-wilson.co.uk	2016-07-20 13:40:14 +01:00
Chris Wilson	f8c417cdb1	drm/i915: Rename drm_gem_object_unreference in preparation for lockless free Ultimately wraps kref_put(), so adopt its nomenclature for consistency with other subsystems. s/drm_gem_object_unreference/i915_gem_object_put/ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-6-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-5-git-send-email-chris@chris-wilson.co.uk	2016-07-20 13:40:12 +01:00
Chris Wilson	9a6feaf0d7	drm/i915: Rename i915_gem_context_reference/unreference() As these are wrappers around kref_get/kref_put() it is preferable to follow the naming convention and use the same verb get/put in our wrapper names for manipulating a reference to the context. s/i915_gem_context_reference/i915_gem_context_get/ s/i915_gem_context_unreference/i915_gem_context_put/ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-3-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-2-git-send-email-chris@chris-wilson.co.uk	2016-07-20 13:40:10 +01:00
Chris Wilson	04769652c8	drm/i915: Derive GEM requests from dma-fence dma-buf provides a generic fence class for interoperation between drivers. Internally we use the request structure as a fence, and so with only a little bit of interfacing we can rebase those requests on top of dma-buf fences. This will allow us, in the future, to pass those fences back to userspace or between drivers. v2: The fence_context needs to be globally unique, not just unique to this device. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1469002875-2335-4-git-send-email-chris@chris-wilson.co.uk	2016-07-20 09:29:53 +01:00
Tvrtko Ursulin	019bf27763	drm/i915: Pull out some more common engine init code Created two common helpers for engine setup and engine init phases respectively to help with code sharing. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1468422221-12132-1-git-send-email-tvrtko.ursulin@linux.intel.com Reviewed-by: Chris Wilson <chris-wilson.co.uk>	2016-07-14 11:17:24 +01:00
Tvrtko Ursulin	acd2784562	drm/i915: Simplify intel_init_ring_buffer prototype Engine contains dev_priv so need to pass it in. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Chris Wilson <chris-wilson.co.uk>	2016-07-14 11:17:17 +01:00
Tvrtko Ursulin	c78d606134	drm/i915: Make more use of the shared engine irq setup Use more of the shared engine setup data for legacy engine initialization. This time to simplify the irq initialization code. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris-wilson.co.uk>	2016-07-14 11:17:12 +01:00
Tvrtko Ursulin	8b3e2d3639	drm/i915: Unify engine init loop With the unified common engine setup done, and the execlist engine initialization loop clearly split into two phases, we can eliminate the separate legacy engine initialization code. v2: Fix cleanup path for legacy. v3: Rename constructors. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Chris Wilson <chris-wilson.co.uk>	2016-07-14 11:17:06 +01:00
Dave Gordon	c2c7f24008	drm/i915: unify first-stage engine struct setup intel_lrc.c has a table of "logical rings" (meaning engines), while intel_ringbuffer.c has separately open-coded initialisation for each engine. We can deduplicate this somewhat by using the same first-stage engine-setup function for both modes. So here we expose the function that transfers information from the static table of (all) known engines to the dev_priv->engine array of engines available on this device (adjusting the names along the way) and then embed calls to it in both the LRC and the legacy-mode setup. Signed-off-by: Dave Gordon <david.s.gordon@intel.com> Reviewed-by: Chris Wilson <chris-wilson.co.uk> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2016-07-14 11:16:18 +01:00
Ville Syrjälä	035ea405c9	drm/i915: Unbreak interrupts on pre-gen6 Prior to gen6 we didn't have per-ring IMR registers, which means that since commit `61ff75ac20` ("drm/i915: Simplify enabling user-interrupts with L3-remapping") we're now masking off all interrupts when init_render_ring() gets called. That's rather rude. Let's limit the ring IMR frobbing to machines that actually have the per-ring IMR registers. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: `61ff75ac20` ("drm/i915: Simplify enabling user-interrupts with L3-remapping") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1468340687-3596-1-git-send-email-ville.syrjala@linux.intel.com Reviewd-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-07-13 17:04:00 +03:00
Chris Wilson	91c8a326a1	drm/i915: Convert dev_priv->dev backpointers to dev_priv->drm Since drm_i915_private is now a subclass of drm_device we do not need to chase the drm_i915_private->dev backpointer and can instead simply access drm_i915_private->drm directly. text data bss dec hex filename `1068757` 4565 416 1073738 10624a drivers/gpu/drm/i915/i915.ko 1066949 4565 416 1071930 105b3a drivers/gpu/drm/i915/i915.ko Created by the coccinelle script: @@ struct drm_i915_private *d; identifier i; @@ ( - d->dev->i + d->drm.i \| - d->dev + &d->drm ) and for good measure the dev_priv->dev backpointer was removed entirely. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467711623-2905-4-git-send-email-chris@chris-wilson.co.uk	2016-07-05 11:58:45 +01:00
Chris Wilson	fac5e23e3c	drm/i915: Mass convert dev->dev_private to to_i915(dev) Since we now subclass struct drm_device, we can save pointer dances by noting the equivalence of struct drm_device and struct drm_i915_private, i.e. by using to_i915(). text data bss dec hex filename 1073824 4562 416 1078802 107612 drivers/gpu/drm/i915/i915.ko 1068976 4562 416 1073954 106322 drivers/gpu/drm/i915/i915.ko Created by the coccinelle script: @@ expression E; identifier p; @@ - struct drm_i915_private p = E->dev_private; + struct drm_i915_private p = to_i915(E); Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Dave Gordon <david.s.gordon@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467628477-25379-1-git-send-email-chris@chris-wilson.co.uk	2016-07-04 12:54:07 +01:00
Chris Wilson	7b4d3a16dd	drm/i915: Remove stop-rings debugfs interface Now that we have (near) universal GPU recovery code, we can inject a real hang from userspace and not need any fakery. Not only does this mean that the testing is far more realistic, but we can simplify the kernel in the process. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-7-git-send-email-chris@chris-wilson.co.uk	2016-07-04 08:18:23 +01:00
Chris Wilson	61ff75ac20	drm/i915: Simplify enabling user-interrupts with L3-remapping Borrow the idea from intel_lrc.c to precompute the mask of interrupts we wish to always enable to avoid having lots of conditionals inside the interrupt enabling. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-19-git-send-email-chris@chris-wilson.co.uk	2016-07-01 21:04:17 +01:00
Chris Wilson	31bb59cc01	drm/i915: Move the get/put irq locking into the caller With only a single callsite for intel_engine_cs->irq_get and ->irq_put, we can reduce the code size by moving the common preamble into the caller, and we can also eliminate the reference counting. For completeness, as we are no longer doing reference counting on irq, rename the get/put vfunctions to enable/disable respectively and are able to review the use of posting reads. We only require the serialisation with hardware when enabling the interrupt (i.e. so we cannot miss an interrupt by going to sleep before the hardware truly enables it). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-18-git-send-email-chris@chris-wilson.co.uk	2016-07-01 21:04:16 +01:00
Chris Wilson	f8973c217f	drm/i915: Add a delay between interrupt and inspecting the final seqno (ilk) On Ironlake, there is no command nor register to ensure that the write from a MI_STORE command is completed (and coherent on the CPU) before the command parser continues. This means that the ordering between the seqno write and the subsequent user interrupt is undefined (like gen6+). So to ensure that the seqno write is completed after the final user interrupt we need to delay the read sufficiently to allow the write to complete. This delay is undefined by the bspec, and empirically requires 75us even though a register read combined with a clflush is less than 500ns. Hence, the delay is due to an on-chip buffer rather than the latency of the write to memory. Note that the render ring controls this by filling the PIPE_CONTROL fifo with stalling commands that force the earliest pipe-control with the seqno to be completed before the command parser continues. Given that we need a barrier operation for BSD, we may as well forgo the extra per-batch latency by using a common per-interrupt barrier. Studying the impact of adding the usleep shows that in both sequences of and individual synchronous no-op batches is negligible for the media engine (where the write now is unordered with the interrupt). Converting the render engine over from the current glutton of pie-controls over to the per-interrupt delays speeds up both the sequential and individual synchronous no-ops by 20% and 60%, respectively. This speed up holds even when looking at the throughput of small copies (4KiB->4MiB), both serial and synchronous, by about 20%. This is because despite adding a significant delay to the interrupt, in all likelihood we will see the seqno write without having to apply the barrier (only in the rare corner cases where the write is delayed on the last required is the delay necessary). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94307 Testcase: igt/gem_sync #ilk Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-12-git-send-email-chris@chris-wilson.co.uk	2016-07-01 21:00:53 +01:00
Chris Wilson	7d5ea80720	drm/i915: Refactor scratch object allocation for gen2 w/a buffer The gen2 w/a buffer is stuffed into the same slot as the gen5+ scratch buffer. If we pass in the size we want to allocate for the scratch buffer, both callers can use the same routine. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-11-git-send-email-chris@chris-wilson.co.uk	2016-07-01 21:00:50 +01:00
Chris Wilson	de8fe1663a	drm/i915: Allocate scratch page from stolen With the last direct CPU access to the scratch page removed, we can now allocate it from our small amount of reserved system pages (stolen memory). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-10-git-send-email-chris@chris-wilson.co.uk	2016-07-01 20:59:45 +01:00
Chris Wilson	f8291952bd	drm/i915: Stop mapping the scratch page into CPU space After the elimination of using the scratch page for Ironlake's breadcrumb, we no longer need to kmap the object. We therefore can move it into the high unmappable space and do not need to force the object to be coherent (i.e. snooped on !llc platforms). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-9-git-send-email-chris@chris-wilson.co.uk	2016-07-01 20:58:48 +01:00
Chris Wilson	1b7744e7ba	drm/i915: Use HWS for seqno tracking everywhere By using the same address for storing the HWS on every platform, we can remove the platform specific vfuncs and reduce the get-seqno routine to a single read of a cached memory location. v2: Fix semaphore_passed() to look at the signaling engine (not the waiter's) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-8-git-send-email-chris@chris-wilson.co.uk	2016-07-01 20:58:48 +01:00
Chris Wilson	688e6c7258	drm/i915: Slaughter the thundering i915_wait_request herd One particularly stressful scenario consists of many independent tasks all competing for GPU time and waiting upon the results (e.g. realtime transcoding of many, many streams). One bottleneck in particular is that each client waits on its own results, but every client is woken up after every batchbuffer - hence the thunder of hooves as then every client must do its heavyweight dance to read a coherent seqno to see if it is the lucky one. Ideally, we only want one client to wake up after the interrupt and check its request for completion. Since the requests must retire in order, we can select the first client on the oldest request to be woken. Once that client has completed his wait, we can then wake up the next client and so on. However, all clients then incur latency as every process in the chain may be delayed for scheduling - this may also then cause some priority inversion. To reduce the latency, when a client is added or removed from the list, we scan the tree for completed seqno and wake up all the completed waiters in parallel. Using igt/benchmarks/gem_latency, we can demonstrate this effect. The benchmark measures the number of GPU cycles between completion of a batch and the client waking up from a call to wait-ioctl. With many concurrent waiters, with each on a different request, we observe that the wakeup latency before the patch scales nearly linearly with the number of waiters (before external factors kick in making the scaling much worse). After applying the patch, we can see that only the single waiter for the request is being woken up, providing a constant wakeup latency for every operation. However, the situation is not quite as rosy for many waiters on the same request, though to the best of my knowledge this is much less likely in practice. Here, we can observe that the concurrent waiters incur extra latency from being woken up by the solitary bottom-half, rather than directly by the interrupt. This appears to be scheduler induced (having discounted adverse effects from having a rbtree walk/erase in the wakeup path), each additional wake_up_process() costs approximately 1us on big core. Another effect of performing the secondary wakeups from the first bottom-half is the incurred delay this imposes on high priority threads - rather than immediately returning to userspace and leaving the interrupt handler to wake the others. To offset the delay incurred with additional waiters on a request, we could use a hybrid scheme that did a quick read in the interrupt handler and dequeued all the completed waiters (incurring the overhead in the interrupt handler, not the best plan either as we then incur GPU submission latency) but we would still have to wake up the bottom-half every time to do the heavyweight slow read. Or we could only kick the waiters on the seqno with the same priority as the current task (i.e. in the realtime waiter scenario, only it is woken up immediately by the interrupt and simply queues the next waiter before returning to userspace, minimising its delay at the expense of the chain, and also reducing contention on its scheduler runqueue). This is effective at avoid long pauses in the interrupt handler and at avoiding the extra latency in realtime/high-priority waiters. v2: Convert from a kworker per engine into a dedicated kthread for the bottom-half. v3: Rename request members and tweak comments. v4: Use a per-engine spinlock in the breadcrumbs bottom-half. v5: Fix race in locklessly checking waiter status and kicking the task on adding a new waiter. v6: Fix deciding when to force the timer to hide missing interrupts. v7: Move the bottom-half from the kthread to the first client process. v8: Reword a few comments v9: Break the busy loop when the interrupt is unmasked or has fired. v10: Comments, unnecessary churn, better debugging from Tvrtko v11: Wake all completed waiters on removing the current bottom-half to reduce the latency of waking up a herd of clients all waiting on the same request. v12: Rearrange missed-interrupt fault injection so that it works with igt/drv_missed_irq_hang v13: Rename intel_breadcrumb and friends to intel_wait in preparation for signal handling. v14: RCU commentary, assert_spin_locked v15: Hide BUG_ON behind the compiler; report on gem_latency findings. v16: Sort seqno-groups by priority so that first-waiter has the highest task priority (and so avoid priority inversion). v17: Add waiters to post-mortem GPU hang state. v18: Return early for a completed wait after acquiring the spinlock. Avoids adding ourselves to the tree if the is already complete, and skips the awkward question of why we don't do completion wakeups for waits earlier than or equal to ourselves. v19: Prepare for init_breadcrumbs to fail. Later patches may want to allocate during init, so be prepared to propagate back the error code. Testcase: igt/gem_concurrent_blit Testcase: igt/benchmarks/gem_latency Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com> Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Dave Gordon <david.s.gordon@intel.com> Cc: "Goel, Akash" <akash.goel@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> #v18 Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-6-git-send-email-chris@chris-wilson.co.uk	2016-07-01 20:58:43 +01:00
Chris Wilson	ed00307893	drm/i915/ringbuffer: Move all default irq vfuncs init to a separate func Just plonk all the default irq vfuncs together in one function to keep the initialisers of reasonable size. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467361093-20209-2-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2016-07-01 09:48:24 +01:00
Chris Wilson	6f7bef75d1	drm/i915/ringbuffer: Move all generic engine->dispatch_batchbuffer together Consolidate the block of default vfuncs for dispatching the batchbuffer. Just a minor tweak on top of Tvrtko's great job of tidying up the vfunc initialisation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467361093-20209-1-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2016-07-01 09:48:00 +01:00
Tvrtko Ursulin	8d228911ff	drm/i915: Trim some if-else braces Just a bit of cleanup after the previous refactoring. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1467212972-861-1-git-send-email-tvrtko.ursulin@linux.intel.com	2016-06-30 17:20:45 +01:00
Tvrtko Ursulin	4b8e38a9ef	drm/i915: Consolidate legacy semaphore initialization Replace per-engine initialization with a common half-programatic, half-data driven code for ease of maintenance and compactness. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-06-30 17:20:45 +01:00
Tvrtko Ursulin	c38c651b39	drm/i915: Compact gen8_ring_sync Store the semaphore offset in a temporary variable to avoid having to get the VMA offset twice. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-06-30 17:20:45 +01:00
Tvrtko Ursulin	1b9e665064	drm/i915: Compact Gen8 semaphore initialization Replace the macro initializer with a programatic loop which results in smaller code and hopefully just as clear. v2: Rebase. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-06-30 17:20:45 +01:00
Tvrtko Ursulin	db3d4019ab	drm/i915: Move semaphore object creation into intel_ring_init_semaphores The object needs to be created before semaphores can be initialized on any ring and it makes sense to pull it out to this semaphore dedicated helper. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-06-30 17:20:45 +01:00
Tvrtko Ursulin	d9a64610b2	drm/i915: Consolidate semaphore vfuncs init Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2016-06-30 17:20:44 +01:00
Tvrtko Ursulin	960ecaad75	drm/i915: Consolidate dispatch_execbuffer vfunc v2: Put dispatch_execbuffer before add_request. (Chris Wilson) v3: Fix add_request and irq_seqno_barrier for gen8+. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-06-30 17:20:44 +01:00
Tvrtko Ursulin	1d8a1337f3	drm/i915: Consolidate init_hw vfunc Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-06-30 17:20:44 +01:00
Tvrtko Ursulin	604096d778	drm/i915: Consolidate get/set_seqno Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-06-30 17:20:44 +01:00
Tvrtko Ursulin	b9700325cb	drm/i915: Consolidate get and put irq vfuncs v2: Consistent INTEL_GEN vs IS_GEN usage. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-06-30 17:20:44 +01:00
Tvrtko Ursulin	cc54a82830	drm/i915: Consolidate seqno_barrier vfunc Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-06-30 17:20:43 +01:00
Tvrtko Ursulin	7445a2a413	drm/i915: Consolidate add_request vfunc All engines apart from render select this based on Gen. Move it to the common helper as well. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-06-30 17:20:43 +01:00
Tvrtko Ursulin	06a2fe2279	drm/i915: Consolidate write_tail vfunc initializer Introduce a function which initializes vfuncs mostly common across engines and move write_tail initialization in it since only one engine overrides the default. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-06-30 17:20:43 +01:00
Chris Wilson	76f8421f2a	drm/i915: Perform Sandybridge BSD tail write under the forcewake Since we have a sequence of register reads and writes, we can reduce the latency of starting the BSD ring by performing all the mmio operations under the same forcewake wakeref. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467297225-21379-62-git-send-email-chris@chris-wilson.co.uk	2016-06-30 15:42:34 +01:00
Chris Wilson	d54fe4aad7	drm/i915: Convert wait_for(I915_READ(reg)) to intel_wait_for_register() By using the out-of-line intel_wait_for_register() not only do we can efficiency from using the hybrid wait_for() contained within, but we avoid code bloat from the numerous inlined loops, in total (all patches): text data bss dec hex filename 1078551 4557 416 1083524 108884 drivers/gpu/drm/i915/i915.ko 1070775 4557 416 1075748 106a24 drivers/gpu/drm/i915/i915.ko Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467297225-21379-48-git-send-email-chris@chris-wilson.co.uk	2016-06-30 15:42:27 +01:00
Chris Wilson	3d808eb171	drm/i915: Convert wait_for(I915_READ(reg)) to intel_wait_for_register() By using the out-of-line intel_wait_for_register() not only do we can efficiency from using the hybrid wait_for() contained within, but we avoid code bloat from the numerous inlined loops, in total (all patches): text data bss dec hex filename 1078551 4557 416 1083524 108884 drivers/gpu/drm/i915/i915.ko 1070775 4557 416 1075748 106a24 drivers/gpu/drm/i915/i915.ko Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467297225-21379-47-git-send-email-chris@chris-wilson.co.uk	2016-06-30 15:42:26 +01:00
Chris Wilson	25ab57f42f	drm/i915: Convert wait_for(I915_READ(reg)) to intel_wait_for_register() By using the out-of-line intel_wait_for_register() not only do we can efficiency from using the hybrid wait_for() contained within, but we avoid code bloat from the numerous inlined loops, in total (all patches): text data bss dec hex filename 1078551 4557 416 1083524 108884 drivers/gpu/drm/i915/i915.ko 1070775 4557 416 1075748 106a24 drivers/gpu/drm/i915/i915.ko Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467297225-21379-46-git-send-email-chris@chris-wilson.co.uk	2016-06-30 15:42:26 +01:00
Chris Wilson	c7c3c07d16	drm/i915: Treat kernel context as initialised The kernel context exists simply as a placeholder and should never be executed with a render context. It does not need the golden render state, as that will always be applied to a user context. By skipping the initialisation we can avoid issues in attempting to program the golden render context when trying to make the hardware idle. v2: Rebase Testcase: igt/drm_module_reload_basic #byt Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95634 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1466776558-21516-3-git-send-email-chris@chris-wilson.co.uk	2016-06-24 15:02:44 +01:00
Chris Wilson	0cb26a8ed1	drm/i915: Move legacy kernel context pinning to intel_ringbuffer.c This is so that we have symmetry with intel_lrc.c and avoid a source of if (i915.enable_execlists) layering violation within i915_gem_context.c - that is we move the specific handling of the dev_priv->kernel_context for legacy submission into the legacy submission code. This depends upon the init/fini ordering between contexts and engines already defined by intel_lrc.c, and also exporting the context alignment required for pinning the legacy context. v2: Separate out pin/unpin context funcs for greater symmetry with intel_lrc. One more step towards unifying behaviour between the two classes of engines and towards fixing another bug in i915_switch_context vs requests. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1466776558-21516-2-git-send-email-chris@chris-wilson.co.uk	2016-06-24 15:02:34 +01:00
arun.siluvery@linux.intel.com	780f0aebda	drm/i915/bxt: Add WaDisablePooledEuLoadBalancingFix This is a WA affecting pooled eu which is a bxt specific feature. Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Winiarski, Michal <michal.winiarski@intel.com> Cc: Zou, Nanhai <nanhai.zou@intel.com> Cc: Yang, Rong R <rong.r.yang@intel.com> Cc: Jeff McGee <jeff.mcgee@intel.com> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2016-06-14 10:44:07 +01:00
Tim Gore	a8ab5ed5e1	drm/i915/gen9: implement WaConextSwitchWithConcurrentTLBInvalidate This patch enables a workaround for a mid thread preemption issue where a hardware timing problem can prevent the context restore from happening, leading to a hang. v2: move to gen9_init_workarounds (Arun) v3: move to start of gen9_init_workarounds (Arun) Signed-off-by: Tim Gore <tim.gore@intel.com> Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465816501-25557-1-git-send-email-tim.gore@intel.com	2016-06-13 14:48:02 +01:00
Mika Kuoppala	71dce58c8e	drm/i915/skl: Extend WaDisableChickenBitTSGBarrierAckForFFSliceCS There is ambiguity in the documentation between D0 and E0. Extend this workaround to E0. References: BSID#779 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-23-git-send-email-mika.kuoppala@intel.com	2016-06-08 16:26:05 +03:00
Mika Kuoppala	954337aa96	drm/i915/kbl: Add WaDisableSbeCacheDispatchPortSharing This is needed for all kbl revision. v2: Don't add revid checks to generic gen9 init (Arun) References: HSD#2135593 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-21-git-send-email-mika.kuoppala@intel.com	2016-06-08 16:25:49 +03:00
Mika Kuoppala	4de5d7ccbc	drm/i915/kbl: Add WaDisableGafsUnitClkGating We need to disable clock gating in this unit to work around hardware issue causing possible corruption/hang. v2: name the bit (Ville) v3: leave the fix enabled for 2227050 and set correct bit (Matthew) v4: Split out the skl part in separate commit for easier backport References: HSD#2227156, HSD#2227050 Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Matthew Auld <matthew.william.auld@gmail.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-20-git-send-email-mika.kuoppala@intel.com	2016-06-08 16:25:41 +03:00
Mika Kuoppala	ad2bdb44b1	drm/i915: Add WaInsertDummyPushConstP for bxt and kbl Add this workaround for both bxt and kbl up to until rev B0. References: HSD#2136703 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-16-git-send-email-mika.kuoppala@intel.com	2016-06-08 16:24:54 +03:00
Mika Kuoppala	c0b730d572	drm/i915/kbl: Add WaDisableDynamicCreditSharing Bspec states that we need to turn off dynamic credit sharing on kbl revid a0 and b0. This happens by writing bit 28 on 0x4ab8. References: HSD#2225601, HSD#2226938, HSD#2225763 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-15-git-send-email-mika.kuoppala@intel.com	2016-06-08 16:24:34 +03:00
Mika Kuoppala	fe90581987	drm/i915/kbl: Add WaDisableLSQCROPERFforOCL Extend the scope of this workaround, already used in skl, to also take effect in kbl. v2: Fix KBL_REVID_E0 (Matthew) References: HSD#2132677 Cc: Matthew Auld <matthew.william.auld@gmail.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-12-git-send-email-mika.kuoppala@intel.com	2016-06-08 16:24:03 +03:00
Mika Kuoppala	8401d42fd5	drm/i915/kbl: Add WaDisableFenceDestinationToSLM for A0 Add this workaround for kbl revid A0 only. v2: rebase v3: carve out a non related workaround (Chris) References: HSD#1911714 Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-9-git-send-email-mika.kuoppala@intel.com	2016-06-08 16:22:44 +03:00
Mika Kuoppala	e587f6cb0a	drm/i915/kbl: Add WaEnableGapsTsvCreditFix We need this crucial workaround from skl also to all kbl revisions. Lack of it was causing system hangs on skl enabling so this is a must have. v2: Don't add revid checks to gen9 init workarounds (Arun) References: HSD#2126660 Cc: Arun Siluvery <arun.siluvery@linux.intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-8-git-send-email-mika.kuoppala@intel.com	2016-06-08 16:21:39 +03:00
Mika Kuoppala	bbaefe72a0	drm/i915: Mimic skl with WaForceEnableNonCoherent Past evidence with system hangs and hsds tie WaForceEnableNonCoherent and WaDisableHDCInvalidation to WaForceContextSaveRestoreNonCoherent. Documentation states that WaForceContextSaveRestoreNonCoherent would not be needed on skl past E0 but evidence proved otherwise. See commit <510650e8b2ab> ("drm/i915/skl: Fix spurious gpu hang with gt3/gt4 revs"). In this scope consider kbl to be skl with a bigger revision than E0 so play it safe and bind these two workarounds to the WaForceContextSaveRestoreNonCoherent, and apply to all gen9. v2: fix comment (Matthew) References: HSD#2134449, HSD#2131413 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-7-git-send-email-mika.kuoppala@intel.com	2016-06-08 16:21:22 +03:00
Mika Kuoppala	5b0e365929	drm/i915/gen9: Always apply WaForceContextSaveRestoreNonCoherent The revision id range for this workaround has changed. So apply it to all revids on all gen9. References: HSD#2134449 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1465309159-30531-6-git-send-email-mika.kuoppala@intel.com	2016-06-08 16:20:33 +03:00

1 2 3 4 5 ...

836 Commits