OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Chris Wilson	c8659efac5	drm/i915: Drop spinlocks around adding to the client request list Adding to the tail of the client request list as the only other user is in the throttle ioctl that iterates forwards over the list. It only needs protection against deletion of a request as it reads it, it simply won't see a new request added to the end of the list, or it would be too early and rejected. We can further reduce the number of spinlocks required when throttling by removing stale requests from the client_list as we throttle. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170302122525.19675-1-chris@chris-wilson.co.uk	2017-03-02 22:33:41 +00:00
Kenneth Graunke	ef0f411f51	drm/i915: Drop support for I915_EXEC_CONSTANTS_* execbuf parameters. This patch makes the I915_PARAM_HAS_EXEC_CONSTANTS getparam return 0 (indicating the optional feature is not supported), and makes execbuf always return -EINVAL if the flags are used. Apparently, no userspace ever shipped which used this optional feature: I checked the git history of Mesa, xf86-video-intel, libva, and Beignet, and there were zero commits showing a use of these flags. Kernel commit `72bfa19c8d` apparently introduced the feature prematurely. According to Chris, the intention was to use this in cairo-drm, but "the use was broken for gen6", so I don't think it ever happened. 'relative_constants_mode' has always been tracked per-device, but this has actually been wrong ever since hardware contexts were introduced, as the INSTPM register is saved (and automatically restored) as part of the render ring context. The software per-device value could therefore get out of sync with the hardware per-context value. This meant that using them is actually unsafe: a client which tried to use them could damage the state of other clients, causing the GPU to interpret their BO offsets as absolute pointers, leading to bogus memory reads. These flags were also never ported to execlist mode, making them no-ops on Gen9+ (which requires execlists), and Gen8 in the default mode. On Gen8+, userspace can write these registers directly, achieving the same effect. On Gen6-7.5, it likely makes sense to extend the command parser to support them. I don't think anyone wants this on Gen4-5. Based on a patch by Dave Gordon. v3: Return -ENODEV for the getparam, as this is what we do for other obsolete features. Suggested by Chris Wilson. Cc: stable@vger.kernel.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92448 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170215093446.21291-1-kenneth@whitecape.org Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-02-24 17:51:05 +00:00
Chris Wilson	57822dc6b9	drm/i915: Perform object clflushing asynchronously Flushing the cachelines for an object is slow, can be as much as 100ms for a large framebuffer. We currently do this under the struct_mutex BKL on execution or on pageflip. But now with the ability to add fences to obj->resv for both flips and execbuf (and we naturally wait on the fence before CPU access), we can move the clflush operation to a workqueue and signal a fence for completion, thereby doing the work asynchronously and not blocking the driver or its clients. v2: Introduce i915_gem_clflush.h and use a new name, split out some extras into separate patches. Suggested-by: Akash Goel <akash.goel@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170222114049.28456-5-chris@chris-wilson.co.uk	2017-02-22 12:12:15 +00:00
Chris Wilson	208b84a375	drm/i915: Remove change_domain tracepoint The change_domain tracepoint has been inaccurate for a few years - it doesn't fully capture the domains, especially with userspace bypassing them. It is defunct, misleading and time to be removed. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170222114049.28456-1-chris@chris-wilson.co.uk	2017-02-22 12:12:11 +00:00
Tvrtko Ursulin	1cce8922df	drm/i915/tracepoints: Adjust i915_gem_ring_dispatch Rename it to i915_gem_request_queue and fix the logged info equivalent to the i915_gem_request even class. Also moved it a bit further apart from the i915_gem_request_add tracepoint since they otherwise provide similar information too close in time. v2: Remove sw fence singalling. We will rely on the soon to come GuC scheduling backend to enable that. (Chris Wilson) v3: Log hex with 0x prefix for clarity. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-02-21 13:17:57 +00:00
Chris Wilson	e2989f140e	drm/i915: Use reservation_object_lock() Replace the calls to ww_mutex_lock(&resv->lock) with the helper reservation_object_lock(resv) and similarly for unlock. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170221091723.6219-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2017-02-21 12:48:51 +00:00
Tvrtko Ursulin	73dec95e6b	drm/i915: Emit to ringbuffer directly This removes the usage of intel_ring_emit in favour of directly writing to the ring buffer. intel_ring_emit was preventing the compiler for optimising fetch and increment of the current ring buffer pointer and therefore generating very verbose code for every write. It had no useful purpose since all ringbuffer operations are started and ended with intel_ring_begin and intel_ring_advance respectively, with no bail out in the middle possible, so it is fine to increment the tail in intel_ring_begin and let the code manage the pointer itself. Useless instruction removal amounts to approximately two and half kilobytes of saved text on my build. Not sure if this has any measurable performance implications but executing a ton of useless instructions on fast paths cannot be good. v2: * Change return from intel_ring_begin to error pointer by popular demand. * Move tail increment to intel_ring_advance to enable some error checking. v3: * Move tail advance back into intel_ring_begin. * Rebase and tidy. v4: * Complete rebase after a few months since v3. v5: * Remove unecessary cast and fix !debug compile. (Chris Wilson) v6: * Make intel_ring_offset take request as well. * Fix recording of request postfix plus a sprinkle of asserts. (Chris Wilson) v7: * Use intel_ring_offset to get the postfix. (Chris Wilson) * Convert GVT code as well. v8: * Rename out++ to cs++. v9: * Fix GVT out to cs conversion in GVT. v10: * Rebase for new intel_ring_begin in selftests. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170214113242.29241-1-tvrtko.ursulin@linux.intel.com	2017-02-14 14:30:46 +00:00
Daniel Vetter	51a831a772	Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued Chris Wilson needs the new drm_driver->release callback to make sure the shiny new dma-buf testcases don't oops the driver on unload. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2017-02-10 16:27:24 +01:00
Michał Winiarski	038c95a313	drm/i915: Always convert incoming exec offsets to non-canonical We're using non-canonical addresses in drm_mm, and we're making sure that userspace is using canonical addressing - both in case of softpin (verifying incoming offset) and when relocating (converting to canonical when updating offset returned to userspace). Unfortunately when considering the need for relocations, we're comparing offset from userspace (in canonical form) with drm_mm node (in non-canonical form), and as a result, we end up always relocating if our offsets are in the "problematic" range. Let's always convert the offsets to avoid the performance impact of relocations. Fixes: `a5f0edf63b` ("drm/i915: Avoid writing relocs with addresses in non-canonical form") Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Reported-by: Michał Pyrzowski <michal.pyrzowski@intel.com> Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170207195559.18798-1-michal.winiarski@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-02-08 09:22:31 +00:00
Daniele Ceraolo Spurio	4a04e37122	drm/i915: fix pm refcounting on fence error in execbuf Fences are creted/checked before the pm ref is taken, so if we jump to pre_mutex_err we will uncorrectly call intel_runtime_pm_put. v2: Massage unwind error paths Fixes: `fec0445caa` (drm/i915: Support explicit fencing for execbuf) Testcase: igt/gem_exec_params Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1486161930-11764-1-git-send-email-daniele.ceraolospurio@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-02-04 09:42:07 +00:00
Chris Wilson	4e64e5539d	drm: Improve drm_mm search (and fix topdown allocation) with rbtrees The drm_mm range manager claimed to support top-down insertion, but it was neither searching for the top-most hole that could fit the allocation request nor fitting the request to the hole correctly. In order to search the range efficiently, we create a secondary index for the holes using either their size or their address. This index allows us to find the smallest hole or the hole at the bottom or top of the range efficiently, whilst keeping the hole stack to rapidly service evictions. v2: Search for holes both high and low. Rename flags to mode. v3: Discover rb_entry_safe() and use it! v4: Kerneldoc for enum drm_mm_insert_mode. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Russell King <rmk+kernel@armlinux.org.uk> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Sean Paul <seanpaul@chromium.org> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Christian Gmeiner <christian.gmeiner@gmail.com> Cc: Rob Clark <robdclark@gmail.com> Cc: Thierry Reding <thierry.reding@gmail.com> Cc: Stephen Warren <swarren@wwwdotorg.org> Cc: Alexandre Courbot <gnurou@gmail.com> Cc: Eric Anholt <eric@anholt.net> Cc: Sinclair Yeh <syeh@vmware.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com> # vmwgfx Reviewed-by: Lucas Stach <l.stach@pengutronix.de> #etnaviv Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20170202210438.28702-1-chris@chris-wilson.co.uk	2017-02-03 11:10:32 +01:00
Chris Wilson	fec0445caa	drm/i915: Support explicit fencing for execbuf Now that the user can opt-out of implicit fencing, we need to give them back control over the fencing. We employ sync_file to wrap our drm_i915_gem_request and provide an fd that userspace can merge with other sync_file fds and pass back to the kernel to wait upon before future execution. Testcase: igt/gem_exec_fence Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Chad Versace <chadversary@chromium.org> Link: http://patchwork.freedesktop.org/patch/msgid/20170127094008.27489-2-chris@chris-wilson.co.uk	2017-01-27 19:55:48 +00:00
Chris Wilson	77ae995789	drm/i915: Enable userspace to opt-out of implicit fencing Userspace is faced with a dilemma. The kernel requires implicit fencing to manage resource usage (we always must wait for the GPU to finish before releasing its PTE) and for third parties. However, userspace may wish to avoid this serialisation if it is either using explicit fencing between parties and wants more fine-grained access to buffers (e.g. it may partition the buffer between uses and track fences on ranges rather than the implicit fences tracking the whole object). It follows that userspace needs a mechanism to avoid the kernel's serialisation on its implicit fences before execbuf execution. The next question is whether this is an object, execbuf or context flag. Hybrid users (such as using explicit EGL_ANDROID_native_sync fencing on shared winsys buffers, but implicit fencing on internal surfaces) require a per-object level flag. Given that this flag need to be only set once for the lifetime of the object, this reduces the convenience of having an execbuf or context level flag (and avoids having multiple pieces of uABI controlling the same feature). Incorrect use of this flag will result in rendering corruption and GPU hangs - but will not result in use-after-free or similar resource tracking issues. Serious caveat: write ordering is not strictly correct after setting this flag on a render target on multiple engines. This affects all subsequent GEM operations (execbuf, set-domain, pread) and shared dma-buf operations. A fix is possible - but costly (both in terms of further ABI changes and runtime overhead). Testcase: igt/gem_exec_async Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Chad Versace <chadversary@chromium.org> Link: http://patchwork.freedesktop.org/patch/msgid/20170127094008.27489-1-chris@chris-wilson.co.uk	2017-01-27 19:55:35 +00:00
Chris Wilson	718659a630	drm/i915: Rename some warts in the VMA API Whilst writing testcases to exercise the VMA API, some oddities came to light, such as i915_gem_obj_lookup_or_create(). Joonas suggested i915_vma_instance() as a neat replacement, so rename them, move them to i915_vma.c and add some kerneldoc as a sugary bonus. s/i915_gem_obj_to_vma/i915_vma_lookup/ s/i915_gem_obj_lookup_or_create_vma/i915_vma_instance/ Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170116152131.18089-2-chris@chris-wilson.co.uk	2017-01-19 10:15:26 +00:00
Chris Wilson	f51455d442	drm/i915: Replace 4096 with PAGE_SIZE or I915_GTT_PAGE_SIZE Start converting over from the byte count to its semantic macro, either we want to allocate the size of a physical page in main memory or we want the size of a virtual page in the GTT. 4096 could mean either, but PAGE_SIZE and I915_GTT_PAGE_SIZE are explicit and should help improve code comprehension and future changes. In the future, we may want to use variable GTT page sizes and so have the challenge of knowing which hardcoded values were used to represent a physical page vs the virtual page. v2: Look for a few more 4096s to convert, discover IS_ALIGNED(). v3: 4096ul paranoia, make fence alignment a distinct value of 4096, keep bdw stolen w/a as 4096 until we know better. v4: Add asserts that i915_vma_insert() start/end are aligned to GTT page sizes. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170110144734.26052-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2017-01-10 20:54:32 +00:00
Chris Wilson	6095868a27	drm/i915: Complete kerneldoc for struct i915_gem_context The existing kerneldoc was outdated, so time for a refresh. v2: Use single line kdoc, mention functions for manipulation Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161231112012.29263-3-chris@chris-wilson.co.uk	2016-12-31 11:41:47 +00:00
Chris Wilson	81147b07f2	drm/i915: Add a reminder that i915_vma_move_to_active() requires struct_mutex i915_vma_move_to_active() requires the struct_mutex for serialisation with retirement, so mark it up with lockdep_assert_held(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-1-chris@chris-wilson.co.uk	2016-12-18 16:18:48 +00:00
Chris Wilson	172ae5b4c8	drm/i915: Fix i915_gem_evict_for_vma (soft-pinning) Soft-pinning depends upon being able to check for availabilty of an interval and evict overlapping object from a drm_mm range manager very quickly. Currently it uses a linear list, and so performance is dire and not suitable as a general replacement. Worse, the current code will oops if it tries to evict an active buffer. It also helps if the routine reports the correct error codes as expected by its callers and emits a tracepoint upon use. For posterity since the wrong patch was pushed (i.e. that missed these key points and had known bugs), this is the changelog that should have been on commit `506a8e87d8` ("drm/i915: Add soft-pinning API for execbuffer"): Userspace can pass in an offset that it presumes the object is located at. The kernel will then do its utmost to fit the object into that location. The assumption is that userspace is handling its own object locations (for example along with full-ppgtt) and that the kernel will rarely have to make space for the user's requests. This extends the DRM_IOCTL_I915_GEM_EXECBUFFER2 to do the following: * if the user supplies a virtual address via the execobject->offset and sets the EXEC_OBJECT_PINNED flag in execobject->flags, then that object is placed at that offset in the address space selected by the context specifier in execbuffer. * the location must be aligned to the GTT page size, 4096 bytes * as the object is placed exactly as specified, it may be used by this execbuffer call without relocations pointing to it It may fail to do so if: * EINVAL is returned if the object does not have a 4096 byte aligned address * the object conflicts with another pinned object (either pinned by hardware in that address space, e.g. scanouts in the aliasing ppgtt) or within the same batch. EBUSY is returned if the location is pinned by hardware EINVAL is returned if the location is already in use by the batch * EINVAL is returned if the object conflicts with its own alignment (as meets the hardware requirements) or if the placement of the object does not fit within the address space All other execbuffer errors apply. Presence of this execbuf extension may be queried by passing I915_PARAM_HAS_EXEC_SOFTPIN to DRM_IOCTL_I915_GETPARAM and checking for a reported value of 1 (or greater). v2: Combine the hole/adjusted-hole ENOSPC checks v3: More color, more splitting, more blurb. Fixes: `506a8e87d8` ("drm/i915: Add soft-pinning API for execbuffer") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161205142941.21965-2-chris@chris-wilson.co.uk	2016-12-05 20:49:17 +00:00
Chris Wilson	85fd4f58d7	drm/i915: Mark all non-vma being inserted into the address spaces We need to distinguish between full i915_vma structs and simple drm_mm_nodes when considering eviction (i.e. we must be careful not to treat a mere drm_mm_node as a much larger i915_vma causing memory corruption, if we are lucky). To do this, color these not-a-vma with -1 (I915_COLOR_UNEVICTABLE). v2...v200: New name for -1. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161205142941.21965-1-chris@chris-wilson.co.uk	2016-12-05 20:49:17 +00:00
Chris Wilson	41736a8e33	drm/i915: Use the precomputed value for whether to enable command parsing As i915.enable_cmd_parser is an unsafe option, make it read-only at runtime. Now that it is constant, we can use the value determined during initialisation as to whether we need the cmdparser at execbuffer time. v2: Remove the inline for its single user, it is clear enough (and shorter) without! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161124125851.6615-1-chris@chris-wilson.co.uk	2016-11-24 13:52:34 +00:00
Mika Kuoppala	bc1d53c647	drm/i915: Wipe hang stats as an embedded struct Bannable property, banned status, guilty and active counts are properties of i915_gem_context. Make them so. v2: rebase Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1479309634-28574-1-git-send-email-mika.kuoppala@intel.com	2016-11-21 14:36:56 +02:00
Chris Wilson	5b8c8aec8e	drm/i915: Move frontbuffer CS write tracking from ggtt vma to object I tried to avoid having to track the write for every VMA by only tracking writes to the ggtt. However, for the purposes of frontbuffer tracking this is insufficient as we need to invalidate around writes not just to the the ggtt but all aliased ppgtt views of the framebuffer. By moving the critical section to the object and only doing so for framebuffer writes we can reduce the tracking even further by only watching framebuffers and not vma. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161116190704.5293-1-chris@chris-wilson.co.uk Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2016-11-18 11:15:59 +00:00
Tvrtko Ursulin	f0836b726f	drm/i915: Use dev_priv in INTEL_INFO in i915_gem_execbuffer.c Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-11-17 13:56:02 +00:00
Tvrtko Ursulin	4805fe82c0	drm/i915: Further assorted dev_priv cleanups A small selection of macros which can only accept dev_priv from now on and a resulting trickle of fixups. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>	2016-11-11 14:58:26 +00:00
Tvrtko Ursulin	0031fb9685	drm/i915: Assorted dev_priv cleanups A small selection of macros which can only accept dev_priv from now on and a resulting trickle of fixups. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>	2016-11-11 14:58:26 +00:00
Chris Wilson	7aa6ca61ee	drm/i915: Mark CPU cache as dirty when used for rendering On LLC, or even snooped, machines rendering via the GPU ends up in the CPU cache. This cacheline dirt also needs to be flushed to main memory when moving to an incoherent domain, such as the display's scanout engine. Mostly, this happens because either the object is marked as dirty from its first use or is avoided by setting the object into the display domain from the start. v2: Treat WT as not requiring a clflush prior to use on the display engine as well. Fixes: `0f71979ab7` ("drm/i915: Performed deferred clflush inside set-cache-level") References: https://bugs.freedesktop.org/show_bug.cgi?id=95414 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: <stable@vger.kernel.org> # v4.0+ Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161107165204.7008-1-chris@chris-wilson.co.uk	2016-11-07 20:54:39 +00:00
Joonas Lahtinen	dfc5148fb3	drm/i915: Introduce HAS_64BIT_RELOC Move has_64bit_reloc into dev_priv->info. This will make it visible in the feature listing debug output. v2: - Keep the struct member to keep GCC fragile but happy (Chris) v3: - More detailed commit message (Chris) - Include forgotten CHV and BXT (Chris) Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1478162386-5018-1-git-send-email-joonas.lahtinen@linux.intel.com	2016-11-03 12:45:57 +02:00
Chris Wilson	d07f0e59b2	drm/i915: Move GEM activity tracking into a common struct reservation_object In preparation to support many distinct timelines, we need to expand the activity tracking on the GEM object to handle more than just a request per engine. We already use the struct reservation_object on the dma-buf to handle many fence contexts, so integrating that into the GEM object itself is the preferred solution. (For example, we can now share the same reservation_object between every consumer/producer using this buffer and skip the manual import/export via dma-buf.) v2: Reimplement busy-ioctl (by walking the reservation object), postpone the ABI change for another day. Similarly use the reservation object to find the last_write request (if active and from i915) for choosing display CS flips. Caveats: * busy-ioctl: busy-ioctl only reports on the native fences, it will not warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is being rendered to by external fences. It also will not report the same busy state as wait-ioctl (or polling on the dma-buf) in the same circumstances. On the plus side, it does retain reporting of which i915 engines are engaged with this object. * non-blocking atomic modesets take a step backwards as the wait for render completion blocks the ioctl. This is fixed in a subsequent patch to use a fence instead for awaiting on the rendering, see "drm/i915: Restore nonblocking awaits for modesetting" * dynamic array manipulation for shared-fences in reservation is slower than the previous lockless static assignment (e.g. gem_exec_lut_handle runtime on ivb goes from 42s to 66s), mainly due to atomic operations (maintaining the fence refcounts). * loss of object-level retirement callbacks, emulated by VMA retirement tracking. * minor loss of object-level last activity information from debugfs, could be replaced with per-vma information if desired Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk	2016-10-28 20:53:50 +01:00
Chris Wilson	a4f5ea64f0	drm/i915: Refactor object page API The plan is to make obtaining the backing storage for the object avoid struct_mutex (i.e. use its own locking). The first step is to update the API so that normal users only call pin/unpin whilst working on the backing storage. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-12-chris@chris-wilson.co.uk	2016-10-28 20:53:46 +01:00
Chris Wilson	f8a7fde456	drm/i915: Defer active reference until required We only need the active reference to keep the object alive after the handle has been deleted (so as to prevent a synchronous gem_close). Why then pay the price of a kref on every execbuf when we can insert that final active ref just in time for the handle deletion? Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-6-chris@chris-wilson.co.uk	2016-10-28 20:53:43 +01:00
Chris Wilson	b52992c06c	drm/i915: Support asynchronous waits on struct fence from i915_gem_request We will need to wait on DMA completion (as signaled via struct fence) before executing our i915_gem_request. Therefore we want to expose a method for adding the await on the fence itself to the request. v2: Add a comment detailing a failure to handle a signal-on-any fence-array. v3: Pretend that magic numbers don't exist. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-1-chris@chris-wilson.co.uk	2016-10-28 20:53:41 +01:00
Chris Wilson	fc0990903c	drm/i915: Remove insert-page shortcut from execbuf relocate_iomap() We are not allowed to touch the GTT entries underneath an atomic section, as they take a rpm wakelock (which is illegal from atomic context) and in the near future acquiring the DMA address for a page within an object may sleep for an allocation. This makes the current shortcircuit in relocation_iomap() for performing a second relocation on an adjacent page illegal, and we need to release the atomic iomapping, lookup the DMA, insert it into the GTT before reentering the atomic iomap section. As it happens, this is precisely what we do on if we are using an iomapping over the full object and not just a single page and by removing the shortcut, we do the right thing. Fixes: `9c870d0367` ("drm/i915: Use RPM as the barrier for controlling...") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028142756.3850-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2016-10-28 20:53:31 +01:00
Dave Airlie	5481e27f6f	Merge tag 'drm-intel-next-2016-10-24' of git://anongit.freedesktop.org/drm-intel into drm-next - first slice of the gvt device model (Zhenyu et al) - compression support for gpu error states (Chris) - sunset clause on gpu errors resulting in dmesg noise telling users how to report them - .rodata diet from Tvrtko - switch over lots of macros to only take dev_priv (Tvrtko) - underrun suppression for dp link training (Ville) - lspcon (hmdi 2.0 on skl/bxt) support from Shashank Sharma, polish from Jani - gen9 wm fixes from Paulo&Lyude - updated ddi programming for kbl (Rodrigo) - respect alternate aux/ddc pins (from vbt) for all ddi ports (Ville) * tag 'drm-intel-next-2016-10-24' of git://anongit.freedesktop.org/drm-intel: (227 commits) drm/i915: Update DRIVER_DATE to 20161024 drm/i915: Stop setting SNB min-freq-table 0 on powersave setup drm/i915/dp: add lane_count check in intel_dp_check_link_status drm/i915: Fix whitespace issues drm/i915: Clean up DDI DDC/AUX CH sanitation drm/i915: Respect alternate_ddc_pin for all DDI ports drm/i915: Respect alternate_aux_channel for all DDI ports drm/i915/gen9: Remove WaEnableYV12BugFixInHalfSliceChicken7 drm/i915: KBL - Recommended buffer translation programming for DisplayPort drm/i915: Move down skl/kbl ddi iboost and n_edp_entires fixup drm/i915: Add a sunset clause to GPU hang logging drm/i915: Stop reporting error details in dmesg as well as the error-state drm/i915/gvt: do not ignore return value of create_scratch_page drm/i915/gvt: fix spare warnings on odd constant _Bool cast drm/i915/gvt: mark symbols static where possible drm/i915/gvt: fix sparse warnings on different address spaces drm/i915/gvt: properly access enabled intel_engine_cs drm/i915/gvt: Remove defunct vmap_batch() drm/i915/gvt: Use common mapping routines for shadow_bb object drm/i915/gvt: Use common mapping routines for indirect_ctx object ...	2016-10-25 16:39:43 +10:00
Chris Wilson	ebc0808fa2	drm/i915: Restrict pagefault disabling to just around copy_from_user() When handling execbuf relocations, we play a delicate dance with pagefault. We first try to access the user pages underneath our struct_mutex. However, if those pages were inside a GEM object, we may trigger a pagefault and deadlock as i915_gem_fault() tries to recursively acquire struct_mutex. Instead, we choose to disable pagefaulting around the copy_from_user whilst inside the struct_mutex and handle the EFAULT by falling back to a copy outside the struct_mutex. We however presumed that disabling pagefaults would be expensive. It is just an operation on the local current task. Cheap enough that we can restrict the disable/enable to the critical section around the copy, and so avoid having to handle the atomic sections within the relocation handling itself. v2: Just illustrate the broken error handling rather than argue why it is safer to ignore it, for now. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161018120251.25043-4-chris@chris-wilson.co.uk	2016-10-18 14:22:27 +01:00
Michał Winiarski	4fb84d991e	drm/i915: Remove unused "valid" parameter from pte_encode We never used any invalid ptes, those were put in place for a possibility of doing gpu faults. However our batchbuffers are not restricted in length, so everything needs to be pointing to something and thus out-of-bounds is pointing to scratch. Remove the valid flag as it is always true. v2: Expand commit msg, patch reorder (Mika) Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1476360162-24062-1-git-send-email-michal.winiarski@intel.com	2016-10-14 12:40:32 +01:00
Tvrtko Ursulin	5db9401983	drm/i915: Make IS_GEN macros only take dev_priv Saves 1416 bytes of .rodata strings. v2: Add parantheses around dev_priv. (Ville Syrjala) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: David Weinehall <david.weinehall@linux.intel.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Jani Nikula <jani.nikula@linux.intel.com> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1476352990-2504-1-git-send-email-tvrtko.ursulin@linux.intel.com	2016-10-14 12:23:22 +01:00
Akash Goel	3b3f1650b1	drm/i915: Allocate intel_engine_cs structure only for the enabled engines With the possibility of addition of many more number of rings in future, the drm_i915_private structure could bloat as an array, of type intel_engine_cs, is embedded inside it. struct intel_engine_cs engine[I915_NUM_ENGINES]; Though this is still fine as generally there is only a single instance of drm_i915_private structure used, but not all of the possible rings would be enabled or active on most of the platforms. Some memory can be saved by allocating intel_engine_cs structure only for the enabled/active engines. Currently the engine/ring ID is kept static and dev_priv->engine[] is simply indexed using the enums defined in intel_engine_id. To save memory and continue using the static engine/ring IDs, 'engine' is defined as an array of pointers. struct intel_engine_cs engine[I915_NUM_ENGINES]; dev_priv->engine[engine_ID] will be NULL for disabled engine instances. There is a text size reduction of 928 bytes, from 1028200 to 1027272, for i915.o file (but for i915.ko file text size remain same as 1193131 bytes). v2: - Remove the engine iterator field added in drm_i915_private structure, instead pass a local iterator variable to the for_each_engine* macros. (Chris) - Do away with intel_engine_initialized() and instead directly use the NULL pointer check on engine pointer. (Chris) v3: - Remove for_each_engine_id() macro, as the updated macro for_each_engine() can be used in place of it. (Chris) - Protect the access to Render engine Fault register with a NULL check, as engine specific init is done later in Driver load sequence. v4: - Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris) - Kill the superfluous init_engine_lists(). v5: - Cleanup the intel_engines_init() & intel_engines_setup(), with respect to allocation of intel_engine_cs structure. (Chris) v6: - Rebase. v7: - Optimize the for_each_engine_masked() macro. (Chris) - Change the type of 'iter' local variable to enum intel_engine_id. (Chris) - Rebase. v8: Rebase. v9: Rebase. v10: - For index calculation use engine ID instead of pointer based arithmetic in intel_engine_sync_index() as engine pointers are not contiguous now (Chris) - For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas) - Use for_each_engine macro for cleanup in intel_engines_init() and remove check for NULL engine pointer in cleanup() routines. (Joonas) v11: Rebase. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Akash Goel <akash.goel@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com	2016-10-14 09:58:43 +01:00
Linus Torvalds	6b25e21fa6	Merge tag 'drm-for-v4.9' of git://people.freedesktop.org/~airlied/linux Pull drm updates from Dave Airlie: "Core: - Fence destaging work - DRIVER_LEGACY to split off legacy drm drivers - drm_mm refactoring - Splitting drm_crtc.c into chunks and documenting better - Display info fixes - rbtree support for prime buffer lookup - Simple VGA DAC driver Panel: - Add Nexus 7 panel - More simple panels i915: - Refactoring GEM naming - Refactored vma/active tracking - Lockless request lookups - Better stolen memory support - FBC fixes - SKL watermark fixes - VGPU improvements - dma-buf fencing support - Better DP dongle support amdgpu: - Powerplay for Iceland asics - Improved GPU reset support - UVD/VEC powergating support for CZ/ST - Preinitialised VRAM buffer support - Virtual display support - Initial SI support - GTT rework - PCI shutdown callback support - HPD IRQ storm fixes amdkfd: - bugfixes tilcdc: - Atomic modesetting support mediatek: - AAL + GAMMA engine support - Hook up gamma LUT - Temporal dithering support imx: - Pixel clock from devicetree - drm bridge support for LVDS bridges - active plane reconfiguration - VDIC deinterlacer support - Frame synchronisation unit support - Color space conversion support analogix: - PSR support - Better panel on/off support rockchip: - rk3399 vop/crtc support - PSR support vc4: - Interlaced vblank timing - 3D rendering CPU overhead reduction - HDMI output fixes tda998x: - HDMI audio ASoC support sunxi: - Allwinner A33 support - better TCON support msm: - DT binding cleanups - Explicit fence-fd support sti: - remove sti415/416 support etnaviv: - MMUv2 refactoring - GC3000 support exynos: - Refactoring HDMI DCC/PHY - G2D pm regression fix - Page fault issues with wait for vblank There is no nouveau work in this tree, as Ben didn't get a pull request in, and he was fighting moving to atomic and adding mst support, so maybe best it waits for a cycle" * tag 'drm-for-v4.9' of git://people.freedesktop.org/~airlied/linux: (1412 commits) drm/crtc: constify drm_crtc_index parameter drm/i915: Fix conflict resolution from backmerge of v4.8-rc8 to drm-next drm/i915/guc: Unwind GuC workqueue reservation if request construction fails drm/i915: Reset the breadcrumbs IRQ more carefully drm/i915: Force relocations via cpu if we run out of idle aperture drm/i915: Distinguish last emitted request from last submitted request drm/i915: Allow DP to work w/o EDID drm/i915: Move long hpd handling into the hotplug work drm/i915/execlists: Reinitialise context image after GPU hang drm/i915: Use correct index for backtracking HUNG semaphores drm/i915: Unalias obj->phys_handle and obj->userptr drm/i915: Just clear the mmiodebug before a register access drm/i915/gen9: only add the planes actually affected by ddb changes drm/i915: Allow PCH DPLL sharing regardless of DPLL_SDVO_HIGH_SPEED drm/i915/bxt: Fix HDMI DPLL configuration drm/i915/gen9: fix the watermark res_blocks value drm/i915/gen9: fix plane_blocks_per_line on watermarks calculations drm/i915/gen9: minimum scanlines for Y tile is not always 4 drm/i915/gen9: fix the WaWmMemoryReadLatency implementation drm/i915/kbl: KBL also needs to run the SAGV code ...	2016-10-11 18:12:22 -07:00
Chris Wilson	c92fa4fe90	drm/i915: Force relocations via cpu if we run out of idle aperture If we run out of enough aperture space to fit the entire object, we fallback to trying to insert a single page. However, if that also fails, we currently fail to userspace with an unexpected ENOSPC. (ENOSPC means to userspace that their batch could not be fitted within the GTT.) Prior to commit `e8cb909ac3` ("drm/i915: Fallback to single page GTT mmappings for relocations") the approach is to fallback to using the slow CPU relocation path in case of iomapping failure, and that is the behaviour we need to restore. Fixes: `e8cb909ac3` ("drm/i915: Fallback to single page GTT mmappings...") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98101 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161007065327.24515-2-chris@chris-wilson.co.uk (cherry picked from commit `d7f7633557`) Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2016-10-10 16:06:44 +03:00
Chris Wilson	d7f7633557	drm/i915: Force relocations via cpu if we run out of idle aperture If we run out of enough aperture space to fit the entire object, we fallback to trying to insert a single page. However, if that also fails, we currently fail to userspace with an unexpected ENOSPC. (ENOSPC means to userspace that their batch could not be fitted within the GTT.) Prior to commit `e8cb909ac3` ("drm/i915: Fallback to single page GTT mmappings for relocations") the approach is to fallback to using the slow CPU relocation path in case of iomapping failure, and that is the behaviour we need to restore. Fixes: `e8cb909ac3` ("drm/i915: Fallback to single page GTT mmappings...") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98101 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161007065327.24515-2-chris@chris-wilson.co.uk	2016-10-07 08:27:22 +01:00
Jani Nikula	615e500083	drm/i915: silence io mapping/unmapping sparse warnings on different address spaces drivers/gpu/drm/i915/i915_gem_execbuffer.c:432:52: warning: incorrect type in argument 1 (different address spaces) drivers/gpu/drm/i915/i915_gem_execbuffer.c:432:52: expected void [noderef] <asn:2>vaddr drivers/gpu/drm/i915/i915_gem_execbuffer.c:432:52: got void drivers/gpu/drm/i915/i915_gem_execbuffer.c:477:15: warning: incorrect type in assignment (different address spaces) drivers/gpu/drm/i915/i915_gem_execbuffer.c:477:15: expected void vaddr drivers/gpu/drm/i915/i915_gem_execbuffer.c:477:15: got void [noderef] <asn:2> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1475574853-4178-2-git-send-email-jani.nikula@intel.com	2016-10-04 17:07:07 +03:00
Al Viro	4bce9f6ee8	get rid of separate multipage fault-in primitives * the only remaining callers of "short" fault-ins are just as happy with generic variants (both in lib/iov_iter.c); switch them to multipage variants, kill the "short" ones * rename the multipage variants to now available plain ones. * get rid of compat macro defining iov_iter_fault_in_multipage_readable by expanding it in its only user. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-09-27 18:12:24 -04:00
Chris Wilson	851ba2d697	drm/i915: Serialise execbuf operation after a dma-buf reservation object Now that we can wait upon fences before emitting the request, it becomes trivial to wait upon any implicit fence provided by the dma-buf reservation object. To protect against failure, we force any asynchronous waits on a foreign fence to timeout after 10s - so that a stall in another driver does not permanently cripple ourselves. Still unpleasant though! Testcase: igt/prime_vgem/fence-wait Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: John Harrison <john.c.harrison@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-21-chris@chris-wilson.co.uk	2016-09-09 14:23:08 +01:00
Chris Wilson	a2bc4695bb	drm/i915: Prepare object synchronisation for asynchronicity We are about to specialize object synchronisation to enable nonblocking execbuf submission. First we make a copy of the current object synchronisation for execbuffer. The general i915_gem_object_sync() will be removed following the removal of CS flips in the near future. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: John Harrison <john.c.harrison@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-16-chris@chris-wilson.co.uk	2016-09-09 14:23:06 +01:00
Joonas Lahtinen	6f63340284	drm/i915: Use atomic for dev_priv->mm.bsd_engine_dispatch_index Use atomic type and operands for dev_priv->mm.bsd_engine_dispatch_index to avoid one struct_mutex locking scenario. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Imre Deak <imre.deak@intel.com> Cc: Zhao Yakui <yakui.zhao@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1472731101-21982-1-git-send-email-joonas.lahtinen@linux.intel.com	2016-09-01 15:39:25 +03:00
Chris Wilson	f7978a0c58	drm/i915: Allow the user to pass a context to any ring With full-ppgtt, we want the user to have full control over their memory layout, with a separate instance per context. Forcing them to use a shared memory layout for !RCS not only duplicates the amount of work we have to do, but also defeats the memory segregation on offer. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20160822080350.4964-4-chris@chris-wilson.co.uk Reviewed-by: John Harrison <john.c.harrison@intel.com> Reviewed-by: Thomas Daniel <thomas.daniel@intel.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2016-08-26 20:26:53 +01:00
Chris Wilson	dcd79934b0	drm/i915: Unconditionally flush any chipset buffers before execbuf If userspace is asynchronously streaming into the batch or other execobjects, we may not flush those writes along with a change in cache domain (as there is no change). Therefore those writes may end up in internal chipset buffers and not visible to the GPU upon execution. We must issue a flush command or otherwise we encounter incoherency in the batchbuffers and the GPU executing invalid commands (i.e. hanging) quite regularly. v2: Throw a paranoid wmb() into the general flush so that we remain consistent with before. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90841 Fixes: `1816f92363` ("drm/i915: Support creation of unbound wc user...") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Akash Goel <akash.goel@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Tested-by: Matti Hämäläinen <ccr@tnsp.org> Cc: stable@vger.kernel.org Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-1-chris@chris-wilson.co.uk (cherry picked from commit `600f436801`) Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2016-08-22 16:04:50 +03:00
Chris Wilson	f7bbe7883c	drm/i915: Embed the io-mapping struct inside drm_i915_private As io_mapping.h now always allocates the struct, we can avoid that allocation and extra pointer dance by embedding the struct inside drm_i915_private Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160819155428.1670-5-chris@chris-wilson.co.uk	2016-08-19 17:13:35 +01:00
Chris Wilson	0b5372727b	drm/i915/cmdparser: Use cached vmappings The single largest factor in the overhead of parsing the commands is the setup of the virtual mapping to provide a continuous block for the batch buffer. If we keep those vmappings around (against the better judgement of mm/vmalloc.c, which we offset by handwaving and looking suggestively at the shrinker) we can dramatically improve the performance of the parser for small batches (such as media workloads). Furthermore, we can use the prepare shmem read/write functions to determine how best we need to clflush the range (rather than every page of the object). The impact of caching both src/dst vmaps is +80% on ivb and +140% on byt for the throughput on small batches. (Caching just the dst vmap and iterating over the src, doing a page by page copy is roughly 5% slower on both platforms. That may be an acceptable trade-off to eliminate one cached vmapping, and we may be able to reduce the per-page copying overhead further.) For this simple test case, the cmdparser is now within a factor of 2 of ideal performance. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.william.auld@gmail.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-33-chris@chris-wilson.co.uk	2016-08-18 22:36:59 +01:00
Chris Wilson	49ef5294cd	drm/i915: Move fence tracking from object to vma In order to handle tiled partial GTT mmappings, we need to associate the fence with an individual vma. v2: A couple of silly drops replaced spotted by Joonas Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-21-chris@chris-wilson.co.uk	2016-08-18 22:36:50 +01:00

1 2 3 4 5 ...

396 Commits