OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Joonas Lahtinen	6f63340284	drm/i915: Use atomic for dev_priv->mm.bsd_engine_dispatch_index Use atomic type and operands for dev_priv->mm.bsd_engine_dispatch_index to avoid one struct_mutex locking scenario. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Imre Deak <imre.deak@intel.com> Cc: Zhao Yakui <yakui.zhao@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1472731101-21982-1-git-send-email-joonas.lahtinen@linux.intel.com	2016-09-01 15:39:25 +03:00
Chris Wilson	4cc6907501	drm/i915: Add I915_PARAM_MMAP_GTT_VERSION to advertise unlimited mmaps Now that we have working partial VMA and faulting support for all objects, including fence support, advertise to userspace that it can take advantage of unlimited GGTT mmaps. v2: Make room in the kerneldoc for a more detailed explanation of the limitations of the GTT mmap interface. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20160825180519.11341-1-chris@chris-wilson.co.uk	2016-08-26 08:42:26 +01:00
Chris Wilson	c58305af18	drm/i915: Use remap_io_mapping() to prefault all PTE in a single pass Very old numbers indicate this is a 66% improvement when remapping the entire object for fence contention - due to the elimination of track_pfn_insert and its strcmp. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Testcase: igt/gem_fence_upload/performance Testcase: igt/gem_mmap_gtt Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160819155428.1670-6-chris@chris-wilson.co.uk	2016-08-19 17:13:36 +01:00
Chris Wilson	f7bbe7883c	drm/i915: Embed the io-mapping struct inside drm_i915_private As io_mapping.h now always allocates the struct, we can avoid that allocation and extra pointer dance by embedding the struct inside drm_i915_private Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160819155428.1670-5-chris@chris-wilson.co.uk	2016-08-19 17:13:35 +01:00
Chris Wilson	cd3127d684	drm/i915: Stop discarding GTT cache-domain on unbind vma Since commit `43566dedde` ("drm/i915: Broaden application of set-domain(GTT)") we allowed objects to be in the GTT domain, but unbound. Therefore removing the GTT cache domain when removing the GGTT vma is no longer semantically correct. An unfortunate side-effect is we lose the wondrously named i915_gem_object_finish_gtt(), not to be confused with i915_gem_gtt_finish_object()! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Akash Goel <akash.goel@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-30-chris@chris-wilson.co.uk	2016-08-18 22:36:57 +01:00
Chris Wilson	383d5823e9	drm/i915: Bump the inactive tracking for all VMA accessed We track the LRU access for eviction and bump the last access for the user GGTT on set-to-gtt. When we do so we need to not only bump the primary GGTT VMA but all partials as well. Similarly we want to bump the last access tracking for when unpinning an object from the scanout so that they do not get promptly evicted and hopefully remain available for reuse on the next frame. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-29-chris@chris-wilson.co.uk	2016-08-18 22:36:57 +01:00
Chris Wilson	d8923dcfa5	drm/i915: Track display alignment on VMA When using the aliasing ppgtt and pageflipping with the shrinker/eviction active, we note that we often have to rebind the backbuffer before flipping onto the scanout because it has an invalid alignment. If we store the worst-case alignment required for a VMA, we can avoid having to rebind at critical junctures. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-28-chris@chris-wilson.co.uk	2016-08-18 22:36:56 +01:00
Chris Wilson	2efb813d53	drm/i915: Fallback to using unmappable memory for scanout The existing ABI says that scanouts are pinned into the mappable region so that legacy clients (e.g. old Xorg or plymouthd) can write directly into the scanout through a GTT mapping. However if the surface does not fit into the mappable region, we are better off just trying to fit it anywhere and hoping for the best. (Any userspace that is capable of using ginormous scanouts is also likely not to rely on pure GTT updates.) With the partial vma fault support, we are no longer restricted to only using scanouts that we can pin (though it is still preferred for performance reasons and for powersaving features like FBC). v2: Skip fence pinning when not mappable. v3: Add a comment to explain the possible ramifications of not being able to use fences for unmappable scanouts. v4: Rebase to skip over some local patches v5: Rebase to defer until after we have unmappable GTT fault support Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Deepak S <deepak.s@linux.intel.com> Cc: Damien Lespiau <damien.lespiau@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jani Nikula <jani.nikula@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-27-chris@chris-wilson.co.uk	2016-08-18 22:36:56 +01:00
Chris Wilson	821188778b	drm/i915: Choose not to evict faultable objects from the GGTT Often times we do not want to evict mapped objects from the GGTT as these are quite expensive to teardown and frequently reused (causing an equally, if not more so, expensive setup). In particular, when faulting in a new object we want to avoid evicting an active object, or else we may trigger a page-fault-of-doom as we ping-pong between evicting two objects. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-26-chris@chris-wilson.co.uk	2016-08-18 22:36:55 +01:00
Chris Wilson	50349247ea	drm/i915: Drop ORIGIN_GTT for untracked GTT writes If FBC is set on a framebuffer that is unmapped, all GTT faults will be from a partial mapping. Writes by the user through the partial VMA are then untracked by the FBC and so we must use the ORIGIN_CPU when flushing the I915_GEM_DOMAIN_GTT. v2: Keep ORIGIN_CPU for set-to-domain(.write=CPU) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: "Zanoni, Paulo R" <paulo.r.zanoni@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-25-chris@chris-wilson.co.uk	2016-08-18 22:36:55 +01:00
Chris Wilson	aa136d9d72	drm/i915: Convert partial ggtt vma to full ggtt if it spans the entire object If we want to create a partial vma from a chunk that is the same size as the object, create a normal ggtt vma instead. The benefit is that it will match future requests for the normal ggtt. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-24-chris@chris-wilson.co.uk	2016-08-18 22:36:51 +01:00
Chris Wilson	a61007a83a	drm/i915: Fix partial GGTT faulting We want to always use the partial VMA as a fallback for a failure to bind the object into the GGTT. This extends the support partial objects in the GGTT to cover everything, not just objects too large. v2: Call the partial view, view not partial. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-23-chris@chris-wilson.co.uk	2016-08-18 22:36:51 +01:00
Chris Wilson	03af84fe7f	drm/i915: Choose partial chunksize based on tile row size In order to support setting up fences for partial mappings of an object, we have to align those mappings with the fence. The minimum chunksize we choose is at least the size of a single tile row. v2: Make minimum chunk size a define for later use Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-22-chris@chris-wilson.co.uk	2016-08-18 22:36:50 +01:00
Chris Wilson	49ef5294cd	drm/i915: Move fence tracking from object to vma In order to handle tiled partial GTT mmappings, we need to associate the fence with an individual vma. v2: A couple of silly drops replaced spotted by Joonas Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-21-chris@chris-wilson.co.uk	2016-08-18 22:36:50 +01:00
Chris Wilson	a1e5afbe4d	drm/i915: Rename fence.lru_list to link Our current practice is to only name the actual list (here dev_priv->fence_list) using "list", and elements upon that list are referred to as "link". Further, the lru nature is of the list and not of the node and including in the name does not disambiguate the link from anything else. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-20-chris@chris-wilson.co.uk	2016-08-18 22:36:49 +01:00
Chris Wilson	05a20d098d	drm/i915: Move map-and-fenceable tracking to the VMA By moving map-and-fenceable tracking from the object to the VMA, we gain fine-grained tracking and the ability to track individual fences on the VMA (subsequent patch). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-16-chris@chris-wilson.co.uk	2016-08-18 22:36:48 +01:00
Chris Wilson	b0dc465f95	drm/i915: Tidy up flush cpu/gtt write domains Since we know the write domain, we can drop the local variable and make the code look a tiny bit simpler. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-12-chris@chris-wilson.co.uk	2016-08-18 22:36:46 +01:00
Chris Wilson	9764951e7f	drm/i915: Pin the pages first in shmem prepare read/write There is an improbable, but not impossible, case that if we leave the pages unpin as we operate on the object, then somebody via the shrinker may steal the lock (which lock? right now, it is struct_mutex, THE lock) and change the cache domains after we have already inspected them. (Whilst here, avail ourselves of the opportunity to take a couple of steps to make the two functions look more similar.) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-11-chris@chris-wilson.co.uk	2016-08-18 22:36:45 +01:00
Chris Wilson	3b5724d702	drm/i915: Wait for writes through the GTT to land before reading back If we quickly switch from writing through the GTT to a read of the physical page directly with the CPU (e.g. performing relocations through the GTT and then running the command parser), we can observe that the writes are not visible to the CPU. It is not a coherency problem, as extensive investigations with clflush have demonstrated, but a mere timing issue - we have to wait for the GTT to complete it's write before we start our read from the CPU. The issue can be illustrated in userspace with: gtt = gem_mmap__gtt(fd, handle, 0, OBJECT_SIZE, PROT_READ \| PROT_WRITE); cpu = gem_mmap__cpu(fd, handle, 0, OBJECT_SIZE, PROT_READ \| PROT_WRITE); gem_set_domain(fd, handle, I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT); for (i = 0; i < OBJECT_SIZE / 64; i++) { int x = 16*i + (i%16); gtt[x] = i; clflush(&cpu[x], sizeof(cpu[x])); assert(cpu[x] == i); } Experimenting with that shows that this behaviour is indeed limited to recent Atom-class hardware. Testcase: igt/gem_exec_flush/basic-batch-default-cmd #byt Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-10-chris@chris-wilson.co.uk	2016-08-18 22:36:45 +01:00
Chris Wilson	a314d5cb4a	drm/i915: Before accessing an object via the cpu, flush GTT writes If we want to read the pages directly via the CPU, we have to be sure that we have to flush the writes via the GTT (as the CPU can not see the address aliasing). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-9-chris@chris-wilson.co.uk	2016-08-18 22:36:44 +01:00
Chris Wilson	43394c7d0d	drm/i915: Extract i915_gem_obj_prepare_shmem_write() This is a companion to i915_gem_obj_prepare_shmem_read() that prepares the backing storage for direct writes. It first serialises with the GPU, pins the backing storage and then indicates what clfushes are required in order for the writes to be coherent. Whilst here, fix support for ancient CPUs without clflush for which we cannot do the GTT+clflush tricks. v2: Add i915_gem_obj_finish_shmem_access() for symmetry Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-8-chris@chris-wilson.co.uk	2016-08-18 22:36:44 +01:00
Chris Wilson	1803458478	drm/i915: Fallback to single page pwrite/pread if unable to release fence If we cannot release the fence (for example if someone is inexplicably trying to write into a tiled framebuffer that is currently pinned to the display! cough kms_frontbuffer_tracking cough) fallback to using the page-by-page pwrite/pread interface, rather than fail the syscall entirely. Since this is triggerable by the user (along pwrite) we have to remove the WARN_ON(fence->pin_count). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-6-chris@chris-wilson.co.uk	2016-08-18 22:36:43 +01:00
Chris Wilson	d243ad8202	drm/i915: Mark up the GTT flush following WC writes as ORIGIN_CPU Similarly to invalidating beforehand, if the object is mmapped via I915_MMAP_WC we cannot track writes through the I915_GEM_DOMAIN_GTT. At the conclusion of the write, i915_gem_object_flush_gtt_writes() we also need to treat the origin carefully in case it may have been untracked. See also commit `aeecc9696a` ("drm/i915: use ORIGIN_CPU for frontbuffer invalidation on WC mmaps"). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-5-chris@chris-wilson.co.uk	2016-08-18 22:36:43 +01:00
Chris Wilson	b19482d7ce	drm/i915: Use ORIGIN_CPU for fb invalidation from pwrite As pwrite does not use the fence for its GTT access, and may even go through a secondary interface avoiding the main VMA, we cannot treat the write as automatically invalidated by the hardware and so we require ORIGIN_CPU frontbufer invalidate/flushes. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-4-chris@chris-wilson.co.uk Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2016-08-18 22:36:24 +01:00
Chris Wilson	4b30cb2334	drm/i915: vfree() no longer ignores the low bits of the address Since vfree() now likes to WARN when passed a non-page-aligned pointer, we need to discard the low bits to comply with it. Fixes: `d31d7cb146` ("drm/i915: Support for creating write combined type vmaps") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-3-chris@chris-wilson.co.uk	2016-08-18 22:36:23 +01:00
Chris Wilson	1255501d86	drm/i915: Embrace the race in busy-ioctl Daniel Vetter proposed a new challenge to the serialisation inside the busy-ioctl that exposed a flaw that could result in us reporting the wrong engine as being busy. If the request is reallocated as we test its busyness and then reassigned to this object by another thread, we would not notice that the test itself was incorrect. We are faced with a choice of using __i915_gem_active_get_request_rcu() to first acquire a reference to the request preventing the race, or to acknowledge the race and accept the limitations upon the accuracy of the busy flags. Note that we guarantee that we never falsely report the object as idle (providing userspace itself doesn't race), and so the most important use of the busy-ioctl and its guarantees are fulfilled. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471337440-16777-1-git-send-email-chris@chris-wilson.co.uk	2016-08-16 10:35:02 +01:00
Chris Wilson	bde13ebdab	drm/i915: Introduce i915_ggtt_offset() This little helper only exists to safely discard the upper unused 32bits of the general 64-bit VMA address - as we know that all Global GTT currently are less than 4GiB in size and so that the upper bits must be zero. In many places, we use a u32 for the global GTT offset and we want to document where we are discarding the full VMA offset. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-28-git-send-email-chris@chris-wilson.co.uk	2016-08-15 11:01:14 +01:00
Chris Wilson	058d88c433	drm/i915: Track pinned VMA Treat the VMA as the primary struct responsible for tracking bindings into the GPU's VM. That is we want to treat the VMA returned after we pin an object into the VM as the cookie we hold and eventually release when unpinning. Doing so eliminates the ambiguity in pinning the object and then searching for the relevant pin later. v2: Joonas' stylistic nitpicks, a fun rebase. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-27-git-send-email-chris@chris-wilson.co.uk	2016-08-15 11:01:13 +01:00
Chris Wilson	247177ddd5	drm/i915: Always set the vma->pages Previously, we would only set the vma->pages pointer for GGTT entries. However, if we always set it, we can use it to prettify some code that may want to access the backing store associated with the VMA (as assigned to the VMA). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-8-git-send-email-chris@chris-wilson.co.uk	2016-08-15 11:00:54 +01:00
Daniel Vetter	cc9263874b	Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued Backmerge because too many conflicts, and also we need to get at the latest struct fence patches from Gustavo. Requested by Chris Wilson. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2016-08-15 10:41:47 +02:00
Dave Airlie	fc93ff608b	Merge tag 'drm-intel-next-2016-08-08' of git://anongit.freedesktop.org/drm-intel into drm-next - refactor ddi buffer programming a bit (Ville) - large-scale renaming to untangle naming in the gem code (Chris) - rework vma/active tracking for accurately reaping idle mappings of shared objects (Chris) - misc dp sst/mst probing corner case fixes (Ville) - tons of cleanup&tunings all around in gem - lockless (rcu-protected) request lookup, plus use it everywhere for non(b)locking waits (Chris) - pipe crc debugfs fixes (Rodrigo) - random fixes all over * tag 'drm-intel-next-2016-08-08' of git://anongit.freedesktop.org/drm-intel: (222 commits) drm/i915: Update DRIVER_DATE to 20160808 drm/i915: fix aliasing_ppgtt leak drm/i915: Update comment before i915_spin_request drm/i915: Use drm official vblank_no_hw_counter callback. drm/i915: Fix copy_to_user usage for pipe_crc Revert "drm/i915: Track active streams also for DP SST" drm/i915: fix WaInsertDummyPushConstPs drm/i915: Assert that the request hasn't been retired drm/i915: Repack fence tiling mode and stride into a single integer drm/i915: Document and reject invalid tiling modes drm/i915: Remove locking for get_tiling drm/i915: Remove pinned check from madvise ioctl drm/i915: Reduce locking inside swfinish ioctl drm/i915: Remove (struct_mutex) locking for busy-ioctl drm/i915: Remove (struct_mutex) locking for wait-ioctl drm/i915: Do a nonblocking wait first in pread/pwrite drm/i915: Remove unused no-shrinker-steal drm/i915: Tidy generation of the GTT mmap offset drm/i915/shrinker: Wait before acquiring struct_mutex under oom drm/i915: Simplify do_idling() (Ironlake vt-d w/a) ...	2016-08-15 16:53:57 +10:00
Chris Wilson	02bef8f98d	drm/i915: Unbind closed vma for i915_gem_object_unbind() Closed vma are removed from the obj->vma_list so that they cannot be found by userspace. However, this means that when forcibly unbinding an object, we have to wait upon all rendering to that object first in order for the closed, but active, vma to be reaped and their bindings removed. Reported-by: Matthew Auld <matthew.auld@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97343 Fixes: `aa653a685d` ("drm/i915: Be more careful when unbinding vma") Fixes: `8a3b3d576c` (" drm/i915: Convert non-blocking userptr waits...") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471196681-30043-2-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.auld@intel.com> Tested-by: Matthew Auld <matthew.auld@intel.com>	2016-08-14 19:40:09 +01:00
Chris Wilson	35a9611ca0	drm/i915: Initialize return value for empty i915_gem_object_unbind() If the obj->vma_list is empty, we immediately return ret. However, we are doing so having never set it to any value, it should be zero! Reported-by: Matthew Auld <matthew.auld@intel.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=97343 Fixes: `aa653a685d` ("drm/i915: Be more careful when unbinding vma") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471196681-30043-1-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.auld@intel.com>	2016-08-14 19:38:26 +01:00
Chris Wilson	d31d7cb146	drm/i915: Support for creating write combined type vmaps vmaps has a provision for controlling the page protection bits, with which we can use to control the mapping type, e.g. WB, WC, UC or even WT. To allow the caller to choose their mapping type, we add a parameter to i915_gem_object_pin_map - but we still only allow one vmap to be cached per object. If the object is currently not pinned, then we recreate the previous vmap with the new access type, but if it was pinned we report an error. This effectively limits the access via i915_gem_object_pin_map to a single mapping type for the lifetime of the object. Not usually a problem, but something to be aware of when setting up the object's vmap. We will want to vary the access type to enable WC mappings of ringbuffer and context objects on !llc platforms, as well as other objects where we need coherent access to the GPU's pages without going through the GTT v2: Remove the redundant braces around pin count check and fix the marker in documentation (Chris) v3: - Add a new enum for the vmalloc mapping type & pass that as an argument to i915_object_pin_map. (Tvrtko) - Use PAGE_MASK to extract or filter the mapping type info and remove a superfluous BUG_ON.(Tvrtko) v4: - Rename the enums and clean up the pin_map function. (Chris) v5: Drop the VM_NO_GUARD, minor cosmetics. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Akash Goel <akash.goel@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk	2016-08-12 13:06:36 +01:00
Chris Wilson	83348ba84e	drm/i915: Move missed interrupt detection from hangcheck to breadcrumbs In commit `2529d57050` ("drm/i915: Drop racy markup of missed-irqs from idle-worker") the racy detection of missed interrupts was removed when we went idle. This however opened up the issue that the stuck waiters were not being reported, causing a test case failure. If we move the stuck waiter detection out of hangcheck and into the breadcrumb mechanims (i.e. the waiter) itself, we can avoid this issue entirely. This leaves hangcheck looking for a stuck GPU (inspecting for request advancement and HEAD motion), and breadcrumbs looking for a stuck waiter - hopefully make both easier to understand by their segregation. v2: Reduce the error message as we now run independently of hangcheck, and the hanging batch used by igt also counts as a stuck waiter causing extra warnings in dmesg. v3: Move the breadcrumb's hangcheck kickstart to the first missed wait. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97104 Fixes: `2529d57050` (waiter"drm/i915: Drop racy markup of missed-irqs...") Testcase: igt/drv_missed_irq Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470761272-1245-2-git-send-email-chris@chris-wilson.co.uk	2016-08-10 10:37:35 +01:00
Chris Wilson	70cb472c6d	drm/i915: Always mark the writer as also a read for busy ioctl One of the few guarantees we want the busy ioctl to provide is that the reported busy writer is included in the set of busy read engines. This should be provided by the ordering of setting and retiring the active trackers, but we can do better by explicitly setting the busy read engine flag for the last writer. v2: More comments inside __busy_write_id() to explain why both fields are set. Fixes: `3fdc13c7a3` ("drm/i915: Remove (struct_mutex) locking for busy-ioctl") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470762505-12799-1-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2016-08-10 10:37:21 +01:00
Chris Wilson	edf6b76f64	drm/i915: Add smp_rmb() to busy ioctl's RCU dance In the debate as to whether the second read of active->request is ordered after the dependent reads of the first read of active->request, just give in and throw a smp_rmb() in there so that ordering of loads is assured. v2: Explain the manual smp_rmb() Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1470731014-6894-1-git-send-email-chris@chris-wilson.co.uk	2016-08-09 10:17:26 +01:00
Chris Wilson	87b723a16d	drm/i915: Don't check for idleness before retiring after a GPU hang When we force the cleanup after a GPU hang, we want to retire all requests, or else we may leak them if truly wedged (and the GPU never advances again). Converting to the active request helpers had the issue of doing the check against busyness before reporting the request, so if we claim the GPU had hung but this engine hadn't we could potential skip the request cleanup - triggering the self-check BUG. Fixes: `dcff85c844` ("drm/i915: Enable i915_gem_wait_for_idle() ...") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1470728222-10243-3-git-send-email-chris@chris-wilson.co.uk	2016-08-09 09:11:59 +01:00
Chris Wilson	3e510a8e65	drm/i915: Repack fence tiling mode and stride into a single integer In the previous commit, we moved the obj->tiling_mode out of a bitfield and into its own integer so that we could safely use READ_ONCE(). Let us now repair some of that damage by sharing the tiling_mode with its companion, the fence stride. v2: New magic Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-18-git-send-email-chris@chris-wilson.co.uk	2016-08-05 10:54:43 +01:00
Chris Wilson	e883d73503	drm/i915: Remove pinned check from madvise ioctl We don't need to incur the overhead of checking whether the object is pinned prior to changing its madvise. If the object is pinned, the madvise will not take effect until it is unpinned and so we cannot free the pages being pointed at by hardware. Marking a pinned object with allocated pages as DONTNEED will not trigger any undue warnings. The check is therefore superfluous, and by removing it we can remove a linear walk over all the vma the object has. Still despite it being an overzealous check, that error code is part of the current ABI and so we must proceed with caution. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-15-git-send-email-chris@chris-wilson.co.uk	2016-08-05 10:54:41 +01:00
Chris Wilson	c21724cc4d	drm/i915: Reduce locking inside swfinish ioctl We only need to take the struct_mutex if the object is pinned to the display engine and so requires checking for clflush. (The race with userspace pinning the object to a framebuffer is irrelevant.) v2: Use access once for compiler hints (or not as it is a bitfield) v3: READ_ONCE, obj->pin_display is not a bitfield anymore v4: Don't be creative with goto. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-14-git-send-email-chris@chris-wilson.co.uk	2016-08-05 10:54:41 +01:00
Chris Wilson	3fdc13c7a3	drm/i915: Remove (struct_mutex) locking for busy-ioctl By applying the same logic as for wait-ioctl, we can query whether a request has completed without holding struct_mutex. The biggest impact system-wide is removing the flush_active and the contention that causes. Testcase: igt/gem_busy Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Akash Goel <akash.goel@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-13-git-send-email-chris@chris-wilson.co.uk	2016-08-05 10:54:40 +01:00
Chris Wilson	033d549b81	drm/i915: Remove (struct_mutex) locking for wait-ioctl With a bit of care (and leniency) we can iterate over the object and wait for previous rendering to complete with judicial use of atomic reference counting. The ABI requires us to ensure that an active object is eventually flushed (like the busy-ioctl) which is guaranteed by our management of requests (i.e. everything that is submitted to hardware is flushed in the same request). All we have to do is ensure that we can detect when the requests are complete for reporting when the object is idle (without triggering ETIME), locklessly - this is handled by i915_gem_active_wait_unlocked(). The impact of this is actually quite small - the return to userspace following the wait was already lockless and so we don't see much gain in latency improvement upon completing the wait. What we do achieve here is completing an already finished wait without hitting the struct_mutex, our hold is quite short and so we are typically just a victim of contention rather than a cause - but it is still one less contention point! v2: Break up a long line. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-12-git-send-email-chris@chris-wilson.co.uk	2016-08-05 10:54:40 +01:00
Chris Wilson	258a5edee0	drm/i915: Do a nonblocking wait first in pread/pwrite If we try and read or write to an active request, we first must wait upon the GPU completing that request. Let's do that without holding the mutex (and so allow someone else to access the GPU whilst we wait). Upon completion, we will acquire the mutex and only then start the operation (i.e. we do not rely on state from before the initial wait). v2: Repaint the goto labels v3: Move the tracepoints back to the start of the ioctls Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-11-git-send-email-chris@chris-wilson.co.uk	2016-08-05 10:54:39 +01:00
Chris Wilson	f3f6184c5f	drm/i915: Tidy generation of the GTT mmap offset If we make the observation that mmap-offsets are only released when we free an object, we can then deduce that the shrinker only creates free space in the mmap arena indirectly by flushing the request list and freeing expired objects. If we combine this with the lockless vma-manager and lockless idling, we can avoid taking our big struct_mutex until we need to actually free the requests. One side-effect is that we defer the madvise checking until we need the pages (i.e. the fault handler). This brings us into line with the other delayed checks (and madvise in general). v2: s/ret/err/ and use if (!err) rather than if (ret == 0) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-9-git-send-email-chris@chris-wilson.co.uk	2016-08-05 10:54:38 +01:00
Chris Wilson	dcff85c844	drm/i915: Enable i915_gem_wait_for_idle() without holding struct_mutex The principal motivation for this was to try and eliminate the struct_mutex from i915_gem_suspend - but we still need to hold the mutex current for the i915_gem_context_lost(). (The issue there is that there may be an indirect lockdep cycle between cpu_hotplug (i.e. suspend) and struct_mutex via the stop_machine().) For the moment, enabling last request tracking for the engine, allows us to do busyness checking and waiting without requiring the struct_mutex - which is useful in its own right. As a side-effect of having a robust means for tracking engine busyness, we can replace our other busyness heuristic, that of comparing against the last submitted seqno. For paranoid reasons, we have a semi-ordered check of that seqno inside the hangchecker, which we can now improve to an ordered check of the engine's busyness (removing a locked xchg in the process). v2: Pass along "bool interruptible" as being unlocked we cannot rely on i915->mm.interruptible being stable or even under our control. v3: Replace check Ironlake i915_gpu_busy() with the common precalculated value Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-6-git-send-email-chris@chris-wilson.co.uk	2016-08-05 10:54:37 +01:00
Chris Wilson	90f4fcd56b	drm/i915: Remove forced stop ring on suspend/unload Before suspending (or unloading), we would first wait upon all rendering to be completed and then disable the rings. This later step is a remanent from DRI1 days when we did not use request tracking for all operations upon the ring. Now that we are sure we are waiting upon the very last operation by the engine, we can forgo clobbering the ring registers, though we do keep the assert that the engine is indeed idle before sleeping. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-5-git-send-email-chris@chris-wilson.co.uk	2016-08-05 10:54:36 +01:00
Chris Wilson	b8f9096d6a	drm/i915: Convert non-blocking waits for requests over to using RCU We can completely avoid taking the struct_mutex around the non-blocking waits by switching over to the RCU request management (trading the mutex for a RCU read lock and some complex atomic operations). The improvement is that we gain further contention reduction, and overall the code become simpler due to the reduced mutex dancing. v2: Move i915_gem_fault tracepoint back to the start of the function, before the unlocked wait. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-2-git-send-email-chris@chris-wilson.co.uk	2016-08-05 10:54:34 +01:00
Chris Wilson	0eafec6d32	drm/i915: Enable lockless lookup of request tracking via RCU If we enable RCU for the requests (providing a grace period where we can inspect a "dead" request before it is freed), we can allow callers to carefully perform lockless lookup of an active request. However, by enabling deferred freeing of requests, we can potentially hog a lot of memory when dealing with tens of thousands of requests per second - with a quick insertion of a synchronize_rcu() inside our shrinker callback, that issue disappears. v2: Currently, it is our responsibility to handle reclaim i.e. to avoid hogging memory with the delayed slab frees. At the moment, we wait for a grace period in the shrinker, and block for all RCU callbacks on oom. Suggested alternatives focus on flushing our RCU callback when we have a certain number of outstanding request frees, and blocking on that flush after a second high watermark. (So rather than wait for the system to run out of memory, we stop issuing requests - both are nondeterministic.) Paul E. McKenney wrote: Another approach is synchronize_rcu() after some largish number of requests. The advantage of this approach is that it throttles the production of callbacks at the source. The corresponding disadvantage is that it slows things up. Another approach is to use call_rcu(), but if the previous call_rcu() is still in flight, block waiting for it. Yet another approach is the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The idea is to do something like this: cond_synchronize_rcu(cookie); cookie = get_state_synchronize_rcu(); You would of course do an initial get_state_synchronize_rcu() to get things going. This would not block unless there was less than one grace period's worth of time between invocations. But this assumes a busy system, where there is almost always a grace period in flight. But you can make that happen as follows: cond_synchronize_rcu(cookie); cookie = get_state_synchronize_rcu(); call_rcu(&my_rcu_head, noop_function); Note that you need additional code to make sure that the old callback has completed before doing a new one. Setting and clearing a flag with appropriate memory ordering control suffices (e.g,. smp_load_acquire() and smp_store_release()). v3: More comments on compiler and processor order of operations within the RCU lookup and discover we can use rcu_access_pointer() here instead. v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: "Goel, Akash" <akash.goel@intel.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk	2016-08-04 20:20:05 +01:00
Chris Wilson	00e60f2659	drm/i915: Move i915_gem_object_wait_rendering() Just move it earlier so that we can use the companion nonblocking version in a couple of more callsites without having to add a forward declaration. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-24-git-send-email-chris@chris-wilson.co.uk	2016-08-04 20:20:05 +01:00

1 2 3 4 5 ...

1499 Commits