OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Ben Widawsky	27173f1f95	drm/i915: Convert execbuf code to use vmas In order to transition more of our code over to using a VMA instead of an <OBJ, VM> pair - we must have the vma accessible at execbuf time. Up until now, we've only had a VMA when actually binding an object. The previous patch helped handle the distinction on bound vs. unbound. This patch will help us catch leaks, and other issues before we actually shuffle a bunch of stuff around. This attempts to convert all the execbuf code to speak in vmas. Since the execbuf code is very self contained it was a nice isolated conversion. The meat of the code is about turning eb_objects into eb_vma, and then wiring up the rest of the code to use vmas instead of obj, vm pairs. Unfortunately, to do this, we must move the exec_list link from the obj structure. This list is reused in the eviction code, so we must also modify the eviction code to make this work. WARNING: This patch makes an already hotly profiled path slower. The cost is unavoidable. In reply to this mail, I will attach the extra data. v2: Release table lock early, and two a 2 phase vma lookup to avoid having to use a GFP_ATOMIC. (Chris) v3: s/obj_exec_list/obj_exec_link/ Updates to address commit `6d2b888569` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Aug 7 18:30:54 2013 +0100 drm/i915: List objects allocated from stolen memory in debugfs v4: Use obj = vma->obj for neatness in some places (Chris) need_reloc_mappable() should return false if ppgtt (Chris) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Split out prep patches. Also remove a FIXME comment which is now taken care of.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-09-04 17:34:41 +02:00
Daniel Vetter	d4d36014ca	drm/i915: fix up the relocate_entry refactoring Somehow we've lost the error handling in the patch split-up between the internal and external patch. This regression has been introduced in commit `5032d871f7` Author: Rafael Barbalho <rafael.barbalho@intel.com> Date: Wed Aug 21 17:10:51 2013 +0100 drm/i915: Cleaning up the relocate entry function This bug is exercised by igt/gem_reloc_vs_gpu/interruptible. Cc: Rafael Barbalho <rafael.barbalho@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-09-03 19:18:01 +02:00
Rafael Barbalho	5032d871f7	drm/i915: Cleaning up the relocate entry function As the relocate entry function was getting a bit too big I've moved the code that used to use either the cpu or the gtt to for the relocation into two separate functions. Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-23 14:52:31 +02:00
Chris Wilson	000433b67e	drm/i915: Only do a chipset flush after a clflush Now that we skip clflushes more often, return a boolean indicating whether the clflush was actually performed, and only if it was do the chipset flush. (Though on most of the architectures where the clflush will be skipped, the chipset flush is a no-op!) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-22 13:31:34 +02:00
Chris Wilson	2c22569bba	drm/i915: Update rules for writing through the LLC with the cpu As mentioned in the previous commit, reads and writes from both the CPU and GPU go through the LLC. This gives us coherency between the CPU and GPU irrespective of the attribute settings either device sets. We can use to avoid having to clflush even uncached memory. Except for the scanout. The scanout resides within another functional block that does not use the LLC but reads directly from main memory. So in order to maintain coherency with the scanout, writes to uncached memory must be flushed. In order to optimize writes elsewhere, we start tracking whether an framebuffer is attached to an object. v2: Use pin_display tracking rather than fb_count (to ensure we flush cursors as well etc) and only force the clflush along explicit writes to the scanout paths (i.e. pin_to_display_plane and pwrite into scanout). v3: Force the flush after hitting the slowpath in pwrite, as after dropping the lock the object's cache domain may be invalidated. (Ville) Based on a patch by Ville Syrjälä. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-10 11:20:49 +02:00
Ben Widawsky	ca191b1313	drm/i915: mm_list is per VMA formerly: "drm/i915: Create VMAs (part 5) - move mm_list" The mm_list is used for the active/inactive LRUs. Since those LRUs are per address space, the link should be per VMx . Because we'll only ever have 1 VMA before this point, it's not incorrect to defer this change until this point in the patch series, and doing it here makes the change much easier to understand. Shamelessly manipulated out of Daniel: "active/inactive stuff is used by eviction when we run out of address space, so needs to be per-vma and per-address space. Bound/unbound otoh is used by the shrinker which only cares about the amount of memory used and not one bit about in which address space this memory is all used in. Of course to actual kick out an object we need to unbind it from every address space, but for that we have the per-object list of vmas." v2: only bump GGTT LRU in i915_gem_object_set_to_gtt_domain (Chris) v3: Moved earlier in the series v4: Add dropped message from v3 Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Frob patch to apply and use vma->node.size directly as discused with Ben. Also drop a needles BUG_ON before move_to_inactive, the function itself has the same check.] [danvet 2nd: Rebase on top of the lost "drm/i915: Cleanup more of VMA in destroy", specifically unlink the vma from the mm_list in vma_unbind (to keep it symmetric with bind_to_vm) instead of vma_destroy.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-08 14:06:58 +02:00
Ben Widawsky	9843877d10	drm/i915: turn bound_ggtt checks to bound_any In some places, we want to know if an object is bound in any address space, and not just the global GTT. This often applies when there is a single global resource (object, pages, etc.) function \| reason -------------------------------------------------- i915_gem_object_is_inactive \| global object i915_gem_object_put_pages \| object's pages 915_gem_object_unpin \| global object i915_gem_execbuffer_unreserve_object \| temporary until we plumb vma pread/pwrite \| see the note below Note: set_to_gtt_domain in pwrite/pread is abused as a wait_rendering call - but that once only worked if the object is bound. We really should replace this with a plain wait_rendering call, which would have the upside that in pread it would be clearer that we actually only wait for oustanding gpu writes. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Explain the set_to_gtt_domain in pwrite/pread and volunteer Ben to replace those with wait_rendering calls.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-08 14:04:43 +02:00
Ben Widawsky	07fe0b1280	drm/i915: plumb VM into bind/unbind code As alluded to in several patches, and it will be reiterated later... A VMA is an abstraction for a GEM BO bound into an address space. Therefore it stands to reason, that the existing bind, and unbind are the ones which will be the most impacted. This patch implements this, and updates all callers which weren't already updated in the series (because it was too messy). This patch represents the bulk of an earlier, larger patch. I've pulled out a bunch of things by the request of Daniel. The history is preserved for posterity with the email convention of ">" One big change from the original patch aside from a bunch of cropping is I've created an i915_vma_unbind() function. That is because we always have the VMA anyway, and doing an extra lookup is useful. There is a caveat, we retain an i915_gem_object_ggtt_unbind, for the global cases which might not talk in VMAs. > drm/i915: plumb VM into object operations > > This patch was formerly known as: > "drm/i915: Create VMAs (part 3) - plumbing" > > This patch adds a VM argument, bind/unbind, and the object > offset/size/color getters/setters. It preserves the old ggtt helper > functions because things still need, and will continue to need them. > > Some code will still need to be ported over after this. > > v2: Fix purge to pick an object and unbind all vmas > This was doable because of the global bound list change. > > v3: With the commit to actually pin/unpin pages in place, there is no > longer a need to check if unbind succeeded before calling put_pages(). > Make put_pages only BUG() after checking pin count. > > v4: Rebased on top of the new hangcheck work by Mika > plumbed eb_destroy also > Many checkpatch related fixes > > v5: Very large rebase > > v6: > Change BUG_ON to WARN_ON (Daniel) > Rename vm to ggtt in preallocate stolen, since it is always ggtt when > dealing with stolen memory. (Daniel) > list_for_each will short-circuit already (Daniel) > remove superflous space (Daniel) > Use per object list of vmas (Daniel) > Make obj_bound_any() use obj_bound for each vm (Ben) > s/bind_to_gtt/bind_to_vm/ (Ben) > > Fixed up the inactive shrinker. As Daniel noticed the code could > potentially count the same object multiple times. While it's not > possible in the current case, since 1 object can only ever be bound into > 1 address space thus far - we may as well try to get something more > future proof in place now. With a prep patch before this to switch over > to using the bound list + inactive check, we're now able to carry that > forward for every address space an object is bound into. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Rebase on top of the loss of "drm/i915: Cleanup more of VMA in destroy".] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-08 14:04:20 +02:00
Ben Widawsky	28d6a7bfa2	drm/i915: thread address space through execbuf This represents the first half of hooking up VMs to execbuf. Here we basically pass an address space all around to the different internal functions. It should be much more readable, and have less risk than the second half, which begins switching over to using VMAs instead of an obj,vm. The overall series echoes this style of, "add a VM, then make it smart later" Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Switch a BUG_ON to WARN_ON.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-05 19:04:11 +02:00
Ben Widawsky	c37e220461	drm/i915: Add VM to pin To verbalize it, one can say, "pin an object into the given address space." The semantics of pinning remain the same otherwise. Certain objects will always have to be bound into the global GTT. Therefore, global GTT is a special case, and keep a special interface around for it (i915_gem_obj_ggtt_pin). v2: s/i915_gem_ggtt_pin/i915_gem_obj_ggtt_pin Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-05 19:04:09 +02:00
Chris Wilson	de51f04f06	drm/i915: Replace open-coded offset_in_page() Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-26 19:45:11 +02:00
Xiong Zhang	0b74b508f7	drm/i915: add prefault_disable module option prefault is stll enabled by default which prevent most of pwrite/pread/reloc from running slow path, in order to verify these slow pathes, prefault need to be disabled. Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com> [danvet: Make checkpatch happy and bikeshed the module option help text a bit.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-19 09:29:26 +02:00
Chris Wilson	e852096986	drm/i915: Replace open-coding of DEFAULT_CONTEXT_ID The intent of the check is made more clear if we use the proper name for 0 here. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-16 10:40:40 +02:00
Daniel Vetter	db1b76ca6a	drm/i915: don't frob mm.suspended when not using ums In kernel modeset driver mode we're in full control of the chip, always. So there's no need at all to set mm.suspended in i915_gem_idle. Hence move that out into the leavevt ioctl. Since i915_gem_idle doesn't suspend gem any more we can also drop the re-enabling for KMS in the thaw function. Also clean up the handling of mm.suspend at driver load by coalescing all the assignments. Stumbled over while reading through our resume code for unrelated reasons. v2: Shovel mm.suspended into the (newly created) ums dungeon as suggested by Chris Wilson. The plan is that once we've completely stopped relying on the register save/restore code we could shovel even that in there. v3: Improve the locking for the entervt/leavevt ioctls a bit by moving the dev->struct_mutex locking outside of i915_gem_idle. Also don't clear dev_priv->ums.mm_suspended for the kms case, we allocate it with kzalloc. Both suggested by Chris Wilson. Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2) Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-10 14:30:25 +02:00
Ben Widawsky	f343c5f647	drm/i915: Getter/setter for object attributes Soon we want to gut a lot of our existing assumptions how many address spaces an object can live in, and in doing so, embed the drm_mm_node in the object (and later the VMA). It's possible in the future we'll want to add more getter/setter methods, but for now this is enough to enable the VMAs. v2: Reworked commit message (Ben) Added comments to the main functions (Ben) sed -i "s/i915_gem_obj_set_color/i915_gem_obj_ggtt_set_color/" drivers/gpu/drm/i915/.[ch] sed -i "s/i915_gem_obj_bound/i915_gem_obj_ggtt_bound/" drivers/gpu/drm/i915/.[ch] sed -i "s/i915_gem_obj_size/i915_gem_obj_ggtt_size/" drivers/gpu/drm/i915/.[ch] sed -i "s/i915_gem_obj_offset/i915_gem_obj_ggtt_offset/" drivers/gpu/drm/i915/.[ch] (Daniel) v3: Rebased on new reserve_node patch Changed DRM_DEBUG_KMS to actually work (will need fixing later) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-08 22:04:34 +02:00
Mika Kuoppala	7d736f4f0b	drm/i915: add batch bo to i915_add_request() In order to track down a batch buffer and context which caused the ring to hang, store reference to bo into the request struct. Request can also cause gpu to hang after the batch in the flush section in the ring. To detect this add start of the flush portion offset into the request. v2: Included comment about request vs batch_obj lifetimes (Chris Wilson) Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-06-13 17:42:16 +02:00
Mika Kuoppala	0025c0772d	drm/i915: change i915_add_request to macro Only execbuffer needed all the parameters on i915_add_request(). By putting __i915_add_request behind macro, all current callsites become cleaner. Following patch will introduce a new parameter for __i915_add_request. With this patch, only the relevant callsite will reflect the change making commit smaller and easier to understand. v2: _i915_add_request as function name (Chris Wilson) v3: change name __i915_add_request and fix ordering of params (Ben Widawsky) Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-06-13 17:42:15 +02:00
Chris Wilson	c65355bbef	drm/i915: Track when we dirty the scanout with render commands This is required for tracking render damage for use with FBC and will be used in subsequent patches. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-06-07 17:56:45 +02:00
Xiang, Haihao	82f91b6e93	drm/i915: add I915_EXEC_VEBOX to i915_gem_do_execbuffer() A user can run batchbuffer via VEBOX ring. Signed-off-by: Xiang, Haihao <haihao.xiang@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-05-31 20:54:21 +02:00
Dave Airlie	399403c7ce	Merge tag 'drm-intel-next-2013-03-23' of git://people.freedesktop.org/~danvet/drm-intel into drm-next Daniel writes: Highlights: - Imre's for_each_sg_pages rework (now also with the stolen mem backed case fixed with a hack) plus the drm prime sg list coalescing patch from Rahul Sharma. I have some follow-up cleanups pending, already acked by Andrew Morton. - Some prep-work for the crazy no-pch/display-less platform by Ben. - Some vlv patches, by far not all (Jesse et al). - Clean up the HDMI/SDVO #define confusion (Paulo) - gen2-4 vblank fixes from Ville. - Unclaimed register warning fixes for hsw (Paulo). More still to come ... - Complete pageflips which have been stuck in a gpu hang, should prevent stuck gl compositors (Ville). - pm patches for vt-switchless resume (Jesse). Note that the i915 enabling is not (yet) included, that took a bit longer to settle. PM patches are acked by Rafael Wysocki. - Minor fixlets all over from various people. * tag 'drm-intel-next-2013-03-23' of git://people.freedesktop.org/~danvet/drm-intel: (79 commits) drm/i915: Implement WaSwitchSolVfFArbitrationPriority drm/i915: Set the VIC in AVI infoframe for SDVO drm/i915: Kill a strange comment about DPMS functions drm/i915: Correct sandybrige overclocking drm/i915: Introduce GEN7_FEATURES for device info drm/i915: Move num_pipes to intel info drm/i915: fixup pd vs pt confusion in gen6 ppgtt code style nit: Align function parameter continuation properly. drm/i915: VLV doesn't have HDMI on port C drm/i915: DSPFW and BLC regs are in the display offset range drm/i915: set conservative clock gating values on VLV v2 drm/i915: fix WaDisablePSDDualDispatchEnable on VLV v2 drm/i915: add more VLV IDs drm/i915: use VLV DIP routines on VLV v2 drm/i915: add media well to VLV force wake routines v2 drm/i915: don't use plane pipe select on VLV drm: modify pages_to_sg prime helper to create optimized SG table drm/i915: use for_each_sg_page for setting up the gtt ptes drm/i915: create compact dma scatter lists for gem objects drm/i915: handle walking compact dma scatter lists ...	2013-04-05 10:18:13 +10:00
Lauri Kasanen	27b7c63a7c	drm/i915: Fix build failure ERROR: "__build_bug_on_failed" [drivers/gpu/drm/i915/i915.ko] undefined! Originally reported at http://www.gossamer-threads.com/lists/linux/kernel/1631803 FDO bug #62775 This needs to be backported to both 3.7 and 3.8 stable trees. Doesn't apply straight, but it's a quick change. Signed-off-by: Lauri Kasanen <cand@gmx.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62775 Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-03-27 15:05:42 +01:00
Ben Widawsky	41fda59682	drm/i915: Remove unneeded dev argument Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-03-18 03:03:19 +01:00
Ben Widawsky	cf144969d5	drm/i915: Remove unused file arg from execbuf Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-03-18 03:02:53 +01:00
Kees Cook	3118a4f652	drm/i915: bounds check execbuffer relocation count It is possible to wrap the counter used to allocate the buffer for relocation copies. This could lead to heap writing overflows. CVE-2013-0913 v3: collapse test, improve comment v2: move check into validate_exec_list Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Pinkie Pie Cc: stable@vger.kernel.org Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-03-13 21:31:03 +01:00
Kees Cook	3058753583	drm/i915: clarify reasoning for the access_ok call This clarifies the comment above the access_ok check so a missing VERIFY_READ doesn't alarm anyone. v2: - rewrote comment, thanks to Chris Wilson Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> [danvet: add patch history log to commit message.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-03-13 21:17:28 +01:00
Ville Syrjälä	2bb4629add	drm/i915: Add to_user_ptr() to_user_ptr() simply casts a pointer passed as u64 from user space to void __user * correctly. Using this lets us get rid of all the tiresome casts. The idea came from Chris Wilson <chris@chris-wilson.co.uk>. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-03-03 19:49:11 +01:00
Dave Airlie	6dc1c49da6	Merge branch 'fbcon-locking-fixes' of ssh://people.freedesktop.org/~airlied/linux into drm-next This pulls in most of Linus tree up to -rc6, this fixes the worst lockdep reported issues and re-enables fbcon lockdep. (not the fbcon maintainer) * 'fbcon-locking-fixes' of ssh://people.freedesktop.org/~airlied/linux: (529 commits) Revert "Revert "console: implement lockdep support for console_lock"" fbcon: fix locking harder fb: Yet another band-aid for fixing lockdep mess fb: rework locking to fix lock ordering on takeover	2013-02-08 12:10:18 +10:00
Ben Widawsky	5d4545aef5	drm/i915: Create a gtt structure The purpose of the gtt structure is to help isolate our gtt specific properties from the rest of the code (in doing so it help us finish the isolation from the AGP connection). The following members are pulled out (and renamed): gtt_start gtt_total gtt_mappable_end gtt_mappable gtt_base_addr gsm The gtt structure will serve as a nice place to put gen specific gtt routines in upcoming patches. As far as what else I feel belongs in this structure: it is meant to encapsulate the GTT's physical properties. This is why I've not added fields which track various drm_mm properties, or things like gtt_mtrr (which is itself a pretty transient field). Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> [Ben modified commit messages] Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-01-17 22:33:56 +01:00
Chris Wilson	eef90ccb8a	drm/i915: Use the reloc.handle as an index into the execbuffer array Using copywinwin10 as an example that is dependent upon emitting a lot of relocations (2 per operation), we see improvements of: c2d/gm45: 618000.0/sec to 623000.0/sec. i3-330m: 748000.0/sec to 789000.0/sec. (measured relative to a baseline with neither optimisations applied). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-01-17 22:23:47 +01:00
Daniel Vetter	ed5982e6ce	drm/i915: Allow userspace to hint that the relocations were known Userspace is able to hint to the kernel that its command stream and auxiliary state buffers already hold the correct presumed addresses and so the relocation process may be skipped if the kernel does not need to move any buffers in preparation for the execbuffer. Thus for the common case where the allotment of buffers is static between batches, we can avoid the overhead of individually checking the relocation entries. Note that this requires userspace to supply the domain tracking and requests for workarounds itself that would otherwise be computed based upon the relocation entries. Using copywinwin10 as an example that is dependent upon emitting a lot of relocations (2 per operation), we see improvements of: c2d/gm45: 618000.0/sec to 632000.0/sec. i3-330m: 748000.0/sec to 830000.0/sec. (measured relative to a baseline with neither optimisations applied). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Imre Deak <imre.deak@intel.com> [danvet: Fixup merge conflict in userspace header due to different baseline trees.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-01-17 22:23:36 +01:00
Chris Wilson	bcffc3faa6	drm/i915: Move the execbuffer objects list from the stack into the tracker Instead of passing around the eb-objects hashtable and a separate object list, we can include the object list into the eb-objects structure for convenience. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-01-17 22:08:02 +01:00
Chris Wilson	3b96eff447	drm/i915: Take the handle idr spinlock once for looking up the exec objects Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-01-17 22:08:01 +01:00
Chris Wilson	419fa72a19	drm/i915: Mark a temporary allocation for copy-from-user as such The difference is that the kernel will then know that this memory will be reclaimable in the near future. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-01-17 22:08:00 +01:00
Dave Airlie	b5cc6c0387	Merge tag 'drm-intel-next-2012-12-21' of git://people.freedesktop.org/~danvet/drm-intel into drm-next Daniel writes: - seqno wrap fixes and debug infrastructure from Mika Kuoppala and Chris Wilson - some leftover kill-agp on gen6+ patches from Ben - hotplug improvements from Damien - clear fb when allocated from stolen, avoids dirt on the fbcon (Chris) - Stolen mem support from Chris Wilson, one of the many steps to get to real fastboot support. - Some DDI code cleanups from Paulo. - Some refactorings around lvds and dp code. - some random little bits&pieces * tag 'drm-intel-next-2012-12-21' of git://people.freedesktop.org/~danvet/drm-intel: (93 commits) drm/i915: Return the real error code from intel_set_mode() drm/i915: Make GSM void drm/i915: Move GSM mapping into dev_priv drm/i915: Move even more gtt code to i915_gem_gtt drm/i915: Make next_seqno debugs entry to use i915_gem_set_seqno drm/i915: Introduce i915_gem_set_seqno() drm/i915: Always clear semaphore mboxes on seqno wrap drm/i915: Initialize hardware semaphore state on ring init drm/i915: Introduce ring set_seqno drm/i915: Missed conversion to gtt_pte_t drm/i915: Bug on unsupported swizzled platforms drm/i915: BUG() if fences are used on unsupported platform drm/i915: fixup overlay stolen memory leak drm/i915: clean up PIPECONF bpc #defines drm/i915: add intel_dp_set_signal_levels drm/i915: remove leftover display.update_wm assignment drm/i915: check for the PCH when setting pch_transcoder drm/i915: Clear the stolen fb before enabling drm/i915: Access to snooped system memory through the GTT is incoherent drm/i915: Remove stale comment about intel_dp_detect() ... Conflicts: drivers/gpu/drm/i915/intel_display.c	2013-01-17 20:34:08 +10:00
Chris Wilson	262b6d363f	drm/i915: Invalidate the relocation presumed_offsets along the slow path In the slow path, we are forced to copy the relocations prior to acquiring the struct mutex in order to handle pagefaults. We forgo copying the new offsets back into the relocation entries in order to prevent a recursive locking bug should we trigger a pagefault whilst holding the mutex for the reservations of the execbuffer. Therefore, we need to reset the presumed_offsets just in case the objects are rebound back into their old locations after relocating for this exexbuffer - if that were to happen we would assume the relocations were valid and leave the actual pointers to the kernels dangling, instant hang. Fixes regression from commit `bcf50e2775` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sun Nov 21 22:07:12 2010 +0000 drm/i915: Handle pagefaults in execbuffer user relocations Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55984 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@fwll.ch> Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-01-16 10:53:38 +01:00
Daniel Vetter	b45305fce5	drm/i915: Implement workaround for broken CS tlb on i830/845 Now that Chris Wilson demonstrated that the key for stability on early gen 2 is to simple _never_ exchange the physical backing storage of batch buffers I've tried a stab at a kernel solution. Doesn't look too nefarious imho, now that I don't try to be too clever for my own good any more. v2: After discussing the various techniques, we've decided to always blit batches on the suspect devices, but allow userspace to opt out of the kernel workaround assume full responsibility for providing coherent batches. The principal reason is that avoiding the blit does improve performance in a few key microbenchmarks and also in cairo-trace replays. Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: - Drop the hunk which uses HAS_BROKEN_CS_TLB to implement the ring wrap w/a. Suggested by Chris Wilson. - Also add the ACTHD check from Chris Wilson for the error state dumping, so that we still catch batches when userspace opts out of the w/a.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-12-17 17:27:02 +01:00
Chris Wilson	c1f093e09c	drm/i915: Remove check for conflicting relocation write-domains Simply use the last write-domain set for the object in the batch, trusting userspace to have correctly flushed the caches between usage as a write target. This check dates back from the golden age of having only a single operation per batch with the kernel repeating it for each cliprect, and conflicts both with userspace trying to efficiently batch multiple operations and with reducing the kernel overhead of relocation processing. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-12-03 20:13:16 +01:00
Ville Syrjälä	ca9c46c5c7	drm/i915: Kill i915_gem_execbuffer_wait_for_flips() As per Chris Wilson's suggestion make i915_gem_execbuffer_wait_for_flips() go away. This was used to stall the GPU ring while there are pending page flips involving the relevant BO. Ie. while the BO is still being scanned out by the display controller. The recommended alternative is to use the page flip events to wait for the page flips to fully complete before reusing the BO of the old front buffer. Or use more buffers. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Kristian Høgsberg <krh@bitplanet.net> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> [danvet: don't remove obj->pending_flips, still required due to reorder patches.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-11-29 11:58:15 +01:00
Chris Wilson	9d7730914f	drm/i915: Preallocate next seqno before touching the ring Based on the work by Mika Kuoppala, we realised that we need to handle seqno wraparound prior to committing our changes to the ring. The most obvious point then is to grab the seqno inside intel_ring_begin(), and then to reuse that seqno for all ring operations until the next request. As intel_ring_begin() can fail, the callers must already be prepared to handle such failure and so we can safely add further checks. This patch looks like it should be split up into the interface changes and the tweaks to move seqno wrapping from the execbuffer into the core seqno increment. However, I found no easy way to break it into incremental steps without introducing further broken behaviour. v2: Mika found a silly mistake and a subtle error in the existing code; inside i915_gem_retire_requests() we were resetting the sync_seqno of the target ring based on the seqno from this ring - which are only related by the order of their allocation, not retirement. Hence we were applying the optimisation that the rings were synchronised too early, fortunately the only real casualty there is the handling of seqno wrapping. v3: Do not forget to reset the sync_seqno upon module reinitialisation, ala resume. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=863861 Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> [v2] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-11-29 11:43:52 +01:00
Chris Wilson	be7cb6347e	drm/i915: Remove bogus test for a present execbuffer The intention of checking obj->gtt_offset!=0 is to verify that the target object was listed in the execbuffer and had been bound into the GTT. This is guarranteed by the earlier rearrangement to split the execbuffer operation into reserve and relocation phases and then verified by the check that the target handle had been processed during the reservation phase. However, the actual checking of obj->gtt_offset==0 is bogus as we can indeed reference an object at offset 0. For instance, the framebuffer installed by the BIOS often resides at offset 0 - causing EINVAL as we legimately try to render using the stolen fb. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-11-21 17:45:03 +01:00
Ben Widawsky	e76e9aebcd	drm/i915: Stop using AGP layer for GEN6+ As a quick hack we make the old intel_gtt structure mutable so we can fool a bunch of the existing code which depends on elements in that data structure. We can/should try to remove this in a subsequent patch. This should preserve the old gtt init behavior which upon writing these patches seems incorrect. The next patch will fix these things. The one exception is VLV which doesn't have the preserved flush control write behavior. Since we want to do that for all GEN6+ stuff, we'll handle that in a later patch. Mainstream VLV support doesn't actually exist yet anyway. v2: Update the comment to remove the "voodoo" Check that the last pte written matches what we readback v3: actually kill cache_level_to_agp_type since most of the flags will disappear in an upcoming patch v4: v3 was actually not what we wanted (Daniel) Make the ggtt bind assertions better and stricter (Chris) Fix some uncaught errors at gtt init (Chris) Some other random stuff that Chris wanted v5: check for i==0 in gen6_ggtt_bind_object to shut up gcc (Ben) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by [v4]: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Make the cache_level -> agp_flags conversion for pre-gen6 a tad more robust by mapping everything != CACHE_NONE to the cached agp flag - we have a 1:1 uncached mapping, but different modes of cacheable (at least on later generations). Suggested by Chris Wilson.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-11-11 23:51:42 +01:00
Daniel Vetter	c2fb791692	Linux 3.7-rc2 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJQgvdwAAoJEHm+PkMAQRiG+3AH/i2XsqqN3VctL0nnbWfvds+Q aKulfIdJTjKiVAsawPUtRqReZ8ijiebrgA/53lZLlrFOoPPQ5+LHmnSyQF6gErOY NuAE1lijXDRM1pwBlhvOBbAj26wUobGjqONFJ9OkKr758Ue8ds/Q7UdxyEgmYgmg tvVMzfRcICzryUV3PcqL+3cNPpCUdT6wGGRJ9DCv/jvGiWKExWhOle5oltrmxk+D NsqRcws5pEubfHE4J8BvNWr8lE1kHfYVhrJETiLJUiN2XAJcbI4Jy7rU/3EGteNS 0HMZdaPPjV874lohdM70X2225SbYrCVkAYB5hnZCTeC3tYyCawBBPMQoyAiOcmU= =+861 -----END PGP SIGNATURE----- Merge tag 'v3.7-rc2' into drm-intel-next-queued Linux 3.7-rc2 Backmerge to solve two ugly conflicts: - uapi. We've already added new ioctl definitions for -next. Do I need to say more? - wc support gtt ptes. We've had to revert this for snb+ for 3.7 and also fix a few other things in the code. Now we know how to make it work on snb+, but to avoid losing the other fixes do the backmerge first before re-enabling wc gtt ptes on snb+. And a few other minor things, among them git getting confused in intel_dp.c and seemingly causing a conflict out of nothing ... Conflicts: drivers/gpu/drm/i915/i915_reg.h drivers/gpu/drm/i915/intel_display.c drivers/gpu/drm/i915/intel_dp.c drivers/gpu/drm/i915/intel_modes.c include/drm/i915_drm.h Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-10-22 14:34:51 +02:00
Chris Wilson	d7d4eeddb8	drm/i915: Allow DRM_ROOT_ONLY\|DRM_MASTER to submit privileged batchbuffers With the introduction of per-process GTT space, the hardware designers thought it wise to also limit the ability to write to MMIO space to only a "secure" batch buffer. The ability to rewrite registers is the only way to program the hardware to perform certain operations like scanline waits (required for tear-free windowed updates). So we either have a choice of adding an interface to perform those synchronized updates inside the kernel, or we permit certain processes the ability to write to the "safe" registers from within its command stream. This patch exposes the ability to submit a SECURE batch buffer to DRM_ROOT_ONLY\|DRM_MASTER processes. v2: Haswell split up bit8 into a ppgtt bit (still bit8) and a security bit (bit 13, accidentally not set). Also add a comment explaining why secure batches need a global gtt binding. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1) [danvet: added hsw fixup.] Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-10-17 21:06:59 +02:00
Linus Torvalds	612a9aab56	Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux Pull drm merge (part 1) from Dave Airlie: "So first of all my tree and uapi stuff has a conflict mess, its my fault as the nouveau stuff didn't hit -next as were trying to rebase regressions out of it before we merged. Highlights: - SH mobile modesetting driver and associated helpers - some DRM core documentation - i915 modesetting rework, haswell hdmi, haswell and vlv fixes, write combined pte writing, ilk rc6 support, - nouveau: major driver rework into a hw core driver, makes features like SLI a lot saner to implement, - psb: add eDP/DP support for Cedarview - radeon: 2 layer page tables, async VM pte updates, better PLL selection for > 2 screens, better ACPI interactions The rest is general grab bag of fixes. So why part 1? well I have the exynos pull req which came in a bit late but was waiting for me to do something they shouldn't have and it looks fairly safe, and David Howells has some more header cleanups he'd like me to pull, that seem like a good idea, but I'd like to get this merge out of the way so -next dosen't get blocked." Tons of conflicts mostly due to silly include line changes, but mostly mindless. A few other small semantic conflicts too, noted from Dave's pre-merged branch. * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (447 commits) drm/nv98/crypt: fix fuc build with latest envyas drm/nouveau/devinit: fixup various issues with subdev ctor/init ordering drm/nv41/vm: fix and enable use of "real" pciegart drm/nv44/vm: fix and enable use of "real" pciegart drm/nv04/dmaobj: fixup vm target handling in preparation for nv4x pcie drm/nouveau: store supported dma mask in vmmgr drm/nvc0/ibus: initial implementation of subdev drm/nouveau/therm: add support for fan-control modes drm/nouveau/hwmon: rename pwm0* to pmw1* to follow hwmon's rules drm/nouveau/therm: calculate the pwm divisor on nv50+ drm/nouveau/fan: rewrite the fan tachometer driver to get more precision, faster drm/nouveau/therm: move thermal-related functions to the therm subdev drm/nouveau/bios: parse the pwm divisor from the perf table drm/nouveau/therm: use the EXTDEV table to detect i2c monitoring devices drm/nouveau/therm: rework thermal table parsing drm/nouveau/gpio: expose the PWM/TOGGLE parameter found in the gpio vbios table drm/nouveau: fix pm initialization order drm/nouveau/bios: check that fixed tvdac gpio data is valid before using it drm/nouveau: log channel debug/error messages from client object rather than drm client drm/nouveau: have drm debugging macros build on top of core macros ...	2012-10-03 23:29:23 -07:00
David Howells	760285e7e7	UAPI: (Scripted) Convert #include "..." to #include <path/...> in drivers/gpu/ Convert #include "..." to #include <path/...> in drivers/gpu/. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Dave Airlie <airlied@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>	2012-10-02 18:01:07 +01:00
David Howells	4126d5d61f	UAPI: (Scripted) Remove redundant DRM UAPI header #inclusions from drivers/gpu/. Remove redundant DRM UAPI header #inclusions from drivers/gpu/. Remove redundant #inclusions of core DRM UAPI headers (drm.h, drm_mode.h and drm_sarea.h). They are now #included via drmP.h and drm_crtc.h via a preceding patch. Without this patch and the patch to make include the UAPI headers from the core headers, after the UAPI split, the DRM C sources cannot find these UAPI headers because the DRM code relies on specific -I flags to make #include "..." work on headers in include/drm/ - but that does not work after the UAPI split without adding more -I flags. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Dave Airlie <airlied@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>	2012-10-02 18:01:05 +01:00
Chris Wilson	41783eea1a	drm/i915: Assert that the exec object lookup table is a power-of-two As we make the simplification of using a power-of-two size for the execbuffer handle-to-object TLB, we should validate that this is actually true and so clarify that premise. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-09-20 14:23:11 +02:00
Chris Wilson	ba7a64587b	drm/i915: Drop the misleading cast to the wrong user pointer type The exec_list is of type drm_i915_gem_exec_object2 and so casting it to a drm_i915_gem_relocation_entry is very confusing! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-09-20 14:23:05 +02:00
Chris Wilson	9da3da660d	drm/i915: Replace the array of pages with a scatterlist Rather than have multiple data structures for describing our page layout in conjunction with the array of pages, we can migrate all users over to a scatterlist. One major advantage, other than unifying the page tracking structures, this offers is that we replace the vmalloc'ed array (which can be up to a megabyte in size) with a chain of individual pages which helps reduce memory pressure. The disadvantage is that we then do not have a simple array to iterate, or to access randomly. The common case for this is in the relocation processing, which will typically fit within a single scatterlist page and so be almost the same cost as the simple array. For iterating over the array, the extra function call could be optimised away, but in reality is an insignificant cost of either binding the pages, or performing the pwrite/pread. v2: Fix drm_clflush_sg() to not invoke wbinvd as well! And fix the trivial compile error from rebasing. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-09-20 14:22:57 +02:00
Chris Wilson	7788a76520	drm/i915: Avoid unbinding due to an interrupted pin_and_fence during execbuffer If we need to stall in order to complete the pin_and_fence operation during execbuffer reservation, there is a high likelihood that the operation will be interrupted by a signal (thanks X!). In order to simplify the cleanup along that error path, the object was unconditionally unbound and the error propagated. However, being interrupted here is far more common than I would like and so we can strive to avoid the extra work by eliminating the forced unbind. v2: In discussion over the indecent colour of the new functions and unwind path, we realised that we can use the new unreserve function to clean up the code even further. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-08-24 21:02:51 +02:00
Chris Wilson	504c7267a1	drm/i915: Use cpu relocations if the object is in the GTT but not mappable This prevents the case of unbinding the object in order to process the relocations through the GTT and then rebinding it only to then proceed to use cpu relocations as the object is now in the CPU write domain. By choosing to use cpu relocations up front, we can therefore avoid the rebind penalty. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-08-24 02:29:16 +02:00
Chris Wilson	86a1ee26bb	drm/i915: Only pwrite through the GTT if there is space in the aperture Avoid stalling and waiting for the GPU by checking to see if there is sufficient inactive space in the aperture for us to bind the buffer prior to writing through the GTT. If there is inadequate space we will have to stall waiting for the GPU, and incur overheads moving objects about. Instead, only incur the clflush overhead on the target object by writing through shmem. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-08-24 02:03:33 +02:00
Chris Wilson	6c085a728c	drm/i915: Track unbound pages When dealing with a working set larger than the GATT, or even the mappable aperture when touching through the GTT, we end up with evicting objects only to rebind them at a new offset again later. Moving an object into and out of the GTT requires clflushing the pages, thus causing a double-clflush penalty for rebinding. To avoid having to clflush on rebinding, we can track the pages as they are evicted from the GTT and only relinquish those pages on memory pressure. As usual, if it were not for the handling of out-of-memory condition and having to manually shrink our own bo caches, it would be a net reduction of code. Alas. Note: The patch also contains a few changes to the last-hope evict_everything logic in i916_gem_execbuffer.c - we no longer try to only evict the purgeable stuff in a first try (since that's superflous and only helps in OOM corner-cases, not fragmented-gtt trashing situations). Also, the extraction of the get_pages retry loop from bind_to_gtt (and other callsites) to get_pages should imo have been a separate patch. v2: Ditch the newly added put_pages (for unbound objects only) in i915_gem_reset. A quick irc discussion hasn't revealed any important reason for this, so if we need this, I'd like to have a git blame'able explanation for it. v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Split out code movements and rant a bit in the commit message with a few Notes. Done v2] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-08-21 14:34:11 +02:00
Daniel Vetter	a22ddff8be	Linux 3.6-rc2 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iQEcBAABAgAGBQJQLWtvAAoJEHm+PkMAQRiG/DYH+wd0FqfEuYkYk4KPyAPuhKpX zX7HYfLvyJE/ZYIdrhjq1E6Xm2KNr7gtX7/Rdzi2W38M9sjbYzwG1UGIw51qnxWy yZJH9BGkfyQgQPeuDGohfB6DkDy2JWr2eqMDvakjOwgBsIzji0PQD/f3UvndhtUa c+tTj/kjavHE1Yr2Wy6OnRZz3Uc0hIMn/Q0JqtbCs3LUgEV1KA4OEAe56XNz4Ku4 WE+FFaGFPvtriQsQON+ohPS5IC8jzQGK/0vbrJ4lWjFnZy4gvZXnborTOwD0WSQG fbsNuxp1AaM2/pqfMwXm1w0ADvwOITHNiwwXf9id6DoK81QwTFpUdvKpn6yB6gQ= =rurr -----END PGP SIGNATURE----- Merge tag 'v3.6-rc2' into drm-intel-next Backmerge Linux 3.6-rc2 to resolve a few funny conflicts before we put even more madness on top: - drivers/gpu/drm/i915/i915_irq.c: Just a spurious WARN removed in -fixes, that has been changed in a variable-rename in -next, too. - drivers/gpu/drm/i915/intel_ringbuffer.c: -next remove scratch_addr (since all their users have been extracted in another fucntion), -fixes added another user for a hw workaroudn. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-08-17 09:01:08 +02:00
Eric Anholt	e844b990b1	drm/i915: Don't forget to apply SNB PIPE_CONTROL GTT workaround. If a buffer that was the target of a PIPE_CONTROL from userland was a reused one that hadn't been evicted which had not previously had this workaround applied, then the early return for a correct presumed_offset in this function meant we would not bind it into the GTT and the write would land somewhere else. Fixes reproducible failures with GL_EXT_timer_query usage in apitrace, and I also expect it to fix the intermittent OQ issues on snb that danvet's been working on. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48019 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=52932 Signed-off-by: Eric Anholt <eric@anholt.net> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Carl Worth <cworth@cworth.org> Tested-by: Carl Worth <cworth@cworth.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-08-05 21:45:01 +02:00
Chris Wilson	f047e395dd	drm/i915: Avoid concurrent access when marking the device as idle/busy As suggested by Daniel, rip out the independent timers for device and crtc busyness and integrate the manual powermanagement of the display engine into the GEM core and its request tracking. The benefits are that the code is a lot smaller, fewer moving parts and should fit more neatly into the overall activity tracking of the driver. v2: Complete overhaul and removal of the racy timers and workers. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:56 +02:00
Chris Wilson	a7b9761d0a	drm/i915: Split i915_gem_flush_ring() into seperate invalidate/flush funcs By moving the function to intel_ringbuffer and currying the appropriate parameter, hopefully we make the callsites easier to read and understand. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:55 +02:00
Chris Wilson	016fd0c1ae	drm/i915: Clear the pending_gpu_fenced_access flag at the start of execbuffer Otherwise once we use the buffer with a BLT command on gen2/3, we will always regard future command submissions as continuing the fenced access. However, now that we flush/invalidate between every batch we can drop this pessimism. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:55 +02:00
Daniel Vetter	6ac42f4148	drm/i915: Replace the complex flushing logic with simple invalidate/flush all Now that we unconditionally flush and invalidate between every batch buffer, we no longer need the complex logic to decide which domains require flushing. Remove it and rejoice. v2 (danvet): Keep around the flip waiting logic. It's gross and broken, I know, but we can't just kill that thing ... even if we just keep it around as a reminder that things are broken. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:54 +02:00
Chris Wilson	69c2fc8913	drm/i915: Remove the per-ring write list This is now handled by a global flag to ensure we emit a flush before the next serialisation point (if we failed to queue one previously). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:53 +02:00
Chris Wilson	0201f1ecf4	drm/i915: Replace the pending_gpu_write flag with an explicit seqno As we always flush the GPU cache prior to emitting the breadcrumb, we no longer have to worry about the deferred flush causing the pending_gpu_write to be delayed. So we can instead utilize the known last_write_seqno to hopefully minimise the wait times. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:52 +02:00
Chris Wilson	3bb73aba1e	drm/i915: Allow late allocation of request for i915_add_request() Request preallocation was added to i915_add_request() in order to support the overlay. However, not all users care and can quite happily ignore the failure to allocate the request as they will simply repeat the request in the future. By pushing the allocation down into i915_add_request(), we can then remove some rather ugly error handling in the callers. v2: Nullify request->file_priv otherwise we chase a garbage pointer when retiring requests. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:51 +02:00
Eric Anholt	0da5cec1de	drm/i915: Set the context before setting up regs for the context. Fixes failures in transform feedback on gen7 because our SOL_RESET flag was setting the transform feedback offsets in the old context (occasionally happened to be ours) instead of the new context. Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 10:39:59 +02:00
Chris Wilson	09cf7c9a12	drm/i915: Insert a flush between batches if the breadcrumb was dropped If we drop the breadcrumb request after a batch due to a signal for example we aim to fix it up at the next opportunity. In this case we emit a second batchbuffer with no waits upon the first and so no opportunity to insert the missing request, so we need to emit the missing flush for coherency. (Note that that invalidating the render cache is the same as flushing it, so there should have been no observable corruption.) Note that beside simply adding the missing flush, avoiding potential render corruption, this will also fix at least parts of the problem introduced by some funny interaction of these two commits: commit `de2b998552` Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Wed Jul 4 22:52:50 2012 +0200 drm/i915: don't return a spurious -EIO from intel_ring_begin which allowed intel_ring_begin to return -ERESTARTSYS and commit `cc889e0f6c` Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Wed Jun 13 20:45:19 2012 +0200 drm/i915: disable flushing_list/gpu_write_list which essentially disabled the flushing list. The issue happens when we submit a batch & emit it, but get interrupted (thanks to the first patch) while trying to emit the flush. On the next batch we still assume that the full gpu domain handling is in effect and hence compute the invalidate&flushing domains. But thanks to the 2nd patch we totally ignore these and only invalidate all gpu domains, presuming that any required flushes have been issued already. Which is wrong and eventually results in us updating the new write_domain values with the computed pending_write_domain values, which leaves an object with write_domain == 0 on the gpu_write_list. As soon as we try to unbind that object, things blow up. Fix this by emitting the missing flush according to the new ring->gpu_caches_dirty flag. Note that this does _not_ fix all the current cases where we end up with an object on the flushing_list that can't be flushed. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=52040 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Add bug explanation to commit message.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-20 12:21:40 +02:00
Daniel Vetter	cc889e0f6c	drm/i915: disable flushing_list/gpu_write_list This is just the minimal patch to disable all this code so that we can do decent amounts of QA before we rip it all out. The complicating thing is that we need to flush the gpu caches after the batchbuffer is emitted. Which is past the point of no return where execbuffer can't fail any more (otherwise we risk submitting the same batch multiple times). Hence we need to add a flag to track whether any caches associated with that ring are dirty. And emit the flush in add_request if that's the case. Note that this has a quite a few behaviour changes: - Caches get flushed/invalidated unconditionally. - Invalidation now happens after potential inter-ring sync. I've bantered around a bit with Chris on irc whether this fixes anything, and it might or might not. The only thing clear is that with these changes it's much easier to reason about correctness. Also rip out a lone get_next_request_seqno in the execbuffer retire_commands function. I've dug around and I couldn't figure out why that is still there, with the outstanding lazy request stuff it shouldn't be necessary. v2: Chris Wilson complained that I also invalidate the read caches when flushing after a batchbuffer. Now optimized. v3: Added some comments to explain the new flushing behaviour. Cc: Eric Anholt <eric@anholt.net> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-06-20 13:54:28 +02:00
Ben Widawsky	6e0a69dbc8	drm/i915/context: switch contexts with execbuf2 Use the rsvd1 field in execbuf2 to specify the context ID associated with the workload. This will allow the driver to do the proper context switch when/if needed. v2: Add checks for context switches on rings not supporting contexts. Before the code would silently ignore such requests. Signed-off-by: Ben Widawsky <ben@bwidawsk.net>	2012-06-14 17:36:21 +02:00
Chris Wilson	a15817cf16	drm/i915: Check whether the ring is initialised prior to dispatch Rather than use the magic feature tests HAS_BLT/HAS_BSD just check whether the ring we are about to dispatch the execbuffer on is initialised. v2: Use intel_ring_initialized() Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-19 22:39:53 +02:00
Chris Wilson	acb87dfb4b	drm/i915: Limit calling mark-busy only for potential scanouts The principle of intel_mark_busy() is that we want to spot the transition of when the display engine is being used in order to bump powersaving modes and increase display clocks. As such it is only important when the display is changing, i.e. when rendering to the scanout or other sprite/plane, and these are characterised by being pinned. v2: Mark the whole device as busy on execbuffer and pageflips as well and rebase against dinq for the minor bug fix to be immediately applicable. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: fix compile fail.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-08 15:10:34 +02:00
Daniel Vetter	5e13a0c5ec	Merge remote-tracking branch 'airlied/drm-core-next' into drm-intel-next-queued Backmerge of drm-next to resolve a few ugly conflicts and to get a few fixes from 3.4-rc6 (which drm-next has already merged). Note that this merge also restricts the stencil cache lra evict policy workaround to snb (as it should) - I had to frob the code anyway because the CM0_MASK_SHIFT define died in the masked bit cleanups. We need the backmerge to get Paulo Zanoni's infoframe regression fix for gm45 - further bugfixes from him touch the same area and would needlessly conflict. Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-08 13:39:59 +02:00
Daniel Vetter	dc257cf154	Linux 3.4-rc6 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iQEcBAABAgAGBQJPpvY9AAoJEHm+PkMAQRiGpEoIAJgbu+Y8gITnBK/wh9O6zy3S 5jie5KK4YWdbJsvO58WbNr3CyVIwGIqQ2dUZLiU59aBVLarlGw8xor0MmW+cZwhp 6fBHaf0qDYAV0MZjD+mnnExOiCRyISa2lPmsfu9dAWywh5KGe6/oAP6/qcXIyok3 KZyl3qQf4ENpaZPHwZPXCEkUvtuyHgNiszN+QXEadA3s19Ot4VGe9A3VGw+GNrSm JqFIq3acQAbKa5BYaqf7TQC02v2FI7//eqt6QHxTqbE6a7LGbTvLfX3HlJ2mnfqa 1R6QHhM4y4OZDHbaMT2raHZ8WuLXzhehJzhP8Co7AHFOKwVKOb5XbcUr2RrukMU= =HkMd -----END PGP SIGNATURE----- Merge tag 'v3.4-rc6' into drm-intel-next Conflicts: drivers/gpu/drm/i915/intel_display.c Ok, this is a fun story of git totally messing things up. There /shouldn't/ be any conflict in here, because the fixes in -rc6 do only touch functions that have not been changed in -next. The offending commits in drm-next are 14415745b2..1fa611065 which simply move a few functions from intel_display.c to intel_pm.c. The problem seems to be that git diff gets completely confused: $ git diff 14415745b2..1fa611065 is a nice mess in intel_display.c, and the diff leaks into totally unrelated functions, whereas $git diff --minimal 14415745b2..1fa611065 is exactly what we want. Unfortunately there seems to be no way to teach similar smarts to the merge diff and conflict generation code, because with the minimal diff there really shouldn't be any conflicts. For added hilarity, every time something in that area changes the + and - lines in the diff move around like crazy, again resulting in new conflicts. So I fear this mess will stay with us for a little longer (and might result in another backmerge down the road). Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-07 14:02:14 +02:00
Daniel Vetter	6ebebc9206	drm/i915: disallow clip rects on gen5+ Unfortunately there has been dri1 userspace that used gem to manage the gtt and hence also needed cliprects in the execbuf ioctl. So we can't ever remove that code without breaking the ioctl abi. But at least we can disable it on gen5+, because these horrible versions of mesa have not supported these chips. Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:29 +02:00
Ben Widawsky	b2da9fe5d5	drm/i915: remove do_retire from i915_wait_request This originates from a hack by me to quickly fix a bug in an earlier patch where we needed control over whether or not waiting on a seqno actually did any retire list processing. Since the two operations aren't clearly related, we should pull the parameter out of the wait function, and make the caller responsible for retiring if the action is desired. The only function call site which did not get an explicit retire_request call (on purpose) is i915_gem_inactive_shrink(). That code was already calling retire_request a second time. v2: don't modify any behavior excepit i915_gem_inactive_shrink(Daniel) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:20 +02:00
Xi Wang	44afb3a043	drm/i915: fix integer overflow in i915_gem_do_execbuffer() On 32-bit systems, a large args->num_cliprects from userspace via ioctl may overflow the allocation size, leading to out-of-bounds access. This vulnerability was introduced in commit `432e58ed` ("drm/i915: Avoid allocation for execbuffer object list"). Signed-off-by: Xi Wang <xi.wang@gmail.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-23 22:32:15 +02:00
Xi Wang	ed8cd3b2cd	drm/i915: fix integer overflow in i915_gem_execbuffer2() On 32-bit systems, a large args->buffer_count from userspace via ioctl may overflow the allocation size, leading to out-of-bounds access. This vulnerability was introduced in commit `8408c282` ("drm/i915: First try a normal large kmalloc for the temporary exec buffers"). Signed-off-by: Xi Wang <xi.wang@gmail.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-23 22:32:02 +02:00
Chris Wilson	06d9813157	drm/i915: Remove the pipelined parameter from get_fence() We never succeeded in getting pipelined fencing to work (unresolved spurious GPU hangs), so begin the process of dismantling and removal the broken code. Step 1 is the removal of the pipeline parameter to get_fence(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-18 13:15:43 +02:00
Chris Wilson	7b09638f45	drm/i915: Always flush tiling changes before accessing through the GTT As we defer updating the fence register from set-tiling to the point of use, we need to declare every access through the GTT as either fenced or unfenced. This patches fixes an old bug in the execbuffer relocation processing which could conceivably be hit by a pathological userspace. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-18 10:48:38 +02:00
Ben Widawsky	2911a35b2e	drm/i915: use semaphores for the display plane In theory this will have performance and power improvements. Performance because we don't need to stall when the scanout BO is busy, and power because we don't have to stall when the BO is busy (and the ring can even go to sleep if the HW supports it). v2: squash 2 patches into 1 (me) un-inline the enable_semaphores function (Daniel) remove comment about SNB hangs from i915_gem_object_sync (Chris) rename intel_enable_semaphores to i915_semaphore_is_enabled (me) removed page flip comment; "no why" (Chris) To address other comments from Daniel (irc): update the comment to say 'vt-d is crap, don't enable semaphores' - I think you misinterpreted Chris' comment, it already exists. checking out whether we can pageflip on the render ring on ivb (didn't work on early silicon) - We don't want to enable workarounds for early silicon unless we have to. - I can't find any references in the docs about this. optionally use it if the fb is already busy on the render ring - This should be how the code already worked, unless I am misunderstanding your meaning. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-12 21:14:05 +02:00
Chris Wilson	9a5a53b392	drm/i915: Reorganise rules for get_fence/put_fence By simplifying the rules to calling get_fence when writing to the through the GTT in a tiled manner, and calling put_fence before writing to the object through the GTT in a linear manner, the code becomes clearer and there is less chance of making a mistake. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> [danvet: fixed up conflict with ppgtt code and spelling in a new comment.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-12 21:14:04 +02:00
Dave Airlie	effbc4fd8e	Merge branch 'drm-intel-next' of git://people.freedesktop.org/~danvet/drm-intel into drm-core-next Daniel Vetter wrote First pull request for 3.5-next, slightly large than usual because new things kept coming in since the last pull for 3.4. Highlights: - first batch of hw enablement for vlv (Jesse et al) and hsw (Eugeni). pci ids are not yet added, and there's still quite a few patches to merge (mostly modesetting). To make QA easier I've decided to merge this stuff in pieces. - loads of cleanups and prep patches spurred by the above. Especially vlv is a real frankenstein chip, but also hsw is stretching our driver's code design. Expect more to come in this area for 3.5. - more gmbus fixes, cleanups and improvements by Daniel Kurtz. Again, there are more patches needed (and some already queued up), but I wanted to split this a bit for better testing. - pwrite/pread rework and retuning. This series has been in the works for a few months already and a lot of i-g-t tests have been created for it. Now it's finally ready to be merged. Note that one patch in this series touches include/pagemap.h, that patch is acked-by akpm. - reduce mappable pressure and relocation throughput improvements from Chris. - mmap offset exhaustion mitigation by Chris Wilson. - a start at figuring out which codepaths in our messy dri1/ums+gem/kms driver we actually need to support by bailing out of unsupported case. The driver now refuses to load without kms on gen6+ and disallows a few ioctls that userspace never used in certain cases. More of this will definitely come. - More decoupling of global gtt and ppgtt. - Improved dual-link lvds detection by Takashi Iwai. - Shut up the compiler + plus fix the fallout (Ben) - Inverted panel brightness handling (mostly Acer manages to break things in this way). - Small fixlets and adjustements and some minor things to help debugging. Regression-wise QA reported quite a few issues on ivb, but all of them turned out to be hw stability issues which are already fixed in drm-intel-fixes (QA runs the nightly regression tests on -next alone, without -fixes automatically merged in). There's still one issue open on snb, it looks like occlusion query writes are not quite as cache coherent as we've expected. With some of the pwrite adjustements we can now reliably hit this. Kernel workaround for it is in the works." * 'drm-intel-next' of git://people.freedesktop.org/~danvet/drm-intel: (101 commits) drm/i915: VCS is not the last ring drm/i915: Add a dual link lvds quirk for MacBook Pro 8,2 drm/i915: make quirks more verbose drm/i915: dump the DMA fetch addr register on pre-gen6 drm/i915/sdvo: Include YRPB as an additional TV output type drm/i915: disallow gem init ioctl on ilk drm/i915: refuse to load on gen6+ without kms drm/i915: extract gt interrupt handler drm/i915: use render gen to switch ring irq functions drm/i915: rip out old HWSTAM missed irq WA for vlv drm/i915: open code gen6+ ring irqs drm/i915: ring irq cleanups drm/i915: add SFUSE_STRAP registers for digital port detection drm/i915: add WM_LINETIME registers drm/i915: add WRPLL clocks drm/i915: add LCPLL control registers drm/i915: add SSC offsets for SBI access drm/i915: add port clock selection support for HSW drm/i915: add S PLL control drm/i915: add PIXCLK_GATE register ... Conflicts: drivers/char/agp/intel-agp.h drivers/char/agp/intel-gtt.c drivers/gpu/drm/i915/i915_debugfs.c	2012-04-12 10:27:01 +01:00
Chris Wilson	7dd4906586	drm/i915: Mark untiled BLT commands as fenced on gen2/3 The BLT commands on gen2/3 utilize the fence registers and so we cannot modify any fences for the object whilst those commands are in flight. Currently we marked tiled commands as occupying a fence, but forgot to restrict the untiled commands from preventing a fence being assigned before they were completed. One side-effect is that we ten have to double check that a fence was allocated for a fenced buffer during move-to-active. Reported-by: Jiri Slaby <jirislaby@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43427 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47990 Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Testcase: i-g-t/tests/gem_tiled_after_untiled_blt Tested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-01 12:26:05 +02:00
Daniel Vetter	f56f821feb	mm: extend prefault helpers to fault in more than PAGE_SIZE drm/i915 wants to read/write more than one page in its fastpath and hence needs to prefault more than PAGE_SIZE bytes. Add new functions in filemap.h to make that possible. Also kill a copy&pasted spurious space in both functions while at it. v2: As suggested by Andrew Morton, add a multipage parameter to both functions to avoid the additional branch for the pagemap.c hotpath. My gcc 4.6 here seems to dtrt and indeed reap these branches where not needed. v3: Becaus I couldn't find a way around adding a uaddr += PAGE_SIZE to the filemap.c hotpaths (that the compiler couldn't remove again), let's go with separate new functions for the multipage use-case. v4: Adjust comment to CodingStlye and fix spelling. Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:36:30 +02:00
Chris Wilson	dabdfe021a	drm/i915: Avoid using mappable space for relocation processing through the CPU We try to avoid writing the relocations through the uncached GTT, if the buffer is currently in the CPU write domain and so will be flushed out to main memory afterwards anyway. Also on SandyBridge we can safely write to the pages in cacheable memory, so long as the buffer is LLC mapped. In either of these cases, we therefore do not need to force the reallocation of the buffer into the mappable region of the GTT, reducing the aperture pressure. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:16:17 +02:00
Chris Wilson	1d83f4426f	drm/i915: Batch copy_from_user for relocation processing Originally the code tried to allocate a large enough array to perform the copy using vmalloc, performance wasn't great and throughput was improved by processing each individual relocation entry separately. This too is not as efficient as one would desire. A compromise would be to allocate a single page, or to allocate a few entries on the stack, and process the copy in batches. The latter gives simpler code and more consistent performance due to a lack of heuristic. x11perf -copywinwin10: n450/pnv i3-330m i5-2520m (cpu) before: 249000 785000 1280000 (80%) page: 264000 896000 1280000 (65%) on-stack: 264000 902000 1280000 (67%) v2: Use 512-bytes of stack for batching rather than allocate a page. v3: Tidy the code slightly with more descriptive variable names Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-26 09:59:05 +02:00
Daniel Vetter	149c84077f	drm/i915: implement SNB workaround for lazy global gtt PIPE_CONTROL on snb needs global gtt mappings in place to workaround a hw gotcha. No other commands need such a workaround. Luckily we can detect a PIPE_CONTROL commands easily because they have a write_domain = I915_GEM_DOMAIN_INSTRUCTION (and nothing else has that). v2: Binding the target of such a reloc into the global gtt actually works instead of binding the source, which is rather pointless ... v3: Kill a superflous has_global_gtt_mapping assignement noticed by Chris Wilson. Reviewed-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-20 21:53:55 +01:00
Daniel Vetter	9edd576d89	Merge remote-tracking branch 'airlied/drm-fixes' into drm-intel-next-queued Back-merge from drm-fixes into drm-intel-next to sort out two things: - interlaced support: -fixes contains a bugfix to correctly clear interlaced configuration bits in case the bios sets up an interlaced mode and we want to set up the progressive mode (current kernels don't support interlaced). The actual feature work to support interlaced depends upon (and conflicts with) this bugfix. - forcewake voodoo to workaround missed IRQ issues: -fixes only enabled this for ivybridge, but some recent bug reports indicate that we need this on Sandybridge, too. But in a slightly different flavour and with other fixes and reworks on top. Additionally there are some forcewake cleanup patches heading to -next that would conflict with currrent -fixes. Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-02-10 17:14:49 +01:00
Daniel Vetter	7bddb01fb9	drm/i915: ppgtt binding/unbinding support This adds support to bind/unbind objects and wires it up. Objects are only put into the ppgtt when necessary, i.e. at execbuf time. Objects are still unconditionally put into the global gtt. v2: Kill the quick hack and explicitly pass cache_level to ppgtt_bind like for the global gtt function. Noticed by Chris Wilson. Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-02-09 21:25:23 +01:00
Daniel Vetter	ff240199b6	drm/i915: s/DRM_ERROR/DRM_DEBUG in i915_gem_execbuffer.c These are all user-trigerable, so tune down their loudness a notch. For some of these we have i-g-t tests (because they prevent newly-discovered bugs), without this patches running the test suite leaves behind a dirty dmesg. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-02-09 11:10:32 +01:00
Daniel Vetter	4ca4a250ac	drm/i915: reject GTT domain in relocations This confuses our domain tracking and can (for gtt write domains) lead to a subsequent oops. Tested by tests/gem_exec_bad_domains from i-g-t. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-01-29 18:37:10 +01:00
Chris Wilson	1690e1eb7a	drm/i915: Separate fence pin counting from normal bind pin counting In order to correctly account for reserving space in the GTT and fences for a batch buffer, we need to independently track whether the fence is pinned due to a fenced GPU access in the batch or whether the buffer is pinned in the aperture. Currently we count the fenced as pinned if the buffer has already been seen in the execbuffer. This leads to a false accounting of available fence registers, causing frequent mass evictions. Worse, if coupled with the change to make i915_gem_object_get_fence() report EDADLK upon fence starvation, the batchbuffer can fail with only one fence required... Fixes intel-gpu-tools/tests/gem_fenced_exec_thrash Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38735 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Tested-by: Paul Neumann <paul104x@yahoo.de> [danvet: Resolve the functional conflict with Jesse Barnes sprite patches, acked by Chris Wilson on irc.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-01-29 18:23:37 +01:00
Daniel Vetter	96154f2fab	drm/i915: switch ring->id to be a real id ... and add a helpr function for the places where we want a flag. This way we can use ring->id to index into arrays. v2: Resurrect the missing beautification-space Chris Wilson noted. I'm moving this space around because I'll reuse ring_str in the next patch. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-01-29 17:32:58 +01:00
Ben Widawsky	b93f9cf14e	drm/i915: argument to control retiring behavior Sometimes it may be the case when we idle the gpu or wait on something we don't actually want to process the retiring list. This patch allows callers to choose the behavior. Reviewed-by: Keith Packard <keithp@keithp.com> Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-01-26 11:19:19 +01:00
Linus Torvalds	1a464cbb3d	Merge branch 'drm-core-next' of git://people.freedesktop.org/~airlied/linux * 'drm-core-next' of git://people.freedesktop.org/~airlied/linux: (307 commits) drm/nouveau/pm: fix build with HWMON off gma500: silence gcc warnings in mid_get_vbt_data() drm/ttm: fix condition (and vs or) drm/radeon: double lock typo in radeon_vm_bo_rmv() drm/radeon: use after free in radeon_vm_bo_add() drm/sis\|via: don't return stack garbage from free_mem ioctl drm/radeon/kms: remove pointless CS flags priority struct drm/radeon/kms: check if vm is supported in VA ioctl drm: introduce drm_can_sleep and use in intel/radeon drivers. (v2) radeon: Fix disabling PCI bus mastering on big endian hosts. ttm: fix agp since ttm tt rework agp: Fix multi-line warning message whitespace drm/ttm/dma: Fix accounting error when calling ttm_mem_global_free_page and don't try to free freed pages. drm/ttm/dma: Only call set_pages_array_wb when the page is not in WB pool. drm/radeon/kms: sync across multiple rings when doing bo moves v3 drm/radeon/kms: Add support for multi-ring sync in CS ioctl (v2) drm/radeon: GPU virtual memory support v22 drm: make DRM_UNLOCKED ioctls with their own mutex drm: no need to hold global mutex for static data drm/radeon/benchmark: common modes sweep ignores 640x480@32 ... Fix up trivial conflicts in radeon/evergreen.c and vmwgfx/vmwgfx_kms.c	2012-01-10 11:04:36 -08:00
Eric Anholt	ae662d3126	drm/i915: Add support for resetting the SO write pointers on gen7. These registers are automatically incremented by the hardware during transform feedback to track where the next streamed vertex output should go. Unlike the previous generation, which had a packet for setting the corresponding registers to a defined value, gen7 only has MI_LOAD_REGISTER_IMM to do so. That's a secure packet (since it loads an arbitrary register), so we need to do it from the kernel, and it needs to be settable atomically with the batchbuffer execution so that two clients doing transform feedback don't stomp on each others' state. Instead of building a more complicated interface involcing setting the registers to a specific value, just set them to 0 when asked and userland can tweak its pointers accordingly. Signed-off-by: Eric Anholt <eric@anholt.net> Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Keith Packard <keithp@keithp.com>	2012-01-03 09:31:18 -08:00
Ben Widawsky	84f9f938be	drm/i915: Force sync command ordering (Gen6+) The docs say this is required for Gen7, and since the bit was added for Gen6, we are also setting it there pit pf paranoia. Particularly as Chris points out, if PIPE_CONTROL counts as a 3d state packet. This was found through doc inspection by Ken and applies to Gen6+; Reported-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Keith Packard <keithp@keithp.com>	2012-01-03 09:09:44 -08:00
Ben Widawsky	e2971bdab2	drm/i915: relative_constants_mode race fix dev_priv keeps track of the current addressing mode that gets set at execbuffer time. Unfortunately the existing code was doing this before acquiring struct_mutex which leaves a race with another thread also doing an execbuffer. If that wasn't bad enough, relocate_slow drops struct_mutex which opens a much more likely error where another thread comes in and modifies the state while relocate_slow is being slow. The solution here is to just defer setting this state until we absolutely need it, and we know we'll have struct_mutex for the remainder of our code path. v2: Keith noticed a bug in the original patch. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Keith Packard <keithp@keithp.com>	2012-01-03 09:09:44 -08:00
Keith Packard	ebbd857e6b	drm/i915: Disable semaphores by default on SNB Semaphores still cause problems on some machines: > From Udo Steinberg: > > With Linux-3.2-rc6 I'm frequently seeing GPU hangs when large amounts of > text scroll in an xterm, such as when extracting a tar archive. Such as this > one (note the timestamps): > > I can reproduce it fairly easily with something > as simple as: > > while true; do dmesg; done This patch turns them off on SNB while leaving them on for IVB. Reported-by: Udo Steinberg <udo@hypervisor.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Eugeni Dodonov <eugeni@dodonov.net> Signed-off-by: Keith Packard <keithp@keithp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-12-26 21:07:26 -08:00
Eugeni Dodonov	f45b55575c	drm/i915: enable semaphores on per-device defaults This adds a default setting for semaphores parameter, and enables semaphores by default on IVB. For now, as semaphores interaction with VTd causes random issues on SNB, we do not enable them by default. But they can still be enabled via the semaphores=1 kernel parameter. v2: enables semaphores on SNB when IO remapping is disabled, with base on Keith Packard patch. CC: Daniel Vetter <daniel.vetter@ffwll.ch> CC: Ben Widawsky <ben@bwidawsk.net> CC: Keith Packard <keithp@keithp.com> CC: Jesse Barnes <jbarnes@virtuousgeek.org> CC: Chris Wilson <chris@chris-wilson.co.uk> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42696 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40564 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41353 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38862 Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Signed-off-by: Keith Packard <keithp@keithp.com>	2011-12-16 08:49:59 -08:00
Ben Widawsky	c8c99b0f0d	drm/i915: Dumb down the semaphore logic While I think the previous code is correct, it was hard to follow and hard to debug. Since we already have a ring abstraction, might as well use it to handle the semaphore updates and compares. I don't expect this code to make semaphores better or worse, but you never know... v2: Remove magic per Keith's suggestions. Ran Daniel's gem_ring_sync_loop test on this. v3: Ignored one of Keith's suggestions. v4: Removed some bloat per Daniel's recommendation. Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Keith Packard <keithp@keithp.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Keith Packard <keithp@keithp.com>	2011-09-21 14:52:41 -07:00
Eric Anholt	e92d03bff9	Revert "drm/i915: Kill GTT mappings when moving from GTT domain" This reverts commit `4a684a4117`. Userland has always been required to set the object's domain to GTT before using it through a GTT mapping, it's not something that the kernel is supposed to enforce. (The pagefault support is so that we can handle multiple mappings without userland having to pin across them, not so that userland can use GTT after GPU domains without telling the kernel). Fixes 19.2% +/- 0.8% (n=6) performance regression in cairo-gl firefox-talos-gfx on my T420 latop. Signed-off-by: Keith Packard <keithp@keithp.com>	2011-06-21 11:11:02 -07:00
Chris Wilson	d4aeee7760	drm/i915: Disable pagefaults along execbuffer relocation fast path Along the fast path for relocation handling, we attempt to copy directly from the user data structures whilst holding our mutex. This causes lockdep to warn about circular lock dependencies if we need to pagefault the user pages. [Since when handling a page fault on a mmapped bo, we need to acquire the struct mutex whilst already holding the mm semaphore, it is then verboten to acquire the mm semaphore when already holding the struct mutex. The likelihood of the user passing in the relocations contained in a GTT mmaped bo is low, but conceivable for extreme pathology.] In order to force the mm to return EFAULT rather than handle the pagefault, we therefore need to disable pagefaults across the relocation fast path. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@kernel.org Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2011-03-23 09:17:01 +00:00
Chris Wilson	47ae63e0c2	Merge branch 'drm-intel-fixes' into drm-intel-next Apply the trivial conflicting regression fixes, but keep GPU semaphores enabled. Conflicts: drivers/gpu/drm/i915/i915_drv.h drivers/gpu/drm/i915/i915_gem_execbuffer.c	2011-03-07 12:35:15 +00:00
Chris Wilson	c59a333f73	drm/i915: Only wait on a pending flip if we intend to write to the buffer ... as if we are only reading from it, we can do that concurrently with the queue flip. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-03-07 11:06:02 +00:00
Chris Wilson	a1656b9090	drm/i915: Disable GPU semaphores by default Andi Kleen narrowed his GPU hangs on his Sugar Bay (SNB desktop) rev 09 down to the use of GPU semaphores, and we already know that they appear broken up to Huron River (mobile) rev 08. (I'm optimistic that disabling GPU semaphores is simply hiding another bug by the latency and side-effects of the additional device interaction it introduces...) However, use of semaphores is a massive performance improvement... Only as long as the system remains stable. Enable at your peril. Reported-by: Andi Kleen <andi-fd@firstfloor.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33921 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-03-07 11:00:59 +00:00
Chris Wilson	e8b2c3c47a	drm/i915: Re-enable GPU semaphores for SandyBridge mobile This seems to be running stably on my test laptop, so hopefully the reported hangs where just symptoms of other bugs. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-03-01 19:22:52 +00:00
Chris Wilson	271d81b841	drm/i915: Allow relocation deltas outside of target bo Userspace has a legitimate requirement to use a delta that points to outside of the target bo, and so we need to enable this. (As this is an abi break, albeit a relaxation of the current restrictions, mark the change with a new flag.) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-03-01 16:01:02 +00:00
Chris Wilson	ce453d81cb	drm/i915: Use a device flag for non-interruptible phases The code paths for modesetting are growing in complexity as we may need to move the buffers around in order to fit the scanout in the aperture. Therefore we face a choice as to whether to thread the interruptible status through the entire pinning and unbinding code paths or to add a flag to the device when we may not be interrupted by a signal. This does the latter and so fixes a few instances of modesetting failures under stress. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-02-22 15:56:25 +00:00
Chris Wilson	8408c282f0	drm/i915: First try a normal large kmalloc for the temporary exec buffers As we just need a temporary array whilst performing the relocations for the execbuffer, first attempt to allocate using kmalloc even if it is not of order page-0. This avoids the overhead of remapping the discontiguous array and so gives a moderate boost to execution throughput. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-02-22 15:56:16 +00:00
Chris Wilson	c872522663	drm/i915: Protect against drm_gem_object not being the first member Dave Airlie spotted that we had a potential bug should we ever rearrange the drm_i915_gem_object so not the base drm_gem_object was not its first member. He noticed that we often convert the return of drm_gem_object_lookup() immediately into drm_i915_gem_object and then check the result for nullity. This is only valid when the base object is the first member and so the superobject has the same address. Play safe instead and use the compiler to convert back to the original return address for sanity testing. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-02-22 15:55:57 +00:00
Chris Wilson	db53a30261	drm/i915: Refine tracepoints A lot of minor tweaks to fix the tracepoints, improve the outputting for ftrace, and to generally make the tracepoints useful again. It is a start and enough to begin identifying performance issues and gaps in our coverage. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-02-07 14:59:18 +00:00
Chris Wilson	bdd92c9ad2	Merge branch 'drm-intel-fixes' into drm-intel-next Merge important suspend and resume regression fixes and resolve the small conflict. Conflicts: drivers/gpu/drm/i915/i915_dma.c	2011-01-24 23:45:32 +00:00
Chris Wilson	076e2c0eb8	drm/i915: Fix use of invalid array size for ring->sync_seqno There are I915_NUM_RINGS-1 inter-ring synchronisation counters, but we were clearing I915_NUM_RINGS of them. Oops. Reported-by: Jiri Slaby <jirislaby@gmail.com> Tested-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-01-23 12:52:11 +00:00
Chris Wilson	311bd68e02	drm/i915: Trivial sparse fixes Move code around and invoke iomem annotation in a few more places in order to silence sparse. Still a few more iomem annotations to go... Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-01-19 12:39:38 +00:00
Chris Wilson	1591192d3a	drm/i915: Disable GPU semaphores on SandyBridge mobile Hopefully, this is a temporary measure whilst the root cause is understood. At the moment, we experience a hard hang whilst looping urbanterror that has been identified as a result of the use of semaphores, but so far only on SNB mobile. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32752 Tested-by: mengmeng.meng@intel.com Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-01-14 09:51:05 +00:00
Chris Wilson	595dad76a0	drm/i915/execbuffer: Clear domains before beginning reloc processing After reordering the sequence of relocating objects, commit `6fe4f1404`, we can no longer rely on seeing all reloc targets prior to performing the relocation. As a result we were ignoring the need to flush objects from the render cache and invalidate the sampler caches, resulting in rendering glitches. So we need to clear the relocation domains earlier. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Tested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-01-13 16:06:05 +00:00
Chris Wilson	dd6864a4ed	drm/i915/execbuffer: Reorder relocations to match new object order On the fault path, commit `6fe4f140` introduction a regression whereby it changed the sequence of the objects but continued to use the original ordering of relocation entries. The result was that incorrect GTT offsets were being fed into the execbuffer causing lots of misrendering and potential hangs. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Tested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-01-13 16:06:05 +00:00
Chris Wilson	6fe4f14044	drm/i915/execbuffer: Reorder binding of objects to favour restrictions As the mappable portion of the aperture is always a small subset at the start of the GTT, it is allocated preferentially by drm_mm. This is useful in case we ever need to map an object later. However, if you have a large object that can consume the entire mappable region of the GTT this prevents the batchbuffer from fitting and so causing an error. Instead allocate all those that require a mapping up front in order to improve the likelihood of finding sufficient space to bind them. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-01-11 22:55:48 +00:00
Chris Wilson	36cf174230	drm/i915/execbuffer: Correctly clear the current object list upon EFAULT Before releasing the lock in order to copy the relocation list from user pages, we need to drop all the object references as another thread may usurp and execute another batchbuffer before we reacquire the lock. However, the code was buggy and failed to clear the list... Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@kernel.org	2011-01-11 22:55:29 +00:00
Chris Wilson	882417851a	drm/i915: Propagate error from flushing the ring ... in order to avoid a BUG() and potential unbounded waits. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-01-11 20:44:50 +00:00
Chris Wilson	b72f3acb71	drm/i915: Handle ringbuffer stalls when flushing Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-01-11 20:43:55 +00:00
Chris Wilson	63256ec534	drm/i915: Enforce write ordering through the GTT We need to ensure that writes through the GTT land before any modification to the MMIO registers and so must impose a mandatory write barrier when flushing the GTT domain. This was revealed by relaxing the write ordering by experimentally mapping the registers and the GATT as write-combining. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-01-11 20:42:53 +00:00
Chris Wilson	72bfa19c8d	drm/i915: Allow the application to choose the constant addressing mode The relative-to-general state default is useless as it means having to rewrite the streaming kernels for each batch. Relative-to-surface is more useful, as that stream usually needs to be rewritten for each batch. And absolute addressing mode, vital if you start streaming state, is also only available by adjusting the register... Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-12-20 09:41:36 +00:00
Chris Wilson	b8f7ab1788	drm/i915: Mark the user reloc error paths as unlikely Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-12-09 19:46:22 +00:00
Chris Wilson	67731b87e9	drm/i915: Eliminate drm_gem_object_lookup during relocation As we provide a list of all objects that will be accessed from the batchbuffer, we can build a lut of the handles associated with those objects for this invocation and use that to avoid the overhead of looking up those objects again for every relocation. The cost of building and searching a small hash table is much less than that of acquiring a spinlock, searching a radix tree and manipulating an atomic refcnt per relocation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-12-09 19:46:21 +00:00
Chris Wilson	9b3826bf84	drm/i915: Ignore fenced commands for gpu access on gen4 Userspace should not have been declaring that it needed fenced GPU access with gen4+ as those GPUs have no fenced commands, but to be on the safe side it is easier to ignore userspace in case they did. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-12-05 23:18:32 +00:00
Chris Wilson	1ec14ad313	drm/i915: Implement GPU semaphores for inter-ring synchronisation on SNB The bulk of the change is to convert the growing list of rings into an array so that the relationship between the rings and the semaphore sync registers can be easily computed. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-12-05 00:37:38 +00:00
Chris Wilson	d9e86c0ee6	drm/i915: Pipelined fencing [infrastructure] With this change, every batchbuffer can use all available fences (save pinned and scanout, of course) without ever stalling the gpu! In theory. Currently the actual pipelined update of the register is disabled due to some stability issues. However, just the deferred update is a significant win. Based on a series of patches by Daniel Vetter. The premise is that before every access to a buffer through the GTT we have to declare whether we need a register or not. If the access is by the GPU, a pipelined update to the register is made via the ringbuffer, and we track the last seqno of the batches that access it. If by the CPU we wait for the last GPU access and update the register (either to clear or to set it for the current buffer). One advantage of being able to pipeline changes is that we can defer the actual updating of the fence register until we first need to access the object through the GTT, i.e. we can eliminate the stall on set_tiling. This is important as the userspace bo cache does not track the tiling status of active buffers which generate frequent stalls on gen3 when enabling tiling for an already bound buffer. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2010-12-02 10:07:05 +00:00
Chris Wilson	87ca9c8a7e	drm/i915: Prevent stalling for a GTT read back from a read-only GPU target Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-12-02 10:00:15 +00:00
Chris Wilson	c4e7a41467	drm/i915/ringbuffer: Handle cliprects in the caller This makes the various rings more consistent by removing the anomalous handing of the rendering ring execbuffer dispatch. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-11-30 14:17:51 +00:00
Chris Wilson	602606a472	drm/i915/execbuffer: On error, starting unwinding from the previous object As the error occurred on the current object, it means that its state was not changed and so it should be excluded from the unwind. Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-11-28 15:31:02 +00:00
Chris Wilson	432e58edc9	drm/i915: Avoid allocation for execbuffer object list Besides the minimal improvement in reducing the execbuffer overhead, the real benefit is clarifying a few routines. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-11-25 21:19:26 +00:00
Chris Wilson	54cf91dc4e	drm/i915: Split i915_gem_execbuffer into its own file. A number of dragons have been seen lurking within the execbuffer code. The first step is then to isolate them from the rest and begin to scrutinise them in depth. Suggested by Daniel Vetter. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-11-25 21:19:25 +00:00

... 4 5 6 7 8

381 Commits