OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Nick Hoath	6d3d8274bc	drm/i915: Subsume intel_ctx_submit_request in to drm_i915_gem_request Move all remaining elements that were unique to execlists queue items in to the associated request. Issue: VIZ-4274 v2: Rebase. Fixed issue of overzealous freeing of request. v3: Removed re-addition of cleanup work queue (found by Daniel Vetter) v4: Rebase. v5: Actual removal of intel_ctx_submit_request. Update both tail and postfix pointer in __i915_add_request (found by Thomas Daniel) v6: Removed unrelated changes Signed-off-by: Nick Hoath <nicholas.hoath@intel.com> Reviewed-by: Thomas Daniel <thomas.daniel@intel.com> [danvet: Reformat comment with strange linebreaks.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-01-27 09:50:53 +01:00
Nick Hoath	21076372af	drm/i915: Remove FIXME_lrc_ctx backpointer The first pass implementation of execlists required a backpointer to the context to be held in the intel_ringbuffer. However the context pointer is available higher in the call stack. Remove the backpointer from the ring buffer structure and instead pass it down through the call stack. v2: Integrate this changeset with the removal of duplicate request/execlist queue item members. v3: Rebase v4: Rebase. Remove passing of context when the request is passed. Signed-off-by: Nick Hoath <nicholas.hoath@intel.com> Reviewed-by: Thomas Daniel <thomas.daniel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-01-27 09:50:53 +01:00
Nick Hoath	72f95afa5f	drm/i915: Removed duplicate members from submit_request Where there were duplicate variables for the tail, context and ring (engine) in the gem request and the execlist queue item, use the one from the request and remove the duplicate from the execlist queue item. Issue: VIZ-4274 v1: Rebase v2: Fixed build issues. Keep separate postfix & tail pointers as these are used in different ways. Reinserted missing full tail pointer update. Signed-off-by: Nick Hoath <nicholas.hoath@intel.com> Reviewed-by: Thomas Daniel <thomas.daniel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-01-27 09:50:52 +01:00
Dave Airlie	d3e7a0dabd	Merge tag 'drm-intel-next-2015-01-17' of git://anongit.freedesktop.org/drm-intel into drm-next - refactor i915/snd-hda interaction to use the component framework (Imre) - psr cleanups and small fixes (Rodrigo) - a few perf w/a from Ken Graunke - switch to atomic plane helpers (Matt Roper) - wc mmap support (Chris Wilson & Akash Goel) - smaller things all over * tag 'drm-intel-next-2015-01-17' of git://anongit.freedesktop.org/drm-intel: (40 commits) drm/i915: Update DRIVER_DATE to 20150117 i915: reuse %ph to dump small buffers drm/i915: Ensure the HiZ RAW Stall Optimization is on for Cherryview. drm/i915: Enable the HiZ RAW Stall Optimization on Broadwell. drm/i915: PSR link standby at debugfs drm/i915: group link_standby setup and let this info visible everywhere. drm/i915: Add missing vbt check. drm/i915: PSR HSW/BDW: Fix inverted logic at sink main_link_active bit. drm/i915: PSR VLV/CHV: Remove condition checks that only applies to Haswell. drm/i915: VLV/CHV PSR needs to exit PSR on every flush. drm/i915: Fix kerneldoc for i915 atomic plane code drm/i915: Don't pretend SDVO hotplug works on 915 drm/i915: Don't register HDMI connectors for eDP ports on VLV/CHV drm/i915: Remove I915_HAS_HOTPLUG() check from i915_hpd_irq_setup() drm/i915: Make hpd arrays big enough to avoid out of bounds access Revert "drm/i915/chv: Use timeout mode for RC6 on chv" drm/i915: Improve HiZ throughput on Cherryview. drm/i915: Reset CSB read pointer in ring init drm/i915: Drop unused position fields (v2) drm/i915: Move to atomic plane helpers (v9) ...	2015-01-27 09:01:09 +10:00
Bob Paauwe	af1a7301c7	drm/i915: Only fence tiled region of object. When creating a fence for a tiled object, only fence the area that makes up the actual tiles. The object may be larger than the tiled area and if we allow those extra addresses to be fenced, they'll get converted to addresses beyond where the object is mapped. This opens up the possiblity of writes beyond the end of object. To prevent this, we adjust the size of the fence to only encompass the area that makes up the actual tiles. The extra space is considered un-tiled and now behaves as if it was a linear object. Testcase: igt/gem_tiled_fence_overflow Reported-by: Dan Hettena <danh@ghs.com> Signed-off-by: Bob Paauwe <bob.j.paauwe@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: stable@vger.kernel.org Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2015-01-26 11:00:33 +02:00
David Woodhouse	f48a01651b	drm/i915: Init PPGTT before context enable Commit `82460d972` ("drm/i915: Rework ppgtt init to no require an aliasing ppgtt") introduced a regression on Broadwell, triggering the following IOMMU fault at startup: vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem dmar: DRHD: handling fault status reg 2 dmar: DMAR:[DMA Write] Request device [00:02.0] fault addr 880000 DMAR:[fault reason 23] Unknown fbcon: inteldrmfb (fb0) is primary device Further commentary from Daniel: I sugggested this change to David after staring at the offending patch for a while. I have no idea and theory whatsoever why this would upset the gpu less than the other way round. But it seems to work. David promised to chase hw people a bit more to get a more meaningful answer. Wrt the comment that this deletes: I've done some digging and afaict loading context before ppgtt enable was once required before our recent restructuring of the context/ppgtt init code: Before that context sw setup (i.e. allocating the default context) and hw setup was smashed together. Also the setup of the default context was the bit that actually allocated the aliasing ppgtt structures. Which is the reason for the context before ppgtt depency. Or was, since with all the untangling there's no no real depency any more (functional, who knows what the hw is doing), so the comment is just stale. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Cc: stable@vger.kernel.org Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2015-01-26 11:00:32 +02:00
Dave Airlie	281d1bbd34	Merge remote-tracking branch 'origin/master' into drm-next Backmerge Linus tree after rc5 + drm-fixes went in. There were a few amdkfd conflicts I wanted to avoid, and Ben requested this for nouveau also. Conflicts: drivers/gpu/drm/amd/amdkfd/Makefile drivers/gpu/drm/amd/amdkfd/kfd_chardev.c drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c drivers/gpu/drm/amd/amdkfd/kfd_priv.h drivers/gpu/drm/amd/include/kgd_kfd_interface.h drivers/gpu/drm/i915/intel_runtime_pm.c drivers/gpu/drm/radeon/radeon_kfd.c	2015-01-22 10:44:41 +10:00
Daniel Vetter	0a87a2db48	Merge tag 'topic/i915-hda-componentized-2015-01-12' into drm-intel-next-queued Conflicts: drivers/gpu/drm/i915/intel_runtime_pm.c Separate branch so that Takashi can also pull just this refactoring into sound-next. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2015-01-12 23:07:46 +01:00
Chris Wilson	226e5ae9e5	drm/i915: Fix mutex->owner inspection race under DEBUG_MUTEXES If CONFIG_DEBUG_MUTEXES is set, the mutex->owner field is only cleared if the mutex debugging is enabled which introduces a race in our mutex_is_locked_by() - i.e. we may inspect the old owner value before it is acquired by the new task. This is the root cause of this error: diff --git a/kernel/locking/mutex-debug.c b/kernel/locking/mutex-debug.c index 5cf6731..3ef3736 100644 --- a/kernel/locking/mutex-debug.c +++ b/kernel/locking/mutex-debug.c @@ -80,13 +80,13 @@ void debug_mutex_unlock(struct mutex lock) DEBUG_LOCKS_WARN_ON(lock->owner != current); DEBUG_LOCKS_WARN_ON(!lock->wait_list.prev && !lock->wait_list.next); - mutex_clear_owner(lock); } / * __mutex_slowpath_needs_to_unlock() is explicitly 0 for debug * mutexes so that we can do it here after we've verified state. */ + mutex_clear_owner(lock); atomic_set(&lock->count, 1); } Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87955 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2015-01-12 10:53:02 +02:00
Dave Airlie	adc31849b2	Merge tag 'drm-intel-next-2014-12-19' of git://anongit.freedesktop.org/drm-intel into drm-next - plane handling refactoring from Matt Roper and Gustavo Padovan in prep for atomic updates - fixes and more patches for the seqno to request transformation from John - docbook for fbc from Rodrigo - prep work for dual-link dsi from Gaurav Signh - crc fixes from Ville - special ggtt views infrastructure from Tvrtko Ursulin - shadow patch copying for the cmd parser from Brad Volkin - execlist and full ppgtt by default on gen8, for testing for now * tag 'drm-intel-next-2014-12-19' of git://anongit.freedesktop.org/drm-intel: (131 commits) drm/i915: Update DRIVER_DATE to 20141219 drm/i915: Hold runtime PM during plane commit drm/i915: Organize bind_vma funcs drm/i915: Organize INSTDONE report for future. drm/i915: Organize PDP regs report for future. drm/i915: Organize PPGTT init drm/i915: Organize Fence registers for future enablement. drm/i915: tame the chattermouth (v2) drm/i915: Warn about missing context state workarounds only once drm/i915: Use true PPGTT in Gen8+ when execlists are enabled drm/i915: Skip gunit save/restore for cherryview drm/i915/chv: Use timeout mode for RC6 on chv drm/i915: Add GPGPU_THREADS_DISPATCHED to the register whitelist drm/i915: Tidy up execbuffer command parsing code drm/i915: Mark shadow batch buffers as purgeable drm/i915: Use batch length instead of object size in command parser drm/i915: Use batch pools with the command parser drm/i915: Implement a framework for batch buffer pools drm/i915: fix use after free during eDP encoder destroying drm/i915/skl: Skylake also supports DP MST ...	2015-01-10 08:46:24 +10:00
Chris Wilson	676fa5721c	drm/i915: Move the ban period onto the context This will allow us to set per-file, or even per-context, periods in the future. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-01-07 14:20:20 +01:00
Akash Goel	1816f92363	drm/i915: Support creation of unbound wc user mappings for objects This patch provides support to create write-combining virtual mappings of GEM object. It intends to provide the same funtionality of 'mmap_gtt' interface without the constraints and contention of a limited aperture space, but requires clients handles the linear to tile conversion on their own. This is for improving the CPU write operation performance, as with such mapping, writes and reads are almost 50% faster than with mmap_gtt. Similar to the GTT mmapping, unlike the regular CPU mmapping, it avoids the cache flush after update from CPU side, when object is passed onto GPU. This type of mapping is specially useful in case of sub-region update, i.e. when only a portion of the object is to be updated. Using a CPU mmap in such cases would normally incur a clflush of the whole object, and using a GTT mmapping would likely require eviction of an active object or fence and thus stall. The write-combining CPU mmap avoids both. To ensure the cache coherency, before using this mapping, the GTT domain has been reused here. This provides the required cache flush if the object is in CPU domain or synchronization against the concurrent rendering. Although the access through an uncached mmap should automatically invalidate the cache lines, this may not be true for non-temporal write instructions and also not all pages of the object may be updated at any given point of time through this mapping. Having a call to get_pages in set_to_gtt_domain function, as added in the earlier patch 'drm/i915: Broaden application of set-domain(GTT)', would guarantee the clflush and so there will be no cachelines holding the data for the object before it is accessed through this map. The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been extended with a new flags field (defaulting to 0 for existent users). In order for userspace to detect the extended ioctl, a new parameter I915_PARAM_MMAP_VERSION has been added for versioning the ioctl interface. v2: Fix error handling, invalid flag detection, renaming (ickle) v3: Rebase to latest drm-intel-nightly codebase The new mmapping is exercised by igt/gem_mmap_wc, igt/gem_concurrent_blit and igt/gem_gtt_speed. Change-Id: Ie883942f9e689525f72fe9a8d3780c3a9faa769a Signed-off-by: Akash Goel <akash.goel@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-01-06 09:08:00 +01:00
Chris Wilson	43566dedde	drm/i915: Broaden application of set-domain(GTT) Previously, this was restricted to only operate on bound objects - to make pointer access through the GTT to the object coherent with writes to and from the GPU. A second usecase is drm_intel_bo_wait_rendering() which at present does not function unless the object also happens to be bound into the GGTT (on current systems that is becoming increasingly rare, especially for the typical requests from mesa). A third usecase is a future patch wishing to extend the coverage of the GTT domain to include objects not bound into the GGTT but still in its coherent cache domain. For the latter pair of requests, we need to operate on the object regardless of its bind state. v2: After discussion with Akash, we came to the conclusion that the get-pages was required in order for accurate domain tracking in the corner cases (like the shrinker) and also useful for ensuring memory coherency with earlier cached CPU mmaps in case userspace uses exotic cache bypass (non-temporal) instructions. v3: Fix the inactive object check. v4: Rebase to latest drm-intel-nightly codebase Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Akash Goel <akash.goel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-01-06 09:08:00 +01:00
Dave Airlie	379a2d31cb	Merge tag 'drm-intel-next-fixes-2014-12-30' of git://anongit.freedesktop.org/drm-intel into linus I've had these since before -rc1, but they missed my last pull request. Real bug fixes and mostly cc: stable material. * tag 'drm-intel-next-fixes-2014-12-30' of git://anongit.freedesktop.org/drm-intel: drm/i915: add missing rpm ref to i915_gem_pwrite_ioctl Revert "drm/i915: Preserve VGACNTR bits from the BIOS" drm/i915: Don't call intel_prepare_page_flip() multiple times on gen2-4 drm/i915: Kill check_power_well() calls	2015-01-04 17:41:00 +10:00
Dave Airlie	da6b51d007	Revert "drm/gem: Warn on illegal use of the dumb buffer interface v2" This reverts commit `355a701838`. This had some bad side effects under normal operation, and should have been dropped earlier. Signed-off-by: Dave Airlie <airlied@redhat.com>	2014-12-24 13:13:22 +10:00
Imre Deak	5d77d9c5e1	drm/i915: add missing rpm ref to i915_gem_pwrite_ioctl Without this RPM ref we can hit the device suspended WARN via: i915_gem_object_pin()->ggtt_bind_vma->gen6_ggtt_insert_entries(). I noticed this on my BYT while keeping the i915 device in runtime suspended state for a while. I chose this place to take the ref to avoid the possible deadlock via the mutex_lock taken both later in this function and in the runtime suspend handler. This can happen if an RPM suspend event is queued and need to be flushed before taking the RPM ref. Testcase: igt/pm_rpm/gem-evict-pwrite Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87363 Signed-off-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2014-12-18 15:46:47 +02:00
Rodrigo Vivi	ce38ab0593	drm/i915: Organize Fence registers for future enablement. Let's be optimistic that for future platforms this will remain the same and reorg a bit. This reorg in if blocks instead of switch make life easier for future platform support addition. v2: Jani pointed out I was missing reg_830 for some gen3 platforms. So let's make this platforms subcases of Gen checks. Cc: Jani Nikula <jani.nikula@intel.com> Cc: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-12-17 18:17:54 +01:00
Brad Volkin	78a423772d	drm/i915: Use batch pools with the command parser This patch sets up all of the tracking and copying necessary to use batch pools with the command parser and dispatches the copied (shadow) batch to the hardware. After this patch, the parser is in 'enabling' mode. Note that performance takes a hit from the copy in some cases and will likely need some work. At a rough pass, the memcpy appears to be the bottleneck. Without having done a deeper analysis, two ideas that come to mind are: 1) Copy sections of the batch at a time, as they are reached by parsing. Might improve cache locality. 2) Copy only up to the userspace-supplied batch length and memset the rest of the buffer. Reduces the number of reads. v2: - Remove setting the capacity of the pool - One global pool instead of per-ring pools - Replace batch_obj with shadow_batch_obj and hook into eb->vmas - Memset any space in the shadow batch beyond what gets copied - Rebased on execlist prep refactoring v3: - Rebase on chained batch handling - Squash in setting the secure dispatch flag - Add a note about the interaction w/secure dispatch pinning - Check for request->batch_obj == NULL in i915_gem_free_request v4: - Fix read domains for shadow_batch_obj - Remove the set_to_gtt_domain call from i915_parse_cmds - ggtt_pin/unpin in the parser block to simplify error handling - Check USES_FULL_PPGTT before setting DISPATCH_SECURE flag - Remove i915_gem_batch_pool_put calls v5: - Move 'pending_read_domains \|= I915_GEM_DOMAIN_COMMAND' after the parser (danvet, from v4 0/7 feedback) Issue: VIZ-4719 Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com> Reviewed-By: Jon Bloomfield <jon.bloomfield@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-12-16 10:39:09 +01:00
Brad Volkin	493018dcb1	drm/i915: Implement a framework for batch buffer pools This adds a small module for managing a pool of batch buffers. The only current use case is for the command parser, as described in the kerneldoc in the patch. The code is simple, but separating it out makes it easier to change the underlying algorithms and to extend to future use cases should they arise. The interface is simple: init to create an empty pool, fini to clean it up, get to obtain a new buffer. Note that all buffers are expected to be inactive before cleaning up the pool. Locking is currently based on the caller holding the struct_mutex. We already do that in the places where we will use the batch pool for the command parser. v2: - s/BUG_ON/WARN_ON/ for locking assertions - Remove the cap on pool size - Switch from alloc/free to init/fini v3: - Idiomatic looping structure in _fini - Correct handling of purged objects - Don't return a buffer that's too much larger than needed v4: - Rebased to latest -nightly v5: - Remove _put() function and clean up comments to match v6: - Move purged check inside the loop (danvet, from v4 1/7 feedback) v7: - Use single list instead of two. (Chris W) - s/active_list/cache_list - Squashed in debug patches (Chris W) drm/i915: Add a batch pool debugfs file It provides some useful information about the buffers in the global command parser batch pool. v2: rebase on global pool instead of per-ring pools v3: rebase drm/i915: Add batch pool details to i915_gem_objects debugfs To better account for the potentially large memory consumption of the batch pool. v8: - Keep cache in LRU order (danvet, from v6 1/5 feedback) Issue: VIZ-4719 Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com> Reviewed-By: Jon Bloomfield <jon.bloomfield@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-12-16 10:39:08 +01:00
Tvrtko Ursulin	fe14d5f4e5	drm/i915: Infrastructure for supporting different GGTT views per object Things like reliable GGTT mappings and mirrored 2d-on-3d display will need to map objects into the same address space multiple times. Added a GGTT view concept and linked it with the VMA to distinguish between multiple instances per address space. New objects and GEM functions which do not take this new view as a parameter assume the default of zero (I915_GGTT_VIEW_NORMAL) which preserves the previous behaviour. This now means that objects can have multiple VMA entries so the code which assumed there will only be one also had to be modified. Alternative GGTT views are supposed to borrow DMA addresses from obj->pages which is DMA mapped on first VMA instantiation and unmapped on the last one going away. v2: * Removed per view special casing in i915_gem_ggtt_prepare / finish_object in favour of creating and destroying DMA mappings on first VMA instantiation and last VMA destruction. (Daniel Vetter) * Simplified i915_vma_unbind which does not need to count the GGTT views. (Daniel Vetter) * Also moved obj->map_and_fenceable reset under the same check. * Checkpatch cleanups. v3: * Only retire objects once the last VMA is unbound. v4: * Keep scatter-gather table for alternative views persistent for the lifetime of the VMA. * Propagate binding errors to callers and handle appropriately. v5: * Explicitly look for normal GGTT view in i915_gem_obj_bound to align usage in i915_gem_object_ggtt_unpin. (Michel Thierry) * Change to single if statement in i915_gem_obj_to_ggtt. (Michel Thierry) * Removed stray semi-colon in i915_gem_object_set_cache_level. For: VIZ-4544 Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Michel Thierry <michel.thierry@intel.com> [danvet: Drop hunk from i915_gem_shrink since it's just prettification but upsets a __must_check warning.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-12-15 11:25:04 +01:00
Jani Nikula	7bcc3777b1	drm/i915: release struct_mutex on the i915_gem_init_hw fail path Release struct_mutex if init_rings() fails. This is a regression introduced in commit `35a57ffbb1` Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Thu Nov 20 00:33:07 2014 +0100 drm/i915: Only init engines once Reported-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-12-05 15:03:44 +01:00
Daniel Vetter	9cca306880	drm/i915: Handle inaccurate time conversion issues So apparently jiffies<->nsec<->ktime isn't accurate or something. At elast if we timeout there's occasionally still a few hundred us left (in a 2 second timeout). Stuff I've tried and thrown out again: - Sampling the before timestamp before jiffies. Doesn't improve test path rate at all. - Using jiffies. Way to inaccurate, which means way too much drift with signals plus automatic ioctl restarting in userspace. In hindsight we should have used an absolute timeout, but hey we need something for v3 of the i915 gem wait interfaces ;-) - Trying to figure out where accuracy gets lost. gl testcase really don't care all that much about this (as long as isn't not massively off), it's just that the testcase gets a bit upset if it receives an EITME with timeout > 0. So as long as we're in the ballbark it's good enough. So patch everything up if we're at most one jiffies off. I get's me a solid test again. This regression is probably introduced in commit `5ed0bdf21a` Author: Thomas Gleixner <tglx@linutronix.de> Date: Wed Jul 16 21:05:06 2014 +0000 drm: i915: Use nsec based interfaces Use ktime_get_raw_ns() and get rid of the back and forth timespec conversions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: John Stultz <john.stultz@linaro.org> Probably because I'm too lazy to confirm myself and still waiting for QA ;-) Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82749 Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2014-12-05 15:24:45 +02:00
Daniel Vetter	7bd0e226e3	drm/i915: compute wait_ioctl timeout correctly We've lost the +1 required for correct timeouts in commit `5ed0bdf21a` Author: Thomas Gleixner <tglx@linutronix.de> Date: Wed Jul 16 21:05:06 2014 +0000 drm: i915: Use nsec based interfaces Use ktime_get_raw_ns() and get rid of the back and forth timespec conversions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: John Stultz <john.stultz@linaro.org> So fix this up by reinstating our handrolled _timeout function. While at it bother with handling MAX_JIFFIES. v2: Convert to usecs (we don't care about the accuracy anyway) first to avoid overflow issues Dave Gordon spotted. v3: Drop the explicit MAX_JIFFY_OFFSET check, usecs_to_jiffies should take care of that already. It might be a bit too enthusiastic about it though. v4: Chris has a much nicer color, so use his implementation. This requires to export nsec_to_jiffies from time.c. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dave Gordon <david.s.gordon@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82749 Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Acked-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2014-12-05 15:20:24 +02:00
Tvrtko Ursulin	f763566992	drm/i915: Stop putting GGTT VMA at the head of the list Multiple GGTT VMAs per object will be introduced in the near future which will make it impossible to guarantee normal GGTT view is at the head of the list. Purpose of this patch is to break this assumption straight away so any potential hidden assumptions in the code base can be bisected to this simple patch. For: VIZ-4544 Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Suggested-by: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-12-04 11:51:36 +01:00
Daniel Vetter	d5abdfda91	drm/i915: Move init_unused_rings to gem_init_hw We need to do that every time we resume the rings, not just at load. I've overlooked this in my untangling of the ring init code. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Reviewed-by: Dave Gordon <david.s.gordon@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-12-03 09:35:30 +01:00
Daniel Vetter	35a57ffbb1	drm/i915: Only init engines once We can do this. And now there's finally the clean split between software setup and hardware setup I kinda wanted since multi-ring support was merged aeons ago. It only took almost 5 years. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Reviewed-by: Dave Gordon <david.s.gordon@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-12-03 09:35:28 +01:00
John Harrison	581c26e8a2	drm/i915: Convert 'trace_irq' to use requests rather than seqnos Updated the trace_irq code to use requests instead of seqnos. This includes reference counting the request object to ensure it sticks around when required. Note that getting access to the reference counting functions means moving the inline i915_trace_irq_get() function from intel_ringbuffer.h to i915_drv.h. For: VIZ-4377 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com> [danvet: Resolve conflict due to shuffled merge order.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-12-03 09:35:24 +01:00
John Harrison	41c5241555	drm/i915: Remove the now redundant 'obj->ring' The ring member of the object structure was always updated with the last_read_seqno member. Thus with the conversion to last_read_req, obj->ring is now a direct copy of obj->last_read_req->ring. This makes it somewhat redundant and potentially misleading (especially as there was no comment to explain its purpose). This checkin removes the redundant field. Many uses were simply testing for non-null to see if the object is active on the GPU. Some of these have been converted to check 'obj->active' instead. Others (where the last_read_req is about to be used anyway) have been changed to check obj->last_read_req. The rest simply pull the ring out from the request structure and proceed as before. For: VIZ-4377 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-12-03 09:35:23 +01:00
John Harrison	1b5a433a4d	drm/i915: Convert 'i915_seqno_passed' calls into 'i915_gem_request_completed' Almost everywhere that caled i915_seqno_passed() was really asking 'has the given seqno popped out of the hardware yet?'. Thus it had to query the current hardware seqno and then do a signed delta comparison (which copes with wrapping around zero but not with seqno values more than 2GB apart, although the latter is unlikely!). Now that the majority of seqno instances have been replaced with request structures, it is possible to convert this test to be request based as well. There is now a 'i915_gem_request_completed()' function which takes a request and returns true or false as appropriate. Note that this currently just wraps up the original _passed() test but a later patch in the series will reduce this to simply returning a cached internal value, i.e.: _completed(req) { return req->completed; }' This checkin converts almost all _seqno_passed() calls. The only one left is in the semaphore code which still requires seqnos not request structures. For: VIZ-4377 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com> [danvet: Drop hunk touching the trace_irq code since I've dropped the patch which converts that, and resolve resulting conflict.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-12-03 09:35:22 +01:00
John Harrison	ff79e85702	drm/i915: Connect requests to rings at creation not submission It makes a lot more sense (and makes future seqno -> request conversion patches simpler) to fill in the 'ring' field of the request structure at the point of creation rather than submission. Given that the request structure is assigned by ring specific code and thus is locked to a ring from the start, there really is no reason to defer this assignment. For: VIZ-4377 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-12-03 09:35:22 +01:00
John Harrison	74328ee510	drm/i915: Convert trace functions from seqno to request All the code above is now using requests not seqnos so it is possible to convert the trace functions across. Note that rather than get into problematic reference counting issues, the trace code only saves the seqno and ring values from the request structure not the structure pointer itself. For: VIZ-4377 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-12-03 09:35:21 +01:00
John Harrison	9400ae5c82	drm/i915: Remove obsolete seqno parameter from 'i915_add_request' There is no longer any need to retrieve a seqno value from an i915_add_request() call. The calling code already knows which request structure is being processed (it can only be ring->OLR). And as the request itself is now used in preference to the basic seqno value, the latter is now redundant in this situation. For: VIZ-4377 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-12-03 09:35:19 +01:00
John Harrison	9c65481829	drm/i915: Convert __wait_seqno() to __wait_request() Now that all code above is using request structures instead of seqno values, it is possible to convert __wait_seqno() itself. Internally, it is still calling i915_seqno_passed(), this will be updated later in the series. This step is just changing the parameter list and function name. For: VIZ-4377 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-12-03 09:35:19 +01:00
Daniel Vetter	a4b3a5713d	drm/i915: Convert i915_wait_seqno to i915_wait_request Updated i915_wait_seqno() to take a request structure instead of a seqno value and renamed it accordingly. Internally, it just pulls the seqno out of the request and calls on to __wait_seqno() as before. However, all the code further up the stack is now simplified as it can just pass the request object straight through without having to peek inside. For: VIZ-4377 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com> [danvet: Squash in hunk from an earlier patch which was rebased wrongly.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-12-03 09:35:17 +01:00
John Harrison	b6660d59f6	drm/i915: Make 'i915_gem_check_olr' actually check by request not seqno Updated the _check_olr() function to actually take a request object and compare it to the OLR rather than extracting seqnos and comparing those. Note that there is one use case where the request object being processed is no longer available at that point in the call stack. Hence a temporary copy of the original function is still present (but called _check_ols() instead). This will be removed in a subsequent patch. Also, downgraded a BUG_ON to a WARN_ON as apparently the former is frowned upon for shipping code. For: VIZ-4377 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-12-03 09:35:16 +01:00
John Harrison	6259cead57	drm/i915: Remove 'outstanding_lazy_seqno' The OLS value is now obsolete. Exactly the same value is guarateed to be always available as PLR->seqno. Thus it is safe to remove the OLS completely. And also to rename the PLR to OLR to keep the 'outstanding lazy ...' naming convention valid. For: VIZ-4377 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-12-03 09:35:16 +01:00
John Harrison	ff8658850a	drm/i915: Ensure requests stick around during waits Added reference counting of the request structure around __wait_seqno() calls. This is a precursor to updating the wait code itself to take the request rather than a seqno. At that point, it would be a Bad Idea for a request object to be retired and freed while the wait code is still using it. v3: Note that even though the mutex lock is held during a call to i915_wait_seqno(), it is still necessary to explicitly bump the reference count. It appears that the shrinker can asynchronously retire items even though the mutex is locked. For: VIZ-4377 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com> [danvet: Remove wrongly squashed hunk which breaks the build.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-12-03 09:35:15 +01:00
John Harrison	54fb2411dd	drm/i915: Convert i915_gem_ring_throttle to use requests Convert the throttle code to use the request structure rather than extracting a ring/seqno pair from it and using those. This is in preparation for __wait_seqno() becoming __wait_request(). For: VIZ-4377 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-12-03 09:35:15 +01:00
John Harrison	97b2a6a10a	drm/i915: Replace last_[rwf]_seqno with last_[rwf]_req The object structure contains the last read, write and fenced seqno values for use in syncrhonisation operations. These have now been replaced with their request structure counterparts. Note that to ensure that objects do not end up with dangling pointers, the assignments of last_*_req include reference count updates. Thus a request cannot be freed if an object is still hanging on to it for any reason. v2: Corrected 'last_rendering_' to 'last_read_' in a number of comments that did not get updated when 'last_rendering_seqno' became 'last_read\|write_seqno' several millenia ago. For: VIZ-4377 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-12-03 09:35:14 +01:00
John Harrison	abfe262ae7	drm/i915: Add reference count to request structure The plan is to use request structures everywhere that seqno values were previously used. This means saving pointers to structures in places that used to be simple integers. In turn, that means that the target structure now needs much more stringent lifetime tracking. That is, it must not be freed while some other random object still holds a pointer to it. To achieve this tracking, a reference count needs to be added. Whenever a pointer to the structure is saved away, the count must be incremented and the free must only occur when all references have been released. For: VIZ-4377 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-12-03 09:35:13 +01:00
Chris Wilson	bdcf120bfc	drm/i915: Assert that we successfully downclock the GPU before suspend Before suspending, we wait upon the outstanding GPU requests and flush our pending idle handlers. This should downclock the GPU to its lowest power state. Add a WARN to check that the delayed tasks were run and did their job properly. Suggested-by: Akash Goel <akash.goel@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-12-03 09:35:12 +01:00
Daniel Vetter	4feb765943	drm/i915: Remove user pinning code Now unused. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2014-12-03 09:35:11 +01:00
Thomas Daniel	0794aed302	drm/i915: Fix context object leak for legacy contexts Dynamic context pinning for LRCs introduced a leak in legacy mode. Reinstate context unreference in i915_gem_free_request for legacy contexts. Leak reported by i-g-t/drv_module_reload fixed by this patch. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86507 Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> Reviewed-by: John Harrison<John.C.Harrison@Intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-12-03 09:29:41 +01:00
Daniel Vetter	d472fcc837	drm/i915: Disallow pin ioctl completely for kms drivers The problem here is that SNA pins batchbuffers to etch out a bit more performance. Iirc it started out as a w/a for i830M (which we've implemented in the kernel since a long time already). The problem is that the pin ioctl wasn't added in commit `d23db88c3a` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri May 23 08:48:08 2014 +0200 drm/i915: Prevent negative relocation deltas from wrapping Fix this by simply disallowing pinning from userspace so that the kernel is in full control of batch placement again. Especially since distros are moving towards running X as non-root, so most users won't even be able to see any benefits. UMS support is dead now, but we need this minimal patch for backporting. Follow-up patch will remove the pin ioctl code completely. Note to backporters: You must have both commit `b45305fce5` Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Mon Dec 17 16:21:27 2012 +0100 drm/i915: Implement workaround for broken CS tlb on i830/845 which laned in 3.8 and commit `c4d69da167` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Sep 8 14:25:41 2014 +0100 drm/i915: Evict CS TLBs between batches which is also marked cc: stable. Otherwise this could introduce a regression by disabling the userspace w/a without the kernel w/a being fully functional on i830/45. References: https://bugs.freedesktop.org/show_bug.cgi?id=76554#c116 Cc: stable@vger.kernel.org # requires `c4d69da167` and v3.8 Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2014-12-03 09:29:35 +01:00
Dave Airlie	26045b53c9	Merge tag 'drm-intel-next-2014-11-21-fixed' of git://anongit.freedesktop.org/drm-intel into drm-next drm-intel-next-2014-11-21: - infoframe tracking (for fastboot) from Jesse - start of the dri1/ums support removal - vlv forcewake timeout fixes (Imre) - bunch of patches to polish the rps code (Imre) and improve it on bdw (Tom O'Rourke) - on-demand pinning for execlist contexts - vlv/chv backlight improvements (Ville) - gen8+ render ctx w/a work from various people - skl edp programming (Satheeshakrishna et al.) - psr docbook (Rodrigo) - piles of little fixes and improvements all over, as usual * tag 'drm-intel-next-2014-11-21-fixed' of git://anongit.freedesktop.org/drm-intel: (117 commits) drm/i915: Don't pin LRC in GGTT when dumping in debugfs drm/i915: Update DRIVER_DATE to 20141121 drm/i915/g4x: fix g4x infoframe readout drm/i915: Only call mod_timer() if not already pending drm/i915: Don't rely upon encoder->type for infoframe hw state readout drm/i915: remove the IRQs enabled WARN from intel_disable_gt_powersave drm/i915: Use ggtt error obj capture helper for gen8 semaphores drm/i915: vlv: increase timeout when setting idle GPU freq drm/i915: vlv: fix cdclk setting during modeset while suspended drm/i915: Dump hdmi pipe_config state drm/i915: Gen9 shadowed registers drm/i915/skl: Gen9 multi-engine forcewake drm/i915: Read power well status before other registers for drpc info drm/i915: Pin tiled objects for L-shaped configs drm/i915: Update ring freq for full gpu freq range drm/i915: change initial rps frequency for gen8 drm/i915: Keep min freq above floor on HSW/BDW drm/i915: Use efficient frequency for HSW/BDW drm/i915: Can i915_gem_init_ioctl drm/i915: Sanitize ->lastclose ...	2014-12-03 08:25:59 +10:00
Thomas Hellstrom	355a701838	drm/gem: Warn on illegal use of the dumb buffer interface v2 It happens on occasion that developers of generic user-space applications abuse the dumb buffer API to get hold of drm buffers that they can both mmap() and use for GPU acceleration, using the assumptions that dumb buffers and buffers available for GPU are a) The same type and can be aribtrarily type-casted. b) fully coherent. This patch makes the most widely used drivers warn nicely when that happens, the next step will be to fail. v2: Move drmP.h changes to drm_gem.h. Fix Radeon dumb mmap breakage. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2014-11-21 12:12:41 +10:00
Daniel Vetter	656bfa3afc	drm/i915: Pin tiled objects for L-shaped configs Let's just throw in the towel on this one and take the cheap way out. Based on a patch from Chris Wilson, but checking for a different bit. Chris' patch checked for even bank layout, this one here for a magic bit. Given the evidence we've gathered (not much) both work I think, but checking for the magic bit might be more accurate. Anyway, works on my gm45 here. For paranoi restrict to gen4 (and mobile), since we've only ever seen this on gm45 and i965gm. Also add some debugfs output so that we can skip the tiled swapping tests properly in these cases. v2: Clean up the quirk'ed pin count in free_object to avoid upsetting the WARN_ON. Spotted by Chris. Cc: Chris Wilson <chris@chris-wilson.co.uk> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=28813 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45092 Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-11-20 13:03:33 +01:00
Daniel Vetter	f548c0e9d4	drm/i915: Can i915_gem_init_ioctl Found one more! With this we can clear up the ggtt init code a bit, yay! Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2014-11-20 13:03:31 +01:00
Daniel Vetter	377e91b204	drm/i915: Sanitize ->lastclose With this all the ums nonsense around gem setup/teardown has disappeared, yay! Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2014-11-20 13:03:30 +01:00
Daniel Vetter	8725548307	drm/i915: Ditch dev_priv->ums.mm_suspend Again just complicates gem init functions and makes a general mess out of everything. Good riddance! v2: In my enthusiasm to start removing dri1/ums crud I went overboard a bit and killed parts of hangcheck. Resurrect it. Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2014-11-20 13:02:57 +01:00
Daniel Vetter	71b14ab618	drm/i915: No-Op enter/leave vt gem ioctl We've killed ums support by now, it's time to reap the benefits. This one here is getting in the way of doing some ring init cleanup. Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2014-11-19 21:34:30 +01:00
Chris Wilson	5c6c600354	drm/i915: Remove DRI1 ring accessors and API With the deprecation of UMS, and by association DRI1, we have a tough choice when updating the ring access routines. We either rewrite the DRI1 routines blindly without testing (so likely to be broken) or take the liberty of declaring them no longer supported and remove them entirely. This takes the latter approach. v2: Also remove the DRI1 sarea updates Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Fix rebase conflicts.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-11-19 21:17:11 +01:00
Oscar Mateo	dcb4c12a68	drm/i915/bdw: Pin the context backing objects to GGTT on-demand Up until now, we have pinned every logical ring context backing object during creation, and left it pinned until destruction. This made my life easier, but it's a harmful thing to do, because we cause fragmentation of the GGTT (and, eventually, we would run out of space). This patch makes the pinning on-demand: the backing objects of the two contexts that are written to the ELSP are pinned right before submission and unpinned once the hardware is done with them. The only context that is still pinned regardless is the global default one, so that the HWS can still be accessed in the same way (ring->status_page). v2: In the early version of this patch, we were pinning the context as we put it into the ELSP: on the one hand, this is very efficient because only a maximum two contexts are pinned at any given time, but on the other hand, we cannot really pin in interrupt time :( v3: Use a mutex rather than atomic_t to protect pin count to avoid races. Do not unpin default context in free_request. v4: Break out pin and unpin into functions. Fix style problems reported by checkpatch v5: Remove unpin_lock as all pinning and unpinning is done with the struct mutex already locked. Add WARN_ONs to make sure this is the case in future. Issue: VIZ-4277 Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> Reviewed-by: Akash Goel <akash.goels@gmail.com> Reviewed-by: Deepak S<deepak.s@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-11-19 19:32:58 +01:00
Thomas Daniel	c86ee3a9f8	drm/i915/bdw: Clean up execlist queue items in retire_work No longer create a work item to clean each execlist queue item. Instead, move retired execlist requests to a queue and clean up the items during retire_requests. v2: Fix legacy ring path broken during overzealous cleanup v3: Update idle detection to take execlists queue into account v4: Grab execlist lock when checking queue state v5: Fix leaking requests by freeing in execlists_retire_requests. Issue: VIZ-4274 Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> Reviewed-by: Deepak S <deepak.s@linux.intel.com> Reviewed-by: Akash Goel <akash.goels@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-11-19 19:16:45 +01:00
Chris Wilson	6a2c4232ec	drm/i915: Make the physical object coherent with GTT Currently objects for which the hardware needs a contiguous physical address are allocated a shadow backing storage to satisfy the contraint. This shadow buffer is not wired into the normal obj->pages and so the physical object is incoherent with accesses via the GPU, GTT and CPU. By setting up the appropriate scatter-gather table, we can allow userspace to access the physical object via either a GTT mmaping of or by rendering into the GEM bo. However, keeping the CPU mmap of the shmemfs backing storage coherent with the contiguous shadow is not yet possible. Fortuituously, CPU mmaps of objects requiring physical addresses are not expected to be coherent anyway. This allows the physical constraint of the GEM object to be transparent to userspace and allow it to efficiently render into or update them via the GTT and GPU. v2: Fix leak of pci handle spotted by Ville v3: Remove the now duplicate call to detach_phys_object during free. v4: Wait for rendering before pwrite. As this patch makes it possible to render into the phys object, we should make it correct as well! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-11-14 10:29:18 +01:00
Ander Conselvan de Oliveira	16e9a21f33	drm/i915: Make __wait_seqno non-static and rename to __i915_wait_seqno So that it can be used by the flip code to wait for rendering without holding any locks. Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-11-07 18:42:20 +01:00
Chris Wilson	c826c44938	drm/i915: Request PIN_GLOBAL when pinning a vma for GTT relocations Always require PIN_GLOBAL when we want a mappable offset (PIN_MAPPABLE). This causes the pin to fixup the global binding in cases were the vma was already bound (and due to the proceeding bug, we considered it to be already mappable). References: https://bugs.freedesktop.org/show_bug.cgi?id=85671 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Add WARN_ON to check that PIN_MAP implies PIN_GLOBAL as discussed on irc.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-11-07 18:42:00 +01:00
Chris Wilson	ef79e17cce	drm/i915: Only mark as map-and-fenceable when bound into the GGTT We use the obj->map_and_fenceable hint for when we already have a valid mapping of this object in the aperture. This hint can only apply to the GGTT and not to the aliasing-ppGTT. One user of the hint is execbuffer relocation, which began to fail when it tried to follow the hint and perform the relocate through the non-existent GGTT mapping. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85671 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-11-07 18:42:00 +01:00
John Harrison	8e63954903	drm/i915: Remove redundant parameter to i915_gem_object_wait_rendering__tail() An earlier commit (c8725f3dc0911d4354315a65150aecd8b7d0d74a: Do not call retire_requests from wait_for_rendering) removed the use of the ring parameter within wait_rendering__tail() but did not remove the parameter itself. As the plan is to remove obj->ring which is where this parameter comes from, it is simpler to just remove the parameter completely than to update it with a new source. For: VIZ-4377 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> CC: Chris Wilson <chris@chris-wilson.co.uk> CC: Brad Volkin <bradley.d.volkin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-11-07 18:41:58 +01:00
Tvrtko Ursulin	aff437667b	drm/i915: Move flags describing VMA mappings into the VMA If these flags are on the object level it will be more difficult to allow for multiple VMAs per object. v2: Simplification and cleanup after code review comments (Chris Wilson). Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-11-04 14:04:51 +01:00
Daniel Vetter	11b5d5112c	drm/i915: Correctly reject invalid flags for wait_ioctl Not having checks for this isn't good. I've checked igt and libdrm and they all already clear flags properly. So we're lucky and should be able to sneak this ABI clarification in. Testcase: igt/gem_wait/invalid-flags Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85280 Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-10-24 16:34:14 +02:00
Daniel Vetter	3436738180	drm/i915: Document that mmap forwarding is discouraged Too many new drm driver writers seem to look at i915 for inspiration. But we have two ways to do mmap, so discourage readers from the old, ugly version. In a new driver we'd just expose two mmap offsets per object, one for the gtt map and the other for the cpu map. v2: Make it clear that i915 does cpu mmaps this way for past cluelessness^W^W historical reasons. Asked for by Jani. Cc: "Cheng, Yao" <yao.cheng@intel.com> Cc: David Herrmann <dh.herrmann@gmail.com> Reviewed-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-10-24 16:34:04 +02:00
Chris Wilson	bb9059d3a0	drm/i915: Suppress no action noise from oom shrinker If we are not able to free anything (the shrinker leaves nothing on the global object lists), do not log anything. This is useful when other subsystems are being stress-tested for their oom behaviour and i915.ko is shouting into the logs about doing nothing. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-10-24 16:34:02 +02:00
Chris Wilson	005445c5fb	drm/i915: Report the current number of bytes freed during oom The shrinker reports the number of pages freed, but we try to log the number of bytes - which leads to some nonsense values being reportedly freed during oom. Reported-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-10-24 16:34:01 +02:00
Chris Wilson	60a5372777	drm/i915: Remove the duplicated logic between the two shrink phases We can use the same logic to walk the different bound/unbound lists during shrinker (as the unbound list is a degenerate case of the bound list), slightly compacting the code. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-10-03 13:52:15 +02:00
Daniel Vetter	955e36d0b4	Merge branch 'topic/skl-stage1' into drm-intel-next-queued SKL stage 1 patches still need polish so will likely miss the 3.18 merge window. We've decided to postpone to 3.19 so let's pull this in to make patch merging and conflict handling easier. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2014-09-30 22:36:57 +02:00
Damien Lespiau	01209dd56e	drm/i915/skl: Fence registers on SKL are the same as SNB v2: Rebased on top of the i915_gpu_error.c extraction. Reviewed-by: Thomas Wood <thomas.wood@intel.com> Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-09-24 14:33:13 +02:00
Daniel Vetter	b680c37a4d	drm/i915: DocBook integration for frontbuffer tracking I shouldn't ask everyone to do this and fail myself ... This extracts all the frontbuffer tracking functions into intel_frontbuffer.c, adds a DOC overview section and also adds the missing kerneldoc for i915_gem_track_fb and also pulls it into the same section for convenience. v2: Don't forget about the header files. v3: Oops, might check compilation next time around. To make my life easier drop the increase_pllclock from set_base_atomic since really, it doesn't matter if you see your Oops or kgdb with a tiny bit of lag. v4: Try to better explain how to actually use this, requested by Paulo on irc. v5: Explain invalidate/flush a bit clearer. v6: s/business/busyness/ Acked-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Vandana Kannan <vandana.kannan@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2014-09-19 19:46:49 +02:00
Chris Wilson	be2d599b5d	drm/i915: Remove dead code, i915_gem_verify_gtt The data structure it was supposed to be sanity checking has long gone. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-09-19 14:43:09 +02:00
Chris Wilson	4144f9b5e0	drm/i915: Match GTT space sanity checker with implementation If we believe that the device can cross cache domains in its prefetcher (i.e. we allow neighbouring pages in different domains), we don't supply a color_adjust callback. Use the presence of this callback to better determine when we should be verifying that the GTT space we just used is valid. v2: Remove the superfluous struct drm_device function param as well. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Also adjust the comment per irc discussion with Chris.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-09-19 14:41:18 +02:00
Chris Wilson	1d1ef21daf	drm/i915: Drop any active reference before unbinding Before we process the final unbind on an object and move it to the unbound list, it is semantically cleaner if there are no more active references to the object. (An active reference would imply that it was still being accessed by the GPU after it became inaccessible.) The caveat is that all callsites must be prepared for the object to disappeared during the unbind - i.e. they must hold their own reference. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-09-19 14:41:15 +02:00
Chris Wilson	21ab4e746d	drm/i915: Objects on the unbound list may still have an active reference Due to the lazy retirement semantics, even though we have unbound an object, it may still hold onto an active reference. So in the debug code, play safe. v2: Export i915_gem_shrink() rather than opencoding it. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-09-19 14:41:14 +02:00
Dave Airlie	40d201af0b	Merge tag 'drm-intel-next-2014-09-05' of git://anongit.freedesktop.org/drm-intel into drm-next - final bits (again) for the rotation support (Sonika Jindal) - support bl_power in the intel backlight (Jani) - vdd handling improvements from Ville - i830M fixes from Ville - piles of prep work all over to make skl enabling just plug in (Damien, Sonika) - rename DP training defines to reflect latest edp standards, this touches all drm drivers supporting DP (Sonika Jindal) - cache edids during single detect cycle to avoid re-reading it for e.g. audio, from Chris - move w/a for registers which are stored in the hw context to the context init code (Arun&Damien) - edp panel power sequencer fixes, helps chv a lot (Ville) - piles of other chv fixes all over - much more paranoid pageflip handling with stall detection and better recovery from Chris - small things all over, as usual * tag 'drm-intel-next-2014-09-05' of git://anongit.freedesktop.org/drm-intel: (114 commits) drm/i915: Update DRIVER_DATE to 20140905 drm/i915: Decouple the stuck pageflip on modeset drm/i915: Check for a stalled page flip after each vblank drm/i915: Introduce a for_each_plane() macro drm/i915: Rewrite ABS_DIFF() in a safer manner drm/i915: Add comments explaining the vdd on/off functions drm/i915: Move DP port disable to post_disable for pch platforms drm/i915: Enable DP port earlier drm/i915: Turn on panel power before doing aux transfers drm/i915: Be more careful when picking the initial power sequencer pipe drm/i915: Reset power sequencer pipe tracking when disp2d is off drm/i915: Track which port is using which pipe's power sequencer drm/i915: Fix edp vdd locking drm/i915: Reset the HEAD pointer for the ring after writing START drm/i915: Fix unsafe vma iteration in i915_drop_caches drm/i915: init sprites with univeral plane init function drm/i915: Check of !HAS_PCH_SPLIT() in PCH transcoder funcs drm/i915: Use HAS_GMCH_DISPLAY un underrun reporting code drm/i915: Use IS_BROADWELL() instead of IS_GEN8() in forcewake code drm/i915: Don't call gen8_fbc_sw_flush() on chv ...	2014-09-16 16:02:09 +10:00
Dave Airlie	b2efb3f0a1	Linux 3.17-rc5 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUFjfVAAoJEHm+PkMAQRiGANkIAIU3PNrAz9dIItq8a/rEAhnx l2shHoOyEmyNR2apholM3BPUNX50cbsc/HGdi7lZKLkA/ifAj6B9nFD2NzVsIChD 1QWVcvdkKlVuxXCDd26qbijlfmbTOAWrLw9ntvM+J6ZtECM6zCAZF4MAV/FwogPq ETGKD76AxJtVIhBMS99troAiC1YxmQ7DKgEr8CraTOR1qwXEonnPCmN/IZA6x2/G EXiihOuQB5me1X7k4PI0V8CDscQOn+3B2CQHIrjRB+KiTF+iKIuI8n6ORC6bpFh+ U8UZP9wLlIG1BrUHG83pIndglIHotqPcjmtfl1WGrRr2hn7abzVSfV+g5Syo3Vg= =Ep+s -----END PGP SIGNATURE----- drm: backmerge tag 'v3.17-rc5' into drm-next This is requested to get the fixes for intel and radeon into the same tree for future development work. i915_display.c: fix missing dev_priv conflict.	2014-09-16 11:38:04 +10:00
Daniel Vetter	2232f0315c	drm/i915: Fix EIO/wedged handling in gem fault handler In commit `1f83fee08d` Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Thu Nov 15 17:17:22 2012 +0100 drm/i915: clear up wedged transitions I've accidentally inverted the EIO/wedged handling in the fault handler: We want to return the EIO as a SIGBUS only if it's not because of the gpu having died, to prevent userspace from unduly dying. In my defence the comment right above is completely misleading, so fix both. v2: Drop the WARN_ON, it's not actually a bug to e.g. receive an -EIO when swap-in fails. v3: Don't remove too much ... oops. Reported-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2014-09-08 08:38:50 +03:00
Ville Syrjälä	81e7f2002b	drm/i915: Idle unused rings on gen2/3 during init/resume gen2/3 platforms have a boatload of rings we're not using. On my 830 the BIOS/hw can leave some of those "active" after resume which will prevent c3 entry. The ring is apparently considered active whenever head != tail even if the ring is disabled. Disable and clear all such unused ringbuffers on init/resume. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-09-03 11:05:06 +02:00
Thomas Daniel	ecdb5fd861	drm/i915/bdw: Don't execute context reset and switch with Execlists These two functions make no sense in an Logical Ring Context & Execlists world. v2: We got rid of lrc_enabled and centralized everything in the sanitized i915.enable_execlists instead. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> v3: Rebased. Corrected a typo in comment for i915_switch_context and added a comment that it should not be called in execlist mode. Added WARN_ON if i915_switch_context is called in execlist mode. Moved check for execlist mode out of i915_switch_context and into callers. Added comment in context_reset explaining why nothing is done in execlist mode. Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> [danvet: Simplify the patch subject so I can understand it.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-09-03 11:04:17 +02:00
McAulay, Alistair	6689c167ae	drm/i915: Rework GPU reset sequence to match driver load & thaw This patch is to address Daniels concerns over different code during reset: http://lists.freedesktop.org/archives/intel-gfx/2014-June/047758.html "The reason for aiming as hard as possible to use the exact same code for driver load, gpu reset and runtime pm/system resume is that we've simply seen too many bugs due to slight variations and unintended omissions." Tested using igt drv_hangman. V2: Cleaner way of preventing check_wedge returning -EAGAIN V3: Clean the last_context during reset, to ensure do_switch() does the MI_SET_CONTEXT. As per review. Signed-off-by: McAulay, Alistair <alistair.mcaulay@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> [danvet: Rebase over ctx->ppgtt rework and extend the comment in check_wedge a bit.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-09-03 10:54:09 +02:00
Dave Airlie	a18b29f0c6	Merge tag 'drm-intel-next-2014-09-01' of git://anongit.freedesktop.org/drm-intel into drm-next drm-intel-next-2014-08-22: - basic code for execlist, which is the fancy new cmd submission on gen8. Still disabled by default (Ben, Oscar Mateo, Thomas Daniel et al) - remove the useless usage of console_lock for I915_FBDEV=n (Chris) - clean up relations between ctx and ppgtt - clean up ppgtt lifetime handling (Michel Thierry) - various cursor code improvements from Ville - execbuffer code cleanups and secure batch fixes (Chris) - prep work for dev -> dev_priv transition (Chris) - some of the prep patches for the seqno -> request object transition (Chris) - various small improvements all over * tag 'drm-intel-next-2014-09-01' of git://anongit.freedesktop.org/drm-intel: (86 commits) drm/i915: fix suspend/resume for GENs w/o runtime PM support drm/i915: Update DRIVER_DATE to 20140822 drm: fix plane rotation when restoring fbdev configuration drm/i915/bdw: Disable execlists by default drm/i915/bdw: Enable Logical Ring Contexts (hence, Execlists) drm/i915/bdw: Document Logical Rings, LR contexts and Execlists drm/i915/bdw: Print context state in debugfs drm/i915/bdw: Display context backing obj & ringbuffer info in debugfs drm/i915/bdw: Display execlists info in debugfs drm/i915/bdw: Disable semaphores for Execlists drm/i915/bdw: Make sure gpu reset still works with Execlists drm/i915/bdw: Don't write PDP in the legacy way when using LRCs drm/i915: Track cursor changes as frontbuffer tracking flushes drm/i915/bdw: Help out the ctx switch interrupt handler drm/i915/bdw: Avoid non-lite-restore preemptions drm/i915/bdw: Handle context switch events drm/i915/bdw: Two-stage execlist submit process drm/i915/bdw: Write the tail pointer, LRC style drm/i915/bdw: Implement context switching (somewhat) drm/i915/bdw: Emission of requests with logical rings ... Conflicts: drivers/gpu/drm/i915/i915_drv.c	2014-09-03 08:30:48 +10:00
Oscar Mateo	cc9130be80	drm/i915/bdw: Make sure gpu reset still works with Execlists If we reset a ring after a hang, we have to make sure that we clear out all queued Execlists requests. v2: The ring is, at this point, already being correctly re-programmed for Execlists, and the hangcheck counters cleared. v3: Daniel suggests to drop the "if (execlists)" because the Execlists queue should be empty in legacy mode (which is true, if we do the INIT_LIST_HEAD). v4: Do the pending intel_runtime_pm_put Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-20 17:17:48 +02:00
Oscar Mateo	48e29f5535	drm/i915/bdw: Emission of requests with logical rings On a previous iteration of this patch, I created an Execlists version of __i915_add_request and asbtracted it away as a vfunc. Daniel Vetter wondered then why that was needed: "with the clean split in command submission I expect every function to know wether it'll submit to an lrc (everything in intel_lrc.c) or wether it'll submit to a legacy ring (existing code), so I don't see a need for an add_request vfunc." The honest, hairy truth is that this patch is the glue keeping the whole logical ring puzzle together: - i915_add_request is used by intel_ring_idle, which in turn is used by i915_gpu_idle, which in turn is used in several places inside the eviction and gtt codes. - Also, it is used by i915_gem_check_olr, which is littered all over i915_gem.c - ... If I were to duplicate all the code that directly or indirectly uses __i915_add_request, I'll end up creating a separate driver. To show the differences between the existing legacy version and the new Execlists one, this time I have special-cased __i915_add_request instead of adding an add_request vfunc. I hope this helps to untangle this Gordian knot. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: Adjust to ringbuf->FIXME_lrc_ctx per the discussion with Thomas Daniel.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-14 22:02:55 +02:00
Daniel Vetter	82460d9724	drm/i915: Rework ppgtt init to no require an aliasing ppgtt Currently we abuse the aliasing ppgtt to set up the ppgtt support in general. Which is a bit backwards since with full ppgtt we don't ever need the aliasing ppgtt. So untangle this and separate the ppgtt init from the aliasing ppgtt. While at it drag it out of the context enabling (which just does a switch to the default context). Note that we still have the differentiation between synchronous and asynchronous ppgtt setup, but that will soon vanish. So also correctly wire up the return value handling to be prepared for when ->switch_mm drops the synchronous parameter and could start to fail. Reviewed-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-13 14:23:31 +02:00
Daniel Vetter	896ab1a5d5	drm/i915: Fix up checks for aliasing ppgtt A subsequent patch will no longer initialize the aliasing ppgtt if we have full ppgtt enabled, since we simply don't need that any more. Unfortunately a few places check for the aliasing ppgtt instead of checking for ppgtt in general. Fix them up. One special case are the gtt offset and size macros, which have some code to remap the aliasing ppgtt to the global gtt. The aliasing ppgtt is _not_ a logical address space, so passing that in as the vm is plain and simple a bug. So just WARN about it and carry on - we have a gracefully fall-through anyway if we can't find the vma. Reviewed-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-13 14:23:31 +02:00
Daniel Vetter	6c5566a82c	drm/i915: Allow i915_gem_setup_global_gtt to fail We already needs this just as a safety check in case the preallocation reservation dance fails. But we definitely need this to be able to move tha aliasing ppgtt setup back out of the context code to this place, where it belongs. Reviewed-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-13 14:23:30 +02:00
Daniel Vetter	5dc383b05a	drm/i915: Add proper prefix to obj_to_ggtt Stuff in headers really aught to have this. Reviewed-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-13 14:23:29 +02:00
Daniel Vetter	841cd77375	drm/i915: Only refcount ppgtt if it actually is one This essentially unbreaks non-ppgtt operation where we'd scribble over random memory. While at it give the vm_to_ppgtt function a proper prefix and make it a bit more paranoid. Reviewed-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-13 14:23:29 +02:00
Daniel Vetter	ee960be7bb	drm/i915: Some cleanups for the ppgtt lifetime handling So when reviewing Michel's patch I've noticed a few things and cleaned them up: - The early checks in ppgtt_release are now redundant: The inactive list should always be empty now, so we can ditch these checks. Even for the aliasing ppgtt (though that's a different confusion) since we tear that down after all the objects are gone. - The ppgtt handling functions are splattered all over. Consolidate them in i915_gem_gtt.c, give them OCD prefixes and add wrappers for get/put. - There was a bit a confusion in ppgtt_release about whether it cares about the active or inactive list. It should care about them both, so augment the WARNINGs to check for both. There's still create_vm_for_ctx left to do, put that is blocked on the removal of ppgtt->ctx. Once that's done we can rename it to i915_ppgtt_create and move it to its siblings for handling ppgtts. v2: Move the ppgtt checks into the inline get/put functions as suggested by Chris. v3: Inline the now redundant ppgtt local variable. Cc: Michel Thierry <michel.thierry@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-12 15:24:04 +02:00
Michel Thierry	b9d06dd9d1	drm/i915: vma/ppgtt lifetime rules VMAs should take a reference of the address space they use. Now, when the fd is closed, it will release the ref that the context was holding, but it will still be referenced by any vmas that are still active. ppgtt_release() should then only be called when the last thing referencing it releases the ref, and it can just call the base cleanup and free the ppgtt. Note that with this we will extend the lifetime of ppgtts which contain shared objects. But all the non-shared objects will get removed as soon as they drop of the active list and for the shared ones the shrinker can eventually reap them. Since we currently can't evict ppgtt pagetables either I don't think that temporary leak is important. Signed-off-by: Michel Thierry <michel.thierry@intel.com> [danvet: Add note about potential ppgtt leak with this approach.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-12 15:22:26 +02:00
Oscar Mateo	454afebde8	drm/i915/bdw: Skeleton for the new logical rings submission path Execlists are indeed a brave new world with respect to workload submission to the GPU. In previous version of these series, I have tried to impact the legacy ringbuffer submission path as little as possible (mostly, passing the context around and using the correct ringbuffer when I needed one) but Daniel is afraid (probably with a reason) that these changes and, especially, future ones, will end up breaking older gens. This commit and some others coming next will try to limit the damage by creating an alternative path for workload submission. The first step is here: laying out a new ring init/fini. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 16:40:57 +02:00
Oscar Mateo	a83014d3f8	drm/i915: Abstract the legacy workload submission mechanism away As suggested by Daniel Vetter. The idea, in subsequent patches, is to provide an alternative to these vfuncs for the Execlists submission mechanism. v2: Splitted into two and reordered to illustrate our intentions, instead of showing it off. Also, remove the add_request vfunc and added the stop_ring one. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: - Make checkpatch happy. - Be grumpy about the excessive vtable. - Ditch gt->is_ring_initialized.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 16:40:32 +02:00
Oscar Mateo	127f100369	drm/i915/bdw: Macro for LRCs and module option for Execlists GEN8 brings an expansion of the HW contexts: "Logical Ring Contexts". These expanded contexts enable a number of new abilities, especially "Execlists". The macro is defined to off until we have things in place to hope to work. v2: Rename "advanced contexts" to the more correct "logical ring contexts". v3: Add a module parameter to enable execlists. Execlist are relatively new, and so it'd be wise to be able to switch back to ring submission to debug subtle problems that will inevitably arise. v4: Add an intel_enable_execlists function. v5: Sanitize early, as suggested by Daniel. Remove lrc_enabled. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1) Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> (v3) Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2, v4 & v5) Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 16:00:27 +02:00
Chris Wilson	82b6b6d786	drm/i915: Remove fenced_gpu_access and pending_fenced_gpu_access This migrates the fence tracking onto the existing seqno infrastructure so that the later conversion to tracking via requests is simplified. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 12:20:25 +02:00
Chris Wilson	e6a844687c	drm/i915: Force CPU relocations if not GTT mapped Move the decision on whether we need to have a mappable object during execbuffer to the fore and then reuse that decision by propagating the flag through to reservation. As a corollary, before doing the actual relocation through the GTT, we can make sure that we do have a GTT mapping through which to operate. Note that the key to make this work is to ditch the obj->map_and_fenceable unbind optimization - with full ppgtt it doesn't make a lot of sense any more anyway. v2: Revamp and resend to ease future patches. v3: Refresh patch rationale References: https://bugs.freedesktop.org/show_bug.cgi?id=81094 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <benjamin.widawsky@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> [danvet: Explain why obj->map_and_fenceable is key and split out the secure batch fix.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 12:01:29 +02:00
Chris Wilson	dc8cd1e790	drm/i915: Only perform set-to-gtt domain for objects bound to the global gtt If an object is not bound into the global GTT, then it cannot be accessed via the GTT. This restores the original code that was muddled by ppGTT. In the process, we remove a WARN that had long outlived its usefulness and was simply being coded around instead. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 11:36:12 +02:00
Linus Torvalds	889fa782bf	Merge tag 'drm-intel-fixes-2014-08-08' of git://anongit.freedesktop.org/drm-intel Pull intel drm fixes from Daniel Vetter: "So I heard that proper pull requests have a revert on top ;-) So here we go with my usual mid-merge-window pile of fixes. [ Ed. This revert thing had better not become the "in" thing ] Big fix is the duct-tape for ring init on g4x platforms, we seem to have found the magic again to make those machines as happy as before (not perfect though unfortunately, but that was never the case). Otherwise fixes all over: - tune down some overzealous debug output - VDD power sequencing fix after resume - bunch of dsi fixes for baytrail among them hw state checker de-noising - bunch of error state capture fixes for bdw - misc tiny fixes/workarounds for various platforms Last minute rebase was to kick out two patches that shouldn't have been in here - they're for the state checker, so 0 functional code affected. Jani's back from vacation, so he'll take over -fixes from here" * tag 'drm-intel-fixes-2014-08-08' of git://anongit.freedesktop.org/drm-intel: (21 commits) Revert "drm/i915: Enable semaphores on BDW" drm/i915: read HEAD register back in init_ring_common() to enforce ordering drm/i915: Fix crash when failing to parse MIPI VBT drm/i915: Bring GPU Freq to min while suspending. drm/i915: Fix DEIER and GTIER collecting for BDW. drm/i915: Don't accumulate hangcheck score on forward progress drm/i915: Add the WaCsStallBeforeStateCacheInvalidate:bdw workaround. drm/i915: Refactor Broadwell PIPE_CONTROL emission into a helper. drm/i915: Fix threshold for choosing 32 vs. 64 precisions for VLV DDL values drm/i915: Fix drain latency precision multipler for VLV drm/i915: Collect gtier properly on HSW. drm/i915: Tune down MCH_SSKPD values warning drm/i915: Tune done rc6 enabling output drm/i915: Don't require dev->struct_mutex in psr_match_conditions drm/i915: Fix error state collecting drm/i915: fix VDD state tracking after system resume drm/i915: Add correct hw/sw config check for DSI encoder drm/i915: factor out intel_edp_panel_vdd_sanitize drm/i915: wait for all DSI FIFOs to be empty drm/i915: work around warning in i915_gem_gtt ...	2014-08-08 10:24:36 -07:00
Linus Torvalds	a7d7a143d0	Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux Pull DRM updates from Dave Airlie: "Like all good pull reqs this ends with a revert, so it must mean we tested it, [ Ed. That's _one_ way of looking at it ] This pull is missing nouveau, Ben has been stuck trying to track down a very longstanding bug that revealed itself due to some other changes. I've asked him to send you a direct pull request for nouveau once he cleans things up. I'm away until Monday so don't want to delay things, you can make a decision on that when he sends it, I have my phone so I can ack things just not really merge much. It has one trivial conflict with your tree in armada_drv.c, and also the pull request contains some component changes that are already in your tree, the base tree from Russell went via Greg's tree already, but some stuff still shows up in here that doesn't when I merge my tree into yours. Otherwise all pretty standard graphics fare, one new driver and changes all over the place. New drivers: - sti kms driver for STMicroelectronics chipsets stih416 and stih407. core: - lots of cleanups to the drm core - DP MST helper code merged - universal cursor planes. - render nodes enabled by default panel: - better panel interfaces - new panel support - non-continuous cock advertising ability ttm: - shrinker fixes i915: - hopefully ditched UMS support - runtime pm fixes - psr tracking and locking - now enabled by default - userptr fixes - backlight brightness fixes - MST support merged - runtime PM for dpms - primary planes locking fixes - gen8 hw semaphore support - fbc fixes - runtime PM on SOix sleep state hw. - mmio base page flipping - lots of vlv/chv fixes. - universal cursor planes radeon: - Hawaii fixes - display scalar support for non-fixed mode displays - new firmware format support - dpm on more asics by default - GPUVM improvements - uncached and wc GTT buffers - BOs > visible VRAM exynos: - i80 interface support - module auto-loading - ipp driver consolidated. armada: - irq handling in crtc layer only - crtc renumbering - add component support - DT interaction changes. tegra: - load as module fixes - eDP bpp and sync polarity fixed - DSI non-continuous clock mode support - better support for importing buffers from nouveau msm: - mdp5/adq8084 v1.3 hw enablement - devicetree clk changse - ifc6410 board working tda998x: - component support - DT documentation update vmwgfx: - fix compat shader namespace" * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (551 commits) Revert "drm: drop redundant drm_file->is_master" drm/panel: simple: Use devm_gpiod_get_optional() drm/dsi: Replace upcasting macro by function drm/panel: ld9040: Replace upcasting macro by function drm/exynos: dp: Modify driver to support drm_panel drm/exynos: Move DP setup into commit() drm/panel: simple: Add AUO B133HTN01 panel support drm/panel: simple: Support delays in panel functions drm/panel: simple: Add proper definition for prepare and unprepare drm/panel: s6e8aa0: Add proper definition for prepare and unprepare drm/panel: ld9040: Add proper definition for prepare and unprepare drm/tegra: Add support for panel prepare and unprepare routines drm/exynos: dsi: Add support for panel prepare and unprepare routines drm/exynos: dpi: Add support for panel prepare and unprepare routines drm/panel: simple: Add dummy prepare and unprepare routines drm/panel: s6e8aa0: Add dummy prepare and unprepare routines drm/panel: ld9040: Add dummy prepare and unprepare routines drm/panel: Provide convenience wrapper for .get_modes() drm/panel: add .prepare() and .unprepare() functions drm/panel: simple: Remove simple-panel compatible ...	2014-08-07 17:36:12 -07:00
Deepak S	274fa1c1ac	drm/i915: Bring GPU Freq to min while suspending. We might be leaving the PGU Frequency (and thus vnn) high during the suspend. Flusing the delayed work queue should take care of this. Signed-off-by: Deepak S <deepak.s@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-07 14:04:09 +02:00
Linus Torvalds	e7fda6c4c3	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer and time updates from Thomas Gleixner: "A rather large update of timers, timekeeping & co - Core timekeeping code is year-2038 safe now for 32bit machines. Now we just need to fix all in kernel users and the gazillion of user space interfaces which rely on timespec/timeval :) - Better cache layout for the timekeeping internal data structures. - Proper nanosecond based interfaces for in kernel users. - Tree wide cleanup of code which wants nanoseconds but does hoops and loops to convert back and forth from timespecs. Some of it definitely belongs into the ugly code museum. - Consolidation of the timekeeping interface zoo. - A fast NMI safe accessor to clock monotonic for tracing. This is a long standing request to support correlated user/kernel space traces. With proper NTP frequency correction it's also suitable for correlation of traces accross separate machines. - Checkpoint/restart support for timerfd. - A few NOHZ[_FULL] improvements in the [hr]timer code. - Code move from kernel to kernel/time of all time* related code. - New clocksource/event drivers from the ARM universe. I'm really impressed that despite an architected timer in the newer chips SoC manufacturers insist on inventing new and differently broken SoC specific timers. [ Ed. "Impressed"? I don't think that word means what you think it means ] - Another round of code move from arch to drivers. Looks like most of the legacy mess in ARM regarding timers is sorted out except for a few obnoxious strongholds. - The usual updates and fixlets all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits) timekeeping: Fixup typo in update_vsyscall_old definition clocksource: document some basic timekeeping concepts timekeeping: Use cached ntp_tick_length when accumulating error timekeeping: Rework frequency adjustments to work better w/ nohz timekeeping: Minor fixup for timespec64->timespec assignment ftrace: Provide trace clocks monotonic timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC seqcount: Add raw_write_seqcount_latch() seqcount: Provide raw_read_seqcount() timekeeping: Use tk_read_base as argument for timekeeping_get_ns() timekeeping: Create struct tk_read_base and use it in struct timekeeper timekeeping: Restructure the timekeeper some more clocksource: Get rid of cycle_last clocksource: Move cycle_last validation to core code clocksource: Make delta calculation a function wireless: ath9k: Get rid of timespec conversions drm: vmwgfx: Use nsec based interfaces drm: i915: Use nsec based interfaces timekeeping: Provide ktime_get_raw() hangcheck-timer: Use ktime_get_ns() ...	2014-08-05 17:46:42 -07:00
Thomas Gleixner	5ed0bdf21a	drm: i915: Use nsec based interfaces Use ktime_get_raw_ns() and get rid of the back and forth timespec conversions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: John Stultz <john.stultz@linaro.org>	2014-07-23 15:01:50 -07:00
Chris Wilson	eedd10f45b	drm/i915: Simplify i915_gem_release_all_mmaps() An object can only have an active gtt mapping if it is currently bound into the global gtt. Therefore we can simply walk the list of all bound objects and check the flag upon those for an active gtt mapping. From commit `48018a57a8` Author: Paulo Zanoni <paulo.r.zanoni@intel.com> Date: Fri Dec 13 15:22:31 2013 -0200 drm/i915: release the GTT mmaps when going into D3 Also note that the WARN is inappropriate for this function as GPU activity is orthogonal to GTT mmap status. Rather it is the caller that relies upon this condition and so it should assert that the GPU is idle itself. References: https://bugs.freedesktop.org/show_bug.cgi?id=80081 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@gmail.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com> [danvet: cherry-pick from -next to -fixes.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-23 16:09:51 +02:00
Armin Reese	9490edb588	drm/i915: Do not unmap object unless no other VMAs reference it When using an IOMMU, GEM objects are mapped by their DMA address as the physical address is unknown. This depends on the underlying IOMMU driver to map and unmap the physical pages properly as defined in intel_iommu.c. The current code will tell the IOMMU to unmap the GEM BO's pages on the destruction of the first VMA that "maps" that BO. This is clearly wrong as there may be other VMAs "mapping" that BO (using flink). The scanout is one such example. The patch fixes this issue by only unmapping the DMA maps when there are no more VMAs mapping that object. This is equivalent to when an object is considered unbound as can be seen by the code. On the first VMA that again because bound, we will remap. An alternate solution would be to move the dma mapping to object creation and destrubtion. I am not sure if this is considered an unfriendly thing to do. Some notes to backporters trying to backport full PPGTT: The bug can never be hit without enabling the IOMMU. The existing code will also do the right thing when the object is shared via dmabuf. The failure should be demonstrable with flink. In cases when not using intel_iommu_strict it is likely (likely, as defined by: off the top of my head) on current workloads to not hit this bug since we often teardown all VMAs for an object shared across multiple VMs. We also finish access to that object before the first dma_unmapping. intel_iommu_strict with flinked buffers is likely to hit this issue. Signed-off-by: Armin Reese <armin.c.reese@intel.com> [danvet: Add the excellent commit message provided by Ben.] Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-23 07:05:40 +02:00
Jesse Barnes	9df7575f1c	drm/i915: add helper for checking whether IRQs are enabled Now that we use the runtime IRQ enable/disable functions in our suspend path, we can simply check the pm._irqs_disabled flag everywhere. So rename it to catch the users, and add an inline for it to make the checks clear everywhere. Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-23 07:05:34 +02:00
Chris Wilson	a1db2fa7c8	drm/i915: Abandon oom quickly if killed by a signal Whilst waiting to obtain our locks for the last resort shrinking before an oom, we check whether or not a fatal signal was pending. If there was, we do not need to keep waiting as the oom will be aborted. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-23 07:05:28 +02:00
Oscar Mateo	1b5d063faf	drm/i915: Generalize intel_ring_get_tail to take a ringbuf Again, it's low-level enough to simply take a ringbuf and nothing else. Trivial change. Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-08 12:31:02 +02:00
Chris Wilson	ec5cc0f9b0	drm/i915: Restrict GPU boost to the RCS engine Make the assumption that media workloads are not as latency sensitive for __wait_seqno, and that upclocking the GPU does not affect the BLT engine. Under that assumption, we only wait to forcibly upclock the GPU when we are stalling for results from the render pipeline. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Deepak S<deepak.s@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-08 10:25:17 +02:00
Rodrigo Vivi	ddd4dbc6c1	drm/i915: Updating comments. ring index calculation table was out of date after other rings were added, although the formula is flexible and scale when adding new rings. So this patch just update the comments and add a brief explanation why to use sync_seqno[ring index]. Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-07 22:02:49 +02:00
Daniel Vetter	f99d70690e	drm/i915: Track frontbuffer invalidation/flushing So these are the guts of the new beast. This tracks when a frontbuffer gets invalidated (due to frontbuffer rendering) and hence should be constantly scaned out, and when it's flushed again and can be compressed/one-shot-upload. Rules for flushing are simple: The frontbuffer needs one more full upload starting from the next vblank. Which means that the flushing can _only_ be called once the frontbuffer update has been latched. But this poses a problem for pageflips: We can't just delay the flushing until the pageflip is latched, since that would pose the risk that we override frontbuffer rendering that has been scheduled in-between the pageflip ioctl and the actual latching. To handle this track asynchronous invalidations (and also pageflip) state per-ring and delay any in-between flushing until the rendering has completed. And also cancel any delayed flushing if we get a new invalidation request (whether delayed or not). Also call intel_mark_fb_busy in both cases in all cases to make sure that we keep the screen at the highest refresh rate both on flips, synchronous plane updates and for frontbuffer rendering. v2: Lots of improvements Suggestions from Chris: - Move invalidate/flush in flush__domain and set_to__domain. - Drop the flush in busy_ioctl since it's redundant. Was a leftover from an earlier concept to track flips/delayed flushes. - Don't forget about the initial modeset enable/final disable. Suggested by Chris. Track flips accurately, too. Since flips complete independently of rendering we need to track pending flips in a separate mask. Again if an invalidate happens we need to cancel the evenutal flush to avoid races. v3: Provide correct header declarations for flip functions. Currently not needed outside of intel_display.c, but part of the proper interface. v4: Add proper domain management to fbcon so that the fbcon buffer is also tracked correctly. v5: Fixup locking around the fbcon set_to_gtt_domain call. v6: More comments from Chris: - Split out fbcon changes. - Drop superflous checks for potential scanout before calling intel_fb functions - we can micro-optimize this later. - s/intel_fb_/intel_fb_obj_/ to make it clear that this deals in gem object. We already have precedence for fb_obj in the pin_and_fence functions. v7: Clarify the semantics of the flip flush handling by renaming things a bit: - Don't go through a gem object but take the relevant frontbuffer bits directly. These functions center on the plane, the actual object is irrelevant - even a flip to the same object as already active should cause a flush. - Add a new intel_frontbuffer_flip for synchronous plane updates. It currently just calls intel_frontbuffer_flush since the implemenation differs. This way we achieve a clear split between one-shot update events on one side and frontbuffer rendering with potentially a very long delay between the invalidate and flush. Chris and I also had some discussions about mark_busy and whether it is appropriate to call from flush. But mark busy is a state which should be derived from the 3 events (invalidate, flush, flip) we now have by the users, like psr does by tracking relevant information in psr.busy_frontbuffer_bits. DRRS (the only real use of mark_busy for frontbuffer) needs to have similar logic. With that the overall mark_busy in the core could be removed. v8: Only when retiring gpu buffers only flush frontbuffer bits we actually invalidated in a batch. Just for safety since before any additional usage/invalidate we should always retire current rendering. Suggested by Chris Wilson. v9: Actually use intel_frontbuffer_flip in all appropriate places. Spotted by Chris. v10: Address more comments from Chris: - Don't call _flip in set_base when the crtc is inactive, avoids redunancy in the modeset case with the initial enabling of all planes. - Add comments explaining that the initial/final plane enable/disable still has work left to do before it's fully generic. v11: Only invalidate for gtt/cpu access when writing. Spotted by Chris. v12: s/_flush/_flip/ in intel_overlay.c per Chris' comment. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-19 18:14:47 +02:00
Daniel Vetter	a071fa0064	drm/i915: Introduce accurate frontbuffer tracking So from just a quick look we seem to have enough information to accurately figure out whether a given gem bo is used as a frontbuffer and where exactly: We have obj->pin_count as a first check with no false negatives and only negligible false positives. And then we can just walk the modeset objects and figure out where exactly a buffer is used as scanout. Except that we can't due to locking order: If we already hold dev->struct_mutex we can't acquire any modeset locks, so could potential chase freed pointers and other evil stuff. So we need something else. For that introduce a new set of bits obj->frontbuffer_bits to track where a buffer object is used. That we can then chase without grabbing any modeset locks. Of course the consumers of this (DRRS, PSR, FBC, ...) still need to be able to do their magic both when called from modeset and from gem code. But that can be easily achieved by adding locks for these specific subsystems which always nest within either kms or gem locking. This patch just adds the relevant update code to all places. Note that if we ever support multi-planar scanout targets then we need one frontbuffer tracking bit per attachment point that we expose to userspace. v2: - Fix more oopsen. Oops. - WARN if we leak obj->frontbuffer_bits when freeing a gem buffer. Fix the bugs this brought to light. - s/update_frontbuffer_bits/update_fb_bits/. More consistent with the fb tracking functions (fb for gem object, frontbuffer for raw bits). And the function name was way too long. v3: Size obj->frontbuffer_bits correctly so that all pipes fit in. v4: Don't update fb bits in set_base on failure. Noticed by Chris. v5: s/i915_gem_update_fb_bits/i915_gem_track_fb/ Also remove a few local enum pipe variables which are now no longer needed to make the function arguments no drop over the 80 char limit. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-19 10:04:41 +02:00
Daniel Vetter	3108e99ea9	drm/i915: Drop schedule_back from psr_exit It doesn't make sense to never again schedule the work, since by the time we might want to re-enable psr the world might have changed and we can do it again. The only exception is when we shut down the pipe, but that's an entirely different thing and needs to be handled in psr_disable. Note that later patch will again split psr_exit into psr_invalidate and psr_flush. But the split is different and this simplification helps with the transition. v2: Improve the commit message a bit. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-19 09:59:19 +02:00
Daniel Vetter	f25748ea73	drm/i915: Don't BUG_ON in i915_gem_obj_offset A WARN_ON is perfectly fine. The BUG in here seems to be the cause behind hard-hangs when I cat the i915_gem_pageflip debugfs file (which calls this from an irq spinlock). But only while running a full igt run after a while. I still need to root cause the underlying issue. I'll also start reject patches which add new BUG_ON but don't come with a really good justification for it. The general rule really should be to just WARN and hope the driver survives for long enough. v2: Make the WARN a bit more useful per Chris' suggestion. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-18 00:48:37 +02:00
Ville Syrjälä	beff0d0f61	drm/i915: Don't prefault the entire obj if the vma is smaller Take the minimum of the object size and the vma size and prefault only that much. Avoids a SIGBUS when mmapping only a portion of the object. Prefaulting was introduced here: commit `b90b91d870` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Jun 10 12:14:40 2014 +0100 drm/i915: Prefault the entire object on first page fault Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Testcase: igt/gem_mmap/short-mmap Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-18 00:48:35 +02:00
Sourab Gupta	84c33a64b4	drm/i915: Replaced Blitter ring based flips with MMIO flips This patch enables the framework for using MMIO based flip calls, in contrast with the CS based flip calls which are being used currently. MMIO based flip calls can be enabled on architectures where Render and Blitter engines reside in different power wells. The decision to use MMIO flips can be made based on workloads to give 100% residency for Media power well. v2: The MMIO flips now use the interrupt driven mechanism for issuing the flips when target seqno is reached. (Incorporating Ville's idea) v3: Rebasing on latest code. Code restructuring after incorporating Damien's comments v4: Addressing Ville's review comments -general cleanup -updating only base addr instead of calling update_primary_plane -extending patch for gen5+ platforms v5: Addressed Ville's review comments -Making mmio flip vs cs flip selection based on module parameter -Adding check for DRIVER_MODESET feature in notify_ring before calling notify mmio flip. -Other changes mostly in function arguments v6: -Having a seperate function to check condition for using mmio flips (Ville) -propogating error code from i915_gem_check_olr (Ville) v7: -Adding __must_check with i915_gem_check_olr (Chris) -Renaming mmio_flip_data to mmio_flip (Chris) -Rebasing on latest nightly v8: -Rebasing on latest code -squash 3rd patch in series(mmio setbase vs page flip race) with this patch -Added new tiling mode update in intel_do_mmio_flip (Chris) v9: -check for obj->last_write_seqno being 0 instead of obj->ring being NULL in intel_postpone_flip, as this is a more restrictive condition (Chris) v10: -Applied Chris's suggestions for squashing patches 2,3 into this patch. These patches make the selection of CS vs MMIO flip at the page flip time, and make the module parameter for using mmio flips as tristate, the states being 'force CS flips', 'force mmio flips', 'driver discretion'. Changed the logic for driver discretion (Chris) v11: Minor code cleanup(better readability, fixing whitespace errors, using lockdep to check mutex locked status in postpone_flip, removal of __must_check in function definition) (Chris) Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com> Signed-off-by: Akash Goel <akash.goel@intel.com> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> # snb, ivb [danvet: Fix up parameter alignement checkpatch spotted.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-17 16:16:20 +02:00
Chris Wilson	6254b2042c	drm/i915: Simplify i915_gem_release_all_mmaps() An object can only have an active gtt mapping if it is currently bound into the global gtt. Therefore we can simply walk the list of all bound objects and check the flag upon those for an active gtt mapping. From commit `48018a57a8` Author: Paulo Zanoni <paulo.r.zanoni@intel.com> Date: Fri Dec 13 15:22:31 2013 -0200 drm/i915: release the GTT mmaps when going into D3 Also note that the WARN is inappropriate for this function as GPU activity is orthogonal to GTT mmap status. Rather it is the caller that relies upon this condition and so it should assert that the GPU is idle itself. References: https://bugs.freedesktop.org/show_bug.cgi?id=80081 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@gmail.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-16 19:52:20 +02:00
Rodrigo Vivi	7c8f8a7007	drm/i915: Force PSR exit by inactivating it. The perfect solution for psr_exit is the hardware tracking the changes and doing the psr exit by itself. This scenario works for HSW and BDW with some environments like Gnome and Wayland. However there are many other scenarios that this isn't true. Mainly one right now is KDE users on HSW and BDW with PSR on. User would miss many screen updates. For instances any key typed could be seen only when mouse cursor is moved. So this patch introduces the ability of trigger PSR exit on kernel side on some common cases that. Most of the cases are coverred by psr_exit at set_domain. The remaining cases are coverred by triggering it at set_domain, busy_ioctl, sw_finish and mark_busy. The downside here might be reducing the residency time on the cases this already work very wall like Gnome environment. But so far let's get focused on fixinge issues sio PSR couild be used for everybody and we could even get it enabled by default. Later we can add some alternatives to choose the level of PSR efficiency over boot flag of even over crtc property. v2: remove exit from connector_dpms. Daniel pointed this is the wrong way and also this isn't needed for BDW and HSW anyway. Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Vijay Purushothaman <vijay.a.purushothaman@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-13 21:21:36 +02:00
Chris Wilson	b90b91d870	drm/i915: Prefault the entire object on first page fault Inserting additional PTEs has no side-effect for us as the pfn are fixed for the entire time the object is resident in the global GTT. The downside is that we pay the entire cost of faulting the object upon the first hit, for which we in return receive the benefit of removing the per-page faulting overhead. On an Ivybridge i7-3720qm with 1600MHz DDR3, with 32 fences, Upload rate for 2 linear surfaces: 8127MiB/s -> 8134MiB/s Upload rate for 2 tiled surfaces: 8607MiB/s -> 8625MiB/s Upload rate for 4 linear surfaces: 8127MiB/s -> 8127MiB/s Upload rate for 4 tiled surfaces: 8611MiB/s -> 8602MiB/s Upload rate for 8 linear surfaces: 8114MiB/s -> 8124MiB/s Upload rate for 8 tiled surfaces: 8601MiB/s -> 8603MiB/s Upload rate for 16 linear surfaces: 8110MiB/s -> 8123MiB/s Upload rate for 16 tiled surfaces: 8595MiB/s -> 8606MiB/s Upload rate for 32 linear surfaces: 8104MiB/s -> 8121MiB/s Upload rate for 32 tiled surfaces: 8589MiB/s -> 8605MiB/s Upload rate for 64 linear surfaces: 8107MiB/s -> 8121MiB/s Upload rate for 64 tiled surfaces: 2013MiB/s -> 3017MiB/s Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: "Goel, Akash" <akash.goel@intel.com> Testcasee: igt/gem_fence_upload/performance Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-13 15:17:41 +02:00
Chris Wilson	ef0cf27c4d	drm/i915: Use the .release hook to drop the stolen drm_mm tracking Now that we have a release hook into i915_gem_object_free, we can move the explicit call to the internal stolen function and hook it up throught the callback instead. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-13 15:17:36 +02:00
David Herrmann	f461d1be22	drm/i915: use shmem helpers if possible Instead of shuffling gfp-masks all the time, use the shmem_read_mapping_page() helper. Note that __GFP_IO and __GFP_WAIT are set in mapping_gfp_mask() for i915, so the behavior is still the same. Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-11 16:57:38 +02:00
Dave Airlie	ecb889e620	Merge tag 'drm-intel-fixes-2014-06-06' of git://anongit.freedesktop.org/drm-intel into drm-next > Bunch of stuff for 3.16 still: > - Mipi dsi panel support for byt. Finally! From Shobhit&others. I've > squeezed this in since it's a regression compared to vbios and we've > been ridiculed about it a bit too often ... > - connection_mutex deadlock fix in get_connector (only affects i915). > - Core patches from Matt's primary plane from Matt Roper, I've pushed the > i915 stuff to 3.17. > - vlv power well sequencing fixes from Jesse. > - Fix for cursor size changes from Chris. > - agpbusy fixes from Ville. > - A few smaller things. > * tag 'drm-intel-fixes-2014-06-06' of git://anongit.freedesktop.org/drm-intel: (32 commits) drm/i915: BDW: Adding missing cursor offsets. drm: Fix getconnector connection_mutex locking drm/i915/bdw: Only use 2g GGTT for 32b platforms drm/i915: Nuke pipe A quirk on i830M drm/i915: fix display power sw state reporting drm/i915: Always apply cursor width changes drm/i915: tell the user if both KMS and UMS are disabled drm/plane-helper: Add drm_plane_helper_check_update() (v3) drm: Check CRTC compatibility in setplane drm/i915: use VBT to determine whether to enumerate the VGA port drm/i915: Don't WARN about ring idle bit on gen2 drm/i915: Silence the WARN if the user tries to GTT mmap an incoherent object drm/i915: Move the C3 LP write bit setup to gen3_init_clock_gating() for KMS drm/i915: Enable interrupt-based AGPBUSY# enable on 85x drm/i915: Flip the sense of AGPBUSY_DIS bit drm/i915: Set AGPBUSY# bit in init_clock_gating drm/i915/vlv: add pll assertion when disabling DPIO common well drm/i915/vlv: move DPIO common reset de-assert into __vlv_set_power_well drm/i915/vlv: re-order power wells so DPIO common comes after TX drm/i915/vlv: move CRI refclk enable into __vlv_set_power_well ...	2014-06-06 19:07:09 +10:00
Dave Airlie	8d4ad9d4bb	Merge commit '9e9a928eed8796a0a1aaed7e0b676db86ba84594' into drm-next Merge drm-fixes into drm-next. Both i915 and radeon need this done for later patches. Conflicts: drivers/gpu/drm/drm_crtc_helper.c drivers/gpu/drm/i915/i915_drv.h drivers/gpu/drm/i915/i915_gem.c drivers/gpu/drm/i915/i915_gem_execbuffer.c drivers/gpu/drm/i915/i915_gem_gtt.c	2014-06-05 20:28:59 +10:00
Chris Wilson	ddeff6ee42	drm/i915: Silence the WARN if the user tries to GTT mmap an incoherent object If the user tries to mmap through the GTT an object that is marked as snooped, we report an error rather than allow the GPU to hang the machine. The choice of EINVAL, however, was unfortunate as we turn that into a WARN rather than a quiet SIGBUS. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-05 08:52:41 +02:00
Ville Syrjälä	dbb42748ac	drm/i915: Move the C3 LP write bit setup to gen3_init_clock_gating() for KMS Move the MI_ARB_STATE MI_ARB_C3_LP_WRITE_ENABLE setup to gen3_init_clock_gating() from i915_gem_load() when KMS is enabled. Leave it in i915_gem_load() for the UMS case, but add an explcit check, just to make it easier to spot it when we eventually rip out UMS support. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-05 08:52:40 +02:00
Chris Wilson	d23db88c3a	drm/i915: Prevent negative relocation deltas from wrapping This is pure evil. Userspace, I'm looking at you SNA, repacks batch buffers on the fly after generation as they are being passed to the kernel for execution. These batches also contain self-referenced relocations as a single buffer encompasses the state commands, kernels, vertices and sampler. During generation the buffers are placed at known offsets within the full batch, and then the relocation deltas (as passed to the kernel) are tweaked as the batch is repacked into a smaller buffer. This means that userspace is passing negative relocations deltas, which subsequently wrap to large values if the batch is at a low address. The GPU hangs when it then tries to use the large value as a base for its address offsets, rather than wrapping back to the real value (as one would hope). As the GPU uses positive offsets from the base, we can treat the relocation address as the minimum address read by the GPU. For the upper bound, we trust that userspace will not read beyond the end of the buffer. So, how do we fix negative relocations from wrapping? We can either check that every relocation looks valid when we write it, and then position each object such that we prevent the offset wraparound, or we just special-case the self-referential behaviour of SNA and force all batches to be above 256k. Daniel prefers the latter approach. This fixes a GPU hang when it tries to use an address (relocation + offset) greater than the GTT size. The issue would occur quite easily with full-ppgtt as each fd gets its own VM space, so low offsets would often be handed out. However, with the rearrangement of the low GTT due to capturing the BIOS framebuffer, it is already affecting kernels 3.15 onwards. I think only IVB+ is susceptible to this bug, but the workaround should only kick in rarely, so it seems sensible to always apply it. v3: Use a bias for batch buffers to prevent small negative delta relocations from wrapping. v4 from Daniel: - s/BIAS/BATCH_OFFSET_BIAS/ - Extract eb_vma_misplaced/i915_vma_misplaced since the conditions were growing rather cumbersome. - Add a comment to eb_get_batch explaining why we do this. - Apply the batch offset bias everywhere but mention that we've only observed it on gen7 gpus. - Drop PIN_OFFSET_FIX for now, that slipped in from a feature patch. v5: Add static to eb_get_batch, spotted by 0-day tester. Testcase: igt/gem_bad_reloc Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78533 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v3) Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-27 11:18:40 +03:00
Chris Wilson	00731155a7	drm/i915: Fix dynamic allocation of physical handles A single object may be referenced by multiple registers fundamentally breaking the static allotment of ids in the current design. When the object is used the second time, the physical address of the first assignment is relinquished and a second one granted. However, the hardware is still reading (and possibly writing) to the old physical address now returned to the system. Eventually hilarity will ensue, but in the short term, it just means that cursors are broken when using more than one pipe. v2: Fix up leak of pci handle when handling an error during attachment, and avoid a double kmap/kunmap. (Ville) Rebase against -fixes. v3: And fix the error handling added in v2 (Ville) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77351 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: stable@vger.kernel.org Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-27 11:18:39 +03:00
Oscar Mateo	273497e5cd	drm/i915: s/i915_hw_context/intel_context Up until now, contexts had one (and only one) backing object that was used by the hardware to save/restore render ring contexts (via the MI_SET_CONTEXT command). Other rings did not have or need this, so our i915_hw_context struct had a 1:1 relationship with a a real HW context. With Logical Ring Contexts and Execlists, this is not possible anymore: all rings need a backing object, and it cannot be reused. To prepare for that, rename our contexts to the more generic term intel_context. No functional changes. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-22 23:41:17 +02:00
Oscar Mateo	ee1b1e5ef3	drm/i915: Split the ringbuffers from the rings (2/3) This refactoring has been performed using the following Coccinelle semantic script: @@ struct intel_engine_cs r; @@ ( - (r).obj + r.buffer->obj \| - (r).virtual_start + r.buffer->virtual_start \| - (r).head + r.buffer->head \| - (r).tail + r.buffer->tail \| - (r).space + r.buffer->space \| - (r).size + r.buffer->size \| - (r).effective_size + r.buffer->effective_size \| - (r).last_retired_head + r.buffer->last_retired_head ) @@ struct intel_engine_cs *r; @@ ( - (r)->obj + r->buffer->obj \| - (r)->virtual_start + r->buffer->virtual_start \| - (r)->head + r->buffer->head \| - (r)->tail + r->buffer->tail \| - (r)->space + r->buffer->space \| - (r)->size + r->buffer->size \| - (r)->effective_size + r->buffer->effective_size \| - (r)->last_retired_head + r->buffer->last_retired_head ) @@ expression E; @@ ( - LP_RING(E)->obj + LP_RING(E)->buffer->obj \| - LP_RING(E)->virtual_start + LP_RING(E)->buffer->virtual_start \| - LP_RING(E)->head + LP_RING(E)->buffer->head \| - LP_RING(E)->tail + LP_RING(E)->buffer->tail \| - LP_RING(E)->space + LP_RING(E)->buffer->space \| - LP_RING(E)->size + LP_RING(E)->buffer->size \| - LP_RING(E)->effective_size + LP_RING(E)->buffer->effective_size \| - LP_RING(E)->last_retired_head + LP_RING(E)->buffer->last_retired_head ) Note: On top of this this patch also removes the now unused ringbuffer fields in intel_engine_cs. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> [danvet: Add note about fixup patch included here.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-22 23:27:25 +02:00
Oscar Mateo	a4872ba6d0	drm/i915: s/intel_ring_buffer/intel_engine_cs In the upcoming patches we plan to break the correlation between engine command streamers (a.k.a. rings) and ringbuffers, so it makes sense to refactor the code and make the change obvious. No functional changes. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-22 23:01:05 +02:00
Chris Wilson	340fbd8ca1	drm/i915: Only discard backing storage on releasing the last ref Before purging our pages (as opposed to copying back the contents from the GPU), make sure that there is not an exposed CPU mmapping through which the user can inspect the results. Regression from commit `5537252b6b` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Mar 25 13:23:06 2014 +0000 drm/i915: Invalidate our pages under memory pressure Testcase: igt/gem_mmap/new-object Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79005 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Guo Jinxian <jinxianx.guo@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-22 15:06:34 +02:00
Chris Wilson	2cfcd32a92	drm/i915: Implement an oom-notifier for last resort shrinking Before the process killer is invoked, oom-notifiers are executed for one last try at recovering pages. We can hook into this callback to be sure that everything that can be is purged from our page lists, and to give a summary of how much memory is still pinned by the GPU in the case of an oom. This should be really valuable for debugging OOM issues. Note that the last-ditch effort call to shrink_all we've previously called from our normal shrinker when we could free as much as the vm demaned is moved into the oom notifier. Since the shrinker accounting races against bind/unbind operations we might have called shrink_all prematurely, which this approach with an oom notifier avoids. References: https://bugs.freedesktop.org/show_bug.cgi?id=72742 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: lu hua <huax.lu@intel.com> [danvet: Bikeshed logical \| into \|\| and pimp commit message.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-20 10:57:13 +02:00
Chris Wilson	5537252b6b	drm/i915: Invalidate our pages under memory pressure Try to flush out dirty pages into the swapcache (and from there into the swapfile) when under memory pressure and forced to drop GEM objects from memory. In effect, this should just allow us to discard unused pages for memory reclaim and to start writeback earlier. v2: Hugh Dickins warned that explicitly starting writeback from shrink_slab was prone to deadlocks within shmemfs. Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Robert Beckett <robert.beckett@intel.com> Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-20 09:51:18 +02:00
Chris Wilson	b453c4dbc3	drm/i915: Refactor common lock handling between shrinker count/scan We can share a few lines of tricky lock handling we need to use for both shrinker routines and in the process fix the return value for count() when reporting a deadlock. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Robert Beckett <robert.beckett@intel.com> Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-20 09:46:52 +02:00
Chris Wilson	ceabbba524	drm/i915: Include bound and active pages in the count of shrinkable objects When the machine is under a lot of memory pressure and being stressed by multiple GPU threads, we quite often report fewer than shrinker->batch (i.e. SHRINK_BATCH) pages to be freed. This causes the shrink_control to skip calling into i915.ko to release pages, despite the GPU holding onto most of the physical pages in its active lists. References: https://bugs.freedesktop.org/show_bug.cgi?id=72742 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Robert Beckett <robert.beckett@intel.com> Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-20 09:46:06 +02:00
Chris Wilson	0820baf39b	drm/i915: Translate ENOSPC from shmem_get_page() to ENOMEM shmemfs first checks if there is enough memory to allocate the page and reports ENOSPC should there be insufficient, along with the usual ENOMEM for a genuine allocation failure. We use ENOSPC in our driver to mean that we have run out of aperture space and so want to translate the error from shmemfs back to our usual understanding of ENOMEM. None of the the other GEM users appear to distinguish between ENOMEM and ENOSPC in their error handling, hence it is easiest to do the fixup in i915.ko Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Robert Beckett <robert.beckett@intel.com> Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-20 09:45:22 +02:00
Chris Wilson	5cc9ed4b9a	drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl By exporting the ability to map user address and inserting PTEs representing their backing pages into the GTT, we can exploit UMA in order to utilize normal application data as a texture source or even as a render target (depending upon the capabilities of the chipset). This has a number of uses, with zero-copy downloads to the GPU and efficient readback making the intermixed streaming of CPU and GPU operations fairly efficient. This ability has many widespread implications from faster rendering of client-side software rasterisers (chromium), mitigation of stalls due to read back (firefox) and to faster pipelining of texture data (such as pixel buffer objects in GL or data blobs in CL). v2: Compile with CONFIG_MMU_NOTIFIER v3: We can sleep while performing invalidate-range, which we can utilise to drop our page references prior to the kernel manipulating the vma (for either discard or cloning) and so protect normal users. v4: Only run the invalidate notifier if the range intercepts the bo. v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers v6: Recheck after reacquire mutex for lost mmu. v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary. v8: Fix rebasing error after forwarding porting the back port. v9: Limit the userptr to page aligned entries. We now expect userspace to handle all the offset-in-page adjustments itself. v10: Prevent vma from being copied across fork to avoid issues with cow. v11: Drop vma behaviour changes -- locking is nigh on impossible. Use a worker to load user pages to avoid lock inversions. v12: Use get_task_mm()/mmput() for correct refcounting of mm. v13: Use a worker to release the mmu_notifier to avoid lock inversion v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer with its own locking and tree of objects for each mm/mmu_notifier. v15: Prevent overlapping userptr objects, and invalidate all objects within the mmu_notifier range v16: Fix a typo for iterating over multiple objects in the range and rearrange error path to destroy the mmu_notifier locklessly. Also close a race between invalidate_range and the get_pages_worker. v17: Close a race between get_pages_worker/invalidate_range and fresh allocations of the same userptr range - and notice that struct_mutex was presumed to be held when during creation it wasn't. v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory for the struct sg_table and to clear it before reporting an error. v19: Always error out on read-only userptr requests as we don't have the hardware infrastructure to support them at the moment. v20: Refuse to implement read-only support until we have the required infrastructure - but reserve the bit in flags for future use. v21: use_mm() is not required for get_user_pages(). It is only meant to be used to fix up the kernel thread's current->mm for use with copy_user(). v22: Use sg_alloc_table_from_pages for that chunky feeling v23: Export a function for sanity checking dma-buf rather than encode userptr details elsewhere, and clean up comments based on suggestions by Bradley. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com> Cc: Akash Goel <akash.goel@intel.com> Cc: "Volkin, Bradley D" <bradley.d.volkin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> [danvet: Frob ioctl allocation to pick the next one - will cause a bit of fuss with create2 apparently, but such are the rules.] [danvet2: oops, forgot to git add after manual patch application] [danvet3: Appease sparse.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-16 19:31:29 +02:00
Oscar Mateo	19656430a8	drm/i915: Gracefully handle obj not bound to GGTT in is_pin_display Otherwise, we do a NULL pointer dereference. I've seen this happen while handling an error in i915_gem_object_pin_to_display_plane(): If i915_gem_object_set_cache_level() fails, we call is_pin_display() to handle the error. At this point, the object is still not pinned to GGTT and maybe not even bound, so we have to check before we dereference its GGTT vma. The IGT kms_flip/bo-too-big tests for this bug. v2: Chris Wilson says restoring the old value is easier, but that is_pin_display is useful as a theory of operation. Take the solomonic decision: at least this way is_pin_display is a little more robust (until Chris can kill it off). v3: Chris suggests the WARN in i915_gem_obj_to_ggtt has outlived its usefulness: add a reminder to remove it. Issue: VIZ-3772 Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Testcase: igt/kms_flip/bo-too-big Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-16 16:24:39 +02:00
Daniel Vetter	8b1bc9b4f1	drm/i915: Only do gtt cleanup in vma_unbind for the global vma Otherwise we end up tearing down fences when e.g. the client quits way too early. Might or might not fix a fence pin_count BUG Ville has reported. Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-14 18:39:54 +02:00
Daniel Vetter	aff10b30a1	drm/i915: Don't drop pinned fences Userspace can currently provoke this when e.g. trying to use a pinned scanout as a cursor or overlay target. Later on that might lead to some fun fence pin count mayhem. Spurred by Ville's report that something goes wrong here and originally I've thought that this might slip through the pwrite gtt fastpath. But that one checks of obj tiling, so should be ok. But one thing that _does_ blow up is the vma unbinding with more than one address space. The next patch will fix this. v2: Use a WARN_ON - Chris pointed out that we already catch all cases so userspace can't provoke this like I've originally feared. While reviewing relevant code I've noticed a pile of DRM_ERROR in the overlay&cursor code which are all triggerable by userspace. Tune them down while at it. v3: Split out the DRM_ERROR->DRM_DEBUG_KMS change into a separate patch, as requested by Chris. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-14 18:39:31 +02:00
Daniel Vetter	d8ffa60b52	drm/i915: WARN_ON fence pin leaks The fence pin count should always be <= the bo pin count. If that's not the case then we have a funny problem and are leaking references somewhere. Which means we can catch fence pin leaks by checking for the same upper limit as we do for the bo pin count. Inspired by a discussion with Ville about a fence leak igt testcase. v2: Also check for fence->pin_count <= ggtt_vma->pin_count, since that might catch a leak even quicker. Also de-inline them, they're getting too big. v3: Don't separately check for MAX_PIN_COUNT since the > vma->pin_count check will catch that already (Chris). Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-13 17:16:12 +02:00
Chris Wilson	1cf0ba1474	drm/i915: Flush request queue when waiting for ring space During the review of commit `1f70999f90` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Jan 27 22:43:07 2014 +0000 drm/i915: Prevent recursion by retiring requests when the ring is full Ville raised the point that our interaction with request->tail was likely to foul up other uses elsewhere (such as hang check comparing ACTHD against requests). However, we also need to restore the implicit retire requests that certain test cases depend upon (e.g. igt/gem_exec_lut_handle), this raises the spectre that the ppgtt will randomly call i915_gpu_idle() and recurse back into intel_ring_begin(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78023 Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> [danvet: Remove now unused 'tail' variable as spotted by Brad.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-08 01:23:34 +02:00
Ben Widawsky	6e7186af3b	drm/i915: Make aliasing a 2nd class VM There is a good debate to be had about how best to fit the aliasing PPGTT into the code. However, as it stands right now, getting aliasing PPGTT bindings is a hack, and done through implicit arguments. To make this absolutely clear, WARN and return an error if a driver writer tries to do something they shouldn't. I have no issue with an eventual revert of this patch. It makes sense for what we have today. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-07 10:01:41 +02:00
Ben Widawsky	ebc348b2ad	drm/i915: Move semaphore specific ring members to struct This will be helpful in abstracting some of the code in preparation for gen8 semaphores. v2: Move mbox stuff to a separate struct v3: Rebased over VCS2 work Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> (v1) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-05 10:56:52 +02:00
Chris Wilson	c8725f3dc0	drm/i915: Do not call retire_requests from wait_for_rendering A common issue we have is that retiring requests causes recursion through GTT manipulation or page table manipulation which we can only handle at very specific points. However, to maintain internal consistency (enforced through our sanity checks on write_domain at various points in the GEM object lifecycle) we do need to retire the object prior to marking it with a new write_domain, and also clear the write_domain for the implicit flush following a batch. Note that this then allows the unbound objects to still be on the active lists, and so care must be taken when removing objects from unbound lists (similar to the caveats we face processing the bound lists). v2: Fix i915_gem_shrink_all() to handle updated object lifetime rules, by refactoring it to call into __i915_gem_shrink(). v3: Missed an object-retire prior to changing cache domains in i915_gem_object_set_cache_leve() v4: Rebase Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-05 09:09:15 +02:00
Imre Deak	981a5aead1	drm/i915: vlv: clean up GTLC wake control/status register macros These will be needed by the upcoming VLV RPM helpers. Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-05 09:08:50 +02:00
Zhao Yakui	845f74a701	drm/i915:Initialize the second BSD ring on BDW GT3 machine Based on the hardware spec, the BDW GT3 machine has two independent BSD ring that can be used to dispatch the video commands. So just initialize it. V3->V4: Follow Imre's comment to do some minor updates. For example: more comments are added to describe the semaphore between ring. Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> [danvet: Fix up checkpatch error.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-05 09:08:46 +02:00
Chris Wilson	6099032045	drm/i915: Allow the module to load even if we fail to setup rings Even without enabling the ringbuffers to allow command execution, we can still control the display engines to enable modesetting. So make the ringbuffer initialization failure soft, and mark the GPU as wedged instead. v2: Only treat an EIO from ring initialisation as a soft failure, and abort module load for any other failure, such as allocation failures. v3: Add an ERROR prior to declaring the GPU wedged so that it stands out like a sore thumb in the logs Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Reviewed-by: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-05 09:08:38 +02:00
Chris Wilson	e3efda49e7	drm/i915: Preserve ring buffers objects across resume Tearing down the ring buffers across resume is overkill, risks unnecessary failure and increases fragmentation. After failure, since the device is still active we may end up trying to write into the dangling iomapping and trigger an oops. v2: stop_ringbuffers() was meant to call stop(ring) not cleanup(ring) during resume! Reported-by: Jae-hyeon Park <jhyeon@gmail.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=72351 References: https://bugs.freedesktop.org/show_bug.cgi?id=76554 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Oscar Mateo <oscar.mateo@intel.com> [danvet: s/ring->obj == NULL/!intel_ring_initialized(ring)/ as suggested by Oscar.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-05 09:08:37 +02:00
Dave Airlie	444c9a08bf	Merge branch 'drm-init-cleanup' of git://people.freedesktop.org/~danvet/drm into drm-next Next pull request, this time more of the drm de-midlayering work. The big thing is that his patch series here removes everything from drm_bus except the set_busid callback. Thierry has a few more patches on top of this to make that one optional to. With that we can ditch all the non-pci drm_bus implementations, which Thierry has already done for the fake tegra host1x drm_bus. Reviewed by Thierry, Laurent and David and now also survived some testing on my intel boxes to make sure the irq fumble is fixed correctly ;-) The last minute rebase was just to add the r-b tags from Thierry for the 2 patches I've redone. * 'drm-init-cleanup' of git://people.freedesktop.org/~danvet/drm: drm/<drivers>: don't set driver->dev_priv_size to 0 drm: Remove dev->kdriver drm: remove drm_bus->get_name drm: rip out dev->devname drm: inline drm_pci_set_unique drm: remove bus->get_irq implementations drm: pass the irq explicitly to drm_irq_install drm/irq: Look up the pci irq directly in the drm_control ioctl drm/irq: track the irq installed in drm_irq_install in dev->irq drm: rename dev->count_lock to dev->buf_lock drm: Rip out totally bogus vga_switcheroo->can_switch locking drm: kill drm_bus->bus_type drm: remove drm_dev_to_irq from drivers drm/irq: remove cargo-culted locking from irq_install/uninstall drm/irq: drm_control is a legacy ioctl, so pci devices only drm/pci: fold in irq_by_busid support drm/irq: simplify irq checks in drm_wait_vblank	2014-05-01 09:32:21 +10:00
Dave Airlie	885ac04ab3	Merge tag 'drm-intel-next-2014-04-16' of git://anongit.freedesktop.org/drm-intel into drm-next drm-intel-next-2014-04-16: - vlv infoframe fixes from Jesse - dsi/mipi fixes from Shobhit - gen8 pageflip fixes for LRI/SRM from Damien - cmd parser fixes from Brad Volkin - some prep patches for CHV, DRRS, ... - and tons of little things all over drm-intel-next-2014-04-04: - cmd parser for gen7 but only in enforcing and not yet granting mode - the batch copying stuff is still missing. Also performance is a bit ... rough (Brad Volkin + OACONTROL fix from Ken). - deprecate UMS harder (i.e. CONFIG_BROKEN) - interrupt rework from Paulo Zanoni - runtime PM support for bdw and snb, again from Paulo - a pile of refactorings from various people all over the place to prep for new stuff (irq reworks, power domain polish, ...) drm-intel-next-2014-04-04: - cmd parser for gen7 but only in enforcing and not yet granting mode - the batch copying stuff is still missing. Also performance is a bit ... rough (Brad Volkin + OACONTROL fix from Ken). - deprecate UMS harder (i.e. CONFIG_BROKEN) - interrupt rework from Paulo Zanoni - runtime PM support for bdw and snb, again from Paulo - a pile of refactorings from various people all over the place to prep for new stuff (irq reworks, power domain polish, ...) Conflicts: drivers/gpu/drm/i915/i915_gem_context.c	2014-05-01 09:11:37 +10:00
Daniel Vetter	bb0f1b5c16	drm: pass the irq explicitly to drm_irq_install Unfortunately this requires a drm-wide change, and I didn't see a sane way around that. Luckily it's fairly simple, we just need to inline the respective get_irq implementation from either drm_pci.c or drm_platform.c. With that we can now also remove drm_dev_to_irq from drm_irq.c. Reviewed-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-04-23 10:32:50 +02:00
Daniel Vetter	e090c53b21	drm/irq: remove cargo-culted locking from irq_install/uninstall The dev->struct_mutex locking in drm_irq.c only protects dev->irq_enabled. Which isn't really much at all and only prevents especially nasty ums userspace from concurrently installing the interrupt handling a few times. Or at least trying. There are tons of unlocked readers of dev->irqs_enabled in the vblank wait code (and by extension also in the pageflip code since that uses the same vblank timestamp engine). Real modesetting drivers should ensure that nothing can go haywire with a sane setup teardown sequence. So we only really need this for the drm_control ioctl, everywhere else this will just paper over nastiness. Note that drm/i915 is a bit specially due to the gem+ums combination. So there we also need to properly protect the entervt and leavevt ioctls. But it's definitely saner to do everything in one go than to drop the lock in-between. Finally there's the gpu reset code in drm/i915. That one's just race (concurrent userspace calls to for vblank waits of pageflips could spuriously fail). So wrap it up in with a nice comment since fixing this is more involved. v2: Rebase and fix commit message (Thierry) Reviewed-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-04-22 11:41:12 +02:00
Chris Wilson	691e6415c8	drm/i915: Always use kref tracking for all contexts. If we always initialize kref for the context, even if we are using fake contexts for hangstats when there is no hw support, we can forgo the dance to dereference the ctx->obj and inspect whether we are permitted to use kref inside i915_gem_context_reference() and _unreference(). My ulterior motive here is to improve the debugging of a use-after-free of ctx->obj. This patch avoids the dereference here and instead forces the assertion checks associated with kref. v2: Refactor the fake contexts to being even more like the real contexts, so that there is much less duplicated and special case code. v3: Tweaks. v4: Tweaks, minor. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76671 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: lu hua <huax.lu@intel.com> Cc: Ben Widawsky <benjamin.widawsky@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> [Jani: tiny change to backport to drm-intel-fixes.] Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2014-04-11 13:29:51 +03:00
Mika Kuoppala	88b4aa8770	drm/i915: add flags to i915_ring_stop Piglit runner and QA are both looking at the dmesg for DRM_ERRORs with test cases. Add a flag to control those when we they are expected from related test cases. Also add flag to control if contexts should be banned that introduced the hang. Hangcheck is timer based and preventing bans by adding sleeps to testcases makes testing slower. v2: intel_ring_stopped(), readable comment (Chris) v3: keep compatibility (Daniel) References: https://bugs.freedesktop.org/show_bug.cgi?id=75876 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-04-09 15:07:42 +02:00
Dave Airlie	9f97ba806a	Merge tag 'drm-intel-fixes-2014-04-04' of git://anongit.freedesktop.org/drm-intel into drm-next Merge window -fixes pull request as usual. Well, I did sneak in Jani's drm_i915_private_t typedef removal, need to have fun with a big sed job too ;-) Otherwise: - hdmi interlaced fixes (Jesse&Ville) - pipe error/underrun/crc tracking fixes, regression in late 3.14-rc (but not cc: stable since only really relevant for igt runs) - large cursor wm fixes (Chris) - fix gpu turbo boost/throttle again, was getting stuck due to vlv rps patches (Chris+Imre) - fix runtime pm fallout (Paulo) - bios framebuffer inherit fix (Chris) - a few smaller things * tag 'drm-intel-fixes-2014-04-04' of git://anongit.freedesktop.org/drm-intel: (196 commits) Skip intel_crt_init for Dell XPS 8700 drm/i915: vlv: fix RPS interrupt mask setting Revert "drm/i915/vlv: fixup DDR freq detection per Punit spec" drm/i915: move power domain init earlier during system resume drm/i915: Fix the computation of required fb size for pipe drm/i915: don't get/put runtime PM at the debugfs forcewake file drm/i915: fix WARNs when reading DDI state while suspended drm/i915: don't read cursor registers on powered down pipes drm/i915: get runtime PM at i915_display_info drm/i915: don't read pp_ctrl_reg if we're suspended drm/i915: get runtime PM at i915_reg_read_ioctl drm/i915: don't schedule force_wake_timer at gen6_read drm/i915: vlv: reserve the GT power context only once during driver init drm/i915: prefer struct drm_i915_private to drm_i915_private_t drm/i915/overlay: prefer struct drm_i915_private to drm_i915_private_t drm/i915/ringbuffer: prefer struct drm_i915_private to drm_i915_private_t drm/i915/display: prefer struct drm_i915_private to drm_i915_private_t drm/i915/irq: prefer struct drm_i915_private to drm_i915_private_t drm/i915/gem: prefer struct drm_i915_private to drm_i915_private_t drm/i915/dma: prefer struct drm_i915_private to drm_i915_private_t ...	2014-04-05 16:14:21 +10:00
Lauri Kasanen	62347f9e0f	drm: Add support for two-ended allocation, v3 Clients like i915 need to segregate cache domains within the GTT which can lead to small amounts of fragmentation. By allocating the uncached buffers from the bottom and the cacheable buffers from the top, we can reduce the amount of wasted space and also optimize allocation of the mappable portion of the GTT to only those buffers that require CPU access through the GTT. For other drivers, allocating small bos from one end and large ones from the other helps improve the quality of fragmentation. Based on drm_mm work by Chris Wilson. v3: Changed to use a TTM placement flag v2: Updated kerneldoc Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Christian König <deathsimple@vodafone.de> Signed-off-by: Lauri Kasanen <cand@gmx.com> Signed-off-by: David Airlie <airlied@redhat.com>	2014-04-04 09:28:14 +10:00
Jani Nikula	3e31c6c017	drm/i915/gem: prefer struct drm_i915_private to drm_i915_private_t No functional changes. Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-31 15:32:38 +02:00
Chris Wilson	df6f783a4e	drm/i915: Fix unsafe loop iteration over vma whilst unbinding them On non-LLC platforms, when changing the cache level of an object, we may need to unbind it so that prefetching across page boundaries does not cross into a different memory domain. This requires us to unbind conflicting vma, but we did so iterating over the objects vma in an unsafe manner (as the list was being modified as we iterated). The regression was introduced in commit `3089c6f239` Author: Ben Widawsky <ben@bwidawsk.net> Date: Wed Jul 31 17:00:03 2013 -0700 drm/i915: make caching operate on all address spaces apparently as far back as v3.12-rc1, but it has only just begun to trigger real world bug reports. Reported-and-tested-by: Nikolay Martynov <mar.kolya@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76384 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-21 16:13:08 +01:00
Paulo Zanoni	5d584b2eca	drm/i915: move pc8.irqs_disabled to pm.irqs_disabled When other platforms add runtime PM support they will also need to disable interrupts, so move the variable to the runtime PM struct. Also notice that the longer-term goal is to completely kill the regsave struct, and I even have patches for that. v2: - Rebase. v3: - Rebase. Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-19 16:39:46 +01:00
Daniel Vetter	b80d6c781e	Merge branch 'topic/dp-aux-rework' into drm-intel-next-queued Conflicts: drivers/gpu/drm/i915/intel_dp.c A bit a mess with reverts which differe in details between -fixes and -next and some other unrelated shuffling. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-19 15:54:37 +01:00
Dave Airlie	d1583c9997	Merge branch 'drm-next' of git://people.freedesktop.org/~dvdhrm/linux into drm-next This is the 3rd respin of the drm-anon patches. They allow module unloading, use the pin_fs_* helpers recommended by Al and are rebased on top of drm-next. Note that there are minor conflicts with the "drm-minor" branch. * 'drm-next' of git://people.freedesktop.org/~dvdhrm/linux: drm: init TTM dev_mapping in ttm_bo_device_init() drm: use anon-inode instead of relying on cdevs drm: add pseudo filesystem for shared inodes	2014-03-18 19:17:02 +10:00
David Herrmann	6796cb16c0	drm: use anon-inode instead of relying on cdevs DRM drivers share a common address_space across all character-devices of a single DRM device. This allows simple buffer eviction and mapping-control. However, DRM core currently waits for the first ->open() on any char-dev to mark the underlying inode as backing inode of the device. This delayed initialization causes ugly conditions all over the place: if (dev->dev_mapping) do_sth(); To avoid delayed initialization and to stop reusing the inode of the char-dev, we allocate an anonymous inode for each DRM device and reset filp->f_mapping to it on ->open(). Signed-off-by: David Herrmann <dh.herrmann@gmail.com>	2014-03-16 12:23:33 +01:00
Ville Syrjälä	3ddffb7b8a	drm/i915: Unbind all vmas whose new cache_level doesn't agree with the neighbours When we change the cache_level for an object we need to make sure we don't put differing types of snoopable memory too close to each other on non-LLC machines. Currently i915_gem_object_set_cache_level() will stop looking when it finds just one vma that has such a conflict. Drop the bogus break statement to make sure it will unbind all vmas which need to be moved around to avoid the conflict. I suppose this is a theoretical issue as currently we don't enable ppgtt on non-LLC machines, so each object can only have one vma. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-13 12:22:44 +01:00
Chris Wilson	c2831a94b5	drm/i915: Do not force non-caching copies for pwrite along shmem path We don't always want to write into main memory with pwrite. The shmem fast path in particular is used for memory that is cacheable - under such circumstances forcing the cache eviction is undesirable. As we will always flush the cache when targeting incoherent buffers, we can rely on that second pass to apply the cache coherency rules and so benefit from in-cache copies otherwise. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-08 00:03:26 +01:00
Chris Wilson	17793c9a46	drm/i915: Process page flags once rather than per pwrite/pread We used to lock individual pages inside the buffer object and so needed to update the page flags every time. However, we now pin the pages into the object for the duration of the pwrite/pread (and hopefully much longer) and so we can forgo the flag updates until we release all the pages. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-08 00:03:01 +01:00
Brad Volkin	4c914c0c7c	drm/i915: Refactor shmem pread setup The command parser is going to need the same synchronization and setup logic, so factor it out for reuse. v2: Add a check that the object is backed by shmem Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-07 22:36:59 +01:00
Damien Lespiau	cb216aa844	drm/i915: Make i915_gem_retire_requests_ring() static Its last usage outside of i915_gem.c was removed in: commit `1f70999f90` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Jan 27 22:43:07 2014 +0000 drm/i915: Prevent recursion by retiring requests when the ring is full Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-05 21:30:39 +01:00
Chris Wilson	ab0e7ff9f2	drm/i915: Record pid/comm of hanging task After finding the guilty batch and request, we can use it to find the process that submitted the batch and then add the culprit into the error state. This is a slightly different approach from Ben's in that instead of adding the extra information into the struct i915_hw_context, we use the information already captured in struct drm_file which is then referenced from the request. v2: Also capture the workaround buffer for gen2, so that we can compare its contents against the intended batch for the active request. v3: Rebase (Mika) v4: Check for null context (Chris) checkpatch warnings fixed Link: http://lists.freedesktop.org/archives/intel-gfx/2013-August/032280.html Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v2) Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> (v4) Acked-by: Ben Widawsky <ben@bwidawsk.net> Cc: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-05 21:30:24 +01:00
Chris Wilson	8d9fc7fd2d	drm/i915: Rely on accurate request tracking for finding hung batches In the past, it was possible to have multiple batches per request due to a stray signal or ENOMEM. As a result we had to scan each active object (filtered by those having the COMMAND domain) for the one that contained the ACTHD pointer. This was then made more complicated by the introduction of ppgtt, whereby ACTHD then pointed into the address space of the context and so also needed to be taken into account. This is a fairly robust approach (though the implementation is a little fragile and depends upon the per-generation setup, registers and parameters). However, due to the requirements for hangstats, we needed a robust method for associating batches with a particular request and having that we can rely upon it for finding the associated batch object for error capture. If the batch buffer tracking is not robust enough, that should become apparent quite quickly through an erroneous error capture. That should also help to make sure that the runtime reporting to userspace is robust. It also means that we then report the oldest incomplete batch on each ring, which can be useful for determining the state of userspace at the time of a hang. v2: Use i915_gem_find_active_request (Mika) v3: remove check for ring->get_seqno, split long lines (Ben) v4: check that context is available (Chris) checkpatch warnings fixed Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1) Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> (v3) Cc: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> (v3) Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-05 21:30:24 +01:00
Chris Wilson	64bf930379	drm/i915: Reset vma->mm_list after unbinding In place of true activity counting, we walk the list of vma associated with an object managing each on the vm's active/inactive list everytime we call move-to-inactive. This depends upon the vma->mm_list being cleared after unbinding, or else we run into difficulty when tracking the object in multiple vm's - we see a use-after free and corruption of the mm_list. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-05 21:30:23 +01:00
Ville Syrjälä	ccc7bed05e	drm/i915: Don't ban default context when stop_rings!=0 If we've explicitly stopped the rings for testing purposes, don't ban the default context. Fixes kms_flip hang tests. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Acked-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-05 21:30:14 +01:00
Chris Wilson	f62a007603	drm/i915: Accurately track when we mark the hardware as idle/busy We currently call intel_mark_idle() too often, as we do so as a side-effect of processing the request queue. However, we the calls to intel_mark_idle() are expected to be paired with a call to intel_mark_busy() (or else we try to idle the hardware by accessing registers that are already disabled). Make the idle/busy tracking explicit to prevent the multiple calls. v2: We can drop some of the complexity in __i915_add_request() as queue_delayed_work() already behaves as we want (not requeuing the item if it is already in the queue) and mark_busy/mark_idle imply that the idle task is inactive. v3: We do still need to cancel the pending idle task so that it is sent again after the current busy load completes (not in the middle of it). Reported-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-05 21:30:10 +01:00
Daniel Vetter	8ea99c9287	drm/i915: Only bind each object rather than for every execbuffer One side-effect of the introduction of ppgtt was that we needed to rebind the object into the appropriate vm (and global gtt in some peculiar cases). For simplicity this was done twice for every object on every call to execbuffer. However, that adds a tremendous amount of CPU overhead (rewriting all the PTE for all objects into WC memory) per draw. The fix is to push all the decision about which vm to bind into and when down into the low-level bind routines through hints rather than performing the bind unconditionally in the execbuffer routine. Note that this is a regression introduced in the full ppgtt feature branch, before this we've only done re-bound objects when the relevant has_(aliasing_ppgtt\|global_gtt)_mapping flag was clear. But since that's per-object and not per-vma that optimization broke. v2: Split out prep work and unrelated changes. v3: Bring back functional change around PIN_GLOBAL that I've accidentally split out. v4: Remove the temporary hack for the old binding logic to avoid bisection issues. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72906 Tested-by: jianx.zhou@intel.com Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1) Cc: Ben Widawsky <benjamin.widawsky@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-14 14:18:38 +01:00
Daniel Vetter	262de14531	drm/i915: Directly return the vma from bind_to_vm This is prep work for reworking the object_pin logic. Atm it still does a (now redundant) lookup of the vma. The next patch will fix this. Split out from Chris vma-bind rework. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-14 14:18:30 +01:00
Daniel Vetter	b287110e89	drm/i915: Simplify i915_gem_object_ggtt_unpin Split out from Chris vma-bind rework. Jani wondered why this is save, and the reason is that i915_vma_unbind does all these checks, too. So they're redundant. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <benjamin.widawsky@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-14 14:18:21 +01:00
Daniel Vetter	bf3d149b25	drm/i915: split PIN_GLOBAL out from PIN_MAPPABLE With abitrary pin flags it makes sense to split out a "please bind this into global gtt" from the "please allocate in the mappable range". Use this unconditionally in our global gtt pin helper since this is what its callers want. Later patches will drop PIN_MAPPABLE where it's not strictly needed. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-14 14:17:27 +01:00
Daniel Vetter	1ec9e26dda	drm/i915: Consolidate binding parameters into flags Anything more than just one bool parameter is just a pain to read, symbolic constants are much better. Split out from Chris' vma-binding rework patch. v2: Undo the behaviour change in object_pin that Chris spotted. v3: Split out misplaced hunk to handle set_cache_level errors, spotted by Jani. v4: Keep the current over-zealous binding logic in the execbuffer code working with a quick hack while the overall binding code gets shuffled around. v5: Reorder the PIN_ flags for more natural patch splitup. v6: Pull out the PIN_GLOBAL split-up again. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-14 14:16:58 +01:00
Chris Wilson	6e4930f6ee	drm/i915: Flush GPU rendering with a lockless wait during a pagefault Arjan van de Ven reported that on his test machine that he was seeing stalls of greater than 1 frame greatly impacting the user experience. He tracked this down to being the locked flush during a pagefault as being the culprit hogging the struct_mutex and so blocking any other user from proceeding. Stalling on a pagefault is bad behaviour on userspace's part, for one it means that they are ignoring the coherency rules on pointer access through the GTT, but fortunately we can apply the same trick as the set-to-domain ioctl to do a lightweight, nonblocking flush of outstanding rendering first. "Prior to the patch it looks like this (this one testrun does not show the 20ms+ I've seen occasionally) 4.99 ms 2.36 ms 31360 __wait_seqno i915_wait_seqno i915_gem_object_wait_rendering i915_gem_object_set_to_gtt_domain i915_gem_fault __do_fault handle_ +pte_fault handle_mm_fault __do_page_fault do_page_fault page_fault 4.99 ms 2.75 ms 107751 __wait_seqno i915_gem_wait_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret 4.99 ms 1.63 ms 1666 i915_mutex_lock_interruptible i915_gem_fault __do_fault handle_pte_fault handle_mm_fault __do_page_fault do_page_fault page_fa +ult 4.93 ms 2.45 ms 980 i915_mutex_lock_interruptible intel_crtc_page_flip drm_mode_page_flip_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_ +sysret 4.89 ms 2.20 ms 3283 i915_mutex_lock_interruptible i915_gem_wait_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret 4.34 ms 1.66 ms 1715 i915_mutex_lock_interruptible i915_gem_pwrite_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret 3.73 ms 3.73 ms 49 i915_mutex_lock_interruptible i915_gem_set_domain_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret 3.17 ms 0.33 ms 931 i915_mutex_lock_interruptible i915_gem_madvise_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret 2.97 ms 0.43 ms 1029 i915_mutex_lock_interruptible i915_gem_busy_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret 2.55 ms 0.51 ms 735 i915_gem_get_tiling drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret After the patch it looks like this: 4.99 ms 2.14 ms 22212 __wait_seqno i915_gem_wait_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret 4.86 ms 0.99 ms 14170 __wait_seqno i915_gem_object_wait_rendering__nonblocking i915_gem_fault __do_fault handle_pte_fault handle_mm_fault __do_page_ +fault do_page_fault page_fault 3.59 ms 1.31 ms 325 i915_gem_get_tiling drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret 3.37 ms 3.37 ms 65 i915_mutex_lock_interruptible i915_gem_wait_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret 2.58 ms 2.58 ms 65 i915_mutex_lock_interruptible i915_gem_do_execbuffer.isra.23 i915_gem_execbuffer2 drm_ioctl i915_compat_ioctl compat_sys_ioctl +ia32_sysret 2.19 ms 2.19 ms 65 i915_mutex_lock_interruptible intel_crtc_page_flip drm_mode_page_flip_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_ +sysret 2.18 ms 2.18 ms 65 i915_mutex_lock_interruptible i915_gem_busy_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret 1.66 ms 1.66 ms 65 i915_gem_set_tiling drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret It may not look like it, but this is quite a large difference, and I've been unable to reproduce > 5 msec delays at all, while before they do happen (just not in the trace above)." gem_gtt_hog on an old Pineview (GMA3150), before: 4969.119ms after: 4122.749ms Reported-by: Arjan van de Ven <arjan.van.de.ven@intel.com> Testcase: igt/gem_gtt_hog Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-12 18:52:59 +01:00
Chris Wilson	bd9b6a4ec5	drm/i915: Downgrade ERROR message for invalid user input When we detect that the user passed along an invalid handle or object, we emit a warning as an aide for debugging. Since these are indeed only for debugging user triggerable errors (and the errors are reported back to userspace by the errno), the messages should only be at the debug level and not claiming that there is a catastrophic error in the driver/hardware. References: https://bugs.freedesktop.org/show_bug.cgi?id=74704 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-12 18:52:54 +01:00
Damien Lespiau	3d13ef2e2d	drm/i915: Always use INTEL_INFO() to access the device_info structure If we make sure that all the dev_priv->info usages are wrapped by INTEL_INFO(), we can easily modify the ->info field to be structure and not a pointer while keeping the const protection in the INTEL_INFO() macro. v2: Rebased onto latest drm-nightly Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-12 18:52:50 +01:00
Chris Wilson	8c99e57d39	drm/i915: Treat using a purged buffer as a source of EFAULT Since a purged buffer is one without any associated pages, attempting to use it should generate EFAULT rather than EINVAL, as it is not strictly an invalid parameter. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-04 17:03:32 +01:00
Chris Wilson	45d678173a	drm/i915: Convert EFAULT into a silent SIGBUS EFAULT will be a possible return code where backing storage is transient, such after it is purged by madvise. As such it is to be expected and so should not trigger a WARN inside i915_gem_fault() but be converted silently to SIGBUS. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-04 17:03:27 +01:00
Mika Kuoppala	e38486943e	drm/i915: release mutex in i915_gem_init()'s error path Found with smatch. Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-04 12:10:45 +01:00
Mika Kuoppala	939fd76208	drm/i915: Get rid of acthd based guilty batch search As we seek the guilty batch using request and hangcheck score, this code is not needed anymore. v2: Rebase. Passing dev_priv instead of getting it from last_ring Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> (v1) Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-04 11:57:29 +01:00
Mika Kuoppala	b6b0fac04d	drm/i915: Use hangcheck score to find guilty context With full ppgtt using acthd is not enough to find guilty batch buffer. We get multiple false positives as acthd is per vm. Instead of scanning which vm was running on a ring, to find corressponding context, use a different, simpler, strategy of finding batches that caused gpu hang: If hangcheck has declared ring to be hung, find first non complete request on that ring and claim it was guilty. v2: Rebase Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73652 Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> (v1) Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-04 11:57:24 +01:00
Mika Kuoppala	3fac8978f5	drm/i915: Tune down debug output when context is banned If we have stopped rings then we know that test is running so no need for spam. In addition, only spam when default context gets banned. v2: - make sure default context ban gets shown (Chris) - use helper for checking for default context, everywhere (Chris) v3: - dont be quiet when debug is set (Ben, Daniel) Reference: https://bugs.freedesktop.org/show_bug.cgi?id=73652 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-30 17:25:38 +01:00
Mika Kuoppala	44e2c0705a	drm/i915: Use i915_hw_context to set reset stats With full ppgtt support drm_i915_file_private gained knowledge about the default context. Also reset stats are now inside i915_hw_context so we can use proper abstraction. v2: Move BUG_ON and WARN_ON to more proper locations (Ben) v3: Pass dev directly to i915_context_is_banned to avoid the need to dereference ctx->last_ring. Spotted by Mika when checking my s/BUG/WARN/ change, I've missed this ->last_ring dereference. Suggested-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> (v2) Reviewed-by: Ben Widawsky <ben@bwidawsk.net> (v2) [danvet: s/BUG/WARN/] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-30 17:24:36 +01:00
Daniel Vetter	6ba844b090	drm/i915: GEN7_MSG_CONTROL is ivb-only At least I couldn't find it in the Haswell Bspec any more and we've tried to test-boot a Haswell machine with num_pipes forced to 0 (i.e. hit the PCH_NOP path) and the unclaimed register logic complained. So restrict this dance to just ivb platforms. v2: Art pointed out that the bits simply moved on hsw+ v3: Buy code terseneness with a notch of sublety as suggested by Chris. v4: Frob the right bit, spotted by Art. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Arthur Ranyan <arthur.j.runyan@intel.com> Cc: Dave Airlie <airlied@gmail.com> Reviewed-by: Art Runyan <arthur.j.runyan@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-27 17:16:47 +01:00
Jani Nikula	d330a9530c	drm/i915: move module parameters into a struct, in a new file With 20+ module parameters, I think referring to them via a struct improves clarity over just having a bunch of globals. While at it, move the parameter initialization and definitions into a new file i915_params.c to reduce clutter in i915_drv.c. Apart from the ill-named i915_enable_rc6, i915_enable_fbc and i915_enable_ppgtt parameters, for which we lose the "i915_" prefix internally, the module parameters now look the same both on the kernel command line and in code. For example, "i915.modeset". The downsides of the change are losing static on a couple of variables and not having the initialization and module_param_named() right next to each other. On the other hand, all module parameters are now defined in one place at i915_params.c. Plus you can do this to find all module parameter references: $ git grep "i915\." -- drivers/gpu/drm/i915 v2: - move the definitions into a new file - s/i915_params/i915/ - make i915_try_reset i915.reset, for consistency Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-27 17:16:45 +01:00
Daniel Vetter	0e5539b923	Merge branch 'topic/ppgtt' into drm-intel-next-queued Because whatever.* * This should contain a fairly long list of issues and still unresolved resgressions, but I didn't really get a vote. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-25 21:14:57 +01:00
Chris Wilson	f72d21eddf	drm/i915: Place the Global GTT VM first in the list of VM This is useful for debugging as we then know that the first entry is always the global GTT, and all later entries the per-process GTT VM. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-25 20:07:15 +01:00
Chris Wilson	5dce5b9387	drm/i915: Wait for completion of pending flips when starved of fences On older generations (gen2, gen3) the GPU requires fences for many operations, such as blits. The display hardware also requires fences for scanouts and this leads to a situation where an arbitrary number of fences may be pinned by old scanouts following a pageflip but before we have executed the unpin workqueue. This is unpredictable by userspace and leads to random EDEADLK when submitting an otherwise benign execbuffer. However, we can detect when we have an outstanding flip and so cause userspace to wait upon their completion before finally declaring that the system is starved of fences. This is really no worse than forcing the GPU to stall waiting for older execbuffer to retire and release their fences before we can reallocate them for the next execbuffer. v2: move the test for a pending fb unpin to a common routine for later reuse during eviction Reported-and-tested-by: dimon@gmx.net Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73696 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-22 10:34:40 +01:00
Ben Widawsky	1d62beeaeb	drm/i915/ppgtt: Defer request freeing on reset We need to defer the free request until the object/vma is capable of being freed - or else we have a problem when we try to destroy the context. The exact same issue is described and fixed here: commit `e20780439b` Author: Ben Widawsky <ben@bwidawsk.net> Date: Fri Dec 6 14:11:22 2013 -0800 drm/i915: Defer request freeing I had this fix previously, but decided not to keep it for some reason I can no longer remember. gem_reset_stats is a really good test at hitting the problem. For the inquisitive: [ 170.516392] ------------[ cut here ]------------ [ 170.517227] WARNING: CPU: 1 PID: 105 at drivers/gpu/drm/drm_mm.c:578 drm_mm_takedown+0x2e/0x30 [drm]() [ 170.518064] Memory manager not clean during takedown. [ 170.518941] CPU: 1 PID: 105 Comm: kworker/1:1 Not tainted 3.13.0-rc4-BEN+ #28 [ 170.519787] Hardware name: Hewlett-Packard HP EliteBook 8470p/179B, BIOS 68ICF Ver. F.02 04/27/2012 [ 170.520662] Call Trace: [ 170.521517] [<ffffffff814f0589>] dump_stack+0x4e/0x7a [ 170.522373] [<ffffffff81049e6d>] warn_slowpath_common+0x7d/0xa0 [ 170.523227] [<ffffffff81049edc>] warn_slowpath_fmt+0x4c/0x50 [ 170.524079] [<ffffffffa06c414e>] drm_mm_takedown+0x2e/0x30 [drm] [ 170.524934] [<ffffffffa07213f3>] gen6_ppgtt_cleanup+0x23/0x110 [i915] [ 170.525777] [<ffffffffa07837ed>] ppgtt_release.part.5+0x24/0x29 [i915] [ 170.526603] [<ffffffffa071aaa5>] i915_gem_context_free+0x195/0x1a0 [i915] [ 170.527423] [<ffffffffa071189d>] i915_gem_free_request+0x9d/0xb0 [i915] [ 170.528247] [<ffffffffa0718af9>] i915_gem_reset+0x1f9/0x3f0 [i915] [ 170.529065] [<ffffffffa0700cce>] i915_reset+0x4e/0x180 [i915] [ 170.529870] [<ffffffffa070829d>] i915_error_work_func+0xcd/0x120 [i915] [ 170.530666] [<ffffffff8106c13a>] process_one_work+0x1fa/0x6d0 [ 170.531453] [<ffffffff8106c0d8>] ? process_one_work+0x198/0x6d0 [ 170.532230] [<ffffffff8106c72b>] worker_thread+0x11b/0x3a0 [ 170.532996] [<ffffffff8106c610>] ? process_one_work+0x6d0/0x6d0 [ 170.533771] [<ffffffff810743ef>] kthread+0xff/0x120 [ 170.534548] [<ffffffff810742f0>] ? insert_kthread_work+0x80/0x80 [ 170.535322] [<ffffffff814f97ac>] ret_from_fork+0x7c/0xb0 [ 170.536089] [<ffffffff810742f0>] ? insert_kthread_work+0x80/0x80 [ 170.536847] ---[ end trace 3d4c12892e42d58f ]--- v2: Whitespace fix. (Chris) Note: This is a bug that only hits the ppgtt topic branch but I've figured that doing the request cleanup in this order is generally the right thing to do. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Add a code comment to clarify what's actually going on since the lifetime rules aroung ppgtt cleanup are ... fuzzy a best atm. Also add a note about why we need this.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-22 10:34:37 +01:00
Daniel Vetter	8664850087	drm/i915: Tune down reset_stat output from ERROR to debug This is user-triggerable and hence we should not allow it to spam dmesg. Also, it upsets the nice dmesg tracking piglit does. Note that this is just extra debugging information, mostly unwanted, in case of a hang and that there is a separate message to the user giving instructions on how to report a bug for a GPU hang. v2: Add note as suggests in Chris' reply. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72740 Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-22 10:34:35 +01:00
Daniel Vetter	0d9d349d87	Merge commit origin/master into drm-intel-next Conflicts are getting out of hand, and now we have to shuffle even more in -next which was also shuffled in -fixes (the call for drm_mode_config_reset needs to move yet again). So do a proper backmerge. I wanted to wait with this for the 3.13 relaese, but alas let's just do this now. Conflicts: drivers/gpu/drm/i915/i915_reg.h drivers/gpu/drm/i915/intel_ddi.c drivers/gpu/drm/i915/intel_display.c drivers/gpu/drm/i915/intel_pm.c Besides the conflict around the forcewake get/put (where we chaged the called function in -fixes and added a new parameter in -next) code all the current conflicts are of the adjacent lines changed type. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-16 22:06:30 +01:00
Chris Wilson	e910303802	drm/i915: Free requests after object release when retiring requests Freeing a request triggers the destruction of the context. This needs to occur after all objects are themselves unbound from the context, and so the free request needs to occur after the object release during retire. This tidies up commit `e20780439b` Author: Ben Widawsky <ben@bwidawsk.net> Date: Fri Dec 6 14:11:22 2013 -0800 drm/i915: Defer request freeing by simply swapping the order of operations rather than introducing further complexity - as noted during review. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-10 08:21:52 +01:00
Daniel Vetter	bfca05275a	Revert "drm/i915: Do not allow buffers at offset 0" This reverts commit `4fe9adbc36`. The patch completely lacks a detailed explanation of what exactly blows up and how, so is insufficiently justified as a band-aid. Otoh the justification as a safety measure against userspace botching up relocations is also fairly weak: If we want real project we need to at least make the gab big enough that the gpu doesn't scribble over more important stuff. With 4k screens that would be 32MB. Also I think this would be much better in conjunction with a (debug) switch to disable our use of the scratch page. Hence revert this. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 17:50:39 +01:00
Daniel Vetter	02f6bcccf7	drm/i915: Reject the pin ioctl on gen6+ Especially with ppgtt this kinda stopped making sense. And if we indeed need this to hack around an issue, we need something that also works for non-root. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 16:30:22 +01:00
Ben Widawsky	7e0d96bc03	drm/i915: Use multiple VMs -- the point of no return As with processes which run on the CPU, the goal of multiple VMs is to provide process isolation. Specific to GEN, there is also the ability to map more objects per process (2GB each instead of 2Gb-2k total). For the most part, all the pipes have been laid, and all we need to do is remove asserts and actually start changing address spaces with the context switch. Since prior to this we've converted the setting of the page tables to a streamed version, this is quite easy. One important thing to point out (since it'd been hotly contested) is that with this patch, every context created will have it's own address space (provided the HW can do it). v2: Disable BDW on rebase NOTE: I tried to make this commit as small as possible. I needed one place where I could "turn everything on" and that is here. It could be split into finer commits, but I didn't really see much point. Cc: Eric Anholt <eric@anholt.net> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 16:24:52 +01:00
Daniel Vetter	3d7f0f9dcc	Merge commit drm-intel-fixes into topic/ppgtt I need the tricky do_switch fix before I can merge the final piece of the ppgtt enabling puzzle. Otherwise the conflict will be a real pain to resolve since the do_switch hunk from -fixes must be placed at the exact right place within a hunk in the next patch. Conflicts: drivers/gpu/drm/i915/i915_gem_context.c drivers/gpu/drm/i915/i915_gem_execbuffer.c drivers/gpu/drm/i915/intel_display.c Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 16:23:37 +01:00
Ben Widawsky	4fe9adbc36	drm/i915: Do not allow buffers at offset 0 This is primarily a band aid for an unexplainable error in gem_reloc_vs_gpu/forked-faulting-reloc-thrashing. Essentially as soon as a relocated buffer (which had a non-zero presumed offset) moved to offset 0, something goes bad. Since I have been unable to solve this, and potentially this is a good thing to do anyway, since many things can accidentally write to offset 0, why not? Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 16:15:40 +01:00
Ben Widawsky	e20780439b	drm/i915: Defer request freeing With context destruction, we always want to be able to tear down the underlying address space. This is invoked on the last unreference to the context which could happen before we've moved all objects to the inactive list. To enable a clean tear down the address space, make sure to process the request free lastly. Without this change, we cannot guarantee to we don't still have active objects in the VM. As an example of a failing case: CTX-A is created, count=1 CTX-A is used during execbuf does a context switch count = 2 and add_request count = 3 CTX B runs, switches, CTX-A count = 2 CTX-A is destroyed, count = 1 retire requests is called free_request from CTX-A, count = 0 <--- free context with active object As mentioned above, by doing the free request after processing the active list, we can avoid this case. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:52:51 +01:00
Ben Widawsky	41bde5535a	drm/i915: Get context early in execbuf We need to have the address space when reserving space for the objects. Since the address space and context are tied together, and reserve occurs before context switch (for good reason), we must lookup our context earlier in the process. This leaves some room for optimizations where we no longer need to use ctx_id in certain places. This will be addressed in a subsequent patch. Important tricky bit: Because slow relocations during execbuffer drop struct_mutex Perhaps it would be best to acquire the reference when we get the context, but I'll save that for another day (note I have written the patch before, and I found the changes required to be uglier than this). Note that since we currently access everything via context id, and not the data structure this is fine, though not desirable. The next change attempts to get the context only once via the context ID idr lookup, and as such, the following can happen: CTX-A is created, refcount = 1 CTX-A execbuf, mutex dropped close IOCTL called on CTX-A, refcount = 0 CTX-A resumes in execbuf. v2: Rebased on top of commit `b6359918b8` Author: Mika Kuoppala <mika.kuoppala@linux.intel.com> Date: Wed Oct 30 15:44:16 2013 +0200 drm/i915: add i915_get_reset_stats_ioctl v3: Rebased on top of commit `25b3dfc87b` Author: Mika Westerberg <mika.westerberg@linux.intel.com> Date: Tue Nov 12 11:57:30 2013 +0200 Author: Mika Kuoppala <mika.kuoppala@linux.intel.com> Date: Tue Nov 26 16:14:33 2013 +0200 drm/i915: check context reset stats before relocations Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:52:42 +01:00
Ben Widawsky	c482972a08	drm/i915: Piggy back hangstats off of contexts To simplify the codepaths somewhat, we can simply always create a context. Contexts already keep hangstat information. This prevents us from having to differentiate at other parts in the code. There is allocation overhead, but it should not be measurable. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:51:58 +01:00
Ben Widawsky	bdf4fd7ea0	drm/i915: Do aliasing PPGTT init with contexts We have a default context which suits the aliasing PPGTT well. Tie them together so it looks like any other context/PPGTT pair. This makes the code cleaner as it won't have to special case aliasing as often. The patch has one slightly tricky part in the default context creation function. In the future (and on aliased setup) we create a new VM for a context (potentially). However, if we have aliasing PPGTT, which occurs at this point in time for all platforms GEN6+, we can simply manage the refcounting to allow things to behave as normal. Now is a good time to recall that the aliasing_ppgtt doesn't have a real VM, it uses the GGTT drm_mm. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:32:14 +01:00
Ben Widawsky	a3d67d2396	drm/i915: PPGTT vfuncs should take a ppgtt argument Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:27:56 +01:00
Ben Widawsky	2fa48d8d4a	drm/i915: Split context enabling from init We need to do this for exactly 1 reason, because we want to embed a PPGTT into the context, but we don't want to special case the default context. To achieve that, we must be able to initialize contexts after the GTT is setup (so we can allocate and pin the default context's BO), but before the PPGTT and rings are initialized. This is because, currently, context initialization requires ring usage. We don't have rings until after the GTT is setup. If we split the enabling part of context initialization, the part requiring the ringbuffer, we can untangle this, and then later embed the PPGTT Incidentally this allows us to also adhere to the original design of context init/fini in future patches: they were only ever meant to be called at driver load and unload. v2: Move hw_contexts_disabled test in i915_gem_context_enable() (Chris) v3: BUG_ON after checking for disabled contexts. Or else it blows up pre gen6 (Ben) v4: Forward port Modified enable for each ring, since that patch is earlier in the series Dropped ring arg from create_default_context so it can be used by others Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:27:55 +01:00
Ben Widawsky	acce9ffa48	drm/i915: Better reset handling for contexts This patch adds to changes for contexts on reset: Sets last context to default - this will prevent the context switch happening after a reset. That switch is not possible because the rings are hung during reset and context switch requires reset. This behavior will need to be reworked in the future, but this is what we want for now. In the future, we'll also want to reset the guilty context to uninitialized. We should wait for ARB_Robustness related code to land for that. This is somewhat for paranoia. Because we really don't know what the GPU was doing when it hung, or the state it was in (mid context write, for example), later restoring the context is a bad idea. By setting the flag to not initialized, the next load of that context will not restore the state, and thus on the subsequent switch away from the context will overwrite the old data. NOTE: This code needs a fixup when we actually have multiple VMs. The issue that can occur is inactive objects in a VM will need to be destroyed before the last context unref. This can now happen via the fake switch introduced in this patch (and it other ways in the future) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:27:54 +01:00
Ben Widawsky	e422b888eb	drm/i915: Add a context open function We'll be doing a bit more stuff with each file, so having our own open function should make things clean. This also allows us to easily add conditionals for stuff we don't want to do when we don't have HW contexts. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:27:51 +01:00
Ben Widawsky	6f65e29aca	drm/i915: Create bind/unbind abstraction for VMAs To sum up what goes on here, we abstract the vma binding, similarly to the previous object binding. This helps for distinguishing legacy binding, versus modern binding. To keep the code churn as minimal as possible, I am leaving in insert_entries(). It serves as the per platform pte writing basically. bind_vma and insert_entries do share a lot of similarities, and I did have designs to combine the two, but as mentioned already... too much churn in an already massive patchset. What follows are the 3 commits which existed discretely in the original submissions. Upon rebasing on Broadwell support, it became clear that separation was not good, and only made for more error prone code. Below are the 3 commit messages with all their history. drm/i915: Add bind/unbind object functions to VMA drm/i915: Use the new vm [un]bind functions drm/i915: reduce vm->insert_entries() usage drm/i915: Add bind/unbind object functions to VMA As we plumb the code with more VM information, it has become more obvious that the easiest way to deal with bind and unbind is to simply put the function pointers in the vm, and let those choose the correct way to handle the page table updates. This change allows many places in the code to simply be vm->bind, and not have to worry about distinguishing PPGTT vs GGTT. Notice that this patch has no impact on functionality. I've decided to save the actual change until the next patch because I think it's easier to review that way. I'm happy to squash the two, or let Daniel do it on merge. v2: Make ggtt handle the quirky aliasing ppgtt Add flags to bind object to support above Don't ever call bind/unbind directly for PPGTT until we have real, full PPGTT (use NULLs to assert this) Make sure we rebind the ggtt if there already is a ggtt binding. This happens on set cache levels. Use VMA for bind/unbind (Daniel, Ben) v3: Reorganize ggtt_vma_bind to be more concise and easier to read (Ville). Change logic in unbind to only unbind ggtt when there is a global mapping, and to remove a redundant check if the aliasing ppgtt exists. v4: Make the bind function a bit smarter about the cache levels to avoid unnecessary multiple remaps. "I accept it is a wart, I think unifying the pin_vma / bind_vma could be unified later" (Chris) Removed the git notes, and put version info here. (Daniel) v5: Update the comment to not suck (Chris) v6: Move bind/unbind to the VMA. It makes more sense in the VMA structure (always has, but I was previously lazy). With this change, it will allow us to keep a distinct insert_entries. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> drm/i915: Use the new vm [un]bind functions Building on the last patch which created the new function pointers in the VM for bind/unbind, here we actually put those new function pointers to use. Split out as a separate patch to aid in review. I'm fine with squashing into the previous patch if people request it. v2: Updated to address the smart ggtt which can do aliasing as needed Make sure we bind to global gtt when mappable and fenceable. I thought we could get away without this initialy, but we cannot. v3: Make the global GTT binding explicitly use the ggtt VM for bind_vma(). While at it, use the new ggtt_vma helper (Chris) At this point the original mailing list thread diverges. ie. v4^: use target_obj instead of obj for gen6 relocate_entry vma->bind_vma() can be called safely during pin. So simply do that instead of the complicated conditionals. Don't restore PPGTT bound objects on resume path Bug fix in resume path for globally bound Bos Properly handle secure dispatch Rebased on vma bind/unbind conversion Signed-off-by: Ben Widawsky <ben@bwidawsk.net> drm/i915: reduce vm->insert_entries() usage FKA: drm/i915: eliminate vm->insert_entries() With bind/unbind function pointers in place, we no longer need insert_entries. We could, and want, to remove clear_range, however it's not totally easy at this point. Since it's used in a couple of place still that don't only deal in objects: setup, ppgtt init, and restore gtt mappings. v2: Don't actually remove insert_entries, just limit its usage. It will be useful when we introduce gen8. It will always be called from the vma bind/unbind. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v1) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:27:50 +01:00
Ben Widawsky	d7f46fc4e7	drm/i915: Make pin count per VMA Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:27:49 +01:00
Ben Widawsky	feb822cfc2	drm/i915: Handle inactivating objects for all VMAs This came from a patch called, "drm/i915: Move active to vma" When moving an object to the inactive list, we do it for all VMs for which the object is bound. The primary difference from that patch is this time around we don't not track 'active' per vma, but rather by object. Therefore, we only need one unref. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:27:46 +01:00
Ben Widawsky	c39538a88d	drm/i915: Takedown drm_mm on failed gtt setup This was found by code inspection. If the GTT setup fails then we are left without properly tearing down the drm_mm. Hopefully this never happens. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:27:45 +01:00
Ben Widawsky	6e164c3382	drm/i915: Allow ggtt lookups to not WARN To be able to effectively use the GGTT object lookup function, we don't want to warn when there is no GGTT mapping. Let the caller deal with it instead. Originally, I had intended to have this behavior, and has not introduced the WARN. It was introduced during review with the addition of the follow commit commit `5c2abbeab7` Author: Ben Widawsky <benjamin.widawsky@intel.com> Date: Tue Sep 24 09:57:57 2013 -0700 drm/i915: Provide a cheap ggtt vma lookup Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:27:45 +01:00
Ben Widawsky	6f425321e0	drm/i915: Don't unconditionally try to deref aliasing ppgtt Since the beginning, the functions which try to properly reference the aliasing PPGTT have deferences a potentially null aliasing_ppgtt member. Since the accessors are meant to be global, this will not do. Introduced originally in: commit `a70a3148b0` Author: Ben Widawsky <ben@bwidawsk.net> Date: Wed Jul 31 16:59:56 2013 -0700 drm/i915: Make proper functions for VMs Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:27:44 +01:00
Paulo Zanoni	48018a57a8	drm/i915: release the GTT mmaps when going into D3 So we'll get a fault when someone tries to access the mmap, then we'll wake up from D3. v2: - Rebase v3: - Use gtt active/inactive Testcase: igt/pm_pc8/gem-mmap-gtt Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> [danvet: Add comment + WARN as discussed with Paulo on irc.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-14 15:35:52 +01:00
Mika Kuoppala	168c3f2151	drm/i915: dont call irq_put when irq test is on If test is running, irq_get was not called so we should gain balance by not doing irq_put "So the rule is: if you access unlocked values, you use ACCESS_ONCE(). You don't say "but it can't matter". Because you simply don't know." -- Linus v2: use local variable so it can't change during test (Chris) v3: update commit msg and use ACCESS_ONCE (Ville) Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-12 22:58:44 +01:00
Mika Kuoppala	47e9766df0	drm/i915: Fix timeout with missed interrupts in __wait_seqno Commit `094f9a54e3` ("drm/i915: Fix __wait_seqno to use true infinite timeouts") added support for __wait_seqno to detect missing interrupts and go around them by polling. As there is also timeout detection in __wait_seqno, the polling and timeout detection were done with the same timer. When there has been missed interrupts and polling is needed, the timer is set to trigger in (now + 1) jiffies in future, instead of the caller specified timeout. Now when io_schedule() returns, we calculate the jiffies left to timeout using the timer expiration value. As the current jiffies is now bound to be always equal or greater than the expiration value, the timeout_jiffies will become zero or negative and we return -ETIME to caller even tho the timeout was never reached. Fix this by decoupling timeout calculation from timer expiration. v2: Commit message with some sense in it (Chris Wilson) v3: add parenthesis on timeout_expire calculation v4: don't read jiffies without timeout (Chris Wilson) Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-12 15:27:21 +01:00
Chris Wilson	4db080f9e9	drm/i915: Fix erroneous dereference of batch_obj inside reset_status As the rings may be processed and their requests deallocated in a different order to the natural retirement during a reset, /* Whilst this request exists, batch_obj will be on the * active_list, and so will hold the active reference. Only when this * request is retired will the the batch_obj be moved onto the * inactive_list and lose its active reference. Hence we do not need * to explicitly hold another reference here. */ is violated, and the batch_obj may be dereferenced after it had been freed on another ring. This can be simply avoided by processing the status update prior to deallocating any requests. Fixes regression (a possible OOPS following a GPU hang) from commit `aa60c664e6` Author: Mika Kuoppala <mika.kuoppala@linux.intel.com> Date: Wed Jun 12 15:13:20 2013 +0300 drm/i915: find guilty batch buffer on ring resets Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: stable@vger.kernel.org Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> [danvet: Add the code comment Chris supplied.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-12 10:49:05 +01:00
Paulo Zanoni	f65c916898	drm/i915: add runtime put/get calls at the basic places If I add code to enable runtime PM on my Haswell machine, start a desktop environment, then enable runtime PM, these functions will complain that they're trying to read/write registers while the graphics card is suspended. v2: - Simplify i915_gem_fault changes. Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> [danvet: Drop the hunk in i915_hangcheck_elapsed, it's the wrong thing to do.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-10 22:47:33 +01:00
Chris Wilson	70903c3ba8	drm/i915: Fix ordering of unbind vs unpin pages It is useful to assert that if the object is bound, then it must have its pages pinned to prevent the shrinker from reaping its backing store. This is even more useful with the introduction of real-ppgtt whereupon we may have the object bound into several vma, with each instance pinning the backing store. This assertion breaks down during unbind where we unpinned the backing store before decoupling the vma binding. This can be fixed with a trivial reording of the unbind sequence, which reinforces the pin pages bind to vma ... unbind from vma unpin pages concept. v2: Bonus comment Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Tested-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-04 12:10:50 +01:00
Ville Syrjälä	0bf2134780	drm/i915: MI_PREDICATE_RESULT_2 is HSW only The MI_PREDICATE_RESULT_2 register exits only on HSW. On other platforms the same offset is either reserved, or contains some other register. So write the register only on HSW. This regression has been introduced in commit `9435373ef8` Author: Rodrigo Vivi <rodrigo.vivi@gmail.com> Date: Wed Aug 28 16:45:46 2013 -0300 drm/i915: Report enabled slices on Haswell GT3 Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> [danvet: Add regression notice.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-29 15:00:03 +01:00
Daniel Vetter	c09cd6e969	Merge branch 'backlight-rework' into drm-intel-next-queued Pull in Jani's backlight rework branch. This was merged through a separate branch to be able to sort out the Broadwell conflicts properly before pulling it into the main development branch. Conflicts: drivers/gpu/drm/i915/intel_display.c Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-15 10:02:39 +01:00
Ben Widawsky	31a5336e1c	drm/i915/bdw: Swizzling support Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-08 18:09:37 +01:00
Ben Widawsky	5ab31333ac	drm/i915/bdw: Fences on gen8 look just like gen7 Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-08 18:09:36 +01:00
Ben Widawsky	8245be3139	drm/i915: Require HW contexts (when possible) v2: Fixed the botched locking on init_hw failure in i915_reset (Ville) Call cleanup_ringbuffer on failed context create in init_hw (Ville) v3: Add dev argument ti clean_ringbuffer Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-07 09:35:44 +01:00
Paulo Zanoni	de45eaf7b9	drm/i915: fix open-coded DIV_ROUND_UP Use the nice Kernel macro, it makes the code much more readable. Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-21 10:04:03 +02:00
Daniel Vetter	aa5f802181	drm/i915: Use unsigned long for obj->user_pin_count At least on linux sizeof(long) == sizeof(void*) and the thinking is that you can grab about as many references as there's memory. Doesn't really matter, just a bit of OCD since the fixed size data type in a pure in-kernel datastructure look off. v2: Ville asked for an overflow check since no one prevents userspace from incrementing the pin count forever. v3: s/INT/LONG/, noticed by Chris. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-16 22:06:39 +02:00
Chris Wilson	45c5f2022c	drm/i915: Disable all GEM timers and work on unload We have two once very similar functions, i915_gpu_idle() and i915_gem_idle(). The former is used as the lower level operation to flush work on the GPU, whereas the latter is the high level interface to flush the GEM bookkeeping in addition to flushing the GPU. As such i915_gem_idle() also clears out the request and activity lists and cancels the delayed work. This is what we need for unloading the driver, unfortunately we called i915_gpu_idle() instead. In the process, make sure that when cancelling the delayed work and timer, which is synchronous, that we do not hold any locks to prevent a deadlock if the work item is already waiting upon the mutex. This requires us to push the mutex down from the caller to i915_gem_idle(). v2: s/i915_gem_idle/i915_gem_suspend/ Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70334 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: xunx.fang@intel.com [danvet: Only set ums.suspended for !kms as discussed earlier. Chris noticed that this slipped through.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-16 19:42:14 +02:00
Ben Widawsky	3d57e5bd12	drm/i915: Do a fuller init after reset I had this lying around from he original PPGTT series, and thought we might try to get it in by itself. It's convenient to just call i915_gem_init_hw at reset because we'll be adding new things to that function, and having just one function to call instead of reimplementing it in two places is nice. In order to accommodate we cleanup ringbuffers in order to bring them back up cleanly. Optionally, we could also teardown/re initialize the default context but this was causing some problems on reset which I wasn't able to fully debug, and is unnecessary with the previous context init/enable split. This essentially reverts: commit `8e88a2bd59` Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Tue Jun 19 18:40:00 2012 +0200 drm/i915: don't call modeset_init_hw in i915_reset It seems to work for me on ILK now. Perhaps it's due to: commit `8a5c2ae753` Author: Jesse Barnes <jbarnes@virtuousgeek.org> Date: Thu Mar 28 13:57:19 2013 -0700 drm/i915: fix ILK GPU reset for render Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-16 11:08:08 +02:00
Daniel Vetter	3bbbe706e8	drm/i915: check that the i965g/gm 4G limit is really obeyed In truly crazy circumstances shmem might give us the wrong type of page. So be a bit paranoid and double check this. Reviewer: Damien Lespiau <damien.lespiau@intel.com> Cc: Rob Clark <robdclark@gmail.com> References: http://lkml.org/lkml/2011/7/11/238 Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-10 12:47:05 +02:00
Chris Wilson	d9973b4356	drm/i915: Fix type mismatch and accounting in i915_gem_shrink The interface uses an unsigned long, and we can use the unsigned counter throughout our code, so do so. In the process, we notice one instance where the shrink count is based on a heuristic rather than the result, and another where we ask for too many pages to be purged. v2: nr_to_scan needs to be promoted to a long as well, so just use sc->nr_to_scan directly. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-10 12:46:48 +02:00
Chris Wilson	5035c275af	drm/i915: Call io_schedule() whilst whilsting for the GPU Since we are waiting upon IO completion, inform the kernel through use of the io_schedule() call rather than the regular schedule(). This should allow the kernel to make better decisions regarding scheduling and power management. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-10 12:46:47 +02:00
Daniel Vetter	967ad7f148	Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next The conflict in intel_drv.h tripped me up a bit since a patch in dinq moves all the functions around, but another one in drm-next removes a single function. So I'ev figured backing this into a backmerge would be good. i915_dma.c is just adjacent lines changed, nothing nefarious there. Conflicts: drivers/gpu/drm/i915/i915_dma.c drivers/gpu/drm/i915/intel_drv.h Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-10 12:44:43 +02:00
David Herrmann	16eb5f4379	drm: kill ->gem_init_object() and friends All drivers embed gem-objects into their own buffer objects. There is no reason to keep drm_gem_object_alloc(), gem->driver_private and ->gem_init_object() anymore. New drivers are highly encouraged to do the same. There is no benefit in allocating gem-objects separately. Cc: Dave Airlie <airlied@gmail.com> Cc: Alex Deucher <alexdeucher@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Rob Clark <robdclark@gmail.com> Cc: Inki Dae <inki.dae@samsung.com> Cc: Ben Skeggs <skeggsb@gmail.com> Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2013-10-09 14:38:02 +10:00
Chris Wilson	b29c19b645	drm/i915: Boost RPS frequency for CPU stalls If we encounter a situation where the CPU blocks waiting for results from the GPU, give the GPU a kick to boost its the frequency. This should work to reduce user interface stalls and to quickly promote mesa to high frequencies - but the cost is that our requested frequency stalls high (as we do not idle for long enough before rc6 to start reducing frequencies, nor are we aggressive at down clocking an underused GPU). However, this should be mitigated by rc6 itself powering off the GPU when idle, and that energy use is dependent upon the workload of the GPU in addition to its frequency (e.g. the math or sampler functions only consume power when used). Still, this is likely to adversely affect light workloads. In particular, this nearly eliminates the highly noticeable wake-up lag in animations from idle. For example, expose or workspace transitions. (However, given the situation where we fail to downclock, our requested frequency is almost always the maximum, except for Baytrail where we manually downclock upon idling. This often masks the latency of upclocking after being idle, so animations are typically smooth - at the cost of increased power consumption.) Stéphane raised the concern that this will punish good applications and reward bad applications - but due to the nature of how mesa performs its client throttling, I believe all mesa applications will be roughly equally affected. To address this concern, and to prevent applications like compositors from permanently boosting the RPS state, we ratelimit the frequency of the wait-boosts each client recieves. Unfortunately, this techinique is ineffective with Ironlake - which also has dynamic render power states and suffers just as dramatically. For Ironlake, the thermal/power headroom is shared with the CPU through Intelligent Power Sharing and the intel-ips module. This leaves us with no GPU boost frequencies available when coming out of idle, and due to hardware limitations we cannot change the arbitration between the CPU and GPU quickly enough to be effective. v2: Limit each client to receiving a single boost for each active period. Tested by QA to only marginally increase power, and to demonstrably increase throughput in games. No latency measurements yet. v3: Cater for front-buffer rendering with manual throttling. v4: Tidy up. v5: Sadly the compositor needs frequent boosts as it may never idle, but due to its picking mechanism (using ReadPixels) may require frequent waits. Those waits, along with the waits for the vrefresh swap, conspire to keep the GPU at low frequencies despite the interactive latency. To overcome this we ditch the one-boost-per-active-period and just ratelimit the number of wait-boosts each client can receive. Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Kenneth Graunke <kenneth@whitecape.org> Cc: Stéphane Marchesin <stephane.marchesin@gmail.com> Cc: Owen Taylor <otaylor@redhat.com> Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com> Cc: "Zhuang, Lena" <lena.zhuang@intel.com> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> [danvet: No extern for function prototypes in headers.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-03 20:01:31 +02:00
Chris Wilson	094f9a54e3	drm/i915: Fix __wait_seqno to use true infinite timeouts When we switched to always using a timeout in conjunction with wait_seqno, we lost the ability to detect missed interrupts. Since, we have had issues with interrupts on a number of generations, and they are required to be delivered in a timely fashion for a smooth UX, it is important that we do log errors found in the wild and prevent the display stalling for upwards of 1s every time the seqno interrupt is missed. Rather than continue to fix up the timeouts to work around the interface impedence in wait_event_*(), open code the combination of wait_event[_interruptible][_timeout], and use the exposed timer to poll for seqno should we detect a lost interrupt. v2: In order to satisfy the debug requirement of logging missed interrupts with the real world requirments of making machines work even if interrupts are hosed, we revert to polling after detecting a missed interrupt. v3: Throw in a debugfs interface to simulate broken hw not reporting interrupts. v4: s/EGAIN/EAGAIN/ (Imre) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Imre Deak <imre.deak@intel.com> [danvet: Don't use the struct typedef in new code.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-03 20:01:30 +02:00
Chris Wilson	b52b89da09	drm/i915: Add a tracepoint for using a semaphore So that we can find the callers who introduce a ring stall. A single ring stall is not too unwelcome, the right issue becomes when they start to interlock and prevent any concurrent work. That, however, is a little tricker to detect with a mere tracepoint! v2: Rebrand it as a ring event, rather than an object event. v3: Include the seqno in the tracepoint for posterity or something. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-01 07:45:24 +02:00
Ben Widawsky	e2d05a8b1e	drm/i915: Convert active API to VMA Even though we track object activity and not VMA, because we have the active_list be based on the VM, it makes the most sense to use VMAs in the APIs. NOTE: Daniel intends to eventually rip out active/inactive LRUs, but for now, leave them be. v2: Remove leftover hunk from the previous patch which didn't keep i915_gem_object_move_to_active. That patch had to rely on the ring to get the dev instead of the obj. (Chris) Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-01 07:45:21 +02:00
Ben Widawsky	5c2abbeab7	drm/i915: Provide a cheap ggtt vma lookup "We do fairly often lookup the ggtt vma for an obj." - Chris Wilson. As such, provide a function to offer slightly cheaper access to the vma. Not performance tested. By my quick estimation it saves at least 3 pointer dereferences from the existing mechanism. This patch mostly matches code from Chris in <20130911221430.GB7825@nuc-i3427.alporthouse.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-01 07:45:21 +02:00
Chris Wilson	f740334775	drm/i915: Do not unlock upon error in i915_gem_idle() We never took the lock ourselves and all callers expect the struct_mutex to be locked upon return (be it success or error), thereore dropping the lock along the error paths looks to be a vestigial error from commit `db1b76ca6a` Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Tue Jul 9 16:51:37 2013 +0200 drm/i915: don't frob mm.suspended when not using ums Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-01 07:45:03 +02:00
Daniel Vetter	b14c5679dd	drm/i915: use pointer = k[cmz...]alloc(sizeof(*pointer), ...) pattern Done while reviewing all our allocations for fubar. Also a few errant cases of lacking () for the sizeof operator - just a bit of OCD. I've left out all the conversions that also should use kcalloc from this patch (it's only 2). Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-01 07:45:01 +02:00
Dave Airlie	4821ff14a3	Merge tag 'drm-intel-next-2013-09-21-merged' of git://people.freedesktop.org/~danvet/drm-intel into drm-next drm-intel-next-2013-09-21: - clock state handling rework from Ville - l3 parity handling fixes for hsw from Ben - some more watermark improvements from Ville - ban badly behaved context from Mika - a few vlv improvements from Jesse - VGA power domain handling from Ville drm-intel-next-2013-09-06: - Basic mipi dsi support from Jani. Not yet converted over to drm_bridge since that was too fresh, but the porting is in progress already. - More vma patches from Ben, this time the code to convert the execbuffer code. Now that the shrinker recursion bug is tracked down we can move ahead here again. Yay! - Optimize hw context switching to not generate needless interrupts (Chris Wilson). Also some shuffling for the oustanding request allocation. - Opregion support for SWSCI, although not yet fully wired up (we need a bit of runtime D3 support for that apparently, due to Windows design deficiencies), from Jani Nikula. - A few smaller changes all over. [airlied: merge conflict fix in i9xx_set_pipeconf] * tag 'drm-intel-next-2013-09-21-merged' of git://people.freedesktop.org/~danvet/drm-intel: (119 commits) drm/i915: assume all GM45 Acer laptops use inverted backlight PWM drm/i915: cleanup a min_t() cast drm/i915: Pull intel_init_power_well() out of intel_modeset_init_hw() drm/i915: Add POWER_DOMAIN_VGA drm/i915: Refactor power well refcount inc/dec operations drm/i915: Add intel_display_power_{get, put} to request power for specific domains drm/i915: Change i915_request power well handling drm/i915: POSTING_READ IPS_CTL before waiting for the vblank drm/i915: don't disable ERR_INT on the IRQ handler drm/i915/vlv: disable rc6p and rc6pp residency reporting on BYT drm/i915/vlv: honor i915_enable_rc6 boot param on VLV drm/i915: s/HAS_L3_GPU_CACHE/HAS_L3_DPF drm/i915: Do remaps for all contexts drm/i915: Keep a list of all contexts drm/i915: Make l3 remapping use the ring drm/i915: Add second slice l3 remapping drm/i915: Fix HSW parity test drm/i915: dump crtc timings from the pipe config drm/i915: register backlight device also when backlight class is a module drm/i915: write D_COMP using the mailbox ... Conflicts: drivers/gpu/drm/i915/intel_display.c	2013-10-01 10:00:50 +10:00
Daniel Vetter	d32270460f	drm/i915: Fix up usage of SHRINK_STOP In commit `81e49f8114` Author: Glauber Costa <glommer@openvz.org> Date: Wed Aug 28 10:18:13 2013 +1000 i915: bail out earlier when shrinker cannot acquire mutex SHRINK_STOP was added to tell the core shrinker code to bail out and go to the next shrinker since the i915 shrinker couldn't acquire required locks. But the SHRINK_STOP return code was added to the ->count_objects callback and not the ->scan_objects callback as it should have been, resulting in tons of dmesg noise like shrink_slab: i915_gem_inactive_scan+0x0/0x9c negative objects to delete nr=-xxxxxxxxx Fix discusssed with Dave Chinner. References: http://www.spinics.net/lists/intel-gfx/msg33597.html Reported-by: Knut Petersen <Knut_Petersen@t-online.de> Cc: Knut Petersen <Knut_Petersen@t-online.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Glauber Costa <glommer@openvz.org> Cc: Glauber Costa <glommer@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-09-26 00:31:51 +02:00
Daniel Vetter	b599c89e8c	Linux 3.12-rc2 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQEcBAABAgAGBQJSQMORAAoJEHm+PkMAQRiGj14H/1bjhtfNjPdX7MVQAzA+WpwX s7h1IQu2Si9S5S1lBiM2sBTOssVcmfheO9x4yqm7JNOD1RnssWKOM3q+zVOLstwd GD3gluJPeraD5EyYSqEJ9ILPQ3gbxb4wOlT0Z291TW6E8XhLRr0RTOJPksRsgvLH Ckm9uJh6ArS6ZXfXiaDQfd+xHAQJkUfW6nMSA0g9ZO9C6KIDRvcbUmrY3m4HhfIk mK0TXCBs+AXGDIjTEB8JgIQL/5y1Qn0c4R+2uTU/4YWwyLvJTV1e44kGoleukMMT 6Pw/TNlUEN161dbSaqCyF3sfXHDYQ5valycI2PDgitMtPSxbzsU1VDizS8+daRg= =lEmF -----END PGP SIGNATURE----- Merge tag 'v3.12-rc2' into drm-intel-next Backmerge Linux 3.12-rc2 to prep for a bunch of -next patches: - Header cleanup in intel_drv.h, both changed in -fixes and my current -next pile. - Cursor handling cleanup for -next which depends upon the cursor handling fix merged into -rc2. All just trivial conflicts of the "changed adjacent lines" type: drivers/gpu/drm/i915/i915_gem.c drivers/gpu/drm/i915/intel_display.c drivers/gpu/drm/i915/intel_drv.h Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-09-24 09:32:53 +02:00
Linus Torvalds	d8524ae9d6	Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: - some small fixes for msm and exynos - a regression revert affecting nouveau users with old userspace - intel pageflip deadlock and gpu hang fixes, hsw modesetting hangs * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (22 commits) Revert "drm: mark context support as a legacy subsystem" drm/i915: Don't enable the cursor on a disable pipe drm/i915: do not update cursor in crtc mode set drm/exynos: fix return value check in lowlevel_buffer_allocate() drm/exynos: Fix address space warnings in exynos_drm_fbdev.c drm/exynos: Fix address space warning in exynos_drm_buf.c drm/exynos: Remove redundant OF dependency drm/msm: drop unnecessary set_need_resched() drm/i915: kill set_need_resched drm/msm: fix potential NULL pointer dereference drm/i915/dvo: set crtc timings again for panel fixed modes drm/i915/sdvo: Robustify the dtd<->drm_mode conversions drm/msm: workaround for missing irq drm/msm: return -EBUSY if bo still active drm/msm: fix return value check in ERR_PTR() drm/msm: fix cmdstream size check drm/msm: hangcheck harder drm/msm: handle read vs write fences drm/i915/sdvo: Fully translate sync flags in the dtd->mode conversion drm/i915: Use proper print format for debug prints ...	2013-09-22 19:51:49 -07:00
Ben Widawsky	040d2baa62	drm/i915: s/HAS_L3_GPU_CACHE/HAS_L3_DPF We'd only ever used this define to denote whether or not we have the dynamic parity feature (DPF) and never to determine whether or not L3 exists. Baytrail is a good example of where L3 exists, and not DPF. This patch provides clarify in the code for future use cases which might want to actually query whether or not L3 exists. v2: Add /* DPF == dynamic parity feature */ Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-09-19 20:41:00 +02:00
Ben Widawsky	a33afea5ff	drm/i915: Keep a list of all contexts I have implemented this patch before without creating a separate list (I'm having trouble finding the links, but the messages ids are: <1364942743-6041-2-git-send-email-ben@bwidawsk.net> <1365118914-15753-9-git-send-email-ben@bwidawsk.net>) However, the code is much simpler to just use a list and it makes the code from the next patch a lot more pretty. As you'll see in the next patch, the reason for this is to be able to specify when a context needs to get L3 remapping. More details there. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-09-19 20:39:43 +02:00
Ben Widawsky	c3787e2eac	drm/i915: Make l3 remapping use the ring Using LRI for setting the remapping registers allows us to stream l3 remapping information. This is necessary to handle per context remaps as we'll see implemented in an upcoming patch. Using the ring also means we don't need to frob the DOP clock gating bits. v2: Add comment about lack of worry for concurrent register access (Daniel) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> [danvet: Bikeshed the comment a bit by doing a s/XXX/Note - there's nothing to fix.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-09-19 20:38:00 +02:00
Ben Widawsky	35a85ac606	drm/i915: Add second slice l3 remapping Certain HSW SKUs have a second bank of L3. This L3 remapping has a separate register set, and interrupt from the first "slice". A slice is simply a term to define some subset of the GPU's l3 cache. This patch implements both the interrupt handler, and ability to communicate with userspace about this second slice. v2: Remove redundant check about non-existent slice. Change warning about interrupts of unknown slices to WARN_ON_ONCE Handle the case where we get 2 slice interrupts concurrently, and switch the tracking of interrupts to be non-destructive (all Ville) Don't enable/mask the second slice parity interrupt for ivb/vlv (even though all docs I can find claim it's rsvd) (Ville + Bryan) Keep BYT excluded from L3 parity v3: Fix the slice = ffs to be decremented by one (found by Ville). When I initially did my testing on the series, I was using 1-based slice counting, so this code was correct. Not sure why my simpler tests that I've been running since then didn't pick it up sooner. Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-09-19 20:37:04 +02:00
Linus Torvalds	26935fb06e	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile 4 from Al Viro: "list_lru pile, mostly" This came out of Andrew's pile, Al ended up doing the merge work so that Andrew didn't have to. Additionally, a few fixes. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (42 commits) super: fix for destroy lrus list_lru: dynamically adjust node arrays shrinker: Kill old ->shrink API. shrinker: convert remaining shrinkers to count/scan API staging/lustre/libcfs: cleanup linux-mem.h staging/lustre/ptlrpc: convert to new shrinker API staging/lustre/obdclass: convert lu_object shrinker to count/scan API staging/lustre/ldlm: convert to shrinkers to count/scan API hugepage: convert huge zero page shrinker to new shrinker API i915: bail out earlier when shrinker cannot acquire mutex drivers: convert shrinkers to new count/scan API fs: convert fs shrinkers to new scan/count API xfs: fix dquot isolation hang xfs-convert-dquot-cache-lru-to-list_lru-fix xfs: convert dquot cache lru to list_lru xfs: rework buffer dispose list tracking xfs-convert-buftarg-lru-to-generic-code-fix xfs: convert buftarg LRU to generic code fs: convert inode and dentry shrinking to be node aware vmscan: per-node deferred work ...	2013-09-12 15:01:38 -07:00
Daniel Vetter	571c608d06	drm/i915: kill set_need_resched This is just a remnant from the old days when our reset handling was horribly racy, suffered from terribly locking issues and often happily live-locked. Those days are now gone so we can drop the hacks and just rip the reschedule-point out. Reported-by: Peter Zijlstra <peterz@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-09-12 22:40:36 +02:00
Ben Widawsky	23f5448398	drm/i915: Synchronize pread/pwrite with wait_rendering lifted from Daniel: pread/pwrite isn't about the object's domain at all, but purely about synchronizing for outstanding rendering. Replacing the call to set_to_gtt_domain with a wait_rendering would imo improve code readability. Furthermore we could pimp pread to only block for outstanding writes and not for reads. Since you're not the first one to trip over this: Can I volunteer you for a follow-up patch to fix this? v2: Switch the pwrite patch to use \!read_only. This was a typo in the original code. (Chris, Daniel) Recommended-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Fix up the logic fumble - wait_rendering has a bool readonly paramater, set_to_gtt_domain otoh has bool write. Breakage reported by Jani Nikula, I've double-checked that igt/gem_concurrent_blt/prw-* would have caught this.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-09-12 21:56:52 +02:00
Glauber Costa	81e49f8114	i915: bail out earlier when shrinker cannot acquire mutex The main shrinker driver will keep trying for a while to free objects if the returned value from the shrink scan procedure is 0. That means "no objects now", but a retry could very well succeed. But what we should say here is a different thing: that it is impossible to shrink, and we would better bail out soon. We find this behavior more appropriate for the case where the lock cannot be taken. Specially given the hammer behavior of the i915: if another thread is already shrinking, we are likely not to be able to shrink anything anyway when we finally acquire the mutex. Signed-off-by: Glauber Costa <glommer@openvz.org> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Chinner <dchinner@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Kent Overstreet <koverstreet@google.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-09-10 18:56:32 -04:00
Dave Chinner	7dc19d5aff	drivers: convert shrinkers to new count/scan API Convert the driver shrinkers to the new API. Most changes are compile tested only because I either don't have the hardware or it's staging stuff. FWIW, the md and android code is pretty good, but the rest of it makes me want to claw my eyes out. The amount of broken code I just encountered is mind boggling. I've added comments explaining what is broken, but I fear that some of the code would be best dealt with by being dragged behind the bike shed, burying in mud up to it's neck and then run over repeatedly with a blunt lawn mower. Special mention goes to the zcache/zcache2 drivers. They can't co-exist in the build at the same time, they are under different menu options in menuconfig, they only show up when you've got the right set of mm subsystem options configured and so even compile testing is an exercise in pulling teeth. And that doesn't even take into account the horrible, broken code... [glommer@openvz.org: fixes for i915, android lowmem, zcache, bcache] Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Glauber Costa <glommer@openvz.org> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Kent Overstreet <koverstreet@google.com> Cc: John Stultz <john.stultz@linaro.org> Cc: David Rientjes <rientjes@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-09-10 18:56:32 -04:00
Chris Wilson	5a1d5eb020	drm/i915: Remove the double-list iteration from bound_any() The purpose of the function is to find out whether the object is still bound in any address space. This can be easily checked by looking at the vma currently associated with the object, rather than asking if any of the global address spaces have an active vma on the object. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-09-10 16:14:06 +02:00
Mika Kuoppala	be62acb4cc	drm/i915: ban badly behaving contexts Now when we have mechanism in place to track which context was guilty of hanging the gpu, it is possible to punish for bad behaviour. If context has recently submitted a faulty batchbuffers guilty of gpu hang and submits another batch which hangs gpu in quick succession, ban it permanently. If ctx is banned, no more batchbuffers will be queued for execution. There is no need for global wedge machinery anymore and it would be unwise to wedge the whole gpu if we have multiple hanging batches queued for execution. Instead just ban the guilty ones and carry on. v2: Store guilty ban status bool in gpu_error instead of pointers that might become danling before hang is declared. v3: Use return value for banned status instead of stashing state into gpu_error (Chris Wilson) v4: - rebase on top of fixed hang stats api - add define for ban period - rename commit and improve commit msg v5: - rely context banning instead of wedging the gpu - beautification and fix for ban calculation (Chris) Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-09-06 17:55:50 +02:00
Chris Wilson	57094f8246	drm/i915: Hold an object reference whilst we shrink it Whilst running the shrinker, we need to hold a reference as we unbind the objects, or else we may end up waiting for and retiring requests, which in turn may result in this object being freed. This is very similar to the eviction code which also has to be very careful to keep a reference to its objects as it retires and unbinds them. Another similarity, that Ben pointed out, is that as we may call retire-requests, the unbound_list is outside of our control. We must only process a single element of that list at a time, that is we can not rely on the "safe" next pointer being valid after a call to i915_vma_unbind(). BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [<ffffffffa0082892>] i915_gem_gtt_finish_object+0x68/0xbd [i915] PGD 758d3067 PUD ac0d6067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: dm_mod snd_hda_codec_realtek iTCO_wdt iTCO_vendor_support pcspkr snd_hda_intel i2c_i801 snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer snd lpc_ich mfd_core soundcore battery ac option usb_wwan usbserial uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev i915 video button drm_kms_helper drm acpi_cpufreq mperf freq_table CPU: 1 PID: 16835 Comm: fbo-maxsize Not tainted 3.11.0-rc7_nightlytop_8fdad4_20130902_+ #7977 task: ffff8800712106d0 ti: ffff880028e4a000 task.ti: ffff880028e4a000 RIP: 0010:[<ffffffffa0082892>] [<ffffffffa0082892>] i915_gem_gtt_finish_object+0x68/0xbd [i915] RSP: 0018:ffff880028e4b9e8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff880145734000 RCX: ffff880145735328 RDX: ffff8801457353fc RSI: 0000000000000000 RDI: ffff88007597cc00 RBP: ffff88007597cc00 R08: 0000000000000001 R09: ffff88014f257f00 R10: ffffea0001d65f00 R11: 0000000000bba60b R12: ffff880149e5b000 R13: ffff880145734001 R14: ffff88007597ccc8 R15: ffff88007597cc00 FS: 00007ff5bc919740(0000) GS:ffff88014f240000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 0000000028f4c000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Stack: 0000000000000000 ffff88007597cc00 ffff8801440d6840 0000000000000000 ffff880145734000 ffffffffa007c854 0000000000000010 ffff88007597c900 0000000000018000 00000000004a1201 ffff88007597cc60 ffffffffa007d183 Call Trace: [<ffffffffa007c854>] ? i915_vma_unbind+0xe2/0x1d1 [i915] [<ffffffffa007d183>] ? __i915_gem_shrink+0xf1/0x162 [i915] [<ffffffffa007d2ee>] ? i915_gem_object_get_pages_gtt+0xfa/0x303 [i915] [<ffffffffa00795f4>] ? i915_gem_object_get_pages+0x54/0x89 [i915] [<ffffffffa007cbda>] ? i915_gem_object_pin+0x238/0x5ce [i915] [<ffffffff812cba5f>] ? __sg_page_iter_next+0x2b/0x58 [<ffffffffa0082056>] ? gen6_ppgtt_insert_entries+0xf2/0x114 [i915] [<ffffffffa007fe4b>] ? i915_gem_execbuffer_reserve_vma.isra.13+0x79/0x18d [i915] [<ffffffffa008017c>] ? i915_gem_execbuffer_reserve+0x21d/0x347 [i915] [<ffffffffa0080bfb>] ? i915_gem_do_execbuffer.isra.17+0x4f3/0xe61 [i915] [<ffffffffa00795f4>] ? i915_gem_object_get_pages+0x54/0x89 [i915] [<ffffffffa007e405>] ? i915_gem_pwrite_ioctl+0x743/0x7a5 [i915] [<ffffffffa0081a46>] ? i915_gem_execbuffer2+0x15e/0x1e4 [i915] [<ffffffffa000e20d>] ? drm_ioctl+0x2a5/0x3c4 [drm] [<ffffffffa00818e8>] ? i915_gem_execbuffer+0x37f/0x37f [i915] [<ffffffff816f64c0>] ? __do_page_fault+0x3ab/0x449 [<ffffffff810be3da>] ? do_mmap_pgoff+0x2b2/0x341 [<ffffffff810e49be>] ? vfs_ioctl+0x1e/0x31 [<ffffffff810e5194>] ? do_vfs_ioctl+0x3ad/0x3ef [<ffffffff810e5224>] ? SyS_ioctl+0x4e/0x7e [<ffffffff816f88d2>] ? system_call_fastpath+0x16/0x1b Code: 52 0c a0 48 c7 c6 22 30 0d a0 31 c0 e8 ef 00 f9 ff bf c6 a7 00 00 e8 90 5d 24 e1 f6 85 13 01 00 00 10 75 44 48 8b 85 18 01 00 00 <8b> 50 08 48 8b 30 49 8b 84 24 88 02 00 00 48 89 c7 48 81 c7 98 RIP [<ffffffffa0082892>] i915_gem_gtt_finish_object+0x68/0xbd [i915] RSP <ffff880028e4b9e8> CR2: 0000000000000008 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68171 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org [danvet: Bikeshed the comments a bit as discussed with Chris.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-09-05 14:47:59 +02:00
Chris Wilson	3c0e234c84	drm/i915; Preallocate the lazy request It is possible for us to be forced to perform an allocation for the lazy request whilst running the shrinker. This allocation may fail, leaving us unable to reclaim any memory leading to premature OOM. A neat solution to the problem is to preallocate the request at the same time as acquiring the seqno for the ring transaction. This means that we can report ENOMEM prior to touching the rings. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-09-05 12:03:53 +02:00
Chris Wilson	1823521d2b	drm/i915: Rename ring->outstanding_lazy_request Prior to preallocating an request for lazy emission, rename the existing field to make way (and differentiate the seqno from the request struct). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-09-05 12:03:12 +02:00
Chris Wilson	9a7e0c2a1b	drm/i915: Rearrange the comments in i915_add_request() The comments were a little out-of-sequence with the code, forcing the reader to jump around whilst reading. Whilst moving the comments around, add one to explain the context reference. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-09-04 17:34:54 +02:00
Chris Wilson	c0321e2c5a	drm/i915: Do not add an interrupt for a context switch We use the request to ensure we hold a reference to the context for the duration that it remains in use by the ring. Each request only holds a reference to the current context, hence we emit a request after switching contexts with the final reference to the old context. However, the extra interrupt caused by that request is not useful (no timing critical function will wait for the context object), instead the overhead of servicing the IRQ shows up in some (lightweight) benchmarks. In order to keep the useful property of using the request to manage the context lifetime, we want to add a dummy request that is associated with the interrupt from the subsequent real request following the batch. The extra interrupt was added as a side-effect of using i915_add_request() in commit `112522f678` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu May 2 16:48:07 2013 +0300 drm/i915: put context upon switching v2: Daniel convinced me that the request here was solely for context lifetime tracking and that we have the active ref to keep the object alive whilst the MI_SET_CONTEXT. So the only concern then is which context should get the blame for MI_SET_CONTEXT failing. The old scheme added a request for the old context so that any hang upto and including the switch away would mark the old context as guilty. Now any hang here implicates the new context. However since we have already gone through a complete flush with the last context in its last request, and all that lies in no-man's-land is an invalidate flush and the MI_SET_CONTEXT, we should be safe in not unduly placing blame on the new context. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-09-04 17:34:53 +02:00
Daniel Vetter	0ff501cbb5	drm/i915: Fix list corruption in vma_unbind The saga around the breadcrumb vmas used by execbuf continues ... This time around we've managed to unconditionally move the object to the unbound list on the last vma unbind even though it might never have been on either the bound or unbound list. Hilarity ensued. Chris Wilson tracked this one down but compared to his patches I've simply opted to completely separate the unbound case for not-yet bound vmas. Otherwise we imo end up with semantically hard to parse checks around the list_move_tail(global_list, ...). Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <ben@bwidawsk.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68462 Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-09-04 17:34:52 +02:00
Rodrigo Vivi	9435373ef8	drm/i915: Report enabled slices on Haswell GT3 Batchbuffers constructed by userspace can conditionalise their URB allocations through the use of the MI_SET_PREDICATE command. This command can read the MI_PREDICATE_RESULT_2 register to see how many slices are enabled on GT3, and by virtue of the result, scale their memory allocations to fit enabled memory. Of course, this only works if the kernel sets the appropriate bit in the register first. v2: Better commit subject and message by Chris Wilson. Cc: Chris Wilson <chris@chris-wilson.co.uk> Credits-to: Yejun Guo <yejun.guo@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-09-04 17:34:51 +02:00
Daniel Vetter	b93dab6e9d	drm/i915: More vma fixups around unbind/destroy The important bugfix here is that we must not unlink the vma when we keep it around as a placeholder for the execbuf code. Since then we won't find it again when execbuf gets interrupt and restarted and create a 2nd vma. And since the code as-is isn't fit yet to deal with more than one vma, hilarity ensues. Specifically the dma map/unmap of the sg table isn't adjusted for multiple vmas yet and will blow up like this: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [<ffffffffa008fb37>] i915_gem_gtt_finish_object+0x73/0xc8 [i915] PGD 56bb5067 PUD ad3dd067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: tcp_lp ppdev parport_pc lp parport ipv6 dm_mod dcdbas snd_hda_codec_hdmi pcspkr snd_hda_codec_realtek serio_raw i2c_i801 iTCO_wdt iTCO_vendor_support snd_hda_intel snd_hda_codec lpc_ich snd_hwdep mfd_core snd_pcm snd_page_alloc snd_timer snd soundcore acpi_cpufreq i915 video button drm_kms_helper drm mperf freq_table CPU: 1 PID: 16650 Comm: fbo-maxsize Not tainted 3.11.0-rc4_nightlytop_d93f59_debug_20130814_+ #6957 Hardware name: Dell Inc. OptiPlex 9010/03JR84, BIOS A01 05/04/2012 task: ffff8800563b3f00 ti: ffff88004bdf4000 task.ti: ffff88004bdf4000 RIP: 0010:[<ffffffffa008fb37>] [<ffffffffa008fb37>] i915_gem_gtt_finish_object+0x73/0xc8 [i915] RSP: 0018:ffff88004bdf5958 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8801135e0000 RCX: ffff8800ad3bf8e0 RDX: ffff8800ad3bf8e0 RSI: 0000000000000000 RDI: ffff8801007ee780 RBP: ffff88004bdf5978 R08: ffff8800ad3bf8e0 R09: 0000000000000000 R10: ffffffff86ca1810 R11: ffff880036a17101 R12: ffff8801007ee780 R13: 0000000000018001 R14: ffff880118c4e000 R15: ffff8801007ee780 FS: 00007f401a0ce740(0000) GS:ffff88011e280000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 000000005635c000 CR4: 00000000001407e0 Stack: ffff8801007ee780 ffff88005c253180 0000000000018000 ffff8801135e0000 ffff88004bdf59a8 ffffffffa0088e55 0000000000000011 ffff8801007eec00 0000000000018000 ffff880036a17101 ffff88004bdf5a08 ffffffffa0089026 Call Trace: [<ffffffffa0088e55>] i915_vma_unbind+0xdf/0x1ab [i915] [<ffffffffa0089026>] __i915_gem_shrink+0x105/0x177 [i915] [<ffffffffa0089452>] i915_gem_object_get_pages_gtt+0x108/0x309 [i915] [<ffffffffa0085ba9>] i915_gem_object_get_pages+0x61/0x90 [i915] [<ffffffffa008f22b>] ? gen6_ppgtt_insert_entries+0x103/0x125 [i915] [<ffffffffa008a113>] i915_gem_object_pin+0x1fa/0x5df [i915] [<ffffffffa008cdfe>] i915_gem_execbuffer_reserve_object.isra.6+0x8d/0x1bc [i915] [<ffffffffa008d156>] i915_gem_execbuffer_reserve+0x229/0x367 [i915] [<ffffffffa008dbf6>] i915_gem_do_execbuffer.isra.12+0x4dc/0xf3a [i915] [<ffffffff810fc823>] ? might_fault+0x40/0x90 [<ffffffffa008eb89>] i915_gem_execbuffer2+0x187/0x222 [i915] [<ffffffffa000971c>] drm_ioctl+0x308/0x442 [drm] [<ffffffffa008ea02>] ? i915_gem_execbuffer+0x3ae/0x3ae [i915] [<ffffffff817db156>] ? __do_page_fault+0x3dd/0x481 [<ffffffff8112fdba>] vfs_ioctl+0x26/0x39 [<ffffffff811306a2>] do_vfs_ioctl+0x40e/0x451 [<ffffffff817deda7>] ? sysret_check+0x1b/0x56 [<ffffffff8113073c>] SyS_ioctl+0x57/0x87 [<ffffffff8135bbfe>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff817ded82>] system_call_fastpath+0x16/0x1b Code: 48 c7 c6 84 30 0e a0 31 c0 e8 d0 e9 f7 ff bf c6 a7 00 00 e8 07 af 2c e1 41 f6 84 24 03 01 00 00 10 75 44 49 8b 84 24 08 01 00 00 <8b> 50 08 48 8b 30 49 8b 86 b0 04 00 00 48 89 c7 48 81 c7 98 00 RIP [<ffffffffa008fb37>] i915_gem_gtt_finish_object+0x73/0xc8 [i915] RSP <ffff88004bdf5958> CR2: 0000000000000008 As a consequence we need to change the "only one vma for now" check in vma_unbind - since vma_destroy isn't always called the obj->vma_list might not be empty. Instead check that the vma list is singular at the beginning of vma_unbind. This is also more symmetric with bind_to_vm. This fixes the igt/gem_evict_everything\|alignment testcases. v2: - Add a paranoid WARN to mark_free in the eviction code to make sure we never try to evict a vma used by the execbuf code right now. - Move the check for a temporary execbuf vma into vma_destroy - otherwise the failure path cleanup in bind_to_vm will blow up. Our first attempting at fixing this was commit 1be81a2f2cfd8789a627401d470423358fba2d76 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Aug 20 12:56:40 2013 +0100 drm/i915: Don't destroy the vma placeholder during execbuffer reservation Squash with this when merging! v3: Improvements suggested in Chris' review: - Move the WARN_ON in vma_destroy that checks for vmas with an drm_mm allocation before the early return. - Bail out if we hit the WARN in mark_free to hopefully make the kernel survive for long enough to capture it. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <ben@bwidawsk.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68298 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68171 Tested-by: lu hua <huax.lu@intel.com> (v2) Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-09-04 17:34:42 +02:00
Chris Wilson	aaa0566792	drm/i915: Don't destroy the vma placeholder during execbuffer reservation The execbuffer handle and exec_link were moved from the object into the vma. As the vma may be unbound and destroyed whilst attempting to reserve the execbuffer objects (either through a forced unbind to fix up a misalignment or through an evict-everything call) we need to prevent the free of the i915_vma itself. Otherwise not only is the list of objects to reserve corrupt, but we continue to reference stale vma entries. Fixes kernel crash with i-g-t/gem_evict_everything This regression has been introduced in commit 04038a515d6eda6dd0857c0ade0b3950d372f4c0 Author: Ben Widawsky <ben@bwidawsk.net> AuthorDate: Wed Aug 14 11:38:36 2013 +0200 drm/i915: Convert execbuf code to use vmas Reported-by: Dan Carpenter <dan.carpenter@oracle.com> References: http://www.spinics.net/lists/intel-gfx/msg32038.html Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68298 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-09-04 17:34:42 +02:00
Daniel Vetter	e656a6cba0	drm/i915: inline vma_create into lookup_or_create_vma In the execbuf code we don't clean up any vmas which ended up not getting bound for code simplicity. To make sure that we don't end up creating multiple vma for the same vm kill the somewhat dangerous vma_create function and inline it into lookup_or_create. This is just a safety measure to prevent surprises in the future. Also update the somewhat confused comment in the execbuf code and clarify what kind of magic is going on with a new one. v2: Keep the function separate as requested by Chris. But give it a __ prefix for paranoia and move it tighter together with the other vma stuff. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <ben@bwidawsk.net> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-09-04 17:34:41 +02:00
Ben Widawsky	27173f1f95	drm/i915: Convert execbuf code to use vmas In order to transition more of our code over to using a VMA instead of an <OBJ, VM> pair - we must have the vma accessible at execbuf time. Up until now, we've only had a VMA when actually binding an object. The previous patch helped handle the distinction on bound vs. unbound. This patch will help us catch leaks, and other issues before we actually shuffle a bunch of stuff around. This attempts to convert all the execbuf code to speak in vmas. Since the execbuf code is very self contained it was a nice isolated conversion. The meat of the code is about turning eb_objects into eb_vma, and then wiring up the rest of the code to use vmas instead of obj, vm pairs. Unfortunately, to do this, we must move the exec_list link from the obj structure. This list is reused in the eviction code, so we must also modify the eviction code to make this work. WARNING: This patch makes an already hotly profiled path slower. The cost is unavoidable. In reply to this mail, I will attach the extra data. v2: Release table lock early, and two a 2 phase vma lookup to avoid having to use a GFP_ATOMIC. (Chris) v3: s/obj_exec_list/obj_exec_link/ Updates to address commit `6d2b888569` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Aug 7 18:30:54 2013 +0100 drm/i915: List objects allocated from stolen memory in debugfs v4: Use obj = vma->obj for neatness in some places (Chris) need_reloc_mappable() should return false if ppgtt (Chris) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Split out prep patches. Also remove a FIXME comment which is now taken care of.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-09-04 17:34:41 +02:00
Damien Lespiau	d2933a5b8f	drm/i915: Don't call sg_free_table() if sg_alloc_table() fails One needs to call __sg_free_table() if __sg_alloc_table() fails, but sg_alloc_table() does that for us already. Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> Reviewd-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-09-03 19:18:00 +02:00
Joe Perches	fac15c1082	i915_gem: Convert kmem_cache_alloc(...GFP_ZERO) to kmem_cache_zalloc The helper exists, might as well use it instead of __GFP_ZERO. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-09-03 19:17:56 +02:00
Dave Airlie	efa27f9cec	Merge tag 'drm-intel-next-2013-08-23' of git://people.freedesktop.org/~danvet/drm-intel into drm-next Need to get my stuff out the door ;-) Highlights: - pc8+ support from Paulo - more vma patches from Ben. - Kconfig option to enable preliminary support by default (Josh Triplett) - Optimized cpu cache flush handling and support for write-through caching of display planes on Iris (Chris) - rc6 tuning from Stéphane Marchesin for more stability - VECS seqno wrap/semaphores fix (Ben) - a pile of smaller cleanups and improvements all over Note that I've ditched Ben's execbuf vma conversion for 3.12 since not yet ready. But there's still other vma conversion stuff in here. * tag 'drm-intel-next-2013-08-23' of git://people.freedesktop.org/~danvet/drm-intel: (62 commits) drm/i915: Print seqnos as unsigned in debugfs drm/i915: Fix context size calculation on SNB/IVB/VLV drm/i915: Use POSTING_READ in lcpll code drm/i915: enable Package C8+ by default drm/i915: add i915.pc8_timeout function drm/i915: add i915_pc8_status debugfs file drm/i915: allow package C8+ states on Haswell (disabled) drm/i915: fix SDEIMR assertion when disabling LCPLL drm/i915: grab force_wake when restoring LCPLL drm/i915: drop WaMbcDriverBootEnable workaround drm/i915: Cleaning up the relocate entry function drm/i915: merge HSW and SNB PM irq handlers drm/i915: fix how we mask PMIMR when adding work to the queue drm/i915: don't queue PM events we won't process drm/i915: don't disable/reenable IVB error interrupts when not needed drm/i915: add dev_priv->pm_irq_mask drm/i915: don't update GEN6_PMIMR when it's not needed drm/i915: wrap GEN6_PMIMR changes drm/i915: wrap GTIMR changes drm/i915: add the FCLK case to intel_ddi_get_cdclk_freq ...	2013-08-30 09:47:41 +10:00
Paulo Zanoni	c67a470b1d	drm/i915: allow package C8+ states on Haswell (disabled) This patch allows PC8+ states on Haswell. These states can only be reached when all the display outputs are disabled, and they allow some more power savings. The fact that the graphics device is allowing PC8+ doesn't mean that the machine will actually enter PC8+: all the other devices also need to allow PC8+. For now this option is disabled by default. You need i915.allow_pc8=1 if you want it. This patch adds a big comment inside i915_drv.h explaining how it works and how it tracks things. Read it. v2: (this is not really v2, many previous versions were already sent, but they had different names) - Use the new functions to enable/disable GTIMR and GEN6_PMIMR - Rename almost all variables and functions to names suggested by Chris - More WARNs on the IRQ handling code - Also disable PC8 when there's GPU work to do (thanks to Ben for the help on this), so apps can run caster - Enable PC8 on a delayed work function that is delayed for 5 seconds. This makes sure we only enable PC8+ if we're really idle - Make sure we're not in PC8+ when suspending v3: - WARN if IRQs are disabled on __wait_seqno - Replace some DRM_ERRORs with WARNs - Fix calls to restore GT and PM interrupts - Use intel_mark_busy instead of intel_ring_advance to disable PC8 v4: - Use the force_wake, Luke! v5: - Remove the "IIR is not zero" WARNs - Move the force_wake chunk to its own patch - Only restore what's missing from RC6, not everything Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-23 14:52:33 +02:00
Ben Widawsky	accfef2e5a	drm/i915: prepare bind_to_vm for preallocated vma In the new execbuf code we want to track buffers using the vmas even before they're all properly mapped. Which means that bind_to_vm needs to deal with buffers which have preallocated vmas which aren't yet bound. This patch implements this prep work and adjusts our WARN/BUG checks. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Split out from Ben's big execbuf patch. Also move one BUG back to its original place to deflate the diff a notch.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-22 13:31:53 +02:00
Ben Widawsky	82a55ad1a0	drm/i915: Switch eviction code to use vmas The execbuf wants to do relocations usings vmas, so we need a vma->exec_list. The eviction code also uses the old obj execbuf list for it's own book-keeping, but would really prefer to deal in vmas only. So switch it over to the new list. Again this is just a prep patch for the big execbuf vma conversion. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Split out from Ben's big execbuf vma patch.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-22 13:31:52 +02:00
Ben Widawsky	b25cb2f882	drm/i915: s/obj->exec_list/obj->obj_exec_link in debugfs To convert the execbuf code over to use vmas natively we need to shuffle the exec_list a bit. This patch here just prepares things with the debugfs code, which also uses the old exec_list list_head, newly called obj_exec_link. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Split out from Ben's big patch.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-22 13:31:51 +02:00
Chris Wilson	4b6d846e9a	drm/i915: Drop the overzealous warning from i915_gem_set_cache_level By our earlier reckoning, move from a snooped/llc setting to an uncached setting, leaves the CPU cache in a consistent state irrespective of our domain tracking - so we can forgo the warning about the lack of invalidation. Similarly for any writes posted to the snooped CPU domain, we know will be safely clflushed to the uncached PTEs after forcing the domain change. This WARN started to pop up with commit `d46f1c3f13` Author: Chris Wilson <chris@chris-wilson.co.uk> AuthorDate: Thu Aug 8 14:41:06 2013 +0100 drm/i915: Allow the GPU to cache stolen memory Ville brought up a scenario where the interaction of a set_caching ioctl call from userspace on a scanout buffer (i.e. obj->pin_display is set) resulted in the code getting confused and not properly flushing stale cpu cachelines. Luckily we already prevent this by rejecting caching changes when obj->pin_count is set. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68040 Tested-by: cancan,feng <cancan.feng@intel.com> [danvet: Add buglink, bisect result and explain why Ville's scenario is already taken care of.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-22 13:31:46 +02:00
Daniel Vetter	49987099e2	drm/i915: use vma->node directly and rewrap map&fence in bind Use () to make for neater alignment of the split lines, too. With this we ditch another jump through the obj_gtt_size/offset indirection maze. Cc: Ben Widawsky <benjamin.widawsky@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-22 13:31:45 +02:00
Ben Widawsky	4bd561b3e8	drm/i915: cleanup map&fence in bind Cleanup the map and fenceable setting during bind to make more sense, and not check i915_is_ggtt() 2 unnecessary times v2: Move the bools into the if block (Chris) - There are ways to tidy this function (fence calculations for instance) even further, but they are quite invasive, so I am punting on those unless specifically asked. v3: Add newline between variable declaration and logic (Chris) Recommended-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-22 13:31:45 +02:00
Ben Widawsky	433544bd25	drm/i915: Remove node only when allocated VMAs can be created and not bound. One may think of it as lazy cleanup, and safely gloss over the conditions which manufacture it. In either case, when the object backing the i915 vma is destroyed, we must cleanup the vma without stumbling into a bunch of pitfalls that assume the vma is bound. NOTE: I was pretty certain the above condition could only happen when we introduced the use of VMAs being looked up at execbuf, and already existing. Paulo has hit this though, so I must be missing something. As I believe the patch is correct anyway, therefore I won't scratch my head too hard. v2: use goto destroy as a compromise (Chris) Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-22 13:31:44 +02:00
Chris Wilson	4257d3ba3b	drm/i915: Allow the user to set bo into the DISPLAY cache domain This is primarily for the benefit of the create2 ioctl so that the caller can avoid the later step of rebinding the bo with new PTE bits. After introducing WT (and possibly GFDT) cacheing for display targets, not everything in the display is earmarked as UC, and more importantly what is is controlled by the kernel. Note that set_cache_level/get_cache_level for DISPLAY is not necessarily idempotent; get_cache_level may return UC for architectures that have no special cache domain for the display engine. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-22 13:31:39 +02:00
Chris Wilson	651d794fae	drm/i915: Use Write-Through cacheing for the display plane on Iris Haswell GT3e has the unique feature of supporting Write-Through cacheing of objects within the eLLC/LLC. The purpose of this is to enable the display plane to remain coherent whilst objects lie resident in the eLLC/LLC - so that we, in theory, get the best of both worlds, perfect display and fast access. However, we still need to be careful as the CPU does not see the WT when accessing the cache. In particular, this means that we need to flush the cache lines after writing to an object through the CPU, and on transitioning from a cached state to WT. v2: Actually do the clflush on transition to WT, nagging by Ville. v3: Flush the CPU cache after writes into WT objects. v4: Rease onto LLC updates and report WT as "uncached" for get_cache_level_ioctl to remain symmetric with set_cache_level_ioctl. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-22 13:31:38 +02:00
Jani Nikula	f2f4d82faf	drm/i915: give more distinctive names to ring hangcheck action enums The short lowercase names are bound to collide. The default warnings don't even warn about shadowing. Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-22 13:31:37 +02:00
Ben Widawsky	7ace7ef2f5	drm/i915: WARN_ON failed map_and_fenceable I just noticed in our code we don't really check the assertion, and given some of the code I am changing in this area, I feel a WARN is very nice to have. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: s/&/&&/ to fix typo on the check.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-22 13:31:35 +02:00
Chris Wilson	000433b67e	drm/i915: Only do a chipset flush after a clflush Now that we skip clflushes more often, return a boolean indicating whether the clflush was actually performed, and only if it was do the chipset flush. (Though on most of the architectures where the clflush will be skipped, the chipset flush is a no-op!) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-22 13:31:34 +02:00
Dave Airlie	9712def2b3	Merge tag 'drm-intel-next-2013-08-09' of git://people.freedesktop.org/~danvet/drm-intel into drm-next Daniel writes: New pile of stuff for -next: - Cleanup of the old crtc helper callbacks, all encoders are now converted to the i915 modeset infrastructure. - Massive amount of wm patches from Ville for ilk, snb, ivb, hsw, this is prep work to eventually get things going for nuclear pageflips where we need to adjust watermarks on the fly. - More vm/vma patches from Ben. This refactoring isn't yet fully rolled out, we miss the execbuf conversion and some of the low-level bind/unbind support code. - Convert our hdmi infoframe code to use the new common helper functions (Damien). This contains some bugfixes for the common infoframe helpers. - Some cruft removal from Damien. - Various smaller bits&pieces all over, as usual. * tag 'drm-intel-next-2013-08-09' of git://people.freedesktop.org/~danvet/drm-intel: (105 commits) drm/i915: Fix FB WM for HSW drm/i915: expose HDMI connectors on port C on BYT drm/i915: fix a limit check in hsw_compute_wm_results() drm/i915: unbreak i915_gem_object_ggtt_unbind() drm/i915: Make intel_set_mode() static drm/i915: Remove intel_modeset_disable() drm/i915: Make intel_encoder_dpms() static drm/i915: Make i915_hangcheck_elapsed() static drm/i915: Fix #endif comment drm/i915: Remove i915_gem_object_check_coherency() drm/i915: Remove stale prototypes drm/i915: List objects allocated from stolen memory in debugfs drm/i915: Always call intel_update_sprite_watermarks() when disabling a plane drm/i915: Pass plane and crtc to intel_update_sprite_watermarks drm/i915: Don't try to disable plane if it's already disabled drm/i915: Pass crtc to our update/disable_plane hooks drm/i915: Split plane watermark parameters into a separate struct drm/i915: Pull some watermarks state into a separate structure drm/i915: Calculate max watermark levels for ILK+ drm/i915: Rename hsw_lp_wm_result to intel_wm_level ...	2013-08-21 12:48:59 +10:00
Chris Wilson	2c22569bba	drm/i915: Update rules for writing through the LLC with the cpu As mentioned in the previous commit, reads and writes from both the CPU and GPU go through the LLC. This gives us coherency between the CPU and GPU irrespective of the attribute settings either device sets. We can use to avoid having to clflush even uncached memory. Except for the scanout. The scanout resides within another functional block that does not use the LLC but reads directly from main memory. So in order to maintain coherency with the scanout, writes to uncached memory must be flushed. In order to optimize writes elsewhere, we start tracking whether an framebuffer is attached to an object. v2: Use pin_display tracking rather than fb_count (to ensure we flush cursors as well etc) and only force the clflush along explicit writes to the scanout paths (i.e. pin_to_display_plane and pwrite into scanout). v3: Force the flush after hitting the slowpath in pwrite, as after dropping the lock the object's cache domain may be invalidated. (Ville) Based on a patch by Ville Syrjälä. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-10 11:20:49 +02:00
Chris Wilson	cc98b413c1	drm/i915: Track when an object is pinned for use by the display engine The display engine has unique coherency rules such that it requires special handling to ensure that all writes to cursors, scanouts and sprites are clflushed. This patch introduces the infrastructure to simply track when an object is being accessed by the display engine. v2: Explain the is_pin_display() magic as the sources for obj->pin_count and their individual rules is not obvious. (Ville) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-10 11:19:51 +02:00
Chris Wilson	c76ce038e3	drm/i915: Update rules for reading cache lines through the LLC The LLC is a fun device. The cache is a distinct functional block within the SA that arbitrates access from both the CPU and GPU cores. As such all writes to memory land first in the LLC before further action is taken. For example, an uncached write from either the CPU or GPU will then proceed to memory and evict the cacheline from the LLC. This means that a read from the LLC always returns the correct information even if the PTE bit in the GPU differs from the PAT bit in the CPU. For the older snooping architecture on non-LLC, the fundamental principle still holds except that some coordination is required between the CPU and GPU to explicitly perform the snooping (which is handled by our request tracking). The upshot of this is that we know that we can issue a read from either LLC devices or snoopable memory and trust the contents of the cache - i.e. we can forgo a clflush before a read in these circumstances. Writing to memory from the CPU is a little more tricky as we have to consider that the scanout does not read from the CPU cache at all, but from main memory. So we have to currently treat all requests to write to uncached memory as having to be flushed to main memory for coherency with all consumers. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-10 11:19:50 +02:00
Dan Carpenter	58e73e1570	drm/i915: unbreak i915_gem_object_ggtt_unbind() There is an extra semi-colon here so we just leak and never unbind anything. This regression has been introduced in commit `07fe0b1280` Author: Ben Widawsky <ben@bwidawsk.net> Date: Wed Jul 31 17:00:10 2013 -0700 drm/i915: plumb VM into bind/unbind code Cc: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-09 12:04:53 +02:00
Ben Widawsky	8b9c2b9411	drm/i915: Add vma to list at creation With the current code there shouldn't be a distinction - however with an upcoming change we intend to allocate a vma much earlier, before it's actually bound anywhere. To do this we have to check node allocation as well for the _bound() check. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: move list_del(&vma->vma_link) from vma_unbind to vma_destroy, again fallout from the loss of "rm/i915: Cleanup more of VMA in destroy".] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> fixup for drm/i915: Add vma to list at creation	2013-08-08 14:10:20 +02:00
Ben Widawsky	ca191b1313	drm/i915: mm_list is per VMA formerly: "drm/i915: Create VMAs (part 5) - move mm_list" The mm_list is used for the active/inactive LRUs. Since those LRUs are per address space, the link should be per VMx . Because we'll only ever have 1 VMA before this point, it's not incorrect to defer this change until this point in the patch series, and doing it here makes the change much easier to understand. Shamelessly manipulated out of Daniel: "active/inactive stuff is used by eviction when we run out of address space, so needs to be per-vma and per-address space. Bound/unbound otoh is used by the shrinker which only cares about the amount of memory used and not one bit about in which address space this memory is all used in. Of course to actual kick out an object we need to unbind it from every address space, but for that we have the per-object list of vmas." v2: only bump GGTT LRU in i915_gem_object_set_to_gtt_domain (Chris) v3: Moved earlier in the series v4: Add dropped message from v3 Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Frob patch to apply and use vma->node.size directly as discused with Ben. Also drop a needles BUG_ON before move_to_inactive, the function itself has the same check.] [danvet 2nd: Rebase on top of the lost "drm/i915: Cleanup more of VMA in destroy", specifically unlink the vma from the mm_list in vma_unbind (to keep it symmetric with bind_to_vm) instead of vma_destroy.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-08 14:06:58 +02:00
Ben Widawsky	5cacaac77c	drm/i915: Fix up map and fenceable for VMA formerly: "drm/i915: Create VMAs (part 3.5) - map and fenceable tracking" The map_and_fenceable tracking is per object. GTT mapping, and fences only apply to global GTT. As such, object operations which are not performed on the global GTT should not effect mappable or fenceable characteristics. Functionally, this commit could very well be squashed in to a previous patch which updated object operations to take a VM argument. This commit is split out because it's a bit tricky (or at least it was for me). Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Drop the bogus hunk in i915_vma_unbind as discussed with Ben.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-08 14:04:55 +02:00
Ben Widawsky	9843877d10	drm/i915: turn bound_ggtt checks to bound_any In some places, we want to know if an object is bound in any address space, and not just the global GTT. This often applies when there is a single global resource (object, pages, etc.) function \| reason -------------------------------------------------- i915_gem_object_is_inactive \| global object i915_gem_object_put_pages \| object's pages 915_gem_object_unpin \| global object i915_gem_execbuffer_unreserve_object \| temporary until we plumb vma pread/pwrite \| see the note below Note: set_to_gtt_domain in pwrite/pread is abused as a wait_rendering call - but that once only worked if the object is bound. We really should replace this with a plain wait_rendering call, which would have the upside that in pread it would be clearer that we actually only wait for oustanding gpu writes. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Explain the set_to_gtt_domain in pwrite/pread and volunteer Ben to replace those with wait_rendering calls.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-08 14:04:43 +02:00
Ben Widawsky	f6cd1f15d3	drm/i915: Use new bind/unbind in eviction code Eviction code, like the rest of the converted code needs to be aware of the address space for which it is evicting (or the everything case, all addresses). With the updated bind/unbind interfaces of the last patch, we can now safely move the eviction code over. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-08 14:04:43 +02:00
Ben Widawsky	07fe0b1280	drm/i915: plumb VM into bind/unbind code As alluded to in several patches, and it will be reiterated later... A VMA is an abstraction for a GEM BO bound into an address space. Therefore it stands to reason, that the existing bind, and unbind are the ones which will be the most impacted. This patch implements this, and updates all callers which weren't already updated in the series (because it was too messy). This patch represents the bulk of an earlier, larger patch. I've pulled out a bunch of things by the request of Daniel. The history is preserved for posterity with the email convention of ">" One big change from the original patch aside from a bunch of cropping is I've created an i915_vma_unbind() function. That is because we always have the VMA anyway, and doing an extra lookup is useful. There is a caveat, we retain an i915_gem_object_ggtt_unbind, for the global cases which might not talk in VMAs. > drm/i915: plumb VM into object operations > > This patch was formerly known as: > "drm/i915: Create VMAs (part 3) - plumbing" > > This patch adds a VM argument, bind/unbind, and the object > offset/size/color getters/setters. It preserves the old ggtt helper > functions because things still need, and will continue to need them. > > Some code will still need to be ported over after this. > > v2: Fix purge to pick an object and unbind all vmas > This was doable because of the global bound list change. > > v3: With the commit to actually pin/unpin pages in place, there is no > longer a need to check if unbind succeeded before calling put_pages(). > Make put_pages only BUG() after checking pin count. > > v4: Rebased on top of the new hangcheck work by Mika > plumbed eb_destroy also > Many checkpatch related fixes > > v5: Very large rebase > > v6: > Change BUG_ON to WARN_ON (Daniel) > Rename vm to ggtt in preallocate stolen, since it is always ggtt when > dealing with stolen memory. (Daniel) > list_for_each will short-circuit already (Daniel) > remove superflous space (Daniel) > Use per object list of vmas (Daniel) > Make obj_bound_any() use obj_bound for each vm (Ben) > s/bind_to_gtt/bind_to_vm/ (Ben) > > Fixed up the inactive shrinker. As Daniel noticed the code could > potentially count the same object multiple times. While it's not > possible in the current case, since 1 object can only ever be bound into > 1 address space thus far - we may as well try to get something more > future proof in place now. With a prep patch before this to switch over > to using the bound list + inactive check, we're now able to carry that > forward for every address space an object is bound into. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Rebase on top of the loss of "drm/i915: Cleanup more of VMA in destroy".] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-08 14:04:20 +02:00
Ben Widawsky	80dcfdbd68	drm/i915: Rework __i915_gem_shrink In order to do this for all VMs, it's convenient to rework the logic a bit. This should have no functional impact. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-08 14:02:41 +02:00
Dave Airlie	32c913e436	Merge tag 'drm-intel-next-2013-07-26-fixed' of git://people.freedesktop.org/~danvet/drm-intel into drm-next Neat that QA (and Ben) keeps on humming along while I'm on vacation, so you already get the next feature pull request: - proper eLLC support for HSW from Ben - more interrupt refactoring - add w/a tags where we implement them already (Damien) - hangcheck fixes (Chris) + hangcheck stats (Mika) - flesh out the new vm structs for ppgtt and ggtt (Ben) - PSR for Haswell, still disabled by default (Rodrigo et al.) - pc8+ refclock sequence code from Paulo - more interrupt refactoring from Paulo, unifying ilk/snb with the ivb/hsw interrupt code - full solution for the Haswell concurrent reg access issues (Chris) - fix racy object accounting, used by some new leak tests - fix sync polarity settings on ch7xxx dvo encoder - random bits&pieces, little fixes and better debug output all over [airlied: fix conflict with drm_mm cleanups] * tag 'drm-intel-next-2013-07-26-fixed' of git://people.freedesktop.org/~danvet/drm-intel: (289 commits) drm/i915: Do not dereference NULL crtc or fb until after checking drm/i915: fix pnv display core clock readout out drm/i915: Replace open-coded offset_in_page() drm/i915: Retry DP aux_ch communications with a different clock after failure drm/i915: Add messages useful for HPD storm detection debugging (v2) drm/i915: dvo_ch7xxx: fix vsync polarity setting drm/i915: fix the racy object accounting drm/i915: Convert the register access tracepoint to be conditional drm/i915: Squash gen lookup through multiple indirections inside GT access drm/i915: Use the common register access functions for NOTRACE variants drm/i915: Use a private interface for register access within GT drm/i915: Colocate all GT access routines in the same file drm/i915: fix reference counting in i915_gem_create drm/i915: Use Graphics Base of Stolen Memory on all gen3+ drm/i915: disable stolen mem for OVERLAY_NEEDS_PHYSICAL drm/i915: add functions to disable and restore LCPLL drm/i915: disable CLKOUT_DP when it's not needed drm/i915: extend lpt_enable_clkout_dp drm/i915: fix up error cleanup in i915_gem_object_bind_to_gtt drm/i915: Add some debug breadcrumbs to connector detection ...	2013-08-07 18:11:35 +10:00
David Herrmann	31e5d7c67b	drm/mm: add "best_match" flag to drm_mm_insert_node() Add a "best_match" flag similar to the drm_mm_search_() helpers so we can convert TTM to use them in follow up patches. We can also inline the non-generic helpers and move them into the header to allow compile-time optimizations. To make calls to drm_mm_{search,insert}_node() more readable, this converts the boolean argument to a flagset. There are pending patches that add additional flags for top-down allocators and more. v2: - use flag parameter instead of boolean "best_match" - convert _search_free() helpers to also use flags argument Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@redhat.com>	2013-08-07 10:08:58 +10:00
Daniel Vetter	43387b37fa	drm/gem: create drm_gem_dumb_destroy All the gem based kms drivers really want the same function to destroy a dumb framebuffer backing storage object. So give it to them and roll it out in all drivers. This still leaves the option open for kms drivers which don't use GEM for backing storage, but it does decently simplify matters for gem drivers. Acked-by: Inki Dae <inki.dae@samsung.com> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: Intel Graphics Development <intel-gfx@lists.freedesktop.org> Cc: Ben Skeggs <skeggsb@gmail.com> Reviwed-by: Rob Clark <robdclark@gmail.com> Cc: Alex Deucher <alexdeucher@gmail.com> Acked-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@redhat.com>	2013-08-07 09:59:24 +10:00
Ben Widawsky	637efacf8f	drm/i915: eliminate dead domain clearing on reset The code itself is no longer accurate without updating once we have multiple address space since clearing the domains of every object requires scanning the inactive list for all VMs. "This code is dead. Just remove it rather than port it to vma." - Chris Wilson Recommended-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-05 19:12:25 +02:00
Ben Widawsky	d1ccbb5d71	drm/i915: make reset&hangcheck code VM aware Hangcheck, and some of the recent reset code for guilty batches need to know which address space the object was in at the time of a hangcheck. This is because we use offsets in the (PP\|G)GTT to determine this information, and those offsets can differ depending on which VM they are bound into. Since we still only have 1 VM ever, this code shouldn't yet have any impact. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-05 19:04:13 +02:00
Ben Widawsky	3e12302705	drm/i915: BUG_ON put_pages later With multiple VMs, the eviction code benefits from being able to blindly put pages without needing to know if there are any entities still holding on to those pages. As such it's preferable to return the -EBUSY before the BUG. Eviction code is the only user for now, but overall it makes sense anyway, IMO. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-05 19:04:13 +02:00
Ben Widawsky	3089c6f239	drm/i915: make caching operate on all address spaces For now, objects will maintain the same cache levels amongst all address spaces. This is to limit the risk of bugs, as playing with cacheability in the different domains can be very error prone. In the future, it may be optimal to allow setting domains per VMA (ie. an object bound into an address space). Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-05 19:04:12 +02:00
Ben Widawsky	c37e220461	drm/i915: Add VM to pin To verbalize it, one can say, "pin an object into the given address space." The semantics of pinning remain the same otherwise. Certain objects will always have to be bound into the global GTT. Therefore, global GTT is a special case, and keep a special interface around for it (i915_gem_obj_ggtt_pin). v2: s/i915_gem_ggtt_pin/i915_gem_obj_ggtt_pin Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-05 19:04:09 +02:00
Ben Widawsky	fcb4a57805	drm/i915: Use bound list for inactive shrink Do to the move active/inactive lists, it no longer makes sense to use them for shrinking, since shrinking isn't VM specific (such a need may also exist, but doesn't yet). What we can do instead is use the global bound list to find all objects which aren't active. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-05 19:04:09 +02:00
Ben Widawsky	a70a3148b0	drm/i915: Make proper functions for VMs Earlier in the conversion sequence we attempted to quickly wedge in the transitional interface as static inlines. Now that we're sure these interfaces are sane, for easier debug and to decrease code size (since many of these functions may be called quite a bit), make them real functions While at it, kill off the set_color interface. We'll always have the VMA, or easily get to it. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-05 19:04:08 +02:00
Ben Widawsky	fc8c067eee	drm/i915: Create an init vm Move all the similar address space (VM) initialization code to one function. Until we have multiple VMs, there should only ever be 1 VM. The aliasing ppgtt is a special case without it's own VM (since it doesn't need it's own address space management). Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-08-05 19:04:07 +02:00
Daniel Vetter	c20e835586	drm/i915: fix the racy object accounting Just use a spinlock to protect them. v2: Rebase onto the new object create refcount fix patch. v3: Don't kill dev_priv->mm.object_memory as requested by Chris and hence just use a spinlock instead of atomic_t. Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67287 Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-25 15:30:54 +02:00
Daniel Vetter	cb54b53ada	Merge commit 'Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux' This backmerges Linus' merge commit of the latest drm-fixes pull: commit `549f3a1218` Merge: `42577ca` `058ca4a` Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Tue Jul 23 15:47:08 2013 -0700 Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux We've accrued a few too many conflicts, but the real reason is that I want to merge the 100% solution for Haswell concurrent registers writes into drm-intel-next. But that depends upon the 90% bandaid merged into -fixes: commit `a7cd1b8fea` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Jul 19 20:36:51 2013 +0100 drm/i915: Serialize almost all register access Also, we can roll up on accrued conflicts. Usually I'd backmerge a tagged -rc, but I want to get this done before heading off to vacations next week ;-) Conflicts: drivers/gpu/drm/i915/i915_dma.c drivers/gpu/drm/i915/i915_gem.c v2: For added hilarity we have a init sequence conflict around the gt_lock, so need to move that one, too. Spotted by Jani Nikula. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-25 15:18:41 +02:00
David Herrmann	51335df9f0	drm/vma: provide drm_vma_node_unmap() helper Instead of unmapping the nodes in TTM and GEM users manually, we provide a generic wrapper which does the correct thing for all vma-nodes. v2: remove bdev->dev_mapping test in ttm_bo_unmap_virtual_unlocked() as ttm_mem_io_free_vm() does nothing in that case (io_reserved_vm is 0). v4: Fix docbook comments v5: use drm_vma_node_size() Cc: Dave Airlie <airlied@redhat.com> Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@gmail.com>	2013-07-25 20:47:08 +10:00
David Herrmann	0de23977cf	drm/gem: convert to new unified vma manager Use the new vma manager instead of the old hashtable. Also convert all drivers to use the new convenience helpers. This drops all the (map_list.hash.key << PAGE_SHIFT) non-sense. Locking and access-management is exactly the same as before with an additional lock inside of the vma-manager, which strictly wouldn't be needed for gem. v2: - rebase on drm-next - init nodes via drm_vma_node_reset() in drm_gem.c v3: - fix tegra v4: - remove duplicate if (drm_vma_node_has_offset()) checks - inline now trivial drm_vma_node_offset_addr() calls v5: - skip node-reset on gem-init due to kzalloc() - do not allow mapping gem-objects with offsets (backwards compat) - remove unneccessary casts Cc: Inki Dae <inki.dae@samsung.com> Cc: Rob Clark <robdclark@gmail.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Acked-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@gmail.com>	2013-07-25 20:47:06 +10:00
Daniel Vetter	d861e33876	drm/i915: fix reference counting in i915_gem_create This function is called without the dev->struct_mutex held, hence we need to use the _unlocked unreference variants. As soon as the object is registered userspace can sneak in here with a gem_close ioctl call, so the object can (and with my new evil tests actually does) get the final unreference in this place. The lack of locking then results in hilarity and some good leakage. To fix this we simply need to revert Chris Wilson <chris@chris-wilson.co.uk> v2: We need to make the trace call _before_ we drop our ref - the object might very well be gone by then already. v3: Just revert the original patch as suggested by Chris Wilson. Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Remove the added white line again to tighten the return block, requested by Chris.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-24 23:25:03 +02:00
Daniel Vetter	bc6bc15bd7	drm/i915: fix up error cleanup in i915_gem_object_bind_to_gtt This has been broken in commit `2f63315692` Author: Ben Widawsky <ben@bwidawsk.net> Date: Wed Jul 17 12:19:03 2013 -0700 drm/i915: Create VMAs which resulted in an OOPS the first time around we've hit -ENOSPC. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67156 Cc: Imre Deak <imre.deak@intel.com> Cc: Ben Widawsky <ben@bwidawsk.net> Tested-by: meng <mengmeng.meng@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-24 10:37:08 +02:00
Xiong Zhang	0b74b508f7	drm/i915: add prefault_disable module option prefault is stll enabled by default which prevent most of pwrite/pread/reloc from running slow path, in order to verify these slow pathes, prefault need to be disabled. Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com> [danvet: Make checkpatch happy and bikeshed the module option help text a bit.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-19 09:29:26 +02:00
Dan Carpenter	6286ef9b56	drm/i915: use after free on error path i915_gem_vma_destroy() frees its argument so we have to move the drm_mm_remove_node() call up a few lines. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-19 08:58:42 +02:00
Dan Carpenter	db473b36d4	drm/i915: checking for NULL instead of IS_ERR() i915_gem_vma_create() returns and ERR_PTR() or a valid pointer, it never returns NULL. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-19 08:58:33 +02:00
Dave Airlie	e13af9a834	Merge tag 'drm-intel-next-2013-07-12' of git://people.freedesktop.org/~danvet/drm-intel into drm-next Highlights: - follow-up refactoring after the shared dpll rework that landed in 3.11 - oddball prep cleanups from Ben for ppgtt - encoder->get_config state tracking infrastructure from Jesse - used by the experimental fastboot support from Jesse (disabled by default) - make the error state file official and add it to our sysfs interface (Mika) - drm_mm prep changes from Ben, prepares to embedd the drm_mm_node (which will be used by the vma rework later on) - interrupt handling rework, follow up cleanups to the VECS enabling, hpd storm handling and fifo underrun reporting. - Big pile of smaller cleanups, code improvements and related stuff. * tag 'drm-intel-next-2013-07-12' of git://people.freedesktop.org/~danvet/drm-intel: (72 commits) drm/i915: clear DPLL reg when disabling i9xx dplls drm/i915: Fix up cpt pixel multiplier enable sequence drm/i915: clean up vlv ->pre_pll_enable and pll enable sequence drm/i915: move error state to own compilation unit drm/i915: Don't attempt to read an unitialized stack value drm/i915: Use for_each_pipe() when possible drm/i915: don't enable PM_VEBOX_CS_ERROR_INTERRUPT drm/i915: unify ring irq refcounts (again) drm/i915: kill dev_priv->rps.lock drm/i915: queue work outside spinlock in hsw_pm_irq_handler drm/i915: streamline hsw_pm_irq_handler drm/i915: irq handlers don't need interrupt-safe spinlocks drm/i915: kill lpt pch transcoder->crtc mapping code for fifo underruns drm/i915: improve GEN7_ERR_INT clearing for fifo underrun reporting drm/i915: improve SERR_INT clearing for fifo underrun reporting drm/i915: extract ibx_display_interrupt_update drm/i915: remove unused members from drm_i915_private drm/i915: don't frob mm.suspended when not using ums drm/i915: Fix VLV DP RBR/HDMI/DAC PLL LPF coefficients drm/i915: WARN if the bios reserved range is bigger than stolen size ... Conflicts: drivers/gpu/drm/i915/i915_gem.c	2013-07-19 12:12:21 +10:00
Daniel Vetter	94a335dba3	drm/i915: correctly restore fences with objects attached To avoid stalls we delay tiling changes and especially hold of committing the new fence state for as long as possible. Synchronization points are in the execbuf code and in our gtt fault handler. Unfortunately we've missed that tricky detail when adding proper fence restore code in commit `19b2dbde57` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jun 12 10:15:12 2013 +0100 drm/i915: Restore fences after resume and GPU resets The result was that we've restored fences for objects with no tiling, since the object<->fence link still existed after resume. Now that wouldn't have been too bad since any subsequent access would have fixed things up, but if we've changed from tiled to untiled real havoc happened: The tiling stride is stored -1 in the fence register, so a stride of 0 resulted in all 1s in the top 32bits, and so a completely bogus fence spanning everything from the start of the object to the top of the GTT. The tell-tale in the register dumps looks like: FENCE START 2: 0x0214d001 FENCE END 2: 0xfffff3ff Bit 11 isn't set since the hw doesn't store it, even when writing all 1s (at least on my snb here). To prevent such a gaffle in the future add a sanity check for fences with an untiled object attached in i915_gem_write_fence. v2: Fix the WARN, spotted by Chris. v3: Trying to reuse get_fences looked ugly and obfuscated the code. Instead reuse update_fence and to make it really dtrt also move the fence dirty state clearing into update_fence. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Stéphane Marchesin <marcheu@chromium.org> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=60530 Cc: stable@vger.kernel.org (for 3.10 only) Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Matthew Garrett <matthew.garrett@nebula.com> Tested-by: Björn Bidar <theodorstormgrade@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-19 00:08:16 +02:00
Daniel Vetter	8157ee2115	Linux 3.10 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQEcBAABAgAGBQJR0K2gAAoJEHm+PkMAQRiGWsEH+gMZSN1qRm34hZ82q1Tx7HvL Eb/Gsl3Qw/7G2TlTqgjBUs36IdqV9O2cui/aa3/TfXvdvrx+0GlhRkEwQPc+ygcO Mvoyoke4tT4+4jVFdCg1J8avREsa28/6oaHs0ZZxuVmJBBLTJH7aXaNsGn6eU1q9 9+p798MQis6naIiPC63somlZcCIiBhsuWCPWpEfLMn8G1HWAFTM3xXIbNBqe/brS bmIOfhomlIZ5dcdaXGvjtP3+KJhkNDwhkPC4tVYu8JqqgSlrE+a+EGyEuuGqKk10 U+swiqyuD31uBI9ga54u/2FzSqDiAu6YOcMXevjo/m3g9XLdYbYLvN+nvN8alCQ= =Ob6Z -----END PGP SIGNATURE----- Merge tag 'v3.10' into drm-intel-fixes Backmerge Linux 3.10 to get at commit `19b2dbde57` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jun 12 10:15:12 2013 +0100 drm/i915: Restore fences after resume and GPU resets That commit is not in my current -fixes pile since that's based on my -next queue for 3.11. And the above mentioned fix was merged really late into 3.10 (and blew up, bad me) so was on a diverging branch. Option B would have been to rebase my current pile of fixes onto Dave's drm-fixes branch. But since some of the patches here are a bit tricky I've decided not to void all the testing by moving over the entire merge window. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-18 12:03:29 +02:00
Ben Widawsky	2f63315692	drm/i915: Create VMAs Formerly: "drm/i915: Create VMAs (part 1)" In a previous patch, the notion of a VM was introduced. A VMA describes an area of part of the VM address space. A VMA is similar to the concept in the linux mm. However, instead of representing regular memory, a VMA is backed by a GEM BO. There may be many VMAs for a given object, one for each VM the object is to be used in. This may occur through flink, dma-buf, or a number of other transient states. Currently the code depends on only 1 VMA per object, for the global GTT (and aliasing PPGTT). The following patches will address this and make the rest of the infrastructure more suited v2: s/i915_obj/i915_gem_obj (Chris) v3: Only move an object to the now global unbound list if there are no more VMAs for the object which are bound into a VM (ie. the list is empty). v4: killed obj->gtt_space some reworks due to rebase v5: Free vma on error path (Imre) v6: Another missed vma free in i915_gem_object_bind_to_gtt error path (Imre) Fixed vma freeing in stolen preallocation (Imre) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Imre Deak <imre.deak@intel.com> [danvet: Squash in fixup from Ben to not deref a non-existing vma in set_cache_level, reported by Chris.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-18 08:46:13 +02:00
Ben Widawsky	5cef07e162	drm/i915: Move active/inactive lists to new mm Shamelessly manipulated out of Daniel :-) "When moving the lists around explain that the active/inactive stuff is used by eviction when we run out of address space, so needs to be per-vma and per-address space. Bound/unbound otoh is used by the shrinker which only cares about the amount of memory used and not one bit about in which address space this memory is all used in. Of course to actual kick out an object we need to unbind it from every address space, but for that we have the per-object list of vmas." v2: Leave the bound list as a global one. (Chris, indirectly) v3: Rebased with no i915_gtt_vm. In most places I added a new *vm local, since it will eventually be replaces by a vm argument. Put comment back inline, since it no longer makes sense to do otherwise. v4: Rebased on hangcheck/error state movement Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-17 22:24:32 +02:00
Ben Widawsky	93bd8649db	drm/i915: Put the mm in the parent address space Every address space should support object allocation. It therefore makes sense to have the allocator be part of the "superclass" which GGTT and PPGTT will derive. Since our maximum address space size is only 2GB we're not yet able to avoid doing allocation/eviction; but we'd hope one day this becomes almost irrelvant. v2: Rebased Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-17 22:23:43 +02:00
Ben Widawsky	853ba5d223	drm/i915: Move gtt and ppgtt under address space umbrella The GTT and PPGTT can be thought of more generally as GPU address spaces. Many of their actions (insert entries), state (LRU lists), and many of their characteristics (size) can be shared. Do that. The change itself doesn't actually impact most of the VMA/VM rework coming up, it just fits in with the grand scheme of abstracting the GPU VM operations. GGTT will usually be a special case where we either know an object must be in the GGTT (dislay engine, workarounds, etc.). The scratch page is left as part of the VM (even though it's currently shared with the ppgtt code) because in the future when we have Full PPGTT, I intend to create a separate scratch page for each. v2: Drop usage of i915_gtt_vm (Daniel) Make cleanup also part of the parent class (Ben) Modified commit msg Rebased v3: Properly share scratch page (Imre) Finish commit message (Daniel, Imre) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-17 22:21:47 +02:00
Dave Airlie	6bd2cab2c1	Merge tag 'drm-intel-fixes-2013-07-11' of git://people.freedesktop.org/~danvet/drm-intel One feature latecomer, I've forgotten to merge the patch to reeanble the Haswell power well feature now that the audio interaction is fixed up. Since that was the only unfixed issue with it I've figured I could throw it in a bit late, and it's trivial to revert in case I'm wrong. Otherwise all bug/regression fixes: - Fix status page reinit after gpu hangs, spotted by more paranoid igt checks. - Fix object list walking fumble regression in the shrinker (only the counting part, the actual shrinking code was correct so no Oops potential), from Xiong Zhang. - Fix DP 1.2 bw limits (Imre). - Restore legacy forcewake on ivb, too many broken biosen out there. We dump a warn though that recent userspace might fall over with that config (Guenter Roeck). - Patch up the gen2 cs tlb w/a. - Improve the fence coherency w/a now that we have a better understanding what's going on. The removed wbinvd+ipi should make -rt folks happy. Big thanks to Jon Bloomfield for figuring this out, patches from Chris. - Fix write-read race when switching ring (Chris). Spotted with code inspection, but now we also have an igt for it. There's an ugly regression we're still working on introduced between 3.10-rc7 and 3.10.0. Unfortunately we can't just revert the offender since that one fixes another regression :( I've asked Steven to include my -fixes branch into linux-next to prevent such fallout in the future, hopefully. * tag 'drm-intel-fixes-2013-07-11' of git://people.freedesktop.org/~danvet/drm-intel: Revert "drm/i915: Workaround incoherence between fences and LLC across multiple CPUs" drm/i915: Fix incoherence with fence updates on Sandybridge+ drm/i915: Fix write-read race with multiple rings Partially revert "drm/i915: unconditionally use mt forcewake on hsw/ivb" drm/i915: fix lane bandwidth capping for DP 1.2 sinks drm/i915: fix up ring cleanup for the i830/i845 CS tlb w/a drm/i915: Correct obj->mm_list link to dev_priv->dev_priv->mm.inactive_list drm/i915: switch disable_power_well default value to 1 drm/i915: reinit status page registers after gpu reset	2013-07-17 08:40:49 +10:00
Mika Kuoppala	10cd45b6e8	drm/i915: introduce i915_queue_hangcheck To run hangcheck in near future. Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-16 12:44:02 +02:00
Ben Widawsky	59124506ba	drm/i915: store eLLC size The eLLC cannot be determined by PCIID because as far as we know, even machines supporting eLLC may not have it enabled, or fused off or whatever. It's possible this isn't actually true, and at that point we can switch to a DEV_INFO flag instead. I've defined everything where the docs are clear, and left the rest as magic. But we need it before we set the pte_encode function pointers, which happens really early, in gtt_init. The problem with just doing the normal sequence earlier is we don't have the ability to use forcewake until after the pte functions have been set up. Since all solutions are somewhat ugly (barring rewriting all the init ordering), I've opted to do the detection really early, and the enabling later - since the register to detect doesn't require forcewake. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Move dev_priv->ellc_size away from the dri1 dungeon to a nice place right next to the l3 parity stuff. Also squash in the follow-up commit to read out the eLLC size a bit earlier.] Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-16 08:08:21 +02:00
Ben Widawsky	05e21cc43d	drm/i915: Define some of the eLLC magic The EDRAM present register isn't really defined in the docs. It just says check to see if it's set to 1. So I haven't defined the 1 value not knowing what it actually means. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-16 08:00:52 +02:00
Chris Wilson	46a0b638f3	Revert "drm/i915: Workaround incoherence between fences and LLC across multiple CPUs" This reverts commit `25ff119` and the follow on for Valleyview commit `2dc8aae`. commit `25ff1195f8` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Apr 4 21:31:03 2013 +0100 drm/i915: Workaround incoherence between fences and LLC across multiple CPUs commit `2dc8aae06d` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed May 22 17:08:06 2013 +0100 drm/i915: Workaround incoherence with fence updates on Valleyview Jon Bloomfield came up with a plausible explanation and cheap fix (drm/i915: Fix incoherence with fence updates on Sandybridge+) for the race condition, so lets run with it. This is a candidate for stable as the old workaround incurs a significant cost (calling wbinvd on all CPUs before performing the register write) for some workloads as noted by Carsten Emde. Link: http://lists.freedesktop.org/archives/intel-gfx/2013-June/028819.html References: https://www.osadl.org/?id=1543#c7602 References: https://bugs.freedesktop.org/show_bug.cgi?id=63825 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Carsten Emde <C.Emde@osadl.org> Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-10 15:31:12 +02:00
Chris Wilson	d18b961903	drm/i915: Fix incoherence with fence updates on Sandybridge+ This hopefully fixes the root cause behind the workaround added in commit `25ff1195f8` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Apr 4 21:31:03 2013 +0100 drm/i915: Workaround incoherence between fences and LLC across multiple CPUs Thanks to further investigation by Jon Bloomfield, he realised that the 64-bit register might be broken up by the hardware into two 32-bit writes (a problem we have encountered elsewhere). This non-atomicity would then cause an issue where a second thread would see an intermediate register state (new high dword, old low dword), and this register would randomly be used in preference to its own thread register. This would cause the second thread to read from and write into a fairly random tiled location. Breaking the operation into 3 explicit 32-bit updates (first disable the fence, poke the upper bits, then poke the lower bits and enable) ensures that, given proper serialisation between the 32-bit register write and the memory transfer, that the fence value is always consistent. Armed with this knowledge, we can explain how the previous workaround work. The key to the corruption is that a second thread sees an erroneous fence register that conflicts and overrides its own. By serialising the fence update across all CPUs, we have a small window where no GTT access is occurring and so hide the potential corruption. This also leads to the conclusion that the earlier workaround was incomplete. v2: Be overly paranoid about the order in which fence updates become visible to the GPU to make really sure that we turn the fence off before doing the update, and then only switch the fence on afterwards. Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Carsten Emde <C.Emde@osadl.org> Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-10 14:41:46 +02:00
Daniel Vetter	db1b76ca6a	drm/i915: don't frob mm.suspended when not using ums In kernel modeset driver mode we're in full control of the chip, always. So there's no need at all to set mm.suspended in i915_gem_idle. Hence move that out into the leavevt ioctl. Since i915_gem_idle doesn't suspend gem any more we can also drop the re-enabling for KMS in the thaw function. Also clean up the handling of mm.suspend at driver load by coalescing all the assignments. Stumbled over while reading through our resume code for unrelated reasons. v2: Shovel mm.suspended into the (newly created) ums dungeon as suggested by Chris Wilson. The plan is that once we've completely stopped relying on the register save/restore code we could shovel even that in there. v3: Improve the locking for the entervt/leavevt ioctls a bit by moving the dev->struct_mutex locking outside of i915_gem_idle. Also don't clear dev_priv->ums.mm_suspended for the kms case, we allocate it with kzalloc. Both suggested by Chris Wilson. Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2) Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-10 14:30:25 +02:00
Chris Wilson	02978ff57a	drm/i915: Fix write-read race with multiple rings Daniel noticed a problem where is we wrote to an object with ring A in the middle of a very long running batch, then executed a quick batch on ring B before a batch that reads from the same object, its obj->ring would now point to ring B, but its last_write_seqno would be still relative to ring A. This would allow for the user to read from the object before the GPU had completed the write, as set_domain would only check that ring B had passed the last_write_seqno. To fix this simply (and inelegantly), we bump the last_write_seqno when switching rings so that the last_write_seqno is always relative to the current obj->ring. This fixes igt/tests/gem_write_read_ring_switch. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: stable@vger.kernel.org [danvet: Add note about the newly created igt which exercises this bug.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-10 10:41:55 +02:00
Linus Torvalds	2e17c5a97e	Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux Pull drm updates from Dave Airlie: "Okay this is the big one, I was stalled on the fbdev pull req as I stupidly let fbdev guys merge a patch I required to fix a warning with some patches I had, they ended up merging the patch from the wrong place, but the warning should be fixed. In future I'll just take the patch myself! Outside drm: There are some snd changes for the HDMI audio interactions on haswell, they've been acked for inclusion via my tree. This relies on the wound/wait tree from Ingo which is already merged. Major changes: AMD finally released the dynamic power management code for all their GPUs from r600->present day, this is great, off by default for now but also a huge amount of code, in fact it is most of this pull request. Since it landed there has been a lot of community testing and Alex has sent a lot of fixes for any bugs found so far. I suspect radeon might now be the biggest kernel driver ever :-P p.s. radeon.dpm=1 to enable dynamic powermanagement for anyone. New drivers: Renesas r-car display unit. Other highlights: - core: GEM CMA prime support, use new w/w mutexs for TTM reservations, cursor hotspot, doc updates - dvo chips: chrontel 7010B support - i915: Haswell (fbc, ips, vecs, watermarks, audio powerwell), Valleyview (enabled by default, rc6), lots of pll reworking, 30bpp support (this time for sure) - nouveau: async buffer object deletion, context/register init updates, kernel vp2 engine support, GF117 support, GK110 accel support (with external nvidia ucode), context cleanups. - exynos: memory leak fixes, Add S3C64XX SoC series support, device tree updates, common clock framework support, - qxl: cursor hotspot support, multi-monitor support, suspend/resume support - mgag200: hw cursor support, g200 mode limiting - shmobile: prime support - tegra: fixes mostly I've been banging on this quite a lot due to the size of it, and it seems to okay on everything I've tested it on." * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (811 commits) drm/radeon/dpm: implement vblank_too_short callback for si drm/radeon/dpm: implement vblank_too_short callback for cayman drm/radeon/dpm: implement vblank_too_short callback for btc drm/radeon/dpm: implement vblank_too_short callback for evergreen drm/radeon/dpm: implement vblank_too_short callback for 7xx drm/radeon/dpm: add checks against vblank time drm/radeon/dpm: add helper to calculate vblank time drm/radeon: remove stray line in old pm code drm/radeon/dpm: fix display_gap programming on rv7xx drm/nvc0/gr: fix gpc firmware regression drm/nouveau: fix minor thinko causing bo moves to not be async on kepler drm/radeon/dpm: implement force performance level for TN drm/radeon/dpm: implement force performance level for ON/LN drm/radeon/dpm: implement force performance level for SI drm/radeon/dpm: implement force performance level for cayman drm/radeon/dpm: implement force performance levels for 7xx/eg/btc drm/radeon/dpm: add infrastructure to force performance levels drm/radeon: fix surface setup on r1xx drm/radeon: add support for 3d perf states on older asics drm/radeon: set default clocks for SI when DPM is disabled ...	2013-07-09 16:04:31 -07:00
Xiong Zhang	067556084a	drm/i915: Correct obj->mm_list link to dev_priv->dev_priv->mm.inactive_list obj->mm_list link to dev_priv->mm.inactive_list/active_list obj->global_list link to dev_priv->mm.unbound_list/bound_list This regression has been introduced in commit `93927ca52a` Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Thu Jan 10 18:03:00 2013 +0100 drm/i915: Revert shrinker changes from "Track unbound pages" Cc: stable@vger.kernel.org Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com> [danvet: Add regression notice.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-09 16:31:48 +02:00
Ben Widawsky	c6cfb32567	drm/i915: Embed drm_mm_node in i915 gem obj Embedding the node in the obj is more natural in the transition to VMAs which will also have embedded nodes. This change also helps transition away from put_block to remove node. Though it's quite an uncommon occurrence, it's somewhat convenient to not fail at bind time because we cannot allocate the node. Though in practice there are other allocations (like the request structure) which would probably make this point not terribly useful. Quoting Daniel: Note that the only difference between put_block and remove_node is that the former fills up the preallocation cache. Which we don't need anyway and hence is just wasted space. v2: Clean up the stolen preallocation code. Rebased on the reserve_node patches renames ggtt_ stuff to gtt_ stuff WARN_ON if the object is already bound (which doesn't mean it's in the bound list, tricky) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-08 22:04:36 +02:00
Ben Widawsky	edd41a870f	drm/i915: Kill obj->gtt_offset With the getters in place from the previous patch this members serves no purpose other than saving one spare pointer chase, which will be killed in the next patch anyway. Moving to VMAs, this members adds unnecessary confusion since an object may exist at different offsets in different VMs. v2: Properly preserve the stolen offset. This code is a bit hacky but it all goes away when we embed the drm_mm_node and removes the need for the incorrect patch I submitted previously: "Use gtt_space->start for stolen reservation" Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-08 22:04:35 +02:00
Ben Widawsky	f343c5f647	drm/i915: Getter/setter for object attributes Soon we want to gut a lot of our existing assumptions how many address spaces an object can live in, and in doing so, embed the drm_mm_node in the object (and later the VMA). It's possible in the future we'll want to add more getter/setter methods, but for now this is enough to enable the VMAs. v2: Reworked commit message (Ben) Added comments to the main functions (Ben) sed -i "s/i915_gem_obj_set_color/i915_gem_obj_ggtt_set_color/" drivers/gpu/drm/i915/.[ch] sed -i "s/i915_gem_obj_bound/i915_gem_obj_ggtt_bound/" drivers/gpu/drm/i915/.[ch] sed -i "s/i915_gem_obj_size/i915_gem_obj_ggtt_size/" drivers/gpu/drm/i915/.[ch] sed -i "s/i915_gem_obj_offset/i915_gem_obj_ggtt_offset/" drivers/gpu/drm/i915/.[ch] (Daniel) v3: Rebased on new reserve_node patch Changed DRM_DEBUG_KMS to actually work (will need fixing later) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-08 22:04:34 +02:00
Chris Wilson	d26e3af842	drm/i915: Refactor the wait_rendering completion into a common routine Harmonise the completion logic between the non-blocking and normal wait_rendering paths, and move that logic into a common function. In the process, we note that the last_write_seqno is by definition the earlier of the two read/write seqnos and so all successful waits will have passed the last_write_seqno. Therefore we can unconditionally clear the write seqno and its domains in the completion logic. v2: Add the missing ring parameter, because sometimes it is good to have things compile. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-01 11:15:01 +02:00
Chris Wilson	daa13e1ca5	drm/i915: Only clear write-domains after a successful wait-seqno In the introduction of the non-blocking wait, I cut'n'pasted the wait completion code from normal locked path. Unfortunately, this neglected that the normal path returned early if the wait returned early. The result is that read-only waits may return whilst the GPU is still writing to the bo. Fixes regression from commit `3236f57a01` [v3.7] Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Aug 24 09:35:09 2012 +0100 drm/i915: Use a non-blocking wait for set-to-domain ioctl Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66163 Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-01 11:15:00 +02:00
Jani Nikula	3765f30486	drm/i915: fix build warning on format specifier mismatch drivers/gpu/drm/i915/i915_gem.c: In function ‘i915_gem_object_bind_to_gtt’: drivers/gpu/drm/i915/i915_gem.c:3002:3: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 5 has type ‘size_t’ [-Wformat] v2: Use %zu instead of %d. Two char patch, and 100% wrong. (Ville) Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-01 11:14:43 +02:00
Konrad Rzeszutek Wilk	1625e7e549	drm/i915: make compact dma scatter lists creation work with SWIOTLB backend. Git commit `90797e6d1e` ("drm/i915: create compact dma scatter lists for gem objects") makes certain assumptions about the under laying DMA API that are not always correct. On a ThinkPad X230 with an Intel HD 4000 with Xen during the bootup I see: [drm:intel_pipe_set_base] ERROR pin & fence failed [drm:intel_crtc_set_config] ERROR failed to set mode on [CRTC:3], err = -28 Bit of debugging traced it down to dma_map_sg failing (in i915_gem_gtt_prepare_object) as some of the SG entries were huge (3MB). That unfortunately are sizes that the SWIOTLB is incapable of handling - the maximum it can handle is a an entry of 512KB of virtual contiguous memory for its bounce buffer. (See IO_TLB_SEGSIZE). Previous to the above mention git commit the SG entries were of 4KB, and the code introduced by above git commit squashed the CPU contiguous PFNs in one big virtual address provided to DMA API. This patch is a simple semi-revert - were we emulate the old behavior if we detect that SWIOTLB is online. If it is not online then we continue on with the new compact scatter gather mechanism. An alternative solution would be for the the '.get_pages' and the i915_gem_gtt_prepare_object to retry with smaller max gap of the amount of PFNs that can be combined together - but with this issue discovered during rc7 that might be too risky. Reported-and-Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> CC: Chris Wilson <chris@chris-wilson.co.uk> CC: Imre Deak <imre.deak@intel.com> CC: Daniel Vetter <daniel.vetter@ffwll.ch> CC: David Airlie <airlied@linux.ie> CC: <dri-devel@lists.freedesktop.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-07-01 11:14:42 +02:00
Dave Airlie	28419261b0	Merge tag 'drm-intel-next-2013-06-18' of git://people.freedesktop.org/~danvet/drm-intel into drm-next Last 3.11 feature pull. I have a few odds bits and pieces and fixes in my queue, I'll sort them out later on to see what's for 3.11-fixes and what's for 3.12. But nothing to hold this here up imo. Highlights: - more hangcheck work from Mika and Chris to prepare for arb robustness - trickle feed fixes from Ville - first parts of the shared pch pll rework, with some basic hw state readout and cross-checking (this shuts up the confused pch pll refcount WARN that Linus just recently forwarded) - Haswell audio power well support from Wang Xingchao (alsa bits acked by Takashi) - some cleanups and asserts sprinkling around the plane/gamma enabling sequence from Ville - more gtt refactoring from Ben - clear up the adjusted->mode vs. pixel clock vs. port clock confusion - 30bpp support, this time for real hopefully * tag 'drm-intel-next-2013-06-18' of git://people.freedesktop.org/~danvet/drm-intel: (97 commits) drm/i915: remove a superflous semi-colon drm/i915: Kill useless "Enable panel fitter" comments drm/i915: Remove extra "ring" from error message drm/i915: simplify the reduced clock handling for pch plls drm/i915: stop killing pfit on i9xx drm/i915: explicitly set up PIPECONF (and gamma table) on haswell drm/i915: set up PIPECONF explicitly for i9xx/vlv platforms drm/i915: set up PIPECONF explicitly on ilk-ivb drm/i915: find guilty batch buffer on ring resets drm/i915: store ring hangcheck action drm/i915: add batch bo to i915_add_request() drm/i915: change i915_add_request to macro drm/i915: add i915_gem_context_get_hang_stats() drm/i915: add struct i915_ctx_hang_stats drm/i915: Try harder to disable trickle feed on VLV drm/i915: fix up pch pll enabling for pixel multipliers drm/i915: hw state readout and cross-checking for shared dplls drm/i915: WARN on lack of shared dpll drm/i915: split up intel_modeset_check_state drm/i915: extract readout_hw_state from setup_hw_state ... Conflicts: drivers/gpu/drm/i915/intel_display.c drivers/gpu/drm/i915/intel_fb.c drivers/gpu/drm/i915/intel_sdvo.c	2013-06-28 09:50:34 +10:00
Dave Airlie	4300a0f8bd	Linux 3.10-rc7 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) iQEbBAABAgAGBQJRxf9cAAoJEHm+PkMAQRiGMWkH911xM4gRmFgE7SqVW4F4AWBm ngcqMqNy9IdqKfibORUUDvVfEa5gjD5ai2quIKpfQiaukbpQJ696H90ijuAkajLn DQBrN243s0pzhhc/quWINnWxsFQ613JjdUMUMaD7e9A1aKjYzWrPGt/tSjrFXGCP tArTupVzc/iOmnEQDKiROI/Nokq44QJ36aTGPM7n08xMtpKmkCXM+9/UosBteB0O HVI33dmjwz7i55fI53XAWyuZCE+gSEnA4z8spJ9LfXso2W14V+roc+GuL6OyeeTI pCn/+4niVPb4B0ROZlpyVmdZjbPPcMMEK5o+BSJI68SH6LHZTQh2iVuqYfpSyA== =uUH5 -----END PGP SIGNATURE----- Merge tag 'v3.10-rc7' into drm-next Linux 3.10-rc7 The sdvo lvds fix in this -fixes pull commit `c3456fb3e4` Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Mon Jun 10 09:47:58 2013 +0200 drm/i915: prefer VBT modes for SVDO-LVDS over EDID has a silent functional conflict with commit `990256aec2` Author: Ville Syrjälä <ville.syrjala@linux.intel.com> Date: Fri May 31 12:17:07 2013 +0000 drm: Add probed modes in probe order in drm-next. W simply need to add the vbt modes before edid modes, i.e. the other way round than now. Conflicts: drivers/gpu/drm/drm_prime.c drivers/gpu/drm/i915/intel_sdvo.c	2013-06-27 20:40:44 +10:00
Konrad Rzeszutek Wilk	426729dcc7	drm/i915: make compact dma scatter lists creation work with SWIOTLB backend. Git commit `90797e6d1e` ("drm/i915: create compact dma scatter lists for gem objects") makes certain assumptions about the under laying DMA API that are not always correct. On a ThinkPad X230 with an Intel HD 4000 with Xen during the bootup I see: [drm:intel_pipe_set_base] ERROR pin & fence failed [drm:intel_crtc_set_config] ERROR failed to set mode on [CRTC:3], err = -28 Bit of debugging traced it down to dma_map_sg failing (in i915_gem_gtt_prepare_object) as some of the SG entries were huge (3MB). That unfortunately are sizes that the SWIOTLB is incapable of handling - the maximum it can handle is a an entry of 512KB of virtual contiguous memory for its bounce buffer. (See IO_TLB_SEGSIZE). Previous to the above mention git commit the SG entries were of 4KB, and the code introduced by above git commit squashed the CPU contiguous PFNs in one big virtual address provided to DMA API. This patch is a simple semi-revert - were we emulate the old behavior if we detect that SWIOTLB is online. If it is not online then we continue on with the new compact scatter gather mechanism. An alternative solution would be for the the '.get_pages' and the i915_gem_gtt_prepare_object to retry with smaller max gap of the amount of PFNs that can be combined together - but with this issue discovered during rc7 that might be too risky. Reported-and-Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> CC: Chris Wilson <chris@chris-wilson.co.uk> CC: Imre Deak <imre.deak@intel.com> CC: Daniel Vetter <daniel.vetter@ffwll.ch> CC: David Airlie <airlied@linux.ie> CC: <dri-devel@lists.freedesktop.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2013-06-25 10:39:57 +10:00
Chris Wilson	19b2dbde57	drm/i915: Restore fences after resume and GPU resets Stéphane Marchesin found that fences for pinned objects (i.e. the scanout) were not being restored upon resume, leading to corruption on the display and reference counting issues. This is due to a bug in commit `312817a39f` [2.6.38] Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Nov 22 11:50:11 2010 +0000 drm/i915: Only save and restore fences for UMS that zapped the pinned fences even though they were in use. Fortuitously, whilst we forced a VT switch during suspend and resume, no fences were ever pinned at the time. However, we now can do switchless S3 transitions and so the old bug finally surfaces. Reported-by: Stéphane Marchesin <marcheu@chromium.org> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Stéphane Marchesin <marcheu@chromium.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-06-16 01:10:45 +02:00
Mika Kuoppala	aa60c664e6	drm/i915: find guilty batch buffer on ring resets After hang check timer has declared gpu to be hung, rings are reset. In ring reset, when clearing request list, do post mortem analysis to find out the guilty batch buffer. Select requests for further analysis by inspecting the completed sequence number which has been updated into the HWS page. If request was completed, it can't be related to the hang. For noncompleted requests mark the batch as guilty if the ring was not waiting and the ring head was stuck inside the buffer object or in the flush region right after the batch. For everything else, mark them as innocents. v2: Fixed a typo in commit message (Ville Syrjälä) v3: - more descriptive function parameters (Chris Wilson) - use masked head address when inspecting if request is in ring - s/hangcheck.last_action/hangcheck.action - added comment about unmasked head hitting batch_obj range Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Acked-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-06-13 17:42:17 +02:00
Mika Kuoppala	7d736f4f0b	drm/i915: add batch bo to i915_add_request() In order to track down a batch buffer and context which caused the ring to hang, store reference to bo into the request struct. Request can also cause gpu to hang after the batch in the flush section in the ring. To detect this add start of the flush portion offset into the request. v2: Included comment about request vs batch_obj lifetimes (Chris Wilson) Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-06-13 17:42:16 +02:00
Mika Kuoppala	0025c0772d	drm/i915: change i915_add_request to macro Only execbuffer needed all the parameters on i915_add_request(). By putting __i915_add_request behind macro, all current callsites become cleaner. Following patch will introduce a new parameter for __i915_add_request. With this patch, only the relevant callsite will reflect the change making commit smaller and easier to understand. v2: _i915_add_request as function name (Chris Wilson) v3: change name __i915_add_request and fix ordering of params (Ben Widawsky) Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-06-13 17:42:15 +02:00
Dave Airlie	e6dfcc5303	Merge tag 'drm-intel-next-2013-06-01' of git://people.freedesktop.org/~danvet/drm-intel into drm-next Daniel writes: Another round of drm-intel-next for 3.11. Highlights: - Haswell IPS support (Paulo Zanoni) - VECS support on Haswell (Ben Widawsky, Xiang Haihao, ...) - Haswell watermark fixes (Paulo Zanoni) - "Make the gun bigger again" multithread fence fix from Chris. - i915_error_state finnally no longer fails with -ENOMEM! Big thanks to Mika for tackling this. - vlv sideband locking fixes from Jani - Hangcheck prep work for arb_robustness support (Mika&Chris) - edp vs cpu port confusion clean-up from Imre - pile of smaller fixes and cleanups all over. * tag 'drm-intel-next-2013-06-01' of git://people.freedesktop.org/~danvet/drm-intel: (70 commits) drm/i915: add i915_ips_status debugfs entry drm/i915: add enable_ips module option drm/i915: implement IPS feature drm/i915: fix up the edp power well check drm/i915: add I915_PARAM_HAS_VEBOX to i915_getparam drm/i915: add I915_EXEC_VEBOX to i915_gem_do_execbuffer() drm/i915: add VEBOX into debugfs drm/i915: Enable vebox interrupts drm/i915: vebox interrupt get/put drm/i915: consolidate interrupt naming scheme drm/i915: Convert irq_refounct to struct drm/i915: make PM interrupt writes non-destructive drm/i915: Add PM regs to pre/post install drm/i915: Create an ivybridge_irq_preinstall drm/i915: Create a more generic pm handler for hsw+ drm/i915: add support for 5/6 data buffer partitioning on Haswell drm/i915: properly set HSW WM_LP watermarks drm/i915: properly set HSW WM_PIPE registers drm/i915: fix pch_nop support drm/i915: Vebox ringbuffer init ...	2013-06-11 08:38:56 +10:00
Daniel Vetter	7abb690a0e	drm/i915: Fix spurious -EIO/SIGBUS on wedged gpus Chris Wilson noticed that since commit `1f83fee08d` [v3.9] Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Thu Nov 15 17:17:22 2012 +0100 drm/i915: clear up wedged transitions X can again get -EIO when it does not expect it. And even worse score a SIGBUS when accessing gtt mmaps. The established ABI is that we _only_ return an -EIO from execbuf - all other ioctls should just work. And since the reset code moves all bos out of gpu domains and clears out all the last_seqno/ring tracking there really shouldn't be any reason for non-execbuf code to ever touch the hw and see an -EIO. After some extensive discussions we've noticed that these spurios -EIO are caused by i915_gem_wait_for_error: http://www.mail-archive.com/intel-gfx@lists.freedesktop.org/msg20540.html That is easy to fix by returning 0 instead of -EIO, since grabbing the dev->struct_mutex does not yet mean that we actually want to touch the hw. And so there is no reason at all to fail with -EIO. But that's not the entire since, since often (at least it's easily googleable) dmesg indicates that the reset fails and we declare the gpu wedged. Then, quite a bit later X wakes up with the "Timed out waiting for the gpu reset to complete" DRM_ERROR message in wait_for_errror and brings down the desktop with an -EIO/SIGBUS. So clearly we're missing a wakeup somewhere, since the gpu reset just doesn't take 10 seconds to complete. And indeed we're do handle the terminally wedged state wrong. Fix this all up. References: https://bugs.freedesktop.org/show_bug.cgi?id=63921 References: https://bugs.freedesktop.org/show_bug.cgi?id=64073 Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Damien Lespiau <damien.lespiau@intel.com> Cc: stable@vger.kernel.org Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-06-03 14:35:18 +02:00
Ben Widawsky	35c20a60c7	drm/i915: Rename the gtt_list to global_list Since it will be used for the global bound/unbound list with full PPGTT, this helps clarify things for upcoming code rework. Recommended-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-06-03 10:51:14 +02:00
Ben Widawsky	401c29f607	drm/i915: unpin pages at unbind If we properly keep track of the pages_pin_count, then when we later add multiple address spaces, the put_pages doesn't need any special checks to be able to perform it's job. CC: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Rebased on top of the fix for stolen memory pinning.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-06-03 10:50:22 +02:00
Ben Widawsky	1d64ae719b	drm/i915: Unpin stolen pages The way the stolen handling works is we take a pin on the backing pages, but we never actually get a reference to the bo. On freeing objects allocated with stolen memory, the final unref will end up freeing the object with pinned pages count left. To enable an assertion to catch bugs in this code path, this patch cleans up that remaining pin. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-06-03 10:49:08 +02:00
Ben Widawsky	9a8a2213a7	drm/i915: Vebox ringbuffer init v2: Add set_seqno which didn't exist before rebase (Haihao) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Xiang, Haihao <haihao.xiang@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-05-31 20:54:12 +02:00

... 5 6 7 8 9 ...

1380 Commits