OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Tomas Elf	94f7bbe150	drm/i915: Snapshot seqno of most recently submitted request. The hang checker needs to inspect whether or not the ring request list is empty as well as if the given engine has reached or passed the most recently submitted request. The problem with this is that the hang checker cannot grab the struct_mutex, which is required in order to safely inspect requests since requests might be deallocated during inspection. In the past we've had kernel panics due to this very unsynchronized access in the hang checker. One solution to this problem is to not inspect the requests directly since we're only interested in the seqno of the most recently submitted request - not the request itself. Instead the seqno of the most recently submitted request is stored separately, which the hang checker then inspects, circumventing the issue of synchronization from the hang checker entirely. This fixes a regression introduced in commit `44cdd6d219` Author: John Harrison <John.C.Harrison@Intel.com> Date: Mon Nov 24 18:49:40 2014 +0000 drm/i915: Convert 'ring_idle()' to use requests not seqnos v2 (Chris Wilson): - Pass current engine seqno to ring_idle() from i915_hangcheck_elapsed() rather than compute it over again. - Remove extra whitespace. Issue: VIZ-5998 Signed-off-by: Tomas Elf <tomas.elf@intel.com> Cc: stable@vger.kernel.org Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Add regressing commit citation provided by Chris.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-07-13 22:42:39 +02:00
Linus Torvalds	099bfbfc7f	Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux Pull drm updates from Dave Airlie: "This is the main drm pull request for v4.2. I've one other new driver from freescale on my radar, it's been posted and reviewed, I'd just like to get someone to give it a last look, so maybe I'll send it or maybe I'll leave it. There is no major nouveau changes in here, Ben was working on something big, and we agreed it was a bit late, there wasn't anything else he considered urgent to merge. There might be another msm pull for some bits that are waiting on arm-soc, I'll see how we time it. This touches some "of" stuff, acks are in place except for the fixes to the build in various configs,t hat I just applied. Summary: New drivers: - virtio-gpu: KMS only pieces of driver for virtio-gpu in qemu. This is just the first part of this driver, enough to run unaccelerated userspace on. As qemu merges more we'll start adding the 3D features for the virgl 3d work. - amdgpu: a new driver from AMD to driver their newer GPUs. (VI+) It contains a new cleaner userspace API, and is a clean break from radeon moving forward, that AMD are going to concentrate on. It also contains a set of register headers auto generated from AMD internal database. core: - atomic modesetting API completed, enabled by default now. - Add support for mode_id blob to atomic ioctl to complete interface. - bunch of Displayport MST fixes - lots of misc fixes. panel: - new simple panels - fix some long-standing build issues with bridge drivers radeon: - VCE1 support - add a GPU reset counter for userspace - lots of fixes. amdkfd: - H/W debugger support module - static user-mode queues - support killing all the waves when a process terminates - use standard DECLARE_BITMAP i915: - Add Broxton support - S3, rotation support for Skylake - RPS booting tuning - CPT modeset sequence fixes - ns2501 dither support - enable cmd parser on haswell - cdclk handling fixes - gen8 dynamic pte allocation - lots of atomic conversion work exynos: - Add atomic modesetting support - Add iommu support - Consolidate drm driver initialization - and MIC, DECON and MIPI-DSI support for exynos5433 omapdrm: - atomic modesetting support (fixes lots of things in rewrite) tegra: - DP aux transaction fixes - iommu support fix msm: - adreno a306 support - various dsi bits - various 64-bit fixes - NV12MT support rcar-du: - atomic and misc fixes sti: - fix HDMI timing complaince tilcdc: - use drm component API to access tda998x driver - fix module unloading qxl: - stability fixes" * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (872 commits) drm/nouveau: Pause between setting gpu to D3hot and cutting the power drm/dp/mst: close deadlock in connector destruction. drm: Always enable atomic API drm/vgem: Set unique to "vgem" of: fix a build error to of_graph_get_endpoint_by_regs function drm/dp/mst: take lock around looking up the branch device on hpd irq drm/dp/mst: make sure mst_primary mstb is valid in work function of: add EXPORT_SYMBOL for of_graph_get_endpoint_by_regs ARM: dts: rename the clock of MIPI DSI 'pll_clk' to 'sclk_mipi' drm/atomic: Don't set crtc_state->enable manually drm/exynos: dsi: do not set TE GPIO direction by input drm/exynos: dsi: add support for MIC driver as a bridge drm/exynos: dsi: add support for Exynos5433 drm/exynos: dsi: make use of array for clock access drm/exynos: dsi: make use of driver data for static values drm/exynos: dsi: add macros for register access drm/exynos: dsi: rename pll_clk to sclk_clk drm/exynos: mic: add MIC driver of: add helper for getting endpoint node of specific identifiers drm/exynos: add Exynos5433 decon driver ...	2015-06-26 13:18:51 -07:00
Jani Nikula	245ec9d856	Revert "drm/i915: Don't skip request retirement if the active list is empty" This reverts commit `0aedb16265`. I messed things up while applying [1] to drm-intel-fixes. Rectify. [1] http://mid.gmane.org/1432827156-9605-1-git-send-email-ville.syrjala@linux.intel.com Fixes: `0aedb16265` ("drm/i915: Don't skip request retirement if the active list is empty") Cc: stable@vger.kernel.org Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2015-06-15 14:32:54 +03:00
Ville Syrjälä	11ee9615f9	drm/i915: Don't skip request retirement if the active list is empty Apparently we can have requests even if though the active list is empty, so do the request retirement regardless of whether there's anything on the active list. The way it happened here is that during suspend intel_ring_idle() notices the olr hanging around and then proceeds to get rid of it by adding a request. However since there was nothing on the active lists i915_gem_retire_requests() didn't clean those up, and so the idle work never runs, and we leave the GPU "busy" during suspend resulting in a WARN later. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2015-06-15 12:21:16 +03:00
Chris Wilson	016a65a391	drm/i915: Always reset vma->ggtt_view.pages cache on unbinding With the introduction of multiple views of an obj in the same vm, each vma was taught to cache its copy of the pages (so that different views could have different page arrangements). However, this missed decoupling those vma->ggtt_view.pages when the vma released its reference on the obj->pages. As we don't always free the vma, this leads to a possible scenario (e.g. execbuffer interrupted by the shrinker) where the vma points to a stale obj->pages, and explodes. Fixes regression from commit `fe14d5f4e5` Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Date: Wed Dec 10 17:27:58 2014 +0000 drm/i915: Infrastructure for supporting different GGTT views per object Tvrtko says, if someone else will be confused how this can happen, key is the reservation execbuffer path. That puts the VMA on the exec_list which prevents i915_vma_unbind and i915_gem_vma_destroy from fully destroying the VMA. So the VMA is left existing as an empty object in the list - unbound and disassociated with the backing store. Kind of a cached memory object. And then re-using it needs to clear the cached pages pointer which is fixed above. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1227892 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Michel Thierry <michel.thierry@intel.com> Cc: stable@vger.kernel.org Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [Jani: Added Tvrtko's explanation to commit message.] Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2015-06-15 09:49:59 +03:00
Ville Syrjälä	0aedb16265	drm/i915: Don't skip request retirement if the active list is empty Apparently we can have requests even if though the active list is empty, so do the request retirement regardless of whether there's anything on the active list. The way it happened here is that during suspend intel_ring_idle() notices the olr hanging around and then proceeds to get rid of it by adding a request. However since there was nothing on the active lists i915_gem_retire_requests() didn't clean those up, and so the idle work never runs, and we leave the GPU "busy" during suspend resulting in a WARN later. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2015-06-01 10:55:51 +03:00
Chris Wilson	8d3afd7d0e	drm/i915: Use spinlocks for checking when to waitboost In commit `1854d5ca0d` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Apr 7 16:20:32 2015 +0100 drm/i915: Deminish contribution of wait-boosting from clients we removed an atomic timer based check for allowing waitboosting and moved it below the mutex taken during RPS. However, that mutex can be held for long periods of time on Vallyview/Cherryview as communication with the PCU is slow. As clients may frequently wait for results (e.g. such as tranform feedback) we introduced contention between the client and the RPS worker. We can take advantage of the RPS worker, by switching the wait boost decision to use spin locks and defer the actual reclocking to the worker. Fixes a regression of up to 45% on Baytrail and Baswell! v2 (Daniel): - Use max_freq_softlimit instead of the not-yet-merged boost frequency. - Don't inject a fake irq into the boost work, instead treat client_boost as just another legit waker. v3: Drop the now unused mask (Chris). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90112 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1) Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-05-26 19:16:12 +02:00
Chris Wilson	d0bc54f2f0	drm/i915: Introduce DRM_I915_THROTTLE_JIFFIES As Daniel commented on commit b7ffe1362c5f468b853223acc9268804aa92afc8 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Apr 27 13:41:24 2015 +0100 drm/i915: Free RPS boosts for all laggards it is better to be explicit when sharing hardcoded values such as throttle/boost timeouts. Make it so! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-05-22 08:59:52 +02:00
Chris Wilson	9a0c1e2770	drm/i915: Use the correct destructor for freeing requests on error After allocating from the slab cache, we then need to free the request back into the slab cache upon error (and not call kfree as that leads to eventual memory corruption). Fixes regression from commit `efab6d8dd1` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Apr 7 16:20:57 2015 +0100 drm/i915: Use a separate slab for requests Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-05-22 08:53:48 +02:00
Chris Wilson	e61b995841	drm/i915: Free RPS boosts for all laggards If the client stalls on a congested request, chosen to be 20ms old to match throttling, allow the client a free RPS boost. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: s/rq/req/] [danvet: s/0/NULL/ reported by 0-day build] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-05-21 22:50:05 +02:00
Chris Wilson	2e1b873072	drm/i915: Convert RPS tracking to a intel_rps_client struct Now that we have internal clients, rather than faking a whole drm_i915_file_private just for tracking RPS boosts, create a new struct intel_rps_client and pass it along when waiting. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: s/rq/req/] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-05-21 15:11:44 +02:00
Chris Wilson	a6f766f397	drm/i915: Limit ring synchronisation (sw sempahores) RPS boosts Ring switches can occur many times per frame, and are often out of control, causing frequent RPS boosting for no practical benefit. Treat the sw semaphore synchronisation as a separate client and only allow it to boost once per busy/idle cycle. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: s/rq/req/] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-05-21 15:11:43 +02:00
Chris Wilson	b47161858b	drm/i915: Implement inter-engine read-read optimisations Currently, we only track the last request globally across all engines. This prevents us from issuing concurrent read requests on e.g. the RCS and BCS engines (or more likely the render and media engines). Without semaphores, we incur costly stalls as we synchronise between rings - greatly impacting the current performance of Broadwell versus Haswell in certain workloads (like video decode). With the introduction of reference counted requests, it is much easier to track the last request per ring, as well as the last global write request so that we can optimise inter-engine read read requests (as well as better optimise certain CPU waits). v2: Fix inverted readonly condition for nonblocking waits. v3: Handle non-continguous engine array after waits v4: Rebase, tidy, rewrite ring list debugging v5: Use obj->active as a bitfield, it looks cool v6: Micro-optimise, mostly involving moving code around v7: Fix retire-requests-upto for execlists (and multiple rq->ringbuf) v8: Rebase v9: Refactor i915_gem_object_sync() to allow the compiler to better optimise it. Benchmark: igt/gem_read_read_speed hsw:gt3e (with semaphores): Before: Time to read-read 1024k: 275.794µs After: Time to read-read 1024k: 123.260µs hsw:gt3e (w/o semaphores): Before: Time to read-read 1024k: 230.433µs After: Time to read-read 1024k: 124.593µs bdw-u (w/o semaphores): Before After Time to read-read 1x1: 26.274µs 10.350µs Time to read-read 128x128: 40.097µs 21.366µs Time to read-read 256x256: 77.087µs 42.608µs Time to read-read 512x512: 281.999µs 181.155µs Time to read-read 1024x1024: 1196.141µs 1118.223µs Time to read-read 2048x2048: 5639.072µs 5225.837µs Time to read-read 4096x4096: 22401.662µs 21137.067µs Time to read-read 8192x8192: 89617.735µs 85637.681µs Testcase: igt/gem_concurrent_blit (read-read and friends) Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [v8] [danvet: s/\<rq\>/req/g] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-05-21 15:11:42 +02:00
Daniel Vetter	eed29a5b21	drm/i915: s/\<rq\>/req/g The merged seqno->request conversion from John called request variables req, but some (not all) of Chris' recent patches changed those to just rq. We've had a lenghty (and inconclusive) discussion on irc which is the more meaningful name with maybe at most a slight bias towards req. Given that the "don't change names without good reason to avoid conflicts" rule applies, so lets go back to a req everywhere for consistency. I'll sed any patches for which this will cause conflicts before applying. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: John Harrison <John.C.Harrison@Intel.com> [danvet: s/origina/merged/ as pointed out by Chris - the first mass-conversion patch was from Chris, the merged one from John.] Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2015-05-21 15:10:48 +02:00
Chris Wilson	2e2f351dbf	drm/i915: Remove domain flubbing from i915_gem_object_finish_gpu() We no longer interpolate domains in the same manner, and even if we did, we should trust setting either of the other write domains would trigger an invalidation rather than force it. Remove the tweaking of the read_domains since it serves no purpose and use i915_gem_object_wait_rendering() directly. Note that this goes back to commit `a8198eea15` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Apr 13 22:04:09 2011 +0100 drm/i915: Introduce i915_gem_object_finish_gpu() and gpu domain tracking died in commit `cc889e0f6c` Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Wed Jun 13 20:45:19 2012 +0200 drm/i915: disable flushing_list/gpu_write_list which is more than 1 year older. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Add notes with information dug out of git history.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-05-20 11:25:45 +02:00
Daniel Vetter	5e024f31be	drm/i915: Remove unused variable from i915_gem_mmap_gtt Lost in commit `c5ad54cf7d` Author: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Date: Wed May 6 14:36:09 2015 +0300 drm/i915: Use partial view in mmap fault handler Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2015-05-20 11:25:38 +02:00
Joonas Lahtinen	e7ded2d746	drm/i915: Reject huge tiled objects We do not yet support tiled objects bigger than the mappable aperture size so reject them. Reported-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> [danvet: Rework the check a bit to avoid warnings.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-05-08 17:25:29 +02:00
Joonas Lahtinen	c5ad54cf7d	drm/i915: Use partial view in mmap fault handler Use partial view for huge BOs (bigger than half the mappable aperture) in fault handler so that they can be accessed withough trying to make room for them by evicting other objects. v2: - Only use partial views in the case where early rejection was previously done. - Account variable type changes from previous reroll. v3: - Add a comment about overwriting existing page entries. (Tvrtko Ursulin) - Whitespace fixes. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-05-08 13:04:18 +02:00
Joonas Lahtinen	a6631ae1fd	drm/i915: Consider object pinned if any VMA is pinned Do not skip special GGTT views when considering whether an object is pinned or not. Wrong behaviour was introduced in; commit `ec7adb6ee7` Author: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Date: Mon Mar 16 14:11:13 2015 +0200 drm/i915: Do not use ggtt_view with (aliasing) PPGTT Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-05-08 13:04:17 +02:00
Joonas Lahtinen	91e6711e30	drm/i915: Do not make assumptions on GGTT VMA sizes GGTT VMA sizes might be smaller than the whole object size due to different GGTT views. v2: - Separate GGTT view constraint calculations from normal view constraint calculations (Chris Wilson) v3: - Do not bother with debug wording. (Tvrtko Ursulin) v4: - Clearer logic for calculating map_and_fenceable (Tvrtko Ursulin) Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [danvet: Drop BUG_ON, it's redudant.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-05-08 13:04:17 +02:00
Chris Wilson	432be69d9d	drm/i915: Remove locking for get-caching query Reading a single value from the object, the locking only provides futile protection against userspace races. The locking is useless so remove it. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-05-08 13:04:12 +02:00
Mika Kuoppala	5e562f1ddd	drm/i915: Clear vma->bound on unbinding Unbinding doesn't always lead to unconditional destruction of vma. This destruction avoidance happens if vma is part of execbuffer relocation list or if vma is being considered for eviction in i915_gem_evict_something(). For those other users, mark the vma unbound so that the correct state of this vma is preserved. Reported-by: Chris Wilson <chris@chris-wilson.co.ok> Cc: Chris Wilson <chris@chris-wilson.co.ok> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-05-08 13:03:29 +02:00
Dave Airlie	e1dee1973c	Merge tag 'drm-intel-next-2015-04-23-fixed' of git://anongit.freedesktop.org/drm-intel into drm-next drm-intel-next-2015-04-23: - dither support for ns2501 dvo (Thomas Richter) - some polish for the gtt code and fixes to finally enable the cmd parser on hsw - first pile of bxt stage 1 enabling (too many different people to list ...) - more psr fixes from Rodrigo - skl rotation support from Chandra - more atomic work from Ander and Matt - pile of cleanups and micro-ops for execlist from Chris drm-intel-next-2015-04-10: - cdclk handling cleanup and fixes from Ville - more prep patches for olr removal from John Harrison - gmbus pin naming rework from Jani (prep for bxt) - remove ->new_config from Ander (more atomic conversion work) - rps (boost) tuning and unification with byt/bsw from Chris - cmd parser batch bool tuning from Chris - gen8 dynamic pte allocation (Michel Thierry, based on work from Ben Widawsky) - execlist tuning (not yet all of it) from Chris - add drm_plane_from_index (Chandra) - various small things all over * tag 'drm-intel-next-2015-04-23-fixed' of git://anongit.freedesktop.org/drm-intel: (204 commits) drm/i915/gtt: Allocate va range only if vma is not bound drm/i915: Enable cmd parser to do secure batch promotion for aliasing ppgtt drm/i915: fix intel_prepare_ddi drm/i915: factor out ddi_get_encoder_port drm/i915/hdmi: check port in ibx_infoframe_enabled drm/i915/hdmi: fix vlv infoframe port check drm/i915: Silence compiler warning in dvo drm/i915: Update DRIVER_DATE to 20150423 drm/i915: Enable dithering on NatSemi DVO2501 for Fujitsu S6010 rm/i915: Move i915_get_ggtt_vma_pages into ggtt_bind_vma drm/i915: Don't try to outsmart gcc in i915_gem_gtt.c drm/i915: Unduplicate i915_ggtt_unbind/bind_vma drm/i915: Move ppgtt_bind/unbind around drm/i915: move i915_gem_restore_gtt_mappings around drm/i915: Fix up the vma aliasing ppgtt binding drm/i915: Remove misleading comment around bind_to_vm drm/i915: Don't use atomics for pg_dirty_rings drm/i915: Don't look at pg_dirty_rings for aliasing ppgtt drm/i915/skl: Support Y tiling in MMIO flips drm/i915: Fixup kerneldoc for struct intel_context ... Conflicts: drivers/gpu/drm/i915/i915_drv.c	2015-05-08 20:51:06 +10:00
Michel Thierry	53292cdb06	drm/i915: Workaround to avoid lite restore with HEAD==TAIL WaIdleLiteRestore is an execlists-only workaround, and requires the driver to ensure that any context always has HEAD!=TAIL when attempting lite restore. Add two extra MI_NOOP instructions at the end of each request, but keep the requests tail pointing before the MI_NOOPs. We may not need to executed them, and this is why request->tail is sampled before adding these extra instructions. If we submit a context to the ELSP which has previously been submitted, move the tail pointer past the MI_NOOPs. This ensures HEAD!=TAIL. v2: Move overallocation to gen8_emit_request, and added note about sampling request->tail in commit message (Chris). v3: Remove redundant request->tail assignment in __i915_add_request, in lrc mode this is already set in execlists_context_queue. Do not add wa implementation details inside gem (Chris). v4: Apply the wa whenever the req has been resubmitted and update comment (Chris). Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> Signed-off-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2015-04-23 23:56:52 +03:00
Daniel Vetter	0875546c53	drm/i915: Fix up the vma aliasing ppgtt binding Currently we have the problem that the decision whether ptes need to be (re)written is splattered all over the codebase. Move all that into i915_vma_bind. This needs a few changes: - Just reuse the PIN_* flags for i915_vma_bind and do the conversion to vma->bound in there to avoid duplicating the conversion code all over. - We need to make binding for EXECBUF (i.e. pick aliasing ppgtt if around) explicit, add PIN_USER for that. - Two callers want to update ptes, give them a PIN_UPDATE for that. Of course we still want to avoid double-binding, but that should be taken care of: - A ppgtt vma will only ever see PIN_USER, so no issue with double-binding. - A ggtt vma with aliasing ppgtt needs both types of binding, and we track that properly now. - A ggtt vma without aliasing ppgtt could be bound twice. In the lower-level ->bind_vma functions hence unconditionally set GLOBAL_BIND when writing the ggtt ptes. There's still a bit room for cleanup, but that's for follow-up patches. v2: Fixup fumbles. v3: s/PIN_EXECBUF/PIN_USER/ for clearer meaning, suggested by Chris. Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2015-04-23 21:06:39 +02:00
Daniel Vetter	cd102a687b	drm/i915: Remove misleading comment around bind_to_vm It's true that we might need to context switch, but both the signalling and implementation of the same are a few source files away. Remove it. Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-23 21:06:03 +02:00
Daniel Vetter	777dc5bb26	drm/i915: Move vma vfuns to adddress_space They change with the address space and not with each vma, so move them into the right pile of vfuncs. Save 2 pointers per vma and clarifies the code. Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2015-04-20 08:54:29 -07:00
Tvrtko Ursulin	8a0c39b162	drm/i915: Simplify and fix object to display tracking Purpose of this tracking is to know when to flush the cache between the CPU and the non-coherent display engine. Prior to: commit `121920faf2` Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Date: Mon Mar 23 11:10:37 2015 +0000 drm/i915/skl: Query display address through a wrapper This worked by a mix of direct flag manipulation and checking for existence of a pinned GGTT VMA. With the introduction of rotated display mappings this approach is no longer correct. New simpler approach is to just keep this count over calls which pin and unpin objects to and from display, at the slight cost of extra space in every bo. (Inspired and extracted code from a larger rework by Chris Wilson.) v2: Remove the limit since it is not well defined. (Chris Wilson, Ville Syrjälä) v3: Commit message corrections. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-20 08:51:45 -07:00
Tvrtko Ursulin	5678ad7367	drm/i915: Fix view type in warning message One month passed between posting a patch and it getting merged, and unfortunately even though it still applies, it needs fixing to account for changes in function parameters since: commit d385612e15b8b6eb3db328d83f1872ef8a381788 Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Date: Tue Mar 17 14:45:29 2015 +0000 drm/i915: Log view type when printing warnings Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> [danvet: Squash in fixup from Tvrtko to fix the rebase conflict.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-16 11:20:07 +02:00
Chris Wilson	30154650b8	drm/i915: Remove obj->pin_mappable The obj->pin_mappable flag only exists for debug purposes and is a hindrance that is mistreated with rotated GGTT views. For debug purposes, it suffices to mark objects with pin_display as being of note. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-13 14:25:36 +02:00
Chris Wilson	2def4ad99b	drm/i915: Optimistically spin for the request completion This provides a nice boost to mesa in swap bound scenarios (as mesa throttles itself to the previous frame and given the scenario that will complete shortly). It will also provide a good boost to systems running with semaphores disabled and so frequently waiting on the GPU as it switches rings. In the most favourable of microbenchmarks, this can increase performance by around 15% - though in practice improvements will be marginal and rarely noticeable. v2: Account for user timeouts v3: Limit the spinning to a single jiffie (~1us) at most. On an otherwise idle system, there is no scheduler contention and so without a limit we would spin until the GPU is ready. v4: Drop forcewake - the lazy coherent access doesn't require it, and we have no reason to believe that the forcewake itself improves seqno coherency - it only adds delay. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Eero Tamminen <eero.t.tamminen@intel.com> Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-13 14:24:36 +02:00
Mika Kuoppala	1d335d1b62	drm/i915: Move vm page allocation in proper place Move to i915_vma_bind as it is part of the binding. Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-10 16:18:02 +02:00
Chris Wilson	d7b9ca2f7a	drm/i915: Remove request->uniq We already assign a unique identifier to every request: seqno. That someone felt like adding a second one without even mentioning why and tweaking ABI smells very fishy. Fixes regression from commit `b3a38998f0` Author: Nick Hoath <nicholas.hoath@intel.com> Date: Thu Feb 19 16:30:47 2015 +0000 drm/i915: Fix a use after free, and unbalanced refcounting v2: Rebase Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Nick Hoath <nicholas.hoath@intel.com> Cc: Thomas Daniel <thomas.daniel@intel.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Jani Nikula <jani.nikula@intel.com> [danvet: Fixup because different merge order.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-10 10:41:11 +02:00
Chris Wilson	423795cbac	drm/i915: Prefer to check for idleness in worker rather than sync-flush Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-10 10:37:31 +02:00
Chris Wilson	e20d2ab741	drm/i915: Use a separate slab for vmas vma are more frequently allocated than objects and so should equally benefit from having a dedicated slab. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-10 10:17:13 +02:00
Chris Wilson	efab6d8dd1	drm/i915: Use a separate slab for requests requests are even more frequently allocated than objects and equally benefit from having a dedicated slab. v2: Rebase Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-10 10:17:07 +02:00
Chris Wilson	4bb1bedb28	drm/i915: Use the global runtime-pm wakelock for a busy GPU for execlists When we submit a request to the GPU, we first take the rpm wakelock, and only release it once the GPU has been idle for a small period of time after all requests have been complete. This means that we are sure no new interrupt can arrive whilst we do not hold the rpm wakelock and so can drop the individual get/put around every single request inside execlists. Note: to close one potential issue we should mark the GPU as busy earlier in __i915_add_request. To elaborate: The issue is that we emit the irq signalling sequence before we grab the rpm reference, which means we could miss the resulting interrupt (since that's not set up when suspended). The only bad side effect is a missed interrupt, gt mmio writes automatically wake up the hw itself. But otoh we have an umbrella rpm reference for the entirety of execbuf, as long as that's there we're covered. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Explain a bit more about the add_request issue, which after some irc chatting with Chris turns out to not be an issue really.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-10 08:56:14 +02:00
Chris Wilson	8d9d5744c6	drm/i915: Split batch pool into size buckets Now with the trimmed memcpy before the command parser, we try to allocate many different sizes of batches, predominantly one or two pages. We can therefore speed up searching for a good sized batch by keeping the objects of buckets of roughly the same size. v2: Add a comment about bucket sizes Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-10 08:56:05 +02:00
Chris Wilson	35c94185c5	drm/i915: Free batch pool when idle At runtime, this helps ensure that the batch pools are kept trim and fast. Then at suspend, this releases memory that we do not need to restore. It also ties into the oom-notifier to ensure that we recover as much kernel memory as possible during OOM. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-10 08:56:05 +02:00
Chris Wilson	06fbca713e	drm/i915: Split the batch pool by engine I woke up one morning and found 50k objects sitting in the batch pool and every search seemed to iterate the entire list... Painting the screen in oils would provide a more fluid display. One issue with the current design is that we only check for retirements on the current ring when preparing to submit a new batch. This means that we can have thousands of "active" batches on another ring that we have to walk over. The simplest way to avoid that is to split the pools per ring and then our LRU execution ordering will also ensure that the inactive buffers remain at the front. v2: execlists still requires duplicate code. v3: execlists requires more duplicate code Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-10 08:56:04 +02:00
Chris Wilson	7c27f52509	drm/i915: Re-enable RPS wait-boosting for all engines This reverts commit `ec5cc0f9b0` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Jun 12 10:28:55 2014 +0100 drm/i915: Restrict GPU boost to the RCS engine The premise that media/blitter workloads are not affected by boosting is patently false with a trip through igt. The question that remains is what exactly is going wrong with the media workload that prompted this? Hopefully that would be fixed by the missing agressive downclocking, in addition to the extra restrictions imposed on how frequent a process is allowed to boost. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Deepak S <deepak.s@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll> Acked-by: Deepak S <deepak.s@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-10 08:56:03 +02:00
Chris Wilson	1854d5ca0d	drm/i915: Deminish contribution of wait-boosting from clients With boosting for missed pageflips, we have a much stronger indication of when we need to (temporarily) boost GPU frequency to ensure smooth delivery of frames. So now only allow each client to perform one RPS boost in each period of GPU activity due to stalling on results. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Deepak S <deepak.s@linux.intel.com> Reviewed-by: Deepak S <deepak.s@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-10 08:56:02 +02:00
Chris Wilson	ee286370d6	drm/i915: Cache last obj->pages location for i915_gem_object_get_page() The biggest user of i915_gem_object_get_page() is the relocation processing during execbuffer. Typically userspace passes in a set of relocations in sorted order. Sadly, we alternate between relocations increasing from the start of the buffers, and relocations decreasing from the end. However the majority of consecutive lookups will still be in the same page. We could cache the start of the last sg chain, however for most callers, the entire sgl is inside a single chain and so we see no improve from the extra layer of caching. v2: Avoid the double increment inside unlikely() References: https://bugs.freedesktop.org/show_bug.cgi?id=88308 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-10 08:56:00 +02:00
John Harrison	6689cb2b62	drm/i915: Move common request allocation code into a common function The request allocation code is largely duplicated between legacy mode and execlist mode. The actual difference between the two versions of the code is pretty minimal. This patch moves the common code out into a separate function. This is then called by the execution specific version prior to setting up the one different value. For: VIZ-5190 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-01 07:54:30 +02:00
John Harrison	f3dc74c0e1	drm/i915: Rename 'do_execbuf' to 'execbuf_submit' The submission portion of the execbuffer code path was abstracted into a function pointer indirection as part of the legacy vs execlist work. The two implementation functions are called 'i915_gem_ringbuffer_submission' and 'intel_execlists_submission' but the pointer was called 'do_execbuf'. There is already a 'i915_gem_do_execbuffer' function (which is what calls the pointer indirection). The name of the pointer is therefore considered to be backwards and should be changed. This patch renames it to 'execbuf_submit' which is hopefully a bit clearer. For: VIZ-5115 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-01 07:53:41 +02:00
Chris Wilson	41037f9f0e	drm/i915: Add i915_gem_request_unreference__unlocked We were missing a convenience stub to aquire the right mutex whilst dropping the request, so add it. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-03-30 16:39:30 +02:00
Daniel Vetter	6e0aa8018f	Linux 4.0-rc6 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJVGHwjAAoJEHm+PkMAQRiG8rcIAJ6cEJ6mbqLpyz5XrGf4yNp0 +wG/QlEpT8rgrxe9wSjB3lfW3kR2Pe69b9fVVCdiklygdkmva5vfmDrVGGzYfe3M QrFSSlMVBplvh6IiM/L1mVMtr3DSmCO23YZZ9R5b7FoEYatNHRpNWBCBpuXpd4aD sLuIvO3L/S7LqeOAFkkYWv6AuL9umicmjR8u+nsmCSRJom7At/aJ6R66WIp9vxho Rn7r6wcUk6B2Q/gYNjdSE8SIwdyKhuBGyvqQ9U9s6Btg9DQfM/b0vG5kw9hqeAq/ 9445jqVDP1whA2vz6GjnvltidxrqRvuDPBwzOnFmY5U+KZz4lS3x2mnWAAJ3xWs= =TqVJ -----END PGP SIGNATURE----- Merge tag 'v4.0-rc6' into drm-intel-next Backmerge Linux 4.0-rc6 because conflicts are (again) getting out of hand. To make sure we don't lose any bugfixes from the 4.0-rc5-rc6 flurry of patches we've applied them all to -next too. Conflicts: drivers/gpu/drm/i915/intel_display.c Always take the version from -next, we've already handled all conflicts with explicit cherrypicking. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2015-03-30 16:37:08 +02:00
Joonas Lahtinen	9abc464854	drm/i915: Compare GGTT view structs instead of types To allow for views where the view type is not defined by the view type only, like it is in stereo or rotated 90 degree view, change the semantic to require the whole view structure for comparison when we match a GGTT view. This allows including parameters like offset to be included in the view which is useful for eg. partial views. v3: - Rely on ggtt_view type being 0 for non-GGTT vma's, which equals to I915_GGTT_VIEW_NORMAL. (Daniel Vetter) - Do not use potentially slower comparison when we only want to know if something is or is not a normal view. - Rebase on top of rotated view patches. Add rotated view singleton. - If one view is missing in comparison they're equal only if both are missing. v4: - Use comparison helper in obj_to_ggtt_view too. (Tvrtko Ursulin) - Do WARN_ON if one view is NULL. (Tvrtko Ursulin) Cc: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-03-27 15:05:22 +01:00
Michel Thierry	72744cb13c	drm/i915: Add dynamic page trace events Traces for page directories and tables allocation and map. v2: Removed references to teardown. v3: bitmap_scnprintf has been deprecated. v4: Replace bitmap_scnprintf with scnprintf correctly, and get right range lengths. (Mika) Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-03-27 09:25:44 +01:00
Chris Wilson	832a3aad1e	drm/i915: Keep ring->active_list and ring->requests_list consistent If we retire requests last, we may use a later seqno and so clear the requests lists without clearing the active list, leading to confusion. Hence we should retire requests first for consistency with the early return. The order used to be important as the lifecycle for the object on the active list was determined by request->seqno. However, the requests themselves are now reference counted removing the constraint from the order of retirement. Fixes regression from commit `1b5a433a4d` Author: John Harrison <John.C.Harrison@Intel.com> Date: Mon Nov 24 18:49:42 2014 +0000 drm/i915: Convert 'i915_seqno_passed' calls into 'i915_gem_request_completed ' and a WARNING: CPU: 0 PID: 1383 at drivers/gpu/drm/i915/i915_gem_evict.c:279 i915_gem_evict_vm+0x10c/0x140() WARN_ON(!list_empty(&vm->active_list)) Identified by updating WATCH_LISTS: [drm:i915_verify_lists] ERROR blitter ring: active list not empty, but no requests WARNING: CPU: 0 PID: 681 at drivers/gpu/drm/i915/i915_gem.c:2751 i915_gem_retire_requests_ring+0x149/0x230() WARN_ON(i915_verify_lists(ring->dev)) Note that this is only a problem in evict_vm where the following happens after a retire_request has cleaned out all requests, but not all active bo: - intel_ring_idle called from i915_gpu_idle notices that no requests are outstanding and immediately returns. - i915_gem_retire_requests_ring called from i915_gem_retire_requests also immediately returns when there's no request, still leaving the bo on the active list. - evict_vm hits the WARN_ON(!list_empty(&vm->active_list)) after evicting all active objects that there's still stuff left that shouldn't be there. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2015-03-26 11:05:54 +02:00

1 2 3 4 5 ...

1157 Commits