Commit Graph

1578 Commits

Author SHA1 Message Date
Imre Deak 93c97dc17f drm/i915: Avoid early GPU idling due to race with new request
There is a small race where a new request can be submitted and retired
after the idle worker started to run which leads to idling the GPU too
early. Fix this by deferring the idling to the pending instance of the
worker.

This scenario was pointed out by Chris.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1478510405-11799-2-git-send-email-imre.deak@intel.com
2016-11-07 14:48:04 +02:00
Chris Wilson 767a222e47 drm/i915: Limit Valleyview and earlier to only using mappable scanout
Valleyview appears to be limited to only scanning out from the first 512MiB
of the Global GTT. Lets presume that this behaviour was inherited from the
display block copied from g4x (not Ironlake) and all earlier generations
are similarly affected, though testing suggests different symptoms. For
simplicity, impose that these platforms must scanout from the mappable
region. (For extra simplicity, use HAS_GMCH_DISPLAY even though this
catches Cherryview which does not appear to be limited to the low
aperture for its scanout.)

v2: Use HAS_GMCH_DISPLAY() to more clearly convey my intent about
limiting this workaround to the old style of display engine.

v3: Update changelog to reflect testing by Ville Syrjälä
v4: Include the changes to the comments as well

Reported-by: Luis Botello <luis.botello.ortega@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98036
Fixes: 2efb813d53 ("drm/i915: Fallback to using unmappable memory for scanout")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Akash Goel <akash.goel@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.9-rc1+
Link: http://patchwork.freedesktop.org/patch/msgid/20161107110128.28762-1-chris@chris-wilson.co.uk
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com
2016-11-07 12:25:50 +00:00
Chris Wilson 0ef723cbce drm/i915: Round tile chunks up for constructing partial VMAs
When we split a large object up into chunks for GTT faulting (because we
can't fit the whole object into the aperture) we have to align our cuts
with the fence registers. Each partial VMA must cover a complete set of
tile rows or the offset into each partial VMA is not aligned with the
whole image. Currently we enforce a minimum size on each partial VMA,
but this minimum size itself was not aligned to the tile row causing
distortion.

Reported-by: Andreas Reis <andreas.reis@gmail.com>
Reported-by: Chris Clayton <chris2553@googlemail.com>
Reported-by: Norbert Preining <preining@logic.at>
Tested-by: Norbert Preining <preining@logic.at>
Tested-by: Chris Clayton <chris2553@googlemail.com>
Fixes: 03af84fe7f ("drm/i915: Choose partial chunksize based on tile row size")
Fixes: a61007a83a ("drm/i915: Fix partial GGTT faulting") # enabling patch
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98402
Testcase: igt/gem_mmap_gtt/medium-copy-odd
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.9-rc1+
Link: http://patchwork.freedesktop.org/patch/msgid/20161107105443.27855-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-11-07 11:39:59 +00:00
Chris Wilson 2c3a3f44dc drm/i915: Fix pages pin counting around swizzle quirk
commit bc0629a767 ("drm/i915: Track pages pinned due to swizzling
quirk") fixed one problem, but revealed a whole lot more. The root cause
of the pin count mismatch for the swizzle quirk (for L-shaped memory on
gen3/4) was that we were incrementing the pages_pin_count upon getting
the backing pages but then overwriting the pages_pin_count to set it to
1 afterwards. With a little bit of adjustment to satisfy the GEM_BUG_ON
sanitychecks, the fix is to replace the explicit atomic_set with an
atomic_inc.

v2: Consistently use atomics (not mix atomics and helpers) within the
lowlevel get_pages routines. This makes the atomic operations much
clearer.

Fixes: 1233e2db19 ("drm/i915: Move object backing storage manipulation")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161104103001.27643-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-11-04 11:55:39 +00:00
Tvrtko Ursulin a933568eb6 drm/i915: Tidy slab cache allocations
We can use the preferred KMEM_CACHE helper for brevity.

Also simplifiy error unwind by only setting the ENOMEM
error code once.

v2: Add forgotten changes. (Joonas Lahtinen)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v1)
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1478099699-28652-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-11-03 13:52:57 +00:00
Joonas Lahtinen 56cea32382 drm/i915: Unify global_list into global_link
$ sed -i -r 's/\bglobal_list\b/global_link/g' *.c *.h

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1478081764-8058-1-git-send-email-joonas.lahtinen@linux.intel.com
2016-11-02 15:17:13 +02:00
Tvrtko Ursulin 3599a91cc8 drm/i915: Allow shrinking of userptr objects once again
Commit 1bec9b0bda ("drm/i915/shrinker: Only shmemfs objects
are backed by swap") stopped considering the userptr objects
in shrinker callbacks.

Restore that so idle userptr objects can be discarded in order
to free up memory.

One change further to what was introduced in 1bec9b0bda is
to start considering userptr objects in oom but that should
also be a correct thing to do.

v2: Introduce I915_GEM_OBJECT_IS_SHRINKABLE. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 1bec9b0bda ("drm/i915/shrinker: Only shmemfs objects are backed by swap")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: <stable@vger.kernel.org>
Link: http://patchwork.freedesktop.org/patch/msgid/1478011450-6634-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-11-01 16:35:26 +00:00
Ville Syrjälä a26e523921 drm/i915: Pass dev_priv to IS_BROADWATER/IS_CRESTLINE
Unify our approach to things by passing around dev_priv instead of dev.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1477946245-14134-21-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-11-01 16:40:38 +02:00
Chris Wilson 548625ee8f drm/i915: Improve lockdep tracking for obj->mm.lock
The shrinker may appear to recurse into obj->mm.lock as the shrinker may
be called from a direct reclaim path whilst handling get_pages. We
filter out recursing on the same obj->mm.lock by inspecting
obj->mm.pages, but we do want to take the lock on a second object in
order to reap their pages. lockdep spots the recursion on the same
lockclass and needs annotation to avoid a false positive. To keep the
two paths distinct, create an enum to indicate which subclass of
obj->mm.lock we are using. This removes the false positive and avoids
masking real bugs.

Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161101121134.27504-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-11-01 13:01:44 +00:00
Chris Wilson db6c2b4151 drm/i915: Store the vma in an rbtree under the object
With full-ppgtt one of the main bottlenecks is the lookup of the VMA
underneath the object. For execbuf there is merit in having a very fast
direct lookup of ctx:handle to the vma using a hashtree, but that still
leaves a large number of other lookups. One way to speed up the lookup
would be to use a rhashtable, but that requires extra allocations and
may exhibit poor worse case behaviour. An alternative is to use an
embedded rbtree, i.e. no extra allocations and deterministic behaviour,
but at the slight cost of O(lgN) lookups (instead of O(1) for
rhashtable). The major of such tree will be very shallow and so not much
slower, and still scales much, much better than the current unsorted
list.

v2: Bump vma_compare() to return a long, as we return the result of
comparing two pointers.

References: https://bugs.freedesktop.org/show_bug.cgi?id=87726
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161101115400.15647-1-chris@chris-wilson.co.uk
2016-11-01 13:00:40 +00:00
Chris Wilson bc0629a767 drm/i915: Track pages pinned due to swizzling quirk
If we have a tiled object and an unknown CPU swizzle pattern, we pin the
pages to prevent the object from being swapped out (and us corrupting
the contents as we do not know the access pattern and so cannot convert
it to linear and back to tiled on reuse). This requires us to remember
to drop the extra pinning when freeing the object, or else we trigger
warnings about the pin leak. In commit fbbd37b36f ("drm/i915: Move
object release to a freelist + worker"), the object free path was
deferred to a worker, but the unpinning of the quirk, along with marking
the object as reclaimable, was left on the immediate path (so that if
required we could reclaim the pages under memory pressure as early as
possible). However, this split introduced a bug where the pages were no
longer being unpinned if they were marked as unneeded.

[  231.800401] WARNING: CPU: 1 PID: 90 at drivers/gpu/drm/i915/i915_gem.c:4275 __i915_gem_free_objects+0x326/0x3c0 [i915]
[  231.800403] WARN_ON(i915_gem_object_has_pinned_pages(obj))
[  231.800405] Modules linked in:
[  231.800406]  snd_hda_intel i915 snd_hda_codec_generic mei_me snd_hda_codec coretemp snd_hwdep mei lpc_ich snd_hda_core snd_pcm e1000e ptp pps_core [last unloaded: i915]
[  231.800426] CPU: 1 PID: 90 Comm: kworker/1:4 Tainted: G     U          4.9.0-rc2-CI-CI_DRM_1780+ #1
[  231.800428] Hardware name: LENOVO 7465CTO/7465CTO, BIOS 6DET44WW (2.08 ) 04/22/2009
[  231.800456] Workqueue: events __i915_gem_free_work [i915]
[  231.800459]  ffffc9000034fc80 ffffffff8142dd65 ffffc9000034fcd0 0000000000000000
[  231.800465]  ffffc9000034fcc0 ffffffff8107e4e6 000010b300000001 0000000000001000
[  231.800469]  ffff88011d3db740 ffff880130ef0000 0000000000000000 ffff880130ef5ea0
[  231.800474] Call Trace:
[  231.800479]  [<ffffffff8142dd65>] dump_stack+0x67/0x92
[  231.800484]  [<ffffffff8107e4e6>] __warn+0xc6/0xe0
[  231.800487]  [<ffffffff8107e54a>] warn_slowpath_fmt+0x4a/0x50
[  231.800491]  [<ffffffff811d12ac>] ? kmem_cache_free+0x2dc/0x340
[  231.800520]  [<ffffffffa009ef36>] __i915_gem_free_objects+0x326/0x3c0 [i915]
[  231.800548]  [<ffffffffa009effe>] __i915_gem_free_work+0x2e/0x50 [i915]
[  231.800552]  [<ffffffff8109c27c>] process_one_work+0x1ec/0x6b0
[  231.800555]  [<ffffffff8109c1f6>] ? process_one_work+0x166/0x6b0
[  231.800558]  [<ffffffff8109c789>] worker_thread+0x49/0x490
[  231.800561]  [<ffffffff8109c740>] ? process_one_work+0x6b0/0x6b0
[  231.800563]  [<ffffffff8109c740>] ? process_one_work+0x6b0/0x6b0
[  231.800566]  [<ffffffff810a2aab>] kthread+0xeb/0x110
[  231.800569]  [<ffffffff810a29c0>] ? kthread_park+0x60/0x60
[  231.800573]  [<ffffffff818164a7>] ret_from_fork+0x27/0x40

Moving to a separate flag for tracking the quirked pin is overkill for
the bug (since we only have to interchange the two tests in
i915_gem_free_object) but it does reduce a complicated test on all
objects and provide a sanitycheck for uncommon code paths.

Fixes: fbbd37b36f ("drm/i915: Move object release to a freelist + worker")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161101100317.11129-2-chris@chris-wilson.co.uk
2016-11-01 10:48:41 +00:00
Chris Wilson cb399eabc4 drm/i915: Avoid accessing request->timeline outside of its lifetime
Whilst waiting on a request, we may do so without holding any locks or
any guards beyond a reference to the request. In order to avoid taking
locks within request deallocation, we drop references to its timeline
(via the context and ppgtt) upon retirement. We should avoid chasing
such pointers outside of their control, in particular we inspect the
request->timeline to see if we may restore the RPS waitboost for a
client. If we instead look at the engine->timeline, we will have similar
behaviour on both full-ppgtt and !full-ppgtt systems and reduce the
amount of reward we give towards stalling clients (i.e. only if the
client stalls and the GPU is uncontended does it reclaim its boost).
This restores behaviour back to pre-timelines, whilst fixing:

[  645.078485] BUG: KASAN: use-after-free in i915_gem_object_wait_fence+0x1ee/0x2e0 at addr ffff8802335643a0
[  645.078577] Read of size 4 by task gem_exec_schedu/28408
[  645.078638] CPU: 1 PID: 28408 Comm: gem_exec_schedu Not tainted 4.9.0-rc2+ #64
[  645.078724] Hardware name:                  /        , BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015
[  645.078816]  ffff88022daef9a0 ffffffff8143d059 ffff880235402a80 ffff880233564200
[  645.078998]  ffff88022daef9c8 ffffffff81229c5c ffff88022daefa48 ffff880233564200
[  645.079172]  ffff880235402a80 ffff88022daefa38 ffffffff81229ef0 000000008110a796
[  645.079345] Call Trace:
[  645.079404]  [<ffffffff8143d059>] dump_stack+0x68/0x9f
[  645.079467]  [<ffffffff81229c5c>] kasan_object_err+0x1c/0x70
[  645.079534]  [<ffffffff81229ef0>] kasan_report_error+0x1f0/0x4b0
[  645.079601]  [<ffffffff8122a244>] kasan_report+0x34/0x40
[  645.079676]  [<ffffffff81634f5e>] ? i915_gem_object_wait_fence+0x1ee/0x2e0
[  645.079741]  [<ffffffff81229951>] __asan_load4+0x61/0x80
[  645.079807]  [<ffffffff81634f5e>] i915_gem_object_wait_fence+0x1ee/0x2e0
[  645.079876]  [<ffffffff816364bf>] i915_gem_object_wait+0x19f/0x590
[  645.079944]  [<ffffffff81636320>] ? i915_gem_object_wait_priority+0x500/0x500
[  645.080016]  [<ffffffff8110fb30>] ? debug_show_all_locks+0x1e0/0x1e0
[  645.080084]  [<ffffffff8110abdc>] ? check_chain_key+0x14c/0x210
[  645.080157]  [<ffffffff8110a796>] ? __lock_is_held+0x46/0xc0
[  645.080226]  [<ffffffff8163bc61>] ? i915_gem_set_domain_ioctl+0x141/0x690
[  645.080296]  [<ffffffff8163bcc2>] i915_gem_set_domain_ioctl+0x1a2/0x690
[  645.080366]  [<ffffffff811f8f85>] ? __might_fault+0x75/0xe0
[  645.080433]  [<ffffffff815a55f7>] drm_ioctl+0x327/0x640
[  645.080508]  [<ffffffff8163bb20>] ? i915_gem_obj_prepare_shmem_write+0x3a0/0x3a0
[  645.080603]  [<ffffffff815a52d0>] ? drm_ioctl_permit+0x120/0x120
[  645.080670]  [<ffffffff8110abdc>] ? check_chain_key+0x14c/0x210
[  645.080738]  [<ffffffff81275717>] do_vfs_ioctl+0x127/0xa20
[  645.080804]  [<ffffffff8120268c>] ? do_mmap+0x47c/0x580
[  645.080871]  [<ffffffff811da567>] ? vm_mmap_pgoff+0x117/0x140
[  645.080938]  [<ffffffff812755f0>] ? ioctl_preallocate+0x150/0x150
[  645.081011]  [<ffffffff81108c53>] ? up_write+0x23/0x50
[  645.081078]  [<ffffffff811da567>] ? vm_mmap_pgoff+0x117/0x140
[  645.081145]  [<ffffffff811da450>] ? vma_is_stack_for_current+0x90/0x90
[  645.081214]  [<ffffffff8110d853>] ? mark_held_locks+0x23/0xc0
[  645.082030]  [<ffffffff81288408>] ? __fget+0x168/0x250
[  645.082106]  [<ffffffff819ad517>] ? entry_SYSCALL_64_fastpath+0x5/0xb1
[  645.082176]  [<ffffffff81288592>] ? __fget_light+0xa2/0xc0
[  645.082242]  [<ffffffff8127604c>] SyS_ioctl+0x3c/0x70
[  645.082309]  [<ffffffff819ad52e>] entry_SYSCALL_64_fastpath+0x1c/0xb1
[  645.082374] Object at ffff880233564200, in cache kmalloc-8192 size: 8192
[  645.082431] Allocated:
[  645.082480] PID = 28408
[  645.082535]  [  645.082566] [<ffffffff8103ae66>] save_stack_trace+0x16/0x20
[  645.082623]  [  645.082656] [<ffffffff81228b06>] save_stack+0x46/0xd0
[  645.082716]  [  645.082756] [<ffffffff812292fd>] kasan_kmalloc+0xad/0xe0
[  645.082817]  [  645.082848] [<ffffffff81631752>] i915_ppgtt_create+0x52/0x220
[  645.082908]  [  645.082941] [<ffffffff8161db96>] i915_gem_create_context+0x396/0x560
[  645.083027]  [  645.083059] [<ffffffff8161f857>] i915_gem_context_create_ioctl+0x97/0xf0
[  645.083152]  [  645.083183] [<ffffffff815a55f7>] drm_ioctl+0x327/0x640
[  645.083243]  [  645.083274] [<ffffffff81275717>] do_vfs_ioctl+0x127/0xa20
[  645.083334]  [  645.083372] [<ffffffff8127604c>] SyS_ioctl+0x3c/0x70
[  645.083432]  [  645.083464] [<ffffffff819ad52e>] entry_SYSCALL_64_fastpath+0x1c/0xb1
[  645.083551] Freed:
[  645.083599] PID = 27629
[  645.083648]  [  645.083676] [<ffffffff8103ae66>] save_stack_trace+0x16/0x20
[  645.083738]  [  645.083770] [<ffffffff81228b06>] save_stack+0x46/0xd0
[  645.083830]  [  645.083862] [<ffffffff81229203>] kasan_slab_free+0x73/0xc0
[  645.083922]  [  645.083961] [<ffffffff812279c9>] kfree+0xa9/0x170
[  645.084021]  [  645.084053] [<ffffffff81629f60>] i915_ppgtt_release+0x100/0x180
[  645.084139]  [  645.084171] [<ffffffff8161d414>] i915_gem_context_free+0x1b4/0x230
[  645.084257]  [  645.084288] [<ffffffff816537b2>] intel_lr_context_unpin+0x192/0x230
[  645.084380]  [  645.084413] [<ffffffff81645250>] i915_gem_request_retire+0x620/0x630
[  645.084500]  [  645.085226] [<ffffffff816473d1>] i915_gem_retire_requests+0x181/0x280
[  645.085313]  [  645.085352] [<ffffffff816352ba>] i915_gem_retire_work_handler+0xca/0xe0
[  645.085440]  [  645.085471] [<ffffffff810c725b>] process_one_work+0x4fb/0x920
[  645.085532]  [  645.085562] [<ffffffff810c770d>] worker_thread+0x8d/0x840
[  645.085622]  [  645.085653] [<ffffffff810d21e5>] kthread+0x185/0x1b0
[  645.085718]  [  645.085750] [<ffffffff819ad7a7>] ret_from_fork+0x27/0x40
[  645.085811] Memory state around the buggy address:
[  645.085869]  ffff880233564280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  645.085956]  ffff880233564300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  645.086053] >ffff880233564380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  645.086138]                                ^
[  645.086193]  ffff880233564400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  645.086283]  ffff880233564480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

v2: Add a comment to document the hint like nature of
 intel_engine_last_submit()

Fixes: 73cb97010d ("drm/i915: Combine seqno + tracking into a global timeline struct")
Fixes: 80b204bce8 ("drm/i915: Enable multiple timelines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161101100317.11129-1-chris@chris-wilson.co.uk
2016-11-01 10:48:40 +00:00
Chris Wilson 7d5d59e527 drm/i915: Use the full hammer when shutting down the rcu tasks
To flush all call_rcu() tasks (here from i915_gem_free_object()) we need
to call rcu_barrier() (not synchronize_rcu()). If we don't then we may
still have objects being freed as we continue to teardown the driver -
in particular, the recently released rings may race with the memory
manager shutdown resulting in sporadic:

[  142.217186] WARNING: CPU: 7 PID: 6185 at drivers/gpu/drm/drm_mm.c:932 drm_mm_takedown+0x2e/0x40
[  142.217187] Memory manager not clean during takedown.
[  142.217187] Modules linked in: i915(-) x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel lpc_ich snd_hda_codec_realtek snd_hda_codec_generic mei_me mei snd_hda_codec_hdmi snd_hda_codec snd_hwdep snd_hda_core snd_pcm e1000e ptp pps_core [last unloaded: snd_hda_intel]
[  142.217199] CPU: 7 PID: 6185 Comm: rmmod Not tainted 4.9.0-rc2-CI-Trybot_242+ #1
[  142.217199] Hardware name: LENOVO 10AGS00601/SHARKBAY, BIOS FBKT34AUS 04/24/2013
[  142.217200]  ffffc90002ecfce0 ffffffff8142dd65 ffffc90002ecfd30 0000000000000000
[  142.217202]  ffffc90002ecfd20 ffffffff8107e4e6 000003a40778c2a8 ffff880401355c48
[  142.217204]  ffff88040778c2a8 ffffffffa040f3c0 ffffffffa040f4a0 00005621fbf8b1f0
[  142.217206] Call Trace:
[  142.217209]  [<ffffffff8142dd65>] dump_stack+0x67/0x92
[  142.217211]  [<ffffffff8107e4e6>] __warn+0xc6/0xe0
[  142.217213]  [<ffffffff8107e54a>] warn_slowpath_fmt+0x4a/0x50
[  142.217214]  [<ffffffff81559e3e>] drm_mm_takedown+0x2e/0x40
[  142.217236]  [<ffffffffa035c02a>] i915_gem_cleanup_stolen+0x1a/0x20 [i915]
[  142.217246]  [<ffffffffa034c581>] i915_ggtt_cleanup_hw+0x31/0xb0 [i915]
[  142.217253]  [<ffffffffa0310311>] i915_driver_cleanup_hw+0x31/0x40 [i915]
[  142.217260]  [<ffffffffa0312001>] i915_driver_unload+0x141/0x1a0 [i915]
[  142.217268]  [<ffffffffa031c2c4>] i915_pci_remove+0x14/0x20 [i915]
[  142.217269]  [<ffffffff8147d214>] pci_device_remove+0x34/0xb0
[  142.217271]  [<ffffffff8157b14c>] __device_release_driver+0x9c/0x150
[  142.217272]  [<ffffffff8157bcc6>] driver_detach+0xb6/0xc0
[  142.217273]  [<ffffffff8157abe3>] bus_remove_driver+0x53/0xd0
[  142.217274]  [<ffffffff8157c787>] driver_unregister+0x27/0x50
[  142.217276]  [<ffffffff8147c265>] pci_unregister_driver+0x25/0x70
[  142.217287]  [<ffffffffa03d764c>] i915_exit+0x1a/0x71 [i915]
[  142.217289]  [<ffffffff811136b3>] SyS_delete_module+0x193/0x1e0
[  142.217291]  [<ffffffff818174ae>] entry_SYSCALL_64_fastpath+0x1c/0xb1
[  142.217292] ---[ end trace 6fd164859c154772 ]---
[  142.217505] [drm:show_leaks] *ERROR* node [6b6b6b6b6b6b6b6b + 6b6b6b6b6b6b6b6b]: inserted at
                [<ffffffff81559ff3>] save_stack.isra.1+0x53/0xa0
                [<ffffffff8155a98d>] drm_mm_insert_node_in_range_generic+0x2ad/0x360
                [<ffffffffa035bf23>] i915_gem_stolen_insert_node_in_range+0x93/0xe0 [i915]
                [<ffffffffa035c855>] i915_gem_object_create_stolen+0x75/0xb0 [i915]
                [<ffffffffa036a51a>] intel_engine_create_ring+0x9a/0x140 [i915]
                [<ffffffffa036a921>] intel_init_ring_buffer+0xf1/0x440 [i915]
                [<ffffffffa036be1b>] intel_init_render_ring_buffer+0xab/0x1b0 [i915]
                [<ffffffffa0363d08>] intel_engines_init+0xc8/0x210 [i915]
                [<ffffffffa0355d7c>] i915_gem_init+0xac/0xf0 [i915]
                [<ffffffffa0311454>] i915_driver_load+0x9c4/0x1430 [i915]
                [<ffffffffa031c2f8>] i915_pci_probe+0x28/0x40 [i915]
                [<ffffffff8147d315>] pci_device_probe+0x85/0xf0
                [<ffffffff8157b7ff>] driver_probe_device+0x21f/0x430
                [<ffffffff8157baee>] __driver_attach+0xde/0xe0

In particular note that the node was being poisoned as we inspected the
list, a  clear indication that the object is being freed as we make the
assertion.

v2: Don't loop, just assert that we do all the work required as that
will be better at detecting further errors.

Fixes: fbbd37b36f ("drm/i915: Move object release to a freelist + worker")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161101084843.3961-1-chris@chris-wilson.co.uk
2016-11-01 09:30:08 +00:00
Chris Wilson 80b204bce8 drm/i915: Enable multiple timelines
With the infrastructure converted over to tracking multiple timelines in
the GEM API whilst preserving the efficiency of using a single execution
timeline internally, we can now assign a separate timeline to every
context with full-ppgtt.

v2: Add a comment to indicate the xfer between timelines upon submission.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-35-chris@chris-wilson.co.uk
2016-10-28 20:53:57 +01:00
Chris Wilson 28176ef4cf drm/i915: Reserve space in the global seqno during request allocation
A restriction on our global seqno is that they cannot wrap, and that we
cannot use the value 0. This allows us to detect when a request has not
yet been submitted, its global seqno is still 0, and ensures that
hardware semaphores are monotonic as required by older hardware. To
meet these restrictions when we defer the assignment of the global
seqno, we must check that we have an available slot in the global seqno
space during request construction. If that test fails, we wait for all
requests to be completed and reset the hardware back to 0.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-33-chris@chris-wilson.co.uk
2016-10-28 20:53:56 +01:00
Chris Wilson 65e4760e39 drm/i915: Introduce a global_seqno for each request
Though we will have multiple timelines, we still have a single timeline
of execution. This we can use to provide an execution and retirement order
of requests. This keeps tracking execution of requests simple, and vital
for preserving a single waiter (i.e. so that we can order the waiters so
that only the earliest to wakeup need be woken). To accomplish this we
distinguish the seqno used to order requests per-context (external) and
that used internally for execution.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-26-chris@chris-wilson.co.uk
2016-10-28 20:53:53 +01:00
Chris Wilson 3033acab07 drm/i915: Queue the idling context switch after all other timelines
Before suspend, we wait for the switch to the kernel context. In order
for all the other context images to be complete upon suspend, that
switch must be the last operation by the GPU (i.e. this idling request
must not overtake any pending requests). To make this request execute last,
we make it depend on every other inflight request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-24-chris@chris-wilson.co.uk
2016-10-28 20:53:52 +01:00
Chris Wilson 73cb97010d drm/i915: Combine seqno + tracking into a global timeline struct
Our timelines are more than just a seqno. They also provide an ordered
list of requests to be executed. Due to the restriction of handling
individual address spaces, we are limited to a timeline per address
space but we use a fence context per engine within.

Our first step to introducing independent timelines per context (i.e. to
allow each context to have a queue of requests to execute that have a
defined set of dependencies on other requests) is to provide a timeline
abstraction for the global execution queue.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-23-chris@chris-wilson.co.uk
2016-10-28 20:53:51 +01:00
Chris Wilson d07f0e59b2 drm/i915: Move GEM activity tracking into a common struct reservation_object
In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)

v2: Reimplement busy-ioctl (by walking the reservation object), postpone
the ABI change for another day. Similarly use the reservation object to
find the last_write request (if active and from i915) for choosing
display CS flips.

Caveats:

 * busy-ioctl: busy-ioctl only reports on the native fences, it will not
warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is
being rendered to by external fences. It also will not report the same
busy state as wait-ioctl (or polling on the dma-buf) in the same
circumstances. On the plus side, it does retain reporting of which
*i915* engines are engaged with this object.

 * non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering, see
"drm/i915: Restore nonblocking awaits for modesetting"

 * dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 66s), mainly due to atomic operations
(maintaining the fence refcounts).

 * loss of object-level retirement callbacks, emulated by VMA retirement
tracking.

 * minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
2016-10-28 20:53:50 +01:00
Chris Wilson f0cd518206 drm/i915: Use lockless object free
Having moved the locked phase of freeing an object to a separate worker,
we can now declare to the core that we only need the unlocked variant of
driver->gem_free_object, and can use the simple unreference internally.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-20-chris@chris-wilson.co.uk
2016-10-28 20:53:50 +01:00
Chris Wilson fbbd37b36f drm/i915: Move object release to a freelist + worker
We want to hide the latency of releasing objects and their backing
storage from the submission, so we move the actual free to a worker.
This allows us to switch to struct_mutex freeing of the object in the
next patch.

Furthermore, if we know that the object we are dereferencing remains valid
for the duration of our access, we can forgo the usual synchronisation
barriers and atomic reference counting. To ensure this we defer freeing
an object til after an RCU grace period, such that any lookup of the
object within an RCU read critical section will remain valid until
after we exit that critical section. We also employ this delay for
rate-limiting the serialisation on reallocation - we have to slow down
object creation in order to prevent resource starvation (in particular,
files).

v2: Return early in i915_gem_tiling() ioctl to skip over superfluous
work on error.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-19-chris@chris-wilson.co.uk
2016-10-28 20:53:49 +01:00
Chris Wilson 40e62d5d6b drm/i915: Acquire the backing storage outside of struct_mutex in set-domain
As we can locklessly (well struct_mutex-lessly) acquire the backing
storage, do so in set-domain-ioctl to reduce the contention on the
struct_mutex.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-18-chris@chris-wilson.co.uk
2016-10-28 20:53:49 +01:00
Chris Wilson fe115628d5 drm/i915: Implement pwrite without struct-mutex
We only need struct_mutex within pwrite for a brief window where we need
to serialise with rendering and control our cache domains. Elsewhere we
can rely on the backing storage being pinned, and forgive userspace any
races against us.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-17-chris@chris-wilson.co.uk
2016-10-28 20:53:48 +01:00
Chris Wilson bb6dc8d96b drm/i915: Implement pread without struct-mutex
We only need struct_mutex within pread for a brief window where we need
to serialise with rendering and control our cache domains. Elsewhere we
can rely on the backing storage being pinned, and forgive userspace any
races against us.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-16-chris@chris-wilson.co.uk
2016-10-28 20:53:48 +01:00
Chris Wilson 1233e2db19 drm/i915: Move object backing storage manipulation to its own locking
Break the allocation of the backing storage away from struct_mutex into
a per-object lock. This allows parallel page allocation, provided we can
do so outside of struct_mutex (i.e. set-domain-ioctl, pwrite, GTT
fault), i.e. before execbuf! The increased cost of the atomic counters
are hidden behind i915_vma_pin() for the typical case of execbuf, i.e.
as the object is typically bound between execbufs, the page_pin_count is
static. The cost will be felt around set-domain and pwrite, but offset
by the improvement from reduced struct_mutex contention.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-14-chris@chris-wilson.co.uk
2016-10-28 20:53:47 +01:00
Chris Wilson 03ac84f183 drm/i915: Pass around sg_table to get_pages/put_pages backend
The plan is to move obj->pages out from under the struct_mutex into its
own per-object lock. We need to prune any assumption of the struct_mutex
from the get_pages/put_pages backends, and to make it easier we pass
around the sg_table to operate on rather than indirectly via the obj.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-13-chris@chris-wilson.co.uk
2016-10-28 20:53:47 +01:00
Chris Wilson a4f5ea64f0 drm/i915: Refactor object page API
The plan is to make obtaining the backing storage for the object avoid
struct_mutex (i.e. use its own locking). The first step is to update the
API so that normal users only call pin/unpin whilst working on the
backing storage.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-12-chris@chris-wilson.co.uk
2016-10-28 20:53:46 +01:00
Chris Wilson 96d7763452 drm/i915: Use a radixtree for random access to the object's backing storage
A while ago we switched from a contiguous array of pages into an sglist,
for that was both more convenient for mapping to hardware and avoided
the requirement for a vmalloc array of pages on every object. However,
certain GEM API calls (like pwrite, pread as well as performing
relocations) do desire access to individual struct pages. A quick hack
was to introduce a cache of the last access such that finding the
following page was quick - this works so long as the caller desired
sequential access. Walking backwards, or multiple callers, still hits a
slow linear search for each page. One solution is to store each
successful lookup in a radix tree.

v2: Rewrite building the radixtree for clarity, hopefully.

v3: Rearrange execbuf to avoid calling i915_gem_object_get_sg() from
within an atomic section and so relax the allocation context to a simple
GFP_KERNEL and mutex.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-10-chris@chris-wilson.co.uk
2016-10-28 20:53:45 +01:00
Chris Wilson 4c7d62c6b8 drm/i915: Markup GEM API with lockdep asserts
Add lockdep_assert_held(struct_mutex) to the API preamble of the
internal GEM interfaces.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-9-chris@chris-wilson.co.uk
2016-10-28 20:53:45 +01:00
Chris Wilson f8a7fde456 drm/i915: Defer active reference until required
We only need the active reference to keep the object alive after the
handle has been deleted (so as to prevent a synchronous gem_close). Why
then pay the price of a kref on every execbuf when we can insert that
final active ref just in time for the handle deletion?

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-6-chris@chris-wilson.co.uk
2016-10-28 20:53:43 +01:00
Chris Wilson e95433c73a drm/i915: Rearrange i915_wait_request() accounting with callers
Our low-level wait routine has evolved from our generic wait interface
that handled unlocked, RPS boosting, waits with time tracking. If we
push our GEM fence tracking to use reservation_objects (required for
handling multiple timelines), we lose the ability to pass the required
information down to i915_wait_request(). However, if we push the extra
functionality from i915_wait_request() to the individual callsites
(i915_gem_object_wait_rendering and i915_gem_wait_ioctl) that make use
of those extras, we can both simplify our low level wait and prepare for
extending the GEM interface for use of reservation_objects.

v2: Rewrite i915_wait_request() kerneldocs

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-4-chris@chris-wilson.co.uk
2016-10-28 20:53:43 +01:00
Chris Wilson c92ac094a9 drm/i915: Remove superfluous wait_for_error() from throttle-ioctl
The throttle-ioctl never touches the struct_mutex. It does, however, as
part of its ABI report whether the hardware is terminally wedged. For
that purposes, it only has to report the current state and not incur the
cost of checking/waiting every invocation, as we do not have to wait for
a reset before waiting on a request to ensure completion (that is baked
into the wait request implementation).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-3-chris@chris-wilson.co.uk
2016-10-28 20:53:42 +01:00
Chris Wilson b52992c06c drm/i915: Support asynchronous waits on struct fence from i915_gem_request
We will need to wait on DMA completion (as signaled via struct fence)
before executing our i915_gem_request. Therefore we want to expose a
method for adding the await on the fence itself to the request.

v2: Add a comment detailing a failure to handle a signal-on-any
fence-array.
v3: Pretend that magic numbers don't exist.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-1-chris@chris-wilson.co.uk
2016-10-28 20:53:41 +01:00
Tvrtko Ursulin 3299e7e434 drm/i915: Remove two invalid warns
Objects can have multiple VMAs used for display in which
case assertion that objects must not be pinned for display
more times than the current VMA is incorrect.

v2: Commit message update. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 058d88c433 ("drm/i915: Track pinned VMA")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1477413635-3876-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-10-26 09:04:56 +01:00
Tvrtko Ursulin 07ee2bce6a drm/i915: Rotated view does not need a fence
We do not need to set up a fence for the rotated view.

Display does not need it and no one can access it.

v2: Move code to __i915_vma_set_map_and_fenceable. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 05a20d098d ("drm/i915: Move map-and-fenceable tracking to the VMA")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-10-26 09:04:55 +01:00
Chris Wilson de867c20b9 drm/i915: Include the kernel uptime in the error state
As well as knowing when the error occurred, it is more interesting to me
to know how long after booting the error occurred, and for good measure
record the time since last hw initialisation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161025121602.1457-1-chris@chris-wilson.co.uk
2016-10-25 13:22:43 +01:00
Daniel Vetter f9bf1d97e8 Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued
Backmerge because Chris Wilson needs the very latest&greates of
Gustavo Padovan's sync_file work, specifically the refcounting changes
from:

commit 30cd85dd6e
Author: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Date:   Wed Oct 19 15:48:32 2016 -0200

    dma-buf/sync_file: hold reference to fence when creating sync_file

Also good to sync in general since git tends to get confused with the
cherry-picking going on.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2016-10-25 08:57:53 +02:00
Dave Airlie 5481e27f6f Merge tag 'drm-intel-next-2016-10-24' of git://anongit.freedesktop.org/drm-intel into drm-next
- first slice of the gvt device model (Zhenyu et al)
- compression support for gpu error states (Chris)
- sunset clause on gpu errors resulting in dmesg noise telling users
  how to report them
- .rodata diet from Tvrtko
- switch over lots of macros to only take dev_priv (Tvrtko)
- underrun suppression for dp link training (Ville)
- lspcon (hmdi 2.0 on skl/bxt) support from Shashank Sharma, polish
  from Jani
- gen9 wm fixes from Paulo&Lyude
- updated ddi programming for kbl (Rodrigo)
- respect alternate aux/ddc pins (from vbt) for all ddi ports (Ville)

* tag 'drm-intel-next-2016-10-24' of git://anongit.freedesktop.org/drm-intel: (227 commits)
  drm/i915: Update DRIVER_DATE to 20161024
  drm/i915: Stop setting SNB min-freq-table 0 on powersave setup
  drm/i915/dp: add lane_count check in intel_dp_check_link_status
  drm/i915: Fix whitespace issues
  drm/i915: Clean up DDI DDC/AUX CH sanitation
  drm/i915: Respect alternate_ddc_pin for all DDI ports
  drm/i915: Respect alternate_aux_channel for all DDI ports
  drm/i915/gen9: Remove WaEnableYV12BugFixInHalfSliceChicken7
  drm/i915: KBL - Recommended buffer translation programming for DisplayPort
  drm/i915: Move down skl/kbl ddi iboost and n_edp_entires fixup
  drm/i915: Add a sunset clause to GPU hang logging
  drm/i915: Stop reporting error details in dmesg as well as the error-state
  drm/i915/gvt: do not ignore return value of create_scratch_page
  drm/i915/gvt: fix spare warnings on odd constant _Bool cast
  drm/i915/gvt: mark symbols static where possible
  drm/i915/gvt: fix sparse warnings on different address spaces
  drm/i915/gvt: properly access enabled intel_engine_cs
  drm/i915/gvt: Remove defunct vmap_batch()
  drm/i915/gvt: Use common mapping routines for shadow_bb object
  drm/i915/gvt: Use common mapping routines for indirect_ctx object
  ...
2016-10-25 16:39:43 +10:00
Chris Wilson 7c108fd8fe drm/i915: Move fence cancellation to runtime suspend
At the moment, we have dependency on the RPM as a barrier itself in both
i915_gem_release_all_mmaps() and i915_gem_restore_fences().
i915_gem_restore_fences() is also called along !runtime pm paths, but we
can move the markup of lost fences alongside releasing the mmaps into a
common i915_gem_runtime_suspend(). This has the advantage of locating
all the tricky barrier dependencies into one location.

v2: Just mark the fence as invalid (fence->dirty) so that upon waking we
will be sure to clear the fence after use, or restore it to the correct
value before use. This makes sure that if the fence is left intact
across the sleep, we do not leave it pointing to a region of GTT for the
next unsuspecting user.

Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Imre Deak <imre.deak@linux.intel.com>
Reviewed-by: Imre Deak <imre.deak@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161024124218.18252-5-chris@chris-wilson.co.uk
2016-10-24 13:45:38 +01:00
Chris Wilson 3594a3e21f drm/i915: Remove superfluous locking around userfault_list
Now that we have reduced the access to the list to either (a) under the
struct_mutex whilst holding the RPM wakeref (so that concurrent writers to
the list are serialised by struct_mutex) and (b) under the atomic
runtime suspend (which cannot run concurrently with any other accessor due
to the atomic nature of the runtime suspend) we can remove the extra
locking around the list itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161024124218.18252-3-chris@chris-wilson.co.uk
2016-10-24 13:45:36 +01:00
Chris Wilson 9c870d0367 drm/i915: Use RPM as the barrier for controlling user mmap access
We can remove the false coupling between RPM and struct mutex by the
observation that we can use the RPM wakeref as the barrier around user
mmap access. That is as we tear down the user's PTE atomically from
within rpm suspend and then to fault in new PTE requires the rpm
wakeref, means that no user access is possible through those PTE without
RPM being awake. Having made that observation, we can then remove the
presumption of having to take rpm outside of struct_mutex and so allow
fine grained acquisition of a wakeref around hw access rather than
having to remember to acquire the wakeref early on.

v2: Rejig placement of the new intel_runtime_pm_get() to be as tight
as possible around the GTT pread/pwrite.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161024124218.18252-2-chris@chris-wilson.co.uk
2016-10-24 13:45:35 +01:00
Chris Wilson 275f039db5 drm/i915: Move user fault tracking to a separate list
We want to decouple RPM and struct_mutex, but currently RPM has to walk
the list of bound objects and remove userspace mmapping before we
suspend (otherwise userspace may continue to access the GTT whilst it is
powered down). This currently requires the struct_mutex to walk the
bound_list, but if we move that to a separate list and lock we can take
the first step towards removing the struct_mutex.

v2: Split runtime suspend unmapping vs regular unmapping, to make the
locking (and barriers) clearer. Add the object to the userfault_list
prior to inserting the first PTE, the race between add/revoke depends
upon struct_mutex for regular unmappings and rpm for runtime-suspend.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> #v1
Link: http://patchwork.freedesktop.org/patch/msgid/20161024124218.18252-1-chris@chris-wilson.co.uk
2016-10-24 13:45:35 +01:00
Chris Wilson 4ff340f061 drm/i915: Limit the scattergather coalescing to 32bits
The scattergather list uses a 32bit size counter, we should avoid
exceeding it.

v2: Also we should use unsigned int to match sg->length.

Fixes: 871dfbd67d ("drm/i915: Allow compaction upto SWIOTLB max segment size")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161018120251.25043-3-chris@chris-wilson.co.uk
2016-10-18 14:22:27 +01:00
Chris Wilson b4bcbe2a90 drm/i915: Document our internal limit on object size
In many places, we try to count pages using a 32 bit integer. That
implies if we are asked to create an object larger than 43bits, we will
subtly crash much later. Catch this on the boundary, and add a warning
to remind ourselves later on our exabyte systems.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161018120251.25043-2-chris@chris-wilson.co.uk
2016-10-18 14:22:26 +01:00
Chris Wilson 3ef7f22893 drm/i915: Bump object bookkeeping to u64 from size_t
Internally we allow for using more objects than a single process can
allocate, i.e. we allow for a 64bit GPU address space even on a 32bit
system. Using size_t may oveerflow.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161018120251.25043-1-chris@chris-wilson.co.uk
2016-10-18 14:22:26 +01:00
Michał Winiarski 4fb84d991e drm/i915: Remove unused "valid" parameter from pte_encode
We never used any invalid ptes, those were put in place for
a possibility of doing gpu faults. However our batchbuffers are not
restricted in length, so everything needs to be pointing to something
and thus out-of-bounds is pointing to scratch.

Remove the valid flag as it is always true.

v2: Expand commit msg, patch reorder (Mika)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1476360162-24062-1-git-send-email-michal.winiarski@intel.com
2016-10-14 12:40:32 +01:00
Tvrtko Ursulin 5db9401983 drm/i915: Make IS_GEN macros only take dev_priv
Saves 1416 bytes of .rodata strings.

v2: Add parantheses around dev_priv. (Ville Syrjala)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476352990-2504-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-10-14 12:23:22 +01:00
Tvrtko Ursulin 772c2a519c drm/i915: Make IS_HASWELL only take dev_priv
Saves 2432 bytes of .rodata strings.

v2: Add parantheses around dev_priv. (Ville Syrjala)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2016-10-14 12:23:19 +01:00
Tvrtko Ursulin 8652744b64 drm/i915: Make IS_BROADWELL only take dev_priv
Saves 1808 bytes of .rodata strings.

v2: Add parantheses around dev_priv. (Ville Syrjala)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2016-10-14 12:23:19 +01:00
Tvrtko Ursulin fd6b8f43c9 drm/i915: Make IS_IVYBRIDGE only take dev_priv
Saves 848 bytes of .rodata strings.

v2: Add parantheses around dev_priv. (Ville Syrjala)
v3: Rebase.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2016-10-14 12:23:19 +01:00