-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJYaYNlAAoJEHm+PkMAQRiGtCUH/18PMUJpHqRKjxL3Yscw+QZC
RmGlD/hwBRLUSgiTCfURNGKP4QZv2kQW7BGsGC72oL01lmxozsU72ixUIO+wXzDY
K2b0OOKGZZWzFtaVm7Qs+5JhHAEKZcT046mLD8sjJuqkrFAhmNLKdwHjihKBEkm9
J3s2tpdXdN0x/Uyga/GY9khEYIrvLPeBoKSz+JXcQKdC0iq3/+PMpWnN47QCNScr
7azojkJkj/rs2cqVdOi7Wbh6PSqIvPsl8E3qJefpaVJF/IQaU1pFdy5g8kYm4V7T
fr6HgIbuN4EQWdN/5cgKrUdpQyV7D8iYx02klk4R8WgfS0QMYoUcsg+XsTd02TI=
=OhGe
-----END PGP SIGNATURE-----
Merge tag 'v4.10-rc2' into drm-intel-next-queued
Backmerge Linux 4.10-rc2 to resync with our -fixes cherry-picks. I've
done the backmerge directly because Dave is on vacation.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
As the fence may be signaled concurrently from an interrupt on another
device, it is possible for the list of requests on the timeline to be
modified as we walk it. Take both (the context's timeline and the global
timeline) locks to prevent such modifications.
Fixes: 80b204bce8 ("drm/i915: Enable multiple timelines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-10-chris@chris-wilson.co.uk
As trimming the sg table is merely an optimisation that gracefully fails
if we cannot allocate a new table, we do not need to report the failure
either.
Fixes: 0c40ce130e ("drm/i915: Trim the object sg table")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-4-chris@chris-wilson.co.uk
When we teardown the backing storage for the phys object, we copy from
the coherent contiguous block back to the shmemfs object, clflushing as
we go. Trying to clflush the invalid sg beforehand just oops and would
be redundant (due to it already being coherent, and clflushed
afterwards).
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-3-chris@chris-wilson.co.uk
The idle work handler is self-arming - if it detects that it needs to
run again it will queue itself from its work handler. Take greater care
when trying to drain the idle work, and double check that it is flushed.
The free worker has a similar issue where it is armed by an RCU task
which may be running concurrently with us.
This should hopefully help with the sporadic WARN_ON(dev_priv->gt.awake)
from i915_gem_suspend.
v2: Reuse drain_freed_objects.
v3: Don't try to flush the freed objects from the shrinker, as it may be
underneath the struct_mutex already.
v4: do while and comment upon the excess rcu_barrier in drain_freed_objects
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-2-chris@chris-wilson.co.uk
Since commit db6c2b4151 ("drm/i915: Store the vma in an rbtree under
the object") the vma are once again sorted into GGTT first, then ppGTT
so that the typical case of walking the GGTT vma can stop as soon as we
find a non-ppGTT. Apply that optimisation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-1-chris@chris-wilson.co.uk
First set of i915 fixes for code in next.
* tag 'drm-intel-next-fixes-2016-12-22' of git://anongit.freedesktop.org/git/drm-intel:
drm/i915: skip the first 4k of stolen memory on everything >= gen8
drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping
drm/i915: Fix use after free in logical_render_ring_init
drm/i915: disable PSR by default on HSW/BDW
drm/i915: Fix setting of boost freq tunable
drm/i915: tune down the fast link training vs boot fail
drm/i915: Reorder phys backing storage release
drm/i915/gen9: Fix PCODE polling during SAGV disabling
drm/i915/gen9: Fix PCODE polling during CDCLK change notification
drm/i915/dsi: Fix chv_exec_gpio disabling the GPIOs it is setting
drm/i915/dsi: Fix swapping of MIPI_SEQ_DEASSERT_RESET / MIPI_SEQ_ASSERT_RESET
drm/i915/dsi: Do not clear DPOUNIT_CLOCK_GATE_DISABLE from vlv_init_display_clock_gating
drm/i915: drop the struct_mutex when wedged or trying to reset
If we at first do not succeed with attempting to remap our physical
pages using a coalesced scattergather list, try again with one
scattergather entry per page. This should help with swiotlb as it uses a
limited buffer size and only searches for contiguous chunks within its
buffer aligned up to the next boundary - i.e. we may prematurely cause a
failure as we are unable to utilize the unused space between large
chunks and trigger an error such as:
i915 0000:00:02.0: swiotlb buffer is full (sz: 1630208 bytes)
Reported-by: Juergen Gross <jgross@suse.com>
Tested-by: Juergen Gross <jgross@suse.com>
Fixes: 871dfbd67d ("drm/i915: Allow compaction upto SWIOTLB max segment size")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20161219124346.550-1-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(cherry picked from commit d766ef5300)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
In commit a4f5ea64f0 ("drm/i915: Refactor object page API"), I
reordered the object->pages teardown to be more friendly wrt to a
separate obj->mm.lock. However, I overlooked the phys object and left it
with a dangling use-after-free of its phys_handle. Move the allocation
of the phys handle to get_pages and it release to put_pages to prevent
the invalid access and to improve symmetry.
v2: Add commentary about always aligning to page size.
Testcase: igt/drv_selftest/objects
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Fixes: a4f5ea64f0 ("drm/i915: Refactor object page API")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161207133411.8028-1-chris@chris-wilson.co.uk
(cherry picked from commit dbb4351bab)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
In commit 0c40ce130e ("drm/i915: Trim the object sg table"), we expect
to copy exactly orig_st->nents across and allocate the table thusly.
The copy loop should therefore end with the new_sg being NULL.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161219124346.550-2-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
If we at first do not succeed with attempting to remap our physical
pages using a coalesced scattergather list, try again with one
scattergather entry per page. This should help with swiotlb as it uses a
limited buffer size and only searches for contiguous chunks within its
buffer aligned up to the next boundary - i.e. we may prematurely cause a
failure as we are unable to utilize the unused space between large
chunks and trigger an error such as:
i915 0000:00:02.0: swiotlb buffer is full (sz: 1630208 bytes)
Reported-by: Juergen Gross <jgross@suse.com>
Tested-by: Juergen Gross <jgross@suse.com>
Fixes: 871dfbd67d ("drm/i915: Allow compaction upto SWIOTLB max segment size")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20161219124346.550-1-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The requests conversion introduced a nasty bug where we could generate a
new request in the middle of constructing a request if we needed to idle
the system in order to evict space for a context. The request to idle
would be executed (and waited upon) before the current one, creating a
minor havoc in the seqno accounting, as we will consider the current
request to already be completed (prior to deferred seqno assignment) but
ring->last_retired_head would have been updated and still could allow
us to overwrite the current request before execution.
We also employed two different mechanisms to track the active context
until it was switched out. The legacy method allowed for waiting upon an
active context (it could forcibly evict any vma, including context's),
but the execlists method took a step backwards by pinning the vma for
the entire active lifespan of the context (the only way to evict was to
idle the entire GPU, not individual contexts). However, to circumvent
the tricky issue of locking (i.e. we cannot take struct_mutex at the
time of i915_gem_request_submit(), where we would want to move the
previous context onto the active tracker and unpin it), we take the
execlists approach and keep the contexts pinned until retirement.
The benefit of the execlists approach, more important for execlists than
legacy, was the reduction in work in pinning the context for each
request - as the context was kept pinned until idle, it could short
circuit the pinning for all active contexts.
We introduce new engine vfuncs to pin and unpin the context
respectively. The context is pinned at the start of the request, and
only unpinned when the following request is retired (this ensures that
the context is idle and coherent in main memory before we unpin it). We
move the engine->last_context tracking into the retirement itself
(rather than during request submission) in order to allow the submission
to be reordered or unwound without undue difficultly.
And finally an ulterior motive for unifying context handling was to
prepare for mock requests.
v2: Rename to last_retired_context, split out legacy_context tracking
for MI_SET_CONTEXT.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-3-chris@chris-wilson.co.uk
Convert some of the obvious hand-rolled ranged overflow sanity checks to
our shiny new range_overflows macro.
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161213203222.32564-4-matthew.auld@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Every single user of vmf->virtual_address typed that entry to unsigned
long before doing anything with it so the type of virtual_address does
not really provide us any additional safety. Just use masked
vmf->address which already has the appropriate type.
Link: http://lkml.kernel.org/r/1479460644-25076-3-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In commit a4f5ea64f0 ("drm/i915: Refactor object page API"), I
reordered the object->pages teardown to be more friendly wrt to a
separate obj->mm.lock. However, I overlooked the phys object and left it
with a dangling use-after-free of its phys_handle. Move the allocation
of the phys handle to get_pages and it release to put_pages to prevent
the invalid access and to improve symmetry.
v2: Add commentary about always aligning to page size.
Testcase: igt/drv_selftest/objects
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Fixes: a4f5ea64f0 ("drm/i915: Refactor object page API")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161207133411.8028-1-chris@chris-wilson.co.uk
Pineview deserves to use its own platform enum (which was already added,
unused, previously). IS_G33() no longer matches Pineview, and gets
replaced by IS_G33() || IS_PINEVIEW() or equivalent. Pineview is no
longer an outlier among platform definitions.
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1481143689-19672-1-git-send-email-jani.nikula@intel.com
We need to distinguish between full i915_vma structs and simple
drm_mm_nodes when considering eviction (i.e. we must be careful not to
treat a mere drm_mm_node as a much larger i915_vma causing memory
corruption, if we are lucky). To do this, color these not-a-vma with -1
(I915_COLOR_UNEVICTABLE).
v2...v200: New name for -1.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161205142941.21965-1-chris@chris-wilson.co.uk
Since the submit/execute split in commit d55ac5bf97 ("drm/i915: Defer
transfer onto execution timeline to actual hw submission") the
global seqno advance was deferred until the submit_request callback.
After wedging the GPU, we were installing a nop_submit_request handler
(to avoid waking up the dead hw) but I had missed converting this over
to the new scheme. Under the new scheme, we have to explicitly call
i915_gem_submit_request() from the submit_request handler to mark the
request as on the hardware. If we don't the request is always pending,
and any waiter will continue to wait indefinitely and hangcheck will not
be able to resolve the lockup.
References: https://bugs.freedesktop.org/show_bug.cgi?id=98748
Testcase: igt/gem_eio/in-flight
Fixes: d55ac5bf97 ("drm/i915: Defer transfer onto execution timeline to actual hw submission")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161122144121.7379-3-chris@chris-wilson.co.uk
(cherry picked from commit 3dcf93f7f2)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Simplifies the code to pass the right parameter in.
v2: Commit message. (Joonas Lahtinen)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Like GEM init, GUC init, MOCS init and context creation.
Enables them to lose dev_priv locals.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Makes all GEM object constructors consistent.
v2: Fix compilation in GVT code.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v1)
Where it is more appropriate and also to be consistent with
the direction of the driver.
v2: Leave out object alloc/free inlining. (Joonas Lahtinen)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
99% of the time we access i915_address_space->dev we want the i915
device and not the drm device, so let's store the drm_i915_private
backpointer instead. The only real complication here are the inlines
in i915_vma.h where drm_i915_private is not yet defined and so we have
to choose an alternate path for our asserts.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161129095008.32622-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
In order to prevent a race between the old callback submitting an
incomplete request and i915_gem_set_wedged() installing its nop handler,
we must ensure that the swap occurs when the machine is idle
(stop_machine).
v2: move context lost from out of BKL.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161122144121.7379-4-chris@chris-wilson.co.uk
Since the submit/execute split in commit d55ac5bf97 ("drm/i915: Defer
transfer onto execution timeline to actual hw submission") the
global seqno advance was deferred until the submit_request callback.
After wedging the GPU, we were installing a nop_submit_request handler
(to avoid waking up the dead hw) but I had missed converting this over
to the new scheme. Under the new scheme, we have to explicitly call
i915_gem_submit_request() from the submit_request handler to mark the
request as on the hardware. If we don't the request is always pending,
and any waiter will continue to wait indefinitely and hangcheck will not
be able to resolve the lockup.
References: https://bugs.freedesktop.org/show_bug.cgi?id=98748
Testcase: igt/gem_eio/in-flight
Fixes: d55ac5bf97 ("drm/i915: Defer transfer onto execution timeline to actual hw submission")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161122144121.7379-3-chris@chris-wilson.co.uk
When a user context is closed, it's file_priv backpointer is replaced by
ERR_PTR(-EBADF); be careful not to chase this invalid pointer after a
hang and a GPU reset.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Fixes: b083a0870c ("drm/i915: Add per client max context ban limit")
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161122144121.7379-1-chris@chris-wilson.co.uk
If we have a bad client submitting unfavourably across different
contexts, creating new ones, the per context scoring of badness
doesn't remove the root cause, the offending client.
To counter, keep track of per client context bans. Deny access if
client is responsible for more than 3 context bans in
it's lifetime.
v2: move ban check to context create ioctl (Chris)
v3: add commentary about hangs needed to reach client ban (Chris)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Now when driver has per context scoring of 'hanging badness'
and also subsequent hangs during short windows are allowed,
if there is progress made in between, it does not make sense
to expose a ban timing window as a context parameter anymore.
Let the scoring be the sole indicator for ban policy and substitute
ban period context parameter as a boolean to get/set context
bannable property.
v2: allow non root to opt into being banned (Chris)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
As hangcheck score was removed, the active decay of score
was removed also. This removed feature for hangcheck to detect
if the gpu client was accidentally or maliciously causing intermittent
hangs. Reinstate the scoring as a per context property, so that if
one context starts to act unfavourably, ban it.
v2: ban_period_secs as a gate to score check (Chris)
v3: decay in proper spot. scores as tunables (Chris)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Hangcheck state accumulation has gained more steps
along the years, like head movement and more recently the
subunit inactivity check. As the subunit sampling is only
done if the previous state check showed inactivity, we
have added more stages (and time) to reach a hang verdict.
Asymmetric engine states led to different actual weight of
'one hangcheck unit' and it was demonstrated in some
hangs that due to difference in stages, simpler engines
were accused falsely of a hang as their scoring was much
more quicker to accumulate above the hang treshold.
To completely decouple the hangcheck guilty score
from the hangcheck period, convert hangcheck score to a
rough period of inactivity measurement. As these are
tracked as jiffies, they are meaningful also across
reset boundaries. This makes finding a guilty engine
more accurate across multi engine activity scenarios,
especially across asymmetric engines.
We lose the ability to detect cross batch malicious attempts
to hinder the progress. Plan is to move this functionality
to be part of context banning which is more natural fit,
later in the series.
v2: use time_before macros (Chris)
reinstate the pardoning of moving engine after hc (Chris)
v3: avoid global state for per engine stall detection (Chris)
v4: take timeline last retirement into account (Chris)
v5: do debug print on pardoning, split out retirement timestamp (Chris)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Currently we only clflush the scanout if it is in the CPU domain. Also
flush if we have a pending CPU clflush. We also want to treat the
dirtyfb path similar, and flush any pending writes there as well.
v2: Only send the fb flush message if flushing the dirt on flip
v3: Make flush-for-flip and dirtyfb look more alike since they serve
similar roles as end-of-frame marker.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> #v2
Link: http://patchwork.freedesktop.org/patch/msgid/20161118211747.25197-1-chris@chris-wilson.co.uk
On the DMA mapping error path, sg may be NULL (it has already been
marked as the last scatterlist entry), and we should avoid dereferencing
it again.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: e227330223 ("drm/i915: avoid leaking DMA mappings")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Cc: stable@vger.kernel.org
Link: http://patchwork.freedesktop.org/patch/msgid/20161114112930.2033-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
I tried to avoid having to track the write for every VMA by only
tracking writes to the ggtt. However, for the purposes of frontbuffer
tracking this is insufficient as we need to invalidate around writes not
just to the the ggtt but all aliased ppgtt views of the framebuffer. By
moving the critical section to the object and only doing so for
framebuffer writes we can reduce the tracking even further by only
watching framebuffers and not vma.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161116190704.5293-1-chris@chris-wilson.co.uk
Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
We use DRM_DEBUG() when reporting on user actions, to try and keep
intentional errors out of the CI dmesg. Demote the debug from
i915_gem_open() similarly so that it is only apparent with drm.debug & 1
like its brethren.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20161109104507.21228-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Started with removing INTEL_INFO(dev) and cascaded into a quite
big trickle of function prototype changes. Still, I think it is
for the better.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Plus a small cascade of function prototype changes.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Boost the priority of any rendering required to show the next pageflip
as we want to avoid missing the vblank by being delayed by invisible
workload. We prioritise avoiding jank and jitter in the GUI over
starving background tasks.
v2: Descend dma_fence_array when boosting priorities.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-10-chris@chris-wilson.co.uk
Track the priority of each request and use it to determine the order in
which we submit requests to the hardware via execlists.
The priority of the request is determined by the user (eventually via
the context) but may be overridden at any time by the driver. When we set
the priority of the request, we bump the priority of all of its
dependencies to match - so that a high priority drawing operation is not
stuck behind a background task.
When the request is ready to execute (i.e. we have signaled the submit
fence following completion of all its dependencies, including third
party fences), we put the request into a priority sorted rbtree to be
submitted to the hardware. If the request is higher priority than all
pending requests, it will be submitted on the next context-switch
interrupt as soon as the hardware has completed the current request. We
do not currently preempt any current execution to immediately run a very
high priority request, at least not yet.
One more limitation, is that this is first implementation is for
execlists only so currently limited to gen8/gen9.
v2: Replace recursive priority inheritance bumping with an iterative
depth-first search list.
v3: list_next_entry() for walking lists
v4: Explain how the dfs solves the recursion problem with PI.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-8-chris@chris-wilson.co.uk
The scheduler needs to know the dependencies of each request for the
lifetime of the request, as it may choose to reschedule the requests at
any time and must ensure the dependency tree is not broken. This is in
additional to using the fence to only allow execution after all
dependencies have been completed.
One option was to extend the fence to support the bidirectional
dependency tracking required by the scheduler. However the mismatch in
lifetimes between the submit fence and the request essentially meant
that we had to build a completely separate struct (and we could not
simply reuse the existing waitqueue in the fence for one half of the
dependency tracking). The extra dependency tracking simply did not mesh
well with the fence, and keeping it separate both keeps the fence
implementation simpler and allows us to extend the dependency tracking
into a priority tree (whilst maintaining support for reordering the
tree).
To avoid the additional allocations and list manipulations, the use of
the priotree is disabled when there are no schedulers to use it.
v2: Create a dedicated slab for i915_dependency.
Rename the lists.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-7-chris@chris-wilson.co.uk
In order to simplify the lockdep annotation, as they become more complex
in the future with deferred execution and multiple paths through the
same functions, create a separate lockclass for the user timeline and
the hardware execution timeline.
We should only ever be locking the user timeline and the execution
timeline in parallel so we only need to create two lock classes, rather
than a separate class for every timeline.
v2: Rename the lock classes to be more consistent with other lockdep.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-2-chris@chris-wilson.co.uk
When we release the shmem backing storage, we make sure that the pages
are coherent with the cpu cache. However, our clflush routine was
skipping the flush as the object had no pages at release time. Fix this by
explicitly flushing the sg_table we are decoupling.
Fixes: 03ac84f183 ("drm/i915: Pass around sg_table to get_pages/put_pages backend")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161111145809.9701-2-chris@chris-wilson.co.uk
A small selection of macros which can only accept dev_priv from
now on and a resulting trickle of fixups.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>