The shrinker is prepared to be called unlocked (and at other times with
struct_mutex held for DIRECT_RECLAIM) so we can skip acquiring the
struct_mutex prior to calling the shrinker during freeze. This improves
our ability to shrink as we can be more aggressive when we know the
caller isn't holding struct_mutex.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170407102552.5781-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Many sightings report the greater prevalence of allocation failures.
This is all due to the incorrect use of mapping_gfp_constraint(), so
remove it in favour of just querying the mapping_gfp_mask() which are
the exact gfp_t we wanted in the first place.
We still do expect a higher chance of reporting ENOMEM, as that is the
intention of using __GFP_NORETRY -- to fail rather than oom after having
reclaimed from our bo caches, and having done a direct|kswapd reclaim
pass.
Reported-by: Jason Ekstrand <jason.ekstrand@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100594
Fixes: 24f8e00a8a ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170405221514.23251-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
i915 is currently doing a full GPU reset at the end of
i915_gem_suspend() followed by GuC suspend in i915_drm_suspend(). This
GPU reset clobbers the GuC, causing the suspend request to then fail,
leaving the GuC in an undefined state. We need to tell the GuC to
suspend before we do the direct intel_gpu_reset().
v2: Commit message update. (Chris, Daniele)
Fixes: 1c777c5d1d ("drm/i915/hsw: Fix GPU hang during resume from S3-devices state")
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1491387710-20553-1-git-send-email-sagar.a.kamble@intel.com
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Make i915_gem_wait_for_idle() be a little heavier in order to try and
guarantee that the GPU is indeed idle (by checking each engine
individually is idle, i.e. all writes are complete and the rings
stopped) after waiting for in-flight requests to be completed.
v2: And return the final error.
v3: Break the wait_for() out from under the WARN -- the macro expansion
is hideous and unreadable in the warning message
v4: If wait_for_engine() fails the result is catastrophic, mark the
device as wedged and wait for the repair team.
References: https://bugs.freedesktop.org/show_bug.cgi?id=98836
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170330145041.9005-4-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
As we now distinguish everywhere that can call
i915_gem_retire_requests() following a successful wait_for_idle, we can
remove the duplication by moving that call into i915_gem_wait_for_idle()
itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170330145041.9005-3-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
If the driver is wedged, HW state may be very inconsistent and
report that it is still busy, even though we have stopped using it. This
can lead to a double *ERROR* rather than a graceful cleanup after
wedging.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170330145041.9005-2-chris@chris-wilson.co.uk
As we declare an engine as wedged, we mark all of its active requests as
in error. However, we don't want to mark successfully completed requests
as in error, which requires us to retire those requests first.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170330145041.9005-1-chris@chris-wilson.co.uk
The should happen as soon as possible, but always within the logic that
depends on it (and not interrupting the top-level driver control flow).
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1490720027-23234-1-git-send-email-oscar.mateo@intel.com
Starting with intel_guc_loader, down to intel_guc_submission
and finally to intel_guc_log.
v2:
- Null execbuf client outside guc_client_free (Daniele)
- Assert if things try to get allocated twice (Daniele/Joonas)
- Null guc->log.buf_addr when destroyed (Daniele)
- Newline between returning success and error labels (Joonas)
- Remove some unnecessary comments (Joonas)
- Keep guc_log_create_extras naming convention (Joonas)
- Helper function guc_log_has_extras (Joonas)
- No need for separate relay_channel create/destroy. It's just another extra.
- No need to nullify guc->log.flush_wq when destroyed (Joonas)
- Hoist the check for has_extras out of guc_log_create_extras (Joonas)
- Try to do i915_guc_log_register/unregister calls (kind of) symmetric (Daniele)
- Make sure initel_guc_fini is not called before init is ever called (Daniele)
v3:
- Remove unnecessary parenthesis (Joonas)
- Check for logs enabled on debugfs registration
- Rebase on top of Tvrtko's "Fix request re-submission after reset"
v4:
- Rebased
- Comment around enabling/disabling interrupts inside GuC logging (Joonas)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Words cannot describe the embarrassment at creating a new gfp_t relaim to
only prevent the oomkiller but allow direct|kswapd reclaim, and then not
use it in the shmem_read_mapping_page_gfp().
Fixes: 24f8e00a8a ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170322223447.7493-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Since gfx allocations tend to be large, unmovable and disposable, report
the allocation failure back to userspace as an ENOMEM rather than incur
the oomkiller. We have already tried to make room by purging our own
cached gfx objects, and the oomkiller doesn't attribute ownership of gfx
objects so will likely pick the wrong candidate. Instead, let userspace
see the ENOMEM.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170322110521.29930-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The trick of using an uncached mmio read to ensure that the GGTT writes
are flushed does not require us to do the forcewake dance, so avoid it
in the hope of reducing the frequency that we do keep the device forced
awake.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170318104257.694-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
When we wedge the device, we override engine->submit_request with a nop
to ensure that all in-flight requests are marked in error. However, igt
would like to unwedge the device to test -EIO handling. This requires us
to flush those in-flight requests and restore the original
engine->submit_request.
v2: Use a vfunc to unify enabling request submission to engines
v3: Split new vfunc to a separate patch.
v4: Make the wait interruptible -- the third party fences we wait upon
may be indefinitely broken, so allow the reset to be aborted.
Fixes: 821ed7df6e ("drm/i915: Update reset path to fix incomplete requests")
Testcase: igt/gem_eio
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v3
Link: http://patchwork.freedesktop.org/patch/msgid/20170316171305.12972-3-chris@chris-wilson.co.uk
I915_RESET_IN_PROGRESS is being used for both signaling the requirement
to i915_mutex_lock_interruptible() to avoid taking the struct_mutex and
to instruct a waiter (already holding the struct_mutex) to perform the
reset. To allow for a little more coordination, split these two meaning
into a couple of distinct flags. I915_RESET_BACKOFF tells
i915_mutex_lock_interruptible() not to acquire the mutex and
I915_RESET_HANDOFF tells the waiter to call i915_reset().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170316171305.12972-1-chris@chris-wilson.co.uk
Current version of intel_guc_init_hw() does a lot:
- cares about submission
- loads huc
- implement WA
This change offloads some of the logic to intel_uc_init_hw(), which now
cares about the above.
v2: rename guc_hw_reset and fix typo in define name (M. Wajdeczko)
v3: rename once again
v4: remove spurious comments and add some style (J. Lahtinen)
v5: flow changes, got rid of dead checks (M. Wajdeczko)
v6: rebase
v7: rebase & onion teardown (J. Lahtinen)
Cc: Anusha Srivatsa <anusha.srivatsa@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
GuC historically has two "startup" functions called _init() and _setup()
Then HuC came with it's _init() and _load().
This commit renames intel_guc_setup() and intel_huc_load() to
*uc_init_hw() as they called from the i915_gem_init_hw().
The aim is to be consistent in that entry points called during
particular driver init phases (e.g. init_hw) are all suffixed by that
phase. When reading the leaf functions, it should be clear at what stage
during the driver load it is called and therefore what operations are
legal at that point.
Also, since the functions start with intel_guc and intel_huc they take
appropiate structure.
v2: commit message update (Chris Wilson)
v3: change taken parameters to be more "semantic" (M. Wajdeczko)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
This patch makes the I915_PARAM_HAS_EXEC_CONSTANTS getparam return 0
(indicating the optional feature is not supported), and makes execbuf
always return -EINVAL if the flags are used.
Apparently, no userspace ever shipped which used this optional feature:
I checked the git history of Mesa, xf86-video-intel, libva, and Beignet,
and there were zero commits showing a use of these flags. Kernel commit
72bfa19c8d apparently introduced the feature prematurely. According
to Chris, the intention was to use this in cairo-drm, but "the use was
broken for gen6", so I don't think it ever happened.
'relative_constants_mode' has always been tracked per-device, but this
has actually been wrong ever since hardware contexts were introduced, as
the INSTPM register is saved (and automatically restored) as part of the
render ring context. The software per-device value could therefore get
out of sync with the hardware per-context value. This meant that using
them is actually unsafe: a client which tried to use them could damage
the state of other clients, causing the GPU to interpret their BO
offsets as absolute pointers, leading to bogus memory reads.
These flags were also never ported to execlist mode, making them no-ops
on Gen9+ (which requires execlists), and Gen8 in the default mode.
On Gen8+, userspace can write these registers directly, achieving the
same effect. On Gen6-7.5, it likely makes sense to extend the command
parser to support them. I don't think anyone wants this on Gen4-5.
Based on a patch by Dave Gordon.
v3: Return -ENODEV for the getparam, as this is what we do for other
obsolete features. Suggested by Chris Wilson.
Cc: stable@vger.kernel.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92448
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170215093446.21291-1-kenneth@whitecape.org
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170313170433.26843-1-chris@chris-wilson.co.uk
(cherry picked from commit ef0f411f51)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
When we restart the engines, and we have active requests, a request on
the first engine may complete and queue a request to the second engine
before we try to restart the second engine. That queueing of the
request may race with the engine to restart, and so may corrupt the
current state. Disabling the engine->irq_tasklet prevents the two paths
from writing into ELSP simultaneously (and modifyin the execlists_port[]
at the same time).
Include fixup 1d309634bc ("drm/i915: Kill the tasklet then disable")
Fixes: 821ed7df6e ("drm/i915: Update reset path to fix incomplete requests")
Testcase: igt/gem_exec_fence/await-hang
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170208143033.11651-3-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/20170313165958.13970-2-chris@chris-wilson.co.uk
(cherry picked from commit 1f7b847d72)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Currently we do a reset prepare/finish around the call to reset the GPU,
but it looks like we need a later stage after the hw has been
reinitialised to allow GEM to restart itself. Start by splitting the 2
GEM phases into 3:
prepare - before the reset, check if GEM recovered, then stop GEM
reset - after the reset, update GEM bookkeeping
finish - after the re-initialisation following the reset, restart GEM
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170208143033.11651-2-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/20170313165958.13970-1-chris@chris-wilson.co.uk
(cherry picked from commit d802709313)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
If the object is coherent, we can simply update the cache domain on the
whole object rather than calculate the before/after clflushes. The
advantage is that we then get correct tracking of ellided flushes when
changing coherency later.
Testcase: igt/gem_pwrite_snooped
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170310000942.11661-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Before we instantiate/pin the backing store for our use, we
can prepopulate the shmemfs filp efficiently using a write into the
pagecache. We avoid the penalty of instantiating all the pages, important
if the user is just writing to a few and never uses the object on the GPU,
and using a direct write into shmemfs allows it to avoid the cost of
retrieving a page (mostly the clear-before-use, but in theory we could
curtail swapin) before it is overwritten.
This can be extended later to provide additional specialisation for
other backends (other than shmemfs). For now it provides a defense
against very large write-only allocations from exhausting all of system
memory.
v2: Smelling fixes.
Fixes: fe115628d5 ("drm/i915: Implement pwrite without struct-mutex")
References: https://bugs.freedesktop.org/show_bug.cgi?id=99107
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.10+
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170307120338.7277-2-chris@chris-wilson.co.uk
(cherry picked from commit 7c55e2c577)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Once the object has been truncated, it is unrecoverable. To facilitate
detection of this state store the error in obj->mm.pages.
This is required for the next patch which should be applied to v4.10
(via stable), so we also need to mark this patch for backporting. In
that regard, let's consider this to be a fix/improvement too.
v2: Avoid dereferencing the ERR_PTR when freeing the object.
Fixes: 1233e2db19 ("drm/i915: Move object backing storage manipulation to its own locking")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.10+
Link: http://patchwork.freedesktop.org/patch/msgid/20170307132031.32461-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit 4e5462ee84)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
We wait upon jiffies, but report the time elapsed using a
high-resolution timer. This discrepancy can lead to us timing out the
wait prior to us reporting the elapsed time as complete.
This restores the squelching lost in commit e95433c73a ("drm/i915:
Rearrange i915_wait_request() accounting with callers").
Fixes: e95433c73a ("drm/i915: Rearrange i915_wait_request() accounting with callers")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Cc: stable@vger.kernel.org
Link: http://patchwork.freedesktop.org/patch/msgid/20170216125441.30923-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit c1d2061b28)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Backmerge drm-next to get at all the good stuff in drm-misc. We need
that because:
- drm_connector_list_iter conversion for i915 needs the core patches.
- Maarten's patches to use the new atomic state iterators also need
the core patches.
- We need the new link status property to complete the DP retraining
work, merging through 2 branches wasn't a good idea and we had to
partially backtrack.
- Chris needs reservation_object_trylock and we want to roll out
kref_read everywhere.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
4 weeks worth of stuff since I was traveling&lazy:
- lspcon improvements (Imre)
- proper atomic state for cdclk handling (Ville)
- gpu reset improvements (Chris)
- lots and lots of polish around fences, requests, waiting and
everything related all over (both gem and modeset code), from Chris
- atomic by default on gen5+ minus byt/bsw (Maarten did the patch to
flip the default, really this is a massive joint team effort)
- moar power domains, now 64bit (Ander)
- big pile of in-kernel unit tests for various gem subsystems (Chris),
including simple mock objects for i915 device and and the ggtt
manager.
- i915_gpu_info in debugfs, for taking a snapshot of the current gpu
state. Same thing as i915_error_state, but useful if the kernel didn't
notice something is stick. From Chris.
- bxt dsi fixes (Umar Shankar)
- bxt w/a updates (Jani)
- no more struct_mutex for gem object unreference (Chris)
- some execlist refactoring (Tvrtko)
- color manager support for glk (Ander)
- improve the power-well sync code to better take over from the
firmware (Imre)
- gem tracepoint polish (Tvrtko)
- lots of glk fixes all around (Ander)
- ctx switch improvements (Chris)
- glk dsi support&fixes (Deepak M)
- dsi fixes for vlv and clanups, lots of them (Hans de Goede)
- switch to i915.ko types in lots of our internal modeset code (Ander)
- byt/bsw atomic wm update code, yay (Ville)
* tag 'drm-intel-next-2017-03-06' of git://anongit.freedesktop.org/git/drm-intel: (432 commits)
drm/i915: Update DRIVER_DATE to 20170306
drm/i915: Don't use enums for hardware engine id
drm/i915: Split breadcrumbs spinlock into two
drm/i915: Refactor wakeup of the next breadcrumb waiter
drm/i915: Take reference for signaling the request from hardirq
drm/i915: Add FIFO underrun tracepoints
drm/i915: Add cxsr toggle tracepoint
drm/i915: Add VLV/CHV watermark/FIFO programming tracepoints
drm/i915: Add plane update/disable tracepoints
drm/i915: Kill level 0 wm hack for VLV/CHV
drm/i915: Workaround VLV/CHV sprite1->sprite0 enable underrun
drm/i915: Sanitize VLV/CHV watermarks properly
drm/i915: Only use update_wm_{pre,post} for pre-ilk platforms
drm/i915: Nuke crtc->wm.cxsr_allowed
drm/i915: Compute proper intermediate wms for vlv/cvh
drm/i915: Skip useless watermark/FIFO related work on VLV/CHV when not needed
drm/i915: Compute vlv/chv wms the atomic way
drm/i915: Compute VLV/CHV FIFO sizes based on the PM2 watermarks
drm/i915: Plop vlv/chv fifo sizes into crtc state
drm/i915: Plop vlv wm state into crtc_state
...
Before we instantiate/pin the backing store for our use, we
can prepopulate the shmemfs filp efficiently using a write into the
pagecache. We avoid the penalty of instantiating all the pages, important
if the user is just writing to a few and never uses the object on the GPU,
and using a direct write into shmemfs allows it to avoid the cost of
retrieving a page (mostly the clear-before-use, but in theory we could
curtail swapin) before it is overwritten.
This can be extended later to provide additional specialisation for
other backends (other than shmemfs). For now it provides a defense
against very large write-only allocations from exhausting all of system
memory.
v2: Smelling fixes.
Fixes: fe115628d5 ("drm/i915: Implement pwrite without struct-mutex")
References: https://bugs.freedesktop.org/show_bug.cgi?id=99107
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.10+
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170307120338.7277-2-chris@chris-wilson.co.uk
Once the object has been truncated, it is unrecoverable. To facilitate
detection of this state store the error in obj->mm.pages.
This is required for the next patch which should be applied to v4.10
(via stable), so we also need to mark this patch for backporting. In
that regard, let's consider this to be a fix/improvement too.
v2: Avoid dereferencing the ERR_PTR when freeing the object.
Fixes: 1233e2db19 ("drm/i915: Move object backing storage manipulation to its own locking")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.10+
Link: http://patchwork.freedesktop.org/patch/msgid/20170307132031.32461-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Adding to the tail of the client request list as the only other user is
in the throttle ioctl that iterates forwards over the list. It only
needs protection against deletion of a request as it reads it, it simply
won't see a new request added to the end of the list, or it would be too
early and rejected. We can further reduce the number of spinlocks
required when throttling by removing stale requests from the client_list
as we throttle.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170302122525.19675-1-chris@chris-wilson.co.uk
A significant cost in setting up a wait is the overhead of enabling the
interrupt. As we disable the interrupt whenever the queue of waiters is
empty, if we are frequently waiting on alternating batches, we end up
re-enabling the interrupt on a frequent basis. We do want to disable the
interrupt during normal operations as under high load it may add several
thousand interrupts/s - we have been known in the past to occupy whole
cores with our interrupt handler after accidentally leaving user
interrupts enabled. As a compromise, leave the interrupt enabled until
the next IRQ, or the system is idle. This gives a small window for a
waiter to keep the interrupt active and not be delayed by having to
re-enable the interrupt.
v2: Restore hangcheck/missed-irq detection for continuations
v3: Be more careful restoring the hangcheck timer after reset
v4: Be more careful restoring the fake irq after reset (if required!)
v5: Redo changes to intel_engine_wakeup()
v6: Factor out __intel_engine_wakeup()
v7: Improve commentary for declaring a missed wakeup
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170227205850.2828-4-chris@chris-wilson.co.uk
->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
take a vma and vmf parameter when the vma already resides in vmf.
Remove the vma parameter to simplify things.
[arnd@arndb.de: fix ARM build]
Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch makes the I915_PARAM_HAS_EXEC_CONSTANTS getparam return 0
(indicating the optional feature is not supported), and makes execbuf
always return -EINVAL if the flags are used.
Apparently, no userspace ever shipped which used this optional feature:
I checked the git history of Mesa, xf86-video-intel, libva, and Beignet,
and there were zero commits showing a use of these flags. Kernel commit
72bfa19c8d apparently introduced the feature prematurely. According
to Chris, the intention was to use this in cairo-drm, but "the use was
broken for gen6", so I don't think it ever happened.
'relative_constants_mode' has always been tracked per-device, but this
has actually been wrong ever since hardware contexts were introduced, as
the INSTPM register is saved (and automatically restored) as part of the
render ring context. The software per-device value could therefore get
out of sync with the hardware per-context value. This meant that using
them is actually unsafe: a client which tried to use them could damage
the state of other clients, causing the GPU to interpret their BO
offsets as absolute pointers, leading to bogus memory reads.
These flags were also never ported to execlist mode, making them no-ops
on Gen9+ (which requires execlists), and Gen8 in the default mode.
On Gen8+, userspace can write these registers directly, achieving the
same effect. On Gen6-7.5, it likely makes sense to extend the command
parser to support them. I don't think anyone wants this on Gen4-5.
Based on a patch by Dave Gordon.
v3: Return -ENODEV for the getparam, as this is what we do for other
obsolete features. Suggested by Chris Wilson.
Cc: stable@vger.kernel.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92448
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170215093446.21291-1-kenneth@whitecape.org
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
A request is assigned a global seqno only when it is on the hardware
execution queue. The global seqno can be used to maintain a list of
requests on the same engine in retirement order, for example for
constructing a priority queue for waiting. Prior to its execution, or
if it is subsequently removed in the event of preemption, its global
seqno is zero. As both insertion and removal from the execution queue
may operate in IRQ context, it is not guarded by the usual struct_mutex
BKL. Instead those relying on the global seqno must be prepared for its
value to change between reads. Only when the request is complete can
the global seqno be stable (due to the memory barriers on submitting
the commands to the hardware to write the breadcrumb, if the HWS shows
that it has passed the global seqno and the global seqno is unchanged
after the read, it is indeed complete).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-9-chris@chris-wilson.co.uk
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJYoM2fAAoJEHm+PkMAQRiGr9MH/izEAMri7rJ0QMc3ejt+WmD0
8pkZw3+MVn71z6cIEgpzk4QkEWJd5rfhkETCeCp7qQ9V6cDW1FDE9+0OmPjiphDt
nnzKs7t7skEBwH5Mq5xygmIfkv+Z0QGHZ20gfQWY3F56Uxo+ARF88OBHBLKhqx3v
98C7YbMFLKBslKClA78NUEIdx0UfBaRqerlERx0Lfl9aoOrbBS6WI3iuREiylpih
9o7HTrwaGKkU4Kd6NdgJP2EyWPsd1LGalxBBjeDSpm5uokX6ALTdNXDZqcQscHjE
RmTqJTGRdhSThXOpNnvUJvk9L442yuNRrVme/IqLpxMdHPyjaXR3FGSIDb2SfjY=
=VMy8
-----END PGP SIGNATURE-----
Merge tag 'v4.10-rc8' into drm-next
Linux 4.10-rc8
Backmerge Linus rc8 to fix some conflicts, but also
to avoid pulling it in via a fixes pull from someone.
Setting retire=true is identical to using origin=ORIGIN_CS, so make the
same simplification to intel_fb_obj_flush() as already employed for
intel_fb_obj_invalidate().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170222114049.28456-6-chris@chris-wilson.co.uk
Flushing the cachelines for an object is slow, can be as much as 100ms
for a large framebuffer. We currently do this under the struct_mutex BKL
on execution or on pageflip. But now with the ability to add fences to
obj->resv for both flips and execbuf (and we naturally wait on the fence
before CPU access), we can move the clflush operation to a workqueue and
signal a fence for completion, thereby doing the work asynchronously and
not blocking the driver or its clients.
v2: Introduce i915_gem_clflush.h and use a new name, split out some
extras into separate patches.
Suggested-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170222114049.28456-5-chris@chris-wilson.co.uk
We have three different paths by which userspace wants to flush the
display plane (i.e. objects with obj->pin_display). Use a common helper
to identify those paths and to simplify a later change.
v2: Include the conditional in the name, i915_gem_object_flush_if_display
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170222114049.28456-3-chris@chris-wilson.co.uk
The change_domain tracepoint has been inaccurate for a few years - it
doesn't fully capture the domains, especially with userspace bypassing
them. It is defunct, misleading and time to be removed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170222114049.28456-1-chris@chris-wilson.co.uk
If we wait upon the full (i.e. all shared fences, or upon an exclusive
fence) reservation object successfully, we know that all fences beneath
it have been signaled, so long as no new fences were added whilst we
slept. If the reservation_object remains the same, as detected by its
seqcount, we can then reap all the fences upon completion.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170217151304.16665-6-chris@chris-wilson.co.uk
The physical object is treated as permanently pinned. If we fail to take
this initial pin during i915_gem_object_attach_phys() we need to revert
it back to an ordinary shmemfs object before reporting the failure.
v2: git-add
Reported-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170215163900.11606-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
We wait upon jiffies, but report the time elapsed using a
high-resolution timer. This discrepancy can lead to us timing out the
wait prior to us reporting the elapsed time as complete.
This restores the squelching lost in commit e95433c73a ("drm/i915:
Rearrange i915_wait_request() accounting with callers").
Fixes: e95433c73a ("drm/i915: Rearrange i915_wait_request() accounting with callers")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Cc: stable@vger.kernel.org
Link: http://patchwork.freedesktop.org/patch/msgid/20170216125441.30923-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
The i915_gem_object_wait_fence() uses an incoming timeout=0 to query
whether the current fence is busy or idle, without waiting. This can be
used by the wait-ioctl to implement a busy query.
Fixes: e95433c73a ("drm/i915: Rearrange i915_wait_request() accounting with callers")
Testcase: igt/gem_wait/basic-busy-write-all
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Cc: stable@vger.kernel.org
Link: http://patchwork.freedesktop.org/patch/msgid/20170212215344.16600-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit d892e9398e)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Following a reset, the context and page directory registers are lost.
However, the queue of requests that we resubmit after the reset may
depend upon them - the registers are restored from a context image, but
that restore may be inhibited and may simply be absent from the request
if it was in the middle of a sequence using the same context. If we
prime the CCID/PD registers with the first request in the queue (even
for the hung request), we prevent invalid memory access for the
following requests (and continually hung engines).
v2: Magic BIT(8), reserved for future use but still appears unused.
v3: Some commentary on handling innocent vs guilty requests
v4: Add a wait for PD_BASE fetch. The reload appears to be instant on my
Ivybridge, but this bit probably exists for a reason.
Fixes: 821ed7df6e ("drm/i915: Update reset path to fix incomplete requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170207152437.4252-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
(cherry picked from commit c0dcb203fb)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
The i915_gem_object_wait_fence() uses an incoming timeout=0 to query
whether the current fence is busy or idle, without waiting. This can be
used by the wait-ioctl to implement a busy query.
Fixes: e95433c73a ("drm/i915: Rearrange i915_wait_request() accounting with callers")
Testcase: igt/gem_wait/basic-busy-write-all
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Cc: stable@vger.kernel.org
Link: http://patchwork.freedesktop.org/patch/msgid/20170212215344.16600-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Write into an object using WB, WC, GTT, and GPU paths and make sure that
our internal API is sufficient to ensure coherent reads and writes.
v2: Avoid invalid free upon allocation error
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170213171558.20942-21-chris@chris-wilson.co.uk
The phys object is a rarely used device (only very old machines require
a chunk of physically contiguous pages for a few hardware interactions).
As such, it is not exercised by CI and to combat that we want to add a
test that exercises the phys object on all platforms.
v2: Always set err on error paths and not rely on inheriting the err.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170213171558.20942-17-chris@chris-wilson.co.uk
We would like to be able to exercise huge allocations even on memory
constrained devices. To do this we create an object that allocates only
a few pages and remaps them across its whole range - each page is reused
multiple times. We can therefore pretend we are rendering into a much
larger object.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170213171558.20942-9-chris@chris-wilson.co.uk
Start exercising the scattergather lists, especially looking at
iteration after coalescing.
v2: Comment on the peculiarity of table construction (i.e. why this
sg_table might be interesting).
v3: Added one __func__ to identify expect_pfn_sg()
v4: Loop until we have crossed the chain boundary (forcing sg_table to
do multiple allocations) before squelching a potential ENOMEM from oom.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170213171558.20942-2-chris@chris-wilson.co.uk
Following a hang and reset, we know that the engine is idle and all
context state has been saved or lost. Consequently, we know that the
engine is no longer referencing the last context and we can relinquish
our tracking.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tursulin@ursulin.net>
Link: http://patchwork.freedesktop.org/patch/msgid/20170212172002.23072-5-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
The signal threads may be running concurrently with the GPU reset. The
completion from the GPU run asynchronous with the reset and two threads
may see different snapshots of the state, and the signaler may mark a
request as complete as we try to reset it. We don't tolerate 2 different
views of the same state and complain if we try to mark a request as
failed if it is already complete. Disable the signal threads during
reset to prevent this conflict (even though the conflict implies that
the state we resetting to is invalid, we have already made our
decision!).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99733
References: https://bugs.freedesktop.org/show_bug.cgi?id=99671
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170212172002.23072-4-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Disabling the tasklet leaves it if scheduled on the ready to run list
until it is re-enabled. This will leave the ksoftird thread spinning
until satisfied. To prevent this situation on starting the GPU reset, we
want to kill the tasklet first and then disable. The same problem will
arise when a tasklet is scheduled from another device, so a better
solution is required for the general case.
Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 1f7b847d72 ("drm/i915: Disable engine->irq_tasklet around resets")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170212172002.23072-3-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
When updating the bookkeeping following the reset, we need the seqno to
be coherent on the CPU prior to trusting its result for deciding whether
any request is completed. We need the irq_barrier before we start making
these decisions, i.e. in reset_prepare.
References: https://bugs.freedesktop.org/show_bug.cgi?id=99733
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170210185214.23463-1-chris@chris-wilson.co.uk
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
As dmabufs may live beyond the PCI device removal, we need to flush the
freed object worker on device release, and include a warning in case
there is a leak.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170210163523.17533-3-chris@chris-wilson.co.uk
Chris Wilson needs the new drm_driver->release callback to make sure
the shiny new dma-buf testcases don't oops the driver on unload.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
When we restart the engines, and we have active requests, a request on
the first engine may complete and queue a request to the second engine
before we try to restart the second engine. That queueing of the
request may race with the engine to restart, and so may corrupt the
current state. Disabling the engine->irq_tasklet prevents the two paths
from writing into ELSP simultaneously (and modifyin the execlists_port[]
at the same time).
Fixes: 821ed7df6e ("drm/i915: Update reset path to fix incomplete requests")
Testcase: igt/gem_exec_fence/await-hang
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170208143033.11651-3-chris@chris-wilson.co.uk
Currently we do a reset prepare/finish around the call to reset the GPU,
but it looks like we need a later stage after the hw has been
reinitialised to allow GEM to restart itself. Start by splitting the 2
GEM phases into 3:
prepare - before the reset, check if GEM recovered, then stop GEM
reset - after the reset, update GEM bookkeeping
finish - after the re-initialisation following the reset, restart GEM
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170208143033.11651-2-chris@chris-wilson.co.uk
Just a simple refactor to move a loop and some support code out of
i915_gem_init_hw(). This is in preparation for avoiding a race between
the tasklet writing to ELSP whilst simultaneously being written by
engine->init_hw() following a GPU reset.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170208143033.11651-1-chris@chris-wilson.co.uk
Since to unbind an object, we may need a powered up device to access the
GTT entries, we only shrink bound objects if awake. Callers to
i915_gem_shrink_all() had to take this into account and take the rpm
wakeref, but we can move this wakeref into the shrink_all itself for
convenience and making the function live up to its name.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170208104710.18089-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
The goal of the WARN was to catch when we are still actively using the
fence as we go into the runtime suspend. However, the reg->pin_count is
too coarse as it does not distinguish between exclusive ownership of the
fence register from activity.
I've not improved on the WARN, nor have we captured this WARN in an
exact igt, but it is showing up regularly in the wild:
[ 1915.935332] WARNING: CPU: 1 PID: 10861 at drivers/gpu/drm/i915/i915_gem.c:2022 i915_gem_runtime_suspend+0x116/0x130 [i915]
[ 1915.935383] WARN_ON(reg->pin_count)[ 1915.935399] Modules linked in:
snd_hda_intel i915 drm_kms_helper vgem netconsole scsi_transport_iscsi fuse vfat fat x86_pkg_temp_thermal coretemp intel_cstate intel_uncore snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mei_me mei serio_raw intel_rapl_perf intel_pch_thermal soundcore wmi acpi_pad i2c_algo_bit syscopyarea sysfillrect sysimgblt fb_sys_fops drm r8169 mii video [last unloaded: drm_kms_helper]
[ 1915.935785] CPU: 1 PID: 10861 Comm: kworker/1:0 Tainted: G U W 4.9.0-rc5+ #170
[ 1915.935799] Hardware name: LENOVO 80MX/Lenovo E31-80, BIOS DCCN34WW(V2.03) 12/01/2015
[ 1915.935822] Workqueue: pm pm_runtime_work
[ 1915.935845] ffffc900044fbbf0 ffffffffac3220bc ffffc900044fbc40 0000000000000000
[ 1915.935890] ffffc900044fbc30 ffffffffac059bcb 000007e6044fbc60 ffff8801626e3198
[ 1915.935937] ffff8801626e0000 0000000000000002 ffffffffc05e5d4e 0000000000000000
[ 1915.935985] Call Trace:
[ 1915.936013] [<ffffffffac3220bc>] dump_stack+0x4f/0x73
[ 1915.936038] [<ffffffffac059bcb>] __warn+0xcb/0xf0
[ 1915.936060] [<ffffffffac059c4f>] warn_slowpath_fmt+0x5f/0x80
[ 1915.936158] [<ffffffffc052d916>] i915_gem_runtime_suspend+0x116/0x130 [i915]
[ 1915.936251] [<ffffffffc04f1c74>] intel_runtime_suspend+0x64/0x280 [i915]
[ 1915.936277] [<ffffffffac0926f1>] ? dequeue_entity+0x241/0xbc0
[ 1915.936298] [<ffffffffac36bb85>] pci_pm_runtime_suspend+0x55/0x180
[ 1915.936317] [<ffffffffac36bb30>] ? pci_pm_runtime_resume+0xa0/0xa0
[ 1915.936339] [<ffffffffac4514e2>] __rpm_callback+0x32/0x70
[ 1915.936356] [<ffffffffac451544>] rpm_callback+0x24/0x80
[ 1915.936375] [<ffffffffac36bb30>] ? pci_pm_runtime_resume+0xa0/0xa0
[ 1915.936392] [<ffffffffac45222d>] rpm_suspend+0x12d/0x680
[ 1915.936415] [<ffffffffac69f6d7>] ? _raw_spin_unlock_irq+0x17/0x30
[ 1915.936435] [<ffffffffac0810b8>] ? finish_task_switch+0x88/0x220
[ 1915.936455] [<ffffffffac4534bf>] pm_runtime_work+0x6f/0xb0
[ 1915.936477] [<ffffffffac074353>] process_one_work+0x1f3/0x4d0
[ 1915.936501] [<ffffffffac074678>] worker_thread+0x48/0x4e0
[ 1915.936523] [<ffffffffac074630>] ? process_one_work+0x4d0/0x4d0
[ 1915.936542] [<ffffffffac074630>] ? process_one_work+0x4d0/0x4d0
[ 1915.936559] [<ffffffffac07a2c9>] kthread+0xd9/0xf0
[ 1915.936580] [<ffffffffac07a1f0>] ? kthread_park+0x60/0x60
[ 1915.936600] [<ffffffffac69fe62>] ret_from_fork+0x22/0x30
In the case the register is pinned, it should be present and we will
need to invalidate them to be restored upon resume as we cannot expect
the owner of the pin to call get_fence prior to use after resume.
Fixes: 7c108fd8fe ("drm/i915: Move fence cancellation to runtime suspend")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98804
Reported-by: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Imre Deak <imre.deak@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Link: http://patchwork.freedesktop.org/patch/msgid/20170203125717.8431-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit e0ec3ec698)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
On a non-llc system, the objects are created with .cache_level =
CACHE_NONE and so the transition to uncached for scanout is a no-op.
However, if the object was never written to, it will still be in the CPU
domain (having been zeroed out by shmemfs). Those cachelines need to be
flushed prior to display.
Reported-and-tested-by: Vito Caputo
Fixes: a6a7cc4b7d ("drm/i915: Always flush the dirty CPU cache when pinning the scanout")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Link: http://patchwork.freedesktop.org/patch/msgid/20170109111932.6342-1-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(cherry picked from commit 69aeafeae9)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Following a reset, the context and page directory registers are lost.
However, the queue of requests that we resubmit after the reset may
depend upon them - the registers are restored from a context image, but
that restore may be inhibited and may simply be absent from the request
if it was in the middle of a sequence using the same context. If we
prime the CCID/PD registers with the first request in the queue (even
for the hung request), we prevent invalid memory access for the
following requests (and continually hung engines).
v2: Magic BIT(8), reserved for future use but still appears unused.
v3: Some commentary on handling innocent vs guilty requests
v4: Add a wait for PD_BASE fetch. The reload appears to be instant on my
Ivybridge, but this bit probably exists for a reason.
Fixes: 821ed7df6e ("drm/i915: Update reset path to fix incomplete requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170207152437.4252-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
The goal of the WARN was to catch when we are still actively using the
fence as we go into the runtime suspend. However, the reg->pin_count is
too coarse as it does not distinguish between exclusive ownership of the
fence register from activity.
I've not improved on the WARN, nor have we captured this WARN in an
exact igt, but it is showing up regularly in the wild:
[ 1915.935332] WARNING: CPU: 1 PID: 10861 at drivers/gpu/drm/i915/i915_gem.c:2022 i915_gem_runtime_suspend+0x116/0x130 [i915]
[ 1915.935383] WARN_ON(reg->pin_count)[ 1915.935399] Modules linked in:
snd_hda_intel i915 drm_kms_helper vgem netconsole scsi_transport_iscsi fuse vfat fat x86_pkg_temp_thermal coretemp intel_cstate intel_uncore snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mei_me mei serio_raw intel_rapl_perf intel_pch_thermal soundcore wmi acpi_pad i2c_algo_bit syscopyarea sysfillrect sysimgblt fb_sys_fops drm r8169 mii video [last unloaded: drm_kms_helper]
[ 1915.935785] CPU: 1 PID: 10861 Comm: kworker/1:0 Tainted: G U W 4.9.0-rc5+ #170
[ 1915.935799] Hardware name: LENOVO 80MX/Lenovo E31-80, BIOS DCCN34WW(V2.03) 12/01/2015
[ 1915.935822] Workqueue: pm pm_runtime_work
[ 1915.935845] ffffc900044fbbf0 ffffffffac3220bc ffffc900044fbc40 0000000000000000
[ 1915.935890] ffffc900044fbc30 ffffffffac059bcb 000007e6044fbc60 ffff8801626e3198
[ 1915.935937] ffff8801626e0000 0000000000000002 ffffffffc05e5d4e 0000000000000000
[ 1915.935985] Call Trace:
[ 1915.936013] [<ffffffffac3220bc>] dump_stack+0x4f/0x73
[ 1915.936038] [<ffffffffac059bcb>] __warn+0xcb/0xf0
[ 1915.936060] [<ffffffffac059c4f>] warn_slowpath_fmt+0x5f/0x80
[ 1915.936158] [<ffffffffc052d916>] i915_gem_runtime_suspend+0x116/0x130 [i915]
[ 1915.936251] [<ffffffffc04f1c74>] intel_runtime_suspend+0x64/0x280 [i915]
[ 1915.936277] [<ffffffffac0926f1>] ? dequeue_entity+0x241/0xbc0
[ 1915.936298] [<ffffffffac36bb85>] pci_pm_runtime_suspend+0x55/0x180
[ 1915.936317] [<ffffffffac36bb30>] ? pci_pm_runtime_resume+0xa0/0xa0
[ 1915.936339] [<ffffffffac4514e2>] __rpm_callback+0x32/0x70
[ 1915.936356] [<ffffffffac451544>] rpm_callback+0x24/0x80
[ 1915.936375] [<ffffffffac36bb30>] ? pci_pm_runtime_resume+0xa0/0xa0
[ 1915.936392] [<ffffffffac45222d>] rpm_suspend+0x12d/0x680
[ 1915.936415] [<ffffffffac69f6d7>] ? _raw_spin_unlock_irq+0x17/0x30
[ 1915.936435] [<ffffffffac0810b8>] ? finish_task_switch+0x88/0x220
[ 1915.936455] [<ffffffffac4534bf>] pm_runtime_work+0x6f/0xb0
[ 1915.936477] [<ffffffffac074353>] process_one_work+0x1f3/0x4d0
[ 1915.936501] [<ffffffffac074678>] worker_thread+0x48/0x4e0
[ 1915.936523] [<ffffffffac074630>] ? process_one_work+0x4d0/0x4d0
[ 1915.936542] [<ffffffffac074630>] ? process_one_work+0x4d0/0x4d0
[ 1915.936559] [<ffffffffac07a2c9>] kthread+0xd9/0xf0
[ 1915.936580] [<ffffffffac07a1f0>] ? kthread_park+0x60/0x60
[ 1915.936600] [<ffffffffac69fe62>] ret_from_fork+0x22/0x30
In the case the register is pinned, it should be present and we will
need to invalidate them to be restored upon resume as we cannot expect
the owner of the pin to call get_fence prior to use after resume.
Fixes: 7c108fd8fe ("drm/i915: Move fence cancellation to runtime suspend")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98804
Reported-by: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Imre Deak <imre.deak@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Link: http://patchwork.freedesktop.org/patch/msgid/20170203125717.8431-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
The drm_mm range manager claimed to support top-down insertion, but it
was neither searching for the top-most hole that could fit the
allocation request nor fitting the request to the hole correctly.
In order to search the range efficiently, we create a secondary index
for the holes using either their size or their address. This index
allows us to find the smallest hole or the hole at the bottom or top of
the range efficiently, whilst keeping the hole stack to rapidly service
evictions.
v2: Search for holes both high and low. Rename flags to mode.
v3: Discover rb_entry_safe() and use it!
v4: Kerneldoc for enum drm_mm_insert_mode.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Cc: Alexandre Courbot <gnurou@gmail.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: Sinclair Yeh <syeh@vmware.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com> # vmwgfx
Reviewed-by: Lucas Stach <l.stach@pengutronix.de> #etnaviv
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170202210438.28702-1-chris@chris-wilson.co.uk
On a non-llc system, the objects are created with .cache_level =
CACHE_NONE and so the transition to uncached for scanout is a no-op.
However, if the object was never written to, it will still be in the CPU
domain (having been zeroed out by shmemfs). Those cachelines need to be
flushed prior to display.
Reported-and-tested-by: Vito Caputo
Fixes: a6a7cc4b7d ("drm/i915: Always flush the dirty CPU cache when pinning the scanout")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Link: http://patchwork.freedesktop.org/patch/msgid/20170109111932.6342-1-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The GPU may be in an unknown state following resume and module load. The
previous occupant may have left contexts loaded, or other dangerous
state, which can cause an immediate GPU hang for us. The only save
course of action is to reset the GPU prior to using it - similarly to
how we reset the GPU prior to unload (before a second user may be
affected by our leftover state).
We need to reset the GPU very early in our load/resume sequence so that
any stale HW pointers are revoked prior to any resource allocations we
make (that may conflict).
A reset should only be a couple of milliseconds on a slow device, a cost
we should easily be able to absorb into our initialisation times.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170124110135.6418-2-chris@chris-wilson.co.uk
Whilst writing testcases to exercise the VMA API, some oddities came to
light, such as i915_gem_obj_lookup_or_create(). Joonas suggested
i915_vma_instance() as a neat replacement, so rename them, move them to
i915_vma.c and add some kerneldoc as a sugary bonus.
s/i915_gem_obj_to_vma/i915_vma_lookup/
s/i915_gem_obj_lookup_or_create_vma/i915_vma_instance/
Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170116152131.18089-2-chris@chris-wilson.co.uk
Explain in a comment how and why we treat hung context like we do.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484668747-9120-7-git-send-email-mika.kuoppala@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If we can't recover the GPU after the reset, mark it as wedged to cancel
the outstanding tasks and to prevent new users from trying to use the
broken GPU.
v2: Check the same ring is hung again before declaring the reset broken.
v3: use engine_stalled (Mika)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484668747-9120-6-git-send-email-mika.kuoppala@intel.com
Split engine reset for engine and request specific parts.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484668747-9120-5-git-send-email-mika.kuoppala@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Move the engine stalled/pardoned check into a helper function.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484668747-9120-4-git-send-email-mika.kuoppala@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Since we now only skip banned contexts, preventing the skip of default
contexts is no longer sensible. For a similar argument as before
'commit 7ec73b7e36 ("drm/i915: Only skip requests once a context is banned")'
we end up with an inconsistent API if we only mark future execbufs from
the default ctx as banned but fail to mark those currently executing
as failed.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484668747-9120-3-git-send-email-mika.kuoppala@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Add a new function for skipping all pending requests
for a context in order to make engine reset flow more
readable.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484668747-9120-2-git-send-email-mika.kuoppala@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Now that we have prepare/finish routines for the GEM reset, move the
disabling of the engine->irq_tasklet into them to reduce repetition. The
device irq enable/disable is split out to ensure it is run first and
last always (even if the GPU reset fails).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484668747-9120-1-git-send-email-mika.kuoppala@intel.com
Some engines are never user or already sitting idle in the kernel
context and for those we can skip flushing the current context for
i915_gem_switch_to_kernel_context(). We used to perform this
optimisation but that was removed for convenience of converting over to
multiple timelines and handling the pending request queues.
From the perspective of writing selftests, reducing the number of
background operations on the engines makes defining assertions easier.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170114162334.10271-2-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reading the ggtt_views is much more pleasant without the extra
characters from specifying the union (i.e. ggtt_view.partial rather than
ggtt_view.params.partial). To make this work inside i915_vma_compare()
with only a single memcmp requires us to ensure that there are no
uninitialised bytes within each branch of the union (we make sure the
structs are packed) and we need to store the size of each branch.
v4: Rewrite changelog and add comments explaining the assert.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170114002827.31315-5-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Since commit fe115628d5 ("drm/i915: Implement pwrite without
struct-mutex") the lowlevel pwrite calls are now called without the
protection of struct_mutex, but pwrite_phys was still asserting that it
held the struct_mutex and later tried to drop and relock it.
Fixes: fe115628d5 ("drm/i915: Implement pwrite without struct-mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20170106152240.5793-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
(cherry picked from commit 10466d2a59)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Start converting over from the byte count to its semantic macro, either
we want to allocate the size of a physical page in main memory or we
want the size of a virtual page in the GTT. 4096 could mean either, but
PAGE_SIZE and I915_GTT_PAGE_SIZE are explicit and should help improve
code comprehension and future changes. In the future, we may want to use
variable GTT page sizes and so have the challenge of knowing which
hardcoded values were used to represent a physical page vs the virtual
page.
v2: Look for a few more 4096s to convert, discover IS_ALIGNED().
v3: 4096ul paranoia, make fence alignment a distinct value of 4096, keep
bdw stolen w/a as 4096 until we know better.
v4: Add asserts that i915_vma_insert() start/end are aligned to GTT page
sizes.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110144734.26052-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
It has been some time since i915_gem_engine_cleanup was only called from
the module unload path, and now it is only called when the GPU is
wedged. Mika complained that the name is confusing, especially in light
of the existence of i915_gem_cleanup_engines().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110172246.27297-5-chris@chris-wilson.co.uk
Similarly to a normal reset, after we mark the GPU as wedged (completely
fubar and no more requests can be executed), set the error status on all
the in flight requests.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110172246.27297-4-chris@chris-wilson.co.uk
Let userspace know if its request was resubmitted due to it being
executed at the time of a global reset. In this case, the reset was for
a guilty request on another engine, and this request was an innocent
victim that will be re-executed upon restarting. However, since it was
running at the time of the reset, we can not guarantee that it suffered
no ill-effects from the reset (e.g. some context state may be lost, or
some self-modifying fragment shaders will be restarted from the final
state not their initial state), to let userspace know that it has been
corrupted set a special value on the fence->error, -EAGAIN.
If the request does hang on resubmission, the error will be overwritten
with -EIO.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110172246.27297-3-chris@chris-wilson.co.uk
The struct dma_fence carries a status field exposed to userspace by
sync_file. This is inspected after the fence is signaled and can convey
whether or not the request completed successfully, or in our case if we
detected a hang during the request (signaled via -EIO in
SYNC_IOC_FILE_INFO).
v2: Mark all cancelled requests as failed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110172246.27297-2-chris@chris-wilson.co.uk
Always reset the requests of the guilty context, including the hung
request that we tell the hardware to skip. This should help if the
reprogram fails entirely, but more importantly makes the guilty path
more uniform (and simplifies the subsequent patch to tweak the cancelled
requests).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110172246.27297-1-chris@chris-wilson.co.uk
Pull in latest drm-next from Dave Airlie to get at all the drm-misc
goodies, specifically:
- dma_fence error state handling rework (Chris needs that for error
recovery)
- crc support locking changes (Tomeu's i915 crc patches need that).
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
The VMA is later clipped against the vm_area_struct before insertion of
the faulting PTE so we are free to create the partial view as we desire.
If we use the object as the extents rather than the area, this partial
can then be used for other areas.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110095633.6612-2-chris@chris-wilson.co.uk
Rename i915_gem_get_ggtt_size() and i915_gem_get_ggtt_alignment() to
i915_gem_fence_size() and i915_gem_fence_alignment() respectively to
better match usage. Similarly move the pair of functions into
i915_gem_tiling.c next to the fence restrictions.
Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170109161613.11881-6-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
The fence size/alignment is a combination of the vma size plus object
tiling parameters. Those parameters are rarely changed, making the fence
size/alignemnt roughly constant for the lifetime of the VMA. We can
simplify subsequent calculations by precalculating the size/alignment
required for GGTT vma taking fencing into account (with an update if we
do change the tiling or stride).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170109161613.11881-4-chris@chris-wilson.co.uk
Ensure the view occupies the full tile row so that reads/writes into the
VMA do not escape (via fenced detiling) into neighbouring objects - we
will pad the object with scratch pages to satisfy the fence. This
applies the lazy-tiling we employed on gen2/3 to gen4+.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170109161613.11881-2-chris@chris-wilson.co.uk
More 4.11 stuff, holidays edition (i.e. not much):
- docs and cleanups for shared dpll code (Ander)
- some kerneldoc work (Chris)
- fbc by default on gen9+ too, yeah! (Paulo)
- fixes, polish and other small things all over gem code (Chris)
- and a few small things on top
Plus a backmerge, because Dave was enjoying time off too.
* tag 'drm-intel-next-2017-01-09' of git://anongit.freedesktop.org/git/drm-intel: (275 commits)
drm/i915: Update DRIVER_DATE to 20170109
drm/i915: Drain freed objects for mmap space exhaustion
drm/i915: Purge loose pages if we run out of DMA remap space
drm/i915: Fix phys pwrite for struct_mutex-less operation
drm/i915: Simplify testing for am-I-the-kernel-context?
drm/i915: Use range_overflows()
drm/i915: Use fixed-sized types for stolen
drm/i915: Use phys_addr_t for the address of stolen memory
drm/i915: Consolidate checks for memcpy-from-wc support
drm/i915: Only skip requests once a context is banned
drm/i915: Move a few more utility macros to i915_utils.h
drm/i915: Clear ret before unbinding in i915_gem_evict_something()
drm/i915/guc: Exclude the upper end of the Global GTT for the GuC
drm/i915: Move a few utility macros into a separate header
drm/i915/execlists: Reorder execlists register enabling
drm/i915: Assert that we do create the deferred context
drm/i915: Assert all timeline requests are gone before fini
drm/i915: Revoke fenced GTT mmapings across GPU reset
drm/i915: enable FBC on gen9+ too
drm/i915: actually drive the BDW reserved IDs
...
Pull swiotlb fixes from Konrad Rzeszutek Wilk:
"This has one fix to make i915 work when using Xen SWIOTLB, and a
feature from Geert to aid in debugging of devices that can't do DMA
outside the 32-bit address space.
The feature from Geert is on top of v4.10 merge window commit
(specifically you pulling my previous branch), as his changes were
dependent on the Documentation/ movement patches.
I figured it would just easier than me trying than to cherry-pick the
Documentation patches to satisfy git.
The patches have been soaking since 12/20, albeit I updated the last
patch due to linux-next catching an compiler error and adding an
Tested-and-Reported-by tag"
* 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
swiotlb: Export swiotlb_max_segment to users
swiotlb: Add swiotlb=noforce debug option
swiotlb: Convert swiotlb_force from int to enum
x86, swiotlb: Simplify pci_swiotlb_detect_override()
So they can figure out what is the optimal number of pages
that can be contingously stitched together without fear of
bounce buffer.
We also expose an mechanism for sub-users of SWIOTLB API, such
as Xen-SWIOTLB to set the max segment value. And lastly
if swiotlb=force is set (which mandates we bounce buffer everything)
we set max_segment so at least we can bounce buffer one 4K page
instead of a giant 512KB one for which we may not have space.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reported-and-Tested-by: Juergen Gross <jgross@suse.com>
As we now use a deferred free queue for objects, simply retiring the
active objects is not enough to immediately free them and recover their
mmap space - we must now also drain the freed object list.
Fixes: fbbd37b36f ("drm/i915: Move object release to a freelist + worker"
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20170106152240.5793-3-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Since commit fe115628d5 ("drm/i915: Implement pwrite without
struct-mutex") the lowlevel pwrite calls are now called without the
protection of struct_mutex, but pwrite_phys was still asserting that it
held the struct_mutex and later tried to drop and relock it.
Fixes: fe115628d5 ("drm/i915: Implement pwrite without struct-mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20170106152240.5793-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
The kernel context (dev_priv->kernel_context) is unique in that it is
not associated with any user filp - it is the only one with
ctx->file_priv == NULL. This is a simpler test than comparing it against
dev_priv->kernel_context which involves some pointer dancing.
In checking that this is true, we notice that the gvt context is
allocating itself a i915_hw_ppgtt it doesn't use and not flagging that
its file_priv should be invalid.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170106152013.24684-5-chris@chris-wilson.co.uk
If we skip before banning, we have an inconsistent interface between
execbuf still queueing valid request but those requests already queued
being cancelled. If we only cancel the pending requests once we stop
accepting new requests, the user interface is more consistent.
Reported-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Fixes: 821ed7df6e ("drm/i915: Update reset path to fix incomplete requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: <stable@vger.kernel.org> # v4.9+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170105170059.344-1-chris@chris-wilson.co.uk
In order to defeat some circular dependencies between headers to allow use
of e.g. range_overflows() in a header, move the simple independent macros
into their own header.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170105153023.30575-4-chris@chris-wilson.co.uk
The fence registers are clobbered by a GPU reset. If there is concurrent
user access to a fenced region via a GTT mmaping, the access will not be
fenced during the reset (until we restore the fences afterwards). In order
to prevent invalid access during the reset, before we clobber the fences
first we must invalidate the GTT mmapings. Access to the mmap will then
be forced to fault in the page, and in handling the fault, i915_gem_fault()
will take the struct_mutex and wait upon the reset to complete.
v2: Fix up commentary.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99274
Testcase: igt/gem_mmap_gtt/hang
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170104145110.1486-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJYaYNlAAoJEHm+PkMAQRiGtCUH/18PMUJpHqRKjxL3Yscw+QZC
RmGlD/hwBRLUSgiTCfURNGKP4QZv2kQW7BGsGC72oL01lmxozsU72ixUIO+wXzDY
K2b0OOKGZZWzFtaVm7Qs+5JhHAEKZcT046mLD8sjJuqkrFAhmNLKdwHjihKBEkm9
J3s2tpdXdN0x/Uyga/GY9khEYIrvLPeBoKSz+JXcQKdC0iq3/+PMpWnN47QCNScr
7azojkJkj/rs2cqVdOi7Wbh6PSqIvPsl8E3qJefpaVJF/IQaU1pFdy5g8kYm4V7T
fr6HgIbuN4EQWdN/5cgKrUdpQyV7D8iYx02klk4R8WgfS0QMYoUcsg+XsTd02TI=
=OhGe
-----END PGP SIGNATURE-----
Merge tag 'v4.10-rc2' into drm-intel-next-queued
Backmerge Linux 4.10-rc2 to resync with our -fixes cherry-picks. I've
done the backmerge directly because Dave is on vacation.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
As the fence may be signaled concurrently from an interrupt on another
device, it is possible for the list of requests on the timeline to be
modified as we walk it. Take both (the context's timeline and the global
timeline) locks to prevent such modifications.
Fixes: 80b204bce8 ("drm/i915: Enable multiple timelines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-10-chris@chris-wilson.co.uk
(cherry picked from commit 00c25e3f40)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
As trimming the sg table is merely an optimisation that gracefully fails
if we cannot allocate a new table, we do not need to report the failure
either.
Fixes: 0c40ce130e ("drm/i915: Trim the object sg table")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-4-chris@chris-wilson.co.uk
(cherry picked from commit 8bfc478fa4)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
When we teardown the backing storage for the phys object, we copy from
the coherent contiguous block back to the shmemfs object, clflushing as
we go. Trying to clflush the invalid sg beforehand just oops and would
be redundant (due to it already being coherent, and clflushed
afterwards).
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-3-chris@chris-wilson.co.uk
(cherry picked from commit e5facdf964)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
As the fence may be signaled concurrently from an interrupt on another
device, it is possible for the list of requests on the timeline to be
modified as we walk it. Take both (the context's timeline and the global
timeline) locks to prevent such modifications.
Fixes: 80b204bce8 ("drm/i915: Enable multiple timelines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-10-chris@chris-wilson.co.uk
As trimming the sg table is merely an optimisation that gracefully fails
if we cannot allocate a new table, we do not need to report the failure
either.
Fixes: 0c40ce130e ("drm/i915: Trim the object sg table")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-4-chris@chris-wilson.co.uk
When we teardown the backing storage for the phys object, we copy from
the coherent contiguous block back to the shmemfs object, clflushing as
we go. Trying to clflush the invalid sg beforehand just oops and would
be redundant (due to it already being coherent, and clflushed
afterwards).
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-3-chris@chris-wilson.co.uk
The idle work handler is self-arming - if it detects that it needs to
run again it will queue itself from its work handler. Take greater care
when trying to drain the idle work, and double check that it is flushed.
The free worker has a similar issue where it is armed by an RCU task
which may be running concurrently with us.
This should hopefully help with the sporadic WARN_ON(dev_priv->gt.awake)
from i915_gem_suspend.
v2: Reuse drain_freed_objects.
v3: Don't try to flush the freed objects from the shrinker, as it may be
underneath the struct_mutex already.
v4: do while and comment upon the excess rcu_barrier in drain_freed_objects
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-2-chris@chris-wilson.co.uk
Since commit db6c2b4151 ("drm/i915: Store the vma in an rbtree under
the object") the vma are once again sorted into GGTT first, then ppGTT
so that the typical case of walking the GGTT vma can stop as soon as we
find a non-ppGTT. Apply that optimisation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-1-chris@chris-wilson.co.uk
First set of i915 fixes for code in next.
* tag 'drm-intel-next-fixes-2016-12-22' of git://anongit.freedesktop.org/git/drm-intel:
drm/i915: skip the first 4k of stolen memory on everything >= gen8
drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping
drm/i915: Fix use after free in logical_render_ring_init
drm/i915: disable PSR by default on HSW/BDW
drm/i915: Fix setting of boost freq tunable
drm/i915: tune down the fast link training vs boot fail
drm/i915: Reorder phys backing storage release
drm/i915/gen9: Fix PCODE polling during SAGV disabling
drm/i915/gen9: Fix PCODE polling during CDCLK change notification
drm/i915/dsi: Fix chv_exec_gpio disabling the GPIOs it is setting
drm/i915/dsi: Fix swapping of MIPI_SEQ_DEASSERT_RESET / MIPI_SEQ_ASSERT_RESET
drm/i915/dsi: Do not clear DPOUNIT_CLOCK_GATE_DISABLE from vlv_init_display_clock_gating
drm/i915: drop the struct_mutex when wedged or trying to reset
If we at first do not succeed with attempting to remap our physical
pages using a coalesced scattergather list, try again with one
scattergather entry per page. This should help with swiotlb as it uses a
limited buffer size and only searches for contiguous chunks within its
buffer aligned up to the next boundary - i.e. we may prematurely cause a
failure as we are unable to utilize the unused space between large
chunks and trigger an error such as:
i915 0000:00:02.0: swiotlb buffer is full (sz: 1630208 bytes)
Reported-by: Juergen Gross <jgross@suse.com>
Tested-by: Juergen Gross <jgross@suse.com>
Fixes: 871dfbd67d ("drm/i915: Allow compaction upto SWIOTLB max segment size")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20161219124346.550-1-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(cherry picked from commit d766ef5300)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
In commit a4f5ea64f0 ("drm/i915: Refactor object page API"), I
reordered the object->pages teardown to be more friendly wrt to a
separate obj->mm.lock. However, I overlooked the phys object and left it
with a dangling use-after-free of its phys_handle. Move the allocation
of the phys handle to get_pages and it release to put_pages to prevent
the invalid access and to improve symmetry.
v2: Add commentary about always aligning to page size.
Testcase: igt/drv_selftest/objects
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Fixes: a4f5ea64f0 ("drm/i915: Refactor object page API")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161207133411.8028-1-chris@chris-wilson.co.uk
(cherry picked from commit dbb4351bab)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
In commit 0c40ce130e ("drm/i915: Trim the object sg table"), we expect
to copy exactly orig_st->nents across and allocate the table thusly.
The copy loop should therefore end with the new_sg being NULL.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161219124346.550-2-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
If we at first do not succeed with attempting to remap our physical
pages using a coalesced scattergather list, try again with one
scattergather entry per page. This should help with swiotlb as it uses a
limited buffer size and only searches for contiguous chunks within its
buffer aligned up to the next boundary - i.e. we may prematurely cause a
failure as we are unable to utilize the unused space between large
chunks and trigger an error such as:
i915 0000:00:02.0: swiotlb buffer is full (sz: 1630208 bytes)
Reported-by: Juergen Gross <jgross@suse.com>
Tested-by: Juergen Gross <jgross@suse.com>
Fixes: 871dfbd67d ("drm/i915: Allow compaction upto SWIOTLB max segment size")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20161219124346.550-1-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The requests conversion introduced a nasty bug where we could generate a
new request in the middle of constructing a request if we needed to idle
the system in order to evict space for a context. The request to idle
would be executed (and waited upon) before the current one, creating a
minor havoc in the seqno accounting, as we will consider the current
request to already be completed (prior to deferred seqno assignment) but
ring->last_retired_head would have been updated and still could allow
us to overwrite the current request before execution.
We also employed two different mechanisms to track the active context
until it was switched out. The legacy method allowed for waiting upon an
active context (it could forcibly evict any vma, including context's),
but the execlists method took a step backwards by pinning the vma for
the entire active lifespan of the context (the only way to evict was to
idle the entire GPU, not individual contexts). However, to circumvent
the tricky issue of locking (i.e. we cannot take struct_mutex at the
time of i915_gem_request_submit(), where we would want to move the
previous context onto the active tracker and unpin it), we take the
execlists approach and keep the contexts pinned until retirement.
The benefit of the execlists approach, more important for execlists than
legacy, was the reduction in work in pinning the context for each
request - as the context was kept pinned until idle, it could short
circuit the pinning for all active contexts.
We introduce new engine vfuncs to pin and unpin the context
respectively. The context is pinned at the start of the request, and
only unpinned when the following request is retired (this ensures that
the context is idle and coherent in main memory before we unpin it). We
move the engine->last_context tracking into the retirement itself
(rather than during request submission) in order to allow the submission
to be reordered or unwound without undue difficultly.
And finally an ulterior motive for unifying context handling was to
prepare for mock requests.
v2: Rename to last_retired_context, split out legacy_context tracking
for MI_SET_CONTEXT.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-3-chris@chris-wilson.co.uk
Convert some of the obvious hand-rolled ranged overflow sanity checks to
our shiny new range_overflows macro.
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161213203222.32564-4-matthew.auld@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Every single user of vmf->virtual_address typed that entry to unsigned
long before doing anything with it so the type of virtual_address does
not really provide us any additional safety. Just use masked
vmf->address which already has the appropriate type.
Link: http://lkml.kernel.org/r/1479460644-25076-3-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In commit a4f5ea64f0 ("drm/i915: Refactor object page API"), I
reordered the object->pages teardown to be more friendly wrt to a
separate obj->mm.lock. However, I overlooked the phys object and left it
with a dangling use-after-free of its phys_handle. Move the allocation
of the phys handle to get_pages and it release to put_pages to prevent
the invalid access and to improve symmetry.
v2: Add commentary about always aligning to page size.
Testcase: igt/drv_selftest/objects
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Fixes: a4f5ea64f0 ("drm/i915: Refactor object page API")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161207133411.8028-1-chris@chris-wilson.co.uk
Pineview deserves to use its own platform enum (which was already added,
unused, previously). IS_G33() no longer matches Pineview, and gets
replaced by IS_G33() || IS_PINEVIEW() or equivalent. Pineview is no
longer an outlier among platform definitions.
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1481143689-19672-1-git-send-email-jani.nikula@intel.com
We need to distinguish between full i915_vma structs and simple
drm_mm_nodes when considering eviction (i.e. we must be careful not to
treat a mere drm_mm_node as a much larger i915_vma causing memory
corruption, if we are lucky). To do this, color these not-a-vma with -1
(I915_COLOR_UNEVICTABLE).
v2...v200: New name for -1.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161205142941.21965-1-chris@chris-wilson.co.uk
Since the submit/execute split in commit d55ac5bf97 ("drm/i915: Defer
transfer onto execution timeline to actual hw submission") the
global seqno advance was deferred until the submit_request callback.
After wedging the GPU, we were installing a nop_submit_request handler
(to avoid waking up the dead hw) but I had missed converting this over
to the new scheme. Under the new scheme, we have to explicitly call
i915_gem_submit_request() from the submit_request handler to mark the
request as on the hardware. If we don't the request is always pending,
and any waiter will continue to wait indefinitely and hangcheck will not
be able to resolve the lockup.
References: https://bugs.freedesktop.org/show_bug.cgi?id=98748
Testcase: igt/gem_eio/in-flight
Fixes: d55ac5bf97 ("drm/i915: Defer transfer onto execution timeline to actual hw submission")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161122144121.7379-3-chris@chris-wilson.co.uk
(cherry picked from commit 3dcf93f7f2)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Simplifies the code to pass the right parameter in.
v2: Commit message. (Joonas Lahtinen)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Like GEM init, GUC init, MOCS init and context creation.
Enables them to lose dev_priv locals.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Makes all GEM object constructors consistent.
v2: Fix compilation in GVT code.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v1)
Where it is more appropriate and also to be consistent with
the direction of the driver.
v2: Leave out object alloc/free inlining. (Joonas Lahtinen)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
99% of the time we access i915_address_space->dev we want the i915
device and not the drm device, so let's store the drm_i915_private
backpointer instead. The only real complication here are the inlines
in i915_vma.h where drm_i915_private is not yet defined and so we have
to choose an alternate path for our asserts.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161129095008.32622-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
In order to prevent a race between the old callback submitting an
incomplete request and i915_gem_set_wedged() installing its nop handler,
we must ensure that the swap occurs when the machine is idle
(stop_machine).
v2: move context lost from out of BKL.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161122144121.7379-4-chris@chris-wilson.co.uk
Since the submit/execute split in commit d55ac5bf97 ("drm/i915: Defer
transfer onto execution timeline to actual hw submission") the
global seqno advance was deferred until the submit_request callback.
After wedging the GPU, we were installing a nop_submit_request handler
(to avoid waking up the dead hw) but I had missed converting this over
to the new scheme. Under the new scheme, we have to explicitly call
i915_gem_submit_request() from the submit_request handler to mark the
request as on the hardware. If we don't the request is always pending,
and any waiter will continue to wait indefinitely and hangcheck will not
be able to resolve the lockup.
References: https://bugs.freedesktop.org/show_bug.cgi?id=98748
Testcase: igt/gem_eio/in-flight
Fixes: d55ac5bf97 ("drm/i915: Defer transfer onto execution timeline to actual hw submission")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161122144121.7379-3-chris@chris-wilson.co.uk
When a user context is closed, it's file_priv backpointer is replaced by
ERR_PTR(-EBADF); be careful not to chase this invalid pointer after a
hang and a GPU reset.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Fixes: b083a0870c ("drm/i915: Add per client max context ban limit")
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161122144121.7379-1-chris@chris-wilson.co.uk
If we have a bad client submitting unfavourably across different
contexts, creating new ones, the per context scoring of badness
doesn't remove the root cause, the offending client.
To counter, keep track of per client context bans. Deny access if
client is responsible for more than 3 context bans in
it's lifetime.
v2: move ban check to context create ioctl (Chris)
v3: add commentary about hangs needed to reach client ban (Chris)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Now when driver has per context scoring of 'hanging badness'
and also subsequent hangs during short windows are allowed,
if there is progress made in between, it does not make sense
to expose a ban timing window as a context parameter anymore.
Let the scoring be the sole indicator for ban policy and substitute
ban period context parameter as a boolean to get/set context
bannable property.
v2: allow non root to opt into being banned (Chris)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
As hangcheck score was removed, the active decay of score
was removed also. This removed feature for hangcheck to detect
if the gpu client was accidentally or maliciously causing intermittent
hangs. Reinstate the scoring as a per context property, so that if
one context starts to act unfavourably, ban it.
v2: ban_period_secs as a gate to score check (Chris)
v3: decay in proper spot. scores as tunables (Chris)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Hangcheck state accumulation has gained more steps
along the years, like head movement and more recently the
subunit inactivity check. As the subunit sampling is only
done if the previous state check showed inactivity, we
have added more stages (and time) to reach a hang verdict.
Asymmetric engine states led to different actual weight of
'one hangcheck unit' and it was demonstrated in some
hangs that due to difference in stages, simpler engines
were accused falsely of a hang as their scoring was much
more quicker to accumulate above the hang treshold.
To completely decouple the hangcheck guilty score
from the hangcheck period, convert hangcheck score to a
rough period of inactivity measurement. As these are
tracked as jiffies, they are meaningful also across
reset boundaries. This makes finding a guilty engine
more accurate across multi engine activity scenarios,
especially across asymmetric engines.
We lose the ability to detect cross batch malicious attempts
to hinder the progress. Plan is to move this functionality
to be part of context banning which is more natural fit,
later in the series.
v2: use time_before macros (Chris)
reinstate the pardoning of moving engine after hc (Chris)
v3: avoid global state for per engine stall detection (Chris)
v4: take timeline last retirement into account (Chris)
v5: do debug print on pardoning, split out retirement timestamp (Chris)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Currently we only clflush the scanout if it is in the CPU domain. Also
flush if we have a pending CPU clflush. We also want to treat the
dirtyfb path similar, and flush any pending writes there as well.
v2: Only send the fb flush message if flushing the dirt on flip
v3: Make flush-for-flip and dirtyfb look more alike since they serve
similar roles as end-of-frame marker.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> #v2
Link: http://patchwork.freedesktop.org/patch/msgid/20161118211747.25197-1-chris@chris-wilson.co.uk
On the DMA mapping error path, sg may be NULL (it has already been
marked as the last scatterlist entry), and we should avoid dereferencing
it again.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: e227330223 ("drm/i915: avoid leaking DMA mappings")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Cc: stable@vger.kernel.org
Link: http://patchwork.freedesktop.org/patch/msgid/20161114112930.2033-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
I tried to avoid having to track the write for every VMA by only
tracking writes to the ggtt. However, for the purposes of frontbuffer
tracking this is insufficient as we need to invalidate around writes not
just to the the ggtt but all aliased ppgtt views of the framebuffer. By
moving the critical section to the object and only doing so for
framebuffer writes we can reduce the tracking even further by only
watching framebuffers and not vma.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161116190704.5293-1-chris@chris-wilson.co.uk
Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
We use DRM_DEBUG() when reporting on user actions, to try and keep
intentional errors out of the CI dmesg. Demote the debug from
i915_gem_open() similarly so that it is only apparent with drm.debug & 1
like its brethren.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20161109104507.21228-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Started with removing INTEL_INFO(dev) and cascaded into a quite
big trickle of function prototype changes. Still, I think it is
for the better.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Plus a small cascade of function prototype changes.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Boost the priority of any rendering required to show the next pageflip
as we want to avoid missing the vblank by being delayed by invisible
workload. We prioritise avoiding jank and jitter in the GUI over
starving background tasks.
v2: Descend dma_fence_array when boosting priorities.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-10-chris@chris-wilson.co.uk
Track the priority of each request and use it to determine the order in
which we submit requests to the hardware via execlists.
The priority of the request is determined by the user (eventually via
the context) but may be overridden at any time by the driver. When we set
the priority of the request, we bump the priority of all of its
dependencies to match - so that a high priority drawing operation is not
stuck behind a background task.
When the request is ready to execute (i.e. we have signaled the submit
fence following completion of all its dependencies, including third
party fences), we put the request into a priority sorted rbtree to be
submitted to the hardware. If the request is higher priority than all
pending requests, it will be submitted on the next context-switch
interrupt as soon as the hardware has completed the current request. We
do not currently preempt any current execution to immediately run a very
high priority request, at least not yet.
One more limitation, is that this is first implementation is for
execlists only so currently limited to gen8/gen9.
v2: Replace recursive priority inheritance bumping with an iterative
depth-first search list.
v3: list_next_entry() for walking lists
v4: Explain how the dfs solves the recursion problem with PI.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-8-chris@chris-wilson.co.uk
The scheduler needs to know the dependencies of each request for the
lifetime of the request, as it may choose to reschedule the requests at
any time and must ensure the dependency tree is not broken. This is in
additional to using the fence to only allow execution after all
dependencies have been completed.
One option was to extend the fence to support the bidirectional
dependency tracking required by the scheduler. However the mismatch in
lifetimes between the submit fence and the request essentially meant
that we had to build a completely separate struct (and we could not
simply reuse the existing waitqueue in the fence for one half of the
dependency tracking). The extra dependency tracking simply did not mesh
well with the fence, and keeping it separate both keeps the fence
implementation simpler and allows us to extend the dependency tracking
into a priority tree (whilst maintaining support for reordering the
tree).
To avoid the additional allocations and list manipulations, the use of
the priotree is disabled when there are no schedulers to use it.
v2: Create a dedicated slab for i915_dependency.
Rename the lists.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-7-chris@chris-wilson.co.uk
In order to simplify the lockdep annotation, as they become more complex
in the future with deferred execution and multiple paths through the
same functions, create a separate lockclass for the user timeline and
the hardware execution timeline.
We should only ever be locking the user timeline and the execution
timeline in parallel so we only need to create two lock classes, rather
than a separate class for every timeline.
v2: Rename the lock classes to be more consistent with other lockdep.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-2-chris@chris-wilson.co.uk
When we release the shmem backing storage, we make sure that the pages
are coherent with the cpu cache. However, our clflush routine was
skipping the flush as the object had no pages at release time. Fix this by
explicitly flushing the sg_table we are decoupling.
Fixes: 03ac84f183 ("drm/i915: Pass around sg_table to get_pages/put_pages backend")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161111145809.9701-2-chris@chris-wilson.co.uk
A small selection of macros which can only accept dev_priv from
now on and a resulting trickle of fixups.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
As a side product, had to split two other files;
- i915_gem_fence_reg.h
- i915_gem_object.h (only parts that needed immediate untanglement)
I tried to move code in as big chunks as possible, to make review
easier. i915_vma_compare was moved to a header temporarily.
v2:
- Use i915_gem_fence_reg.{c,h}
v3:
- Rebased
v4:
- Fix building when DEBUG_GEM is enabled by reordering a bit.
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1478861034-30643-1-git-send-email-joonas.lahtinen@linux.intel.com
At the moment we allocate enough sg table entries assuming we
will not be able to do any coalescing. But since in practice
we most often can, and more so very effectively, this ends up
wasting a lot of memory.
A simple and effective way of trimming the over-allocated
entries is to copy the table over to a new one allocated to the
exact size.
Experiments on my freshly logged and idle desktop (KDE) showed
that by doing this we can save approximately 1 MiB of RAM, or
when running a typical benchmark like gl_manhattan I have
even seen a 6 MiB saving.
More complicated techniques such as only copying the last used
page and freeing the rest are left to the reader.
v2:
* Update commit message.
* Use temporary sg_table on stack. (Chris Wilson)
v3:
* Commit message update.
* Comment added.
* Replace memcpy with copy assignment.
(Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1478704423-7447-1-git-send-email-tvrtko.ursulin@linux.intel.com
We always flush the chipset prior to executing with the GPU, so we can
skip the flush during ordinary domain management.
This should help mitigate some of the potential performance regressions,
but likely trivial, from doing the flush unconditionally before execbuf
introduced in commit dcd79934b0 ("drm/i915: Unconditionally flush any
chipset buffers before execbuf")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20161106130001.9509-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
During resume we will reset the SW/HW tracking for each ring head/tail
pointers and so are not prepared to replay any pending requests (as
opposed to GPU reset time). Add an assert for this both to the suspend
and the resume code.
v2:
- Check for ELSP port idle already during suspend and check !gt.awake
during resume. (Chris)
v3:
- Move the !gt.awake check to i915_gem_resume().
v4:
- s/intel_lr_engines_idle/intel_execlists_idle/ (Chris)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1478510405-11799-4-git-send-email-imre.deak@intel.com
We assume that the GPU is idle once receiving the seqno via the last
request's user interrupt. In execlist mode the corresponding context
completed interrupt can be delayed though and until this latter
interrupt arrives we consider the request to be pending on the ELSP
submit port. This can cause a problem during system suspend where this
last request will be seen by the resume code as still pending. Such
pending requests are normally replayed after a GPU reset, but during
resume we reset both SW and HW tracking of the ring head/tail pointers,
so replaying the pending request with its stale tail pointer will leave
the ring in an inconsistent state. A subsequent request submission can
lead then to the GPU executing from uninitialized area in the ring
behind the above stale tail pointer.
Fix this by making sure any pending request on the ELSP port is
completed before suspending. I used a polling wait since the completion
time I measured was <1ms and since normally we only need to wait during
system suspend. GPU idling during runtime suspend is scheduled with a
delay (currently 50-100ms) after the retirement of the last request at
which point the context completed interrupt must have arrived already.
The chance of this bug was increased by
commit 1c777c5d1d
Author: Imre Deak <imre.deak@intel.com>
Date: Wed Oct 12 17:46:37 2016 +0300
drm/i915/hsw: Fix GPU hang during resume from S3-devices state
but it could happen even without the explicit GPU reset, since we
disable interrupts afterwards during the suspend sequence.
v2:
- Do an unlocked poll-wait first. (Chris)
v3-4:
- s/intel_lr_engines_idle/intel_execlists_idle/ and move
i915.enable_execlists check to the new helper. (Chris)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98470
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1478510405-11799-3-git-send-email-imre.deak@intel.com
There is a small race where a new request can be submitted and retired
after the idle worker started to run which leads to idling the GPU too
early. Fix this by deferring the idling to the pending instance of the
worker.
This scenario was pointed out by Chris.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1478510405-11799-2-git-send-email-imre.deak@intel.com
Valleyview appears to be limited to only scanning out from the first 512MiB
of the Global GTT. Lets presume that this behaviour was inherited from the
display block copied from g4x (not Ironlake) and all earlier generations
are similarly affected, though testing suggests different symptoms. For
simplicity, impose that these platforms must scanout from the mappable
region. (For extra simplicity, use HAS_GMCH_DISPLAY even though this
catches Cherryview which does not appear to be limited to the low
aperture for its scanout.)
v2: Use HAS_GMCH_DISPLAY() to more clearly convey my intent about
limiting this workaround to the old style of display engine.
v3: Update changelog to reflect testing by Ville Syrjälä
v4: Include the changes to the comments as well
Reported-by: Luis Botello <luis.botello.ortega@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98036
Fixes: 2efb813d53 ("drm/i915: Fallback to using unmappable memory for scanout")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Akash Goel <akash.goel@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.9-rc1+
Link: http://patchwork.freedesktop.org/patch/msgid/20161107110128.28762-1-chris@chris-wilson.co.uk
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com
When we split a large object up into chunks for GTT faulting (because we
can't fit the whole object into the aperture) we have to align our cuts
with the fence registers. Each partial VMA must cover a complete set of
tile rows or the offset into each partial VMA is not aligned with the
whole image. Currently we enforce a minimum size on each partial VMA,
but this minimum size itself was not aligned to the tile row causing
distortion.
Reported-by: Andreas Reis <andreas.reis@gmail.com>
Reported-by: Chris Clayton <chris2553@googlemail.com>
Reported-by: Norbert Preining <preining@logic.at>
Tested-by: Norbert Preining <preining@logic.at>
Tested-by: Chris Clayton <chris2553@googlemail.com>
Fixes: 03af84fe7f ("drm/i915: Choose partial chunksize based on tile row size")
Fixes: a61007a83a ("drm/i915: Fix partial GGTT faulting") # enabling patch
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98402
Testcase: igt/gem_mmap_gtt/medium-copy-odd
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.9-rc1+
Link: http://patchwork.freedesktop.org/patch/msgid/20161107105443.27855-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
commit bc0629a767 ("drm/i915: Track pages pinned due to swizzling
quirk") fixed one problem, but revealed a whole lot more. The root cause
of the pin count mismatch for the swizzle quirk (for L-shaped memory on
gen3/4) was that we were incrementing the pages_pin_count upon getting
the backing pages but then overwriting the pages_pin_count to set it to
1 afterwards. With a little bit of adjustment to satisfy the GEM_BUG_ON
sanitychecks, the fix is to replace the explicit atomic_set with an
atomic_inc.
v2: Consistently use atomics (not mix atomics and helpers) within the
lowlevel get_pages routines. This makes the atomic operations much
clearer.
Fixes: 1233e2db19 ("drm/i915: Move object backing storage manipulation")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161104103001.27643-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Commit 1bec9b0bda ("drm/i915/shrinker: Only shmemfs objects
are backed by swap") stopped considering the userptr objects
in shrinker callbacks.
Restore that so idle userptr objects can be discarded in order
to free up memory.
One change further to what was introduced in 1bec9b0bda is
to start considering userptr objects in oom but that should
also be a correct thing to do.
v2: Introduce I915_GEM_OBJECT_IS_SHRINKABLE. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 1bec9b0bda ("drm/i915/shrinker: Only shmemfs objects are backed by swap")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: <stable@vger.kernel.org>
Link: http://patchwork.freedesktop.org/patch/msgid/1478011450-6634-1-git-send-email-tvrtko.ursulin@linux.intel.com
The shrinker may appear to recurse into obj->mm.lock as the shrinker may
be called from a direct reclaim path whilst handling get_pages. We
filter out recursing on the same obj->mm.lock by inspecting
obj->mm.pages, but we do want to take the lock on a second object in
order to reap their pages. lockdep spots the recursion on the same
lockclass and needs annotation to avoid a false positive. To keep the
two paths distinct, create an enum to indicate which subclass of
obj->mm.lock we are using. This removes the false positive and avoids
masking real bugs.
Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161101121134.27504-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
With full-ppgtt one of the main bottlenecks is the lookup of the VMA
underneath the object. For execbuf there is merit in having a very fast
direct lookup of ctx:handle to the vma using a hashtree, but that still
leaves a large number of other lookups. One way to speed up the lookup
would be to use a rhashtable, but that requires extra allocations and
may exhibit poor worse case behaviour. An alternative is to use an
embedded rbtree, i.e. no extra allocations and deterministic behaviour,
but at the slight cost of O(lgN) lookups (instead of O(1) for
rhashtable). The major of such tree will be very shallow and so not much
slower, and still scales much, much better than the current unsorted
list.
v2: Bump vma_compare() to return a long, as we return the result of
comparing two pointers.
References: https://bugs.freedesktop.org/show_bug.cgi?id=87726
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161101115400.15647-1-chris@chris-wilson.co.uk
If we have a tiled object and an unknown CPU swizzle pattern, we pin the
pages to prevent the object from being swapped out (and us corrupting
the contents as we do not know the access pattern and so cannot convert
it to linear and back to tiled on reuse). This requires us to remember
to drop the extra pinning when freeing the object, or else we trigger
warnings about the pin leak. In commit fbbd37b36f ("drm/i915: Move
object release to a freelist + worker"), the object free path was
deferred to a worker, but the unpinning of the quirk, along with marking
the object as reclaimable, was left on the immediate path (so that if
required we could reclaim the pages under memory pressure as early as
possible). However, this split introduced a bug where the pages were no
longer being unpinned if they were marked as unneeded.
[ 231.800401] WARNING: CPU: 1 PID: 90 at drivers/gpu/drm/i915/i915_gem.c:4275 __i915_gem_free_objects+0x326/0x3c0 [i915]
[ 231.800403] WARN_ON(i915_gem_object_has_pinned_pages(obj))
[ 231.800405] Modules linked in:
[ 231.800406] snd_hda_intel i915 snd_hda_codec_generic mei_me snd_hda_codec coretemp snd_hwdep mei lpc_ich snd_hda_core snd_pcm e1000e ptp pps_core [last unloaded: i915]
[ 231.800426] CPU: 1 PID: 90 Comm: kworker/1:4 Tainted: G U 4.9.0-rc2-CI-CI_DRM_1780+ #1
[ 231.800428] Hardware name: LENOVO 7465CTO/7465CTO, BIOS 6DET44WW (2.08 ) 04/22/2009
[ 231.800456] Workqueue: events __i915_gem_free_work [i915]
[ 231.800459] ffffc9000034fc80 ffffffff8142dd65 ffffc9000034fcd0 0000000000000000
[ 231.800465] ffffc9000034fcc0 ffffffff8107e4e6 000010b300000001 0000000000001000
[ 231.800469] ffff88011d3db740 ffff880130ef0000 0000000000000000 ffff880130ef5ea0
[ 231.800474] Call Trace:
[ 231.800479] [<ffffffff8142dd65>] dump_stack+0x67/0x92
[ 231.800484] [<ffffffff8107e4e6>] __warn+0xc6/0xe0
[ 231.800487] [<ffffffff8107e54a>] warn_slowpath_fmt+0x4a/0x50
[ 231.800491] [<ffffffff811d12ac>] ? kmem_cache_free+0x2dc/0x340
[ 231.800520] [<ffffffffa009ef36>] __i915_gem_free_objects+0x326/0x3c0 [i915]
[ 231.800548] [<ffffffffa009effe>] __i915_gem_free_work+0x2e/0x50 [i915]
[ 231.800552] [<ffffffff8109c27c>] process_one_work+0x1ec/0x6b0
[ 231.800555] [<ffffffff8109c1f6>] ? process_one_work+0x166/0x6b0
[ 231.800558] [<ffffffff8109c789>] worker_thread+0x49/0x490
[ 231.800561] [<ffffffff8109c740>] ? process_one_work+0x6b0/0x6b0
[ 231.800563] [<ffffffff8109c740>] ? process_one_work+0x6b0/0x6b0
[ 231.800566] [<ffffffff810a2aab>] kthread+0xeb/0x110
[ 231.800569] [<ffffffff810a29c0>] ? kthread_park+0x60/0x60
[ 231.800573] [<ffffffff818164a7>] ret_from_fork+0x27/0x40
Moving to a separate flag for tracking the quirked pin is overkill for
the bug (since we only have to interchange the two tests in
i915_gem_free_object) but it does reduce a complicated test on all
objects and provide a sanitycheck for uncommon code paths.
Fixes: fbbd37b36f ("drm/i915: Move object release to a freelist + worker")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161101100317.11129-2-chris@chris-wilson.co.uk
With the infrastructure converted over to tracking multiple timelines in
the GEM API whilst preserving the efficiency of using a single execution
timeline internally, we can now assign a separate timeline to every
context with full-ppgtt.
v2: Add a comment to indicate the xfer between timelines upon submission.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-35-chris@chris-wilson.co.uk
A restriction on our global seqno is that they cannot wrap, and that we
cannot use the value 0. This allows us to detect when a request has not
yet been submitted, its global seqno is still 0, and ensures that
hardware semaphores are monotonic as required by older hardware. To
meet these restrictions when we defer the assignment of the global
seqno, we must check that we have an available slot in the global seqno
space during request construction. If that test fails, we wait for all
requests to be completed and reset the hardware back to 0.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-33-chris@chris-wilson.co.uk
Though we will have multiple timelines, we still have a single timeline
of execution. This we can use to provide an execution and retirement order
of requests. This keeps tracking execution of requests simple, and vital
for preserving a single waiter (i.e. so that we can order the waiters so
that only the earliest to wakeup need be woken). To accomplish this we
distinguish the seqno used to order requests per-context (external) and
that used internally for execution.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-26-chris@chris-wilson.co.uk
Before suspend, we wait for the switch to the kernel context. In order
for all the other context images to be complete upon suspend, that
switch must be the last operation by the GPU (i.e. this idling request
must not overtake any pending requests). To make this request execute last,
we make it depend on every other inflight request.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-24-chris@chris-wilson.co.uk
Our timelines are more than just a seqno. They also provide an ordered
list of requests to be executed. Due to the restriction of handling
individual address spaces, we are limited to a timeline per address
space but we use a fence context per engine within.
Our first step to introducing independent timelines per context (i.e. to
allow each context to have a queue of requests to execute that have a
defined set of dependencies on other requests) is to provide a timeline
abstraction for the global execution queue.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-23-chris@chris-wilson.co.uk
In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)
v2: Reimplement busy-ioctl (by walking the reservation object), postpone
the ABI change for another day. Similarly use the reservation object to
find the last_write request (if active and from i915) for choosing
display CS flips.
Caveats:
* busy-ioctl: busy-ioctl only reports on the native fences, it will not
warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is
being rendered to by external fences. It also will not report the same
busy state as wait-ioctl (or polling on the dma-buf) in the same
circumstances. On the plus side, it does retain reporting of which
*i915* engines are engaged with this object.
* non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering, see
"drm/i915: Restore nonblocking awaits for modesetting"
* dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 66s), mainly due to atomic operations
(maintaining the fence refcounts).
* loss of object-level retirement callbacks, emulated by VMA retirement
tracking.
* minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
Having moved the locked phase of freeing an object to a separate worker,
we can now declare to the core that we only need the unlocked variant of
driver->gem_free_object, and can use the simple unreference internally.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-20-chris@chris-wilson.co.uk
We want to hide the latency of releasing objects and their backing
storage from the submission, so we move the actual free to a worker.
This allows us to switch to struct_mutex freeing of the object in the
next patch.
Furthermore, if we know that the object we are dereferencing remains valid
for the duration of our access, we can forgo the usual synchronisation
barriers and atomic reference counting. To ensure this we defer freeing
an object til after an RCU grace period, such that any lookup of the
object within an RCU read critical section will remain valid until
after we exit that critical section. We also employ this delay for
rate-limiting the serialisation on reallocation - we have to slow down
object creation in order to prevent resource starvation (in particular,
files).
v2: Return early in i915_gem_tiling() ioctl to skip over superfluous
work on error.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-19-chris@chris-wilson.co.uk
We only need struct_mutex within pwrite for a brief window where we need
to serialise with rendering and control our cache domains. Elsewhere we
can rely on the backing storage being pinned, and forgive userspace any
races against us.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-17-chris@chris-wilson.co.uk
We only need struct_mutex within pread for a brief window where we need
to serialise with rendering and control our cache domains. Elsewhere we
can rely on the backing storage being pinned, and forgive userspace any
races against us.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-16-chris@chris-wilson.co.uk
Break the allocation of the backing storage away from struct_mutex into
a per-object lock. This allows parallel page allocation, provided we can
do so outside of struct_mutex (i.e. set-domain-ioctl, pwrite, GTT
fault), i.e. before execbuf! The increased cost of the atomic counters
are hidden behind i915_vma_pin() for the typical case of execbuf, i.e.
as the object is typically bound between execbufs, the page_pin_count is
static. The cost will be felt around set-domain and pwrite, but offset
by the improvement from reduced struct_mutex contention.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-14-chris@chris-wilson.co.uk
The plan is to move obj->pages out from under the struct_mutex into its
own per-object lock. We need to prune any assumption of the struct_mutex
from the get_pages/put_pages backends, and to make it easier we pass
around the sg_table to operate on rather than indirectly via the obj.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-13-chris@chris-wilson.co.uk
The plan is to make obtaining the backing storage for the object avoid
struct_mutex (i.e. use its own locking). The first step is to update the
API so that normal users only call pin/unpin whilst working on the
backing storage.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-12-chris@chris-wilson.co.uk
A while ago we switched from a contiguous array of pages into an sglist,
for that was both more convenient for mapping to hardware and avoided
the requirement for a vmalloc array of pages on every object. However,
certain GEM API calls (like pwrite, pread as well as performing
relocations) do desire access to individual struct pages. A quick hack
was to introduce a cache of the last access such that finding the
following page was quick - this works so long as the caller desired
sequential access. Walking backwards, or multiple callers, still hits a
slow linear search for each page. One solution is to store each
successful lookup in a radix tree.
v2: Rewrite building the radixtree for clarity, hopefully.
v3: Rearrange execbuf to avoid calling i915_gem_object_get_sg() from
within an atomic section and so relax the allocation context to a simple
GFP_KERNEL and mutex.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-10-chris@chris-wilson.co.uk