Piglit runner and QA are both looking at the dmesg for
DRM_ERRORs with test cases. Add a flag to control those
when we they are expected from related test cases.
Also add flag to control if contexts should be banned
that introduced the hang. Hangcheck is timer based and
preventing bans by adding sleeps to testcases makes
testing slower.
v2: intel_ring_stopped(), readable comment (Chris)
v3: keep compatibility (Daniel)
References: https://bugs.freedesktop.org/show_bug.cgi?id=75876
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Merge window -fixes pull request as usual. Well, I did sneak in Jani's
drm_i915_private_t typedef removal, need to have fun with a big sed job
too ;-)
Otherwise:
- hdmi interlaced fixes (Jesse&Ville)
- pipe error/underrun/crc tracking fixes, regression in late 3.14-rc (but
not cc: stable since only really relevant for igt runs)
- large cursor wm fixes (Chris)
- fix gpu turbo boost/throttle again, was getting stuck due to vlv rps
patches (Chris+Imre)
- fix runtime pm fallout (Paulo)
- bios framebuffer inherit fix (Chris)
- a few smaller things
* tag 'drm-intel-fixes-2014-04-04' of git://anongit.freedesktop.org/drm-intel: (196 commits)
Skip intel_crt_init for Dell XPS 8700
drm/i915: vlv: fix RPS interrupt mask setting
Revert "drm/i915/vlv: fixup DDR freq detection per Punit spec"
drm/i915: move power domain init earlier during system resume
drm/i915: Fix the computation of required fb size for pipe
drm/i915: don't get/put runtime PM at the debugfs forcewake file
drm/i915: fix WARNs when reading DDI state while suspended
drm/i915: don't read cursor registers on powered down pipes
drm/i915: get runtime PM at i915_display_info
drm/i915: don't read pp_ctrl_reg if we're suspended
drm/i915: get runtime PM at i915_reg_read_ioctl
drm/i915: don't schedule force_wake_timer at gen6_read
drm/i915: vlv: reserve the GT power context only once during driver init
drm/i915: prefer struct drm_i915_private to drm_i915_private_t
drm/i915/overlay: prefer struct drm_i915_private to drm_i915_private_t
drm/i915/ringbuffer: prefer struct drm_i915_private to drm_i915_private_t
drm/i915/display: prefer struct drm_i915_private to drm_i915_private_t
drm/i915/irq: prefer struct drm_i915_private to drm_i915_private_t
drm/i915/gem: prefer struct drm_i915_private to drm_i915_private_t
drm/i915/dma: prefer struct drm_i915_private to drm_i915_private_t
...
Clients like i915 need to segregate cache domains within the GTT which
can lead to small amounts of fragmentation. By allocating the uncached
buffers from the bottom and the cacheable buffers from the top, we can
reduce the amount of wasted space and also optimize allocation of the
mappable portion of the GTT to only those buffers that require CPU
access through the GTT.
For other drivers, allocating small bos from one end and large ones
from the other helps improve the quality of fragmentation.
Based on drm_mm work by Chris Wilson.
v3: Changed to use a TTM placement flag
v2: Updated kerneldoc
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Christian König <deathsimple@vodafone.de>
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Signed-off-by: David Airlie <airlied@redhat.com>
On non-LLC platforms, when changing the cache level of an object, we may
need to unbind it so that prefetching across page boundaries does not
cross into a different memory domain. This requires us to unbind
conflicting vma, but we did so iterating over the objects vma in an
unsafe manner (as the list was being modified as we iterated).
The regression was introduced in
commit 3089c6f239
Author: Ben Widawsky <ben@bwidawsk.net>
Date: Wed Jul 31 17:00:03 2013 -0700
drm/i915: make caching operate on all address spaces
apparently as far back as v3.12-rc1, but it has only just begun to
trigger real world bug reports.
Reported-and-tested-by: Nikolay Martynov <mar.kolya@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76384
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When other platforms add runtime PM support they will also need to
disable interrupts, so move the variable to the runtime PM struct.
Also notice that the longer-term goal is to completely kill the
regsave struct, and I even have patches for that.
v2: - Rebase.
v3: - Rebase.
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Conflicts:
drivers/gpu/drm/i915/intel_dp.c
A bit a mess with reverts which differe in details between -fixes and
-next and some other unrelated shuffling.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is the 3rd respin of the drm-anon patches. They allow module unloading, use
the pin_fs_* helpers recommended by Al and are rebased on top of drm-next. Note
that there are minor conflicts with the "drm-minor" branch.
* 'drm-next' of git://people.freedesktop.org/~dvdhrm/linux:
drm: init TTM dev_mapping in ttm_bo_device_init()
drm: use anon-inode instead of relying on cdevs
drm: add pseudo filesystem for shared inodes
DRM drivers share a common address_space across all character-devices of a
single DRM device. This allows simple buffer eviction and mapping-control.
However, DRM core currently waits for the first ->open() on any char-dev
to mark the underlying inode as backing inode of the device. This delayed
initialization causes ugly conditions all over the place:
if (dev->dev_mapping)
do_sth();
To avoid delayed initialization and to stop reusing the inode of the
char-dev, we allocate an anonymous inode for each DRM device and reset
filp->f_mapping to it on ->open().
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
When we change the cache_level for an object we need to make sure
we don't put differing types of snoopable memory too close to each
other on non-LLC machines.
Currently i915_gem_object_set_cache_level() will stop looking when
it finds just one vma that has such a conflict. Drop the bogus break
statement to make sure it will unbind all vmas which need to be moved
around to avoid the conflict.
I suppose this is a theoretical issue as currently we don't enable
ppgtt on non-LLC machines, so each object can only have one vma.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We don't always want to write into main memory with pwrite. The shmem
fast path in particular is used for memory that is cacheable - under
such circumstances forcing the cache eviction is undesirable. As we will
always flush the cache when targeting incoherent buffers, we can rely on
that second pass to apply the cache coherency rules and so benefit from
in-cache copies otherwise.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We used to lock individual pages inside the buffer object and so needed
to update the page flags every time. However, we now pin the pages into
the object for the duration of the pwrite/pread (and hopefully much
longer) and so we can forgo the flag updates until we release all the
pages.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The command parser is going to need the same synchronization and
setup logic, so factor it out for reuse.
v2: Add a check that the object is backed by shmem
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Its last usage outside of i915_gem.c was removed in:
commit 1f70999f90
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Mon Jan 27 22:43:07 2014 +0000
drm/i915: Prevent recursion by retiring requests when the ring is full
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
After finding the guilty batch and request, we can use it to find the
process that submitted the batch and then add the culprit into the error
state.
This is a slightly different approach from Ben's in that instead of
adding the extra information into the struct i915_hw_context, we use the
information already captured in struct drm_file which is then referenced
from the request.
v2: Also capture the workaround buffer for gen2, so that we can compare
its contents against the intended batch for the active request.
v3: Rebase (Mika)
v4: Check for null context (Chris)
checkpatch warnings fixed
Link: http://lists.freedesktop.org/archives/intel-gfx/2013-August/032280.html
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> (v4)
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Cc: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In the past, it was possible to have multiple batches per request due to
a stray signal or ENOMEM. As a result we had to scan each active object
(filtered by those having the COMMAND domain) for the one that contained
the ACTHD pointer. This was then made more complicated by the
introduction of ppgtt, whereby ACTHD then pointed into the address space
of the context and so also needed to be taken into account.
This is a fairly robust approach (though the implementation is a little
fragile and depends upon the per-generation setup, registers and
parameters). However, due to the requirements for hangstats, we needed a
robust method for associating batches with a particular request and
having that we can rely upon it for finding the associated batch object
for error capture.
If the batch buffer tracking is not robust enough, that should become
apparent quite quickly through an erroneous error capture. That should
also help to make sure that the runtime reporting to userspace is
robust. It also means that we then report the oldest incomplete batch on
each ring, which can be useful for determining the state of userspace at
the time of a hang.
v2: Use i915_gem_find_active_request (Mika)
v3: remove check for ring->get_seqno, split long lines (Ben)
v4: check that context is available (Chris)
checkpatch warnings fixed
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> (v3)
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net> (v3)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In place of true activity counting, we walk the list of vma associated
with an object managing each on the vm's active/inactive list everytime
we call move-to-inactive. This depends upon the vma->mm_list being
cleared after unbinding, or else we run into difficulty when tracking
the object in multiple vm's - we see a use-after free and corruption of
the mm_list.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If we've explicitly stopped the rings for testing purposes, don't ban
the default context. Fixes kms_flip hang tests.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We currently call intel_mark_idle() too often, as we do so as a
side-effect of processing the request queue. However, we the calls to
intel_mark_idle() are expected to be paired with a call to
intel_mark_busy() (or else we try to idle the hardware by accessing
registers that are already disabled). Make the idle/busy tracking
explicit to prevent the multiple calls.
v2: We can drop some of the complexity in __i915_add_request() as
queue_delayed_work() already behaves as we want (not requeuing the item
if it is already in the queue) and mark_busy/mark_idle imply that the
idle task is inactive.
v3: We do still need to cancel the pending idle task so that it is sent
again after the current busy load completes (not in the middle of it).
Reported-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
One side-effect of the introduction of ppgtt was that we needed to
rebind the object into the appropriate vm (and global gtt in some
peculiar cases). For simplicity this was done twice for every object on
every call to execbuffer. However, that adds a tremendous amount of CPU
overhead (rewriting all the PTE for all objects into WC memory) per
draw. The fix is to push all the decision about which vm to bind into
and when down into the low-level bind routines through hints rather than
performing the bind unconditionally in the execbuffer routine.
Note that this is a regression introduced in the full ppgtt feature
branch, before this we've only done re-bound objects when the relevant
has_(aliasing_ppgtt|global_gtt)_mapping flag was clear. But since
that's per-object and not per-vma that optimization broke.
v2: Split out prep work and unrelated changes.
v3: Bring back functional change around PIN_GLOBAL that I've
accidentally split out.
v4: Remove the temporary hack for the old binding logic to avoid
bisection issues.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72906
Tested-by: jianx.zhou@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is prep work for reworking the object_pin logic. Atm
it still does a (now redundant) lookup of the vma. The next
patch will fix this.
Split out from Chris vma-bind rework.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Split out from Chris vma-bind rework.
Jani wondered why this is save, and the reason is that i915_vma_unbind
does all these checks, too. So they're redundant.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With abitrary pin flags it makes sense to split out a "please bind
this into global gtt" from the "please allocate in the mappable
range".
Use this unconditionally in our global gtt pin helper since this is
what its callers want. Later patches will drop PIN_MAPPABLE where it's
not strictly needed.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Anything more than just one bool parameter is just a pain to read,
symbolic constants are much better.
Split out from Chris' vma-binding rework patch.
v2: Undo the behaviour change in object_pin that Chris spotted.
v3: Split out misplaced hunk to handle set_cache_level errors,
spotted by Jani.
v4: Keep the current over-zealous binding logic in the execbuffer code
working with a quick hack while the overall binding code gets shuffled
around.
v5: Reorder the PIN_ flags for more natural patch splitup.
v6: Pull out the PIN_GLOBAL split-up again.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Arjan van de Ven reported that on his test machine that he was seeing
stalls of greater than 1 frame greatly impacting the user experience. He
tracked this down to being the locked flush during a pagefault as being
the culprit hogging the struct_mutex and so blocking any other user from
proceeding. Stalling on a pagefault is bad behaviour on userspace's
part, for one it means that they are ignoring the coherency rules on
pointer access through the GTT, but fortunately we can apply the same
trick as the set-to-domain ioctl to do a lightweight, nonblocking flush
of outstanding rendering first.
"Prior to the patch it looks like this
(this one testrun does not show the 20ms+ I've seen occasionally)
4.99 ms 2.36 ms 31360 __wait_seqno i915_wait_seqno i915_gem_object_wait_rendering i915_gem_object_set_to_gtt_domain i915_gem_fault __do_fault handle_
+pte_fault handle_mm_fault __do_page_fault do_page_fault page_fault
4.99 ms 2.75 ms 107751 __wait_seqno i915_gem_wait_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret
4.99 ms 1.63 ms 1666 i915_mutex_lock_interruptible i915_gem_fault __do_fault handle_pte_fault handle_mm_fault __do_page_fault do_page_fault page_fa
+ult
4.93 ms 2.45 ms 980 i915_mutex_lock_interruptible intel_crtc_page_flip drm_mode_page_flip_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_
+sysret
4.89 ms 2.20 ms 3283 i915_mutex_lock_interruptible i915_gem_wait_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret
4.34 ms 1.66 ms 1715 i915_mutex_lock_interruptible i915_gem_pwrite_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret
3.73 ms 3.73 ms 49 i915_mutex_lock_interruptible i915_gem_set_domain_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret
3.17 ms 0.33 ms 931 i915_mutex_lock_interruptible i915_gem_madvise_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret
2.97 ms 0.43 ms 1029 i915_mutex_lock_interruptible i915_gem_busy_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret
2.55 ms 0.51 ms 735 i915_gem_get_tiling drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret
After the patch it looks like this:
4.99 ms 2.14 ms 22212 __wait_seqno i915_gem_wait_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret
4.86 ms 0.99 ms 14170 __wait_seqno i915_gem_object_wait_rendering__nonblocking i915_gem_fault __do_fault handle_pte_fault handle_mm_fault __do_page_
+fault do_page_fault page_fault
3.59 ms 1.31 ms 325 i915_gem_get_tiling drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret
3.37 ms 3.37 ms 65 i915_mutex_lock_interruptible i915_gem_wait_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret
2.58 ms 2.58 ms 65 i915_mutex_lock_interruptible i915_gem_do_execbuffer.isra.23 i915_gem_execbuffer2 drm_ioctl i915_compat_ioctl compat_sys_ioctl
+ia32_sysret
2.19 ms 2.19 ms 65 i915_mutex_lock_interruptible intel_crtc_page_flip drm_mode_page_flip_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_
+sysret
2.18 ms 2.18 ms 65 i915_mutex_lock_interruptible i915_gem_busy_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret
1.66 ms 1.66 ms 65 i915_gem_set_tiling drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret
It may not look like it, but this is quite a large difference, and I've
been unable to reproduce > 5 msec delays at all, while before they do
happen (just not in the trace above)."
gem_gtt_hog on an old Pineview (GMA3150),
before: 4969.119ms
after: 4122.749ms
Reported-by: Arjan van de Ven <arjan.van.de.ven@intel.com>
Testcase: igt/gem_gtt_hog
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When we detect that the user passed along an invalid handle or object,
we emit a warning as an aide for debugging. Since these are indeed only
for debugging user triggerable errors (and the errors are reported back
to userspace by the errno), the messages should only be at the debug
level and not claiming that there is a catastrophic error in the
driver/hardware.
References: https://bugs.freedesktop.org/show_bug.cgi?id=74704
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If we make sure that all the dev_priv->info usages are wrapped by
INTEL_INFO(), we can easily modify the ->info field to be structure and
not a pointer while keeping the const protection in the INTEL_INFO()
macro.
v2: Rebased onto latest drm-nightly
Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Since a purged buffer is one without any associated pages, attempting to
use it should generate EFAULT rather than EINVAL, as it is not strictly
an invalid parameter.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
EFAULT will be a possible return code where backing storage is
transient, such after it is purged by madvise. As such it is to be
expected and so should not trigger a WARN inside i915_gem_fault() but be
converted silently to SIGBUS.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we seek the guilty batch using request and hangcheck
score, this code is not needed anymore.
v2: Rebase. Passing dev_priv instead of getting it from last_ring
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With full ppgtt using acthd is not enough to find guilty
batch buffer. We get multiple false positives as acthd is
per vm.
Instead of scanning which vm was running on a ring,
to find corressponding context, use a different, simpler,
strategy of finding batches that caused gpu hang:
If hangcheck has declared ring to be hung, find first non complete
request on that ring and claim it was guilty.
v2: Rebase
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73652
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If we have stopped rings then we know that test is running
so no need for spam. In addition, only spam when default
context gets banned.
v2: - make sure default context ban gets shown (Chris)
- use helper for checking for default context, everywhere (Chris)
v3: - dont be quiet when debug is set (Ben, Daniel)
Reference: https://bugs.freedesktop.org/show_bug.cgi?id=73652
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With full ppgtt support drm_i915_file_private gained knowledge
about the default context. Also reset stats are now inside
i915_hw_context so we can use proper abstraction.
v2: Move BUG_ON and WARN_ON to more proper locations (Ben)
v3: Pass dev directly to i915_context_is_banned to avoid the need to
dereference ctx->last_ring. Spotted by Mika when checking my
s/BUG/WARN/ change, I've missed this ->last_ring dereference.
Suggested-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> (v2)
Reviewed-by: Ben Widawsky <ben@bwidawsk.net> (v2)
[danvet: s/BUG/WARN/]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
At least I couldn't find it in the Haswell Bspec any more and we've
tried to test-boot a Haswell machine with num_pipes forced to 0 (i.e.
hit the PCH_NOP path) and the unclaimed register logic complained.
So restrict this dance to just ivb platforms.
v2: Art pointed out that the bits simply moved on hsw+
v3: Buy code terseneness with a notch of sublety as suggested by
Chris.
v4: Frob the right bit, spotted by Art.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Arthur Ranyan <arthur.j.runyan@intel.com>
Cc: Dave Airlie <airlied@gmail.com>
Reviewed-by: Art Runyan <arthur.j.runyan@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With 20+ module parameters, I think referring to them via a struct
improves clarity over just having a bunch of globals. While at it, move
the parameter initialization and definitions into a new file
i915_params.c to reduce clutter in i915_drv.c.
Apart from the ill-named i915_enable_rc6, i915_enable_fbc and
i915_enable_ppgtt parameters, for which we lose the "i915_" prefix
internally, the module parameters now look the same both on the kernel
command line and in code. For example, "i915.modeset".
The downsides of the change are losing static on a couple of variables
and not having the initialization and module_param_named() right next to
each other. On the other hand, all module parameters are now defined in
one place at i915_params.c. Plus you can do this to find all module
parameter references:
$ git grep "i915\." -- drivers/gpu/drm/i915
v2:
- move the definitions into a new file
- s/i915_params/i915/
- make i915_try_reset i915.reset, for consistency
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Because whatever.*
* This should contain a fairly long list of issues and still
unresolved resgressions, but I didn't really get a vote.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is useful for debugging as we then know that the first entry is
always the global GTT, and all later entries the per-process GTT VM.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
On older generations (gen2, gen3) the GPU requires fences for many
operations, such as blits. The display hardware also requires fences for
scanouts and this leads to a situation where an arbitrary number of
fences may be pinned by old scanouts following a pageflip but before we
have executed the unpin workqueue. This is unpredictable by userspace
and leads to random EDEADLK when submitting an otherwise benign
execbuffer. However, we can detect when we have an outstanding flip and
so cause userspace to wait upon their completion before finally
declaring that the system is starved of fences. This is really no worse
than forcing the GPU to stall waiting for older execbuffer to retire and
release their fences before we can reallocate them for the next
execbuffer.
v2: move the test for a pending fb unpin to a common routine for
later reuse during eviction
Reported-and-tested-by: dimon@gmx.net
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73696
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We need to defer the free request until the object/vma is capable of
being freed - or else we have a problem when we try to destroy the
context.
The exact same issue is described and fixed here:
commit e20780439b
Author: Ben Widawsky <ben@bwidawsk.net>
Date: Fri Dec 6 14:11:22 2013 -0800
drm/i915: Defer request freeing
I had this fix previously, but decided not to keep it for some reason I
can no longer remember.
gem_reset_stats is a really good test at hitting the problem.
For the inquisitive:
[ 170.516392] ------------[ cut here ]------------
[ 170.517227] WARNING: CPU: 1 PID: 105 at drivers/gpu/drm/drm_mm.c:578 drm_mm_takedown+0x2e/0x30 [drm]()
[ 170.518064] Memory manager not clean during takedown.
[ 170.518941] CPU: 1 PID: 105 Comm: kworker/1:1 Not tainted 3.13.0-rc4-BEN+ #28
[ 170.519787] Hardware name: Hewlett-Packard HP EliteBook 8470p/179B, BIOS 68ICF Ver. F.02 04/27/2012
[ 170.520662] Call Trace:
[ 170.521517] [<ffffffff814f0589>] dump_stack+0x4e/0x7a
[ 170.522373] [<ffffffff81049e6d>] warn_slowpath_common+0x7d/0xa0
[ 170.523227] [<ffffffff81049edc>] warn_slowpath_fmt+0x4c/0x50
[ 170.524079] [<ffffffffa06c414e>] drm_mm_takedown+0x2e/0x30 [drm]
[ 170.524934] [<ffffffffa07213f3>] gen6_ppgtt_cleanup+0x23/0x110
[i915]
[ 170.525777] [<ffffffffa07837ed>] ppgtt_release.part.5+0x24/0x29
[i915]
[ 170.526603] [<ffffffffa071aaa5>] i915_gem_context_free+0x195/0x1a0
[i915]
[ 170.527423] [<ffffffffa071189d>] i915_gem_free_request+0x9d/0xb0
[i915]
[ 170.528247] [<ffffffffa0718af9>] i915_gem_reset+0x1f9/0x3f0 [i915]
[ 170.529065] [<ffffffffa0700cce>] i915_reset+0x4e/0x180 [i915]
[ 170.529870] [<ffffffffa070829d>] i915_error_work_func+0xcd/0x120
[i915]
[ 170.530666] [<ffffffff8106c13a>] process_one_work+0x1fa/0x6d0
[ 170.531453] [<ffffffff8106c0d8>] ? process_one_work+0x198/0x6d0
[ 170.532230] [<ffffffff8106c72b>] worker_thread+0x11b/0x3a0
[ 170.532996] [<ffffffff8106c610>] ? process_one_work+0x6d0/0x6d0
[ 170.533771] [<ffffffff810743ef>] kthread+0xff/0x120
[ 170.534548] [<ffffffff810742f0>] ? insert_kthread_work+0x80/0x80
[ 170.535322] [<ffffffff814f97ac>] ret_from_fork+0x7c/0xb0
[ 170.536089] [<ffffffff810742f0>] ? insert_kthread_work+0x80/0x80
[ 170.536847] ---[ end trace 3d4c12892e42d58f ]---
v2: Whitespace fix. (Chris)
Note: This is a bug that only hits the ppgtt topic branch but I've
figured that doing the request cleanup in this order is generally the
right thing to do.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Add a code comment to clarify what's actually going on since
the lifetime rules aroung ppgtt cleanup are ... fuzzy a best atm. Also
add a note about why we need this.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is user-triggerable and hence we should not allow it to spam
dmesg. Also, it upsets the nice dmesg tracking piglit does.
Note that this is just extra debugging information, mostly
unwanted, in case of a hang and that there is a separate message to the
user giving instructions on how to report a bug for a GPU hang.
v2: Add note as suggests in Chris' reply.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72740
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Conflicts are getting out of hand, and now we have to shuffle even
more in -next which was also shuffled in -fixes (the call for
drm_mode_config_reset needs to move yet again).
So do a proper backmerge. I wanted to wait with this for the 3.13
relaese, but alas let's just do this now.
Conflicts:
drivers/gpu/drm/i915/i915_reg.h
drivers/gpu/drm/i915/intel_ddi.c
drivers/gpu/drm/i915/intel_display.c
drivers/gpu/drm/i915/intel_pm.c
Besides the conflict around the forcewake get/put (where we chaged the
called function in -fixes and added a new parameter in -next) code all
the current conflicts are of the adjacent lines changed type.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Freeing a request triggers the destruction of the context. This needs to
occur after all objects are themselves unbound from the context, and so
the free request needs to occur after the object release during retire.
This tidies up
commit e20780439b
Author: Ben Widawsky <ben@bwidawsk.net>
Date: Fri Dec 6 14:11:22 2013 -0800
drm/i915: Defer request freeing
by simply swapping the order of operations rather than introducing
further complexity - as noted during review.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This reverts commit 4fe9adbc36.
The patch completely lacks a detailed explanation of what exactly
blows up and how, so is insufficiently justified as a band-aid.
Otoh the justification as a safety measure against userspace botching
up relocations is also fairly weak: If we want real project we need to
at least make the gab big enough that the gpu doesn't scribble over
more important stuff. With 4k screens that would be 32MB.
Also I think this would be much better in conjunction with a (debug)
switch to disable our use of the scratch page.
Hence revert this.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Especially with ppgtt this kinda stopped making sense. And if we
indeed need this to hack around an issue, we need something that also
works for non-root.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As with processes which run on the CPU, the goal of multiple VMs is to
provide process isolation. Specific to GEN, there is also the ability to
map more objects per process (2GB each instead of 2Gb-2k total).
For the most part, all the pipes have been laid, and all we need to do
is remove asserts and actually start changing address spaces with the
context switch. Since prior to this we've converted the setting of the
page tables to a streamed version, this is quite easy.
One important thing to point out (since it'd been hotly contested) is
that with this patch, every context created will have it's own address
space (provided the HW can do it).
v2: Disable BDW on rebase
NOTE: I tried to make this commit as small as possible. I needed one
place where I could "turn everything on" and that is here. It could be
split into finer commits, but I didn't really see much point.
Cc: Eric Anholt <eric@anholt.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
I need the tricky do_switch fix before I can merge the final piece of
the ppgtt enabling puzzle. Otherwise the conflict will be a real pain
to resolve since the do_switch hunk from -fixes must be placed at the
exact right place within a hunk in the next patch.
Conflicts:
drivers/gpu/drm/i915/i915_gem_context.c
drivers/gpu/drm/i915/i915_gem_execbuffer.c
drivers/gpu/drm/i915/intel_display.c
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is primarily a band aid for an unexplainable error in
gem_reloc_vs_gpu/forked-faulting-reloc-thrashing. Essentially as soon as
a relocated buffer (which had a non-zero presumed offset) moved to
offset 0, something goes bad. Since I have been unable to solve this,
and potentially this is a good thing to do anyway, since many things can
accidentally write to offset 0, why not?
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With context destruction, we always want to be able to tear down the
underlying address space. This is invoked on the last unreference to the
context which could happen before we've moved all objects to the
inactive list. To enable a clean tear down the address space, make sure
to process the request free lastly.
Without this change, we cannot guarantee to we don't still have active
objects in the VM.
As an example of a failing case:
CTX-A is created, count=1
CTX-A is used during execbuf
does a context switch count = 2
and add_request count = 3
CTX B runs, switches, CTX-A count = 2
CTX-A is destroyed, count = 1
retire requests is called
free_request from CTX-A, count = 0 <--- free context with active object
As mentioned above, by doing the free request after processing the
active list, we can avoid this case.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We need to have the address space when reserving space for the objects.
Since the address space and context are tied together, and reserve
occurs before context switch (for good reason), we must lookup our
context earlier in the process.
This leaves some room for optimizations where we no longer need to use
ctx_id in certain places. This will be addressed in a subsequent patch.
Important tricky bit:
Because slow relocations during execbuffer drop struct_mutex
Perhaps it would be best to acquire the reference when we get the
context, but I'll save that for another day (note I have written the
patch before, and I found the changes required to be uglier than this).
Note that since we currently access everything via context id, and not
the data structure this is fine, though not desirable. The next change
attempts to get the context only once via the context ID idr lookup, and
as such, the following can happen:
CTX-A is created, refcount = 1
CTX-A execbuf, mutex dropped
close IOCTL called on CTX-A, refcount = 0
CTX-A resumes in execbuf.
v2: Rebased on top of
commit b6359918b8
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date: Wed Oct 30 15:44:16 2013 +0200
drm/i915: add i915_get_reset_stats_ioctl
v3: Rebased on top of
commit 25b3dfc87b
Author: Mika Westerberg <mika.westerberg@linux.intel.com>
Date: Tue Nov 12 11:57:30 2013 +0200
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date: Tue Nov 26 16:14:33 2013 +0200
drm/i915: check context reset stats before relocations
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
To simplify the codepaths somewhat, we can simply always create a
context. Contexts already keep hangstat information. This prevents us
from having to differentiate at other parts in the code.
There is allocation overhead, but it should not be measurable.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We have a default context which suits the aliasing PPGTT well. Tie them
together so it looks like any other context/PPGTT pair. This makes the
code cleaner as it won't have to special case aliasing as often.
The patch has one slightly tricky part in the default context creation
function. In the future (and on aliased setup) we create a new VM for a
context (potentially). However, if we have aliasing PPGTT, which occurs
at this point in time for all platforms GEN6+, we can simply manage the
refcounting to allow things to behave as normal. Now is a good time to
recall that the aliasing_ppgtt doesn't have a real VM, it uses the GGTT
drm_mm.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We **need** to do this for exactly 1 reason, because we want to embed a
PPGTT into the context, but we don't want to special case the default
context.
To achieve that, we must be able to initialize contexts after the GTT is
setup (so we can allocate and pin the default context's BO), but before
the PPGTT and rings are initialized. This is because, currently, context
initialization requires ring usage. We don't have rings until after the
GTT is setup. If we split the enabling part of context initialization,
the part requiring the ringbuffer, we can untangle this, and then later
embed the PPGTT
Incidentally this allows us to also adhere to the original design of
context init/fini in future patches: they were only ever meant to be
called at driver load and unload.
v2: Move hw_contexts_disabled test in i915_gem_context_enable() (Chris)
v3: BUG_ON after checking for disabled contexts. Or else it blows up pre
gen6 (Ben)
v4: Forward port
Modified enable for each ring, since that patch is earlier in the series
Dropped ring arg from create_default_context so it can be used by others
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This patch adds to changes for contexts on reset:
Sets last context to default - this will prevent the context switch
happening after a reset. That switch is not possible because the
rings are hung during reset and context switch requires reset. This
behavior will need to be reworked in the future, but this is what we
want for now.
In the future, we'll also want to reset the guilty context to
uninitialized. We should wait for ARB_Robustness related code to land
for that.
This is somewhat for paranoia. Because we really don't know what the
GPU was doing when it hung, or the state it was in (mid context write,
for example), later restoring the context is a bad idea. By setting the
flag to not initialized, the next load of that context will not restore
the state, and thus on the subsequent switch away from the context will
overwrite the old data.
NOTE: This code needs a fixup when we actually have multiple VMs. The
issue that can occur is inactive objects in a VM will need to be
destroyed before the last context unref. This can now happen via the
fake switch introduced in this patch (and it other ways in the future)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We'll be doing a bit more stuff with each file, so having our own open
function should make things clean.
This also allows us to easily add conditionals for stuff we don't want
to do when we don't have HW contexts.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
To sum up what goes on here, we abstract the vma binding, similarly to
the previous object binding. This helps for distinguishing legacy
binding, versus modern binding. To keep the code churn as minimal as
possible, I am leaving in insert_entries(). It serves as the per
platform pte writing basically. bind_vma and insert_entries do share a
lot of similarities, and I did have designs to combine the two, but as
mentioned already... too much churn in an already massive patchset.
What follows are the 3 commits which existed discretely in the original
submissions. Upon rebasing on Broadwell support, it became clear that
separation was not good, and only made for more error prone code. Below
are the 3 commit messages with all their history.
drm/i915: Add bind/unbind object functions to VMA
drm/i915: Use the new vm [un]bind functions
drm/i915: reduce vm->insert_entries() usage
drm/i915: Add bind/unbind object functions to VMA
As we plumb the code with more VM information, it has become more
obvious that the easiest way to deal with bind and unbind is to simply
put the function pointers in the vm, and let those choose the correct
way to handle the page table updates. This change allows many places in
the code to simply be vm->bind, and not have to worry about
distinguishing PPGTT vs GGTT.
Notice that this patch has no impact on functionality. I've decided to
save the actual change until the next patch because I think it's easier
to review that way. I'm happy to squash the two, or let Daniel do it on
merge.
v2:
Make ggtt handle the quirky aliasing ppgtt
Add flags to bind object to support above
Don't ever call bind/unbind directly for PPGTT until we have real, full
PPGTT (use NULLs to assert this)
Make sure we rebind the ggtt if there already is a ggtt binding. This
happens on set cache levels.
Use VMA for bind/unbind (Daniel, Ben)
v3: Reorganize ggtt_vma_bind to be more concise and easier to read
(Ville). Change logic in unbind to only unbind ggtt when there is a
global mapping, and to remove a redundant check if the aliasing ppgtt
exists.
v4: Make the bind function a bit smarter about the cache levels to avoid
unnecessary multiple remaps. "I accept it is a wart, I think unifying
the pin_vma / bind_vma could be unified later" (Chris)
Removed the git notes, and put version info here. (Daniel)
v5: Update the comment to not suck (Chris)
v6:
Move bind/unbind to the VMA. It makes more sense in the VMA structure
(always has, but I was previously lazy). With this change, it will allow
us to keep a distinct insert_entries.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
drm/i915: Use the new vm [un]bind functions
Building on the last patch which created the new function pointers in
the VM for bind/unbind, here we actually put those new function pointers
to use.
Split out as a separate patch to aid in review. I'm fine with squashing
into the previous patch if people request it.
v2: Updated to address the smart ggtt which can do aliasing as needed
Make sure we bind to global gtt when mappable and fenceable. I thought
we could get away without this initialy, but we cannot.
v3: Make the global GTT binding explicitly use the ggtt VM for
bind_vma(). While at it, use the new ggtt_vma helper (Chris)
At this point the original mailing list thread diverges. ie.
v4^:
use target_obj instead of obj for gen6 relocate_entry
vma->bind_vma() can be called safely during pin. So simply do that
instead of the complicated conditionals.
Don't restore PPGTT bound objects on resume path
Bug fix in resume path for globally bound Bos
Properly handle secure dispatch
Rebased on vma bind/unbind conversion
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
drm/i915: reduce vm->insert_entries() usage
FKA: drm/i915: eliminate vm->insert_entries()
With bind/unbind function pointers in place, we no longer need
insert_entries. We could, and want, to remove clear_range, however it's
not totally easy at this point. Since it's used in a couple of place
still that don't only deal in objects: setup, ppgtt init, and restore
gtt mappings.
v2: Don't actually remove insert_entries, just limit its usage. It will
be useful when we introduce gen8. It will always be called from the vma
bind/unbind.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This came from a patch called, "drm/i915: Move active to vma"
When moving an object to the inactive list, we do it for all VMs for
which the object is bound.
The primary difference from that patch is this time around we don't not
track 'active' per vma, but rather by object. Therefore, we only need
one unref.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This was found by code inspection. If the GTT setup fails then we are
left without properly tearing down the drm_mm.
Hopefully this never happens.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
To be able to effectively use the GGTT object lookup function, we don't
want to warn when there is no GGTT mapping. Let the caller deal with it
instead.
Originally, I had intended to have this behavior, and has not
introduced the WARN. It was introduced during review with the addition
of the follow commit
commit 5c2abbeab7
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date: Tue Sep 24 09:57:57 2013 -0700
drm/i915: Provide a cheap ggtt vma lookup
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Since the beginning, the functions which try to properly reference the
aliasing PPGTT have deferences a potentially null aliasing_ppgtt member.
Since the accessors are meant to be global, this will not do.
Introduced originally in:
commit a70a3148b0
Author: Ben Widawsky <ben@bwidawsk.net>
Date: Wed Jul 31 16:59:56 2013 -0700
drm/i915: Make proper functions for VMs
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
So we'll get a fault when someone tries to access the mmap, then we'll
wake up from D3.
v2: - Rebase
v3: - Use gtt active/inactive
Testcase: igt/pm_pc8/gem-mmap-gtt
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
[danvet: Add comment + WARN as discussed with Paulo on irc.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If test is running, irq_get was not called so we should gain
balance by not doing irq_put
"So the rule is: if you access unlocked values, you use ACCESS_ONCE().
You don't say "but it can't matter". Because you simply don't know."
-- Linus
v2: use local variable so it can't change during test (Chris)
v3: update commit msg and use ACCESS_ONCE (Ville)
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Commit 094f9a54e3 ("drm/i915: Fix __wait_seqno to use true infinite
timeouts") added support for __wait_seqno to detect missing interrupts and
go around them by polling. As there is also timeout detection in
__wait_seqno, the polling and timeout detection were done with the same
timer.
When there has been missed interrupts and polling is needed, the timer is
set to trigger in (now + 1) jiffies in future, instead of the caller
specified timeout.
Now when io_schedule() returns, we calculate the jiffies left to timeout
using the timer expiration value. As the current jiffies is now bound to be
always equal or greater than the expiration value, the timeout_jiffies will
become zero or negative and we return -ETIME to caller even tho the
timeout was never reached.
Fix this by decoupling timeout calculation from timer expiration.
v2: Commit message with some sense in it (Chris Wilson)
v3: add parenthesis on timeout_expire calculation
v4: don't read jiffies without timeout (Chris Wilson)
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As the rings may be processed and their requests deallocated in a
different order to the natural retirement during a reset,
/* Whilst this request exists, batch_obj will be on the
* active_list, and so will hold the active reference. Only when this
* request is retired will the the batch_obj be moved onto the
* inactive_list and lose its active reference. Hence we do not need
* to explicitly hold another reference here.
*/
is violated, and the batch_obj may be dereferenced after it had been
freed on another ring. This can be simply avoided by processing the
status update prior to deallocating any requests.
Fixes regression (a possible OOPS following a GPU hang) from
commit aa60c664e6
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date: Wed Jun 12 15:13:20 2013 +0300
drm/i915: find guilty batch buffer on ring resets
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
[danvet: Add the code comment Chris supplied.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If I add code to enable runtime PM on my Haswell machine, start a
desktop environment, then enable runtime PM, these functions will
complain that they're trying to read/write registers while the
graphics card is suspended.
v2: - Simplify i915_gem_fault changes.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
[danvet: Drop the hunk in i915_hangcheck_elapsed, it's the wrong thing
to do.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It is useful to assert that if the object is bound, then it must have
its pages pinned to prevent the shrinker from reaping its backing store.
This is even more useful with the introduction of real-ppgtt whereupon
we may have the object bound into several vma, with each instance
pinning the backing store. This assertion breaks down during unbind
where we unpinned the backing store before decoupling the vma binding.
This can be fixed with a trivial reording of the unbind sequence, which
reinforces the
pin pages
bind to vma
...
unbind from vma
unpin pages
concept.
v2: Bonus comment
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The MI_PREDICATE_RESULT_2 register exits only on HSW. On other
platforms the same offset is either reserved, or contains some
other register. So write the register only on HSW.
This regression has been introduced in
commit 9435373ef8
Author: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Date: Wed Aug 28 16:45:46 2013 -0300
drm/i915: Report enabled slices on Haswell GT3
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
[danvet: Add regression notice.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Pull in Jani's backlight rework branch. This was merged through a
separate branch to be able to sort out the Broadwell conflicts
properly before pulling it into the main development branch.
Conflicts:
drivers/gpu/drm/i915/intel_display.c
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
v2: Fixed the botched locking on init_hw failure in i915_reset (Ville)
Call cleanup_ringbuffer on failed context create in init_hw (Ville)
v3: Add dev argument ti clean_ringbuffer
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Use the nice Kernel macro, it makes the code much more readable.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
At least on linux sizeof(long) == sizeof(void*) and the thinking
is that you can grab about as many references as there's memory.
Doesn't really matter, just a bit of OCD since the fixed size data
type in a pure in-kernel datastructure look off.
v2: Ville asked for an overflow check since no one prevents userspace
from incrementing the pin count forever.
v3: s/INT/LONG/, noticed by Chris.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We have two once very similar functions, i915_gpu_idle() and
i915_gem_idle(). The former is used as the lower level operation to
flush work on the GPU, whereas the latter is the high level interface to
flush the GEM bookkeeping in addition to flushing the GPU. As such
i915_gem_idle() also clears out the request and activity lists and
cancels the delayed work. This is what we need for unloading the driver,
unfortunately we called i915_gpu_idle() instead.
In the process, make sure that when cancelling the delayed work and
timer, which is synchronous, that we do not hold any locks to prevent a
deadlock if the work item is already waiting upon the mutex. This
requires us to push the mutex down from the caller to i915_gem_idle().
v2: s/i915_gem_idle/i915_gem_suspend/
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70334
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: xunx.fang@intel.com
[danvet: Only set ums.suspended for !kms as discussed earlier. Chris
noticed that this slipped through.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
I had this lying around from he original PPGTT series, and thought we
might try to get it in by itself.
It's convenient to just call i915_gem_init_hw at reset because we'll be
adding new things to that function, and having just one function to call
instead of reimplementing it in two places is nice.
In order to accommodate we cleanup ringbuffers in order to bring them
back up cleanly. Optionally, we could also teardown/re initialize the
default context but this was causing some problems on reset which I
wasn't able to fully debug, and is unnecessary with the previous context
init/enable split.
This essentially reverts:
commit 8e88a2bd59
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Tue Jun 19 18:40:00 2012 +0200
drm/i915: don't call modeset_init_hw in i915_reset
It seems to work for me on ILK now. Perhaps it's due to:
commit 8a5c2ae753
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date: Thu Mar 28 13:57:19 2013 -0700
drm/i915: fix ILK GPU reset for render
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In truly crazy circumstances shmem might give us the wrong type of
page. So be a bit paranoid and double check this.
Reviewer: Damien Lespiau <damien.lespiau@intel.com>
Cc: Rob Clark <robdclark@gmail.com>
References: http://lkml.org/lkml/2011/7/11/238
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The interface uses an unsigned long, and we can use the unsigned counter
throughout our code, so do so. In the process, we notice one instance
where the shrink count is based on a heuristic rather than the result,
and another where we ask for too many pages to be purged.
v2: nr_to_scan needs to be promoted to a long as well, so just use
sc->nr_to_scan directly.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Since we are waiting upon IO completion, inform the kernel through use
of the io_schedule() call rather than the regular schedule(). This
should allow the kernel to make better decisions regarding scheduling
and power management.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The conflict in intel_drv.h tripped me up a bit since a patch in dinq
moves all the functions around, but another one in drm-next removes a
single function. So I'ev figured backing this into a backmerge would
be good.
i915_dma.c is just adjacent lines changed, nothing nefarious there.
Conflicts:
drivers/gpu/drm/i915/i915_dma.c
drivers/gpu/drm/i915/intel_drv.h
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
All drivers embed gem-objects into their own buffer objects. There is no
reason to keep drm_gem_object_alloc(), gem->driver_private and
->gem_init_object() anymore.
New drivers are highly encouraged to do the same. There is no benefit in
allocating gem-objects separately.
Cc: Dave Airlie <airlied@gmail.com>
Cc: Alex Deucher <alexdeucher@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Ben Skeggs <skeggsb@gmail.com>
Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When we switched to always using a timeout in conjunction with
wait_seqno, we lost the ability to detect missed interrupts. Since, we
have had issues with interrupts on a number of generations, and they are
required to be delivered in a timely fashion for a smooth UX, it is
important that we do log errors found in the wild and prevent the
display stalling for upwards of 1s every time the seqno interrupt is
missed.
Rather than continue to fix up the timeouts to work around the interface
impedence in wait_event_*(), open code the combination of
wait_event[_interruptible][_timeout], and use the exposed timer to
poll for seqno should we detect a lost interrupt.
v2: In order to satisfy the debug requirement of logging missed
interrupts with the real world requirments of making machines work even
if interrupts are hosed, we revert to polling after detecting a missed
interrupt.
v3: Throw in a debugfs interface to simulate broken hw not reporting
interrupts.
v4: s/EGAIN/EAGAIN/ (Imre)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Imre Deak <imre.deak@intel.com>
[danvet: Don't use the struct typedef in new code.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
So that we can find the callers who introduce a ring stall. A single
ring stall is not too unwelcome, the right issue becomes when they start
to interlock and prevent any concurrent work. That, however, is a little
tricker to detect with a mere tracepoint!
v2: Rebrand it as a ring event, rather than an object event.
v3: Include the seqno in the tracepoint for posterity or something.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Even though we track object activity and not VMA, because we have the
active_list be based on the VM, it makes the most sense to use VMAs in
the APIs.
NOTE: Daniel intends to eventually rip out active/inactive LRUs, but for
now, leave them be.
v2: Remove leftover hunk from the previous patch which didn't keep
i915_gem_object_move_to_active. That patch had to rely on the ring to
get the dev instead of the obj. (Chris)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
"We do fairly often lookup the ggtt vma for an obj." - Chris Wilson. As
such, provide a function to offer slightly cheaper access to the vma.
Not performance tested. By my quick estimation it saves at least 3
pointer dereferences from the existing mechanism.
This patch mostly matches code from Chris in
<20130911221430.GB7825@nuc-i3427.alporthouse.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We never took the lock ourselves and all callers expect the struct_mutex
to be locked upon return (be it success or error), thereore dropping the
lock along the error paths looks to be a vestigial error from
commit db1b76ca6a
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Tue Jul 9 16:51:37 2013 +0200
drm/i915: don't frob mm.suspended when not using ums
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Done while reviewing all our allocations for fubar. Also a few errant
cases of lacking () for the sizeof operator - just a bit of OCD.
I've left out all the conversions that also should use kcalloc from
this patch (it's only 2).
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
drm-intel-next-2013-09-21:
- clock state handling rework from Ville
- l3 parity handling fixes for hsw from Ben
- some more watermark improvements from Ville
- ban badly behaved context from Mika
- a few vlv improvements from Jesse
- VGA power domain handling from Ville
drm-intel-next-2013-09-06:
- Basic mipi dsi support from Jani. Not yet converted over to drm_bridge
since that was too fresh, but the porting is in progress already.
- More vma patches from Ben, this time the code to convert the execbuffer
code. Now that the shrinker recursion bug is tracked down we can move
ahead here again. Yay!
- Optimize hw context switching to not generate needless interrupts (Chris
Wilson). Also some shuffling for the oustanding request allocation.
- Opregion support for SWSCI, although not yet fully wired up (we need a
bit of runtime D3 support for that apparently, due to Windows design
deficiencies), from Jani Nikula.
- A few smaller changes all over.
[airlied: merge conflict fix in i9xx_set_pipeconf]
* tag 'drm-intel-next-2013-09-21-merged' of git://people.freedesktop.org/~danvet/drm-intel: (119 commits)
drm/i915: assume all GM45 Acer laptops use inverted backlight PWM
drm/i915: cleanup a min_t() cast
drm/i915: Pull intel_init_power_well() out of intel_modeset_init_hw()
drm/i915: Add POWER_DOMAIN_VGA
drm/i915: Refactor power well refcount inc/dec operations
drm/i915: Add intel_display_power_{get, put} to request power for specific domains
drm/i915: Change i915_request power well handling
drm/i915: POSTING_READ IPS_CTL before waiting for the vblank
drm/i915: don't disable ERR_INT on the IRQ handler
drm/i915/vlv: disable rc6p and rc6pp residency reporting on BYT
drm/i915/vlv: honor i915_enable_rc6 boot param on VLV
drm/i915: s/HAS_L3_GPU_CACHE/HAS_L3_DPF
drm/i915: Do remaps for all contexts
drm/i915: Keep a list of all contexts
drm/i915: Make l3 remapping use the ring
drm/i915: Add second slice l3 remapping
drm/i915: Fix HSW parity test
drm/i915: dump crtc timings from the pipe config
drm/i915: register backlight device also when backlight class is a module
drm/i915: write D_COMP using the mailbox
...
Conflicts:
drivers/gpu/drm/i915/intel_display.c
In
commit 81e49f8114
Author: Glauber Costa <glommer@openvz.org>
Date: Wed Aug 28 10:18:13 2013 +1000
i915: bail out earlier when shrinker cannot acquire mutex
SHRINK_STOP was added to tell the core shrinker code to bail out and
go to the next shrinker since the i915 shrinker couldn't acquire
required locks. But the SHRINK_STOP return code was added to the
->count_objects callback and not the ->scan_objects callback as it
should have been, resulting in tons of dmesg noise like
shrink_slab: i915_gem_inactive_scan+0x0/0x9c negative objects to delete nr=-xxxxxxxxx
Fix discusssed with Dave Chinner.
References: http://www.spinics.net/lists/intel-gfx/msg33597.html
Reported-by: Knut Petersen <Knut_Petersen@t-online.de>
Cc: Knut Petersen <Knut_Petersen@t-online.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Glauber Costa <glommer@openvz.org>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQEcBAABAgAGBQJSQMORAAoJEHm+PkMAQRiGj14H/1bjhtfNjPdX7MVQAzA+WpwX
s7h1IQu2Si9S5S1lBiM2sBTOssVcmfheO9x4yqm7JNOD1RnssWKOM3q+zVOLstwd
GD3gluJPeraD5EyYSqEJ9ILPQ3gbxb4wOlT0Z291TW6E8XhLRr0RTOJPksRsgvLH
Ckm9uJh6ArS6ZXfXiaDQfd+xHAQJkUfW6nMSA0g9ZO9C6KIDRvcbUmrY3m4HhfIk
mK0TXCBs+AXGDIjTEB8JgIQL/5y1Qn0c4R+2uTU/4YWwyLvJTV1e44kGoleukMMT
6Pw/TNlUEN161dbSaqCyF3sfXHDYQ5valycI2PDgitMtPSxbzsU1VDizS8+daRg=
=lEmF
-----END PGP SIGNATURE-----
Merge tag 'v3.12-rc2' into drm-intel-next
Backmerge Linux 3.12-rc2 to prep for a bunch of -next patches:
- Header cleanup in intel_drv.h, both changed in -fixes and my current
-next pile.
- Cursor handling cleanup for -next which depends upon the cursor
handling fix merged into -rc2.
All just trivial conflicts of the "changed adjacent lines" type:
drivers/gpu/drm/i915/i915_gem.c
drivers/gpu/drm/i915/intel_display.c
drivers/gpu/drm/i915/intel_drv.h
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Pull drm fixes from Dave Airlie:
- some small fixes for msm and exynos
- a regression revert affecting nouveau users with old userspace
- intel pageflip deadlock and gpu hang fixes, hsw modesetting hangs
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (22 commits)
Revert "drm: mark context support as a legacy subsystem"
drm/i915: Don't enable the cursor on a disable pipe
drm/i915: do not update cursor in crtc mode set
drm/exynos: fix return value check in lowlevel_buffer_allocate()
drm/exynos: Fix address space warnings in exynos_drm_fbdev.c
drm/exynos: Fix address space warning in exynos_drm_buf.c
drm/exynos: Remove redundant OF dependency
drm/msm: drop unnecessary set_need_resched()
drm/i915: kill set_need_resched
drm/msm: fix potential NULL pointer dereference
drm/i915/dvo: set crtc timings again for panel fixed modes
drm/i915/sdvo: Robustify the dtd<->drm_mode conversions
drm/msm: workaround for missing irq
drm/msm: return -EBUSY if bo still active
drm/msm: fix return value check in ERR_PTR()
drm/msm: fix cmdstream size check
drm/msm: hangcheck harder
drm/msm: handle read vs write fences
drm/i915/sdvo: Fully translate sync flags in the dtd->mode conversion
drm/i915: Use proper print format for debug prints
...
We'd only ever used this define to denote whether or not we have the
dynamic parity feature (DPF) and never to determine whether or not L3
exists. Baytrail is a good example of where L3 exists, and not DPF.
This patch provides clarify in the code for future use cases which might
want to actually query whether or not L3 exists.
v2: Add /* DPF == dynamic parity feature */
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
I have implemented this patch before without creating a separate list
(I'm having trouble finding the links, but the messages ids are:
<1364942743-6041-2-git-send-email-ben@bwidawsk.net>
<1365118914-15753-9-git-send-email-ben@bwidawsk.net>)
However, the code is much simpler to just use a list and it makes the
code from the next patch a lot more pretty.
As you'll see in the next patch, the reason for this is to be able to
specify when a context needs to get L3 remapping. More details there.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Using LRI for setting the remapping registers allows us to stream l3
remapping information. This is necessary to handle per context remaps as
we'll see implemented in an upcoming patch.
Using the ring also means we don't need to frob the DOP clock gating
bits.
v2: Add comment about lack of worry for concurrent register access
(Daniel)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
[danvet: Bikeshed the comment a bit by doing a s/XXX/Note - there's
nothing to fix.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Certain HSW SKUs have a second bank of L3. This L3 remapping has a
separate register set, and interrupt from the first "slice". A slice is
simply a term to define some subset of the GPU's l3 cache. This patch
implements both the interrupt handler, and ability to communicate with
userspace about this second slice.
v2: Remove redundant check about non-existent slice.
Change warning about interrupts of unknown slices to WARN_ON_ONCE
Handle the case where we get 2 slice interrupts concurrently, and switch
the tracking of interrupts to be non-destructive (all Ville)
Don't enable/mask the second slice parity interrupt for ivb/vlv (even
though all docs I can find claim it's rsvd) (Ville + Bryan)
Keep BYT excluded from L3 parity
v3: Fix the slice = ffs to be decremented by one (found by Ville). When
I initially did my testing on the series, I was using 1-based slice
counting, so this code was correct. Not sure why my simpler tests that
I've been running since then didn't pick it up sooner.
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Pull vfs pile 4 from Al Viro:
"list_lru pile, mostly"
This came out of Andrew's pile, Al ended up doing the merge work so that
Andrew didn't have to.
Additionally, a few fixes.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (42 commits)
super: fix for destroy lrus
list_lru: dynamically adjust node arrays
shrinker: Kill old ->shrink API.
shrinker: convert remaining shrinkers to count/scan API
staging/lustre/libcfs: cleanup linux-mem.h
staging/lustre/ptlrpc: convert to new shrinker API
staging/lustre/obdclass: convert lu_object shrinker to count/scan API
staging/lustre/ldlm: convert to shrinkers to count/scan API
hugepage: convert huge zero page shrinker to new shrinker API
i915: bail out earlier when shrinker cannot acquire mutex
drivers: convert shrinkers to new count/scan API
fs: convert fs shrinkers to new scan/count API
xfs: fix dquot isolation hang
xfs-convert-dquot-cache-lru-to-list_lru-fix
xfs: convert dquot cache lru to list_lru
xfs: rework buffer dispose list tracking
xfs-convert-buftarg-lru-to-generic-code-fix
xfs: convert buftarg LRU to generic code
fs: convert inode and dentry shrinking to be node aware
vmscan: per-node deferred work
...
This is just a remnant from the old days when our reset handling was
horribly racy, suffered from terribly locking issues and often happily
live-locked. Those days are now gone so we can drop the hacks and just
rip the reschedule-point out.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
lifted from Daniel:
pread/pwrite isn't about the object's domain at all, but purely about
synchronizing for outstanding rendering. Replacing the call to
set_to_gtt_domain with a wait_rendering would imo improve code
readability. Furthermore we could pimp pread to only block for
outstanding writes and not for reads.
Since you're not the first one to trip over this: Can I volunteer you
for a follow-up patch to fix this?
v2: Switch the pwrite patch to use \!read_only. This was a typo in the
original code. (Chris, Daniel)
Recommended-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Fix up the logic fumble - wait_rendering has a bool readonly
paramater, set_to_gtt_domain otoh has bool write. Breakage reported by
Jani Nikula, I've double-checked that igt/gem_concurrent_blt/prw-*
would have caught this.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>