Daniel writes:
Highlights:
- Imre's for_each_sg_pages rework (now also with the stolen mem backed
case fixed with a hack) plus the drm prime sg list coalescing patch from
Rahul Sharma. I have some follow-up cleanups pending, already acked by
Andrew Morton.
- Some prep-work for the crazy no-pch/display-less platform by Ben.
- Some vlv patches, by far not all (Jesse et al).
- Clean up the HDMI/SDVO #define confusion (Paulo)
- gen2-4 vblank fixes from Ville.
- Unclaimed register warning fixes for hsw (Paulo). More still to come ...
- Complete pageflips which have been stuck in a gpu hang, should prevent
stuck gl compositors (Ville).
- pm patches for vt-switchless resume (Jesse). Note that the i915 enabling
is not (yet) included, that took a bit longer to settle. PM patches are
acked by Rafael Wysocki.
- Minor fixlets all over from various people.
* tag 'drm-intel-next-2013-03-23' of git://people.freedesktop.org/~danvet/drm-intel: (79 commits)
drm/i915: Implement WaSwitchSolVfFArbitrationPriority
drm/i915: Set the VIC in AVI infoframe for SDVO
drm/i915: Kill a strange comment about DPMS functions
drm/i915: Correct sandybrige overclocking
drm/i915: Introduce GEN7_FEATURES for device info
drm/i915: Move num_pipes to intel info
drm/i915: fixup pd vs pt confusion in gen6 ppgtt code
style nit: Align function parameter continuation properly.
drm/i915: VLV doesn't have HDMI on port C
drm/i915: DSPFW and BLC regs are in the display offset range
drm/i915: set conservative clock gating values on VLV v2
drm/i915: fix WaDisablePSDDualDispatchEnable on VLV v2
drm/i915: add more VLV IDs
drm/i915: use VLV DIP routines on VLV v2
drm/i915: add media well to VLV force wake routines v2
drm/i915: don't use plane pipe select on VLV
drm: modify pages_to_sg prime helper to create optimized SG table
drm/i915: use for_each_sg_page for setting up the gtt ptes
drm/i915: create compact dma scatter lists for gem objects
drm/i915: handle walking compact dma scatter lists
...
It is possible to wrap the counter used to allocate the buffer for
relocation copies. This could lead to heap writing overflows.
CVE-2013-0913
v3: collapse test, improve comment
v2: move check into validate_exec_list
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Pinkie Pie
Cc: stable@vger.kernel.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This clarifies the comment above the access_ok check so a missing
VERIFY_READ doesn't alarm anyone.
v2:
- rewrote comment, thanks to Chris Wilson
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
[danvet: add patch history log to commit message.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
to_user_ptr() simply casts a pointer passed as u64 from user space
to void __user * correctly. Using this lets us get rid of all the
tiresome casts.
The idea came from Chris Wilson <chris@chris-wilson.co.uk>.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This pulls in most of Linus tree up to -rc6, this fixes the worst lockdep
reported issues and re-enables fbcon lockdep.
(not the fbcon maintainer)
* 'fbcon-locking-fixes' of ssh://people.freedesktop.org/~airlied/linux: (529 commits)
Revert "Revert "console: implement lockdep support for console_lock""
fbcon: fix locking harder
fb: Yet another band-aid for fixing lockdep mess
fb: rework locking to fix lock ordering on takeover
The purpose of the gtt structure is to help isolate our gtt specific
properties from the rest of the code (in doing so it help us finish the
isolation from the AGP connection).
The following members are pulled out (and renamed):
gtt_start
gtt_total
gtt_mappable_end
gtt_mappable
gtt_base_addr
gsm
The gtt structure will serve as a nice place to put gen specific gtt
routines in upcoming patches. As far as what else I feel belongs in this
structure: it is meant to encapsulate the GTT's physical properties.
This is why I've not added fields which track various drm_mm properties,
or things like gtt_mtrr (which is itself a pretty transient field).
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
[Ben modified commit messages]
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Using copywinwin10 as an example that is dependent upon emitting a lot
of relocations (2 per operation), we see improvements of:
c2d/gm45: 618000.0/sec to 623000.0/sec.
i3-330m: 748000.0/sec to 789000.0/sec.
(measured relative to a baseline with neither optimisations applied).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Userspace is able to hint to the kernel that its command stream and
auxiliary state buffers already hold the correct presumed addresses and
so the relocation process may be skipped if the kernel does not need to
move any buffers in preparation for the execbuffer. Thus for the common
case where the allotment of buffers is static between batches, we can
avoid the overhead of individually checking the relocation entries.
Note that this requires userspace to supply the domain tracking and
requests for workarounds itself that would otherwise be computed based
upon the relocation entries.
Using copywinwin10 as an example that is dependent upon emitting a lot
of relocations (2 per operation), we see improvements of:
c2d/gm45: 618000.0/sec to 632000.0/sec.
i3-330m: 748000.0/sec to 830000.0/sec.
(measured relative to a baseline with neither optimisations applied).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Imre Deak <imre.deak@intel.com>
[danvet: Fixup merge conflict in userspace header due to different
baseline trees.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Instead of passing around the eb-objects hashtable and a separate object
list, we can include the object list into the eb-objects structure for
convenience.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The difference is that the kernel will then know that this memory will
be reclaimable in the near future.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Daniel writes:
- seqno wrap fixes and debug infrastructure from Mika Kuoppala and Chris
Wilson
- some leftover kill-agp on gen6+ patches from Ben
- hotplug improvements from Damien
- clear fb when allocated from stolen, avoids dirt on the fbcon (Chris)
- Stolen mem support from Chris Wilson, one of the many steps to get to
real fastboot support.
- Some DDI code cleanups from Paulo.
- Some refactorings around lvds and dp code.
- some random little bits&pieces
* tag 'drm-intel-next-2012-12-21' of git://people.freedesktop.org/~danvet/drm-intel: (93 commits)
drm/i915: Return the real error code from intel_set_mode()
drm/i915: Make GSM void
drm/i915: Move GSM mapping into dev_priv
drm/i915: Move even more gtt code to i915_gem_gtt
drm/i915: Make next_seqno debugs entry to use i915_gem_set_seqno
drm/i915: Introduce i915_gem_set_seqno()
drm/i915: Always clear semaphore mboxes on seqno wrap
drm/i915: Initialize hardware semaphore state on ring init
drm/i915: Introduce ring set_seqno
drm/i915: Missed conversion to gtt_pte_t
drm/i915: Bug on unsupported swizzled platforms
drm/i915: BUG() if fences are used on unsupported platform
drm/i915: fixup overlay stolen memory leak
drm/i915: clean up PIPECONF bpc #defines
drm/i915: add intel_dp_set_signal_levels
drm/i915: remove leftover display.update_wm assignment
drm/i915: check for the PCH when setting pch_transcoder
drm/i915: Clear the stolen fb before enabling
drm/i915: Access to snooped system memory through the GTT is incoherent
drm/i915: Remove stale comment about intel_dp_detect()
...
Conflicts:
drivers/gpu/drm/i915/intel_display.c
In the slow path, we are forced to copy the relocations prior to
acquiring the struct mutex in order to handle pagefaults. We forgo
copying the new offsets back into the relocation entries in order to
prevent a recursive locking bug should we trigger a pagefault whilst
holding the mutex for the reservations of the execbuffer. Therefore, we
need to reset the presumed_offsets just in case the objects are rebound
back into their old locations after relocating for this exexbuffer - if
that were to happen we would assume the relocations were valid and leave
the actual pointers to the kernels dangling, instant hang.
Fixes regression from commit bcf50e2775
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Sun Nov 21 22:07:12 2010 +0000
drm/i915: Handle pagefaults in execbuffer user relocations
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55984
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@fwll.ch>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now that Chris Wilson demonstrated that the key for stability on early
gen 2 is to simple _never_ exchange the physical backing storage of
batch buffers I've tried a stab at a kernel solution. Doesn't look too
nefarious imho, now that I don't try to be too clever for my own good
any more.
v2: After discussing the various techniques, we've decided to always blit
batches on the suspect devices, but allow userspace to opt out of the
kernel workaround assume full responsibility for providing coherent
batches. The principal reason is that avoiding the blit does improve
performance in a few key microbenchmarks and also in cairo-trace
replays.
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet:
- Drop the hunk which uses HAS_BROKEN_CS_TLB to implement the ring
wrap w/a. Suggested by Chris Wilson.
- Also add the ACTHD check from Chris Wilson for the error state
dumping, so that we still catch batches when userspace opts out of
the w/a.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Simply use the last write-domain set for the object in the batch,
trusting userspace to have correctly flushed the caches between usage as
a write target. This check dates back from the golden age of having only
a single operation per batch with the kernel repeating it for each
cliprect, and conflicts both with userspace trying to efficiently batch
multiple operations and with reducing the kernel overhead of relocation
processing.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As per Chris Wilson's suggestion make
i915_gem_execbuffer_wait_for_flips() go away.
This was used to stall the GPU ring while there are pending
page flips involving the relevant BO. Ie. while the BO is still
being scanned out by the display controller.
The recommended alternative is to use the page flip events to
wait for the page flips to fully complete before reusing the BO
of the old front buffer. Or use more buffers.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: don't remove obj->pending_flips, still required due to
reorder patches.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Based on the work by Mika Kuoppala, we realised that we need to handle
seqno wraparound prior to committing our changes to the ring. The most
obvious point then is to grab the seqno inside intel_ring_begin(), and
then to reuse that seqno for all ring operations until the next request.
As intel_ring_begin() can fail, the callers must already be prepared to
handle such failure and so we can safely add further checks.
This patch looks like it should be split up into the interface
changes and the tweaks to move seqno wrapping from the execbuffer into
the core seqno increment. However, I found no easy way to break it into
incremental steps without introducing further broken behaviour.
v2: Mika found a silly mistake and a subtle error in the existing code;
inside i915_gem_retire_requests() we were resetting the sync_seqno of
the target ring based on the seqno from this ring - which are only
related by the order of their allocation, not retirement. Hence we were
applying the optimisation that the rings were synchronised too early,
fortunately the only real casualty there is the handling of seqno
wrapping.
v3: Do not forget to reset the sync_seqno upon module reinitialisation,
ala resume.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=863861
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> [v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The intention of checking obj->gtt_offset!=0 is to verify that the
target object was listed in the execbuffer and had been bound into the
GTT. This is guarranteed by the earlier rearrangement to split the
execbuffer operation into reserve and relocation phases and then
verified by the check that the target handle had been processed during
the reservation phase.
However, the actual checking of obj->gtt_offset==0 is bogus as we can
indeed reference an object at offset 0. For instance, the framebuffer
installed by the BIOS often resides at offset 0 - causing EINVAL as we
legimately try to render using the stolen fb.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As a quick hack we make the old intel_gtt structure mutable so we can
fool a bunch of the existing code which depends on elements in that data
structure. We can/should try to remove this in a subsequent patch.
This should preserve the old gtt init behavior which upon writing these
patches seems incorrect. The next patch will fix these things.
The one exception is VLV which doesn't have the preserved flush control
write behavior. Since we want to do that for all GEN6+ stuff, we'll
handle that in a later patch. Mainstream VLV support doesn't actually
exist yet anyway.
v2: Update the comment to remove the "voodoo"
Check that the last pte written matches what we readback
v3: actually kill cache_level_to_agp_type since most of the flags will
disappear in an upcoming patch
v4: v3 was actually not what we wanted (Daniel)
Make the ggtt bind assertions better and stricter (Chris)
Fix some uncaught errors at gtt init (Chris)
Some other random stuff that Chris wanted
v5: check for i==0 in gen6_ggtt_bind_object to shut up gcc (Ben)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by [v4]: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Make the cache_level -> agp_flags conversion for pre-gen6 a
tad more robust by mapping everything != CACHE_NONE to the cached agp
flag - we have a 1:1 uncached mapping, but different modes of
cacheable (at least on later generations). Suggested by Chris Wilson.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJQgvdwAAoJEHm+PkMAQRiG+3AH/i2XsqqN3VctL0nnbWfvds+Q
aKulfIdJTjKiVAsawPUtRqReZ8ijiebrgA/53lZLlrFOoPPQ5+LHmnSyQF6gErOY
NuAE1lijXDRM1pwBlhvOBbAj26wUobGjqONFJ9OkKr758Ue8ds/Q7UdxyEgmYgmg
tvVMzfRcICzryUV3PcqL+3cNPpCUdT6wGGRJ9DCv/jvGiWKExWhOle5oltrmxk+D
NsqRcws5pEubfHE4J8BvNWr8lE1kHfYVhrJETiLJUiN2XAJcbI4Jy7rU/3EGteNS
0HMZdaPPjV874lohdM70X2225SbYrCVkAYB5hnZCTeC3tYyCawBBPMQoyAiOcmU=
=+861
-----END PGP SIGNATURE-----
Merge tag 'v3.7-rc2' into drm-intel-next-queued
Linux 3.7-rc2
Backmerge to solve two ugly conflicts:
- uapi. We've already added new ioctl definitions for -next. Do I need to say more?
- wc support gtt ptes. We've had to revert this for snb+ for 3.7 and
also fix a few other things in the code. Now we know how to make it
work on snb+, but to avoid losing the other fixes do the backmerge
first before re-enabling wc gtt ptes on snb+.
And a few other minor things, among them git getting confused in
intel_dp.c and seemingly causing a conflict out of nothing ...
Conflicts:
drivers/gpu/drm/i915/i915_reg.h
drivers/gpu/drm/i915/intel_display.c
drivers/gpu/drm/i915/intel_dp.c
drivers/gpu/drm/i915/intel_modes.c
include/drm/i915_drm.h
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With the introduction of per-process GTT space, the hardware designers
thought it wise to also limit the ability to write to MMIO space to only
a "secure" batch buffer. The ability to rewrite registers is the only
way to program the hardware to perform certain operations like scanline
waits (required for tear-free windowed updates). So we either have a
choice of adding an interface to perform those synchronized updates
inside the kernel, or we permit certain processes the ability to write
to the "safe" registers from within its command stream. This patch
exposes the ability to submit a SECURE batch buffer to
DRM_ROOT_ONLY|DRM_MASTER processes.
v2: Haswell split up bit8 into a ppgtt bit (still bit8) and a security
bit (bit 13, accidentally not set). Also add a comment explaining why
secure batches need a global gtt binding.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
[danvet: added hsw fixup.]
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Pull drm merge (part 1) from Dave Airlie:
"So first of all my tree and uapi stuff has a conflict mess, its my
fault as the nouveau stuff didn't hit -next as were trying to rebase
regressions out of it before we merged.
Highlights:
- SH mobile modesetting driver and associated helpers
- some DRM core documentation
- i915 modesetting rework, haswell hdmi, haswell and vlv fixes, write
combined pte writing, ilk rc6 support,
- nouveau: major driver rework into a hw core driver, makes features
like SLI a lot saner to implement,
- psb: add eDP/DP support for Cedarview
- radeon: 2 layer page tables, async VM pte updates, better PLL
selection for > 2 screens, better ACPI interactions
The rest is general grab bag of fixes.
So why part 1? well I have the exynos pull req which came in a bit
late but was waiting for me to do something they shouldn't have and it
looks fairly safe, and David Howells has some more header cleanups
he'd like me to pull, that seem like a good idea, but I'd like to get
this merge out of the way so -next dosen't get blocked."
Tons of conflicts mostly due to silly include line changes, but mostly
mindless. A few other small semantic conflicts too, noted from Dave's
pre-merged branch.
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (447 commits)
drm/nv98/crypt: fix fuc build with latest envyas
drm/nouveau/devinit: fixup various issues with subdev ctor/init ordering
drm/nv41/vm: fix and enable use of "real" pciegart
drm/nv44/vm: fix and enable use of "real" pciegart
drm/nv04/dmaobj: fixup vm target handling in preparation for nv4x pcie
drm/nouveau: store supported dma mask in vmmgr
drm/nvc0/ibus: initial implementation of subdev
drm/nouveau/therm: add support for fan-control modes
drm/nouveau/hwmon: rename pwm0* to pmw1* to follow hwmon's rules
drm/nouveau/therm: calculate the pwm divisor on nv50+
drm/nouveau/fan: rewrite the fan tachometer driver to get more precision, faster
drm/nouveau/therm: move thermal-related functions to the therm subdev
drm/nouveau/bios: parse the pwm divisor from the perf table
drm/nouveau/therm: use the EXTDEV table to detect i2c monitoring devices
drm/nouveau/therm: rework thermal table parsing
drm/nouveau/gpio: expose the PWM/TOGGLE parameter found in the gpio vbios table
drm/nouveau: fix pm initialization order
drm/nouveau/bios: check that fixed tvdac gpio data is valid before using it
drm/nouveau: log channel debug/error messages from client object rather than drm client
drm/nouveau: have drm debugging macros build on top of core macros
...
Convert #include "..." to #include <path/...> in drivers/gpu/.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Jones <davej@redhat.com>
Remove redundant DRM UAPI header #inclusions from drivers/gpu/.
Remove redundant #inclusions of core DRM UAPI headers (drm.h, drm_mode.h and
drm_sarea.h). They are now #included via drmP.h and drm_crtc.h via a preceding
patch.
Without this patch and the patch to make include the UAPI headers from the core
headers, after the UAPI split, the DRM C sources cannot find these UAPI headers
because the DRM code relies on specific -I flags to make #include "..." work
on headers in include/drm/ - but that does not work after the UAPI split without
adding more -I flags.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Jones <davej@redhat.com>
As we make the simplification of using a power-of-two size for the
execbuffer handle-to-object TLB, we should validate that this is actually
true and so clarify that premise.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The exec_list is of type drm_i915_gem_exec_object2 and so casting it to
a drm_i915_gem_relocation_entry is very confusing!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Rather than have multiple data structures for describing our page layout
in conjunction with the array of pages, we can migrate all users over to
a scatterlist.
One major advantage, other than unifying the page tracking structures,
this offers is that we replace the vmalloc'ed array (which can be up to
a megabyte in size) with a chain of individual pages which helps reduce
memory pressure.
The disadvantage is that we then do not have a simple array to iterate,
or to access randomly. The common case for this is in the relocation
processing, which will typically fit within a single scatterlist page
and so be almost the same cost as the simple array. For iterating over
the array, the extra function call could be optimised away, but in
reality is an insignificant cost of either binding the pages, or
performing the pwrite/pread.
v2: Fix drm_clflush_sg() to not invoke wbinvd as well! And fix the
trivial compile error from rebasing.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If we need to stall in order to complete the pin_and_fence operation
during execbuffer reservation, there is a high likelihood that the
operation will be interrupted by a signal (thanks X!). In order to
simplify the cleanup along that error path, the object was
unconditionally unbound and the error propagated. However, being
interrupted here is far more common than I would like and so we can
strive to avoid the extra work by eliminating the forced unbind.
v2: In discussion over the indecent colour of the new functions and
unwind path, we realised that we can use the new unreserve function to
clean up the code even further.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This prevents the case of unbinding the object in order to process the
relocations through the GTT and then rebinding it only to then proceed
to use cpu relocations as the object is now in the CPU write domain. By
choosing to use cpu relocations up front, we can therefore avoid the
rebind penalty.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Avoid stalling and waiting for the GPU by checking to see if there is
sufficient inactive space in the aperture for us to bind the buffer
prior to writing through the GTT. If there is inadequate space we will
have to stall waiting for the GPU, and incur overheads moving objects
about. Instead, only incur the clflush overhead on the target object by
writing through shmem.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQEcBAABAgAGBQJQLWtvAAoJEHm+PkMAQRiG/DYH+wd0FqfEuYkYk4KPyAPuhKpX
zX7HYfLvyJE/ZYIdrhjq1E6Xm2KNr7gtX7/Rdzi2W38M9sjbYzwG1UGIw51qnxWy
yZJH9BGkfyQgQPeuDGohfB6DkDy2JWr2eqMDvakjOwgBsIzji0PQD/f3UvndhtUa
c+tTj/kjavHE1Yr2Wy6OnRZz3Uc0hIMn/Q0JqtbCs3LUgEV1KA4OEAe56XNz4Ku4
WE+FFaGFPvtriQsQON+ohPS5IC8jzQGK/0vbrJ4lWjFnZy4gvZXnborTOwD0WSQG
fbsNuxp1AaM2/pqfMwXm1w0ADvwOITHNiwwXf9id6DoK81QwTFpUdvKpn6yB6gQ=
=rurr
-----END PGP SIGNATURE-----
Merge tag 'v3.6-rc2' into drm-intel-next
Backmerge Linux 3.6-rc2 to resolve a few funny conflicts before we put
even more madness on top:
- drivers/gpu/drm/i915/i915_irq.c: Just a spurious WARN removed in
-fixes, that has been changed in a variable-rename in -next, too.
- drivers/gpu/drm/i915/intel_ringbuffer.c: -next remove scratch_addr
(since all their users have been extracted in another fucntion),
-fixes added another user for a hw workaroudn.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If a buffer that was the target of a PIPE_CONTROL from userland was a
reused one that hadn't been evicted which had not previously had this
workaround applied, then the early return for a correct
presumed_offset in this function meant we would not bind it into the
GTT and the write would land somewhere else.
Fixes reproducible failures with GL_EXT_timer_query usage in apitrace,
and I also expect it to fix the intermittent OQ issues on snb that
danvet's been working on.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48019
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=52932
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Carl Worth <cworth@cworth.org>
Tested-by: Carl Worth <cworth@cworth.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As suggested by Daniel, rip out the independent timers for device and
crtc busyness and integrate the manual powermanagement of the display
engine into the GEM core and its request tracking. The benefits are that
the code is a lot smaller, fewer moving parts and should fit more neatly
into the overall activity tracking of the driver.
v2: Complete overhaul and removal of the racy timers and workers.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
By moving the function to intel_ringbuffer and currying the appropriate
parameter, hopefully we make the callsites easier to read and
understand.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Otherwise once we use the buffer with a BLT command on gen2/3, we will
always regard future command submissions as continuing the fenced
access. However, now that we flush/invalidate between every batch we can
drop this pessimism.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now that we unconditionally flush and invalidate between every batch
buffer, we no longer need the complex logic to decide which domains
require flushing. Remove it and rejoice.
v2 (danvet): Keep around the flip waiting logic. It's gross and
broken, I know, but we can't just kill that thing ... even if we just
keep it around as a reminder that things are broken.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is now handled by a global flag to ensure we emit a flush before
the next serialisation point (if we failed to queue one previously).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we always flush the GPU cache prior to emitting the breadcrumb, we no
longer have to worry about the deferred flush causing the
pending_gpu_write to be delayed. So we can instead utilize the known
last_write_seqno to hopefully minimise the wait times.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Request preallocation was added to i915_add_request() in order to
support the overlay. However, not all users care and can quite happily
ignore the failure to allocate the request as they will simply repeat
the request in the future.
By pushing the allocation down into i915_add_request(), we can then
remove some rather ugly error handling in the callers.
v2: Nullify request->file_priv otherwise we chase a garbage pointer
when retiring requests.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Fixes failures in transform feedback on gen7 because our SOL_RESET
flag was setting the transform feedback offsets in the old context
(occasionally happened to be ours) instead of the new context.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If we drop the breadcrumb request after a batch due to a signal for
example we aim to fix it up at the next opportunity. In this case we
emit a second batchbuffer with no waits upon the first and so no
opportunity to insert the missing request, so we need to emit the
missing flush for coherency. (Note that that invalidating the render
cache is the same as flushing it, so there should have been no
observable corruption.)
Note that beside simply adding the missing flush, avoiding potential
render corruption, this will also fix at least parts of the problem
introduced by some funny interaction of these two commits:
commit de2b998552
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Wed Jul 4 22:52:50 2012 +0200
drm/i915: don't return a spurious -EIO from intel_ring_begin
which allowed intel_ring_begin to return -ERESTARTSYS and
commit cc889e0f6c
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Wed Jun 13 20:45:19 2012 +0200
drm/i915: disable flushing_list/gpu_write_list
which essentially disabled the flushing list.
The issue happens when we submit a batch & emit it, but get
interrupted (thanks to the first patch) while trying to emit the
flush. On the next batch we still assume that the full gpu domain
handling is in effect and hence compute the invalidate&flushing
domains. But thanks to the 2nd patch we totally ignore these and only
invalidate all gpu domains, presuming that any required flushes have
been issued already. Which is wrong and eventually results in us
updating the new write_domain values with the computed
pending_write_domain values, which leaves an object with write_domain
== 0 on the gpu_write_list.
As soon as we try to unbind that object, things blow up.
Fix this by emitting the missing flush according to the new
ring->gpu_caches_dirty flag.
Note that this does _not_ fix all the current cases where we end up
with an object on the flushing_list that can't be flushed.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=52040
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Add bug explanation to commit message.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is just the minimal patch to disable all this code so that we can
do decent amounts of QA before we rip it all out.
The complicating thing is that we need to flush the gpu caches after
the batchbuffer is emitted. Which is past the point of no return where
execbuffer can't fail any more (otherwise we risk submitting the same
batch multiple times).
Hence we need to add a flag to track whether any caches associated
with that ring are dirty. And emit the flush in add_request if that's
the case.
Note that this has a quite a few behaviour changes:
- Caches get flushed/invalidated unconditionally.
- Invalidation now happens after potential inter-ring sync.
I've bantered around a bit with Chris on irc whether this fixes
anything, and it might or might not. The only thing clear is that with
these changes it's much easier to reason about correctness.
Also rip out a lone get_next_request_seqno in the execbuffer
retire_commands function. I've dug around and I couldn't figure out
why that is still there, with the outstanding lazy request stuff it
shouldn't be necessary.
v2: Chris Wilson complained that I also invalidate the read caches
when flushing after a batchbuffer. Now optimized.
v3: Added some comments to explain the new flushing behaviour.
Cc: Eric Anholt <eric@anholt.net>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Use the rsvd1 field in execbuf2 to specify the context ID associated
with the workload. This will allow the driver to do the proper context
switch when/if needed.
v2: Add checks for context switches on rings not supporting contexts.
Before the code would silently ignore such requests.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Rather than use the magic feature tests HAS_BLT/HAS_BSD just check
whether the ring we are about to dispatch the execbuffer on is
initialised.
v2: Use intel_ring_initialized()
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The principle of intel_mark_busy() is that we want to spot the
transition of when the display engine is being used in order to bump
powersaving modes and increase display clocks. As such it is only
important when the display is changing, i.e. when rendering to the
scanout or other sprite/plane, and these are characterised by being
pinned.
v2: Mark the whole device as busy on execbuffer and pageflips as well
and rebase against dinq for the minor bug fix to be immediately
applicable.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: fix compile fail.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Backmerge of drm-next to resolve a few ugly conflicts and to get a few
fixes from 3.4-rc6 (which drm-next has already merged). Note that this
merge also restricts the stencil cache lra evict policy workaround to
snb (as it should) - I had to frob the code anyway because the
CM0_MASK_SHIFT define died in the masked bit cleanups.
We need the backmerge to get Paulo Zanoni's infoframe regression fix
for gm45 - further bugfixes from him touch the same area and would
needlessly conflict.
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>