Commit Graph

1157 Commits

Author SHA1 Message Date
Daniel Vetter 49987099e2 drm/i915: use vma->node directly and rewrap map&fence in bind
Use () to make for neater alignment of the split lines, too. With this
we ditch another jump through the obj_gtt_size/offset indirection
maze.

Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-22 13:31:45 +02:00
Ben Widawsky 4bd561b3e8 drm/i915: cleanup map&fence in bind
Cleanup the map and fenceable setting during bind to make more sense,
and not check i915_is_ggtt() 2 unnecessary times

v2: Move the bools into the if block (Chris) - There are ways to tidy
this function (fence calculations for instance) even further, but they
are quite invasive, so I am punting on those unless specifically asked.

v3: Add newline between variable declaration and logic (Chris)

Recommended-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-22 13:31:45 +02:00
Ben Widawsky 433544bd25 drm/i915: Remove node only when allocated
VMAs can be created and not bound. One may think of it as lazy cleanup,
and safely gloss over the conditions which manufacture it. In either
case, when the object backing the i915 vma is destroyed, we must cleanup
the vma without stumbling into a bunch of pitfalls that assume the vma
is bound.

NOTE: I was pretty certain the above condition could only happen when we
introduced the use of VMAs being looked up at execbuf, and already
existing. Paulo has hit this though, so I must be missing something. As
I believe the patch is correct anyway, therefore I won't scratch my head
too hard.

v2: use goto destroy as a compromise (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-22 13:31:44 +02:00
Chris Wilson 4257d3ba3b drm/i915: Allow the user to set bo into the DISPLAY cache domain
This is primarily for the benefit of the create2 ioctl so that the
caller can avoid the later step of rebinding the bo with new PTE bits.
After introducing WT (and possibly GFDT) cacheing for display targets,
not everything in the display is earmarked as UC, and more importantly
what is is controlled by the kernel.

Note that set_cache_level/get_cache_level for DISPLAY is not necessarily
idempotent; get_cache_level may return UC for architectures that have no
special cache domain for the display engine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-22 13:31:39 +02:00
Chris Wilson 651d794fae drm/i915: Use Write-Through cacheing for the display plane on Iris
Haswell GT3e has the unique feature of supporting Write-Through cacheing
of objects within the eLLC/LLC. The purpose of this is to enable the display
plane to remain coherent whilst objects lie resident in the eLLC/LLC - so
that we, in theory, get the best of both worlds, perfect display and fast
access.

However, we still need to be careful as the CPU does not see the WT when
accessing the cache. In particular, this means that we need to flush the
cache lines after writing to an object through the CPU, and on
transitioning from a cached state to WT.

v2: Actually do the clflush on transition to WT, nagging by Ville.
v3: Flush the CPU cache after writes into WT objects.
v4: Rease onto LLC updates and report WT as "uncached" for
get_cache_level_ioctl to remain symmetric with set_cache_level_ioctl.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-22 13:31:38 +02:00
Jani Nikula f2f4d82faf drm/i915: give more distinctive names to ring hangcheck action enums
The short lowercase names are bound to collide. The default warnings
don't even warn about shadowing.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-22 13:31:37 +02:00
Ben Widawsky 7ace7ef2f5 drm/i915: WARN_ON failed map_and_fenceable
I just noticed in our code we don't really check the assertion, and
given some of the code I am changing in this area, I feel a WARN is very
nice to have.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: s/&/&&/ to fix typo on the check.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-22 13:31:35 +02:00
Chris Wilson 000433b67e drm/i915: Only do a chipset flush after a clflush
Now that we skip clflushes more often, return a boolean indicating
whether the clflush was actually performed, and only if it was do the
chipset flush. (Though on most of the architectures where the clflush will
be skipped, the chipset flush is a no-op!)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-22 13:31:34 +02:00
Dave Airlie 9712def2b3 Merge tag 'drm-intel-next-2013-08-09' of git://people.freedesktop.org/~danvet/drm-intel into drm-next
Daniel writes:
New pile of stuff for -next:
- Cleanup of the old crtc helper callbacks, all encoders are now converted
  to the i915 modeset infrastructure.
- Massive amount of wm patches from Ville for ilk, snb, ivb, hsw, this is
  prep work to eventually get things going for nuclear pageflips where we
  need to adjust watermarks on the fly.
- More vm/vma patches from Ben. This refactoring isn't yet fully rolled
  out, we miss the execbuf conversion and some of the low-level
  bind/unbind support code.
- Convert our hdmi infoframe code to use the new common helper functions
  (Damien). This contains some bugfixes for the common infoframe helpers.
- Some cruft removal from Damien.
- Various smaller bits&pieces all over, as usual.

* tag 'drm-intel-next-2013-08-09' of git://people.freedesktop.org/~danvet/drm-intel: (105 commits)
  drm/i915: Fix FB WM for HSW
  drm/i915: expose HDMI connectors on port C on BYT
  drm/i915: fix a limit check in hsw_compute_wm_results()
  drm/i915: unbreak i915_gem_object_ggtt_unbind()
  drm/i915: Make intel_set_mode() static
  drm/i915: Remove intel_modeset_disable()
  drm/i915: Make intel_encoder_dpms() static
  drm/i915: Make i915_hangcheck_elapsed() static
  drm/i915: Fix #endif comment
  drm/i915: Remove i915_gem_object_check_coherency()
  drm/i915: Remove stale prototypes
  drm/i915: List objects allocated from stolen memory in debugfs
  drm/i915: Always call intel_update_sprite_watermarks() when disabling a plane
  drm/i915: Pass plane and crtc to intel_update_sprite_watermarks
  drm/i915: Don't try to disable plane if it's already disabled
  drm/i915: Pass crtc to our update/disable_plane hooks
  drm/i915: Split plane watermark parameters into a separate struct
  drm/i915: Pull some watermarks state into a separate structure
  drm/i915: Calculate max watermark levels for ILK+
  drm/i915: Rename hsw_lp_wm_result to intel_wm_level
  ...
2013-08-21 12:48:59 +10:00
Chris Wilson 2c22569bba drm/i915: Update rules for writing through the LLC with the cpu
As mentioned in the previous commit, reads and writes from both the CPU
and GPU go through the LLC. This gives us coherency between the CPU and
GPU irrespective of the attribute settings either device sets. We can
use to avoid having to clflush even uncached memory.

Except for the scanout.

The scanout resides within another functional block that does not use
the LLC but reads directly from main memory. So in order to maintain
coherency with the scanout, writes to uncached memory must be flushed.
In order to optimize writes elsewhere, we start tracking whether an
framebuffer is attached to an object.

v2: Use pin_display tracking rather than fb_count (to ensure we flush
cursors as well etc) and only force the clflush along explicit writes to
the scanout paths (i.e. pin_to_display_plane and pwrite into scanout).

v3: Force the flush after hitting the slowpath in pwrite, as after
dropping the lock the object's cache domain may be invalidated. (Ville)

Based on a patch by Ville Syrjälä.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-10 11:20:49 +02:00
Chris Wilson cc98b413c1 drm/i915: Track when an object is pinned for use by the display engine
The display engine has unique coherency rules such that it requires
special handling to ensure that all writes to cursors, scanouts and
sprites are clflushed. This patch introduces the infrastructure to
simply track when an object is being accessed by the display engine.

v2: Explain the is_pin_display() magic as the sources for obj->pin_count
and their individual rules is not obvious. (Ville)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-10 11:19:51 +02:00
Chris Wilson c76ce038e3 drm/i915: Update rules for reading cache lines through the LLC
The LLC is a fun device. The cache is a distinct functional block within
the SA that arbitrates access from both the CPU and GPU cores. As such
all writes to memory land first in the LLC before further action is
taken. For example, an uncached write from either the CPU or GPU will
then proceed to memory and evict the cacheline from the LLC. This means that
a read from the LLC always returns the correct information even if the PTE
bit in the GPU differs from the PAT bit in the CPU. For the older
snooping architecture on non-LLC, the fundamental principle still holds
except that some coordination is required between the CPU and GPU to
explicitly perform the snooping (which is handled by our request
tracking).

The upshot of this is that we know that we can issue a read from either
LLC devices or snoopable memory and trust the contents of the cache -
i.e. we can forgo a clflush before a read in these circumstances.
Writing to memory from the CPU is a little more tricky as we have to
consider that the scanout does not read from the CPU cache at all, but
from main memory. So we have to currently treat all requests to write to
uncached memory as having to be flushed to main memory for coherency
with all consumers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-10 11:19:50 +02:00
Dan Carpenter 58e73e1570 drm/i915: unbreak i915_gem_object_ggtt_unbind()
There is an extra semi-colon here so we just leak and never unbind
anything.

This regression has been introduced in

commit 07fe0b1280
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Wed Jul 31 17:00:10 2013 -0700

    drm/i915: plumb VM into bind/unbind code

Cc: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-09 12:04:53 +02:00
Ben Widawsky 8b9c2b9411 drm/i915: Add vma to list at creation
With the current code there shouldn't be a distinction - however with an
upcoming change we intend to allocate a vma much earlier, before it's
actually bound anywhere.

To do this we have to check node allocation as well for the _bound()
check.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: move list_del(&vma->vma_link) from vma_unbind to vma_destroy,
again fallout from the loss of "rm/i915: Cleanup more of VMA in
destroy".]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

fixup for drm/i915: Add vma to list at creation
2013-08-08 14:10:20 +02:00
Ben Widawsky ca191b1313 drm/i915: mm_list is per VMA
formerly: "drm/i915: Create VMAs (part 5) - move mm_list"

The mm_list is used for the active/inactive LRUs. Since those LRUs are
per address space, the link should be per VMx .

Because we'll only ever have 1 VMA before this point, it's not incorrect
to defer this change until this point in the patch series, and doing it
here makes the change much easier to understand.

Shamelessly manipulated out of Daniel:
"active/inactive stuff is used by eviction when we run out of address
space, so needs to be per-vma and per-address space. Bound/unbound otoh
is used by the shrinker which only cares about the amount of memory used
and not one bit about in which address space this memory is all used in.
Of course to actual kick out an object we need to unbind it from every
address space, but for that we have the per-object list of vmas."

v2: only bump GGTT LRU in i915_gem_object_set_to_gtt_domain (Chris)

v3: Moved earlier in the series

v4: Add dropped message from v3

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Frob patch to apply and use vma->node.size directly as
discused with Ben. Also drop a needles BUG_ON before move_to_inactive,
the function itself has the same check.]
[danvet 2nd: Rebase on top of the lost "drm/i915: Cleanup more of VMA
in destroy", specifically unlink the vma from the mm_list in
vma_unbind (to keep it symmetric with bind_to_vm) instead of
vma_destroy.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-08 14:06:58 +02:00
Ben Widawsky 5cacaac77c drm/i915: Fix up map and fenceable for VMA
formerly: "drm/i915: Create VMAs (part 3.5) - map and fenceable
tracking"

The map_and_fenceable tracking is per object. GTT mapping, and fences
only apply to global GTT. As such,  object operations which are not
performed on the global GTT should not effect mappable or fenceable
characteristics.

Functionally, this commit could very well be squashed in to a previous
patch which updated object operations to take a VM argument.  This
commit is split out because it's a bit tricky (or at least it was for
me).

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Drop the bogus hunk in i915_vma_unbind as discussed with
Ben.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-08 14:04:55 +02:00
Ben Widawsky 9843877d10 drm/i915: turn bound_ggtt checks to bound_any
In some places, we want to know if an object is bound in any address
space, and not just the global GTT. This often applies when there is a
single global resource (object, pages, etc.)

function                             |      reason
--------------------------------------------------
i915_gem_object_is_inactive          | global object
i915_gem_object_put_pages            | object's pages
915_gem_object_unpin                 | global object
i915_gem_execbuffer_unreserve_object | temporary until we plumb vma
pread/pwrite                         | see the note below

Note: set_to_gtt_domain in pwrite/pread is abused as a wait_rendering
call - but that once only worked if the object is bound. We really
should replace this with a plain wait_rendering call, which would have
the upside that in pread it would be clearer that we actually only
wait for oustanding gpu writes.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Explain the set_to_gtt_domain in pwrite/pread and volunteer
Ben to replace those with wait_rendering calls.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-08 14:04:43 +02:00
Ben Widawsky f6cd1f15d3 drm/i915: Use new bind/unbind in eviction code
Eviction code, like the rest of the converted code needs to be aware of
the address space for which it is evicting (or the everything case, all
addresses). With the updated bind/unbind interfaces of the last patch,
we can now safely move the eviction code over.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-08 14:04:43 +02:00
Ben Widawsky 07fe0b1280 drm/i915: plumb VM into bind/unbind code
As alluded to in several patches, and it will be reiterated later... A
VMA is an abstraction for a GEM BO bound into an address space.
Therefore it stands to reason, that the existing bind, and unbind are
the ones which will be the most impacted. This patch implements this,
and updates all callers which weren't already updated in the series
(because it was too messy).

This patch represents the bulk of an earlier, larger patch. I've pulled
out a bunch of things by the request of Daniel. The history is preserved
for posterity with the email convention of ">" One big change from the
original patch aside from a bunch of cropping is I've created an
i915_vma_unbind() function. That is because we always have the VMA
anyway, and doing an extra lookup is useful. There is a caveat, we
retain an i915_gem_object_ggtt_unbind, for the global cases which might
not talk in VMAs.

> drm/i915: plumb VM into object operations
>
> This patch was formerly known as:
> "drm/i915: Create VMAs (part 3) - plumbing"
>
> This patch adds a VM argument, bind/unbind, and the object
> offset/size/color getters/setters. It preserves the old ggtt helper
> functions because things still need, and will continue to need them.
>
> Some code will still need to be ported over after this.
>
> v2: Fix purge to pick an object and unbind all vmas
> This was doable because of the global bound list change.
>
> v3: With the commit to actually pin/unpin pages in place, there is no
> longer a need to check if unbind succeeded before calling put_pages().
> Make put_pages only BUG() after checking pin count.
>
> v4: Rebased on top of the new hangcheck work by Mika
> plumbed eb_destroy also
> Many checkpatch related fixes
>
> v5: Very large rebase
>
> v6:
> Change BUG_ON to WARN_ON (Daniel)
> Rename vm to ggtt in preallocate stolen, since it is always ggtt when
> dealing with stolen memory. (Daniel)
> list_for_each will short-circuit already (Daniel)
> remove superflous space (Daniel)
> Use per object list of vmas (Daniel)
> Make obj_bound_any() use obj_bound for each vm (Ben)
> s/bind_to_gtt/bind_to_vm/ (Ben)
>
> Fixed up the inactive shrinker. As Daniel noticed the code could
> potentially count the same object multiple times. While it's not
> possible in the current case, since 1 object can only ever be bound into
> 1 address space thus far - we may as well try to get something more
> future proof in place now. With a prep patch before this to switch over
> to using the bound list + inactive check, we're now able to carry that
> forward for every address space an object is bound into.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Rebase on top of the loss of "drm/i915: Cleanup more of VMA
in destroy".]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-08 14:04:20 +02:00
Ben Widawsky 80dcfdbd68 drm/i915: Rework __i915_gem_shrink
In order to do this for all VMs, it's convenient to rework the logic a
bit. This should have no functional impact.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-08 14:02:41 +02:00
Dave Airlie 32c913e436 Merge tag 'drm-intel-next-2013-07-26-fixed' of git://people.freedesktop.org/~danvet/drm-intel into drm-next
Neat that QA (and Ben) keeps on humming along while I'm on vacation, so
you already get the next feature pull request:
- proper eLLC support for HSW from Ben
- more interrupt refactoring
- add w/a tags where we implement them already (Damien)
- hangcheck fixes (Chris) + hangcheck stats (Mika)
- flesh out the new vm structs for ppgtt and ggtt (Ben)
- PSR for Haswell, still disabled by default (Rodrigo et al.)
- pc8+ refclock sequence code from Paulo
- more interrupt refactoring from Paulo, unifying ilk/snb with the ivb/hsw
  interrupt code
- full solution for the Haswell concurrent reg access issues (Chris)
- fix racy object accounting, used by some new leak tests
- fix sync polarity settings on ch7xxx dvo encoder
- random bits&pieces, little fixes and better debug output all over

[airlied: fix conflict with drm_mm cleanups]

* tag 'drm-intel-next-2013-07-26-fixed' of git://people.freedesktop.org/~danvet/drm-intel: (289 commits)
  drm/i915: Do not dereference NULL crtc or fb until after checking
  drm/i915: fix pnv display core clock readout out
  drm/i915: Replace open-coded offset_in_page()
  drm/i915: Retry DP aux_ch communications with a different clock after failure
  drm/i915: Add messages useful for HPD storm detection debugging (v2)
  drm/i915: dvo_ch7xxx: fix vsync polarity setting
  drm/i915: fix the racy object accounting
  drm/i915: Convert the register access tracepoint to be conditional
  drm/i915: Squash gen lookup through multiple indirections inside GT access
  drm/i915: Use the common register access functions for NOTRACE variants
  drm/i915: Use a private interface for register access within GT
  drm/i915: Colocate all GT access routines in the same file
  drm/i915: fix reference counting in i915_gem_create
  drm/i915: Use Graphics Base of Stolen Memory on all gen3+
  drm/i915: disable stolen mem for OVERLAY_NEEDS_PHYSICAL
  drm/i915: add functions to disable and restore LCPLL
  drm/i915: disable CLKOUT_DP when it's not needed
  drm/i915: extend lpt_enable_clkout_dp
  drm/i915: fix up error cleanup in i915_gem_object_bind_to_gtt
  drm/i915: Add some debug breadcrumbs to connector detection
  ...
2013-08-07 18:11:35 +10:00
David Herrmann 31e5d7c67b drm/mm: add "best_match" flag to drm_mm_insert_node()
Add a "best_match" flag similar to the drm_mm_search_*() helpers so we
can convert TTM to use them in follow up patches. We can also inline the
non-generic helpers and move them into the header to allow compile-time
optimizations.

To make calls to drm_mm_{search,insert}_node() more readable, this
converts the boolean argument to a flagset. There are pending patches that
add additional flags for top-down allocators and more.

v2:
 - use flag parameter instead of boolean "best_match"
 - convert *_search_free() helpers to also use flags argument

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-08-07 10:08:58 +10:00
Daniel Vetter 43387b37fa drm/gem: create drm_gem_dumb_destroy
All the gem based kms drivers really want the same function to
destroy a dumb framebuffer backing storage object.

So give it to them and roll it out in all drivers.

This still leaves the option open for kms drivers which don't use GEM
for backing storage, but it does decently simplify matters for gem
drivers.

Acked-by: Inki Dae <inki.dae@samsung.com>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: Intel Graphics Development <intel-gfx@lists.freedesktop.org>
Cc: Ben Skeggs <skeggsb@gmail.com>
Reviwed-by: Rob Clark <robdclark@gmail.com>
Cc: Alex Deucher <alexdeucher@gmail.com>
Acked-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-08-07 09:59:24 +10:00
Ben Widawsky 637efacf8f drm/i915: eliminate dead domain clearing on reset
The code itself is no longer accurate without updating once we have
multiple address space since clearing the domains of every object
requires scanning the inactive list for all VMs.

"This code is dead. Just remove it rather than port it to vma." - Chris
Wilson

Recommended-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-05 19:12:25 +02:00
Ben Widawsky d1ccbb5d71 drm/i915: make reset&hangcheck code VM aware
Hangcheck, and some of the recent reset code for guilty batches need to
know which address space the object was in at the time of a hangcheck.
This is because we use offsets in the (PP|G)GTT to determine this
information, and those offsets can differ depending on which VM they are
bound into.

Since we still only have 1 VM ever, this code shouldn't yet have any
impact.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-05 19:04:13 +02:00
Ben Widawsky 3e12302705 drm/i915: BUG_ON put_pages later
With multiple VMs, the eviction code benefits from being able to blindly
put pages without needing to know if there are any entities still
holding on to those pages. As such it's preferable to return the -EBUSY
before the BUG.

Eviction code is the only user for now, but overall it makes sense
anyway, IMO.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-05 19:04:13 +02:00
Ben Widawsky 3089c6f239 drm/i915: make caching operate on all address spaces
For now, objects will maintain the same cache levels amongst all address
spaces. This is to limit the risk of bugs, as playing with cacheability
in the different domains can be very error prone.

In the future, it may be optimal to allow setting domains per VMA (ie.
an object bound into an address space).

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-05 19:04:12 +02:00
Ben Widawsky c37e220461 drm/i915: Add VM to pin
To verbalize it, one can say, "pin an object into the given address
space." The semantics of pinning remain the same otherwise.

Certain objects will always have to be bound into the global GTT.
Therefore, global GTT is a special case, and keep a special interface
around for it (i915_gem_obj_ggtt_pin).

v2: s/i915_gem_ggtt_pin/i915_gem_obj_ggtt_pin

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-05 19:04:09 +02:00
Ben Widawsky fcb4a57805 drm/i915: Use bound list for inactive shrink
Do to the move active/inactive lists, it no longer makes sense to use
them for shrinking, since shrinking isn't VM specific (such a need may
also exist, but doesn't yet).

What we can do instead is use the global bound list to find all objects
which aren't active.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-05 19:04:09 +02:00
Ben Widawsky a70a3148b0 drm/i915: Make proper functions for VMs
Earlier in the conversion sequence we attempted to quickly wedge in the
transitional interface as static inlines.

Now that we're sure these interfaces are sane, for easier debug and to
decrease code size (since many of these functions may be called quite a
bit), make them real functions

While at it, kill off the set_color interface. We'll always have the
VMA, or easily get to it.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-05 19:04:08 +02:00
Ben Widawsky fc8c067eee drm/i915: Create an init vm
Move all the similar address space (VM) initialization code to one
function. Until we have multiple VMs, there should only ever be 1 VM.
The aliasing ppgtt is a special case without it's own VM (since it
doesn't need it's own address space management).

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-05 19:04:07 +02:00
Daniel Vetter c20e835586 drm/i915: fix the racy object accounting
Just use a spinlock to protect them.

v2: Rebase onto the new object create refcount fix patch.

v3: Don't kill dev_priv->mm.object_memory as requested by Chris and
hence just use a spinlock instead of atomic_t.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67287
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-25 15:30:54 +02:00
Daniel Vetter cb54b53ada Merge commit 'Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux'
This backmerges Linus' merge commit of the latest drm-fixes pull:

commit 549f3a1218
Merge: 42577ca 058ca4a
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Tue Jul 23 15:47:08 2013 -0700

    Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

We've accrued a few too many conflicts, but the real reason is that I
want to merge the 100% solution for Haswell concurrent registers
writes into drm-intel-next. But that depends upon the 90% bandaid
merged into -fixes:

commit a7cd1b8fea
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Jul 19 20:36:51 2013 +0100

    drm/i915: Serialize almost all register access

Also, we can roll up on accrued conflicts.

Usually I'd backmerge a tagged -rc, but I want to get this done before
heading off to vacations next week ;-)

Conflicts:
	drivers/gpu/drm/i915/i915_dma.c
	drivers/gpu/drm/i915/i915_gem.c

v2: For added hilarity we have a init sequence conflict around the
gt_lock, so need to move that one, too. Spotted by Jani Nikula.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-25 15:18:41 +02:00
David Herrmann 51335df9f0 drm/vma: provide drm_vma_node_unmap() helper
Instead of unmapping the nodes in TTM and GEM users manually, we provide
a generic wrapper which does the correct thing for all vma-nodes.

v2: remove bdev->dev_mapping test in ttm_bo_unmap_virtual_unlocked() as
ttm_mem_io_free_vm() does nothing in that case (io_reserved_vm is 0).
v4: Fix docbook comments
v5: use drm_vma_node_size()

Cc: Dave Airlie <airlied@redhat.com>
Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@gmail.com>
2013-07-25 20:47:08 +10:00
David Herrmann 0de23977cf drm/gem: convert to new unified vma manager
Use the new vma manager instead of the old hashtable. Also convert all
drivers to use the new convenience helpers. This drops all the
(map_list.hash.key << PAGE_SHIFT) non-sense.

Locking and access-management is exactly the same as before with an
additional lock inside of the vma-manager, which strictly wouldn't be
needed for gem.

v2:
 - rebase on drm-next
 - init nodes via drm_vma_node_reset() in drm_gem.c
v3:
 - fix tegra
v4:
 - remove duplicate if (drm_vma_node_has_offset()) checks
 - inline now trivial drm_vma_node_offset_addr() calls
v5:
 - skip node-reset on gem-init due to kzalloc()
 - do not allow mapping gem-objects with offsets (backwards compat)
 - remove unneccessary casts

Cc: Inki Dae <inki.dae@samsung.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@gmail.com>
2013-07-25 20:47:06 +10:00
Daniel Vetter d861e33876 drm/i915: fix reference counting in i915_gem_create
This function is called without the dev->struct_mutex held, hence we
need to use the _unlocked unreference variants.

As soon as the object is registered userspace can sneak in here with a
gem_close ioctl call, so the object can (and with my new evil tests
actually does) get the final unreference in this place. The lack of
locking then results in hilarity and some good leakage.

To fix this we simply need to revert

Chris Wilson <chris@chris-wilson.co.uk>

v2: We need to make the trace call _before_ we drop our ref - the
object might very well be gone by then already.

v3: Just revert the original patch as suggested by Chris Wilson.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Remove the added white line again to tighten the return
block, requested by Chris.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-24 23:25:03 +02:00
Daniel Vetter bc6bc15bd7 drm/i915: fix up error cleanup in i915_gem_object_bind_to_gtt
This has been broken in

commit 2f63315692
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Wed Jul 17 12:19:03 2013 -0700

    drm/i915: Create VMAs

which resulted in an OOPS the first time around we've hit -ENOSPC.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67156
Cc: Imre Deak <imre.deak@intel.com>
Cc: Ben Widawsky <ben@bwidawsk.net>
Tested-by: meng <mengmeng.meng@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-24 10:37:08 +02:00
Xiong Zhang 0b74b508f7 drm/i915: add prefault_disable module option
prefault is stll enabled by default which prevent most of pwrite/pread/reloc
from running slow path, in order to verify these slow pathes, prefault need
to be disabled.

Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
[danvet: Make checkpatch happy and bikeshed the module option help
text a bit.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-19 09:29:26 +02:00
Dan Carpenter 6286ef9b56 drm/i915: use after free on error path
i915_gem_vma_destroy() frees its argument so we have to move the
drm_mm_remove_node() call up a few lines.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-19 08:58:42 +02:00
Dan Carpenter db473b36d4 drm/i915: checking for NULL instead of IS_ERR()
i915_gem_vma_create() returns and ERR_PTR() or a valid pointer, it never
returns NULL.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-19 08:58:33 +02:00
Dave Airlie e13af9a834 Merge tag 'drm-intel-next-2013-07-12' of git://people.freedesktop.org/~danvet/drm-intel into drm-next
Highlights:
- follow-up refactoring after the shared dpll rework that landed in 3.11
- oddball prep cleanups from Ben for ppgtt
- encoder->get_config state tracking infrastructure from Jesse
- used by the experimental fastboot support from Jesse (disabled by
  default)
- make the error state file official and add it to our sysfs interface
  (Mika)
- drm_mm prep changes from Ben, prepares to embedd the drm_mm_node (which
  will be used by the vma rework later on)
- interrupt handling rework, follow up cleanups to the VECS enabling, hpd
  storm handling and fifo underrun reporting.
- Big pile of smaller cleanups, code improvements and related stuff.

* tag 'drm-intel-next-2013-07-12' of git://people.freedesktop.org/~danvet/drm-intel: (72 commits)
  drm/i915: clear DPLL reg when disabling i9xx dplls
  drm/i915: Fix up cpt pixel multiplier enable sequence
  drm/i915: clean up vlv ->pre_pll_enable and pll enable sequence
  drm/i915: move error state to own compilation unit
  drm/i915: Don't attempt to read an unitialized stack value
  drm/i915: Use for_each_pipe() when possible
  drm/i915: don't enable PM_VEBOX_CS_ERROR_INTERRUPT
  drm/i915: unify ring irq refcounts (again)
  drm/i915: kill dev_priv->rps.lock
  drm/i915: queue work outside spinlock in hsw_pm_irq_handler
  drm/i915: streamline hsw_pm_irq_handler
  drm/i915: irq handlers don't need interrupt-safe spinlocks
  drm/i915: kill lpt pch transcoder->crtc mapping code for fifo underruns
  drm/i915: improve GEN7_ERR_INT clearing for fifo underrun reporting
  drm/i915: improve SERR_INT clearing for fifo underrun reporting
  drm/i915: extract ibx_display_interrupt_update
  drm/i915: remove unused members from drm_i915_private
  drm/i915: don't frob mm.suspended when not using ums
  drm/i915: Fix VLV DP RBR/HDMI/DAC PLL LPF coefficients
  drm/i915: WARN if the bios reserved range is bigger than stolen size
  ...

Conflicts:
	drivers/gpu/drm/i915/i915_gem.c
2013-07-19 12:12:21 +10:00
Daniel Vetter 94a335dba3 drm/i915: correctly restore fences with objects attached
To avoid stalls we delay tiling changes and especially hold of
committing the new fence state for as long as possible.
Synchronization points are in the execbuf code and in our gtt fault
handler.

Unfortunately we've missed that tricky detail when adding proper fence
restore code in

commit 19b2dbde57
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Jun 12 10:15:12 2013 +0100

    drm/i915: Restore fences after resume and GPU resets

The result was that we've restored fences for objects with no tiling,
since the object<->fence link still existed after resume. Now that
wouldn't have been too bad since any subsequent access would have
fixed things up, but if we've changed from tiled to untiled real havoc
happened:

The tiling stride is stored -1 in the fence register, so a stride of 0
resulted in all 1s in the top 32bits, and so a completely bogus fence
spanning everything from the start of the object to the top of the
GTT. The tell-tale in the register dumps looks like:

                 FENCE START 2: 0x0214d001
                 FENCE END 2: 0xfffff3ff

Bit 11 isn't set since the hw doesn't store it, even when writing all
1s (at least on my snb here).

To prevent such a gaffle in the future add a sanity check for fences
with an untiled object attached in i915_gem_write_fence.

v2: Fix the WARN, spotted by Chris.

v3: Trying to reuse get_fences looked ugly and obfuscated the code.
Instead reuse update_fence and to make it really dtrt also move the
fence dirty state clearing into update_fence.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Stéphane Marchesin <marcheu@chromium.org>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=60530
Cc: stable@vger.kernel.org (for 3.10 only)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Matthew Garrett <matthew.garrett@nebula.com>
Tested-by: Björn Bidar <theodorstormgrade@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-19 00:08:16 +02:00
Daniel Vetter 8157ee2115 Linux 3.10
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJR0K2gAAoJEHm+PkMAQRiGWsEH+gMZSN1qRm34hZ82q1Tx7HvL
 Eb/Gsl3Qw/7G2TlTqgjBUs36IdqV9O2cui/aa3/TfXvdvrx+0GlhRkEwQPc+ygcO
 Mvoyoke4tT4+4jVFdCg1J8avREsa28/6oaHs0ZZxuVmJBBLTJH7aXaNsGn6eU1q9
 9+p798MQis6naIiPC63somlZcCIiBhsuWCPWpEfLMn8G1HWAFTM3xXIbNBqe/brS
 bmIOfhomlIZ5dcdaXGvjtP3+KJhkNDwhkPC4tVYu8JqqgSlrE+a+EGyEuuGqKk10
 U+swiqyuD31uBI9ga54u/2FzSqDiAu6YOcMXevjo/m3g9XLdYbYLvN+nvN8alCQ=
 =Ob6Z
 -----END PGP SIGNATURE-----

Merge tag 'v3.10' into drm-intel-fixes

Backmerge Linux 3.10 to get at

commit 19b2dbde57
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Jun 12 10:15:12 2013 +0100

    drm/i915: Restore fences after resume and GPU resets

That commit is not in my current -fixes pile since that's based on my
-next queue for 3.11. And the above mentioned fix was merged really
late into 3.10 (and blew up, bad me) so was on a diverging branch.

Option B would have been to rebase my current pile of fixes onto
Dave's drm-fixes branch. But since some of the patches here are a bit
tricky I've decided not to void all the testing by moving over the
entire merge window.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-18 12:03:29 +02:00
Ben Widawsky 2f63315692 drm/i915: Create VMAs
Formerly: "drm/i915: Create VMAs (part 1)"

In a previous patch, the notion of a VM was introduced. A VMA describes
an area of part of the VM address space. A VMA is similar to the concept
in the linux mm. However, instead of representing regular memory, a VMA
is backed by a GEM BO. There may be many VMAs for a given object, one
for each VM the object is to be used in. This may occur through flink,
dma-buf, or a number of other transient states.

Currently the code depends on only 1 VMA per object, for the global GTT
(and aliasing PPGTT). The following patches will address this and make
the rest of the infrastructure more suited

v2: s/i915_obj/i915_gem_obj (Chris)

v3: Only move an object to the now global unbound list if there are no
more VMAs for the object which are bound into a VM (ie. the list is
empty).

v4: killed obj->gtt_space
some reworks due to rebase

v5: Free vma on error path (Imre)

v6: Another missed vma free in i915_gem_object_bind_to_gtt error path
(Imre)
Fixed vma freeing in stolen preallocation (Imre)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Imre Deak <imre.deak@intel.com>
[danvet: Squash in fixup from Ben to not deref a non-existing vma in
set_cache_level, reported by Chris.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-18 08:46:13 +02:00
Ben Widawsky 5cef07e162 drm/i915: Move active/inactive lists to new mm
Shamelessly manipulated out of Daniel :-)
"When moving the lists around explain that the active/inactive stuff is
used by eviction when we run out of address space, so needs to be
per-vma and per-address space. Bound/unbound otoh is used by the
shrinker which only cares about the amount of memory used and not one
bit about in which address space this memory is all used in. Of course
to actual kick out an object we need to unbind it from every address
space, but for that we have the per-object list of vmas."

v2: Leave the bound list as a global one. (Chris, indirectly)

v3: Rebased with no i915_gtt_vm. In most places I added a new *vm local,
since it will eventually be replaces by a vm argument.
Put comment back inline, since it no longer makes sense to do otherwise.

v4: Rebased on hangcheck/error state movement

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-17 22:24:32 +02:00
Ben Widawsky 93bd8649db drm/i915: Put the mm in the parent address space
Every address space should support object allocation. It therefore makes
sense to have the allocator be part of the "superclass" which GGTT and
PPGTT will derive.

Since our maximum address space size is only 2GB we're not yet able to
avoid doing allocation/eviction; but we'd hope one day this becomes
almost irrelvant.

v2: Rebased

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-17 22:23:43 +02:00
Ben Widawsky 853ba5d223 drm/i915: Move gtt and ppgtt under address space umbrella
The GTT and PPGTT can be thought of more generally as GPU address
spaces. Many of their actions (insert entries), state (LRU lists), and
many of their characteristics (size) can be shared. Do that.

The change itself doesn't actually impact most of the VMA/VM rework
coming up, it just fits in with the grand scheme of abstracting the GPU
VM operations. GGTT will usually be a special case where we either know
an object must be in the GGTT (dislay engine, workarounds, etc.).

The scratch page is left as part of the VM (even though it's currently
shared with the ppgtt code) because in the future when we have Full
PPGTT, I intend to create a separate scratch page for each.

v2: Drop usage of i915_gtt_vm (Daniel)
Make cleanup also part of the parent class (Ben)
Modified commit msg
Rebased

v3: Properly share scratch page (Imre)
Finish commit message (Daniel, Imre)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-17 22:21:47 +02:00
Dave Airlie 6bd2cab2c1 Merge tag 'drm-intel-fixes-2013-07-11' of git://people.freedesktop.org/~danvet/drm-intel
One feature latecomer, I've forgotten to merge the patch to reeanble the
Haswell power well feature now that the audio interaction is fixed up.
Since that was the only unfixed issue with it I've figured I could throw
it in a bit late, and it's trivial to revert in case I'm wrong.

Otherwise all bug/regression fixes:
- Fix status page reinit after gpu hangs, spotted by more paranoid igt
  checks.
- Fix object list walking fumble regression in the shrinker (only the
  counting part, the actual shrinking code was correct so no Oops
  potential), from Xiong Zhang.
- Fix DP 1.2 bw limits (Imre).
- Restore legacy forcewake on ivb, too many broken biosen out there. We
  dump a warn though that recent userspace might fall over with that
  config (Guenter Roeck).
- Patch up the gen2 cs tlb w/a.
- Improve the fence coherency w/a now that we have a better understanding
  what's going on. The removed wbinvd+ipi should make -rt folks happy. Big
  thanks to Jon Bloomfield for figuring this out, patches from Chris.
- Fix write-read race when switching ring (Chris). Spotted with code
  inspection, but now we also have an igt for it.

There's an ugly regression we're still working on introduced between
3.10-rc7 and 3.10.0. Unfortunately we can't just revert the offender since
that one fixes another regression :( I've asked Steven to include my
-fixes branch into linux-next to prevent such fallout in the future,
hopefully.

* tag 'drm-intel-fixes-2013-07-11' of git://people.freedesktop.org/~danvet/drm-intel:
  Revert "drm/i915: Workaround incoherence between fences and LLC across multiple CPUs"
  drm/i915: Fix incoherence with fence updates on Sandybridge+
  drm/i915: Fix write-read race with multiple rings
  Partially revert "drm/i915: unconditionally use mt forcewake on hsw/ivb"
  drm/i915: fix lane bandwidth capping for DP 1.2 sinks
  drm/i915: fix up ring cleanup for the i830/i845 CS tlb w/a
  drm/i915: Correct obj->mm_list link to dev_priv->dev_priv->mm.inactive_list
  drm/i915: switch disable_power_well default value to 1
  drm/i915: reinit status page registers after gpu reset
2013-07-17 08:40:49 +10:00
Mika Kuoppala 10cd45b6e8 drm/i915: introduce i915_queue_hangcheck
To run hangcheck in near future.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-16 12:44:02 +02:00
Ben Widawsky 59124506ba drm/i915: store eLLC size
The eLLC cannot be determined by PCIID because as far as we know, even
machines supporting eLLC may not have it enabled, or fused off or
whatever. It's possible this isn't actually true, and at that point we
can switch to a DEV_INFO flag instead.

I've defined everything where the docs are clear, and left the rest as
magic.

But we need it before we set the pte_encode function pointers, which
happens really early, in gtt_init.

The problem with just doing the normal sequence earlier is we don't have
the ability to use forcewake until after the pte functions have been set
up.

Since all solutions are somewhat ugly (barring rewriting all the init
ordering), I've opted to do the detection really early, and the enabling
later - since the register to detect doesn't require forcewake.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Move dev_priv->ellc_size away from the dri1 dungeon to a nice
place right next to the l3 parity stuff. Also squash in the follow-up
commit to read out the eLLC size a bit earlier.]
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-16 08:08:21 +02:00
Ben Widawsky 05e21cc43d drm/i915: Define some of the eLLC magic
The EDRAM present register isn't really defined in the docs. It just
says check to see if it's set to 1. So I haven't defined the 1 value not
knowing what it actually means.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-16 08:00:52 +02:00
Chris Wilson 46a0b638f3 Revert "drm/i915: Workaround incoherence between fences and LLC across multiple CPUs"
This reverts commit 25ff119 and the follow on for Valleyview commit 2dc8aae.

commit 25ff1195f8
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Apr 4 21:31:03 2013 +0100

    drm/i915: Workaround incoherence between fences and LLC across multiple CPUs

commit 2dc8aae06d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed May 22 17:08:06 2013 +0100

    drm/i915: Workaround incoherence with fence updates on Valleyview

Jon Bloomfield came up with a plausible explanation and cheap fix
(drm/i915: Fix incoherence with fence updates on Sandybridge+) for the
race condition, so lets run with it.

This is a candidate for stable as the old workaround incurs a
significant cost (calling wbinvd on all CPUs before performing the
register write) for some workloads as noted by Carsten Emde.

Link: http://lists.freedesktop.org/archives/intel-gfx/2013-June/028819.html
References: https://www.osadl.org/?id=1543#c7602
References: https://bugs.freedesktop.org/show_bug.cgi?id=63825
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Carsten Emde <C.Emde@osadl.org>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-10 15:31:12 +02:00
Chris Wilson d18b961903 drm/i915: Fix incoherence with fence updates on Sandybridge+
This hopefully fixes the root cause behind the workaround added in

commit 25ff1195f8
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Apr 4 21:31:03 2013 +0100

    drm/i915: Workaround incoherence between fences and LLC across multiple CPUs

Thanks to further investigation by Jon Bloomfield, he realised that
the 64-bit register might be broken up by the hardware into two 32-bit
writes (a problem we have encountered elsewhere). This non-atomicity
would then cause an issue where a second thread would see an
intermediate register state (new high dword, old low dword), and this
register would randomly be used in preference to its own thread register.
This would cause the second thread to read from and write into a fairly
random tiled location.  Breaking the operation into 3 explicit 32-bit
updates (first disable the fence, poke the upper bits, then poke the lower
bits and enable) ensures that, given proper serialisation between the
32-bit register write and the memory transfer, that the fence value is
always consistent.

Armed with this knowledge, we can explain how the previous workaround
work. The key to the corruption is that a second thread sees an
erroneous fence register that conflicts and overrides its own. By
serialising the fence update across all CPUs, we have a small window
where no GTT access is occurring and so hide the potential corruption.
This also leads to the conclusion that the earlier workaround was
incomplete.

v2: Be overly paranoid about the order in which fence updates become
visible to the GPU to make really sure that we turn the fence off before
doing the update, and then only switch the fence on afterwards.

Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Carsten Emde <C.Emde@osadl.org>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-10 14:41:46 +02:00
Daniel Vetter db1b76ca6a drm/i915: don't frob mm.suspended when not using ums
In kernel modeset driver mode we're in full control of the chip,
always. So there's no need at all to set mm.suspended in
i915_gem_idle. Hence move that out into the leavevt ioctl. Since
i915_gem_idle doesn't suspend gem any more we can also drop the
re-enabling for KMS in the thaw function.

Also clean up the handling of mm.suspend at driver load by coalescing
all the assignments.

Stumbled over while reading through our resume code for unrelated
reasons.

v2: Shovel mm.suspended into the (newly created) ums dungeon as
suggested by Chris Wilson. The plan is that once we've completely
stopped relying on the register save/restore code we could shovel even
that in there.

v3: Improve the locking for the entervt/leavevt ioctls a bit by moving
the dev->struct_mutex locking outside of i915_gem_idle. Also don't
clear dev_priv->ums.mm_suspended for the kms case, we allocate it with
kzalloc. Both suggested by Chris Wilson.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-10 14:30:25 +02:00
Chris Wilson 02978ff57a drm/i915: Fix write-read race with multiple rings
Daniel noticed a problem where is we wrote to an object with ring A in
the middle of a very long running batch, then executed a quick batch on
ring B before a batch that reads from the same object, its obj->ring would
now point to ring B, but its last_write_seqno would be still relative to
ring A. This would allow for the user to read from the object before the
GPU had completed the write, as set_domain would only check that ring B
had passed the last_write_seqno.

To fix this simply (and inelegantly), we bump the last_write_seqno when
switching rings so that the last_write_seqno is always relative to the
current obj->ring.

This fixes igt/tests/gem_write_read_ring_switch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
[danvet: Add note about the newly created igt which exercises this
bug.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-10 10:41:55 +02:00
Linus Torvalds 2e17c5a97e Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
 "Okay this is the big one, I was stalled on the fbdev pull req as I
  stupidly let fbdev guys merge a patch I required to fix a warning with
  some patches I had, they ended up merging the patch from the wrong
  place, but the warning should be fixed.  In future I'll just take the
  patch myself!

  Outside drm:

  There are some snd changes for the HDMI audio interactions on haswell,
  they've been acked for inclusion via my tree.  This relies on the
  wound/wait tree from Ingo which is already merged.

  Major changes:

  AMD finally released the dynamic power management code for all their
  GPUs from r600->present day, this is great, off by default for now but
  also a huge amount of code, in fact it is most of this pull request.

  Since it landed there has been a lot of community testing and Alex has
  sent a lot of fixes for any bugs found so far.  I suspect radeon might
  now be the biggest kernel driver ever :-P p.s.  radeon.dpm=1 to enable
  dynamic powermanagement for anyone.

  New drivers:

  Renesas r-car display unit.

  Other highlights:

   - core: GEM CMA prime support, use new w/w mutexs for TTM
     reservations, cursor hotspot, doc updates
   - dvo chips: chrontel 7010B support
   - i915: Haswell (fbc, ips, vecs, watermarks, audio powerwell),
     Valleyview (enabled by default, rc6), lots of pll reworking, 30bpp
     support (this time for sure)
   - nouveau: async buffer object deletion, context/register init
     updates, kernel vp2 engine support, GF117 support, GK110 accel
     support (with external nvidia ucode), context cleanups.
   - exynos: memory leak fixes, Add S3C64XX SoC series support, device
     tree updates, common clock framework support,
   - qxl: cursor hotspot support, multi-monitor support, suspend/resume
     support
   - mgag200: hw cursor support, g200 mode limiting
   - shmobile: prime support
   - tegra: fixes mostly

  I've been banging on this quite a lot due to the size of it, and it
  seems to okay on everything I've tested it on."

* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (811 commits)
  drm/radeon/dpm: implement vblank_too_short callback for si
  drm/radeon/dpm: implement vblank_too_short callback for cayman
  drm/radeon/dpm: implement vblank_too_short callback for btc
  drm/radeon/dpm: implement vblank_too_short callback for evergreen
  drm/radeon/dpm: implement vblank_too_short callback for 7xx
  drm/radeon/dpm: add checks against vblank time
  drm/radeon/dpm: add helper to calculate vblank time
  drm/radeon: remove stray line in old pm code
  drm/radeon/dpm: fix display_gap programming on rv7xx
  drm/nvc0/gr: fix gpc firmware regression
  drm/nouveau: fix minor thinko causing bo moves to not be async on kepler
  drm/radeon/dpm: implement force performance level for TN
  drm/radeon/dpm: implement force performance level for ON/LN
  drm/radeon/dpm: implement force performance level for SI
  drm/radeon/dpm: implement force performance level for cayman
  drm/radeon/dpm: implement force performance levels for 7xx/eg/btc
  drm/radeon/dpm: add infrastructure to force performance levels
  drm/radeon: fix surface setup on r1xx
  drm/radeon: add support for 3d perf states on older asics
  drm/radeon: set default clocks for SI when DPM is disabled
  ...
2013-07-09 16:04:31 -07:00
Xiong Zhang 067556084a drm/i915: Correct obj->mm_list link to dev_priv->dev_priv->mm.inactive_list
obj->mm_list link to dev_priv->mm.inactive_list/active_list
obj->global_list link to dev_priv->mm.unbound_list/bound_list

This regression has been introduced in

commit 93927ca52a
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Thu Jan 10 18:03:00 2013 +0100

    drm/i915: Revert shrinker changes from "Track unbound pages"

Cc: stable@vger.kernel.org
Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
[danvet: Add regression notice.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-09 16:31:48 +02:00
Ben Widawsky c6cfb32567 drm/i915: Embed drm_mm_node in i915 gem obj
Embedding the node in the obj is more natural in the transition to VMAs
which will also have embedded nodes. This change also helps transition
away from put_block to remove node.

Though it's quite an uncommon occurrence, it's somewhat convenient to not
fail at bind time because we cannot allocate the node. Though in
practice there are other allocations (like the request structure) which
would probably make this point not terribly useful.

Quoting Daniel:
Note that the only difference between put_block and remove_node is
that the former fills up the preallocation cache. Which we don't need
anyway and hence is just wasted space.

v2: Clean up the stolen preallocation code.
Rebased on the reserve_node patches
renames ggtt_ stuff to gtt_ stuff
WARN_ON if the object is already bound (which doesn't mean it's in the
bound list, tricky)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-08 22:04:36 +02:00
Ben Widawsky edd41a870f drm/i915: Kill obj->gtt_offset
With the getters in place from the previous patch this members serves no
purpose other than saving one spare pointer chase, which will be killed
in the next patch anyway.

Moving to VMAs, this members adds unnecessary confusion since an object
may exist at different offsets in different VMs.

v2: Properly preserve the stolen offset. This code is a bit hacky but it
all goes away when we embed the drm_mm_node and removes the need for the
incorrect patch I submitted previously: "Use gtt_space->start for stolen
reservation"

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-08 22:04:35 +02:00
Ben Widawsky f343c5f647 drm/i915: Getter/setter for object attributes
Soon we want to gut a lot of our existing assumptions how many address
spaces an object can live in, and in doing so, embed the drm_mm_node in
the object (and later the VMA).

It's possible in the future we'll want to add more getter/setter
methods, but for now this is enough to enable the VMAs.

v2: Reworked commit message (Ben)
Added comments to the main functions (Ben)
sed -i "s/i915_gem_obj_set_color/i915_gem_obj_ggtt_set_color/" drivers/gpu/drm/i915/*.[ch]
sed -i "s/i915_gem_obj_bound/i915_gem_obj_ggtt_bound/" drivers/gpu/drm/i915/*.[ch]
sed -i "s/i915_gem_obj_size/i915_gem_obj_ggtt_size/" drivers/gpu/drm/i915/*.[ch]
sed -i "s/i915_gem_obj_offset/i915_gem_obj_ggtt_offset/" drivers/gpu/drm/i915/*.[ch]
(Daniel)

v3: Rebased on new reserve_node patch
Changed DRM_DEBUG_KMS to actually work (will need fixing later)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-08 22:04:34 +02:00
Chris Wilson d26e3af842 drm/i915: Refactor the wait_rendering completion into a common routine
Harmonise the completion logic between the non-blocking and normal
wait_rendering paths, and move that logic into a common function.

In the process, we note that the last_write_seqno is by definition the
earlier of the two read/write seqnos and so all successful waits will
have passed the last_write_seqno. Therefore we can unconditionally clear
the write seqno and its domains in the completion logic.

v2: Add the missing ring parameter, because sometimes it is good to have
things compile.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-01 11:15:01 +02:00
Chris Wilson daa13e1ca5 drm/i915: Only clear write-domains after a successful wait-seqno
In the introduction of the non-blocking wait, I cut'n'pasted the wait
completion code from normal locked path. Unfortunately, this neglected
that the normal path returned early if the wait returned early. The
result is that read-only waits may return whilst the GPU is still
writing to the bo.

Fixes regression from
commit 3236f57a01 [v3.7]
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Aug 24 09:35:09 2012 +0100

    drm/i915: Use a non-blocking wait for set-to-domain ioctl

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66163
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-01 11:15:00 +02:00
Jani Nikula 3765f30486 drm/i915: fix build warning on format specifier mismatch
drivers/gpu/drm/i915/i915_gem.c: In function ‘i915_gem_object_bind_to_gtt’:
drivers/gpu/drm/i915/i915_gem.c:3002:3: warning: format ‘%ld’ expects
argument of type ‘long int’, but argument 5 has type ‘size_t’ [-Wformat]

v2: Use %zu instead of %d. Two char patch, and 100% wrong. (Ville)

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-01 11:14:43 +02:00
Konrad Rzeszutek Wilk 1625e7e549 drm/i915: make compact dma scatter lists creation work with SWIOTLB backend.
Git commit 90797e6d1e
("drm/i915: create compact dma scatter lists for gem objects") makes
certain assumptions about the under laying DMA API that are not always
correct.

On a ThinkPad X230 with an Intel HD 4000 with Xen during the bootup
I see:

[drm:intel_pipe_set_base] *ERROR* pin & fence failed
[drm:intel_crtc_set_config] *ERROR* failed to set mode on [CRTC:3], err = -28

Bit of debugging traced it down to dma_map_sg failing (in
i915_gem_gtt_prepare_object) as some of the SG entries were huge (3MB).

That unfortunately are sizes that the SWIOTLB is incapable of handling -
the maximum it can handle is a an entry of 512KB of virtual contiguous
memory for its bounce buffer. (See IO_TLB_SEGSIZE).

Previous to the above mention git commit the SG entries were of 4KB, and
the code introduced by above git commit squashed the CPU contiguous PFNs
in one big virtual address provided to DMA API.

This patch is a simple semi-revert - were we emulate the old behavior
if we detect that SWIOTLB is online. If it is not online then we continue
on with the new compact scatter gather mechanism.

An alternative solution would be for the the '.get_pages' and the
i915_gem_gtt_prepare_object to retry with smaller max gap of the
amount of PFNs that can be combined together - but with this issue
discovered during rc7 that might be too risky.

Reported-and-Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Chris Wilson <chris@chris-wilson.co.uk>
CC: Imre Deak <imre.deak@intel.com>
CC: Daniel Vetter <daniel.vetter@ffwll.ch>
CC: David Airlie <airlied@linux.ie>
CC: <dri-devel@lists.freedesktop.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-01 11:14:42 +02:00
Dave Airlie 28419261b0 Merge tag 'drm-intel-next-2013-06-18' of git://people.freedesktop.org/~danvet/drm-intel into drm-next
Last 3.11 feature pull. I have a few odds bits and pieces and fixes in my
queue, I'll sort them out later on to see what's for 3.11-fixes and what's
for 3.12. But nothing to hold this here up imo.

Highlights:
- more hangcheck work from Mika and Chris to prepare for arb robustness
- trickle feed fixes from Ville
- first parts of the shared pch pll rework, with some basic hw state
  readout and cross-checking (this shuts up the confused pch pll refcount
  WARN that Linus just recently forwarded)
- Haswell audio power well support from Wang Xingchao (alsa bits acked by
  Takashi)
- some cleanups and asserts sprinkling around the plane/gamma enabling
  sequence from Ville
- more gtt refactoring from Ben
- clear up the adjusted->mode vs. pixel clock vs. port clock confusion
- 30bpp support, this time for real hopefully

* tag 'drm-intel-next-2013-06-18' of git://people.freedesktop.org/~danvet/drm-intel: (97 commits)
  drm/i915: remove a superflous semi-colon
  drm/i915: Kill useless "Enable panel fitter" comments
  drm/i915: Remove extra "ring" from error message
  drm/i915: simplify the reduced clock handling for pch plls
  drm/i915: stop killing pfit on i9xx
  drm/i915: explicitly set up PIPECONF (and gamma table) on haswell
  drm/i915: set up PIPECONF explicitly for i9xx/vlv platforms
  drm/i915: set up PIPECONF explicitly on ilk-ivb
  drm/i915: find guilty batch buffer on ring resets
  drm/i915: store ring hangcheck action
  drm/i915: add batch bo to i915_add_request()
  drm/i915: change i915_add_request to macro
  drm/i915: add i915_gem_context_get_hang_stats()
  drm/i915: add struct i915_ctx_hang_stats
  drm/i915: Try harder to disable trickle feed on VLV
  drm/i915: fix up pch pll enabling for pixel multipliers
  drm/i915: hw state readout and cross-checking for shared dplls
  drm/i915: WARN on lack of shared dpll
  drm/i915: split up intel_modeset_check_state
  drm/i915: extract readout_hw_state from setup_hw_state
  ...

Conflicts:
	drivers/gpu/drm/i915/intel_display.c
	drivers/gpu/drm/i915/intel_fb.c
	drivers/gpu/drm/i915/intel_sdvo.c
2013-06-28 09:50:34 +10:00
Dave Airlie 4300a0f8bd Linux 3.10-rc7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQEbBAABAgAGBQJRxf9cAAoJEHm+PkMAQRiGMWkH911xM4gRmFgE7SqVW4F4AWBm
 ngcqMqNy9IdqKfibORUUDvVfEa5gjD5ai2quIKpfQiaukbpQJ696H90ijuAkajLn
 DQBrN243s0pzhhc/quWINnWxsFQ613JjdUMUMaD7e9A1aKjYzWrPGt/tSjrFXGCP
 tArTupVzc/iOmnEQDKiROI/Nokq44QJ36aTGPM7n08xMtpKmkCXM+9/UosBteB0O
 HVI33dmjwz7i55fI53XAWyuZCE+gSEnA4z8spJ9LfXso2W14V+roc+GuL6OyeeTI
 pCn/+4niVPb4B0ROZlpyVmdZjbPPcMMEK5o+BSJI68SH6LHZTQh2iVuqYfpSyA==
 =uUH5
 -----END PGP SIGNATURE-----

Merge tag 'v3.10-rc7' into drm-next

Linux 3.10-rc7

The sdvo lvds fix in this -fixes pull

commit c3456fb3e4
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Mon Jun 10 09:47:58 2013 +0200

    drm/i915: prefer VBT modes for SVDO-LVDS over EDID

has a silent functional conflict with

commit 990256aec2
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date:   Fri May 31 12:17:07 2013 +0000

    drm: Add probed modes in probe order

in drm-next. W simply need to add the vbt modes before edid modes, i.e. the
other way round than now.

Conflicts:
	drivers/gpu/drm/drm_prime.c
	drivers/gpu/drm/i915/intel_sdvo.c
2013-06-27 20:40:44 +10:00
Konrad Rzeszutek Wilk 426729dcc7 drm/i915: make compact dma scatter lists creation work with SWIOTLB backend.
Git commit 90797e6d1e
("drm/i915: create compact dma scatter lists for gem objects") makes
certain assumptions about the under laying DMA API that are not always
correct.

On a ThinkPad X230 with an Intel HD 4000 with Xen during the bootup
I see:

[drm:intel_pipe_set_base] *ERROR* pin & fence failed
[drm:intel_crtc_set_config] *ERROR* failed to set mode on [CRTC:3], err = -28

Bit of debugging traced it down to dma_map_sg failing (in
i915_gem_gtt_prepare_object) as some of the SG entries were huge (3MB).

That unfortunately are sizes that the SWIOTLB is incapable of handling -
the maximum it can handle is a an entry of 512KB of virtual contiguous
memory for its bounce buffer. (See IO_TLB_SEGSIZE).

Previous to the above mention git commit the SG entries were of 4KB, and
the code introduced by above git commit squashed the CPU contiguous PFNs
in one big virtual address provided to DMA API.

This patch is a simple semi-revert - were we emulate the old behavior
if we detect that SWIOTLB is online. If it is not online then we continue
on with the new compact scatter gather mechanism.

An alternative solution would be for the the '.get_pages' and the
i915_gem_gtt_prepare_object to retry with smaller max gap of the
amount of PFNs that can be combined together - but with this issue
discovered during rc7 that might be too risky.

Reported-and-Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Chris Wilson <chris@chris-wilson.co.uk>
CC: Imre Deak <imre.deak@intel.com>
CC: Daniel Vetter <daniel.vetter@ffwll.ch>
CC: David Airlie <airlied@linux.ie>
CC: <dri-devel@lists.freedesktop.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-06-25 10:39:57 +10:00
Chris Wilson 19b2dbde57 drm/i915: Restore fences after resume and GPU resets
Stéphane Marchesin found that fences for pinned objects (i.e. the
scanout) were not being restored upon resume, leading to corruption on
the display and reference counting issues. This is due to a bug in

commit 312817a39f [2.6.38]
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Nov 22 11:50:11 2010 +0000

    drm/i915: Only save and restore fences for UMS

that zapped the pinned fences even though they were in use.
Fortuitously, whilst we forced a VT switch during suspend and resume,
no fences were ever pinned at the time. However, we now can do
switchless S3 transitions and so the old bug finally surfaces.

Reported-by: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-06-16 01:10:45 +02:00
Mika Kuoppala aa60c664e6 drm/i915: find guilty batch buffer on ring resets
After hang check timer has declared gpu to be hung,
rings are reset. In ring reset, when clearing
request list, do post mortem analysis to find out
the guilty batch buffer.

Select requests for further analysis by inspecting
the completed sequence number which has been updated
into the HWS page. If request was completed, it can't
be related to the hang.

For noncompleted requests mark the batch as guilty
if the ring was not waiting and the ring head was
stuck inside the buffer object or in the flush region
right after the batch. For everything else, mark
them as innocents.

v2: Fixed a typo in commit message (Ville Syrjälä)

v3: - more descriptive function parameters (Chris Wilson)
    - use masked head address when inspecting if request is in ring
    - s/hangcheck.last_action/hangcheck.action
    - added comment about unmasked head hitting batch_obj range

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-06-13 17:42:17 +02:00
Mika Kuoppala 7d736f4f0b drm/i915: add batch bo to i915_add_request()
In order to track down a batch buffer and context which
caused the ring to hang, store reference to bo into the request struct.
Request can also cause gpu to hang after the batch in the flush section
in the ring. To detect this add start of the flush portion offset into the
request.

v2: Included comment about request vs batch_obj lifetimes (Chris Wilson)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-06-13 17:42:16 +02:00
Mika Kuoppala 0025c0772d drm/i915: change i915_add_request to macro
Only execbuffer needed all the parameters on i915_add_request().
By putting __i915_add_request behind macro, all current callsites
become cleaner. Following patch will introduce a new parameter
for __i915_add_request. With this patch, only the relevant callsite
will reflect the change making commit smaller and easier to understand.

v2: _i915_add_request as function name (Chris Wilson)

v3: change name __i915_add_request and fix ordering of params (Ben Widawsky)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-06-13 17:42:15 +02:00
Dave Airlie e6dfcc5303 Merge tag 'drm-intel-next-2013-06-01' of git://people.freedesktop.org/~danvet/drm-intel into drm-next
Daniel writes:
Another round of drm-intel-next for 3.11. Highlights:
- Haswell IPS support (Paulo Zanoni)
- VECS support on Haswell (Ben Widawsky, Xiang Haihao, ...)
- Haswell watermark fixes (Paulo Zanoni)
- "Make the gun bigger again" multithread fence fix from Chris.
- i915_error_state finnally no longer fails with -ENOMEM! Big thanks to
  Mika for tackling this.
- vlv sideband locking fixes from Jani
- Hangcheck prep work for arb_robustness support (Mika&Chris)
- edp vs cpu port confusion clean-up from Imre
- pile of smaller fixes and cleanups all over.

* tag 'drm-intel-next-2013-06-01' of git://people.freedesktop.org/~danvet/drm-intel: (70 commits)
  drm/i915: add i915_ips_status debugfs entry
  drm/i915: add enable_ips module option
  drm/i915: implement IPS feature
  drm/i915: fix up the edp power well check
  drm/i915: add I915_PARAM_HAS_VEBOX to i915_getparam
  drm/i915: add I915_EXEC_VEBOX to i915_gem_do_execbuffer()
  drm/i915: add VEBOX into debugfs
  drm/i915: Enable vebox interrupts
  drm/i915: vebox interrupt get/put
  drm/i915: consolidate interrupt naming scheme
  drm/i915: Convert irq_refounct to struct
  drm/i915: make PM interrupt writes non-destructive
  drm/i915: Add PM regs to pre/post install
  drm/i915: Create an ivybridge_irq_preinstall
  drm/i915: Create a more generic pm handler for hsw+
  drm/i915: add support for 5/6 data buffer partitioning on Haswell
  drm/i915: properly set HSW WM_LP watermarks
  drm/i915: properly set HSW WM_PIPE registers
  drm/i915: fix pch_nop support
  drm/i915: Vebox ringbuffer init
  ...
2013-06-11 08:38:56 +10:00
Daniel Vetter 7abb690a0e drm/i915: Fix spurious -EIO/SIGBUS on wedged gpus
Chris Wilson noticed that since

commit 1f83fee08d [v3.9]
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Thu Nov 15 17:17:22 2012 +0100

    drm/i915: clear up wedged transitions

X can again get -EIO when it does not expect it. And even worse score
a SIGBUS when accessing gtt mmaps. The established ABI is that we
_only_ return an -EIO from execbuf - all other ioctls should just
work. And since the reset code moves all bos out of gpu domains and
clears out all the last_seqno/ring tracking there really shouldn't be
any reason for non-execbuf code to ever touch the hw and see an -EIO.

After some extensive discussions we've noticed that these spurios -EIO
are caused by i915_gem_wait_for_error:

http://www.mail-archive.com/intel-gfx@lists.freedesktop.org/msg20540.html

That is easy to fix by returning 0 instead of -EIO, since grabbing the
dev->struct_mutex does not yet mean that we actually want to touch the
hw. And so there is no reason at all to fail with -EIO.

But that's not the entire since, since often (at least it's easily
googleable) dmesg indicates that the reset fails and we declare the
gpu wedged. Then, quite a bit later X wakes up with the "Timed out
waiting for the gpu reset to complete" DRM_ERROR message in
wait_for_errror and brings down the desktop with an -EIO/SIGBUS.

So clearly we're missing a wakeup somewhere, since the gpu reset just
doesn't take 10 seconds to complete. And indeed we're do handle the
terminally wedged state wrong.

Fix this all up.

References: https://bugs.freedesktop.org/show_bug.cgi?id=63921
References: https://bugs.freedesktop.org/show_bug.cgi?id=64073
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-06-03 14:35:18 +02:00
Ben Widawsky 35c20a60c7 drm/i915: Rename the gtt_list to global_list
Since it will be used for the global bound/unbound list with full PPGTT,
this helps clarify things for upcoming code rework.

Recommended-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-06-03 10:51:14 +02:00
Ben Widawsky 401c29f607 drm/i915: unpin pages at unbind
If we properly keep track of the pages_pin_count, then when we later add
multiple address spaces, the put_pages doesn't need any special checks
to be able to perform it's job.

CC: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Rebased on top of the fix for stolen memory pinning.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-06-03 10:50:22 +02:00
Ben Widawsky 1d64ae719b drm/i915: Unpin stolen pages
The way the stolen handling works is we take a pin on the backing pages,
but we never actually get a reference to the bo. On freeing objects
allocated with stolen memory, the final unref will end up freeing the
object with pinned pages count left. To enable an assertion to catch
bugs in this code path, this patch cleans up that remaining pin.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-06-03 10:49:08 +02:00
Ben Widawsky 9a8a2213a7 drm/i915: Vebox ringbuffer init
v2: Add set_seqno which didn't exist before rebase (Haihao)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Xiang, Haihao <haihao.xiang@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-05-31 20:54:12 +02:00
Ben Widawsky 0a9ae0d7f8 drm/i915: pre-fixes for checkpatch
Since I'll need to modify i915_gem_object_bind_to_gtt(), fix the errors
now to get checkpatch to not complain.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Resolve conflict with Chris' improved debug output, and
bikeshed the new variable with s/max/gtt_max/ a bit while at it.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-05-31 20:53:56 +02:00
Dave Airlie e81f3d81e2 Merge tag 'drm-intel-next-2013-05-20-merged' of git://people.freedesktop.org/~danvet/drm-intel into drm-next
Daniel writes:
Highlights (copy-pasted from my testing cycle mails):
- fbc support for Haswell (Rodrigo)
- streamlined workaround comments, including an igt tool to grep for
  them (Damien)
- sdvo and TV out cleanups, including a fixup for sdvo multifunction devices
- refactor our eDP mess a bit (Imre)
- don't register the hdmi connector on haswell when desktop eDP is present
- vlv support is no longer preliminary!
- more vlv fixes from Jesse for stolen and dpll handling
- more flexible power well checking infrastructure from Paulo
- a few gtt patches from Ben
- a bit of OCD cleanups for transcoder #defines and an assorted pile
  of smaller things.
- fixes for the gmch modeset sequence
- a bit of OCD around plane/pipe usage (Ville)
- vlv turbo support (Jesse)
- tons of vlv modeset fixes (Jesse et al.)
- vlv pte write fixes (Kenneth Graunke)
- hpd filtering to avoid costly probes on unaffected outputs (Egbert Eich)
- intel dev_info cleanups and refactorings (Damien)
- vlv rc6 support (Jesse)
- random pile of fixes around non-24bpp modes handling
- asle/opregion cleanups and locking fixes (Jani)
- dp dpll refactoring
- improvements for reduced_clock computation on g4x/ilk+
- pfit state refactored to use pipe_config (Jesse)
- lots more computed modeset state moved to pipe_config, including readout
  and cross-check support
- fdi auto-dithering for ivb B/C links, using the neat pipe_config
  improvements
- drm_rect helpers plus sprite clipping fixes (Ville)
- hw context refcounting (Mika + Ben)

* tag 'drm-intel-next-2013-05-20-merged' of git://people.freedesktop.org/~danvet/drm-intel: (155 commits)
  drm/i915: add support for dvo Chrontel 7010B
  drm/i915: Use pipe config state to control gmch pfit enable/disable
  drm/i915: Use pipe_config state to disable ilk+ pfit
  drm/i915: panel fitter hw state readout&check support
  drm/i915: implement WADPOClockGatingDisable for LPT
  drm/i915: Add missing platform tags to FBC workaround comments
  drm/i915: rip out an unused lvds_reg variable
  drm/i915: Compute WR PLL dividers dynamically
  drm/i915: HSW FBC WaFbcDisableDpfcClockGating
  drm/i915: HSW FBC WaFbcAsynchFlipDisableFbcQueue
  drm/i915: Enable FBC at Haswell.
  drm/i915: IVB FBC WaFbcDisableDpfcClockGating
  drm/i915: IVB FBC WaFbcAsynchFlipDisableFbcQueue
  drm/i915: Add support for FBC on Ivybridge.
  drm/i915: Organize VBT stuff inside drm_i915_private
  drm/i915: make SDVO TV-out work for multifunction devices
  drm/i915: rip out now unused is_foo tracking from crtc code
  drm/i915: rip out TV-out lore ...
  drm/i915: drop TVclock special casing on ilk+
  drm/i915: move sdvo TV clock computation to intel_sdvo.c
  ...
2013-05-31 12:56:05 +10:00
Chris Wilson 2dc8aae06d drm/i915: Workaround incoherence with fence updates on Valleyview
In commit 25ff1195f8
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Apr 4 21:31:03 2013 +0100

    drm/i915: Workaround incoherence between fences and LLC across multiple CPUs

we introduced an empirical workaround for memory corruption when using
fences from multiple CPUs. At the time, we did not have any results for
Valleyview, so the presumption was that it was limited to recent
generations using LLC. Now we have evidence that Valleyview also suffers
incoherence and requires a similar but different workaround. For
Valleyview, the wbinvd instruction is insufficient and we require the
serialising register write per-CPU. Conversely, that serialising
register write is not enough for SNB/IVB/HSW. To compromise and keep the
code relatively clean, employ both serialisation techniques in the same
workaround.

Reported-by: Jon Bloomfield <jon.bloomfield@intel.com>
Tested-by: Jon Bloomfield <jon.bloomfield@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-05-23 12:51:31 +02:00
Chris Wilson a36689cb77 drm/i915: Be more informative when reporting "too large for aperture" error
This should help debugging the truly unexpected cases where it occurs -
in particular to see which value is garbage.

References: https://bugzilla.kernel.org/show_bug.cgi?id=58511
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: s/%ld/%zd/ as spotted by Wu Fengguang's autobuilder.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-05-23 12:51:29 +02:00
Imre Deak e054cc3937 drm/i915: avoid premature timeouts in __wait_seqno()
At the moment wait_event_timeout/wait_event_interruptible_timeout may
time out 1 jiffy too early, as the calculated expiry time is 1 less than
needed. Besides timing out too early this also means that the
calculation of the remaining time will be incorrect and we will pass a
non-zero remaining time to user space in case of a time out. This is one
reason for the following bugzilla report:

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64270

Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-05-22 13:51:23 +02:00
Daniel Vetter e1b73cba13 Linux 3.10-rc2
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJRmpexAAoJEHm+PkMAQRiGrRIH/1uWFW38RvaCV/PXm/ia6Z+x
 BfBJfBIvPxGwb4n7aQNQlhU25xkfrPZ6szO4WiBH5/KPH3xYi2I2OZ1AzffkYqMF
 BWkPmsPK6EsTdp16zsi6JtH2aXArG4SpYA7ZamPvDkmfigHuiZg7GlL/9eHTRPNV
 P7Q8JToOrcnP8RoGgNj0uFiQeQbc62Kmoq7WuPtUhVlpQCCCknXgOJiYgz9w6Xe9
 /i79YFS8WRrzAquExT1NbIOh4ZMqB9MvuroaVWy8JDDLUyz7QUvOCe3tCDNguwgi
 FdWvU6nfkdQq5SLaWCWXDE9Rp/pL1MvfBn9vCOwFcp42aw0aQ0PgJVIXvsqufd0=
 =jgDI
 -----END PGP SIGNATURE-----

Merge tag 'v3.10-rc2' into drm-intel-next-queued

Backmerge Linux 3.10-rc2 since the various (rather trivial) conflicts
grew a bit out of hand. intel_dp.c has the only real functional
conflict since the logic changed while dev_priv->edp.bpp was moved
around.

Also squash in a whitespace fixup from Ben Widawsky for
i915_gem_gtt.c, git seems to do something pretty strange in there
(which I don't fully understand tbh).

Conflicts:
	drivers/gpu/drm/i915/i915_reg.h
	drivers/gpu/drm/i915/intel_dp.c

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-05-21 09:52:16 +02:00
Mika Kuoppala 0e50e96bf2 drm/i915: add context into request struct
Storing context reference into request struct
allows us to inspect context and its associated
objects when requests are retired.

Both ppgtt and arb robustness work will need
this.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-05-06 11:21:51 +02:00
Chris Wilson 4f42f4ef0d drm/i915: Always normalize return timeout for wait_timeout_ioctl
As we recompute the remaining timeout after waiting, there is a
potential for that timeout to be less than zero and so need sanitizing.
The timeout is always returned to userspace and validated, so we should
always perform the sanitation.

v2 [vsyrjala]: Only normalize the timespec if it's invalid
v3: Add a comment to clarify the situation and remove the now
    useless WARN_ON() (ickle)

Cc: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-04-30 10:50:32 +02:00
Ville Syrjälä 42b5aeabe9 drm/i915: IVB/HSW have 32 fence register
Increase the number of fence registers to 32 on IVB/HSW. VLV however
only has 16 fence registers according to the docs.

Increasing the number of fences was attempted before [1], but there was
some uncertainty about the maximum CPU fence number for FBC. Since then
BSpec has been updated to state that there are in fact 32 fence registers,
and the CPU fence number field in the SNB_DPFC_CTL_SA register is 5 bits,
and the CPU fence number field in the ILK_DPFC_CONTROL register must be
zero. So now it all makes sense.

[1] http://lists.freedesktop.org/archives/intel-gfx/2011-October/012865.html

v2: Include some background information based on the previous attempt

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-04-18 09:43:21 +02:00
Ben Widawsky b7c36d2546 drm/i915: Allow PPGTT enable to fail
I'm really not happy that we have to support this, but this will be the
simplest way to handle cases where PPGTT init can fail, which I promise
will be coming in the future.

v2: Resolve conflicts due to patch series reordering.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-04-18 09:43:16 +02:00
Ben Widawsky 6197349bde drm/i915: Abstract PPGTT enabling
Since we've already set up a nice vtable to abstract other PPGTT
functions, also abstract the actual register programming to enable
things.

This function will probably need to change a bit as we implement real
processes.

v2: Resolve conflicts due to patch series reordering.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-04-18 09:43:15 +02:00
Chris Wilson 25ff1195f8 drm/i915: Workaround incoherence between fences and LLC across multiple CPUs
In order to fully serialize access to the fenced region and the update
to the fence register we need to take extreme measures on SNB+, and
manually flush writes to memory prior to writing the fence register in
conjunction with the memory barriers placed around the register write.

Fixes i-g-t/gem_fence_thrash

v2: Bring a bigger gun
v3: Switch the bigger gun for heavier bullets (Arjan van de Ven)
v4: Remove changes for working generations.
v5: Reduce to a per-cpu wbinvd() call prior to updating the fences.
v6: Rewrite comments to ellide forgotten history.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Tested-by: Jon Bloomfield <jon.bloomfield@intel.com> (v2)
Cc: stable@vger.kernel.org
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-04-18 09:43:10 +02:00
Ben Widawsky 88a2b2a32d drm/i915: Don't wait for PCH on reset
BIOS should be setting this, but in case it doesn't...

v2: Define the bits we actually want to clear (Jesse)
Make it an RMW op (Jesse)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-04-08 20:53:05 +02:00
Imre Deak 2db76d7c3c lib/scatterlist: sg_page_iter: support sg lists w/o backing pages
The i915 driver uses sg lists for memory without backing 'struct page'
pages, similarly to other IO memory regions, setting only the DMA
address for these. It does this, so that it can program the HW MMU
tables in a uniform way both for sg lists with and without backing pages.

Without a valid page pointer we can't call nth_page to get the current
page in __sg_page_iter_next, so add a helper that relevant users can
call separately. Also add a helper to get the DMA address of the current
page (idea from Daniel).

Convert all places in i915, to use the new API.

Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-03-27 17:13:44 +01:00
Chris Wilson f9c513e9d6 drm/i915: Always call fence-lost prior to removing the fence
There is a minute window for a race between put-fence removing the fence
and for a new transaction by an external party on the GTT mmap. That is
we must zap the mmap prior to removing the fence and not afterwards.

Fixes regression from
commit 61050808bb
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Apr 17 15:31:31 2012 +0100

    drm/i915: Refactor put_fence() to use the common fence writing routine

v2: Remember the fence to remove with a local variable (gcc)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-03-26 20:16:18 +01:00
Imre Deak 90797e6d1e drm/i915: create compact dma scatter lists for gem objects
So far we created a sparse dma scatter list for gem objects, where each
scatter list entry represented only a single page. In the future we'll
have to handle compact scatter lists too where each entry can consist of
multiple pages, for example for objects imported through PRIME.

The previous patches have already fixed up all other places where the
i915 driver _walked_ these lists. Here we have the corresponding fix to
_create_ compact lists. It's not a performance or memory footprint
improvement, but it helps to better exercise the new logic.

Reference: http://www.spinics.net/lists/dri-devel/msg33917.html
Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-03-23 12:17:09 +01:00
Imre Deak 67d5a50c04 drm/i915: handle walking compact dma scatter lists
So far the assumption was that each dma scatter list entry contains only
a single page. This might not hold in the future, when we'll introduce
compact scatter lists, so prepare for this everywhere in the i915 code
where we walk such a list.

We'll fix the place _creating_ these lists separately in the next patch
to help the reviewing/bisectability.

Reference: http://www.spinics.net/lists/dri-devel/msg33917.html
Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-03-23 12:16:36 +01:00
Daniel Vetter 0d4a42f6bd Linux 3.9-rc3
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJRRkrbAAoJEHm+PkMAQRiGy3oH/jrbHinYs0auurANgx4TdtWT
 /WNajstKBqLOJJ6cnTR7sOqwOVlptt65EbbTs+qGyZ2Z2W/Lg0BMenHvNHo4ER8C
 e7UbMdBCSLKBjAMKh1XCoZscGv4Exm8WRH3Vc5yP0Hafj3EzSAVLY1dta9WKKoQi
 bh7D1ErUlbU1zczA1w5YbPF0LqFKRvyZOwebMCCAKAxv5wWAxmbcPNxVR4sufkjg
 k6TkQ2ysgWivZAfy3tJYOcxiEu7ahpZVEuYdlZEJQXHRQUfoNljQlOp4BqKsYUai
 5A0kaf2VpKay/7pkhvTfBBcF/jFJ68pYP6gQ2ThNdr0b5kOiAfMWj030Xyngnhg=
 =iO9t
 -----END PGP SIGNATURE-----

Merge tag 'v3.9-rc3' into drm-intel-next-queued

Backmerge so that I can merge Imre Deak's coalesced sg entries fixes,
which depend upon the new for_each_sg_page introduce in

commit a321e91b6d
Author: Imre Deak <imre.deak@intel.com>
Date:   Wed Feb 27 17:02:56 2013 -0800

    lib/scatterlist: add simple page iterator

The merge itself is just two trivial conflicts:

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-03-19 09:47:30 +01:00
Jesse Barnes d62b4892f3 drm/i915: allow force wake at init time on VLV v2
We need to set the 'allow force wake' bit to enable forcewake handling
later on.

v2: split from clock gating patch (Jani)
    check for allowwakeack (Ville)

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-03-19 09:38:32 +01:00
Ville Syrjälä 2bb4629add drm/i915: Add to_user_ptr()
to_user_ptr() simply casts a pointer passed as u64 from user space
to void __user * correctly. Using this lets us get rid of all the
tiresome casts.

The idea came from Chris Wilson <chris@chris-wilson.co.uk>.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-03-03 19:49:11 +01:00
Linus Torvalds d895cb1af1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile (part one) from Al Viro:
 "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
  locking violations, etc.

  The most visible changes here are death of FS_REVAL_DOT (replaced with
  "has ->d_weak_revalidate()") and a new helper getting from struct file
  to inode.  Some bits of preparation to xattr method interface changes.

  Misc patches by various people sent this cycle *and* ocfs2 fixes from
  several cycles ago that should've been upstream right then.

  PS: the next vfs pile will be xattr stuff."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
  saner proc_get_inode() calling conventions
  proc: avoid extra pde_put() in proc_fill_super()
  fs: change return values from -EACCES to -EPERM
  fs/exec.c: make bprm_mm_init() static
  ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
  ocfs2: fix possible use-after-free with AIO
  ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
  get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
  target: writev() on single-element vector is pointless
  export kernel_write(), convert open-coded instances
  fs: encode_fh: return FILEID_INVALID if invalid fid_type
  kill f_vfsmnt
  vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
  nfsd: handle vfs_getattr errors in acl protocol
  switch vfs_getattr() to struct path
  default SET_PERSONALITY() in linux/elf.h
  ceph: prepopulate inodes only when request is aborted
  d_hash_and_lookup(): export, switch open-coded instances
  9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
  9p: split dropping the acls from v9fs_set_create_acl()
  ...
2013-02-26 20:16:07 -08:00
Al Viro 496ad9aa8e new helper: file_inode(file)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22 23:31:31 -05:00
Daniel Vetter eb32e4584d drm/i915: Use HAS_L3_GPU_CACHE in i915_gem_l3_remap
Yet another remnant ... this might explain why l3 remapping didn't
really work on HSW.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=57441
Spotted-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-02-20 00:21:47 +01:00
Imre Deak 769ce4643b drm/i915: don't clflush gem objects in stolen memory
As explained by Chris Wilson gem objects in stolen memory are always
coherent with the GPU so we don't need to ever flush the CPU caches for
these.

This fixes a breakage - at least with the compact sg patches applied -
during the resume/restore gtt mappings path, when we tried to clflush an
FB object in stolen memory, but since stolen objects don't have backing
pages we passed an invalid page pointer to drm_clflush_page().

Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-02-20 00:21:43 +01:00
Ben Widawsky 4fc7c971c3 drm/i915: Extract ring init from hw_init
The ring initialization will differ a bit in upcoming generations, and
this split will prepare the code for what's needed.

This patch also fixes a bug introduced in:
commit 9943393195
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date:   Tue Jan 22 14:12:17 2013 +0200

    drm/i915: use gem_set_seqno() on hardware init

After doing the extraction, the bad error handling became obvious.  I
acknowledge that this should be two patches, but it's a pretty
small/trivial patch. If requested, I can certainly do the fix as a
distinct patch.

v2: Should be cleanup blt, not init blt on failure (Chris)

v3: Forgot to git add on v2

Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-02-15 10:30:39 +01:00
Dave Airlie cd17ef4114 Merge tag 'drm-intel-next-2013-02-01' of git://people.freedesktop.org/~danvet/drm-intel into drm-next
Daniel writes:
"Probably the last feature pull for 3.9, there's some fixes outstanding
thought that I'd like to sneak in. And maybe 3.8 takes a bit longer ...
Anyway, highlights of this pull:
- Kill the horrible IS_DISPLAYREG hack to handle the mmio offset movements
  on vlv, big thanks to Ville.
- Dynamic power well support for Haswell, shaves away a bit when only
  using the eDP port on pipe A (Paulo). Plus unclaimed register fixes
  uncovered by this.
- Clarifications of the gpu hang/reset state transitions, hopefully fixing
  a few spurious -EIO deaths in userspace.
- Haswell ELD fixes.
- Some more (pp)gtt cleanups from Ben.
- A few smaller things all over.

Plus all the stuff from the previous rather small pull request:
- Broadcast RBG improvements and reduced color range fixes from Ville.
- Ben is on a "kill legacy gtt code for good" spree, first pile of patches
  included.
- No-relocs and bo lut improvements for faster execbuf from Chris.
- Some refactorings from Imre."

* tag 'drm-intel-next-2013-02-01' of git://people.freedesktop.org/~danvet/drm-intel: (101 commits)
  GPU/i915: Fix acpi_bus_get_device() check in drivers/gpu/drm/i915/intel_opregion.c
  drm/i915: Set the SR01 "screen off" bit in i915_redisable_vga() too
  drm/i915: Kill IS_DISPLAYREG()
  drm/i915: Introduce i915_vgacntrl_reg()
  drm/i915: gen6_gmch_remove can be static
  drm/i915: dynamic Haswell display power well support
  drm/i915: check the power down well on assert_pipe()
  drm/i915: don't send DP "idle" pattern before "normal" on HSW PORT_A
  drm/i915: don't run hsw power well code on !hsw
  drm/i915: kill cargo-culted locking from power well code
  drm/i915: Only run idle processing from i915_gem_retire_requests_worker
  drm/i915: Fix CAGF for HSW
  drm/i915: Reclaim GTT space for failed PPGTT
  drm/i915: remove intel_gtt structure
  drm/i915: Add probe and remove to the gtt ops
  drm/i915: extract hw ppgtt setup/cleanup code
  drm/i915: pte_encode is gen6+
  drm/i915: vfuncs for ppgtt
  drm/i915: vfuncs for gtt_clear_range/insert_entries
  drm/i915: Error state should print /sys/kernel/debug
  ...
2013-02-08 11:08:10 +10:00
Chris Wilson 725a5b5402 drm/i915: Only run idle processing from i915_gem_retire_requests_worker
When adding the fb idle detection to mark-inactive, it was forgotten
that userspace can drive the processing of retire-requests. We assumed
that it would be principally driven by the retire requests worker,
running once every second whilst active and so we would get the deferred
timer for free. Instead we spend too many CPU cycles reclocking the LVDS
preventing real work from being done.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reported-and-tested-by: Alexander Lam <lambchop468@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=58843
Cc: stable@vger.kernel.org
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-31 11:50:09 +01:00
Mika Kuoppala 9943393195 drm/i915: use gem_set_seqno() on hardware init
When machine was rebooted or module was reloaded,
gem_hw_init() set last_seqno to be identical to next_seqno.
This lead to situation that waits for first ever request
always passed immediately regardless if it was actually
executed.

Use gem_set_seqno() to be consistent how hw is
initialized on init, wrap and on resume.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-22 13:52:26 +01:00
Daniel Vetter f69061bedd drm/i915: create a race-free reset detection
With the previous patch the state transition handling of the reset
code itself is now (hopefully) race free and solid. But that still
leaves out everyone else - with the various lock-free wait paths
we have there's the possibility that the reset happens between the
point where we read the seqno we should wait on and the actual wait.

And if __wait_seqno then never sees the RESET_IN_PROGRESS state, we'll
happily wait for a seqno which will in all likelyhood never signal.

In practice this is not a big problem since the X server gets
constantly interrupted, and can then submit more work (hopefully) to
unblock everyone else: As soon as a new seqno write lands, all waiters
will unblock. But running the i-g-t reset testcase ZZ_hangman can
expose this race, especially on slower hw with fewer cpu cores.

Now looking forward to ARB_robustness and friends that's not the best
possible behaviour, hence this patch adds a reset_counter to be able
to detect any reset, even if a given thread never observed the
in-progress state.

The important part is to correctly order things:
- The write side needs to increment the counter after any seqno gets
  reset.  Hence we need to do that at the end of the reset work, and
  again wake everyone up. We also need to place a barrier in between
  any possible seqno changes and the counter increment, since any
  unlock operations only guarantee that nothing leaks out, but not
  that at later load operation gets moved ahead.
- On the read side we need to ensure that no reset can sneak in and
  invalidate the seqno. In all cases we can use the one-sided barrier
  that unlock operations guarantee (of the lock protecting the
  respective seqno/ring pair) to ensure correct ordering. Hence it is
  sufficient to place the atomic read before the mutex/spin_unlock and
  no additional barriers are required.

The end-result of all this is that we need to wake up everyone twice
in a reset operation:
- First, before the reset starts, to get any lockholders of the locks,
  so that the reset can proceed.
- Second, after the reset is completed, to allow waiters to properly
  and reliably detect the reset condition and bail out.

I admit that this entire reset_counter thing smells a bit like
overkill, but I think it's justified since it makes it really explicit
what the bail-out condition is. And we need a reset counter anyway to
implement ARB_robustness, and imo with finer-grained locking on the
horizont this is the most resilient scheme I could think of.

v2: Drop spurious change in the wait_for_error EXIT_COND - we only
need to wait until we leave the reset-in-progress wedged state.

v3: Don't play tricks with barriers in the throttle ioctl, the
spin_unlock is barrier enough.

I've also considered using a little helper to grab the current
reset_counter, but then decided that hiding the atomic_read isn't a
great idea, since having it explicitly show up in the code is a nice
remainder to reviews to check the memory barriers.

v4: Add a comment to explain why we need to fall through in
__wait_seqno in the end variable assignments.

v5: Review from Damien:
- s/smb/smp/ in a comment
- don't increment the reset counter after we've set it to WEDGED. Now
  we (again) properly wedge the gpu when the reset fails.

Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-21 19:53:54 +01:00
Dave Airlie 735dc0d1e2 Merge branch 'drm-kms-locking' of git://people.freedesktop.org/~danvet/drm-intel into drm-next
The aim of this locking rework is that ioctls which a compositor should be
might call for every frame (set_cursor, page_flip, addfb, rmfb and
getfb/create_handle) should not be able to block on kms background
activities like output detection. And since each EDID read takes about
25ms (in the best case), that always means we'll drop at least one frame.

The solution is to add per-crtc locking for these ioctls, and restrict
background activities to only use the global lock. Change-the-world type
of events (modeset, dpms, ...) need to grab all locks.

Two tricky parts arose in the conversion:
- A lot of current code assumes that a kms fb object can't disappear while
  holding the global lock, since the current code serializes fb
  destruction with it. Hence proper lifetime management using the already
  created refcounting for fbs need to be instantiated for all ioctls and
  interfaces/users.

- The rmfb ioctl removes the to-be-deleted fb from all active users. But
  unconditionally taking the global kms lock to do so introduces an
  unacceptable potential stall point. And obviously changing the userspace
  abi isn't on the table, either. Hence this conversion opportunistically
  checks whether the rmfb ioctl holds the very last reference, which
  guarantees that the fb isn't in active use on any crtc or plane (thanks
  to the conversion to the new lifetime rules using proper refcounting).
  Only if this is not the case will the code go through the slowpath and
  grab all modeset locks. Sane compositors will never hit this path and so
  avoid the stall, but userspace relying on these semantics will also not
  break.

All these cases are exercised by the newly added subtests for the i-g-t
kms_flip, tested on a machine where a full detect cycle takes around 100
ms.  It works, and no frames are dropped any more with these patches
applied.  kms_flip also contains a special case to exercise the
above-describe rmfb slowpath.

* 'drm-kms-locking' of git://people.freedesktop.org/~danvet/drm-intel: (335 commits)
  drm/fb_helper: check whether fbcon is bound
  drm/doc: updates for new framebuffer lifetime rules
  drm: don't hold crtc mutexes for connector ->detect callbacks
  drm: only grab the crtc lock for pageflips
  drm: optimize drm_framebuffer_remove
  drm/vmwgfx: add proper framebuffer refcounting
  drm/i915: dump refcount into framebuffer debugfs file
  drm: refcounting for crtc framebuffers
  drm: refcounting for sprite framebuffers
  drm: fb refcounting for dirtyfb_ioctl
  drm: don't take modeset locks in getfb ioctl
  drm: push modeset_lock_all into ->fb_create driver callbacks
  drm: nest modeset locks within fpriv->fbs_lock
  drm: reference framebuffers which are on the idr
  drm: revamp framebuffer cleanup interfaces
  drm: create drm_framebuffer_lookup
  drm: revamp locking around fb creation/destruction
  drm: only take the crtc lock for ->cursor_move
  drm: only take the crtc lock for ->cursor_set
  drm: add per-crtc locks
  ...
2013-01-21 07:44:58 +10:00
Chris Wilson 97c809fd9c drm/i915: Only apply the mb() when flushing the GTT domain during a finish
Now that we seem to have brought order to the GTT barriers, the last one
to review is the terminal barrier before we unbind the buffer from the
GTT. This needs to only be performed if the buffer still resides in the
GTT domain, and so we can skip some needless barriers otherwise.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-20 13:11:17 +01:00
Chris Wilson d0a57789d5 drm/i915: Only insert the mb() before updating the fence parameter
With a fence, we only need to insert a memory barrier around the actual
fence alteration for CPU accesses through the GTT. Performing the
barrier in flush-fence was inserting unnecessary and expensive barriers
for never fenced objects.

Note removing the barriers from flush-fence, which was effectively a
barrier before every direct access through the GTT, revealed that we
where missing a barrier before the first access through the GTT. Lack of
that barrier was sufficient to cause GPU hangs.

v2: Add a couple more comments to explain the new barriers

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-20 13:11:16 +01:00
Daniel Vetter 1f83fee08d drm/i915: clear up wedged transitions
We have two important transitions of the wedged state in the current
code:

- 0 -> 1: This means a hang has been detected, and signals to everyone
  that they please get of any locks, so that the reset work item can
  do its job.

- 1 -> 0: The reset handler has completed.

Now the last transition mixes up two states: "Reset completed and
successful" and "Reset failed". To distinguish these two we do some
tricks with the reset completion, but I simply could not convince
myself that this doesn't race under odd circumstances.

Hence split this up, and add a new terminal state indicating that the
hw is gone for good.

Also add explicit #defines for both states, update comments.

v2: Split out the reset handling bugfix for the throttle ioctl.

v3: s/tmp/wedged/ sugested by Chris Wilson. Also fixup up a rebase
error which prevented this patch from actually compiling.

v4: To unify the wedged state with the reset counter, keep the
reset-in-progress state just as a flag. The terminally-wedged state is
now denoted with a big number.

v5: Add a comment to the reset_counter special values explaining that
WEDGED & RESET_IN_PROGRESS needs to be true for the code to be
correct.

v6: Fixup logic errors introduced with the wedged+reset_counter
unification. Since WEDGED implies reset-in-progress (in a way we're
terminally stuck in the dead-but-reset-not-completed state), we need
ensure that we check for this everywhere. The specific bug was in
wait_for_error, which would simply have timed out.

v7: Extract an inline i915_reset_in_progress helper to make the code
more readable. Also annote the reset-in-progress case with an
unlikely, to help the compiler optimize the fastpath. Do the same for
the terminally wedged case with i915_terminally_wedged.

Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-20 13:11:16 +01:00
Daniel Vetter 308887aad1 drm/i915: fix reset handling in the throttle ioctl
While auditing the code I've noticed one place (the throttle ioctl)
which does not yet wait for the reset handler to complete and doesn't
properly decode the wedge state into -EAGAIN/-EIO. Fix this up by
calling the right helpers. This might explain the oddball "my
compositor just died in a successfull gpu reset" reports. Or maybe not, since
current mesa doesn't use this ioctl to throttle command submission.

The throttle ioctl doesn't take the struct_mutex, so to avoid busy-looping
with -EAGAIN while a reset is in process, check for errors first and wait
for the handler to complete if a reset is pending by calling
i915_gem_wait_for_error.

Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-20 13:11:15 +01:00
Daniel Vetter 33196dedda drm/i915: move wedged to the other gpu error handling stuff
And to make Ben Widawsky happier, use the gpu_error instead of
the entire device as the argument in some functions.

Drop the outdated comment on ->wedged for now, a follow-up patch will
change the semantics and add a proper comment again.

Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-20 13:11:15 +01:00
Daniel Vetter 99584db33b drm/i915: extract hangcheck/reset/error_state state into substruct
This has been sprinkled all over the place in dev_priv. I think
it'd be good to also move all the code into a separate file like
i915_gem_error.c, but that's for another patch.

Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-20 13:11:14 +01:00
Ben Widawsky 93d187993b drm/i915: Remove use of gtt_mappable_entries
Mappable_end, ie. size is almost always what you want as opposed to the
number of entries. Since we already have that information, we can scrap
the number of entries and only calculate it when needed.

If gtt_start is !0, this will have slightly different behavior. This
difference can only occur in DRI1, and exists when we try to kick out
the firmware fb. The new code seems like a bugfix to me.

The other case where we've changed the behavior is during init we check
the mappable region against our current known upper and lower limits
(64MB, and 512MB). This now matches the comment, and makes things more
convenient after removing gtt_mappable_entries.

Also worth noting is the setting of mappable_end is taken out of setup
because we do it earlier now in the DRI2 case and therefore need to add
that tiny hunk to support the DRI1 IOCTL.

v2: Move up mappable end to before legacy AGP init

v3: Add the dev_priv inclusion here from previous rebase error in patch
5

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> (v2)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: squash in fix for a printk format flag mismatch warning.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-20 13:09:20 +01:00
Ben Widawsky 5d4545aef5 drm/i915: Create a gtt structure
The purpose of the gtt structure is to help isolate our gtt specific
properties from the rest of the code (in doing so it help us finish the
isolation from the AGP connection).

The following members are pulled out (and renamed):
gtt_start
gtt_total
gtt_mappable_end
gtt_mappable
gtt_base_addr
gsm

The gtt structure will serve as a nice place to put gen specific gtt
routines in upcoming patches. As far as what else I feel belongs in this
structure: it is meant to encapsulate the GTT's physical properties.
This is why I've not added fields which track various drm_mm properties,
or things like gtt_mtrr (which is itself a pretty transient field).

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
[Ben modified commit messages]
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-17 22:33:56 +01:00
Chris Wilson 43e28f092b drm/i915: Bail if we attempt to allocate pages for a purged object
Move the existing checking inside bind_to_gtt() to the more appropriate
layer in order to prevent recreation of the pages after they have been
explicitly truncated.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-17 22:07:59 +01:00
Chris Wilson dd624afd53 drm/i915: Add a debug interface to forcibly evict and shrink our object caches
As a means to investigate some bad system behaviour related to the
purging of the active, inactive and unbound lists, it is useful to be
able to manually control when those lists should be cleared.

v2: use _safe list iterators as we kick objects from the list as we
walk.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Add a small comment explaining why we don't need to check and
wait for gpu resets, acked by Chris on irc.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-17 22:07:57 +01:00
Imre Deak 0fa8779651 drm/i915: use gtt_get_size() instead of open coding it
Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-17 22:07:56 +01:00
Imre Deak 56c844e539 drm/i915: merge {i965, sandybridge}_write_fence_reg()
The two functions are rather similar, so merge them.

Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-17 22:07:55 +01:00
Imre Deak d865110cc2 drm/i915: merge get_gtt_alignment/get_unfenced_gtt_alignment()
The two functions are rather similar, so merge them.

Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-17 22:07:54 +01:00
Dave Airlie b5cc6c0387 Merge tag 'drm-intel-next-2012-12-21' of git://people.freedesktop.org/~danvet/drm-intel into drm-next
Daniel writes:
- seqno wrap fixes and debug infrastructure from Mika Kuoppala and Chris
  Wilson
- some leftover kill-agp on gen6+ patches from Ben
- hotplug improvements from Damien
- clear fb when allocated from stolen, avoids dirt on the fbcon (Chris)
- Stolen mem support from Chris Wilson, one of the many steps to get to
  real fastboot support.
- Some DDI code cleanups from Paulo.
- Some refactorings around lvds and dp code.
- some random little bits&pieces

* tag 'drm-intel-next-2012-12-21' of git://people.freedesktop.org/~danvet/drm-intel: (93 commits)
  drm/i915: Return the real error code from intel_set_mode()
  drm/i915: Make GSM void
  drm/i915: Move GSM mapping into dev_priv
  drm/i915: Move even more gtt code to i915_gem_gtt
  drm/i915: Make next_seqno debugs entry to use i915_gem_set_seqno
  drm/i915: Introduce i915_gem_set_seqno()
  drm/i915: Always clear semaphore mboxes on seqno wrap
  drm/i915: Initialize hardware semaphore state on ring init
  drm/i915: Introduce ring set_seqno
  drm/i915: Missed conversion to gtt_pte_t
  drm/i915: Bug on unsupported swizzled platforms
  drm/i915: BUG() if fences are used on unsupported platform
  drm/i915: fixup overlay stolen memory leak
  drm/i915: clean up PIPECONF bpc #defines
  drm/i915: add intel_dp_set_signal_levels
  drm/i915: remove leftover display.update_wm assignment
  drm/i915: check for the PCH when setting pch_transcoder
  drm/i915: Clear the stolen fb before enabling
  drm/i915: Access to snooped system memory through the GTT is incoherent
  drm/i915: Remove stale comment about intel_dp_detect()
  ...

Conflicts:
	drivers/gpu/drm/i915/intel_display.c
2013-01-17 20:34:08 +10:00
Daniel Vetter 93927ca52a drm/i915: Revert shrinker changes from "Track unbound pages"
This partially reverts

commit 6c085a728c
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Aug 20 11:40:46 2012 +0200

    drm/i915: Track unbound pages

Closer inspection of that patch revealed a bunch of unrelated changes
in the shrinker:
- The shrinker count is now in pages instead of objects.
- For counting the shrinkable objects the old code only looked at the
  inactive list, the new code looks at all bounds objects (including
  pinned ones). That is obviously in addition to the new unbound list.
- The shrinker cound is no longer scaled with
  sysctl_vfs_cache_pressure. Note though that with the default tuning
  value of vfs_cache_pressue = 100 this doesn't affect the shrinker
  behaviour.
- When actually shrinking objects, the old code first dropped
  purgeable objects, then normal (inactive) objects. Only then did it,
  in a last-ditch effort idle the gpu and evict everything. The new
  code omits the intermediate step of evicting normal inactive
  objects.

Safe for the first change, which seems benign, and the shrinker count
scaling, which is a bit a different story, the endresult of all these
changes is that the shrinker is _much_ more likely to fall back to the
last-ditch resort of idling the gpu and evicting everything.  The old
code could only do that if something else evicted lots of objects
meanwhile (since without any other changes the nr_to_scan will be
smaller than the object count).

Reverting the vfs_cache_pressure behaviour itself is a bit bogus: Only
dentry/inode object caches should scale their shrinker counts with
vfs_cache_pressure. Originally I've had that change reverted, too. But
Chris Wilson insisted that it's too bogus and shouldn't again see the
light of day.

Hence revert all these other changes and restore the old shrinker
behaviour, with the minor adjustment that we now first scan the
unbound list, then the inactive list for each object category
(purgeable or normal).

A similar patch has been tested by a few people affected by the gen4/5
hangs which started to appear in 3.7, which some people bisected to
the "drm/i915: Track unbound pages" commit. But just disabling the
unbound logic alone didn't change things at all.

Note that this patch doesn't fix the referenced bugs, it only hides
the underlying bug(s) well enough to restore pre-3.7 behaviour. The
key to achieve that is to massively reduce the likelyhood of going
into a full gpu stall and evicting everything.

v2: Reword commit message a bit, taking Chris Wilson's comment into
account.

v3: On Chris Wilson's insistency, do not reinstate the rather bogus
vfs_cache_pressure change.

Tested-by: Greg KH <gregkh@linuxfoundation.org>
Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>
References: https://bugs.freedesktop.org/show_bug.cgi?id=55984
References: https://bugs.freedesktop.org/show_bug.cgi?id=57122
References: https://bugs.freedesktop.org/show_bug.cgi?id=56916
References: https://bugs.freedesktop.org/show_bug.cgi?id=57136
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-10 18:02:44 +01:00
Chris Wilson 93be8788e6 drm/i915; Only increment the user-pin-count after successfully pinning the bo
As along the error path we do not correct the user pin-count for the
failure, we may end up with userspace believing that it has a pinned
object at offset 0 (when interrupted by a signal for example).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-07 10:30:53 +01:00
Dave Airlie 8be0e5c427 Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into drm-next
Some fixes for 3.8:
- Watermark fixups from Chris Wilson (4 pieces).
- 2 snb workarounds, seem to be recently added to our internal DB.
- workaround for the infamous i830/i845 hang, seems now finally solid!
  Based on Chris' fix for SNA, now also for UXA/mesa&old SNA.
- Some more fixlets for shrinker-pulls-the-rug issues (Chris&me).
- Fix dma-buf flags when exporting (you).
- Disable the VGA plane if it's enabled on lid open - similar fix in
  spirit to the one I've sent you last weeek, BIOS' really like to mess
  with the display when closing the lid (awesome debug work from Krzysztof
  Mazur).

* 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel:
  drm/i915: disable shrinker lock stealing for create_mmap_offset
  drm/i915: optionally disable shrinker lock stealing
  drm/i915: fix flags in dma buf exporting
  i915: ensure that VGA plane is disabled
  drm/i915: Preallocate the drm_mm_node prior to manipulating the GTT drm_mm manager
  drm: Export routines for inserting preallocated nodes into the mm manager
  drm/i915: don't disable disconnected outputs
  drm/i915: Implement workaround for broken CS tlb on i830/845
  drm/i915: Implement WaSetupGtModeTdRowDispatch
  drm/i915: Implement WaDisableHiZPlanesWhenMSAAEnabled
  drm/i915: Prefer CRTC 'active' rather than 'enabled' during WM computations
  drm/i915: Clear self-refresh watermarks when disabled
  drm/i915: Double the cursor self-refresh latency on Valleyview
  drm/i915: Fixup cursor latency used for IVB lp3 watermarks
2012-12-30 13:54:12 +10:00
Ben Widawsky d7e5008f7c drm/i915: Move even more gtt code to i915_gem_gtt
This really should have been part of the kill agp series.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-20 16:27:35 +01:00
Daniel Vetter da494d7ca5 drm/i915: disable shrinker lock stealing for create_mmap_offset
The mmap offset structure is not part of the drm/i915 code, but
provided by gem helpers. To avoid leaky abstractions (by either
depending upon implementation details of said helper wrt to
preallocations, or reimplementing it in our code and so fuzzing
around in internal details of that helpr) simply disable
the shrinker lock stealing accross calls into the helper functions.

This should fix igt/gem_tiled_swapping.

v2: Fix cleanup path confusion bemoaned by Chris Wilson.

Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-20 14:57:35 +01:00
Daniel Vetter 677feac291 drm/i915: optionally disable shrinker lock stealing
commit 5774506f15
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Nov 21 13:04:04 2012 +0000

    drm/i915: Borrow our struct_mutex for the direct reclaim

added a nice trick to steal the struct_mutex lock in the shrinker if
it's the current task holding it. But this also caused the requirement
that every place which allocates memory needs to be careful about the
gem state of objects, since the shrinker could have pulled the rug out
from under it. We've usually solved this by carefully preallocating
things or ensure that buffers are pinned already.

But the shrinker also reaps mmap offset, so allocating those needs to
be careful, too. Now that code has been factored out into some common
helpers, so either we have fragile code depending upon the common
helper not doing something we don't want it to do. Or we need to
reimplement the mmap offset creation and so also leak implementation
details into our code.

Since this all results in leaky abstraction, cop out by disabling the
lock borrowing trick while calling down into the helpers. That way our
craziness is nicely confined to files in drm/i915.

v2: Split out the change to create_mmap_offset as request by Chris Wilson.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-20 14:56:04 +01:00
Mika Kuoppala fca26bb453 drm/i915: Introduce i915_gem_set_seqno()
This function can be used to set the driver's next_seqno
to arbitrary value.

i915_gem_set_seqno() will idle the gpu, retire outstanding
requests, clear the semaphore mailboxes and set the hardware
status page's seqno index.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-19 11:25:10 +01:00
Mika Kuoppala ba1a7067c0 drm/i915: Always clear semaphore mboxes on seqno wrap
In preparation for setting the seqno to arbitrary value on init or
through debugfs. We need to always clear the semaphores and set the
hws page seqno index by calling intel_ring_init_seqno().

v2: rewrote the commit message as suggested by Chris Wilson.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-19 11:17:41 +01:00
Mika Kuoppala f7e98ad4d4 drm/i915: Initialize hardware semaphore state on ring init
Hardware status page needs to have proper seqno set
as our initial seqno can be arbitrary. If initial seqno is close
to wrap boundary on init and i915_seqno_passed() (31bit space)
refers to hw status page which contains zero, errorneous result
will be returned.

v2: clear mboxes and set hws page directly instead of going
through rings. Suggested by Chris Wilson.

v3: hws needs to be updated for all gens. Noticed by Chris
Wilson.

References: https://bugs.freedesktop.org/show_bug.cgi?id=58230
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-19 11:17:01 +01:00
Ben Widawsky 8782e26c0c drm/i915: Bug on unsupported swizzled platforms
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-18 22:31:23 +01:00
Ben Widawsky 7dbf9d6e0f drm/i915: BUG() if fences are used on unsupported platform
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-18 22:29:55 +01:00
Chris Wilson dc9dd7a20f drm/i915: Preallocate the drm_mm_node prior to manipulating the GTT drm_mm manager
As we may reap neighbouring objects in order to free up pages for
allocations, we need to be careful not to allocate in the middle of the
drm_mm manager. To accomplish this, we can simply allocate the
drm_mm_node up front and then use the combined search & insert
drm_mm routines, reducing our code footprint in the process.

Fixes (partially) i-g-t/gem_tiled_swapping

Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
[danvet: Again fixup atomic bikeshed.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-18 22:02:29 +01:00
Linus Torvalds 3c2e81ef34 Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
Pull DRM updates from Dave Airlie:
 "This is the one and only next pull for 3.8, we had a regression we
  found last week, so I was waiting for that to resolve itself, and I
  ended up with some Intel fixes on top as well.

  Highlights:
   - new driver: nvidia tegra 20/30/hdmi support
   - radeon: add support for previously unused DMA engines, more HDMI
     regs, eviction speeds ups and fixes
   - i915: HSW support enable, agp removal on GEN6, seqno wrapping
   - exynos: IPP subsystem support (image post proc), HDMI
   - nouveau: display class reworking, nv20->40 z compression
   - ttm: start of locking fixes, rcu usage for lookups,
   - core: documentation updates, docbook integration, monotonic clock
     usage, move from connector to object properties"

* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (590 commits)
  drm/exynos: add gsc ipp driver
  drm/exynos: add rotator ipp driver
  drm/exynos: add fimc ipp driver
  drm/exynos: add iommu support for ipp
  drm/exynos: add ipp subsystem
  drm/exynos: support device tree for fimd
  radeon: fix regression with eviction since evict caching changes
  drm/radeon: add more pedantic checks in the CP DMA checker
  drm/radeon: bump version for CS ioctl support for async DMA
  drm/radeon: enable the async DMA rings in the CS ioctl
  drm/radeon: add VM CS parser support for async DMA on cayman/TN/SI
  drm/radeon/kms: add evergreen/cayman CS parser for async DMA (v2)
  drm/radeon/kms: add 6xx/7xx CS parser for async DMA (v2)
  drm/radeon: fix htile buffer size computation for command stream checker
  drm/radeon: fix fence locking in the pageflip callback
  drm/radeon: make indirect register access concurrency-safe
  drm/radeon: add W|RREG32_IDX for MM_INDEX|DATA based mmio accesss
  drm/exynos: support extended screen coordinate of fimd
  drm/exynos: fix x, y coordinates for right bottom pixel
  drm/exynos: fix fb offset calculation for plane
  ...
2012-12-17 08:26:17 -08:00
Chris Wilson eb119bd612 drm/i915: Access to snooped system memory through the GTT is incoherent
We ignore all the user requests to handle flushing to the GTT domain if
the user requests such on a snoopable bo, and as such access through the
GTT to such pages remains incoherent. The specs even warn that such
behaviour is undefined - a strong reason never to do so.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-17 12:28:23 +01:00
Mika Kuoppala 9e8e36879f drm/i915: Set initial seqno value close to wrap boundary
To gain confidence in the wrap handling, make it happen quite
soon after the boot.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-11 14:07:22 +01:00
Chris Wilson 107f27a5df drm/i915: Open-code i915_gpu_idle() for handling seqno wrapping
The complication is that during seqno wrapping we must be extremely
careful not to write to any ring as that will require a new seqno, and
so would recurse back into the seqno wrap handler. So we cannot call
i915_gpu_idle() as that does additional work beyond simply retiring the
current set of requests, and instead must do the minimal work ourselves
during seqno wrapping.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-11 14:07:03 +01:00
Mika Kuoppala f72b3435c1 drm/i915: Don't emit semaphore wait if wrap happened
If wrap just happened we need to prevent emitting waits for
pre wrap values. Detect this and emit no-ops instead.

v2: Use olr > seqno to detect wrap instead of *seqno == 0
as suggested by Chris Wilson.

v3: Use last used seqno to detect the wraparound. From
Chris Wilson

v4: Fixed unnecessary last_seqno assigment

References: https://bugs.freedesktop.org/show_bug.cgi?id=57967
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-11 13:32:26 +01:00
Linus Torvalds caf491916b Revert "revert "Revert "mm: remove __GFP_NO_KSWAPD""" and associated damage
This reverts commits a50915394f and
d7c3b937bd.

This is a revert of a revert of a revert.  In addition, it reverts the
even older i915 change to stop using the __GFP_NO_KSWAPD flag due to the
original commits in linux-next.

It turns out that the original patch really was bogus, and that the
original revert was the correct thing to do after all.  We thought we
had fixed the problem, and then reverted the revert, but the problem
really is fundamental: waking up kswapd simply isn't the right thing to
do, and direct reclaim sometimes simply _is_ the right thing to do.

When certain allocations fail, we simply should try some direct reclaim,
and if that fails, fail the allocation.  That's the right thing to do
for THP allocations, which can easily fail, and the GPU allocations want
to do that too.

So starting kswapd is sometimes simply wrong, and removing the flag that
said "don't start kswapd" was a mistake.  Let's hope we never revisit
this mistake again - and certainly not this many times ;)

Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-10 11:03:05 -08:00
Chris Wilson e9b73c6739 drm/i915: Reduce memory pressure during shrinker by preallocating swizzle pages
On a machine with bit17 swizzling, we need to store the bit17 of the
physical page address in put-pages. This requires a memory allocation,
on average less than a page, which may be difficult to satisfy is the
request to put-pages is on behalf of the shrinker. We could allow that
allocation to pull from the reserved memory pools, but it seems much
safer to preallocate the array for tiled objects on affected machines.

v2: Export i915_gem_object_needs_bit17_swizzle() for reuse.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-07 01:16:15 +01:00
Mika Kuoppala 498d2ac15c drm/i915: Add intel_ring_handle_seqno wrap
If there are pre-wrap values in semaphore-mbox registers after wrap,
syncing against some after-wrap request will complete immediately.
Fix this by emitting ring commands to set mbox registers to zero
when the wrap happens.

v2: Use __intel_ring_begin to emit ring commands, from
Chris Wilson.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Add a small comment to handle_seqno_wrap.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-06 13:14:34 +01:00
Daniel Vetter 1a240d4de2 drm/i915: fixup sparse warnings
- __iomem where there is none (I love how we mix these things up).
- Use gfp_t instead of an other plain type.
- Unconfuse one place about enum pipe vs enum transcoder - for the pch
  transcoder we actually use the pipe enum. Fixup the other cases
  where we assign the pipe to the cpu transcoder with explicit casts.
- Declare the mch_lock properly in a header.

There is still a decent mess in intel_bios.c about __iomem, but heck,
this is x86 and we're allowed to do that.

Makes-sparse-happy: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Use a space after the cast consistently and fix up the
newly-added cast in i915_irq.c to properly use __iomem.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-03 22:31:04 +01:00
Damien Lespiau 4239ca779d drm/i915: Fix dieing -> dying typo
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-03 18:25:02 +01:00
Chris Wilson a2165e3123 drm/i915: Decouple the object from the unbound list before freeing pages
As we may actually allocate in order to save the physical swizzling bits
during the free, we have to be careful not to trigger the shrinker on
the same object.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Added a small comment in the code to really drive the
scariness of this patch home.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-03 17:22:16 +01:00
Chris Wilson 42dcedd4f2 drm/i915: Use a slab for object allocation
The primary purpose of this was to debug some use-after-free memory
corruption that was causing an OOPS inside drm/i915. As it turned out
the corruption was being caused elsewhere and i915.ko as a major user of
many objects was being hit hardest.

Indeed as we do frequent the generic kmalloc caches, dedicating one to
ourselves (or at least naming one for us depending upon the core) aids
debugging our own slab usage.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-11-30 23:44:05 +01:00
Chris Wilson 0104fdbb84 drm/i915: Introduce i915_gem_object_create_stolen()
Allow for the creation of GEM objects backed by stolen memory. As these
are not backed by ordinary pages, we create a fake dma mapping and store
the address in the scatterlist rather than obj->pages.

v2: Mark _i915_gem_object_create_stolen() as static, as noticed by Jesse
Barnes.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-11-30 23:34:16 +01:00
Daniel Vetter 8dcf015eb9 drm/i915: optimize the shmem_pwrite slowpath handling
Since we drop dev->struct_mutex when going through the slowpath, the
object might have been moved out of the cpu domain. Hence we need to
clflush the entire object to ensure that after the ioctl returns,
everything is coherent again (interwoven writes are ill-defined
anyway).

But we only need to do this if we start in the cpu domain and the
object requires flushing for coherency. So don't do the flushing if
the object is coherent anyway or if we've done in-line clfushing
already.

v2: i915_gem_clflush_object already checks whether the object is
coherent and if so, drops the flushing. Hence we don't need to check
that ourselves, simplifying the condition.

v3: Reorder the checks for better clarity (and adjust the comment
accordingly), suggested by Chris Wilson.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-11-29 13:49:08 +01:00
Daniel Vetter a39a68054f drm/i915: simplify shmem pwrite/pread slowpath handling
The shmem paths for pwrite/pread used a clever trick to hold onto a
single page when dropping the big dev->struct_mutex for the slowpath.
But this ran the risk of reinstating (or not completely purging) the
backing storage when dropping purgeable objects.

Hence the code needed to keep track of whether it ever dropped the
lock, and if it did, manually check whether it needs to re-purge the
backing storage. But thanks to the pages pin count introduced in

commit a5570178c0
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Sep 4 21:02:54 2012 +0100

    drm/i915: Pin backing pages whilst exporting through a dmabuf vmap

which allowed us to pin the backing storage and remove that page
reference trick from shmem_pwrite/read in

commit f60d7f0c1d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Sep 4 21:02:56 2012 +0100

    drm/i915: Pin backing pages for pread

and

commit 755d22184f
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Sep 4 21:02:55 2012 +0100

    drm/i915: Pin backing pages for pwrite

we can now abolish this check. The slowpath cleanup completely
disappears from pread, and for pwrite we're only left with the domain
fixup in case someone moved the object out of the cpu domain from
under us. A follow-on patch will optimize that a notch more.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-11-29 13:48:34 +01:00
Mika Kuoppala 7b01e260a6 drm/i915: Set sync_seqno properly after seqno wrap
i915_gem_handle_seqno_wrap() will zero all sync_seqnos but as the
wrap can happen inside ring->sync_to(), pre wrap seqno was
carried over and overwrote the zeroed sync_seqno.

When wrap is handled, all outstanding requests will be retired and
objects moved to inactive queue, causing their last_read_seqno to be zero.
Use this to update the sync_seqno correctly.

RING_SYNC registers after wrap will contain pre wrap values which
are >= seqno. So injecting the semaphore wait into ring completes
immediately.

Original idea for using last_read_seqno from Chris Wilson.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-11-29 11:43:54 +01:00
Chris Wilson 3e9605018a drm/i915: Rearrange code to only have a single method for waiting upon the ring
Replace the wait for the ring to be clear with the more common wait for
the ring to be idle. The principle advantage is one less exported
intel_ring_wait function, and the removal of a hardcoded value.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-11-29 11:43:53 +01:00
Chris Wilson b662a06632 drm/i915: Simplify flushing activity on the ring
As we now always preallocate the seqno before writing to the ring, we
can trivially test if we have any pending activity on the ring by
inspecting the olr. This makes it then possible to flush operations that
are not normally associated with a request, like power-management.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-11-29 11:43:53 +01:00
Chris Wilson 9d7730914f drm/i915: Preallocate next seqno before touching the ring
Based on the work by Mika Kuoppala, we realised that we need to handle
seqno wraparound prior to committing our changes to the ring. The most
obvious point then is to grab the seqno inside intel_ring_begin(), and
then to reuse that seqno for all ring operations until the next request.
As intel_ring_begin() can fail, the callers must already be prepared to
handle such failure and so we can safely add further checks.

This patch looks like it should be split up into the interface
changes and the tweaks to move seqno wrapping from the execbuffer into
the core seqno increment. However, I found no easy way to break it into
incremental steps without introducing further broken behaviour.

v2: Mika found a silly mistake and a subtle error in the existing code;
inside i915_gem_retire_requests() we were resetting the sync_seqno of
the target ring based on the seqno from this ring - which are only
related by the order of their allocation, not retirement. Hence we were
applying the optimisation that the rings were synchronised too early,
fortunately the only real casualty there is the handling of seqno
wrapping.

v3: Do not forget to reset the sync_seqno upon module reinitialisation,
ala resume.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=863861
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> [v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-11-29 11:43:52 +01:00
Chris Wilson b5d177946a drm/i915: Wait upon the last request seqno, rather than a future seqno
In commit 69c2fc8913
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Jul 20 12:41:03 2012 +0100

    drm/i915: Remove the per-ring write list

the explicit flush was removed from i915_ring_idle(). However, we
continued to wait upon the next seqno which now did not correspond to
any request (except for the unusual condition of a failure to queue a
request after execbuffer) and so would wait indefinitely.

This has an important side-effect that i915_gpu_idle() does not cause
the seqno to be incremented. This is vital if we are to be able to idle
the GPU to handle seqno wraparound, as in subsequent patches.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-11-29 11:43:51 +01:00
Chris Wilson 5774506f15 drm/i915: Borrow our struct_mutex for the direct reclaim
If we have hit oom whilst holding our struct_mutex, then currently we
cannot reap our own GPU buffers which likely pin most of memory, making
an outright OOM more likely. So if we are running in direct reclaim and
already hold the mutex, attempt to free buffers knowing that the
original function can not continue until we return.

v2: Add a note explaining that the mutex may be stolen due to
pre-emption, and that is bad.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-11-21 17:47:14 +01:00
Chris Wilson 8742267af4 drm/i915: Defer assignment of obj->gtt_space until after all possible mallocs
As we may invoke the shrinker whilst trying to allocate memory to hold
the gtt_space for this object, we need to be careful not to mark the
drm_mm_node as activated (by assigning it to this object) before we
have finished our sequence of allocations.

Note: We also need to move the binding of the object into the actual
pagetables down a bit. The best way seems to be to move it out into
the callsites.

Reported-by: Imre Deak <imre.deak@gmail.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Added small note to commit message to summarize review
discussion.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-11-21 17:47:13 +01:00
Chris Wilson c9839303d1 drm/i915: Pin the object whilst faulting it in
In order to prevent reaping of the object whilst setting it up to
handle the pagefault, we need to mark it as pinned. This has the nice
side-effect of eliminating some special cases from the pagefault handler
as well!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-11-21 17:45:04 +01:00
Chris Wilson fbdda6fb5e drm/i915: Guard pages being reaped by OOM whilst binding-to-GTT
In the circumstances that the shrinker is allowed to steal the mutex
in order to reap pages, we need to be careful to prevent it operating on
the current object and shooting ourselves in the foot.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-11-21 17:45:04 +01:00
Dave Airlie 9fabd4eede Merge branch 'for-airlied' of git://people.freedesktop.org/~danvet/drm-intel into drm-next
Daniel writes:
Highlights of this -next round:
- ivb fdi B/C fixes
- hsw sprite/plane offset fixes from Damien
- unified dp/hdmi encoder for hsw, finally external dp support on hsw
  (Paulo)
- kill-agp and some other prep work in the gtt code from Ben
- some fb handling fixes from Ville
- massive pile of patches to align hsw VGA with the spec and make it
  actually work (Paulo)
- pile of workarounds from Jesse, mostly for vlv, but also some other
  related platforms
- start of a dev_priv reorg, that thing grew out of bounds and chaotic
- small bits&pieces all over the place, down to better error handling for
  load-detect on gen2 (Chris, Jani, Mika, Zhenyu, ...)

On top of the previous pile (just copypasta):
- tons of hsw dp prep patches form Paulo
- round scheduled work items and timers to nearest second (Chris)
- some hw workarounds (Jesse&Damien)
- vlv dp support and related fixups (Vijay et al.)
- basic haswell dp support, not yet wired up for external ports (Paulo)
- edp support (Paulo)
- tons of refactorings to prepare for the above (Paulo)
- panel rework, unifiying code between lvds and edp panels (Jani)
- panel fitter scaling modes (Jani + Yuly Novikov)
- panel power improvements, should now work without the BIOS setting it up
- extracting some dp helpers from radeon/i915 and move them to
  drm_dp_helper.c
- randome pile of workarounds (Damien, Ben, ...)
- some cleanups for the register restore code for suspend/resume
- secure batchbuffer support, should enable tear-free blits on gen6+
  Chris)
- random smaller fixlets and cleanups.

* 'for-airlied' of git://people.freedesktop.org/~danvet/drm-intel: (231 commits)
  drm/i915: Restore physical HWS_PGA after resume
  drm/i915: Report amount of usable graphics memory in MiB
  drm/i915/i2c: Track users of GMBUS force-bit
  drm/i915: Allocate the proper size for contexts.
  drm/i915: Update load-detect failure paths for modeset-rework
  drm/i915: Clear unused fields of mode for framebuffer creation
  drm/i915: Always calculate 8xx WM values based on a 32-bpp framebuffer
  drm/i915: Fix sparse warnings in from AGP kill code
  drm/i915: Missed lock change with rps lock
  drm/i915: Move the remaining gtt code
  drm/i915: flush system agent TLBs on SNB
  drm/i915: Kill off now unused gen6+ AGP code
  drm/i915: Calculate correct stolen size for GEN7+
  drm/i915: Stop using AGP layer for GEN6+
  drm/i915: drop the double-OP_STOREDW usage in blt_ring_flush
  drm/i915: don't rewrite the GTT on resume v4
  drm/i915: protect RPS/RC6 related accesses (including PCU) with a new mutex
  drm/i915: put ring frequency and turbo setup into a work queue v5
  drm/i915: don't block resume on fb console resume v2
  drm/i915: extract l3_parity substruct from dev_priv
  ...
2012-11-20 09:22:35 +10:00
Ben Widawsky 26b1ff35c8 drm/i915: Move the remaining gtt code
It's pretty much all consolidated now that we've killed AGP. We can move
the one outlier, and defines too.

(Kill some unused defines in the process)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-11-11 23:51:44 +01:00
Ben Widawsky e76e9aebcd drm/i915: Stop using AGP layer for GEN6+
As a quick hack we make the old intel_gtt structure mutable so we can
fool a bunch of the existing code which depends on elements in that data
structure. We can/should try to remove this in a subsequent patch.

This should preserve the old gtt init behavior which upon writing these
patches seems incorrect. The next patch will fix these things.

The one exception is VLV which doesn't have the preserved flush control
write behavior. Since we want to do that for all GEN6+ stuff, we'll
handle that in a later patch. Mainstream VLV support doesn't actually
exist yet anyway.

v2: Update the comment to remove the "voodoo"
Check that the last pte written matches what we readback

v3: actually kill cache_level_to_agp_type since most of the flags will
disappear in an upcoming patch

v4: v3 was actually not what we wanted (Daniel)
Make the ggtt bind assertions better and stricter (Chris)
Fix some uncaught errors at gtt init (Chris)
Some other random stuff that Chris wanted

v5: check for i==0 in gen6_ggtt_bind_object to shut up gcc (Ben)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by [v4]: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Make the cache_level -> agp_flags conversion for pre-gen6 a
tad more robust by mapping everything != CACHE_NONE to the cached agp
flag - we have a 1:1 uncached mapping, but different modes of
cacheable (at least on later generations). Suggested by Chris Wilson.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-11-11 23:51:42 +01:00
Daniel Vetter a4da4fa4e5 drm/i915: extract l3_parity substruct from dev_priv
Pretty astonishing how far apart these two members landed ... Especially since
I've already removed almost 200 lines in between.

Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-11-11 23:51:40 +01:00
Daniel Vetter c2fb791692 Linux 3.7-rc2
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJQgvdwAAoJEHm+PkMAQRiG+3AH/i2XsqqN3VctL0nnbWfvds+Q
 aKulfIdJTjKiVAsawPUtRqReZ8ijiebrgA/53lZLlrFOoPPQ5+LHmnSyQF6gErOY
 NuAE1lijXDRM1pwBlhvOBbAj26wUobGjqONFJ9OkKr758Ue8ds/Q7UdxyEgmYgmg
 tvVMzfRcICzryUV3PcqL+3cNPpCUdT6wGGRJ9DCv/jvGiWKExWhOle5oltrmxk+D
 NsqRcws5pEubfHE4J8BvNWr8lE1kHfYVhrJETiLJUiN2XAJcbI4Jy7rU/3EGteNS
 0HMZdaPPjV874lohdM70X2225SbYrCVkAYB5hnZCTeC3tYyCawBBPMQoyAiOcmU=
 =+861
 -----END PGP SIGNATURE-----

Merge tag 'v3.7-rc2' into drm-intel-next-queued

Linux 3.7-rc2

Backmerge to solve two ugly conflicts:
- uapi. We've already added new ioctl definitions for -next. Do I need to say more?
- wc support gtt ptes. We've had to revert this for snb+ for 3.7 and
  also fix a few other things in the code. Now we know how to make it
  work on snb+, but to avoid losing the other fixes do the backmerge
  first before re-enabling wc gtt ptes on snb+.

And a few other minor things, among them git getting confused in
intel_dp.c and seemingly causing a conflict out of nothing ...

Conflicts:
	drivers/gpu/drm/i915/i915_reg.h
	drivers/gpu/drm/i915/intel_display.c
	drivers/gpu/drm/i915/intel_dp.c
	drivers/gpu/drm/i915/intel_modes.c
	include/drm/i915_drm.h

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-10-22 14:34:51 +02:00
Dave Airlie 64acba6a7a Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into drm-fixes
Daniel writes:
The big thing is the disabling of the hsw support by default, cc: stable.
We've aimed for basic hsw support in 3.6, but due to a few bad
happenstances we've screwed up and only 3.8 will have better modeset
support than vesa. To avoid yet another round of fallout from such a
gaffle on for the next platform we've added a module option to disable
early hw support by default. That should also give us more flexibility in
bring-up.

 Otherwise just small fixes:
 - 3 fixes from Egbert for sdvo corner cases
 - invert-brightness quirk entry from Egbert
 - revert a dp link training change, it regresses some setups
 - and shut up a spurious WARN in our gem fault handler.
 - regression fix for an oops on bit17 swizzling machines, introduce in 3.7
 - another no-lvds quirk

* 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel:
  drm/i915: Initialize obj->pages before use by i915_gem_object_do_bit17_swizzle()
  drm/i915: Add no-lvds quirk for Supermicro X7SPA-H
  drm/i915: Insert i915_preliminary_hw_support variable.
  drm/i915: shut up spurious WARN in the gtt fault handler
  Revert "drm/i915: Try harder to complete DP training pattern 1"
  DRM/i915: Restore sdvo_flags after dtd->mode->dtd Roundrtrip.
  DRM/i915: Don't clone SDVO LVDS with analog.
  DRM/i915: Add QUIRK_INVERT_BRIGHTNESS for NCR machines.
  DRM/i915: Don't delete DPLL Multiplier during DAC init.
2012-10-22 09:55:48 +10:00
Chris Wilson 74ce6b6c63 drm/i915: Initialize obj->pages before use by i915_gem_object_do_bit17_swizzle()
If we leave obj->pages set to NULL before attempting to deswizzle them,
then an OOPS is well deserved.

Fixes regression introduced in commit 9da3da660d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Jun 1 15:20:22 2012 +0100

    drm/i915: Replace the array of pages with a scatterlist

Reported-and-tested-by: Krzysztof Kolasa
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-10-19 21:52:52 +02:00
Daniel Vetter a7c2e1aad6 drm/i915: shut up spurious WARN in the gtt fault handler
-ENOSPC can happen if userspace is being simplistic and tries to map a
too big object. To aid further spurious WARN debugging, also print out
the error code.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56017
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-10-17 11:56:40 +02:00
Dave Airlie 3459f62047 Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into drm-fixes
Daniel writes:
"- some register magic to fix hsw crw (Paulo&Ben)
- fix backlight destruction for cpu edp (Jani)
- fix gen ch7xxx dvo ->get_hw_state
- fixup the plane->pipe fixup code, the broken version massively angers
  the modeset sanity checks
- kill pipe A quirk for i855gm, otherwise I get a black screen with the
  above patch
- fixup for gem_get_page helper (Chris)
- fixup guardband clipping w/a (Ken), without this mesa master can erronously
  drop vertices on snb, mesa 9.0 has the optimization reverted
- another pageflip vs. modeset fix
- kill bogus BUG_ON which broke ums+gem from Willy Tarreau (gasp, people
  are still using this!)"

* 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel:
  drm/i915: fix non-DP-D eDP backlight cleanup and module reload
  drm/i915: HSW CRW stability magic
  drm/i915/dvo-ch7xxx: fix get_hw_state
  drm/i915: fixup the plane->pipe fixup code
  drm/i915: rip out the pipe A quirk for i855gm
  drm/i915: disable wc gtt pte mappings on gen2
  drm/i915: fixup i915_gem_object_get_page inline helper
  drm/i915: Disallow preallocation of requests
  drm/i915: Set guardband clipping workaround bit in the right register.
  drm/i915: paper over a pipe-enable vs pageflip race
  drm/i915: remove useless BUG_ON which caused a regression in 3.5.
2012-10-16 10:11:59 +10:00
Rodrigo Vivi eda2d7f582 drm/i915: HSW CRW stability magic
This magic brings stability to HSW CRW machines.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-10-12 10:59:11 +02:00
Chris Wilson acb868d3d7 drm/i915: Disallow preallocation of requests
The intention was to allow the caller to avoid a failure to queue a
request having already written commands to the ring. However, this is a
moot point as the i915_add_request() can fail for other reasons than a
mere allocation failure and those failure cases are more likely than
ENOMEM. So the overlay code already had to handle i915_add_request()
failures, and due to

commit 3bb73aba1e
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Jul 20 12:40:59 2012 +0100

    drm/i915: Allow late allocation of request for i915_add_request()

the error handling code in intel_overlay.c was subject to causing
double-frees, as found by coverity.

Rather than further complicate i915_add_request() and callers, realise
the battle is lost and adapt intel_overlay.c to take advantage of the
late allocation of requests.

v2: Handle callers passing in a NULL seqno.
v3: Ditto. This time for sure.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-10-12 10:59:09 +02:00
Chris Wilson bcb4508616 drm/i915: Align the retire_requests worker to the nearest second
By using round_jiffies() we can align the wakeup of our worker to the
nearest second in order to batch wakeups and reduce system load, which
is useful for unimportant coarse tasks like our retire_requests.

v2: round_jiffies_relative() already returns the relative timeout value,
so no need to incorrectly perform the subtraction twice. The timer
interface still leaves the possibility for the value of jiffies to
change be we program the timer.

Suggested-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-10-08 18:45:21 +02:00
Chris Wilson cecc21fea9 drm/i915: Align the hangcheck wakeup to the nearest second
round_jiffies() aligns the wakeup time to the nearest second in order to
batch wakeups and reduce system load, which is useful for unimportant
coarse timers like our hangcheck.

v2: round_jiffies_relative() returns the relative jiffie value, whereas
we need the absolute value for the timer.

Suggested-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-10-08 18:44:36 +02:00
Willy Tarreau c77d7162a7 drm/i915: remove useless BUG_ON which caused a regression in 3.5.
starting an old X server causes a kernel BUG since commit 1b50247a8d:

------------[ cut here ]------------
kernel BUG at drivers/gpu/drm/i915/i915_gem.c:3661!
invalid opcode: 0000 [#1] SMP
Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss uvcvideo
+videobuf2_core videodev videobuf2_vmalloc videobuf2_memops uhci_hcd ath9k mac80211 snd_hda_codec_realtek ath9k_common microcode
+ath9k_hw psmouse serio_raw sg ath cfg80211 atl1c lpc_ich mfd_core ehci_hcd snd_hda_intel snd_hda_codec snd_hwdep snd_pcm rtc_cmos
+snd_timer snd evdev eeepc_laptop snd_page_alloc sparse_keymap

Pid: 2866, comm: X Not tainted 3.5.6-rc1-eeepc #1 ASUSTeK Computer INC. 1005HA/1005HA
EIP: 0060:[<c12dc291>] EFLAGS: 00013297 CPU: 0
EIP is at i915_gem_entervt_ioctl+0xf1/0x110
EAX: f5941df4 EBX: f5940000 ECX: 00000000 EDX: 00020000
ESI: f5835400 EDI: 00000000 EBP: f51d7e38 ESP: f51d7e20
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 8005003b CR2: b760e0a0 CR3: 351b6000 CR4: 000007d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Process X (pid: 2866, ti=f51d6000 task=f61af8d0 task.ti=f51d6000)
Stack:
 00000001 00000000 f5835414 f51d7e84 f5835400 f54f85c0 f51d7f10 c12b530b
 00000001 c151b139 c14751b6 c152e030 00000b32 00006459 00000059 0000e200
 00000001 00000000 00006459 c159ddd0 c12dc1a0 ffffffea 00000000 00000000
Call Trace:
 [<c12b530b>] drm_ioctl+0x2eb/0x440
 [<c12dc1a0>] ? i915_gem_init+0xe0/0xe0
 [<c1052b2b>] ? enqueue_hrtimer+0x1b/0x50
 [<c1053321>] ? __hrtimer_start_range_ns+0x161/0x330
 [<c10530b3>] ? lock_hrtimer_base+0x23/0x50
 [<c1053163>] ? hrtimer_try_to_cancel+0x33/0x70
 [<c12b5020>] ? drm_version+0x90/0x90
 [<c10ca171>] vfs_ioctl+0x31/0x50
 [<c10ca2e4>] do_vfs_ioctl+0x64/0x510
 [<c10535de>] ? hrtimer_nanosleep+0x8e/0x100
 [<c1052c20>] ? update_rmtp+0x80/0x80
 [<c10ca7c9>] sys_ioctl+0x39/0x60
 [<c1433949>] syscall_call+0x7/0xb
Code: 83 c4 0c 5b 5e 5f 5d c3 c7 44 24 04 2c 05 53 c1 c7 04 24 6f ef 47 c1 e8 6e e0 fd ff c7 83 38 1e 00 00 00 00 00 00 e9 3f ff ff
+ff <0f> 0b eb fe 0f 0b eb fe 8d b4 26 00 00 00 00 0f 0b eb fe 8d b6
EIP: [<c12dc291>] i915_gem_entervt_ioctl+0xf1/0x110 SS:ESP 0068:f51d7e20
---[ end trace dd332ec083cbd513 ]---

The crash happens here in i915_gem_entervt_ioctl() :

    3659          BUG_ON(!list_empty(&dev_priv->mm.active_list));
    3660          BUG_ON(!list_empty(&dev_priv->mm.flushing_list));
 -> 3661          BUG_ON(!list_empty(&dev_priv->mm.inactive_list));
    3662          mutex_unlock(&dev->struct_mutex);

Quoting Chris :
  "That BUG_ON there is silly and can simply be removed. The check is to
   verify that no batches were submitted to the kernel whilst the UMS/GEM
   client was suspended - to which the BUG_ONs are a crude approximation.
   Furthermore, the checks are too late, since it means we attempted to
   program the hardware whilst it was in an invalid state, the BUG_ONs are
   the least of your concerns at that point."

Note that this regression has been introduced in

commit 1b50247a8d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Apr 24 15:47:30 2012 +0100

    drm/i915: Remove the list of pinned inactive objects

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
[danvet: Added note about the regressing commit and cc: stable.]
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-10-07 22:57:25 +02:00
Dave Airlie 1f31c69dac Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into drm-next
Daniel writes:

Bigger -fixes pile, mostly because I've included Ajax' DP dongle stuff,
as discussed on irc. Otherwise just small things:
- regression fix to finally make 6bpc auto-dither on dp work (Jani)
- reinstate an snb ctx w/a that accidentally got lost in a rework (Chris)
- fixup the DP train sequence, logic-goof-up uncovered by Coverty (Chris)
- fix set_caching locking (Ben)
- fix spurious segfault on con-current gtt mmap faulting (Dimitry and Mika)
- some pageflip correctness fixes (still hunting down some issues, but
  these are the worst offenders of confused code that we've tracked down
  thus far) from Chris and me
- fixup swizzling settings on vlv (Jesse)
- gt_mode w/a from Ben added, fixes snb gt1 rc6+hw ctx hangs.

* 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel:
  drm/i915: Fix GT_MODE default value
  drm/i915: don't frob the vblank ts in finish_page_flip
  drm/i915: call drm_handle_vblank before finish_page_flip
  drm/i915: print warning if vmi915_gem_fault error is not handled
  drm/i915: EBUSY status handling added to i915_gem_fault().
  drm/i915: Try harder to complete DP training pattern 1
  drm/i915: set swizzling to none on VLV
  drm/dp: Make sink count DP 1.2 aware
  drm/dp: Document DP spec versions for various DPCD registers
  drm/i915/dp: Be smarter about connection sense for branch devices
  drm/i915/dp: Fetch downstream port info if needed during DPCD fetch
  drm/dp: Update DPCD defines
  drm: Export drm_probe_ddc()
  drm/i915: Flush the pending flips on the CRTC before modification
  drm/i915: Actually invalidate the TLB for the SandyBridge HW contexts w/a
  drm/i915: Fix set_caching locking
  drm/i915: use adjusted_mode instead of mode for checking the 6bpc force flag
2012-10-07 21:13:54 +10:00
Mika Kuoppala 4d0f817e74 drm/i915: print warning if vmi915_gem_fault error is not handled
Falling into default case in vmi915_gem_fault is a bug. Be more
verbose about it.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-10-04 10:33:42 +02:00
Dmitry Rogozhkin e79e0fe380 drm/i915: EBUSY status handling added to i915_gem_fault().
Subsequent threads returning EBUSY from vm_insert_pfn() was not handled
correctly. As a result concurrent access from new threads to
mmapped data caused SIGBUS.

Note that this fixes i-g-t/tests/gem_threaded_tiled_access.

Tested-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-10-04 10:33:42 +02:00
Linus Torvalds 612a9aab56 Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
Pull drm merge (part 1) from Dave Airlie:
 "So first of all my tree and uapi stuff has a conflict mess, its my
  fault as the nouveau stuff didn't hit -next as were trying to rebase
  regressions out of it before we merged.

  Highlights:
   - SH mobile modesetting driver and associated helpers
   - some DRM core documentation
   - i915 modesetting rework, haswell hdmi, haswell and vlv fixes, write
     combined pte writing, ilk rc6 support,
   - nouveau: major driver rework into a hw core driver, makes features
     like SLI a lot saner to implement,
   - psb: add eDP/DP support for Cedarview
   - radeon: 2 layer page tables, async VM pte updates, better PLL
     selection for > 2 screens, better ACPI interactions

  The rest is general grab bag of fixes.

  So why part 1? well I have the exynos pull req which came in a bit
  late but was waiting for me to do something they shouldn't have and it
  looks fairly safe, and David Howells has some more header cleanups
  he'd like me to pull, that seem like a good idea, but I'd like to get
  this merge out of the way so -next dosen't get blocked."

Tons of conflicts mostly due to silly include line changes, but mostly
mindless.  A few other small semantic conflicts too, noted from Dave's
pre-merged branch.

* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (447 commits)
  drm/nv98/crypt: fix fuc build with latest envyas
  drm/nouveau/devinit: fixup various issues with subdev ctor/init ordering
  drm/nv41/vm: fix and enable use of "real" pciegart
  drm/nv44/vm: fix and enable use of "real" pciegart
  drm/nv04/dmaobj: fixup vm target handling in preparation for nv4x pcie
  drm/nouveau: store supported dma mask in vmmgr
  drm/nvc0/ibus: initial implementation of subdev
  drm/nouveau/therm: add support for fan-control modes
  drm/nouveau/hwmon: rename pwm0* to pmw1* to follow hwmon's rules
  drm/nouveau/therm: calculate the pwm divisor on nv50+
  drm/nouveau/fan: rewrite the fan tachometer driver to get more precision, faster
  drm/nouveau/therm: move thermal-related functions to the therm subdev
  drm/nouveau/bios: parse the pwm divisor from the perf table
  drm/nouveau/therm: use the EXTDEV table to detect i2c monitoring devices
  drm/nouveau/therm: rework thermal table parsing
  drm/nouveau/gpio: expose the PWM/TOGGLE parameter found in the gpio vbios table
  drm/nouveau: fix pm initialization order
  drm/nouveau/bios: check that fixed tvdac gpio data is valid before using it
  drm/nouveau: log channel debug/error messages from client object rather than drm client
  drm/nouveau: have drm debugging macros build on top of core macros
  ...
2012-10-03 23:29:23 -07:00
David Howells 760285e7e7 UAPI: (Scripted) Convert #include "..." to #include <path/...> in drivers/gpu/
Convert #include "..." to #include <path/...> in drivers/gpu/.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Jones <davej@redhat.com>
2012-10-02 18:01:07 +01:00
David Howells 4126d5d61f UAPI: (Scripted) Remove redundant DRM UAPI header #inclusions from drivers/gpu/.
Remove redundant DRM UAPI header #inclusions from drivers/gpu/.

Remove redundant #inclusions of core DRM UAPI headers (drm.h, drm_mode.h and
drm_sarea.h).  They are now #included via drmP.h and drm_crtc.h via a preceding
patch.

Without this patch and the patch to make include the UAPI headers from the core
headers, after the UAPI split, the DRM C sources cannot find these UAPI headers
because the DRM code relies on specific -I flags to make #include "..."  work
on headers in include/drm/ - but that does not work after the UAPI split without
adding more -I flags.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Jones <davej@redhat.com>
2012-10-02 18:01:05 +01:00
Ben Widawsky 3bc2913e2c drm/i915: Fix set_caching locking
On the EINVAL case we don't release struct_mutex. It should be safe to
grab the lock after checking the parameters, which also resolves the
issues.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-09-27 08:45:11 +02:00
Ben Widawsky 199adf40ae drm/i915: s/cacheing/caching/
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-09-26 09:24:36 +02:00
Daniel Vetter 398b7a1b88 Linux 3.6-rc7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQEcBAABAgAGBQJQX7MuAAoJEHm+PkMAQRiG0h0IAJURkrMCAQUxA+Ik66ReH89s
 LQcVd0U9uL4UUOi7f5WR64Vf9Cfu6VVGX9ZKSvjpNskvlQaUQPMIt4pMe6g4X4dI
 u0bApEy4XZz3nGabUAghIU8jJ8cDmhCG6kPpSiS7pi7KHc0yIa4WFtJRrIpGaIWT
 xuK38YOiOHcSDRlLyWZzainMncQp/ixJdxnqVMTonkVLk0q0b84XzOr4/qlLE5lU
 i+TsK3PRKdQXgvZ4CebL+srPBwWX1dmgP3VkeBloQbSSenSeELICbFWavn2ml+sF
 GXi4dO93oNquL/Oy5SwI666T4uNcrRPaS+5X+xSZgBW/y2aQVJVJuNZg6ZP/uWk=
 =0v2l
 -----END PGP SIGNATURE-----

Merge tag 'v3.6-rc7' into drm-intel-next-queued

Manual backmerge of -rc7 to resolve a silent conflict leading to
compile failure in drivers/gpu/drm/i915/intel_hdmi.c.

This is due to the bugfix in -rc7:

commit b98b601672
Author: Wang Xingchao <xingchao.wang@intel.com>
Date:   Thu Sep 13 07:43:22 2012 +0800

    drm/i915: HDMI - Clear Audio Enable bit for Hot Plug

Since this code moved around a lot in -next git put that snippet at
the wrong spot. I've tried to fix this by making the conflict explicit
by merging a version for next with:

commit 3cce574f01
Author: Wang Xingchao <xingchao.wang@intel.com>
Date:   Thu Sep 13 11:19:00 2012 +0800

    drm/i915: HDMI - Clear Audio Enable bit for Hot Plug unconditionally

But that failed to solve the entire problem. To avoid pushing out
further -nightly branch to our QA where this is broken, do the
backmerge and manually add the stuff git adds to -next from the patch
in -fixes.

Note that this doesn't show up in git's merge diff (and hence is also
not handled by git rerere), which adds to the reasons why I'd like to
fix this with a verbose backmerge. The git merge diff only shows a
bunch of trivial conflicts of the "code changed in lines next to each
another" kind.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-09-24 18:17:12 +02:00
Chris Wilson 2f745ad3d3 drm/i915: Convert the dmabuf object to use the new i915_gem_object_ops
By providing a callback for when we need to bind the pages, and then
release them again later, we can shorten the amount of time we hold the
foreign pages mapped and pinned, and importantly the dmabuf objects then
behave as any other normal object with respect to the shrinker and
memory management.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-09-20 14:23:10 +02:00
Chris Wilson 9da3da660d drm/i915: Replace the array of pages with a scatterlist
Rather than have multiple data structures for describing our page layout
in conjunction with the array of pages, we can migrate all users over to
a scatterlist.

One major advantage, other than unifying the page tracking structures,
this offers is that we replace the vmalloc'ed array (which can be up to
a megabyte in size) with a chain of individual pages which helps reduce
memory pressure.

The disadvantage is that we then do not have a simple array to iterate,
or to access randomly. The common case for this is in the relocation
processing, which will typically fit within a single scatterlist page
and so be almost the same cost as the simple array. For iterating over
the array, the extra function call could be optimised away, but in
reality is an insignificant cost of either binding the pages, or
performing the pwrite/pread.

v2: Fix drm_clflush_sg() to not invoke wbinvd as well! And fix the
trivial compile error from rebasing.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-09-20 14:22:57 +02:00
Chris Wilson f60d7f0c1d drm/i915: Pin backing pages for pread
By using the recently introduced pinning of pages, we can safely drop
the mutex in the knowledge that the pages are not going to disappear
beneath us, and so we can simplify the code for iterating over the pages.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-09-20 14:22:57 +02:00
Chris Wilson 755d22184f drm/i915: Pin backing pages for pwrite
By using the recently introduced pinning of pages, we can safely drop
the mutex in the knowledge that the pages are not going to disappear
jeneath us, and so we can simplify the code for iterating over the pages.

Note: The old code had such complicated page refcounting since it used
obj->pages as a micro-optimization if it's there, but that could
(before this patch) disappear when we drop the dev->struct_mutex.
Hence some manual page refcounting was required for the slow path,
complicated by the fact that pages returned by shmem_read_mapping_page
already have a pageref, which needs to be dropped again.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Added note to explain the question Ben raised in review.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-09-20 14:22:56 +02:00
Chris Wilson a5570178c0 drm/i915: Pin backing pages whilst exporting through a dmabuf vmap
We need to refcount our pages in order to prevent reaping them at
inopportune times, such as when they currently vmapped or exported to
another driver. However, we also wish to keep the lazy deallocation of
our pages so we need to take a pin/unpinned approach rather than a
simple refcount.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-09-20 14:22:56 +02:00
Chris Wilson 37e680a15f drm/i915: Introduce drm_i915_gem_object_ops
In order to specialise functions depending upon the type of object, we
can attach vfuncs to each object via a new ->ops pointer.

For instance, this will be used in future patches to only bind pages from
a dma-buf for the duration that the object is used by the GPU - and so
prevent them from pinning those pages for the entire of the object.

v2: Bonus comments.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-09-20 14:22:55 +02:00
Chris Wilson 7e81a42e34 drm/i915: Reduce a pin-leak BUG into a WARN
Pin-leaks persist and we get the perennial bug reports of machine
lockups to the BUG_ON(pin_count==MAX). If we instead loudly report that
the object cannot be pinned at that time it should prevent the driver from
locking up, and hopefully restore a semblance of working whilst still
leaving us a OOPS to debug.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-09-17 10:12:57 +02:00
Dave Airlie 65983bd605 Merge branch 'for-airlied' of git://people.freedesktop.org/~danvet/drm-intel into drm-next
Daniel writes:
"New stuff for -next. Highlights:
- prep patches for the modeset rework. Note that one of those patches
  touches the fb helper in the common drm code.
- hasw hdmi audio support (Wang Xingchao)
- improved instdone dumping for gen7 (Ben)
- unbound tracking and a few follow-up patches from Chris
- dma_buf->begin/end_cpu_access plus fix for drm/udl (Dave)
- improve mmio error reporting for hsw
- prep patch for WQ_NON_REENTRANT removal (Tejun Heo)
"

* 'for-airlied' of git://people.freedesktop.org/~danvet/drm-intel: (41 commits)
  drm/i915: Remove __GFP_NO_KSWAPD
  drm/i915: disable rc6 on ilk when vt-d is enabled
  drm/i915: Avoid unbinding due to an interrupted pin_and_fence during execbuffer
  drm/i915: Use new INSTDONE registers (Gen7+)
  drm/i915: Add new INSTDONE registers
  drm/i915: Extract reading INSTDONE
  drm/i915: Use a non-blocking wait for set-to-domain ioctl
  drm/i915: Juggle code order to ease flow of the next patch
  drm/i915: Use cpu relocations if the object is in the GTT but not mappable
  drm/i915: Extract general object init routine
  drm/i915: Protect private gem objects from truncate (such as imported dmabuf)
  drm/i915: Only pwrite through the GTT if there is space in the aperture
  i915: use alloc_ordered_workqueue() instead of explicit UNBOUND w/ max_active = 1
  drm/i915: Find unclaimed MMIO writes.
  drm/i915: Add ERR_INT to gen7 error state
  drm/i915: Cantiga+ cannot handle a hsync front porch of 0
  drm/i915: fix reassignment of variable "intel_dp->DP"
  drm/i915: Try harder to allocate an mmap_offset
  drm/i915: Show pin count in debugfs
  drm/i915: Show (count, size) of purgeable objects in i915_gem_objects
  ...
2012-09-03 12:05:01 +10:00
Sedat Dilek d7c3b937bd drm/i915: Remove __GFP_NO_KSWAPD
When I pulled-in today's drm-intel-next into linux-next (next-20120824)
I saw this build-breakage:

drivers/gpu/drm/i915/i915_gem.c: In function 'i915_gem_object_get_pages_gtt':
drivers/gpu/drm/i915/i915_gem.c:1778:40: error: '__GFP_NO_KSWAPD' undeclared (first use in this function)
drivers/gpu/drm/i915/i915_gem.c:1778:40: note: each undeclared identifier is reported only once for each function it appears in

This is caused by commit ba099ef165f8 ("mm: remove __GFP_NO_KSWAPD")
and commit b6beae2c2014 ("mm: remove __GFP_NO_KSWAPD fixes") in
linux-next (next-20120824).

Fix this by removing __GFP_NO_KSWAPD from drm/i915 driver.

Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-27 17:11:38 +02:00
Dave Airlie 93bb70e0c0 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into drm-next
There was some merge conflicts in -next and they weren't so pretty, so
backmerge now to avoid them.

Conflicts:
	drivers/gpu/drm/i915/i915_gem.c
	drivers/gpu/drm/i915/intel_modes.c
2012-08-27 16:22:20 +10:00
Chris Wilson 3236f57a01 drm/i915: Use a non-blocking wait for set-to-domain ioctl
The principal use for set-to-domain is for userspace to serialise
operations with a particular buffer, for example to maintain coherency
with a CPU map or to ratelimit its rendering by waiting on all previous
operations before continuing. As such we tend to hold the struct_mutex
for long periods during the synchronisation and so cause contention
issues with other users of the graphics device, even for independent
operations as memory management. An example is the contention between
compiz and X which causes jitter in the display and a drop in peak
throughput.

The ultimate solution would be a set of fine grained locks and lockless
operations, but an intermediate step is to first attempt the
synchronisation for set-to-domain without holding the mutex. This
introduces a number of race conditions, so we limit it use to the ioctl
periphery where we have no dependent state and can safely complete with
a locked synchronisation afterwards.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-24 11:12:56 +02:00
Chris Wilson b361237bcc drm/i915: Juggle code order to ease flow of the next patch
Move the wait-for-rendering logic around in the file so that we can
group it together with the subsequent variations. The general goal is to
have the lower level routines clustered together and then the higher
level logic building upon those low level routines that came before.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-24 11:12:53 +02:00
Chris Wilson 0327d6ba99 drm/i915: Extract general object init routine
As we wish to create specialised object constructions in the near
future that share the same basic GEM object struct, export the default
initializer.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-24 02:04:38 +02:00
Chris Wilson 4d6294bf77 drm/i915: Protect private gem objects from truncate (such as imported dmabuf)
If the object has no backing shmemfs filp, then we obviously cannot
perform a truncation operation upon it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-24 02:04:31 +02:00
Chris Wilson 86a1ee26bb drm/i915: Only pwrite through the GTT if there is space in the aperture
Avoid stalling and waiting for the GPU by checking to see if there is
sufficient inactive space in the aperture for us to bind the buffer
prior to writing through the GTT. If there is inadequate space we will
have to stall waiting for the GPU, and incur overheads moving objects
about. Instead, only incur the clflush overhead on the target object by
writing through shmem.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-24 02:03:33 +02:00
Chris Wilson d8cb508669 drm/i915: Try harder to allocate an mmap_offset
Given the persistence of an offset for the lifetime of an object, itis
easy to contemplate how the mmap space becomes badly fragmented to the
point that further allocations fail with ENOSPC. Our only recourse at
this point is to try to purge the objects to release some space and
reattempt the allocation.

References: https://bugs.freedesktop.org/show_bug.cgi?id=39552
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-21 14:34:36 +02:00
Chris Wilson c4670ad080 drm/i915: Add some sanity checks to unbound tracking
A pair of universally true checks that just need to be put in the right
place depending on where in the patch sequence you go. Note that
i915_gem_object_put_pages_gtt() already gains the
BUG_ON(obj->gtt_space), but on reflection that needed to migrate to
put_pages().

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-21 14:34:20 +02:00
Chris Wilson 6c085a728c drm/i915: Track unbound pages
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.

To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.

As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.

Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).

Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.

v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.

v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-21 14:34:11 +02:00
Daniel Vetter 225067eedf drm/i915: move functions around
Prep work to make Chris Wilson's unbound tracking patch a bit easier
to read. Alas, I'd have preferred that moving the page allocation
retry loop from bind to get_pages would have been a separate patch,
too. But that looks like real work ;-)

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-20 10:59:41 +02:00
Ben Widawsky b6c7488df6 drm/i915/contexts: fix list corruption
After reset we unconditionally reinitialize lists. If the context switch
hasn't yet completed before the suspend, the default context object will
end up on lists that are going to go away when we resume.

The patch forces the context switch to be synchronous before suspend
assuring that the active/inactive tracking is correct at the time of
resume.

References: https://bugs.freedesktop.org/show_bug.cgi?id=52429
Tested-by: Guang A Yang <guang.a.yang@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-17 09:21:34 +02:00
Chris Wilson b2eadbc85b drm/i915: Lazily apply the SNB+ seqno w/a
Avoid the forcewake overhead when simply retiring requests, as often the
last seen seqno is good enough to satisfy the retirment process and will
be promptly re-run in any case. Only ensure that we force the coherent
seqno read when we are explicitly waiting upon a completion event to be
sure that none go missing, and also for when we are reporting seqno
values in case of error or debugging.

This greatly reduces the load for userspace using the busy-ioctl to
track active buffers, for instance halving the CPU used by X in pushing
the pixels from a software render (flash). The effect will be even more
magnified with userptr and so providing a zero-copy upload path in that
instance, or in similar instances where X is simply compositing DRI
buffers.

v2: Reverse the polarity of the tachyon stream. Daniel suggested that
'force' was too generic for the parameter name and that 'lazy_coherency'
better encapsulated the semantics of it being an optimization and its
purpose. Also notice that gen6_get_seqno() is only used by gen6/7
chipsets and so the test for IS_GEN6 || IS_GEN7 is redundant in that
function.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-10 11:11:32 +02:00
Chris Wilson e6994aeedc drm/i915: Export ability of changing cache levels to userspace
By selecting the cache level (essentially whether or not the CPU snoops
any updates to the bo, and on more recent machines whether it resides
inside the CPU's last-level-cache) a userspace driver is able to then
manage all of its memory within buffer objects, if it so desires. This
enables the userspace driver to accelerate uploads and more importantly
downloads from the GPU and to able to mix CPU and GPU rendering/activity
efficiently.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Added code comment about where we plan to stuff platform
specific cacheing control bits in the ioctl struct.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-26 12:56:25 +02:00
Chris Wilson 42d6ab4839 drm/i915: Segregate memory domains in the GTT using coloring
Several functions of the GPU have the restriction that differing memory
domains cannot be placed next to each other (as the GPU may prefetch
beyond the end of one domain and hang as it crosses into the other
domain). We use the facility of the drm_mm to mark ranges with a
particular color that corresponds to the cache attributes of those pages
in order to prevent allocating adjacent blocks of differing memory
types.

v2: Rebase ontop of drm_mm coloring v2.
v3: Fix rebinding existing gtt_space and add a verification routine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-26 12:56:25 +02:00
Chris Wilson f047e395dd drm/i915: Avoid concurrent access when marking the device as idle/busy
As suggested by Daniel, rip out the independent timers for device and
crtc busyness and integrate the manual powermanagement of the display
engine into the GEM core and its request tracking. The benefits are that
the code is a lot smaller, fewer moving parts and should fit more neatly
into the overall activity tracking of the driver.

v2: Complete overhaul and removal of the racy timers and workers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-25 18:23:56 +02:00
Chris Wilson a7b9761d0a drm/i915: Split i915_gem_flush_ring() into seperate invalidate/flush funcs
By moving the function to intel_ringbuffer and currying the appropriate
parameter, hopefully we make the callsites easier to read and
understand.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-25 18:23:55 +02:00
Chris Wilson 26b9c4a57f drm/i915: Remove the explicit flush of the GPU write domain
Rely instead on the insertion of the implicit flush before the seqno
breadcrumb.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-25 18:23:54 +02:00
Chris Wilson 86d5bc3782 drm/i915: Remove explicit flush from i915_gem_object_flush_fence()
As the flush is either performed explictly immediately after the
execbuffer dispatch, or before the serialisation of last_fenced_seqno we
can forgo the explict i915_gem_flush_ring().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-25 18:23:53 +02:00
Chris Wilson 69c2fc8913 drm/i915: Remove the per-ring write list
This is now handled by a global flag to ensure we emit a flush before
the next serialisation point (if we failed to queue one previously).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-25 18:23:53 +02:00
Chris Wilson 65ce302741 drm/i915: Remove the defunct flushing list
As we guarantee to emit a flush before emitting the breadcrumb or
the next batchbuffer, there is no further need for the flushing list.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-25 18:23:52 +02:00
Chris Wilson 0201f1ecf4 drm/i915: Replace the pending_gpu_write flag with an explicit seqno
As we always flush the GPU cache prior to emitting the breadcrumb, we no
longer have to worry about the deferred flush causing the
pending_gpu_write to be delayed. So we can instead utilize the known
last_write_seqno to hopefully minimise the wait times.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-25 18:23:52 +02:00
Chris Wilson e5f1d962a8 drm/i915: Remove assertion over write domain after i915_gem_object_sync()
As we move to lazily clearing the GPU write domain only when the buffer
becomes inactive, this leaves a window of opportunity for
i915_gem_object_pin_to_display_plane() to detect a seemingly
inconsistent value. This function is special as it tries to pipeline the
operation to avoid the stall and so may not retires the buffer and we
may not get the opportunity to clear the write domain. However, we know
all is good, so drop the assertion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-25 18:23:51 +02:00
Chris Wilson 3bb73aba1e drm/i915: Allow late allocation of request for i915_add_request()
Request preallocation was added to i915_add_request() in order to
support the overlay. However, not all users care and can quite happily
ignore the failure to allocate the request as they will simply repeat
the request in the future.

By pushing the allocation down into i915_add_request(), we can then
remove some rather ugly error handling in the callers.

v2: Nullify request->file_priv otherwise we chase a garbage pointer
when retiring requests.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-25 18:23:51 +02:00
Chris Wilson e9808edd98 drm/i915: Return a mask of the active rings in the high word of busy_ioctl
The intention is to help select which engine to use for copies with
interoperating clients - such as a GL client making a request to the X
server to perform a SwapBuffers, which may require copying from the
active GL back buffer to the X front buffer.

We choose to report a mask of the active rings to future proof the
interface against any changes which may allow for the object to reside
upon multiple rings.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: bikeshed away the write ring mask and add the explanation
Chris sent in a follow-up mail why we decided to use masks.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-25 18:23:50 +02:00
Chris Wilson eeef9b3874 drm/i915: Add -EIO to the list of known errors for __wait_seqno
This prevents a WARN introduced with

  commit de2b998552
  Author: Daniel Vetter <daniel.vetter@ffwll.ch>
  Date:   Wed Jul 4 22:52:50 2012 +0200

      drm/i915: don't return a spurious -EIO from intel_ring_begin

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-25 10:39:57 +02:00
Chris Wilson 67b1b57182 drm/i915: Disable the BLT on pre-production SNB hardware
It never quite worked despite the numerous workarounds, yet I still see
people trying to use this hardware and filing bug reports. As we no
longer even try to implement the workarounds, since 6a233c7887
(drm/i915/ringbuffer: kill snb blt workaround), simply disable the ring.

v2: Add a message to inform the user about the limited capabilities of
their pre-production hardware.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-20 12:21:37 +02:00
Chris Wilson 6b9d89b436 drm: Add colouring to the range allocator
In order to support snoopable memory on non-LLC architectures (so that
we can bind vgem objects into the i915 GATT for example), we have to
avoid the prefetcher on the GPU from crossing memory domains and so
prevent allocation of a snoopable PTE immediately following an uncached
PTE. To do that, we need to extend the range allocator with support for
tracking and segregating different node colours.

This will be used by i915 to segregate memory domains within the GTT.

v2: Now with more drm_mm helpers and less driver interference.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Airlie <airlied@redhat.com
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@gmail.com>
2012-07-16 05:59:37 +10:00
Daniel Vetter a9340ccab5 drm/i915: properly SIGBUS on I/O errors
... instead of looping endless with no hope of ever serving that
page-fault. We only need to break out of this loop when the gpu died,
to run the reset work (and hopefully resurrect it).

To clarify questions Chris raised on irc: This is about handling I/O
errors not from our own code, but e.g. when the disk died when trying
to swap in a gem bo. So this patch remidies the issue that the current
handling only handles gpu-death-induced cases of -EIO. Admittedly,
dying disks are much rarer than hanging gpus ...To clarify questions
Chris raised on irc: This is about handling I/O errors not from our
own code, but e.g. when the disk died when trying to swap in a gem bo.
So this patch remidies the issue that the current handling only
handles gpu-death-induced cases of -EIO. Admittedly, dying disks are
much rarer than hanging gpus ...

This seems to have been lost in:

commit d9bc7e9f32
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Feb 7 13:09:31 2011 +0000

    drm/i915: Fix infinite loop regression from 21dd3734

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-05 10:03:01 +02:00
Daniel Vetter 0a6759c6ba drm/i915: don't hang userspace when the gpu reset is stuck
With the gpu reset no longer using a trylock we've increased the
chances of userspace getting stuck quite a bit. To make that
(hopefully) rare case more paletable time out when waiting for the gpu
reset code to complete and signal this little issue to the caller by
returning -EIO.

This should help userspace to somewhat gracefully fall back and
hopefully allow the user to grab some logs and reboot the machine
(instead of staring at a frozen X screen in agony).

Suggested by Chris Wilson because I've been stubborn about allowing
the gpu reset code no to fail, ever (by removing the trylock).

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-05 10:02:24 +02:00
Daniel Vetter d6b2c790a4 drm/i915: non-interruptible sleeps can't handle -EAGAIN
So don't return -EAGAIN, even in the case of a gpu hang. Remap it to
-EIO instead. Note that this isn't really an issue with
interruptability, but more that we have quite a few codepaths (mostly
around kms stuff) that simply can't handle any errors and hence not
even -EAGAIN. Instead of adding proper failure paths so that we could
restart these ioctls we've opted for the cheap way out of sleeping
non-interruptibly.  Which works everywhere but when the gpu dies,
which this patch fixes.

So essentially interruptible == false means 'wait for the gpu or die
trying'.'

This patch is a bit ugly because intel_ring_begin is all non-interruptible
and hence only returns -EIO. But as the comment in there says,
auditing all the callsites would be a pain.

To avoid duplicating code, reuse i915_gem_check_wedge in __wait_seqno
and intel_wait_ring_buffer. Also use the opportunity to clarify the
different cases in i915_gem_check_wedge a bit with comments.

v2: Don't access dev_priv->mm.interruptible from check_wedge - we
might not hold dev->struct_mutex, making this racy. Instead pass
interruptible in as a parameter. I've noticed this because I've hit a
BUG_ON(!mutex_is_locked) at the top of check_wedge. This has been
added in

commit b4aca0106c
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Wed Apr 25 20:50:12 2012 -0700

    drm/i915: extract some common olr+wedge code

although that commit is missing any justification for this. I guess
it's just copy&paste, because the same commit add the same BUG_ON
check to check_olr, where it indeed makes sense.

But in check_wedge everything we access is protected by other means,
so this is superflous. And because it now gets in the way (we add a
new caller in __wait_seqno, which can be called without
dev->struct_mutext) let's just remove it.

v3: Group all the i915_gem_check_wedge refactoring into this patch, so
that this patch here is all about not returning -EAGAIN to callsites
that can't handle syscall restarting.

v4: Add clarification what interuptible == fales means in our code,
requested by Ben Widawsky.

v5: Fix EAGAIN mispell noticed by Chris Wilson.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-05 10:01:14 +02:00
Daniel Vetter cc889e0f6c drm/i915: disable flushing_list/gpu_write_list
This is just the minimal patch to disable all this code so that we can
do decent amounts of QA before we rip it all out.

The complicating thing is that we need to flush the gpu caches after
the batchbuffer is emitted. Which is past the point of no return where
execbuffer can't fail any more (otherwise we risk submitting the same
batch multiple times).

Hence we need to add a flag to track whether any caches associated
with that ring are dirty. And emit the flush in add_request if that's
the case.

Note that this has a quite a few behaviour changes:
- Caches get flushed/invalidated unconditionally.
- Invalidation now happens after potential inter-ring sync.

I've bantered around a bit with Chris on irc whether this fixes
anything, and it might or might not. The only thing clear is that with
these changes it's much easier to reason about correctness.

Also rip out a lone get_next_request_seqno in the execbuffer
retire_commands function. I've dug around and I couldn't figure out
why that is still there, with the outstanding lazy request stuff it
shouldn't be necessary.

v2: Chris Wilson complained that I also invalidate the read caches
when flushing after a batchbuffer. Now optimized.

v3: Added some comments to explain the new flushing behaviour.

Cc: Eric Anholt <eric@anholt.net>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-06-20 13:54:28 +02:00
Ben Widawsky f2ef6eb145 drm/i915: switch to default context on idle
To keep things as sane as possible, switch to the default context before
idling. This should help free context objects, as well as put things in
a more well defined state before suspending.

v2: remove seqno from context switch call (daniel)
return error on failed context switch instead of WARN+continue (daniel)

v3: move idling to i915_gpu idle (from i915_gem_idle) (Chris)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2012-06-14 17:36:20 +02:00
Ben Widawsky 254f965c39 drm/i915: preliminary context support
Very basic code for context setup/destruction in the driver.

Adds the file i915_gem_context.c This file implements HW context
support. On gen5+ a HW context consists of an opaque GPU object which is
referenced at times of context saves and restores.  With RC6 enabled,
the context is also referenced as the GPU enters and exists from RC6
(GPU has it's own internal power context, except on gen5).  Though
something like a context does exist for the media ring, the code only
supports contexts for the render ring.

In software, there is a distinction between contexts created by the
user, and the default HW context. The default HW context is used by GPU
clients that do not request setup of their own hardware context. The
default context's state is never restored to help prevent programming
errors. This would happen if a client ran and piggy-backed off another
clients GPU state.  The default context only exists to give the GPU some
offset to load as the current to invoke a save of the context we
actually care about. In fact, the code could likely be constructed,
albeit in a more complicated fashion, to never use the default context,
though that limits the driver's ability to swap out, and/or destroy
other contexts.

All other contexts are created as a request by the GPU client. These
contexts store GPU state, and thus allow GPU clients to not re-emit
state (and potentially query certain state) at any time. The kernel
driver makes certain that the appropriate commands are inserted.

There are 4 entry points into the contexts, init, fini, open, close.
The names are self-explanatory except that init can be called during
reset, and also during pm thaw/resume. As we expect our context to be
preserved across these events, we do not reinitialize in this case.

As Adam Jackson pointed out, The cutoff of 1MB where a HW context is
considered too big is arbitrary. The reason for this is even though
context sizes are increasing with every generation, they have yet to
eclipse even 32k. If we somehow read back way more than that, it
probably means BIOS has done something strange, or we're running on a
platform that wasn't designed for this.

v2: rename load/unload to init/fini (daniel)
remove ILK support for get_size() (indirectly daniel)
add HAS_HW_CONTEXTS macro to clarify supported platforms (daniel)
added comments (Ben)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2012-06-14 17:36:16 +02:00
Daniel Vetter 8ecd1a6615 drm/i915: call intel_enable_gtt
When drm/i915 is in control of the gtt, we need to call
the enable function at all the relevant places ourselves.

Reviewed-by: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-06-12 22:21:07 +02:00
Daniel Vetter dd2757f8b5 drm/i915: stop using dev->agp->base
For that to work we need to export the base address of the gtt
mmio window from intel-gtt. Also replace all other uses of
dev->agp by values we already have at hand.

Reviewed-by: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-06-12 22:18:06 +02:00
Ben Widawsky eac1f14fd1 drm/i915: Inifite timeout for wait ioctl
Change the ns_timeout parameter of the wait ioctl to a signed value.
Doing this allows the kernel to provide an infinite wait when a timeout
of less than 0 is provided. This mimics select/poll.

Initially the parameter was meant to match up with the GL spec 1:1, but
after being made aware of how much 2^64 - 1 nanoseconds actually is, I
do not think anyone will ever notice the loss of 1 bit.

The infinite timeout on waiting is similar to the existing i915
userspace interface with the exception that struct_mutex is dropped
while doing the wait in this ioctl.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-06-06 12:25:46 +02:00
Daniel Vetter 30dfebf34b drm/i915: extract object active state flushing code
Both busy_ioctl and the new wait_ioct need to do the same dance (or at
least should). Some slight changes:
- busy_ioctl now unconditionally checks for olr. Before emitting a
  require flush would have prevent the olr check and hence required a
  second call to the busy ioctl to really emit the request.
- the timeout wait now also retires request. Not really required for
  abi-reasons, but makes a notch more sense imo.

I've tested this by pimping the i-g-t test some more and also checking
the polling behviour of the wait_rendering_timeout ioctl versus what
busy_ioctl returns.

v2: Too many people complained about unplug, new color is
flush_active.

v3: Kill the comment about the unplug moniker.

v4: s/un-active/inactive/

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-06-02 20:51:03 +02:00
Daniel Vetter e269f90f3d Merge remote-tracking branch 'airlied/drm-prime-vmap' into drm-intel-next-queued
We need the latest dma-buf code from Dave Airlie so that we can pimp
the backing storage handling code in drm/i915 with Chris Wilson's
unbound tracking and stolen mem backed gem object code.

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-06-01 10:52:54 +02:00
Ben Widawsky b9524a1e1c drm/i915: remap l3 on hw init
If any l3 rows have been previously remapped, we must remap them after
GPU reset/resume too.

v2: Just return (no warn) on remapping init if not IVB (Jesse)
Move the check of schizo userspace to i915_gem_l3_remap (Jesse)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-31 12:11:29 +02:00
Dave Airlie a21f976094 Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into drm-fixes
* 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel:
  drm/i915: tune down the noise of the RP irq limit fail
  drm/i915: Remove the error message for unbinding pinned buffers
  drm/i915: Limit page allocations to lowmem (dma32) for i965
  drm/i915: always use RPNSWREQ for turbo change requests
  drm/i915: reject doubleclocked cea modes on dp
  drm/i915: Adding TV Out Missing modes.
  drm/i915: wait for a vblank to pass after tv detect
  drm/i915: no lvds quirk for HP t5740e Thin Client
  drm/i915: enable vdd when switching off the eDP panel
  drm/i915: Fix PCH PLL assertions to not assume CRTC:PLL relationship
  drm/i915: Always update RPS interrupts thresholds along with frequency
  drm/i915: properly handle interlaced bit for sdvo dtd conversion
  drm/i915: fix module unload since error_state rework
  drm/i915: be more careful when returning -ENXIO in gmbus transfer
2012-05-29 11:09:06 +01:00
Ben Widawsky 199b2bc25b drm/i915: s/i915_wait_request/i915_wait_seqno/g
Wait request is poorly named IMO. After working with these functions for
some time, I feel it's much clearer to name the functions more
appropriately.

Of course we must update the callers to use the new name as well.

This leaves room within our namespace for a *real* wait request function
at some point.

Note to maintainer: this patch is optional.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-25 14:18:42 +02:00
Ben Widawsky 23ba4fd0a4 drm/i915: wait render timeout ioctl
This helps implement GL_ARB_sync but stops short of allowing full blown
sync objects. Finally we can use the new timed seqno waiting function
to allow userspace to wait on a buffer object with a timeout. This
implements that interface.

The IOCTL will take as input a buffer object handle, and a timeout in
nanoseconds (flags is currently optional but will likely be used for
permutations of flush operations). Users may specify 0 nanoseconds to
instantly check.

The wait ioctl with a timeout of 0 reimplements the busy ioctl. With any
non-zero timeout parameter the wait ioctl will wait for the given number
of nanoseconds on an object becoming unbusy. Since the wait itself does
so holding struct_mutex the object may become re-busied before this
completes. A similar but shorter race condition exists in the busy
ioctl.

v2: ETIME/ERESTARTSYS instead of changing to EBUSY, and EGAIN (Chris)
Flush the object from the gpu write domain (Chris + Daniel)
Fix leaked refcount in good case (Chris)
Naturally align ioctl struct (Chris)

v3: Drop lock after getting seqno to avoid ugly dance (Chris)

v4: check for 0 timeout after olr check to allow polling (Chris)

v5: Updated the comment. (Chris)

v6: Return -ETIME instead of -EBUSY when timeout_ns is 0 (Daniel)
Fix the commit message comment to be less ugly (Ben)
Add a warning to check the return timespec (Ben)

v7: Use DRM_AUTH for the ioctl. (Eugeni)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-25 14:15:46 +02:00
Chris Wilson 31d8d651eb drm/i915: Remove the error message for unbinding pinned buffers
This is now used intentionally to prevent proliferation of is-pinned
checks upon the inactive list following:

commit 1b50247a8d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Apr 24 15:47:30 2012 +0100

    drm/i915: Remove the list of pinned inactive objects

Reported-and-tested-by: guang.a.yang@intel.com
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50075
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-25 10:10:40 +02:00
Chris Wilson bed1ea95a3 drm/i915: Limit page allocations to lowmem (dma32) for i965
Broadwater and Crestline share a limitation that prevent it from
relocating general surface state above 4GiB. The only recourse we have
since any buffer object may be used as a relocation target is then to
limit all object allocations on 965g[m] to DMA32.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-25 10:07:06 +02:00
Ben Widawsky 5c81fe85da drm/i915: timeout parameter for seqno wait
Insert a wait parameter in the code so we can possibly timeout on a
seqno wait if need be. The code should be functionally the same as
before because all the callers will continue to retry if an arbitrary
timeout elapses.

We'd like to have nanosecond granularity, but the only way to do this is
with hrtimer, and that doesn't fit well with the needs of this code.

v2: Fix rebase error (Chris)
Return proper time even in wedged + signal case (Chris + Ben)
Use timespec constructs (Ben)
Didn't take Daniel's advice regarding the Frankenstein-ness of the
  function. I did try his advice, but in the end I liked the way the
  original code looked, better.

v3: Make wakeups far less frequent for infinite waits (Chris)

v4: Remove dummy_wait variable (Daniel)
Use raw monotonic time instead of jiffies (made the code a bit cleaner) (Ben)
Added a couple of warnings (Ben)

v5: Remove warnings (Daniel)
Use more accurate time diff for default case (Daniel)
Bikeshed for setting the return timespec in timeout case (Daniel)
s/jiffies/time in one of the comments

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-25 09:55:08 +02:00
Daniel Vetter 1286ff7397 i915: add dmabuf/prime buffer sharing support.
This adds handle->fd and fd->handle support to i915, this is to allow
for offloading of rendering in one direction and outputs in the other.

v2 from Daniel Vetter:
- fixup conflicts with the prepare/finish gtt prep work.
- implement ppgtt binding support.

Note that we have squat i-g-t testcoverage for any of the lifetime and
access rules dma_buf/prime support brings along. And there are quite a
few intricate situations here.

Also note that the integration with the existing code is a bit
hackish, especially around get_gtt_pages and put_gtt_pages. It imo
would be easier with the prep code from Chris Wilson's unbound series,
but that is for 3.6.

Also note that I didn't bother to put the new prepare/finish gtt hooks
to good use by moving the dma_buf_map/unmap_attachment calls in there
(like we've originally planned for).

Last but not least this patch is only compile-tested, but I've changed
very little compared to Dave Airlie's version. So there's a decent
chance v2 on drm-next works as well as v1 on 3.4-rc.

v3: Right when I've hit sent I've noticed that I've screwed up one
obj->sg_list (for dmar support) and obj->sg_table (for prime support)
disdinction. We should be able to merge these 2 paths, but that's
material for another patch.

v4: fix the error reporting bugs pointed out by ickle.

v5: fix another error, and stop non-gtt mmaps on shared objects
stop pread/pwrite on imported objects, add fake kmap

Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-23 10:47:10 +01:00
Chris Wilson b4519513e8 drm/i915: Introduce for_each_ring() macro
In many places we wish to iterate over the rings associated with the
GPU, so refactor them to use a common macro.

Along the way, there are a few code removals that should be side-effect
free and some rearrangement which should only have a cosmetic impact,
such as error-state.

Note that this slightly changes the semantics in the hangcheck code:
We now always cycle through all enabled rings instead of
short-circuiting the logic.

v2: Pull in a couple of suggestions from Ben and Daniel for
intel_ring_initialized() and not removing the warning (just moving them
to a new home, closer to the error).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Added note to commit message about the small behaviour
change, suggested by Ben Widawsky.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-19 22:39:53 +02:00
Daniel Vetter 5e13a0c5ec Merge remote-tracking branch 'airlied/drm-core-next' into drm-intel-next-queued
Backmerge of drm-next to resolve a few ugly conflicts and to get a few
fixes from 3.4-rc6 (which drm-next has already merged). Note that this
merge also restricts the stencil cache lra evict policy workaround to
snb (as it should) - I had to frob the code anyway because the
CM0_MASK_SHIFT define died in the masked bit cleanups.

We need the backmerge to get Paulo Zanoni's infoframe regression fix
for gm45 - further bugfixes from him touch the same area and would
needlessly conflict.

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-08 13:39:59 +02:00
Daniel Vetter dc257cf154 Linux 3.4-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQEcBAABAgAGBQJPpvY9AAoJEHm+PkMAQRiGpEoIAJgbu+Y8gITnBK/wh9O6zy3S
 5jie5KK4YWdbJsvO58WbNr3CyVIwGIqQ2dUZLiU59aBVLarlGw8xor0MmW+cZwhp
 6fBHaf0qDYAV0MZjD+mnnExOiCRyISa2lPmsfu9dAWywh5KGe6/oAP6/qcXIyok3
 KZyl3qQf4ENpaZPHwZPXCEkUvtuyHgNiszN+QXEadA3s19Ot4VGe9A3VGw+GNrSm
 JqFIq3acQAbKa5BYaqf7TQC02v2FI7//eqt6QHxTqbE6a7LGbTvLfX3HlJ2mnfqa
 1R6QHhM4y4OZDHbaMT2raHZ8WuLXzhehJzhP8Co7AHFOKwVKOb5XbcUr2RrukMU=
 =HkMd
 -----END PGP SIGNATURE-----

Merge tag 'v3.4-rc6' into drm-intel-next

Conflicts:
	drivers/gpu/drm/i915/intel_display.c

Ok, this is a fun story of git totally messing things up. There
/shouldn't/ be any conflict in here, because the fixes in -rc6 do only
touch functions that have not been changed in -next.

The offending commits in drm-next are 14415745b2..1fa611065 which
simply move a few functions from intel_display.c to intel_pm.c. The
problem seems to be that git diff gets completely confused:

$ git diff 14415745b2..1fa611065

is a nice mess in intel_display.c, and the diff leaks into totally
unrelated functions, whereas

$git diff --minimal  14415745b2..1fa611065

is exactly what we want.

Unfortunately there seems to be no way to teach similar smarts to the
merge diff and conflict generation code, because with the minimal diff
there really shouldn't be any conflicts. For added hilarity, every
time something in that area changes the + and - lines in the diff move
around like crazy, again resulting in new conflicts. So I fear this
mess will stay with us for a little longer (and might result in
another backmerge down the road).

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-07 14:02:14 +02:00
Ben Widawsky b4aca0106c drm/i915: extract some common olr+wedge code
The new wait_rendering ioctl also needs to check for an oustanding
lazy request, and we already duplicate that logic at three places. So
extract it.

While at it, also extract the code to check the gpu wedging state to
improve code flow.

v2: Don't use seqno as an outparam (Chris)

v3 by danvet: Kill stale comment and pimp commit message

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:32 +02:00
Daniel Vetter 53ca26cab8 drm/i915 disallow physical batchbuffers for KMS
Even the horrible gen3 XvMC code has learned to do this
right by the time xf86-video-intel releases learned to do
kernel modesetting. So we can just disallow this.

Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:25 +02:00
Daniel Vetter 8781342df7 drm/i915: create dev_priv->dri1 dragon dungeon^W^W sub-struct
... and shove allow_batchbuffer in there. More dragons will
follow suit.

There's the curious case that we allow this for KMS ...

Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:25 +02:00
Ben Widawsky 3b88cc0dd7 drm/i915: use __wait_seqno for ring throttle
It turns out throttle had an almost identical  bit of code to do the
wait. Now we can call the new helper directly.  This is just a bonus,
and not needed for the overall series.

v2: remove irq_get/put which is now in __wait_seqno (Ben)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:23 +02:00
Ben Widawsky 4146b08d76 drm/i915: remove polled wait from throttle
It's about to go away anyway. Just here to help bisection.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:22 +02:00
Ben Widawsky 604dd3ec75 drm/i915: extract __wait_seqno from i915_wait_request
i915_wait_request is actually a fairly large function encapsulating
quite a few different operations. Because being able to wait on seqnos
in various conditions is useful, extracting that bit of code to a helper
function seems useful

v2: pull the irq_get/put as well (Ben)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:22 +02:00
Ben Widawsky c58cf4f108 drm/i915: drop polled waits from i915_wait_request
The only time irq_get should fail is during unload or suspend. Both of
these points should try to quiesce the GPU before disabling interrupts
and so the atomic polling should never occur.

This was recommended by Chris Wilson as a way of reducing added
complexity to the polled wait which I introduced in an RFC patch.

09:57 < ickle_> it's only there as a fudge for waiting after irqs
after uninstalled during s&r, we aren't actually meant to hit it
09:57 < ickle_> so maybe we should just kill the code there and fix the breakage

v2: return -ENODEV instead of -EBUSY when irq_get fails

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:22 +02:00
Ben Widawsky 9574b3fe29 drm/i915: kill waiting_seqno
The waiting_seqno is not terribly useful, and as such we can remove it
so that we'll be able to extract lockless code.

v2: Keep the information for error_state (Chris)
Check if ring is initialized in hangcheck (Chris)
Capture the waiting ring (Chris)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: add some bikeshed to clarify a comment.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:21 +02:00
Ben Widawsky be998e2e39 drm/i915: move vbetool invoked ier stuff
This extra bit of interrupt enabling code doesn't belong in the wait
seqno function. If anything we should pull it out to a helper so the
throttle code can also use it. The history is a bit vague, but I am
going to attempt to just dump it, unless someone can argue otherwise.

Removing this allows for a shared lock free wait seqno function. To keep
tabs on this issue though, the IER value is stored on error capture
(recommended by Chris Wilson)

v2: fixed typo EIR->IER (Ben)
Fix some white space (Ben)
Move IER capture to globally instead of per ring (Ben)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: ier is a 16 bit reg on gen2!]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:21 +02:00
Ben Widawsky b2da9fe5d5 drm/i915: remove do_retire from i915_wait_request
This originates from a hack by me to quickly fix a bug in an earlier
patch where we needed control over whether or not waiting on a seqno
actually did any retire list processing. Since the two operations aren't
clearly related, we should pull the parameter out of the wait function,
and make the caller responsible for retiring if the action is desired.

The only function call site which did not get an explicit retire_request call
(on purpose) is i915_gem_inactive_shrink(). That code was already calling
retire_request a second time.

v2: don't modify any behavior excepit i915_gem_inactive_shrink(Daniel)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:20 +02:00
Daniel Vetter 507432986c drm/i915: use the new masked bit macro some more
I've missed this one.

v2: Chris Wilson noticed another register.
v3: Color choice improvements.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:20 +02:00
Daniel Vetter 63ed2cb2d1 drm/i915: rip out GEM drm feature checks
We always set it so there's no point in checking. We could
instead add a bit that tells us whether gem is actually
initialized (i.e. either kms or gem_init_ioctl called), but
that's imho not worth it.

So just rip it out.

There's a little change in the wait_ring timeout, but we've never
run with anything else than the 60 second timeout, even on dri1
userspace.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:14 +02:00
Daniel Vetter 7bb6fb8dd9 drm/i915: disallow gem ums init ioctl for kms
This ioctl used in a kms driver is only useful to create massive
havoc.

v2: Bail out with -ENODEV as suggested by Chris Wilson.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:13 +02:00
Chris Wilson 1070a42b6b drm/i915: Move GEM initialisation from i915_dma.c to i915_gem.c
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:12 +02:00
Chris Wilson 1488fc08c1 drm/i915: Remove the deferred-free list
The use of the mm_list by deferred-free breaks the following patches to
extend the range of objects tracked. We can simplify things if we just
make the unbind during free uninterrutible.

Note that unbinding should never fail, because we hold an additional
reference on every active object. Only the ilk vt-d workaround breaks
this, but already takes care of not failing by waiting for the gpu to
quiescent non-interruptible. But the existence of the deferred free
list casted some doubts on this theory, hence WARN if the unbind fails
and only then retry non-interruptible.

We can kill this additional code after a release in case the theory is
indeed right and no one has hit that WARN.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:11 +02:00
Chris Wilson 1b50247a8d drm/i915: Remove the list of pinned inactive objects
Simplify object tracking by removing the inactive but pinned list. The
only place where this was used is for counting the available memory,
which is just as easy performed by checking all objects on the rare
occasions it is required (application startup). For ease of debugging,
we keep the reporting of pinned objects through the error-state and
debugfs.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:11 +02:00
Chris Wilson a39d7efc62 drm/i915: Remove i915_gem_evict_inactive()
This was only used by one external caller who would just be as happy
with evict-everything, so perform the replacement and make the function
private.

In the process we note that unbinding the inactive list should not fail,
and make it a warning instead.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:10 +02:00
Chris Wilson 8325a09dd0 drm/i915: Bump the inactive LRU on set-to-GTT-domain
Currently, we only bump the inactive LRU of an object when we bind
into the GTT for a page-fault. As the object may be used many times
before its mapping is zapped, we do not mark it as active as
frequently as we should. Userspace should be calling set-to-GTT-domain
before each pointer deference (for synchronous access) and so is a
good place to mark the buffer as active.

Marking the buffer as recently used places it at the end of the
inactive eviction queue, though still before anything with outstanding
rendering. This reduces the likelihood of evicting a buffer that is
going to be used again by the CPU in the near future. This way we can
hopefully avoid to kick out upload buffers right before we use them on
the gpu.

Note that we need to check that the object is not active or pinned,
for otherwise we create havoc on the active/pinned lists, which also
use obj->mm_list.

The active lists are sorted by and evicted in last GPU rendering
order, access by the CPU to a still active buffer therefore does not
affect its eviction ordering. Pinned objects are currently excluded
from eviction, therefore the only list that we need to bump for GTT
access by the CPU is the inactive list.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Added further explanations to the commit message as discussed
on irc.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:10 +02:00
Daniel Vetter 6b26c86d61 drm/i915: create macros to handle masked bits
... and put them to so good use.

Note that there's functional change in vlv clock gating code, we now
no longer spuriously read back the current value of the bit. According
to Bspec the high bits should always read zero, so ORing this in
should have no effect.

Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:08 +02:00
Chris Wilson 5d82e3e642 drm/i915: Clarify the semantics of tiling_changed
Rename obj->tiling_changed to obj->fence_dirty so that it is clear that
it flags when the parameters for an active fence (including the
no-fence) register are changed.

Also, do not set this flag when the object does not have a fence
register allocated currently and the gpu does not depend upon the
unfence. This case works exactly like when a tiled object lost its
fence and hence does not need additional handling for the tiling
change in the code.

v2: Use fence_dirty to better express what the flag tracks and add a few
more details to the comments to serve as a reminder of how the GPU also
uses the unfenced register slot.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Add some bikeshed to the commit message about the stricter
use of fence_dirty.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:06 +02:00
Ben Widawsky 4f0c7cfbb4 drm/i915: [sparse] __iomem fixes for gem
As with one of the earlier patches in the series, we're forced to cast
for copy_[to|from]_user. Again because of the nature of the GEN x86
exclusivity, this should be safe.

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
[danvet: Added some bikeshed.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:01 +02:00
Linus Torvalds 6be5ceb02e VM: add "vm_mmap()" helper function
This continues the theme started with vm_brk() and vm_munmap():
vm_mmap() does the same thing as do_mmap(), but additionally does the
required VM locking.

This uninlines (and rewrites it to be clearer) do_mmap(), which sadly
duplicates it in mm/mmap.c and mm/nommu.c.  But that way we don't have
to export our internal do_mmap_pgoff() function.

Some day we hopefully don't have to export do_mmap() either, if all
modular users can become the simpler vm_mmap() instead.  We're actually
very close to that already, with the notable exception of the (broken)
use in i810, and a couple of stragglers in binfmt_elf.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-20 17:29:13 -07:00
Chris Wilson 14415745b2 drm/i915: Refactor get_fence() to use the common fence writing routine
We can also take advantage of the new 'no retire' mode for seqno waiting
to avoid having to take a reference on the old fence object whilst
flushing an existing fence.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-18 13:40:51 +02:00
Chris Wilson ada726c734 drm/i915: Refactor fence clearing to use the common fence writing routine
Now that we have a routine that is able to clear the fences as well as
setup up the register for a tiled object, remove the surplus routines to
clear the fences.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-18 13:34:53 +02:00
Chris Wilson 61050808bb drm/i915: Refactor put_fence() to use the common fence writing routine
One clarification that we make is to the existing semantics of
obj->tiling_changed to only mean that we need to update an associated
fence register (including the NO_FENCE when executing an untiled but
fenced GPU command). If we do not have a fence register or pending
fenced GPU access for the object (after put_fence() for example), then
we can clear the tiling_changed flag as any fence will necessarily be
rewritten upon acquisition.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-18 13:34:30 +02:00
Chris Wilson 9ce079e481 drm/i915: Prepare to consolidate fence writing
Update the existing architecture specific fence writing routines to
either update the fence to point to a tiled object or to clear them in
preparation to remove the other fence writing routes.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-18 13:24:32 +02:00
Chris Wilson 1899184547 drm/i915: Remove the unsightly "optimisation" from flush_fence()
As i915_wait_request() will first check for an already passed seqno,
doing it also in the caller is a waste of space for a cold path.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-18 13:23:17 +02:00
Chris Wilson 8fe301add5 drm/i915: Simplify fence finding
As the fences are stored in LRU order, we can simply reuse the oldest if
we do not have an unused register.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-18 13:20:35 +02:00
Chris Wilson 1c293ea3b1 drm/i915: Discard the unused obj->last_fenced_ring
As we now never pipeline a fence update, obj->last_fenced_ring is always
the same as the obj->ring whenever obj->last_fenced_seqno is active, so
remove it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-18 13:19:51 +02:00
Chris Wilson 69963e7c76 drm/i915: Remove unused ring->setup_seqno
As we now no longer track a pipelined fence change, we never use
ring->setup_seqno and can kill it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-18 13:18:52 +02:00
Chris Wilson a360bb1a83 drm/i915: Remove fence pipelining
Step 2 is then to replace the pipelined parameter with NULL and perform
constant folding to remove dead code.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-18 13:18:25 +02:00
Chris Wilson 06d9813157 drm/i915: Remove the pipelined parameter from get_fence()
We never succeeded in getting pipelined fencing to work (unresolved
spurious GPU hangs), so begin the process of dismantling and removal
the broken code.

Step 1 is the removal of the pipeline parameter to get_fence().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-18 13:15:43 +02:00
Daniel Vetter 48ecfa1090 drm/i915: properly set ppgtt cacheability on snb
For some reason snb has 2 fields to set ppgtt cacheability. This one
here does not exist on gen7.

This might explain why ppgtt wasn't a win on snb like on ivb - not
enough pte caching.

v2: Fixup rebase fail.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-17 11:19:59 +02:00
Daniel Vetter be901a5a1b drm/i915: set w/a bit for snb pagefaults
Bspec says that we need to set this: vol1c.3 "Blitter Command
Streamer", Section 1.1.2.1 "GAB_CTL_REG - GAB Unit Control Register".

We don't really rely on pagefaults, but who knows what this all
affects.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-17 11:19:56 +02:00
Daniel Vetter 767878908e Linux 3.4-rc3
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQEbBAABAgAGBQJPi3XOAAoJEHm+PkMAQRiGnsUH9RjHwH4YFVyuP/DKtKa6zs74
 wqkpT15yITQ5WWMog4JaJFFg5rJCUd8QZr7AS/HSn0ijDyZX5VU7Rcs9cMudDzNR
 H/5K/AscS4fjb0HwWVqoltTWHRb9QGSwVN3+E3VCDLt9P89YJ0o3QztkkuEX5dkZ
 jc7reVXTfRnCcILEa9jleOzrn+OLM3j/jAjQ2hGunl8EDLzD4b17HHPoli4jEZ/5
 5ibpSVsPD+AqzN+glbXvYjVItl12D0IQos/JdOwfuZriCVWLxysSSwHZTbPCyvBZ
 LHH4HR5T+XLSXbjJeNkUFHLzqU+d5gVRadIoWtJCxqxFjKbOs2YtzJ5Ai0nDiw==
 =kTkC
 -----END PGP SIGNATURE-----

Merge tag 'v3.4-rc3' into drm-intel-next-queued

Backmerge Linux 3.4-rc3 into drm-intel-next to resolve a few things
that conflict/depend upon patches in -rc3:
- Second part of the Sandybridge workaround series - it changes some
  of the same registers.
- Preparation for Chris Wilson's fencing cleanup - we need the fix
  from -rc3 merged before we can move around all that code.
- Resolve the gmbus conflict - gmbus has been disabled in 3.4 again,
  but should be enabled on all generations in 3.5.

Conflicts:
	drivers/gpu/drm/i915/intel_i2c.c

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-17 11:16:20 +02:00
Daniel Vetter c07496fa61 drm/i915: don't pwrite tiled objects through the gtt
... we will botch up the bit17 swizzling. Furthermore tiled pwrite is
a (now) unused slowpath, so no one really cares.

This fixes the last swizzling issues I have with i-g-t on my bit17
swizzling i915G. No regression, it's been broken since the dawn of
gem, but it's nice for regression tracking when really _all_ i-g-t
tests work.

Actually this is not true, Chris Wilson noticed while reviewing this
patch that the commit

commit d9e86c0ee6
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Nov 10 16:40:20 2010 +0000

    drm/i915: Pipelined fencing [infrastructure]

contained a functional change that broke things.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-15 19:37:42 +02:00
Ben Widawsky 1500f7ea06 drm/i915: hide (seqno-1) in ringbuffer code
Waiting for seqno-1 in our object synchronization code is an
implementation detail given how we've decided to do the waits within the
rest of our code.

Requested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-12 21:14:14 +02:00
Ben Widawsky e3a5a2250a drm/i915: fix for when semaphore updates fail
This fixes a long standing issue where emitting the semaphore updates
may have failed, but we've already updated our internal data structure.

Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-12 21:14:13 +02:00
Ben Widawsky 5816d648d5 drm/i915: i915_gem_object_sync must handle NULL
When I extracted the synchronization code for implementing semaphorified
pageflips (74f5f6e0), I neglected the non pipelined case which also
calls this code. The modesetting code wants to make sure the object has
finished rendering to the frame before configuring the scanout (ie.
non-pipelined case).

As a result of a follow on discussion on IRC, I've decided to add a
comment about the function itself which received much inspiration from
Chris as well. So really, this patch was ghost-written by Chris :).

Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-12 21:14:13 +02:00
Chris Wilson f84131905b drm/i915: Allow concurrent read access between CPU and GPU domain
Similar to allowing a buffer to be simultaneously read by the GPU and
through the GTT, we wish to allow readback of the pages through the CPU
domain whilst they are also being read by the GPU. Domain coherency
is managed by allowing multiple readers, but only a single writer.

This is used by mesa for its program cache which it may search for every
new program every frame and then renews should it need to add. During
renewal, mesa copies the program bo currently executing through a CPU
mapping onto the new bo. This patch allows the search and that copy to
proceed without causing a stall on the current batch.

Testcase: i-g-t/tests/gem_cpu_concurrent_blit
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-12 21:14:10 +02:00
Ben Widawsky 2911a35b2e drm/i915: use semaphores for the display plane
In theory this will have performance and power improvements. Performance
because we don't need to stall when the scanout BO is busy, and power
because we don't have to stall when the BO is busy (and the ring can
even go to sleep if the HW supports it).

v2:
squash 2 patches into 1 (me)
un-inline the enable_semaphores function (Daniel)
remove comment about SNB hangs from i915_gem_object_sync (Chris)
rename intel_enable_semaphores to i915_semaphore_is_enabled (me)
removed page flip comment; "no why" (Chris)

To address other comments from Daniel (irc):
update the comment to say 'vt-d is crap, don't enable semaphores'
  - I think you misinterpreted Chris' comment, it already exists.
checking out whether we can pageflip on the render ring on ivb (didn't
work on early silicon)
  - We don't want to enable workarounds for early silicon unless we have
    to.
  - I can't find any references in the docs about this.
optionally use it if the fb is already busy on the render ring
  - This should be how the code already worked, unless I am
    misunderstanding your meaning.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-12 21:14:05 +02:00
Chris Wilson 9a5a53b392 drm/i915: Reorganise rules for get_fence/put_fence
By simplifying the rules to calling get_fence when writing to the
through the GTT in a tiled manner, and calling put_fence before writing
to the object through the GTT in a linear manner, the code becomes
clearer and there is less chance of making a mistake.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[danvet: fixed up conflict with ppgtt code and spelling in a new
comment.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-12 21:14:04 +02:00
Dave Airlie effbc4fd8e Merge branch 'drm-intel-next' of git://people.freedesktop.org/~danvet/drm-intel into drm-core-next
Daniel Vetter wrote
First pull request for 3.5-next, slightly large than usual because new
things kept coming in since the last pull for 3.4.
Highlights:
- first batch of hw enablement for vlv (Jesse et al) and hsw (Eugeni). pci
 ids are not yet added, and there's still quite a few patches to merge
 (mostly modesetting). To make QA easier I've decided to merge this stuff
 in pieces.
- loads of cleanups and prep patches spurred by the above. Especially vlv
 is a real frankenstein chip, but also hsw is stretching our driver's
 code design. Expect more to come in this area for 3.5.
- more gmbus fixes, cleanups and improvements by Daniel Kurtz. Again,
 there are more patches needed (and some already queued up), but I wanted
 to split this a bit for better testing.
- pwrite/pread rework and retuning. This series has been in the works for
 a few months already and a lot of i-g-t tests have been created for it.
 Now it's finally ready to be merged.  Note that one patch in this series
 touches include/pagemap.h, that patch is acked-by akpm.
- reduce mappable pressure and relocation throughput improvements from
 Chris.
- mmap offset exhaustion mitigation by Chris Wilson.
- a start at figuring out which codepaths in our messy dri1/ums+gem/kms
 driver we actually need to support by bailing out of unsupported case.
 The driver now refuses to load without kms on gen6+ and disallows a few
 ioctls that userspace never used in certain cases. More of this will
 definitely come.
- More decoupling of global gtt and ppgtt.
- Improved dual-link lvds detection by Takashi Iwai.
- Shut up the compiler + plus fix the fallout (Ben)
- Inverted panel brightness handling (mostly Acer manages to break things
 in this way).
- Small fixlets and adjustements and some minor things to help debugging.

Regression-wise QA reported quite a few issues on ivb, but all of them
turned out to be hw stability issues which are already fixed in
drm-intel-fixes (QA runs the nightly regression tests on -next alone,
without -fixes automatically merged in). There's still one issue open on
snb, it looks like occlusion query writes are not quite as cache coherent
as we've expected. With some of the pwrite adjustements we can now
reliably hit this. Kernel workaround for it is in the works."

* 'drm-intel-next' of git://people.freedesktop.org/~danvet/drm-intel: (101 commits)
  drm/i915: VCS is not the last ring
  drm/i915: Add a dual link lvds quirk for MacBook Pro 8,2
  drm/i915: make quirks more verbose
  drm/i915: dump the DMA fetch addr register on pre-gen6
  drm/i915/sdvo: Include YRPB as an additional TV output type
  drm/i915: disallow gem init ioctl on ilk
  drm/i915: refuse to load on gen6+ without kms
  drm/i915: extract gt interrupt handler
  drm/i915: use render gen to switch ring irq functions
  drm/i915: rip out old HWSTAM missed irq WA for vlv
  drm/i915: open code gen6+ ring irqs
  drm/i915: ring irq cleanups
  drm/i915: add SFUSE_STRAP registers for digital port detection
  drm/i915: add WM_LINETIME registers
  drm/i915: add WRPLL clocks
  drm/i915: add LCPLL control registers
  drm/i915: add SSC offsets for SBI access
  drm/i915: add port clock selection support for HSW
  drm/i915: add S PLL control
  drm/i915: add PIXCLK_GATE register
  ...

Conflicts:
	drivers/char/agp/intel-agp.h
	drivers/char/agp/intel-gtt.c
	drivers/gpu/drm/i915/i915_debugfs.c
2012-04-12 10:27:01 +01:00
Daniel Vetter 15a13bbdff drm/i915: clear fencing tracking state when retiring requests
This fixes a resume regression introduced in

commit 7dd4906586
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Mar 21 10:48:18 2012 +0000

    drm/i915: Mark untiled BLT commands as fenced on gen2/3

which fixed fencing tracking for untiled blt commands.

A side effect of that patch was that now also untiled objects have a
non-zero obj->last_fenced_seqno to track when a fence can be set up
after a pipelined tiling change. Unfortunately this was only cleared
by the fence setup and teardown code, resulting in tons of untiled but
inactive objects with non-zero last_fenced_seqno.

Now after resume we completely reset the seqno tracking, both on the
driver side (by setting dev_priv->next_seqno = 1) and on the hw side
(by allocating a new hws page, which contains the seqnos). Hilarity
and indefinite waits ensued from the stale seqnos in
obj->last_fenced_seqno from before the suspend.

The fix is to properly clear the fencing tracking state like we
already do for the normal gpu rendering while moving objects off the
active list.

Reported-and-tested-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Jiri Slaby <jslaby@suse.cz>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-12 09:02:37 +02:00
Daniel Vetter f534bc0b22 drm/i915: disallow gem init ioctl on ilk
Ums is already disabled, but on ilk we can additionally disable gem
initialization when using user mode setting. Upstream never support
ilk without kernel modesetting and not even the RHEL ilk ums backport
needs gem - that driver is based on xf86-video-intel version 2.2,
which is pre-gem.

Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-09 18:04:08 +02:00
Chris Wilson 7dd4906586 drm/i915: Mark untiled BLT commands as fenced on gen2/3
The BLT commands on gen2/3 utilize the fence registers and so we cannot
modify any fences for the object whilst those commands are in flight.
Currently we marked tiled commands as occupying a fence, but forgot to
restrict the untiled commands from preventing a fence being assigned
before they were completed.

One side-effect is that we ten have to double check that a fence was
allocated for a fenced buffer during move-to-active.

Reported-by: Jiri Slaby <jirislaby@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43427
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47990
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Testcase: i-g-t/tests/gem_tiled_after_untiled_blt
Tested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-01 12:26:05 +02:00
Daniel Vetter 55a254ac63 drm/i915: properly restore the ppgtt page directory on resume
The ppgtt page directory lives in a snatched part of the gtt pte
range. Which naturally gets cleared on hibernate when we pull the
power. Suspend to ram (which is what I've tested) works because
despite the fact that this is a mmio region, it is actually back by
system ram.

Fix this by moving the page directory setup code to the ppgtt init
code (which gets called on resume).

This fixes hibernate on my ivb and snb.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-01 12:25:29 +02:00
Jesse Barnes 23e3f9b37e drm/i915: check for disabled interrupts on ValleyView
Haven't seen this yet, but it doesn't hurt.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-29 00:11:46 +02:00
Daniel Vetter e7e58eb5c0 drm/i915: mark pwrite/pread slowpaths with unlikely
Beside helping the compiler untangle this maze they double-up as
documentation for which parts of the code aren't performance-critical
but just around to keep old (but already dead-slow) userspace from
breaking.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:41:41 +02:00
Daniel Vetter 23c18c71da drm/i915: fixup in-line clflushing on bit17 swizzled bos
The issue is that with inline clflushing the clflushing isn't properly
swizzled. Fix this by
- always clflushing entire 128 byte chunks and
- unconditionally flush before writes when swizzling a given page.
  We could be clever and check whether we pwrite a partial 128 byte
  chunk instead of a partial cacheline, but I've figured that's not
  worth it.

Now the usual approach is to fold this into the original patch series, but
I've opted against this because
- this fixes a corner case only very old userspace relies on and
- I'd like to not invalidate all the testing the pwrite rewrite has gotten.

This fixes the regression notice by tests/gem_tiled_partial_prite_pread
from i-g-t. Unfortunately it doesn't fix the issues with partial pwrites to
tiled buffers on bit17 swizzling machines. But that is also broken without
the pwrite patches, so likely a different issue (or a problem with the
testcase).

v2: Simplify the patch by dropping the overly clever partial write
logic for swizzled pages.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:40:57 +02:00
Daniel Vetter f56f821feb mm: extend prefault helpers to fault in more than PAGE_SIZE
drm/i915 wants to read/write more than one page in its fastpath
and hence needs to prefault more than PAGE_SIZE bytes.

Add new functions in filemap.h to make that possible.

Also kill a copy&pasted spurious space in both functions while at it.

v2: As suggested by Andrew Morton, add a multipage parameter to both
functions to avoid the additional branch for the pagemap.c hotpath.
My gcc 4.6 here seems to dtrt and indeed reap these branches where not
needed.

v3: Becaus I couldn't find a way around adding a uaddr += PAGE_SIZE to
the filemap.c hotpaths (that the compiler couldn't remove again),
let's go with separate new functions for the multipage use-case.

v4: Adjust comment to CodingStlye and fix spelling.

Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:36:30 +02:00
Daniel Vetter d174bd6472 drm/i915: extract copy helpers from shmem_pread|pwrite
While moving around things, this two functions slowly grew out of any
sane bounds. So extract a few lines that do the copying and
clflushing. Also add a few comments to explain what's going on.

v2: Again do s/needs_clflush/needs_clflush_after/ in the write paths
as suggested by Chris Wilson.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:30:33 +02:00
Daniel Vetter 117babcdd5 drm/i915: use uncached writes in pwrite
It's around 20% faster.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:29:38 +02:00
Daniel Vetter ffc62976d2 drm/i915: fall back to shmem pwrite when the buffer is not accessible
It's too expensive to move it around just for that pwrite, especially
when we're trashing on the mappable gtt part like crazy.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:29:08 +02:00
Daniel Vetter 586428852a drm/i915: implement inline clflush for pwrite
In micro-benchmarking of the usual pwrite use-pattern of alternating
pwrites with gtt domain reads from the gpu, this yields around 30%
improvement of pwrite throughput across all buffers size. The trick is
that we can avoid clflush cachelines that we will overwrite completely
anyway.

Furthermore for partial pwrites it gives a proportional speedup on top
of the 30% percent because we only clflush back the part of the buffer
we're actually writing.

v2: Simplify the clflush-before-write logic, as suggested by Chris
Wilson.

v3: Finishing touches suggested by Chris Wilson:
- add comment to needs_clflush_before and only set this if the bo is
  uncached.
- s/needs_clflush/needs_clflush_after/ in the write paths to clearly
  differentiate it from needs_clflush_before.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:28:45 +02:00
Daniel Vetter 96d79b5270 drm/i915: don't clobber userspace memory before commiting to the pread
The pagemap.h prefault helpers do the prefaulting by simply writing
some data into every page. Hence we should not prefault when we're not
yet commited to to actually writing data to userspace. The problem is
now that
- we can't prefault while holding dev->struct_mutex for we could
  deadlock with our own pagefault handler
- we need to grab dev->struct_mutex before copying to sync up with any
  outsanding gpu writes.

Therefore only prefault when we're dropping the lock the first time in
the pread slowpath - at that point we're committed to the write, don't
wait on the gpu anymore and hence won't return early (with e.g.
-EINTR).

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:28:32 +02:00
Daniel Vetter 935aaa692e drm/i915: drop gtt slowpath
With the proper prefault, it's extremely unlikely that we fall back
to the gtt slowpath.

So just kill it and use the shmem_pwrite path as fallback.

To further clean up the code, move the preparatory gem calls into the
respective pwrite functions. This way the gtt_fast->shmem fallback
is much more obvious.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:27:21 +02:00
Daniel Vetter 692a576b9d drm/i915: don't call shmem_read_mapping unnecessarily
This speeds up pwrite and pread from ~120 µs ro ~100 µs for
reading/writing 1mb on my snb (if the backing storage pages
are already pinned, of course).

v2: Chris Wilson pointed out a glaring page reference bug - I've
unconditionally dropped the reference. With that fixed (and the
associated reduction of dirt in dmesg) it's now even a notch faster.

v3: Unconditionaly grab a page reference when dropping
dev->struct_mutex to simplify the code-flow.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:27:03 +02:00
Daniel Vetter 3ae5378330 drm/i915: don't use gtt_pwrite on LLC cached objects
~120 µs instead fo ~210 µs to write 1mb on my snb. I like this.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:25:45 +02:00
Daniel Vetter a0356fc373 drm/i915: kill ranged cpu read domain support
No longer needed.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:25:32 +02:00
Daniel Vetter 8489731c9b drm/i915: move clflushing into shmem_pread
This is obviously gonna slow down pread. But for a half-way realistic
micro-benchmark, it doesn't matter: Non-broken userspace reads back
data from the gpu once before the gpu again dirties it.

So all this ranged clflush tracking is just a waste of time.

No pread performance change (neglecting the dumb benchmark of
constantly reading the same data) measured.

As an added bonus, this avoids clflush on read on coherent objects.
Which means that partial preads on snb are now roughly 4x as fast.
This will be usefull for e.g. the libva encoder - when I finally get
around to fix that up.

v2: Properly sync with the gpu on LLC machines.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:20:01 +02:00
Daniel Vetter dbf7bff074 drm/i915: merge shmem_pread slow&fast-path
With the previous rewrite, they've become essential identical.

v2: Simplify the page_do_bit17_swizzling logic as suggested by Chris
Wilson.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:19:11 +02:00
Daniel Vetter e244a443bf drm/i915: merge shmem_pwrite slow&fast-path
With the previous rewrite, they've become essential identical.

v2: Simplify the page_do_bit17_swizzling logic as suggested by Chris
Wilson.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:18:58 +02:00
Chris Wilson dabdfe021a drm/i915: Avoid using mappable space for relocation processing through the CPU
We try to avoid writing the relocations through the uncached GTT, if the
buffer is currently in the CPU write domain and so will be flushed out to
main memory afterwards anyway. Also on SandyBridge we can safely write
to the pages in cacheable memory, so long as the buffer is LLC mapped.
In either of these cases, we therefore do not need to force the
reallocation of the buffer into the mappable region of the GTT, reducing
the aperture pressure.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:16:17 +02:00
Daniel Vetter 644ec02b5d drm/i915: s/i915_gem_do_init/i915_gem_init_global_gtt
... because this is what it actually doesn now that we have the global
gtt vs. ppgtt split.

Also move it to the other global gtt functions in i915_gem_gtt.c

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:14:59 +02:00
Chris Wilson a14917eeb2 drm/i915: Release the mmap offset when purging a buffer
If we discard a buffer due to memory pressure, also release its alloted
mmap address space. As it may be sometime before userspace wakes up
and notices that it has buffers to purge from its cache, we may waste
valuable address space on unusable objects for a period of time.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47738
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-23 11:04:35 +01:00
Ben Widawsky eb2c0c818a drm/i915: [dinq] shut up two instances -Wunitialized
Introduced in commit 8461d226 and 8c59967c

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: s/fix/shut up/ in the commit msg.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-22 17:44:09 +01:00
Daniel Vetter 0ebb982993 drm/i915: enable lazy global-gtt binding
Now that everything is in place, only bind to the global gtt
when actually required. Patch split-up suggested by Chris Wilson.

Reviewed-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-20 21:55:16 +01:00
Daniel Vetter 74898d7edc drm/i915: bind objects to the global gtt only when needed
And track the existence of such a binding similar to the aliasing
ppgtt case. Speeds up binding/unbinding in the common case where we
only need a ppgtt binding (which is accessed in a cpu coherent fashion
by the gpu) and no gloabl gtt binding (which needs uc writes for the
ptes).

This patch just puts the required tracking in place.

v2: Check that global gtt mappings exist in the error_state capture
code (with Chris Wilson's llc reloc patches batchbuffers are no longer
relocated as mappable in all situations, so this matters). Suggested
by Chris Wilson.

v3: Adapted to Chris' latest llc-reloc patches.

v4: Fix a bug in the i915 error state capture code noticed by Chris
Wilson.

Reviewed-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-20 21:52:01 +01:00
Daniel Vetter 741639079c drm/i915: split out dma mapping from global gtt bind/unbind functions
Note that there's a functional change buried in this patch wrt the ilk
dmar workaround: We now only idle the gpu while tearing down the dmar
mappings, not while clearing the gtt. Keeping the current semantics
would have made for some really ugly code and afaik the issue is only
with the dmar unmapping that needs a fully idle gpu.

Reviewed-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-20 21:51:41 +01:00
Chris Wilson c501ae7f33 drm/i915: Only clear the GPU domains upon a successful finish
By clearing the GPU read domains before waiting upon the buffer, we run
the risk of the wait being interrupted and the domains prematurely
cleared. The next time we attempt to wait upon the buffer (after
userspace handles the signal), we believe that the buffer is idle and so
skip the wait.

There are a number of bugs across all generations which show signs of an
overly haste reuse of active buffers.

Such as:

  https://bugs.freedesktop.org/show_bug.cgi?id=29046
  https://bugs.freedesktop.org/show_bug.cgi?id=35863
  https://bugs.freedesktop.org/show_bug.cgi?id=38952
  https://bugs.freedesktop.org/show_bug.cgi?id=40282
  https://bugs.freedesktop.org/show_bug.cgi?id=41098
  https://bugs.freedesktop.org/show_bug.cgi?id=41102
  https://bugs.freedesktop.org/show_bug.cgi?id=41284
  https://bugs.freedesktop.org/show_bug.cgi?id=42141

A couple of those pre-date i915_gem_object_finish_gpu(), so may be
unrelated (such as a wild write from a userspace command buffer), but
this does look like a convincing cause for most of those bugs.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-01 21:36:13 +01:00
Chris Wilson eadb29a9c5 drm/i915: Silence the error message from i915_wait_request()
This error message has since been superseded by the hangcheck, and does
not add any salient information beyond that already printed by hangcheck
discovering the GPU hang that lead to i915_wait_request() bombing out in
the first place.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-02-27 18:08:22 +01:00
Chris Wilson a71d8d9452 drm/i915: Record the tail at each request and use it to estimate the head
By recording the location of every request in the ringbuffer, we know
that in order to retire the request the GPU must have finished reading
it and so the GPU head is now beyond the tail of the request. We can
therefore provide a conservative estimate of where the GPU is reading
from in order to avoid having to read back the ring buffer registers
when polling for space upon starting a new write into the ringbuffer.

A secondary effect is that this allows us to convert
intel_ring_buffer_wait() to use i915_wait_request() and so consolidate
upon the single function to handle the complicated task of waiting upon
the GPU. A necessary precaution is that we need to make that wait
uninterruptible to match the existing conditions as all the callers of
intel_ring_begin() have not been audited to handle ERESTARTSYS
correctly.

By using a conservative estimate for the head, and always processing all
outstanding requests first, we prevent a race condition between using
the estimate and direct reads of I915_RING_HEAD which could result in
the value of the head going backwards, and the tail overflowing once
again. We are also careful to mark any request that we skip over in
order to free space in ring as consumed which provides a
self-consistency check.

Given sufficient abuse, such as a set of unthrottled GPU bound
cairo-traces, avoiding the use of I915_RING_HEAD gives a 10-20% boost on
Sandy Bridge (i5-2520m):
  firefox-paintball  18927ms -> 15646ms: 1.21x speedup
  firefox-fishtank   12563ms -> 11278ms: 1.11x speedup
which is a mild consolation for the performance those traces achieved from
exploiting the buggy autoreported head.

v2: Add a few more comments and make request->tail a conservative
estimate as suggested by Daniel Vetter.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: resolve conflicts with retirement defering and the lack of
the autoreport head removal (that will go in through -fixes).]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-02-15 14:26:03 +01:00
Daniel Vetter 53d227f282 drm/i915: fixup seqno allocation logic for lazy_request
Currently we reserve seqnos only when we emit the request to the ring
(by bumping dev_priv->next_seqno), but start using it much earlier for
ring->oustanding_lazy_request. When 2 threads compete for the gpu and
run on two different rings (e.g. ddx on blitter vs. compositor)
hilarity ensued, especially when we get constantly interrupted while
reserving buffers.

Breakage seems to have been introduced in

commit 6f392d5486
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat Aug 7 11:01:22 2010 +0100

    drm/i915: Use a common seqno for all rings.

This patch fixes up the seqno reservation logic by moving it into
i915_gem_next_request_seqno. The ring->add_request functions now
superflously still return the new seqno through a pointer, that will
be refactored in the next patch.

Note that with this change we now unconditionally allocate a seqno,
even when ->add_request might fail because the rings are full and the
gpu died. But this does not open up a new can of worms because we can
already leave behind an outstanding_request_seqno if e.g. the caller
gets interrupted with a signal while stalling for the gpu in the
eviciton paths. And with the bugfix we only ever have one seqno
allocated per ring (and only that ring), so there are no ordering
issues with multiple outstanding seqnos on the same ring.

v2: Keep i915_gem_get_seqno (but move it to i915_gem.c) to make it
clear that we only have one seqno counter for all rings. Suggested by
Chris Wilson.

v3: As suggested by Chris Wilson use i915_gem_next_request_seqno
instead of ring->oustanding_lazy_request to make the follow-up
refactoring more clearly correct. Also improve the commit message
with issues discussed on irc.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45181
Tested-by: Nicolas Kalkhof nkalkhof()at()web.de
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-02-13 10:55:57 +01:00
Daniel Vetter 5391d0cffe drm/i915: outstanding_lazy_request is a u32
So don't assign it false, that's just confusing ... No functional
change here.

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-02-13 10:55:48 +01:00
Daniel Vetter e21af88d39 drm/i915: enable ppgtt
We want to unconditionally enable ppgtt for two reasons:
- Windows uses this on snb and later.
- We need the basic hw support to work before we can think about real
  per-process address spaces and other cool features we want.

But Chris Wilson was complaining all over irc and intel-gfx that this
will blow up if we don't have a module option to disable it. Hence add
one, to prevent this.

ppgtt support seems to slightly change the timings and make crashy
things slightly more or less crashy. Now in my testing and the testing
this got on troublesome snb machines, it seems to have improved things
only. But on ivb it makes quite a few crashes happen much more often,
see

https://bugs.freedesktop.org/show_bug.cgi?id=41353

Luckily Eugeni Dodonov seems to have a set of workarounds that fix
this issue.

v2: Don't try to enable ppgtt on pre-snb.

v3: Pimp commit message and make Chris Wilson less grumpy by adding a
module option.

v4: New try at making Chris Wilson happy.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-02-09 21:49:30 +01:00
Daniel Vetter 7bddb01fb9 drm/i915: ppgtt binding/unbinding support
This adds support to bind/unbind objects and wires it up. Objects are
only put into the ppgtt when necessary, i.e. at execbuf time.

Objects are still unconditionally put into the global gtt.

v2: Kill the quick hack and explicitly pass cache_level to ppgtt_bind
like for the global gtt function. Noticed by Chris Wilson.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-02-09 21:25:23 +01:00
Daniel Vetter 11782b0233 drm/i915: consolidate swizzling control bit frobbing
On gen5 we also need to correctly set up swizzling in the display
scanout engine, but only there. Consolidate this into the same
function.

This has a small effect on ums setups - the kernel now also sets this
bit in addition to userspace setting it. Given that this code only
runs when userspace either can't (resume, gpu reset) or explicitly
won't(gem_init) touch the hw this shouldn't have an adverse effect.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-02-08 23:18:27 +01:00
Daniel Vetter f691e2f4ce drm/i915: swizzling support for snb/ivb
We have to do this manually. Somebody had a Great Idea.

I've measured speed-ups just a few percent above the noise level
(below 5% for the best case), but no slowdows. Chris Wilson measured
quite a bit more (10-20% above the usual snb variance) on a more
recent and better tuned version of sna, but also recorded a few
slow-downs on benchmarks know for uglier amounts of snb-induced
variance.

v2: Incorporate Ben Widawsky's preliminary review comments and
elaborate a bit about the performance impact in the changelog.

v3: Add a comment as to why we don't need to check the 3rd memory
channel.

v4: Fixup whitespace.

Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-02-08 23:16:24 +01:00
Daniel Vetter 8461d22677 drm/i915: rewrite shmem_pread_slow to use copy_to_user
Like for shmem_pwrite_slow. The only difference is that because we
read data, we can leave the fetched cachelines in the cpu: In the case
that the object isn't in the cpu read domain anymore, the clflush for
the next cpu read domain invalidation will simply drop these
cachelines.

slow_shmem_bit17_copy is now ununsed, so kill it.

With this patch tests/gem_mmap_gtt now actually works.

v2: add __ to copy_to_user_swizzled as suggested by Chris Wilson.

v3: Fixup the swizzling logic, it swizzled the wrong pages.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38115
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-01-30 23:34:34 +01:00
Daniel Vetter 8c59967c44 drm/i915: rewrite shmem_pwrite_slow to use copy_from_user
... instead of get_user_pages, because that fails on non page-backed
user addresses like e.g. a gtt mapping of a bo.

To get there essentially copy the vfs read path into pagecache. We
can't call that right away because we have to take care of bit17
swizzling. To not deadlock with our own pagefault handler we need
to completely drop struct_mutex, reducing the atomicty-guarantees
of our userspace abi. Implications for racing with other gem ioctl:

- execbuf, pwrite, pread: Due to -EFAULT fallback to slow paths there's
  already the risk of the pwrite call not being atomic, no degration.
- read/write access to mmaps: already fully racy, no degration.
- set_tiling: Calling set_tiling while reading/writing is already
  pretty much undefined, now it just got a bit worse. set_tiling is
  only called by libdrm on unused/new bos, so no problem.
- set_domain: When changing to the gtt domain while copying (without any
  read/write access, e.g. for synchronization), we might leave unflushed
  data in the cpu caches. The clflush_object at the end of pwrite_slow
  takes care of this problem.
- truncating of purgeable objects: the shmem_read_mapping_page call could
  reinstate backing storage for truncated objects. The check at the end
  of pwrite_slow takes care of this.

v2:
- add missing intel_gtt_chipset_flush
- add __ to copy_from_user_swizzled as suggest by Chris Wilson.

v3: Fixup bit17 swizzling, it swizzled the wrong pages.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-01-30 23:34:21 +01:00
Daniel Vetter 5c0480f21f drm/i915: fall through pwrite_gtt_slow to the shmem slow path
The gtt_pwrite slowpath grabs the userspace memory with
get_user_pages. This will not work for non-page backed memory, like a
gtt mmapped gem object. Hence fall throuh to the shmem paths if we hit
-EFAULT in the gtt paths.

Now the shmem paths have exactly the same problem, but this way we
only need to rearrange the code in one write path.

v2: v1 accidentaly falls back to shmem pwrite for phys objects. Fixed.

v3: Make the codeflow around phys_pwrite cleara as suggested by Chris
Wilson.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-01-30 23:34:07 +01:00
Chris Wilson 068c6ff1cb drm/i915: Remove the upper limit on the bo size for mapping into the CPU domain
The original intention of comparing the bo against the mappable GTT
limits was to prevent a subsequent faulting of the bo into the GTT from
clearing the entire GTT in vain. However, that was clearly a cut'n'paste
mistake as a CPU mapping never binds the bo into the aperture. Whilst
there may be some merit to limiting the maximum size of the bo to
something that can be utilized by the GPU, that limit itself does not
belong as a safeguard to mmapping the bo, so remove the check entirely.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-01-30 17:54:35 +01:00
Daniel Vetter 39965b3766 drm/i915: don't trash the gtt when running out of fences
With the fence accounting fixed up in the previous commit not finding
enough fences is a fatal error and userspace bug. Trashing the entire
gtt is not gonna turn up that missing fence, so don't to this by
returning another error thatn ENOSPC.

This has the added benefit that it's easier to distinguish fence
accounting errors from gtt space accounting issues.

TTM serves as precendence for the EDEADLK error code - it returns it
when the reservation code needs resources already blocked by the
current reservation.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-01-29 18:24:10 +01:00
Chris Wilson 1690e1eb7a drm/i915: Separate fence pin counting from normal bind pin counting
In order to correctly account for reserving space in the GTT and fences
for a batch buffer, we need to independently track whether the fence is
pinned due to a fenced GPU access in the batch or whether the buffer is
pinned in the aperture. Currently we count the fenced as pinned if the
buffer has already been seen in the execbuffer. This leads to a false
accounting of available fence registers, causing frequent mass evictions.
Worse, if coupled with the change to make i915_gem_object_get_fence()
report EDADLK upon fence starvation, the batchbuffer can fail with only
one fence required...

Fixes intel-gpu-tools/tests/gem_fenced_exec_thrash

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38735
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Tested-by: Paul Neumann <paul104x@yahoo.de>
[danvet: Resolve the functional conflict with Jesse Barnes sprite
patches, acked by Chris Wilson on irc.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-01-29 18:23:37 +01:00
Ben Widawsky b93f9cf14e drm/i915: argument to control retiring behavior
Sometimes it may be the case when we idle the gpu or wait on something
we don't actually want to process the retiring list. This patch allows
callers to choose the behavior.

Reviewed-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-01-26 11:19:19 +01:00
Eugeni Dodonov 3d29b842e5 drm/i915: add a LLC feature flag in device description
LLC is not SNB/IVB-specific, so we should check for it in a more generic
way.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-01-17 20:01:45 +01:00
Eric Anholt e959b5db4a drm/i915: Make the fallback IRQ wait not sleep.
The waits we do here are generally so short that sleeping is a bad
idea unless we have an IRQ to wake us up.  Improves regression test
performance from 18 minutes to 3.5 minutes on gen7, which is now
consistent with the previous generation.

Signed-off-by: Eric Anholt <eric@anholt.net>
Tested-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Keith Packard <keithp@keithp.com>
2012-01-03 09:31:16 -08:00
Eric Anholt 7ea29b13e5 drm/i915: Do the fallback non-IRQ wait in ring throttle, too.
As a workaround for IRQ synchronization issues in the gen7 BLT ring,
we want to turn the two wait functions into polling loops.

Signed-off-by: Eric Anholt <eric@anholt.net>
Tested-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Keith Packard <keithp@keithp.com>
2012-01-03 09:31:14 -08:00
Linus Torvalds ed4a51842a Revert "drm/i915: fix infinite recursion on unbind due to ilk vt-d w/a"
This reverts commit eb1711bb94.

It blows up the i915 seqno tracking, resulting in the

	BUG_ON(seqno == 0);

in i915_wait_request() triggering, which will cause lock-ups.

See for example
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/903010
  https://lkml.org/lkml/2011/12/14/395

Reported-requested-and-tested-by: Dirk Hohndel <dirk@hohndel.org>
Reported-by: Richard Eames <Richard.Eames@flinders.edu.au>
Reported-by: Rocko Requin <rockorequin@hotmail.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Keith Packard <keithp@keithp.com>
Cc: Eric Anholt <eric@anholt.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-16 12:58:39 -08:00
Daniel Vetter eb1711bb94 drm/i915: fix infinite recursion on unbind due to ilk vt-d w/a
The recursion loop goes retire_requests->unbind->gpu_idle->retire_reqeusts.

Every time we go through this we need a
- active object that can be retired
- and there are no other references to that object than the one from
  the active list, so that it gets unbound and freed immediately.
Otherwise the recursion stops. So the recursion is only limited by the
number of objects that fit these requirements sitting in the active list
any time retire_request is called.

Issue exercised by tests/gem_unref_active_buffers from i-g-t.

There's been a decent bikeshed discussion whether it wouldn't be
better to pass around a flag, but imo this is o.k. for such a limited
case that only supports a w/a.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42180

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson>
[ickle- we built better bikesheds, but this keeps the rain off for now]
Tested-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-12-07 10:44:40 +00:00
Rakib Mullick 457eafce61 drm, i915: Fix memory leak in i915_gem_busy_ioctl().
A call to i915_add_request() has been made in function i915_gem_busy_ioctl(). i915_add_request can fail,
so in it's exit path previously allocated memory needs to be freed.

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Reviewed-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-11-17 12:57:45 -08:00
Jesse Barnes 680da876f4 drm/i915: enable cacheable objects on Ivybridge
IVB supports these bits as well.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-11-03 16:17:57 -07:00
Daniel Vetter 4b9de737fa drm/i915: add constants to size fence arrays and fields
In preparation of to support 32 fences on Ivybdrigde.

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-11-03 09:20:37 -07:00
Eric Anholt ff56b0bc84 drm/i915: Fix object refcount leak on mmappable size limit error path.
I've been seeing memory leaks on my system in the form of large
(300-400MB) GEM objects created by now-dead processes laying around
clogging up memory.  I usually notice when it gets to about 1.2GB of
them.  Hopefully this clears up the issue, but I just found this bug
by inspection.

Signed-off-by: Eric Anholt <eric@anholt.net>
Cc: stable@kernel.org
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-11-01 09:15:17 -07:00
Ben Widawsky f372b85463 drm/i915: Remove early exit on i915_gpu_idle
[Description from: Daniel Vetter]
I've just discussed this quickly with Chris on irc and it's probably
best to just kill the list_empty early bailout. gpu_idle isn't a
fastpath, so who cares. One candidate where we emit commands to the ring
without adding anything onto these lists is e.g. pageflip. There are
probably more.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-10-20 15:26:38 -07:00
Daniel Vetter 130c2561de drm/i915: drop KM_USER0 argument to k(un)map_atomic
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-10-20 15:26:37 -07:00
Chris Wilson 8ffc024681 drm/i915: Defend against userspace creating a gem object with size==0
We currently only round up the userspace size to the next page. We
assume that userspace hasn't made a mistake and requested a zero-length
gem object and all through our internal code we then presume that every
object is backed by at least a single page. Fix that oversight and
report EINVAL back to userspace if they try to create a zero length
object.

[danvet: This fixes tests/gem_bad_length]

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-10-20 14:11:19 -07:00
Daniel Vetter 6dacfd2faa drm/i915: simplify swapin/out swizzle checking a bit
Use the helper function already employed by the pwrite/pread
functions.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-10-20 14:11:18 -07:00
Dave Airlie 88ef4e3f4f Merge branch 'drm-intel-next' of git://people.freedesktop.org/~keithp/linux into drm-next
* 'drm-intel-next' of git://people.freedesktop.org/~keithp/linux:
  Drivers: i915: Fix all space related issues.
2011-09-20 09:36:22 +01:00
Akshay Joshi 0206e353a0 Drivers: i915: Fix all space related issues.
Various issues involved with the space character were generating
warnings in the checkpatch.pl file. This patch removes most of those
warnings.

Signed-off-by: Akshay Joshi <me@akshayjoshi.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-09-19 18:01:47 -07:00
Rob Clark b464e9a25c drm/i915: use common functions for mmap offset creation
Signed-off-by: Rob Clark <rob@ti.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-08-30 11:07:00 +01:00
Keith Packard df7976797f Merge branch 'drm-intel-fixes' into drm-intel-next 2011-07-22 13:40:42 -07:00
Keith Packard f0b69efc29 drm/i915: Skip GPU wait for scanout pin while wedged
Failing to pin a scanout buffer will most likely lead to a black
screen, so if the GPU is wedged, then just let the pin happen and hope
that things work out OK.

v2: Just ignore any error from i915_gem_object_wait_rendering, as
suggested by Chris Wilson

Signed-off-by: Keith Packard <keithp@keithp.com>
2011-07-21 20:18:31 -07:00
Chris Wilson e28f871165 drm/i915: Fix unfenced alignment on pre-G33 hardware
Align unfenced buffers on older hardware to the power-of-two object
size.  The docs suggest that it should be possible to align only to a
power-of-two tile height, but using the already computed fence size is
easier and always correct. We also have to make sure that we unbind
misaligned buffers upon tiling changes.

In order to prevent a repetition of this bug, we change the interface
to the alignment computation routines to force the caller to provide
the requested alignment and size of the GTT binding rather than assume
the current values on the object.

Reported-and-tested-by: Sitosfe Wheeler <sitsofe@yahoo.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36326
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-07-18 14:02:06 -07:00
Keith Packard 8eb2c0ee67 Merge branch 'drm-intel-fixes' into drm-intel-next 2011-06-29 10:34:54 -07:00
Ben Widawsky 3e0dc6b01f drm/i915: hangcheck disable parameter
Provide a parameter to disable hanghcheck. This is useful mostly for
developers trying to debug known problems, and probably should not be
touched by normal users.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-06-29 10:32:08 -07:00
Linus Torvalds 0d72c6fcb5 Merge branch 'drm-intel-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/keithp/linux-2.6
* 'drm-intel-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/keithp/linux-2.6:
  drm/i915: Use chipset-specific irq installers
  drm/i915: forcewake fix after reset
  drm/i915: add Ivy Bridge page flip support
  drm/i915: split page flip queueing into per-chipset functions
2011-06-28 11:15:57 -07:00
Keith Packard 6ae77e6b6a Merge branch 'drm-intel-fixes' into drm-intel-next 2011-06-28 10:29:47 -07:00
Chris Wilson f01c22fd59 drm/i915: Use chipset-specific irq installers
Konstantin Belousov pointed out that 4697995b98 replaced the generic
i915_driver_irq_*install() functions with chipset specific routines
accessible only through driver->irq_*install(). So update the sanity
check in i915_request_wait() to match.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-06-28 10:20:06 -07:00
Hugh Dickins e2377fe0b6 drm/i915: use shmem_truncate_range
The interface to ->truncate_range is changing very slightly: once "tmpfs:
take control of its truncate_range" has been applied, this can be applied.
 For now there is only a slight inefficiency while this remains unapplied,
but it will soon become essential for managing shmem's use of swap.

Change i915_gem_object_truncate() to use shmem_truncate_range() directly:
which should also spare i915 later change if we switch from
inode_operations->truncate_range to file_operations->fallocate.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Keith Packard <keithp@keithp.com>
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-27 18:00:14 -07:00
Hugh Dickins 5949eac4d9 drm/i915: use shmem_read_mapping_page
Soon tmpfs will stop supporting ->readpage and read_cache_page_gfp(): once
"tmpfs: add shmem_read_mapping_page_gfp" has been applied, this patch can
be applied to ease the transition.

Make i915_gem_object_get_pages_gtt() use shmem_read_mapping_page_gfp() in
the one place it's needed; elsewhere use shmem_read_mapping_page(), with
the mapping's gfp_mask properly initialized.

Forget about __GFP_COLD: since tmpfs initializes its pages with memset,
asking for a cold page is counter-productive.

Include linux/shmem_fs.h also in drm_gem.c: with shmem_file_setup() now
declared there too, we shall remove the prototype from linux/mm.h later.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Keith Packard <keithp@keithp.com>
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-27 18:00:13 -07:00
Keith Packard b97c3d9c16 drm/i915: i915_gem_object_finish_gtt must always release gtt mmap
Even if the object is no longer in the GTT domain, there may still be
a user space mapping which needs to be released.

Without this fix, render-based text (mostly in firefox) would
occasionally get corrupted when the system was under load.

Signed-off-by: Keith Packard <keithp@keithp.com>
2011-06-24 21:02:59 -07:00
Keith Packard 2cd1176bd9 Merge branch 'drm-intel-fixes' into drm-intel-next 2011-06-21 12:02:57 -07:00
Eric Anholt e92d03bff9 Revert "drm/i915: Kill GTT mappings when moving from GTT domain"
This reverts commit 4a684a4117.
Userland has always been required to set the object's domain to GTT
before using it through a GTT mapping, it's not something that the
kernel is supposed to enforce.  (The pagefault support is so that we
can handle multiple mappings without userland having to pin across
them, not so that userland can use GTT after GPU domains without
telling the kernel).

Fixes 19.2% +/- 0.8% (n=6) performance regression in cairo-gl
firefox-talos-gfx on my T420 latop.

Signed-off-by: Keith Packard <keithp@keithp.com>
2011-06-21 11:11:02 -07:00
Jesper Juhl b65552f06c drm/i915: Don't leak in i915_gem_shmem_pread_slow()
It seems to me that we are leaking 'user_pages' in
drivers/gpu/drm/i915/i915_gem.c::i915_gem_shmem_pread_slow() if
read_cache_page_gfp() fails.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-06-14 11:00:54 +10:00
Eric Anholt a187111207 drm/i915: Use the LLC mode on gen6 for everything but display.
Improves full-screen openarena on my laptop 20.3% +/- 4.0% (n=3)
Improves 800x600 nexuiz on my laptop 12.3% +/- 0.1% (n=3)

We have more room to improve with doing LLC caching for display using
GFDT, and in doing LLC+MLC caching, but this was an easy performance
win and incremental improvement toward those two.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2011-06-09 21:51:22 -07:00
Eric Anholt a7ef0640d9 drm/i915: Use the uncached domain for the display planes
The simplest and common method for ensuring scanout coherency on all
chipsets is to mark the scanout buffers as uncached (and for
userspace to remember to flush the render cache every so often).

We can improve upon this for later generations by marking scanout
objects as GFDT and only flush those cachelines when required. However,
we start simple.

[v2: Move the set to uncached above the clflush.  Otherwise, we'd skip
the clflush and try to scan out data that was still sitting in the
cache.]

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2011-06-09 21:51:20 -07:00
Chris Wilson 2da3b9b940 drm/i915: Combine pinning with setting to the display plane
We need to perform a few operations in order to move the object into the
display plane (where it can be accessed coherently by the display
engine) that are important for future safety to forbid whilst pinned. As a
result, we want to need to perform some of the operations before pinning,
but some are required once we have been bound into the GTT. So combine
the pinning performed by all the callers with set_to_display_plane(), so
this complication is contained within the single function.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2011-06-09 21:51:19 -07:00
Chris Wilson e4ffd173a1 drm/i915: Add an interface to dynamically change the cache level
[anholt v2: Don't forget that when going from cached to uncached, we
haven't been tracking the write domain from the CPU perspective, since
we haven't needed it for GPU coherency.]

[ickle v3: We also need to make sure we relinquish any fences on older
chipsets and clear the GTT for sane domain tracking.]

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2011-06-09 21:51:16 -07:00
Chris Wilson b5ffc9bc38 drm/i915: Introduce i915_gem_object_finish_gtt()
Like its siblings finish_gpu(), this function clears the object from the
GTT domain forcing it to be trigger a domain invalidation should we ever
need to use via the GTT again.

Note that the most important side-effect of finishing the GTT domain
(aside from clearing the tracking read/write domains) is that it imposes
an memory barrier so that all accesses are complete before it returns,
which is important if you intend to be modifying translation tables
shortly afterwards. The second most important side-effect is that it
tears down the GTT mappings forcing a page-fault and invalidation on
next user access to the object.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2011-06-09 21:51:14 -07:00
Chris Wilson a8198eea15 drm/i915: Introduce i915_gem_object_finish_gpu()
... reincarnated from i915_gem_object_flush_gpu(). The semantic
difference is that after calling finish_gpu() the object no longer
resides in any GPU domain, and so will cause the GPU caches to be
invalidated if it is ever used again.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2011-06-09 11:43:47 -07:00
Daniel Vetter c8ebc2b076 drm/915: fix relaxed tiling on gen2: tile height
A tile on gen2 has a size of 2kb, stride of 128 bytes and 16 rows.

Userspace was broken and assumed 8 rows. Chris Wilson noted that the
kernel unfortunately can't reliable check that because libdrm rounds
up the size to the next bucket.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-06-04 10:41:12 -07:00
Chris Wilson c8cbbb8ba9 drm/i915: s/addr & ~PAGE_MASK/offset_in_page(addr)/
Convert our open coded offset_in_page() to the common macro.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-06-04 10:40:42 -07:00
Ying Han 1495f230fa vmscan: change shrinker API by passing shrink_control struct
Change each shrinker's API by consolidating the existing parameters into
shrink_control struct.  This will simplify any further features added w/o
touching each file of shrinker.

[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: fix warning]
[kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API]
[akpm@linux-foundation.org: fix xfs warning]
[akpm@linux-foundation.org: update gfs2]
Signed-off-by: Ying Han <yinghan@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 08:39:26 -07:00
Eric Anholt 25aebfc30b drm/i915: Add support for fence registers on Ivybridge.
The registers are the same as on Sandybridge.  Fixes scrambled display
in X when it does software drawing to the GTT, and scans the results
out as tiled.

Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-05-13 18:12:51 -07:00
Eric Anholt 10ed13e4a5 drm/i915: Use existing function instead of open-coding fence reg clear.
This is once less place to miss a new INTEL_INFO(dev)->gen update now.

Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-05-13 18:12:50 -07:00
Chris Wilson 9c23f7fc4c drm/i915: Do not clflush snooped objects
Rely on the GPU snooping into the CPU cache for appropriately bound
objects on MI_FLUSH. Or perhaps one day we will have a cache-coherent
CPU/GPU package...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-05-10 13:56:44 -07:00
Chris Wilson 93dfb40cd8 drm/i915: Rename agp_type to cache_level
... to clarify just how we use it inside the driver and remove the
confusion of the poorly matching agp_type names. We still need to
translate through agp_type for interface into the fake AGP driver.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-05-10 13:56:43 -07:00
Chris Wilson f6e47884e7 drm/i915: Avoid unmapping pages from a NULL address space
Found by gem_stress.

As we perform retirement from a workqueue, it is possible for us to free
and unbind objects after the last close on the device, and so after the
address space has been torn down and reset to NULL:

BUG: unable to handle kernel NULL pointer dereference at 00000054
IP: [<c1295a20>] mutex_lock+0xf/0x27
*pde = 00000000
Oops: 0002 [#1] SMP
last sysfs file: /sys/module/vt/parameters/default_utf8

Pid: 5, comm: kworker/u:0 Not tainted 2.6.38+ #214
EIP: 0060:[<c1295a20>] EFLAGS: 00010206 CPU: 1
EIP is at mutex_lock+0xf/0x27
EAX: 00000054 EBX: 00000054 ECX: 00000000 EDX: 00012fff
ESI: 00000028 EDI: 00000000 EBP: f706fe20 ESP: f706fe18
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process kworker/u:0 (pid: 5, ti=f706e000 task=f7060d00 task.ti=f706e000)
Stack:
 f5aa3c60 00000000 f706fe74 c107e7df 00000246 dea55380 00000054 f5aa3c60
 f706fe44 00000061 f70b4000 c13fff84 00000008 f706fe54 00000000 00000000
 00012f00 00012fff 00000028 c109e575 f6b36700 00100000 00000000 f706fe90
Call Trace:
 [<c107e7df>] unmap_mapping_range+0x7d/0x1e6
 [<c109e575>] ? mntput_no_expire+0x52/0xb6
 [<c11c12f6>] i915_gem_release_mmap+0x49/0x58
 [<c11c3449>] i915_gem_object_unbind+0x4c/0x125
 [<c11c353f>] i915_gem_free_object_tail+0x1d/0xdb
 [<c11c55a2>] i915_gem_free_object+0x3d/0x41
 [<c11a6be2>] ? drm_gem_object_free+0x0/0x27
 [<c11a6c07>] drm_gem_object_free+0x25/0x27
 [<c113c3ca>] kref_put+0x39/0x42
 [<c11c0a59>] drm_gem_object_unreference+0x16/0x18
 [<c11c0b15>] i915_gem_object_move_to_inactive+0xba/0xbe
 [<c11c0c87>] i915_gem_retire_requests_ring+0x16e/0x1a5
 [<c11c3645>] i915_gem_retire_requests+0x48/0x63
 [<c11c36ac>] i915_gem_retire_work_handler+0x4c/0x117
 [<c10385d1>] process_one_work+0x140/0x21b
 [<c103734c>] ? __need_more_worker+0x13/0x2a
 [<c10373b1>] ? need_to_create_worker+0x1c/0x35
 [<c11c3660>] ? i915_gem_retire_work_handler+0x0/0x117
 [<c1038faf>] worker_thread+0xd4/0x14b
 [<c1038edb>] ? worker_thread+0x0/0x14b
 [<c103be1b>] kthread+0x68/0x6d
 [<c103bdb3>] ? kthread+0x0/0x6d
 [<c12970f6>] kernel_thread_helper+0x6/0x10
Code: 00 e8 98 fe ff ff 5d c3 55 89 e5 3e 8d 74 26 00 ba 01 00 00 00 e8
84 fe ff ff 5d c3 55 89 e5 53 8d 64 24 fc 3e 8d 74 26 00 89 c3 <f0> ff
08 79 05 e8 ab ff ff ff 89 e0 25 00 e0 ff ff 89 43 10 58
EIP: [<c1295a20>] mutex_lock+0xf/0x27 SS:ESP 0068:f706fe18
CR2: 0000000000000054

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Keith Packard <keithp@keithp.com>
2011-03-23 09:17:03 +00:00
Chris Wilson 26e12f8943 drm/i915: Fix use after free within tracepoint
Detected by scripts/coccinelle/free/kfree.cocci.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Keith Packard <keithp@keithp.com>
2011-03-23 09:17:02 +00:00
Chris Wilson 36d527dead drm/i915: Restore missing command flush before interrupt on BLT ring
We always skipped flushing the BLT ring if the request flush did not
include the RENDER domain. However, this neglects that we try to flush
the COMMAND domain after every batch and before the breadcrumb interrupt
(to make sure the batch is indeed completed prior to the interrupt
firing and so insuring CPU coherency). As a result of the missing flush,
incoherency did indeed creep in, most notable when using lots of command
buffers and so potentially rewritting an active command buffer (i.e.
the GPU was still executing from it even though the following interrupt
had already fired and the request/buffer retired).

As all ring->flush routines now have the same preconditions, de-duplicate
and move those checks up into i915_gem_flush_ring().

Fixes gem_linear_blit.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=35284
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Tested-by: mengmeng.meng@intel.com
2011-03-23 09:17:01 +00:00
Chris Wilson ed0291fd16 drm/i915: Fix computation of pitch for dumb bo creator
Cc: Dave Airlie <airlied@linux.ie>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-03-23 09:17:00 +00:00
Chris Wilson 29c5a58728 drm/i915: Fix tiling corruption from pipelined fencing
... even though it was disabled. A mistake in the handling of fence reuse
caused us to skip the vital delay of waiting for the object to finish
rendering before changing the register. This resulted in us changing the
fence register whilst the bo was active and so causing the blits to
complete using the wrong stride or even the wrong tiling. (Visually the
effect is that small blocks of the screen look like they have been
interlaced). The fix is to wait for the GPU to finish using the memory
region pointed to by the fence before changing it.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34584
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[Note for 2.6.38-stable, we need to reintroduce the interruptible passing]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Dave Airlie <airlied@linux.ie>
2011-03-23 09:12:24 +00:00
Herton Ronaldo Krzesinski 09bfa51773 drm/i915: Prevent racy removal of request from client list
When i915_gem_retire_requests_ring calls i915_gem_request_remove_from_client,
the client_list for that request may already be removed in i915_gem_release.
So we may call twice list_del(&request->client_list), resulting in an
oops like this report:

[126167.230394] BUG: unable to handle kernel paging request at 00100104
[126167.230699] IP: [<f8c2ce44>] i915_gem_retire_requests_ring+0xd4/0x240 [i915]
[126167.231042] *pdpt = 00000000314c1001 *pde = 0000000000000000
[126167.231314] Oops: 0002 [#1] SMP
[126167.231471] last sysfs file: /sys/devices/LNXSYSTM:00/device:00/PNP0C0A:00/power_supply/BAT1/current_now
[126167.231901] Modules linked in: snd_seq_dummy nls_utf8 isofs btrfs zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs exportfs reiserfs cryptd aes_i586 aes_generic binfmt_misc vboxnetadp vboxnetflt vboxdrv parport_pc ppdev snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_intel snd_hda_codec snd_hwdep arc4 snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq uvcvideo videodev snd_timer snd_seq_device joydev iwlagn iwlcore mac80211 snd cfg80211 soundcore i915 drm_kms_helper snd_page_alloc psmouse drm serio_raw i2c_algo_bit video lp parport usbhid hid sky2 sdhci_pci ahci sdhci libahci
[126167.232018]
[126167.232018] Pid: 1101, comm: Xorg Not tainted 2.6.38-6-generic-pae #34-Ubuntu Gateway                          MC7833U /
[126167.232018] EIP: 0060:[<f8c2ce44>] EFLAGS: 00213246 CPU: 0
[126167.232018] EIP is at i915_gem_retire_requests_ring+0xd4/0x240 [i915]
[126167.232018] EAX: 00200200 EBX: f1ac25b0 ECX: 00000040 EDX: 00100100
[126167.232018] ESI: f1a2801c EDI: e87fc060 EBP: ef4d7dd8 ESP: ef4d7db0
[126167.232018]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[126167.232018] Process Xorg (pid: 1101, ti=ef4d6000 task=f1ba6500 task.ti=ef4d6000)
[126167.232018] Stack:
[126167.232018]  f1a28000 f1a2809c f1a28094 0058bd97 f1aa2400 f1a2801c 0058bd7b 0058bd85
[126167.232018]  f1a2801c f1a28000 ef4d7e38 f8c2e995 ef4d7e30 ef4d7e60 c14d1ebc f6b3a040
[126167.232018]  f1522cc0 000000db 00000000 f1ba6500 ffffffa1 00000000 00000001 f1a29214
[126167.232018] Call Trace:

Unfortunately the call trace reported was cut, but looking at debug
symbols the crash is at __list_del, when probably list_del is called
twice on the same request->client_list, as the dereferenced value is
LIST_POISON1 + 4, and by looking more at the debug symbols before
list_del call it should have being called by
i915_gem_request_remove_from_client

And as I can see in the code, it seems we indeed have the possibility
to remove a request->client_list twice, which would cause the above,
because we do list_del(&request->client_list) on both
i915_gem_request_remove_from_client and i915_gem_release

As Chris Wilson pointed out, it's indeed the case:
"(...) I had thought that the actual insertion/deletion was serialised
under the struct mutex and the intention of the spinlock was to protect
the unlocked list traversal during throttling. However, I missed that
i915_gem_release() is also called without struct mutex and so we do need
the double check for i915_gem_request_remove_from_client()."

This change does the required check to avoid the duplicate remove of
request->client_list.

Bugzilla: http://bugs.launchpad.net/bugs/733780
Cc: stable@kernel.org # 2.6.38
Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-03-23 06:41:12 +00:00
Dave Airlie 34db18abd3 Merge remote branch 'intel/drm-intel-next' of ../drm-next into drm-core-next
* 'intel/drm-intel-next' of ../drm-next: (755 commits)
  drm/i915: Only wait on a pending flip if we intend to write to the buffer
  drm/i915/dp: Sanity check eDP existence
  drm/i915: Rebind the buffer if its alignment constraints changes with tiling
  drm/i915: Disable GPU semaphores by default
  drm/i915: Do not overflow the MMADDR write FIFO
  Revert "drm/i915: fix corruptions on i8xx due to relaxed fencing"
  drm/i915: Don't save/restore hardware status page address register
  drm/i915: don't store the reg value for HWS_PGA
  drm/i915: fix memory corruption with GM965 and >4GB RAM
  Linux 2.6.38-rc7
  Revert "TPM: Long default timeout fix"
  drm/i915: Re-enable GPU semaphores for SandyBridge mobile
  drm/i915: Replace vblank PM QoS with "Interrupt-Based AGPBUSY#"
  Revert "drm/i915: Use PM QoS to prevent C-State starvation of gen3 GPU"
  drm/i915: Allow relocation deltas outside of target bo
  drm/i915: Silence an innocuous compiler warning for an unused variable
  fs/block_dev.c: fix new kernel-doc warning
  ACPI: Fix build for CONFIG_NET unset
  mm: <asm-generic/pgtable.h> must include <linux/mm_types.h>
  x86: Use u32 instead of long to set reset vector back to 0
  ...

Conflicts:
	drivers/gpu/drm/i915/i915_gem.c
2011-03-14 14:15:13 +10:00
Chris Wilson 47ae63e0c2 Merge branch 'drm-intel-fixes' into drm-intel-next
Apply the trivial conflicting regression fixes, but keep GPU semaphores
enabled.

Conflicts:
	drivers/gpu/drm/i915/i915_drv.h
	drivers/gpu/drm/i915/i915_gem_execbuffer.c
2011-03-07 12:35:15 +00:00
Chris Wilson 467cffba85 drm/i915: Rebind the buffer if its alignment constraints changes with tiling
Early gen3 and gen2 chipset do not have the relaxed per-surface tiling
constraints of the later chipsets, so we need to check that the GTT
alignment is correct for the new tiling. If it is not, we need to
rebind.

Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-03-07 11:02:16 +00:00
Chris Wilson ce453d81cb drm/i915: Use a device flag for non-interruptible phases
The code paths for modesetting are growing in complexity as we may need
to move the buffers around in order to fit the scanout in the aperture.
Therefore we face a choice as to whether to thread the interruptible status
through the entire pinning and unbinding code paths or to add a flag to
the device when we may not be interrupted by a signal. This does the
latter and so fixes a few instances of modesetting failures under stress.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-02-22 15:56:25 +00:00
Chris Wilson c872522663 drm/i915: Protect against drm_gem_object not being the first member
Dave Airlie spotted that we had a potential bug should we ever rearrange
the drm_i915_gem_object so not the base drm_gem_object was not its first
member. He noticed that we often convert the return of
drm_gem_object_lookup() immediately into drm_i915_gem_object and then
check the result for nullity. This is only valid when the base object is
the first member and so the superobject has the same address. Play safe
instead and use the compiler to convert back to the original return
address for sanity testing.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-02-22 15:55:57 +00:00
Chris Wilson bed636abea drm/i915: i915_mutex_interruptible() returns -EINTR
... so we handle that for i915_gem_fault() in the same manner as
ERESTARTSYS, or we send a SIGBUS to the faulting application.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-02-11 20:32:44 +00:00
Chris Wilson 8d7e3de1e0 drm/i915: Skip the no-op domain changes when already in CPU|GTT domains
Removes some superfluous fluff from tracing...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-02-07 15:24:03 +00:00
Chris Wilson db53a30261 drm/i915: Refine tracepoints
A lot of minor tweaks to fix the tracepoints, improve the outputting for
ftrace, and to generally make the tracepoints useful again. It is a start
and enough to begin identifying performance issues and gaps in our
coverage.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-02-07 14:59:18 +00:00
Chris Wilson d9bc7e9f32 drm/i915: Fix infinite loop regression from 21dd3734
By returning EAGAIN upon a wedged GPU before attempting to wait, we
would hit an infinite loop of repeating operation without ever
progressing. Instead this needs to be EIO so that userspace knows that
the GPU is truly wedged and not in the process of error recovery.

Similarly, we need to handle the error recovery during i915_gem_fault.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-02-07 14:33:55 +00:00
Dave Airlie ff72145bad drm: dumb scanout create/mmap for intel/radeon (v3)
This is just an idea that might or might not be a good idea,
it basically adds two ioctls to create a dumb and map a dumb buffer
suitable for scanout. The handle can be passed to the KMS ioctls to create
a framebuffer.

It looks to me like it would be useful in the following cases:
a) in development drivers - we can always provide a shadowfb fallback.
b) libkms users - we can clean up libkms a lot and avoid linking
to libdrm_*.
c) plymouth via libkms is a lot easier.

Userspace bits would be just calls + mmaps. We could probably
mark these handles somehow as not being suitable for acceleartion
so as top stop people who are dumber than dumb.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-02-07 12:16:14 +10:00
Chris Wilson 21dd373486 drm/i915: Defer reporting EIO until we try to use the GPU
Instead of reporting EIO upfront in the entrance of an ioctl that may or
may not attempt to use the GPU, defer the actual detection of an invalid
ioctl to when we issue a GPU instruction. This allows us to continue to
use bo in video memory (via pread/pwrite and mmap) after the GPU has hung.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-27 11:06:07 +00:00
Chris Wilson e110e8d672 drm/i915: Check wedged status before throttling
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-27 11:05:51 +00:00
Chris Wilson 29ee399131 drm/i915: Silence a few -Wunused-but-set-variable
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-25 10:33:11 +00:00
Chris Wilson bee4a186c1 drm/i915,agp/intel: Do not clear stolen entries
We can only utilize the stolen portion of the GTT if we are in sole
charge of the hardware. This is only true if using GEM and KMS,
otherwise VESA continues to access stolen memory.

Reported-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-24 18:26:25 +00:00
Chris Wilson 076e2c0eb8 drm/i915: Fix use of invalid array size for ring->sync_seqno
There are I915_NUM_RINGS-1 inter-ring synchronisation counters, but we
were clearing I915_NUM_RINGS of them. Oops.

Reported-by: Jiri Slaby <jirislaby@gmail.com>
Tested-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-23 12:52:11 +00:00
Chris Wilson 809b63349c drm/i915: If we hit OOM when allocating GTT pages, clear the aperture
Rather than evicting an object at random, which is unlikely to alleviate
the memory pressure sufficient to allow us to continue, zap the entire
aperture. That should give the system long enough to recover and reap
some pages from the evicted objects, forestalling the allocation error
for the new object.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-11 22:55:48 +00:00
Chris Wilson 0a58705b2f drm/i915: Periodically flush the active lists and requests
In order to retire active buffers whilst no client is active, we need to
insert our own flush requests onto the ring.

This is useful for servers that queue up some rendering and then go to
sleep as it allows us to the complete processing of those requests,
potentially making that memory available again much earlier.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-11 22:15:30 +00:00
Chris Wilson 882417851a drm/i915: Propagate error from flushing the ring
... in order to avoid a BUG() and potential unbounded waits.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-11 20:44:50 +00:00
Chris Wilson b72f3acb71 drm/i915: Handle ringbuffer stalls when flushing
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-11 20:43:55 +00:00
Chris Wilson 63256ec534 drm/i915: Enforce write ordering through the GTT
We need to ensure that writes through the GTT land before any
modification to the MMIO registers and so must impose a mandatory write
barrier when flushing the GTT domain. This was revealed by relaxing the
write ordering by experimentally mapping the registers and the GATT as
write-combining.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-11 20:42:53 +00:00
Chris Wilson 72bfa19c8d drm/i915: Allow the application to choose the constant addressing mode
The relative-to-general state default is useless as it means having to
rewrite the streaming kernels for each batch. Relative-to-surface is
more useful, as that stream usually needs to be rewritten for each
batch. And absolute addressing mode, vital if you start streaming
state, is also only available by adjusting the register...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-12-20 09:41:36 +00:00
Chris Wilson b5ba177d8d drm/i915: Poll for seqno completion if IRQ is disabled
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32288
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-12-14 12:19:25 +00:00
Chris Wilson b13c2b96bf drm/i915/ringbuffer: Make IRQ refcnting atomic
In order to enforce the correct memory barriers for irq get/put, we need
to perform the actual counting using atomic operations.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-12-14 11:34:46 +00:00
Chris Wilson 1a1c69762a Merge branch 'drm-intel-fixes' into drm-intel-next
Conflicts:
	drivers/gpu/drm/i915/i915_gem.c
	drivers/gpu/drm/i915/intel_dp.c
2010-12-07 23:02:08 +00:00
Chris Wilson 7a1948768c drm/i915: Emit a request to clear a flushed and idle ring for unbusy bo
In order for bos to retire eventually, a request must be sent down the
ring. This is expected, for example, by occlusion queries for which mesa
will wait upon (whilst running glean) before issuing more batches and so
the normal activity upon the ring is suspended and we need to emit a
request to clear the idle ring.

Reported-by: Jinjin, Wang <jinjin.wang@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=30380
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-12-07 10:59:14 +00:00
Chris Wilson 0be732841f drm/i915: Wait for the bo if a display flip is pipelined on the other ring
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-12-06 14:37:27 +00:00
Chris Wilson 0ac74c6b33 drm/i915: Only emit a flush if there is an outstanding gpu write
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-12-06 14:36:02 +00:00