Commit Graph

1151 Commits

Author SHA1 Message Date
Chris Wilson 8d3afd7d0e drm/i915: Use spinlocks for checking when to waitboost
In commit 1854d5ca0d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Apr 7 16:20:32 2015 +0100

    drm/i915: Deminish contribution of wait-boosting from clients

we removed an atomic timer based check for allowing waitboosting and
moved it below the mutex taken during RPS. However, that mutex can be
held for long periods of time on Vallyview/Cherryview as communication
with the PCU is slow. As clients may frequently wait for results (e.g.
such as tranform feedback) we introduced contention between the client
and the RPS worker. We can take advantage of the RPS worker, by
switching the wait boost decision to use spin locks and defer the
actual reclocking to the worker.

Fixes a regression of up to 45% on Baytrail and Baswell!

v2 (Daniel):
- Use max_freq_softlimit instead of the not-yet-merged boost
  frequency.
- Don't inject a fake irq into the boost work, instead treat
  client_boost as just another legit waker.

v3: Drop the now unused mask (Chris).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90112
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-05-26 19:16:12 +02:00
Chris Wilson d0bc54f2f0 drm/i915: Introduce DRM_I915_THROTTLE_JIFFIES
As Daniel commented on

commit b7ffe1362c5f468b853223acc9268804aa92afc8
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Apr 27 13:41:24 2015 +0100

    drm/i915: Free RPS boosts for all laggards

it is better to be explicit when sharing hardcoded values such as
throttle/boost timeouts. Make it so!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-05-22 08:59:52 +02:00
Chris Wilson 9a0c1e2770 drm/i915: Use the correct destructor for freeing requests on error
After allocating from the slab cache, we then need to free the request
back into the slab cache upon error (and not call kfree as that leads
to eventual memory corruption).

Fixes regression from
commit efab6d8dd1
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Apr 7 16:20:57 2015 +0100

    drm/i915: Use a separate slab for requests

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-05-22 08:53:48 +02:00
Chris Wilson e61b995841 drm/i915: Free RPS boosts for all laggards
If the client stalls on a congested request, chosen to be 20ms old to
match throttling, allow the client a free RPS boost.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: s/rq/req/]
[danvet: s/0/NULL/ reported by 0-day build]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-05-21 22:50:05 +02:00
Chris Wilson 2e1b873072 drm/i915: Convert RPS tracking to a intel_rps_client struct
Now that we have internal clients, rather than faking a whole
drm_i915_file_private just for tracking RPS boosts, create a new struct
intel_rps_client and pass it along when waiting.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: s/rq/req/]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-05-21 15:11:44 +02:00
Chris Wilson a6f766f397 drm/i915: Limit ring synchronisation (sw sempahores) RPS boosts
Ring switches can occur many times per frame, and are often out of
control, causing frequent RPS boosting for no practical benefit. Treat
the sw semaphore synchronisation as a separate client and only allow it
to boost once per busy/idle cycle.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: s/rq/req/]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-05-21 15:11:43 +02:00
Chris Wilson b47161858b drm/i915: Implement inter-engine read-read optimisations
Currently, we only track the last request globally across all engines.
This prevents us from issuing concurrent read requests on e.g. the RCS
and BCS engines (or more likely the render and media engines). Without
semaphores, we incur costly stalls as we synchronise between rings -
greatly impacting the current performance of Broadwell versus Haswell in
certain workloads (like video decode). With the introduction of
reference counted requests, it is much easier to track the last request
per ring, as well as the last global write request so that we can
optimise inter-engine read read requests (as well as better optimise
certain CPU waits).

v2: Fix inverted readonly condition for nonblocking waits.
v3: Handle non-continguous engine array after waits
v4: Rebase, tidy, rewrite ring list debugging
v5: Use obj->active as a bitfield, it looks cool
v6: Micro-optimise, mostly involving moving code around
v7: Fix retire-requests-upto for execlists (and multiple rq->ringbuf)
v8: Rebase
v9: Refactor i915_gem_object_sync() to allow the compiler to better
optimise it.

Benchmark: igt/gem_read_read_speed
hsw:gt3e (with semaphores):
Before: Time to read-read 1024k:		275.794µs
After:  Time to read-read 1024k:		123.260µs

hsw:gt3e (w/o semaphores):
Before: Time to read-read 1024k:		230.433µs
After:  Time to read-read 1024k:		124.593µs

bdw-u (w/o semaphores):             Before          After
Time to read-read 1x1:            26.274µs       10.350µs
Time to read-read 128x128:        40.097µs       21.366µs
Time to read-read 256x256:        77.087µs       42.608µs
Time to read-read 512x512:       281.999µs      181.155µs
Time to read-read 1024x1024:    1196.141µs     1118.223µs
Time to read-read 2048x2048:    5639.072µs     5225.837µs
Time to read-read 4096x4096:   22401.662µs    21137.067µs
Time to read-read 8192x8192:   89617.735µs    85637.681µs

Testcase: igt/gem_concurrent_blit (read-read and friends)
Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [v8]
[danvet: s/\<rq\>/req/g]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-05-21 15:11:42 +02:00
Daniel Vetter eed29a5b21 drm/i915: s/\<rq\>/req/g
The merged seqno->request conversion from John called request
variables req, but some (not all) of Chris' recent patches changed
those to just rq. We've had a lenghty (and inconclusive) discussion on
irc which is the more meaningful name with maybe at most a slight bias
towards req.

Given that the "don't change names without good reason to avoid
conflicts" rule applies, so lets go back to a req everywhere for
consistency. I'll sed any patches for which this will cause conflicts
before applying.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Harrison <John.C.Harrison@Intel.com>
[danvet: s/origina/merged/ as pointed out by Chris - the first
mass-conversion patch was from Chris, the merged one from John.]
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2015-05-21 15:10:48 +02:00
Chris Wilson 2e2f351dbf drm/i915: Remove domain flubbing from i915_gem_object_finish_gpu()
We no longer interpolate domains in the same manner, and even if we did,
we should trust setting either of the other write domains would trigger
an invalidation rather than force it. Remove the tweaking of the
read_domains since it serves no purpose and use
i915_gem_object_wait_rendering() directly.

Note that this goes back to

commit a8198eea15
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Apr 13 22:04:09 2011 +0100

    drm/i915: Introduce i915_gem_object_finish_gpu()

and gpu domain tracking died in

commit cc889e0f6c
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Jun 13 20:45:19 2012 +0200

    drm/i915: disable flushing_list/gpu_write_list

which is more than 1 year older.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Add notes with information dug out of git history.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-05-20 11:25:45 +02:00
Daniel Vetter 5e024f31be drm/i915: Remove unused variable from i915_gem_mmap_gtt
Lost in

commit c5ad54cf7d
Author: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Date:   Wed May 6 14:36:09 2015 +0300

    drm/i915: Use partial view in mmap fault handler

Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2015-05-20 11:25:38 +02:00
Joonas Lahtinen e7ded2d746 drm/i915: Reject huge tiled objects
We do not yet support tiled objects bigger than the mappable
aperture size so reject them.

Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
[danvet: Rework the check a bit to avoid warnings.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-05-08 17:25:29 +02:00
Joonas Lahtinen c5ad54cf7d drm/i915: Use partial view in mmap fault handler
Use partial view for huge BOs (bigger than half the mappable aperture)
in fault handler so that they can be accessed withough trying to make
room for them by evicting other objects.

v2:
- Only use partial views in the case where early rejection was
  previously done.
- Account variable type changes from previous reroll.
v3:
- Add a comment about overwriting existing page entries.
  (Tvrtko Ursulin)
- Whitespace fixes.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-05-08 13:04:18 +02:00
Joonas Lahtinen a6631ae1fd drm/i915: Consider object pinned if any VMA is pinned
Do not skip special GGTT views when considering whether an object
is pinned or not.

Wrong behaviour was introduced in;

commit ec7adb6ee7
Author: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Date:   Mon Mar 16 14:11:13 2015 +0200

    drm/i915: Do not use ggtt_view with (aliasing) PPGTT

Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-05-08 13:04:17 +02:00
Joonas Lahtinen 91e6711e30 drm/i915: Do not make assumptions on GGTT VMA sizes
GGTT VMA sizes might be smaller than the whole object size due to
different GGTT views.

v2:
- Separate GGTT view constraint calculations from normal view
  constraint calculations (Chris Wilson)
v3:
- Do not bother with debug wording. (Tvrtko Ursulin)
v4:
- Clearer logic for calculating map_and_fenceable (Tvrtko Ursulin)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
[danvet: Drop BUG_ON, it's redudant.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-05-08 13:04:17 +02:00
Chris Wilson 432be69d9d drm/i915: Remove locking for get-caching query
Reading a single value from the object, the locking only provides futile
protection against userspace races. The locking is useless so remove it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-05-08 13:04:12 +02:00
Mika Kuoppala 5e562f1ddd drm/i915: Clear vma->bound on unbinding
Unbinding doesn't always lead to unconditional destruction
of vma. This destruction avoidance happens if vma is part of
execbuffer relocation list or if vma is being considered for
eviction in i915_gem_evict_something().

For those other users, mark the vma unbound so that
the correct state of this vma is preserved.

Reported-by: Chris Wilson <chris@chris-wilson.co.ok>
Cc: Chris Wilson <chris@chris-wilson.co.ok>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-05-08 13:03:29 +02:00
Dave Airlie e1dee1973c Merge tag 'drm-intel-next-2015-04-23-fixed' of git://anongit.freedesktop.org/drm-intel into drm-next
drm-intel-next-2015-04-23:
- dither support for ns2501 dvo (Thomas Richter)
- some polish for the gtt code and fixes to finally enable the cmd parser on hsw
- first pile of bxt stage 1 enabling (too many different people to list ...)
- more psr fixes from Rodrigo
- skl rotation support from Chandra
- more atomic work from Ander and Matt
- pile of cleanups and micro-ops for execlist from Chris
drm-intel-next-2015-04-10:
- cdclk handling cleanup and fixes from Ville
- more prep patches for olr removal from John Harrison
- gmbus pin naming rework from Jani (prep for bxt)
- remove ->new_config from Ander (more atomic conversion work)
- rps (boost) tuning and unification with byt/bsw from Chris
- cmd parser batch bool tuning from Chris
- gen8 dynamic pte allocation (Michel Thierry, based on work from Ben Widawsky)
- execlist tuning (not yet all of it) from Chris
- add drm_plane_from_index (Chandra)
- various small things all over

* tag 'drm-intel-next-2015-04-23-fixed' of git://anongit.freedesktop.org/drm-intel: (204 commits)
  drm/i915/gtt: Allocate va range only if vma is not bound
  drm/i915: Enable cmd parser to do secure batch promotion for aliasing ppgtt
  drm/i915: fix intel_prepare_ddi
  drm/i915: factor out ddi_get_encoder_port
  drm/i915/hdmi: check port in ibx_infoframe_enabled
  drm/i915/hdmi: fix vlv infoframe port check
  drm/i915: Silence compiler warning in dvo
  drm/i915: Update DRIVER_DATE to 20150423
  drm/i915: Enable dithering on NatSemi DVO2501 for Fujitsu S6010
  rm/i915: Move i915_get_ggtt_vma_pages into ggtt_bind_vma
  drm/i915: Don't try to outsmart gcc in i915_gem_gtt.c
  drm/i915: Unduplicate i915_ggtt_unbind/bind_vma
  drm/i915: Move ppgtt_bind/unbind around
  drm/i915: move i915_gem_restore_gtt_mappings around
  drm/i915: Fix up the vma aliasing ppgtt binding
  drm/i915: Remove misleading comment around bind_to_vm
  drm/i915: Don't use atomics for pg_dirty_rings
  drm/i915: Don't look at pg_dirty_rings for aliasing ppgtt
  drm/i915/skl: Support Y tiling in MMIO flips
  drm/i915: Fixup kerneldoc for struct intel_context
  ...

Conflicts:
	drivers/gpu/drm/i915/i915_drv.c
2015-05-08 20:51:06 +10:00
Michel Thierry 53292cdb06 drm/i915: Workaround to avoid lite restore with HEAD==TAIL
WaIdleLiteRestore is an execlists-only workaround, and requires the driver
to ensure that any context always has HEAD!=TAIL when attempting lite
restore.

Add two extra MI_NOOP instructions at the end of each request, but keep
the requests tail pointing before the MI_NOOPs. We may not need to
executed them, and this is why request->tail is sampled before adding
these extra instructions.

If we submit a context to the ELSP which has previously been submitted,
move the tail pointer past the MI_NOOPs. This ensures HEAD!=TAIL.

v2: Move overallocation to gen8_emit_request, and added note about
sampling request->tail in commit message (Chris).

v3: Remove redundant request->tail assignment in __i915_add_request, in
lrc mode this is already set in execlists_context_queue.
Do not add wa implementation details inside gem (Chris).

v4: Apply the wa whenever the req has been resubmitted and update
comment (Chris).

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2015-04-23 23:56:52 +03:00
Daniel Vetter 0875546c53 drm/i915: Fix up the vma aliasing ppgtt binding
Currently we have the problem that the decision whether ptes need to
be (re)written is splattered all over the codebase. Move all that into
i915_vma_bind. This needs a few changes:
- Just reuse the PIN_* flags for i915_vma_bind and do the conversion
  to vma->bound in there to avoid duplicating the conversion code all
  over.
- We need to make binding for EXECBUF (i.e. pick aliasing ppgtt if
  around) explicit, add PIN_USER for that.
- Two callers want to update ptes, give them a PIN_UPDATE for that.

Of course we still want to avoid double-binding, but that should be
taken care of:
- A ppgtt vma will only ever see PIN_USER, so no issue with
  double-binding.
- A ggtt vma with aliasing ppgtt needs both types of binding, and we
  track that properly now.
- A ggtt vma without aliasing ppgtt could be bound twice. In the
  lower-level ->bind_vma functions hence unconditionally set
  GLOBAL_BIND when writing the ggtt ptes.

There's still a bit room for cleanup, but that's for follow-up
patches.

v2: Fixup fumbles.

v3: s/PIN_EXECBUF/PIN_USER/ for clearer meaning, suggested by Chris.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2015-04-23 21:06:39 +02:00
Daniel Vetter cd102a687b drm/i915: Remove misleading comment around bind_to_vm
It's true that we might need to context switch, but both the signalling
and implementation of the same are a few source files away. Remove it.

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-23 21:06:03 +02:00
Daniel Vetter 777dc5bb26 drm/i915: Move vma vfuns to adddress_space
They change with the address space and not with each vma, so move them
into the right pile of vfuncs. Save 2 pointers per vma and clarifies
the code.

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2015-04-20 08:54:29 -07:00
Tvrtko Ursulin 8a0c39b162 drm/i915: Simplify and fix object to display tracking
Purpose of this tracking is to know when to flush the cache between
the CPU and the non-coherent display engine. Prior to:

   commit 121920faf2
   Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
   Date:   Mon Mar 23 11:10:37 2015 +0000

       drm/i915/skl: Query display address through a wrapper

This worked by a mix of direct flag manipulation and checking for
existence of a pinned GGTT VMA.

With the introduction of rotated display mappings this approach is
no longer correct.

New simpler approach is to just keep this count over calls which pin
and unpin objects to and from display, at the slight cost of extra
space in every bo.

(Inspired and extracted code from a larger rework by Chris Wilson.)

v2: Remove the limit since it is not well defined. (Chris Wilson, Ville Syrjälä)
v3: Commit message corrections. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-20 08:51:45 -07:00
Tvrtko Ursulin 5678ad7367 drm/i915: Fix view type in warning message
One month passed between posting a patch and it getting merged, and
unfortunately even though it still applies, it needs fixing to account
for changes in function parameters since:

   commit d385612e15b8b6eb3db328d83f1872ef8a381788
   Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
   Date:   Tue Mar 17 14:45:29 2015 +0000

       drm/i915: Log view type when printing warnings

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
[danvet: Squash in fixup from Tvrtko to fix the rebase conflict.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-16 11:20:07 +02:00
Chris Wilson 30154650b8 drm/i915: Remove obj->pin_mappable
The obj->pin_mappable flag only exists for debug purposes and is a
hindrance that is mistreated with rotated GGTT views. For debug
purposes, it suffices to mark objects with pin_display as being of note.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-13 14:25:36 +02:00
Chris Wilson 2def4ad99b drm/i915: Optimistically spin for the request completion
This provides a nice boost to mesa in swap bound scenarios (as mesa
throttles itself to the previous frame and given the scenario that will
complete shortly). It will also provide a good boost to systems running
with semaphores disabled and so frequently waiting on the GPU as it
switches rings. In the most favourable of microbenchmarks, this can
increase performance by around 15% - though in practice improvements
will be marginal and rarely noticeable.

v2: Account for user timeouts
v3: Limit the spinning to a single jiffie (~1us) at most. On an
otherwise idle system, there is no scheduler contention and so without a
limit we would spin until the GPU is ready.
v4: Drop forcewake - the lazy coherent access doesn't require it, and we
have no reason to believe that the forcewake itself improves seqno
coherency - it only adds delay.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-13 14:24:36 +02:00
Mika Kuoppala 1d335d1b62 drm/i915: Move vm page allocation in proper place
Move to i915_vma_bind as it is part of the binding.

Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-10 16:18:02 +02:00
Chris Wilson d7b9ca2f7a drm/i915: Remove request->uniq
We already assign a unique identifier to every request: seqno. That
someone felt like adding a second one without even mentioning why and
tweaking ABI smells very fishy.

Fixes regression from
commit b3a38998f0
Author: Nick Hoath <nicholas.hoath@intel.com>
Date:   Thu Feb 19 16:30:47 2015 +0000

    drm/i915: Fix a use after free, and unbalanced refcounting

v2: Rebase

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Nick Hoath <nicholas.hoath@intel.com>
Cc: Thomas Daniel <thomas.daniel@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Jani Nikula <jani.nikula@intel.com>
[danvet: Fixup because different merge order.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-10 10:41:11 +02:00
Chris Wilson 423795cbac drm/i915: Prefer to check for idleness in worker rather than sync-flush
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-10 10:37:31 +02:00
Chris Wilson e20d2ab741 drm/i915: Use a separate slab for vmas
vma are more frequently allocated than objects and so should equally
benefit from having a dedicated slab.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-10 10:17:13 +02:00
Chris Wilson efab6d8dd1 drm/i915: Use a separate slab for requests
requests are even more frequently allocated than objects and equally
benefit from having a dedicated slab.

v2: Rebase

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-10 10:17:07 +02:00
Chris Wilson 4bb1bedb28 drm/i915: Use the global runtime-pm wakelock for a busy GPU for execlists
When we submit a request to the GPU, we first take the rpm wakelock, and
only release it once the GPU has been idle for a small period of time
after all requests have been complete. This means that we are sure no
new interrupt can arrive whilst we do not hold the rpm wakelock and so
can drop the individual get/put around every single request inside
execlists.

Note: to close one potential issue we should mark the GPU as busy
earlier in __i915_add_request.

To elaborate: The issue is that we emit the irq signalling sequence
before we grab the rpm reference, which means we could miss the
resulting interrupt (since that's not set up when suspended). The only
bad side effect is a missed interrupt, gt mmio writes automatically
wake up the hw itself. But otoh we have an umbrella rpm reference for
the entirety of execbuf, as long as that's there we're covered.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Explain a bit more about the add_request issue, which after
some irc chatting with Chris turns out to not be an issue really.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-10 08:56:14 +02:00
Chris Wilson 8d9d5744c6 drm/i915: Split batch pool into size buckets
Now with the trimmed memcpy before the command parser, we try to
allocate many different sizes of batches, predominantly one or two
pages. We can therefore speed up searching for a good sized batch by
keeping the objects of buckets of roughly the same size.

v2: Add a comment about bucket sizes

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-10 08:56:05 +02:00
Chris Wilson 35c94185c5 drm/i915: Free batch pool when idle
At runtime, this helps ensure that the batch pools are kept trim and
fast. Then at suspend, this releases memory that we do not need to
restore. It also ties into the oom-notifier to ensure that we recover as
much kernel memory as possible during OOM.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-10 08:56:05 +02:00
Chris Wilson 06fbca713e drm/i915: Split the batch pool by engine
I woke up one morning and found 50k objects sitting in the batch pool
and every search seemed to iterate the entire list... Painting the
screen in oils would provide a more fluid display.

One issue with the current design is that we only check for retirements
on the current ring when preparing to submit a new batch. This means
that we can have thousands of "active" batches on another ring that we
have to walk over. The simplest way to avoid that is to split the pools
per ring and then our LRU execution ordering will also ensure that the
inactive buffers remain at the front.

v2: execlists still requires duplicate code.
v3: execlists requires more duplicate code

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by:  Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-10 08:56:04 +02:00
Chris Wilson 7c27f52509 drm/i915: Re-enable RPS wait-boosting for all engines
This reverts commit ec5cc0f9b0
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Jun 12 10:28:55 2014 +0100

    drm/i915: Restrict GPU boost to the RCS engine

The premise that media/blitter workloads are not affected by boosting is
patently false with a trip through igt. The question that remains is
what exactly is going wrong with the media workload that prompted this?
Hopefully that would be fixed by the missing agressive downclocking, in
addition to the extra restrictions imposed on how frequent a process is
allowed to boost.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Deepak S <deepak.s@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll>
Acked-by: Deepak S <deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-10 08:56:03 +02:00
Chris Wilson 1854d5ca0d drm/i915: Deminish contribution of wait-boosting from clients
With boosting for missed pageflips, we have a much stronger indication
of when we need to (temporarily) boost GPU frequency to ensure smooth
delivery of frames. So now only allow each client to perform one RPS boost
in each period of GPU activity due to stalling on results.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Deepak S <deepak.s@linux.intel.com>
Reviewed-by: Deepak S <deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-10 08:56:02 +02:00
Chris Wilson ee286370d6 drm/i915: Cache last obj->pages location for i915_gem_object_get_page()
The biggest user of i915_gem_object_get_page() is the relocation
processing during execbuffer. Typically userspace passes in a set of
relocations in sorted order. Sadly, we alternate between relocations
increasing from the start of the buffers, and relocations decreasing
from the end. However the majority of consecutive lookups will still be
in the same page. We could cache the start of the last sg chain, however
for most callers, the entire sgl is inside a single chain and so we see
no improve from the extra layer of caching.

v2: Avoid the double increment inside unlikely()

References: https://bugs.freedesktop.org/show_bug.cgi?id=88308
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-10 08:56:00 +02:00
John Harrison 6689cb2b62 drm/i915: Move common request allocation code into a common function
The request allocation code is largely duplicated between legacy mode and
execlist mode. The actual difference between the two versions of the code is
pretty minimal.

This patch moves the common code out into a separate function. This is then
called by the execution specific version prior to setting up the one different
value.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-01 07:54:30 +02:00
John Harrison f3dc74c0e1 drm/i915: Rename 'do_execbuf' to 'execbuf_submit'
The submission portion of the execbuffer code path was abstracted into a
function pointer indirection as part of the legacy vs execlist work. The two
implementation functions are called 'i915_gem_ringbuffer_submission' and
'intel_execlists_submission' but the pointer was called 'do_execbuf'. There is
already a 'i915_gem_do_execbuffer' function (which is what calls the pointer
indirection). The name of the pointer is therefore considered to be backwards
and should be changed.

This patch renames it to 'execbuf_submit' which is hopefully a bit clearer.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-01 07:53:41 +02:00
Chris Wilson 41037f9f0e drm/i915: Add i915_gem_request_unreference__unlocked
We were missing a convenience stub to aquire the right mutex whilst
dropping the request, so add it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-03-30 16:39:30 +02:00
Daniel Vetter 6e0aa8018f Linux 4.0-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVGHwjAAoJEHm+PkMAQRiG8rcIAJ6cEJ6mbqLpyz5XrGf4yNp0
 +wG/QlEpT8rgrxe9wSjB3lfW3kR2Pe69b9fVVCdiklygdkmva5vfmDrVGGzYfe3M
 QrFSSlMVBplvh6IiM/L1mVMtr3DSmCO23YZZ9R5b7FoEYatNHRpNWBCBpuXpd4aD
 sLuIvO3L/S7LqeOAFkkYWv6AuL9umicmjR8u+nsmCSRJom7At/aJ6R66WIp9vxho
 Rn7r6wcUk6B2Q/gYNjdSE8SIwdyKhuBGyvqQ9U9s6Btg9DQfM/b0vG5kw9hqeAq/
 9445jqVDP1whA2vz6GjnvltidxrqRvuDPBwzOnFmY5U+KZz4lS3x2mnWAAJ3xWs=
 =TqVJ
 -----END PGP SIGNATURE-----

Merge tag 'v4.0-rc6' into drm-intel-next

Backmerge Linux 4.0-rc6 because conflicts are (again) getting out of
hand. To make sure we don't lose any bugfixes from the 4.0-rc5-rc6
flurry of patches we've applied them all to -next too.

Conflicts:
	drivers/gpu/drm/i915/intel_display.c

Always take the version from -next, we've already handled all
conflicts with explicit cherrypicking.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2015-03-30 16:37:08 +02:00
Joonas Lahtinen 9abc464854 drm/i915: Compare GGTT view structs instead of types
To allow for views where the view type is not defined by the view type only,
like it is in stereo or rotated 90 degree view, change the semantic to require
the whole view structure for comparison when we match a GGTT view.

This allows including parameters like offset to be included in the view which
is useful for eg. partial views.

v3:
- Rely on ggtt_view type being 0 for non-GGTT vma's, which equals to
  I915_GGTT_VIEW_NORMAL. (Daniel Vetter)
- Do not use potentially slower comparison when we only want to know if
  something is or is not a normal view.
- Rebase on top of rotated view patches. Add rotated view singleton.
- If one view is missing in comparison they're equal only if both are missing.

v4:
- Use comparison helper in obj_to_ggtt_view too. (Tvrtko Ursulin)
- Do WARN_ON if one view is NULL. (Tvrtko Ursulin)

Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-03-27 15:05:22 +01:00
Michel Thierry 72744cb13c drm/i915: Add dynamic page trace events
Traces for page directories and tables allocation and map.

v2: Removed references to teardown.
v3: bitmap_scnprintf has been deprecated.
v4: Replace bitmap_scnprintf with scnprintf correctly, and get right
range lengths. (Mika)

Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-03-27 09:25:44 +01:00
Chris Wilson 832a3aad1e drm/i915: Keep ring->active_list and ring->requests_list consistent
If we retire requests last, we may use a later seqno and so clear
the requests lists without clearing the active list, leading to
confusion. Hence we should retire requests first for consistency with
the early return. The order used to be important as the lifecycle for
the object on the active list was determined by request->seqno. However,
the requests themselves are now reference counted removing the
constraint from the order of retirement.

Fixes regression from

commit 1b5a433a4d
Author: John Harrison <John.C.Harrison@Intel.com>
Date:   Mon Nov 24 18:49:42 2014 +0000

    drm/i915: Convert 'i915_seqno_passed' calls into 'i915_gem_request_completed
'

and a

	WARNING: CPU: 0 PID: 1383 at drivers/gpu/drm/i915/i915_gem_evict.c:279 i915_gem_evict_vm+0x10c/0x140()
	WARN_ON(!list_empty(&vm->active_list))

Identified by updating WATCH_LISTS:

	[drm:i915_verify_lists] *ERROR* blitter ring: active list not empty, but no requests
	WARNING: CPU: 0 PID: 681 at drivers/gpu/drm/i915/i915_gem.c:2751 i915_gem_retire_requests_ring+0x149/0x230()
	WARN_ON(i915_verify_lists(ring->dev))

Note that this is only a problem in evict_vm where the following happens
after a retire_request has cleaned out all requests, but not all active
bo:
- intel_ring_idle called from i915_gpu_idle notices that no requests are
  outstanding and immediately returns.
- i915_gem_retire_requests_ring called from i915_gem_retire_requests also
  immediately returns when there's no request, still leaving the bo on the
  active list.
- evict_vm hits the WARN_ON(!list_empty(&vm->active_list)) after evicting
  all active objects that there's still stuff left that shouldn't be
  there.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2015-03-26 11:05:54 +02:00
Tvrtko Ursulin 50470bb011 drm/i915/skl: Support secondary (rotated) frame buffer mapping
90/270 rotated scanout needs a rotated GTT view of the framebuffer.

This is put in a separate VMA with a dedicated ggtt view and wired such that
it is created when a framebuffer is pinned to a 90/270 rotated plane.

Rotation is only possible with Yb/Yf buffers and error is propagated to
user space in case of a mismatch.

Special rotated page view is constructed at the VMA creation time by
borrowing the DMA addresses from obj->pages.

v2:
    * Do not bother with pages for rotated sg list, just populate the DMA
      addresses. (Daniel Vetter)
    * Checkpatch cleanup.

v3:
    * Rebased on top of new plane handling (create rotated mapping when
      setting the rotation property).
    * Unpin rotated VMA on unpinning from display plane.
    * Simplify rotation check using bitwise AND. (Chris Wilson)

v4:
    * Fix unpinning of optional rotated mapping so it is really considered
      to be optional.

v5:
   * Rebased for fb modifier changes.
   * Rebased for atomic commit.
   * Only pin needed view for display. (Ville Syrjälä, Daniel Vetter)

v6:
   * Rebased after preparatory work has been extracted out. (Daniel Vetter)

v7:
   * Slightly simplified tiling geometry calculation.
   * Moved rotated GGTT view implementation into i915_gem_gtt.c (Daniel Vetter)

v8:
   * Do not use i915_gem_obj_size to get object size since that actually
     returns the size of an VMA which may not exist.
   * Rebased for ggtt view changes.

v9:
   * Rebased after code review changes on the preceding patches.
   * Tidy function definitions. (Joonas Lahtinen)

For: VIZ-4726
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com> (v4)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-03-23 15:06:31 +01:00
Tvrtko Ursulin e661733092 drm/i915: Use GGTT view when (un)pinning objects to planes
To support frame buffer rotation we need to be able to pass on the information
on what kind of GGTT view is required for display.

This patch just adds the parameter and makes all the callers default to the
normal view.

v2: Rebased for ggtt view changes.
v3: Don't limit PIN_MAPPABLE to normal views just yet. (Joonas Lahtinen)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v3)
[danvet: s/BUG/WARN/ in the patch hunk because. At least where the
BUG_ON isn't fatal right away.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-03-23 14:56:56 +01:00
Ben Widawsky 563222a745 drm/i915: Track page table reload need
This patch was formerly known as, "Force pd restore when PDEs change,
gen6-7." I had to change the name because it is needed for GEN8 too.

The real issue this is trying to solve is when a new object is mapped
into the current address space. The GPU does not snoop the new mapping
so we must do the gen specific action to reload the page tables.

GEN8 and GEN7 do differ in the way they load page tables for the RCS.
GEN8 does so with the context restore, while GEN7 requires the proper
load commands in the command streamer. Non-render is similar for both.

Caveat for GEN7
The docs say you cannot change the PDEs of a currently running context.
We never map new PDEs of a running context, and expect them to be
present - so I think this is okay. (We can unmap, but this should also
be okay since we only unmap unreferenced objects that the GPU shouldn't
be tryingto va->pa xlate.) The MI_SET_CONTEXT command does have a flag
to signal that even if the context is the same, force a reload. It's
unclear exactly what this does, but I have a hunch it's the right thing
to do.

The logic assumes that we always emit a context switch after mapping new
PDEs, and before we submit a batch. This is the case today, and has been
the case since the inception of hardware contexts. A note in the comment
let's the user know.

It's not just for gen8. If the current context has mappings change, we
need a context reload to switch

v2: Rebased after ppgtt clean up patches. Split the warning for aliasing
and true ppgtt options. And do not break aliasing ppgtt, where to->ppgtt
is always null.

v3: Invalidate PPGTT TLBs inside alloc_va_range.

v4: Rename ppgtt_invalidate_tlbs to mark_tlbs_dirty and move
pd_dirty_rings from i915_address_space to i915_hw_ppgtt. Fixes when
neither ctx->ppgtt and aliasing_ppgtt exist.

v5: Removed references to teardown_va_range.

v6: Updated needs_pd_load_pre/post.

v7: Fix pd_dirty_rings check in needs_pd_load_post, and update/move
comment about updated PDEs to object_pin/bind (Mika).

Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-03-20 11:48:18 +01:00
Ben Widawsky 678d96fbb3 drm/i915: Track GEN6 page table usage
Instead of implementing the full tracking + dynamic allocation, this
patch does a bit less than half of the work, by tracking and warning on
unexpected conditions. The tracking itself follows which PTEs within a
page table are currently being used for objects. The next patch will
modify this to actually allocate the page tables only when necessary.

With the current patch there isn't much in the way of making a gen
agnostic range allocation function. However, in the next patch we'll add
more specificity which makes having separate functions a bit easier to
manage.

One important change introduced here is that DMA mappings are
created/destroyed at the same page directories/tables are
allocated/deallocated.

Notice that aliasing PPGTT is not managed here. The patch which actually
begins dynamic allocation/teardown explains the reasoning for this.

v2: s/pdp.page_directory/pdp.page_directories
Make a scratch page allocation helper

v3: Rebase and expand commit message.

v4: Allocate required pagetables only when it is needed, _bind_to_vm
instead of bind_vma (Daniel).

v5: Rebased to remove the unnecessary noise in the diff, also:
 - PDE mask is GEN agnostic, renamed GEN6_PDE_MASK to I915_PDE_MASK.
 - Removed unnecessary checks in gen6_alloc_va_range.
 - Changed map/unmap_px_single macros to use dma functions directly and
   be part of a static inline function instead.
 - Moved drm_device plumbing through page tables operation to its own
   patch.
 - Moved allocate/teardown_va_range calls until they are fully
   implemented (in subsequent patch).
 - Merged pt and scratch_pt unmap_and_free path.
 - Moved scratch page allocator helper to the patch that will use it.

v6: Reduce complexity by not tearing down pagetables dynamically, the
same can be achieved while freeing empty vms. (Daniel)

v7: s/i915_dma_map_px_single/i915_dma_map_single
s/gen6_write_pdes/gen6_write_pde
Prevent a NULL case when only GGTT is available. (Mika)

v8: Rebased after s/page_tables/page_table/.

v9: Reworked i915_pte_index and i915_pte_count.
Also exercise bitmap allocation here (gen6_alloc_va_range) and fix
incorrect write_page_range in i915_gem_restore_gtt_mappings (Mika).

Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-03-20 11:48:18 +01:00
Daniel Vetter be6a037695 drm/i915: Extract i915_gem_shrinker.c
Two code changes:
- Extract i915_gem_shrinker_init.
- Inline i915_gem_object_is_purgeable since we open-code it everywhere
  else too.

This already has the benefit of pulling all the shrinker code
together, next patch adds a bit of kerneldoc.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-03-20 11:48:15 +01:00
Tvrtko Ursulin 6fafab76d5 drm/i915: Turn on PIN_GLOBAL in i915_gem_object_ggtt_pin
This makes the interface consistent to old i915_gem_obj_ggtt_pin.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-03-20 11:48:12 +01:00