Commit Graph

177 Commits

Author SHA1 Message Date
Chris Wilson c8725f3dc0 drm/i915: Do not call retire_requests from wait_for_rendering
A common issue we have is that retiring requests causes recursion
through GTT manipulation or page table manipulation which we can only
handle at very specific points. However, to maintain internal
consistency (enforced through our sanity checks on write_domain at
various points in the GEM object lifecycle) we do need to retire the
object prior to marking it with a new write_domain, and also clear the
write_domain for the implicit flush following a batch.

Note that this then allows the unbound objects to still be on the active
lists, and so care must be taken when removing objects from unbound lists
(similar to the caveats we face processing the bound lists).

v2: Fix i915_gem_shrink_all() to handle updated object lifetime rules,
by refactoring it to call into __i915_gem_shrink().

v3: Missed an object-retire prior to changing cache domains in
i915_gem_object_set_cache_leve()

v4: Rebase

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 09:09:15 +02:00
Zhao Yakui a8ebba75b3 drm/i915: Use the coarse ping-pong mechanism based on drm fd to dispatch the BSD command on BDW GT3
The BDW GT3 has two independent BSD rings, which can be used to process the
video commands. To be simpler, it is transparent to user-space driver/middle.
Instead the kernel driver will decide which ring is to dispatch the BSD video
command.

As every BSD ring is powerful, it is enough to dispatch the BSD video command
based on the drm fd. In such case it can play back video stream while encoding
another video stream. The coarse ping-pong mechanism is used to determine
which BSD ring is used to dispatch the BSD video command.

V1->V2: Follow Daniel's comment and use the simple ping-pong mechanism.
This is only to add the support of dual BSD rings on BDW GT3 machine.
The further optimization will be considered in another patch set.

V2->V3: Follow Daniel's comment to use the struct_mutext instead of
atomic_t during determining which ring can be used to dispatch Video command.

Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 09:08:49 +02:00
Zhao Yakui b1a93306ed drm/i915: Update the restrict check to filter out wrong Ring ID passed by user-space
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 09:08:44 +02:00
Daniel Vetter 9cb346648d drm/i915: Catch dirt in unused execbuffer fields
We need to make sure that userspace keeps on following the contract,
otherwise we won't be able to use the reserved fields at all.

v2: Add DRM_DEBUG (Chris)

Testcase: igt/gem_exec_params/*-dirt
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 09:08:43 +02:00
Daniel Vetter c0f5b82cd1 drm/i915: Catch abuse of I915_EXEC_CONSTANTS_*
A bit tricky since 0 is also a valid constant ...

v2: Add DRM_DEBUG (Chris)

Testcase: igt/gem_exec_params/rel-constants-*
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 09:08:42 +02:00
Daniel Vetter 9d662da8b6 drm/i915: Catch abuse of I915_EXEC_GEN7_SOL_RESET
Currently we catch it, but silently succeed. Our userspace is
better than this.

v2: Add DRM_DEBUG (Chris)

Testcase: igt/gem_exec_params/sol-reset-*
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 09:08:41 +02:00
Dave Airlie 885ac04ab3 Merge tag 'drm-intel-next-2014-04-16' of git://anongit.freedesktop.org/drm-intel into drm-next
drm-intel-next-2014-04-16:
- vlv infoframe fixes from Jesse
- dsi/mipi fixes from Shobhit
- gen8 pageflip fixes for LRI/SRM from Damien
- cmd parser fixes from Brad Volkin
- some prep patches for CHV, DRRS, ...
- and tons of little things all over
drm-intel-next-2014-04-04:
- cmd parser for gen7 but only in enforcing and not yet granting mode - the
  batch copying stuff is still missing. Also performance is a bit ... rough
  (Brad Volkin + OACONTROL fix from Ken).
- deprecate UMS harder (i.e. CONFIG_BROKEN)
- interrupt rework from Paulo Zanoni
- runtime PM support for bdw and snb, again from Paulo
- a pile of refactorings from various people all over the place to prep for new
  stuff (irq reworks, power domain polish, ...)

drm-intel-next-2014-04-04:
- cmd parser for gen7 but only in enforcing and not yet granting mode - the
  batch copying stuff is still missing. Also performance is a bit ... rough
  (Brad Volkin + OACONTROL fix from Ken).
- deprecate UMS harder (i.e. CONFIG_BROKEN)
- interrupt rework from Paulo Zanoni
- runtime PM support for bdw and snb, again from Paulo
- a pile of refactorings from various people all over the place to prep for new
  stuff (irq reworks, power domain polish, ...)

Conflicts:
	drivers/gpu/drm/i915/i915_gem_context.c
2014-05-01 09:11:37 +10:00
Chris Wilson 691e6415c8 drm/i915: Always use kref tracking for all contexts.
If we always initialize kref for the context, even if we are using fake
contexts for hangstats when there is no hw support, we can forgo the
dance to dereference the ctx->obj and inspect whether we are permitted
to use kref inside i915_gem_context_reference() and _unreference().

My ulterior motive here is to improve the debugging of a use-after-free
of ctx->obj. This patch avoids the dereference here and instead forces
the assertion checks associated with kref.

v2: Refactor the fake contexts to being even more like the real
contexts, so that there is much less duplicated and special case code.

v3: Tweaks.
v4: Tweaks, minor.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76671
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: lu hua <huax.lu@intel.com>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
[Jani: tiny change to backport to drm-intel-fixes.]
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2014-04-11 13:29:51 +03:00
Ben Widawsky 935f38d694 drm/i915: Unref context on failed eb_create
I opted to do this instead of grabbing the context reference after
eb_create since eb_create can potentially call the shrinker, and that
makes things very complicated. This simple patch balances the ref count
without requiring a great deal of review to make sure the shrinker path
is safe.

Theoretically (by design) the shrinker can end up destroying a context,
which enforces the reasoning for doing the fix this way instead of
moving the reference to later in the function.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-04-09 14:37:11 +02:00
Jani Nikula 50227e1cae drm/i915: prefer struct drm_i915_private to drm_i915_private_t
Remove the rest of the references to drm_i915_private_t. No functional
changes.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
[danvet: Drop hunk in i915_cmd_parser.c]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-31 15:34:21 +02:00
Brad Volkin 351e3db2b3 drm/i915: Implement command buffer parsing logic
The command parser scans batch buffers submitted via execbuffer ioctls before
the driver submits them to hardware. At a high level, it looks for several
things:

1) Commands which are explicitly defined as privileged or which should only be
   used by the kernel driver. The parser generally rejects such commands, with
   the provision that it may allow some from the drm master process.
2) Commands which access registers. To support correct/enhanced userspace
   functionality, particularly certain OpenGL extensions, the parser provides a
   whitelist of registers which userspace may safely access (for both normal and
   drm master processes).
3) Commands which access privileged memory (i.e. GGTT, HWS page, etc). The
   parser always rejects such commands.

See the overview comment in the source for more details.

This patch only implements the logic. Subsequent patches will build the tables
that drive the parser.

v2: Don't set the secure bit if the parser succeeds
Fail harder during init
Makefile cleanup
Kerneldoc cleanup
Clarify module param description
Convert ints to bools in a few places
Move client/subclient defs to i915_reg.h
Remove the bits_count field

OTC-Tracker: AXIA-4631
Change-Id: I50b98c71c6655893291c78a2d1b8954577b37a30
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
[danvet: Appease checkpatch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-07 22:37:00 +01:00
Daniel Vetter 8ea99c9287 drm/i915: Only bind each object rather than for every execbuffer
One side-effect of the introduction of ppgtt was that we needed to
rebind the object into the appropriate vm (and global gtt in some
peculiar cases). For simplicity this was done twice for every object on
every call to execbuffer. However, that adds a tremendous amount of CPU
overhead (rewriting all the PTE for all objects into WC memory) per
draw. The fix is to push all the decision about which vm to bind into
and when down into the low-level bind routines through hints rather than
performing the bind unconditionally in the execbuffer routine.

Note that this is a regression introduced in the full ppgtt feature
branch, before this we've only done re-bound objects when the relevant
has_(aliasing_ppgtt|global_gtt)_mapping flag was clear. But since
that's per-object and not per-vma that optimization broke.

v2: Split out prep work and unrelated changes.

v3: Bring back functional change around PIN_GLOBAL that I've
accidentally split out.

v4: Remove the temporary hack for the old binding logic to avoid
bisection issues.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72906
Tested-by: jianx.zhou@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-14 14:18:38 +01:00
Daniel Vetter bf3d149b25 drm/i915: split PIN_GLOBAL out from PIN_MAPPABLE
With abitrary pin flags it makes sense to split out a "please bind
this into global gtt" from the "please allocate in the mappable
range".

Use this unconditionally in our global gtt pin helper since this is
what its callers want. Later patches will drop PIN_MAPPABLE where it's
not strictly needed.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-14 14:17:27 +01:00
Daniel Vetter 1ec9e26dda drm/i915: Consolidate binding parameters into flags
Anything more than just one bool parameter is just a pain to read,
symbolic constants are much better.

Split out from Chris' vma-binding rework patch.

v2: Undo the behaviour change in object_pin that Chris spotted.

v3: Split out misplaced hunk to handle set_cache_level errors,
spotted by Jani.

v4: Keep the current over-zealous binding logic in the execbuffer code
working with a quick hack while the overall binding code gets shuffled
around.

v5: Reorder the PIN_ flags for more natural patch splitup.

v6: Pull out the PIN_GLOBAL split-up again.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-14 14:16:58 +01:00
Jani Nikula d330a9530c drm/i915: move module parameters into a struct, in a new file
With 20+ module parameters, I think referring to them via a struct
improves clarity over just having a bunch of globals. While at it, move
the parameter initialization and definitions into a new file
i915_params.c to reduce clutter in i915_drv.c.

Apart from the ill-named i915_enable_rc6, i915_enable_fbc and
i915_enable_ppgtt parameters, for which we lose the "i915_" prefix
internally, the module parameters now look the same both on the kernel
command line and in code. For example, "i915.modeset".

The downsides of the change are losing static on a couple of variables
and not having the initialization and module_param_named() right next to
each other. On the other hand, all module parameters are now defined in
one place at i915_params.c. Plus you can do this to find all module
parameter references:

$ git grep "i915\." -- drivers/gpu/drm/i915

v2:
- move the definitions into a new file
- s/i915_params/i915/
- make i915_try_reset i915.reset, for consistency

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-01-27 17:16:45 +01:00
Rodrigo Vivi a25eebb0af drm: dp helper: Add DP test sink CRC definition.
This address will be used to verify panel CRC for test and
validation purposes.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
[danvet: Fix whitespace fail.]
Acked-by: Dave Airlie <airlied@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-01-27 09:55:23 +01:00
Daniel Vetter 0e5539b923 Merge branch 'topic/ppgtt' into drm-intel-next-queued
Because whatever.*

* This should contain a fairly long list of issues and still
unresolved resgressions, but I didn't really get a vote.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-01-25 21:14:57 +01:00
Ben Widawsky 8b78f0e588 drm/i915: Clarify relocation errnos
While trying to find a random -EINVAL from a failing test, I noticed we
had a few hard to follow return values.

The first two hunks in this patch replace completely useless
initialization of ret. The last several hunks help to distinguish
between altering 'return ret' and 'return <ERROR>'

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-01-22 09:58:25 +01:00
Daniel Vetter 0d9d349d87 Merge commit origin/master into drm-intel-next
Conflicts are getting out of hand, and now we have to shuffle even
more in -next which was also shuffled in -fixes (the call for
drm_mode_config_reset needs to move yet again).

So do a proper backmerge. I wanted to wait with this for the 3.13
relaese, but alas let's just do this now.

Conflicts:
	drivers/gpu/drm/i915/i915_reg.h
	drivers/gpu/drm/i915/intel_ddi.c
	drivers/gpu/drm/i915/intel_display.c
	drivers/gpu/drm/i915/intel_pm.c

Besides the conflict around the forcewake get/put (where we chaged the
called function in -fixes and added a new parameter in -next) code all
the current conflicts are of the adjacent lines changed type.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-01-16 22:06:30 +01:00
Ben Widawsky 72ad5c45f0 drm/i915/ppgtt: Fix ioctl errno for "no such context"
Without this fix the ioctls silently succeeded (but actually did
nothing).

It makes all the code which calls into this function way too confusing.

v2: Fix destroy IOCTL as well

v3: Clarify the other two callers of i915_gem_context_get() to never
check for NULL. (Mika)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72903
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Testcase: igt/gem_ctx_exec/basic
[danvet: Fix up the commit message and actually bother to mention the
testcase this fixes.]
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-01-07 08:50:11 +01:00
Daniel Vetter a7c1d426ef drm/i915: Don't check for NEEDS_GTT when deciding the address space
This means something different and is only relevant for gen6 and the
reason why we cant use anything else than aliasing ppgtt there.

Note that the currently implemented logic for secure batches is
broken: Userspace wants the buffer both in ppgtt (for self-referencing
relocations) and in ggtt (for priveledge operations).

This is the same issue the command parser is also facing.
Unfortunately our coverage for corner-cases of self-referencing
batches is spotty.

Note that this will break vsync'ed Xv and DRI2 copies.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 17:50:40 +01:00
Daniel Vetter 2c9f8d56a1 drm/i915: Reject NEEDS_GTT relocations with full ppgtt
Doesn't make sense. Spotted while fixing an issue Chris
noticed in the same area.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 17:50:39 +01:00
Daniel Vetter 7c9c4b8f5d drm/i915: Reject non-default contexts on non-render again
This reverts the abi-change from

commit 67e3d2979b
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Fri Dec 6 14:11:01 2013 -0800

    drm/i915: Permit contexts on all rings

We don't actually need this, only the internal changes to allow
contexts on all rings for the purpose of ppgtt switching are required.
And I'm not sure whether this is the right thing to do given some of
the hw features in the pipeline.

Also, new abi needs userspace patches as a proof-of-need, which is
completely lacking here.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 17:50:38 +01:00
Ben Widawsky 7e0d96bc03 drm/i915: Use multiple VMs -- the point of no return
As with processes which run on the CPU, the goal of multiple VMs is to
provide process isolation. Specific to GEN, there is also the ability to
map more objects per process (2GB each instead of 2Gb-2k total).

For the most part, all the pipes have been laid, and all we need to do
is remove asserts and actually start changing address spaces with the
context switch. Since prior to this we've converted the setting of the
page tables to a streamed version, this is quite easy.

One important thing to point out (since it'd been hotly contested) is
that with this patch, every context created will have it's own address
space (provided the HW can do it).

v2: Disable BDW on rebase

NOTE: I tried to make this commit as small as possible. I needed one
place where I could "turn everything on" and that is here. It could be
split into finer commits, but I didn't really see much point.

Cc: Eric Anholt <eric@anholt.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 16:24:52 +01:00
Daniel Vetter 3d7f0f9dcc Merge commit drm-intel-fixes into topic/ppgtt
I need the tricky do_switch fix before I can merge the final piece of
the ppgtt enabling puzzle. Otherwise the conflict will be a real pain
to resolve since the do_switch hunk from -fixes must be placed at the
exact right place within a hunk in the next patch.

Conflicts:
	drivers/gpu/drm/i915/i915_gem_context.c
	drivers/gpu/drm/i915/i915_gem_execbuffer.c
	drivers/gpu/drm/i915/intel_display.c

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 16:23:37 +01:00
Ben Widawsky 41bde5535a drm/i915: Get context early in execbuf
We need to have the address space when reserving space for the objects.
Since the address space and context are tied together, and reserve
occurs before context switch (for good reason), we must lookup our
context earlier in the process.

This leaves some room for optimizations where we no longer need to use
ctx_id in certain places. This will be addressed in a subsequent patch.

Important tricky bit:
Because slow relocations during execbuffer drop struct_mutex

Perhaps it would be best to acquire the reference when we get the
context, but I'll save that for another day (note I have written the
patch before, and I found the changes required to be uglier than this).

Note that since we currently access everything via context id, and not
the data structure this is fine, though not desirable. The next change
attempts to get the context only once via the context ID idr lookup, and
as such, the following can happen:

CTX-A is created, refcount = 1
CTX-A execbuf, mutex dropped
close IOCTL called on CTX-A, refcount = 0
CTX-A resumes in execbuf.

v2: Rebased on top of
commit b6359918b8
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date:   Wed Oct 30 15:44:16 2013 +0200

    drm/i915: add i915_get_reset_stats_ioctl

v3: Rebased on top of
commit 25b3dfc87b
Author: Mika Westerberg <mika.westerberg@linux.intel.com>
Date:   Tue Nov 12 11:57:30 2013 +0200

Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date:   Tue Nov 26 16:14:33 2013 +0200

    drm/i915: check context reset stats before relocations

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:52:42 +01:00
Ben Widawsky 67e3d2979b drm/i915: Permit contexts on all rings
If we want to use contexts in more abstract terms (specifically with
PPGTT in mind), we need to allow them to be specified for any ring.

Since the upcoming patches will bring about the use of multiple address
spaces, and each ring needs to have an address space programmed (which
we intend to do at context switch time), we can no longer only use RCS.

With multiple rings having a last context, we must now unreference these
contexts.

NOTE: This commit requires an update to intel-gpu-tools to make it not
fail.

v2: Rebased with some logical conflicts.
Squashed in the context fini refcount patch

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:27:53 +01:00
Ben Widawsky ca01b12b40 drm/i915: Simplify ring handling in execbuf
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:27:52 +01:00
Ben Widawsky 3e7a032295 drm/i915: Remove vm arg from relocate entry
The only place we were using it was for GEN6, which won't have PPGTT
support anyway (ie. the VM is always the same). To clear things up,
(it only added confusion for me since it doesn't allow us to assert
vma->vm is what we always want, when just looking at the code).

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:27:50 +01:00
Ben Widawsky 6f65e29aca drm/i915: Create bind/unbind abstraction for VMAs
To sum up what goes on here, we abstract the vma binding, similarly to
the previous object binding. This helps for distinguishing legacy
binding, versus modern binding. To keep the code churn as minimal as
possible, I am leaving in insert_entries(). It serves as the per
platform pte writing basically. bind_vma and insert_entries do share a
lot of similarities, and I did have designs to combine the two, but as
mentioned already... too much churn in an already massive patchset.

What follows are the 3 commits which existed discretely in the original
submissions. Upon rebasing on Broadwell support, it became clear that
separation was not good, and only made for more error prone code. Below
are the 3 commit messages with all their history.

drm/i915: Add bind/unbind object functions to VMA
drm/i915: Use the new vm [un]bind functions
drm/i915: reduce vm->insert_entries() usage

drm/i915: Add bind/unbind object functions to VMA

As we plumb the code with more VM information, it has become more
obvious that the easiest way to deal with bind and unbind is to simply
put the function pointers in the vm, and let those choose the correct
way to handle the page table updates. This change allows many places in
the code to simply be vm->bind, and not have to worry about
distinguishing PPGTT vs GGTT.

Notice that this patch has no impact on functionality. I've decided to
save the actual change until the next patch because I think it's easier
to review that way. I'm happy to squash the two, or let Daniel do it on
merge.

v2:
Make ggtt handle the quirky aliasing ppgtt
Add flags to bind object to support above
Don't ever call bind/unbind directly for PPGTT until we have real, full
PPGTT (use NULLs to assert this)
Make sure we rebind the ggtt if there already is a ggtt binding.  This
happens on set cache levels.
Use VMA for bind/unbind (Daniel, Ben)

v3: Reorganize ggtt_vma_bind to be more concise and easier to read
(Ville). Change logic in unbind to only unbind ggtt when there is a
global mapping, and to remove a redundant check if the aliasing ppgtt
exists.

v4: Make the bind function a bit smarter about the cache levels to avoid
unnecessary multiple remaps. "I accept it is a wart, I think unifying
the pin_vma / bind_vma could be unified later" (Chris)
Removed the git notes, and put version info here. (Daniel)

v5: Update the comment to not suck (Chris)

v6:
Move bind/unbind to the VMA. It makes more sense in the VMA structure
(always has, but I was previously lazy). With this change, it will allow
us to keep a distinct insert_entries.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

drm/i915: Use the new vm [un]bind functions

Building on the last patch which created the new function pointers in
the VM for bind/unbind, here we actually put those new function pointers
to use.

Split out as a separate patch to aid in review. I'm fine with squashing
into the previous patch if people request it.

v2: Updated to address the smart ggtt which can do aliasing as needed
Make sure we bind to global gtt when mappable and fenceable. I thought
we could get away without this initialy, but we cannot.

v3: Make the global GTT binding explicitly use the ggtt VM for
bind_vma(). While at it, use the new ggtt_vma helper (Chris)

At this point the original mailing list thread diverges. ie.

v4^:
use target_obj instead of obj for gen6 relocate_entry
vma->bind_vma() can be called safely during pin. So simply do that
instead of the complicated conditionals.
Don't restore PPGTT bound objects on resume path
Bug fix in resume path for globally bound Bos
Properly handle secure dispatch
Rebased on vma bind/unbind conversion

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

drm/i915: reduce vm->insert_entries() usage

FKA: drm/i915: eliminate vm->insert_entries()

With bind/unbind function pointers in place, we no longer need
insert_entries. We could, and want, to remove clear_range, however it's
not totally easy at this point. Since it's used in a couple of place
still that don't only deal in objects: setup, ppgtt init, and restore
gtt mappings.

v2: Don't actually remove insert_entries, just limit its usage. It will
be useful when we introduce gen8. It will always be called from the vma
bind/unbind.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:27:50 +01:00
Ben Widawsky d7f46fc4e7 drm/i915: Make pin count per VMA
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-18 15:27:49 +01:00
Chris Wilson 9ae9ab5220 drm/i915: Prevent double unref following alloc failure during execbuffer
Whilst looking up the objects required for an execbuffer, an untimely
allocation failure in creating the vma results in the object being
unreferenced from two lists. The ownership during the lookup is meant to
be moved from the list of objects being looked to the vma, and this
double unreference upon error results in a use-after-free.

Fixes regression from
commit 27173f1f95
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Wed Aug 14 11:38:36 2013 +0200

    drm/i915: Convert execbuf code to use vmas

Based on the fix by Ben Widawsky.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: stable@vger.kernel.org
[danvet: Bikeshed the crucial comment above the ownership transfer as
discussed on irc.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-12 10:44:57 +01:00
Paulo Zanoni f65c916898 drm/i915: add runtime put/get calls at the basic places
If I add code to enable runtime PM on my Haswell machine, start a
desktop environment, then enable runtime PM, these functions will
complain that they're trying to read/write registers while the
graphics card is suspended.

v2: - Simplify i915_gem_fault changes.

Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
[danvet: Drop the hunk in i915_hangcheck_elapsed, it's the wrong thing
to do.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-10 22:47:33 +01:00
Mika Kuoppala d299cce76e drm/i915: check context reset stats before relocations
Doing it early prevents moving and relocating objects in vain
for contexts that won't get any GPU time.

Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-04 13:20:31 +01:00
Chris Wilson a415d35564 drm/i915: Pin relocations for the duration of constructing the execbuffer
As the execbuffer dispatch grows ever more complex and involves multiple
stages of moving objects into the aperture, we need to take greater care
that we do not evict our execbuffer objects prior to dispatch. This is
relatively simple as we can just keep the objects pinned for not just
the relocation but until we are finished.

One such example is the possibility of the context switch causing an
eviction or hitting the shrinker in order to fit its object into the
aperture.

Link: http://lists.freedesktop.org/archives/intel-gfx/2013-November/036166.html
Reported-by: "Siluvery, Arun" <arun.siluvery@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: stable@vger.kernel.org
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Add the additional explanations from Chris to the commit
message.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-27 09:04:36 +01:00
Ben Widawsky 5ce097254e drm/i915: Missed dropped VMA conversion
This belonged in
commit 07fe0b1280
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Wed Jul 31 17:00:10 2013 -0700

    drm/i915: plumb VM into bind/unbind code

But it was somehow missed along the way.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-26 10:12:32 +01:00
Ben Widawsky 17601cbc93 drm/i915: Removed unused vm args
i915_gem_execbuffer_relocate became defunct in:
commit 27173f1f95
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Wed Aug 14 11:38:36 2013 +0200

    drm/i915: Convert execbuf code to use vmas

eb_create: never used?

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: The lingering vm parameter to eb_create might have been back
from the days where we didn't yet keep both vmas and obj lists in the
eb struct. But I didn't check really.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-26 10:11:21 +01:00
Ben Widawsky 28cf541543 drm/i915/bdw: unleash PPGTT
v2: Squash in fix from Ben: Set PPGTT batches as necessary

This fixes the regression in the last couple of days when we enabled
PPGTT.

v3: Squash in fixup to still use GTT for secure batches from Ville:

BDW doesn't have a separate secure vs. non-secure bit in
MI_BATCH_BUFFER_START. So for secure batches we have to simply
leave the PPGTT bit unset. Fortunately older generations (except
HSW) had similar limitations so execbuffer already creates a GTT
mapping for all secure batches.

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-08 18:09:48 +01:00
Ben Widawsky 3c94ceeee2 drm/i915/bdw: Support 64b relocations
We don't actually return any to userspace yet, however we can pretend
like we do now so userspace will support it when it happens.

This is just to please Chris as the code itself isn't ready for > 64b
relocations.

v2: Rebase on top of the refactored relocate_entry_gtt|cpu functions.

v3: Squash in fixup from Rafal Barbalho for 64 byte relocs using cpu
relocs and those crossing a page boundary.

v4: Squash in a fixup for the fixup from Rafael.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Barbalho, Rafael <rafael.barbalho@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-08 18:09:41 +01:00
Ben Widawsky e2d05a8b1e drm/i915: Convert active API to VMA
Even though we track object activity and not VMA, because we have the
active_list be based on the VM, it makes the most sense to use VMAs in
the APIs.

NOTE: Daniel intends to eventually rip out active/inactive LRUs, but for
now, leave them be.

v2: Remove leftover hunk from the previous patch which didn't keep
i915_gem_object_move_to_active. That patch had to rely on the ring to
get the dev instead of the obj. (Chris)

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-01 07:45:21 +02:00
Daniel Vetter b205ca5721 drm/i915: Use unsigned for overflow checks in execbuf
There's actually no real risk since we already check for stricter
constraints earlier (using UINT_MAX / sizeof (struct
drm_i915_gem_exec_object2) as the limit). But in eb_create we use
signed integers, which steals a factor of 2. Luckily struct
drm_i915_gem_exec_object2 for this to not matter.

Still, be consistent and use unsigned integers.

Similar use unsinged integers when checking for overflows in the
relocation entry processing.

I've also added a new subtests to igt/gem_reloc_overflow to also
test for overflowing args->buffer_count values.

v2: Give the variables again tighter scope to make it clear that the
computation is purely local and doesn't leak out to the 2nd block.
Requested by Chris Wilson.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-01 07:45:03 +02:00
Daniel Vetter a1e2265332 drm/i915: Use kcalloc more
No buffer overflows here, but better safe than sorry.

v2:
- Fixup the sizeof conversion, I've missed the pointer deref (Jani).
- Drop the redundant GFP_ZERO, kcalloc alreads memsets (Jani).
- Use kmalloc_array for the execbuf fastpath to avoid the memset
  (Chris). I've opted to leave all other conversions as-is since they
  aren't in a fastpath and dealing with cleared memory instead of
  random garbage is just generally nicer.

Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
[danvet: Drop the contentious kmalloc_array hunk in execbuf.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-01 07:45:01 +02:00
Ben Widawsky 68c8c17f52 drm/i915: evict VM instead of everything
When reserving objects during execbuf, it is possible to come across an
object which will not fit given the current fragmentation of the address
space. We do not have any defragment in drm_mm, so the strategy is to
instead evict everything, and reallocate objects.

With the upcoming addition of multiple VMs, there is no point to evict
everything since doing so is overkill for the specific case mentioned
above.

Recommended-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: One additional s/evict_everything/evict_vm/ to update a
comment in the code.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-12 21:58:22 +02:00
Mika Kuoppala be62acb4cc drm/i915: ban badly behaving contexts
Now when we have mechanism in place to track which context
was guilty of hanging the gpu, it is possible to punish
for bad behaviour.

If context has recently submitted a faulty batchbuffers guilty of
gpu hang and submits another batch which hangs gpu in quick
succession, ban it permanently. If ctx is banned, no more
batchbuffers will be queued for execution.

There is no need for global wedge machinery anymore and
it would be unwise to wedge the whole gpu if we have multiple
hanging batches queued for execution. Instead just ban
the guilty ones and carry on.

v2: Store guilty ban status bool in gpu_error instead of pointers
    that might become danling before hang is declared.

v3: Use return value for banned status instead of stashing state
    into gpu_error (Chris Wilson)

v4: - rebase on top of fixed hang stats api
    - add define for ban period
    - rename commit and improve commit msg

v5: - rely context banning instead of wedging the gpu
    - beautification and fix for ban calculation (Chris)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-06 17:55:50 +02:00
Chris Wilson 2cc86b8260 drm/i915: Always prefer CPU relocations with LLC
A follow-on to the update of the LLC coherency logic is that we can rely
on the LLC being coherent with the CS for rewriting batchbuffers
irrespective of their cache domain. (This should have no effect
currently as all the batch buffers are expected to be I915_CACHE_LLC and
so using the cpu relocation path anyway.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-04 17:34:43 +02:00
Daniel Vetter e656a6cba0 drm/i915: inline vma_create into lookup_or_create_vma
In the execbuf code we don't clean up any vmas which ended up not
getting bound for code simplicity. To make sure that we don't end up
creating multiple vma for the same vm kill the somewhat dangerous
vma_create function and inline it into lookup_or_create.

This is just a safety measure to prevent surprises in the future.

Also update the somewhat confused comment in the execbuf code and
clarify what kind of magic is going on with a new one.

v2: Keep the function separate as requested by Chris. But give it a __
prefix for paranoia and move it tighter together with the other vma
stuff.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-04 17:34:41 +02:00
Ben Widawsky 27173f1f95 drm/i915: Convert execbuf code to use vmas
In order to transition more of our code over to using a VMA instead of
an <OBJ, VM> pair - we must have the vma accessible at execbuf time. Up
until now, we've only had a VMA when actually binding an object.

The previous patch helped handle the distinction on bound vs. unbound.
This patch will help us catch leaks, and other issues before we actually
shuffle a bunch of stuff around.

This attempts to convert all the execbuf code to speak in vmas. Since
the execbuf code is very self contained it was a nice isolated
conversion.

The meat of the code is about turning eb_objects into eb_vma, and then
wiring up the rest of the code to use vmas instead of obj, vm pairs.

Unfortunately, to do this, we must move the exec_list link from the obj
structure. This list is reused in the eviction code, so we must also
modify the eviction code to make this work.

WARNING: This patch makes an already hotly profiled path slower. The cost is
unavoidable. In reply to this mail, I will attach the extra data.

v2: Release table lock early, and two a 2 phase vma lookup to avoid
having to use a GFP_ATOMIC. (Chris)

v3: s/obj_exec_list/obj_exec_link/
Updates to address
commit 6d2b888569
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Aug 7 18:30:54 2013 +0100

    drm/i915: List objects allocated from stolen memory in debugfs

v4: Use obj = vma->obj for neatness in some places (Chris)
need_reloc_mappable() should return false if ppgtt (Chris)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Split out prep patches. Also remove a FIXME comment which is
now taken care of.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-04 17:34:41 +02:00
Daniel Vetter d4d36014ca drm/i915: fix up the relocate_entry refactoring
Somehow we've lost the error handling in the patch split-up between
the internal and external patch. This regression has been introduced
in

commit 5032d871f7
Author: Rafael Barbalho <rafael.barbalho@intel.com>
Date:   Wed Aug 21 17:10:51 2013 +0100

    drm/i915: Cleaning up the relocate entry function

This bug is exercised by igt/gem_reloc_vs_gpu/interruptible.

Cc: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-03 19:18:01 +02:00
Rafael Barbalho 5032d871f7 drm/i915: Cleaning up the relocate entry function
As the relocate entry function was getting a bit too big I've moved
the code that used to use either the cpu or the gtt to for the
relocation into two separate functions.

Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-23 14:52:31 +02:00
Chris Wilson 000433b67e drm/i915: Only do a chipset flush after a clflush
Now that we skip clflushes more often, return a boolean indicating
whether the clflush was actually performed, and only if it was do the
chipset flush. (Though on most of the architectures where the clflush will
be skipped, the chipset flush is a no-op!)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-22 13:31:34 +02:00