Commit Graph

551 Commits

Author SHA1 Message Date
John Harrison ccd98fe499 drm/i915: Add *_ring_begin() to request allocation
Now that the *_ring_begin() functions no longer call the request allocation
code, it is finally safe for the request allocation code to call *_ring_begin().
This is important to guarantee that the space reserved for the subsequent
i915_add_request() call does actually get reserved.

v2: Renamed functions according to review feedback (Tomas Elf).

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:30 +02:00
John Harrison 4d616a293a drm/i915: Update intel_logical_ring_begin() to take a request structure
Now that everything above has been converted to use requests,
intel_logical_ring_begin() can be updated to take a request instead of a
ringbuf/context pair. This also means that it no longer needs to lazily allocate
a request if no-one happens to have done it earlier.

Note that this change makes the execlist signature the same as the legacy
version. Thus the two functions could be merged into a ring->begin() wrapper if
required.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:30 +02:00
John Harrison be795fc17b drm/i915: Update ring->emit_bb_start() to take a request structure
Updated the ring->emit_bb_start() implementation to take a request instead of a
ringbuf/context pair.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:26 +02:00
John Harrison c4e766389e drm/i915: Update ring->emit_request() to take a request structure
Updated the ring->emit_request() implementation to take a request instead of a
ringbuf/request pair. Also removed its use of the OLR for obtaining the
request's seqno.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:24 +02:00
John Harrison 7deb4d3980 drm/i915: Update ring->emit_flush() to take a request structure
Updated the various ring->emit_flush() implementations to take a request instead
of a ringbuf/context pair.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:23 +02:00
John Harrison 4866d729ab drm/i915: Update flush_all_caches() to take request structures
Updated the *_ring_flush_all_caches() functions to take requests instead of
rings or ringbuf/context pairs.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:20 +02:00
John Harrison e2be4faf30 drm/i915: Update workarounds_emit() to take request structures
Updated the *_ring_workarounds_emit() functions to take requests instead of
ring/context pairs.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:19 +02:00
John Harrison 2f20055d36 drm/i915: Update a bunch of execbuffer helpers to take request structures
Updated *_ring_invalidate_all_caches(), i915_reset_gen7_sol_offsets() and
i915_emit_box() to take request structures instead of ring or ringbuf/context
pairs.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:18 +02:00
John Harrison b2af037693 drm/i915: Update [vma|object]_move_to_active() to take request structures
Now that everything above has been converted to use request structures, it is
possible to update the lower level move_to_active() functions to be request
based as well.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:16 +02:00
John Harrison 75289874e4 drm/i915: Update add_request() to take a request structure
Now that all callers of i915_add_request() have a request pointer to hand, it is
possible to update the add request function to take a request pointer rather
than pulling it out of the OLR.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:15 +02:00
John Harrison 91af127fd7 drm/i915: Update i915_gem_object_sync() to take a request structure
The plan is to pass requests around as the basic submission tracking structure
rather than rings and contexts. This patch updates the i915_gem_object_sync()
code path.

v2: Much more complex patch to share a single request between the sync and the
page flip. The _sync() function now supports lazy allocation of the request
structure. That is, if one is passed in then that will be used. If one is not,
then a request will be allocated and passed back out. Note that the _sync() code
does not necessarily require a request. Thus one will only be created until
certain situations. The reason the lazy allocation must be done within the
_sync() code itself is because the decision to need one or not is not really
something that code above can second guess (except in the case where one is
definitely not required because no ring is passed in).

The call chains above _sync() now support passing a request through which most
callers passing in NULL and assuming that no request will be required (because
they also pass in NULL for the ring and therefore can't be generating any ring
code).

The exeception is intel_crtc_page_flip() which now supports having a request
returned from _sync(). If one is, then that request is shared by the page flip
(if the page flip is of a type to need a request). If _sync() does not generate
a request but the page flip does need one, then the page flip path will create
its own request.

v3: Updated comment description to be clearer about 'to_req' parameter (Tomas
Elf review request). Rebased onto newer tree that significantly changed the
synchronisation code.

v4: Updated comments from review feedback (Tomas Elf)

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:13 +02:00
John Harrison be01363f0a drm/i915: Update render_state_init() to take a request structure
Updated the two render_state_init() functions to take a request pointer instead
of a ring. This removes their reliance on the OLR.

v2: Rebased to newer tree.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:12 +02:00
John Harrison 8753181e10 drm/i915: Update init_context() to take a request structure
Now that everything above has been converted to use requests, it is possible to
update init_context() to take a request pointer instead of a ring/context pair.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:12 +02:00
John Harrison 76c3916887 drm/i915: Update deferred context creation to do explicit request management
In execlist mode, context initialisation is deferred until first use of the
given context. This is because execlist mode has per ring context state and thus
many more context storage objects than legacy mode and many are never actually
used. Previously, the initialisation commands were written to the ring and
tagged with some random request structure via the OLR. This seemed to be causing
a null pointer deference bug under certain circumstances (BZ:88865).

This patch adds explicit request creation and submission to the deferred
initialisation code path. Thus removing any reliance on or randomness caused by
the OLR.

Note that it should be possible to move the deferred context creation until even
later - when the context is actually switched to rather than when it is merely
validated. This would allow the initialisation to be done within the request of
the work that is wanting to use the context. Hence, the extra request that is
created, used and retired just for the context init could be removed completely.
However, this is left for a follow up patch.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:11 +02:00
John Harrison dc4be6071a drm/i915: Add explicit request management to i915_gem_init_hw()
Now that a single per ring loop is being done for all the different
intialisation steps in i915_gem_init_hw(), it is possible to add proper request
management as well. The last remaining issue is that the context enable call
eventually ends up within *_render_state_init() and this does its own private
_i915_add_request() call.

This patch adds explicit request creation and submission to the top level loop
and removes the add_request() from deep within the sub-functions.

v2: Updated for removal of batch_obj from add_request call in previous patch.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:08 +02:00
John Harrison a3fbe05a61 drm/i915: Don't tag kernel batches as user batches
The render state initialisation code does an explicit i915_add_request() call to
commit the init commands. It was passing in the initialisation batch buffer to
add_request() as the batch object parameter. However, the batch object entry in
the request structure (which is all that parameter is used for) is meant for
keeping track of user generated batch buffers for blame tagging during GPU
hangs.

This patch clears the batch object parameter so that kernel generated batch
buffers are not tagged as being user generated.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:07 +02:00
John Harrison 5b4a60c276 drm/i915: Add flag to i915_add_request() to skip the cache flush
In order to explcitly track all GPU work (and completely remove the outstanding
lazy request), it is necessary to add extra i915_add_request() calls to various
places. Some of these do not need the implicit cache flush done as part of the
standard batch buffer submission process.

This patch adds a flag to _add_request() to specify whether the flush is
required or not.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:04 +02:00
John Harrison 8a8edb5917 drm/i915: Update execbuffer_move_to_active() to take a request structure
The plan is to pass requests around as the basic submission tracking structure
rather than rings and contexts. This patch updates the
execbuffer_move_to_active() code path.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:03 +02:00
John Harrison 535fbe8233 drm/i915: Update move_to_gpu() to take a request structure
The plan is to pass requests around as the basic submission tracking structure
rather than rings and contexts. This patch updates the move_to_gpu() code paths.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:03 +02:00
John Harrison 95c24161cd drm/i915: Update the dispatch tracepoint to use params->request
Updated a couple of trace points to use the now cached request pointer rather
than extracting it from the ring.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:02 +02:00
John Harrison 217e46b576 drm/i915: Update alloc_request to return the allocated request
The alloc_request() function does not actually return the newly allocated
request. Instead, it must be pulled from ring->outstanding_lazy_request. This
patch fixes this so that code can create a request and start using it knowing
exactly which request it actually owns.

v2: Updated for new i915_gem_request_alloc() scheme.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:00 +02:00
John Harrison adeca76d8e drm/i915: Simplify i915_gem_execbuffer_retire_commands() parameters
Shrunk the parameter list of i915_gem_execbuffer_retire_commands() to a single
structure as everything it requires is available in the execbuff_params object.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:00 +02:00
John Harrison 5f19e2bffa drm/i915: Merged the many do_execbuf() parameters into a structure
The do_execbuf() function takes quite a few parameters. The actual set of
parameters is going to change with the conversion to passing requests around.
Further, it is due to grow massively with the arrival of the GPU scheduler.

This patch simplifies the prototype by passing a parameter structure instead.
Changing the parameter set in the future is then simply a matter of
adding/removing items to the structure.

Note that the structure does not contain absolutely everything that is passed
in. This is because the intention is to use this structure more extensively
later in this patch series and more especially in the GPU scheduler that is
coming soon. The latter requires hanging on to the structure as the final
hardware submission can be delayed until long after the execbuf IOCTL has
returned to user land. Thus it is unsafe to put anything in the structure that
is local to the IOCTL call itself - such as the 'args' parameter. All entries
must be copies of data or pointers to structures that are reference counted in
some way and guaranteed to exist for the duration of the batch buffer's life.

v2: Rebased to newer tree and updated for changes to the command parser.
Specifically, a code shuffle has required saving the batch start address in the
params structure.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:01:59 +02:00
John Harrison 40e895ceca drm/i915: Set context in request from creation even in legacy mode
In execlist mode, the context object pointer is written in to the request
structure (and reference counted) at the point of request creation. In legacy
mode, this only happens inside i915_add_request().

This patch updates the legacy code path to match the execlist version. This
allows all the intermediate code between request creation and request submission
to get at the context object given only a request structure. Thus negating the
need to pass context pointers here, there and everywhere.

v2: Moved the context reference so it does not need to be undone if the
get_seqno() fails.

v3: Fixed execlist mode always hitting a warning about invalid last_contexts
(which don't exist in execlist mode).

v4: Updated for new i915_gem_request_alloc() scheme.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:01:58 +02:00
John Harrison bf7dc5b709 drm/i915: i915_add_request must not fail
The i915_add_request() function is called to keep track of work that has been
written to the ring buffer. It adds epilogue commands to track progress (seqno
updates and such), moves the request structure onto the right list and other
such house keeping tasks. However, the work itself has already been written to
the ring and will get executed whether or not the add request call succeeds. So
no matter what goes wrong, there isn't a whole lot of point in failing the call.

At the moment, this is fine(ish). If the add request does bail early on and not
do the housekeeping, the request will still float around in the
ring->outstanding_lazy_request field and be picked up next time. It means
multiple pieces of work will be tagged as the same request and driver can't
actually wait for the first piece of work until something else has been
submitted. But it all sort of hangs together.

This patch series is all about removing the OLR and guaranteeing that each piece
of work gets its own personal request. That means that there is no more
'hoovering up of forgotten requests'. If the request does not get tracked then
it will be leaked. Thus the add request call _must_ not fail. The previous patch
should have already ensured that it _will_ not fail by removing the potential
for running out of ring space. This patch enforces the rule by actually removing
the early exit paths and the return code.

Note that if something does manage to fail and the epilogue commands don't get
written to the ring, the driver will still hang together. The request will be
added to the tracking lists. And as in the old case, any subsequent work will
generate a new seqno which will suffice for marking the old one as complete.

v2: Improved WARNings (Tomas Elf review request).

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:01:57 +02:00
John Harrison 29b1b415fc drm/i915: Reserve ring buffer space for i915_add_request() commands
It is a bad idea for i915_add_request() to fail. The work will already have been
send to the ring and will be processed, but there will not be any tracking or
management of that work.

The only way the add request call can fail is if it can't write its epilogue
commands to the ring (cache flushing, seqno updates, interrupt signalling). The
reasons for that are mostly down to running out of ring buffer space and the
problems associated with trying to get some more. This patch prevents that
situation from happening in the first place.

When a request is created, it marks sufficient space as reserved for the
epilogue commands. Thus guaranteeing that by the time the epilogue is written,
there will be plenty of space for it. Note that a ring_begin() call is required
to actually reserve the space (and do any potential waiting). However, that is
not currently done at request creation time. This is because the ring_begin()
code can allocate a request. Hence calling begin() from the request allocation
code would lead to infinite recursion! Later patches in this series remove the
need for begin() to do the allocate. At that point, it becomes safe for the
allocate to call begin() and really reserve the space.

Until then, there is a potential for insufficient space to be available at the
point of calling i915_add_request(). However, that would only be in the case
where the request was created and immediately submitted without ever calling
ring_begin() and adding any work to that request. Which should never happen. And
even if it does, and if that request happens to fall down the tiny window of
opportunity for failing due to being out of ring space then does it really
matter because the request wasn't doing anything in the first place?

v2: Updated the 'reserved space too small' warning to include the offending
sizes. Added a 'cancel' operation to clean up when a request is abandoned. Added
re-initialisation of tracking state after a buffer wrap to keep the sanity
checks accurate.

v3: Incremented the reserved size to accommodate Ironlake (after finally
managing to run on an ILK system). Also fixed missing wrap code in LRC mode.

v4: Added extra comment and removed duplicate WARN (feedback from Tomas).

For: VIZ-5115
CC: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:01:56 +02:00
Arun Siluvery c82435bbe5 drm/i915/gen8: Add WaFlushCoherentL3CacheLinesAtContextSwitch workaround
In Indirect context w/a batch buffer,
+WaFlushCoherentL3CacheLinesAtContextSwitch:bdw

v2: Add LRI commands to set/reset bit that invalidates coherent lines,
update WA to include programming restrictions and exclude CHV as
it is not required (Ville)

v3: Avoid unnecessary read when it can be done by reading register once (Chris).

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:01:41 +02:00
Arun Siluvery 7ad00d1ac1 drm/i915/gen8: Add WaDisableCtxRestoreArbitration workaround
In Indirect and Per context w/a batch buffer,
+WaDisableCtxRestoreArbitration

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:01:41 +02:00
Arun Siluvery c4db759919 drm/i915/gen8: Re-order init pipe_control in lrc mode
Some of the WA applied using WA batch buffers perform writes to scratch page.
In the current flow WA are initialized before scratch obj is allocated.
This patch reorders intel_init_pipe_control() to have a valid scratch obj
before we initialize WA.

v2: Check for valid scratch page before initializing WA as some of them
perform writes to it.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:01:40 +02:00
Arun Siluvery 17ee950df3 drm/i915/gen8: Add infrastructure to initialize WA batch buffers
Some of the WA are to be applied during context save but before restore and
some at the end of context save/restore but before executing the instructions
in the ring, WA batch buffers are created for this purpose and these WA cannot
be applied using normal means. Each context has two registers to load the
offsets of these batch buffers. If they are non-zero, HW understands that it
need to execute these batches.

v1: In this version two separate ring_buffer objects were used to load WA
instructions for indirect and per context batch buffers and they were part
of every context.

v2: Chris suggested to include additional page in context and use it to load
these WA instead of creating separate objects. This will simplify lot of things
as we need not explicity pin/unpin them. Thomas Daniel further pointed that GuC
is planning to use a similar setup to share data between GuC and driver and
WA batch buffers can probably share that page. However after discussions with
Dave who is implementing GuC changes, he suggested to use an independent page
for the reasons - GuC area might grow and these WA are initialized only once and
are not changed afterwards so we can share them share across all contexts.

The page is updated with WA during render ring init. This has an advantage of
not adding more special cases to default_context.

We don't know upfront the number of WA we will applying using these batch buffers.
For this reason the size was fixed earlier but it is not a good idea. To fix this,
the functions that load instructions are modified to report the no of commands
inserted and the size is now calculated after the batch is updated. A macro is
introduced to add commands to these batch buffers which also checks for overflow
and returns error.
We have a full page dedicated for these WA so that should be sufficient for
good number of WA, anything more means we have major issues.
The list for Gen8 is small, same for Gen9 also, maybe few more gets added
going forward but not close to filling entire page. Chris suggested a two-pass
approach but we agreed to go with single page setup as it is a one-off routine
and simpler code wins.

One additional option is offset field which is helpful if we would like to
have multiple batches at different offsets within the page and select them
based on some criteria. This is not a requirement at this point but could
help in future (Dave).

Chris provided some helpful macros and suggestions which further simplified
the code, they will also help in reducing code duplication when WA for
other Gen are added. Add detailed comments explaining restrictions.
Use do {} while(0) for wa_ctx_emit() macro.

(Many thanks to Chris, Dave and Thomas for their reviews and inputs)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:01:39 +02:00
Arun Siluvery 2e5356da37 drm/i915: Initialize HWS page address after GPU reset
After GPU reset, HW is losing the address of HWS page in the register.
The page itself is valid except that HW is not aware of its location.

[   64.368623] [drm:gen8_init_common_ring [i915]] *ERROR* HWS Page address = 0x00000000
[   64.368655] [drm:gen8_init_common_ring [i915]] *ERROR* HWS Page address = 0x00000000
[   64.368681] [drm:gen8_init_common_ring [i915]] *ERROR* HWS Page address = 0x00000000
[   64.368704] [drm:gen8_init_common_ring [i915]] *ERROR* HWS Page address = 0x00000000

This patch reloads this value into the register during ring init.

Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2015-06-04 11:10:21 +03:00
Michel Thierry d63f820f39 drm/i915: Remove unnecessary null check in execlists_context_unqueue
commit 53292cdb06 ("drm/i915: Workaround
to avoid lite restore with HEAD==TAIL") added a check for req0 != null
which is unnecessary.

The only way req0 could be null is if the list was empty, and this is
already addressed at the beginning of execlists_context_unqueue().

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-05-27 13:20:51 +02:00
Chris Wilson 03ade51185 drm/i915: Inline check required for object syncing prior to execbuf
This trims a little overhead from the common case of not needing to
synchronize between rings.

v2: execlists is special and likes to duplicate code.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-05-21 15:11:43 +02:00
Chris Wilson b47161858b drm/i915: Implement inter-engine read-read optimisations
Currently, we only track the last request globally across all engines.
This prevents us from issuing concurrent read requests on e.g. the RCS
and BCS engines (or more likely the render and media engines). Without
semaphores, we incur costly stalls as we synchronise between rings -
greatly impacting the current performance of Broadwell versus Haswell in
certain workloads (like video decode). With the introduction of
reference counted requests, it is much easier to track the last request
per ring, as well as the last global write request so that we can
optimise inter-engine read read requests (as well as better optimise
certain CPU waits).

v2: Fix inverted readonly condition for nonblocking waits.
v3: Handle non-continguous engine array after waits
v4: Rebase, tidy, rewrite ring list debugging
v5: Use obj->active as a bitfield, it looks cool
v6: Micro-optimise, mostly involving moving code around
v7: Fix retire-requests-upto for execlists (and multiple rq->ringbuf)
v8: Rebase
v9: Refactor i915_gem_object_sync() to allow the compiler to better
optimise it.

Benchmark: igt/gem_read_read_speed
hsw:gt3e (with semaphores):
Before: Time to read-read 1024k:		275.794µs
After:  Time to read-read 1024k:		123.260µs

hsw:gt3e (w/o semaphores):
Before: Time to read-read 1024k:		230.433µs
After:  Time to read-read 1024k:		124.593µs

bdw-u (w/o semaphores):             Before          After
Time to read-read 1x1:            26.274µs       10.350µs
Time to read-read 128x128:        40.097µs       21.366µs
Time to read-read 256x256:        77.087µs       42.608µs
Time to read-read 512x512:       281.999µs      181.155µs
Time to read-read 1024x1024:    1196.141µs     1118.223µs
Time to read-read 2048x2048:    5639.072µs     5225.837µs
Time to read-read 4096x4096:   22401.662µs    21137.067µs
Time to read-read 8192x8192:   89617.735µs    85637.681µs

Testcase: igt/gem_concurrent_blit (read-read and friends)
Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [v8]
[danvet: s/\<rq\>/req/g]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-05-21 15:11:42 +02:00
Peter Antoine 779949f4b1 drm/i915: Warn when execlists changes context without IRQs
If an batch ends while the IRQs are not turned on the notification can
go missing and the GPU can hang. So generate a warning in this case.

Signed-off-by: Peter Antoine <peter.antoine@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-05-20 11:25:44 +02:00
Dan Carpenter 3126a660f3 drm/i915: checking IS_ERR() instead of NULL
We switched from calling i915_gem_alloc_context_obj() to calling
i915_gem_alloc_object() so the error handling needs to be updated to
check for NULL instead of IS_ERR().

Fixes: 149c86e74f ('drm/i915: Allocate context objects from stolen')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-05-08 13:03:23 +02:00
Dave Airlie e1dee1973c Merge tag 'drm-intel-next-2015-04-23-fixed' of git://anongit.freedesktop.org/drm-intel into drm-next
drm-intel-next-2015-04-23:
- dither support for ns2501 dvo (Thomas Richter)
- some polish for the gtt code and fixes to finally enable the cmd parser on hsw
- first pile of bxt stage 1 enabling (too many different people to list ...)
- more psr fixes from Rodrigo
- skl rotation support from Chandra
- more atomic work from Ander and Matt
- pile of cleanups and micro-ops for execlist from Chris
drm-intel-next-2015-04-10:
- cdclk handling cleanup and fixes from Ville
- more prep patches for olr removal from John Harrison
- gmbus pin naming rework from Jani (prep for bxt)
- remove ->new_config from Ander (more atomic conversion work)
- rps (boost) tuning and unification with byt/bsw from Chris
- cmd parser batch bool tuning from Chris
- gen8 dynamic pte allocation (Michel Thierry, based on work from Ben Widawsky)
- execlist tuning (not yet all of it) from Chris
- add drm_plane_from_index (Chandra)
- various small things all over

* tag 'drm-intel-next-2015-04-23-fixed' of git://anongit.freedesktop.org/drm-intel: (204 commits)
  drm/i915/gtt: Allocate va range only if vma is not bound
  drm/i915: Enable cmd parser to do secure batch promotion for aliasing ppgtt
  drm/i915: fix intel_prepare_ddi
  drm/i915: factor out ddi_get_encoder_port
  drm/i915/hdmi: check port in ibx_infoframe_enabled
  drm/i915/hdmi: fix vlv infoframe port check
  drm/i915: Silence compiler warning in dvo
  drm/i915: Update DRIVER_DATE to 20150423
  drm/i915: Enable dithering on NatSemi DVO2501 for Fujitsu S6010
  rm/i915: Move i915_get_ggtt_vma_pages into ggtt_bind_vma
  drm/i915: Don't try to outsmart gcc in i915_gem_gtt.c
  drm/i915: Unduplicate i915_ggtt_unbind/bind_vma
  drm/i915: Move ppgtt_bind/unbind around
  drm/i915: move i915_gem_restore_gtt_mappings around
  drm/i915: Fix up the vma aliasing ppgtt binding
  drm/i915: Remove misleading comment around bind_to_vm
  drm/i915: Don't use atomics for pg_dirty_rings
  drm/i915: Don't look at pg_dirty_rings for aliasing ppgtt
  drm/i915/skl: Support Y tiling in MMIO flips
  drm/i915: Fixup kerneldoc for struct intel_context
  ...

Conflicts:
	drivers/gpu/drm/i915/i915_drv.c
2015-05-08 20:51:06 +10:00
Michel Thierry 53292cdb06 drm/i915: Workaround to avoid lite restore with HEAD==TAIL
WaIdleLiteRestore is an execlists-only workaround, and requires the driver
to ensure that any context always has HEAD!=TAIL when attempting lite
restore.

Add two extra MI_NOOP instructions at the end of each request, but keep
the requests tail pointing before the MI_NOOPs. We may not need to
executed them, and this is why request->tail is sampled before adding
these extra instructions.

If we submit a context to the ELSP which has previously been submitted,
move the tail pointer past the MI_NOOPs. This ensures HEAD!=TAIL.

v2: Move overallocation to gen8_emit_request, and added note about
sampling request->tail in commit message (Chris).

v3: Remove redundant request->tail assignment in __i915_add_request, in
lrc mode this is already set in execlists_context_queue.
Do not add wa implementation details inside gem (Chris).

v4: Apply the wa whenever the req has been resubmitted and update
comment (Chris).

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2015-04-23 23:56:52 +03:00
Daniel Vetter c5fe557dde Merge branch 'topic/bxt-stage1' into drm-intel-next-queued
Separate topic branch for bxt didn't work out since we needed to
refactor the gmbus code a bit to make it look decent. So backmerge.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2015-04-14 14:00:56 +02:00
Imre Deak 9647ff36ae drm/i915/gen9: fix PIPE_CONTROL flush for VS_INVALIDATE
On GEN9+ per specification a NULL PIPE_CONTROL needs to be emitted
before any PIPE_CONTROL command with the VS_INVALIDATE flag set.

Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Nick Hoath <nicholas.hoath@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-14 13:55:20 +02:00
Michel Thierry a6631bc8d6 drm/i915: Remove unused variable from execlists_context_queue
After commit d7b9ca2f7a
("drm/i915: Remove request->uniq")

dev_priv is no longer needed.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-10 16:34:44 +02:00
Chris Wilson 149c86e74f drm/i915: Allocate context objects from stolen
As we never expose context objects directly to userspace, we can forgo
allocating a first-class GEM object for them and prefer to use the
limited resource of reserved/stolen memory for them. Note this means
that their initial contents are undefined.

However, a downside of using stolen objects for execlists is that we
cannot access the physical address directly (thanks MCH!) which prevents
their use.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-10 10:41:24 +02:00
Chris Wilson d7b9ca2f7a drm/i915: Remove request->uniq
We already assign a unique identifier to every request: seqno. That
someone felt like adding a second one without even mentioning why and
tweaking ABI smells very fishy.

Fixes regression from
commit b3a38998f0
Author: Nick Hoath <nicholas.hoath@intel.com>
Date:   Thu Feb 19 16:30:47 2015 +0000

    drm/i915: Fix a use after free, and unbalanced refcounting

v2: Rebase

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Nick Hoath <nicholas.hoath@intel.com>
Cc: Thomas Daniel <thomas.daniel@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Jani Nikula <jani.nikula@intel.com>
[danvet: Fixup because different merge order.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-10 10:41:11 +02:00
Chris Wilson a6111f7b66 drm/i915: Reduce locking in execlist command submission
This eliminates six needless spin lock/unlock pairs when writing out
ELSP.

v2: Respin with my preferred colour.
v3: Mostly back to the original colour

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [v1]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-10 10:31:44 +02:00
Daniel Vetter 19ee66af15 drm/i915: Remove unused variable in intel_lrc.c
Already tagged this one and 0-day builder is failing me.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2015-04-10 10:18:47 +02:00
Chris Wilson 595e1eeb26 drm/i915: Remove vestigal DRI1 ring quiescing code
After the removal of DRI1, all access to the rings are through requests
and so we can always be sure that there is a request to wait upon to
free up available space. The fallback code only existed so that we could
quiesce the GPU following unmediated access by DRI1.

v2: Rebase

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-10 08:56:15 +02:00
Chris Wilson 4bb1bedb28 drm/i915: Use the global runtime-pm wakelock for a busy GPU for execlists
When we submit a request to the GPU, we first take the rpm wakelock, and
only release it once the GPU has been idle for a small period of time
after all requests have been complete. This means that we are sure no
new interrupt can arrive whilst we do not hold the rpm wakelock and so
can drop the individual get/put around every single request inside
execlists.

Note: to close one potential issue we should mark the GPU as busy
earlier in __i915_add_request.

To elaborate: The issue is that we emit the irq signalling sequence
before we grab the rpm reference, which means we could miss the
resulting interrupt (since that's not set up when suspended). The only
bad side effect is a missed interrupt, gt mmio writes automatically
wake up the hw itself. But otoh we have an umbrella rpm reference for
the entirety of execbuf, as long as that's there we're covered.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Explain a bit more about the add_request issue, which after
some irc chatting with Chris turns out to not be an issue really.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-10 08:56:14 +02:00
Chris Wilson b5eba37283 drm/i915: Use simpler form of spin_lock_irq(execlist_lock)
We can use the simpler spinlock form to disable interrupts as we are
always outside of an irq/softirq handler.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-10 08:56:14 +02:00
Michel Thierry d7b2633dba drm/i915/gen8: Dynamic page table allocations
This finishes off the dynamic page tables allocations, in the legacy 3
level style that already exists. Most everything has already been setup
to this point, the patch finishes off the enabling by setting the
appropriate function pointers.

In LRC mode, contexts need to know the PDPs when they are populated. With
dynamic page table allocations, these PDPs may not exist yet. Check if
PDPs have been allocated and use the scratch page if they do not exist yet.

Before submission, update the PDPs in the logic ring context as PDPs
have been allocated.

v2: Update aliasing/true ppgtt allocate/teardown/clear functions for
gen 6 & 7.

v3: Rebase.

v4: Remove BUG() from ppgtt_unbind_vma, but keep checking that either
teardown_va_range or clear_range functions exist (Daniel).

v5: Similar to gen6, in init, gen8_ppgtt_clear_range call is only needed
for aliasing ppgtt. Zombie tracking was originally added for teardown
function and is no longer required.

v6: Update err_out case in gen8_alloc_va_range (missed from lastest
rebase).

v7: Rebase after s/page_tables/page_table/.

v8: Updated scratch_pt check after scratch flag was removed in previous
patch.

v9: Note that lrc mode needs to be updated to support init state without
any PDP.

v10: Unmap correct page_table in gen8_alloc_va_range's error case,  clean-up
gen8_aliasing_ppgtt_init (remove duplicated map), and initialize PTs
during page table allocation.

v11: Squashed LRC enabling commit, otherwise LRC mode would be left broken
until it was updated to handle the init case without any PDP.

v12: Do not overallocate new_pts bitmap, make alloc_gen8_temp_bitmaps
static and don't abuse of inline functions. (Mika)

Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-10 08:56:13 +02:00
Michel Thierry e5815a2e05 drm/i915/gen8: Split out mappings
When we do dynamic page table allocations for gen8, we'll need to have
more control over how and when we map page tables, similar to gen6.
In particular, DMA mappings for page directories/tables occur at allocation
time.

This patch adds the functionality and calls it at init, which should
have no functional change.

The PDPEs are still a special case for now. We'll need a function for
that in the future as well.

v2: Handle renamed unmap_and_free_page functions.
v3: Updated after teardown_va logic was removed.
v4: Rebase after s/page_tables/page_table/.
v5: No longer allocate all PDPs in GEN8+ systems with less than 4GB of
memory, and update populate_lr_context to handle this new case (proper
tracking will be added later in the patch series).
v6: Assign lrc page directory pointer addresses using a macro. (Mika)

Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-10 08:56:12 +02:00
Chris Wilson 06fbca713e drm/i915: Split the batch pool by engine
I woke up one morning and found 50k objects sitting in the batch pool
and every search seemed to iterate the entire list... Painting the
screen in oils would provide a more fluid display.

One issue with the current design is that we only check for retirements
on the current ring when preparing to submit a new batch. This means
that we can have thousands of "active" batches on another ring that we
have to walk over. The simplest way to avoid that is to split the pools
per ring and then our LRU execution ordering will also ensure that the
inactive buffers remain at the front.

v2: execlists still requires duplicate code.
v3: execlists requires more duplicate code

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by:  Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-10 08:56:04 +02:00
Arun Siluvery 51847fb99f drm/i915: Do not set L3-LLC Coherency bit in ctx descriptor
According to Spec this is a reserved bit for Gen9+ and should not be set.

Change-Id: I0215fb7057b94139b7a2f90ecc7a0201c0c93ad4
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-10 08:55:58 +02:00
John Harrison dbe4646d6e drm/i915: Fix for ringbuf space wait in LRC mode
The legacy and LRC code paths have an almost identical procedure for waiting for
space in the ring buffer. They both search for a request in the free list that
will advance the tail to a point where sufficient space is available. They then
wait for that request, retire it and recalculate the free space value.

Unfortunately, a bug in the LRC side meant that the resulting free space might
not be as large as expected and indeed, might not be sufficient. This is because
it was testing against the value of request->tail not request->postfix. Whereas,
when a request is retired, ringbuf->tail is updated to req->postfix not
req->tail.

Another significant difference between the two is that the LRC one did not trust
the wait for request to work! It redid the is there enough space available test
and would fail the call if insufficient. Whereas, the legacy version just said
'return 0' - it assumed the preceeding code works. This difference meant that
the LRC version still worked even with the bug - it just fell back to the
polling wait path.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-01 07:54:43 +02:00
John Harrison 6689cb2b62 drm/i915: Move common request allocation code into a common function
The request allocation code is largely duplicated between legacy mode and
execlist mode. The actual difference between the two versions of the code is
pretty minimal.

This patch moves the common code out into a separate function. This is then
called by the execution specific version prior to setting up the one different
value.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-01 07:54:30 +02:00
John Harrison bc0dce3fd0 drm/i915: Make intel_logical_ring_begin() static
The only usage of intel_logical_ring_begin() is within intel_lrc.c so it can be
made static. To avoid a forward declaration at the top of the file, it and bunch
of other functions have been shuffled upwards.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-01 07:54:17 +02:00
Dave Airlie a8c6ecb3be Linux 4.0-rc3
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJU/NacAAoJEHm+PkMAQRiGdUcIAJU5dHclwd9HRc7LX5iOwYN6
 mN0aCsYjMD8Pjx2VcPCgJvkIoESQO5pkwYpFFWCwILup1bVEidqXfr8EPOdThzdh
 kcaT0FwUvd19K+0jcKVNCX1RjKBtlUfUKONk6sS2x4RrYZpv0Ur8Gh+yXV8iMWtf
 fAusNEYlxQJvEz5+NSKw86EZTr4VVcykKLNvj+/t/JrXEuue7IG8EyoAO/nLmNd2
 V/TUKKttqpE6aUVBiBDmcMQl2SUVAfp5e+KJAHmizdDpSE80nU59UC1uyV8VCYdM
 qwHXgttLhhKr8jBPOkvUxl4aSXW7S0QWO8TrMpNdEOeB3ZB8AKsiIuhe1JrK0ro=
 =Xkue
 -----END PGP SIGNATURE-----

Merge tag 'v4.0-rc3' into drm-next

Linux 4.0-rc3 backmerge to fix two i915 conflicts, and get
some mainline bug fixes needed for my testing box

Conflicts:
	drivers/gpu/drm/i915/i915_drv.h
	drivers/gpu/drm/i915/intel_display.c
2015-03-09 19:58:30 +10:00
Dave Airlie 8dd0eb3566 Merge tag 'drm-intel-next-2015-02-27' of git://anongit.freedesktop.org/drm-intel into drm-next
- Y tiling support for scanout from Tvrtko&Damien
- Remove more UMS support
- some small prep patches for OLR removal from John Harrison
- first few patches for dynamic pagetable allocation from Ben Widawsky, rebased
  by tons of other people
- DRRS support patches (Sonika&Vandana)
- fbc patches from Paulo
- make sure our vblank callbacks aren't called when the pipes are off
- various patches all over

* tag 'drm-intel-next-2015-02-27' of git://anongit.freedesktop.org/drm-intel: (61 commits)
  drm/i915: Update DRIVER_DATE to 20150227
  drm/i915: Clarify obj->map_and_fenceable
  drm/i915/skl: Allow Y (and Yf) frame buffer creation
  drm/i915/skl: Update watermarks for Y tiling
  drm/i915/skl: Updated watermark programming
  drm/i915/skl: Adjust get_plane_config() to support Yb/Yf tiling
  drm/i915/skl: Teach pin_and_fence_fb_obj() about Y tiling constraints
  drm/i915/skl: Adjust intel_fb_align_height() for Yb/Yf tiling
  drm/i915/skl: Allow scanning out Y and Yf fbs
  drm/i915/skl: Add new displayable tiling formats
  drm/i915: Remove DRIVER_MODESET checks from modeset code
  drm/i915: Remove regfile code&data for UMS suspend/resume
  drm/i915: Remove DRIVER_MODESET checks from gem code
  drm/i915: Remove DRIVER_MODESET checks in the gpu reset code
  drm/i915: Remove DRIVER_MODESET checks from suspend/resume code
  drm/i915: Remove DRIVER_MODESET checks in load/unload/close code
  drm/i915: fix a printk format
  drm/i915: Add media rc6 residency file to sysfs
  drm/i915: Add missing description to parameter in alloc_pt_range
  drm/i915: Removed the read of RP_STATE_CAP from sysfs/debugfs functions
  ...
2015-03-09 19:41:15 +10:00
Dave Airlie 7547af9186 Merge tag 'drm-intel-next-2015-02-14' of git://anongit.freedesktop.org/drm-intel into drm-next
- use the atomic helpers for plane_upate/disable hooks (Matt Roper)
- refactor the initial plane config code (Damien)
- ppgtt prep patches for dynamic pagetable alloc (Ben Widawsky, reworked and
  rebased by a lot of other people)
- framebuffer modifier support from Tvrtko Ursulin, drm core code from Rob Clark
- piles of workaround patches for skl from Damien and Nick Hoath
- vGPU support for xengt on the client side (Yu Zhang)
- and the usual smaller things all over

* tag 'drm-intel-next-2015-02-14' of git://anongit.freedesktop.org/drm-intel: (88 commits)
  drm/i915: Update DRIVER_DATE to 20150214
  drm/i915: Remove references to previously removed UMS config option
  drm/i915/skl: Use a LRI for WaDisableDgMirrorFixInHalfSliceChicken5
  drm/i915/skl: Fix always true comparison in a revision id check
  drm/i915/skl: Implement WaEnableLbsSlaRetryTimerDecrement
  drm/i915/skl: Implement WaSetDisablePixMaskCammingAndRhwoInCommonSliceChicken
  drm/i915: Add process identifier to requests
  drm/i915/skl: Implement WaBarrierPerformanceFixDisable
  drm/i915/skl: Implement WaCcsTlbPrefetchDisable:skl
  drm/i915/skl: Implement WaDisableChickenBitTSGBarrierAckForFFSliceCS
  drm/i915/skl: Implement WaDisableHDCInvalidation
  drm/i915/skl: Implement WaDisableLSQCROPERFforOCL
  drm/i915/skl: Implement WaDisablePartialResolveInVc
  drm/i915/skl: Introduce a SKL specific init_workarounds()
  drm/i915/skl: Document that we implement WaRsClearFWBitsAtReset
  drm/i915/skl: Implement WaSetGAPSunitClckGateDisable
  drm/i915/skl: Make the init clock gating function skylake specific
  drm/i915/skl: Provide a gen9 specific init_render_ring()
  drm/i915/skl: Document the WM read latency W/A with its name
  drm/i915/skl: Also detect eDRAM on SKL
  ...
2015-03-05 09:41:09 +10:00
John Harrison 98e1bd4ae6 drm/i915: Cache ringbuf pointer in request structure
In execlist mode, the ringbuf is a function of the ring and context whereas in
legacy mode, it is derived from the ring alone. Thus the calculation required to
determine the ringbuf pointer from the ring (and context) also needs to test
execlist mode or not. This is messy.

Further, the request structure holds a pointer to both the ring and the context
for which it was created. Thus, given a request, it is possible to derive the
ringbuf in either legacy or execlist mode. Hence it is necessary to pass just
the request in to all the low level functions rather than some combination of
request, ring, context and ringbuf. However, rather than recalculating it each
time, it is much simpler to just cache the ringbuf pointer in the request
structure itself.

Caching the pointer means the calculation is done once at request creation time
and all further code and simply read it directly from the request structure.

OTC-Jira: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
[danvet: Drop contentless comment in lrc alloc request entirely. And
spelling fix in the commit message.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-25 22:53:10 +01:00
John Harrison 5e4be7bda1 drm/i915: Add missing trace point to LRC execbuff code path
There is a trace point in the legacy execbuffer execution path that is missing
from the execlist path. Trace points are extremely useful for debugging and are
used by various automated validation tests. Hence, this patch adds the missing
trace point back in.

OTC-Jira: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-25 22:48:21 +01:00
John Harrison 8e004efc16 drm/i915: Rename 'flags' to 'dispatch_flags' for better code reading
There is a flags word that is passed through the execbuffer code path all the
way from initial decoding of the user parameters down to the very final dispatch
buffer call. It is simply called 'flags'. Unfortuantely, there are many other
flags words floating around in the same blocks of code. Even more once the GPU
scheduler arrives.

This patch makes it more obvious exactly which flags word is which by renaming
'flags' to 'dispatch_flags'. Note that the bit definitions for this flags word
already have an 'I915_DISPATCH_' prefix on them and so are not quite so
ambiguous.

OTC-Jira: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
[danvet: Resolve conflict with Chris' rework of the bb parsing.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-25 22:43:29 +01:00
Ben Widawsky 06fda602db drm/i915: Create page table allocators
As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks.  This makes the
code easier to write, read, and verify.

Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,

The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain additional complexity as a
result.

This patch exists for a few reasons:
1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the DMA mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.

2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).

3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and catching leaks should be easy.

4. Less important: the GFP flags are confined to one location, which
makes playing around with such things trivial.

v2: Updated commit message to explain why this patch exists

v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/

v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)

v5: Added additional safety checks in gen8 clear/free/unmap.

v6: Use WARN_ON and return -EINVAL in alloc_pt_range (Mika).

v7: Make err_out loop symmetrical to the way we allocate in
alloc_pt_range. Also s/page_tables/page_table and correct commit
message (Mika)

Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-25 16:53:43 +01:00
Ben Widawsky 7324cc0491 drm/i915: Complete page table structures
Move the remaining members over to the new page table structures.

This can be squashed with the previous commit if desire. The reasoning
is the same as that patch. I simply felt it is easier to review if split.

v2: In lrc: s/ppgtt->pd_dma_addr[i]/ppgtt->pdp.page_directory[i].daddr/
v3: Rebase.
v4: Rebased after s/page_tables/page_table/.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-25 16:53:07 +01:00
Nick Hoath b3a38998f0 drm/i915: Fix a use after free, and unbalanced refcounting
When converting from implicitly tracked execlist queue items to ref counted
requests, not all frees of requests were replaced with unrefs, and extraneous
refs/unrefs of contexts were added.
Correct the unbalanced refcount & replace the frees.
Remove a noisy warning when hitting the request creation path.

drm_i915_gem_request and intel_context are both kref reference counted
structures. Upon allocation, drm_i915_gem_request's ref count should be
bumped using kref_init. When a context is assigned to the request,
the context's reference count should be bumped using i915_gem_context_reference.
i915_gem_request_reference will reduce the context reference count when
the request is freed.

Problem introduced in
commit 6d3d8274bc
Author:     Nick Hoath <nicholas.hoath@intel.com>
AuthorDate: Thu Jan 15 13:10:39 2015 +0000

     drm/i915: Subsume intel_ctx_submit_request in to drm_i915_gem_request

v2: Added comments explaining how the ctx pointer and the request object should
be ref-counted. Removed noisy warning.

v3: Cleaned up the language used in the commit & the header
description (Thanks David Gordon)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88652
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2015-02-24 15:18:37 +02:00
Thomas Daniel 3e5b6f05a2 drm/i915: Reset logical ring contexts' head and tail during GPU reset
Work was getting left behind in LRC contexts during reset.  This causes a hang
if the GPU is reset when HEAD==TAIL because the context's ringbuffer head and
tail don't get reset and retiring a request doesn't alter them, so the ring
still appears full.

Added a function intel_lr_context_reset() to reset head and tail on a LRC and
its ringbuffer.

Call intel_lr_context_reset() for each context in i915_gem_context_reset() when
in execlists mode.

Testcase: igt/pm_rps --run-subtest reset #bdw
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88096
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
[danvet: Flatten control flow in the lrc reset code a notch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-24 00:19:37 +01:00
Jeff McGee 0cea6502bf drm/i915: Request full SSEU enablement on Gen9
On Gen9 the render power gating can leave slice/subslice/EU in
a partially enabled state. We must make an explicit request for
full SSEU enablement through the Render Power Clock State
register when resuming render work. This register is save/
restored in the logical ring context image for execlist
submission mode. Initialize its value in each LRC image to
request full enablement according to the device SSEU config.

Thanks to Sharma Ankitprasad and Akash Goel for highlighting the
issue and proposing the initial fix on which this patch is based.

v2: Adjusted the names of the power gating support flags to fit
    update of an earlier patch.

Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
Reviewed-by: "Akash Goel <akash.goel@intel.com>"
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-23 23:57:13 +01:00
Damien Lespiau 82ef822e65 drm/i915/skl: Provide a gen9 specific init_render_ring()
WaDisableAsyncFlipPerfMode isn't listed for SKL and
INSTPM_FORCE_ORDERING is MBZ so let's make a gen9 specific render init
function.

Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Nick Hoath <nicholas.hoath@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-13 23:28:32 +01:00
Damien Lespiau 183c990673 drm/i915: Make intel_logical_ring_advance_and_submit() static
This function is only used in intel_lrc.c, so restrict it to that file. The
function was moved around to avoid a forward declaration and group it with its
user.

Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-13 23:28:28 +01:00
Damien Lespiau cef437ad22 drm/i915: Make intel_lr_context_render_state_init() static
This function is only used in intel_lrc.c, so restrict it to that file. The
function was moved around to avoid a forward declaration and group it with its
user.

Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-13 23:28:28 +01:00
Zhi Wang 5baa22c59f drm/i915: Introduce bit definitions of CTXT_SR_CTRL register.
This patch introduces 2 bit definitions of context save/restore
control register.

Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Suggested-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-13 23:28:22 +01:00
Nick Hoath 203a571b21 drm/i915: gen 9 h/w w/a (WaEnableForceRestoreInCtxtDescForVCS)
Add:
WaEnableForceRestoreInCtxtDescForVCS

v2: Add stepping check.

v3: Fixed stepping check direction. Cleaned up indentation.

Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-13 23:28:13 +01:00
Chris Wilson f0a1fb10e5 drm/i915: Insert a command barrier on BLT/BSD cache flushes
This looked like an odd regression from

commit ec5cc0f9b0
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Jun 12 10:28:55 2014 +0100

    drm/i915: Restrict GPU boost to the RCS engine

but in reality it undercovered a much older coherency bug. The issue that
boosting the GPU frequency on the BCS ring was masking was that we could
wake the CPU up after completion of a BCS batch and inspect memory prior
to the write cache being fully evicted. In order to serialise the
breadcrumb interrupt (and so ensure that the CPU's view of memory is
coherent) we need to perform a post-sync operation in the MI_FLUSH_DW.

v2: Fix all the MI_FLUSH_DW (bsd plus the duplication in execlists).

Also fix the invalidate_domains mask in gen8_emit_flush() for ring !=
VCS.

Testcase: gpuX-rcs-gpu-read-after-write
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2015-02-09 20:03:15 +02:00
Nick Hoath f82107950e drm/i915: Fix a use-after-free in intel_execlists_retire_requests
Remove request from list before unreferencing it, in case it's actually
the only reference. (Found by Tvrtko Ursulin)

This issue has been most likely introduced in

commit 6d3d8274bc
Author: Nick Hoath <nicholas.hoath@intel.com>
Date:   Thu Jan 15 13:10:39 2015 +0000

    drm/i915: Subsume intel_ctx_submit_request in to drm_i915_gem_request

Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-30 19:38:13 +01:00
Mika Kuoppala a7cbedec83 drm/i915: Rename unpin_count to pin_count
We increase it when we pin, so for the casual reader
rename it to cause less confusion.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-27 09:51:06 +01:00
Mika Kuoppala 1197b4f230 drm/i915: Balance context pinning on reset cleanup
We pin when we submit to execlist queue. Balance
the pinning when the submitted queue is cleaned on reset.

Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-27 09:51:06 +01:00
Mika Kuoppala 59bad94718 drm/i915: Rename the forcewake get/put functions
We have multiple forcewake domains now on recent gens. Change the
function naming to reflect this.

v2: More verbose names (Chris)
v3: Rebase
v4: Rebase
v5: Add documentation for forcewake_get/put

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Deepak S <deepak.s@linux.intel.com> (v2)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-27 09:50:57 +01:00
Chris Wilson 6daccb0b2a drm/i915: Assert that runtime pm is active on user fw access
On user forcewake access, assert that runtime pm reference is held.
Fix and cleanup the callsites accordingly.

v2: Remove intel_runtime_pm_get() rebasehap (Deepak)

v3: use drivers own runtime state tracking as pm_runtime_active()
    will return wrong results when we are in resume callchain (Mika)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Deepak S <deepak.s@linux.intel.com> (v2)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-27 09:50:54 +01:00
Nick Hoath 6d3d8274bc drm/i915: Subsume intel_ctx_submit_request in to drm_i915_gem_request
Move all remaining elements that were unique to execlists queue items
in to the associated request.

Issue: VIZ-4274

v2: Rebase. Fixed issue of overzealous freeing of request.
v3: Removed re-addition of cleanup work queue (found by Daniel Vetter)
v4: Rebase.
v5: Actual removal of intel_ctx_submit_request. Update both tail and postfix
pointer in __i915_add_request (found by Thomas Daniel)
v6: Removed unrelated changes

Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
[danvet: Reformat comment with strange linebreaks.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-27 09:50:53 +01:00
Nick Hoath 21076372af drm/i915: Remove FIXME_lrc_ctx backpointer
The first pass implementation of execlists required a backpointer to the context to be held
in the intel_ringbuffer. However the context pointer is available higher in the call stack.
Remove the backpointer from the ring buffer structure and instead pass it down through the
call stack.

v2: Integrate this changeset with the removal of duplicate request/execlist queue item members.
v3: Rebase
v4: Rebase. Remove passing of context when the request is passed.

Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-27 09:50:53 +01:00
Nick Hoath 72f95afa5f drm/i915: Removed duplicate members from submit_request
Where there were duplicate variables for the tail, context and ring (engine)
in the gem request and the execlist queue item, use the one from the request
and remove the duplicate from the execlist queue item.

Issue: VIZ-4274

v1: Rebase
v2: Fixed build issues. Keep separate postfix & tail pointers as these are
used in different ways. Reinserted missing full tail pointer update.

Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-27 09:50:52 +01:00
Nick Hoath 2d12955a3e drm/i915: execlist request keeps ptr/ref to gem_request
Add a reference and pointer from the execlist queue item to the associated
gem request. For execlist requests that don't have a request, create one
as a placeholder.

Issue: VIZ-4274
v1: Rebase after upstream of "Replace seqno values with request structures" patchset.

Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-27 09:50:52 +01:00
Thomas Daniel c0a03a2e4c drm/i915: Reset CSB read pointer in ring init
A previous commit enabled execlists by default:

       commit 27401d126b ("drm/i915/bdw: Enable execlists by default where supported")

This allowed routine testing of execlists which exposed a regression when
resuming from suspend.  The cause was tracked down the to recent changes to the
ring init sequence:

       commit 35a57ffbb1 ("drm/i915: Only init engines once")

During a suspend/resume cycle the hardware Context Status Buffer write pointer
is reset.  However since the recent changes to the init sequence the software CSB
read pointer is no longer reset.  This means that context status events are not
handled correctly and new contexts are not written to the ELSP, resulting in an
apparent GPU hang.

Pending further changes to the ring init code, just move the
ring->next_context_status_buffer initialization into gen8_init_common_ring to
fix this regression.

v2: Moved init into gen8_init_common_ring rather than context_enable after
feedback from Daniel Vetter.  Updated commit msg to reflect this and also cite
commits related to the regression.  Fixed bz link to correct bug.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88096
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-13 00:11:52 +01:00
Michel Thierry e6c1abb739 drm/i915: Warn about missing context state workarounds only once
Otherwise, new platforms without workarounds will hit this warning for
every new context created.

Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-16 10:39:12 +01:00
Thomas Daniel 27401d126b drm/i915/bdw: Enable execlists by default where supported
Execlist support in the i915 driver is now considered good enough for the
feature to be enabled by default on Gen8 and later and routinely tested.
Adjusted i915 parameters structure initialization to reflect this and updated
the comment in intel_sanitize_enable_execlists().

There's still work to do before we can let the wider massive onto it,
but there's still time left before the 3.20 cutoff.

v2: Update the MODULE_PARM_DESC too.

Issue: VIZ-2020
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
[danvet: Add note that there's still some work left to do.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-15 11:25:28 +01:00
Daniel Vetter 3f7531c3b3 drm/i915: Name the lrc irq handler correctly
We consistently use the _irq_handler postfix for functions called in
hardirq context. Especially when it's a non-static function hardirq is
a crazy enough calling context to warrant this level of ocd. So rename
it.

Cc: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2014-12-15 09:54:05 +01:00
John Harrison 67e2937bf4 drm/i915: Add unique id to the request structure for debugging
For debugging purposes, it is useful to be able to uniquely identify a given
request structure as it works its way through the system. This becomes
especially tricky once the seqno value is lazily allocated as then the request
has nothing but its pointer to identify it for much of its life.

Change-Id: Ie76b2268b940467f4cdf5a4ba6f5a54cbb96445d
For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-06 01:46:27 +01:00
John Harrison aaeb1ba041 drm/i915: Zero fill the request structure
There is a general theory that kzmalloc is better/safer than kmalloc, especially
for interesting data structures. This change updates the request structure
allocation to be zero filled.

This also fixes crashes in the reset code. Quoting Mika's patch:

"Clean the request structure on alloc. Otherwise we might end up
referencing uninitialized fields.  This is apparent when we try to
cleanup the preallocated request on ring reset, before any request has
been submitted to the ring.  The request->ctx is foobar and we end up
freeing the foobarness."

Note that this fixes a regression introduced in

commit 9eba5d4a1d
Author: John Harrison <John.C.Harrison@Intel.com>
Date:   Mon Nov 24 18:49:23 2014 +0000

    drm/i915: Ensure OLS & PLR are always in sync

References: https://bugs.freedesktop.org/show_bug.cgi?id=86959
References: https://bugs.freedesktop.org/show_bug.cgi?id=86962
References: https://bugs.freedesktop.org/show_bug.cgi?id=86992
Change-Id: I68715ef758025fab8db763941ef63bf60d7031e2
For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-06 01:46:26 +01:00
Ville Syrjälä 8edfbb8bfc drm/i915: s/MI_STORE_DWORD_IMM_GEN8/MI_STORE_DWORD_IMM_GEN4/
MI_STORE_DWORD_IMM length has been the same ever since gen4. Rename
the define to avoid potential confusion if someone tries to use this
on pre-gen8.

Also correct the comment on MI_MEM_VIRTUAL bit. It's present on 945,g33
and 965 only.

Cc: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
[danvet: Add USE_GGTT define for g4x+ too.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-05 15:16:35 +01:00
Thomas Daniel e7778be1ea drm/i915: Fix startup failure in LRC mode after recent init changes
A previous commit introduced engine init changes:

    commit 372ee59699d9 ("drm/i915: Only init engines once")

This broke execlists as intel_lr_context_render_state_init was trying to emit
commands to the RCS for the default context before the ring->init_hw was called.

Made a new gen8_init_rcs_context function and assign in to render ring
init_context.  Moved call to intel_logical_ring_workarounds_emit into
gen8_init_rcs_context to maintain previous functionality.

Moved call to render_state_init from lr_context_deferred_create into
gen8_init_rcs_context, and modified deferred_create to call ring->init_context
for non-default contexts.

Modified i915_gem_context_enable to call ring->init_context for the default
context.

So init_context will now always be called when the hw is ready - in
i915_gem_context_enable for the default context and in lr_context_deferred_create
for other contexts.

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:35:30 +01:00
Daniel Vetter bfc882b4e3 drm/i915: Flatten engine init control flow
Now that sanity prevails and we have the clean split between software
init and starting the engines we can drop all the "have we allocate
this struct already?" nonsense.

Execlist code could benefit quite a bit more still, but that's for
another patch.

Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2014-12-03 09:35:29 +01:00
Daniel Vetter 35a57ffbb1 drm/i915: Only init engines once
We can do this.

And now there's finally the clean split between software setup and
hardware setup I kinda wanted since multi-ring support was merged
aeons ago. It only took almost 5 years.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:35:28 +01:00
Daniel Vetter 99be1dfe06 drm/i915: Move intel_init_pipe_control out of engine->init_hw
With this all the ->init_hw hooks really only set up hw state needed
to start the ring, all the software state setup and memory/buffer
allocations happen beforehand.

v2: We need to call intel_init_pipe_control after the ring init since
otherwise engine->dev is NULL and it falls over. Currently that's
now after the hw ring is enabled but a) we'll be fine as long as no
one submits a batch b) this will change soon.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:35:27 +01:00
Daniel Vetter ecfe00d802 drm/i915: s/init()/init_hw()/ in intel_engine_cs
This is (mostly, some exceptions that need fixing) the hw setup
function which starts the ring. And not the function which allocates
all the resources.

Make this clear by giving it a better name.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:35:27 +01:00
Dave Gordon ebd0fd4bef drm/i915: Consolidate ring freespace calculations
There are numerous places in the code where the driver's idea of
how much space is left in a ring is updated using the driver's
latest notions of the positions of 'head' and 'tail' for the ring.
Among them are some that update one or both of these values before
(re)doing the calculation. In particular, there are four different
places in the code where 'last_retired_head' is copied to 'head'
and then set to -1; and two of these do not have a guard to check
that it has actually been updated since last time it was consumed,
leaving the possibility that the dummy -1 can be transferred from
'last_retired_head' to 'head', causing the space calculation to
produce 'impossible' results (previously seen on Android/VLV).

This code therefore consolidates all the calculation and updating of
these values, such that there is only one place where the ring space
is updated, and it ALWAYS uses (and consumes) 'last_retired_head' if
(and ONLY if) it has been updated since the last call.

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:35:26 +01:00
John Harrison ff79e85702 drm/i915: Connect requests to rings at creation not submission
It makes a lot more sense (and makes future seqno -> request conversion patches
simpler) to fill in the 'ring' field of the request structure at the point of
creation rather than submission. Given that the request structure is assigned by
ring specific code and thus is locked to a ring from the start, there really is
no reason to defer this assignment.

For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:35:22 +01:00
John Harrison 9400ae5c82 drm/i915: Remove obsolete seqno parameter from 'i915_add_request'
There is no longer any need to retrieve a seqno value from an i915_add_request()
call. The calling code already knows which request structure is being processed
(it can only be ring->OLR). And as the request itself is now used in preference
to the basic seqno value, the latter is now redundant in this situation.

For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:35:19 +01:00
Daniel Vetter a4b3a5713d drm/i915: Convert i915_wait_seqno to i915_wait_request
Updated i915_wait_seqno() to take a request structure instead of a seqno value
and renamed it accordingly. Internally, it just pulls the seqno out of the
request and calls on to __wait_seqno() as before. However, all the code further
up the stack is now simplified as it can just pass the request object straight
through without having to peek inside.

For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
[danvet: Squash in hunk from an earlier patch which was rebased
wrongly.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:35:17 +01:00
John Harrison 6259cead57 drm/i915: Remove 'outstanding_lazy_seqno'
The OLS value is now obsolete. Exactly the same value is guarateed to be always
available as PLR->seqno. Thus it is safe to remove the OLS completely. And also
to rename the PLR to OLR to keep the 'outstanding lazy ...' naming convention
valid.

For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:35:16 +01:00
John Harrison abfe262ae7 drm/i915: Add reference count to request structure
The plan is to use request structures everywhere that seqno values were
previously used. This means saving pointers to structures in places that used to
be simple integers. In turn, that means that the target structure now needs much
more stringent lifetime tracking. That is, it must not be freed while some other
random object still holds a pointer to it.

To achieve this tracking, a reference count needs to be added. Whenever a
pointer to the structure is saved away, the count must be incremented and the
free must only occur when all references have been released.

For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:35:13 +01:00
John Harrison 9eba5d4a1d drm/i915: Ensure OLS & PLR are always in sync
The aim is to replace seqno values with request structures. A step along the way
is to switch to using the PLR in preference to the OLS. That requires the PLR to
only be valid when and only when the OLS is also valid. I.e., the two must be
kept in lock step. Then, code which was using the OLS can be safely switched
over to using the PLR instead.

For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:35:13 +01:00
Dave Gordon d65621c496 drm/i915: Don't read 'HEAD' MMIO register in LRC mode
The logical ring code was updating the software ring 'head' value
by reading the hardware 'HEAD' register. In LRC mode, this is not
valid as the hardware is not necessarily executing the same context
that is being processed by the software. Thus reading the h/w HEAD
could put an unrelated (undefined, effectively random) value into
the s/w 'head' -- A Bad Thing for the free space calculations.

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Deepak S <deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:35:11 +01:00
Dave Gordon 57e215135f drm/i915: Check for matching ringbuffer in logical_ring_wait_request()
The request queue is per-engine, and may therefore contain requests
from several different contexts/ringbuffers. In determining which
request to wait for, this function should only consider requests
from the ringbuffer that it's checking for space, and ignore any
that it finds that belong to other contexts.

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Deepak S <deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:35:10 +01:00
Thomas Daniel 7ba717cf36 drm/i915/bdw: Pin the ringbuffer backing object to GGTT on-demand
Same as with the context, pinning to GGTT regardless is harmful (it
badly fragments the GGTT and can even exhaust it).

Unfortunately, this case is also more complex than the previous one
because we need to map and access the ringbuffer in several places
along the execbuffer path (and we cannot make do by leaving the
default ringbuffer pinned, as before). Also, the context object
itself contains a pointer to the ringbuffer address that we have to
keep updated if we are going to allow the ringbuffer to move around.

v2: Same as with the context pinning, we cannot really do it during
an interrupt. Also, pin the default ringbuffers objects regardless
(makes error capture a lot easier).

v3: Rebased. Take a pin reference of the ringbuffer for each item
in the execlist request queue because the hardware may still be using
the ringbuffer after the MI_USER_INTERRUPT to notify the seqno update
is executed.  The ringbuffer must remain pinned until the context save
is complete.  No longer pin and unpin ringbuffer in
populate_lr_context() - this transient address is meaningless and the
pinning can cause a sleep while atomic.

v4: Moved ringbuffer pin and unpin into the lr_context_pin functions.
Downgraded pinning check BUG_ONs to WARN_ONs.

v5: Reinstated WARN_ONs for unexpected execlist states.  Removed unused
variable.

Issue: VIZ-4277
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Akash Goel <akash.goels@gmail.com>
Reviewed-by: Deepak S<deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-11-19 19:56:44 +01:00
Oscar Mateo dcb4c12a68 drm/i915/bdw: Pin the context backing objects to GGTT on-demand
Up until now, we have pinned every logical ring context backing object
during creation, and left it pinned until destruction. This made my life
easier, but it's a harmful thing to do, because we cause fragmentation
of the GGTT (and, eventually, we would run out of space).

This patch makes the pinning on-demand: the backing objects of the two
contexts that are written to the ELSP are pinned right before submission
and unpinned once the hardware is done with them. The only context that
is still pinned regardless is the global default one, so that the HWS can
still be accessed in the same way (ring->status_page).

v2: In the early version of this patch, we were pinning the context as
we put it into the ELSP: on the one hand, this is very efficient because
only a maximum two contexts are pinned at any given time, but on the other
hand, we cannot really pin in interrupt time :(

v3: Use a mutex rather than atomic_t to protect pin count to avoid races.
Do not unpin default context in free_request.

v4: Break out pin and unpin into functions.  Fix style problems reported
by checkpatch

v5: Remove unpin_lock as all pinning and unpinning is done with the struct
mutex already locked.  Add WARN_ONs to make sure this is the case in future.

Issue: VIZ-4277
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Akash Goel <akash.goels@gmail.com>
Reviewed-by: Deepak S<deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-11-19 19:32:58 +01:00
Thomas Daniel c86ee3a9f8 drm/i915/bdw: Clean up execlist queue items in retire_work
No longer create a work item to clean each execlist queue item.
Instead, move retired execlist requests to a queue and clean up the
items during retire_requests.

v2: Fix legacy ring path broken during overzealous cleanup

v3: Update idle detection to take execlists queue into account

v4: Grab execlist lock when checking queue state

v5: Fix leaking requests by freeing in execlists_retire_requests.

Issue: VIZ-4274
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Deepak S <deepak.s@linux.intel.com>
Reviewed-by: Akash Goel <akash.goels@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-11-19 19:16:45 +01:00
Daniel Vetter 70b0ea8656 drm/i915: Drop return value from lrc_setup_hardware_status_page
kmap never fails.

Spotted-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2014-11-18 09:09:32 +01:00
Damien Lespiau 70ee45e10b drm/i915/skl: Don't allow disabling ppgtt and execlists on gen9+
Running the driver without execlists and hence PPGTT (either aliasing or
full) isn't a supported configuration on gen9+.

Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-11-14 18:17:31 +01:00
Tvrtko Ursulin 6e7cc470bc drm/i915/skl: Use correct use counters for force wakes
Write and reads following the block changed use engine specific use counters
and unless that is matched here force wake use counting goes bad. Same
force wake is attempted to be taken twice which leads to at least time outs.

NOTE: Depending on feedback from hardware designers it may not be necessary
to grab force wakes on Gen9 here. But for Gen8 it is needed due to a race
between RC6 and ELSP writes.

v2: Added blitter force wake engine and made more future proof.
    Added commit note.

Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-11-14 11:29:29 +01:00
Michael H. Nguyen 468c6816b5 drm/i915/skl: Add Gen9 LRC size
The LRC increased in size on gen9. Make sure we return the right
size in get_lr_context_size()

v2. Corrected the size, should be 22 pages. I unintentionally mailed out
a test patch w/ size equaling 23 pages.

Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Michael H. Nguyen <michael.h.nguyen@intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-11-14 11:29:05 +01:00
Michel Thierry 771b9a5324 drm/i915: Initialize workarounds in logical ring mode too
Following the legacy ring submission example, update the
ring->init_context() hook to support the execlist submission mode.

v2: update to use the new workaround macros and cleanup unused code.
This takes care of both bdw and chv workarounds.

v2.1: Add missing call to init_context() during deferred context creation.

v3: Split init_context (emit) in legacy/lrc modes. For lrc, get the ringbuf
from the context (Mika/Daniel).

v4: Merge init_context interfaces back, the legacy mode only needs the ring,
but the lrc mode needs the ring and context (Mika).

Issue: VIZ-4092
Issue: GMIN-3475
Change-Id: Ie3d093b2542ab0e2a44b90460533e2f979788d6c
Cc: Deepak S <deepak.s@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
[danvet: Align function paramater lists properly.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-11-14 10:29:25 +01:00
Daniel Vetter eb84f976c8 Merge remote-tracking branch 'airlied/drm-next' into HEAD
Backmerge drm-next so that I can keep merging patches. Specifically I
want:
- atomic stuff, yay!
- eld parsing patch from Jani.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2014-11-10 10:55:35 +01:00
Thomas Daniel 1df06b75f0 drm/i915/bdw: Setup global hardware status page in execlists mode
Write HWS_PGA address even in execlists mode as the global hardware status
page is still required.  This address was previously uninitialized and
HWSP writes would clobber whatever buffer happened to reside at GGTT
address 0.

v2: Break out hardware status page setup into a separate function.

Issue: VIZ-2020
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-11-07 18:41:50 +01:00
Dave Airlie 1f9e14baa9 Merge tag 'topic/core-stuff-2014-11-05' of git://anongit.freedesktop.org/drm-intel into drm-next
Just various stuff all over from a bunch of people. Shortlog gives a beter
overview, it's really all misc drm patches.

* tag 'topic/core-stuff-2014-11-05' of git://anongit.freedesktop.org/drm-intel:
  drm/edid: add #defines and helpers for ELD
  drm/dp: Add counters in the drm_dp_aux struct for I2C NACKs and DEFERs
  drm: Remove compiler BUG_ON() test
  drm: Fix DRM_FORCE_ON_DIGITAL use
  drm/gma500: Don't destroy DRM properties in the driver
  drm/i915: Don't destroy DRM properties in the driver
  drm: Add a note to drm_property_create() about property lifetime
  gpu: drm: Fix warning caused by a parameter description in drm_crtc.c
  drm/dp-helper: Move the legacy helpers to gma500
  drm/crtc: Remove duplicated ioctl code
  drm/crtc: Fix two typos
  gpu:drm: Fix typo in Documentation/DocBook/drm.xml
  gpu: drm: drm_dp_mst_topology.c: Fix improper use of strncat
  drm: drm_err: Remove unnecessary __func__ argument
  drm: Implement O_NONBLOCK support on /dev/dri/cardN
2014-11-07 10:58:46 +10:00
Dave Gordon cd0707cb1d drm/i915: Remove redundant return value and WARN_ON
execlists_submit_context() always returns 0, which is redundant.
And its name is inaccurate, since it actually submits (up to)
TWO contextS. So we rename it, change it to "void", and remove
the WARN_ON() testing its return value.

Change-Id: Ie225b0eca7754c6093c8b8bd15550b251b6feb82
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-11-04 23:22:15 +01:00
John Harrison 6402c330a6 drm/i915: Fix null pointer dereference in ring cleanup code
If a ring failed to initialise for any reason then the error path would try to
clean up all rings including those that had not yet been allocated. The ring
clean up code did a check that the ring was valid before starting its work.
Unfortunately, that was after it had already dereferenced the ring to obtain a
dev_private pointer.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-11-04 23:22:14 +01:00
Masanari Iida 32197aab04 gpu:drm: Fix typo in Documentation/DocBook/drm.xml
This patch fix spelling typos found in drm.xml.
It is because the file is generated from comments in
source codes, I have to fix the typos within source files.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-10-21 10:55:33 +02:00
Daniel Vetter 7cd512f152 drm/i915: Fix irq checks in ring->irq_get/put functions
Yet another place that wasn't properly transformed when implementing
SOix. While at it convert the checks to WARN_ON on gen5+ (since we
don't have UMS potentially doing stupid things on those platforms).
And also add the corresponding checks to the put functions (again with
a WARN_ON) for gen5+.

v2: Drop the WARNINGS in the irq_put functions (including the existing
one for vebox), Chris convinced me that they're not that terribly
useful.

v3: Don't forget about execlist code.

Cc: Imre Deak <imre.deak@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: "Volkin, Bradley D" <bradley.d.volkin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2014-09-19 14:43:13 +02:00
Deepak S a01b0e946f drm/i915: add cherryview specfic forcewake in execlists_elsp_write
In chv, we have two power wells Render & Media. We need to use
corresponsing forcewake count. If we dont follow this we are getting
error "*ERROR*: Timed out waiting for forcewake old ack to clear" due to
multiple entry into __vlv_force_wake_get.

Signed-off-by: Deepak S <deepak.s@linux.intel.com>
Requested-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-09-19 14:41:17 +02:00
Oscar Mateo 564ddb2fae drm/i915/bdw: Render state init for Execlists
The batchbuffer that sets the render context state is submitted
in a different way, and from different places.

We needed to make both the render state preparation and free functions
outside accesible, and namespace accordingly. This mess is so that all
LR, LRC and Execlists functionality can go together in intel_lrc.c: we
can fix all of this later on, once the interfaces are clear.

v2: Create a separate ctx->rcs_initialized for the Execlists case, as
suggested by Chris Wilson.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>

v3: Setup ring status page in lr_context_deferred_create when the
default context is being created. This means that the render state
init for the default context is no longer a special case.  Execute
deferred creation of the default context at the end of
logical_ring_init to allow the render state commands to be submitted.
Fix style errors reported by checkpatch. Rebased.

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-09-03 11:04:52 +02:00
Thomas Daniel 2d96553613 drm/i915/bdw: Populate lrc with aliasing ppgtt if required
A previous commit broke aliasing PPGTT for lrc, resulting in a kernel oops
on boot. Add a check so that is full PPGTT is not in use the context is
populated with the aliasing PPGTT.

Issue: VIZ-4278
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-09-03 11:03:56 +02:00
Oscar Mateo 73e4d07f8a drm/i915/bdw: Document Logical Rings, LR contexts and Execlists
Add theory of operation notes to intel_lrc.c and comments to externally
visible functions.

v2: Add notes on logical ring context creation.

v3: Use kerneldoc.

v4: Integrate it in the DocBook template.

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2, v3)
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Drop hunk about render ring init function since that's not
yet merged.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-20 17:17:51 +02:00
Oscar Mateo 4ba70e448b drm/i915/bdw: Display execlists info in debugfs
v2: Warn and return if LRCs are not enabled.

v3: Grab the Execlists spinlock (noticed by Daniel Vetter).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>

v4: Lock the struct mutex for atomic state capture

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Checkpatch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-20 17:17:49 +02:00
Oscar Mateo f1ad5a1fd4 drm/i915/bdw: Help out the ctx switch interrupt handler
If we receive a storm of requests for the same context (see gem_storedw_loop_*)
we might end up iterating over too many elements in interrupt time, looking for
contexts to squash together. Instead, share the burden by giving more
intelligence to the queue function. At most, the interrupt will iterate over
three elements.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Checkpatch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-14 22:44:04 +02:00
Oscar Mateo e1fee72c2e drm/i915/bdw: Avoid non-lite-restore preemptions
In the current Execlists feeding mechanism, full preemption is not
supported yet: only lite-restores are allowed (this is: the GPU
simply samples a new tail pointer for the context currently in
execution).

But we have identified an scenario in which a full preemption occurs:
1) We submit two contexts for execution (A & B).
2) The GPU finishes with the first one (A), switches to the second one
(B) and informs us.
3) We submit B again (hoping to cause a lite restore) together with C,
but in the time we spend writing to the ELSP, the GPU finishes B.
4) The GPU start executing B again (since we told it so).
5) We receive a B finished interrupt and, mistakenly, we submit C (again)
and D, causing a full preemption of B.

The race is avoided by keeping track of how many times a context has been
submitted to the hardware and by better discriminating the received context
switch interrupts: in the example, when we have submitted B twice, we won´t
submit C and D as soon as we receive the notification that B is completed
because we were expecting to get a LITE_RESTORE and we didn´t, so we know a
second completion will be received shortly.

Without this explicit checking, somehow, the batch buffer execution order
gets messed with. This can be verified with the IGT test I sent together with
the series. I don´t know the exact mechanism by which the pre-emption messes
with the execution order but, since other people is working on the Scheduler
+ Preemption on Execlists, I didn´t try to fix it. In these series, only Lite
Restores are supported (other kind of preemptions WARN).

v2: elsp_submitted belongs in the new intel_ctx_submit_request. Several
rebase changes.

v3: Clarify how the race is avoided, as requested by Daniel.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Align function parameters ...]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-14 22:43:58 +02:00
Thomas Daniel e981e7b17f drm/i915/bdw: Handle context switch events
Handle all context status events in the context status buffer on every
context switch interrupt. We only remove work from the execlist queue
after a context status buffer reports that it has completed and we only
attempt to schedule new contexts on interrupt when a previously submitted
context completes (unless no contexts are queued, which means the GPU is
free).

We canot call intel_runtime_pm_get() in an interrupt (or with a spinlock
grabbed, FWIW), because it might sleep, which is not a nice thing to do.
Instead, do the runtime_pm get/put together with the create/destroy request,
and handle the forcewake get/put directly.

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>

v2: Unreferencing the context when we are freeing the request might free
the backing bo, which requires the struct_mutex to be grabbed, so defer
unreferencing and freeing to a bottom half.

v3:
- Ack the interrupt inmediately, before trying to handle it (fix for
missing interrupts by Bob Beckett <robert.beckett@intel.com>).
- Update the Context Status Buffer Read Pointer, just in case (spotted
by Damien Lespiau).

v4: New namespace and multiple rebase changes.

v5: Squash with "drm/i915/bdw: Do not call intel_runtime_pm_get() in an
interrupt", as suggested by Daniel.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Checkpatch ...]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-14 22:43:47 +02:00
Michel Thierry acdd884a2e drm/i915/bdw: Two-stage execlist submit process
Context switch (and execlist submission) should happen only when
other contexts are not active, otherwise pre-emption occurs.

To assure this, we place context switch requests in a queue and those
request are later consumed when the right context switch interrupt is
received (still TODO).

v2: Use a spinlock, do not remove the requests on unqueue (wait for
context switch completion).

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>

v3: Several rebases and code changes. Use unique ID.

v4:
- Move the queue/lock init to the late ring initialization.
- Damien's kmalloc review comments: check return, use sizeof(*req),
do not cast.

v5:
- Do not reuse drm_i915_gem_request. Instead, create our own.
- New namespace.

Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2-v5)
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[davnet: Checkpatch + wash-up s/BUG_ON/WARN_ON/.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-14 22:10:59 +02:00
Oscar Mateo ae1250b9da drm/i915/bdw: Write the tail pointer, LRC style
Each logical ring context has the tail pointer in the context object,
so update it before submission.

v2: New namespace.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-14 22:03:09 +02:00
Ben Widawsky 84b790f80e drm/i915/bdw: Implement context switching (somewhat)
A context switch occurs by submitting a context descriptor to the
ExecList Submission Port. Given that we can now initialize a context,
it's possible to begin implementing the context switch by creating the
descriptor and submitting it to ELSP (actually two, since the ELSP
has two ports).

The context object must be mapped in the GGTT, which means it must exist
in the 0-4GB graphics VA range.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

v2: This code has changed quite a lot in various rebases. Of particular
importance is that now we use the globally unique Submission ID to send
to the hardware. Also, context pages are now pinned unconditionally to
GGTT, so there is no need to bind them.

v3: Use LRCA[31:12] as hwCtxId[19:0]. This guarantees that the HW context
ID we submit to the ELSP is globally unique and != 0 (Bspec requirements
of the software use-only bits of the Context ID in the Context Descriptor
Format) without the hassle of the previous submission Id construction.
Also, re-add the ELSP porting read (it was dropped somewhere during the
rebases).

v4:
- Squash with "drm/i915/bdw: Add forcewake lock around ELSP writes" (BSPEC
  says: "SW must set Force Wakeup bit to prevent GT from entering C6 while
  ELSP writes are in progress") as noted by Thomas Daniel
  (thomas.daniel@intel.com).
- Rename functions and use an execlists/intel_execlists_ namespace.
- The BUG_ON only checked that the LRCA was <32 bits, but it didn't make
  sure that it was properly aligned. Spotted by Alistair Mcaulay
  <alistair.mcaulay@intel.com>.

v5:
- Improved source code comments as suggested by Chris Wilson.
- No need to abstract submit_ctx away, as pointed by Brad Volkin.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Checkpatch. Sigh.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-14 22:03:03 +02:00
Oscar Mateo 48e29f5535 drm/i915/bdw: Emission of requests with logical rings
On a previous iteration of this patch, I created an Execlists
version of __i915_add_request and asbtracted it away as a
vfunc. Daniel Vetter wondered then why that was needed:

"with the clean split in command submission I expect every
function to know wether it'll submit to an lrc (everything in
intel_lrc.c) or wether it'll submit to a legacy ring (existing
code), so I don't see a need for an add_request vfunc."

The honest, hairy truth is that this patch is the glue keeping
the whole logical ring puzzle together:

- i915_add_request is used by intel_ring_idle, which in turn is
  used by i915_gpu_idle, which in turn is used in several places
  inside the eviction and gtt codes.
- Also, it is used by i915_gem_check_olr, which is littered all
  over i915_gem.c
- ...

If I were to duplicate all the code that directly or indirectly
uses __i915_add_request, I'll end up creating a separate driver.

To show the differences between the existing legacy version and
the new Execlists one, this time I have special-cased
__i915_add_request instead of adding an add_request vfunc. I
hope this helps to untangle this Gordian knot.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Adjust to ringbuf->FIXME_lrc_ctx per the discussion with
Thomas Daniel.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-14 22:02:55 +02:00
Oscar Mateo 582d67f0b1 drm/i915: Add temporary ring->ctx backpointer
The execlist patches have a bit a convoluted and long history and due
to that have the actual submission still misplaced deeply burried in
the low-level ringbuffer handling code. This design goes back to the
legacy ringbuffer code with its tricky lazy request and simple work
submissiion using ring tail writes. For that reason they need a
ring->ctx backpointer.

The goal is to unburry that code and move it up into a level where the
full execlist context is available so that we can ditch this
backpointer. Until that's done make it really obvious that there's
work still to be done.

Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Thomas Daniel <thomas.daniel@intel.com>
Acked-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-14 18:42:59 +02:00
Daniel Vetter ae6c480692 drm/i915: Only track real ppgtt for a context
There's a bit a confusion since we track the global gtt,
the aliasing and real ppgtt in the ctx->vm pointer. And not
all callers really bother to check for the different cases and just
presume that it points to a real ppgtt.

Now looking closely we don't actually need ->vm to always point at an
address space - the only place that cares actually has fixup code
already to decide whether to look at the per-proces or the global
address space.

So switch to just tracking the ppgtt directly and ditch all the
extraneous code.

v2: Fixup the ppgtt debugfs file to not oops on a NULL ctx->ppgtt.
Also drop the early exit - without aliasing ppgtt we want to dump all
the ppgtts of the contexts if we have full ppgtt.

v3: Actually git add the compile fix.

Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Cc: "Thierry, Michel" <michel.thierry@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
OTC-Jira: VIZ-3724
[danvet: Resolve conflicts with execlist patches while applying.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-13 14:23:33 +02:00
Oscar Mateo 14bf993e83 drm/i915/bdw: Always use MMIO flips with Execlists
The normal flip function places things in the ring in the legacy
way, so we either fix that or force MMIO flips always as we do in
this patch.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Checkpatch. Fucking again.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 23:25:49 +02:00
Oscar Mateo ba8b7ccb19 drm/i915/bdw: Workload submission mechanism for Execlists
This is what i915_gem_do_execbuffer calls when it wants to execute some
worload in an Execlists world.

v2: Check arguments before doing stuff in intel_execlists_submission. Also,
get rel_constants parsing right.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Drop the chipset flush, that's pre-gen6. And appease
checkpatch a bit .... again!]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 23:18:38 +02:00
Oscar Mateo 1564858526 drm/i915/bdw: GEN-specific logical ring emit batchbuffer start
Dispatch_execbuffer's evil twin.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Ditch the check for aliasing ppgtt. It'll break soon and
execlists requires full ppgtt anyway.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 23:12:34 +02:00
Oscar Mateo 73d477f6bb drm/i915/bdw: Interrupts with logical rings
We need to attend context switch interrupts from all rings. Also, fixed writing
IMR/IER and added HWSTAM at ring init time.

Notice that, if added to irq_enable_mask, the context switch interrupts would
be incorrectly masked out when the user interrupts are due to no users waiting
on a sequence number. Therefore, this commit adds a bitmask of interrupts to
be kept unmasked at all times.

v2: Disable HWSTAM, as suggested by Damien (nobody listens to these interrupts,
anyway).

v3: Add new get/put_irq functions.

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2 & v3)
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Drop the GEN8_ prefix from the context switch interrupt
define and move it to its brethren.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 23:06:58 +02:00
Oscar Mateo 9832b9dae8 drm/i915/bdw: Ring idle and stop with logical rings
This is a hard one, since there is no direct hardware ring to
control when in Execlists.

We reuse intel_ring_idle here, but it should be fine as long
as i915_add_request does the ring thing.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 22:57:38 +02:00
Oscar Mateo 4712274c36 drm/i915/bdw: GEN-specific logical ring emit flush
Same as the legacy-style ring->flush.

v2: The BSD invalidate bit still exists in GEN8! Add it for the VCS
rings (but still consolidate the blt and bsd ring flushes into one).
This was noticed by Brad Volkin.

v3: The command for BSD and for other rings is slightly different:
get it exactly the same as in gen6_ring_flush + gen6_bsd_ring_flush

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Checkpatch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 22:44:37 +02:00
Oscar Mateo 4da46e1e5b drm/i915/bdw: GEN-specific logical ring emit request
Very similar to the legacy add_request, only modified to account for
logical ringbuffer.

v2: Use MI_GLOBAL_GTT, as suggested by Brad Volkin.

v3: Unify render and non-render in the same function, as noticed by
Brad Volkin.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 22:42:49 +02:00
Oscar Mateo 82e104cc26 drm/i915/bdw: New logical ring submission mechanism
Well, new-ish: if all this code looks familiar, that's because it's
a clone of the existing submission mechanism (with some modifications
here and there to adapt it to LRCs and Execlists).

And why did we do this instead of reusing code, one might wonder?
Well, there are some fears that the differences are big enough that
they will end up breaking all platforms.

Also, Execlists offer several advantages, like control over when the
GPU is done with a given workload, that can help simplify the
submission mechanism, no doubt. I am interested in getting Execlists
to work first and foremost, but in the future this parallel submission
mechanism will help us to fine tune the mechanism without affecting
old gens.

v2: Pass the ringbuffer only (whenever possible).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Appease checkpatch. Again. And drop the legacy sarea gunk
that somehow crept in.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 22:42:36 +02:00
Oscar Mateo e94e37ad19 drm/i915/bdw: GEN-specific logical ring set/get seqno
No mistery here: the seqno is still retrieved from the engine's
HW status page (the one in the default context. For the moment,
I see no reason to worry about other context's HWS page).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 17:05:17 +02:00
Oscar Mateo 9b1136d505 drm/i915/bdw: GEN-specific logical ring init
Logical rings do not need most of the initialization their
legacy ringbuffer counterparts do: we just need the pipe
control object for the render ring, enable Execlists on the
hardware and a few workarounds.

v2: Squash with: "drm/i915: Extract pipe control fini & make
init outside accesible".

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Make checkpatch happy.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 17:03:28 +02:00
Oscar Mateo 48d823878d drm/i915/bdw: Generic logical ring init and cleanup
Allocate and populate the default LRC for every ring, call
gen-specific init/cleanup, init/fini the command parser and
set the status page (now inside the LRC object). These are
things all engines/rings have in common.

Stopping the ring before cleanup and initializing the seqnos
is left as a TODO task (we need more infrastructure in place
before we can achieve this).

v2: Check the ringbuffer backing obj for ring_is_initialized,
instead of the context backing obj (similar, but not exactly
the same).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 16:55:17 +02:00
Oscar Mateo 454afebde8 drm/i915/bdw: Skeleton for the new logical rings submission path
Execlists are indeed a brave new world with respect to workload
submission to the GPU.

In previous version of these series, I have tried to impact the
legacy ringbuffer submission path as little as possible (mostly,
passing the context around and using the correct ringbuffer when I
needed one) but Daniel is afraid (probably with a reason) that
these changes and, especially, future ones, will end up breaking
older gens.

This commit and some others coming next will try to limit the
damage by creating an alternative path for workload submission.
The first step is here: laying out a new ring init/fini.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 16:40:57 +02:00
Oscar Mateo 8670d6f97d drm/i915/bdw: Populate LR contexts (somewhat)
For the most part, logical ring context objects are similar to hardware
contexts in that the backing object is meant to be opaque. There are
some exceptions where we need to poke certain offsets of the object for
initialization, updating the tail pointer or updating the PDPs.

For our basic execlist implementation we'll only need our PPGTT PDs,
and ringbuffer addresses in order to set up the context. With previous
patches, we have both, so start prepping the context to be load.

Before running a context for the first time you must populate some
fields in the context object. These fields begin 1 PAGE + LRCA, ie. the
first page (in 0 based counting) of the context  image. These same
fields will be read and written to as contexts are saved and restored
once the system is up and running.

Many of these fields are completely reused from previous global
registers: ringbuffer head/tail/control, context control matches some
previous MI_SET_CONTEXT flags, and page directories. There are other
fields which we don't touch which we may want in the future.

v2: CTX_LRI_HEADER_0 is MI_LOAD_REGISTER_IMM(14) for render and (11)
for other engines.

v3: Several rebases and general changes to the code.

v4: Squash with "Extract LR context object populating"
Also, Damien's review comments:
- Set the Force Posted bit on the LRI header, as the BSpec suggest we do.
- Prevent warning when compiling a 32-bits kernel without HIGHMEM64.
- Add a clarifying comment to the context population code.

v5: Damien's review comments:
- The third MI_LOAD_REGISTER_IMM in the context does not set Force Posted.
- Remove dead code.

v6: Add a note about the (presumed) differences between BDW and CHV state
contexts. Also, Brad's review comments:
- Use the _MASKED_BIT_ENABLE, upper_32_bits and lower_32_bits macros.
- Be less magical about how we set the ring size in the context.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com> (v2)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 16:21:53 +02:00
Daniel Vetter 0c7dd53b84 drm/i915/bdw: Add a context and an engine pointers to the ringbuffer
Any given ringbuffer is unequivocally tied to one context and one engine.
By setting the appropriate pointers to them, the ringbuffer struct holds
all the infromation you might need to submit a workload for processing,
Execlists style.

v2: Drop ring->ctx since that looks terribly ill-defined for legacy
ringbuffer submission.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v1)
Acked-by: Damien Lespiau <damien.lespiau@intel.com> (v2)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 16:18:17 +02:00
Oscar Mateo 84c2377fce drm/i915/bdw: Allocate ringbuffers for Logical Ring Contexts
As we have said a couple of times by now, logical ring contexts have
their own ringbuffers: not only the backing pages, but the whole
management struct.

In a previous version of the series, this was achieved with two separate
patches:
drm/i915/bdw: Allocate ringbuffer backing objects for default global LRC
drm/i915/bdw: Allocate ringbuffer for user-created LRCs

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 16:10:58 +02:00
Oscar Mateo 8c8579176a drm/i915/bdw: A bit more advanced LR context alloc/free
Now that we have the ability to allocate our own context backing objects
and we have multiplexed one of them per engine inside the context structs,
we can finally allocate and free them correctly.

Regarding the context size, reading the register to calculate the sizes
can work, I think, however the docs are very clear about the actual
context sizes on GEN8, so just hardcode that and use it.

v2: Rebased on top of the Full PPGTT series. It is important to notice
that at this point we have one global default context per engine, all
of them using the aliasing PPGTT (as opposed to the single global
default context we have with legacy HW contexts).

v3:
- Go back to one single global default context, this time with multiple
  backing objects inside.
- Use different context sizes for non-render engines, as suggested by
  Damien (still hardcoded, since the information about the context size
  registers in the BSpec is, well, *lacking*).
- Render ctx size is 20 (or 19) pages, but not 21 (caught by Damien).
- Move default context backing object creation to intel_init_ring (so
  that we don't waste memory in rings that might not get initialized).

v4:
- Reuse the HW legacy context init/fini.
- Create a separate free function.
- Rename the functions with an intel_ preffix.

v5: Several rebases to account for the changes in the previous patches.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 16:08:18 +02:00
Oscar Mateo ede7d42bae drm/i915/bdw: Initialization for Logical Ring Contexts
For the moment this is just a placeholder, but it shows one of the
main differences between the good ol' HW contexts and the shiny
new Logical Ring Contexts: LR contexts allocate  and free their
own backing objects. Another difference is that the allocation is
deferred (as the create function name suggests), but that does not
happen in this patch yet, because for the moment we are only dealing
with the default context.

Early in the series we had our own gen8_gem_context_init/fini
functions, but the truth is they now look almost the same as the
legacy hw context init/fini functions. We can always split them
later if this ceases to be the case.

Also, we do not fall back to legacy ringbuffers when logical ring
context initialization fails (not very likely to happen and, even
if it does, hw contexts would probably fail as well).

v2: Daniel says "explain, do not showcase".

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: s/BUG_ON/WARN_ON/.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 16:04:11 +02:00
Daniel Vetter bd84b1e995 drm/i915: WARN if module opt sanitization goes out of order
Depending upon one module option to be sanitized (through USES_PPGTT)
for the other is a bit too fragile for my taste. At least WARN about
this.

Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 16:00:34 +02:00
Oscar Mateo 127f100369 drm/i915/bdw: Macro for LRCs and module option for Execlists
GEN8 brings an expansion of the HW contexts: "Logical Ring Contexts".
These expanded contexts enable a number of new abilities, especially
"Execlists".

The macro is defined to off until we have things in place to hope to
work.

v2: Rename "advanced contexts" to the more correct "logical ring
contexts".

v3: Add a module parameter to enable execlists. Execlist are relatively
new, and so it'd be wise to be able to switch back to ring submission
to debug subtle problems that will inevitably arise.

v4: Add an intel_enable_execlists function.

v5: Sanitize early, as suggested by Daniel. Remove lrc_enabled.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> (v3)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2, v4 & v5)
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 16:00:27 +02:00
Oscar Mateo b20385f1f8 drm/i915/bdw: New source and header file for LRs, LRCs and Execlists
Some legacy HW context code assumptions don't make sense for this new
submission method, so we will place this stuff in a separate file.

Note for reviewers: I've carefully considered the best name for this file
and this was my best option (other possibilities were intel_lr_context.c
or intel_execlist.c). I am open to a certain bikeshedding on this matter,
anyway.

And some point in time, it would be a good idea to split intel_lrc.c/.h
even further, but for the moment just shove everything together.

v2: Change to intel_lrc.c

v3: Squash together with the header file addition

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 16:00:07 +02:00