In order to simplify future patches, extract the
lazy_coherency optimisation our of the engine->get_seqno() vfunc into
its own callback.
v2: Rename the barrier to engine->irq_seqno_barrier to try and better
reflect that the barrier is only required after the user interrupt before
reading the seqno (to ensure that the seqno update lands in time as we
do not have strict seqno-irq ordering on all platforms).
Reviewed-by: Dave Gordon <david.s.gordon@intel.com> [#v2]
v3: Comments for hangcheck paranoia. Mika wanted to keep the extra
barrier inside the hangcheck, just in case. I can argue that it doesn't
provide a barrier against anything, but the side-effects of applying the
barrier may prevent a false declaration of a hung GPU.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460195877-20520-2-git-send-email-chris@chris-wilson.co.uk
It's useful to look at the last seqno submitted on a particular engine
and compare it against the HWS value to check for irregularities.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460010558-10705-1-git-send-email-chris@chris-wilson.co.uk
dev_priv is what the macro works hard to extract, pass it directly.
> sed 's/\([A-Z].*(dev_priv\)->dev)/\1)/g'
v2:
- Include all wrapper macros too (Chris)
v3:
- Include sed cmdline (Chris)
v4:
- Break long line
- Rebase
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460016485-8089-1-git-send-email-joonas.lahtinen@linux.intel.com
Refer to the GGTT VM consistently as "ggtt->base" instead of just "ggtt",
"vm" or indirectly through other variables like "dev_priv->ggtt.base"
to avoid confusion with the i915_ggtt object itself and PPGTT VMs.
Refer to the GGTT as "ggtt" instead of indirectly through chaining.
As a bonus gets rid of the long-standing i915_obj_to_ggtt vs.
i915_gem_obj_to_ggtt conflict, due to removal of i915_obj_to_ggtt!
v2:
- Added some more after grepping sources with Chris
v3:
- Refer to GGTT VM through ggtt->base consistently instead of ggtt_vm
(Chris)
v4:
- Convert all dev_priv->ggtt->foo accesses to ggtt->foo.
v5:
- Make patch checker happy
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Equivalent to the existing for_each_engine() macro, this will replace
the latter wherever the third argument *is* actually wanted (in most
places, it is not used). The third argument is renamed to emphasise
that it is an engine id (type enum intel_engine_id). All the callers of
the macro that actually need the third argument are updated to use this
version, and the argument (generally 'i') is also updated to be 'id'.
Other callers (where the third argument is unused) are untouched for
now; they will be updated in the next patch.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
In preparation for engine reset, the wedged argument of i915_handle_error()
is extended to reflect as a mask of engines that are hung. This is further
passed down to error state capture functions which are also updated.
Engine reset recovery mechanism uses this mask and schedules recovery work
for those particular engines.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1458331676-567-3-git-send-email-arun.siluvery@linux.intel.com
Refer to Global GTT consistently as GGTT, thus rename dev_priv->gtt
to dev_priv->ggtt and struct i915_gtt to struct i915_ggtt.
Fix a couple of whitespace problems while at it.
v2:
- Fix a typo in commit message.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Some trivial ones, first pass done with Coccinelle:
@@
@@
(
- I915_NUM_RINGS
+ I915_NUM_ENGINES
|
- intel_ring_flag
+ intel_engine_flag
|
- for_each_ring
+ for_each_engine
|
- i915_gem_request_get_ring
+ i915_gem_request_get_engine
|
- intel_ring_idle
+ intel_engine_idle
|
- i915_gem_reset_ring_status
+ i915_gem_reset_engine_status
|
- i915_gem_reset_ring_cleanup
+ i915_gem_reset_engine_cleanup
|
- init_ring_lists
+ init_engine_lists
)
But that didn't fully work so I cleaned it up with:
for f in *.[hc]; do sed -i -e s/I915_NUM_RINGS/I915_NUM_ENGINES/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_request_get_ring/i915_gem_request_get_engine/ $f; done
for f in *.[hc]; do sed -i -e s/intel_ring_flag/intel_engine_flag/ $f; done
for f in *.[hc]; do sed -i -e s/intel_ring_idle/intel_engine_idle/ $f; done
for f in *.[hc]; do sed -i -e s/init_ring_lists/init_engine_lists/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_reset_ring_cleanup/i915_gem_reset_engine_cleanup/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_reset_ring_status/i915_gem_reset_engine_status/ $f; done
v2: Rebase.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
With full-ppgtt, it takes the GPU an eon to traverse the entire 256PiB
address space, causing a loop to be detected. Under the current scheme,
if ACTHD walks off the end of a batch buffer and into an empty
address space, we "never" detect the hang. If we always increment the
score as the ACTHD is progressing then we will eventually timeout (after
~46.5s (31 * 1.5s) without advancing onto a new batch). To counter act
this, increase the amount we reduce the score for good batches, so that
only a series of almost-bad batches trigger a full reset. DoS detection
suffers slightly but series of long running shader tests will benefit.
Based on a patch from Chris Wilson.
Testcase: igt/drv_hangman/hangcheck-unterminated
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1456930109-21532-1-git-send-email-mika.kuoppala@intel.com
execute during context save/restore, good to have them in error state.
v2: use wa_ctx->size and print only size values (Mika)
v3: simplify conditions when recording and freeing object (Chris)
v4: resolve checkpatch errors (Tvrtko)
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1456831476-10782-1-git-send-email-arun.siluvery@linux.intel.com
Elsewhere we have adopted the convention of using '_link' to denote
elements in the list (and '_list' for the actual list_head itself), and
that the name should indicate which list the link belongs to (and
preferrably not just where the link is being stored).
s/vma_link/obj_link/ (we iterate over obj->vma_list)
s/mm_list/vm_link/ (we iterate over vm->[in]active_list)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Revision id along with device id is useful in better identification of the HW
and its limitations so include this detail in error state.
v2: make it clear that it is PCI revision and We might as well dump PCI
subsystem details while we update this (Ville, Chris).
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1454001521-7701-1-git-send-email-arun.siluvery@linux.intel.com
Now that we've eliminated a lot of uses of ring->default_context,
we can eliminate the pointer itself.
All the engines share the same default intel_context, so we can just
keep a single reference to it in the dev_priv structure rather than one
in each of the engine[] elements. This make refcounting more sensible
too, as we now have a refcount of one for the one pointer, rather than
a refcount of one but multiple pointers.
From an idea by Chris Wilson.
v2: transform an extra instance of ring->default_context introduced by
42f1cae8c drm/i915: Restore inhibiting the load of the default context
That patch's commentary includes:
v2: Mark the global default context as uninitialized on GPU reset so
that the context-local workarounds are reloaded upon re-enabling
The code implementing that now also benefits from the replacement of
the multiple (per-ring) pointers to the default context with a single
pointer to the unique kernel context.
v4: Rebased, remove underused local (Nick Hoath)
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Nick Hoath <nicholas.hoath@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1453230175-19330-3-git-send-email-david.s.gordon@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Make I915_READ and I915_WRITE more type safe by wrapping the register
offset in a struct. This should eliminate most of the fumbles we've had
with misplaced parens.
This only takes care of normal mmio registers. We could extend the idea
to other register types and define each with its own struct. That way
you wouldn't be able to accidentally pass the wrong thing to a specific
register access function.
The gpio_reg setup is probably the ugliest thing left. But I figure I'd
just leave it for now, and wait for some divine inspiration to strike
before making it nice.
As for the generated code, it's actually a bit better sometimes. Eg.
looking at i915_irq_handler(), we can see the following change:
lea 0x70024(%rdx,%rax,1),%r9d
mov $0x1,%edx
- movslq %r9d,%r9
- mov %r9,%rsi
- mov %r9,-0x58(%rbp)
- callq *0xd8(%rbx)
+ mov %r9d,%esi
+ mov %r9d,-0x48(%rbp)
callq *0xd8(%rbx)
So previously gcc thought the register offset might be signed and
decided to sign extend it, just in case. The rest appears to be
mostly just minor shuffling of instructions.
v2: i915_mmio_reg_{offset,equal,valid}() helpers added
s/_REG/_MMIO/ in the register defines
mo more switch statements left to worry about
ring_emit stuff got sorted in a prep patch
cmd parser, lrc context and w/a batch buildup also in prep patch
vgpu stuff cleaned up and moved to a prep patch
all other unrelated changes split out
v3: Rebased due to BXT DSI/BLC, MOCS, etc.
v4: Rebased due to churn, s/i915_mmio_reg_t/i915_reg_t/
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1447853606-2751-1-git-send-email-ville.syrjala@linux.intel.com
We have had one case where buggy csr/dmc firmware version influenced
gt side and caused a hang. Add dmc firmware loading state and
version to error state.
v2: - Rebased on top of Damien's patches
- included fw load state
v3: include dmc info only if platform supports it (Chris)
v4: move *csr to branch scope (Chris)
v5: remove dependency to csr_state
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v4)
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1446124879-22240-1-git-send-email-mika.kuoppala@intel.com
Tested-by: Daniel Stone <daniels@collabora.com> # SKL
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Change FORCEWAKE & co. reads for the error state to use I915_READ_FW().
Reading a FORCEWAKE register using a function that can frob forcewake
just seems wrong.
There is a check to skip grabbing the forcewake for accessing FORCEWAKE
in intel_uncore.c, but there's no such check for FORCEWAKE_MT. So no
idea what is currently happening with FORCEWAKE_MT reads. FORCEWAKE_VLV
is fortunately outside the forcewake range anyway, so no actual issue
with that one.
So let's just make the rule that you can't access FORCEWAKE registers with
the normal I915_READ() stuff, and we can drop the extra FORCEWAKE check
from NEEDS_FORCEWAKE(). While at it use NEEDS_FORCEWAKE() on BDW, where
it was skipped for whatever bikeshed reason that I've already forgotten.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1445517300-28173-3-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Since we're not synchronizing the ring request list during error state capture
the request list state might change between the time the corresponding error
request list was allocated and dimensioned to the time when the ring request
list is actually captured into the error state. If this happens then do an
early exit and be aware that the captured error state might not be fully
reliable.
* v2:
- Chris Wilson: Removed WARN_ON from size check since having the error state
request list and the live driver request list diverge like this is a
legitimate behaviour.
- Tomas Elf: Removed update of num_request field since this made no sense. Just
exit and move on.
* v3:
- Chris Wilson: Removed error message at the point of early exit. The user is
not interested in any state changes happening during the error state capture,
only in the state that we're trying to capture at the point of the error.
Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This reverts commit 5105672341.
I somehow managed to combine a patch from Tomas Elf with a totally
unrelated commit message from Chris Wilson. Let's revert this and
reapply properly.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Userspace can pass in an offset that it presumes the object is located
at. The kernel will then do its utmost to fit the object into that
location. The assumption is that userspace is handling its own object
locations (for example along with full-ppgtt) and that the kernel will
rarely have to make space for the user's requests.
v2: Fix i915_gem_evict_range() (now evict_for_vma) to handle ordinary
and fixed objects within the same batch
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Daniel, Thomas" <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This register was added on GEN4, by the name INSTDONE_1 whereas the GEN6
specification calls it INSTDONE_2. Keep the original name with a
platform prefix to make it clearer which INSTDONE register instance this
is. Also add a comment about the SNB alternative name.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We have a bunch of INSTDONE registers for different platforms and
purposes and it's not immediately clear which instance they are just by
looking at the register name. This one was added on GEN2, where it was
the only INSTDONE register, so mark it as such.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We use 3 different names to refer to the same render ring INSTDONE
register. This can be confusing when comparing two parts of the code
accessing the register via different names. Although the GEN4 version's
layout is different, we treat it the same way as the GEN7+ version, in
that we simply read it out during error capture. So remove the
duplicates and leave a comment about the GEN4 difference.
Note that there is also a GEN2 version of this register, but that's on a
different address so not handled in this patch.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
v2: Hide the 945 vs. rest of gen2/3 difference in the macro
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Looks like this was introduced in:
commit d1675198ed
Author: Alex Dai <yu.dai@intel.com>
Date: Wed Aug 12 15:43:43 2015 +0100
drm/i915: Integrate GuC-based command submission
This patch assumed LRC contexts and HWS layout, which is incorrect on
platforms without execlists. This can lead to a crash in GPU error
state readout on those platforms.
I don't see a bug filed for this, but there may be one that I haven't
found.
v2: fixup offset handling for error capture fix (Dave)
Cc: Alex Dai <yu.dai@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Add a common function to return "yes" or "no" string based on the
argument, and drop the local versions of it.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
GuC-based submission is mostly the same as execlist mode, up to
intel_logical_ring_advance_and_submit(), where the context being
dispatched would be added to the execlist queue; at this point
we submit the context to the GuC backend instead.
There are, however, a few other changes also required, notably:
1. Contexts must be pinned at GGTT addresses accessible by the GuC
i.e. NOT in the range [0..WOPCM_SIZE), so we have to add the
PIN_OFFSET_BIAS flag to the relevant GGTT-pinning calls.
2. The GuC's TLB must be invalidated after a context is pinned at
a new GGTT address.
3. GuC firmware uses the one page before Ring Context as shared data.
Therefore, whenever driver wants to get base address of LRC, we
will offset one page for it. LRC_PPHWSP_PN is defined as the page
number of LRCA.
4. In the work queue used to pass requests to the GuC, the GuC
firmware requires the ring-tail-offset to be represented as an
11-bit value, expressed in QWords. Therefore, the ringbuffer
size must be reduced to the representable range (4 pages).
v2:
Defer adding #defines until needed [Chris Wilson]
Rationalise type declarations [Chris Wilson]
v4:
Squashed kerneldoc patch into here [Daniel Vetter]
v5:
Update request->tail in code common to both GuC and execlist modes.
Add a private version of lr_context_update(), as sharing the
execlist version leads to race conditions when the CPU and
the GuC both update TAIL in the context image.
Conversion of error-captured HWS page to string must account
for offset from start of object to actual HWS (LRC_PPHWSP_PN).
Issue: VIZ-4884
Signed-off-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tom O'Rourke <Tom.O'Rourke@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
v2: For semaphore errors, object is mapped to GGTT and offset will not
be > 4GB, print only lower 32-bits (Akash)
v3: Print gtt_offset in groups of 32-bit (Chris)
Cc: Akash Goel <akash.goel@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The IOMMU for Intel graphics has historically had many issues resulting
in random GPU hangs. Lets include its status when capturing the GPU hang
error state for post-mortem analysis.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Currently, we only track the last request globally across all engines.
This prevents us from issuing concurrent read requests on e.g. the RCS
and BCS engines (or more likely the render and media engines). Without
semaphores, we incur costly stalls as we synchronise between rings -
greatly impacting the current performance of Broadwell versus Haswell in
certain workloads (like video decode). With the introduction of
reference counted requests, it is much easier to track the last request
per ring, as well as the last global write request so that we can
optimise inter-engine read read requests (as well as better optimise
certain CPU waits).
v2: Fix inverted readonly condition for nonblocking waits.
v3: Handle non-continguous engine array after waits
v4: Rebase, tidy, rewrite ring list debugging
v5: Use obj->active as a bitfield, it looks cool
v6: Micro-optimise, mostly involving moving code around
v7: Fix retire-requests-upto for execlists (and multiple rq->ringbuf)
v8: Rebase
v9: Refactor i915_gem_object_sync() to allow the compiler to better
optimise it.
Benchmark: igt/gem_read_read_speed
hsw:gt3e (with semaphores):
Before: Time to read-read 1024k: 275.794µs
After: Time to read-read 1024k: 123.260µs
hsw:gt3e (w/o semaphores):
Before: Time to read-read 1024k: 230.433µs
After: Time to read-read 1024k: 124.593µs
bdw-u (w/o semaphores): Before After
Time to read-read 1x1: 26.274µs 10.350µs
Time to read-read 128x128: 40.097µs 21.366µs
Time to read-read 256x256: 77.087µs 42.608µs
Time to read-read 512x512: 281.999µs 181.155µs
Time to read-read 1024x1024: 1196.141µs 1118.223µs
Time to read-read 2048x2048: 5639.072µs 5225.837µs
Time to read-read 4096x4096: 22401.662µs 21137.067µs
Time to read-read 8192x8192: 89617.735µs 85637.681µs
Testcase: igt/gem_concurrent_blit (read-read and friends)
Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [v8]
[danvet: s/\<rq\>/req/g]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
wa_batchbuffer is part of some error states. Make sure it
is freed.
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is mostly useful for execlists where the rings switch between
contexts (and so checking that the ring's start register matches the
context is important).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The faulting virtual address is >32bits and has been moved
to different registers. Add to error state and output upper
register first, in the same line for easy reconstruction of
the fault address.
v2: correct gen masking (Michel)
v3: s/TBL/TLB (Ville)
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We use the pid of the process which opened our device when
we track which was the culprit of the gpu hang. But as that
file descriptor might get inherited, we might blame the
wrong process when we record the error state.
Track process identifiers in requests to always find
the correct offender.
v2: Track only user processes (Chris)
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
[danvet: drop NULL check before put_pid as suggested by Chris.]
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Where there were duplicate variables for the tail, context and ring (engine)
in the gem request and the execlist queue item, use the one from the request
and remove the duplicate from the execlist queue item.
Issue: VIZ-4274
v1: Rebase
v2: Fixed build issues. Keep separate postfix & tail pointers as these are
used in different ways. Reinserted missing full tail pointer update.
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Let's be optimistic that for future platforms this will remain the same
and reorg a bit.
This reorg in if blocks instead of switch make life easier for future
platform support addition.
Cc: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Let's be optimistic that for future platforms this will remain the same
and reorg a bit.
This reorg in if blocks instead of switch make life easier for future
platform support addition.
Cc: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Let's be optimistic that for future platforms this will remain the same
and reorg a bit.
This reorg in if blocks instead of switch make life easier for future
platform support addition.
v2: Jani pointed out I was missing reg_830 for some gen3 platforms. So let's make
this platforms subcases of Gen checks.
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Things like reliable GGTT mappings and mirrored 2d-on-3d display will need
to map objects into the same address space multiple times.
Added a GGTT view concept and linked it with the VMA to distinguish between
multiple instances per address space.
New objects and GEM functions which do not take this new view as a parameter
assume the default of zero (I915_GGTT_VIEW_NORMAL) which preserves the
previous behaviour.
This now means that objects can have multiple VMA entries so the code which
assumed there will only be one also had to be modified.
Alternative GGTT views are supposed to borrow DMA addresses from obj->pages
which is DMA mapped on first VMA instantiation and unmapped on the last one
going away.
v2:
* Removed per view special casing in i915_gem_ggtt_prepare /
finish_object in favour of creating and destroying DMA mappings
on first VMA instantiation and last VMA destruction. (Daniel Vetter)
* Simplified i915_vma_unbind which does not need to count the GGTT views.
(Daniel Vetter)
* Also moved obj->map_and_fenceable reset under the same check.
* Checkpatch cleanups.
v3:
* Only retire objects once the last VMA is unbound.
v4:
* Keep scatter-gather table for alternative views persistent for the
lifetime of the VMA.
* Propagate binding errors to callers and handle appropriately.
v5:
* Explicitly look for normal GGTT view in i915_gem_obj_bound to align
usage in i915_gem_object_ggtt_unpin. (Michel Thierry)
* Change to single if statement in i915_gem_obj_to_ggtt. (Michel Thierry)
* Removed stray semi-colon in i915_gem_object_set_cache_level.
For: VIZ-4544
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
[danvet: Drop hunk from i915_gem_shrink since it's just prettification
but upsets a __must_check warning.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The ring member of the object structure was always updated with the
last_read_seqno member. Thus with the conversion to last_read_req, obj->ring is
now a direct copy of obj->last_read_req->ring. This makes it somewhat redundant
and potentially misleading (especially as there was no comment to explain its
purpose).
This checkin removes the redundant field. Many uses were simply testing for
non-null to see if the object is active on the GPU. Some of these have been
converted to check 'obj->active' instead. Others (where the last_read_req is
about to be used anyway) have been changed to check obj->last_read_req. The rest
simply pull the ring out from the request structure and proceed as before.
For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The object structure contains the last read, write and fenced seqno values for
use in syncrhonisation operations. These have now been replaced with their
request structure counterparts.
Note that to ensure that objects do not end up with dangling pointers, the
assignments of last_*_req include reference count updates. Thus a request cannot
be freed if an object is still hanging on to it for any reason.
v2: Corrected 'last_rendering_' to 'last_read_' in a number of comments that did
not get updated when 'last_rendering_seqno' became 'last_read|write_seqno'
several millenia ago.
For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Spotted while reading and trying to understand how our error capture
code deals with full ppgtt.
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
This goes back to
commit 362b8af7ad
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date: Thu Jan 30 00:19:38 2014 -0800
drm/i915: Move per ring error state to ring_error
Spotted while reading error states.
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
If these flags are on the object level it will be more difficult to allow
for multiple VMAs per object.
v2: Simplification and cleanup after code review comments (Chris Wilson).
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>