We currently presume that the engine reset is successful, cancelling the
expired preemption timer in the process. However, engine resets can
fail, leaving the timeout still pending and we will then respond to the
timeout again next time the tasklet fires. What we want is for the
failed engine reset to be promoted to a full device reset, which is
kicked by the heartbeat once the engine stops processing events.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1168
Fixes: 3a7a92aba8 ("drm/i915/execlists: Force preemption")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201204151234.19729-2-chris@chris-wilson.co.uk
Before reseting the engine, we suspend the execution of the guilty
request, so that we can continue execution with a new context while we
slowly compress the captured error state for the guilty context. However,
if the reset fails, we will promptly attempt to reset the same request
again, and discover the ongoing capture. Ignore the second attempt to
suspend and capture the same request.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1168
Fixes: 32ff621fd7 ("drm/i915/gt: Allow temporary suspension of inflight requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <stable@vger.kernel.org> # v5.7+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201204151234.19729-1-chris@chris-wilson.co.uk
drm/i915 features for v5.11:
Highlights:
- Enable big joiner to join two pipes to one port to overcome pipe restrictions
(Manasi, Ville, Maarten)
Display:
- More DG1 enabling (Lucas, Aditya)
- Fixes to cases without display (Lucas, José, Jani)
- Initial PSR state improvements (José)
- JSL eDP vswing updates (Tejas)
- Handle EDID declared max 16 bpc (Ville)
- Display refactoring (Ville)
Other:
- GVT features
- Backmerge
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87czzzkk1s.fsf@intel.com
We treat idling the GT (intel_rps_park) as a downclock event, and reduce
the frequency we intend to restart the GT with. Since the two workloads
are likely related (e.g. a compositor rendering every 16ms), we want to
carry the frequency and load information from across the idling.
However, we do also need to update the frequencies so that workloads
that run for less than 1ms are autotuned by RPS (otherwise we leave
compositors running at max clocks, draining excess power). Conversely,
if we try to run too slowly, the next workload has to run longer. Since
there is a hysteresis in the power graph, below a certain frequency
running a short workload for longer consumes more energy than running it
slightly higher for less time. The exact balance point is unknown
beforehand, but measurements with 30fps media playback indicate that RPe
is a better choice.
Reported-by: Edward Baker <edward.baker@intel.com>
Tested-by: Edward Baker <edward.baker@intel.com>
Fixes: 043cd2d14e ("drm/i915/gt: Leave rps->cur_freq on unpark")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: <stable@vger.kernel.org> # v5.8+
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201124183521.28623-1-chris@chris-wilson.co.uk
(cherry picked from commit f7ed83cc19)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
As we use a shmemfs file to hold the context state, when not in use it
may be swapped out, such as across suspend. Since we wrote into the
shmemfs without marking the pages as dirty, the contents may be dropped
instead of being written back to swap. On re-using the shmemfs file,
such as creating a new context after resume, the contents of that file
were likely garbage and so the new context could then hang the GPU.
Simply mark the page as being written when copying into the shmemfs
file, and it the new contents will be retained across swapout.
Fixes: be1cb55a07 ("drm/i915/gt: Keep a no-frills swappable copy of the default context state")
Cc: Sudeep Dutt <sudeep.dutt@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: CQ Tang <cq.tang@intel.com>
Signed-off-by: Venkata Ramana Nayana <venkata.ramana.nayana@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <stable@vger.kernel.org> # v5.8+
Link: https://patchwork.freedesktop.org/patch/msgid/20201127120718.454037-161-matthew.auld@intel.com
(cherry picked from commit a9d71f76cc)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Allow a brief period for continued access to a dead intel_context by
deferring the release of the struct until after an RCU grace period.
As we are using a dedicated slab cache for the contexts, we can defer
the release of the slab pages via RCU, with the caveat that individual
structs may be reused from the freelist within an RCU grace period. To
handle that, we have to avoid clearing members of the zombie struct.
This is required for a later patch to handle locking around virtual
requests in the signaler, as those requests may want to move between
engines and be destroyed while we are holding b->irq_lock on a physical
engine.
v2: Drop mutex_reinit(), if we never mark the mutex as destroyed we
don't need to reset the debug code, at the loss of having the mutex
debug code spot us attempting to destroy a locked mutex.
v3: As the intended use will remain strongly referenced counted, with
very little inflight access across reuse, drop the ctor.
v4: Drop the unrequired change to remove the temporary reference around
dropping the active context, and add back some more missing ctor
operations.
v5: The ctor is back. Tvrtko spotted that ce->signal_lock [introduced
later] maybe accessed under RCU and so needs special care not to be
reinitialised.
v6: Don't mix SLAB_TYPESAFE_BY_RCU and RCU list iteration.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201126140407.31952-3-chris@chris-wilson.co.uk
(cherry picked from commit 14d1eaf088)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Ville noticed that the last mocs entry is used unconditionally by the HW
when it performs cache evictions, and noted that while the value is not
meant to be writable by the driver, we should program it to a reasonable
value nevertheless.
As it turns out, we can change the value of mocs:63 and the value we
were programming into it would cause hard hangs in conjunction with
atomic operations.
v2: Add details from bspec about how it is used by HW
Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2707
Fixes: 3bbaba0cea ("drm/i915: Added Programming of the MOCS")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: <stable@vger.kernel.org> # v4.3+
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201126140841.1982-1-chris@chris-wilson.co.uk
(cherry picked from commit 977933b5da)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
We treat idling the GT (intel_rps_park) as a downclock event, and reduce
the frequency we intend to restart the GT with. Since the two workloads
are likely related (e.g. a compositor rendering every 16ms), we want to
carry the frequency and load information from across the idling.
However, we do also need to update the frequencies so that workloads
that run for less than 1ms are autotuned by RPS (otherwise we leave
compositors running at max clocks, draining excess power). Conversely,
if we try to run too slowly, the next workload has to run longer. Since
there is a hysteresis in the power graph, below a certain frequency
running a short workload for longer consumes more energy than running it
slightly higher for less time. The exact balance point is unknown
beforehand, but measurements with 30fps media playback indicate that RPe
is a better choice.
Reported-by: Edward Baker <edward.baker@intel.com>
Tested-by: Edward Baker <edward.baker@intel.com>
Fixes: 043cd2d14e ("drm/i915/gt: Leave rps->cur_freq on unpark")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: <stable@vger.kernel.org> # v5.8+
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201124183521.28623-1-chris@chris-wilson.co.uk
As we use a shmemfs file to hold the context state, when not in use it
may be swapped out, such as across suspend. Since we wrote into the
shmemfs without marking the pages as dirty, the contents may be dropped
instead of being written back to swap. On re-using the shmemfs file,
such as creating a new context after resume, the contents of that file
were likely garbage and so the new context could then hang the GPU.
Simply mark the page as being written when copying into the shmemfs
file, and it the new contents will be retained across swapout.
Fixes: be1cb55a07 ("drm/i915/gt: Keep a no-frills swappable copy of the default context state")
Cc: Sudeep Dutt <sudeep.dutt@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: CQ Tang <cq.tang@intel.com>
Signed-off-by: Venkata Ramana Nayana <venkata.ramana.nayana@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <stable@vger.kernel.org> # v5.8+
Link: https://patchwork.freedesktop.org/patch/msgid/20201127120718.454037-161-matthew.auld@intel.com
We checked the table size against a hardcoded number of entries, and
that number was excluding the special mocs registers at the end.
Fixes: 977933b5da ("drm/i915/gt: Program mocs:63 for cache eviction on gen9")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <stable@vger.kernel.org> # v4.3+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201127102540.13117-1-chris@chris-wilson.co.uk
If while we are cancelling the breadcrumb signaling, we find that the
request is already completed, move it to the irq signaler and let it be
signaled.
v2: Tweak reference counting so that we only acquire a new reference on
adding to a signal list, as opposed to a hidden i915_request_put of the
caller's reference.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201126140407.31952-5-chris@chris-wilson.co.uk
Allow a brief period for continued access to a dead intel_context by
deferring the release of the struct until after an RCU grace period.
As we are using a dedicated slab cache for the contexts, we can defer
the release of the slab pages via RCU, with the caveat that individual
structs may be reused from the freelist within an RCU grace period. To
handle that, we have to avoid clearing members of the zombie struct.
This is required for a later patch to handle locking around virtual
requests in the signaler, as those requests may want to move between
engines and be destroyed while we are holding b->irq_lock on a physical
engine.
v2: Drop mutex_reinit(), if we never mark the mutex as destroyed we
don't need to reset the debug code, at the loss of having the mutex
debug code spot us attempting to destroy a locked mutex.
v3: As the intended use will remain strongly referenced counted, with
very little inflight access across reuse, drop the ctor.
v4: Drop the unrequired change to remove the temporary reference around
dropping the active context, and add back some more missing ctor
operations.
v5: The ctor is back. Tvrtko spotted that ce->signal_lock [introduced
later] maybe accessed under RCU and so needs special care not to be
reinitialised.
v6: Don't mix SLAB_TYPESAFE_BY_RCU and RCU list iteration.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201126140407.31952-3-chris@chris-wilson.co.uk
Pull the repeated check for the last active request being completed to a
single spot, when deciding whether or not execlist preemption is
required.
In doing so, we remove the tasklet kick, introduced with the completion
checks in commit 35f3fd8182 ("drm/i915/execlists: Workaround switching
back to a completed context"), if we find the request was completed but
have not yet seen the corresponding CS event. This was devolving into a
busy spin of the tasklet while we waited for the event as the delivery
was not as instantaneous as expected. Under load this is sufficient to
exhaust the tasklet softirq timeslice, and force ksoftirqd. Quite
noticeable overhead for no apparent improvement in latency.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201126140407.31952-2-chris@chris-wilson.co.uk
Since the introduction of preempt-to-busy, requests can complete in the
background, even while they are not on the engine->active.requests list.
As such, the engine->active.request list itself is not in strict
retirement order, and we have to scan the entire list while unwinding to
not miss any. However, if the request is completed we currently leave it
on the list [until retirement], but we could just as simply remove it
and stop treating it as active. We would only have to then traverse it
once while unwinding in quick succession.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201126140407.31952-1-chris@chris-wilson.co.uk
Ville noticed that the last mocs entry is used unconditionally by the HW
when it performs cache evictions, and noted that while the value is not
meant to be writable by the driver, we should program it to a reasonable
value nevertheless.
As it turns out, we can change the value of mocs:63 and the value we
were programming into it would cause hard hangs in conjunction with
atomic operations.
v2: Add details from bspec about how it is used by HW
Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2707
Fixes: 3bbaba0cea ("drm/i915: Added Programming of the MOCS")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: <stable@vger.kernel.org> # v4.3+
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201126140841.1982-1-chris@chris-wilson.co.uk
Since preempt-to-busy, we may unsubmit a request while it is still on
the HW and completes asynchronously. That means it may be retired and in
the process destroy the virtual engine (as the user has closed their
context), but that engine may still be holding onto the unsubmitted
compelted request. Therefore we need to potentially cleanup the old
request on destroying the virtual engine. We also have to keep the
virtual_engine alive until after the sibling's execlists_dequeue() have
finished peeking into the virtual engines, for which we serialise with
RCU.
v2: Be paranoid and flush the tasklet as well.
v3: And flush the tasklet before the engines, as the tasklet may
re-attach an rb_node after our removal from the siblings.
Fixes: 6d06779e86 ("drm/i915: Load balancing across a virtual engine")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201123113717.20500-4-chris@chris-wilson.co.uk
(cherry picked from commit 46eecfccb4)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
We currently want to keep the interrupt enabled until the interrupt after
which we have no more work to do. This heuristic was broken by us
kicking the irq-work on adding a completed request without attaching a
signaler -- hence it appearing to the irq-worker that an interrupt had
fired when we were idle.
Fixes: 2854d86632 ("drm/i915/gt: Replace intel_engine_transfer_stale_breadcrumbs")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201123113717.20500-3-chris@chris-wilson.co.uk
(cherry picked from commit 3aef910d26)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Move the register slow register write and readback from out of the
critical path for execlists submission and delay it until the following
worker, shaving off around 200us. Note that the same signal_irq_work() is
allowed to run concurrently on each CPU (but it will only be queued once,
once running though it can be requeued and reexecuted) so we have to
remember to lock the global interactions as we cannot rely on the
signal_irq_work() itself providing the serialisation (in constrast to a
tasklet).
By pushing the arm/disarm into the central signaling worker we can close
the race for disarming the interrupt (and dropping its associated
GT wakeref) on parking the engine. If we loose the race, that GT wakeref
may be held indefinitely, preventing the machine from sleeping while
the GPU is ostensibly idle.
v2: Move the self-arming parking of the signal_irq_work to a flush of
the irq-work from intel_breadcrumbs_park().
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2271
Fixes: e23005604b ("drm/i915/gt: Hold context/request reference while breadcrumbs are active")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201123113717.20500-1-chris@chris-wilson.co.uk
(cherry picked from commit 9d5612ca16)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
CT event handler is called under the gt->irq_lock from the interrupt
handling paths so make it the same from the init path. I don't think this
mismatch caused any functional issue but we need to wean the code of the
global i915->irq_lock.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201120095636.1987395-2-tvrtko.ursulin@linux.intel.com
Guc->mmio_msg is set under the guc->irq_lock in guc_get_mmio_msg so it
should be consumed under the same lock from guc_handle_mmio_msg.
I am not sure if the overall flow here makes complete sense but at least
the correct lock is now used.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20201120095636.1987395-1-tvrtko.ursulin@linux.intel.com
Since preempt-to-busy, we may unsubmit a request while it is still on
the HW and completes asynchronously. That means it may be retired and in
the process destroy the virtual engine (as the user has closed their
context), but that engine may still be holding onto the unsubmitted
compelted request. Therefore we need to potentially cleanup the old
request on destroying the virtual engine. We also have to keep the
virtual_engine alive until after the sibling's execlists_dequeue() have
finished peeking into the virtual engines, for which we serialise with
RCU.
v2: Be paranoid and flush the tasklet as well.
v3: And flush the tasklet before the engines, as the tasklet may
re-attach an rb_node after our removal from the siblings.
Fixes: 6d06779e86 ("drm/i915: Load balancing across a virtual engine")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201123113717.20500-4-chris@chris-wilson.co.uk
We currently want to keep the interrupt enabled until the interrupt after
which we have no more work to do. This heuristic was broken by us
kicking the irq-work on adding a completed request without attaching a
signaler -- hence it appearing to the irq-worker that an interrupt had
fired when we were idle.
Fixes: 2854d86632 ("drm/i915/gt: Replace intel_engine_transfer_stale_breadcrumbs")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201123113717.20500-3-chris@chris-wilson.co.uk
Move the register slow register write and readback from out of the
critical path for execlists submission and delay it until the following
worker, shaving off around 200us. Note that the same signal_irq_work() is
allowed to run concurrently on each CPU (but it will only be queued once,
once running though it can be requeued and reexecuted) so we have to
remember to lock the global interactions as we cannot rely on the
signal_irq_work() itself providing the serialisation (in constrast to a
tasklet).
By pushing the arm/disarm into the central signaling worker we can close
the race for disarming the interrupt (and dropping its associated
GT wakeref) on parking the engine. If we loose the race, that GT wakeref
may be held indefinitely, preventing the machine from sleeping while
the GPU is ostensibly idle.
v2: Move the self-arming parking of the signal_irq_work to a flush of
the irq-work from intel_breadcrumbs_park().
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2271
Fixes: e23005604b ("drm/i915/gt: Hold context/request reference while breadcrumbs are active")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201123113717.20500-1-chris@chris-wilson.co.uk
The old IPS interface did not match the RPS interface that we tried to
plug it into (bool vs int return). Once repaired, our minimal
selftesting is finally happy!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201121190352.15996-1-chris@chris-wilson.co.uk
If we run out of ring space, or exceed the desired runtime, we wish to
stop the subtest. Put these checks together, so that we always keep the
requests flushed on completion.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201120140314.24749-3-chris@chris-wilson.co.uk
Include the active timelines for debugfs/i915_engine_info, so that we
can see which have unready requests inflight which are not shown
otherwise.
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201119165616.10834-4-chris@chris-wilson.co.uk
We plan to expand upon the number of available statuses for when we
pretty-print the requests along the timelines, and so need a new set of
flags. We have settled upon:
Unready [U]
- initial status after being submitted, the request is not
ready for execution as it is waiting for external fences
Ready [R]
- all fences the request was waiting on have been signaled,
and the request is now ready for execution and will be
in a backend queue
- a ready request may still need to wait on semaphores
[internal fences]
Ready/virtual [V]
- same as ready, but queued over multiple backends
Executing [E]
- the request has been transferred from the backend queue and
submitted for execution on HW
- a completed request may still be regarded as executing, its
status may not be updated until it is retired and removed
from the lists
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201119165616.10834-3-chris@chris-wilson.co.uk
Extract i915_request_show for reuse in other request chain pretty
printers.
For a bonus point, quietly change the seqno format from %llx to %lld to
match everywhere else.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201119165616.10834-2-chris@chris-wilson.co.uk
Forcing mocs:1 [used for our winsys follows-pte mode] to be cached
caused display glitches. Though it is documented as deprecated (and so
likely behaves as uncached) use the follow-pte bit and force it out of
L3 cache.
Testcase: igt/kms_frontbuffer_tracking
Testcase: igt/kms_big_fb
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201015122138.30161-4-chris@chris-wilson.co.uk
(cherry picked from commit a04ac82736)
Fixes: 849c0fe9e8 ("drm/i915/gt: Initialize reserved and unspecified MOCS indices")
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Rodrigo: Updated Fixes tag]
Since we allocate some breadcrumbs for the virtual engine, and the
virtual engine has a custom destructor, we also need to free the
breadcrumbs after use.
Fixes: b3786b2937 ("drm/i915/gt: Distinguish the virtual breadcrumbs from the irq breadcrumbs")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201118133839.1783-1-chris@chris-wilson.co.uk
(cherry picked from commit 45e50f48b7)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Since we allocate some breadcrumbs for the virtual engine, and the
virtual engine has a custom destructor, we also need to free the
breadcrumbs after use.
Fixes: b3786b2937 ("drm/i915/gt: Distinguish the virtual breadcrumbs from the irq breadcrumbs")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201118133839.1783-1-chris@chris-wilson.co.uk
The presumption was that some time would always elapse between recording
the start and the finish of a context switch. This turns out to be a
regular occurrence and emitting a debug statement superfluous.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201117113103.21480-4-chris@chris-wilson.co.uk
Just like for rkl and tgl, this should be permanent as well for dg1
instead just for A0. The commit making it permanent for those platforms
ended up "racing" with the commit adding the DG1 WAs, so now fix that up.
v2: Add "tgl,dg1" to WA comment (Matt)
Cc: Swathi Dhanavanthri <swathi.dhanavanthri@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201027043228.696518-3-lucas.demarchi@intel.com
I forgot to free the old list when growing past 16 entries.
Luckily, as much as I checked, none of the current platforms has more than
16 workarounds on a single list.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 452420d22d ("drm/i915: Fuse per-context workaround handling with the common framework")
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20201113132510.2298483-1-tvrtko.ursulin@linux.intel.com
(cherry picked from commit 77c296966e)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Some media power gates are disabled by default. commit 5d86923060
("drm/i915/tgl: Enable VD HCP/MFX sub-pipe power gating")
tried to enable it, but it duplicated an existent register.
So, the main PG setup sequences ended up overwriting it.
So, let's now merge this to the main PG setup sequence.
v2: (Chris): s/BIT/REG_BIT, remove useless comment,
remove useless =0, use the right gt,
remove rc6 sequence doubt from commit message.
Fixes: 5d86923060 ("drm/i915/tgl: Enable VD HCP/MFX sub-pipe power gating")
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: stable@vger.kernel.org#v5.5+
Cc: Dale B Stimson <dale.b.stimson@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20201111072859.1186070-1-rodrigo.vivi@intel.com
(cherry picked from commit 695dc55b57)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
I forgot to free the old list when growing past 16 entries.
Luckily, as much as I checked, none of the current platforms has more than
16 workarounds on a single list.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 452420d22d ("drm/i915: Fuse per-context workaround handling with the common framework")
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20201113132510.2298483-1-tvrtko.ursulin@linux.intel.com
Cross-subsystem Changes:
- DMA mapped scatterlist fixes in i915 to unblock merging of
https://lkml.org/lkml/2020/9/27/70 (Tvrtko, Tom)
Driver Changes:
- Fix for user reported issue #2381 (Graphical output stops with "switching to inteldrmfb from simple"):
Mark ininitial fb obj as WT on eLLC machines to avoid rcu lockup during fbdev init (Ville, Chris)
- Fix for Tigerlake (and earlier) to avoid spurious empty CSB events leading to hang (Chris, Bruce)
- Delay execlist processing for Tigerlake to avoid hang (Chris)
- Fix for Tigerlake RCS engine health check through heartbeat (Chris)
- Fix for Tigerlake reserved MOCS entries (Ayaz, Chris)
- Fix Media power gate sequence on Tigerlake (Rodrigo)
- Enable eLLC caching of display buffers for SKL+ (Ville)
- Support parsing of oversize batches on Gen9 (Matt, Chris)
- Exclude low pages (128KiB) of stolen from use to avoid thrashing during reset (Chris)
- Flush engines before Tigerlake breadcrumbs (Chris)
- Use the local HWSP offset during submission (Chris)
- Flush coherency domains on first set-domain-ioctl (Chris, Zbigniew)
- Use the active reference on the vma while capturing to avoid use-after-free (Chris)
- Fix MOCS PTE setting for gen9+ (Ville)
- Avoid NULL dereference on IPS driver callback while unbinding i915 (Chris)
- Avoid NULL dereference from PT/PD stash allocation error (Matt)
- Hold request reference for canceling an active context (Chris)
- Avoid infinite loop on x86-32 when mapping a lot of objects (Chris)
- Disallow WC mappings when processor doesn't support them (Chris)
- Return correct error in i915_gem_object_copy_blt() error path (Dan)
- Return correct error in intel_context_create_request() error path (Maarten)
- Tune down GuC communication enabled/disabled messages to debug (Jani)
- Fix rebased commit "Remove i915_request.lock requirement for execution callbacks" (Chris)
- Cancel outstanding work after disabling heartbeats on an engine (Chris)
- Signal cancelled requests (Chris)
- Retire cancelled requests on unload (Chris)
- Scrub HW state on driver remove (Chris)
- Undo forced context restores after trivial preemptions (Chris)
- Handle PCI unbind in PMU code (Tvrtko)
- Fix CPU hotplug with multiple GPUs in PMU code (Trtkko)
- Correctly set SFC capability for video engines (Venkata)
- Update GuC code to use firmware v49.0.1 (John, Matthew B., Daniele, Oscar, Michel, Rodrigo, Michal)
- Improve GuC warnings on loading failure (John)
- Avoid ownership race in buffer pool by clearing age (Chris)
- Use MMIO to read CSB in case of failure (Chris, Mika)
- Show engine properties in engine state dump to indicate changes (Chris, Joonas)
- Break up error capture compression loops with cond_resched() (Chris)
- Reduce GPU error capture mutex hold time to avoid khungtaskd (Chris)
- Serialise debugfs i915_gem_objects with ctx->mutex (Chris)
- Always test execution status on closing the context and close if not persistent (Chris)
- Avoid mixing integer types during batch copies (Chris, Jared)
- Skip over MI_NOOP when parsing to avoid overhead (Chris)
- Hold onto an explicit ref to i915_vma_work.pinned (Chris)
- Perform all asynchronous waits prior to marking payload start (Chris)
- Pull phys pread/pwrite implementations to the backend (Matt)
- Improve record of hung engines in error state (Tvrtko)
- Allow backends to override pread implementation (Matt)
- Reinforce LRC poisoning checks to confirm context survives execution (Chris)
- Fix memory region max size calculation (Matt)
- Fix order when adding blocks to memory region (Matt)
- Eliminate unused intel_virtual_engine_get_sibling func (Chris)
- Cleanup kasan warning for on-stack (unsigned long) casting (Chris)
- Onion unwind for scratch page allocation failure (Chris)
- Poison stolen pages before use (Chris)
- Selftest improvements (Chris)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201112163407.GA20320@jlahtine-mobl.ger.corp.intel.com
Some media power gates are disabled by default. commit 5d86923060
("drm/i915/tgl: Enable VD HCP/MFX sub-pipe power gating")
tried to enable it, but it duplicated an existent register.
So, the main PG setup sequences ended up overwriting it.
So, let's now merge this to the main PG setup sequence.
v2: (Chris): s/BIT/REG_BIT, remove useless comment,
remove useless =0, use the right gt,
remove rc6 sequence doubt from commit message.
Fixes: 5d86923060 ("drm/i915/tgl: Enable VD HCP/MFX sub-pipe power gating")
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: stable@vger.kernel.org#v5.5+
Cc: Dale B Stimson <dale.b.stimson@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20201111072859.1186070-1-rodrigo.vivi@intel.com
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl+oiOgeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGKBQIAJw6oad/FA7j9OO2
dMoaXb8UaBehGWgW2rdfWrFPV5v0DBnp/GkdRpLoZIjV3W4mBfnog7bIa4Eswlxo
Y8sZxo5/3JlgJQUkHvzR1TYk5z61lHkUw9Kj/cCyx6YdbjSl19AfFsnhQVVMuyp9
TXL2c7hxkHlw8eBGrymVu0Ip7Zq0x8O2g+8nQpmRcvaR6SBuSHdikDF/iWCtU1YW
wpk5eWEVaAO67keZOz6b+aCFHqjFX+1dUBBuPnslucYLR73Qi16hfaU9pebe97Gb
lX/MJ1bR9BeRp314cF0PYbm4WhKjRLudHOFJH8x3dj/BiYNrFK3SJGLiiTwsrAZ8
kytU0Xs=
=Ke/D
-----END PGP SIGNATURE-----
Merge v5.10-rc3 into drm-next
We need commit f8f6ae5d07 ("mm: always have io_remap_pfn_range() set
pgprot_decrypted()") to be able to merge Jason's cleanup patch.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Between events which trigger engine and GPU resets and capturing the error
state we lose information on which engine triggered the reset. Improve
this by passing in the hung engine mask down to error capture.
Result is that the list of engines in user visible "GPU HANG: ecode
<gen>:<engines>:<ecode>, <process>" is now a list of hanging and not just
active engines. Most importantly the displayed process is now the one
which was actually hung.
v2:
* Stub prototype. (Chris)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20201104134743.916027-1-tvrtko.ursulin@linux.intel.com
SFC capability of video engines is not set correctly because i915
is testing for incorrect bits.
Fixes: c5d3e39caa ("drm/i915: Engine discovery query")
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.3+
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20201106011842.36203-1-daniele.ceraolospurio@intel.com
In a simple test case that writes to scratch and then busy-waits for the
batch to be signaled, we observe that the signal is before the write is
posted. That is bad news.
Splitting the flush + write_dword into two separate flush_dw prevents
the issue from being reproduced, we can presume the post-sync op is not
so post-sync.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/216
Testcase: igt/gem_exec_fence/parallel
Testcase: igt/i915_selftest/live/gt_timelines
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: stable@vger.kernel.org
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201102221057.29626-2-chris@chris-wilson.co.uk
(cherry picked from commit 09212e81e5)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Add another lower level to emit_ggtt_write so that the GGTT nature of
the write is not hardcoded into the emitter.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201102221057.29626-1-chris@chris-wilson.co.uk
(cherry picked from commit 2739d8cfc5)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
We wrap the timeline on construction of the next request, but there may
still be requests in flight that have not yet finalized the breadcrumb.
(The breadcrumb is delayed as we need engine-local offsets, and for the
virtual engine that is not known until execution.) As such, by the time
we write to the timeline's HWSP offset it may have changed, and we
should use the value we preserved in the request instead.
Though the window is small and infrequent (at full flow we can expect a
timeline's seqno to wrap once every 30 minutes), the impact of writing
the old seqno into the new HWSP is severe: the old requests are never
completed, and the new requests are completed before they are even
submitted.
Fixes: ebece75392 ("drm/i915: Keep timeline HWSP allocated until idle across the system")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.2+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201022064127.10159-1-chris@chris-wilson.co.uk
(cherry picked from commit c10f6019d0)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
In a simple test case that writes to scratch and then busy-waits for the
batch to be signaled, we observe that the signal is before the write is
posted. That is bad news.
Splitting the flush + write_dword into two separate flush_dw prevents
the issue from being reproduced, we can presume the post-sync op is not
so post-sync.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/216
Testcase: igt/gem_exec_fence/parallel
Testcase: igt/i915_selftest/live/gt_timelines
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: stable@vger.kernel.org
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201102221057.29626-2-chris@chris-wilson.co.uk
Clear out some pointers when objects have been de-allocated. This
makes it much easier to track down use-after-free type issues.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201028145826.2949180-4-John.C.Harrison@Intel.com
Rather than just saying 'GuC failed to load: -110', actually print out
the GuC status register and break it down into the individual fields.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201028145826.2949180-3-John.C.Harrison@Intel.com
The latest GuC firmware includes a number of interface changes that
require driver updates to match.
* Starting from Gen11, the ID to be provided to GuC needs to contain
the engine class in bits [0..2] and the instance in bits [3..6].
NOTE: this patch breaks pointer dereferences in some existing GuC
functions that use the guc_id to dereference arrays but these functions
are not used for now as we have GuC submission disabled and we will
update these functions in follow up patch which requires new IDs.
* The new GuC requires the additional data structure (ADS) and associated
'private_data' pointer to be setup. This is basically a scratch area
of memory that the GuC owns. The size is read from the CSS header.
* There is now a physical to logical engine mapping table in the ADS
which needs to be configured in order for the firmware to load. For
now, the table is initialised with a 1 to 1 mapping.
* GUC_CTL_CTXINFO has been removed from the initialization params.
* reg_state_buffer is maintained internally by the GuC as part of
the private data.
* The ADS layout has changed significantly. This patch updates the
shared structure and also adds better documentation of the layout.
* While i915 does not use GuC doorbells, the firmware now requires
that some initialisation is done.
* The number of engine classes and instances supported in the ADS has
been increased.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Tomasz Lis <tomasz.lis@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201028145826.2949180-2-John.C.Harrison@Intel.com
fbcon/fonts:
- Two patches to prevent OOB access
ttm:
- fix for evicition value range check
amdgpu:
- Sienna Cichlid fixes
- MST manager resource leak fix
- GPU reset fix
amdkfd:
- Luxmark fix for Navi1x
i915:
- Tweak initial DPCD backlight.enabled value (Sean)
- Initialize reserved MOCS indices (Ayaz)
- Mark initial fb obj as WT on eLLC machines to avoid rcu lockup (Ville)
- Support parsing of oversize batches (Chris)
- Delay execlists processing for TGL (Chris)
- Use the active reference on the vma during error capture (Chris)
- Widen CSB pointer (Chris)
- Wait for CSB entries on TGL (Chris)
- Fix unwind for scratch page allocation (Chris)
- Exclude low patches of stolen memory (Chris)
- Force VT'd workarounds when running as a guest OS (Chris)
- Drop runtime-pm assert from vpgu io accessors (Chris)
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJfkhyvAAoJEAx081l5xIa+3qIQAKjvgIvuoLix+bl/gkOHwymv
QHTXVA3Gi4Rup778L3k1GNppQX4/LE42Of9wnsZIELxG0vxKl59JGzlWJVld/rtJ
ZXfoT5kh7Hh7kkiXnC2s11aqQSsgr25lfsY8gOWSPIGfwOn07JSMTkqPWlbwrOz2
il6qlOlgfSNfXwn2NxSTzGxZMrgOyUvKNZczRXU9gSuuoTsEo4bAvS7/vEN4hazX
DFQAZfd82PfcdAIkVzk/gOoaCQ6a9YgjOzg1RQ4gKhrj8UaWu4gUyJwWPjSMnLrh
uP2RM3gU54MhVF3jHt+D0Trv2ti2zStD9wc4AEJwOVQZtcDSsgGduOdUs5Xc/1l5
dBOvpumBmNxsYbVvvThijNeSx6Y5ybzI3iUp8SLDftiRZtwsXds2aRaskuVgMqj4
MSBdOiZqoJudLjwCBHWKe328+r00X+f14Vi30Y0cy4VW59NxLG5D7qjc5BGqJw1q
J3FQ/9uDh0lWbeKT/grapjP43IWcLApykZa3Rn6p2w0mW2+8Wht/WbrFYyNYGlzf
aNS9RnknTaMYWpvZUZLVG83dJpn6Y9ooHa9L/blMzfCxpF6ftEYf+Iq2x8s0gprz
tIq0xsGvBacsnQIOWRuHjuF87zibVbDp9ba+x78F/woyUqEhip+lBXaPofp132gQ
HtIdQewvG9KcLZcoluO4
=rAUM
-----END PGP SIGNATURE-----
Merge tag 'drm-next-2020-10-23' of git://anongit.freedesktop.org/drm/drm
Pull more drm fixes from Dave Airlie:
"This should be the last round of things for rc1, a bunch of i915
fixes, some amdgpu, more font OOB fixes and one ttm fix just found
reading code:
fbcon/fonts:
- Two patches to prevent OOB access
ttm:
- fix for evicition value range check
amdgpu:
- Sienna Cichlid fixes
- MST manager resource leak fix
- GPU reset fix
amdkfd:
- Luxmark fix for Navi1x
i915:
- Tweak initial DPCD backlight.enabled value (Sean)
- Initialize reserved MOCS indices (Ayaz)
- Mark initial fb obj as WT on eLLC machines to avoid rcu lockup (Ville)
- Support parsing of oversize batches (Chris)
- Delay execlists processing for TGL (Chris)
- Use the active reference on the vma during error capture (Chris)
- Widen CSB pointer (Chris)
- Wait for CSB entries on TGL (Chris)
- Fix unwind for scratch page allocation (Chris)
- Exclude low patches of stolen memory (Chris)
- Force VT'd workarounds when running as a guest OS (Chris)
- Drop runtime-pm assert from vpgu io accessors (Chris)"
* tag 'drm-next-2020-10-23' of git://anongit.freedesktop.org/drm/drm: (31 commits)
drm/amdgpu: correct the cu and rb info for sienna cichlid
drm/amd/pm: remove the average clock value in sysfs
drm/amd/pm: fix pp_dpm_fclk
Revert drm/amdgpu: disable sienna chichlid UMC RAS
drm/amd/pm: fix pcie information for sienna cichlid
drm/amdkfd: Use same SQ prefetch setting as amdgpu
drm/amd/swsmu: correct wrong feature bit mapping
drm/amd/psp: Fix sysfs: cannot create duplicate filename
drm/amd/display: Avoid MST manager resource leak.
drm/amd/display: Revert "drm/amd/display: Fix a list corruption"
drm/amdgpu: update golden setting for sienna_cichlid
drm/amd/swsmu: add missing feature map for sienna_cichlid
drm/amdgpu: correct the gpu reset handling for job != NULL case
drm/amdgpu: add rlc iram and dram firmware support
drm/amdgpu: add function to program pbb mode for sienna cichlid
drm/i915: Drop runtime-pm assert from vgpu io accessors
drm/i915: Force VT'd workarounds when running as a guest OS
drm/i915: Exclude low pages (128KiB) of stolen from use
drm/i915/gt: Onion unwind for scratch page allocation failure
drm/ttm: fix eviction valuable range check.
...
intel_timeline_read_hwsp() is used to support semaphore waits between
engines, that may themselves be deferred for arbitrary periods -- that
is the read of the target request's HWSP is at an indeterminant point in
the future. To support this, we need to prevent overwriting a HWSP that
is being watched across a seqno wrap (otherwise the next request will
write its value into the old HWSP preventing the watcher from making
progress, ad infinitum.) To simulate the observer across a wrap, let's
create a request that reads from the HWSP and dispatch it at different
points around a wrap to see if the value is lost.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201021220411.5777-2-chris@chris-wilson.co.uk
We wrap the timeline on construction of the next request, but there may
still be requests in flight that have not yet finalized the breadcrumb.
(The breadcrumb is delayed as we need engine-local offsets, and for the
virtual engine that is not known until execution.) As such, by the time
we write to the timeline's HWSP offset it may have changed, and we
should use the value we preserved in the request instead.
Though the window is small and infrequent (at full flow we can expect a
timeline's seqno to wrap once every 30 minutes), the impact of writing
the old seqno into the new HWSP is severe: the old requests are never
completed, and the new requests are completed before they are even
submitted.
Fixes: ebece75392 ("drm/i915: Keep timeline HWSP allocated until idle across the system")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.2+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201022064127.10159-1-chris@chris-wilson.co.uk
Since Ironlake uses intel_ips.ko for its dynamic frequency adjustment,
we do not have direct control over the frequency management so such
tests are defunct. Similarly, we can't check the gen6+ RPS registers on
Ironlake.
Hopefully this catches all the invalid tests now that Ironlake has
rejoined the dynamic GPU frequency club. There is an opportunity for the
reader to add tests to exercise MEMINTRSTS and co.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201022210814.23004-1-chris@chris-wilson.co.uk
Let's unmask the PCU event irq _after_ we've set up the
hardware and software to deal with the fallout. We can
also drop the PCU event bit from DEIER except when we
need it for rps.
And on the disable side we replace the hand rolled (and
unlocked) DEIER/IIR/IMR frobbing with ilk_disable_display_irq().
Ocd does require me to reorder it to be symmetric with
the enable path however.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201021131443.25616-5-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
There is no GEN6_RPSTAT1 on ILK. Instead of reading that let's
try to get the same information from MEMSTAT_ILK. At least it
seems to track MEMSWCTL frequency request perfectly on my ILK.
It needs the same invert trick as the request value.
We don't want to put the invert thing into intel_gpu_freq()
and intel_freq_opcode() because that would incorrectly invert
the min/max/etc frequencies also.
One day someone might want to reverse engineer the formula for
converting these numbers to Hz, but for now we'll just report
them raw.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201021131443.25616-2-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
In order to test how fast the heartbeat can respond, we measure with the
interval set to its minimum. Before we measure though, we want to be
sure we start with a fresh pulse, and so wait until any old one is
complete. During that wait though, we were continually flushing the
work, and so continually re-evaluating to see if the pulse was complete,
and each attempt would count as an unresponsive system. If the engine
did not complete the request in the couple of busy-spins, it would flag
an error. This is unfortunate, so let's not busy-spin waiting for the
old heartbeat, but terminate it and start afresh.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201019142841.32273-1-chris@chris-wilson.co.uk
The GPU is trashing the low pages of its reserved memory upon reset. If
we are using this memory for ringbuffers, then we will dutiful resubmit
the trashed rings after the reset causing further resets, and worse. We
must exclude this range from our own use. The value of 128KiB was found
by empirical measurement (and verified now with a selftest) on gen9.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201019165005.18128-2-chris@chris-wilson.co.uk
(cherry picked from commit d3606757e6)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
In switching to using objects for our ppGTT scratch pages, care was not
taken to avoid trying to unref NULL objects on failure. And for gen6
ppGTT, it appears we forgot entirely to unwind after a partial allocation
failure.
Fixes: 89351925a4 ("drm/i915/gt: Switch to object allocations for page directories")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201019083444.1286-1-chris@chris-wilson.co.uk
(cherry picked from commit fa812ce96a)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The GPU is trashing the low pages of its reserved memory upon reset. If
we are using this memory for ringbuffers, then we will dutiful resubmit
the trashed rings after the reset causing further resets, and worse. We
must exclude this range from our own use. The value of 128KiB was found
by empirical measurement (and verified now with a selftest) on gen9.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201019165005.18128-2-chris@chris-wilson.co.uk
On Tigerlake, we are seeing a repeat of commit d8f5053117 ("drm/i915/icl:
Forcibly evict stale csb entries") where, presumably, due to a missing
Global Observation Point synchronisation, the write pointer of the CSB
ringbuffer is updated _prior_ to the contents of the ringbuffer. That is
we see the GPU report more context-switch entries for us to parse, but
those entries have not been written, leading us to process stale events,
and eventually report a hung GPU.
However, this effect appears to be much more severe than we previously
saw on Icelake (though it might be best if we try the same approach
there as well and measure), and Bruce suggested the good idea of resetting
the CSB entry after use so that we can detect when it has been updated by
the GPU. By instrumenting how long that may be, we can set a reliable
upper bound for how long we should wait for:
513 late, avg of 61 retries (590 ns), max of 1061 retries (10099 ns)
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2045
References: d8f5053117 ("drm/i915/icl: Forcibly evict stale csb entries")
References: HSDES#22011327657, HSDES#1508287568
Suggested-by: Bruce Chang <yu.bruce.chang@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: stable@vger.kernel.org # v5.4
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200915134923.30088-2-chris@chris-wilson.co.uk
(cherry picked from commit 233c1ae3c8)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
A CSB entry is 64b, and it is simpler for us to treat it as an array of
64b entries than as an array of pairs of 32b entries.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200915134923.30088-1-chris@chris-wilson.co.uk
(cherry picked from commit f24a44e52f)
(cherry picked from commit 3d4dbe0e0f0d04ebcea917b7279586817da8cf46)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
We may try to preempt the currently executing request, only to find that
after unravelling all the dependencies that the original executing
context is still the earliest in the topological sort and re-submitted
back to HW (if we do detect some change in the ELSP that requires
re-submission). However, due to the way we check for wrap-around during
the unravelling, we mark any context that has been submitted just once
(i.e. with the rq->wa_tail set, but the ring->tail earlier) as
potentially wrapping and requiring a forced restore on resubmission.
This was expected to be not a problem, as it was anticipated that most
unwinding for preemption would result in a context switch and the few
that did not would be lost in the noise. It did not take long for
someone to find one particular workload where the cost of those extra
context restores was measurable.
However, since we know the wa_tail is of fixed size, and we know that a
request must be larger than the wa_tail itself, we can safely maintain
the check for request wrapping and check against a slightly future point
in the ring that includes an expected wa_tail. (That is if the
ring->tail is already set to rq->wa_tail, including another 8 bytes in
the check does not invalidate the incremental wrap detection.)
Fixes: 8ab3a3812a ("drm/i915/gt: Incrementally check for rewinding")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.4+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201002083425.4605-1-chris@chris-wilson.co.uk
(cherry picked from commit bb65548e3c)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
When running gem_exec_nop, it floods the system with many requests (with
the goal of userspace submitting faster than the HW can process a single
empty batch). This causes the driver to continually resubmit new
requests onto the end of an active context, a flood of lite-restore
preemptions. If we time this just right, Tigerlake hangs.
Inserting a small delay between the processing of CS events and
submitting the next context, prevents the hang. Naturally it does not
occur with debugging enabled. The suspicion then is that this is related
to the issues with the CS event buffer, and inserting an mmio read of
the CS pointer status appears to be very successful in preventing the
hang. Other registers, or uncached reads, or plain mb, do not prevent
the hang, suggesting that register is key -- but that the hang can be
prevented by a simple udelay, suggests it is just a timing issue like
that encountered by commit 233c1ae3c8 ("drm/i915/gt: Wait for CSB
entries on Tigerlake"). Also note that the hang is not prevented by
applying CTX_DESC_FORCE_RESTORE, or by inserting a delay on the GPU
between requests.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: stable@vger.kernel.org
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201015195023.32346-1-chris@chris-wilson.co.uk
(cherry picked from commit 6ca7217dff)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
In order to avoid functional breakage of mis-programmed applications that
have grown to depend on unused MOCS entries, we are programming
those entries to be equal to fully cached ("L3 + LLC") entry.
These reserved and unspecified entries should not be used as they may be
changed to less performant variants with better coherency in the future
if more entries are needed.
v2: As suggested by Lucas De Marchi to utilise __init_mocs_table for
programming default value, setting I915_MOCS_PTE index of tgl_mocs_table
with desired value.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Tomasz Lis <tomasz.lis@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Francisco Jerez <currojerez@riseup.net>
Cc: Mathew Alwin <alwin.mathew@intel.com>
Cc: Mcguire Russell W <russell.w.mcguire@intel.com>
Cc: Spruit Neil R <neil.r.spruit@intel.com>
Cc: Zhou Cheng <cheng.zhou@intel.com>
Cc: Benemelis Mike G <mike.g.benemelis@intel.com>
Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200729102539.134731-2-ayaz.siddiqui@intel.com
Cc: stable@vger.kernel.org
(cherry picked from commit 4d8a5cfe3b)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
In switching to using objects for our ppGTT scratch pages, care was not
taken to avoid trying to unref NULL objects on failure. And for gen6
ppGTT, it appears we forgot entirely to unwind after a partial allocation
failure.
Fixes: 89351925a4 ("drm/i915/gt: Switch to object allocations for page directories")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201019083444.1286-1-chris@chris-wilson.co.uk
shmem_pin_map somewhat awkwardly reimplements vmap using alloc_vm_area and
manual pte setup. The only practical difference is that alloc_vm_area
prefeaults the vmalloc area PTEs, which doesn't seem to be required here
(and could be added to vmap using a flag if actually required). Switch to
use vmap, and use vfree to free both the vmalloc mapping and the page
array, as well as dropping the references to each page.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Link: https://lkml.kernel.org/r/20201002122204.1534411-7-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Repeat our sanitychecks from before execution to after execution. One
expects that if we were to see these, the gpu would already be on fire,
but the timing may be informative.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201015190816.31763-1-chris@chris-wilson.co.uk
We may try to preempt the currently executing request, only to find that
after unravelling all the dependencies that the original executing
context is still the earliest in the topological sort and re-submitted
back to HW (if we do detect some change in the ELSP that requires
re-submission). However, due to the way we check for wrap-around during
the unravelling, we mark any context that has been submitted just once
(i.e. with the rq->wa_tail set, but the ring->tail earlier) as
potentially wrapping and requiring a forced restore on resubmission.
This was expected to be not a problem, as it was anticipated that most
unwinding for preemption would result in a context switch and the few
that did not would be lost in the noise. It did not take long for
someone to find one particular workload where the cost of those extra
context restores was measurable.
However, since we know the wa_tail is of fixed size, and we know that a
request must be larger than the wa_tail itself, we can safely maintain
the check for request wrapping and check against a slightly future point
in the ring that includes an expected wa_tail. (That is if the
ring->tail is already set to rq->wa_tail, including another 8 bytes in
the check does not invalidate the incremental wrap detection.)
Fixes: 8ab3a3812a ("drm/i915/gt: Incrementally check for rewinding")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.4+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201002083425.4605-1-chris@chris-wilson.co.uk
When running gem_exec_nop, it floods the system with many requests (with
the goal of userspace submitting faster than the HW can process a single
empty batch). This causes the driver to continually resubmit new
requests onto the end of an active context, a flood of lite-restore
preemptions. If we time this just right, Tigerlake hangs.
Inserting a small delay between the processing of CS events and
submitting the next context, prevents the hang. Naturally it does not
occur with debugging enabled. The suspicion then is that this is related
to the issues with the CS event buffer, and inserting an mmio read of
the CS pointer status appears to be very successful in preventing the
hang. Other registers, or uncached reads, or plain mb, do not prevent
the hang, suggesting that register is key -- but that the hang can be
prevented by a simple udelay, suggests it is just a timing issue like
that encountered by commit 233c1ae3c8 ("drm/i915/gt: Wait for CSB
entries on Tigerlake"). Also note that the hang is not prevented by
applying CTX_DESC_FORCE_RESTORE, or by inserting a delay on the GPU
between requests.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: stable@vger.kernel.org
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201015195023.32346-1-chris@chris-wilson.co.uk
DG1 shares some workarounds with TGL and RKL and also has some
additional workarounds of its own.
v2: Correct location of Wa_1408615072 (JohnH).
v3: Apply WAs 1606700617, 18011464164 and 22010931296 to DG1 (José)
v4 (Anusha)
- Add Wa_22010271021
- s/Wa_14010096844/Wa_1409836686
v5:
- Extend Wa_14010919138 to all revs (Matt Atwood)
- Power gate media is global gen12 design. (Rodrigo)
- Rebase (Lucas)
v6: use REG_BIT() to fix checkpatch warning (Lucas)
BSpec: 53508
Cc: Matt Atwood <matthew.s.atwood@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201014191937.1266226-8-lucas.demarchi@intel.com
Forcing mocs:1 [used for our winsys follows-pte mode] to be cached
caused display glitches. Though it is documented as deprecated (and so
likely behaves as uncached) use the follow-pte bit and force it out of
L3 cache.
Fixes: 4d8a5cfe3b ("drm/i915/gt: Initialize reserved and unspecified MOCS indices")
Testcase: igt/kms_frontbuffer_tracking
Testcase: igt/kms_big_fb
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201015122138.30161-4-chris@chris-wilson.co.uk
Since SKL the eLLC has been sitting on the far side of the system
agent, meaning the display engine can utilize it. Let's enable that.
I chose WB for the caching mode, because my numbers are indicating
that WT might actually be WB and WC might actually be UC. I'm not
100% sure that is indeed the case but at least my simple rendercopy
based benchmark didn't see any difference in performance.
Also if I configure things to do LLCeLLC+WT I still get cache dirt
on my screen, suggesting that is in fact operating in WB mode
anyway. This is also the reason I had to fix the MOCS target cache
to really say PTE rather than LLC+eLLC.
Since SKL the eLLC has been sitting on the far side of the system agent,
meaning the display engine can utilize it. Let's enable that.
Eero's earlier benchmarks numbers:
"* Results in GfxBench and Unigine (Valley/Heaven) tests were within daily
variation on the tested SKL machines
* SKL GT4e (128MB eLLC) / Wayland / Weston:
+15-20% SynMark TexMem512 (512MB of textures)
+4-6% SynMark TerrainFly*, CSCloth, ShMapVsm
-5-10% SynMark TexMem128 (128MB of textures)
* SKL GT3e (64MB eLLC) / Xorg / Unity:
+4-8% GpuTest Triangle fullscreen (FullHD)
-5-10% GpuTest Triangle windowed (1/2 screen)
* SKL GT2 (no eLLC) / Xorg / Unity:
* Some of the higher FPS SynMark pixel and vertex shader tests
are few percent higher, more than daily variance
=> Do you see any reason why this machine would be impacted
although it doesn't eLLC?"
Caveats:
- Still haven't tested with a prime setup
- Still not entirely sure this a good idea, but I've been
using it on my cfl anyway :)
v2: Split the MOCS PTE change out
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20201007120329.17076-3-ville.syrjala@linux.intel.com
Link: https://patchwork.freedesktop.org/patch/msgid/20201015122138.30161-3-chris@chris-wilson.co.uk
Recently we came across requirement to identify EHL and JSL
platform to program them differently. Thus Split the basic
platform definition, macros, and PCI IDs to differentiate
between EHL and JSL platforms. Also, IS_ELKHARTLAKE is replaced
with IS_JSL_EHL everywhere.
Changes since V1 :
- Rebased to avoid merge conflicts
- Added missed check for jasperlake in intel_uc_fw.c
Cc : Matt Roper <matthew.d.roper@intel.com>
Cc : Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201013192948.63470-1-tejaskumarx.surendrakumar.upadhyay@intel.com
In order to avoid functional breakage of mis-programmed applications that
have grown to depend on unused MOCS entries, we are programming
those entries to be equal to fully cached ("L3 + LLC") entry.
These reserved and unspecified entries should not be used as they may be
changed to less performant variants with better coherency in the future
if more entries are needed.
v2: As suggested by Lucas De Marchi to utilise __init_mocs_table for
programming default value, setting I915_MOCS_PTE index of tgl_mocs_table
with desired value.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Tomasz Lis <tomasz.lis@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Francisco Jerez <currojerez@riseup.net>
Cc: Mathew Alwin <alwin.mathew@intel.com>
Cc: Mcguire Russell W <russell.w.mcguire@intel.com>
Cc: Spruit Neil R <neil.r.spruit@intel.com>
Cc: Zhou Cheng <cheng.zhou@intel.com>
Cc: Benemelis Mike G <mike.g.benemelis@intel.com>
Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200729102539.134731-2-ayaz.siddiqui@intel.com
Cc: stable@vger.kernel.org
The updated bspec forcewake table also provides us with new multicast
ranges that should be reflected in our workaround code.
Note that there are different types of multicast registers with
different styles of replication and different steering registers. The
i915 MCR range lists we're updating here are only used to ensure we can
verify workarounds properly (i.e., if we can't steer register reads we
don't want to verify workarounds where an unsteered read might hit a
fused-off instance of the unit). Because of this, we don't need to
include any of the multicast ranges where all instances of the register
will always present and fusing doesn't play a role. Specifically, that
means that we are not including the MCR ranges designated as "SQIDI" in
the bspec.
Bspec: 66696
Cc: Caz Yokoyama <caz.yokoyama@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201009194442.3668677-4-matthew.d.roper@intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
The power well that we've been referring to as the 'blitter' well is
actually more of a general GT power well which contains a lot of things
other than the blitter engine registers. The FORCEWAKE_BLITTER name in
the code was used for historic reasons, but no longer matches how the
bspec describes this power well and just causes confusion for people not
familiar with this area of the code. Let's rename it to FORCEWAKE_GT to
more accurately describe the role of the power well and match how the
modern bspec refers to it.
v2:
- Add a comment noting that the GT power well includes the blitter
engine. (Jose)
Bspec: 66696, 66534, 67609
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201009194442.3668677-2-matthew.d.roper@intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
DG1 has a new MOCS table. We still use the old definition of the table,
but as for any dgfx card it doesn't contain the control_value values
(these values don't matter as we won't program them).
Bspec: 45101
v2: Reword the comment to state that the last few entries are reserved
instead of "the last two". DG1 reserves four instead of two from
previous platforms (from Matt Roper)
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201007002210.3678024-3-lucas.demarchi@intel.com
Since we track the idle_pulse for flushing the barriers and avoid
re-emitting the pulse upon idling if no futher action is required, this
also impacts the heartbeat. Before emitting a fresh heartbeat, we look
at the engine idle status and assume that if the pulse was the last
request emitted along the heartbeat, the engine is idling and a
heartbeat pulse not required. This assumption fails, but we can reuse
the idle pulse as the heartbeat if we are yet to emit one, and so track
the status of that pulse for our engine health check.
This impacts tgl/rcs0 as we rely on the heartbeat for our healthcheck for
the normal preemption detection mechanism is disabled by default.
Testcase: igt/gem_exec_schedule/preempt-hang/rcs0 #tgl
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201006094653.7558-1-chris@chris-wilson.co.uk
As the previous patch fixed the places where we walk the whole scatterlist
for DMA addresses, this patch fixes the random lookup functionality.
To achieve this we have to add a second lookup iterator and add a
i915_gem_object_get_sg_dma helper, to be used analoguous to existing
i915_gem_object_get_sg_dma. Therefore two lookup caches are maintained per
object and they are flushed at the same point for simplicity. (Strictly
speaking the DMA cache should be flushed from i915_gem_gtt_finish_pages,
but today this conincides with unsetting of the pages in general.)
Partial VMA view is then fixed to use the new DMA lookup and properly
query sg length.
v2:
* Checkpatch.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Tom Murphy <murphyt7@tcd.ie>
Cc: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20201006092508.1064287-2-tvrtko.ursulin@linux.intel.com
When walking DMA mapped scatterlists sg_dma_len has to be used since it
can be different (coalesced) from the backing store entry.
This also means we have to end the walk when encountering a zero length
DMA entry and cannot rely on the normal sg list end marker.
Both issues were there in theory for some time but were hidden by the fact
Intel IOMMU driver was never coalescing entries. As there are ongoing
efforts to change this we need to start handling it.
v2:
* Use unsigned int for local storing sg_dma_len. (Logan)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
References: 85d1225ec0 ("drm/i915: Introduce & use new lightweight SGL iterators")
References: b31144c0da ("drm/i915: Micro-optimise gen6_ppgtt_insert_entries()")
Reported-by: Tom Murphy <murphyt7@tcd.ie>
Suggested-by: Tom Murphy <murphyt7@tcd.ie> # __sgt_iter
Suggested-by: Logan Gunthorpe <logang@deltatee.com> # __sgt_iter
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201006092508.1064287-1-tvrtko.ursulin@linux.intel.com
Currently we do a final scrub of the HW state upon release. However,
when rebinding the device, this is too late as the device may either
have been partially rebound or the device is no longer accessible. If
the device has been removed before release, the reset goes astray
leaving the device in an inconsistent state, unlikely to work without a
full PCI reset. Furthermore, if the device is partially rebound before
the HW scrubbing, there may be leftover HW state that should have been
scrubbed. Either way, we need to push the scrubbing earlier before the
removal, so into unregister. The danger is that on older machines,
resetting the GPU also impact the display engine and so the reset should
be after modesetting is disabled (and before reuse we need to recover
modesetting).
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2508
Testcase: igt/core_hotunplug
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200929112639.24223-1-chris@chris-wilson.co.uk
When using fake lmem for tests, we are overriding the setting in
device info for dgfx devices. Current users of IS_DGFX() except one are
correct. However, as we add support for DG1, we are going to use it in
additional places to trigger dgfx-only code path.
In future if we need we can use HAS_LMEM() instead of IS_DGFX() in the
places that make sense to also contemplate fake lmem use.
v2: update gen8_gmch_probe() to use HAS_LMEM(): we need to steal the
mappable aperture later(which is fine since it doesn't exist on "DGFX"),
and use it as a substitute for LMEMBAR. The !mappable aperture property
is also useful since it exercises some other parts of the code too.
(Matthew Auld)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201001063917.3133475-1-lucas.demarchi@intel.com
Currently, we check we can send a pulse prior to disabling the
heartbeat to verify that we can change the heartbeat, but since we may
re-evaluate execution upon changing the heartbeat interval we need another
pulse afterwards to refresh execution.
v2: Tvrtko asked if we could reduce the double pulse to a single, which
opened up a discussion of how we should handle the pulse-error after
attempting to change the property, and the desire to serialise
adjustment of the property with its validating pulse, and unwind upon
failure.
Fixes: 9a40bddd47 ("drm/i915/gt: Expose heartbeat interval via sysfs")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.7+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200928221510.26044-2-chris@chris-wilson.co.uk
(cherry picked from commit 3dd66a94de)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
We only allow persistent requests to remain on the GPU past the closure
of their containing context (and process) so long as they are continuously
checked for hangs or allow other requests to preempt them, as we need to
ensure forward progress of the system. If we allow persistent contexts
to remain on the system after the the hangcheck mechanism is disabled,
the system may grind to a halt. On disabling the mechanism, we sent a
pulse along the engine to remove all executing contexts from the engine
which would check for hung contexts -- but we did not prevent those
contexts from being resubmitted if they survived the final hangcheck.
Fixes: 9a40bddd47 ("drm/i915/gt: Expose heartbeat interval via sysfs")
Testcase: igt/gem_ctx_persistence/heartbeat-stop
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.7+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200928221510.26044-1-chris@chris-wilson.co.uk
(cherry picked from commit 7a991cd3e3)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
In case backoff fails with an error, we return an undefined rq,
assign err to rq correctly.
Fixes: 8a929c9eb1 ("drm/i915: Use ww pinning for intel_context_create_request()")
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200918111208.1392128-1-maarten.lankhorst@linux.intel.com
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 4316b19dee)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
If we create a new node, it is possible for the slab allocator to return
us a recently freed node. If that node was just retired, it will retain
the current jiffy as its node->age. There is then a miniscule window,
where as that node is retired, it will appear on the free list with an
incorrect age and be eligible for reuse by one thread, and then by a
second thread as the correct node->age is written.
Fixes: 06b73c2d0b ("drm/i915/gt: Delay taking the spinlock for grabbing from the buffer pool")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200915091417.4086-3-chris@chris-wilson.co.uk
(cherry picked from commit 9bb34ff25c)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
If we manage to hit the intel_gt_set_wedged_on_fini() while active, i.e.
module unload during a stress test, we may cancel the requests but not
clean up. This leads to a very slow module unload as we wait for
something or other to trigger the retirement flushing, or timeout and
unload with a bunch of warnings. Instead if we explicitly cancel then
cleanup on an active unload, it should be instant and quiet.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200930163253.2789-3-chris@chris-wilson.co.uk
Currently, we check we can send a pulse prior to disabling the
heartbeat to verify that we can change the heartbeat, but since we may
re-evaluate execution upon changing the heartbeat interval we need another
pulse afterwards to refresh execution.
v2: Tvrtko asked if we could reduce the double pulse to a single, which
opened up a discussion of how we should handle the pulse-error after
attempting to change the property, and the desire to serialise
adjustment of the property with its validating pulse, and unwind upon
failure.
Fixes: 9a40bddd47 ("drm/i915/gt: Expose heartbeat interval via sysfs")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.7+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200928221510.26044-2-chris@chris-wilson.co.uk
We only allow persistent requests to remain on the GPU past the closure
of their containing context (and process) so long as they are continuously
checked for hangs or allow other requests to preempt them, as we need to
ensure forward progress of the system. If we allow persistent contexts
to remain on the system after the the hangcheck mechanism is disabled,
the system may grind to a halt. On disabling the mechanism, we sent a
pulse along the engine to remove all executing contexts from the engine
which would check for hung contexts -- but we did not prevent those
contexts from being resubmitted if they survived the final hangcheck.
Fixes: 9a40bddd47 ("drm/i915/gt: Expose heartbeat interval via sysfs")
Testcase: igt/gem_ctx_persistence/heartbeat-stop
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.7+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200928221510.26044-1-chris@chris-wilson.co.uk
The GuC communication enabled/disabled messages are too noisy in info
level. Convert them from info to debug level, and switch to device based
logging while at it.
Reported-by: Karol Herbst <kherbst@redhat.com>
Cc: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200917165056.29766-1-jani.nikula@intel.com
UAPI Changes:
Cross-subsystem Changes:
Core Changes:
- dev: More devm_drm convertions and removal of drm_dev_init
Driver Changes:
- i915: selftests improvements
- panfrost: support for Amlogic SoC
- vc4: one fix
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCX2jGxQAKCRDj7w1vZxhR
xR3DAQCiZOnaxVcY49iG4343Z1aHHaIEShbnB0bDdaWstn7kiQD/UXBXUoOSFoFQ
FkTsW31JsdXNnWP5e6/eJd2Lb6waVAA=
=VlsU
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-next-2020-09-21' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for 5.10:
UAPI Changes:
Cross-subsystem Changes:
- virtio: Merged a PR for patches that will affect drm/virtio
Core Changes:
- dev: More devm_drm convertions and removal of drm_dev_init
- atomic: Split out drm_atomic_helper_calc_timestamping_constants of
drm_atomic_helper_update_legacy_modeset_state
- ttm: More rework
Driver Changes:
- i915: selftests improvements
- panfrost: support for Amlogic SoC
- vc4: one fix
- tree-wide: conversions to devm_drm_dev_alloc,
- ast: simplifications of the atomic modesetting code
- panfrost: multiple fixes
- vc4: multiple fixes
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20200921152956.2gxnsdgxmwhvjyut@gilmour.lan
Just some prep work before we rework the lifetime handling, which
requires replacing all the drm_dev_put in selftests by something else.
v2: Don't go with a static inline, upsets the header tests and
separation.
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200918132505.2316382-2-daniel.vetter@ffwll.ch
As the last user was eliminated in commit e21fecdcde40 ("drm/i915/gt:
Distinguish the virtual breadcrumbs from the irq breadcrumbs"), we can
remove the function. One less implementation detail creeping beyond its
scope.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200826132811.17577-16-chris@chris-wilson.co.uk
When debugging the engine state, include the user properties that may,
or may not, have been adjusted by the user/test.
For example,
vecs0
...
Properties:
heartbeat_interval_ms: 2500 [default 2500]
max_busywait_duration_ns: 8000 [default 8000]
preempt_timeout_ms: 640 [default 640]
stop_timeout_ms: 100 [default 100]
timeslice_duration_ms: 1 [default 1]
Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200916090059.3189-1-chris@chris-wilson.co.uk
If we find the GPU didn't update the CSB within 50us, we currently fail
and eventually reset the GPU. Lets report the value from the mmio space
as a last resort, it may just stave off an unnecessary GPU reset.
References: HSDES#22011327657
Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200915134923.30088-4-chris@chris-wilson.co.uk
Since we expect to inline the csb_parse() routines, the w/a for the
stale CSB data on Tigerlake will be pulled into process_csb(), and so we
might as well simply reuse the logic for all, and so will hopefully
avoid any strange behaviour on Icelake that was not covered by our
previous w/a.
References: d8f5053117 ("drm/i915/icl: Forcibly evict stale csb entries")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200915134923.30088-3-chris@chris-wilson.co.uk
On Tigerlake, we are seeing a repeat of commit d8f5053117 ("drm/i915/icl:
Forcibly evict stale csb entries") where, presumably, due to a missing
Global Observation Point synchronisation, the write pointer of the CSB
ringbuffer is updated _prior_ to the contents of the ringbuffer. That is
we see the GPU report more context-switch entries for us to parse, but
those entries have not been written, leading us to process stale events,
and eventually report a hung GPU.
However, this effect appears to be much more severe than we previously
saw on Icelake (though it might be best if we try the same approach
there as well and measure), and Bruce suggested the good idea of resetting
the CSB entry after use so that we can detect when it has been updated by
the GPU. By instrumenting how long that may be, we can set a reliable
upper bound for how long we should wait for:
513 late, avg of 61 retries (590 ns), max of 1061 retries (10099 ns)
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2045
References: d8f5053117 ("drm/i915/icl: Forcibly evict stale csb entries")
References: HSDES#22011327657, HSDES#1508287568
Suggested-by: Bruce Chang <yu.bruce.chang@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: stable@vger.kernel.org # v5.4
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200915134923.30088-2-chris@chris-wilson.co.uk
A CSB entry is 64b, and it is simpler for us to treat it as an array of
64b entries than as an array of pairs of 32b entries.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200915134923.30088-1-chris@chris-wilson.co.uk
If we create a new node, it is possible for the slab allocator to return
us a recently freed node. If that node was just retired, it will retain
the current jiffy as its node->age. There is then a miniscule window,
where as that node is retired, it will appear on the free list with an
incorrect age and be eligible for reuse by one thread, and then by a
second thread as the correct node->age is written.
Fixes: 06b73c2d0b ("drm/i915/gt: Delay taking the spinlock for grabbing from the buffer pool")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200915091417.4086-3-chris@chris-wilson.co.uk
(Same content as drm-intel-gt-next-2020-09-04-3, S-o-b's added)
UAPI Changes:
(- Potential implicit changes from WW locking refactoring)
Cross-subsystem Changes:
(- WW locking changes should align the i915 locking more with others)
Driver Changes:
- MAJOR: Apply WW locking across the driver (Maarten)
- Reverts for 5 commits to make applying WW locking faster (Maarten)
- Disable preparser around invalidations on Tigerlake for non-RCS engines (Chris)
- Add missing dma_fence_put() for error case of syncobj timeline (Chris)
- Parse command buffer earlier in eb_relocate(slow) to facilitate backoff (Maarten)
- Pin engine before pinning all objects (Maarten)
- Rework intel_context pinning to do everything outside of pin_mutex (Maarten)
- Avoid tracking GEM context until registered (Cc: stable, Chris)
- Provide a fastpath for waiting on vma bindings (Chris)
- Fixes to preempt-to-busy mechanism (Chris)
- Distinguish the virtual breadcrumbs from the irq breadcrumbs (Chris)
- Switch to object allocations for page directories (Chris)
- Hold context/request reference while breadcrumbs are active (Chris)
- Make sure execbuffer always passes ww state to i915_vma_pin (Maarten)
- Code refactoring to facilitate use of WW locking (Maarten)
- Locking refactoring to use more granular locking (Maarten, Chris)
- Support for multiple pinned timelines per engine (Chris)
- Move complication of I915_GEM_THROTTLE to the ioctl from general code (Chris)
- Make active tracking/vma page-directory stash work preallocated (Chris)
- Avoid flushing submission tasklet too often (Chris)
- Reduce context termination list iteration guard to RCU (Chris)
- Reductions to locking contention (Chris)
- Fixes for issues found by CI (Chris)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <jlahtine@jlahtine-mobl.ger.corp.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200907130039.GA27766@jlahtine-mobl.ger.corp.intel.com
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl9VerweHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGhc4H/iHD6qLdB36gZB6K
oc2nJyrqyWitv4ti2Mnt5PA7o4wX4l6nnr1QvoaJ4BRs5Ja1czRvb2XDmdzqAoIA
xITGoafqaAeDfxQ91bWrJsVN0pCRKiGwddXlU7TWmqw/riAkfOqi6GYKvav4biJH
+n1mUPQb1M2IbRFsqkAS+ebKHq3CWaRvzKOEneS88nGlL5u31S9NAru8Ru/fkxRn
6CwGcs1XRaBPYaZAhdfIb0NuatUlpkhPC9yhNS9up6SqrWmK3m65vmFVng6H0eCF
fwn1jVztboY/XcNAi5sM9ExpQCql6WLQEEktVikqRDojC8fVtSx6W55tPt7qeaoO
Z6t4/DA=
=bcA4
-----END PGP SIGNATURE-----
Merge tag 'v5.9-rc4' into drm-next
Backmerge 5.9-rc4 as there is a nasty qxl conflict
that needs to be resolved.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The hwsp_gtt object is used for sub-allocation and could therefore
be shared by many contexts causing unnecessary contention during
concurrent context pinning.
However since we're currently locking it only for pinning, it remains
resident until we unpin it, and therefore it's safe to drop the
lock early, allowing for concurrent thread access.
Signed-off-by: Thomas Hellström <thomas.hellstrom@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
To implement preempt-to-busy (and so efficient timeslicing and best utilization
of the hardware submission ports) we let the GPU run asynchronously in respect
to the ELSP submission queue. This created challenges in keeping and accessing
the driver state mirroring the asynchronous GPU execution.
The latest occurence of this was spotted by KCSAN:
[ 1413.563200] BUG: KCSAN: data-race in __await_execution+0x217/0x370 [i915]
[ 1413.563221]
[ 1413.563236] race at unknown origin, with read to 0xffff88885bb6c478 of 8 bytes by task 9654 on cpu 1:
[ 1413.563548] __await_execution+0x217/0x370 [i915]
[ 1413.563891] i915_request_await_dma_fence+0x4eb/0x6a0 [i915]
[ 1413.564235] i915_request_await_object+0x421/0x490 [i915]
[ 1413.564577] i915_gem_do_execbuffer+0x29b7/0x3c40 [i915]
[ 1413.564967] i915_gem_execbuffer2_ioctl+0x22f/0x5c0 [i915]
[ 1413.564998] drm_ioctl_kernel+0x156/0x1b0
[ 1413.565022] drm_ioctl+0x2ff/0x480
[ 1413.565046] __x64_sys_ioctl+0x87/0xd0
[ 1413.565069] do_syscall_64+0x4d/0x80
[ 1413.565094] entry_SYSCALL_64_after_hwframe+0x44/0xa9
To complicate matters, we have to both avoid the read tearing of *active and
avoid any write tearing as perform the pending[] -> inflight[] promotion of the
execlists.
This is because we cannot rely on the memcpy doing u64 aligned copies on all
kernels/platforms and so we opt to open-code it with explicit WRITE_ONCE
annotations to satisfy KCSAN.
v2: When in doubt, write the same comment again.
v3: Expanded commit message.
Fixes: b55230e5e8 ("drm/i915: Check for awaits on still currently executing requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200716142207.13003-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Joonas: Rebased and reordered into drm-intel-gt-next branch]
[Joonas: Added expanded commit message from Tvrtko and Chris]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Make sure vma_lock is not used as inner lock when kernel context is used,
and add ww handling where appropriate.
Ensure that execbuf selftests keep passing by using ww handling.
Changes since v2:
- Fix i915_gem_context finally.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-22-maarten.lankhorst@linux.intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
This function does not use intel_context_create_request, so it has
to use the same locking order as normal code. This is required to
shut up lockdep in selftests.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-20-maarten.lankhorst@linux.intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Some i915 selftests still use i915_vma_lock() as inner lock, and
intel_context_create_request() intel_timeline->mutex as outer lock.
Fortunately for selftests this is not an issue, they should be fixed
but we can move ahead and cleanify lockdep now.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-19-maarten.lankhorst@linux.intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Instead of using intel_context_create_request(), use intel_context_pin()
and i915_create_request directly.
Now all those calls are gone outside of selftests. :)
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-17-maarten.lankhorst@linux.intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
As a preparation step for full object locking and wait/wound handling
during pin and object mapping, ensure that we always pass the ww context
in i915_gem_execbuffer.c to i915_vma_pin, use lockdep to ensure this
happens.
This also requires changing the order of eb_parse slightly, to ensure
we pass ww at a point where we could still handle -EDEADLK safely.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-15-maarten.lankhorst@linux.intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Instead of doing everything inside of pin_mutex, we move all pinning
outside. Because i915_active has its own reference counting and
pinning is also having the same issues vs mutexes, we make sure
everything is pinned first, so the pinning in i915_active only needs
to bump refcounts. This allows us to take pin refcounts correctly
all the time.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-14-maarten.lankhorst@linux.intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
We want to start using ww locking in intel_context_pin, for this
we need to lock multiple objects, and the single i915_gem_object_lock
is not enough.
Convert to using ww-waiting, and make sure we always pin intel_context_state,
even if we don't have a renderstate object.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-10-maarten.lankhorst@linux.intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
i915_gem_ww_ctx is used to lock all gem bo's for pinning and memory
eviction. We don't use it yet, but lets start adding the definition
first.
To use it, we have to pass a non-NULL ww to gem_object_lock, and don't
unlock directly. It is done in i915_gem_ww_ctx_fini.
Changes since v1:
- Change ww_ctx and obj order in locking functions (Jonas Lahtinen)
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-6-maarten.lankhorst@linux.intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Currently we hold no actual reference to the request nor context while
they are attached to a breadcrumb. To avoid freeing the request/context
too early, we serialise with cancel-breadcrumbs by taking the irq
spinlock in i915_request_retire(). The alternative is to take a
reference for a new breadcrumb and release it upon signaling; removing
the more frequently hit contention point in i915_request_retire().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200801160225.6814-2-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Joonas: Rebased and reordered into drm-intel-gt-next branch]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Move the __intel_breadcrumbs_arm_irq earlier, next to the disarm_irq, so
that we can make use of it in the following patch.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200801160225.6814-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
kmalloc uses power-of-two slab buckets for small allocations (up to a
few pages). Since i915_page_directory is a page of pointers, plus a
couple more, this is rounded up to 8K, and we waste nearly 50% of that
allocation. Long terms this leads to poor memory utilisation, bloating
the kernel footprint, but the problem is exacerbated by our conservative
preallocation scheme for binding VMA. As we are required to allocate all
levels for each vma just in case we need to insert them upon binding,
this leads to a large multiplication factor for a single page vma. By
halving the allocation we need for the page directory structure, we
halve the impact of that factor, bringing workloads that once fitted into
memory, hopefully back to fitting into memory.
We maintain the split between i915_page_directory and i915_page_table as
we only need half the allocation for the lowest, most populous, level.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200729164219.5737-3-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
The GEM object is grossly overweight for the practicality of tracking
large numbers of individual pages, yet it is currently our only
abstraction for tracking DMA allocations. Since those allocations need
to be reserved upfront before an operation, and that we need to break
away from simple system memory, we need to ditch using plain struct page
wrappers.
In the process, we drop the WC mapping as we ended up clflushing
everything anyway due to various issues across a wider range of
platforms. Though in a future step, we need to drop the kmap_atomic
approach which suggests we need to pre-map all the pages and keep them
mapped.
v2: Verify our large scratch page is suitably DMA aligned; and manually
clear the scratch since we are allocating plain struct pages full of
prior content.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200729164219.5737-2-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
We need to make the DMA allocations used for page directories to be
performed up front so that we can include those allocations in our
memory reservation pass. The downside is that we have to assume the
worst case, even before we know the final layout, and always allocate
enough page directories for this object, even when there will be overlap.
This unfortunately can be quite expensive, especially as we have to
clear/reset the page directories and DMA pages, but it should only be
required during early phases of a workload when new objects are being
discovered, or after memory/eviction pressure when we need to rebind.
Once we reach steady state, the objects should not be moved and we no
longer need to preallocating the pages tables.
It should be noted that the lifetime for the page directories DMA is
more or less decoupled from individual fences as they will be shared
across objects across timelines.
v2: Only allocate enough PD space for the PTE we may use, we do not need
to allocate PD that will be left as scratch.
v3: Store the shift unto the first PD level to encapsulate the different
PTE counts for gen6/gen8.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200729164219.5737-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
On the virtual engines, we only use the intel_breadcrumbs for tracking
signaling of stale breadcrumbs from the irq_workers. They do not have
any associated interrupt handling, active requests are passed to a
physical engine and associated breadcrumb interrupt handler. This causes
issues for us as we need to ensure that we do not actually try and
enable interrupts and the powermanagement required for them on the
virtual engine, as they will never be disabled. Instead, let's
specify the physical engine used for interrupt handler on a particular
breadcrumb.
v2: Drop b->irq_armed = true mocking for no interrupt HW
Fixes: 4fe6abb8f5 ("drm/i915/gt: Ignore irq enabling on the virtual engines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200731154834.8378-4-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
One more complication of preempt-to-busy with respect to the virtual
engine is that we may have retired the last request along the virtual
engine at the same time as preparing to submit the completed request to
a new engine. That submit will be shortcircuited, but not before we have
updated the context with the new register offsets and marked the virtual
engine as bound to the new engine (by calling swap on ve->siblings[]).
As we may have just retired the completed request, we may also be in the
middle of calling virtual_context_exit() to turn off the power management
associated with the virtual engine, and that in turn walks the
ve->siblings[]. If we happen to call swap() on the array as we walk, we
will call intel_engine_pm_put() twice on the same engine.
In this patch, we prevent this by only updating the bound engine after a
successful submission which weeds out the already completed requests.
Alternatively, we could walk a non-volatile array for the pm, such as
using the engine->mask. The small advantage to performing the update
after the submit is that we then only have to do a swap for active
requests.
Fixes: 22b7a426bb ("drm/i915/execlists: Preempt-to-busy")
References: 6d06779e86 ("drm/i915: Load balancing across a virtual engine"
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: "Nayana, Venkata Ramana" <venkata.ramana.nayana@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200731154834.8378-3-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
After staring at the breadcrumb enabling/cancellation and coming to the
conclusion that the cause of the mysterious stale breadcrumbs must the
act of submitting a completed requests, we can then redirect those
completed requests onto a dedicated signaled_list at the time of
construction and so eliminate intel_engine_transfer_stale_breadcrumbs().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200731154834.8378-2-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Since the breadcrumb enabling/cancelling itself is serialised by the
breadcrumbs.irq_lock, with a bit of care we can remove the outer
serialisation with i915_request.lock for concurrent
dma_fence_enable_signaling(). This has the important side-effect of
eliminating the nested i915_request.lock within request submission.
The challenge in serialisation is around the unsubmission where we take
an active request that wants a breadcrumb on the signaling engine and
put it to sleep. We do not want a concurrent
dma_fence_enable_signaling() to attach a breadcrumb as we unsubmit, so
we must mark the request as no longer active before serialising with the
concurrent enable-signaling.
On retire, we serialise with the concurrent enable-signaling, but
instead of clearing ACTIVE, we mark it as SIGNALED.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200731154834.8378-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Joonas: Rebased and reordered into drm-intel-gt-next branch]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Sometimes we have to be very careful not to allocate underneath a mutex
(or spinlock) and yet still want to track activity. Enter
i915_active_acquire_for_context(). This raises the activity counter on
i915_active prior to use and ensures that the fence-tree contains a slot
for the context.
v2: Refactor active_lookup() so it can be called again before/after
locking to resolve contention. Since we protect the rbtree until we
idle, we can do a lockfree lookup, with the caveat that if another
thread performs a concurrent insertion, the rotations from the insert
may cause us to not find our target. A second pass holding the treelock
will find the target if it exists, or the place to perform our
insertion.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200731085015.32368-3-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Since we pass around encoded parameters to the kernel context
constructor using the ce->timeline pointer, we can no longer assert that
it should be zero for mock timeline construction.
Fixes: d1bf5dd8f6 ("drm/i915/gt: Support multiple pinned timelines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200731102206.6793-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Joonas: Updated Fixes: link after rebasing and reordering into drm-intel-gt-next branch]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
We need to ensure that the list is valid prior to marking the node as
retrievable, otherwise we may see two threads compete over the same node
in intel_gt_get_buffer_pool(). If the first thread acquires and releases
the node in the same jiffie, the second thread may then acquire it (as
the jiffie now again matches the expected value) and claim the node
before it is put back into the list.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200730134049.8822-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
We may need to allocate more than one pinned context/timeline for each
engine which can utilise the per-engine HWSP, so we need to give each
a different offset within it.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200730183906.25422-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
A last minute change, that unfortunately broke CI so badly it declared
SUCCESS, was to refactor the debug free all buffer pool code to reuse
the normal worker, inverted the termination condition so that it instead
of discarding the nodes, they were all declared young enough and
eligible for reuse.
Fixes: 06b73c2d0b ("drm/i915/gt: Delay taking the spinlock for grabbing from the buffer pool")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200729110756.2344-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Joonas: Updating Fixes: link after rebasing and reordering into drm-intel-gt-next]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Before we peek at the barrier status for an assert, first serialise with
its callbacks so that we see a stable value.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200728153325.28351-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Some very low hanging fruit, but contention on the pool->lock is
noticeable between intel_gt_get_buffer_pool() and pool_retire(), with
the majority of the hold time due to the locked list iteration. If we
make the node itself RCU protected, we can perform the search for an
suitable node just under RCU, reserving taking the lock itself for
claiming the node and manipulating the list.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200729080245.8070-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Unlike rcs where we have conclusive evidence from our selftesting that
disabling the preparser before performing the TLB invalidate and
relocations does impact upon the GPU execution, the evidence for the
same requirement on xcs is much more circumstantial. Let's apply the
preparser disable between batches as we invalidate the TLB as a dose of
healthy paranoia, just in case.
References: https://gitlab.freedesktop.org/drm/intel/-/issues/2169
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200728152110.830-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
I915_GEM_THROTTLE dates back to the time before contexts where there was
just a single engine, and therefore a single timeline and request list
globally. That request list was in execution/retirement order, and so
walking it to find a particular aged request made sense and could be
split per file.
That is no more. We now have many timelines with a file, as many as the
user wants to construct (essentially per-engine, per-context). Each of
those run independently and so make the single list futile. Remove the
disordered list, and iterate over all the timelines to find a request to
wait on in each to satisfy the criteria that the CPU is no more than 20ms
ahead of its oldest request.
It should go without saying that the I915_GEM_THROTTLE ioctl is no
longer used as the primary means of throttling, so it makes sense to push
the complication into the ioctl where it only impacts upon its few
irregular users, rather than the execbuf/retire where everybody has to
pay the cost. Fortunately, the few users do not create vast amount of
contexts, so the loops over contexts/engines should be concise.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200728152010.30701-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
TGL made stepping a litte mess, workarounds refer to the stepping of
the IP(GT or Display) not of the GPU stepping so it would already
require the same solution as used in commit 96c5a15f9f
("drm/i915/kbl: Fix revision ID checks").
But to make things even more messy it have a different IP stepping
mapping between SKUs and the same stepping revision of GT do not match
the same HW between TGL U/Y and regular TGL.
So it was required to have 2 different macros to check GT WAs while
for Display we are able to use just one macro that uses the right
revids table.
All TGL workarounds checked and updated accordingly.
v2:
- removed TODO to check if WA 14010919138 applies to regular TGL.
- fixed display stepping in regular TGL (Anusha)
BSpec: 52890
BSpec: 55378
BSpec: 44455
Reviewed-by: Anusha Srivatsa <anusha.srivtsa@intel.com>
Cc: Anusha Srivatsa <anusha.srivatsa@intel.com>
Cc: Penne Lee <penne.y.lee@intel.com>
Cc: Guangyao Bai <guangyao.bai@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200827233943.400946-1-jose.souza@intel.com
- Introduce a mechanism to extend execbuf2 (Lionel)
- Add syncobj timeline support (Lionel)
Driver Changes:
- Limit stolen mem usage on the compressed frame buffer (Ville)
- Some clean-up around display's cdclk (Ville)
- Some DDI changes for better DP link training according
to spec (Imre)
- Provide the perf pmu.module (Chris)
- Remove dobious Valleyview PCI IDs (Alexei)
- Add new display power saving feature for gen12+ called
HOBL (Jose)
- Move SKL's clock gating w/a to skl_init_clock_gating() (Ville)
- Rocket Lake display additions (Matt)
- Selftest: temporarily downgrade on severity of frequency
scaling tests (Chris)
- Introduce a new display workaround for fixing FLR related
issues on new PCH. (Jose)
- Temporarily disable FBC on TGL. It was the culprit of random
underruns. (Uma).
- Copy default modparams to mock i915_device (Chris)
- Add compiler paranoia for checking HWSP values (Chris)
- Remove useless gen check before calling intel_rps_boost (Chris)
- Fix a null pointer dereference (Chris)
- Add a couple of missing i915_active_fini() (Chris)
- Update TGL display power's bw_buddy table according to
update spec (Matt)
- Fix couple wrong return values (Tianjia)
- Selftest: Avoid passing random 0 into ilog2 (George)
- Many Tiger Lake display fixes and improvements for Type-C and
DP compliance (Imre, Jose)
- Start the addition of PSR2 selective fetch (Jose)
- Update a few DMC and HuC firmware versions (Jose)
- Add gen11+ w/a to fix underuns (Matt)
- Fix cmd parser desc matching with mask (Mika)
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEbSBwaO7dZQkcLOKj+mJfZA7rE8oFAl9EBoQACgkQ+mJfZA7r
E8qNIgf9E3t12hq0z+8SidMyUPXCz6+BJzed+zjF6q6w3lVaxloQbJbQc/ujec6Y
DcnHKdZWN4/BjPtO9PYsOo7JRPlw9mounMfMqhmsgCNigpy8jdE6EQB2wDY/JtWG
I/OmVwaIDWF/srRJZNJlmdx1IT6pes3A/1HBJmJWFFPFFQxl6Y8vbaZGmMDwXRzS
6/LOy7otXVGvSHqYDFzNWBPNRstUYmQuPbE4/Iei3zbS8Di3uCkspa6LbocE+T5g
cokw9fxE1cJv9bIhIY65R611XyzqqHDzM+2s3x35r8a/ectItLE7kkU07/X3RXmc
lrqf4xxzmg+lvbKaLMGdI7YRFPcvbQ==
=cGnK
-----END PGP SIGNATURE-----
Merge tag 'drm-intel-next-2020-08-24-1' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
UAPI Changes:
- Introduce a mechanism to extend execbuf2 (Lionel)
- Add syncobj timeline support (Lionel)
Driver Changes:
- Limit stolen mem usage on the compressed frame buffer (Ville)
- Some clean-up around display's cdclk (Ville)
- Some DDI changes for better DP link training according
to spec (Imre)
- Provide the perf pmu.module (Chris)
- Remove dobious Valleyview PCI IDs (Alexei)
- Add new display power saving feature for gen12+ called
HOBL (Jose)
- Move SKL's clock gating w/a to skl_init_clock_gating() (Ville)
- Rocket Lake display additions (Matt)
- Selftest: temporarily downgrade on severity of frequency
scaling tests (Chris)
- Introduce a new display workaround for fixing FLR related
issues on new PCH. (Jose)
- Temporarily disable FBC on TGL. It was the culprit of random
underruns. (Uma).
- Copy default modparams to mock i915_device (Chris)
- Add compiler paranoia for checking HWSP values (Chris)
- Remove useless gen check before calling intel_rps_boost (Chris)
- Fix a null pointer dereference (Chris)
- Add a couple of missing i915_active_fini() (Chris)
- Update TGL display power's bw_buddy table according to
update spec (Matt)
- Fix couple wrong return values (Tianjia)
- Selftest: Avoid passing random 0 into ilog2 (George)
- Many Tiger Lake display fixes and improvements for Type-C and
DP compliance (Imre, Jose)
- Start the addition of PSR2 selective fetch (Jose)
- Update a few DMC and HuC firmware versions (Jose)
- Add gen11+ w/a to fix underuns (Matt)
- Fix cmd parser desc matching with mask (Mika)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200826232733.GA129053@intel.com
Enable HW Default flip for small PL.
bspec: 52890
bspec: 53508
bspec: 53273
v2: rebase to drm-tip
v3: move from ctx to gt workarounds. Remove whitelist.
v4: move to rcs WA init
Cc: Matt Atwood <matthew.s.atwood@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Clint Taylor <clinton.a.taylor@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200826025724.20944-1-clinton.a.taylor@intel.com
We usually assume that increasing PCI device revision ID's translates to
newer steppings; macros like IS_KBL_REVID() that we use rely on this
behavior. Unfortunately this turns out to not be true on KBL; the
newer device 2 revision ID's sometimes go backward to older steppings.
The situation is further complicated by different GT and display
steppings associated with each revision ID.
Let's work around this by providing a table to map the revision ID to
specific GT and display steppings, and then perform our comparisons on
the mapped values.
v2:
- Move the kbl_revids[] array to intel_workarounds.c to avoid compiler
warnings about an unused variable in files that don't call the
macros (kernel test robot).
Bspec: 18329
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200811032105.2819370-1-matthew.d.roper@intel.com
Reviewed-by: Swathi Dhanavanthri <swathi.dhanavanthri@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
We use i915_active_fini() as a debug check on the i915_active state
before freeing. If we forget to call it, we may end up angering the
debugobjects contained within.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200731085015.32368-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Since we want to read the values from the HWSP as written to by the GPU,
warn the compiler that the values are volatile.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200728152110.830-2-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
RKL and TGL share some general gen12 workarounds, but each platform also
has its own platform-specific workarounds.
v2:
- Add Wa_1604555607 for RKL. This makes RKL's ctx WA list identical to
TGL's, so we'll have both functions call the tgl_ function for now;
this workaround isn't listed for DG1 so we don't want to add it to
the general gen12_ function.
Cc: Matt Atwood <matthew.s.atwood@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200716220551.2730644-3-matthew.d.roper@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
core:
- add user def flag to cmd line modes
- dma_fence_wait added might_sleep
- dma-fence lockdep annotations
- indefinite fences are bad documentation
- gem CMA functions used in more drivers
- struct mutex removal
- more drm_ debug macro usage
- set/drop master api fixes
- fix for drm/mm hole size comparison
- drm/mm remove invalid entry optimization
- optimise drm/mm hole handling
- VRR debugfs added
- uncompressed AFBC modifier support
- multiple display id blocks in EDID
- multiple driver sg handling fixes
- __drm_atomic_helper_crtc_reset in all drivers
- managed vram helpers
ttm:
- ttm_mem_reg handling cleanup
- remove bo offset field
- drop CMA memtype flag
- drop mappable flag
xilinx:
- New Xilinx ZynqMP DisplayPort Subsystem driver
nouveau:
- add CRC support
- start using NVIDIA published class header files
- convert all push buffer emission to new macros
- Proper push buffer space management for EVO/NVD channels.
- firmware loading fixes
- 2MiB system memory pages support on Pascal and newer
vkms:
- larget cursor support
i915:
- Rocketlake platform enablement
- Early DG1 enablement
- Numerous GEM refactorings
- DP MST fixes
- FBC, PSR, Cursor, Color, Gamma fixes
- TGL, RKL, EHL workaround updates
- TGL 8K display support fixes
- SDVO/HDMI/DVI fixes
amdgpu:
- Initial support for Sienna Cichlid GPU
- Initial support for Navy Flounder GPU
- SI UVD/VCE support
- expose rotation property
- Add support for unique id on Arcturus
- Enable runtime PM on vega10 boards that support BACO
- Skip BAR resizing if the bios already did id
- Major swSMU code cleanup
- Fixes for DCN bandwidth calculations
amdkfd:
- Track SDMA usage per process
- SMI events interface
radeon:
- Default to on chip GART for AGP boards on all arches
- Runtime PM reference count fixes
msm:
- headers regenerated causing churn
- a650/a640 display and GPU enablement
- dpu dither support for 6bpc panels
- dpu cursor fix
- dsi/mdp5 enablement for sdm630/sdm636/sdm66
tegra:
- video capture prep support
- reflection support
mediatek:
- convert mtk_dsi to bridge API
meson:
- FBC support
sun4i:
- iommu support
rockchip:
- register locking fix
- per-pixel alpha support PX30 VOP
-
mgag200:
- ported to simple and shmem helpers
- device init cleanups
- use managed pci functions
- dropped hw cursor support
ast:
- use managed pci functions
- use managed VRAM helpers
- rework cursor support
malidp:
- dev_groups support
hibmc:
- refactor hibmc_drv_vdac:
vc4:
- create TXP CRTC
imx:
- error path fixes and cleanups
etnaviv:
- clock handling and error handling cleanups
- use pin_user_pages
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJfK1atAAoJEAx081l5xIa+vDkQAJvl/mjbEA7fDy8Ysa0cgPLI
8nI4Bo/MaxkyRfUcP8+f/n3QQrRME37C0xa/Mn6SG1oFAdlovPwDqmDr5kjhkrMI
geo8oJb2Q+AsrJr+ejpuF+iq0FxWi64bLbwJFJ2nBet+lHTMzoPceWeq3gG1Vvfl
h6PV4B/9TjrnbhcKLIQSEmJ0kZp9uMkDBF/iynVn4+AKAkG1rQNjigdTH48IFPoz
28KuqG0B4NWu648zYXhjsN0kD3Dxjv3YOH+FsoWQpQa9icCTySYbySsQ7l0/XvA3
4BPtP3rWMhU46FHTBkWF72WQR4F0B4wm5DJJKMeG4vb1mXakOqAKcAq7JAbka+wL
PBIiU+AcAKRSiwHmYDuDWtDoSpvYncuo0p3IvNP5hhih+7hqCnLIULDWS+V8AUzW
39usS/DXsVKk/HGYIYC89cRwsqWYD4c7edzOBdPQxW4LNYCD2gXezLJ5TeeR2lih
y9JIVnPiluWleOovs4W3BoZNRuLc1rHBO6COToXjlme/48Z+sRHBAoge6UZurqRN
jr+e60cS7n/DOeJQuNf4UHZnK48Pc24+3kVfMHlX+OKn8VuKPGr+USkeHV/NYL/B
USiKCAxkkZM0dxerSb1/Ra9kGnchf0QBpA6Fsem8kV61Z4GVc+K6xJWg7KXB6n/3
7ZyalUKLwlOCz9sYsCCe
=Yvtd
-----END PGP SIGNATURE-----
Merge tag 'drm-next-2020-08-06' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
"New xilinx displayport driver, AMD support for two new GPUs (more
header files), i915 initial support for RocketLake and some work on
their DG1 (discrete chip).
The core also grew some lockdep annotations to try and constrain what
drivers do with dma-fences, and added some documentation on why the
idea of indefinite fences doesn't work.
The long list is below.
I do have some fixes trees outstanding, but I'll follow up with those
later.
core:
- add user def flag to cmd line modes
- dma_fence_wait added might_sleep
- dma-fence lockdep annotations
- indefinite fences are bad documentation
- gem CMA functions used in more drivers
- struct mutex removal
- more drm_ debug macro usage
- set/drop master api fixes
- fix for drm/mm hole size comparison
- drm/mm remove invalid entry optimization
- optimise drm/mm hole handling
- VRR debugfs added
- uncompressed AFBC modifier support
- multiple display id blocks in EDID
- multiple driver sg handling fixes
- __drm_atomic_helper_crtc_reset in all drivers
- managed vram helpers
ttm:
- ttm_mem_reg handling cleanup
- remove bo offset field
- drop CMA memtype flag
- drop mappable flag
xilinx:
- New Xilinx ZynqMP DisplayPort Subsystem driver
nouveau:
- add CRC support
- start using NVIDIA published class header files
- convert all push buffer emission to new macros
- Proper push buffer space management for EVO/NVD channels.
- firmware loading fixes
- 2MiB system memory pages support on Pascal and newer
vkms:
- larger cursor support
i915:
- Rocketlake platform enablement
- Early DG1 enablement
- Numerous GEM refactorings
- DP MST fixes
- FBC, PSR, Cursor, Color, Gamma fixes
- TGL, RKL, EHL workaround updates
- TGL 8K display support fixes
- SDVO/HDMI/DVI fixes
amdgpu:
- Initial support for Sienna Cichlid GPU
- Initial support for Navy Flounder GPU
- SI UVD/VCE support
- expose rotation property
- Add support for unique id on Arcturus
- Enable runtime PM on vega10 boards that support BACO
- Skip BAR resizing if the bios already did id
- Major swSMU code cleanup
- Fixes for DCN bandwidth calculations
amdkfd:
- Track SDMA usage per process
- SMI events interface
radeon:
- Default to on chip GART for AGP boards on all arches
- Runtime PM reference count fixes
msm:
- headers regenerated causing churn
- a650/a640 display and GPU enablement
- dpu dither support for 6bpc panels
- dpu cursor fix
- dsi/mdp5 enablement for sdm630/sdm636/sdm66
tegra:
- video capture prep support
- reflection support
mediatek:
- convert mtk_dsi to bridge API
meson:
- FBC support
sun4i:
- iommu support
rockchip:
- register locking fix
- per-pixel alpha support PX30 VOP
mgag200:
- ported to simple and shmem helpers
- device init cleanups
- use managed pci functions
- dropped hw cursor support
ast:
- use managed pci functions
- use managed VRAM helpers
- rework cursor support
malidp:
- dev_groups support
hibmc:
- refactor hibmc_drv_vdac:
vc4:
- create TXP CRTC
imx:
- error path fixes and cleanups
etnaviv:
- clock handling and error handling cleanups
- use pin_user_pages"
* tag 'drm-next-2020-08-06' of git://anongit.freedesktop.org/drm/drm: (1747 commits)
drm/msm: use kthread_create_worker instead of kthread_run
drm/msm/mdp5: Add MDP5 configuration for SDM636/660
drm/msm/dsi: Add DSI configuration for SDM660
drm/msm/mdp5: Add MDP5 configuration for SDM630
drm/msm/dsi: Add phy configuration for SDM630/636/660
drm/msm/a6xx: add A640/A650 hwcg
drm/msm/a6xx: hwcg tables in gpulist
drm/msm/dpu: add SM8250 to hw catalog
drm/msm/dpu: add SM8150 to hw catalog
drm/msm/dpu: intf timing path for displayport
drm/msm/dpu: set missing flush bits for INTF_2 and INTF_3
drm/msm/dpu: don't use INTF_INPUT_CTRL feature on sdm845
drm/msm/dpu: move some sspp caps to dpu_caps
drm/msm/dpu: update UBWC config for sm8150 and sm8250
drm/msm/dpu: use right setup_blend_config for sm8150 and sm8250
drm/msm/a6xx: set ubwc config for A640 and A650
drm/msm/adreno: un-open-code some packets
drm/msm: sync generated headers
drm/msm/a6xx: add build_bw_table for A640/A650
drm/msm/a6xx: fix crashstate capture for A650
...
Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.
In preparation for removing[2] the[3] macro[4], remove all remaining
needless uses with the following script:
git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
xargs perl -pi -e \
's/\buninitialized_var\(([^\)]+)\)/\1/g;
s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'
drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
pathological white-space.
No outstanding warnings were found building allmodconfig with GCC 9.3.0
for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
alpha, and m68k.
[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/
Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5
Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB
Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers
Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs
Signed-off-by: Kees Cook <keescook@chromium.org>
We need to ensure that the kernel context is using the permanently pinned
HWSP so that we can always submit a pm request from any context. By
construction, the engine->kernel_context should only be using the
engine->status_page.vma so let's assert that is still true when we have
to submit a request for parking the engine.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200714114419.28713-1-chris@chris-wilson.co.uk
The danger in switching at random upon intel_context_pin is that the
context may still actually be inflight, as it will not be scheduled out
until a context switch after it is complete -- that may be a long time
after we do a final intel_context_unpin.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2118
Fixes: 6d06779e86 ("drm/i915: Load balancing across a virtual engine")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.3+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200713160549.17344-1-chris@chris-wilson.co.uk
(cherry picked from commit 90a987205c)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
We do not use the virtual engines for interrupts (they have physical
components), but we do use them to decouple the fence signaling during
submission. Currently, when we submit a completed request, we try to
enable the interrupt handler for the virtual engine, but we never disarm
it. A quick fix is then to mark the irq as enabled, and it will then
remain enabled -- and this prevents us from waking the device and never
letting it sleep again.
Fixes: f8db4d051b ("drm/i915: Initialise breadcrumb lists on the virtual engine")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200711203236.12330-1-chris@chris-wilson.co.uk
(cherry picked from commit 4fe6abb8f5)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Both cmp_u32 and cmp_u64 are comparing the pointers instead of the value
at those pointers. This will result in incorrect/unsorted list. Fix it
by deferencing the pointers before comparison.
Fixes: 4ba74e53ad ("drm/i915/selftests: Verify frequency scaling with RPS")
Fixes: 8757797ff9 ("drm/i915/selftests: Repeat the rps clock frequency measurement")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200709154931.23310-1-sudeep.holla@arm.com
(cherry picked from commit 2196dfea89)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Preempt-to-busy introduces various fascinating complications in that the
requests may complete as we are unsubmitting them from HW. As they may
then signal after unsubmission, we may find ourselves having to cleanup
the signaling request from within the signaling callback. This causes us
to recurse onto the same i915_request.lock.
However, if the request is already signaled (as it will be before we
enter the signal callbacks), we know we can skip the signaling of that
request during submission, neatly evading the spinlock recursion.
unsubmit(ve.rq0) # timeslice expiration or other preemption
-> virtual_submit_request(ve.rq0)
dma_fence_signal(ve.rq0) # request completed before preemption ack
-> submit_notify(ve.rq1)
-> virtual_submit_request(ve.rq1) # sees that we have completed ve.rq0
-> __i915_request_submit(ve.rq0)
[ 264.210142] BUG: spinlock recursion on CPU#2, sample_multi_tr/2093
[ 264.210150] lock: 0xffff9efd6ac55080, .magic: dead4ead, .owner: sample_multi_tr/2093, .owner_cpu: 2
[ 264.210155] CPU: 2 PID: 2093 Comm: sample_multi_tr Tainted: G U
[ 264.210158] Hardware name: Intel Corporation CoffeeLake Client Platform/CoffeeLake S UDIMM RVP, BIOS CNLSFWR1.R00.X212.B01.1909060036 09/06/2019
[ 264.210160] Call Trace:
[ 264.210167] dump_stack+0x98/0xda
[ 264.210174] spin_dump.cold+0x24/0x3c
[ 264.210178] do_raw_spin_lock+0x9a/0xd0
[ 264.210184] _raw_spin_lock_nested+0x6a/0x70
[ 264.210314] __i915_request_submit+0x10a/0x3c0 [i915]
[ 264.210415] virtual_submit_request+0x9b/0x380 [i915]
[ 264.210516] submit_notify+0xaf/0x14c [i915]
[ 264.210602] __i915_sw_fence_complete+0x8a/0x230 [i915]
[ 264.210692] i915_sw_fence_complete+0x2d/0x40 [i915]
[ 264.210762] __dma_i915_sw_fence_wake+0x19/0x30 [i915]
[ 264.210767] dma_fence_signal_locked+0xb1/0x1c0
[ 264.210772] dma_fence_signal+0x29/0x50
[ 264.210871] i915_request_wait+0x5cb/0x830 [i915]
[ 264.210876] ? dma_resv_get_fences_rcu+0x294/0x5d0
[ 264.210974] i915_gem_object_wait_fence+0x2f/0x40 [i915]
[ 264.211084] i915_gem_object_wait+0xce/0x400 [i915]
[ 264.211178] i915_gem_wait_ioctl+0xff/0x290 [i915]
Fixes: 22b7a426bb ("drm/i915/execlists: Preempt-to-busy")
References: 6d06779e86 ("drm/i915: Load balancing across a virtual engine")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: "Nayana, Venkata Ramana" <venkata.ramana.nayana@intel.com>
Cc: <stable@vger.kernel.org> # v5.4+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200713141636.29326-1-chris@chris-wilson.co.uk
The danger in switching at random upon intel_context_pin is that the
context may still actually be inflight, as it will not be scheduled out
until a context switch after it is complete -- that may be a long time
after we do a final intel_context_unpin.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2118
Fixes: 6d06779e86 ("drm/i915: Load balancing across a virtual engine")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.3+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200713160549.17344-1-chris@chris-wilson.co.uk
We do not use the virtual engines for interrupts (they have physical
components), but we do use them to decouple the fence signaling during
submission. Currently, when we submit a completed request, we try to
enable the interrupt handler for the virtual engine, but we never disarm
it. A quick fix is then to mark the irq as enabled, and it will then
remain enabled -- and this prevents us from waking the device and never
letting it sleep again.
Fixes: f8db4d051b ("drm/i915: Initialise breadcrumb lists on the virtual engine")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200711203236.12330-1-chris@chris-wilson.co.uk
If something has gone awry with the CSB processing, we need to pause,
unwind and restart the request submission and event processing. However,
currently we skip the engine reset if we raise an error but discover no
active context, in the mistaken belief that it was merely a glitch in
the matrix. The glitches are real enough, and we do need to unwind even
if the engine appears idle (as it has gone permanently idle!) The
simplest way to unwind and recover is simply do the engine reset, which
should be very fast and _safe_ as nothing is active.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200711091349.28865-1-chris@chris-wilson.co.uk
If the HW throws a curve ball and reports either en event before it is
possible, or just a completely impossible event, we have to grin and
bear it. The first few events, we will likely not notice as we would be
expecting some event, but as soon as we stop expecting an event and yet
they still keep coming, then we enter into undefined state territory.
In which case, bail out, stop processing the events, and reset the
engine and our set of queued requests to recover.
The sporadic hangs and warnings will continue to plague CI, but at least
system stability should not be compromised.
v2: Commentary and force the reset-on-error.
v3: Customised user facing message for forced resets from internal errors.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2045
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200710133125.30194-1-chris@chris-wilson.co.uk
Since the aliasing-ppgtt remains the default for gen6/gen7, it is worth
optimising the ppgtt allocation for it. In this case, we do not need to
flush the GGTT page directories entries as they are fixed during setup.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200709190111.5492-1-chris@chris-wilson.co.uk
Both cmp_u32 and cmp_u64 are comparing the pointers instead of the value
at those pointers. This will result in incorrect/unsorted list. Fix it
by deferencing the pointers before comparison.
Fixes: 4ba74e53ad ("drm/i915/selftests: Verify frequency scaling with RPS")
Fixes: 8757797ff9 ("drm/i915/selftests: Repeat the rps clock frequency measurement")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200709154931.23310-1-sudeep.holla@arm.com
As today those 2 WAs have different implementation between TGL and DG1
WA pages but checking the HSD it is clear that DG1 implementation
should be used for both, also to do so is easier as we just need to
extend WA 1407928979 to B* stepping.
Both WAs are need to fix some possible render corruptions.
DG1 initial patches were not merged yet, as soon it is this WAs should
be applied to DG1 as well.
BSpec: 53508
BSpec: 52890
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708212947.40178-1-jose.souza@intel.com
Some objects we map once during their construction, and then never
access their mappings again, even if they are kept around for the
duration of the driver. Keeping those pages mapped, often vmapped, is
therefore wasteful and we should release the maps as soon as we no
longer need them.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708173748.32734-3-chris@chris-wilson.co.uk
As we have a pin_map interface, that knows how to flush the data to the
device, use it. The only downside is that we keep the kmap around, as
once acquired we keep the mapping cached until the object's backing
store is released.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708173748.32734-2-chris@chris-wilson.co.uk
We removed retiring requests from the shrinker in order to decouple the
mutexes from reclaim in preparation for unravelling the struct_mutex.
The impact of not retiring is that we are much less agressive in making
global objects available for shrinking, as such objects remain pinned
until they are flushed by a heartbeat pulse following the last retired
request along their timeline. In order to ensure that pulse occurs in
time for memory reclamation, we should kick it from kswapd.
The catch is that we have added some flush_work() into the retirement
phase (to ensure that we reach a global idle in a timely manner), but
these flush_work() are not eligible (i.e do not belong to WQ_MEM_RELCAIM)
for use from inside kswapd. To avoid flushing those workqueues, we teach
the retirer not to do so unless we are actually waiting, and only do the
plain retire from inside the shrinker.
Note that for execlists, we already retire completed contexts as they
are scheduled out, so it should not be keeping global state
unnecessarily pinned. The legacy ringbuffer however...
References: 9e9539800d ("drm/i915: Remove waiting & retiring from shrinker paths")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708173748.32734-1-chris@chris-wilson.co.uk
In line with what happened for other gt-related features, move the sseu
debugfs files under gt/.
The sseu_status debugfs has also been kept at the top level as we do
have tests that use it; it will be removed once we teach the tests to
look into the new path.
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708003952.21831-10-daniele.ceraolospurio@intel.com
SSEUs are a GT capability, so track them under gt_info.
Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708003952.21831-8-daniele.ceraolospurio@intel.com
Keep all the SSEU code in the relevant file. The code has also been
updated to use intel_gt instead of dev_priv.
Based on an original patch by Sandeep.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708003952.21831-7-daniele.ceraolospurio@intel.com
We already call 2 gt-related init_mmio functions in driver_mmio_probe
and a 3rd one will be added by a follow-up patch, so pre-emptively
introduce a gt_init_mmio function to group them.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708003952.21831-6-daniele.ceraolospurio@intel.com
Since the engines belong to the GT, move the runtime-updated list of
available engines to the intel_gt struct. The original mask has been
renamed to indicate it contains the maximum engine list that can be
found on a matching device.
In preparation for other info being moved to the gt in follow up patches
(sseu), introduce an intel_gt_info structure to group all gt-related
runtime info.
v2: s/max_engine_mask/platform_engine_mask (tvrtko), fix selftest
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v1
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708003952.21831-5-daniele.ceraolospurio@intel.com
All the info we read in intel_device_info_init_mmio are engine-related
and since we already have an engine_init_mmio function we can just
perform the operations from there.
v2: clarify comment about forcewake requirements and pruning (Chris)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v1
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708003952.21831-4-daniele.ceraolospurio@intel.com
A follow up patch will move the engine mask under the gt structure,
so get ready for that.
v2: switch the remaining gvt case using dev_priv->gt to gvt->gt (Chris)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v1
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708003952.21831-3-daniele.ceraolospurio@intel.com
Firmware "Selected" state is a transient state - we don't expect to see
it after finishing driver probe, we even have asserts sprinkled over
i915 to confirm whether that's the case.
Unfortunately - we don't handle the transition out of "Selected" in case
of GuC fetch error, leading those asserts to fire when calling
"intel_huc_is_used()".
v2: Add dbg print when moving HuC into error state (Daniele)
Reported-by: Marcin Bernatowicz <marcin.bernatowicz@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Marcin Bernatowicz <marcin.bernatowicz@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708100843.297655-2-michal@hardline.pl
It has been pointed out that information about HuC usage doesn't belong
in guc_info debugfs. Let's move "supported/used/wanted" matrix to a
separate debugfs file, keeping guc_info strictly about GuC.
Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Lukasz Fiedorowicz <lukasz.fiedorowicz@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708100843.297655-1-michal@hardline.pl
On eviction, we acquire the vm->mutex and then wait on the vma->active.
Therefore when binding and pinning the vma, we must follow the same
sequence, lock/pin the vma then mark it active. Otherwise, we mark the
vma as active, then wait for the vm->mutex, and meanwhile the evictor
holding the mutex waits upon us to complete our activity.
Fixes: 8ccfc20a7d ("drm/i915/gt: Mark ring->vma as active while pinned")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.6+
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200706170138.8993-1-chris@chris-wilson.co.uk
(cherry picked from commit 8567774e87)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
On eviction, we acquire the vm->mutex and then wait on the vma->active.
Therefore when binding and pinning the vma, we must follow the same
sequence, lock/pin the vma then mark it active. Otherwise, we mark the
vma as active, then wait for the vm->mutex, and meanwhile the evictor
holding the mutex waits upon us to complete our activity.
Fixes: 8ccfc20a7d ("drm/i915/gt: Mark ring->vma as active while pinned")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.6+
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200706170138.8993-1-chris@chris-wilson.co.uk
We can add taint from multiple places, printing the caller allows us to
have a better overview of what exactly caused us to do the tainting.
v2: Tweak format and print the device (Chris)
v3: Move things around (Chris)
Suggested-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Petri Latvala <petri.latvala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200706144107.204821-2-michal@hardline.pl
Getting wedged device on driver init is pretty much unrecoverable.
Since we're running various scenarios that may potentially hit this in
CI (module reload / selftests / hotunplug), and if it happens, it means
that we can't trust any subsequent CI results, we should just apply the
taint to let the CI know that it should reboot (CI checks taint between
test runs).
v2: Comment that WEDGED_ON_INIT is non-recoverable, distinguish
WEDGED_ON_INIT from WEDGED_ON_FINI (Chris)
v3: Appease checkpatch, fixup search-replace logic expression mindbomb
in assert (Chris)
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Petri Latvala <petri.latvala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200706144107.204821-1-michal@hardline.pl
Reuse the ppgtt_bind_vma() for aliasing_ppgtt_bind_vma() so we can
reduce some code near-duplication. The catch is that we need to then
pass along the i915_address_space and not rely on vma->vm, as they
differ with the aliasing-ppgtt.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200703102519.26539-1-chris@chris-wilson.co.uk
The information about platform/driver/user view of GuC firmware usage
currently requires user to either go through kernel log or parse the
combination of "enable_guc" modparam and various debugfs entries.
Let's keep things simple and add a "supported/used/wanted" matrix
(already used internally by i915) in guc_info debugfs.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Lukasz Fiedorowicz <lukasz.fiedorowicz@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Lukasz Fiedorowicz <lukasz.fiedorowicz@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200701142752.419878-1-michal@hardline.pl
If the driver gets stuck holding the kernel timeline, we cannot issue a
heartbeat and so fail to discover that the driver is indeed stuck and do
not issue a GPU reset (which would hopefully unstick the driver!).
Switch to using a trylock so that we can query if the heartbeat's
timeline mutex is locked elsewhere, and then use the timer to probe if it
remains stuck at the same spot for consecutive heartbeats, indicating
that the mutex has not been released and the engine has not progressed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200702095219.963-1-chris@chris-wilson.co.uk
Alexandre Oliva has recently removed these files from Linux Libre
with concerns that the sources weren't available.
The sources are available on IGT repository, and only open source
tools are used to generate the {ivb,hsw}_clear_kernel.c files.
However, the remaining concern from Alexandre Oliva was around
GPL license and the source not been present when distributing
the code.
So, it looks like 2 alternatives are possible, the use of
linux-firmware.git repository to store the blob or making sure
that the source is also present in our tree. Since the goal
is to limit the i915 firmware to only the micro-controller blobs
let's make sure that we do include the asm sources here in our tree.
Btw, I tried to have some diligence here and make sure that the
asms that these commits are adding are truly the source for
the mentioned files:
igt$ ./scripts/generate_clear_kernel.sh -g ivb \
-m ~/mesa/build/src/intel/tools/i965_asm
Output file not specified - using default file "ivb-cb_assembled"
Generating gen7 CB Kernel assembled file "ivb_clear_kernel.c"
for i915 driver...
igt$ diff ~/i915/drm-tip/drivers/gpu/drm/i915/gt/ivb_clear_kernel.c \
ivb_clear_kernel.c
< * Generated by: IGT Gpu Tools on Fri 21 Feb 2020 05:29:32 AM UTC
> * Generated by: IGT Gpu Tools on Mon 08 Jun 2020 10:00:54 AM PDT
61c61
< };
> };
\ No newline at end of file
igt$ ./scripts/generate_clear_kernel.sh -g hsw \
-m ~/mesa/build/src/intel/tools/i965_asm
Output file not specified - using default file "hsw-cb_assembled"
Generating gen7.5 CB Kernel assembled file "hsw_clear_kernel.c"
for i915 driver...
igt$ diff ~/i915/drm-tip/drivers/gpu/drm/i915/gt/hsw_clear_kernel.c \
hsw_clear_kernel.c
5c5
< * Generated by: IGT Gpu Tools on Fri 21 Feb 2020 05:30:13 AM UTC
> * Generated by: IGT Gpu Tools on Mon 08 Jun 2020 10:01:42 AM PDT
61c61
< };
> };
\ No newline at end of file
Used IGT and Mesa master repositories from Fri Jun 5 2020)
IGT: 53e8c878a6fb ("tests/kms_chamelium: Force reprobe after replugging
the connector")
Mesa: 5d13c7477eb1 ("radv: set keep_statistic_info with
RADV_DEBUG=shaderstats")
Mesa built with: meson build -D platforms=drm,x11 -D dri-drivers=i965 \
-D gallium-drivers=iris -D prefix=/usr \
-D libdir=/usr/lib64/ -Dtools=intel \
-Dkulkan-drivers=intel && ninja -C build
v2: Header clean-up and include build instructions in a readme (Chris)
Modified commit message to respect check-patch
Reference: http://www.fsfla.org/pipermail/linux-libre/2020-June/003374.html
Reference: http://www.fsfla.org/pipermail/linux-libre/2020-June/003375.html
Fixes: 47f8253d2b ("drm/i915/gen7: Clear all EU/L3 residual contexts")
Cc: <stable@vger.kernel.org> # v5.7+
Cc: Alexandre Oliva <lxoliva@fsfla.org>
Cc: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200610201807.191440-1-rodrigo.vivi@intel.com
(cherry picked from commit 5a7eeb8ba1)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>