The OR routing logic in NVKM does not expect to receive supervisor
interrupts until the DD has provided consistent information on the
ORs it's using and the EVO/NVD assembly state to match.
The combination of changing window ownership + core channel update
during display init triggered a situation where we'd disconnect an
OR from the pad it was meant to still be driving on some systems.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
For various complicated reasons, we need to avoid sending a core update
method during display init. Something, which we've been required to do
on GV100 and up because we've been assigning windows to heads there and
the HW is rather picky about when that's allowed.
This moves window assignment into the modesetting path at a point where
it's much safer to send our first update methods to NVDisplay.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
On seqno rollover, we need to allocate ourselves a new cacheline. This
might incur grabbing a new page and pinning it into the GGTT, with some
rather unfortunate lockdep implications.
To avoid a mutex, and more specifically pinning in the GGTT from inside
the kernel context being used to flush the GGTT in emergencies, we will
likely need to lift the next-cacheline allocation to a pre-reservation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200203094152.4150550-3-chris@chris-wilson.co.uk
Inside the intel_timeline_get_seqno(), we currently track the retirement
of the old cachelines by listening to the new request. This requires
that the new request is ready to be used and so requires a minimum bit
of initialisation prior to getting the new seqno.
Fixes: b1e3177bd1 ("drm/i915: Coordinate i915_active with its own mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200203094152.4150550-2-chris@chris-wilson.co.uk
Take a reference to the previous exclusive fence on the i915_active, as
we wish to add an await to it in the caller (and so must prevent it from
being freed until we have completed that task).
Fixes: e3793468b4 ("drm/i915: Use the async worker to avoid reclaim tainting the ggtt->mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200203094152.4150550-1-chris@chris-wilson.co.uk
Now that intel_engine_apply_workarounds is called on all gens, we can
use the engine workaround lists for pre-gen8 workarounds as well to be
consistent in the way we handle and dump the WAs.
v2: Ignore the sanity check of MI_MODE on Broadwater, for whatever reason
it is not sticking.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200201194004.3622493-1-chris@chris-wilson.co.uk
A masked register does not need rmw to update, and it is best not to use
such a sequence.
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200131235035.3522102-1-chris@chris-wilson.co.uk
The workarounds are a common "feature" across gens and submission
mechanisms and we already call the other WA related functions from
common engine ones (<setup/cleanup>_common), so it makes sense to
do the same with WA application. Medium-term, This will help us
reduce the duplication once the GuC resume function is added, but short
term it will also allow us to use the workaround lists for pre-gen8
engine workarounds.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200131075716.2212299-2-chris@chris-wilson.co.uk
We already have guc_is_running function, but it only reflects
firmware status, while to fully use GuC we need to know if we've
already established communication with it.
v2: also s/intel_guc_is_running/intel_guc_is_fw_running (Chris)
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200131153706.109528-1-michal.wajdeczko@intel.com
Commit fd67e9c6ed ("drm/tegra: Do not implement runtime PM") replaced
the generic runtime PM usage by a host1x bus-specific implementation in
order to work around some assumptions baked into runtime PM that are in
conflict with the requirements in the Tegra DRM driver.
Unfortunately the new runtime PM callbacks are not setup yet at the time
when the SOR driver first needs to resume the device to register the SOR
pad clock, and accesses to register will cause the system to hang.
Note that this only happens on Tegra124 and Tegra210 because those are
the only SoCs where the SOR pad clock is registered from the SOR driver.
Later generations use a SOR pad clock provided by the BPMP.
Fix this by moving the registration of the SOR pad clock after the
host1x client has been registered. That's somewhat suboptimal because
this could potentially, though it's very unlikely, cause the Tegra DRM
to be probed if the SOR happens to be the last subdevice to register,
only to be immediately removed again if the SOR pad output clock fails
to register. That's just a minor annoyance, though, and doesn't justify
implementing a workaround.
Fixes: fd67e9c6ed ("drm/tegra: Do not implement runtime PM")
Signed-off-by: Thierry Reding <treding@nvidia.com>
If the driver fails to probe, make sure to disable runtime PM again.
While at it, make the cleanup code in ->remove() symmetric.
Signed-off-by: Thierry Reding <treding@nvidia.com>
In the rare cases where we are using the global GGTT for execution in
the selftests, we have marked them with PIN_USER knowing that they will
be bound as PIN_GLOBAL as well. However, we need to catch the extra flag
in deciding to use the async worker for such binds as well.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200131081543.2251298-1-chris@chris-wilson.co.uk
To enable non-persistent contexts, we require a means of cancelling any
inflight work from that context. This is first done "gracefully" by
using preemption to kick the active context off the engine, and then
forcefully by resetting the engine if it is active. If we are unable to
reset the engine to remove hostile userspace, we should not allow
userspace to opt into using non-persistent contexts.
If the per-engine reset fails, we still do a full GPU reset, but that is
rare and usually indicative of much deeper issues. The damage is already
done. However, the goal of the interface to allow long running compute
jobs without causing collateral damage elsewhere, and if we are unable
to support that we should make that known by not providing the
interface (and falsely pretending we can).
Fixes: a0e047156c ("drm/i915/gem: Make context persistence optional")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200130164553.1937718-1-chris@chris-wilson.co.uk
Let's add a copy of the active_pipes bitmask into the cdclk_state.
While this is duplicating a bit of information we may already
have elsewhere, I think it's worth it to decopule the cdclk stuff
from whatever else wants to use that bitmask. Also we want to get
rid of all the old ad-hoc global state which is what the current
bitmask is, so this removes one obstacle.
The one extra thing we have to remember is write locking the cdclk
state whenever the bitmask changes.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200120174728.21095-19-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
Let's convert cdclk_state to be a proper global state. That allows
us to use the regular atomic old vs. new state accessor, hopefully
making the code less confusing.
We do have to deal with a few more error cases in case the cdclk
state duplication fails. But so be it.
v2: Fix new plane min_cdclk vs. old crtc min_cdclk check
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200121140353.25997-1-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
Extract a small helper to compute the active pipes bitmask
based on the old bitmask + the crtcs in the atomic state.
I want to decouple the cdclk state entirely from the current
global state so I want to track the active pipes also inside
the (to be introduced) full cdclk state.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200120174728.21095-17-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
Now that we have the more formal global state thing let's
use if for memory bandwidth tracking. No real difference
to the current private object usage since we already
tried to avoid taking the single serializing lock needlessly.
But since we're going to roll the global state out to more
things probably a good idea to unify the approaches a bit.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200120174728.21095-16-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
Our current global state handling is pretty ad-hoc. Let's try to
make it better by imitating the standard drm core private object
approach.
The reason why we don't want to directly use the private objects
is locking; Each private object has its own lock so if we
introduce any global private objects we get serialized by that
single lock across all pipes. The global state apporoach instead
uses a read/write lock type of approach where each individual
crtc lock counts as a read lock, and grabbing all the crtc locks
allows one write access.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200120174728.21095-15-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
Give the cdclk init/uninit functions a _hw suffix to make
it clear they are about initializing the actual hardware.
I'll be wanting to to add a intel_cdclk_init() which is
purely initializing software structures.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200120174728.21095-12-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
intel_cdclk_needs_cd2x_update() is named rather confusingly.
We don't have to do a cd2x update, rather we are allowed to
do one (as opposed to a full PLL reprogramming with its heavy
handed modeset). So let's rename the function to
intel_cdclk_can_cd2x_update().
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200120174728.21095-7-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
Move the min_cdclk[] and min_voltage_level[] arrays under the
rest of the cdclk state. And while at it provide a simple
helper (intel_cdclk_clear_state()) to clear the state during
the ww_mutex backoff dance.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200120174728.21095-6-ville.syrjala@linux.intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Move the initial setup of state->{cdclk,min_cdclk[],min_voltage_level[]}
into intel_modeset_calc_cdclk(), and we'll move the counterparts into
intel_cdclk_swap_state(). This encapsulates the cdclk state much better.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200120174728.21095-5-ville.syrjala@linux.intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>