Don't zero out the watermarks for the Y plane since we've already
computed them when computing the UV plane's watermarks (since the
UV plane always appears before ethe Y plane when iterating through
the planes).
This leads to allocating no DDB for the Y plane since .min_ddb_alloc
also gets zeroed. And that of course leads to underruns when scanning
out planar formats.
Cc: stable@vger.kernel.org
Cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Fixes: dbf71381d7 ("drm/i915: Nuke intel_atomic_crtc_state_for_each_plane_state() from skl+ wm code")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210327005945.4929-1-ville.syrjala@linux.intel.com
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
(cherry picked from commit f99b805fb9)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
On HSW/BDW with VT-d active the first tile row scanned out
after the first async flip of the frame often ends up corrupted.
Whether the corruption happens or not depends on the scanline
on which the async flip happens, but the behaviour seems very
consistent. Ie. the same set of scanlines (which are most scanlines)
always show the corruption. And another set of scanlines (far less
of them) never shows the corruption.
I discovered that disabling the fetch-stride stretching
feature cures the corruption. This is some kind of TLB related
prefetch thing AFAIK. We already disable it on SNB primary
planes due to a documented workaround. The hardware folks
indicated that disabling this should be fine, so let's go
with that.
And while we're here, let's document the relevant bits on all
pre-skl platforms.
Fixes: 2a636e240c ("drm/i915: Implement async flip for ivb/hsw")
Fixes: cda195f13a ("drm/i915: Implement async flips for bdw")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210220103303.3448-1-ville.syrjala@linux.intel.com
Reviewed-by: Karthik B S <karthik.b.s@intel.com>
(cherry picked from commit b7a7053ab2)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
As it now it is always required for GEN12+ the is_16gb_dimm name
do not make sense for GEN12+.
v2:
- Updated comment on top of "dram_info->wm_lv_0_adjust_needed =
!IS_GEN9_LP(i915);"
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210128164312.91160-3-jose.souza@intel.com
In order to make the dbuf state computation less fragile
let's make it stand on its own feet by not requiring someone
to peek into a crystall ball ahead of time to figure out
which pipes need to be added to the state under which potential
future conditions. Instead we compute each piece of the state
as we go along, and if any fallout occurs that affects more than
the current set of pipes we add the affected pipes to the state
naturally.
That requires that we track a few extra thigns in the global
dbuf state: dbuf slices for each pipe, and the weight each
pipe has when distributing the same set of slice(s) between
multiple pipes. Easy enough.
We do need to follow a somewhat careful sequence of computations
though as there are several steps involved in cooking up the dbuf
state. Thoguh we could avoid some of that by computing more things
on demand instead of relying on earlier step of the algorithm to
have filled it out. I think the end result is still reasonable
as the entire sequence is pretty much consolidated into a single
function instead of being spread around all over.
The rough sequence is this:
1. calculate active_pipes
2. calculate dbuf slices for every pipe
3. calculate total enabled slices
4. calculate new dbuf weights for any crtc in the state
5. calculate new ddb entry for every pipe based on the sets of
slices and weights, and add any affected crtc to the state
6. calculate new plane ddb entries for all crtcs in the state,
and add any affected plane to the state so that we'll perform
the requisite hw reprogramming
And as a nice bonus we get to throw dev_priv->wm.distrust_bios_wm
out the window.
v2: Keep crtc_state->wm.skl.ddb
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210122205633.18492-8-ville.syrjala@linux.intel.com
Extract the code to calculate the weights used to chunk up the dbuf
between pipes. There's still extra stuff in there that shouldn't be
there and must be moved out, but that requires a bit more state to
be tracked in the dbuf state.
v2: Keep crtc_state->wm.skl.ddb
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210122205633.18492-7-ville.syrjala@linux.intel.com
The dbuf state will be where we collect all the inter-pipe dbuf
allocation stuff. Start by adding the actual per-pipe ddb entries
there.
Originally the plan was to move them there outright, but that no longer
works as we're no longer guaranteed to have a dbuf state when it comes
time to sanity check the ddb overlaps in skl_commit_modeset_enables().
I think when I wrote this originally we did the watermark/ddb
calculation last, and so we couldn't have any crtcs in the state w/o
also having the dbuf state. But that has since changed and we do the
watermark/ddb calculation much earlier, and thus it is now possible
to commit crtcs w/o a dbuf state. So we keep another copy of the
information in the crtc state.
v2: Rebase
v3: Duplicate the entries instead of moving
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210122205633.18492-6-ville.syrjala@linux.intel.com
Generalize icl_get_first_dbuf_slice_offset() into something that
just gives us the start+end of the dbuf chunk covered by the
specified slices as a standard ddb entry. Initial idea was to use
it during readout as well, but we shall see.
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210122205633.18492-5-ville.syrjala@linux.intel.com
skl_ddb_get_pipe_allocation_limits() doesn't care how the weights
for distributing the ddb are caclculated for each pipe. Put that
calculation into a separate function so that such mundane details
are hidden from view.
v2: s/adjusted_mode/pipe_mode/
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210122205633.18492-2-ville.syrjala@linux.intel.com
DG1 is missing those two WA so instead of copy and paste it to the DG1
function, here calling the function that implements it.
While at it also renaming tgl_init_clock_gating to
gen12lp_init_clock_gating as it is also used by DG1, RKL and ADL-S.
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210113133759.72055-1-jose.souza@intel.com
drm/i915 features for v5.11:
Highlights:
- Enable big joiner to join two pipes to one port to overcome pipe restrictions
(Manasi, Ville, Maarten)
Display:
- More DG1 enabling (Lucas, Aditya)
- Fixes to cases without display (Lucas, José, Jani)
- Initial PSR state improvements (José)
- JSL eDP vswing updates (Tejas)
- Handle EDID declared max 16 bpc (Ville)
- Display refactoring (Ville)
Other:
- GVT features
- Backmerge
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87czzzkk1s.fsf@intel.com
Arguably some of these should use intel_de_read() or intel_de_write(),
however not all. Prioritize I915_READ() and I915_WRITE() removal in
general over migrating to the pedantically correct replacements right
away.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201130111601.2817-7-jani.nikula@intel.com
Store the relative data rate for planes in the crtc state
so that we don't have to use
intel_atomic_crtc_state_for_each_plane_state() to compute
it even for the planes that are no part of the current state.
Should probably just nuke this stuff entirely an use the normal
plane data rate instead. The two are slightly different since this
relative data rate doesn't factor in the actual pixel clock, so
it's a bit odd thing to even call a "data rate". And since the
watermarks are computed based on the actual data rate anyway
I don't really see what the point of this relative data rate
is. But that's for the future...
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201106173042.7534-6-ville.syrjala@linux.intel.com
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
In order to remove intel_atomic_crtc_state_for_each_plane_state()
from skl_crtc_can_enable_sagv() we can simply precompute whether
each wm level can tolerate the SAGV block time latency or not.
This has the nice side benefit that we remove the duplicated
wm level latency calculation. In fact the copy of that code
we had in skl_crtc_can_enable_sagv() didn't even handle
WaIncreaseLatencyIPCEnabled/Display WA #1141 whereas the copy
in skl_compute_plane_wm() did. So now we just have the one
copy which handles all the w/as.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201106173042.7534-5-ville.syrjala@linux.intel.com
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
intel_atomic_crtc_state_for_each_plane_state() peeks at the
plane's current state without holding the plane's mutex, trusting
that the crtc's mutex will protect it. In practice that does work
since our planes can't move between pipes, but it sets a bad
example. intel_atomic_crtc_state_for_each_plane_state() also
relies on crtc_state.uapi.plane_mask which may be full of lies
when it comes to the bigjoiner stuff, so soon we can't use it as
is anyway. So best to just get rid of it entirely. Which we can
easily do by switching to the g4x/vlv "raw" watermark approach.
Later on we should even be able to move the "raw" watermark
computation into the normal .plane_check() code, leaving only
the merging/clamping of the final watermarks to the later
stages. But that will require adjusting the ilk+ wm code
similarly as well.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Signed-off-by: Manasi Navare <manasi.d.navare@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201106173042.7534-3-ville.syrjala@linux.intel.com
Pass the whole intel_atomic_state to skl_build_pipe_wm()
and skl_allocate_pipe_ddb() so we can start to iterate
stuff containerd in the commit.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Manasi Navare <manasi.d.navare@intel.com>
Signed-off-by: Manasi Navare <manasi.d.navare@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201106173042.7534-2-ville.syrjala@linux.intel.com
With bigjoiner, there will be 2 pipes driving 2 halves of 1 transcoder,
because of this, we need a pipe_mode for various calculations, including
for example watermarks, plane clipping, etc.
v10:
* remove redundant pipe_mode assignment (Ville)
v9:
* pipe_mode in state dump nd state check (Ville)
v8:
* Add pipe_mode in readout in verify_crtc_state (Ville)
v7:
* Remove redundant comment (Ville)
* Just keep mode instead of pipe_mode (Ville)
v6:
* renaming in separate function, only pipe_mode here (Ville)
* Add description (Maarten)
v5:
* Rebase (Manasi)
v4:
* Manual rebase (Manasi)
v3:
* Change state to crtc_state, fix rebase err (Manasi)
v2:
* Manual Rebase (Manasi)
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Manasi Navare <manasi.d.navare@intel.com>
Reviewed-by: Animesh Manna <animesh.manna@intel.com>
[vsyrjala:
* Fix state checker
* Fix state dump
* Use pipe_mode for linetime watermarks
* Make sure pipe_mode normal timings are correct since the
silly ddb code uses them
* Drop the redundant pipe_mode copies from intel_modeset_pipe_config()
and intel_crtc_copy_uapi_to_hw_state()
* Use drm_mode_copy() all over]
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201112191718.16683-7-ville.syrjala@linux.intel.com
Cross-subsystem Changes:
- DMA mapped scatterlist fixes in i915 to unblock merging of
https://lkml.org/lkml/2020/9/27/70 (Tvrtko, Tom)
Driver Changes:
- Fix for user reported issue #2381 (Graphical output stops with "switching to inteldrmfb from simple"):
Mark ininitial fb obj as WT on eLLC machines to avoid rcu lockup during fbdev init (Ville, Chris)
- Fix for Tigerlake (and earlier) to avoid spurious empty CSB events leading to hang (Chris, Bruce)
- Delay execlist processing for Tigerlake to avoid hang (Chris)
- Fix for Tigerlake RCS engine health check through heartbeat (Chris)
- Fix for Tigerlake reserved MOCS entries (Ayaz, Chris)
- Fix Media power gate sequence on Tigerlake (Rodrigo)
- Enable eLLC caching of display buffers for SKL+ (Ville)
- Support parsing of oversize batches on Gen9 (Matt, Chris)
- Exclude low pages (128KiB) of stolen from use to avoid thrashing during reset (Chris)
- Flush engines before Tigerlake breadcrumbs (Chris)
- Use the local HWSP offset during submission (Chris)
- Flush coherency domains on first set-domain-ioctl (Chris, Zbigniew)
- Use the active reference on the vma while capturing to avoid use-after-free (Chris)
- Fix MOCS PTE setting for gen9+ (Ville)
- Avoid NULL dereference on IPS driver callback while unbinding i915 (Chris)
- Avoid NULL dereference from PT/PD stash allocation error (Matt)
- Hold request reference for canceling an active context (Chris)
- Avoid infinite loop on x86-32 when mapping a lot of objects (Chris)
- Disallow WC mappings when processor doesn't support them (Chris)
- Return correct error in i915_gem_object_copy_blt() error path (Dan)
- Return correct error in intel_context_create_request() error path (Maarten)
- Tune down GuC communication enabled/disabled messages to debug (Jani)
- Fix rebased commit "Remove i915_request.lock requirement for execution callbacks" (Chris)
- Cancel outstanding work after disabling heartbeats on an engine (Chris)
- Signal cancelled requests (Chris)
- Retire cancelled requests on unload (Chris)
- Scrub HW state on driver remove (Chris)
- Undo forced context restores after trivial preemptions (Chris)
- Handle PCI unbind in PMU code (Tvrtko)
- Fix CPU hotplug with multiple GPUs in PMU code (Trtkko)
- Correctly set SFC capability for video engines (Venkata)
- Update GuC code to use firmware v49.0.1 (John, Matthew B., Daniele, Oscar, Michel, Rodrigo, Michal)
- Improve GuC warnings on loading failure (John)
- Avoid ownership race in buffer pool by clearing age (Chris)
- Use MMIO to read CSB in case of failure (Chris, Mika)
- Show engine properties in engine state dump to indicate changes (Chris, Joonas)
- Break up error capture compression loops with cond_resched() (Chris)
- Reduce GPU error capture mutex hold time to avoid khungtaskd (Chris)
- Serialise debugfs i915_gem_objects with ctx->mutex (Chris)
- Always test execution status on closing the context and close if not persistent (Chris)
- Avoid mixing integer types during batch copies (Chris, Jared)
- Skip over MI_NOOP when parsing to avoid overhead (Chris)
- Hold onto an explicit ref to i915_vma_work.pinned (Chris)
- Perform all asynchronous waits prior to marking payload start (Chris)
- Pull phys pread/pwrite implementations to the backend (Matt)
- Improve record of hung engines in error state (Tvrtko)
- Allow backends to override pread implementation (Matt)
- Reinforce LRC poisoning checks to confirm context survives execution (Chris)
- Fix memory region max size calculation (Matt)
- Fix order when adding blocks to memory region (Matt)
- Eliminate unused intel_virtual_engine_get_sibling func (Chris)
- Cleanup kasan warning for on-stack (unsigned long) casting (Chris)
- Onion unwind for scratch page allocation failure (Chris)
- Poison stolen pages before use (Chris)
- Selftest improvements (Chris)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201112163407.GA20320@jlahtine-mobl.ger.corp.intel.com
Some media power gates are disabled by default. commit 5d86923060
("drm/i915/tgl: Enable VD HCP/MFX sub-pipe power gating")
tried to enable it, but it duplicated an existent register.
So, the main PG setup sequences ended up overwriting it.
So, let's now merge this to the main PG setup sequence.
v2: (Chris): s/BIT/REG_BIT, remove useless comment,
remove useless =0, use the right gt,
remove rc6 sequence doubt from commit message.
Fixes: 5d86923060 ("drm/i915/tgl: Enable VD HCP/MFX sub-pipe power gating")
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: stable@vger.kernel.org#v5.5+
Cc: Dale B Stimson <dale.b.stimson@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20201111072859.1186070-1-rodrigo.vivi@intel.com
WAC6entrylatency is trying to fix excessive rc6 entry latency caused
by the extra delay from FBC_LLC_READ_CTRL, which is there for some
extra sync with uncore for frame buffer caching in LLC.
Reading through the hsd the recommendation was to set the FBC_LLC_FULLY_OPEN
bit to disable this extra delay entirely. This can be done whenever fb LLC
caching is not used. The alternative suggestion was to reduce the delay to
eg. 0x5 via updated BIOS programming instructions. But all the kbl/cfl
machines I've seen still have the default 0xff programmed. As we never use
fb LLC caching let's just apply the w/a to all skl derivatives to get
consistent rc6 latencies.
I was able to measure the effect of FBC_LLC_READ_CTRL to rc6 latency
via forcewake. Here's a graph of some of the results:
sleep;fw_req=1;wait fw_ack==1;sleep;fw_req=0;wait fw_ack==0
fw_ack==1 duration
160us +----------------------------------------------------------------+
| + + $$+ + + |
| $$ $ $ ******$$ ** $ $**$* #########$$######|
140us |-$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$*$$$$$$$$$$$$$$$$ $$$$$$|
| $ * # |
| $ * # |
120us |$+ * # +-|
|$ * # |
|$ * # # |
100us |$+ ************######################## +-|
|$ * *# |
|$ ***** ######### |
80us |$+ * # #### ## +-|
|$ **** ### # # |
| ** #### FBC_LLC_READ_CTRL: 0x8000 ******* |
60us |-###### FBC_LLC_READ_CTRL: 0xffff #######-|
|## + + FBC_LLC_READ_CTRL: 0x400000ff $$$$$$$ |
+----------------------------------------------------------------+
0ms 10ms 20ms 30ms 40ms 50ms 60ms
sleep duration
The default FBC_LLC_READ_CTRL value of 0xff is documented to give us
a 170usec delay. That tracks well with the knees at 0xffff->~44msec and
0x8000->~22msec we see in the graph.
We can see that if we sleep longer than the FBC_LLC_READ_CTRL delay
we always observe the full (~145usec) rc6 wakeup latency. But if we sleep
for less than the FBC_LLC_READ_CTRL delay we see a quicker fw wakeup,
presumably due the hardware not having yet entered rc6 fully.
The other plateaus in the graph I suspect correspond to some shallower
internal rc states.
v2: s/usec/msec/ typo in commit msg
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200716190426.17047-2-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
DG1 shares some workarounds with TGL and RKL and also has some
additional workarounds of its own.
v2: Correct location of Wa_1408615072 (JohnH).
v3: Apply WAs 1606700617, 18011464164 and 22010931296 to DG1 (José)
v4 (Anusha)
- Add Wa_22010271021
- s/Wa_14010096844/Wa_1409836686
v5:
- Extend Wa_14010919138 to all revs (Matt Atwood)
- Power gate media is global gen12 design. (Rodrigo)
- Rebase (Lucas)
v6: use REG_BIT() to fix checkpatch warning (Lucas)
BSpec: 53508
Cc: Matt Atwood <matthew.s.atwood@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201014191937.1266226-8-lucas.demarchi@intel.com
TGL made stepping a litte mess, workarounds refer to the stepping of
the IP(GT or Display) not of the GPU stepping so it would already
require the same solution as used in commit 96c5a15f9f
("drm/i915/kbl: Fix revision ID checks").
But to make things even more messy it have a different IP stepping
mapping between SKUs and the same stepping revision of GT do not match
the same HW between TGL U/Y and regular TGL.
So it was required to have 2 different macros to check GT WAs while
for Display we are able to use just one macro that uses the right
revids table.
All TGL workarounds checked and updated accordingly.
v2:
- removed TODO to check if WA 14010919138 applies to regular TGL.
- fixed display stepping in regular TGL (Anusha)
BSpec: 52890
BSpec: 55378
BSpec: 44455
Reviewed-by: Anusha Srivatsa <anusha.srivtsa@intel.com>
Cc: Anusha Srivatsa <anusha.srivatsa@intel.com>
Cc: Penne Lee <penne.y.lee@intel.com>
Cc: Guangyao Bai <guangyao.bai@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200827233943.400946-1-jose.souza@intel.com
We usually assume that increasing PCI device revision ID's translates to
newer steppings; macros like IS_KBL_REVID() that we use rely on this
behavior. Unfortunately this turns out to not be true on KBL; the
newer device 2 revision ID's sometimes go backward to older steppings.
The situation is further complicated by different GT and display
steppings associated with each revision ID.
Let's work around this by providing a table to map the revision ID to
specific GT and display steppings, and then perform our comparisons on
the mapped values.
v2:
- Move the kbl_revids[] array to intel_workarounds.c to avoid compiler
warnings about an unused variable in files that don't call the
macros (kernel test robot).
Bspec: 18329
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200811032105.2819370-1-matthew.d.roper@intel.com
Reviewed-by: Swathi Dhanavanthri <swathi.dhanavanthri@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
It's silly to have if(SKL) checks in gen9_init_clock_gating() when
we can just move those bits into skl_init_clock_gating().
I'm not entirely convinced we even need this w/a, or if we do
then maybe we want it for kbl/cfl as well. IIRC it was only
listed in the wadb, but that is now dead so can't double check
anymore. Bspec doesn't seem to have any purely skl specific
DOP clock gating workarounds listed.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200716190426.17047-1-ville.syrjala@linux.intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Some platforms apply the FBC w/as in .init_clock_gating(), some
in fbc_activate(). Move them all to .init_clock_gating() for
consistentce. Also safer since we don't have to worry about the
RMWs clashing with any other runtime use of the same registers.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708131223.9519-1-ville.syrjala@linux.intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
A follow up patch will move the engine mask under the gt structure,
so get ready for that.
v2: switch the remaining gvt case using dev_priv->gt to gvt->gt (Chris)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v1
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708003952.21831-3-daniele.ceraolospurio@intel.com
Normally i85x/i865 3D activity will block FBC until a 2D blit
occurs. I suppose this was meant to avoid recompression while
3D activity is still going on but the frame hasn't yet been
presented. Unfortunately that also means that a page flipped
3D workload will permanently block FBC even if it only renders
a single frame and then does nothing.
Since we are using software render tracking anyway we might as
well flip the chicken bit so that 3D does not block FBC. This
will avoid the permament FBC blockage in the aforemention use
case, but thanks to the software tracking the compressor will
not disturb 3D rendering activity.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200702153723.24327-5-ville.syrjala@linux.intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
'level' here means the highest level we can't use, so when checking
the fbc watermarks we need a -1 to get at the last enabled level.
While at if refactor the code a bit to declutter
g4x_compute_pipe_wm().
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200429101034.8208-12-ville.syrjala@linux.intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Cometlake is a small refresh of Coffeelake, but since we have found out a
difference in the plaforms, we need to identify them as separate platforms.
Since we previously took Coffeelake/Cometlake as identical, update all
IS_COFFEELAKE() to also include IS_COMETLAKE().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200602140541.5481-1-chris@chris-wilson.co.uk
Removed duplicate include and fixed comment > 80 chars.
v2: Added newline after system include and between functions
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200522131843.20477-1-stanislav.lisovskiy@intel.com
According to BSpec max BW per slice is calculated using formula
Max BW = CDCLK * 64. Currently when calculating min CDCLK we
account only per plane requirements, however in order to avoid
FIFO underruns we need to estimate accumulated BW consumed by
all planes(ddb entries basically) residing on that particular
DBuf slice. This will allow us to put CDCLK lower and save power
when we don't need that much bandwidth or gain additional
performance once plane consumption grows.
v2: - Fix long line warning
- Limited new DBuf bw checks to only gens >= 11
v3: - Lets track used Dbuf bw per slice and per crtc in bw state
(or may be in DBuf state in future), that way we don't need
to have all crtcs in state and those only if we detect if
are actually going to change cdclk, just same way as we
do with other stuff, i.e intel_atomic_serialize_global_state
and co. Just as per Ville's paradigm.
- Made dbuf bw calculation procedure look nicer by introducing
for_each_dbuf_slice_in_mask - we often will now need to iterate
slices using mask.
- According to experimental results CDCLK * 64 accounts for
overall bandwidth across all dbufs, not per dbuf.
v4: - Fixed missing const(Ville)
- Removed spurious whitespaces(Ville)
- Fixed local variable init(reduced scope where not needed)
- Added some comments about data rate for planar formats
- Changed struct intel_crtc_bw to intel_dbuf_bw
- Moved dbuf bw calculation to intel_compute_min_cdclk(Ville)
v5: - Removed unneeded macro
v6: - Prevent too frequent CDCLK switching back and forth:
Always switch to higher CDCLK when needed to prevent bandwidth
issues, however don't switch to lower CDCLK earlier than once
in 30 minutes in order to prevent constant modeset blinking.
We could of course not switch back at all, however this is
bad from power consumption point of view.
v7: - Fixed to track cdclk using bw_state, modeset will be now
triggered only when CDCLK change is really needed.
v8: - Lock global state if bw_state->min_cdclk is changed.
- Try getting bw_state only if there are crtcs in the commit
(need to have read-locked global state)
v9: - Do not do Dbuf bw check for gens < 9 - triggers WARN
as ddb_size is 0.
v10: - Lock global state for older gens as well.
v11: - Define new bw_calc_min_cdclk hook, instead of using
a condition(Manasi Navare)
v12: - Fixed rebase conflict
v13: - Added spaces after declarations to make checkpatch happy.
Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Reviewed-by: Manasi Navare <manasi.d.navare@intel.com>
Signed-off-by: Manasi Navare <manasi.d.navare@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200520150058.16123-1-stanislav.lisovskiy@intel.com
The current dbuf slice computation only happens when there are
active pipes. If we are turning off all the pipes we just leave
the dbuf slice mask at it's previous value, which may be something
other that BIT(S1). If runtime PM will kick in it will however
turn off everything but S1. Then on the next atomic commit (if
the new dbuf slice mask matches the stale value we left behind)
the code will not turn on the other slices we now need. This will
lead to underruns as the planes are trying to use a dbuf slice
that's not powered up.
To work around let's just just explicitly set the dbuf slice mask
to BIT(S1) when we are turning off all the pipes. Really the code
should just calculate this stuff the same way regardless whether
the pipes are on or off, but we're not quite there yet (need a
bit more work on the dbuf state for that).
v2: Let's not put the fix into dead code
Cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Fixes: 3cf43cdc63 ("drm/i915: Introduce proper dbuf state")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200518121354.20401-1-ville.syrjala@linux.intel.com
Combine the two per-pipe dbuf debugs into one, and use the canonical
[CRTC:%d:%s] style to identify the crtc. Also use the same style as
the plane code uses for the ddb start/end, and prefix bitmask properly
with 0x to make it clear they are in fact bitmasks.
The "how many total slices we are going to use" debug we move to
outside the crtc loop so it gets printed only once at the end.
Cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200225171125.28885-12-ville.syrjala@linux.intel.com
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>