We can implement limited RC6 counter wrap-around protection under the
assumption that clients will be reading this value more frequently than
the wrap period on a given platform.
With the typical wrap-around period being ~90 minutes, even with the
exception of Baytrail which wraps every 13 seconds, this sounds like a
reasonable assumption.
Implementation works by storing a 64-bit software copy of a hardware RC6
counter, along with the previous HW counter snapshot. This enables it to
detect wrap is polled frequently enough and keep the software copy
monotonically incrementing.
v2:
* Missed GEN6_GT_GFX_RC6_LOCKED when considering slot sizing and
indexing.
* Fixed off-by-one in wrap-around handling. (Chris Wilson)
v3:
* Simplify index checking by using unsigned int. (Chris Wilson)
* Expand the comment to explain why indexing works.
v4:
* Use __int128 if supported.
v5:
* Use mul_u64_u32_div. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94852
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> # v3
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180208160036.29919-1-tvrtko.ursulin@linux.intel.com
Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Most of our ioctl functions have an _ioctl suffix in the name. I like
that idea since it makes it easy to figure out how the function is
going to get called. Rename the handful of exceptions to follow the
same pattern.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180207164841.19431-1-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Rather than having the high level ioctl interface guess the underlying
implementation details, having the implementation declare what
capabilities it exports. We define an intel_driver_caps, similar to the
intel_device_info, which instead of trying to describe the HW gives
details on what the driver itself supports. This is then populated by
the engine backend for the new scheduler capability field for use
elsewhere.
v2: Use caps.scheduler for validating CONTEXT_PARAM_SET_PRIORITY (Mika)
One less assumption of engine[RCS] \o/
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tomasz Lis <tomasz.lis@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tomasz Lis <tomasz.lis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180207210544.26351-2-chris@chris-wilson.co.uk
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Since unbannable contexts are special and supposed not to be causing GPU
hangs in the first place, make it clear when they are implicated in said
hang. In practice, most unbannable contexts are those created by igt
for the express purpose of throwing untold thousands of hangs at the GPU
and wish to keep doing so to finish the test. Normally they are cleaned
up, but it's when they or the other unbannable kernel contexts stay
stuck in an erroneous state that we need to worry and so need
highlighting.
Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180205094139.10671-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
We're using i915_inject_load_failure() to inject dummy
faults during driver load, but since this is debug utility
we shouldn't expose it in default config as it consumes
both code and data.
add/remove: 0/1 grow/shrink: 0/2 up/down: 0/-302 (-302)
Function old new delta
__i915_inject_load_failure 61 - -61
i915_gem_init 1331 1268 -63
i915_driver_load 5923 5745 -178
Total: Before=1177454, After=1177152, chg -0.03%
add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-4 (-4)
Data old new delta
i915_load_fail_count 4 - -4
Total: Before=56762, After=56758, chg -0.01%
add/remove: 4/8 grow/shrink: 0/1 up/down: 245/-591 (-346)
RO Data old new delta
__param_str_inject_load_failure 20 - -20
__UNIQUE_ID_inject_load_failuretype200 34 - -34
__param_inject_load_failure 40 - -40
__func__ 4998 4896 -102
__UNIQUE_ID_inject_load_failure201 150 - -150
Total: Before=119095, After=118749, chg -0.29%
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180201173248.3912-1-michal.wajdeczko@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We have the max DP link rate info available in VBT since BDB version
216, included in child device config since commit c4fb60b9ab
("drm/i915/bios: add DP max link rate to VBT child device
struct"). Parse it and use it.
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/a8b1364d1f2394fba3062b6ad11b474744ea4366.1517482774.git.jani.nikula@intel.com
There is no requirement for doing the PCODE request polling atomically,
so do that only for a short time switching to sleeping poll afterwards.
The specification requires a 150usec timeout for the change notification,
so let's use that for the atomic poll. Do the extra 2ms poll - needed as
a workaround on BXT/GLK - in sleeping mode.
v2:
- rebase on v2 of patchset dropping the sandybridge_pcode_read/write
refactoring (Chris)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180130142939.17983-2-imre.deak@intel.com
Currently we see sporadic timeouts during CDCLK changing both on BXT and
GLK as reported by the Bugzilla: ticket. It's easy to reproduce this by
changing the frequency in a tight loop after blanking the display. The
upper bound for the completion time is 800us based on my tests, so
increase it from the current 500us to 2ms; with that I couldn't trigger
the problem either on BXT or GLK.
Note that timeouts happened during both the change notification and the
voltage level setting PCODE request. (For the latter one BSpec doesn't
require us to wait for completion before further HW programming.)
This issue is similar to
commit 2c7d0602c8 ("drm/i915/gen9: Fix PCODE polling during CDCLK
change notification")
but there the PCODE request does complete (as shown by the mbox
busy flag), only the reply we get from PCODE indicates a failure.
So there we keep resending the request until a success reply, here we
just have to increase the timeout for the one PCODE request we send.
v2:
- s/snb_pcode_request/sandybridge_pcode_write_timeout/ (Ville)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.4+
Acked-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103326
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180130142939.17983-1-imre.deak@intel.com
GEN9/10 had fixed DBuf block size of 512. Dbuf block size is not a
fixed number anymore in GEN11, it varies according to bits per pixel
and tiling. If 8bpp & Yf-tile surface, block size = 256 else block
size = 512
This patch addresses the same.
v2 (from Paulo):
- Make it compile.
- Fix a few coding style issues.
v3:
- Rebase on top of upstream patches
v4 (from Paulo):
- Bikeshed if statements (James).
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: James Ausmus <james.ausmus@intel.com>
Signed-off-by: Mahesh Kumar <mahesh1.kumar@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180130134918.32283-3-paulo.r.zanoni@intel.com
On CNP boards that are using DDI F,
bit 25 (SDE_PORTE_HOTPLUG_SPT) is representing
the Digital Port F hotplug line when the Digital
Port F hotplug detect input is enabled.
v2: Reuse all existent structure instead of adding a
new HPD_PORT_F pointing to pin of port E.
v3: Use IS_CNL_WITH_PORT_F so we can start upstreaming
this right now. If that SKU ever get a proper name
we come back and update it.
v4: Rebase on top of digital connected port using encoder
instead of port.
v5: Moved IS_CNL_WITH_PORT_F definition to the PCI IDs patch.
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Manasi Navare <manasi.d.navare@intel.com>
Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180129232223.766-8-rodrigo.vivi@intel.com
On some Cannonlake SKUs we have a dedicated Aux for port F,
that is only the full split between port A and port E.
There is still no Aux E for Port E, as in previous platforms,
because port_E still means shared lanes with port A.
v2: Rebase.
v3: Add couple missed PORT_F cases on intel_dp.
v4: Rebase and fix commit message.
v5: Squash Imre's "drm/i915: Add missing AUX_F power well string"
v6: Rebase on top of display headers rework.
v7: s/IS_CANNONLAKE/IS_CNL_WITH_PORT_F (DK)
v8: Fix Aux bits for Port F (DK)
v9: Fix VBT definition of Port F (DK).
v10: Squash power well addition to this patch to avoid
warns as pointed by DK.
v11: Clean up squashed commit message. (David)
v12: Remove unnecessary handling for older platforms (DK)
Adding AUX_F to PG2 following other existent ones. (DK)
Cc: David Weinehall <david.weinehall@linux.intel.com>
Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Manasi Navare <manasi.d.navare@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180129232223.766-2-rodrigo.vivi@intel.com
The only difference is that this SKUs has the full
Port A/E split named as Port F.
But since SKUs differences don't matter on the platform
definition group and ids, let's merge all off them together.
v2: Really include the PCI IDs to the picidlist[];
v3: Add the PCI Id for another SKU (Anusha).
v4: Update IDs, really include to pciidlists again.
v5: Unify all GT2 IDs.
v6: Unify in a way that we don't break early-quirks.c
v7: Remove GT reference since it doesn't matter here (Paulo)
Also move IS_CNL_WITH_PORT_F macro to this patch to
make it easier for review this part and also to get
used sooner.
v8: Rebased on top of commit 5db47e37b3 ("Revert "drm/i915:
mark all device info struct with __initconst"")
Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180129232223.766-1-rodrigo.vivi@intel.com
Replace the ad-hoc plane indexing scheme used by the frontbuffer
tracking with enum plane_id.
The old video overlay not being part of the plane_id namespace
will just be given the high bit.
v2: Drop the unintended whitespace change (Chris)
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180123183343.9181-1-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
By counting the number of times we have woken up, we have a very simple
means of defining an epoch, which will come in handy if we want to
perform deferred tasks at the end of an epoch (i.e. while we are going
to sleep) without imposing on the next activity cycle.
v2: No reason to specify precise number of bits here.
v3: Take Tvrtko's advice and reserve 0 as an invalid epoch.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180124113608.14909-1-chris@chris-wilson.co.uk
Add the enum additions to ICP PCH.
v2 (from Paulo): don't set any platforms to it yet since ICP support is
incomplete.
v3 (from Rodrigo): Fix ICP name.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180111180010.24357-4-paulo.r.zanoni@intel.com
Icelake is an Intel® Processor containing an Intel® Graphics
Controller.
This is just an initial Icelake definition. PCI IDs, Icelake support
and new features coming in following patches.
v2: Add .ddb_size and .has_guc (Michal Wajdeczko).
v3: Add the ICL_FEATURES macro (Kelvin Gardiner).
v4 (from Paulo): Add missing __initconst (Paulo) and say "graphics
controller" instead of something that looks like an official marketing
name but isn't (Chris).
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180111180010.24357-3-paulo.r.zanoni@intel.com
The CDCLK bypass frequency can vary on upcoming platforms, so prepare
for that now by tracking its value in the CDCLK state.
Currently on BDW+ the bypass frequency is always the reference clock and
I didn't bother with earlier platforms since it's not all that clear
what's the bypass clock on those.
I also didn't bother adding support for changing this frequency, since
atm I don't see any need for it.
Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180117172508.15993-1-imre.deak@intel.com
struct timeval is deprecated because it cannot represent times
past 2038. In this driver, the only use of this structure is
to capture debug information. This is easily changed to ktime_t,
which we then format as needed when printing it later.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20180117154916.219273-1-arnd@arndb.de
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This flag has become redundant since
commit 4d90f2d507 ("drm/i915: Start tracking PSR state in crtc state")
It is set at the same place as psr.enabled, which is also exposed via
debugfs.
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180103213824.1405-1-dhinakaran.pandiyan@intel.com
Once the Aksv is available in the PCH, we need to get it on the wire to
the receiver via DDC. The hardware doesn't allow us to read the value
directly, so we need to tell GMBUS to source the Aksv internally and
send it to the right offset on the receiver.
The way we do this is to initiate an indexed write where the index is
the Aksv register offset. We write dummy values to GMBUS3 as if we were
sending the key, and the hardware slips in the "real" values when it
goes out.
Changes in v2:
- None
Changes in v3:
- Uses new index write feature (Ville)
Changes in v4:
- None
Changes in v5:
- checkpatch whitespace fix
Changes in v6:
- None
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20180108195545.218615-8-seanpaul@chromium.org
We have plenty of global registers and whatnot programmed without
any further locking by the modeset code. Currently non-bocking
modesets are allowed to execute in parallel which could corrupt
said registers.
To avoid the problem let's run all non-blocking modesets on an
ordered workqueue. We still put page flips etc. to system_unbound_wq
allowing page flips on one pipe to execute in parallel with page flips
or a modeset on a another pipe (assuming no known state is shared
between them, at which point they would have been added to the same
atomic commit and serialized that way).
Blocking modesets are already serialized with each other by
connection_mutex, and thus are safe. To serialize them with
non-blocking modesets we just flush the workqueue before executing
blocking modesets.
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Fixes: 94f050246b ("drm/i915: nonblocking commit")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171113133622.8593-1-ville.syrjala@linux.intel.com
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
We already have dedicated file for opregion related code, dedicated
header will make our life easier.
v2: reorder includes (Chris)
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171221185334.17396-4-michal.wajdeczko@intel.com
[ickle: quieten checkpatch]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171221215735.30314-3-chris@chris-wilson.co.uk
Convert intel_device_info_dump into pretty printer to be
consistent with the rest of the driver code.
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171219114346.26308-2-michal.wajdeczko@intel.com
We dump device flags in few places (init_early, debugfs, gpu_error)
using different functions. Lets add reusable function to avoid
code duplication.
add/remove: 1/0 grow/shrink: 0/3 up/down: 1296/-3572 (-2276)
Function old new delta
intel_device_info_dump_flags - 1296 +1296
i915_capabilities 2435 1353 -1082
i915_error_state_to_str 6642 5507 -1135
intel_device_info_dump 1507 152 -1355
Total: Before=1287992, After=1285716, chg -0.18%
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171219114346.26308-1-michal.wajdeczko@intel.com
Extract the timeout we use in i915_gem_idle_work_handler() and reuse it
for wait_for_engines() in i915_gem_wait_for_idle(). It too has the same
problem in sometimes having to wait for an extended period before the HW
settles, so make use of the same timeout.
References: 5427f20785 ("drm/i915: Bump wait-times for the final CS interrupt before parking")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171211194135.27095-1-chris@chris-wilson.co.uk
Keeps things consistent now that we make use of struct resource. This
should keep us covered in case we ever get huge amounts of stolen
memory.
v2: bunch of missing conversions (Chris)
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171211151822.20953-10-matthew.auld@intel.com
Kick it out of i915_ggtt and keep it grouped with dsm and dsm_reserved,
where it makes the most sense.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171211151822.20953-9-matthew.auld@intel.com
Now that we are using struct resource to track the stolen region, it is
more convenient if we track the reserved portion of that region in a
resource as well.
v2: s/<= end + 1/< end/ (Chris)
v3: prefer DEFINE_RES_MEM
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171211151822.20953-7-matthew.auld@intel.com
Now that we are using struct resource to track the stolen region, it is
more convenient if we track dsm in a resource as well.
v2: check range_overflow when writing to 32b registers (Chris)
pepper in some comments (Chris)
v3: refit i915_stolen_to_dma()
v4: kill ggtt->stolen_size
v5: some more polish
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171211151822.20953-6-matthew.auld@intel.com
It seems that the DMC likes to transition between the DC states a lot when
there are no connected displays (no active power domains) during command
submission.
This activity on DC states has a negative impact on the performance of the
chip with huge latencies observed in the interrupt handlers and elsewhere.
Simple tests like igt/gem_latency -n 0 are slowed down by a factor of
eight.
Work around it by introducing a new power domain named,
POWER_DOMAIN_GT_IRQ, associtated with the "DC off" power well, which is
held for the duration of command submission activity.
CNL has the same problem which will be addressed as a follow-up. Doing
that requires a fix for a DC6 context corruption problem in the CNL DMC
firmware which is yet to be released.
v2:
* Add commit text as comment in i915_gem_mark_busy. (Chris Wilson)
* Protect macro body with braces. (Jani Nikula)
v3:
* Add dedicated power domain for clarity. (Chris, Imre)
* Commit message and comment text updates.
* Apply to all big-core GEN9 parts apart for Skylake which is pending DMC
firmware release.
v4:
* Power domain should be inner to device runtime pm. (Chris)
* Simplify NEEDS_CSR_GT_PERF_WA macro. (Chris)
* Handle async DMC loading by moving the GT_IRQ power domain logic into
intel_runtime_pm. (Daniel, Chris)
* Include small core GEN9 as well. (Imre)
v5
* Special handling for async DMC load is not needed since on failure the
power domain reference is kept permanently taken. (Imre)
v6:
* Drop the NEEDS_CSR_GT_PERF_WA macro since all firmwares have now been
deployed. (Imre, Chris)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100572
Testcase: igt/gem_exec_nop/headless
Cc: Imre Deak <imre.deak@intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> (v5)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[Imre: Add note about applying the WA on CNL as a follow-up]
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171205132854.26380-1-tvrtko.ursulin@linux.intel.com
As writes through the GTT and GGTT PTE updates do not share the same
path, they are not strictly ordered and so we must explicitly flush the
indirect writes prior to modifying the PTE. We do track outstanding GGTT
writes on the object itself, but since the object may have multiple GGTT
vma, that is overly coarse as we can track and flush individual vma as
required.
Whilst here, update the GGTT flushing behaviour for Cannonlake.
v2: Hard-code ring offset to allow use during unload (after RCS may have
been freed, or never existed!)
References: https://bugs.freedesktop.org/show_bug.cgi?id=104002
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171206124914.19960-2-chris@chris-wilson.co.uk
We currently have two module parameters that control GuC:
"enable_guc_loading" and "enable_guc_submission". Whenever
we need submission=1, we also need loading=1. We also need
loading=1 when we want to want to load and verify the HuC.
Lets combine above module parameters into one "enable_guc"
modparam. New supported bit values are:
0=disable GuC (no GuC submission, no HuC)
1=enable GuC submission
2=enable HuC load
Special value "-1" can be used to let driver decide what
option should be enabled for given platform based on
hardware/firmware availability or preference.
Explicit enabling any of the GuC features makes GuC load
a required step, fallback to non-GuC mode will not be
supported.
v2: Don't use -EIO
v3: define modparam bits (Chris)
v4: rely on implicit cast (Chris)
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171206135316.32556-6-michal.wajdeczko@intel.com
In the upcoming patch we will change the way how to recognize
when GuC is in use. Using helper macros will minimize scope
of that changes. While here, update dev_info message.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171206135316.32556-3-michal.wajdeczko@intel.com
Doing HuC firmware path selection from sanitize_options function
is not perfect, while there is no problem with doing so during
early init stage as we already have all needed data.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171206135316.32556-1-michal.wajdeczko@intel.com
It has been many years since the last confirmed sighting (and fix) of an
RC6 related bug (usually a system hang). Remove the parameter to stop
users from setting dangerous values, as they often set it during triage
and end up disabling the entire runtime pm instead (the option is not a
fine scalpel!).
Furthermore, it allows users to set known dangerous values which were
intended for testing and not for production use. For testing, we can
always patch in the required setting without having to expose ourselves
to random abuse.
v2: Fixup NEEDS_WaRsDisableCoarsePowerGating fumble, and document the
lack of ilk support better.
v3: Clear intel_info->rc6p if we don't support rc6 itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171201113030.18360-1-chris@chris-wilson.co.uk
Since the shrinker is registered and unregistered during
i915_driver_register and i915_driver_unregister, respectively, rename
the init/cleanup functions to match.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171123115338.10270-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
It may be of interest to both compare the active HW context against the
default (aka NULL) context, to see what has been changed and if either are
corrupt.
v2: Rename the fake vma as fake.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171126220901.14735-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
From: Chris Wilson <chris@chris-wilson.co.uk>
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
From: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
The first goal is to be able to measure GPU (and invidual ring) busyness
without having to poll registers from userspace. (Which not only incurs
holding the forcewake lock indefinitely, perturbing the system, but also
runs the risk of hanging the machine.) As an alternative we can use the
perf event counter interface to sample the ring registers periodically
and send those results to userspace.
Functionality we are exporting to userspace is via the existing perf PMU
API and can be exercised via the existing tools. For example:
perf stat -a -e i915/rcs0-busy/ -I 1000
Will print the render engine busynnes once per second. All the performance
counters can be enumerated (perf list) and have their unit of measure
correctly reported in sysfs.
v1-v2 (Chris Wilson):
v2: Use a common timer for the ring sampling.
v3: (Tvrtko Ursulin)
* Decouple uAPI from i915 engine ids.
* Complete uAPI defines.
* Refactor some code to helpers for clarity.
* Skip sampling disabled engines.
* Expose counters in sysfs.
* Pass in fake regs to avoid null ptr deref in perf core.
* Convert to class/instance uAPI.
* Use shared driver code for rc6 residency, power and frequency.
v4: (Dmitry Rogozhkin)
* Register PMU with .task_ctx_nr=perf_invalid_context
* Expose cpumask for the PMU with the single CPU in the mask
* Properly support pmu->stop(): it should call pmu->read()
* Properly support pmu->del(): it should call stop(event, PERF_EF_UPDATE)
* Introduce refcounting of event subscriptions.
* Make pmu.busy_stats a refcounter to avoid busy stats going away
with some deleted event.
* Expose cpumask for i915 PMU to avoid multiple events creation of
the same type followed by counter aggregation by perf-stat.
* Track CPUs getting online/offline to migrate perf context. If (likely)
cpumask will initially set CPU0, CONFIG_BOOTPARAM_HOTPLUG_CPU0 will be
needed to see effect of CPU status tracking.
* End result is that only global events are supported and perf stat
works correctly.
* Deny perf driver level sampling - it is prohibited for uncore PMU.
v5: (Tvrtko Ursulin)
* Don't hardcode number of engine samplers.
* Rewrite event ref-counting for correctness and simplicity.
* Store initial counter value when starting already enabled events
to correctly report values to all listeners.
* Fix RC6 residency readout.
* Comments, GPL header.
v6:
* Add missing entry to v4 changelog.
* Fix accounting in CPU hotplug case by copying the approach from
arch/x86/events/intel/cstate.c. (Dmitry Rogozhkin)
v7:
* Log failure message only on failure.
* Remove CPU hotplug notification state on unregister.
v8:
* Fix error unwind on failed registration.
* Checkpatch cleanup.
v9:
* Drop the energy metric, it is available via intel_rapl_perf.
(Ville Syrjälä)
* Use HAS_RC6(p). (Chris Wilson)
* Handle unsupported non-engine events. (Dmitry Rogozhkin)
* Rebase for intel_rc6_residency_ns needing caller managed
runtime pm.
* Drop HAS_RC6 checks from the read callback since creating those
events will be rejected at init time already.
* Add counter units to sysfs so perf stat output is nicer.
* Cleanup the attribute tables for brevity and readability.
v10:
* Fixed queued accounting.
v11:
* Move intel_engine_lookup_user to intel_engine_cs.c
* Commit update. (Joonas Lahtinen)
v12:
* More accurate sampling. (Chris Wilson)
* Store and report frequency in MHz for better usability from
perf stat.
* Removed metrics: queued, interrupts, rc6 counters.
* Sample engine busyness based on seqno difference only
for less MMIO (and forcewake) on all platforms. (Chris Wilson)
v13:
* Comment spelling, use mul_u32_u32 to work around potential GCC
issue and somne code alignment changes. (Chris Wilson)
v14:
* Rebase.
v15:
* Rebase for RPS refactoring.
v16:
* Use the dynamic slot in the CPU hotplug state machine so that we are
free to setup our state as multi-instance. Previously we were re-using
the CPUHP_AP_PERF_X86_UNCORE_ONLINE slot which is neither used as
multi-instance, nor owned by our driver to start with.
* Register the CPU hotplug handlers after the PMU, otherwise the callback
will get called before the PMU is initialized which can end up in
perf_pmu_migrate_context with an un-initialized base.
* Added workaround for a probable bug in cpuhp core.
v17:
* Remove workaround for the cpuhp bug.
v18:
* Rebase for drm_i915_gem_engine_class getting upstream before us.
v19:
* Rebase. (trivial)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-2-tvrtko.ursulin@linux.intel.com
Stop using the old for_each_intel_plane_in_state() type iteration
macro and replace it with for_each_new_intel_plane_in_state().
And similarly replace drm_atomic_get_existing_crtc_state() with
intel_atomic_get_new_crtc_state(). Switch over to intel_ types
as well to make the code less cluttered.
v2: s/plane/i9xx_plane/ etc. (James)
Cc: James Ausmus <james.ausmus@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20171117191917.11506-8-ville.syrjala@linux.intel.com
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Replace the 0 and 1 with PLANE_A and PLANE_B in the pre-g4x wm code.
v2: s/old_plane_id/i9xx_plane_id/ (Daniel)
v3: s/plane/i9xx_plane/ etc. (James)
Cc: James Ausmus <james.ausmus@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20171117191917.11506-5-ville.syrjala@linux.intel.com
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Rename enum plane to enum i9xx_plane_id to make it clear that it only
applies to pre-SKL platforms.
enum i9xx_plane_id is a global identifier, whereas enum plane_id is
per-pipe. We need the old global identifier to index the primary plane
(and the pre-g4x sprite C if we ever expose it) registers on pre-SKL
platforms.
v2: Reorder patches
v3: s/old_plane_id/i9xx_plane_id/ (Daniel)
Pimp the commit message a bit
Note that i9xx_plane_id doesn't apply to SKL+
v4: Rebase due to power domain handling in plane readout
v5: Rebase due to crtc->dspaddr_offset removal
v6: s/plane/i9xx_plane/ etc. (James)
Cc: James Ausmus <james.ausmus@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20171117191917.11506-4-ville.syrjala@linux.intel.com
Reviewed-by: James Ausmus <james.ausmus@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Having disabled the broken semaphores on Sandybridge, there is no need
for a modparam any more, so remove it in favour of a simple
HAS_LEGACY_SEMAPHORES() guard.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120205504.21892-5-chris@chris-wilson.co.uk
Since removing the module parameter to force selection of ringbuffer
emission for gen8, the code is defunct. Remove it.
To put the difference into perspective, a couple of microbenchmarks
(bdw i7-5557u, 20170324):
ring execlists
exec continuous nops on all rings: 1.491us 2.223us
exec sequential nops on each ring: 12.508us 53.682us
single nop + sync: 9.272us 30.291us
vblank_mode=0 glxgears: ~11000fps ~9000fps
Since the earlier submission, gen8 ringbuffer submission has fallen
further and further behind in features. So while ringbuffer may hold the
throughput crown, in terms of interactive latency, execlists is much
better. Alas, we have no convenient metrics for such, other than
demonstrating things we can do with execlists but can not using
legacy ringbuffer submission.
We have made a few improvements to lowlevel execlists throughput,
and ringbuffer currently panics on boot! (bdw i7-5557u, 20171026):
ring execlists
exec continuous nops on all rings: n/a 1.921us
exec sequential nops on each ring: n/a 44.621us
single nop + sync: n/a 21.953us
vblank_mode=0 glxgears: n/a ~18500fps
References: https://bugs.freedesktop.org/show_bug.cgi?id=87725
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Once-upon-a-time-Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120205504.21892-2-chris@chris-wilson.co.uk
Execlists and legacy ringbuffer submission are no longer feature
comparable (execlists now offer greater functionality that should
overcome their performance hit) and obsoletes the unsafe module
parameter, i.e. comparing the two modes of execution is no longer
useful, so remove the debug tool.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> #i915_perf.c
Link: https://patchwork.freedesktop.org/patch/msgid/20171120205504.21892-1-chris@chris-wilson.co.uk
Now that we have this stored in the device info, we can drop it from perf
part of the driver.
Note that this requires to init perf after we've computed the frequency,
hence why we move i915_perf_init() from i915_driver_init_early() to after
intel_device_info_runtime_init().
v2: Use div_u64 (Chris)
v3: Drop u64 divs by switching to kHz (Chris/Ville)
Move i915_perf_fini to i915_driver_cleanup_hw (Matthew)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171113181902.12411-2-lionel.g.landwerlin@intel.com
ERROR: "__udivdi3" [drivers/gpu/drm/i915/i915.ko] undefined!
ERROR: "__divdi3" [drivers/gpu/drm/i915/i915.ko] undefined!
Store the frequency in kHz and drop 64bit divisions.
v2: Use div64_u64 (Matthew)
v3: store frequency in kHz to avoid 64bit divs (Chris/Ville)
Fixes: dab9178333 ("drm/i915: expose command stream timestamp frequency to userspace")
Reported-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171113233455.12085-3-lionel.g.landwerlin@intel.com
Reviewed-by: Ewelina Musial <ewelina.musial@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
We use to have this fixed per generation, but starting with CNL userspace
cannot tell just off the PCI ID. Let's make this information available. This
is particularly useful for performance monitoring where much of the
normalization work is done using those timestamps (this include pipeline
statistics in both GL & Vulkan as well as OA reports).
v2: Use variables for 24MHz/19.2MHz values (Ewelina)
Renamed function & coding style (Sagar)
v3: Fix frequency read on Broadwell (Sagar)
Fix missing divide by 4 on <= gen4 (Sagar)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110190845.32574-7-lionel.g.landwerlin@intel.com
Now that we always execute a context switch upon module load, there is
no need to queue a delayed task for doing so. The purpose of the delayed
task is to enable GT powersaving, for which we need the HW state to be
valid (i.e. having loaded a context and initialised basic state). We
used to defer this operation as historically it was slow (due to slow
register polling, fixed with commit 1758b90e38 ("drm/i915: Use a hybrid
scheme for fast register waits")) but now we have a requirement to save
the default HW state.
v2: Load the kernel context (to provide the power context) upon resume.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171112112738.1463-3-chris@chris-wilson.co.uk
As we now record the default HW state and so only emit the "golden"
renderstate once to prepare the HW, there is no advantage in keeping the
renderstate batch around as it will never be used again.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-8-chris@chris-wilson.co.uk
intel_modeset_gem_init() now only sets up the legacy overlay, so let's
remove the function and call the setup directly during driver load. This
should help us find a better point in the initialisation sequence for it
later.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-5-chris@chris-wilson.co.uk
Rather than digging through encoder->crtc and crtc->config in the
DPIO PHY functions, pass down the correct crtc state from the caller.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171031205123.13123-6-ville.syrjala@linux.intel.com
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
We keep details of GuC and HuC in separate error state struct.
Make GuC log part of it to group all related data together.
Since we are printing uC details at the end, with this change
GuC log will be moved there too.
v2: comment on new placement of the log (Chris)
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171026173657.49648-2-michal.wajdeczko@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Include GuC and HuC firmware details in captured error state
to provide additional debug information. To reuse existing
uc firmware pretty printer, introduce new drm-printer variant
that works with our i915_error_state_buf output. Also update
uc firmware pretty printer to accept const input.
v2: don't rely on current caps (Chris)
dump correct fw info (Michal)
v3: simplify capture of custom paths (Chris)
v4: improve 'why' comment (Joonas)
trim output if no fw path (Michal)
group code around uc error state (Michal)
v5: use error in cleanup_uc (Michal)
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171026173657.49648-1-michal.wajdeczko@intel.com
[ickle: allow printing uc_fw after allocation failure]
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This patch adds per engine reset and recovery (TDR) support when GuC is
used to submit workloads to GPU.
In the case of i915 directly submission to ELSP, driver manages hang
detection, recovery and resubmission. With GuC submission these tasks
are shared between driver and GuC. i915 is still responsible for detecting
a hang, and when it does it only requests GuC to reset that Engine. GuC
internally manages acquiring forcewake and idling the engine before
resetting it.
Once the reset is successful, i915 takes over again and handles the
resubmission. The scheduler in i915 knows which requests are pending so
after resetting a engine, pending workloads/requests are resubmitted
again.
v2: s/i915_guc_request_engine_reset/i915_guc_reset_engine/ to match the
non-guc function names.
v3: Removed debug message about engine restarting from which request,
since the new baseline do it regardless of submission mode. (Chris)
v4: Rebase.
v5: Do not pass unnecessary reporting flags to the fw (Jeff);
tasklet_schedule(&execlists->irq_tasklet) handles the resubmit; rebase.
v6: Rename the existing reset engine function and share a similar
interface between guc and non-guc paths (Chris).
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171031225309.10888-1-michel.thierry@intel.com
Reviewed-by: Jeff McGee <jeff.mcgee@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
intel_guc_reset sounds more like the microcontroller is the one performing
a reset, while in this case is the opposite. intel_reset_guc not only
makes it clearer, it follows the other intel_reset functions available.
v2: Print error message in English (Tvrtko).
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171030185616.32836-2-michel.thierry@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Explicitly pass the crtc and connector states into the audio
code enable/disable hooks, and plumb them all the way down.
This gets rid of almost all crtc->config and encoder->crtc
uses. The one place where we still use them is
i915_audio_component_sync_audio_rate() since that gets called from
the audio driver and we don't have explicit states around then.
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171030184654.17429-1-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Starting from version 204 VBT can specify the max TMDS clock we are
allowed to use with HDMI ports. Parse that information and take it
into account when filtering modes and computing a crtc state.
Also take the opportunity to sort the platform check if ladder
from new to old.
v2: Add defines for the values into intel_vbt_defs.h (Jani)
Don't fall back to 0 silently for unknown values (Jani)
Skip the debug print for the 0 case (Jani)
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171030145702.23662-1-ville.syrjala@linux.intel.com
Call the DDI .pre_pll_enable() hook from the MST code so that BXT gets
the correct lane latency optimal setting applied. And we obviously need
to compute the correct value, and read it out to keep the state checker
happy.
While at it drop the useless 'encoder' parameter to
bxt_ddi_phy_calc_lane_lat_optim_mask()
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171027134348.31190-1-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
We would also like to make use of execlist_cancel_port_requests and
unwind_incomplete_requests in GuC preemption backend.
Let's rename the functions to use the correct prefixes, so that we can
simply add the declarations in the following patch.
Similar thing for applies for can_preempt, except we're introducing
HAS_LOGICAL_RING_PREEMPTION macro instad, converting other users that
were previously touching device info directly.
v2: s/intel_engine/execlists and pass execlists to unwind (Chris)
v3: use locked version for exporting, drop const qual (Chris)
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171025200020.16636-11-michal.winiarski@intel.com
On CNL we may need to bump up the system agent voltage not only due
to CDCLK but also when driving DDI port with a sufficiently high clock.
To that end start tracking the minimum acceptable voltage for each crtc.
We do the tracking via crtcs because we don't have any kind of encoder
state. Also there's no downside to doing it this way, and it matches how
we track cdclk requirements on account of pixel rate.
v2: Allow disabled crtcs to use the min voltage
Add IS_CNL check to intel_ddi_compute_min_voltage() since
we're using CNL specific values there
s/intel_compute_min_voltage/cnl_compute_min_voltage/ since
the function makes hw specific assumptions about the voltage
values
v3: Drop the test hack leftovers from skl_modeset_calc_cdclk()
v4: s/voltage/voltage_level/ (Rodrigo)
Replace DPLL DVFS FIXMEs with an explanation why we don't
do anything there (Rodrigo)
Cc: Mika Kahola <mika.kahola@intel.com>
Cc: Manasi Navare <manasi.d.navare@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171024095216.1638-9-ville.syrjala@linux.intel.com
For CNL we'll need to start considering the port clocks when we select
the voltage level for the system agent. To that end start tracking the
voltage in the cdclk state (since that already has to adjust it).
v2: s/voltage/voltage_level/ (Rodrigo)
Cc: Mika Kahola <mika.kahola@intel.com>
Cc: Manasi Navare <manasi.d.navare@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171024095216.1638-3-ville.syrjala@linux.intel.com
This patch parse DSI backlight/cabc ports info from
VBT and save them inside local structure. This saved info
can be directly used while initializing DSI for different
platforms instead of parsing for each platform.
V2: Changes:
- Typo fix in commit message.
- Move up newly added port variables (Jani N)
- Remove redundant initialization (Jani N)
- Don't parse CABC ports if not supported (Jani N)
V3: Patch restructure (Suggested by Jani N)
Credits-to: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Madhav Chauhan <madhav.chauhan@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1507898700-20016-1-git-send-email-madhav.chauhan@intel.com
They're unused and unsupported. Leave the reduced_clock pointers in
place still, should they prove useful later on.
v2: go from nuking DDI lowfreq_avail to nuking it entirely (Ville)
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171017140234.20677-1-jani.nikula@intel.com
Now that we write RING_FORCE_TO_NONPRIV registers directly to hardware,
[commit 32ced39 ("drm/i915: Transform whitelisting WAs into a simple reg
write")] there is no need to save space for them in the list of context
workarounds.
v2: Refer to previous commit in commit message (Michel)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1508272071-15125-1-git-send-email-oscar.mateo@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Inspired by Tvrtko's critique of the reaping of the stale contexts
before allocating a new one, also limit the freed object reaping to the
oldest stale object before allocating a fresh object. Unlike contexts,
objects may have radically different sizes of backing storage, but
similar to contexts, while we want to prevent starvation due to
excessive freed lists, we also do not want to delay fresh allocations
for too long. Only freeing the oldest on the freed object list before
each allocation is a reasonable compromise.
v2: Only a single consumer of llist_del_first() is allowed (although
multiple llist_add are still allowed in parallel). Unlike
i915_gem_context, i915_gem_flush_free_objects() is itself not serialized
and so we need to add our own spinlock. Otherwise KASAN eventually spots
a use-after-free for the race on *first->next.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v1
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171013202621.7276-8-chris@chris-wilson.co.uk
Remove the struct_mutex requirement around dev_priv->mm.bound_list and
dev_priv->mm.unbound_list by giving it its own spinlock. This reduces
one more requirement for struct_mutex and in the process gives us
slightly more accurate unbound_list tracking, which should improve the
shrinker - but the drawback is that we drop the retirement before
counting so i915_gem_object_is_active() may be stale and lead us to
underestimate the number of objects that may be shrunk (see commit
bed50aea61 ("drm/i915/shrinker: Flush active on objects before
counting")).
v2: Crosslink the spinlock to the lists it protects, and btw this
changes s/obj->global_link/obj->mm.link/
v3: Fix decoupling of old links in i915_gem_object_attach_phys()
v3.1: Fix the fix, only unlink if it was linked
v3.2: Use a local for to_i915(obj->base.dev)->mm.obj_lock
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171016114037.5556-1-chris@chris-wilson.co.uk
Since we occasionally stuff an error pointer into obj->mm.pages for a
semi-permanent or even permanent failure, we have to be more careful and
not just test against NULL when deciding if the object has a complete
set of its concurrent pages.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171013202621.7276-1-chris@chris-wilson.co.uk
Apparently RC6 residency is lower than expected
with EI mode for most of the cases on CNL A0, B0 and C0.
This Wa doesn't solve our lower residency, but I
believe it is better to have it since EI is not
expected to work by HW engineers anyways.
Cc: David Weinehall <david.weinehall@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170822235828.18322-1-rodrigo.vivi@intel.com
Defined new struct intel_rc6 to hold RC6 specific state and
intel_ring_pstate to hold ring specific state.
v2: s/intel_ring_pstate/intel_llc_pstate. Removed checks from
autoenable_* functions. (Chris)
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/1507360055-19948-13-git-send-email-sagar.a.kamble@intel.com
Acked-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171010213010.7415-12-chris@chris-wilson.co.uk
Prepared substructure rps for RPS related state. autoenable_work is
used for RC6 too hence it is defined outside rps structure. As we do
this lot many functions are refactored to use intel_rps *rps to access
rps related members. Hence renamed intel_rps_client pointer variables
to rps_client in various functions.
v2: Rebase.
v3: s/pm/gt_pm (Chris)
Refactored access to rps structure by declaring struct intel_rps * in
many functions.
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com> #1
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/1507360055-19948-9-git-send-email-sagar.a.kamble@intel.com
Acked-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171010213010.7415-8-chris@chris-wilson.co.uk
In order to separate GT PM related functionality into new structure
we are updating rps structure. hw_lock in it is used for display
related PCU communication too hence move it to dev_priv.
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/1507360055-19948-8-git-send-email-sagar.a.kamble@intel.com
Acked-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171010213010.7415-7-chris@chris-wilson.co.uk
It's a little unclear what the sg_mask actually is, so prefer the more
meaningful name of sg_page_sizes.
Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171009110024.29114-1-matthew.auld@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Following the pattern now used for obj->mm.pages, use just pin_fence and
unpin_fence to control access to the fence registers. I.e. instead of
calling get_fence(); pin_fence(), we now just need to call pin_fence().
This will make it easier to reduce the locking requirements around
fence registers.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171009084401.29090-2-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
In preparation for supporting huge gtt pages for the ppgtt, we introduce
page size members for gem objects. We fill in the page sizes by
scanning the sg table.
v2: pass the sg_mask to set_pages
v3: calculate the sg_mask inline with populating the sg_table where
possible, and pass to set_pages along with the pages.
v4: bunch of improvements from Joonas
v5: fix num_pages blunder
introduce i915_sg_page_sizes helper
v6: prefer GEM_BUG_ON(sizes == 0)
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006145041.21673-7-matthew.auld@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006221833.32439-6-chris@chris-wilson.co.uk
In preparation for huge gtt pages expose page_sizes as part of the
device info, to indicate the page sizes supported by the HW. Currently
only 4K is supported.
v2: s/page_size_mask/page_sizes/
v3: introduce I915_GTT_MAX_PAGE_SIZE
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006145041.21673-5-matthew.auld@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006221833.32439-4-chris@chris-wilson.co.uk
Not a fully blown gemfs, just our very own tmpfs kernel mount. Doing so
moves us away from the shmemfs shm_mnt, and gives us the much needed
flexibility to do things like set our own mount options, namely huge=
which should allow us to enable the use of transparent-huge-pages for
our shmem backed objects.
v2: various improvements suggested by Joonas
v3: move gemfs instance to i915.mm and simplify now that we have
file_setup_with_mnt
v4: fallback to tmpfs shm_mnt upon failure to setup gemfs
v5: make tmpfs fallback kinder
v5: better gemfs failure message
flags variable
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>
Cc: linux-mm@kvack.org
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006145041.21673-3-matthew.auld@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006221833.32439-2-chris@chris-wilson.co.uk
With preemption, we will want to "unsubmit" a request, taking it back
from the hw and returning it to the priority sorted execution list. In
order to know where to insert it into that list, we need to remember
its adjust priority (which may change even as it was being executed).
This also affects reset for execlists as we are now unsubmitting the
requests following the reset (rather than directly writing the ELSP for
the inflight contexts). This turns reset into an accidental preemption
point, as after the reset we may choose a different pair of contexts to
submit to hw.
GuC is not updated as this series doesn't add preemption to the GuC
submission, and so it can keep benefiting from the early pruning of the
DFS inside execlists_schedule() for a little longer. We also need to
find a way of reducing the cost of that DFS...
v2: Include priority in error-state
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171003203453.15692-6-chris@chris-wilson.co.uk
Add another perma-pinned context for using for preemption at any time.
We cannot just reuse the existing kernel context, as first and foremost
we need to ensure that we can preempt the kernel context itself, so
require a distinct context id. Similar to the kernel context, we may
want to interrupt execution and switch to the preempt context at any
time, and so it needs to be permanently pinned and available.
To compensate for yet another permanent allocation, we shrink the
existing context and the new context by reducing their ringbuffer to the
minimum.
v2: Assert that we never allocate a request from the preemption context.
v3: Limit perma-pin to engines that may preempt.
v4: Onion cleanup for early driver death
v5: Onion ordering in main driver cleanup as well.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171003203453.15692-4-chris@chris-wilson.co.uk
If we store the platform as a bitmask, and convert the
IS_PLATFORM macro to use it, we allow the compiler to
merge the IS_PLATFORM(a) || IS_PLATFORM(b) || ... checks
into a single conditional.
As a secondary benefit this saves almost 1k of text:
text data bss dec hex filename
-1460254 60014 3656 1523924 1740d4 drivers/gpu/drm/i915/i915.ko
+1459260 60026 3656 1522942 173cfe drivers/gpu/drm/i915/i915.ko
v2: Removed the infamous -1.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170927164138.15474-1-tvrtko.ursulin@linux.intel.com
Getting started with v4.15 features:
- Cannonlake workarounds (Rodrigo, Oscar)
- Infoframe refactoring and fixes to enable infoframes for DP (Ville)
- VBT definition updates (Jani)
- Sparse warning fixes (Ville, Chris)
- Crtc state usage fixes and cleanups (Ville)
- DP vswing, pre-emph and buffer translation refactoring and fixes (Rodrigo)
- Prevent IPS from interfering with CRC capture (Ville, Marta)
- Enable Mesa to advertise ARB_timer_query (Nanley)
- Refactor GT number into intel_device_info (Lionel)
- Avoid eDP DP AUX CH timeouts harder (Manasi)
- CDCLK check improvements (Ville)
- Restore GPU clock boost on missed pageflip vblanks (Chris)
- Fence register reservation API for vGPU (Changbin)
- First batch of CCS fixes (Ville)
- Finally, numerous GEM fixes, cleanups and improvements (Chris)
* tag 'drm-intel-next-2017-09-07' of git://anongit.freedesktop.org/git/drm-intel: (100 commits)
drm/i915: Update DRIVER_DATE to 20170907
drm/i915/cnl: WaThrottleEUPerfToAvoidTDBackPressure:cnl(pre-prod)
drm/i915: Lift has-pinned-pages assert to caller of ____i915_gem_object_get_pages
drm/i915: Display WA #1133 WaFbcSkipSegments:cnl, glk
drm/i915/cnl: Allow the reg_read ioctl to read the RCS TIMESTAMP register
drm/i915: Move device_info.has_snoop into the static tables
drm/i915: Disable MI_STORE_DATA_IMM for i915g/i915gm
drm/i915: Re-enable GTT following a device reset
drm/i915/cnp: Wa 1181: Fix Backlight issue
drm/i915: Annotate user relocs with __user
drm/i915: Constify load detect mode
drm/i915/perf: Remove __user from u64 in drm_i915_perf_oa_config
drm/i915: Silence sparse by using gfp_t
drm/i915: io unmap functions want __iomem
drm/i915: Add __rcu to radix tree slot pointer
drm/i915: Wake up the device for the fbdev setup
drm/i915: Add interface to reserve fence registers for vGPU
drm/i915: Use correct path to trace include
drm/i915: Fix the missing PPAT cache attributes on CNL
drm/i915: Fix enum pipe vs. enum transcoder for the PCH transcoder
...
UAPI Changes:
Cross-subsystem Changes:
Core Changes:
- DP SDP defines (Ville)
- polish for scdc helpers (Thierry Reding)
- fix lifetimes for connector/plane state across crtc changes (Maarten
Lankhorst).
- sparse fixes (Ville+Thierry)
- make legacy kms ioctls all interruptible (Maarten)
- push edid override into the edid helpers (out of probe helpers)
(Jani)
- DP ESI defines for link status (DK)
Driver Changes:
- drm-panel is now in drm-misc!
- minor panel-simple cleanups/refactoring by various folks
- drm_bridge_add cleanup (Inki Dae)
- constify a few i2c_device_id structs (Arvind Yadav)
- More patches from Noralf's fb/gem helper cleanup
- bridge/synopsis: reset fix (Philippe Cornu)
- fix tracepoint include handling in drivers (Thierry)
- rockchip: lvds support (Sandy Huang)
- move sun4i into drm-misc fold (Maxime Ripard)
- sun4i: refactor driver load + support TCON backend/layer muxing
(Chen-Yu Tsai)
- pl111: support more pl11x variants (Linus Walleij)
- bridge/adv7511: robustify probing/edid handling (Lars-Petersen
Clausen)
New hw support:
- S6E63J0X03 panel (Hoegeun Kwon)
- OTM8009A panel (Philippe CORNU)
- Seiko 43WVF1G panel (Marco Franchi)
- tve200 driver (Linus Walleij)
Plus assorted of tiny patches all over, including our first outreachy
patches from applicants for the winter round!
* tag 'drm-misc-next-2017-09-20' of git://anongit.freedesktop.org/git/drm-misc: (101 commits)
drm: add backwards compatibility support for drm_kms_helper.edid_firmware
drm: handle override and firmware EDID at drm_do_get_edid() level
drm/dp: DPCD register defines for link status within ESI field
drm/rockchip: Replace dev_* with DRM_DEV_*
drm/tinydrm: Drop driver registered message
drm/gem-fb-helper: Use debug message on gem lookup failure
drm/imx: Use drm_gem_fb_create() and drm_gem_fb_prepare_fb()
drm/bridge: adv7511: Constify HDMI CODEC platform data
drm/bridge: adv7511: Enable connector polling when no interrupt is specified
drm/bridge: adv7511: Remove private copy of the EDID
drm/bridge: adv7511: Properly update EDID when no EDID was found
drm/crtc: Convert setcrtc ioctl locking to interruptible.
drm/atomic: Convert pageflip ioctl locking to interruptible.
drm/legacy: Convert setplane ioctl locking to interruptible.
drm/legacy: Convert cursor ioctl locking to interruptible.
drm/atomic: Convert atomic ioctl locking to interruptible.
drm/atomic: Prepare drm_modeset_lock infrastructure for interruptible waiting, v2.
drm/tve200: Clean up panel bridging
drm/doc: Update todo.rst
drm/dp/mst: Sideband message transaction to power up/down nodes
...
More effort to align members on 4-byte boundary helps with
code size a tiny bit:
text data bss dec hex filename
-1460454 60014 3656 1524124 17419c drivers/gpu/drm/i915/i915.ko
+1460254 60014 3656 1523924 1740d4 drivers/gpu/drm/i915/i915.ko
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170920092701.17963-3-tvrtko.ursulin@linux.intel.com
i830 seems to occasionally forget the PIPESTAT enable bits when
we read the register. These aren't the only registers on i830 that
have problems with RMW, as reading the double buffered plane
registers returns the latched value rather than the last written
value. So something similar is perhaps going on with PIPESTAT.
This corruption results on vblank interrupts occasionally turning off
on their own, which leads to vblank timeouts and generally a stuck
display subsystem.
So let's not RMW the pipestat enable bits, and instead use the cached
copy we have around.
Cc: Imre Deak <imre.deak@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170914151731.5034-1-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
As we emulate execlists on top of the GuC workqueue, it is not
restricted to just 2 ports and we can increase that number arbitrarily
to trade-off queue depth (i.e. scheduling latency) against pipeline
bubbles.
v2: rebase. better commit msg (Chris)
v3: rebase
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170922124307.10914-5-mika.kuoppala@intel.com
Our global struct with params is named exactly the same way
as new preferred name for the drm_i915_private function parameter.
To avoid such name reuse lets use different name for the global.
v5: pure rename
v6: fix
Credits-to: Coccinelle
@@
identifier n;
@@
(
- i915.n
+ i915_modparams.n
)
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Ville Syrjala <ville.syrjala@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170919193846.38060-1-michal.wajdeczko@intel.com
Now that we're not using MSI anymore on gen4 we can start
using GMBUS and AUX interrupts again. These were disabled on
account of them causing the hardware to somehow generate
legacy interrupts even when MSI was enabled.
See commit c12aba5aa0 ("drm/i915: stop using GMBUS IRQs on Gen4
chips") and commit 4e6b788c3f ("drm/i915: Disable dp aux irq on
g4x") for more details.
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170818183705.27850-17-ville.syrjala@linux.intel.com
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
The private PAT management is to support PPAT entry manipulation. Two
APIs are introduced for dynamically managing PPAT entries: intel_ppat_get
and intel_ppat_put.
intel_ppat_get will search for an existing PPAT entry which perfectly
matches the required PPAT value. If not, it will try to allocate a new
entry if there is any available PPAT indexs, or return a partially
matched PPAT entry if there is no available PPAT indexes.
intel_ppat_put will put back the PPAT entry which comes from
intel_ppat_get. If it's dynamically allocated, the reference count will
be decreased. If the reference count turns into zero, the PPAT index is
freed again.
Besides, another two callbacks are introduced to support the private PAT
management framework. One is ppat->update_hw(), which writes the PPAT
configurations in ppat->entries into HW. Another one is ppat->match, which
will return a score to show how two PPAT values match with each other.
v17:
- Refine the comparision of score of BDW. (Joonas)
v16:
- Fix a bug in PPAT match function of BDW. (Joonas)
v15:
- Refine some code flow. (Joonas)
v12:
- Fix a problem "not returning the entry of best score". (Zhenyu)
v7:
- Keep all the register writes unchanged in this patch. (Joonas)
v6:
- Address all comments from Chris:
http://www.spinics.net/lists/intel-gfx/msg136850.html
- Address all comments from Joonas:
http://www.spinics.net/lists/intel-gfx/msg136845.html
v5:
- Add check and warnnings for those platforms which don't have PPAT.
v3:
- Introduce dirty bitmap for PPAT registers. (Chris)
- Change the name of the pointer "dev_priv" to "i915". (Chris)
- intel_ppat_{get, put} returns/takes a const intel_ppat_entry *. (Chris)
v2:
- API re-design. (Chris)
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> #v7
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
[Joonas: Use BIT() in the enum in bdw_private_pat_match]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1505392783-4084-1-git-send-email-zhi.a.wang@intel.com
Split INTEL_GEN_MASK out of IS_GEN macro, and make it usable
within static declarations (unlike compound statements).
v2:
- s/combound/compound/ (Tvrtko)
- Fix whitespace (yes, we need automatic checkpatch.pl)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170913115255.13851-1-joonas.lahtinen@linux.intel.com
The engine also provides a mirror of the CSB write pointer in the HWSP,
but not of our read pointer. To take advantage of this we need to
remember where we read up to on the last interrupt and continue off from
there. This poses a problem following a reset, as we don't know where
the hw will start writing from, and due to the use of power contexts we
cannot perform that query during the reset itself. So we continue the
current modus operandi of delaying the first read of the context-status
read/write pointers until after the first interrupt. With this we should
now have eliminated all uncached mmio reads in handling the
context-status interrupt, though we still have the uncached mmio writes
for submitting new work, and many uncached mmio reads in the global
interrupt handler itself. Still a step in the right direction towards
reducing our resubmit latency, although it appears lost in the noise!
v2: Cannonlake moved the CSB write index
v3: Include the sw/hwsp state in debugfs/i915_engine_info
v4: Also revert to using CSB mmio for GVT-g
v5: Prevent the compiler reloading tail (Mika)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Acked-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170913085605.18299-6-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
The sgt iterators cause an
drivers/gpu/drm/i915/i915_gpu_error.c:846 i915_error_object_create() warn: statement has no effect 7
everywhere they are used. If we change the code slightly, we can achieve
the same increment without altering the output or raising a warning.
text data bss dec hex filename
1267906 20587 3168 1291661 13b58d before
1267906 20587 3168 1291661 13b58d after
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170913105754.4423-1-chris@chris-wilson.co.uk
Continue on VLV PSR split with vfunc, let's also create one
for enabling source.
Also since we are touching *_enable_source functions let's
fix a comment with wrong name for vlv's one.
v2: Fix typo on commit message (DK).
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Cc: Vathsala Nagaraju <vathsala.nagaraju@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170907230041.22978-12-rodrigo.vivi@intel.com
Continue on VLV PSR split with vfunc, let's also create
one for setting up VSC.
v2: Rebased on top of commit d2419ffc10 ("drm/i915: Plumb
crtc_state to PSR enable/disable")
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Cc: Vathsala Nagaraju <vathsala.nagaraju@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170907230041.22978-10-rodrigo.vivi@intel.com
VLV/CHV has a total different PSR implementation than the
other platforms, so let's start moving that to vfuncs.
Let's start with disable_src one.
v2: Rebased on top of commit d2419ffc10 ("drm/i915: Plumb
crtc_state to PSR enable/disable")
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Cc: Vathsala Nagaraju <vathsala.nagaraju@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170907230041.22978-3-rodrigo.vivi@intel.com
The next commit removes the wait for flip_done in in
drm_atomic_helper_commit_cleanup_done, but we need it for the tests
to pass. Instead of using complicated vblank tracking which ends
up being ignored anyway, call the correct atomic helper. :)
Changes since v1:
- Always call drm_atomic_helper_wait_for_flip_done,
even for legacy cursor updates. (danvet)
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170904104838.23822-2-maarten.lankhorst@linux.intel.com
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Pull i916 drm fixes from Rodrigo Vivi:
"Since Dave is on paternity leave we are sending drm/i915 fixes for
v4.14-rc1 directly to you as he had asked us to do.
The most critical ones are the GPU reset fix for gen2-4 and GVT fix
for a regression that is blocking gvt init to work on your tree.
The rest is general fixes for patches coming from drm-next"
Acked-by: Dave Airlie <airlied@redhat.com>
* tag 'drm-intel-next-fixes-2017-09-07' of git://anongit.freedesktop.org/git/drm-intel:
drm/i915: Re-enable GTT following a device reset
drm/i915: Annotate user relocs with __user
drm/i915: Silence sparse by using gfp_t
drm/i915: Add __rcu to radix tree slot pointer
drm/i915: Fix the missing PPAT cache attributes on CNL
drm/i915/gvt: Remove one duplicated MMIO
drm/i915: Fix enum pipe vs. enum transcoder for the PCH transcoder
drm/i915: Make i2c lock ops static
drm/i915: Make i9xx_load_ycbcr_conversion_matrix() static
drm/i915/edp: Increase T12 panel delay to 900 ms to fix DP AUX CH timeouts
drm/i915: Ignore duplicate VMA stored within the per-object handle LUT
drm/i915: Skip fence alignemnt check for the CCS plane
drm/i915: Treat fb->offsets[] as a raw byte offset instead of a linear offset
drm/i915: Always wake the device to flush the GTT
drm/i915: Recreate vmapping even when the object is pinned
drm/i915: Quietly cancel FBC activation if CRTC is turned off before worker
New Isochronous Priority Control (IPC) capability is introduced in newer
GEN platforms. This patch adds a device info flag to indicate if platform
supports IPC. Patch also sets this flag in supported platforms.
Signed-off-by: Mahesh Kumar <mahesh1.kumar@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170817134529.2839-7-mahesh1.kumar@intel.com
Plane configuration parameters doesn't change for each WM-level
calculation. Currently we compute same parameters 8 times for each
wm-level.
This patch optimizes it by calculating these parameters in beginning
& reuse during each level-wm calculation.
Changes since V1:
- rebase on top of Rodrigo's series for CNL
Signed-off-by: Mahesh Kumar <mahesh1.kumar@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170817134529.2839-3-mahesh1.kumar@intel.com
As per suggestion from Jani, cleanup the code. Cleanup includes
- Instead of left shifting & check, compare with U32/16_MAX
- Use typecast instead of clamp_t
Signed-off-by: Mahesh Kumar <mahesh1.kumar@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170817134529.2839-2-mahesh1.kumar@intel.com
With the addition of __sg_alloc_table_from_pages we can control
the maximum coalescing size and eliminate a separate path for
allocating backing store here.
Similar to 871dfbd67d ("drm/i915: Allow compaction upto
SWIOTLB max segment size") this enables more compact sg lists to
be created and so has a beneficial effect on workloads with many
and/or large objects of this class.
v2:
* Rename helper to i915_sg_segment_size and fix swiotlb override.
* Commit message update.
v3:
* Actually include the swiotlb override fix.
v4:
* Regroup parameters a bit. (Chris Wilson)
v5:
* Rebase for swiotlb_max_segment.
* Add DMA map failure handling as in abb0deacb5
("drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping").
v6: Handle swiotlb_max_segment() returning 1. (Joonas Lahtinen)
v7: Rebase.
v8: Commit spelling fix.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170803091417.23677-1-tvrtko.ursulin@linux.intel.com
shrink_slab() allows us to report back the number of objects we
successfully scanned (out of the target shrinkctl->nr_to_scan). As
report the number of pages owned by each GEM object as a separate item
to the shrinker, we cannot precisely control the number of shrinker
objects we scan on each pass; and indeed may free more than requested.
If we fail to tell the shrinker about the number of objects we process,
it will continue to hold a grudge against us as any objects left
unscanned are added to the next reclaim -- and so we will keep on
"unfairly" shrinking our own slab in comparison to other slabs.
Link: http://lkml.kernel.org/r/20170822135325.9191-2-chris@chris-wilson.co.uk
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Shaohua Li <shli@fb.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The early gen3 machines (i915g/Grantsdale and i915gm/Alviso) share a lot
of characteristics in their MI/GTT blocks with gen2, and in particular
can only use physical addresses in MI_STORE_DATA_IMM. This makes it
incompatible with our usage, so include those two machines in the
blacklist to prevent usage.
v2: Make it easy for gcc and rewrite it as a switch to save some space.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> #v1
Link: https://patchwork.freedesktop.org/patch/msgid/20170906152859.5304-1-chris@chris-wilson.co.uk
In the past, vGPU alloc fence registers by walking through mm.fence_list
to find fence which pin_count = 0 and vma is empty. vGPU may not find
enough fence registers this way. Because a fence can be bind to vma even
though it is not in using. We have found such failure many times these
days.
An option to resolve this issue is that we can force-remove fence from
vma in this case.
This patch added two new api to the fence management code:
- i915_reserve_fence() will try to find a free fence from fence_list
and force-remove vma if need.
- i915_unreserve_fence() reclaim a reserved fence after vGPU has
finished.
With this change, the fence management is more clear to work with vGPU.
GVTg do not need remove fence from fence_list in private.
v3: (Chris)
- Add struct_mutex lock assertion.
- Only count for unpinned fence.
v2: (Chris)
- Rename the new api for symmetry.
- Add safeguard to ensure at least 1 fence remained for host display.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/1504512061-5892-1-git-send-email-changbin.du@intel.com
Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Up to Coffeelake we could deduce this GT number from the device ID.
This doesn't seem to be the case anymore. This change reorders pciids
per GT and adds a gt field to intel_device_info. We set this field on
the following platforms :
- SNB/IVB/HSW/BDW/SKL/KBL/CFL/CNL
Before & After :
$ modinfo drivers/gpu/drm/i915/i915.ko | grep ^alias | wc -l
209
v2: Add SNB & IVB (Chris)
v3: Fix compilation error in early-quirks (Lionel)
v4: Fix inconsistency between FEATURE/PLATFORM macros (Ville)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170830161208.29221-2-lionel.g.landwerlin@intel.com
Make the min_pixclk thing less confusing by changing it to track
the minimum acceptable cdclk frequency instead. This means moving
the application of the guardbands to a slightly higher level from
the low level platform specific calc_cdclk() functions.
The immediate benefit is elimination of the confusing 2x factors
on GLK/CNL+ in the audio workarounds (which stems from the fact
that the pipes produce two pixels per clock).
v2: Keep cdclk higher on CNL to workaround missing DDI clock voltage handling
v3: Squash with the CNL cdclk limits patch (DK)
v4: s/intel_min_cdclk/intel_pixel_rate_to_cdclk/ (DK)
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170830185703.8189-1-ville.syrjala@linux.intel.com
Sometimes we know we are the only user of the bo, but since we take a
protective pin_pages early on, an attempt to change the vmap on the
object is denied because it is busy. i915_gem_object_pin_map() cannot
tell from our single pin_count if the operation is safe. Instead we must
pass that information down from the caller in the manner of
I915_MAP_OVERRIDE.
This issue has existed from the introduction of the mapping, but was
never noticed as the only place where this conflict might happen is for
cached kernel buffers (such as allocated by i915_gem_batch_pool_get()).
Until recently there was only a single user (the cmdparser) so no
conflicts ever occurred. However, we now use it to allocate batches for
different operations (using MAP_WC on !llc for writes) in addition to the
existing shadow batch (using MAP_WB for reads).
We could either keep both mappings cached, or use a different write
mechanism if we detect a MAP_WB already exists (i.e. clflush
afterwards), but as we haven't seen this issue in the wild (it requires
hitting the GPU reloc path in addition to the cmdparser) for simplicity
just allow the mappings to be recreated.
v2: Include the i915_MAP_OVERRIDE bit in the enum so the compiler knows
about all the valid values.
Fixes: 7dd4f6729f ("drm/i915: Async GPU relocation processing")
Testcase: igt/gem_lut_handle # byt, completely by accident
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170828104631.8606-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit a575c67617)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Sometimes we know we are the only user of the bo, but since we take a
protective pin_pages early on, an attempt to change the vmap on the
object is denied because it is busy. i915_gem_object_pin_map() cannot
tell from our single pin_count if the operation is safe. Instead we must
pass that information down from the caller in the manner of
I915_MAP_OVERRIDE.
This issue has existed from the introduction of the mapping, but was
never noticed as the only place where this conflict might happen is for
cached kernel buffers (such as allocated by i915_gem_batch_pool_get()).
Until recently there was only a single user (the cmdparser) so no
conflicts ever occurred. However, we now use it to allocate batches for
different operations (using MAP_WC on !llc for writes) in addition to the
existing shadow batch (using MAP_WB for reads).
We could either keep both mappings cached, or use a different write
mechanism if we detect a MAP_WB already exists (i.e. clflush
afterwards), but as we haven't seen this issue in the wild (it requires
hitting the GPU reloc path in addition to the cmdparser) for simplicity
just allow the mappings to be recreated.
v2: Include the i915_MAP_OVERRIDE bit in the enum so the compiler knows
about all the valid values.
Fixes: 7dd4f6729f ("drm/i915: Async GPU relocation processing")
Testcase: igt/gem_lut_handle # byt, completely by accident
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170828104631.8606-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
When FBC is enabled for linear, legacy Y-tiled and Yf-tiled
surfaces on gen9, the cfb stride must be programmed by SW as
cfb_stride = ceiling[(at least plane width in pixels)/
(32 * compression limit factor)] * 8
v2: Minor fix for a build error
v3: Fixed subject, register name and platform check (Ville)
v4: Added WA details in comment (Paulo)
v5:
- Read modified reg write to preserve other bit values (Paulo)
- Store modified stride value in reg_params (Paulo)
- Keep GLK out of the WA (Paulo)
v6:
- added additional field in reg_params for gen9_wa_cfb_stride (Paulo)
- Used appropriate bit mask while writing the register (Paulo)
v7 (from Paulo):
- Fix coding style and spacing issues.
- Mask the old values before writing.
- Bikeshed comments and unnecessary checks.
Signed-off-by: Praveen Paneri <praveen.paneri@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1502389833-32621-1-git-send-email-praveen.paneri@intel.com
All the child device config fields, including legacy, are now available
in the same struct, so use it for everything.
As this change touches plenty of code with "p_child", rename them to
"child" while at it. Also do some simple unification and constification
where not intrusive. This in the name of avoiding extra cleanup churn
for the same lines as here.
No functional changes.
Cc: Animesh Manna <animesh.manna@intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/103300a9ae8629624619fc8df2c533e745cc5a78.1503600621.git.jani.nikula@intel.com
We use WC pages for coherent writes into the ppGTT on !llc
architectures. However, to create a WC page requires a stop_machine(),
i.e. is very slow. To compensate we currently keep a per-vm cache of
recently freed pages, but we still see the slow startup of new contexts.
We can amoritize that cost slightly by allocating WC pages in small
batches (PAGEVEC_SIZE == 14) and since creating a WC page implies a
stop_machine() there is no penalty for keeping that stash global.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170822173828.5932-1-chris@chris-wilson.co.uk
This was the competing idea long ago, but it was only with the rewrite
of the idr as an radixtree and using the radixtree directly ourselves,
along with the realisation that we can store the vma directly in the
radixtree and only need a list for the reverse mapping, that made the
patch performant enough to displace using a hashtable. Though the vma ht
is fast and doesn't require any extra allocation (as we can embed the node
inside the vma), it does require a thread for resizing and serialization
and will have the occasional slow lookup. That is hairy enough to
investigate alternatives and favour them if equivalent in peak performance.
One advantage of allocating an indirection entry is that we can support a
single shared bo between many clients, something that was done on a
first-come first-serve basis for shared GGTT vma previously. To offset
the extra allocations, we create yet another kmem_cache for them.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170816085210.4199-5-chris@chris-wilson.co.uk
MI_STORE_DWORD_IMM just doesn't work on the video decode engine under
Sandybridge, so refrain from using it. Then switch the selftests over to
using the now common test prior to using MI_STORE_DWORD_IMM.
Fixes: 7dd4f6729f ("drm/i915: Async GPU relocation processing")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.13-rc1+
Link: https://patchwork.freedesktop.org/patch/msgid/20170816085210.4199-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Sometimes it would be most enlightening to debug systems by replacing
the VBT to be used. For example, in the referenced bug the BIOS provides
different VBT depending on the boot mode (UEFI vs. legacy). It would be
interesting to try the failing boot mode with the VBT from the working
boot, and see if that makes a difference.
Add a module parameter to load the VBT using the firmware loader, not
unlike the EDID firmware mechanism.
As a starting point for experimenting, one can pick up the BIOS provided
VBT from /sys/kernel/debug/dri/0/i915_opregion/i915_vbt.
v2: clarify firmware load return value check (Bob)
v3: kfree the loaded firmware blob
References: https://bugs.freedesktop.org/show_bug.cgi?id=97822#c83
Reviewed-by: Bob Paauwe <bob.j.paauwe@intel.com>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170817115209.25912-1-jani.nikula@intel.com
The wait-ioctl is optionally supplied a timeout with nanosecond
precision in a s64 field. We use nsecs_to_jiffies64() to convert that
into the jiffies consumed by the scheduler, but internally
nsecs_to_jiffies64() does not guard against overflow (as it's purpose is
for use by the scheduler and not drivers!). So we must guard against the
overflow ourselves, and in the process note that we may then return
much earlier than the timeout selected by the user, so don't report
ETIME unless we do hit the timeout. (Woe betold us though if the user
waits for a year (32bit) and the request is still not complete!)
v2: Refine overflow detection (to not include an overffow itself)
Reported-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20170811105731.9482-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Another month, another story in the cache coherency saga. This time, we
come to the realisation that i915_gem_object_is_coherent() has been
reporting whether we can read from the target without requiring a cache
invalidate; but we were using it in places for testing whether we could
write into the object without requiring a cache flush. So split the
tracking into two, one to decide before reads, one after writes.
See commit e27ab73d17 ("drm/i915: Mark CPU cache as dirty on every
transition for CPU writes") for the previous entry in this saga.
v2: Be verbose
v3: Remove unused function (i915_gem_object_is_coherent)
v4: Fix inverted coherency check prior to execbuf (from v2)
v5: Add comment for nasty code where we are optimising on gcc's behalf.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101109
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101555
Testcase: igt/kms_mmap_write_crc
Testcase: igt/kms_pwrite_crc
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Tested-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170811111116.10373-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
gvt-next-2017-08-15
gvt update for 4.14
- MMIO save/restore optimization (Changbin)
- Split workload scan vs. dispatch for more parallel exec (Ping)
- vGPU full 48bit ppgtt support (Joonas, Tina)
- vGPU hw id expose for perf (Zhenyu)
- other misc cleanup and fixes
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20170815023940.skhjfcsyrao7axqi@zhen-hp.sh.intel.com
Enable the guest i915 full ppgtt functionality when host can provide this
capability. vgt_caps is introduced to guest i915 driver to get the vgpu
capabilities from the device model. VGT_CPAS_FULL_PPGTT is one of the
capabilities type to let guest i915 dirver know that the guest i915 full
ppgtt is supported by device model.
Notice that the minor version of pvinfo isn't bumped because of this
vgt_caps introduction, due to older guest would be broken by simply
increasing the pvinfo version. Although the pvinfo minor version doesn't
increase, the compatibility won't be blocked. The compatibility is ensured
by checking the value of caps field in pvinfo. Zero means no full ppgtt
support and BIT(2) means this feature is provided.
Changes since v1:
- Use u32 instead of uint32_t (Joonas)
- Move VGT_CAPS_FULL_PPGTT introduction to this patch and use #define
instead of enum (Joonas)
- Rewrite the vgpu full ppgtt capability checking logic. (Joonas)
- Some coding style refine. (Joonas)
Changes since v2:
- Divide the whole patch set into two separate patch series, with one
patch in i915 side to check guest i915 full ppgtt capability and enable
it when this capability is supported by the device model, and the other
one in gvt side which fixs the blocking issue and enables the device
model to provide the capability to guest. And this patch focuses on guest
i915 side. (Joonas)
- Change the title from "introduce vgt_caps to pvinfo" to
"Enable guest i915 full ppgtt functionality". (Tina)
Change since v3:
- Add some comments about pvinfo caps and version. (Joonas)
Change since v4:
- Tested by Tina Zhang.
Change since v5:
- Add limitation about supporting 32bit full ppgtt.
Change since v6:
- Change the fallback to 48bit full ppgtt if i915.ppgtt_enable=2. (Zhenyu)
Change in v9:
- Remove the fixme comment due to no plan for 32bit full ppgtt
support. (Zhenyu)
- Reorder the patch-set to fix compiling issue with git-bisect. (Zhenyu)
- Add print log when forcing guest 48bit full ppgtt. (Zhenyu)
v10:
- Update against Joonas's has_full_ppgtt and has_full_48bit_ppgtt disconnect
change. (Zhenyu)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> # in v2
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tina Zhang <tina.zhang@intel.com>
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>