2005-04-17 06:20:36 +08:00
|
|
|
/* i915_drv.h -- Private header for the I915 driver -*- linux-c -*-
|
|
|
|
*/
|
2006-01-02 17:14:23 +08:00
|
|
|
/*
|
2005-06-23 20:46:46 +08:00
|
|
|
*
|
2005-04-17 06:20:36 +08:00
|
|
|
* Copyright 2003 Tungsten Graphics, Inc., Cedar Park, Texas.
|
|
|
|
* All Rights Reserved.
|
2005-06-23 20:46:46 +08:00
|
|
|
*
|
|
|
|
* Permission is hereby granted, free of charge, to any person obtaining a
|
|
|
|
* copy of this software and associated documentation files (the
|
|
|
|
* "Software"), to deal in the Software without restriction, including
|
|
|
|
* without limitation the rights to use, copy, modify, merge, publish,
|
|
|
|
* distribute, sub license, and/or sell copies of the Software, and to
|
|
|
|
* permit persons to whom the Software is furnished to do so, subject to
|
|
|
|
* the following conditions:
|
|
|
|
*
|
|
|
|
* The above copyright notice and this permission notice (including the
|
|
|
|
* next paragraph) shall be included in all copies or substantial portions
|
|
|
|
* of the Software.
|
|
|
|
*
|
|
|
|
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
|
|
|
|
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
|
|
|
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
|
|
|
|
* IN NO EVENT SHALL TUNGSTEN GRAPHICS AND/OR ITS SUPPLIERS BE LIABLE FOR
|
|
|
|
* ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
|
|
|
|
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
|
|
|
|
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
|
|
|
*
|
2006-01-02 17:14:23 +08:00
|
|
|
*/
|
2005-04-17 06:20:36 +08:00
|
|
|
|
|
|
|
#ifndef _I915_DRV_H_
|
|
|
|
#define _I915_DRV_H_
|
|
|
|
|
2012-12-04 05:03:14 +08:00
|
|
|
#include <uapi/drm/i915_drm.h>
|
2015-02-11 01:16:05 +08:00
|
|
|
#include <uapi/drm/drm_fourcc.h>
|
2012-12-04 05:03:14 +08:00
|
|
|
|
2008-10-31 10:38:48 +08:00
|
|
|
#include <linux/io-mapping.h>
|
2010-07-21 06:44:45 +08:00
|
|
|
#include <linux/i2c.h>
|
2012-02-28 07:43:09 +08:00
|
|
|
#include <linux/i2c-algo-bit.h>
|
2011-08-12 18:11:33 +08:00
|
|
|
#include <linux/backlight.h>
|
drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl
By exporting the ability to map user address and inserting PTEs
representing their backing pages into the GTT, we can exploit UMA in order
to utilize normal application data as a texture source or even as a
render target (depending upon the capabilities of the chipset). This has
a number of uses, with zero-copy downloads to the GPU and efficient
readback making the intermixed streaming of CPU and GPU operations
fairly efficient. This ability has many widespread implications from
faster rendering of client-side software rasterisers (chromium),
mitigation of stalls due to read back (firefox) and to faster pipelining
of texture data (such as pixel buffer objects in GL or data blobs in CL).
v2: Compile with CONFIG_MMU_NOTIFIER
v3: We can sleep while performing invalidate-range, which we can utilise
to drop our page references prior to the kernel manipulating the vma
(for either discard or cloning) and so protect normal users.
v4: Only run the invalidate notifier if the range intercepts the bo.
v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers
v6: Recheck after reacquire mutex for lost mmu.
v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary.
v8: Fix rebasing error after forwarding porting the back port.
v9: Limit the userptr to page aligned entries. We now expect userspace
to handle all the offset-in-page adjustments itself.
v10: Prevent vma from being copied across fork to avoid issues with cow.
v11: Drop vma behaviour changes -- locking is nigh on impossible.
Use a worker to load user pages to avoid lock inversions.
v12: Use get_task_mm()/mmput() for correct refcounting of mm.
v13: Use a worker to release the mmu_notifier to avoid lock inversion
v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer
with its own locking and tree of objects for each mm/mmu_notifier.
v15: Prevent overlapping userptr objects, and invalidate all objects
within the mmu_notifier range
v16: Fix a typo for iterating over multiple objects in the range and
rearrange error path to destroy the mmu_notifier locklessly.
Also close a race between invalidate_range and the get_pages_worker.
v17: Close a race between get_pages_worker/invalidate_range and fresh
allocations of the same userptr range - and notice that
struct_mutex was presumed to be held when during creation it wasn't.
v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory
for the struct sg_table and to clear it before reporting an error.
v19: Always error out on read-only userptr requests as we don't have the
hardware infrastructure to support them at the moment.
v20: Refuse to implement read-only support until we have the required
infrastructure - but reserve the bit in flags for future use.
v21: use_mm() is not required for get_user_pages(). It is only meant to
be used to fix up the kernel thread's current->mm for use with
copy_user().
v22: Use sg_alloc_table_from_pages for that chunky feeling
v23: Export a function for sanity checking dma-buf rather than encode
userptr details elsewhere, and clean up comments based on
suggestions by Bradley.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Akash Goel <akash.goel@intel.com>
Cc: "Volkin, Bradley D" <bradley.d.volkin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
[danvet: Frob ioctl allocation to pick the next one - will cause a bit
of fuss with create2 apparently, but such are the rules.]
[danvet2: oops, forgot to git add after manual patch application]
[danvet3: Appease sparse.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-16 21:22:37 +08:00
|
|
|
#include <linux/hashtable.h>
|
2012-04-06 05:47:36 +08:00
|
|
|
#include <linux/intel-iommu.h>
|
2012-04-27 21:17:39 +08:00
|
|
|
#include <linux/kref.h>
|
drm/i915: irq-drive the dp aux communication
At least on the platforms that have a dp aux irq and also have it
enabled - vlvhsw should have one, too. But I don't have a machine to
test this on. Judging from docs there's no dp aux interrupt for gm45.
Also, I only have an ivb cpu edp machine, so the dp aux A code for
snb/ilk is untested.
For dpcd probing when nothing is connected it slashes about 5ms of cpu
time (cpu time is now negligible), which agrees with 3 * 5 400 usec
timeouts.
A previous version of this patch increases the time required to go
through the dp_detect cycle (which includes reading the edid) from
around 33 ms to around 40 ms. Experiments indicated that this is
purely due to the irq latency - the hw doesn't allow us to queue up
dp aux transactions and hence irq latency directly affects throughput.
gmbus is much better, there we have a 8 byte buffer, and we get the
irq once another 4 bytes can be queued up.
But by using the pm_qos interface to request the lowest possible cpu
wake-up latency this slowdown completely disappeared.
Since all our output detection logic is single-threaded with the
mode_config mutex right now anyway, I've decide not ot play fancy and
to just reuse the gmbus wait queue. But this would definitely prep the
way to run dp detection on different ports in parallel
v2: Add a timeout for dp aux transfers when using interrupts - the hw
_does_ prevent this with the hw-based 400 usec timeout, but if the
irq somehow doesn't arrive we're screwed. Lesson learned while
developing this ;-)
v3: While at it also convert the busy-loop to wait_for_atomic, so that
we don't run the risk of an infinite loop any more.
v4: Ensure we have the smallest possible irq latency by using the
pm_qos interface.
v5: Add a comment to the code to explain why we frob pm_qos. Suggested
by Chris Wilson.
v6: Disable dp irq for vlv, that's easier than trying to get at docs
and hw.
v7: Squash in a fix for Haswell that Paulo Zanoni tracked down - the
dp aux registers aren't at a fixed offset any more, but can be on the
PCH while the DP port is on the cpu die.
Reviewed-by: Imre Deak <imre.deak@intel.com> (v6)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-01 20:53:48 +08:00
|
|
|
#include <linux/pm_qos.h>
|
2016-04-14 00:35:01 +08:00
|
|
|
#include <linux/shmem_fs.h>
|
|
|
|
|
|
|
|
#include <drm/drmP.h>
|
|
|
|
#include <drm/intel-gtt.h>
|
|
|
|
#include <drm/drm_legacy.h> /* for struct drm_dma_handle */
|
|
|
|
#include <drm/drm_gem.h>
|
2016-06-21 16:54:22 +08:00
|
|
|
#include <drm/drm_auth.h>
|
2016-04-14 00:35:01 +08:00
|
|
|
|
|
|
|
#include "i915_params.h"
|
|
|
|
#include "i915_reg.h"
|
|
|
|
|
|
|
|
#include "intel_bios.h"
|
2016-03-08 23:46:19 +08:00
|
|
|
#include "intel_dpll_mgr.h"
|
2016-04-14 00:35:01 +08:00
|
|
|
#include "intel_guc.h"
|
|
|
|
#include "intel_lrc.h"
|
|
|
|
#include "intel_ringbuffer.h"
|
|
|
|
|
2016-04-14 00:35:02 +08:00
|
|
|
#include "i915_gem.h"
|
2016-04-14 00:35:01 +08:00
|
|
|
#include "i915_gem_gtt.h"
|
|
|
|
#include "i915_gem_render_state.h"
|
2008-07-30 02:54:06 +08:00
|
|
|
|
drm/i915: gvt: Introduce the basic architecture of GVT-g
This patch introduces the very basic framework of GVT-g device model,
includes basic prototypes, definitions, initialization.
v12:
- Call intel_gvt_init() in driver early initialization stage. (Chris)
v8:
- Remove the GVT idr and mutex in intel_gvt_host. (Joonas)
v7:
- Refine the URL link in Kconfig. (Joonas)
- Refine the introduction of GVT-g host support in Kconfig. (Joonas)
- Remove the macro GVT_ALIGN(), use round_down() instead. (Joonas)
- Make "struct intel_gvt" a data member in struct drm_i915_private.(Joonas)
- Remove {alloc, free}_gvt_device()
- Rename intel_gvt_{create, destroy}_gvt_device()
- Expost intel_gvt_init_host()
- Remove the dummy "struct intel_gvt" declaration in intel_gvt.h (Joonas)
v6:
- Refine introduction in Kconfig. (Chris)
- The exposed API functions will take struct intel_gvt * instead of
void *. (Chris/Tvrtko)
- Remove most memebers of strct intel_gvt_device_info. Will add them
in the device model patches.(Chris)
- Remove gvt_info() and gvt_err() in debug.h. (Chris)
- Move GVT kernel parameter into i915_params. (Chris)
- Remove include/drm/i915_gvt.h, as GVT-g will be built within i915.
- Remove the redundant struct i915_gvt *, as the functions in i915
will directly take struct intel_gvt *.
- Add more comments for reviewer.
v5:
Take Tvrtko's comments:
- Fix the misspelled words in Kconfig
- Let functions take drm_i915_private * instead of struct drm_device *
- Remove redundant prints/local varible initialization
v3:
Take Joonas' comments:
- Change file name i915_gvt.* to intel_gvt.*
- Move GVT kernel parameter into intel_gvt.c
- Remove redundant debug macros
- Change error handling style
- Add introductions for some stub functions
- Introduce drm/i915_gvt.h.
Take Kevin's comments:
- Move GVT-g host/guest check into intel_vgt_balloon in i915_gem_gtt.c
v2:
- Introduce i915_gvt.c.
It's necessary to introduce the stubs between i915 driver and GVT-g host,
as GVT-g components is configurable in kernel config. When disabled, the
stubs here do nothing.
Take Joonas' comments:
- Replace boolean return value with int.
- Replace customized info/warn/debug macros with DRM macros.
- Document all non-static functions like i915.
- Remove empty and unused functions.
- Replace magic number with marcos.
- Set GVT-g in kernel config to "n" by default.
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466078825-6662-5-git-send-email-zhi.a.wang@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-06-16 20:07:00 +08:00
|
|
|
#include "intel_gvt.h"
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
/* General customization:
|
|
|
|
*/
|
|
|
|
|
|
|
|
#define DRIVER_NAME "i915"
|
|
|
|
#define DRIVER_DESC "Intel Graphics"
|
2016-07-11 15:18:31 +08:00
|
|
|
#define DRIVER_DATE "20160711"
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2014-10-28 23:32:30 +08:00
|
|
|
#undef WARN_ON
|
2014-12-08 23:40:10 +08:00
|
|
|
/* Many gcc seem to no see through this and fall over :( */
|
|
|
|
#if 0
|
|
|
|
#define WARN_ON(x) ({ \
|
|
|
|
bool __i915_warn_cond = (x); \
|
|
|
|
if (__builtin_constant_p(__i915_warn_cond)) \
|
|
|
|
BUILD_BUG_ON(__i915_warn_cond); \
|
|
|
|
WARN(__i915_warn_cond, "WARN_ON(" #x ")"); })
|
|
|
|
#else
|
2015-12-18 20:27:27 +08:00
|
|
|
#define WARN_ON(x) WARN((x), "%s", "WARN_ON(" __stringify(x) ")")
|
2014-12-08 23:40:10 +08:00
|
|
|
#endif
|
|
|
|
|
2015-03-12 19:01:12 +08:00
|
|
|
#undef WARN_ON_ONCE
|
2015-12-18 20:27:27 +08:00
|
|
|
#define WARN_ON_ONCE(x) WARN_ONCE((x), "%s", "WARN_ON_ONCE(" __stringify(x) ")")
|
2015-03-12 19:01:12 +08:00
|
|
|
|
2014-12-08 23:40:10 +08:00
|
|
|
#define MISSING_CASE(x) WARN(1, "Missing switch case (%lu) in %s\n", \
|
|
|
|
(long) (x), __func__);
|
2014-10-28 23:32:30 +08:00
|
|
|
|
2014-12-16 02:56:32 +08:00
|
|
|
/* Use I915_STATE_WARN(x) and I915_STATE_WARN_ON() (rather than WARN() and
|
|
|
|
* WARN_ON()) for hw state sanity checks to check for unexpected conditions
|
|
|
|
* which may not necessarily be a user visible problem. This will either
|
|
|
|
* WARN() or DRM_ERROR() depending on the verbose_checks moduleparam, to
|
|
|
|
* enable distros and users to tailor their preferred amount of i915 abrt
|
|
|
|
* spam.
|
|
|
|
*/
|
|
|
|
#define I915_STATE_WARN(condition, format...) ({ \
|
|
|
|
int __ret_warn_on = !!(condition); \
|
2015-12-18 20:27:26 +08:00
|
|
|
if (unlikely(__ret_warn_on)) \
|
|
|
|
if (!WARN(i915.verbose_state_checks, format)) \
|
2014-12-16 02:56:32 +08:00
|
|
|
DRM_ERROR(format); \
|
|
|
|
unlikely(__ret_warn_on); \
|
|
|
|
})
|
|
|
|
|
2015-12-18 20:27:27 +08:00
|
|
|
#define I915_STATE_WARN_ON(x) \
|
|
|
|
I915_STATE_WARN((x), "%s", "WARN_ON(" __stringify(x) ")")
|
2014-10-28 23:32:30 +08:00
|
|
|
|
2016-03-16 19:39:08 +08:00
|
|
|
bool __i915_inject_load_failure(const char *func, int line);
|
|
|
|
#define i915_inject_load_failure() \
|
|
|
|
__i915_inject_load_failure(__func__, __LINE__)
|
|
|
|
|
2015-08-27 21:23:30 +08:00
|
|
|
static inline const char *yesno(bool v)
|
|
|
|
{
|
|
|
|
return v ? "yes" : "no";
|
|
|
|
}
|
|
|
|
|
2016-01-14 18:53:34 +08:00
|
|
|
static inline const char *onoff(bool v)
|
|
|
|
{
|
|
|
|
return v ? "on" : "off";
|
|
|
|
}
|
|
|
|
|
2008-08-26 06:11:06 +08:00
|
|
|
enum pipe {
|
2013-11-01 00:55:49 +08:00
|
|
|
INVALID_PIPE = -1,
|
2008-08-26 06:11:06 +08:00
|
|
|
PIPE_A = 0,
|
|
|
|
PIPE_B,
|
2011-02-08 04:26:52 +08:00
|
|
|
PIPE_C,
|
drm/i915: Reorganize display pipe register accesses
RFCv2: Reorganize array indexing so that full offsets can be used as
is. It makes grepping for registers in i915_reg.h much easier. Also
move offset arrays to intel_device_info.
v1: Fixed offsets for VLV, proper eDP handling
v2: Fixed BCLRPAT, PIPESRC, PIPECONF and DSP* macros.
v3: Added EDP pipe comment, removed redundant offset arrays for
MSA_MISC and DDI_FUNC_CTL.
v4: Rename patch and report object size increase.
v5: Change location of commas, add PIPE_EDP into enum pipe
v6: Insert PIPE_EDP_OFFSET into pipe offset array
v7: Set I915_MAX_PIPES back to 3, change more registers accessors
to use the new macros, get rid of _PIPE_INC and add dev_priv
as a parameter where required by the new macros.
Upcoming hardware will not have the various display pipe register
ranges evenly spaced in memory. Change register address calculations
into array lookups.
Tested on SNB, VLV, IVB, Gen2 and HSW w/eDP.
I left the UMS cruft untouched.
Size differences:
text data bss dec hex filename
596431 4634 56 601121 92c21 i915.ko (new)
593199 4634 56 597889 91f81 i915.ko (old)
Signed-off-by: Antti Koskipaa <antti.koskipaa@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-04 20:22:24 +08:00
|
|
|
_PIPE_EDP,
|
|
|
|
I915_MAX_PIPES = _PIPE_EDP
|
2008-08-26 06:11:06 +08:00
|
|
|
};
|
2011-02-08 04:26:52 +08:00
|
|
|
#define pipe_name(p) ((p) + 'A')
|
2008-08-26 06:11:06 +08:00
|
|
|
|
drm/i915: add TRANSCODER_EDP
Before Haswell we used to have the CPU pipes and the PCH transcoders.
We had the same amount of pipes and transcoders, and there was a 1:1
mapping between them. After Haswell what we used to call CPU pipe was
split into CPU pipe and CPU transcoder. So now we have 3 CPU pipes (A,
B and C), 4 CPU transcoders (A, B, C and EDP) and 1 PCH transcoder
(only used for VGA).
For all the outputs except for EDP we have an 1:1 mapping on the CPU
pipes and CPU transcoders, so if you're using CPU pipe A you have to
use CPU transcoder A. When have an eDP output you have to use
transcoder EDP and you can attach this CPU transcoder to any of the 3
CPU pipes. When using VGA you need to select a pair of matching CPU
pipes/transcoders (A/A, B/B, C/C) and you also need to enable/use the
PCH transcoder.
For now we're just creating the cpu_transcoder definitions and setting
cpu_transcoder to TRANSCODER_EDP on DDI eDP code, but none of the
registers was ported to use transcoder instead of pipe. The goal is to
keep the code backwards-compatible since on all cases except when
using eDP we must have pipe == cpu_transcoder.
V2: Comment the haswell_crtc_off chunk, suggested by Damien Lespiau
and Daniel Vetter.
We currently need the haswell_crtc_off chunk because TRANSCODER_EDP
can be used by any CRTC, so when you stop using it you have to stop
saying you're using it, otherwise you may have at some point 2 CRTCs
claiming they're using TRANSCODER_EDP (a disabled CRTC and an enabled
one), then the HW state readout code will get completely confused.
In other words:
Imagine the following case:
xrandr --output eDP1 --auto --crtc 0
xrandr --output eDP1 --off
xrandr --output eDP1 --auto --crtc 2
After the last command you could get a "pipe A assertion failure
(expected off, current on)" because CRTC 0 still claims it's using
TRANSCODER_EDP, so the HW state readout function will read it
(through PIPECONF) and expect it to be off, when it's actually on
because it's being used by CRTC 2.
So when we make "intel_crtc->cpu_transcoder = intel_crtc->pipe" we
make sure we're pointing to our own original CRTC which is certainly
not used by any other CRTC.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-10-25 01:59:34 +08:00
|
|
|
enum transcoder {
|
|
|
|
TRANSCODER_A = 0,
|
|
|
|
TRANSCODER_B,
|
|
|
|
TRANSCODER_C,
|
drm/i915: Reorganize display pipe register accesses
RFCv2: Reorganize array indexing so that full offsets can be used as
is. It makes grepping for registers in i915_reg.h much easier. Also
move offset arrays to intel_device_info.
v1: Fixed offsets for VLV, proper eDP handling
v2: Fixed BCLRPAT, PIPESRC, PIPECONF and DSP* macros.
v3: Added EDP pipe comment, removed redundant offset arrays for
MSA_MISC and DDI_FUNC_CTL.
v4: Rename patch and report object size increase.
v5: Change location of commas, add PIPE_EDP into enum pipe
v6: Insert PIPE_EDP_OFFSET into pipe offset array
v7: Set I915_MAX_PIPES back to 3, change more registers accessors
to use the new macros, get rid of _PIPE_INC and add dev_priv
as a parameter where required by the new macros.
Upcoming hardware will not have the various display pipe register
ranges evenly spaced in memory. Change register address calculations
into array lookups.
Tested on SNB, VLV, IVB, Gen2 and HSW w/eDP.
I left the UMS cruft untouched.
Size differences:
text data bss dec hex filename
596431 4634 56 601121 92c21 i915.ko (new)
593199 4634 56 597889 91f81 i915.ko (old)
Signed-off-by: Antti Koskipaa <antti.koskipaa@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-04 20:22:24 +08:00
|
|
|
TRANSCODER_EDP,
|
2016-03-18 23:05:42 +08:00
|
|
|
TRANSCODER_DSI_A,
|
|
|
|
TRANSCODER_DSI_C,
|
drm/i915: Reorganize display pipe register accesses
RFCv2: Reorganize array indexing so that full offsets can be used as
is. It makes grepping for registers in i915_reg.h much easier. Also
move offset arrays to intel_device_info.
v1: Fixed offsets for VLV, proper eDP handling
v2: Fixed BCLRPAT, PIPESRC, PIPECONF and DSP* macros.
v3: Added EDP pipe comment, removed redundant offset arrays for
MSA_MISC and DDI_FUNC_CTL.
v4: Rename patch and report object size increase.
v5: Change location of commas, add PIPE_EDP into enum pipe
v6: Insert PIPE_EDP_OFFSET into pipe offset array
v7: Set I915_MAX_PIPES back to 3, change more registers accessors
to use the new macros, get rid of _PIPE_INC and add dev_priv
as a parameter where required by the new macros.
Upcoming hardware will not have the various display pipe register
ranges evenly spaced in memory. Change register address calculations
into array lookups.
Tested on SNB, VLV, IVB, Gen2 and HSW w/eDP.
I left the UMS cruft untouched.
Size differences:
text data bss dec hex filename
596431 4634 56 601121 92c21 i915.ko (new)
593199 4634 56 597889 91f81 i915.ko (old)
Signed-off-by: Antti Koskipaa <antti.koskipaa@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-04 20:22:24 +08:00
|
|
|
I915_MAX_TRANSCODERS
|
drm/i915: add TRANSCODER_EDP
Before Haswell we used to have the CPU pipes and the PCH transcoders.
We had the same amount of pipes and transcoders, and there was a 1:1
mapping between them. After Haswell what we used to call CPU pipe was
split into CPU pipe and CPU transcoder. So now we have 3 CPU pipes (A,
B and C), 4 CPU transcoders (A, B, C and EDP) and 1 PCH transcoder
(only used for VGA).
For all the outputs except for EDP we have an 1:1 mapping on the CPU
pipes and CPU transcoders, so if you're using CPU pipe A you have to
use CPU transcoder A. When have an eDP output you have to use
transcoder EDP and you can attach this CPU transcoder to any of the 3
CPU pipes. When using VGA you need to select a pair of matching CPU
pipes/transcoders (A/A, B/B, C/C) and you also need to enable/use the
PCH transcoder.
For now we're just creating the cpu_transcoder definitions and setting
cpu_transcoder to TRANSCODER_EDP on DDI eDP code, but none of the
registers was ported to use transcoder instead of pipe. The goal is to
keep the code backwards-compatible since on all cases except when
using eDP we must have pipe == cpu_transcoder.
V2: Comment the haswell_crtc_off chunk, suggested by Damien Lespiau
and Daniel Vetter.
We currently need the haswell_crtc_off chunk because TRANSCODER_EDP
can be used by any CRTC, so when you stop using it you have to stop
saying you're using it, otherwise you may have at some point 2 CRTCs
claiming they're using TRANSCODER_EDP (a disabled CRTC and an enabled
one), then the HW state readout code will get completely confused.
In other words:
Imagine the following case:
xrandr --output eDP1 --auto --crtc 0
xrandr --output eDP1 --off
xrandr --output eDP1 --auto --crtc 2
After the last command you could get a "pipe A assertion failure
(expected off, current on)" because CRTC 0 still claims it's using
TRANSCODER_EDP, so the HW state readout function will read it
(through PIPECONF) and expect it to be off, when it's actually on
because it's being used by CRTC 2.
So when we make "intel_crtc->cpu_transcoder = intel_crtc->pipe" we
make sure we're pointing to our own original CRTC which is certainly
not used by any other CRTC.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-10-25 01:59:34 +08:00
|
|
|
};
|
2016-03-16 03:51:10 +08:00
|
|
|
|
|
|
|
static inline const char *transcoder_name(enum transcoder transcoder)
|
|
|
|
{
|
|
|
|
switch (transcoder) {
|
|
|
|
case TRANSCODER_A:
|
|
|
|
return "A";
|
|
|
|
case TRANSCODER_B:
|
|
|
|
return "B";
|
|
|
|
case TRANSCODER_C:
|
|
|
|
return "C";
|
|
|
|
case TRANSCODER_EDP:
|
|
|
|
return "EDP";
|
2016-03-18 23:05:42 +08:00
|
|
|
case TRANSCODER_DSI_A:
|
|
|
|
return "DSI A";
|
|
|
|
case TRANSCODER_DSI_C:
|
|
|
|
return "DSI C";
|
2016-03-16 03:51:10 +08:00
|
|
|
default:
|
|
|
|
return "<invalid>";
|
|
|
|
}
|
|
|
|
}
|
drm/i915: add TRANSCODER_EDP
Before Haswell we used to have the CPU pipes and the PCH transcoders.
We had the same amount of pipes and transcoders, and there was a 1:1
mapping between them. After Haswell what we used to call CPU pipe was
split into CPU pipe and CPU transcoder. So now we have 3 CPU pipes (A,
B and C), 4 CPU transcoders (A, B, C and EDP) and 1 PCH transcoder
(only used for VGA).
For all the outputs except for EDP we have an 1:1 mapping on the CPU
pipes and CPU transcoders, so if you're using CPU pipe A you have to
use CPU transcoder A. When have an eDP output you have to use
transcoder EDP and you can attach this CPU transcoder to any of the 3
CPU pipes. When using VGA you need to select a pair of matching CPU
pipes/transcoders (A/A, B/B, C/C) and you also need to enable/use the
PCH transcoder.
For now we're just creating the cpu_transcoder definitions and setting
cpu_transcoder to TRANSCODER_EDP on DDI eDP code, but none of the
registers was ported to use transcoder instead of pipe. The goal is to
keep the code backwards-compatible since on all cases except when
using eDP we must have pipe == cpu_transcoder.
V2: Comment the haswell_crtc_off chunk, suggested by Damien Lespiau
and Daniel Vetter.
We currently need the haswell_crtc_off chunk because TRANSCODER_EDP
can be used by any CRTC, so when you stop using it you have to stop
saying you're using it, otherwise you may have at some point 2 CRTCs
claiming they're using TRANSCODER_EDP (a disabled CRTC and an enabled
one), then the HW state readout code will get completely confused.
In other words:
Imagine the following case:
xrandr --output eDP1 --auto --crtc 0
xrandr --output eDP1 --off
xrandr --output eDP1 --auto --crtc 2
After the last command you could get a "pipe A assertion failure
(expected off, current on)" because CRTC 0 still claims it's using
TRANSCODER_EDP, so the HW state readout function will read it
(through PIPECONF) and expect it to be off, when it's actually on
because it's being used by CRTC 2.
So when we make "intel_crtc->cpu_transcoder = intel_crtc->pipe" we
make sure we're pointing to our own original CRTC which is certainly
not used by any other CRTC.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-10-25 01:59:34 +08:00
|
|
|
|
2016-03-18 23:05:42 +08:00
|
|
|
static inline bool transcoder_is_dsi(enum transcoder transcoder)
|
|
|
|
{
|
|
|
|
return transcoder == TRANSCODER_DSI_A || transcoder == TRANSCODER_DSI_C;
|
|
|
|
}
|
|
|
|
|
2014-03-28 02:48:32 +08:00
|
|
|
/*
|
2015-09-25 06:53:09 +08:00
|
|
|
* I915_MAX_PLANES in the enum below is the maximum (across all platforms)
|
|
|
|
* number of planes per CRTC. Not all platforms really have this many planes,
|
|
|
|
* which means some arrays of size I915_MAX_PLANES may have unused entries
|
|
|
|
* between the topmost sprite plane and the cursor plane.
|
2014-03-28 02:48:32 +08:00
|
|
|
*/
|
2009-09-11 06:28:06 +08:00
|
|
|
enum plane {
|
|
|
|
PLANE_A = 0,
|
|
|
|
PLANE_B,
|
2011-02-08 04:26:52 +08:00
|
|
|
PLANE_C,
|
2015-09-25 06:53:09 +08:00
|
|
|
PLANE_CURSOR,
|
|
|
|
I915_MAX_PLANES,
|
2009-09-11 06:28:06 +08:00
|
|
|
};
|
2011-02-08 04:26:52 +08:00
|
|
|
#define plane_name(p) ((p) + 'A')
|
2008-11-19 01:30:25 +08:00
|
|
|
|
2014-03-04 01:31:48 +08:00
|
|
|
#define sprite_name(p, s) ((p) * INTEL_INFO(dev)->num_sprites[(p)] + (s) + 'A')
|
2013-04-17 22:48:51 +08:00
|
|
|
|
2012-03-29 23:32:22 +08:00
|
|
|
enum port {
|
|
|
|
PORT_A = 0,
|
|
|
|
PORT_B,
|
|
|
|
PORT_C,
|
|
|
|
PORT_D,
|
|
|
|
PORT_E,
|
|
|
|
I915_MAX_PORTS
|
|
|
|
};
|
|
|
|
#define port_name(p) ((p) + 'A')
|
|
|
|
|
2014-04-09 18:28:14 +08:00
|
|
|
#define I915_NUM_PHYS_VLV 2
|
2013-11-06 14:36:35 +08:00
|
|
|
|
|
|
|
enum dpio_channel {
|
|
|
|
DPIO_CH0,
|
|
|
|
DPIO_CH1
|
|
|
|
};
|
|
|
|
|
|
|
|
enum dpio_phy {
|
|
|
|
DPIO_PHY0,
|
|
|
|
DPIO_PHY1
|
|
|
|
};
|
|
|
|
|
2013-05-03 23:15:36 +08:00
|
|
|
enum intel_display_power_domain {
|
|
|
|
POWER_DOMAIN_PIPE_A,
|
|
|
|
POWER_DOMAIN_PIPE_B,
|
|
|
|
POWER_DOMAIN_PIPE_C,
|
|
|
|
POWER_DOMAIN_PIPE_A_PANEL_FITTER,
|
|
|
|
POWER_DOMAIN_PIPE_B_PANEL_FITTER,
|
|
|
|
POWER_DOMAIN_PIPE_C_PANEL_FITTER,
|
|
|
|
POWER_DOMAIN_TRANSCODER_A,
|
|
|
|
POWER_DOMAIN_TRANSCODER_B,
|
|
|
|
POWER_DOMAIN_TRANSCODER_C,
|
2013-10-16 22:25:48 +08:00
|
|
|
POWER_DOMAIN_TRANSCODER_EDP,
|
2016-03-18 23:05:42 +08:00
|
|
|
POWER_DOMAIN_TRANSCODER_DSI_A,
|
|
|
|
POWER_DOMAIN_TRANSCODER_DSI_C,
|
2015-11-09 23:48:21 +08:00
|
|
|
POWER_DOMAIN_PORT_DDI_A_LANES,
|
|
|
|
POWER_DOMAIN_PORT_DDI_B_LANES,
|
|
|
|
POWER_DOMAIN_PORT_DDI_C_LANES,
|
|
|
|
POWER_DOMAIN_PORT_DDI_D_LANES,
|
|
|
|
POWER_DOMAIN_PORT_DDI_E_LANES,
|
2014-03-05 01:22:57 +08:00
|
|
|
POWER_DOMAIN_PORT_DSI,
|
|
|
|
POWER_DOMAIN_PORT_CRT,
|
|
|
|
POWER_DOMAIN_PORT_OTHER,
|
2013-09-16 22:38:30 +08:00
|
|
|
POWER_DOMAIN_VGA,
|
2013-11-25 23:15:28 +08:00
|
|
|
POWER_DOMAIN_AUDIO,
|
2014-07-04 22:27:38 +08:00
|
|
|
POWER_DOMAIN_PLLS,
|
2015-01-16 23:57:51 +08:00
|
|
|
POWER_DOMAIN_AUX_A,
|
|
|
|
POWER_DOMAIN_AUX_B,
|
|
|
|
POWER_DOMAIN_AUX_C,
|
|
|
|
POWER_DOMAIN_AUX_D,
|
2015-11-09 23:48:19 +08:00
|
|
|
POWER_DOMAIN_GMBUS,
|
2015-11-09 23:48:22 +08:00
|
|
|
POWER_DOMAIN_MODESET,
|
2013-10-25 22:36:48 +08:00
|
|
|
POWER_DOMAIN_INIT,
|
2013-10-16 22:25:49 +08:00
|
|
|
|
|
|
|
POWER_DOMAIN_NUM,
|
2013-05-03 23:15:36 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
#define POWER_DOMAIN_PIPE(pipe) ((pipe) + POWER_DOMAIN_PIPE_A)
|
|
|
|
#define POWER_DOMAIN_PIPE_PANEL_FITTER(pipe) \
|
|
|
|
((pipe) + POWER_DOMAIN_PIPE_A_PANEL_FITTER)
|
2013-10-16 22:25:48 +08:00
|
|
|
#define POWER_DOMAIN_TRANSCODER(tran) \
|
|
|
|
((tran) == TRANSCODER_EDP ? POWER_DOMAIN_TRANSCODER_EDP : \
|
|
|
|
(tran) + POWER_DOMAIN_TRANSCODER_A)
|
2013-05-03 23:15:36 +08:00
|
|
|
|
2013-02-26 01:06:49 +08:00
|
|
|
enum hpd_pin {
|
|
|
|
HPD_NONE = 0,
|
|
|
|
HPD_TV = HPD_NONE, /* TV is known to be unreliable */
|
|
|
|
HPD_CRT,
|
|
|
|
HPD_SDVO_B,
|
|
|
|
HPD_SDVO_C,
|
2015-07-22 06:32:45 +08:00
|
|
|
HPD_PORT_A,
|
2013-02-26 01:06:49 +08:00
|
|
|
HPD_PORT_B,
|
|
|
|
HPD_PORT_C,
|
|
|
|
HPD_PORT_D,
|
2015-08-17 15:55:50 +08:00
|
|
|
HPD_PORT_E,
|
2013-02-26 01:06:49 +08:00
|
|
|
HPD_NUM_PINS
|
|
|
|
};
|
|
|
|
|
2015-05-28 20:43:48 +08:00
|
|
|
#define for_each_hpd_pin(__pin) \
|
|
|
|
for ((__pin) = (HPD_NONE + 1); (__pin) < HPD_NUM_PINS; (__pin)++)
|
|
|
|
|
2015-05-27 20:03:42 +08:00
|
|
|
struct i915_hotplug {
|
|
|
|
struct work_struct hotplug_work;
|
|
|
|
|
|
|
|
struct {
|
|
|
|
unsigned long last_jiffies;
|
|
|
|
int count;
|
|
|
|
enum {
|
|
|
|
HPD_ENABLED = 0,
|
|
|
|
HPD_DISABLED = 1,
|
|
|
|
HPD_MARK_DISABLED = 2
|
|
|
|
} state;
|
|
|
|
} stats[HPD_NUM_PINS];
|
|
|
|
u32 event_bits;
|
|
|
|
struct delayed_work reenable_work;
|
|
|
|
|
|
|
|
struct intel_digital_port *irq_port[I915_MAX_PORTS];
|
|
|
|
u32 long_port_mask;
|
|
|
|
u32 short_port_mask;
|
|
|
|
struct work_struct dig_port_work;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* if we get a HPD irq from DP and a HPD irq from non-DP
|
|
|
|
* the non-DP HPD could block the workqueue on a mode config
|
|
|
|
* mutex getting, that userspace may have taken. However
|
|
|
|
* userspace is waiting on the DP workqueue to run which is
|
|
|
|
* blocked behind the non-DP one.
|
|
|
|
*/
|
|
|
|
struct workqueue_struct *dp_wq;
|
|
|
|
};
|
|
|
|
|
2012-12-03 19:49:06 +08:00
|
|
|
#define I915_GEM_GPU_DOMAINS \
|
|
|
|
(I915_GEM_DOMAIN_RENDER | \
|
|
|
|
I915_GEM_DOMAIN_SAMPLER | \
|
|
|
|
I915_GEM_DOMAIN_COMMAND | \
|
|
|
|
I915_GEM_DOMAIN_INSTRUCTION | \
|
|
|
|
I915_GEM_DOMAIN_VERTEX)
|
2010-05-22 04:26:39 +08:00
|
|
|
|
2014-08-18 20:49:10 +08:00
|
|
|
#define for_each_pipe(__dev_priv, __p) \
|
|
|
|
for ((__p) = 0; (__p) < INTEL_INFO(__dev_priv)->num_pipes; (__p)++)
|
2016-02-20 02:47:31 +08:00
|
|
|
#define for_each_pipe_masked(__dev_priv, __p, __mask) \
|
|
|
|
for ((__p) = 0; (__p) < INTEL_INFO(__dev_priv)->num_pipes; (__p)++) \
|
|
|
|
for_each_if ((__mask) & (1 << (__p)))
|
2015-02-28 22:54:08 +08:00
|
|
|
#define for_each_plane(__dev_priv, __pipe, __p) \
|
|
|
|
for ((__p) = 0; \
|
|
|
|
(__p) < INTEL_INFO(__dev_priv)->num_sprites[(__pipe)] + 1; \
|
|
|
|
(__p)++)
|
2015-02-28 22:54:09 +08:00
|
|
|
#define for_each_sprite(__dev_priv, __p, __s) \
|
|
|
|
for ((__s) = 0; \
|
|
|
|
(__s) < INTEL_INFO(__dev_priv)->num_sprites[(__p)]; \
|
|
|
|
(__s)++)
|
2011-02-08 04:26:52 +08:00
|
|
|
|
2016-03-16 03:51:09 +08:00
|
|
|
#define for_each_port_masked(__port, __ports_mask) \
|
|
|
|
for ((__port) = PORT_A; (__port) < I915_MAX_PORTS; (__port)++) \
|
|
|
|
for_each_if ((__ports_mask) & (1 << (__port)))
|
|
|
|
|
2014-05-14 06:32:23 +08:00
|
|
|
#define for_each_crtc(dev, crtc) \
|
2016-07-05 17:40:23 +08:00
|
|
|
list_for_each_entry(crtc, &(dev)->mode_config.crtc_list, head)
|
2014-05-14 06:32:23 +08:00
|
|
|
|
2015-04-21 22:12:52 +08:00
|
|
|
#define for_each_intel_plane(dev, intel_plane) \
|
|
|
|
list_for_each_entry(intel_plane, \
|
2016-07-05 17:40:23 +08:00
|
|
|
&(dev)->mode_config.plane_list, \
|
2015-04-21 22:12:52 +08:00
|
|
|
base.head)
|
|
|
|
|
2016-05-12 22:06:01 +08:00
|
|
|
#define for_each_intel_plane_mask(dev, intel_plane, plane_mask) \
|
2016-07-05 17:40:23 +08:00
|
|
|
list_for_each_entry(intel_plane, \
|
|
|
|
&(dev)->mode_config.plane_list, \
|
2016-05-12 22:06:01 +08:00
|
|
|
base.head) \
|
|
|
|
for_each_if ((plane_mask) & \
|
|
|
|
(1 << drm_plane_index(&intel_plane->base)))
|
|
|
|
|
2015-06-25 03:00:04 +08:00
|
|
|
#define for_each_intel_plane_on_crtc(dev, intel_crtc, intel_plane) \
|
|
|
|
list_for_each_entry(intel_plane, \
|
|
|
|
&(dev)->mode_config.plane_list, \
|
|
|
|
base.head) \
|
2015-11-25 03:21:56 +08:00
|
|
|
for_each_if ((intel_plane)->pipe == (intel_crtc)->pipe)
|
2015-06-25 03:00:04 +08:00
|
|
|
|
2016-07-05 17:40:23 +08:00
|
|
|
#define for_each_intel_crtc(dev, intel_crtc) \
|
|
|
|
list_for_each_entry(intel_crtc, \
|
|
|
|
&(dev)->mode_config.crtc_list, \
|
|
|
|
base.head)
|
2014-05-14 06:32:21 +08:00
|
|
|
|
2016-07-05 17:40:23 +08:00
|
|
|
#define for_each_intel_crtc_mask(dev, intel_crtc, crtc_mask) \
|
|
|
|
list_for_each_entry(intel_crtc, \
|
|
|
|
&(dev)->mode_config.crtc_list, \
|
|
|
|
base.head) \
|
2016-05-12 22:06:03 +08:00
|
|
|
for_each_if ((crtc_mask) & (1 << drm_crtc_index(&intel_crtc->base)))
|
|
|
|
|
2014-08-05 18:29:37 +08:00
|
|
|
#define for_each_intel_encoder(dev, intel_encoder) \
|
|
|
|
list_for_each_entry(intel_encoder, \
|
|
|
|
&(dev)->mode_config.encoder_list, \
|
|
|
|
base.head)
|
|
|
|
|
2015-03-03 21:21:56 +08:00
|
|
|
#define for_each_intel_connector(dev, intel_connector) \
|
|
|
|
list_for_each_entry(intel_connector, \
|
2016-07-05 17:40:23 +08:00
|
|
|
&(dev)->mode_config.connector_list, \
|
2015-03-03 21:21:56 +08:00
|
|
|
base.head)
|
|
|
|
|
2012-07-05 15:50:24 +08:00
|
|
|
#define for_each_encoder_on_crtc(dev, __crtc, intel_encoder) \
|
|
|
|
list_for_each_entry((intel_encoder), &(dev)->mode_config.encoder_list, base.head) \
|
2015-11-25 03:21:56 +08:00
|
|
|
for_each_if ((intel_encoder)->base.crtc == (__crtc))
|
2012-07-05 15:50:24 +08:00
|
|
|
|
2014-02-08 04:48:15 +08:00
|
|
|
#define for_each_connector_on_encoder(dev, __encoder, intel_connector) \
|
|
|
|
list_for_each_entry((intel_connector), &(dev)->mode_config.connector_list, base.head) \
|
2015-11-25 03:21:56 +08:00
|
|
|
for_each_if ((intel_connector)->base.encoder == (__encoder))
|
2014-02-08 04:48:15 +08:00
|
|
|
|
2014-07-12 12:32:27 +08:00
|
|
|
#define for_each_power_domain(domain, mask) \
|
|
|
|
for ((domain) = 0; (domain) < POWER_DOMAIN_NUM; (domain)++) \
|
2015-11-25 03:21:56 +08:00
|
|
|
for_each_if ((1 << (domain)) & (mask))
|
2014-07-12 12:32:27 +08:00
|
|
|
|
2013-06-05 19:34:14 +08:00
|
|
|
struct drm_i915_private;
|
2014-08-07 21:20:40 +08:00
|
|
|
struct i915_mm_struct;
|
drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl
By exporting the ability to map user address and inserting PTEs
representing their backing pages into the GTT, we can exploit UMA in order
to utilize normal application data as a texture source or even as a
render target (depending upon the capabilities of the chipset). This has
a number of uses, with zero-copy downloads to the GPU and efficient
readback making the intermixed streaming of CPU and GPU operations
fairly efficient. This ability has many widespread implications from
faster rendering of client-side software rasterisers (chromium),
mitigation of stalls due to read back (firefox) and to faster pipelining
of texture data (such as pixel buffer objects in GL or data blobs in CL).
v2: Compile with CONFIG_MMU_NOTIFIER
v3: We can sleep while performing invalidate-range, which we can utilise
to drop our page references prior to the kernel manipulating the vma
(for either discard or cloning) and so protect normal users.
v4: Only run the invalidate notifier if the range intercepts the bo.
v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers
v6: Recheck after reacquire mutex for lost mmu.
v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary.
v8: Fix rebasing error after forwarding porting the back port.
v9: Limit the userptr to page aligned entries. We now expect userspace
to handle all the offset-in-page adjustments itself.
v10: Prevent vma from being copied across fork to avoid issues with cow.
v11: Drop vma behaviour changes -- locking is nigh on impossible.
Use a worker to load user pages to avoid lock inversions.
v12: Use get_task_mm()/mmput() for correct refcounting of mm.
v13: Use a worker to release the mmu_notifier to avoid lock inversion
v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer
with its own locking and tree of objects for each mm/mmu_notifier.
v15: Prevent overlapping userptr objects, and invalidate all objects
within the mmu_notifier range
v16: Fix a typo for iterating over multiple objects in the range and
rearrange error path to destroy the mmu_notifier locklessly.
Also close a race between invalidate_range and the get_pages_worker.
v17: Close a race between get_pages_worker/invalidate_range and fresh
allocations of the same userptr range - and notice that
struct_mutex was presumed to be held when during creation it wasn't.
v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory
for the struct sg_table and to clear it before reporting an error.
v19: Always error out on read-only userptr requests as we don't have the
hardware infrastructure to support them at the moment.
v20: Refuse to implement read-only support until we have the required
infrastructure - but reserve the bit in flags for future use.
v21: use_mm() is not required for get_user_pages(). It is only meant to
be used to fix up the kernel thread's current->mm for use with
copy_user().
v22: Use sg_alloc_table_from_pages for that chunky feeling
v23: Export a function for sanity checking dma-buf rather than encode
userptr details elsewhere, and clean up comments based on
suggestions by Bradley.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Akash Goel <akash.goel@intel.com>
Cc: "Volkin, Bradley D" <bradley.d.volkin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
[danvet: Frob ioctl allocation to pick the next one - will cause a bit
of fuss with create2 apparently, but such are the rules.]
[danvet2: oops, forgot to git add after manual patch application]
[danvet3: Appease sparse.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-16 21:22:37 +08:00
|
|
|
struct i915_mmu_object;
|
2013-06-05 19:34:14 +08:00
|
|
|
|
2015-04-27 20:41:20 +08:00
|
|
|
struct drm_i915_file_private {
|
|
|
|
struct drm_i915_private *dev_priv;
|
|
|
|
struct drm_file *file;
|
|
|
|
|
|
|
|
struct {
|
|
|
|
spinlock_t lock;
|
|
|
|
struct list_head request_list;
|
2015-05-22 04:01:48 +08:00
|
|
|
/* 20ms is a fairly arbitrary limit (greater than the average frame time)
|
|
|
|
* chosen to prevent the CPU getting more than a frame ahead of the GPU
|
|
|
|
* (when using lax throttling for the frontbuffer). We also use it to
|
|
|
|
* offer free GPU waitboosts for severely congested workloads.
|
|
|
|
*/
|
|
|
|
#define DRM_I915_THROTTLE_JIFFIES msecs_to_jiffies(20)
|
2015-04-27 20:41:20 +08:00
|
|
|
} mm;
|
|
|
|
struct idr context_idr;
|
|
|
|
|
2015-04-27 20:41:22 +08:00
|
|
|
struct intel_rps_client {
|
|
|
|
struct list_head link;
|
|
|
|
unsigned boosts;
|
|
|
|
} rps;
|
2015-04-27 20:41:20 +08:00
|
|
|
|
2016-01-15 23:12:50 +08:00
|
|
|
unsigned int bsd_ring;
|
2015-04-27 20:41:20 +08:00
|
|
|
};
|
|
|
|
|
2012-11-29 22:59:36 +08:00
|
|
|
/* Used by dp and fdi links */
|
|
|
|
struct intel_link_m_n {
|
|
|
|
uint32_t tu;
|
|
|
|
uint32_t gmch_m;
|
|
|
|
uint32_t gmch_n;
|
|
|
|
uint32_t link_m;
|
|
|
|
uint32_t link_n;
|
|
|
|
};
|
|
|
|
|
|
|
|
void intel_link_compute_m_n(int bpp, int nlanes,
|
|
|
|
int pixel_clock, int link_clock,
|
|
|
|
struct intel_link_m_n *m_n);
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
/* Interface history:
|
|
|
|
*
|
|
|
|
* 1.1: Original.
|
2006-01-02 17:14:23 +08:00
|
|
|
* 1.2: Add Power Management
|
|
|
|
* 1.3: Add vblank support
|
2006-01-25 12:31:43 +08:00
|
|
|
* 1.4: Fix cmdbuffer path, add heap destroy
|
2006-06-24 15:07:34 +08:00
|
|
|
* 1.5: Add vblank pipe configuration
|
2006-10-24 23:05:09 +08:00
|
|
|
* 1.6: - New ioctl for scheduling buffer swaps on vertical blank
|
|
|
|
* - Support vertical blank on secondary display pipe
|
2005-04-17 06:20:36 +08:00
|
|
|
*/
|
|
|
|
#define DRIVER_MAJOR 1
|
2006-10-24 23:05:09 +08:00
|
|
|
#define DRIVER_MINOR 6
|
2005-04-17 06:20:36 +08:00
|
|
|
#define DRIVER_PATCHLEVEL 0
|
|
|
|
|
2010-09-29 23:10:57 +08:00
|
|
|
#define WATCH_LISTS 0
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2008-10-01 03:14:26 +08:00
|
|
|
struct opregion_header;
|
|
|
|
struct opregion_acpi;
|
|
|
|
struct opregion_swsci;
|
|
|
|
struct opregion_asle;
|
|
|
|
|
2008-08-06 02:37:25 +08:00
|
|
|
struct intel_opregion {
|
2015-10-13 05:12:57 +08:00
|
|
|
struct opregion_header *header;
|
|
|
|
struct opregion_acpi *acpi;
|
|
|
|
struct opregion_swsci *swsci;
|
2013-09-02 15:38:59 +08:00
|
|
|
u32 swsci_gbda_sub_functions;
|
|
|
|
u32 swsci_sbcb_sub_functions;
|
2015-10-13 05:12:57 +08:00
|
|
|
struct opregion_asle *asle;
|
2015-12-15 19:18:00 +08:00
|
|
|
void *rvda;
|
2015-12-14 18:50:52 +08:00
|
|
|
const void *vbt;
|
2015-12-15 19:17:12 +08:00
|
|
|
u32 vbt_size;
|
2015-10-13 05:12:57 +08:00
|
|
|
u32 *lid_state;
|
2013-11-01 00:55:48 +08:00
|
|
|
struct work_struct asle_work;
|
2008-08-06 02:37:25 +08:00
|
|
|
};
|
2010-08-19 23:09:23 +08:00
|
|
|
#define OPREGION_SIZE (8*1024)
|
2008-08-06 02:37:25 +08:00
|
|
|
|
2010-08-05 03:26:07 +08:00
|
|
|
struct intel_overlay;
|
|
|
|
struct intel_overlay_error_state;
|
|
|
|
|
2008-11-13 02:03:55 +08:00
|
|
|
#define I915_FENCE_REG_NONE -1
|
2013-04-09 18:02:47 +08:00
|
|
|
#define I915_MAX_NUM_FENCES 32
|
|
|
|
/* 32 fences + sign bit for FENCE_REG_NONE */
|
|
|
|
#define I915_MAX_NUM_FENCE_BITS 6
|
2008-11-13 02:03:55 +08:00
|
|
|
|
|
|
|
struct drm_i915_fence_reg {
|
2010-04-28 17:02:31 +08:00
|
|
|
struct list_head lru_list;
|
2010-11-12 21:53:37 +08:00
|
|
|
struct drm_i915_gem_object *obj;
|
2011-12-14 20:57:08 +08:00
|
|
|
int pin_count;
|
2008-11-13 02:03:55 +08:00
|
|
|
};
|
2008-11-28 12:22:24 +08:00
|
|
|
|
2009-05-31 17:17:17 +08:00
|
|
|
struct sdvo_device_mapping {
|
2010-09-24 19:52:03 +08:00
|
|
|
u8 initialized;
|
2009-05-31 17:17:17 +08:00
|
|
|
u8 dvo_port;
|
|
|
|
u8 slave_addr;
|
|
|
|
u8 dvo_wiring;
|
2010-09-24 19:52:03 +08:00
|
|
|
u8 i2c_pin;
|
2010-04-24 04:07:40 +08:00
|
|
|
u8 ddc_pin;
|
2009-05-31 17:17:17 +08:00
|
|
|
};
|
|
|
|
|
2010-11-21 21:12:35 +08:00
|
|
|
struct intel_display_error_state;
|
|
|
|
|
2009-06-19 07:56:52 +08:00
|
|
|
struct drm_i915_error_state {
|
2012-04-27 21:17:39 +08:00
|
|
|
struct kref ref;
|
2014-01-30 16:19:37 +08:00
|
|
|
struct timeval time;
|
|
|
|
|
2014-02-25 23:11:25 +08:00
|
|
|
char error_msg[128];
|
2016-07-04 15:08:39 +08:00
|
|
|
bool simulated;
|
2015-08-08 03:24:15 +08:00
|
|
|
int iommu;
|
2014-02-25 23:11:27 +08:00
|
|
|
u32 reset_count;
|
2014-02-25 23:11:28 +08:00
|
|
|
u32 suspend_count;
|
2014-02-25 23:11:25 +08:00
|
|
|
|
2014-01-30 16:19:37 +08:00
|
|
|
/* Generic register state */
|
2009-06-19 07:56:52 +08:00
|
|
|
u32 eir;
|
|
|
|
u32 pgtbl_er;
|
2012-04-27 07:03:00 +08:00
|
|
|
u32 ier;
|
2014-08-06 01:07:13 +08:00
|
|
|
u32 gtier[4];
|
2012-06-05 05:42:52 +08:00
|
|
|
u32 ccid;
|
2013-01-15 20:05:55 +08:00
|
|
|
u32 derrmr;
|
|
|
|
u32 forcewake;
|
2014-01-30 16:19:37 +08:00
|
|
|
u32 error; /* gen6+ */
|
|
|
|
u32 err_int; /* gen7 */
|
2015-03-24 20:54:19 +08:00
|
|
|
u32 fault_data0; /* gen8, gen9 */
|
|
|
|
u32 fault_data1; /* gen8, gen9 */
|
2014-01-30 16:19:37 +08:00
|
|
|
u32 done_reg;
|
2014-01-30 16:19:39 +08:00
|
|
|
u32 gac_eco;
|
|
|
|
u32 gam_ecochk;
|
|
|
|
u32 gab_ctl;
|
|
|
|
u32 gfx_mode;
|
2014-01-30 16:19:37 +08:00
|
|
|
u32 extra_instdone[I915_NUM_INSTDONE_REG];
|
|
|
|
u64 fence[I915_MAX_NUM_FENCES];
|
|
|
|
struct intel_overlay_error_state *overlay;
|
|
|
|
struct intel_display_error_state *display;
|
2014-07-01 00:53:41 +08:00
|
|
|
struct drm_i915_error_object *semaphore_obj;
|
2014-01-30 16:19:37 +08:00
|
|
|
|
2012-02-15 19:25:37 +08:00
|
|
|
struct drm_i915_error_ring {
|
2014-01-27 21:52:34 +08:00
|
|
|
bool valid;
|
2014-01-30 16:19:38 +08:00
|
|
|
/* Software tracked state */
|
|
|
|
bool waiting;
|
drm/i915: Slaughter the thundering i915_wait_request herd
One particularly stressful scenario consists of many independent tasks
all competing for GPU time and waiting upon the results (e.g. realtime
transcoding of many, many streams). One bottleneck in particular is that
each client waits on its own results, but every client is woken up after
every batchbuffer - hence the thunder of hooves as then every client must
do its heavyweight dance to read a coherent seqno to see if it is the
lucky one.
Ideally, we only want one client to wake up after the interrupt and
check its request for completion. Since the requests must retire in
order, we can select the first client on the oldest request to be woken.
Once that client has completed his wait, we can then wake up the
next client and so on. However, all clients then incur latency as every
process in the chain may be delayed for scheduling - this may also then
cause some priority inversion. To reduce the latency, when a client
is added or removed from the list, we scan the tree for completed
seqno and wake up all the completed waiters in parallel.
Using igt/benchmarks/gem_latency, we can demonstrate this effect. The
benchmark measures the number of GPU cycles between completion of a
batch and the client waking up from a call to wait-ioctl. With many
concurrent waiters, with each on a different request, we observe that
the wakeup latency before the patch scales nearly linearly with the
number of waiters (before external factors kick in making the scaling much
worse). After applying the patch, we can see that only the single waiter
for the request is being woken up, providing a constant wakeup latency
for every operation. However, the situation is not quite as rosy for
many waiters on the same request, though to the best of my knowledge this
is much less likely in practice. Here, we can observe that the
concurrent waiters incur extra latency from being woken up by the
solitary bottom-half, rather than directly by the interrupt. This
appears to be scheduler induced (having discounted adverse effects from
having a rbtree walk/erase in the wakeup path), each additional
wake_up_process() costs approximately 1us on big core. Another effect of
performing the secondary wakeups from the first bottom-half is the
incurred delay this imposes on high priority threads - rather than
immediately returning to userspace and leaving the interrupt handler to
wake the others.
To offset the delay incurred with additional waiters on a request, we
could use a hybrid scheme that did a quick read in the interrupt handler
and dequeued all the completed waiters (incurring the overhead in the
interrupt handler, not the best plan either as we then incur GPU
submission latency) but we would still have to wake up the bottom-half
every time to do the heavyweight slow read. Or we could only kick the
waiters on the seqno with the same priority as the current task (i.e. in
the realtime waiter scenario, only it is woken up immediately by the
interrupt and simply queues the next waiter before returning to userspace,
minimising its delay at the expense of the chain, and also reducing
contention on its scheduler runqueue). This is effective at avoid long
pauses in the interrupt handler and at avoiding the extra latency in
realtime/high-priority waiters.
v2: Convert from a kworker per engine into a dedicated kthread for the
bottom-half.
v3: Rename request members and tweak comments.
v4: Use a per-engine spinlock in the breadcrumbs bottom-half.
v5: Fix race in locklessly checking waiter status and kicking the task on
adding a new waiter.
v6: Fix deciding when to force the timer to hide missing interrupts.
v7: Move the bottom-half from the kthread to the first client process.
v8: Reword a few comments
v9: Break the busy loop when the interrupt is unmasked or has fired.
v10: Comments, unnecessary churn, better debugging from Tvrtko
v11: Wake all completed waiters on removing the current bottom-half to
reduce the latency of waking up a herd of clients all waiting on the
same request.
v12: Rearrange missed-interrupt fault injection so that it works with
igt/drv_missed_irq_hang
v13: Rename intel_breadcrumb and friends to intel_wait in preparation
for signal handling.
v14: RCU commentary, assert_spin_locked
v15: Hide BUG_ON behind the compiler; report on gem_latency findings.
v16: Sort seqno-groups by priority so that first-waiter has the highest
task priority (and so avoid priority inversion).
v17: Add waiters to post-mortem GPU hang state.
v18: Return early for a completed wait after acquiring the spinlock.
Avoids adding ourselves to the tree if the is already complete, and
skips the awkward question of why we don't do completion wakeups for
waits earlier than or equal to ourselves.
v19: Prepare for init_breadcrumbs to fail. Later patches may want to
allocate during init, so be prepared to propagate back the error code.
Testcase: igt/gem_concurrent_blit
Testcase: igt/benchmarks/gem_latency
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> #v18
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-6-git-send-email-chris@chris-wilson.co.uk
2016-07-02 00:23:15 +08:00
|
|
|
int num_waiters;
|
2014-01-30 16:19:38 +08:00
|
|
|
int hangcheck_score;
|
|
|
|
enum intel_ring_hangcheck_action hangcheck_action;
|
|
|
|
int num_requests;
|
|
|
|
|
|
|
|
/* our own tracking of ring head and tail */
|
|
|
|
u32 cpu_ring_head;
|
|
|
|
u32 cpu_ring_tail;
|
|
|
|
|
2016-04-07 14:29:10 +08:00
|
|
|
u32 last_seqno;
|
2016-03-16 19:00:39 +08:00
|
|
|
u32 semaphore_seqno[I915_NUM_ENGINES - 1];
|
2014-01-30 16:19:38 +08:00
|
|
|
|
|
|
|
/* Register state */
|
2015-04-07 23:20:47 +08:00
|
|
|
u32 start;
|
2014-01-30 16:19:38 +08:00
|
|
|
u32 tail;
|
|
|
|
u32 head;
|
|
|
|
u32 ctl;
|
|
|
|
u32 hws;
|
|
|
|
u32 ipeir;
|
|
|
|
u32 ipehr;
|
|
|
|
u32 instdone;
|
|
|
|
u32 bbstate;
|
|
|
|
u32 instpm;
|
|
|
|
u32 instps;
|
|
|
|
u32 seqno;
|
|
|
|
u64 bbaddr;
|
2014-03-21 20:41:53 +08:00
|
|
|
u64 acthd;
|
2014-01-30 16:19:38 +08:00
|
|
|
u32 fault_reg;
|
2014-04-02 07:31:07 +08:00
|
|
|
u64 faddr;
|
2014-01-30 16:19:38 +08:00
|
|
|
u32 rc_psmi; /* sleep state */
|
2016-03-16 19:00:39 +08:00
|
|
|
u32 semaphore_mboxes[I915_NUM_ENGINES - 1];
|
2014-01-30 16:19:38 +08:00
|
|
|
|
2012-02-15 19:25:37 +08:00
|
|
|
struct drm_i915_error_object {
|
|
|
|
int page_count;
|
2015-07-30 00:23:56 +08:00
|
|
|
u64 gtt_offset;
|
2012-02-15 19:25:37 +08:00
|
|
|
u32 *pages[0];
|
2014-02-25 23:11:24 +08:00
|
|
|
} *ringbuffer, *batchbuffer, *wa_batchbuffer, *ctx, *hws_page;
|
2014-01-30 16:19:38 +08:00
|
|
|
|
2016-03-01 19:24:36 +08:00
|
|
|
struct drm_i915_error_object *wa_ctx;
|
|
|
|
|
2012-02-15 19:25:37 +08:00
|
|
|
struct drm_i915_error_request {
|
|
|
|
long jiffies;
|
|
|
|
u32 seqno;
|
2012-02-15 19:25:38 +08:00
|
|
|
u32 tail;
|
2012-02-15 19:25:37 +08:00
|
|
|
} *requests;
|
2014-01-30 16:19:40 +08:00
|
|
|
|
drm/i915: Slaughter the thundering i915_wait_request herd
One particularly stressful scenario consists of many independent tasks
all competing for GPU time and waiting upon the results (e.g. realtime
transcoding of many, many streams). One bottleneck in particular is that
each client waits on its own results, but every client is woken up after
every batchbuffer - hence the thunder of hooves as then every client must
do its heavyweight dance to read a coherent seqno to see if it is the
lucky one.
Ideally, we only want one client to wake up after the interrupt and
check its request for completion. Since the requests must retire in
order, we can select the first client on the oldest request to be woken.
Once that client has completed his wait, we can then wake up the
next client and so on. However, all clients then incur latency as every
process in the chain may be delayed for scheduling - this may also then
cause some priority inversion. To reduce the latency, when a client
is added or removed from the list, we scan the tree for completed
seqno and wake up all the completed waiters in parallel.
Using igt/benchmarks/gem_latency, we can demonstrate this effect. The
benchmark measures the number of GPU cycles between completion of a
batch and the client waking up from a call to wait-ioctl. With many
concurrent waiters, with each on a different request, we observe that
the wakeup latency before the patch scales nearly linearly with the
number of waiters (before external factors kick in making the scaling much
worse). After applying the patch, we can see that only the single waiter
for the request is being woken up, providing a constant wakeup latency
for every operation. However, the situation is not quite as rosy for
many waiters on the same request, though to the best of my knowledge this
is much less likely in practice. Here, we can observe that the
concurrent waiters incur extra latency from being woken up by the
solitary bottom-half, rather than directly by the interrupt. This
appears to be scheduler induced (having discounted adverse effects from
having a rbtree walk/erase in the wakeup path), each additional
wake_up_process() costs approximately 1us on big core. Another effect of
performing the secondary wakeups from the first bottom-half is the
incurred delay this imposes on high priority threads - rather than
immediately returning to userspace and leaving the interrupt handler to
wake the others.
To offset the delay incurred with additional waiters on a request, we
could use a hybrid scheme that did a quick read in the interrupt handler
and dequeued all the completed waiters (incurring the overhead in the
interrupt handler, not the best plan either as we then incur GPU
submission latency) but we would still have to wake up the bottom-half
every time to do the heavyweight slow read. Or we could only kick the
waiters on the seqno with the same priority as the current task (i.e. in
the realtime waiter scenario, only it is woken up immediately by the
interrupt and simply queues the next waiter before returning to userspace,
minimising its delay at the expense of the chain, and also reducing
contention on its scheduler runqueue). This is effective at avoid long
pauses in the interrupt handler and at avoiding the extra latency in
realtime/high-priority waiters.
v2: Convert from a kworker per engine into a dedicated kthread for the
bottom-half.
v3: Rename request members and tweak comments.
v4: Use a per-engine spinlock in the breadcrumbs bottom-half.
v5: Fix race in locklessly checking waiter status and kicking the task on
adding a new waiter.
v6: Fix deciding when to force the timer to hide missing interrupts.
v7: Move the bottom-half from the kthread to the first client process.
v8: Reword a few comments
v9: Break the busy loop when the interrupt is unmasked or has fired.
v10: Comments, unnecessary churn, better debugging from Tvrtko
v11: Wake all completed waiters on removing the current bottom-half to
reduce the latency of waking up a herd of clients all waiting on the
same request.
v12: Rearrange missed-interrupt fault injection so that it works with
igt/drv_missed_irq_hang
v13: Rename intel_breadcrumb and friends to intel_wait in preparation
for signal handling.
v14: RCU commentary, assert_spin_locked
v15: Hide BUG_ON behind the compiler; report on gem_latency findings.
v16: Sort seqno-groups by priority so that first-waiter has the highest
task priority (and so avoid priority inversion).
v17: Add waiters to post-mortem GPU hang state.
v18: Return early for a completed wait after acquiring the spinlock.
Avoids adding ourselves to the tree if the is already complete, and
skips the awkward question of why we don't do completion wakeups for
waits earlier than or equal to ourselves.
v19: Prepare for init_breadcrumbs to fail. Later patches may want to
allocate during init, so be prepared to propagate back the error code.
Testcase: igt/gem_concurrent_blit
Testcase: igt/benchmarks/gem_latency
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> #v18
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-6-git-send-email-chris@chris-wilson.co.uk
2016-07-02 00:23:15 +08:00
|
|
|
struct drm_i915_error_waiter {
|
|
|
|
char comm[TASK_COMM_LEN];
|
|
|
|
pid_t pid;
|
|
|
|
u32 seqno;
|
|
|
|
} *waiters;
|
|
|
|
|
2014-01-30 16:19:40 +08:00
|
|
|
struct {
|
|
|
|
u32 gfx_mode;
|
|
|
|
union {
|
|
|
|
u64 pdp[4];
|
|
|
|
u32 pp_dir_base;
|
|
|
|
};
|
|
|
|
} vm_info;
|
2014-02-25 23:11:24 +08:00
|
|
|
|
|
|
|
pid_t pid;
|
|
|
|
char comm[TASK_COMM_LEN];
|
2016-03-16 19:00:39 +08:00
|
|
|
} ring[I915_NUM_ENGINES];
|
2014-08-13 03:05:47 +08:00
|
|
|
|
2010-02-18 18:24:56 +08:00
|
|
|
struct drm_i915_error_buffer {
|
2011-01-10 05:07:49 +08:00
|
|
|
u32 size;
|
2010-02-18 18:24:56 +08:00
|
|
|
u32 name;
|
2016-03-16 19:00:39 +08:00
|
|
|
u32 rseqno[I915_NUM_ENGINES], wseqno;
|
2015-07-30 00:23:56 +08:00
|
|
|
u64 gtt_offset;
|
2010-02-18 18:24:56 +08:00
|
|
|
u32 read_domains;
|
|
|
|
u32 write_domain;
|
2011-10-10 03:52:02 +08:00
|
|
|
s32 fence_reg:I915_MAX_NUM_FENCE_BITS;
|
2010-02-18 18:24:56 +08:00
|
|
|
s32 pinned:2;
|
|
|
|
u32 tiling:2;
|
|
|
|
u32 dirty:1;
|
|
|
|
u32 purgeable:1;
|
drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl
By exporting the ability to map user address and inserting PTEs
representing their backing pages into the GTT, we can exploit UMA in order
to utilize normal application data as a texture source or even as a
render target (depending upon the capabilities of the chipset). This has
a number of uses, with zero-copy downloads to the GPU and efficient
readback making the intermixed streaming of CPU and GPU operations
fairly efficient. This ability has many widespread implications from
faster rendering of client-side software rasterisers (chromium),
mitigation of stalls due to read back (firefox) and to faster pipelining
of texture data (such as pixel buffer objects in GL or data blobs in CL).
v2: Compile with CONFIG_MMU_NOTIFIER
v3: We can sleep while performing invalidate-range, which we can utilise
to drop our page references prior to the kernel manipulating the vma
(for either discard or cloning) and so protect normal users.
v4: Only run the invalidate notifier if the range intercepts the bo.
v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers
v6: Recheck after reacquire mutex for lost mmu.
v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary.
v8: Fix rebasing error after forwarding porting the back port.
v9: Limit the userptr to page aligned entries. We now expect userspace
to handle all the offset-in-page adjustments itself.
v10: Prevent vma from being copied across fork to avoid issues with cow.
v11: Drop vma behaviour changes -- locking is nigh on impossible.
Use a worker to load user pages to avoid lock inversions.
v12: Use get_task_mm()/mmput() for correct refcounting of mm.
v13: Use a worker to release the mmu_notifier to avoid lock inversion
v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer
with its own locking and tree of objects for each mm/mmu_notifier.
v15: Prevent overlapping userptr objects, and invalidate all objects
within the mmu_notifier range
v16: Fix a typo for iterating over multiple objects in the range and
rearrange error path to destroy the mmu_notifier locklessly.
Also close a race between invalidate_range and the get_pages_worker.
v17: Close a race between get_pages_worker/invalidate_range and fresh
allocations of the same userptr range - and notice that
struct_mutex was presumed to be held when during creation it wasn't.
v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory
for the struct sg_table and to clear it before reporting an error.
v19: Always error out on read-only userptr requests as we don't have the
hardware infrastructure to support them at the moment.
v20: Refuse to implement read-only support until we have the required
infrastructure - but reserve the bit in flags for future use.
v21: use_mm() is not required for get_user_pages(). It is only meant to
be used to fix up the kernel thread's current->mm for use with
copy_user().
v22: Use sg_alloc_table_from_pages for that chunky feeling
v23: Export a function for sanity checking dma-buf rather than encode
userptr details elsewhere, and clean up comments based on
suggestions by Bradley.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Akash Goel <akash.goel@intel.com>
Cc: "Volkin, Bradley D" <bradley.d.volkin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
[danvet: Frob ioctl allocation to pick the next one - will cause a bit
of fuss with create2 apparently, but such are the rules.]
[danvet2: oops, forgot to git add after manual patch application]
[danvet3: Appease sparse.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-16 21:22:37 +08:00
|
|
|
u32 userptr:1;
|
2012-02-16 18:03:29 +08:00
|
|
|
s32 ring:4;
|
2013-09-25 17:23:19 +08:00
|
|
|
u32 cache_level:3;
|
2013-08-01 08:00:15 +08:00
|
|
|
} **active_bo, **pinned_bo;
|
2014-01-30 16:19:40 +08:00
|
|
|
|
2013-08-01 08:00:15 +08:00
|
|
|
u32 *active_bo_count, *pinned_bo_count;
|
2014-08-13 03:05:47 +08:00
|
|
|
u32 vm_count;
|
2009-06-19 07:56:52 +08:00
|
|
|
};
|
|
|
|
|
2013-11-08 22:48:56 +08:00
|
|
|
struct intel_connector;
|
2014-10-27 22:26:47 +08:00
|
|
|
struct intel_encoder;
|
2015-01-15 20:55:21 +08:00
|
|
|
struct intel_crtc_state;
|
2015-01-20 20:51:52 +08:00
|
|
|
struct intel_initial_plane_config;
|
2013-03-28 17:42:00 +08:00
|
|
|
struct intel_crtc;
|
2013-06-04 04:40:22 +08:00
|
|
|
struct intel_limit;
|
|
|
|
struct dpll;
|
2013-03-27 07:44:50 +08:00
|
|
|
|
2009-09-22 01:42:27 +08:00
|
|
|
struct drm_i915_display_funcs {
|
|
|
|
int (*get_display_clock_speed)(struct drm_device *dev);
|
|
|
|
int (*get_fifo_size)(struct drm_device *dev, int plane);
|
2016-03-01 18:07:22 +08:00
|
|
|
int (*compute_pipe_wm)(struct intel_crtc_state *cstate);
|
drm/i915: Add two-stage ILK-style watermark programming (v11)
In addition to calculating final watermarks, let's also pre-calculate a
set of intermediate watermark values at atomic check time. These
intermediate watermarks are a combination of the watermarks for the old
state and the new state; they should satisfy the requirements of both
states which means they can be programmed immediately when we commit the
atomic state (without waiting for a vblank). Once the vblank does
happen, we can then re-program watermarks to the more optimal final
value.
v2: Significant rebasing/rewriting.
v3:
- Move 'need_postvbl_update' flag to CRTC state (Daniel)
- Don't forget to check intermediate watermark values for validity
(Maarten)
- Don't due async watermark optimization; just do it at the end of the
atomic transaction, after waiting for vblanks. We do want it to be
async eventually, but adding that now will cause more trouble for
Maarten's in-progress work. (Maarten)
- Don't allocate space in crtc_state for intermediate watermarks on
platforms that don't need it (gen9+).
- Move WaCxSRDisabledForSpriteScaling:ivb into intel_begin_crtc_commit
now that ilk_update_wm is gone.
v4:
- Add a wm_mutex to cover updates to intel_crtc->active and the
need_postvbl_update flag. Since we don't have async yet it isn't
terribly important yet, but might as well add it now.
- Change interface to program watermarks. Platforms will now expose
.initial_watermarks() and .optimize_watermarks() functions to do
watermark programming. These should lock wm_mutex, copy the
appropriate state values into intel_crtc->active, and then call
the internal program watermarks function.
v5:
- Skip intermediate watermark calculation/check during initial hardware
readout since we don't trust the existing HW values (and don't have
valid values of our own yet).
- Don't try to call .optimize_watermarks() on platforms that don't have
atomic watermarks yet. (Maarten)
v6:
- Rebase
v7:
- Further rebase
v8:
- A few minor indentation and line length fixes
v9:
- Yet another rebase since Maarten's patches reworked a bunch of the
code (wm_pre, wm_post, etc.) that this was previously based on.
v10:
- Move wm_mutex to dev_priv to protect against racing commits against
disjoint CRTC sets. (Maarten)
- Drop unnecessary clearing of cstate->wm.need_postvbl_update (Maarten)
v11:
- Now that we've moved to atomic watermark updates, make sure we call
the proper function to program watermarks in
{ironlake,haswell}_crtc_enable(); the failure to do so on the
previous patch iteration led to us not actually programming the
watermarks before turning on the CRTC, which was the cause of the
underruns that the CI system was seeing.
- Fix inverted logic for determining when to optimize watermarks. We
were needlessly optimizing when the intermediate/optimal values were
the same (harmless), but not actually optimizing when they differed
(also harmless, but wasteful from a power/bandwidth perspective).
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1456276813-5689-1-git-send-email-matthew.d.roper@intel.com
2016-02-24 09:20:13 +08:00
|
|
|
int (*compute_intermediate_wm)(struct drm_device *dev,
|
|
|
|
struct intel_crtc *intel_crtc,
|
|
|
|
struct intel_crtc_state *newstate);
|
|
|
|
void (*initial_watermarks)(struct intel_crtc_state *cstate);
|
|
|
|
void (*optimize_watermarks)(struct intel_crtc_state *cstate);
|
2016-05-12 22:06:03 +08:00
|
|
|
int (*compute_global_watermarks)(struct drm_atomic_state *state);
|
2013-09-10 16:40:40 +08:00
|
|
|
void (*update_wm)(struct drm_crtc *crtc);
|
2015-06-15 18:33:56 +08:00
|
|
|
int (*modeset_calc_cdclk)(struct drm_atomic_state *state);
|
|
|
|
void (*modeset_commit_cdclk)(struct drm_atomic_state *state);
|
2013-03-28 17:42:00 +08:00
|
|
|
/* Returns the active state of the crtc, and if the crtc is active,
|
|
|
|
* fills out the pipe-config with the hw state. */
|
|
|
|
bool (*get_pipe_config)(struct intel_crtc *,
|
2015-01-15 20:55:21 +08:00
|
|
|
struct intel_crtc_state *);
|
2015-01-20 20:51:52 +08:00
|
|
|
void (*get_initial_plane_config)(struct intel_crtc *,
|
|
|
|
struct intel_initial_plane_config *);
|
2015-01-15 20:55:23 +08:00
|
|
|
int (*crtc_compute_clock)(struct intel_crtc *crtc,
|
|
|
|
struct intel_crtc_state *crtc_state);
|
2012-06-30 04:39:33 +08:00
|
|
|
void (*crtc_enable)(struct drm_crtc *crtc);
|
|
|
|
void (*crtc_disable)(struct drm_crtc *crtc);
|
2014-10-27 22:26:50 +08:00
|
|
|
void (*audio_codec_enable)(struct drm_connector *connector,
|
|
|
|
struct intel_encoder *encoder,
|
2015-09-25 21:37:43 +08:00
|
|
|
const struct drm_display_mode *adjusted_mode);
|
2014-10-27 22:26:50 +08:00
|
|
|
void (*audio_codec_disable)(struct intel_encoder *encoder);
|
2011-04-29 05:27:04 +08:00
|
|
|
void (*fdi_link_train)(struct drm_crtc *crtc);
|
2011-04-29 06:04:31 +08:00
|
|
|
void (*init_clock_gating)(struct drm_device *dev);
|
2016-05-24 23:13:53 +08:00
|
|
|
int (*queue_flip)(struct drm_device *dev, struct drm_crtc *crtc,
|
|
|
|
struct drm_framebuffer *fb,
|
|
|
|
struct drm_i915_gem_object *obj,
|
|
|
|
struct drm_i915_gem_request *req,
|
|
|
|
uint32_t flags);
|
2016-05-06 21:48:28 +08:00
|
|
|
void (*hpd_irq_setup)(struct drm_i915_private *dev_priv);
|
2009-09-22 01:42:27 +08:00
|
|
|
/* clock updates for mode set */
|
|
|
|
/* cursor updates */
|
|
|
|
/* render clock increase/decrease */
|
|
|
|
/* display clock increase/decrease */
|
|
|
|
/* pll clock increase/decrease */
|
2016-03-16 18:57:14 +08:00
|
|
|
|
2016-03-30 23:16:34 +08:00
|
|
|
void (*load_csc_matrix)(struct drm_crtc_state *crtc_state);
|
|
|
|
void (*load_luts)(struct drm_crtc_state *crtc_state);
|
2009-09-22 01:42:27 +08:00
|
|
|
};
|
|
|
|
|
2015-01-16 17:34:41 +08:00
|
|
|
enum forcewake_domain_id {
|
|
|
|
FW_DOMAIN_ID_RENDER = 0,
|
|
|
|
FW_DOMAIN_ID_BLITTER,
|
|
|
|
FW_DOMAIN_ID_MEDIA,
|
|
|
|
|
|
|
|
FW_DOMAIN_ID_COUNT
|
|
|
|
};
|
|
|
|
|
|
|
|
enum forcewake_domains {
|
|
|
|
FORCEWAKE_RENDER = (1 << FW_DOMAIN_ID_RENDER),
|
|
|
|
FORCEWAKE_BLITTER = (1 << FW_DOMAIN_ID_BLITTER),
|
|
|
|
FORCEWAKE_MEDIA = (1 << FW_DOMAIN_ID_MEDIA),
|
|
|
|
FORCEWAKE_ALL = (FORCEWAKE_RENDER |
|
|
|
|
FORCEWAKE_BLITTER |
|
|
|
|
FORCEWAKE_MEDIA)
|
|
|
|
};
|
|
|
|
|
2016-04-12 21:37:31 +08:00
|
|
|
#define FW_REG_READ (1)
|
|
|
|
#define FW_REG_WRITE (2)
|
|
|
|
|
|
|
|
enum forcewake_domains
|
|
|
|
intel_uncore_forcewake_for_reg(struct drm_i915_private *dev_priv,
|
|
|
|
i915_reg_t reg, unsigned int op);
|
|
|
|
|
2013-07-20 03:36:52 +08:00
|
|
|
struct intel_uncore_funcs {
|
2013-11-23 17:25:42 +08:00
|
|
|
void (*force_wake_get)(struct drm_i915_private *dev_priv,
|
2015-01-16 17:34:41 +08:00
|
|
|
enum forcewake_domains domains);
|
2013-11-23 17:25:42 +08:00
|
|
|
void (*force_wake_put)(struct drm_i915_private *dev_priv,
|
2015-01-16 17:34:41 +08:00
|
|
|
enum forcewake_domains domains);
|
2013-10-05 12:22:51 +08:00
|
|
|
|
drm/i915: Type safe register read/write
Make I915_READ and I915_WRITE more type safe by wrapping the register
offset in a struct. This should eliminate most of the fumbles we've had
with misplaced parens.
This only takes care of normal mmio registers. We could extend the idea
to other register types and define each with its own struct. That way
you wouldn't be able to accidentally pass the wrong thing to a specific
register access function.
The gpio_reg setup is probably the ugliest thing left. But I figure I'd
just leave it for now, and wait for some divine inspiration to strike
before making it nice.
As for the generated code, it's actually a bit better sometimes. Eg.
looking at i915_irq_handler(), we can see the following change:
lea 0x70024(%rdx,%rax,1),%r9d
mov $0x1,%edx
- movslq %r9d,%r9
- mov %r9,%rsi
- mov %r9,-0x58(%rbp)
- callq *0xd8(%rbx)
+ mov %r9d,%esi
+ mov %r9d,-0x48(%rbp)
callq *0xd8(%rbx)
So previously gcc thought the register offset might be signed and
decided to sign extend it, just in case. The rest appears to be
mostly just minor shuffling of instructions.
v2: i915_mmio_reg_{offset,equal,valid}() helpers added
s/_REG/_MMIO/ in the register defines
mo more switch statements left to worry about
ring_emit stuff got sorted in a prep patch
cmd parser, lrc context and w/a batch buildup also in prep patch
vgpu stuff cleaned up and moved to a prep patch
all other unrelated changes split out
v3: Rebased due to BXT DSI/BLC, MOCS, etc.
v4: Rebased due to churn, s/i915_mmio_reg_t/i915_reg_t/
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1447853606-2751-1-git-send-email-ville.syrjala@linux.intel.com
2015-11-18 21:33:26 +08:00
|
|
|
uint8_t (*mmio_readb)(struct drm_i915_private *dev_priv, i915_reg_t r, bool trace);
|
|
|
|
uint16_t (*mmio_readw)(struct drm_i915_private *dev_priv, i915_reg_t r, bool trace);
|
|
|
|
uint32_t (*mmio_readl)(struct drm_i915_private *dev_priv, i915_reg_t r, bool trace);
|
|
|
|
uint64_t (*mmio_readq)(struct drm_i915_private *dev_priv, i915_reg_t r, bool trace);
|
2013-10-05 12:22:51 +08:00
|
|
|
|
drm/i915: Type safe register read/write
Make I915_READ and I915_WRITE more type safe by wrapping the register
offset in a struct. This should eliminate most of the fumbles we've had
with misplaced parens.
This only takes care of normal mmio registers. We could extend the idea
to other register types and define each with its own struct. That way
you wouldn't be able to accidentally pass the wrong thing to a specific
register access function.
The gpio_reg setup is probably the ugliest thing left. But I figure I'd
just leave it for now, and wait for some divine inspiration to strike
before making it nice.
As for the generated code, it's actually a bit better sometimes. Eg.
looking at i915_irq_handler(), we can see the following change:
lea 0x70024(%rdx,%rax,1),%r9d
mov $0x1,%edx
- movslq %r9d,%r9
- mov %r9,%rsi
- mov %r9,-0x58(%rbp)
- callq *0xd8(%rbx)
+ mov %r9d,%esi
+ mov %r9d,-0x48(%rbp)
callq *0xd8(%rbx)
So previously gcc thought the register offset might be signed and
decided to sign extend it, just in case. The rest appears to be
mostly just minor shuffling of instructions.
v2: i915_mmio_reg_{offset,equal,valid}() helpers added
s/_REG/_MMIO/ in the register defines
mo more switch statements left to worry about
ring_emit stuff got sorted in a prep patch
cmd parser, lrc context and w/a batch buildup also in prep patch
vgpu stuff cleaned up and moved to a prep patch
all other unrelated changes split out
v3: Rebased due to BXT DSI/BLC, MOCS, etc.
v4: Rebased due to churn, s/i915_mmio_reg_t/i915_reg_t/
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1447853606-2751-1-git-send-email-ville.syrjala@linux.intel.com
2015-11-18 21:33:26 +08:00
|
|
|
void (*mmio_writeb)(struct drm_i915_private *dev_priv, i915_reg_t r,
|
2013-10-05 12:22:51 +08:00
|
|
|
uint8_t val, bool trace);
|
drm/i915: Type safe register read/write
Make I915_READ and I915_WRITE more type safe by wrapping the register
offset in a struct. This should eliminate most of the fumbles we've had
with misplaced parens.
This only takes care of normal mmio registers. We could extend the idea
to other register types and define each with its own struct. That way
you wouldn't be able to accidentally pass the wrong thing to a specific
register access function.
The gpio_reg setup is probably the ugliest thing left. But I figure I'd
just leave it for now, and wait for some divine inspiration to strike
before making it nice.
As for the generated code, it's actually a bit better sometimes. Eg.
looking at i915_irq_handler(), we can see the following change:
lea 0x70024(%rdx,%rax,1),%r9d
mov $0x1,%edx
- movslq %r9d,%r9
- mov %r9,%rsi
- mov %r9,-0x58(%rbp)
- callq *0xd8(%rbx)
+ mov %r9d,%esi
+ mov %r9d,-0x48(%rbp)
callq *0xd8(%rbx)
So previously gcc thought the register offset might be signed and
decided to sign extend it, just in case. The rest appears to be
mostly just minor shuffling of instructions.
v2: i915_mmio_reg_{offset,equal,valid}() helpers added
s/_REG/_MMIO/ in the register defines
mo more switch statements left to worry about
ring_emit stuff got sorted in a prep patch
cmd parser, lrc context and w/a batch buildup also in prep patch
vgpu stuff cleaned up and moved to a prep patch
all other unrelated changes split out
v3: Rebased due to BXT DSI/BLC, MOCS, etc.
v4: Rebased due to churn, s/i915_mmio_reg_t/i915_reg_t/
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1447853606-2751-1-git-send-email-ville.syrjala@linux.intel.com
2015-11-18 21:33:26 +08:00
|
|
|
void (*mmio_writew)(struct drm_i915_private *dev_priv, i915_reg_t r,
|
2013-10-05 12:22:51 +08:00
|
|
|
uint16_t val, bool trace);
|
drm/i915: Type safe register read/write
Make I915_READ and I915_WRITE more type safe by wrapping the register
offset in a struct. This should eliminate most of the fumbles we've had
with misplaced parens.
This only takes care of normal mmio registers. We could extend the idea
to other register types and define each with its own struct. That way
you wouldn't be able to accidentally pass the wrong thing to a specific
register access function.
The gpio_reg setup is probably the ugliest thing left. But I figure I'd
just leave it for now, and wait for some divine inspiration to strike
before making it nice.
As for the generated code, it's actually a bit better sometimes. Eg.
looking at i915_irq_handler(), we can see the following change:
lea 0x70024(%rdx,%rax,1),%r9d
mov $0x1,%edx
- movslq %r9d,%r9
- mov %r9,%rsi
- mov %r9,-0x58(%rbp)
- callq *0xd8(%rbx)
+ mov %r9d,%esi
+ mov %r9d,-0x48(%rbp)
callq *0xd8(%rbx)
So previously gcc thought the register offset might be signed and
decided to sign extend it, just in case. The rest appears to be
mostly just minor shuffling of instructions.
v2: i915_mmio_reg_{offset,equal,valid}() helpers added
s/_REG/_MMIO/ in the register defines
mo more switch statements left to worry about
ring_emit stuff got sorted in a prep patch
cmd parser, lrc context and w/a batch buildup also in prep patch
vgpu stuff cleaned up and moved to a prep patch
all other unrelated changes split out
v3: Rebased due to BXT DSI/BLC, MOCS, etc.
v4: Rebased due to churn, s/i915_mmio_reg_t/i915_reg_t/
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1447853606-2751-1-git-send-email-ville.syrjala@linux.intel.com
2015-11-18 21:33:26 +08:00
|
|
|
void (*mmio_writel)(struct drm_i915_private *dev_priv, i915_reg_t r,
|
2013-10-05 12:22:51 +08:00
|
|
|
uint32_t val, bool trace);
|
drm/i915: Type safe register read/write
Make I915_READ and I915_WRITE more type safe by wrapping the register
offset in a struct. This should eliminate most of the fumbles we've had
with misplaced parens.
This only takes care of normal mmio registers. We could extend the idea
to other register types and define each with its own struct. That way
you wouldn't be able to accidentally pass the wrong thing to a specific
register access function.
The gpio_reg setup is probably the ugliest thing left. But I figure I'd
just leave it for now, and wait for some divine inspiration to strike
before making it nice.
As for the generated code, it's actually a bit better sometimes. Eg.
looking at i915_irq_handler(), we can see the following change:
lea 0x70024(%rdx,%rax,1),%r9d
mov $0x1,%edx
- movslq %r9d,%r9
- mov %r9,%rsi
- mov %r9,-0x58(%rbp)
- callq *0xd8(%rbx)
+ mov %r9d,%esi
+ mov %r9d,-0x48(%rbp)
callq *0xd8(%rbx)
So previously gcc thought the register offset might be signed and
decided to sign extend it, just in case. The rest appears to be
mostly just minor shuffling of instructions.
v2: i915_mmio_reg_{offset,equal,valid}() helpers added
s/_REG/_MMIO/ in the register defines
mo more switch statements left to worry about
ring_emit stuff got sorted in a prep patch
cmd parser, lrc context and w/a batch buildup also in prep patch
vgpu stuff cleaned up and moved to a prep patch
all other unrelated changes split out
v3: Rebased due to BXT DSI/BLC, MOCS, etc.
v4: Rebased due to churn, s/i915_mmio_reg_t/i915_reg_t/
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1447853606-2751-1-git-send-email-ville.syrjala@linux.intel.com
2015-11-18 21:33:26 +08:00
|
|
|
void (*mmio_writeq)(struct drm_i915_private *dev_priv, i915_reg_t r,
|
2013-10-05 12:22:51 +08:00
|
|
|
uint64_t val, bool trace);
|
2012-07-02 22:51:02 +08:00
|
|
|
};
|
|
|
|
|
2013-07-20 03:36:52 +08:00
|
|
|
struct intel_uncore {
|
|
|
|
spinlock_t lock; /** lock is also taken in irq contexts. */
|
|
|
|
|
|
|
|
struct intel_uncore_funcs funcs;
|
|
|
|
|
|
|
|
unsigned fifo_count;
|
2015-01-16 17:34:41 +08:00
|
|
|
enum forcewake_domains fw_domains;
|
2015-01-16 17:34:37 +08:00
|
|
|
|
|
|
|
struct intel_uncore_forcewake_domain {
|
|
|
|
struct drm_i915_private *i915;
|
2015-01-16 17:34:41 +08:00
|
|
|
enum forcewake_domain_id id;
|
2016-04-08 00:04:33 +08:00
|
|
|
enum forcewake_domains mask;
|
2015-01-16 17:34:37 +08:00
|
|
|
unsigned wake_count;
|
drm/i915: Use consistent forcewake auto-release timeout across kernel configs
Because it is based on jiffies, current implementation releases the
forcewake at any time between straight away and between 1ms and 10ms,
depending on the kernel configuration (CONFIG_HZ).
This is probably not what has been desired, since the dynamics of keeping
parts of the GPU awake should not be correlated with this kernel
configuration parameter.
Change the auto-release mechanism to use hrtimers and set the timeout to
1ms with a 1ms of slack. This should make the GPU power consistent
across kernel configs, and timer slack should enable some timer coalescing
where multiple force-wake domains exist, or with unrelated timers.
For GlBench/T-Rex this decreases the number of forcewake releases from
~480 to ~300 per second, and for a heavy combined OGL/OCL test from
~670 to ~360 (HZ=1000 kernel).
Even though this reduction can be attributed to the average release period
extending from 0-1ms to 1-2ms, as discussed above, it will make the
forcewake timeout consistent for different CONFIG_HZ values.
Real life measurements with the above workload has shown that, with this
patch, both manage to auto-release the forcewake between 2-4 times per
10ms, even though the number of forcewake gets is dramatically different.
T-Rex requests between 5-10 explicit gets and 5-10 implict gets in each
10ms period, while the OGL/OCL test requests 250 and 380 times in the same
period.
The two data points together suggest that the nature of the forwake
accesses is bursty and that further changes and potential timeout
extensions, or moving the start of timeout from the first to the last
automatic forcewake grab, should be carefully measured for power and
performance effects.
v2:
* Commit spelling. (Dave Gordon)
* More discussion on numbers in the commit. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-04-08 00:04:32 +08:00
|
|
|
struct hrtimer timer;
|
drm/i915: Type safe register read/write
Make I915_READ and I915_WRITE more type safe by wrapping the register
offset in a struct. This should eliminate most of the fumbles we've had
with misplaced parens.
This only takes care of normal mmio registers. We could extend the idea
to other register types and define each with its own struct. That way
you wouldn't be able to accidentally pass the wrong thing to a specific
register access function.
The gpio_reg setup is probably the ugliest thing left. But I figure I'd
just leave it for now, and wait for some divine inspiration to strike
before making it nice.
As for the generated code, it's actually a bit better sometimes. Eg.
looking at i915_irq_handler(), we can see the following change:
lea 0x70024(%rdx,%rax,1),%r9d
mov $0x1,%edx
- movslq %r9d,%r9
- mov %r9,%rsi
- mov %r9,-0x58(%rbp)
- callq *0xd8(%rbx)
+ mov %r9d,%esi
+ mov %r9d,-0x48(%rbp)
callq *0xd8(%rbx)
So previously gcc thought the register offset might be signed and
decided to sign extend it, just in case. The rest appears to be
mostly just minor shuffling of instructions.
v2: i915_mmio_reg_{offset,equal,valid}() helpers added
s/_REG/_MMIO/ in the register defines
mo more switch statements left to worry about
ring_emit stuff got sorted in a prep patch
cmd parser, lrc context and w/a batch buildup also in prep patch
vgpu stuff cleaned up and moved to a prep patch
all other unrelated changes split out
v3: Rebased due to BXT DSI/BLC, MOCS, etc.
v4: Rebased due to churn, s/i915_mmio_reg_t/i915_reg_t/
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1447853606-2751-1-git-send-email-ville.syrjala@linux.intel.com
2015-11-18 21:33:26 +08:00
|
|
|
i915_reg_t reg_set;
|
2015-01-19 22:20:43 +08:00
|
|
|
u32 val_set;
|
|
|
|
u32 val_clear;
|
drm/i915: Type safe register read/write
Make I915_READ and I915_WRITE more type safe by wrapping the register
offset in a struct. This should eliminate most of the fumbles we've had
with misplaced parens.
This only takes care of normal mmio registers. We could extend the idea
to other register types and define each with its own struct. That way
you wouldn't be able to accidentally pass the wrong thing to a specific
register access function.
The gpio_reg setup is probably the ugliest thing left. But I figure I'd
just leave it for now, and wait for some divine inspiration to strike
before making it nice.
As for the generated code, it's actually a bit better sometimes. Eg.
looking at i915_irq_handler(), we can see the following change:
lea 0x70024(%rdx,%rax,1),%r9d
mov $0x1,%edx
- movslq %r9d,%r9
- mov %r9,%rsi
- mov %r9,-0x58(%rbp)
- callq *0xd8(%rbx)
+ mov %r9d,%esi
+ mov %r9d,-0x48(%rbp)
callq *0xd8(%rbx)
So previously gcc thought the register offset might be signed and
decided to sign extend it, just in case. The rest appears to be
mostly just minor shuffling of instructions.
v2: i915_mmio_reg_{offset,equal,valid}() helpers added
s/_REG/_MMIO/ in the register defines
mo more switch statements left to worry about
ring_emit stuff got sorted in a prep patch
cmd parser, lrc context and w/a batch buildup also in prep patch
vgpu stuff cleaned up and moved to a prep patch
all other unrelated changes split out
v3: Rebased due to BXT DSI/BLC, MOCS, etc.
v4: Rebased due to churn, s/i915_mmio_reg_t/i915_reg_t/
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1447853606-2751-1-git-send-email-ville.syrjala@linux.intel.com
2015-11-18 21:33:26 +08:00
|
|
|
i915_reg_t reg_ack;
|
|
|
|
i915_reg_t reg_post;
|
2015-01-19 22:20:43 +08:00
|
|
|
u32 val_reset;
|
2015-01-16 17:34:37 +08:00
|
|
|
} fw_domain[FW_DOMAIN_ID_COUNT];
|
2015-12-16 15:26:48 +08:00
|
|
|
|
|
|
|
int unclaimed_mmio_check;
|
2015-01-16 17:34:37 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
/* Iterate over initialised fw domains */
|
2016-04-08 00:04:33 +08:00
|
|
|
#define for_each_fw_domain_masked(domain__, mask__, dev_priv__) \
|
|
|
|
for ((domain__) = &(dev_priv__)->uncore.fw_domain[0]; \
|
|
|
|
(domain__) < &(dev_priv__)->uncore.fw_domain[FW_DOMAIN_ID_COUNT]; \
|
|
|
|
(domain__)++) \
|
|
|
|
for_each_if ((mask__) & (domain__)->mask)
|
|
|
|
|
|
|
|
#define for_each_fw_domain(domain__, dev_priv__) \
|
|
|
|
for_each_fw_domain_masked(domain__, FORCEWAKE_ALL, dev_priv__)
|
2013-07-20 03:36:52 +08:00
|
|
|
|
2015-10-27 20:46:59 +08:00
|
|
|
#define CSR_VERSION(major, minor) ((major) << 16 | (minor))
|
|
|
|
#define CSR_VERSION_MAJOR(version) ((version) >> 16)
|
|
|
|
#define CSR_VERSION_MINOR(version) ((version) & 0xffff)
|
|
|
|
|
drm/i915/skl: Add support to load SKL CSR firmware.
Display Context Save and Restore support is needed for
various SKL Display C states like DC5, DC6.
This implementation is added based on first version of DMC CSR program
that we received from h/w team.
Here we are using request_firmware based design.
Finally this firmware should end up in linux-firmware tree.
For SKL platform its mandatory to ensure that we load this
csr program before enabling DC states like DC5/DC6.
As CSR program gets reset on various conditions, we should ensure
to load it during boot and in future change to be added to load
this system resume sequence too.
v1: Initial relese as RFC patch
v2: Design change as per Daniel, Damien and Shobit's review comments
request firmware method followed.
v3: Some optimization and functional changes.
Pulled register defines into drivers/gpu/drm/i915/i915_reg.h
Used kmemdup to allocate and duplicate firmware content.
Ensured to free allocated buffer.
v4: Modified as per review comments from Satheesh and Daniel
Removed temporary buffer.
Optimized number of writes by replacing I915_WRITE with I915_WRITE64.
v5:
Modified as per review comemnts from Damien.
- Changed name for functions and firmware.
- Introduced HAS_CSR.
- Reverted back previous change and used csr_buf with u8 size.
- Using cpu_to_be64 for endianness change.
Modified as per review comments from Imre.
- Modified registers and macro names to be a bit closer to bspec terminology
and the existing register naming in the driver.
- Early return for non SKL platforms in intel_load_csr_program function.
- Added locking around CSR program load function as it may be called
concurrently during system/runtime resume.
- Releasing the fw before loading the program for consistency
- Handled error path during f/w load.
v6: Modified as per review comments from Imre.
- Corrected out_freecsr sequence.
v7: Modified as per review comments from Imre.
Fail loading fw if fw->size%8!=0.
v8: Rebase to latest.
v9: Rebase on top of -nightly (Damien)
v10: Enabled support for dmc firmware ver 1.0.
According to ver 1.0 in a single binary package all the firmware's that are
required for different stepping's of the product will be stored. The package
contains the css header, followed by the package header and the actual dmc
firmwares. Package header contains the firmware/stepping mapping table and
the corresponding firmware offsets to the individual binaries, within the
package. Each individual program binary contains the header and the payload
sections whose size is specified in the header section. This changes are done
to extract the specific firmaware from the package. (Animesh)
v11: Modified as per review comemnts from Imre.
- Added code comment from bpec for header structure elements.
- Added __packed to avoid structure padding.
- Added helper functions for stepping and substepping info.
- Added code comment for CSR_MAX_FW_SIZE.
- Disabled BXT firmware loading, will be enabled with dmc 1.0 support.
- Changed skl_stepping_info based on bspec, earlier used from config DB.
- Removed duplicate call of cpu_to_be* from intel_csr_load_program function.
- Used cpu_to_be32 instead of cpu_to_be64 as firmware binary in dword aligned.
- Added sanity check for header length.
- Added sanity check for mmio address got from firmware binary.
- kmalloc done separately for dmc header and dmc firmware. (Animesh)
v12: Modified as per review comemnts from Imre.
- Corrected the typo error in skl stepping info structure.
- Added out-of-bound access for skl_stepping_info.
- Sanity check for mmio address modified.
- Sanity check added for stepping and substeppig.
- Modified the intel_dmc_info structure, cache only the required header info. (Animesh)
v13: clarify firmware load error message.
The reason for a firmware loading failure can be obscure if the driver
is built-in. Provide an explanation to the user about the likely reason for
the failure and how to resolve it. (Imre)
v14: Suggested by Jani.
- fix s/I915/CONFIG_DRM_I915/ typo
- add fw_path to the firmware object instead of using a static ptr (Jani)
v15:
1) Changed the firmware name as dmc_gen9.bin, everytime for a new firmware version a symbolic link
with same name will help not to build kernel again.
2) Changes done as per review comments from Imre.
- Error check removed for intel_csr_ucode_init.
- Moved csr-specific data structure to intel_csr.h and optimization done on structure definition.
- fw->data used directly for parsing the header info & memory allocation
only done separately for payload. (Animesh)
v16:
- No need for out_regs label in i915_driver_load(), so removed it.
- Changed the firmware name as skl_dmc_ver1.bin, followed naming convention <platform>_dmc_<api-version>.bin (Animesh)
Issue: VIZ-2569
Signed-off-by: A.Sunil Kamath <sunil.kamath@intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Animesh Manna <animesh.manna@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-05-04 20:58:44 +08:00
|
|
|
struct intel_csr {
|
2015-10-29 05:59:04 +08:00
|
|
|
struct work_struct work;
|
drm/i915/skl: Add support to load SKL CSR firmware.
Display Context Save and Restore support is needed for
various SKL Display C states like DC5, DC6.
This implementation is added based on first version of DMC CSR program
that we received from h/w team.
Here we are using request_firmware based design.
Finally this firmware should end up in linux-firmware tree.
For SKL platform its mandatory to ensure that we load this
csr program before enabling DC states like DC5/DC6.
As CSR program gets reset on various conditions, we should ensure
to load it during boot and in future change to be added to load
this system resume sequence too.
v1: Initial relese as RFC patch
v2: Design change as per Daniel, Damien and Shobit's review comments
request firmware method followed.
v3: Some optimization and functional changes.
Pulled register defines into drivers/gpu/drm/i915/i915_reg.h
Used kmemdup to allocate and duplicate firmware content.
Ensured to free allocated buffer.
v4: Modified as per review comments from Satheesh and Daniel
Removed temporary buffer.
Optimized number of writes by replacing I915_WRITE with I915_WRITE64.
v5:
Modified as per review comemnts from Damien.
- Changed name for functions and firmware.
- Introduced HAS_CSR.
- Reverted back previous change and used csr_buf with u8 size.
- Using cpu_to_be64 for endianness change.
Modified as per review comments from Imre.
- Modified registers and macro names to be a bit closer to bspec terminology
and the existing register naming in the driver.
- Early return for non SKL platforms in intel_load_csr_program function.
- Added locking around CSR program load function as it may be called
concurrently during system/runtime resume.
- Releasing the fw before loading the program for consistency
- Handled error path during f/w load.
v6: Modified as per review comments from Imre.
- Corrected out_freecsr sequence.
v7: Modified as per review comments from Imre.
Fail loading fw if fw->size%8!=0.
v8: Rebase to latest.
v9: Rebase on top of -nightly (Damien)
v10: Enabled support for dmc firmware ver 1.0.
According to ver 1.0 in a single binary package all the firmware's that are
required for different stepping's of the product will be stored. The package
contains the css header, followed by the package header and the actual dmc
firmwares. Package header contains the firmware/stepping mapping table and
the corresponding firmware offsets to the individual binaries, within the
package. Each individual program binary contains the header and the payload
sections whose size is specified in the header section. This changes are done
to extract the specific firmaware from the package. (Animesh)
v11: Modified as per review comemnts from Imre.
- Added code comment from bpec for header structure elements.
- Added __packed to avoid structure padding.
- Added helper functions for stepping and substepping info.
- Added code comment for CSR_MAX_FW_SIZE.
- Disabled BXT firmware loading, will be enabled with dmc 1.0 support.
- Changed skl_stepping_info based on bspec, earlier used from config DB.
- Removed duplicate call of cpu_to_be* from intel_csr_load_program function.
- Used cpu_to_be32 instead of cpu_to_be64 as firmware binary in dword aligned.
- Added sanity check for header length.
- Added sanity check for mmio address got from firmware binary.
- kmalloc done separately for dmc header and dmc firmware. (Animesh)
v12: Modified as per review comemnts from Imre.
- Corrected the typo error in skl stepping info structure.
- Added out-of-bound access for skl_stepping_info.
- Sanity check for mmio address modified.
- Sanity check added for stepping and substeppig.
- Modified the intel_dmc_info structure, cache only the required header info. (Animesh)
v13: clarify firmware load error message.
The reason for a firmware loading failure can be obscure if the driver
is built-in. Provide an explanation to the user about the likely reason for
the failure and how to resolve it. (Imre)
v14: Suggested by Jani.
- fix s/I915/CONFIG_DRM_I915/ typo
- add fw_path to the firmware object instead of using a static ptr (Jani)
v15:
1) Changed the firmware name as dmc_gen9.bin, everytime for a new firmware version a symbolic link
with same name will help not to build kernel again.
2) Changes done as per review comments from Imre.
- Error check removed for intel_csr_ucode_init.
- Moved csr-specific data structure to intel_csr.h and optimization done on structure definition.
- fw->data used directly for parsing the header info & memory allocation
only done separately for payload. (Animesh)
v16:
- No need for out_regs label in i915_driver_load(), so removed it.
- Changed the firmware name as skl_dmc_ver1.bin, followed naming convention <platform>_dmc_<api-version>.bin (Animesh)
Issue: VIZ-2569
Signed-off-by: A.Sunil Kamath <sunil.kamath@intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Animesh Manna <animesh.manna@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-05-04 20:58:44 +08:00
|
|
|
const char *fw_path;
|
2015-08-04 00:25:32 +08:00
|
|
|
uint32_t *dmc_payload;
|
drm/i915/skl: Add support to load SKL CSR firmware.
Display Context Save and Restore support is needed for
various SKL Display C states like DC5, DC6.
This implementation is added based on first version of DMC CSR program
that we received from h/w team.
Here we are using request_firmware based design.
Finally this firmware should end up in linux-firmware tree.
For SKL platform its mandatory to ensure that we load this
csr program before enabling DC states like DC5/DC6.
As CSR program gets reset on various conditions, we should ensure
to load it during boot and in future change to be added to load
this system resume sequence too.
v1: Initial relese as RFC patch
v2: Design change as per Daniel, Damien and Shobit's review comments
request firmware method followed.
v3: Some optimization and functional changes.
Pulled register defines into drivers/gpu/drm/i915/i915_reg.h
Used kmemdup to allocate and duplicate firmware content.
Ensured to free allocated buffer.
v4: Modified as per review comments from Satheesh and Daniel
Removed temporary buffer.
Optimized number of writes by replacing I915_WRITE with I915_WRITE64.
v5:
Modified as per review comemnts from Damien.
- Changed name for functions and firmware.
- Introduced HAS_CSR.
- Reverted back previous change and used csr_buf with u8 size.
- Using cpu_to_be64 for endianness change.
Modified as per review comments from Imre.
- Modified registers and macro names to be a bit closer to bspec terminology
and the existing register naming in the driver.
- Early return for non SKL platforms in intel_load_csr_program function.
- Added locking around CSR program load function as it may be called
concurrently during system/runtime resume.
- Releasing the fw before loading the program for consistency
- Handled error path during f/w load.
v6: Modified as per review comments from Imre.
- Corrected out_freecsr sequence.
v7: Modified as per review comments from Imre.
Fail loading fw if fw->size%8!=0.
v8: Rebase to latest.
v9: Rebase on top of -nightly (Damien)
v10: Enabled support for dmc firmware ver 1.0.
According to ver 1.0 in a single binary package all the firmware's that are
required for different stepping's of the product will be stored. The package
contains the css header, followed by the package header and the actual dmc
firmwares. Package header contains the firmware/stepping mapping table and
the corresponding firmware offsets to the individual binaries, within the
package. Each individual program binary contains the header and the payload
sections whose size is specified in the header section. This changes are done
to extract the specific firmaware from the package. (Animesh)
v11: Modified as per review comemnts from Imre.
- Added code comment from bpec for header structure elements.
- Added __packed to avoid structure padding.
- Added helper functions for stepping and substepping info.
- Added code comment for CSR_MAX_FW_SIZE.
- Disabled BXT firmware loading, will be enabled with dmc 1.0 support.
- Changed skl_stepping_info based on bspec, earlier used from config DB.
- Removed duplicate call of cpu_to_be* from intel_csr_load_program function.
- Used cpu_to_be32 instead of cpu_to_be64 as firmware binary in dword aligned.
- Added sanity check for header length.
- Added sanity check for mmio address got from firmware binary.
- kmalloc done separately for dmc header and dmc firmware. (Animesh)
v12: Modified as per review comemnts from Imre.
- Corrected the typo error in skl stepping info structure.
- Added out-of-bound access for skl_stepping_info.
- Sanity check for mmio address modified.
- Sanity check added for stepping and substeppig.
- Modified the intel_dmc_info structure, cache only the required header info. (Animesh)
v13: clarify firmware load error message.
The reason for a firmware loading failure can be obscure if the driver
is built-in. Provide an explanation to the user about the likely reason for
the failure and how to resolve it. (Imre)
v14: Suggested by Jani.
- fix s/I915/CONFIG_DRM_I915/ typo
- add fw_path to the firmware object instead of using a static ptr (Jani)
v15:
1) Changed the firmware name as dmc_gen9.bin, everytime for a new firmware version a symbolic link
with same name will help not to build kernel again.
2) Changes done as per review comments from Imre.
- Error check removed for intel_csr_ucode_init.
- Moved csr-specific data structure to intel_csr.h and optimization done on structure definition.
- fw->data used directly for parsing the header info & memory allocation
only done separately for payload. (Animesh)
v16:
- No need for out_regs label in i915_driver_load(), so removed it.
- Changed the firmware name as skl_dmc_ver1.bin, followed naming convention <platform>_dmc_<api-version>.bin (Animesh)
Issue: VIZ-2569
Signed-off-by: A.Sunil Kamath <sunil.kamath@intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Animesh Manna <animesh.manna@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-05-04 20:58:44 +08:00
|
|
|
uint32_t dmc_fw_size;
|
2015-10-27 20:46:59 +08:00
|
|
|
uint32_t version;
|
drm/i915/skl: Add support to load SKL CSR firmware.
Display Context Save and Restore support is needed for
various SKL Display C states like DC5, DC6.
This implementation is added based on first version of DMC CSR program
that we received from h/w team.
Here we are using request_firmware based design.
Finally this firmware should end up in linux-firmware tree.
For SKL platform its mandatory to ensure that we load this
csr program before enabling DC states like DC5/DC6.
As CSR program gets reset on various conditions, we should ensure
to load it during boot and in future change to be added to load
this system resume sequence too.
v1: Initial relese as RFC patch
v2: Design change as per Daniel, Damien and Shobit's review comments
request firmware method followed.
v3: Some optimization and functional changes.
Pulled register defines into drivers/gpu/drm/i915/i915_reg.h
Used kmemdup to allocate and duplicate firmware content.
Ensured to free allocated buffer.
v4: Modified as per review comments from Satheesh and Daniel
Removed temporary buffer.
Optimized number of writes by replacing I915_WRITE with I915_WRITE64.
v5:
Modified as per review comemnts from Damien.
- Changed name for functions and firmware.
- Introduced HAS_CSR.
- Reverted back previous change and used csr_buf with u8 size.
- Using cpu_to_be64 for endianness change.
Modified as per review comments from Imre.
- Modified registers and macro names to be a bit closer to bspec terminology
and the existing register naming in the driver.
- Early return for non SKL platforms in intel_load_csr_program function.
- Added locking around CSR program load function as it may be called
concurrently during system/runtime resume.
- Releasing the fw before loading the program for consistency
- Handled error path during f/w load.
v6: Modified as per review comments from Imre.
- Corrected out_freecsr sequence.
v7: Modified as per review comments from Imre.
Fail loading fw if fw->size%8!=0.
v8: Rebase to latest.
v9: Rebase on top of -nightly (Damien)
v10: Enabled support for dmc firmware ver 1.0.
According to ver 1.0 in a single binary package all the firmware's that are
required for different stepping's of the product will be stored. The package
contains the css header, followed by the package header and the actual dmc
firmwares. Package header contains the firmware/stepping mapping table and
the corresponding firmware offsets to the individual binaries, within the
package. Each individual program binary contains the header and the payload
sections whose size is specified in the header section. This changes are done
to extract the specific firmaware from the package. (Animesh)
v11: Modified as per review comemnts from Imre.
- Added code comment from bpec for header structure elements.
- Added __packed to avoid structure padding.
- Added helper functions for stepping and substepping info.
- Added code comment for CSR_MAX_FW_SIZE.
- Disabled BXT firmware loading, will be enabled with dmc 1.0 support.
- Changed skl_stepping_info based on bspec, earlier used from config DB.
- Removed duplicate call of cpu_to_be* from intel_csr_load_program function.
- Used cpu_to_be32 instead of cpu_to_be64 as firmware binary in dword aligned.
- Added sanity check for header length.
- Added sanity check for mmio address got from firmware binary.
- kmalloc done separately for dmc header and dmc firmware. (Animesh)
v12: Modified as per review comemnts from Imre.
- Corrected the typo error in skl stepping info structure.
- Added out-of-bound access for skl_stepping_info.
- Sanity check for mmio address modified.
- Sanity check added for stepping and substeppig.
- Modified the intel_dmc_info structure, cache only the required header info. (Animesh)
v13: clarify firmware load error message.
The reason for a firmware loading failure can be obscure if the driver
is built-in. Provide an explanation to the user about the likely reason for
the failure and how to resolve it. (Imre)
v14: Suggested by Jani.
- fix s/I915/CONFIG_DRM_I915/ typo
- add fw_path to the firmware object instead of using a static ptr (Jani)
v15:
1) Changed the firmware name as dmc_gen9.bin, everytime for a new firmware version a symbolic link
with same name will help not to build kernel again.
2) Changes done as per review comments from Imre.
- Error check removed for intel_csr_ucode_init.
- Moved csr-specific data structure to intel_csr.h and optimization done on structure definition.
- fw->data used directly for parsing the header info & memory allocation
only done separately for payload. (Animesh)
v16:
- No need for out_regs label in i915_driver_load(), so removed it.
- Changed the firmware name as skl_dmc_ver1.bin, followed naming convention <platform>_dmc_<api-version>.bin (Animesh)
Issue: VIZ-2569
Signed-off-by: A.Sunil Kamath <sunil.kamath@intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Animesh Manna <animesh.manna@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-05-04 20:58:44 +08:00
|
|
|
uint32_t mmio_count;
|
drm/i915: Type safe register read/write
Make I915_READ and I915_WRITE more type safe by wrapping the register
offset in a struct. This should eliminate most of the fumbles we've had
with misplaced parens.
This only takes care of normal mmio registers. We could extend the idea
to other register types and define each with its own struct. That way
you wouldn't be able to accidentally pass the wrong thing to a specific
register access function.
The gpio_reg setup is probably the ugliest thing left. But I figure I'd
just leave it for now, and wait for some divine inspiration to strike
before making it nice.
As for the generated code, it's actually a bit better sometimes. Eg.
looking at i915_irq_handler(), we can see the following change:
lea 0x70024(%rdx,%rax,1),%r9d
mov $0x1,%edx
- movslq %r9d,%r9
- mov %r9,%rsi
- mov %r9,-0x58(%rbp)
- callq *0xd8(%rbx)
+ mov %r9d,%esi
+ mov %r9d,-0x48(%rbp)
callq *0xd8(%rbx)
So previously gcc thought the register offset might be signed and
decided to sign extend it, just in case. The rest appears to be
mostly just minor shuffling of instructions.
v2: i915_mmio_reg_{offset,equal,valid}() helpers added
s/_REG/_MMIO/ in the register defines
mo more switch statements left to worry about
ring_emit stuff got sorted in a prep patch
cmd parser, lrc context and w/a batch buildup also in prep patch
vgpu stuff cleaned up and moved to a prep patch
all other unrelated changes split out
v3: Rebased due to BXT DSI/BLC, MOCS, etc.
v4: Rebased due to churn, s/i915_mmio_reg_t/i915_reg_t/
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1447853606-2751-1-git-send-email-ville.syrjala@linux.intel.com
2015-11-18 21:33:26 +08:00
|
|
|
i915_reg_t mmioaddr[8];
|
drm/i915/skl: Add support to load SKL CSR firmware.
Display Context Save and Restore support is needed for
various SKL Display C states like DC5, DC6.
This implementation is added based on first version of DMC CSR program
that we received from h/w team.
Here we are using request_firmware based design.
Finally this firmware should end up in linux-firmware tree.
For SKL platform its mandatory to ensure that we load this
csr program before enabling DC states like DC5/DC6.
As CSR program gets reset on various conditions, we should ensure
to load it during boot and in future change to be added to load
this system resume sequence too.
v1: Initial relese as RFC patch
v2: Design change as per Daniel, Damien and Shobit's review comments
request firmware method followed.
v3: Some optimization and functional changes.
Pulled register defines into drivers/gpu/drm/i915/i915_reg.h
Used kmemdup to allocate and duplicate firmware content.
Ensured to free allocated buffer.
v4: Modified as per review comments from Satheesh and Daniel
Removed temporary buffer.
Optimized number of writes by replacing I915_WRITE with I915_WRITE64.
v5:
Modified as per review comemnts from Damien.
- Changed name for functions and firmware.
- Introduced HAS_CSR.
- Reverted back previous change and used csr_buf with u8 size.
- Using cpu_to_be64 for endianness change.
Modified as per review comments from Imre.
- Modified registers and macro names to be a bit closer to bspec terminology
and the existing register naming in the driver.
- Early return for non SKL platforms in intel_load_csr_program function.
- Added locking around CSR program load function as it may be called
concurrently during system/runtime resume.
- Releasing the fw before loading the program for consistency
- Handled error path during f/w load.
v6: Modified as per review comments from Imre.
- Corrected out_freecsr sequence.
v7: Modified as per review comments from Imre.
Fail loading fw if fw->size%8!=0.
v8: Rebase to latest.
v9: Rebase on top of -nightly (Damien)
v10: Enabled support for dmc firmware ver 1.0.
According to ver 1.0 in a single binary package all the firmware's that are
required for different stepping's of the product will be stored. The package
contains the css header, followed by the package header and the actual dmc
firmwares. Package header contains the firmware/stepping mapping table and
the corresponding firmware offsets to the individual binaries, within the
package. Each individual program binary contains the header and the payload
sections whose size is specified in the header section. This changes are done
to extract the specific firmaware from the package. (Animesh)
v11: Modified as per review comemnts from Imre.
- Added code comment from bpec for header structure elements.
- Added __packed to avoid structure padding.
- Added helper functions for stepping and substepping info.
- Added code comment for CSR_MAX_FW_SIZE.
- Disabled BXT firmware loading, will be enabled with dmc 1.0 support.
- Changed skl_stepping_info based on bspec, earlier used from config DB.
- Removed duplicate call of cpu_to_be* from intel_csr_load_program function.
- Used cpu_to_be32 instead of cpu_to_be64 as firmware binary in dword aligned.
- Added sanity check for header length.
- Added sanity check for mmio address got from firmware binary.
- kmalloc done separately for dmc header and dmc firmware. (Animesh)
v12: Modified as per review comemnts from Imre.
- Corrected the typo error in skl stepping info structure.
- Added out-of-bound access for skl_stepping_info.
- Sanity check for mmio address modified.
- Sanity check added for stepping and substeppig.
- Modified the intel_dmc_info structure, cache only the required header info. (Animesh)
v13: clarify firmware load error message.
The reason for a firmware loading failure can be obscure if the driver
is built-in. Provide an explanation to the user about the likely reason for
the failure and how to resolve it. (Imre)
v14: Suggested by Jani.
- fix s/I915/CONFIG_DRM_I915/ typo
- add fw_path to the firmware object instead of using a static ptr (Jani)
v15:
1) Changed the firmware name as dmc_gen9.bin, everytime for a new firmware version a symbolic link
with same name will help not to build kernel again.
2) Changes done as per review comments from Imre.
- Error check removed for intel_csr_ucode_init.
- Moved csr-specific data structure to intel_csr.h and optimization done on structure definition.
- fw->data used directly for parsing the header info & memory allocation
only done separately for payload. (Animesh)
v16:
- No need for out_regs label in i915_driver_load(), so removed it.
- Changed the firmware name as skl_dmc_ver1.bin, followed naming convention <platform>_dmc_<api-version>.bin (Animesh)
Issue: VIZ-2569
Signed-off-by: A.Sunil Kamath <sunil.kamath@intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Animesh Manna <animesh.manna@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-05-04 20:58:44 +08:00
|
|
|
uint32_t mmiodata[8];
|
2016-02-18 23:21:11 +08:00
|
|
|
uint32_t dc_state;
|
2016-03-01 04:49:03 +08:00
|
|
|
uint32_t allowed_dc_mask;
|
drm/i915/skl: Add support to load SKL CSR firmware.
Display Context Save and Restore support is needed for
various SKL Display C states like DC5, DC6.
This implementation is added based on first version of DMC CSR program
that we received from h/w team.
Here we are using request_firmware based design.
Finally this firmware should end up in linux-firmware tree.
For SKL platform its mandatory to ensure that we load this
csr program before enabling DC states like DC5/DC6.
As CSR program gets reset on various conditions, we should ensure
to load it during boot and in future change to be added to load
this system resume sequence too.
v1: Initial relese as RFC patch
v2: Design change as per Daniel, Damien and Shobit's review comments
request firmware method followed.
v3: Some optimization and functional changes.
Pulled register defines into drivers/gpu/drm/i915/i915_reg.h
Used kmemdup to allocate and duplicate firmware content.
Ensured to free allocated buffer.
v4: Modified as per review comments from Satheesh and Daniel
Removed temporary buffer.
Optimized number of writes by replacing I915_WRITE with I915_WRITE64.
v5:
Modified as per review comemnts from Damien.
- Changed name for functions and firmware.
- Introduced HAS_CSR.
- Reverted back previous change and used csr_buf with u8 size.
- Using cpu_to_be64 for endianness change.
Modified as per review comments from Imre.
- Modified registers and macro names to be a bit closer to bspec terminology
and the existing register naming in the driver.
- Early return for non SKL platforms in intel_load_csr_program function.
- Added locking around CSR program load function as it may be called
concurrently during system/runtime resume.
- Releasing the fw before loading the program for consistency
- Handled error path during f/w load.
v6: Modified as per review comments from Imre.
- Corrected out_freecsr sequence.
v7: Modified as per review comments from Imre.
Fail loading fw if fw->size%8!=0.
v8: Rebase to latest.
v9: Rebase on top of -nightly (Damien)
v10: Enabled support for dmc firmware ver 1.0.
According to ver 1.0 in a single binary package all the firmware's that are
required for different stepping's of the product will be stored. The package
contains the css header, followed by the package header and the actual dmc
firmwares. Package header contains the firmware/stepping mapping table and
the corresponding firmware offsets to the individual binaries, within the
package. Each individual program binary contains the header and the payload
sections whose size is specified in the header section. This changes are done
to extract the specific firmaware from the package. (Animesh)
v11: Modified as per review comemnts from Imre.
- Added code comment from bpec for header structure elements.
- Added __packed to avoid structure padding.
- Added helper functions for stepping and substepping info.
- Added code comment for CSR_MAX_FW_SIZE.
- Disabled BXT firmware loading, will be enabled with dmc 1.0 support.
- Changed skl_stepping_info based on bspec, earlier used from config DB.
- Removed duplicate call of cpu_to_be* from intel_csr_load_program function.
- Used cpu_to_be32 instead of cpu_to_be64 as firmware binary in dword aligned.
- Added sanity check for header length.
- Added sanity check for mmio address got from firmware binary.
- kmalloc done separately for dmc header and dmc firmware. (Animesh)
v12: Modified as per review comemnts from Imre.
- Corrected the typo error in skl stepping info structure.
- Added out-of-bound access for skl_stepping_info.
- Sanity check for mmio address modified.
- Sanity check added for stepping and substeppig.
- Modified the intel_dmc_info structure, cache only the required header info. (Animesh)
v13: clarify firmware load error message.
The reason for a firmware loading failure can be obscure if the driver
is built-in. Provide an explanation to the user about the likely reason for
the failure and how to resolve it. (Imre)
v14: Suggested by Jani.
- fix s/I915/CONFIG_DRM_I915/ typo
- add fw_path to the firmware object instead of using a static ptr (Jani)
v15:
1) Changed the firmware name as dmc_gen9.bin, everytime for a new firmware version a symbolic link
with same name will help not to build kernel again.
2) Changes done as per review comments from Imre.
- Error check removed for intel_csr_ucode_init.
- Moved csr-specific data structure to intel_csr.h and optimization done on structure definition.
- fw->data used directly for parsing the header info & memory allocation
only done separately for payload. (Animesh)
v16:
- No need for out_regs label in i915_driver_load(), so removed it.
- Changed the firmware name as skl_dmc_ver1.bin, followed naming convention <platform>_dmc_<api-version>.bin (Animesh)
Issue: VIZ-2569
Signed-off-by: A.Sunil Kamath <sunil.kamath@intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Animesh Manna <animesh.manna@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-05-04 20:58:44 +08:00
|
|
|
};
|
|
|
|
|
2013-04-23 23:37:17 +08:00
|
|
|
#define DEV_INFO_FOR_EACH_FLAG(func, sep) \
|
|
|
|
func(is_mobile) sep \
|
|
|
|
func(is_i85x) sep \
|
|
|
|
func(is_i915g) sep \
|
|
|
|
func(is_i945gm) sep \
|
|
|
|
func(is_g33) sep \
|
|
|
|
func(need_gfx_hws) sep \
|
|
|
|
func(is_g4x) sep \
|
|
|
|
func(is_pineview) sep \
|
|
|
|
func(is_broadwater) sep \
|
|
|
|
func(is_crestline) sep \
|
|
|
|
func(is_ivybridge) sep \
|
|
|
|
func(is_valleyview) sep \
|
2015-12-10 04:29:35 +08:00
|
|
|
func(is_cherryview) sep \
|
2013-04-23 23:37:17 +08:00
|
|
|
func(is_haswell) sep \
|
2016-05-10 17:57:05 +08:00
|
|
|
func(is_broadwell) sep \
|
2014-04-02 13:54:50 +08:00
|
|
|
func(is_skylake) sep \
|
2015-10-28 01:14:54 +08:00
|
|
|
func(is_broxton) sep \
|
2015-10-28 19:16:45 +08:00
|
|
|
func(is_kabylake) sep \
|
2013-08-24 07:00:07 +08:00
|
|
|
func(is_preliminary) sep \
|
2013-04-23 23:37:17 +08:00
|
|
|
func(has_fbc) sep \
|
|
|
|
func(has_pipe_cxsr) sep \
|
|
|
|
func(has_hotplug) sep \
|
|
|
|
func(cursor_needs_physical) sep \
|
|
|
|
func(has_overlay) sep \
|
|
|
|
func(overlay_needs_physical) sep \
|
|
|
|
func(supports_tv) sep \
|
2013-04-23 01:40:39 +08:00
|
|
|
func(has_llc) sep \
|
2016-03-02 20:10:31 +08:00
|
|
|
func(has_snoop) sep \
|
2013-04-23 01:40:41 +08:00
|
|
|
func(has_ddi) sep \
|
2016-06-03 13:34:33 +08:00
|
|
|
func(has_fpga_dbg) sep \
|
|
|
|
func(has_pooled_eu)
|
2012-08-09 04:01:51 +08:00
|
|
|
|
2013-04-23 01:40:38 +08:00
|
|
|
#define DEFINE_FLAG(name) u8 name:1
|
|
|
|
#define SEP_SEMICOLON ;
|
2012-08-09 04:01:51 +08:00
|
|
|
|
2009-12-17 04:16:16 +08:00
|
|
|
struct intel_device_info {
|
2013-01-24 21:29:28 +08:00
|
|
|
u32 display_mmio_offset;
|
2014-08-10 02:18:42 +08:00
|
|
|
u16 device_id;
|
2016-05-10 17:57:07 +08:00
|
|
|
u8 num_pipes;
|
2014-03-04 01:31:48 +08:00
|
|
|
u8 num_sprites[I915_MAX_PIPES];
|
2010-08-11 16:59:24 +08:00
|
|
|
u8 gen;
|
2016-05-10 17:57:04 +08:00
|
|
|
u16 gen_mask;
|
2013-10-16 01:02:57 +08:00
|
|
|
u8 ring_mask; /* Rings supported by the HW */
|
2013-04-23 01:40:38 +08:00
|
|
|
DEV_INFO_FOR_EACH_FLAG(DEFINE_FLAG, SEP_SEMICOLON);
|
drm/i915: Reorganize display pipe register accesses
RFCv2: Reorganize array indexing so that full offsets can be used as
is. It makes grepping for registers in i915_reg.h much easier. Also
move offset arrays to intel_device_info.
v1: Fixed offsets for VLV, proper eDP handling
v2: Fixed BCLRPAT, PIPESRC, PIPECONF and DSP* macros.
v3: Added EDP pipe comment, removed redundant offset arrays for
MSA_MISC and DDI_FUNC_CTL.
v4: Rename patch and report object size increase.
v5: Change location of commas, add PIPE_EDP into enum pipe
v6: Insert PIPE_EDP_OFFSET into pipe offset array
v7: Set I915_MAX_PIPES back to 3, change more registers accessors
to use the new macros, get rid of _PIPE_INC and add dev_priv
as a parameter where required by the new macros.
Upcoming hardware will not have the various display pipe register
ranges evenly spaced in memory. Change register address calculations
into array lookups.
Tested on SNB, VLV, IVB, Gen2 and HSW w/eDP.
I left the UMS cruft untouched.
Size differences:
text data bss dec hex filename
596431 4634 56 601121 92c21 i915.ko (new)
593199 4634 56 597889 91f81 i915.ko (old)
Signed-off-by: Antti Koskipaa <antti.koskipaa@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-04 20:22:24 +08:00
|
|
|
/* Register offsets for the various display pipes and transcoders */
|
|
|
|
int pipe_offsets[I915_MAX_TRANSCODERS];
|
|
|
|
int trans_offsets[I915_MAX_TRANSCODERS];
|
|
|
|
int palette_offsets[I915_MAX_PIPES];
|
2014-04-09 18:28:53 +08:00
|
|
|
int cursor_offsets[I915_MAX_PIPES];
|
2015-02-14 00:27:54 +08:00
|
|
|
|
|
|
|
/* Slice/subslice/EU info */
|
|
|
|
u8 slice_total;
|
|
|
|
u8 subslice_total;
|
|
|
|
u8 subslice_per_slice;
|
|
|
|
u8 eu_total;
|
|
|
|
u8 eu_per_subslice;
|
2016-06-03 13:34:33 +08:00
|
|
|
u8 min_eu_in_pool;
|
2015-02-15 02:30:29 +08:00
|
|
|
/* For each slice, which subslice(s) has(have) 7 EUs (bitfield)? */
|
|
|
|
u8 subslice_7eu[3];
|
2015-02-14 00:27:54 +08:00
|
|
|
u8 has_slice_pg:1;
|
|
|
|
u8 has_subslice_pg:1;
|
|
|
|
u8 has_eu_pg:1;
|
2016-03-16 18:57:16 +08:00
|
|
|
|
|
|
|
struct color_luts {
|
|
|
|
u16 degamma_lut_size;
|
|
|
|
u16 gamma_lut_size;
|
|
|
|
} color;
|
2009-12-17 04:16:16 +08:00
|
|
|
};
|
|
|
|
|
2013-04-23 01:40:38 +08:00
|
|
|
#undef DEFINE_FLAG
|
|
|
|
#undef SEP_SEMICOLON
|
|
|
|
|
2013-01-25 06:44:55 +08:00
|
|
|
enum i915_cache_level {
|
|
|
|
I915_CACHE_NONE = 0,
|
2013-08-06 20:17:02 +08:00
|
|
|
I915_CACHE_LLC, /* also used for snoopable memory on non-LLC */
|
|
|
|
I915_CACHE_L3_LLC, /* gen7+, L3 sits between the domain specifc
|
|
|
|
caches, eg sampler/render caches, and the
|
|
|
|
large Last-Level-Cache. LLC is coherent with
|
|
|
|
the CPU, but L3 is only visible to the GPU. */
|
2013-08-08 21:41:10 +08:00
|
|
|
I915_CACHE_WT, /* hsw:gt3e WriteThrough for scanouts */
|
2013-01-25 06:44:55 +08:00
|
|
|
};
|
|
|
|
|
2013-06-12 17:35:28 +08:00
|
|
|
struct i915_ctx_hang_stats {
|
|
|
|
/* This context had batch pending when hang was declared */
|
|
|
|
unsigned batch_pending;
|
|
|
|
|
|
|
|
/* This context had batch active when hang was declared */
|
|
|
|
unsigned batch_active;
|
2013-08-30 21:19:28 +08:00
|
|
|
|
|
|
|
/* Time when this context was last blamed for a GPU reset */
|
|
|
|
unsigned long guilty_ts;
|
|
|
|
|
2014-12-25 00:13:39 +08:00
|
|
|
/* If the contexts causes a second GPU hang within this time,
|
|
|
|
* it is permanently banned from submitting any more work.
|
|
|
|
*/
|
|
|
|
unsigned long ban_period_seconds;
|
|
|
|
|
2013-08-30 21:19:28 +08:00
|
|
|
/* This context is banned to submit more work */
|
|
|
|
bool banned;
|
2013-06-12 17:35:28 +08:00
|
|
|
};
|
2012-06-05 05:42:43 +08:00
|
|
|
|
|
|
|
/* This must match up with the value previously used for execbuf2.rsvd1. */
|
drm/i915: Emphasize that ctx->id is merely a user handle
This is an Execlists preparatory patch, since they make context ID become an
overloaded term:
- In the software, it was used to distinguish which context userspace was
trying to use.
- In the BSpec, the term is used to describe the 20-bits long field the
hardware uses to it to discriminate the contexts that are submitted to
the ELSP and inform the driver about their current status (via Context
Switch Interrupts and Context Status Buffers).
Initially, I tried to make the different meanings converge, but it proved
impossible:
- The software ctx->id is per-filp, while the hardware one needs to be
globally unique.
- Also, we multiplex several backing states objects per intel_context,
and all of them need unique HW IDs.
- I tried adding a per-filp ID and then composing the HW context ID as:
ctx->id + file_priv->id + ring->id, but the fact that the hardware only
uses 20-bits means we have to artificially limit the number of filps or
contexts the userspace can create.
The ctx->user_handle renaming bits are done with this Cocci patch (plus
manual frobbing of the struct declaration):
@@
struct intel_context c;
@@
- (c).id
+ c.user_handle
@@
struct intel_context *c;
@@
- (c)->id
+ c->user_handle
Also, while we are at it, s/DEFAULT_CONTEXT_ID/DEFAULT_CONTEXT_HANDLE and
change the type to unsigned 32 bits.
v2: s/handle/user_handle and change the type to uint32_t as suggested by
Chris Wilson.
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-07-03 23:28:00 +08:00
|
|
|
#define DEFAULT_CONTEXT_HANDLE 0
|
2015-05-20 22:00:13 +08:00
|
|
|
|
2014-07-03 23:28:01 +08:00
|
|
|
/**
|
2016-05-24 21:53:34 +08:00
|
|
|
* struct i915_gem_context - as the name implies, represents a context.
|
2014-07-03 23:28:01 +08:00
|
|
|
* @ref: reference count.
|
|
|
|
* @user_handle: userspace tracking identity for this context.
|
|
|
|
* @remap_slice: l3 row remapping information.
|
2015-05-20 22:00:13 +08:00
|
|
|
* @flags: context specific flags:
|
|
|
|
* CONTEXT_NO_ZEROMAP: do not allow mapping things to page 0.
|
2014-07-03 23:28:01 +08:00
|
|
|
* @file_priv: filp associated with this context (NULL for global default
|
|
|
|
* context).
|
|
|
|
* @hang_stats: information about the role of this context in possible GPU
|
|
|
|
* hangs.
|
2015-04-17 19:49:07 +08:00
|
|
|
* @ppgtt: virtual memory space used by this context.
|
2014-07-03 23:28:01 +08:00
|
|
|
* @legacy_hw_ctx: render context backing object and whether it is correctly
|
|
|
|
* initialized (legacy ring submission mechanism only).
|
|
|
|
* @link: link in the global list of contexts.
|
|
|
|
*
|
|
|
|
* Contexts are memory images used by the hardware to store copies of their
|
|
|
|
* internal state.
|
|
|
|
*/
|
2016-05-24 21:53:34 +08:00
|
|
|
struct i915_gem_context {
|
2013-04-30 18:30:33 +08:00
|
|
|
struct kref ref;
|
2015-05-05 16:17:29 +08:00
|
|
|
struct drm_i915_private *i915;
|
2012-06-05 05:42:43 +08:00
|
|
|
struct drm_i915_file_private *file_priv;
|
2014-08-06 21:04:53 +08:00
|
|
|
struct i915_hw_ppgtt *ppgtt;
|
2013-09-18 12:12:45 +08:00
|
|
|
|
2016-05-24 21:53:42 +08:00
|
|
|
struct i915_ctx_hang_stats hang_stats;
|
|
|
|
|
2016-04-28 16:56:51 +08:00
|
|
|
/* Unique identifier for this context, used by the hw for tracking */
|
2016-05-24 21:53:42 +08:00
|
|
|
unsigned long flags;
|
2016-07-04 15:08:39 +08:00
|
|
|
#define CONTEXT_NO_ZEROMAP BIT(0)
|
|
|
|
#define CONTEXT_NO_ERROR_CAPTURE BIT(1)
|
2016-04-28 16:56:51 +08:00
|
|
|
unsigned hw_id;
|
2016-05-24 21:53:42 +08:00
|
|
|
u32 user_handle;
|
2016-04-28 16:56:51 +08:00
|
|
|
|
2016-06-24 21:55:53 +08:00
|
|
|
u32 ggtt_alignment;
|
|
|
|
|
2016-05-24 21:53:37 +08:00
|
|
|
struct intel_context {
|
drm/i915/bdw: Introduce one context backing object per engine
A context backing object only makes sense for a given engine (because
it holds state data specific to that engine).
In legacy ringbuffer sumission mode, the only MI_SET_CONTEXT we really
perform is for the render engine, so one backing object is all we nee.
With Execlists, however, we need backing objects for every engine, as
contexts become the only way to submit workloads to the GPU. To tackle
this problem, we multiplex the context struct to contain <no-of-engines>
objects.
Originally, I colored this code by instantiating one new context for
every engine I wanted to use, but this change suggested by Brad Volkin
makes it more elegant.
v2: Leave the old backing object pointer behind. Daniel Vetter suggested
using a union, but it makes more sense to keep rcs_state as a NULL
pointer behind, to make sure no one uses it incorrectly when Execlists
are enabled, similar to what he suggested for ring->buffer (Rusty's API
level 5).
v3: Use the name "state" instead of the too-generic "obj", so that it
mirrors the name choice for the legacy rcs_state.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-07-25 00:04:13 +08:00
|
|
|
struct drm_i915_gem_object *state;
|
2014-07-25 00:04:15 +08:00
|
|
|
struct intel_ringbuffer *ringbuf;
|
2016-01-15 23:10:27 +08:00
|
|
|
struct i915_vma *lrc_vma;
|
2016-01-16 01:12:45 +08:00
|
|
|
uint32_t *lrc_reg_state;
|
2016-05-24 21:53:42 +08:00
|
|
|
u64 lrc_desc;
|
|
|
|
int pin_count;
|
2016-04-28 16:56:53 +08:00
|
|
|
bool initialised;
|
2016-03-16 19:00:39 +08:00
|
|
|
} engine[I915_NUM_ENGINES];
|
2016-06-16 20:07:01 +08:00
|
|
|
u32 ring_size;
|
2016-06-16 20:07:02 +08:00
|
|
|
u32 desc_template;
|
2016-06-16 20:07:03 +08:00
|
|
|
struct atomic_notifier_head status_notifier;
|
2016-06-16 20:07:04 +08:00
|
|
|
bool execlists_force_single_submission;
|
drm/i915/bdw: Introduce one context backing object per engine
A context backing object only makes sense for a given engine (because
it holds state data specific to that engine).
In legacy ringbuffer sumission mode, the only MI_SET_CONTEXT we really
perform is for the render engine, so one backing object is all we nee.
With Execlists, however, we need backing objects for every engine, as
contexts become the only way to submit workloads to the GPU. To tackle
this problem, we multiplex the context struct to contain <no-of-engines>
objects.
Originally, I colored this code by instantiating one new context for
every engine I wanted to use, but this change suggested by Brad Volkin
makes it more elegant.
v2: Leave the old backing object pointer behind. Daniel Vetter suggested
using a union, but it makes more sense to keep rcs_state as a NULL
pointer behind, to make sure no one uses it incorrectly when Execlists
are enabled, similar to what he suggested for ring->buffer (Rusty's API
level 5).
v3: Use the name "state" instead of the too-generic "obj", so that it
mirrors the name choice for the legacy rcs_state.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-07-25 00:04:13 +08:00
|
|
|
|
2013-09-18 12:12:45 +08:00
|
|
|
struct list_head link;
|
2016-05-24 21:53:42 +08:00
|
|
|
|
|
|
|
u8 remap_slice;
|
2012-06-05 05:42:43 +08:00
|
|
|
};
|
|
|
|
|
2015-02-14 03:23:44 +08:00
|
|
|
enum fb_op_origin {
|
|
|
|
ORIGIN_GTT,
|
|
|
|
ORIGIN_CPU,
|
|
|
|
ORIGIN_CS,
|
|
|
|
ORIGIN_FLIP,
|
2015-07-15 03:29:14 +08:00
|
|
|
ORIGIN_DIRTYFB,
|
2015-02-14 03:23:44 +08:00
|
|
|
};
|
|
|
|
|
2016-01-12 03:44:36 +08:00
|
|
|
struct intel_fbc {
|
2015-07-03 06:25:10 +08:00
|
|
|
/* This is always the inner lock when overlapping with struct_mutex and
|
|
|
|
* it's the outer lock when overlapping with stolen_lock. */
|
|
|
|
struct mutex lock;
|
2014-07-01 01:41:24 +08:00
|
|
|
unsigned threshold;
|
2015-02-14 03:23:46 +08:00
|
|
|
unsigned int possible_framebuffer_bits;
|
|
|
|
unsigned int busy_bits;
|
2016-01-19 21:35:48 +08:00
|
|
|
unsigned int visible_pipes_mask;
|
2015-02-10 00:46:29 +08:00
|
|
|
struct intel_crtc *crtc;
|
2013-06-28 07:30:21 +08:00
|
|
|
|
2014-06-20 03:06:10 +08:00
|
|
|
struct drm_mm_node compressed_fb;
|
2013-06-28 07:30:21 +08:00
|
|
|
struct drm_mm_node *compressed_llb;
|
|
|
|
|
2014-08-01 17:04:45 +08:00
|
|
|
bool false_color;
|
|
|
|
|
2015-10-15 21:44:46 +08:00
|
|
|
bool enabled;
|
2015-10-15 04:45:36 +08:00
|
|
|
bool active;
|
2014-09-20 03:04:55 +08:00
|
|
|
|
2016-01-19 21:35:42 +08:00
|
|
|
struct intel_fbc_state_cache {
|
|
|
|
struct {
|
|
|
|
unsigned int mode_flags;
|
|
|
|
uint32_t hsw_bdw_pixel_rate;
|
|
|
|
} crtc;
|
|
|
|
|
|
|
|
struct {
|
|
|
|
unsigned int rotation;
|
|
|
|
int src_w;
|
|
|
|
int src_h;
|
|
|
|
bool visible;
|
|
|
|
} plane;
|
|
|
|
|
|
|
|
struct {
|
|
|
|
u64 ilk_ggtt_offset;
|
|
|
|
uint32_t pixel_format;
|
|
|
|
unsigned int stride;
|
|
|
|
int fence_reg;
|
|
|
|
unsigned int tiling_mode;
|
|
|
|
} fb;
|
|
|
|
} state_cache;
|
|
|
|
|
2015-12-24 04:28:11 +08:00
|
|
|
struct intel_fbc_reg_params {
|
|
|
|
struct {
|
|
|
|
enum pipe pipe;
|
|
|
|
enum plane plane;
|
|
|
|
unsigned int fence_y_offset;
|
|
|
|
} crtc;
|
|
|
|
|
|
|
|
struct {
|
|
|
|
u64 ggtt_offset;
|
|
|
|
uint32_t pixel_format;
|
|
|
|
unsigned int stride;
|
|
|
|
int fence_reg;
|
|
|
|
} fb;
|
|
|
|
|
|
|
|
int cfb_size;
|
|
|
|
} params;
|
|
|
|
|
2013-06-28 07:30:21 +08:00
|
|
|
struct intel_fbc_work {
|
drm/i915: use a single intel_fbc_work struct
This was already on my TODO list, and was requested both by Chris and
Ville, for different reasons. The advantages are avoiding a frequent
malloc/free pair, and the locality of having the work structure
embedded in dev_priv. The maximum used memory is also smaller since
previously we could have multiple allocated intel_fbc_work structs at
the same time, and now we'll always have a single one - the one
embedded on dev_priv. Of course, we're now using a little more memory
on the cases where there's nothing scheduled.
The biggest challenge here is to keep everything synchronized the way
it was before.
Currently, when we try to activate FBC, we allocate a new
intel_fbc_work structure. Then later when we conclude we must delay
the FBC activation a little more, we allocate a new intel_fbc_work
struct, and then adjust dev_priv->fbc.fbc_work to point to the new
struct. So when the old work runs - at intel_fbc_work_fn() - it will
check that dev_priv->fbc.fbc_work points to something else, so it does
nothing. Everything is also protected by fbc.lock.
Just cancelling the old delayed work doesn't work because we might
just cancel it after the work function already started to run, but
while it is still waiting to grab fbc.lock. That's why we use the
"dev_priv->fbc.fbc_work == work" check described in the paragraph
above.
So now that we have a single work struct we have to introduce a new
way to synchronize everything. So we're making the work function a
normal work instead of a delayed work, and it will be responsible for
sleeping the appropriate amount of time itself. This way, after it
wakes up it can grab the lock, ask "were we delayed or cancelled?" and
then go back to sleep, enable FBC or give up.
v2:
- Spelling fixes.
- Rebase after changing the patch order.
- Fix ms/jiffies confusion.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/
2015-10-27 02:27:49 +08:00
|
|
|
bool scheduled;
|
2016-01-22 04:03:05 +08:00
|
|
|
u32 scheduled_vblank;
|
drm/i915: use a single intel_fbc_work struct
This was already on my TODO list, and was requested both by Chris and
Ville, for different reasons. The advantages are avoiding a frequent
malloc/free pair, and the locality of having the work structure
embedded in dev_priv. The maximum used memory is also smaller since
previously we could have multiple allocated intel_fbc_work structs at
the same time, and now we'll always have a single one - the one
embedded on dev_priv. Of course, we're now using a little more memory
on the cases where there's nothing scheduled.
The biggest challenge here is to keep everything synchronized the way
it was before.
Currently, when we try to activate FBC, we allocate a new
intel_fbc_work structure. Then later when we conclude we must delay
the FBC activation a little more, we allocate a new intel_fbc_work
struct, and then adjust dev_priv->fbc.fbc_work to point to the new
struct. So when the old work runs - at intel_fbc_work_fn() - it will
check that dev_priv->fbc.fbc_work points to something else, so it does
nothing. Everything is also protected by fbc.lock.
Just cancelling the old delayed work doesn't work because we might
just cancel it after the work function already started to run, but
while it is still waiting to grab fbc.lock. That's why we use the
"dev_priv->fbc.fbc_work == work" check described in the paragraph
above.
So now that we have a single work struct we have to introduce a new
way to synchronize everything. So we're making the work function a
normal work instead of a delayed work, and it will be responsible for
sleeping the appropriate amount of time itself. This way, after it
wakes up it can grab the lock, ask "were we delayed or cancelled?" and
then go back to sleep, enable FBC or give up.
v2:
- Spelling fixes.
- Rebase after changing the patch order.
- Fix ms/jiffies confusion.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/
2015-10-27 02:27:49 +08:00
|
|
|
struct work_struct work;
|
|
|
|
} work;
|
2013-06-28 07:30:21 +08:00
|
|
|
|
2015-10-28 00:50:03 +08:00
|
|
|
const char *no_fbc_reason;
|
2010-02-06 04:42:41 +08:00
|
|
|
};
|
|
|
|
|
2015-01-10 04:55:56 +08:00
|
|
|
/**
|
|
|
|
* HIGH_RR is the highest eDP panel refresh rate read from EDID
|
|
|
|
* LOW_RR is the lowest eDP panel refresh rate found from EDID
|
|
|
|
* parsing for same resolution.
|
|
|
|
*/
|
|
|
|
enum drrs_refresh_rate_type {
|
|
|
|
DRRS_HIGH_RR,
|
|
|
|
DRRS_LOW_RR,
|
|
|
|
DRRS_MAX_RR, /* RR count */
|
|
|
|
};
|
|
|
|
|
|
|
|
enum drrs_support_type {
|
|
|
|
DRRS_NOT_SUPPORTED = 0,
|
|
|
|
STATIC_DRRS_SUPPORT = 1,
|
|
|
|
SEAMLESS_DRRS_SUPPORT = 2
|
2014-04-05 14:43:28 +08:00
|
|
|
};
|
|
|
|
|
2014-07-12 01:30:11 +08:00
|
|
|
struct intel_dp;
|
2015-01-10 04:55:56 +08:00
|
|
|
struct i915_drrs {
|
|
|
|
struct mutex mutex;
|
|
|
|
struct delayed_work work;
|
|
|
|
struct intel_dp *dp;
|
|
|
|
unsigned busy_frontbuffer_bits;
|
|
|
|
enum drrs_refresh_rate_type refresh_rate_type;
|
|
|
|
enum drrs_support_type type;
|
|
|
|
};
|
|
|
|
|
2013-10-04 03:15:06 +08:00
|
|
|
struct i915_psr {
|
2014-07-12 01:30:15 +08:00
|
|
|
struct mutex lock;
|
2013-10-04 03:15:06 +08:00
|
|
|
bool sink_support;
|
|
|
|
bool source_ok;
|
2014-07-12 01:30:11 +08:00
|
|
|
struct intel_dp *enabled;
|
2014-06-13 20:10:03 +08:00
|
|
|
bool active;
|
|
|
|
struct delayed_work work;
|
2014-07-12 01:30:16 +08:00
|
|
|
unsigned busy_frontbuffer_bits;
|
2015-04-02 13:32:44 +08:00
|
|
|
bool psr2_support;
|
|
|
|
bool aux_frame_sync;
|
2016-02-02 04:02:07 +08:00
|
|
|
bool link_standby;
|
2013-07-12 05:45:00 +08:00
|
|
|
};
|
2013-06-28 07:30:21 +08:00
|
|
|
|
2010-04-07 16:15:53 +08:00
|
|
|
enum intel_pch {
|
2012-07-04 05:48:16 +08:00
|
|
|
PCH_NONE = 0, /* No PCH present */
|
2010-04-07 16:15:53 +08:00
|
|
|
PCH_IBX, /* Ibexpeak PCH */
|
|
|
|
PCH_CPT, /* Cougarpoint PCH */
|
2012-03-29 23:32:20 +08:00
|
|
|
PCH_LPT, /* Lynxpoint PCH */
|
2014-04-09 13:38:57 +08:00
|
|
|
PCH_SPT, /* Sunrisepoint PCH */
|
2016-07-02 08:07:12 +08:00
|
|
|
PCH_KBP, /* Kabypoint PCH */
|
2013-04-06 04:12:40 +08:00
|
|
|
PCH_NOP,
|
2010-04-07 16:15:53 +08:00
|
|
|
};
|
|
|
|
|
2012-12-01 22:04:24 +08:00
|
|
|
enum intel_sbi_destination {
|
|
|
|
SBI_ICLK,
|
|
|
|
SBI_MPHY,
|
|
|
|
};
|
|
|
|
|
2010-07-20 04:53:12 +08:00
|
|
|
#define QUIRK_PIPEA_FORCE (1<<0)
|
2011-07-13 05:56:22 +08:00
|
|
|
#define QUIRK_LVDS_SSC_DISABLE (1<<1)
|
2012-03-15 22:56:26 +08:00
|
|
|
#define QUIRK_INVERT_BRIGHTNESS (1<<2)
|
2014-07-04 07:27:50 +08:00
|
|
|
#define QUIRK_BACKLIGHT_PRESENT (1<<3)
|
2014-08-15 06:22:07 +08:00
|
|
|
#define QUIRK_PIPEB_FORCE (1<<4)
|
2014-11-20 16:26:30 +08:00
|
|
|
#define QUIRK_PIN_SWIZZLED_PAGES (1<<5)
|
2010-07-20 04:53:12 +08:00
|
|
|
|
2010-03-30 13:34:14 +08:00
|
|
|
struct intel_fbdev;
|
2011-07-08 19:22:42 +08:00
|
|
|
struct intel_fbc_work;
|
2010-03-30 13:34:13 +08:00
|
|
|
|
2012-02-15 05:37:19 +08:00
|
|
|
struct intel_gmbus {
|
|
|
|
struct i2c_adapter adapter;
|
2016-03-07 23:56:59 +08:00
|
|
|
#define GMBUS_FORCE_BIT_RETRY (1U << 31)
|
2012-11-10 23:58:21 +08:00
|
|
|
u32 force_bit;
|
2012-02-15 05:37:19 +08:00
|
|
|
u32 reg0;
|
drm/i915: Type safe register read/write
Make I915_READ and I915_WRITE more type safe by wrapping the register
offset in a struct. This should eliminate most of the fumbles we've had
with misplaced parens.
This only takes care of normal mmio registers. We could extend the idea
to other register types and define each with its own struct. That way
you wouldn't be able to accidentally pass the wrong thing to a specific
register access function.
The gpio_reg setup is probably the ugliest thing left. But I figure I'd
just leave it for now, and wait for some divine inspiration to strike
before making it nice.
As for the generated code, it's actually a bit better sometimes. Eg.
looking at i915_irq_handler(), we can see the following change:
lea 0x70024(%rdx,%rax,1),%r9d
mov $0x1,%edx
- movslq %r9d,%r9
- mov %r9,%rsi
- mov %r9,-0x58(%rbp)
- callq *0xd8(%rbx)
+ mov %r9d,%esi
+ mov %r9d,-0x48(%rbp)
callq *0xd8(%rbx)
So previously gcc thought the register offset might be signed and
decided to sign extend it, just in case. The rest appears to be
mostly just minor shuffling of instructions.
v2: i915_mmio_reg_{offset,equal,valid}() helpers added
s/_REG/_MMIO/ in the register defines
mo more switch statements left to worry about
ring_emit stuff got sorted in a prep patch
cmd parser, lrc context and w/a batch buildup also in prep patch
vgpu stuff cleaned up and moved to a prep patch
all other unrelated changes split out
v3: Rebased due to BXT DSI/BLC, MOCS, etc.
v4: Rebased due to churn, s/i915_mmio_reg_t/i915_reg_t/
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1447853606-2751-1-git-send-email-ville.syrjala@linux.intel.com
2015-11-18 21:33:26 +08:00
|
|
|
i915_reg_t gpio_reg;
|
2012-02-28 07:43:09 +08:00
|
|
|
struct i2c_algo_bit_data bit_algo;
|
2012-02-15 05:37:19 +08:00
|
|
|
struct drm_i915_private *dev_priv;
|
|
|
|
};
|
|
|
|
|
2012-11-03 02:55:02 +08:00
|
|
|
struct i915_suspend_saved_registers {
|
2008-05-07 10:27:53 +08:00
|
|
|
u32 saveDSPARB;
|
2007-11-22 12:14:14 +08:00
|
|
|
u32 saveLVDS;
|
2008-07-30 02:54:06 +08:00
|
|
|
u32 savePP_ON_DELAYS;
|
|
|
|
u32 savePP_OFF_DELAYS;
|
2007-11-22 12:14:14 +08:00
|
|
|
u32 savePP_ON;
|
|
|
|
u32 savePP_OFF;
|
|
|
|
u32 savePP_CONTROL;
|
2008-07-30 02:54:06 +08:00
|
|
|
u32 savePP_DIVISOR;
|
2007-11-22 12:14:14 +08:00
|
|
|
u32 saveFBC_CONTROL;
|
2008-02-17 11:19:29 +08:00
|
|
|
u32 saveCACHE_MODE_0;
|
|
|
|
u32 saveMI_ARB_STATE;
|
2007-11-22 12:14:14 +08:00
|
|
|
u32 saveSWF0[16];
|
|
|
|
u32 saveSWF1[16];
|
2015-09-19 01:03:43 +08:00
|
|
|
u32 saveSWF3[3];
|
2011-10-10 03:52:02 +08:00
|
|
|
uint64_t saveFENCE[I915_MAX_NUM_FENCES];
|
2011-07-27 04:53:06 +08:00
|
|
|
u32 savePCH_PORT_HOTPLUG;
|
2014-12-11 04:16:05 +08:00
|
|
|
u16 saveGCDGMBUS;
|
2012-11-03 02:55:02 +08:00
|
|
|
};
|
2012-11-03 02:55:03 +08:00
|
|
|
|
2014-05-05 20:19:56 +08:00
|
|
|
struct vlv_s0ix_state {
|
|
|
|
/* GAM */
|
|
|
|
u32 wr_watermark;
|
|
|
|
u32 gfx_prio_ctrl;
|
|
|
|
u32 arb_mode;
|
|
|
|
u32 gfx_pend_tlb0;
|
|
|
|
u32 gfx_pend_tlb1;
|
|
|
|
u32 lra_limits[GEN7_LRA_LIMITS_REG_NUM];
|
|
|
|
u32 media_max_req_count;
|
|
|
|
u32 gfx_max_req_count;
|
|
|
|
u32 render_hwsp;
|
|
|
|
u32 ecochk;
|
|
|
|
u32 bsd_hwsp;
|
|
|
|
u32 blt_hwsp;
|
|
|
|
u32 tlb_rd_addr;
|
|
|
|
|
|
|
|
/* MBC */
|
|
|
|
u32 g3dctl;
|
|
|
|
u32 gsckgctl;
|
|
|
|
u32 mbctl;
|
|
|
|
|
|
|
|
/* GCP */
|
|
|
|
u32 ucgctl1;
|
|
|
|
u32 ucgctl3;
|
|
|
|
u32 rcgctl1;
|
|
|
|
u32 rcgctl2;
|
|
|
|
u32 rstctl;
|
|
|
|
u32 misccpctl;
|
|
|
|
|
|
|
|
/* GPM */
|
|
|
|
u32 gfxpause;
|
|
|
|
u32 rpdeuhwtc;
|
|
|
|
u32 rpdeuc;
|
|
|
|
u32 ecobus;
|
|
|
|
u32 pwrdwnupctl;
|
|
|
|
u32 rp_down_timeout;
|
|
|
|
u32 rp_deucsw;
|
|
|
|
u32 rcubmabdtmr;
|
|
|
|
u32 rcedata;
|
|
|
|
u32 spare2gh;
|
|
|
|
|
|
|
|
/* Display 1 CZ domain */
|
|
|
|
u32 gt_imr;
|
|
|
|
u32 gt_ier;
|
|
|
|
u32 pm_imr;
|
|
|
|
u32 pm_ier;
|
|
|
|
u32 gt_scratch[GEN7_GT_SCRATCH_REG_NUM];
|
|
|
|
|
|
|
|
/* GT SA CZ domain */
|
|
|
|
u32 tilectl;
|
|
|
|
u32 gt_fifoctl;
|
|
|
|
u32 gtlc_wake_ctrl;
|
|
|
|
u32 gtlc_survive;
|
|
|
|
u32 pmwgicz;
|
|
|
|
|
|
|
|
/* Display 2 CZ domain */
|
|
|
|
u32 gu_ctl0;
|
|
|
|
u32 gu_ctl1;
|
2015-04-02 05:22:57 +08:00
|
|
|
u32 pcbr;
|
2014-05-05 20:19:56 +08:00
|
|
|
u32 clock_gate_dis2;
|
|
|
|
};
|
|
|
|
|
2014-07-11 03:31:18 +08:00
|
|
|
struct intel_rps_ei {
|
|
|
|
u32 cz_clock;
|
|
|
|
u32 render_c0;
|
|
|
|
u32 media_c0;
|
2014-07-04 05:33:01 +08:00
|
|
|
};
|
|
|
|
|
2012-11-03 02:55:03 +08:00
|
|
|
struct intel_gen6_power_mgmt {
|
drm/i915: sanitize rps irq disabling
When disabling the RPS interrupts there is a tricky dependency between
the thread disabling the interrupts, the RPS interrupt handler and the
corresponding RPS work. The RPS work can reenable the interrupts, so
there is no straightforward order in the disabling thread to (1) make
sure that any RPS work is flushed and to (2) disable all RPS
interrupts. Currently this is solved by masking the interrupts using two
separate mask registers (first level display IMR and PM IMR) and doing
the disabling when all first level interrupts are disabled.
This works, but the requirement to run with all first level interrupts
disabled is unnecessary making the suspend / unload time ordering of RPS
disabling wrt. other unitialization steps difficult and error prone.
Removing this restriction allows us to disable RPS early during suspend
/ unload and forget about it for the rest of the sequence. By adding a
more explicit method for avoiding the above race, it also becomes easier
to prove its correctness. Finally currently we can hit the WARN in
snb_update_pm_irq(), when a final RPS work runs with the first level
interrupts already disabled. This won't lead to any problem (due to the
separate interrupt masks), but with the change in this and the next
patch we can get rid of the WARN, while leaving it in place for other
scenarios.
To address the above points, add a new RPS interrupts_enabled flag and
use this during RPS disabling to avoid requeuing the RPS work and
reenabling of the RPS interrupts. Since the interrupt disabling happens
now in intel_suspend_gt_powersave(), we will disable RPS interrupts
explicitly during suspend (and not just through the first level mask),
but there is no problem doing so, it's also more consistent and allows
us to unify more of the RPS disabling during suspend and unload time in
the next patch.
v2/v3:
- rebase on patch "drm/i915: move rps irq disable one level up" in the
patchset
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-11-19 21:30:04 +08:00
|
|
|
/*
|
|
|
|
* work, interrupts_enabled and pm_iir are protected by
|
|
|
|
* dev_priv->irq_lock
|
|
|
|
*/
|
2012-11-03 02:55:03 +08:00
|
|
|
struct work_struct work;
|
drm/i915: sanitize rps irq disabling
When disabling the RPS interrupts there is a tricky dependency between
the thread disabling the interrupts, the RPS interrupt handler and the
corresponding RPS work. The RPS work can reenable the interrupts, so
there is no straightforward order in the disabling thread to (1) make
sure that any RPS work is flushed and to (2) disable all RPS
interrupts. Currently this is solved by masking the interrupts using two
separate mask registers (first level display IMR and PM IMR) and doing
the disabling when all first level interrupts are disabled.
This works, but the requirement to run with all first level interrupts
disabled is unnecessary making the suspend / unload time ordering of RPS
disabling wrt. other unitialization steps difficult and error prone.
Removing this restriction allows us to disable RPS early during suspend
/ unload and forget about it for the rest of the sequence. By adding a
more explicit method for avoiding the above race, it also becomes easier
to prove its correctness. Finally currently we can hit the WARN in
snb_update_pm_irq(), when a final RPS work runs with the first level
interrupts already disabled. This won't lead to any problem (due to the
separate interrupt masks), but with the change in this and the next
patch we can get rid of the WARN, while leaving it in place for other
scenarios.
To address the above points, add a new RPS interrupts_enabled flag and
use this during RPS disabling to avoid requeuing the RPS work and
reenabling of the RPS interrupts. Since the interrupt disabling happens
now in intel_suspend_gt_powersave(), we will disable RPS interrupts
explicitly during suspend (and not just through the first level mask),
but there is no problem doing so, it's also more consistent and allows
us to unify more of the RPS disabling during suspend and unload time in
the next patch.
v2/v3:
- rebase on patch "drm/i915: move rps irq disable one level up" in the
patchset
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-11-19 21:30:04 +08:00
|
|
|
bool interrupts_enabled;
|
2012-11-03 02:55:03 +08:00
|
|
|
u32 pm_iir;
|
2013-07-05 05:35:28 +08:00
|
|
|
|
2016-05-31 16:28:27 +08:00
|
|
|
u32 pm_intr_keep;
|
|
|
|
|
2014-03-20 09:31:11 +08:00
|
|
|
/* Frequencies are stored in potentially platform dependent multiples.
|
|
|
|
* In other words, *_freq needs to be multiplied by X to be interesting.
|
|
|
|
* Soft limits are those which are used for the dynamic reclocking done
|
|
|
|
* by the driver (raise frequencies under heavy loads, and lower for
|
|
|
|
* lighter loads). Hard limits are those imposed by the hardware.
|
|
|
|
*
|
|
|
|
* A distinction is made for overclocking, which is never enabled by
|
|
|
|
* default, and is considered to be above the hard limit if it's
|
|
|
|
* possible at all.
|
|
|
|
*/
|
|
|
|
u8 cur_freq; /* Current frequency (cached, may not == HW) */
|
|
|
|
u8 min_freq_softlimit; /* Minimum frequency permitted by the driver */
|
|
|
|
u8 max_freq_softlimit; /* Max frequency permitted by the driver */
|
|
|
|
u8 max_freq; /* Maximum frequency, RP0 if not overclocking */
|
|
|
|
u8 min_freq; /* AKA RPn. Minimum frequency */
|
2015-03-18 17:48:21 +08:00
|
|
|
u8 idle_freq; /* Frequency to request when we are idle */
|
2014-03-20 09:31:11 +08:00
|
|
|
u8 efficient_freq; /* AKA RPe. Pre-determined balanced frequency */
|
|
|
|
u8 rp1_freq; /* "less than" RP0 power/freqency */
|
|
|
|
u8 rp0_freq; /* Non-overclocked max frequency. */
|
2016-03-05 03:43:02 +08:00
|
|
|
u16 gpll_ref_freq; /* vlv/chv GPLL reference frequency */
|
2012-11-03 02:14:00 +08:00
|
|
|
|
2015-04-07 23:20:28 +08:00
|
|
|
u8 up_threshold; /* Current %busy required to uplock */
|
|
|
|
u8 down_threshold; /* Current %busy required to downclock */
|
|
|
|
|
drm/i915: Tweak RPS thresholds to more aggressively downclock
After applying wait-boost we often find ourselves stuck at higher clocks
than required. The current threshold value requires the GPU to be
continuously and completely idle for 313ms before it is dropped by one
bin. Conversely, we require the GPU to be busy for an average of 90% over
a 84ms period before we upclock. So the current thresholds almost never
downclock the GPU, and respond very slowly to sudden demands for more
power. It is easy to observe that we currently lock into the wrong bin
and both underperform in benchmarks and consume more power than optimal
(just by repeating the task and measuring the different results).
An alternative approach, as discussed in the bspec, is to use a
continuous threshold for upclocking, and an average value for downclocking.
This is good for quickly detecting and reacting to state changes within a
frame, however it fails with the common throttling method of waiting
upon the outstanding frame - at least it is difficult to choose a
threshold that works well at 15,000fps and at 60fps. So continue to use
average busy/idle loads to determine frequency change.
v2: Use 3 power zones to keep frequencies low in steady-state mostly
idle (e.g. scrolling, interactive 2D drawing), and frequencies high
for demanding games. In between those end-states, we use a
fast-reclocking algorithm to converge more quickly on the desired bin.
v3: Bug fixes - make sure we reset adj after switching power zones.
v4: Tune - drop the continuous busy thresholds as it prevents us from
choosing the right frequency for glxgears style swap benchmarks. Instead
the goal is to be able to find the right clocks irrespective of the
wait-boost.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-26 00:34:57 +08:00
|
|
|
int last_adj;
|
|
|
|
enum { LOW_POWER, BETWEEN, HIGH_POWER } power;
|
|
|
|
|
2015-05-22 04:01:47 +08:00
|
|
|
spinlock_t client_lock;
|
|
|
|
struct list_head clients;
|
|
|
|
bool client_boost;
|
|
|
|
|
2013-10-11 04:58:50 +08:00
|
|
|
bool enabled;
|
2012-11-03 02:14:00 +08:00
|
|
|
struct delayed_work delayed_resume_work;
|
2015-04-07 23:20:32 +08:00
|
|
|
unsigned boosts;
|
2012-11-03 02:14:01 +08:00
|
|
|
|
2015-04-27 20:41:22 +08:00
|
|
|
struct intel_rps_client semaphores, mmioflips;
|
2015-04-27 20:41:20 +08:00
|
|
|
|
2014-07-11 03:31:18 +08:00
|
|
|
/* manual wa residency calculations */
|
|
|
|
struct intel_rps_ei up_ei, down_ei;
|
|
|
|
|
2012-11-03 02:14:01 +08:00
|
|
|
/*
|
|
|
|
* Protects RPS/RC6 register access and PCU communication.
|
2015-05-22 04:01:47 +08:00
|
|
|
* Must be taken after struct_mutex if nested. Note that
|
|
|
|
* this lock may be held for long periods of time when
|
|
|
|
* talking to hw - so only take it when talking to hw!
|
2012-11-03 02:14:01 +08:00
|
|
|
*/
|
|
|
|
struct mutex hw_lock;
|
2012-11-03 02:55:03 +08:00
|
|
|
};
|
|
|
|
|
2012-11-30 05:18:51 +08:00
|
|
|
/* defined intel_pm.c */
|
|
|
|
extern spinlock_t mchdev_lock;
|
|
|
|
|
2012-11-03 02:55:03 +08:00
|
|
|
struct intel_ilk_power_mgmt {
|
|
|
|
u8 cur_delay;
|
|
|
|
u8 min_delay;
|
|
|
|
u8 max_delay;
|
|
|
|
u8 fmax;
|
|
|
|
u8 fstart;
|
|
|
|
|
|
|
|
u64 last_count1;
|
|
|
|
unsigned long last_time1;
|
|
|
|
unsigned long chipset_power;
|
|
|
|
u64 last_count2;
|
2014-07-17 05:05:06 +08:00
|
|
|
u64 last_time2;
|
2012-11-03 02:55:03 +08:00
|
|
|
unsigned long gfx_power;
|
|
|
|
u8 corr;
|
|
|
|
|
|
|
|
int c_m;
|
|
|
|
int r_t;
|
|
|
|
};
|
|
|
|
|
2014-03-05 01:22:55 +08:00
|
|
|
struct drm_i915_private;
|
|
|
|
struct i915_power_well;
|
|
|
|
|
|
|
|
struct i915_power_well_ops {
|
|
|
|
/*
|
|
|
|
* Synchronize the well's hw state to match the current sw state, for
|
|
|
|
* example enable/disable it based on the current refcount. Called
|
|
|
|
* during driver init and resume time, possibly after first calling
|
|
|
|
* the enable/disable handlers.
|
|
|
|
*/
|
|
|
|
void (*sync_hw)(struct drm_i915_private *dev_priv,
|
|
|
|
struct i915_power_well *power_well);
|
|
|
|
/*
|
|
|
|
* Enable the well and resources that depend on it (for example
|
|
|
|
* interrupts located on the well). Called after the 0->1 refcount
|
|
|
|
* transition.
|
|
|
|
*/
|
|
|
|
void (*enable)(struct drm_i915_private *dev_priv,
|
|
|
|
struct i915_power_well *power_well);
|
|
|
|
/*
|
|
|
|
* Disable the well and resources that depend on it. Called after
|
|
|
|
* the 1->0 refcount transition.
|
|
|
|
*/
|
|
|
|
void (*disable)(struct drm_i915_private *dev_priv,
|
|
|
|
struct i915_power_well *power_well);
|
|
|
|
/* Returns the hw enabled state. */
|
|
|
|
bool (*is_enabled)(struct drm_i915_private *dev_priv,
|
|
|
|
struct i915_power_well *power_well);
|
|
|
|
};
|
|
|
|
|
2013-05-30 22:07:11 +08:00
|
|
|
/* Power well structure for haswell */
|
|
|
|
struct i915_power_well {
|
2013-11-25 23:15:29 +08:00
|
|
|
const char *name;
|
2013-11-25 23:15:30 +08:00
|
|
|
bool always_on;
|
2013-05-30 22:07:11 +08:00
|
|
|
/* power well enable/disable usage count */
|
|
|
|
int count;
|
2014-06-06 01:31:47 +08:00
|
|
|
/* cached hw enabled state */
|
|
|
|
bool hw_enabled;
|
2013-11-25 23:15:29 +08:00
|
|
|
unsigned long domains;
|
2014-03-05 22:20:56 +08:00
|
|
|
unsigned long data;
|
2014-03-05 01:22:55 +08:00
|
|
|
const struct i915_power_well_ops *ops;
|
2013-05-30 22:07:11 +08:00
|
|
|
};
|
|
|
|
|
2013-10-25 22:36:47 +08:00
|
|
|
struct i915_power_domains {
|
2013-10-25 22:36:48 +08:00
|
|
|
/*
|
|
|
|
* Power wells needed for initialization at driver init and suspend
|
|
|
|
* time are on. They are kept on until after the first modeset.
|
|
|
|
*/
|
|
|
|
bool init_power_on;
|
2014-04-25 18:19:05 +08:00
|
|
|
bool initializing;
|
2013-11-25 23:15:29 +08:00
|
|
|
int power_well_count;
|
2013-10-25 22:36:48 +08:00
|
|
|
|
2013-10-25 22:36:47 +08:00
|
|
|
struct mutex lock;
|
2013-11-25 23:15:35 +08:00
|
|
|
int domain_use_count[POWER_DOMAIN_NUM];
|
2013-11-25 23:15:29 +08:00
|
|
|
struct i915_power_well *power_wells;
|
2013-10-25 22:36:47 +08:00
|
|
|
};
|
|
|
|
|
2013-09-20 02:13:41 +08:00
|
|
|
#define MAX_L3_SLICES 2
|
2012-11-03 02:55:07 +08:00
|
|
|
struct intel_l3_parity {
|
2013-09-20 02:13:41 +08:00
|
|
|
u32 *remap_info[MAX_L3_SLICES];
|
2012-11-03 02:55:07 +08:00
|
|
|
struct work_struct error_work;
|
2013-09-20 02:13:41 +08:00
|
|
|
int which_slice;
|
2012-11-03 02:55:07 +08:00
|
|
|
};
|
|
|
|
|
2012-11-15 00:14:03 +08:00
|
|
|
struct i915_gem_mm {
|
|
|
|
/** Memory allocator for GTT stolen memory */
|
|
|
|
struct drm_mm stolen;
|
2015-07-03 06:25:09 +08:00
|
|
|
/** Protects the usage of the GTT stolen memory allocator. This is
|
|
|
|
* always the inner lock when overlapping with struct_mutex. */
|
|
|
|
struct mutex stolen_lock;
|
|
|
|
|
2012-11-15 00:14:03 +08:00
|
|
|
/** List of all objects in gtt_space. Used to restore gtt
|
|
|
|
* mappings on resume */
|
|
|
|
struct list_head bound_list;
|
|
|
|
/**
|
|
|
|
* List of objects which are not bound to the GTT (thus
|
|
|
|
* are idle and not used by the GPU) but still have
|
|
|
|
* (presumably uncached) pages still attached.
|
|
|
|
*/
|
|
|
|
struct list_head unbound_list;
|
|
|
|
|
|
|
|
/** Usable portion of the GTT for GEM */
|
|
|
|
unsigned long stolen_base; /* limited to low memory (32-bit) */
|
|
|
|
|
|
|
|
/** PPGTT used for aliasing the PPGTT with the GTT */
|
|
|
|
struct i915_hw_ppgtt *aliasing_ppgtt;
|
|
|
|
|
2014-05-20 15:28:43 +08:00
|
|
|
struct notifier_block oom_notifier;
|
2016-04-04 21:46:43 +08:00
|
|
|
struct notifier_block vmap_notifier;
|
2014-03-25 21:23:04 +08:00
|
|
|
struct shrinker shrinker;
|
2012-11-15 00:14:03 +08:00
|
|
|
bool shrinker_no_lock_stealing;
|
|
|
|
|
|
|
|
/** LRU list of objects with fence regs on them. */
|
|
|
|
struct list_head fence_list;
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Are we in a non-interruptible section of code like
|
|
|
|
* modesetting?
|
|
|
|
*/
|
|
|
|
bool interruptible;
|
|
|
|
|
2014-05-21 23:37:52 +08:00
|
|
|
/* the indicator for dispatch video commands on two BSD rings */
|
2016-01-15 23:12:50 +08:00
|
|
|
unsigned int bsd_ring_dispatch_index;
|
2014-05-21 23:37:52 +08:00
|
|
|
|
2012-11-15 00:14:03 +08:00
|
|
|
/** Bit 6 swizzling required for X tiling */
|
|
|
|
uint32_t bit_6_swizzle_x;
|
|
|
|
/** Bit 6 swizzling required for Y tiling */
|
|
|
|
uint32_t bit_6_swizzle_y;
|
|
|
|
|
|
|
|
/* accounting, useful for userland debugging */
|
2013-07-25 04:40:23 +08:00
|
|
|
spinlock_t object_stat_lock;
|
2012-11-15 00:14:03 +08:00
|
|
|
size_t object_memory;
|
|
|
|
u32 object_count;
|
|
|
|
};
|
|
|
|
|
2013-05-23 18:55:35 +08:00
|
|
|
struct drm_i915_error_state_buf {
|
2014-08-22 21:41:39 +08:00
|
|
|
struct drm_i915_private *i915;
|
2013-05-23 18:55:35 +08:00
|
|
|
unsigned bytes;
|
|
|
|
unsigned size;
|
|
|
|
int err;
|
|
|
|
u8 *buf;
|
|
|
|
loff_t start;
|
|
|
|
loff_t pos;
|
|
|
|
};
|
|
|
|
|
2013-06-06 20:18:39 +08:00
|
|
|
struct i915_error_state_file_priv {
|
|
|
|
struct drm_device *dev;
|
|
|
|
struct drm_i915_error_state *error;
|
|
|
|
};
|
|
|
|
|
2012-11-15 00:14:04 +08:00
|
|
|
struct i915_gpu_error {
|
|
|
|
/* For hangcheck timer */
|
|
|
|
#define DRM_I915_HANGCHECK_PERIOD 1500 /* in ms */
|
|
|
|
#define DRM_I915_HANGCHECK_JIFFIES msecs_to_jiffies(DRM_I915_HANGCHECK_PERIOD)
|
2013-08-30 21:19:28 +08:00
|
|
|
/* Hang gpu twice in this window and your context gets banned */
|
|
|
|
#define DRM_I915_CTX_BAN_PERIOD DIV_ROUND_UP(8*DRM_I915_HANGCHECK_PERIOD, 1000)
|
|
|
|
|
2015-01-27 00:03:03 +08:00
|
|
|
struct delayed_work hangcheck_work;
|
2012-11-15 00:14:04 +08:00
|
|
|
|
|
|
|
/* For reset and error_state handling. */
|
|
|
|
spinlock_t lock;
|
|
|
|
/* Protected by the above dev->gpu_error.lock. */
|
|
|
|
struct drm_i915_error_state *first_error;
|
2013-09-26 00:34:55 +08:00
|
|
|
|
|
|
|
unsigned long missed_irq_rings;
|
|
|
|
|
2012-11-16 00:17:22 +08:00
|
|
|
/**
|
2013-11-12 20:44:19 +08:00
|
|
|
* State variable controlling the reset flow and count
|
2012-11-16 00:17:22 +08:00
|
|
|
*
|
2013-11-12 20:44:19 +08:00
|
|
|
* This is a counter which gets incremented when reset is triggered,
|
|
|
|
* and again when reset has been handled. So odd values (lowest bit set)
|
|
|
|
* means that reset is in progress and even values that
|
|
|
|
* (reset_counter >> 1):th reset was successfully completed.
|
|
|
|
*
|
|
|
|
* If reset is not completed succesfully, the I915_WEDGE bit is
|
|
|
|
* set meaning that hardware is terminally sour and there is no
|
|
|
|
* recovery. All waiters on the reset_queue will be woken when
|
|
|
|
* that happens.
|
|
|
|
*
|
|
|
|
* This counter is used by the wait_seqno code to notice that reset
|
|
|
|
* event happened and it needs to restart the entire ioctl (since most
|
|
|
|
* likely the seqno it waited for won't ever signal anytime soon).
|
drm/i915: create a race-free reset detection
With the previous patch the state transition handling of the reset
code itself is now (hopefully) race free and solid. But that still
leaves out everyone else - with the various lock-free wait paths
we have there's the possibility that the reset happens between the
point where we read the seqno we should wait on and the actual wait.
And if __wait_seqno then never sees the RESET_IN_PROGRESS state, we'll
happily wait for a seqno which will in all likelyhood never signal.
In practice this is not a big problem since the X server gets
constantly interrupted, and can then submit more work (hopefully) to
unblock everyone else: As soon as a new seqno write lands, all waiters
will unblock. But running the i-g-t reset testcase ZZ_hangman can
expose this race, especially on slower hw with fewer cpu cores.
Now looking forward to ARB_robustness and friends that's not the best
possible behaviour, hence this patch adds a reset_counter to be able
to detect any reset, even if a given thread never observed the
in-progress state.
The important part is to correctly order things:
- The write side needs to increment the counter after any seqno gets
reset. Hence we need to do that at the end of the reset work, and
again wake everyone up. We also need to place a barrier in between
any possible seqno changes and the counter increment, since any
unlock operations only guarantee that nothing leaks out, but not
that at later load operation gets moved ahead.
- On the read side we need to ensure that no reset can sneak in and
invalidate the seqno. In all cases we can use the one-sided barrier
that unlock operations guarantee (of the lock protecting the
respective seqno/ring pair) to ensure correct ordering. Hence it is
sufficient to place the atomic read before the mutex/spin_unlock and
no additional barriers are required.
The end-result of all this is that we need to wake up everyone twice
in a reset operation:
- First, before the reset starts, to get any lockholders of the locks,
so that the reset can proceed.
- Second, after the reset is completed, to allow waiters to properly
and reliably detect the reset condition and bail out.
I admit that this entire reset_counter thing smells a bit like
overkill, but I think it's justified since it makes it really explicit
what the bail-out condition is. And we need a reset counter anyway to
implement ARB_robustness, and imo with finer-grained locking on the
horizont this is the most resilient scheme I could think of.
v2: Drop spurious change in the wait_for_error EXIT_COND - we only
need to wait until we leave the reset-in-progress wedged state.
v3: Don't play tricks with barriers in the throttle ioctl, the
spin_unlock is barrier enough.
I've also considered using a little helper to grab the current
reset_counter, but then decided that hiding the atomic_read isn't a
great idea, since having it explicitly show up in the code is a nice
remainder to reviews to check the memory barriers.
v4: Add a comment to explain why we need to fall through in
__wait_seqno in the end variable assignments.
v5: Review from Damien:
- s/smb/smp/ in a comment
- don't increment the reset counter after we've set it to WEDGED. Now
we (again) properly wedge the gpu when the reset fails.
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-06 16:01:42 +08:00
|
|
|
*
|
|
|
|
* This is important for lock-free wait paths, where no contended lock
|
|
|
|
* naturally enforces the correct ordering between the bail-out of the
|
|
|
|
* waiter and the gpu reset work code.
|
2012-11-16 00:17:22 +08:00
|
|
|
*/
|
|
|
|
atomic_t reset_counter;
|
|
|
|
|
|
|
|
#define I915_RESET_IN_PROGRESS_FLAG 1
|
2013-11-12 20:44:19 +08:00
|
|
|
#define I915_WEDGED (1 << 31)
|
2012-11-16 00:17:22 +08:00
|
|
|
|
2016-07-02 00:23:14 +08:00
|
|
|
/**
|
|
|
|
* Waitqueue to signal when a hang is detected. Used to for waiters
|
|
|
|
* to release the struct_mutex for the reset to procede.
|
|
|
|
*/
|
|
|
|
wait_queue_head_t wait_queue;
|
|
|
|
|
2012-11-16 00:17:22 +08:00
|
|
|
/**
|
|
|
|
* Waitqueue to signal when the reset has completed. Used by clients
|
|
|
|
* that wait for dev_priv->mm.wedged to settle.
|
|
|
|
*/
|
|
|
|
wait_queue_head_t reset_queue;
|
2012-11-15 00:14:05 +08:00
|
|
|
|
2013-09-26 00:34:55 +08:00
|
|
|
/* For missed irq/seqno simulation. */
|
drm/i915: Slaughter the thundering i915_wait_request herd
One particularly stressful scenario consists of many independent tasks
all competing for GPU time and waiting upon the results (e.g. realtime
transcoding of many, many streams). One bottleneck in particular is that
each client waits on its own results, but every client is woken up after
every batchbuffer - hence the thunder of hooves as then every client must
do its heavyweight dance to read a coherent seqno to see if it is the
lucky one.
Ideally, we only want one client to wake up after the interrupt and
check its request for completion. Since the requests must retire in
order, we can select the first client on the oldest request to be woken.
Once that client has completed his wait, we can then wake up the
next client and so on. However, all clients then incur latency as every
process in the chain may be delayed for scheduling - this may also then
cause some priority inversion. To reduce the latency, when a client
is added or removed from the list, we scan the tree for completed
seqno and wake up all the completed waiters in parallel.
Using igt/benchmarks/gem_latency, we can demonstrate this effect. The
benchmark measures the number of GPU cycles between completion of a
batch and the client waking up from a call to wait-ioctl. With many
concurrent waiters, with each on a different request, we observe that
the wakeup latency before the patch scales nearly linearly with the
number of waiters (before external factors kick in making the scaling much
worse). After applying the patch, we can see that only the single waiter
for the request is being woken up, providing a constant wakeup latency
for every operation. However, the situation is not quite as rosy for
many waiters on the same request, though to the best of my knowledge this
is much less likely in practice. Here, we can observe that the
concurrent waiters incur extra latency from being woken up by the
solitary bottom-half, rather than directly by the interrupt. This
appears to be scheduler induced (having discounted adverse effects from
having a rbtree walk/erase in the wakeup path), each additional
wake_up_process() costs approximately 1us on big core. Another effect of
performing the secondary wakeups from the first bottom-half is the
incurred delay this imposes on high priority threads - rather than
immediately returning to userspace and leaving the interrupt handler to
wake the others.
To offset the delay incurred with additional waiters on a request, we
could use a hybrid scheme that did a quick read in the interrupt handler
and dequeued all the completed waiters (incurring the overhead in the
interrupt handler, not the best plan either as we then incur GPU
submission latency) but we would still have to wake up the bottom-half
every time to do the heavyweight slow read. Or we could only kick the
waiters on the seqno with the same priority as the current task (i.e. in
the realtime waiter scenario, only it is woken up immediately by the
interrupt and simply queues the next waiter before returning to userspace,
minimising its delay at the expense of the chain, and also reducing
contention on its scheduler runqueue). This is effective at avoid long
pauses in the interrupt handler and at avoiding the extra latency in
realtime/high-priority waiters.
v2: Convert from a kworker per engine into a dedicated kthread for the
bottom-half.
v3: Rename request members and tweak comments.
v4: Use a per-engine spinlock in the breadcrumbs bottom-half.
v5: Fix race in locklessly checking waiter status and kicking the task on
adding a new waiter.
v6: Fix deciding when to force the timer to hide missing interrupts.
v7: Move the bottom-half from the kthread to the first client process.
v8: Reword a few comments
v9: Break the busy loop when the interrupt is unmasked or has fired.
v10: Comments, unnecessary churn, better debugging from Tvrtko
v11: Wake all completed waiters on removing the current bottom-half to
reduce the latency of waking up a herd of clients all waiting on the
same request.
v12: Rearrange missed-interrupt fault injection so that it works with
igt/drv_missed_irq_hang
v13: Rename intel_breadcrumb and friends to intel_wait in preparation
for signal handling.
v14: RCU commentary, assert_spin_locked
v15: Hide BUG_ON behind the compiler; report on gem_latency findings.
v16: Sort seqno-groups by priority so that first-waiter has the highest
task priority (and so avoid priority inversion).
v17: Add waiters to post-mortem GPU hang state.
v18: Return early for a completed wait after acquiring the spinlock.
Avoids adding ourselves to the tree if the is already complete, and
skips the awkward question of why we don't do completion wakeups for
waits earlier than or equal to ourselves.
v19: Prepare for init_breadcrumbs to fail. Later patches may want to
allocate during init, so be prepared to propagate back the error code.
Testcase: igt/gem_concurrent_blit
Testcase: igt/benchmarks/gem_latency
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> #v18
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-6-git-send-email-chris@chris-wilson.co.uk
2016-07-02 00:23:15 +08:00
|
|
|
unsigned long test_irq_rings;
|
2012-11-15 00:14:04 +08:00
|
|
|
};
|
|
|
|
|
i915: ignore lid open event when resuming
i915 driver needs to do modeset when
1. system resumes from sleep
2. lid is opened
In PM_SUSPEND_MEM state, all the GPEs are cleared when system resumes,
thus it is the i915_resume code does the modeset rather than intel_lid_notify().
But in PM_SUSPEND_FREEZE state, this will be broken because
system is still responsive to the lid events.
1. When we close the lid in Freeze state, intel_lid_notify() sets modeset_on_lid.
2. When we reopen the lid, intel_lid_notify() will do a modeset,
before the system is resumed.
here is the error log,
[92146.548074] WARNING: at drivers/gpu/drm/i915/intel_display.c:1028 intel_wait_for_pipe_off+0x184/0x190 [i915]()
[92146.548076] Hardware name: VGN-Z540N
[92146.548078] pipe_off wait timed out
[92146.548167] Modules linked in: hid_generic usbhid hid snd_hda_codec_realtek snd_hda_intel snd_hda_codec parport_pc snd_hwdep ppdev snd_pcm_oss i915 snd_mixer_oss snd_pcm arc4 iwldvm snd_seq_dummy mac80211 snd_seq_oss snd_seq_midi fbcon tileblit font bitblit softcursor drm_kms_helper snd_rawmidi snd_seq_midi_event coretemp drm snd_seq kvm btusb bluetooth snd_timer iwlwifi pcmcia tpm_infineon i2c_algo_bit joydev snd_seq_device intel_agp cfg80211 snd intel_gtt yenta_socket pcmcia_rsrc sony_laptop agpgart microcode psmouse tpm_tis serio_raw mxm_wmi soundcore snd_page_alloc tpm acpi_cpufreq lpc_ich pcmcia_core tpm_bios mperf processor lp parport firewire_ohci firewire_core crc_itu_t sdhci_pci sdhci thermal e1000e
[92146.548173] Pid: 4304, comm: kworker/0:0 Tainted: G W 3.8.0-rc3-s0i3-v3-test+ #9
[92146.548175] Call Trace:
[92146.548189] [<c10378e2>] warn_slowpath_common+0x72/0xa0
[92146.548227] [<f86398b4>] ? intel_wait_for_pipe_off+0x184/0x190 [i915]
[92146.548263] [<f86398b4>] ? intel_wait_for_pipe_off+0x184/0x190 [i915]
[92146.548270] [<c10379b3>] warn_slowpath_fmt+0x33/0x40
[92146.548307] [<f86398b4>] intel_wait_for_pipe_off+0x184/0x190 [i915]
[92146.548344] [<f86399c2>] intel_disable_pipe+0x102/0x190 [i915]
[92146.548380] [<f8639ea4>] ? intel_disable_plane+0x64/0x80 [i915]
[92146.548417] [<f8639f7c>] i9xx_crtc_disable+0xbc/0x150 [i915]
[92146.548456] [<f863ebee>] intel_crtc_update_dpms+0x5e/0x90 [i915]
[92146.548493] [<f86437cf>] intel_modeset_setup_hw_state+0x42f/0x8f0 [i915]
[92146.548535] [<f8645b0b>] intel_lid_notify+0x9b/0xc0 [i915]
[92146.548543] [<c15610d3>] notifier_call_chain+0x43/0x60
[92146.548550] [<c105d1e1>] __blocking_notifier_call_chain+0x41/0x80
[92146.548556] [<c105d23f>] blocking_notifier_call_chain+0x1f/0x30
[92146.548563] [<c131a684>] acpi_lid_send_state+0x78/0xa4
[92146.548569] [<c131aa9e>] acpi_button_notify+0x3b/0xf1
[92146.548577] [<c12df56a>] ? acpi_os_execute+0x17/0x19
[92146.548582] [<c12e591a>] ? acpi_ec_sync_query+0xa5/0xbc
[92146.548589] [<c12e2b82>] acpi_device_notify+0x16/0x18
[92146.548595] [<c12f4904>] acpi_ev_notify_dispatch+0x38/0x4f
[92146.548600] [<c12df0e8>] acpi_os_execute_deferred+0x20/0x2b
[92146.548607] [<c1051208>] process_one_work+0x128/0x3f0
[92146.548613] [<c1564f73>] ? common_interrupt+0x33/0x38
[92146.548618] [<c104f8c0>] ? wake_up_worker+0x30/0x30
[92146.548624] [<c12df0c8>] ? acpi_os_wait_events_complete+0x1e/0x1e
[92146.548629] [<c10524f9>] worker_thread+0x119/0x3b0
[92146.548634] [<c10523e0>] ? manage_workers+0x240/0x240
[92146.548640] [<c1056e84>] kthread+0x94/0xa0
[92146.548647] [<c1060000>] ? ftrace_raw_output_sched_stat_runtime+0x70/0xf0
[92146.548652] [<c15649b7>] ret_from_kernel_thread+0x1b/0x28
[92146.548658] [<c1056df0>] ? kthread_create_on_node+0xc0/0xc0
three different modeset flags are introduced in this patch
MODESET_ON_LID_OPEN: do modeset on next lid open event
MODESET_DONE: modeset already done
MODESET_SUSPENDED: suspended, only do modeset when system is resumed
In this way,
1. when lid is closed, MODESET_ON_LID_OPEN is set so that
we'll do modeset on next lid open event.
2. when lid is opened, MODESET_DONE is set
so that duplicate lid open events will be ignored.
3. when system suspends, MODESET_SUSPENDED is set.
In this case, we will not do modeset on any lid events.
Plus, locking mechanism is also introduced to avoid racing.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-02-05 15:41:53 +08:00
|
|
|
enum modeset_restore {
|
|
|
|
MODESET_ON_LID_OPEN,
|
|
|
|
MODESET_DONE,
|
|
|
|
MODESET_SUSPENDED,
|
|
|
|
};
|
|
|
|
|
2015-08-08 08:01:16 +08:00
|
|
|
#define DP_AUX_A 0x40
|
|
|
|
#define DP_AUX_B 0x10
|
|
|
|
#define DP_AUX_C 0x20
|
|
|
|
#define DP_AUX_D 0x30
|
|
|
|
|
2015-08-17 16:04:04 +08:00
|
|
|
#define DDC_PIN_B 0x05
|
|
|
|
#define DDC_PIN_C 0x04
|
|
|
|
#define DDC_PIN_D 0x06
|
|
|
|
|
2013-09-13 04:06:24 +08:00
|
|
|
struct ddi_vbt_port_info {
|
2014-08-01 18:07:54 +08:00
|
|
|
/*
|
|
|
|
* This is an index in the HDMI/DVI DDI buffer translation table.
|
|
|
|
* The special value HDMI_LEVEL_SHIFT_UNKNOWN means the VBT didn't
|
|
|
|
* populate this field.
|
|
|
|
*/
|
|
|
|
#define HDMI_LEVEL_SHIFT_UNKNOWN 0xff
|
2013-09-13 04:06:24 +08:00
|
|
|
uint8_t hdmi_level_shift;
|
2013-09-13 04:12:18 +08:00
|
|
|
|
|
|
|
uint8_t supports_dvi:1;
|
|
|
|
uint8_t supports_hdmi:1;
|
|
|
|
uint8_t supports_dp:1;
|
2015-08-08 08:01:16 +08:00
|
|
|
|
|
|
|
uint8_t alternate_aux_channel;
|
2015-08-17 16:04:04 +08:00
|
|
|
uint8_t alternate_ddc_pin;
|
2015-07-10 19:10:55 +08:00
|
|
|
|
|
|
|
uint8_t dp_boost_level;
|
|
|
|
uint8_t hdmi_boost_level;
|
2013-09-13 04:06:24 +08:00
|
|
|
};
|
|
|
|
|
2014-11-15 00:52:30 +08:00
|
|
|
enum psr_lines_to_wait {
|
|
|
|
PSR_0_LINES_TO_WAIT = 0,
|
|
|
|
PSR_1_LINE_TO_WAIT,
|
|
|
|
PSR_4_LINES_TO_WAIT,
|
|
|
|
PSR_8_LINES_TO_WAIT
|
2014-03-28 12:44:57 +08:00
|
|
|
};
|
|
|
|
|
2013-05-10 07:03:18 +08:00
|
|
|
struct intel_vbt_data {
|
|
|
|
struct drm_display_mode *lfp_lvds_vbt_mode; /* if any */
|
|
|
|
struct drm_display_mode *sdvo_lvds_vbt_mode; /* if any */
|
|
|
|
|
|
|
|
/* Feature bits */
|
|
|
|
unsigned int int_tv_support:1;
|
|
|
|
unsigned int lvds_dither:1;
|
|
|
|
unsigned int lvds_vbt:1;
|
|
|
|
unsigned int int_crt_support:1;
|
|
|
|
unsigned int lvds_use_ssc:1;
|
|
|
|
unsigned int display_clock_mode:1;
|
|
|
|
unsigned int fdi_rx_polarity_inverted:1;
|
2016-04-08 21:28:12 +08:00
|
|
|
unsigned int panel_type:4;
|
2013-05-10 07:03:18 +08:00
|
|
|
int lvds_ssc_freq;
|
|
|
|
unsigned int bios_lvds_val; /* initial [PCH_]LVDS reg val in VBIOS */
|
|
|
|
|
2014-03-28 12:44:57 +08:00
|
|
|
enum drrs_support_type drrs_type;
|
|
|
|
|
2016-03-24 23:50:20 +08:00
|
|
|
struct {
|
|
|
|
int rate;
|
|
|
|
int lanes;
|
|
|
|
int preemphasis;
|
|
|
|
int vswing;
|
2016-03-24 23:50:21 +08:00
|
|
|
bool low_vswing;
|
2016-03-24 23:50:20 +08:00
|
|
|
bool initialized;
|
|
|
|
bool support;
|
|
|
|
int bpp;
|
|
|
|
struct edp_power_seq pps;
|
|
|
|
} edp;
|
2013-05-10 07:03:18 +08:00
|
|
|
|
2014-11-15 00:52:30 +08:00
|
|
|
struct {
|
|
|
|
bool full_link;
|
|
|
|
bool require_aux_wakeup;
|
|
|
|
int idle_frames;
|
|
|
|
enum psr_lines_to_wait lines_to_wait;
|
|
|
|
int tp1_wakeup_time;
|
|
|
|
int tp2_tp3_wakeup_time;
|
|
|
|
} psr;
|
|
|
|
|
2013-12-15 06:38:29 +08:00
|
|
|
struct {
|
|
|
|
u16 pwm_freq_hz;
|
2014-04-09 16:22:06 +08:00
|
|
|
bool present;
|
2013-12-15 06:38:29 +08:00
|
|
|
bool active_low_pwm;
|
2014-06-24 23:27:39 +08:00
|
|
|
u8 min_brightness; /* min_brightness/255 of max */
|
2016-04-26 21:14:24 +08:00
|
|
|
enum intel_backlight_type type;
|
2013-12-15 06:38:29 +08:00
|
|
|
} backlight;
|
|
|
|
|
2013-08-27 20:12:25 +08:00
|
|
|
/* MIPI DSI */
|
|
|
|
struct {
|
|
|
|
u16 panel_id;
|
2014-04-14 13:30:34 +08:00
|
|
|
struct mipi_config *config;
|
|
|
|
struct mipi_pps_data *pps;
|
|
|
|
u8 seq_version;
|
|
|
|
u32 size;
|
|
|
|
u8 *data;
|
2015-12-21 21:10:57 +08:00
|
|
|
const u8 *sequence[MIPI_SEQ_MAX];
|
2013-08-27 20:12:25 +08:00
|
|
|
} dsi;
|
|
|
|
|
2013-05-10 07:03:18 +08:00
|
|
|
int crt_ddc_pin;
|
|
|
|
|
|
|
|
int child_dev_num;
|
2013-09-12 05:02:47 +08:00
|
|
|
union child_device_config *child_dev;
|
2013-09-13 04:06:24 +08:00
|
|
|
|
|
|
|
struct ddi_vbt_port_info ddi_port_info[I915_MAX_PORTS];
|
2016-03-24 23:50:22 +08:00
|
|
|
struct sdvo_device_mapping sdvo_mappings[2];
|
2013-05-10 07:03:18 +08:00
|
|
|
};
|
|
|
|
|
2013-08-07 03:24:04 +08:00
|
|
|
enum intel_ddb_partitioning {
|
|
|
|
INTEL_DDB_PART_1_2,
|
|
|
|
INTEL_DDB_PART_5_6, /* IVB+ */
|
|
|
|
};
|
|
|
|
|
2013-08-07 03:24:05 +08:00
|
|
|
struct intel_wm_level {
|
|
|
|
bool enable;
|
|
|
|
uint32_t pri_val;
|
|
|
|
uint32_t spr_val;
|
|
|
|
uint32_t cur_val;
|
|
|
|
uint32_t fbc_val;
|
|
|
|
};
|
|
|
|
|
2013-12-17 20:46:36 +08:00
|
|
|
struct ilk_wm_values {
|
2013-10-10 00:18:03 +08:00
|
|
|
uint32_t wm_pipe[3];
|
|
|
|
uint32_t wm_lp[3];
|
|
|
|
uint32_t wm_lp_spr[3];
|
|
|
|
uint32_t wm_linetime[3];
|
|
|
|
bool enable_fbc_wm;
|
|
|
|
enum intel_ddb_partitioning partitioning;
|
|
|
|
};
|
|
|
|
|
2015-06-25 03:00:04 +08:00
|
|
|
struct vlv_pipe_wm {
|
|
|
|
uint16_t primary;
|
|
|
|
uint16_t sprite[2];
|
|
|
|
uint8_t cursor;
|
|
|
|
};
|
2015-03-06 03:19:49 +08:00
|
|
|
|
2015-06-25 03:00:04 +08:00
|
|
|
struct vlv_sr_wm {
|
|
|
|
uint16_t plane;
|
|
|
|
uint8_t cursor;
|
|
|
|
};
|
2015-03-06 03:19:49 +08:00
|
|
|
|
2015-06-25 03:00:04 +08:00
|
|
|
struct vlv_wm_values {
|
|
|
|
struct vlv_pipe_wm pipe[3];
|
|
|
|
struct vlv_sr_wm sr;
|
2015-03-06 03:19:45 +08:00
|
|
|
struct {
|
|
|
|
uint8_t cursor;
|
|
|
|
uint8_t sprite[2];
|
|
|
|
uint8_t primary;
|
|
|
|
} ddl[3];
|
2015-06-25 03:00:03 +08:00
|
|
|
uint8_t level;
|
|
|
|
bool cxsr;
|
2015-03-06 03:19:45 +08:00
|
|
|
};
|
|
|
|
|
2014-11-05 01:06:41 +08:00
|
|
|
struct skl_ddb_entry {
|
2014-11-05 01:06:53 +08:00
|
|
|
uint16_t start, end; /* in number of blocks, 'end' is exclusive */
|
2014-11-05 01:06:41 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
static inline uint16_t skl_ddb_entry_size(const struct skl_ddb_entry *entry)
|
|
|
|
{
|
2014-11-05 01:06:53 +08:00
|
|
|
return entry->end - entry->start;
|
2014-11-05 01:06:41 +08:00
|
|
|
}
|
|
|
|
|
2014-11-05 01:06:52 +08:00
|
|
|
static inline bool skl_ddb_entry_equal(const struct skl_ddb_entry *e1,
|
|
|
|
const struct skl_ddb_entry *e2)
|
|
|
|
{
|
|
|
|
if (e1->start == e2->start && e1->end == e2->end)
|
|
|
|
return true;
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2014-11-05 01:06:41 +08:00
|
|
|
struct skl_ddb_allocation {
|
2014-11-05 01:07:01 +08:00
|
|
|
struct skl_ddb_entry pipe[I915_MAX_PIPES];
|
2015-04-28 06:47:37 +08:00
|
|
|
struct skl_ddb_entry plane[I915_MAX_PIPES][I915_MAX_PLANES]; /* packed/uv */
|
2015-09-25 06:53:10 +08:00
|
|
|
struct skl_ddb_entry y_plane[I915_MAX_PIPES][I915_MAX_PLANES];
|
2014-11-05 01:06:41 +08:00
|
|
|
};
|
|
|
|
|
2014-11-05 01:06:40 +08:00
|
|
|
struct skl_wm_values {
|
2016-05-12 22:06:07 +08:00
|
|
|
unsigned dirty_pipes;
|
2014-11-05 01:06:41 +08:00
|
|
|
struct skl_ddb_allocation ddb;
|
2014-11-05 01:06:40 +08:00
|
|
|
uint32_t wm_linetime[I915_MAX_PIPES];
|
|
|
|
uint32_t plane[I915_MAX_PIPES][I915_MAX_PLANES][8];
|
|
|
|
uint32_t plane_trans[I915_MAX_PIPES][I915_MAX_PLANES];
|
|
|
|
};
|
|
|
|
|
|
|
|
struct skl_wm_level {
|
|
|
|
bool plane_en[I915_MAX_PLANES];
|
|
|
|
uint16_t plane_res_b[I915_MAX_PLANES];
|
|
|
|
uint8_t plane_res_l[I915_MAX_PLANES];
|
|
|
|
};
|
|
|
|
|
2013-08-20 00:18:09 +08:00
|
|
|
/*
|
2014-03-08 07:08:18 +08:00
|
|
|
* This struct helps tracking the state needed for runtime PM, which puts the
|
|
|
|
* device in PCI D3 state. Notice that when this happens, nothing on the
|
|
|
|
* graphics device works, even register access, so we don't get interrupts nor
|
|
|
|
* anything else.
|
2013-08-20 00:18:09 +08:00
|
|
|
*
|
2014-03-08 07:08:18 +08:00
|
|
|
* Every piece of our code that needs to actually touch the hardware needs to
|
|
|
|
* either call intel_runtime_pm_get or call intel_display_power_get with the
|
|
|
|
* appropriate power domain.
|
drm/i915: make PC8 be part of runtime PM suspend/resume
Currently, when our driver becomes idle for i915.pc8_timeout (default:
5s) we enable PC8, so we save some power, but not everything we can.
Then, while PC8 is enabled, if we stay idle for more
autosuspend_delay_ms (default: 10s) we'll enter runtime PM and put the
graphics device in D3 state, saving even more power. The two features
are separate things with increasing levels of power savings, but if we
disable PC8 we'll never get into D3.
While from the modularity point of view it would be nice to keep these
features as separate, we have reasons to merge them:
- We are not aware of anybody wanting a "PC8 without D3" environment.
- If we keep both features as separate, we'll have to to test both
PC8 and PC8+D3 code paths. We're already having a major pain to
make QA do automated testing of just one thing, testing both paths
will cost even more.
- Only Haswell+ supports PC8, so if we want to add runtime PM support
to, for example, IVB, we'll have to copy some code from the PC8
feature to runtime PM, so merging both features as a single thing
will make it easier for enabling runtime PM on other platforms.
This patch only does the very basic steps required to have PC8 and
runtime PM merged on a single feature: the next patches will take care
of cleaning up everything.
v2: - Rebase.
v3: - Rebase.
- Fully remove the deprecated i915 params since Daniel doesn't
consider them as part of the ABI.
v4: - Rebase.
- Fix typo in the commit message.
v5: - Rebase, again.
- Add a huge comment explaining the different forcewake usage
(Chris, Daniel).
- Use open-coded forcewake functions (Daniel).
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-08 07:08:05 +08:00
|
|
|
*
|
2014-03-08 07:08:18 +08:00
|
|
|
* Our driver uses the autosuspend delay feature, which means we'll only really
|
|
|
|
* suspend if we stay with zero refcount for a certain amount of time. The
|
2014-09-30 16:56:39 +08:00
|
|
|
* default value is currently very conservative (see intel_runtime_pm_enable), but
|
2014-03-08 07:08:18 +08:00
|
|
|
* it can be changed with the standard runtime PM files from sysfs.
|
2013-08-20 00:18:09 +08:00
|
|
|
*
|
|
|
|
* The irqs_disabled variable becomes true exactly after we disable the IRQs and
|
|
|
|
* goes back to false exactly before we reenable the IRQs. We use this variable
|
|
|
|
* to check if someone is trying to enable/disable IRQs while they're supposed
|
|
|
|
* to be disabled. This shouldn't happen and we'll print some error messages in
|
2014-03-08 07:12:32 +08:00
|
|
|
* case it happens.
|
2013-08-20 00:18:09 +08:00
|
|
|
*
|
2014-03-08 07:08:18 +08:00
|
|
|
* For more, read the Documentation/power/runtime_pm.txt.
|
2013-08-20 00:18:09 +08:00
|
|
|
*/
|
2014-03-08 07:08:15 +08:00
|
|
|
struct i915_runtime_pm {
|
2015-12-16 08:52:19 +08:00
|
|
|
atomic_t wakeref_count;
|
2015-12-16 02:10:37 +08:00
|
|
|
atomic_t atomic_seq;
|
2014-03-08 07:08:15 +08:00
|
|
|
bool suspended;
|
2014-09-30 16:56:43 +08:00
|
|
|
bool irqs_enabled;
|
2013-08-20 00:18:09 +08:00
|
|
|
};
|
|
|
|
|
2013-10-16 19:30:34 +08:00
|
|
|
enum intel_pipe_crc_source {
|
|
|
|
INTEL_PIPE_CRC_SOURCE_NONE,
|
|
|
|
INTEL_PIPE_CRC_SOURCE_PLANE1,
|
|
|
|
INTEL_PIPE_CRC_SOURCE_PLANE2,
|
|
|
|
INTEL_PIPE_CRC_SOURCE_PF,
|
2013-10-17 04:55:48 +08:00
|
|
|
INTEL_PIPE_CRC_SOURCE_PIPE,
|
2013-10-17 04:55:58 +08:00
|
|
|
/* TV/DP on pre-gen5/vlv can't use the pipe source. */
|
|
|
|
INTEL_PIPE_CRC_SOURCE_TV,
|
|
|
|
INTEL_PIPE_CRC_SOURCE_DP_B,
|
|
|
|
INTEL_PIPE_CRC_SOURCE_DP_C,
|
|
|
|
INTEL_PIPE_CRC_SOURCE_DP_D,
|
2013-11-01 17:50:20 +08:00
|
|
|
INTEL_PIPE_CRC_SOURCE_AUTO,
|
2013-10-16 19:30:34 +08:00
|
|
|
INTEL_PIPE_CRC_SOURCE_MAX,
|
|
|
|
};
|
|
|
|
|
2013-10-16 01:55:27 +08:00
|
|
|
struct intel_pipe_crc_entry {
|
2013-10-16 01:55:30 +08:00
|
|
|
uint32_t frame;
|
2013-10-16 01:55:27 +08:00
|
|
|
uint32_t crc[5];
|
|
|
|
};
|
|
|
|
|
2013-10-16 01:55:29 +08:00
|
|
|
#define INTEL_PIPE_CRC_ENTRIES_NR 128
|
2013-10-16 01:55:27 +08:00
|
|
|
struct intel_pipe_crc {
|
2013-10-21 21:29:30 +08:00
|
|
|
spinlock_t lock;
|
|
|
|
bool opened; /* exclusive access to the result file */
|
2013-10-16 01:55:34 +08:00
|
|
|
struct intel_pipe_crc_entry *entries;
|
2013-10-16 19:30:34 +08:00
|
|
|
enum intel_pipe_crc_source source;
|
2013-10-21 21:29:30 +08:00
|
|
|
int head, tail;
|
2013-10-16 01:55:40 +08:00
|
|
|
wait_queue_head_t wq;
|
2013-10-16 01:55:27 +08:00
|
|
|
};
|
|
|
|
|
drm/i915: Track frontbuffer invalidation/flushing
So these are the guts of the new beast. This tracks when a frontbuffer
gets invalidated (due to frontbuffer rendering) and hence should be
constantly scaned out, and when it's flushed again and can be
compressed/one-shot-upload.
Rules for flushing are simple: The frontbuffer needs one more full
upload starting from the next vblank. Which means that the flushing
can _only_ be called once the frontbuffer update has been latched.
But this poses a problem for pageflips: We can't just delay the
flushing until the pageflip is latched, since that would pose the risk
that we override frontbuffer rendering that has been scheduled
in-between the pageflip ioctl and the actual latching.
To handle this track asynchronous invalidations (and also pageflip)
state per-ring and delay any in-between flushing until the rendering
has completed. And also cancel any delayed flushing if we get a new
invalidation request (whether delayed or not).
Also call intel_mark_fb_busy in both cases in all cases to make sure
that we keep the screen at the highest refresh rate both on flips,
synchronous plane updates and for frontbuffer rendering.
v2: Lots of improvements
Suggestions from Chris:
- Move invalidate/flush in flush_*_domain and set_to_*_domain.
- Drop the flush in busy_ioctl since it's redundant. Was a leftover
from an earlier concept to track flips/delayed flushes.
- Don't forget about the initial modeset enable/final disable.
Suggested by Chris.
Track flips accurately, too. Since flips complete independently of
rendering we need to track pending flips in a separate mask. Again if
an invalidate happens we need to cancel the evenutal flush to avoid
races.
v3:
Provide correct header declarations for flip functions. Currently not
needed outside of intel_display.c, but part of the proper interface.
v4: Add proper domain management to fbcon so that the fbcon buffer is
also tracked correctly.
v5: Fixup locking around the fbcon set_to_gtt_domain call.
v6: More comments from Chris:
- Split out fbcon changes.
- Drop superflous checks for potential scanout before calling intel_fb
functions - we can micro-optimize this later.
- s/intel_fb_/intel_fb_obj_/ to make it clear that this deals in gem
object. We already have precedence for fb_obj in the pin_and_fence
functions.
v7: Clarify the semantics of the flip flush handling by renaming
things a bit:
- Don't go through a gem object but take the relevant frontbuffer bits
directly. These functions center on the plane, the actual object is
irrelevant - even a flip to the same object as already active should
cause a flush.
- Add a new intel_frontbuffer_flip for synchronous plane updates. It
currently just calls intel_frontbuffer_flush since the implemenation
differs.
This way we achieve a clear split between one-shot update events on
one side and frontbuffer rendering with potentially a very long delay
between the invalidate and flush.
Chris and I also had some discussions about mark_busy and whether it
is appropriate to call from flush. But mark busy is a state which
should be derived from the 3 events (invalidate, flush, flip) we now
have by the users, like psr does by tracking relevant information in
psr.busy_frontbuffer_bits. DRRS (the only real use of mark_busy for
frontbuffer) needs to have similar logic. With that the overall
mark_busy in the core could be removed.
v8: Only when retiring gpu buffers only flush frontbuffer bits we
actually invalidated in a batch. Just for safety since before any
additional usage/invalidate we should always retire current rendering.
Suggested by Chris Wilson.
v9: Actually use intel_frontbuffer_flip in all appropriate places.
Spotted by Chris.
v10: Address more comments from Chris:
- Don't call _flip in set_base when the crtc is inactive, avoids redunancy
in the modeset case with the initial enabling of all planes.
- Add comments explaining that the initial/final plane enable/disable
still has work left to do before it's fully generic.
v11: Only invalidate for gtt/cpu access when writing. Spotted by Chris.
v12: s/_flush/_flip/ in intel_overlay.c per Chris' comment.
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-06-19 22:01:59 +08:00
|
|
|
struct i915_frontbuffer_tracking {
|
|
|
|
struct mutex lock;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Tracking bits for delayed frontbuffer flushing du to gpu activity or
|
|
|
|
* scheduled flips.
|
|
|
|
*/
|
|
|
|
unsigned busy_bits;
|
|
|
|
unsigned flip_bits;
|
|
|
|
};
|
|
|
|
|
2014-10-07 22:21:26 +08:00
|
|
|
struct i915_wa_reg {
|
drm/i915: Type safe register read/write
Make I915_READ and I915_WRITE more type safe by wrapping the register
offset in a struct. This should eliminate most of the fumbles we've had
with misplaced parens.
This only takes care of normal mmio registers. We could extend the idea
to other register types and define each with its own struct. That way
you wouldn't be able to accidentally pass the wrong thing to a specific
register access function.
The gpio_reg setup is probably the ugliest thing left. But I figure I'd
just leave it for now, and wait for some divine inspiration to strike
before making it nice.
As for the generated code, it's actually a bit better sometimes. Eg.
looking at i915_irq_handler(), we can see the following change:
lea 0x70024(%rdx,%rax,1),%r9d
mov $0x1,%edx
- movslq %r9d,%r9
- mov %r9,%rsi
- mov %r9,-0x58(%rbp)
- callq *0xd8(%rbx)
+ mov %r9d,%esi
+ mov %r9d,-0x48(%rbp)
callq *0xd8(%rbx)
So previously gcc thought the register offset might be signed and
decided to sign extend it, just in case. The rest appears to be
mostly just minor shuffling of instructions.
v2: i915_mmio_reg_{offset,equal,valid}() helpers added
s/_REG/_MMIO/ in the register defines
mo more switch statements left to worry about
ring_emit stuff got sorted in a prep patch
cmd parser, lrc context and w/a batch buildup also in prep patch
vgpu stuff cleaned up and moved to a prep patch
all other unrelated changes split out
v3: Rebased due to BXT DSI/BLC, MOCS, etc.
v4: Rebased due to churn, s/i915_mmio_reg_t/i915_reg_t/
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1447853606-2751-1-git-send-email-ville.syrjala@linux.intel.com
2015-11-18 21:33:26 +08:00
|
|
|
i915_reg_t addr;
|
2014-10-07 22:21:26 +08:00
|
|
|
u32 value;
|
|
|
|
/* bitmask representing WA bits */
|
|
|
|
u32 mask;
|
|
|
|
};
|
|
|
|
|
2016-01-22 05:43:47 +08:00
|
|
|
/*
|
|
|
|
* RING_MAX_NONPRIV_SLOTS is per-engine but at this point we are only
|
|
|
|
* allowing it for RCS as we don't foresee any requirement of having
|
|
|
|
* a whitelist for other engines. When it is really required for
|
|
|
|
* other engines then the limit need to be increased.
|
|
|
|
*/
|
|
|
|
#define I915_MAX_WA_REGS (16 + RING_MAX_NONPRIV_SLOTS)
|
2014-10-07 22:21:26 +08:00
|
|
|
|
|
|
|
struct i915_workarounds {
|
|
|
|
struct i915_wa_reg reg[I915_MAX_WA_REGS];
|
|
|
|
u32 count;
|
2016-03-16 19:00:39 +08:00
|
|
|
u32 hw_whitelist_count[I915_NUM_ENGINES];
|
2014-10-07 22:21:26 +08:00
|
|
|
};
|
|
|
|
|
2015-02-10 19:05:47 +08:00
|
|
|
struct i915_virtual_gpu {
|
|
|
|
bool active;
|
|
|
|
};
|
|
|
|
|
2015-05-30 00:43:27 +08:00
|
|
|
struct i915_execbuffer_params {
|
|
|
|
struct drm_device *dev;
|
|
|
|
struct drm_file *file;
|
|
|
|
uint32_t dispatch_flags;
|
|
|
|
uint32_t args_batch_start_offset;
|
2015-07-30 00:23:59 +08:00
|
|
|
uint64_t batch_obj_vm_offset;
|
2016-03-16 19:00:38 +08:00
|
|
|
struct intel_engine_cs *engine;
|
2015-05-30 00:43:27 +08:00
|
|
|
struct drm_i915_gem_object *batch_obj;
|
2016-05-24 21:53:34 +08:00
|
|
|
struct i915_gem_context *ctx;
|
2015-05-30 00:43:30 +08:00
|
|
|
struct drm_i915_gem_request *request;
|
2015-05-30 00:43:27 +08:00
|
|
|
};
|
|
|
|
|
2015-09-25 06:53:18 +08:00
|
|
|
/* used in computing the new watermarks state */
|
|
|
|
struct intel_wm_config {
|
|
|
|
unsigned int num_pipes_active;
|
|
|
|
bool sprites_enabled;
|
|
|
|
bool sprites_scaled;
|
|
|
|
};
|
|
|
|
|
2014-03-31 19:27:22 +08:00
|
|
|
struct drm_i915_private {
|
2016-06-24 21:00:18 +08:00
|
|
|
struct drm_device drm;
|
|
|
|
|
2015-04-07 23:20:57 +08:00
|
|
|
struct kmem_cache *objects;
|
2015-04-07 23:20:58 +08:00
|
|
|
struct kmem_cache *vmas;
|
2015-04-07 23:20:57 +08:00
|
|
|
struct kmem_cache *requests;
|
2012-11-03 02:55:02 +08:00
|
|
|
|
2014-02-08 03:12:48 +08:00
|
|
|
const struct intel_device_info info;
|
2012-11-03 02:55:02 +08:00
|
|
|
|
|
|
|
int relative_constants_mode;
|
|
|
|
|
|
|
|
void __iomem *regs;
|
|
|
|
|
2013-07-20 03:36:52 +08:00
|
|
|
struct intel_uncore uncore;
|
2012-11-03 02:55:02 +08:00
|
|
|
|
2015-02-10 19:05:47 +08:00
|
|
|
struct i915_virtual_gpu vgpu;
|
|
|
|
|
drm/i915: gvt: Introduce the basic architecture of GVT-g
This patch introduces the very basic framework of GVT-g device model,
includes basic prototypes, definitions, initialization.
v12:
- Call intel_gvt_init() in driver early initialization stage. (Chris)
v8:
- Remove the GVT idr and mutex in intel_gvt_host. (Joonas)
v7:
- Refine the URL link in Kconfig. (Joonas)
- Refine the introduction of GVT-g host support in Kconfig. (Joonas)
- Remove the macro GVT_ALIGN(), use round_down() instead. (Joonas)
- Make "struct intel_gvt" a data member in struct drm_i915_private.(Joonas)
- Remove {alloc, free}_gvt_device()
- Rename intel_gvt_{create, destroy}_gvt_device()
- Expost intel_gvt_init_host()
- Remove the dummy "struct intel_gvt" declaration in intel_gvt.h (Joonas)
v6:
- Refine introduction in Kconfig. (Chris)
- The exposed API functions will take struct intel_gvt * instead of
void *. (Chris/Tvrtko)
- Remove most memebers of strct intel_gvt_device_info. Will add them
in the device model patches.(Chris)
- Remove gvt_info() and gvt_err() in debug.h. (Chris)
- Move GVT kernel parameter into i915_params. (Chris)
- Remove include/drm/i915_gvt.h, as GVT-g will be built within i915.
- Remove the redundant struct i915_gvt *, as the functions in i915
will directly take struct intel_gvt *.
- Add more comments for reviewer.
v5:
Take Tvrtko's comments:
- Fix the misspelled words in Kconfig
- Let functions take drm_i915_private * instead of struct drm_device *
- Remove redundant prints/local varible initialization
v3:
Take Joonas' comments:
- Change file name i915_gvt.* to intel_gvt.*
- Move GVT kernel parameter into intel_gvt.c
- Remove redundant debug macros
- Change error handling style
- Add introductions for some stub functions
- Introduce drm/i915_gvt.h.
Take Kevin's comments:
- Move GVT-g host/guest check into intel_vgt_balloon in i915_gem_gtt.c
v2:
- Introduce i915_gvt.c.
It's necessary to introduce the stubs between i915 driver and GVT-g host,
as GVT-g components is configurable in kernel config. When disabled, the
stubs here do nothing.
Take Joonas' comments:
- Replace boolean return value with int.
- Replace customized info/warn/debug macros with DRM macros.
- Document all non-static functions like i915.
- Remove empty and unused functions.
- Replace magic number with marcos.
- Set GVT-g in kernel config to "n" by default.
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466078825-6662-5-git-send-email-zhi.a.wang@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-06-16 20:07:00 +08:00
|
|
|
struct intel_gvt gvt;
|
|
|
|
|
2015-08-12 22:43:36 +08:00
|
|
|
struct intel_guc guc;
|
|
|
|
|
drm/i915/skl: Add support to load SKL CSR firmware.
Display Context Save and Restore support is needed for
various SKL Display C states like DC5, DC6.
This implementation is added based on first version of DMC CSR program
that we received from h/w team.
Here we are using request_firmware based design.
Finally this firmware should end up in linux-firmware tree.
For SKL platform its mandatory to ensure that we load this
csr program before enabling DC states like DC5/DC6.
As CSR program gets reset on various conditions, we should ensure
to load it during boot and in future change to be added to load
this system resume sequence too.
v1: Initial relese as RFC patch
v2: Design change as per Daniel, Damien and Shobit's review comments
request firmware method followed.
v3: Some optimization and functional changes.
Pulled register defines into drivers/gpu/drm/i915/i915_reg.h
Used kmemdup to allocate and duplicate firmware content.
Ensured to free allocated buffer.
v4: Modified as per review comments from Satheesh and Daniel
Removed temporary buffer.
Optimized number of writes by replacing I915_WRITE with I915_WRITE64.
v5:
Modified as per review comemnts from Damien.
- Changed name for functions and firmware.
- Introduced HAS_CSR.
- Reverted back previous change and used csr_buf with u8 size.
- Using cpu_to_be64 for endianness change.
Modified as per review comments from Imre.
- Modified registers and macro names to be a bit closer to bspec terminology
and the existing register naming in the driver.
- Early return for non SKL platforms in intel_load_csr_program function.
- Added locking around CSR program load function as it may be called
concurrently during system/runtime resume.
- Releasing the fw before loading the program for consistency
- Handled error path during f/w load.
v6: Modified as per review comments from Imre.
- Corrected out_freecsr sequence.
v7: Modified as per review comments from Imre.
Fail loading fw if fw->size%8!=0.
v8: Rebase to latest.
v9: Rebase on top of -nightly (Damien)
v10: Enabled support for dmc firmware ver 1.0.
According to ver 1.0 in a single binary package all the firmware's that are
required for different stepping's of the product will be stored. The package
contains the css header, followed by the package header and the actual dmc
firmwares. Package header contains the firmware/stepping mapping table and
the corresponding firmware offsets to the individual binaries, within the
package. Each individual program binary contains the header and the payload
sections whose size is specified in the header section. This changes are done
to extract the specific firmaware from the package. (Animesh)
v11: Modified as per review comemnts from Imre.
- Added code comment from bpec for header structure elements.
- Added __packed to avoid structure padding.
- Added helper functions for stepping and substepping info.
- Added code comment for CSR_MAX_FW_SIZE.
- Disabled BXT firmware loading, will be enabled with dmc 1.0 support.
- Changed skl_stepping_info based on bspec, earlier used from config DB.
- Removed duplicate call of cpu_to_be* from intel_csr_load_program function.
- Used cpu_to_be32 instead of cpu_to_be64 as firmware binary in dword aligned.
- Added sanity check for header length.
- Added sanity check for mmio address got from firmware binary.
- kmalloc done separately for dmc header and dmc firmware. (Animesh)
v12: Modified as per review comemnts from Imre.
- Corrected the typo error in skl stepping info structure.
- Added out-of-bound access for skl_stepping_info.
- Sanity check for mmio address modified.
- Sanity check added for stepping and substeppig.
- Modified the intel_dmc_info structure, cache only the required header info. (Animesh)
v13: clarify firmware load error message.
The reason for a firmware loading failure can be obscure if the driver
is built-in. Provide an explanation to the user about the likely reason for
the failure and how to resolve it. (Imre)
v14: Suggested by Jani.
- fix s/I915/CONFIG_DRM_I915/ typo
- add fw_path to the firmware object instead of using a static ptr (Jani)
v15:
1) Changed the firmware name as dmc_gen9.bin, everytime for a new firmware version a symbolic link
with same name will help not to build kernel again.
2) Changes done as per review comments from Imre.
- Error check removed for intel_csr_ucode_init.
- Moved csr-specific data structure to intel_csr.h and optimization done on structure definition.
- fw->data used directly for parsing the header info & memory allocation
only done separately for payload. (Animesh)
v16:
- No need for out_regs label in i915_driver_load(), so removed it.
- Changed the firmware name as skl_dmc_ver1.bin, followed naming convention <platform>_dmc_<api-version>.bin (Animesh)
Issue: VIZ-2569
Signed-off-by: A.Sunil Kamath <sunil.kamath@intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Animesh Manna <animesh.manna@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-05-04 20:58:44 +08:00
|
|
|
struct intel_csr csr;
|
|
|
|
|
2015-04-01 15:55:04 +08:00
|
|
|
struct intel_gmbus gmbus[GMBUS_NUM_PINS];
|
2012-12-01 20:53:45 +08:00
|
|
|
|
2012-11-03 02:55:02 +08:00
|
|
|
/** gmbus_mutex protects against concurrent usage of the single hw gmbus
|
|
|
|
* controller on different i2c buses. */
|
|
|
|
struct mutex gmbus_mutex;
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Base address of the gmbus and gpio block.
|
|
|
|
*/
|
|
|
|
uint32_t gpio_mmio_base;
|
|
|
|
|
2014-05-19 23:24:03 +08:00
|
|
|
/* MMIO base address for MIPI regs */
|
|
|
|
uint32_t mipi_mmio_base;
|
|
|
|
|
2015-11-12 02:34:15 +08:00
|
|
|
uint32_t psr_mmio_base;
|
|
|
|
|
2012-12-01 20:53:45 +08:00
|
|
|
wait_queue_head_t gmbus_wait_queue;
|
|
|
|
|
2012-11-03 02:55:02 +08:00
|
|
|
struct pci_dev *bridge_dev;
|
2016-05-24 21:53:40 +08:00
|
|
|
struct i915_gem_context *kernel_context;
|
2016-03-16 19:00:39 +08:00
|
|
|
struct intel_engine_cs engine[I915_NUM_ENGINES];
|
2014-07-01 00:53:37 +08:00
|
|
|
struct drm_i915_gem_object *semaphore_obj;
|
2012-12-10 21:41:48 +08:00
|
|
|
uint32_t last_seqno, next_seqno;
|
2012-11-03 02:55:02 +08:00
|
|
|
|
2014-09-11 13:43:25 +08:00
|
|
|
struct drm_dma_handle *status_page_dmah;
|
2012-11-03 02:55:02 +08:00
|
|
|
struct resource mch_res;
|
|
|
|
|
|
|
|
/* protects the irq masks */
|
|
|
|
spinlock_t irq_lock;
|
|
|
|
|
drm/i915: Replaced Blitter ring based flips with MMIO flips
This patch enables the framework for using MMIO based flip calls,
in contrast with the CS based flip calls which are being used currently.
MMIO based flip calls can be enabled on architectures where
Render and Blitter engines reside in different power wells. The
decision to use MMIO flips can be made based on workloads to give
100% residency for Media power well.
v2: The MMIO flips now use the interrupt driven mechanism for issuing the
flips when target seqno is reached. (Incorporating Ville's idea)
v3: Rebasing on latest code. Code restructuring after incorporating
Damien's comments
v4: Addressing Ville's review comments
-general cleanup
-updating only base addr instead of calling update_primary_plane
-extending patch for gen5+ platforms
v5: Addressed Ville's review comments
-Making mmio flip vs cs flip selection based on module parameter
-Adding check for DRIVER_MODESET feature in notify_ring before calling
notify mmio flip.
-Other changes mostly in function arguments
v6: -Having a seperate function to check condition for using mmio flips (Ville)
-propogating error code from i915_gem_check_olr (Ville)
v7: -Adding __must_check with i915_gem_check_olr (Chris)
-Renaming mmio_flip_data to mmio_flip (Chris)
-Rebasing on latest nightly
v8: -Rebasing on latest code
-squash 3rd patch in series(mmio setbase vs page flip race) with this patch
-Added new tiling mode update in intel_do_mmio_flip (Chris)
v9: -check for obj->last_write_seqno being 0 instead of obj->ring being NULL in
intel_postpone_flip, as this is a more restrictive condition (Chris)
v10: -Applied Chris's suggestions for squashing patches 2,3 into this patch.
These patches make the selection of CS vs MMIO flip at the page flip time, and
make the module parameter for using mmio flips as tristate, the states being
'force CS flips', 'force mmio flips', 'driver discretion'.
Changed the logic for driver discretion (Chris)
v11: Minor code cleanup(better readability, fixing whitespace errors, using
lockdep to check mutex locked status in postpone_flip, removal of __must_check
in function definition) (Chris)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk> # snb, ivb
[danvet: Fix up parameter alignement checkpatch spotted.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-06-02 19:17:17 +08:00
|
|
|
/* protects the mmio flip data */
|
|
|
|
spinlock_t mmio_flip_lock;
|
|
|
|
|
2014-03-05 01:23:07 +08:00
|
|
|
bool display_irqs_enabled;
|
|
|
|
|
drm/i915: irq-drive the dp aux communication
At least on the platforms that have a dp aux irq and also have it
enabled - vlvhsw should have one, too. But I don't have a machine to
test this on. Judging from docs there's no dp aux interrupt for gm45.
Also, I only have an ivb cpu edp machine, so the dp aux A code for
snb/ilk is untested.
For dpcd probing when nothing is connected it slashes about 5ms of cpu
time (cpu time is now negligible), which agrees with 3 * 5 400 usec
timeouts.
A previous version of this patch increases the time required to go
through the dp_detect cycle (which includes reading the edid) from
around 33 ms to around 40 ms. Experiments indicated that this is
purely due to the irq latency - the hw doesn't allow us to queue up
dp aux transactions and hence irq latency directly affects throughput.
gmbus is much better, there we have a 8 byte buffer, and we get the
irq once another 4 bytes can be queued up.
But by using the pm_qos interface to request the lowest possible cpu
wake-up latency this slowdown completely disappeared.
Since all our output detection logic is single-threaded with the
mode_config mutex right now anyway, I've decide not ot play fancy and
to just reuse the gmbus wait queue. But this would definitely prep the
way to run dp detection on different ports in parallel
v2: Add a timeout for dp aux transfers when using interrupts - the hw
_does_ prevent this with the hw-based 400 usec timeout, but if the
irq somehow doesn't arrive we're screwed. Lesson learned while
developing this ;-)
v3: While at it also convert the busy-loop to wait_for_atomic, so that
we don't run the risk of an infinite loop any more.
v4: Ensure we have the smallest possible irq latency by using the
pm_qos interface.
v5: Add a comment to the code to explain why we frob pm_qos. Suggested
by Chris Wilson.
v6: Disable dp irq for vlv, that's easier than trying to get at docs
and hw.
v7: Squash in a fix for Haswell that Paulo Zanoni tracked down - the
dp aux registers aren't at a fixed offset any more, but can be on the
PCH while the DP port is on the cpu die.
Reviewed-by: Imre Deak <imre.deak@intel.com> (v6)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-01 20:53:48 +08:00
|
|
|
/* To control wakeup latency, e.g. for irq-driven dp aux transfers. */
|
|
|
|
struct pm_qos_request pm_qos;
|
|
|
|
|
2015-05-27 01:42:30 +08:00
|
|
|
/* Sideband mailbox protection */
|
|
|
|
struct mutex sb_lock;
|
2012-11-03 02:55:02 +08:00
|
|
|
|
|
|
|
/** Cached value of IMR to avoid reads in updating the bitfield */
|
drm/i915/bdw: Implement interrupt changes
The interrupt handling implementation remains the same as previous
generations with the 4 types of registers, status, identity, mask, and
enable. However the layout of where the bits go have changed entirely.
To address these changes, all of the interrupt vfuncs needed special
gen8 code.
The way it works is there is a top level status register now which
informs the interrupt service routine which unit caused the interrupt,
and therefore which interrupt registers to read to process the
interrupt. For display the division is quite logical, a set of interrupt
registers for each pipe, and in addition to those, a set each for "misc"
and port.
For GT the things get a bit hairy, as seen by the code. Each of the GT
units has it's own bits defined. They all look *very similar* and
resides in 16 bits of a GT register. As an example, RCS and BCS share
register 0. To compact the code a bit, at a slight expense to
complexity, this is exactly how the code works as well. 2 structures are
added to the ring buffer so that our ring buffer interrupt handling code
knows which ring shares the interrupt registers, and a shift value (ie.
the top or bottom 16 bits of the register).
The above allows us to kept the interrupt register caching scheme, the
per interrupt enables, and the code to mask and unmask interrupts
relatively clean (again at the cost of some more complexity).
Most of the GT units mentioned above are command streamers, and so the
symmetry should work quite well for even the yet to be implemented rings
which Broadwell adds.
v2: Fixes up a couple of bugs, and is more verbose about errors in the
Broadwell interrupt handler.
v3: fix DE_MISC IER offset
v4: Simplify interrupts:
I totally misread the docs the first time I implemented interrupts, and
so this should greatly simplify the mess. Unlike GEN6, we never touch
the regular mask registers in irq_get/put.
v5: Rebased on to of recent pch hotplug setup changes.
v6: Fixup on top of moving num_pipes to intel_info.
v7: Rebased on top of Egbert Eich's hpd irq handling rework. Also
wired up ibx_hpd_irq_setup for gen8.
v8: Rebase on top of Jani's asle handling rework.
v9: Rebase on top of Ben's VECS enabling for Haswell, where he
unfortunately went OCD on the gt irq #defines. Not that they're still
not yet fully consistent:
- Used the GT_RENDER_ #defines + bdw shifts.
- Dropped the shift from the L3_PARITY stuff, seemed clearer.
- s/irq_refcount/irq_refcount.gt/
v10: Squash in VECS enabling patches and the gen8_gt_irq_handler
refactoring from Zhao Yakui <yakui.zhao@intel.com>
v11: Rebase on top of the interrupt cleanups in upstream.
v12: Rebase on top of Ben's DPF changes in upstream.
v13: Drop bdw from the HAS_L3_DPF feature flag for now, it's unclear what
exactly needs to be done. Requested by Ben.
v14: Fix the patch.
- Drop the mask of reserved bits and assorted logic, it doesn't match
the spec.
- Do the posting read inconditionally instead of commenting it out.
- Add a GEN8_MASTER_IRQ_CONTROL definition and use it.
- Fix up the GEN8_PIPE interrupt defines and give the GEN8_ prefixes -
we actually will need to use them.
- Enclose macros in do {} while (0) (checkpatch).
- Clear DE_MISC interrupt bits only after having processed them.
- Fix whitespace fail (checkpatch).
- Fix overtly long lines where appropriate (checkpatch).
- Don't use typedef'ed private_t (maintainer-scripts).
- Align the function parameter list correctly.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v4)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
bikeshed
2013-11-03 12:07:09 +08:00
|
|
|
union {
|
|
|
|
u32 irq_mask;
|
|
|
|
u32 de_irq_mask[I915_MAX_PIPES];
|
|
|
|
};
|
2012-11-03 02:55:02 +08:00
|
|
|
u32 gt_irq_mask;
|
2013-08-07 05:57:15 +08:00
|
|
|
u32 pm_irq_mask;
|
2014-03-15 22:53:22 +08:00
|
|
|
u32 pm_rps_events;
|
2014-02-11 00:42:49 +08:00
|
|
|
u32 pipestat_irq_mask[I915_MAX_PIPES];
|
2012-11-03 02:55:02 +08:00
|
|
|
|
2015-05-27 20:03:42 +08:00
|
|
|
struct i915_hotplug hotplug;
|
2016-01-12 03:44:36 +08:00
|
|
|
struct intel_fbc fbc;
|
2014-04-05 14:43:28 +08:00
|
|
|
struct i915_drrs drrs;
|
2012-11-03 02:55:02 +08:00
|
|
|
struct intel_opregion opregion;
|
2013-05-10 07:03:18 +08:00
|
|
|
struct intel_vbt_data vbt;
|
2012-11-03 02:55:02 +08:00
|
|
|
|
2014-10-10 03:57:43 +08:00
|
|
|
bool preserve_bios_swizzle;
|
|
|
|
|
2012-11-03 02:55:02 +08:00
|
|
|
/* overlay */
|
|
|
|
struct intel_overlay *overlay;
|
|
|
|
|
2013-11-08 22:48:54 +08:00
|
|
|
/* backlight registers and fields in struct intel_panel */
|
2014-09-15 20:35:09 +08:00
|
|
|
struct mutex backlight_lock;
|
2013-04-02 20:48:09 +08:00
|
|
|
|
2012-11-03 02:55:02 +08:00
|
|
|
/* LVDS info */
|
|
|
|
bool no_aux_handshake;
|
|
|
|
|
2014-09-04 19:53:14 +08:00
|
|
|
/* protects panel power sequencer state */
|
|
|
|
struct mutex pps_mutex;
|
|
|
|
|
2012-11-03 02:55:02 +08:00
|
|
|
struct drm_i915_fence_reg fence_regs[I915_MAX_NUM_FENCES]; /* assume 965 */
|
|
|
|
int num_fence_regs; /* 8 on pre-965, 16 otherwise */
|
|
|
|
|
|
|
|
unsigned int fsb_freq, mem_freq, is_ddr3;
|
2016-05-14 04:41:27 +08:00
|
|
|
unsigned int skl_preferred_vco_freq;
|
2015-12-03 21:31:06 +08:00
|
|
|
unsigned int cdclk_freq, max_cdclk_freq, atomic_cdclk_freq;
|
2015-08-18 19:36:59 +08:00
|
|
|
unsigned int max_dotclk_freq;
|
2016-03-02 23:22:13 +08:00
|
|
|
unsigned int rawclk_freq;
|
2014-10-07 22:41:22 +08:00
|
|
|
unsigned int hpll_freq;
|
2015-09-25 04:29:18 +08:00
|
|
|
unsigned int czclk_freq;
|
2012-11-03 02:55:02 +08:00
|
|
|
|
2016-05-14 04:41:32 +08:00
|
|
|
struct {
|
2016-05-14 04:41:33 +08:00
|
|
|
unsigned int vco, ref;
|
2016-05-14 04:41:32 +08:00
|
|
|
} cdclk_pll;
|
|
|
|
|
2013-09-02 22:22:25 +08:00
|
|
|
/**
|
|
|
|
* wq - Driver workqueue for GEM.
|
|
|
|
*
|
|
|
|
* NOTE: Work items scheduled here are not allowed to grab any modeset
|
|
|
|
* locks, for otherwise the flushing done in the pageflip code will
|
|
|
|
* result in deadlocks.
|
|
|
|
*/
|
2012-11-03 02:55:02 +08:00
|
|
|
struct workqueue_struct *wq;
|
|
|
|
|
|
|
|
/* Display functions */
|
|
|
|
struct drm_i915_display_funcs display;
|
|
|
|
|
|
|
|
/* PCH chipset type */
|
|
|
|
enum intel_pch pch_type;
|
2012-11-21 01:12:07 +08:00
|
|
|
unsigned short pch_id;
|
2012-11-03 02:55:02 +08:00
|
|
|
|
|
|
|
unsigned long quirks;
|
|
|
|
|
i915: ignore lid open event when resuming
i915 driver needs to do modeset when
1. system resumes from sleep
2. lid is opened
In PM_SUSPEND_MEM state, all the GPEs are cleared when system resumes,
thus it is the i915_resume code does the modeset rather than intel_lid_notify().
But in PM_SUSPEND_FREEZE state, this will be broken because
system is still responsive to the lid events.
1. When we close the lid in Freeze state, intel_lid_notify() sets modeset_on_lid.
2. When we reopen the lid, intel_lid_notify() will do a modeset,
before the system is resumed.
here is the error log,
[92146.548074] WARNING: at drivers/gpu/drm/i915/intel_display.c:1028 intel_wait_for_pipe_off+0x184/0x190 [i915]()
[92146.548076] Hardware name: VGN-Z540N
[92146.548078] pipe_off wait timed out
[92146.548167] Modules linked in: hid_generic usbhid hid snd_hda_codec_realtek snd_hda_intel snd_hda_codec parport_pc snd_hwdep ppdev snd_pcm_oss i915 snd_mixer_oss snd_pcm arc4 iwldvm snd_seq_dummy mac80211 snd_seq_oss snd_seq_midi fbcon tileblit font bitblit softcursor drm_kms_helper snd_rawmidi snd_seq_midi_event coretemp drm snd_seq kvm btusb bluetooth snd_timer iwlwifi pcmcia tpm_infineon i2c_algo_bit joydev snd_seq_device intel_agp cfg80211 snd intel_gtt yenta_socket pcmcia_rsrc sony_laptop agpgart microcode psmouse tpm_tis serio_raw mxm_wmi soundcore snd_page_alloc tpm acpi_cpufreq lpc_ich pcmcia_core tpm_bios mperf processor lp parport firewire_ohci firewire_core crc_itu_t sdhci_pci sdhci thermal e1000e
[92146.548173] Pid: 4304, comm: kworker/0:0 Tainted: G W 3.8.0-rc3-s0i3-v3-test+ #9
[92146.548175] Call Trace:
[92146.548189] [<c10378e2>] warn_slowpath_common+0x72/0xa0
[92146.548227] [<f86398b4>] ? intel_wait_for_pipe_off+0x184/0x190 [i915]
[92146.548263] [<f86398b4>] ? intel_wait_for_pipe_off+0x184/0x190 [i915]
[92146.548270] [<c10379b3>] warn_slowpath_fmt+0x33/0x40
[92146.548307] [<f86398b4>] intel_wait_for_pipe_off+0x184/0x190 [i915]
[92146.548344] [<f86399c2>] intel_disable_pipe+0x102/0x190 [i915]
[92146.548380] [<f8639ea4>] ? intel_disable_plane+0x64/0x80 [i915]
[92146.548417] [<f8639f7c>] i9xx_crtc_disable+0xbc/0x150 [i915]
[92146.548456] [<f863ebee>] intel_crtc_update_dpms+0x5e/0x90 [i915]
[92146.548493] [<f86437cf>] intel_modeset_setup_hw_state+0x42f/0x8f0 [i915]
[92146.548535] [<f8645b0b>] intel_lid_notify+0x9b/0xc0 [i915]
[92146.548543] [<c15610d3>] notifier_call_chain+0x43/0x60
[92146.548550] [<c105d1e1>] __blocking_notifier_call_chain+0x41/0x80
[92146.548556] [<c105d23f>] blocking_notifier_call_chain+0x1f/0x30
[92146.548563] [<c131a684>] acpi_lid_send_state+0x78/0xa4
[92146.548569] [<c131aa9e>] acpi_button_notify+0x3b/0xf1
[92146.548577] [<c12df56a>] ? acpi_os_execute+0x17/0x19
[92146.548582] [<c12e591a>] ? acpi_ec_sync_query+0xa5/0xbc
[92146.548589] [<c12e2b82>] acpi_device_notify+0x16/0x18
[92146.548595] [<c12f4904>] acpi_ev_notify_dispatch+0x38/0x4f
[92146.548600] [<c12df0e8>] acpi_os_execute_deferred+0x20/0x2b
[92146.548607] [<c1051208>] process_one_work+0x128/0x3f0
[92146.548613] [<c1564f73>] ? common_interrupt+0x33/0x38
[92146.548618] [<c104f8c0>] ? wake_up_worker+0x30/0x30
[92146.548624] [<c12df0c8>] ? acpi_os_wait_events_complete+0x1e/0x1e
[92146.548629] [<c10524f9>] worker_thread+0x119/0x3b0
[92146.548634] [<c10523e0>] ? manage_workers+0x240/0x240
[92146.548640] [<c1056e84>] kthread+0x94/0xa0
[92146.548647] [<c1060000>] ? ftrace_raw_output_sched_stat_runtime+0x70/0xf0
[92146.548652] [<c15649b7>] ret_from_kernel_thread+0x1b/0x28
[92146.548658] [<c1056df0>] ? kthread_create_on_node+0xc0/0xc0
three different modeset flags are introduced in this patch
MODESET_ON_LID_OPEN: do modeset on next lid open event
MODESET_DONE: modeset already done
MODESET_SUSPENDED: suspended, only do modeset when system is resumed
In this way,
1. when lid is closed, MODESET_ON_LID_OPEN is set so that
we'll do modeset on next lid open event.
2. when lid is opened, MODESET_DONE is set
so that duplicate lid open events will be ignored.
3. when system suspends, MODESET_SUSPENDED is set.
In this case, we will not do modeset on any lid events.
Plus, locking mechanism is also introduced to avoid racing.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-02-05 15:41:53 +08:00
|
|
|
enum modeset_restore modeset_restore;
|
|
|
|
struct mutex modeset_restore_lock;
|
2016-02-16 17:06:14 +08:00
|
|
|
struct drm_atomic_state *modeset_restore_state;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2013-07-17 07:50:07 +08:00
|
|
|
struct list_head vm_list; /* Global list of all address spaces */
|
2016-03-18 16:42:57 +08:00
|
|
|
struct i915_ggtt ggtt; /* VM representing the global address space */
|
2013-01-18 04:45:15 +08:00
|
|
|
|
2012-11-15 00:14:03 +08:00
|
|
|
struct i915_gem_mm mm;
|
2014-08-07 21:20:40 +08:00
|
|
|
DECLARE_HASHTABLE(mm_structs, 7);
|
|
|
|
struct mutex mm_lock;
|
2012-05-02 17:49:32 +08:00
|
|
|
|
2016-04-28 16:56:51 +08:00
|
|
|
/* The hw wants to have a stable context identifier for the lifetime
|
|
|
|
* of the context (for OA, PASID, faults, etc). This is limited
|
|
|
|
* in execlists to 21 bits.
|
|
|
|
*/
|
|
|
|
struct ida context_hw_ida;
|
|
|
|
#define MAX_CONTEXT_HW_ID (1<<21) /* exclusive */
|
|
|
|
|
2012-05-02 17:49:32 +08:00
|
|
|
/* Kernel Modesetting */
|
|
|
|
|
2014-02-08 03:12:52 +08:00
|
|
|
struct drm_crtc *plane_to_crtc_mapping[I915_MAX_PIPES];
|
|
|
|
struct drm_crtc *pipe_to_crtc_mapping[I915_MAX_PIPES];
|
2009-11-19 00:25:18 +08:00
|
|
|
wait_queue_head_t pending_flip_queue;
|
|
|
|
|
2013-10-22 03:04:07 +08:00
|
|
|
#ifdef CONFIG_DEBUG_FS
|
|
|
|
struct intel_pipe_crc pipe_crc[I915_MAX_PIPES];
|
|
|
|
#endif
|
|
|
|
|
2015-12-10 19:33:57 +08:00
|
|
|
/* dpll and cdclk state is protected by connection_mutex */
|
2013-06-05 19:34:06 +08:00
|
|
|
int num_shared_dpll;
|
|
|
|
struct intel_shared_dpll shared_dplls[I915_NUM_PLLS];
|
2016-03-08 23:46:22 +08:00
|
|
|
const struct intel_dpll_mgr *dpll_mgr;
|
2015-12-10 19:33:57 +08:00
|
|
|
|
2016-03-23 21:51:12 +08:00
|
|
|
/*
|
|
|
|
* dpll_lock serializes intel_{prepare,enable,disable}_shared_dpll.
|
|
|
|
* Must be global rather than per dpll, because on some platforms
|
|
|
|
* plls share registers.
|
|
|
|
*/
|
|
|
|
struct mutex dpll_lock;
|
|
|
|
|
2015-12-10 19:33:57 +08:00
|
|
|
unsigned int active_crtcs;
|
|
|
|
unsigned int min_pixclk[I915_MAX_PIPES];
|
|
|
|
|
2013-11-06 14:36:35 +08:00
|
|
|
int dpio_phy_iosf_port[I915_NUM_PHYS_VLV];
|
2012-04-21 00:11:53 +08:00
|
|
|
|
2014-10-07 22:21:26 +08:00
|
|
|
struct i915_workarounds workarounds;
|
2014-08-26 21:44:51 +08:00
|
|
|
|
drm/i915: Track frontbuffer invalidation/flushing
So these are the guts of the new beast. This tracks when a frontbuffer
gets invalidated (due to frontbuffer rendering) and hence should be
constantly scaned out, and when it's flushed again and can be
compressed/one-shot-upload.
Rules for flushing are simple: The frontbuffer needs one more full
upload starting from the next vblank. Which means that the flushing
can _only_ be called once the frontbuffer update has been latched.
But this poses a problem for pageflips: We can't just delay the
flushing until the pageflip is latched, since that would pose the risk
that we override frontbuffer rendering that has been scheduled
in-between the pageflip ioctl and the actual latching.
To handle this track asynchronous invalidations (and also pageflip)
state per-ring and delay any in-between flushing until the rendering
has completed. And also cancel any delayed flushing if we get a new
invalidation request (whether delayed or not).
Also call intel_mark_fb_busy in both cases in all cases to make sure
that we keep the screen at the highest refresh rate both on flips,
synchronous plane updates and for frontbuffer rendering.
v2: Lots of improvements
Suggestions from Chris:
- Move invalidate/flush in flush_*_domain and set_to_*_domain.
- Drop the flush in busy_ioctl since it's redundant. Was a leftover
from an earlier concept to track flips/delayed flushes.
- Don't forget about the initial modeset enable/final disable.
Suggested by Chris.
Track flips accurately, too. Since flips complete independently of
rendering we need to track pending flips in a separate mask. Again if
an invalidate happens we need to cancel the evenutal flush to avoid
races.
v3:
Provide correct header declarations for flip functions. Currently not
needed outside of intel_display.c, but part of the proper interface.
v4: Add proper domain management to fbcon so that the fbcon buffer is
also tracked correctly.
v5: Fixup locking around the fbcon set_to_gtt_domain call.
v6: More comments from Chris:
- Split out fbcon changes.
- Drop superflous checks for potential scanout before calling intel_fb
functions - we can micro-optimize this later.
- s/intel_fb_/intel_fb_obj_/ to make it clear that this deals in gem
object. We already have precedence for fb_obj in the pin_and_fence
functions.
v7: Clarify the semantics of the flip flush handling by renaming
things a bit:
- Don't go through a gem object but take the relevant frontbuffer bits
directly. These functions center on the plane, the actual object is
irrelevant - even a flip to the same object as already active should
cause a flush.
- Add a new intel_frontbuffer_flip for synchronous plane updates. It
currently just calls intel_frontbuffer_flush since the implemenation
differs.
This way we achieve a clear split between one-shot update events on
one side and frontbuffer rendering with potentially a very long delay
between the invalidate and flush.
Chris and I also had some discussions about mark_busy and whether it
is appropriate to call from flush. But mark busy is a state which
should be derived from the 3 events (invalidate, flush, flip) we now
have by the users, like psr does by tracking relevant information in
psr.busy_frontbuffer_bits. DRRS (the only real use of mark_busy for
frontbuffer) needs to have similar logic. With that the overall
mark_busy in the core could be removed.
v8: Only when retiring gpu buffers only flush frontbuffer bits we
actually invalidated in a batch. Just for safety since before any
additional usage/invalidate we should always retire current rendering.
Suggested by Chris Wilson.
v9: Actually use intel_frontbuffer_flip in all appropriate places.
Spotted by Chris.
v10: Address more comments from Chris:
- Don't call _flip in set_base when the crtc is inactive, avoids redunancy
in the modeset case with the initial enabling of all planes.
- Add comments explaining that the initial/final plane enable/disable
still has work left to do before it's fully generic.
v11: Only invalidate for gtt/cpu access when writing. Spotted by Chris.
v12: s/_flush/_flip/ in intel_overlay.c per Chris' comment.
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-06-19 22:01:59 +08:00
|
|
|
struct i915_frontbuffer_tracking fb_tracking;
|
|
|
|
|
2009-08-18 04:31:43 +08:00
|
|
|
u16 orig_clock;
|
2010-01-30 03:27:07 +08:00
|
|
|
|
2009-12-17 14:48:43 +08:00
|
|
|
bool mchbar_need_disable;
|
2010-01-30 03:27:07 +08:00
|
|
|
|
2012-11-03 02:55:07 +08:00
|
|
|
struct intel_l3_parity l3_parity;
|
|
|
|
|
2013-07-05 02:02:05 +08:00
|
|
|
/* Cannot be determined by PCIID. You must always read a register. */
|
2016-04-13 22:26:43 +08:00
|
|
|
u32 edram_cap;
|
2013-07-05 02:02:05 +08:00
|
|
|
|
2012-08-09 05:35:35 +08:00
|
|
|
/* gen6+ rps state */
|
2012-11-03 02:55:03 +08:00
|
|
|
struct intel_gen6_power_mgmt rps;
|
2012-08-09 05:35:35 +08:00
|
|
|
|
2012-08-09 05:35:39 +08:00
|
|
|
/* ilk-only ips/rps state. Everything in here is protected by the global
|
|
|
|
* mchdev_lock in intel_pm.c */
|
2012-11-03 02:55:03 +08:00
|
|
|
struct intel_ilk_power_mgmt ips;
|
2010-02-06 04:42:41 +08:00
|
|
|
|
2013-10-25 22:36:47 +08:00
|
|
|
struct i915_power_domains power_domains;
|
2013-05-30 22:07:11 +08:00
|
|
|
|
2013-10-04 03:15:06 +08:00
|
|
|
struct i915_psr psr;
|
2013-07-12 05:45:00 +08:00
|
|
|
|
2012-11-15 00:14:04 +08:00
|
|
|
struct i915_gpu_error gpu_error;
|
2010-10-01 21:57:56 +08:00
|
|
|
|
2013-05-09 01:45:13 +08:00
|
|
|
struct drm_i915_gem_object *vlv_pctx;
|
|
|
|
|
2015-08-10 19:34:08 +08:00
|
|
|
#ifdef CONFIG_DRM_FBDEV_EMULATION
|
2010-03-30 13:34:14 +08:00
|
|
|
/* list of fbdev register on this device */
|
|
|
|
struct intel_fbdev *fbdev;
|
2014-08-13 20:09:46 +08:00
|
|
|
struct work_struct fbdev_suspend_work;
|
2013-10-09 15:18:51 +08:00
|
|
|
#endif
|
2011-02-22 06:23:52 +08:00
|
|
|
|
|
|
|
struct drm_property *broadcast_rgb_property;
|
2011-05-13 05:17:24 +08:00
|
|
|
struct drm_property *force_audio_property;
|
2012-05-26 07:56:22 +08:00
|
|
|
|
2015-01-08 23:54:14 +08:00
|
|
|
/* hda/i915 audio component */
|
2015-08-19 16:48:56 +08:00
|
|
|
struct i915_audio_component *audio_component;
|
2015-01-08 23:54:14 +08:00
|
|
|
bool audio_component_registered;
|
2015-09-02 14:11:39 +08:00
|
|
|
/**
|
|
|
|
* av_mutex - mutex for audio/video sync
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
struct mutex av_mutex;
|
2015-01-08 23:54:14 +08:00
|
|
|
|
drm/i915: preliminary context support
Very basic code for context setup/destruction in the driver.
Adds the file i915_gem_context.c This file implements HW context
support. On gen5+ a HW context consists of an opaque GPU object which is
referenced at times of context saves and restores. With RC6 enabled,
the context is also referenced as the GPU enters and exists from RC6
(GPU has it's own internal power context, except on gen5). Though
something like a context does exist for the media ring, the code only
supports contexts for the render ring.
In software, there is a distinction between contexts created by the
user, and the default HW context. The default HW context is used by GPU
clients that do not request setup of their own hardware context. The
default context's state is never restored to help prevent programming
errors. This would happen if a client ran and piggy-backed off another
clients GPU state. The default context only exists to give the GPU some
offset to load as the current to invoke a save of the context we
actually care about. In fact, the code could likely be constructed,
albeit in a more complicated fashion, to never use the default context,
though that limits the driver's ability to swap out, and/or destroy
other contexts.
All other contexts are created as a request by the GPU client. These
contexts store GPU state, and thus allow GPU clients to not re-emit
state (and potentially query certain state) at any time. The kernel
driver makes certain that the appropriate commands are inserted.
There are 4 entry points into the contexts, init, fini, open, close.
The names are self-explanatory except that init can be called during
reset, and also during pm thaw/resume. As we expect our context to be
preserved across these events, we do not reinitialize in this case.
As Adam Jackson pointed out, The cutoff of 1MB where a HW context is
considered too big is arbitrary. The reason for this is even though
context sizes are increasing with every generation, they have yet to
eclipse even 32k. If we somehow read back way more than that, it
probably means BIOS has done something strange, or we're running on a
platform that wasn't designed for this.
v2: rename load/unload to init/fini (daniel)
remove ILK support for get_size() (indirectly daniel)
add HAS_HW_CONTEXTS macro to clarify supported platforms (daniel)
added comments (Ben)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2012-06-05 05:42:42 +08:00
|
|
|
uint32_t hw_context_size;
|
2013-09-18 12:12:45 +08:00
|
|
|
struct list_head context_list;
|
2012-11-03 02:55:02 +08:00
|
|
|
|
2012-12-12 02:48:29 +08:00
|
|
|
u32 fdi_rx_config;
|
2012-12-01 22:04:26 +08:00
|
|
|
|
2016-03-15 22:39:56 +08:00
|
|
|
/* Shadow for DISPLAY_PHY_CONTROL which can't be safely read */
|
2015-04-10 23:21:28 +08:00
|
|
|
u32 chv_phy_control;
|
2016-03-15 22:39:56 +08:00
|
|
|
/*
|
|
|
|
* Shadows for CHV DPLL_MD regs to keep the state
|
|
|
|
* checker somewhat working in the presence hardware
|
|
|
|
* crappiness (can't read out DPLL_MD for pipes B & C).
|
|
|
|
*/
|
|
|
|
u32 chv_dpll_md[I915_MAX_PIPES];
|
2016-04-04 22:27:10 +08:00
|
|
|
u32 bxt_phy_grc;
|
2015-04-10 23:21:28 +08:00
|
|
|
|
2014-03-10 17:01:44 +08:00
|
|
|
u32 suspend_count;
|
2015-11-18 23:32:30 +08:00
|
|
|
bool suspended_to_idle;
|
2012-11-03 02:55:02 +08:00
|
|
|
struct i915_suspend_saved_registers regfile;
|
2014-05-05 20:19:56 +08:00
|
|
|
struct vlv_s0ix_state vlv_s0ix_state;
|
2012-11-03 02:55:05 +08:00
|
|
|
|
2013-08-01 21:18:50 +08:00
|
|
|
struct {
|
|
|
|
/*
|
|
|
|
* Raw watermark latency values:
|
|
|
|
* in 0.1us units for WM0,
|
|
|
|
* in 0.5us units for WM1+.
|
|
|
|
*/
|
|
|
|
/* primary */
|
|
|
|
uint16_t pri_latency[5];
|
|
|
|
/* sprite */
|
|
|
|
uint16_t spr_latency[5];
|
|
|
|
/* cursor */
|
|
|
|
uint16_t cur_latency[5];
|
2014-11-05 01:06:38 +08:00
|
|
|
/*
|
|
|
|
* Raw watermark memory latency values
|
|
|
|
* for SKL for all 8 levels
|
|
|
|
* in 1us units.
|
|
|
|
*/
|
|
|
|
uint16_t skl_latency[8];
|
2013-10-10 00:18:03 +08:00
|
|
|
|
2014-11-05 01:06:42 +08:00
|
|
|
/*
|
|
|
|
* The skl_wm_values structure is a bit too big for stack
|
|
|
|
* allocation, so we keep the staging struct where we store
|
|
|
|
* intermediate results here instead.
|
|
|
|
*/
|
|
|
|
struct skl_wm_values skl_results;
|
|
|
|
|
2013-10-10 00:18:03 +08:00
|
|
|
/* current hardware state */
|
2014-11-05 01:06:42 +08:00
|
|
|
union {
|
|
|
|
struct ilk_wm_values hw;
|
|
|
|
struct skl_wm_values skl_hw;
|
2015-03-06 03:19:45 +08:00
|
|
|
struct vlv_wm_values vlv;
|
2014-11-05 01:06:42 +08:00
|
|
|
};
|
2015-09-09 02:05:12 +08:00
|
|
|
|
|
|
|
uint8_t max_level;
|
drm/i915: Add two-stage ILK-style watermark programming (v11)
In addition to calculating final watermarks, let's also pre-calculate a
set of intermediate watermark values at atomic check time. These
intermediate watermarks are a combination of the watermarks for the old
state and the new state; they should satisfy the requirements of both
states which means they can be programmed immediately when we commit the
atomic state (without waiting for a vblank). Once the vblank does
happen, we can then re-program watermarks to the more optimal final
value.
v2: Significant rebasing/rewriting.
v3:
- Move 'need_postvbl_update' flag to CRTC state (Daniel)
- Don't forget to check intermediate watermark values for validity
(Maarten)
- Don't due async watermark optimization; just do it at the end of the
atomic transaction, after waiting for vblanks. We do want it to be
async eventually, but adding that now will cause more trouble for
Maarten's in-progress work. (Maarten)
- Don't allocate space in crtc_state for intermediate watermarks on
platforms that don't need it (gen9+).
- Move WaCxSRDisabledForSpriteScaling:ivb into intel_begin_crtc_commit
now that ilk_update_wm is gone.
v4:
- Add a wm_mutex to cover updates to intel_crtc->active and the
need_postvbl_update flag. Since we don't have async yet it isn't
terribly important yet, but might as well add it now.
- Change interface to program watermarks. Platforms will now expose
.initial_watermarks() and .optimize_watermarks() functions to do
watermark programming. These should lock wm_mutex, copy the
appropriate state values into intel_crtc->active, and then call
the internal program watermarks function.
v5:
- Skip intermediate watermark calculation/check during initial hardware
readout since we don't trust the existing HW values (and don't have
valid values of our own yet).
- Don't try to call .optimize_watermarks() on platforms that don't have
atomic watermarks yet. (Maarten)
v6:
- Rebase
v7:
- Further rebase
v8:
- A few minor indentation and line length fixes
v9:
- Yet another rebase since Maarten's patches reworked a bunch of the
code (wm_pre, wm_post, etc.) that this was previously based on.
v10:
- Move wm_mutex to dev_priv to protect against racing commits against
disjoint CRTC sets. (Maarten)
- Drop unnecessary clearing of cstate->wm.need_postvbl_update (Maarten)
v11:
- Now that we've moved to atomic watermark updates, make sure we call
the proper function to program watermarks in
{ironlake,haswell}_crtc_enable(); the failure to do so on the
previous patch iteration led to us not actually programming the
watermarks before turning on the CRTC, which was the cause of the
underruns that the CI system was seeing.
- Fix inverted logic for determining when to optimize watermarks. We
were needlessly optimizing when the intermediate/optimal values were
the same (harmless), but not actually optimizing when they differed
(also harmless, but wasteful from a power/bandwidth perspective).
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1456276813-5689-1-git-send-email-matthew.d.roper@intel.com
2016-02-24 09:20:13 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Should be held around atomic WM register writing; also
|
|
|
|
* protects * intel_crtc->wm.active and
|
|
|
|
* cstate->wm.need_postvbl_update.
|
|
|
|
*/
|
|
|
|
struct mutex wm_mutex;
|
2016-05-12 22:06:02 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Set during HW readout of watermarks/DDB. Some platforms
|
|
|
|
* need to know when we're still using BIOS-provided values
|
|
|
|
* (which we don't fully trust).
|
|
|
|
*/
|
|
|
|
bool distrust_bios_wm;
|
2013-08-01 21:18:50 +08:00
|
|
|
} wm;
|
|
|
|
|
2013-12-07 06:32:13 +08:00
|
|
|
struct i915_runtime_pm pm;
|
|
|
|
|
2014-07-25 00:04:21 +08:00
|
|
|
/* Abstract the submission mechanism (legacy ringbuffer or execlists) away */
|
|
|
|
struct {
|
2015-05-30 00:43:27 +08:00
|
|
|
int (*execbuf_submit)(struct i915_execbuffer_params *params,
|
2015-03-19 20:30:06 +08:00
|
|
|
struct drm_i915_gem_execbuffer2 *args,
|
2015-05-30 00:43:27 +08:00
|
|
|
struct list_head *vmas);
|
2016-03-16 19:00:40 +08:00
|
|
|
int (*init_engines)(struct drm_device *dev);
|
|
|
|
void (*cleanup_engine)(struct intel_engine_cs *engine);
|
|
|
|
void (*stop_engine)(struct intel_engine_cs *engine);
|
2016-07-04 15:08:31 +08:00
|
|
|
|
|
|
|
/**
|
|
|
|
* Is the GPU currently considered idle, or busy executing
|
|
|
|
* userspace requests? Whilst idle, we allow runtime power
|
|
|
|
* management to power down the hardware and display clocks.
|
|
|
|
* In order to reduce the effect on performance, there
|
|
|
|
* is a slight delay before we do so.
|
|
|
|
*/
|
|
|
|
unsigned int active_engines;
|
|
|
|
bool awake;
|
|
|
|
|
|
|
|
/**
|
|
|
|
* We leave the user IRQ off as much as possible,
|
|
|
|
* but this means that requests will finish and never
|
|
|
|
* be retired once the system goes idle. Set a timer to
|
|
|
|
* fire periodically while the ring is running. When it
|
|
|
|
* fires, go retire requests.
|
|
|
|
*/
|
|
|
|
struct delayed_work retire_work;
|
|
|
|
|
|
|
|
/**
|
|
|
|
* When we detect an idle GPU, we want to turn on
|
|
|
|
* powersaving features. So once we see that there
|
|
|
|
* are no more requests outstanding and no more
|
|
|
|
* arrive within a small period of time, we fire
|
|
|
|
* off the idle_work.
|
|
|
|
*/
|
|
|
|
struct delayed_work idle_work;
|
2014-07-25 00:04:21 +08:00
|
|
|
} gt;
|
|
|
|
|
2015-09-08 23:05:45 +08:00
|
|
|
/* perform PHY state sanity checks? */
|
|
|
|
bool chv_phy_assert[2];
|
|
|
|
|
2015-12-01 01:19:39 +08:00
|
|
|
struct intel_encoder *dig_port_map[I915_MAX_PORTS];
|
|
|
|
|
2014-05-21 23:37:52 +08:00
|
|
|
/*
|
|
|
|
* NOTE: This is the dri1/ums dungeon, don't add stuff here. Your patch
|
|
|
|
* will be rejected. Instead look for a better place.
|
|
|
|
*/
|
2014-03-31 19:27:22 +08:00
|
|
|
};
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2013-08-02 01:39:55 +08:00
|
|
|
static inline struct drm_i915_private *to_i915(const struct drm_device *dev)
|
|
|
|
{
|
2016-06-24 21:00:21 +08:00
|
|
|
return container_of(dev, struct drm_i915_private, drm);
|
2013-08-02 01:39:55 +08:00
|
|
|
}
|
|
|
|
|
2015-01-08 23:54:13 +08:00
|
|
|
static inline struct drm_i915_private *dev_to_i915(struct device *dev)
|
|
|
|
{
|
|
|
|
return to_i915(dev_get_drvdata(dev));
|
|
|
|
}
|
|
|
|
|
2015-08-12 22:43:36 +08:00
|
|
|
static inline struct drm_i915_private *guc_to_i915(struct intel_guc *guc)
|
|
|
|
{
|
|
|
|
return container_of(guc, struct drm_i915_private, guc);
|
|
|
|
}
|
|
|
|
|
2016-03-24 19:20:38 +08:00
|
|
|
/* Simple iterator over all initialised engines */
|
|
|
|
#define for_each_engine(engine__, dev_priv__) \
|
|
|
|
for ((engine__) = &(dev_priv__)->engine[0]; \
|
|
|
|
(engine__) < &(dev_priv__)->engine[I915_NUM_ENGINES]; \
|
|
|
|
(engine__)++) \
|
|
|
|
for_each_if (intel_engine_initialized(engine__))
|
2012-05-11 21:29:30 +08:00
|
|
|
|
2016-03-24 02:19:53 +08:00
|
|
|
/* Iterator with engine_id */
|
|
|
|
#define for_each_engine_id(engine__, dev_priv__, id__) \
|
|
|
|
for ((engine__) = &(dev_priv__)->engine[0], (id__) = 0; \
|
|
|
|
(engine__) < &(dev_priv__)->engine[I915_NUM_ENGINES]; \
|
|
|
|
(engine__)++) \
|
|
|
|
for_each_if (((id__) = (engine__)->id, \
|
|
|
|
intel_engine_initialized(engine__)))
|
|
|
|
|
|
|
|
/* Iterator over subset of engines selected by mask */
|
2016-03-16 23:54:00 +08:00
|
|
|
#define for_each_engine_masked(engine__, dev_priv__, mask__) \
|
2016-03-24 19:20:38 +08:00
|
|
|
for ((engine__) = &(dev_priv__)->engine[0]; \
|
|
|
|
(engine__) < &(dev_priv__)->engine[I915_NUM_ENGINES]; \
|
|
|
|
(engine__)++) \
|
|
|
|
for_each_if (((mask__) & intel_engine_flag(engine__)) && \
|
|
|
|
intel_engine_initialized(engine__))
|
2016-03-16 23:54:00 +08:00
|
|
|
|
2012-02-14 11:45:36 +08:00
|
|
|
enum hdmi_force_audio {
|
|
|
|
HDMI_AUDIO_OFF_DVI = -2, /* no aux data for HDMI-DVI converter */
|
|
|
|
HDMI_AUDIO_OFF, /* force turn off HDMI audio */
|
|
|
|
HDMI_AUDIO_AUTO, /* trust EDID */
|
|
|
|
HDMI_AUDIO_ON, /* force turn on HDMI audio */
|
|
|
|
};
|
|
|
|
|
2013-07-04 19:06:28 +08:00
|
|
|
#define I915_GTT_OFFSET_NONE ((u32)-1)
|
2012-11-15 19:32:19 +08:00
|
|
|
|
2012-06-07 22:38:42 +08:00
|
|
|
struct drm_i915_gem_object_ops {
|
2016-01-23 02:32:31 +08:00
|
|
|
unsigned int flags;
|
|
|
|
#define I915_GEM_OBJECT_HAS_STRUCT_PAGE 0x1
|
|
|
|
|
2012-06-07 22:38:42 +08:00
|
|
|
/* Interface between the GEM object and its backing storage.
|
|
|
|
* get_pages() is called once prior to the use of the associated set
|
|
|
|
* of pages before to binding them into the GTT, and put_pages() is
|
|
|
|
* called after we no longer need them. As we expect there to be
|
|
|
|
* associated cost with migrating pages between the backing storage
|
|
|
|
* and making them available for the GPU (e.g. clflush), we may hold
|
|
|
|
* onto the pages after they are no longer referenced by the GPU
|
|
|
|
* in case they may be used again shortly (for example migrating the
|
|
|
|
* pages to a different memory domain within the GTT). put_pages()
|
|
|
|
* will therefore most likely be called when the object itself is
|
|
|
|
* being released or under memory pressure (where we attempt to
|
|
|
|
* reap pages for the shrinker).
|
|
|
|
*/
|
|
|
|
int (*get_pages)(struct drm_i915_gem_object *);
|
|
|
|
void (*put_pages)(struct drm_i915_gem_object *);
|
2016-01-23 02:32:31 +08:00
|
|
|
|
drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl
By exporting the ability to map user address and inserting PTEs
representing their backing pages into the GTT, we can exploit UMA in order
to utilize normal application data as a texture source or even as a
render target (depending upon the capabilities of the chipset). This has
a number of uses, with zero-copy downloads to the GPU and efficient
readback making the intermixed streaming of CPU and GPU operations
fairly efficient. This ability has many widespread implications from
faster rendering of client-side software rasterisers (chromium),
mitigation of stalls due to read back (firefox) and to faster pipelining
of texture data (such as pixel buffer objects in GL or data blobs in CL).
v2: Compile with CONFIG_MMU_NOTIFIER
v3: We can sleep while performing invalidate-range, which we can utilise
to drop our page references prior to the kernel manipulating the vma
(for either discard or cloning) and so protect normal users.
v4: Only run the invalidate notifier if the range intercepts the bo.
v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers
v6: Recheck after reacquire mutex for lost mmu.
v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary.
v8: Fix rebasing error after forwarding porting the back port.
v9: Limit the userptr to page aligned entries. We now expect userspace
to handle all the offset-in-page adjustments itself.
v10: Prevent vma from being copied across fork to avoid issues with cow.
v11: Drop vma behaviour changes -- locking is nigh on impossible.
Use a worker to load user pages to avoid lock inversions.
v12: Use get_task_mm()/mmput() for correct refcounting of mm.
v13: Use a worker to release the mmu_notifier to avoid lock inversion
v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer
with its own locking and tree of objects for each mm/mmu_notifier.
v15: Prevent overlapping userptr objects, and invalidate all objects
within the mmu_notifier range
v16: Fix a typo for iterating over multiple objects in the range and
rearrange error path to destroy the mmu_notifier locklessly.
Also close a race between invalidate_range and the get_pages_worker.
v17: Close a race between get_pages_worker/invalidate_range and fresh
allocations of the same userptr range - and notice that
struct_mutex was presumed to be held when during creation it wasn't.
v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory
for the struct sg_table and to clear it before reporting an error.
v19: Always error out on read-only userptr requests as we don't have the
hardware infrastructure to support them at the moment.
v20: Refuse to implement read-only support until we have the required
infrastructure - but reserve the bit in flags for future use.
v21: use_mm() is not required for get_user_pages(). It is only meant to
be used to fix up the kernel thread's current->mm for use with
copy_user().
v22: Use sg_alloc_table_from_pages for that chunky feeling
v23: Export a function for sanity checking dma-buf rather than encode
userptr details elsewhere, and clean up comments based on
suggestions by Bradley.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Akash Goel <akash.goel@intel.com>
Cc: "Volkin, Bradley D" <bradley.d.volkin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
[danvet: Frob ioctl allocation to pick the next one - will cause a bit
of fuss with create2 apparently, but such are the rules.]
[danvet2: oops, forgot to git add after manual patch application]
[danvet3: Appease sparse.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-16 21:22:37 +08:00
|
|
|
int (*dmabuf_export)(struct drm_i915_gem_object *);
|
|
|
|
void (*release)(struct drm_i915_gem_object *);
|
2012-06-07 22:38:42 +08:00
|
|
|
};
|
|
|
|
|
2014-06-19 05:28:09 +08:00
|
|
|
/*
|
|
|
|
* Frontbuffer tracking bits. Set in obj->frontbuffer_bits while a gem bo is
|
2015-09-15 00:05:42 +08:00
|
|
|
* considered to be the frontbuffer for the given plane interface-wise. This
|
2014-06-19 05:28:09 +08:00
|
|
|
* doesn't mean that the hw necessarily already scans it out, but that any
|
|
|
|
* rendering (by the cpu or gpu) will land in the frontbuffer eventually.
|
|
|
|
*
|
|
|
|
* We have one bit per pipe and per scanout plane type.
|
|
|
|
*/
|
2015-09-15 00:05:42 +08:00
|
|
|
#define INTEL_MAX_SPRITE_BITS_PER_PIPE 5
|
|
|
|
#define INTEL_FRONTBUFFER_BITS_PER_PIPE 8
|
2014-06-19 05:28:09 +08:00
|
|
|
#define INTEL_FRONTBUFFER_BITS \
|
|
|
|
(INTEL_FRONTBUFFER_BITS_PER_PIPE * I915_MAX_PIPES)
|
|
|
|
#define INTEL_FRONTBUFFER_PRIMARY(pipe) \
|
|
|
|
(1 << (INTEL_FRONTBUFFER_BITS_PER_PIPE * (pipe)))
|
|
|
|
#define INTEL_FRONTBUFFER_CURSOR(pipe) \
|
2015-09-15 00:05:42 +08:00
|
|
|
(1 << (1 + (INTEL_FRONTBUFFER_BITS_PER_PIPE * (pipe))))
|
|
|
|
#define INTEL_FRONTBUFFER_SPRITE(pipe, plane) \
|
|
|
|
(1 << (2 + plane + (INTEL_FRONTBUFFER_BITS_PER_PIPE * (pipe))))
|
2014-06-19 05:28:09 +08:00
|
|
|
#define INTEL_FRONTBUFFER_OVERLAY(pipe) \
|
2015-09-15 00:05:42 +08:00
|
|
|
(1 << (2 + INTEL_MAX_SPRITE_BITS_PER_PIPE + (INTEL_FRONTBUFFER_BITS_PER_PIPE * (pipe))))
|
2014-06-18 19:59:13 +08:00
|
|
|
#define INTEL_FRONTBUFFER_ALL_MASK(pipe) \
|
2015-09-15 00:05:42 +08:00
|
|
|
(0xff << (INTEL_FRONTBUFFER_BITS_PER_PIPE * (pipe)))
|
2014-06-19 05:28:09 +08:00
|
|
|
|
2008-07-31 03:06:12 +08:00
|
|
|
struct drm_i915_gem_object {
|
2010-04-10 03:05:07 +08:00
|
|
|
struct drm_gem_object base;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2012-06-07 22:38:42 +08:00
|
|
|
const struct drm_i915_gem_object_ops *ops;
|
|
|
|
|
2013-07-18 03:19:03 +08:00
|
|
|
/** List of VMAs backed by this object */
|
|
|
|
struct list_head vma_list;
|
|
|
|
|
2012-11-15 19:32:21 +08:00
|
|
|
/** Stolen memory for this object, instead of being backed by shmem. */
|
|
|
|
struct drm_mm_node *stolen;
|
2013-06-01 02:28:48 +08:00
|
|
|
struct list_head global_list;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2016-03-16 19:00:40 +08:00
|
|
|
struct list_head engine_list[I915_NUM_ENGINES];
|
2013-08-14 17:38:33 +08:00
|
|
|
/** Used in execbuf to temporarily hold a ref */
|
|
|
|
struct list_head obj_exec_link;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2015-04-07 23:20:38 +08:00
|
|
|
struct list_head batch_pool_link;
|
2014-12-12 04:13:08 +08:00
|
|
|
|
2008-07-31 03:06:12 +08:00
|
|
|
/**
|
2012-07-20 19:41:02 +08:00
|
|
|
* This is set if the object is on the active lists (has pending
|
|
|
|
* rendering and so a non-zero seqno), and is not set if it i s on
|
|
|
|
* inactive (ready to be unbound) list.
|
2008-07-31 03:06:12 +08:00
|
|
|
*/
|
2016-03-16 19:00:39 +08:00
|
|
|
unsigned int active:I915_NUM_ENGINES;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
|
|
|
/**
|
|
|
|
* This is set if the object has been written to since last bound
|
|
|
|
* to the GTT
|
|
|
|
*/
|
2011-08-17 03:34:10 +08:00
|
|
|
unsigned int dirty:1;
|
2010-05-13 17:49:44 +08:00
|
|
|
|
|
|
|
/**
|
|
|
|
* Fence register bits (if any) for this object. Will be set
|
|
|
|
* as needed when mapped into the GTT.
|
|
|
|
* Protected by dev->struct_mutex.
|
|
|
|
*/
|
2011-10-10 03:52:02 +08:00
|
|
|
signed int fence_reg:I915_MAX_NUM_FENCE_BITS;
|
2010-05-13 17:49:44 +08:00
|
|
|
|
|
|
|
/**
|
|
|
|
* Advice: are the backing pages purgeable?
|
|
|
|
*/
|
2011-08-17 03:34:10 +08:00
|
|
|
unsigned int madv:2;
|
2010-05-13 17:49:44 +08:00
|
|
|
|
|
|
|
/**
|
|
|
|
* Current tiling mode for the object.
|
|
|
|
*/
|
2011-08-17 03:34:10 +08:00
|
|
|
unsigned int tiling_mode:2;
|
2012-04-21 23:23:23 +08:00
|
|
|
/**
|
|
|
|
* Whether the tiling parameters for the currently associated fence
|
|
|
|
* register have changed. Note that for the purposes of tracking
|
|
|
|
* tiling changes we also treat the unfenced register, the register
|
|
|
|
* slot that the object occupies whilst it executes a fenced
|
|
|
|
* command (such as BLT on gen2/3), as a "fence".
|
|
|
|
*/
|
|
|
|
unsigned int fence_dirty:1;
|
2010-05-13 17:49:44 +08:00
|
|
|
|
2010-11-05 00:11:09 +08:00
|
|
|
/**
|
|
|
|
* Is the object at the current location in the gtt mappable and
|
|
|
|
* fenceable? Used to avoid costly recalculations.
|
|
|
|
*/
|
2011-08-17 03:34:10 +08:00
|
|
|
unsigned int map_and_fenceable:1;
|
2010-11-05 00:11:09 +08:00
|
|
|
|
2010-10-02 04:05:20 +08:00
|
|
|
/**
|
|
|
|
* Whether the current gtt mapping needs to be mappable (and isn't just
|
|
|
|
* mappable by accident). Track pin and fault separate for a more
|
|
|
|
* accurate mappable working set.
|
|
|
|
*/
|
2011-08-17 03:34:10 +08:00
|
|
|
unsigned int fault_mappable:1;
|
2010-10-02 04:05:20 +08:00
|
|
|
|
2014-06-17 13:29:42 +08:00
|
|
|
/*
|
|
|
|
* Is the object to be mapped as read-only to the GPU
|
|
|
|
* Only honoured if hardware has relevant pte bit
|
|
|
|
*/
|
|
|
|
unsigned long gt_ro:1;
|
2013-08-08 21:41:10 +08:00
|
|
|
unsigned int cache_level:3;
|
2015-01-13 21:32:52 +08:00
|
|
|
unsigned int cache_dirty:1;
|
2011-03-30 07:59:50 +08:00
|
|
|
|
2014-06-19 05:28:09 +08:00
|
|
|
unsigned int frontbuffer_bits:INTEL_FRONTBUFFER_BITS;
|
|
|
|
|
2016-06-18 01:46:39 +08:00
|
|
|
unsigned int has_wc_mmap;
|
2015-04-13 18:50:09 +08:00
|
|
|
unsigned int pin_display;
|
|
|
|
|
2012-06-01 22:20:22 +08:00
|
|
|
struct sg_table *pages;
|
2012-09-05 04:02:54 +08:00
|
|
|
int pages_pin_count;
|
2015-04-07 23:20:25 +08:00
|
|
|
struct get_page {
|
|
|
|
struct scatterlist *sg;
|
|
|
|
int last;
|
|
|
|
} get_page;
|
2016-04-08 19:11:11 +08:00
|
|
|
void *mapping;
|
2012-05-22 20:09:21 +08:00
|
|
|
|
2015-04-27 20:41:17 +08:00
|
|
|
/** Breadcrumb of last rendering to the buffer.
|
|
|
|
* There can only be one writer, but we allow for multiple readers.
|
|
|
|
* If there is a writer that necessarily implies that all other
|
|
|
|
* read requests are complete - but we may only be lazily clearing
|
|
|
|
* the read requests. A read request is naturally the most recent
|
|
|
|
* request on a ring, so we may have two different write and read
|
|
|
|
* requests on one ring where the write request is older than the
|
|
|
|
* read request. This allows for the CPU to read from an active
|
|
|
|
* buffer by only waiting for the write to complete.
|
|
|
|
* */
|
2016-03-16 19:00:39 +08:00
|
|
|
struct drm_i915_gem_request *last_read_req[I915_NUM_ENGINES];
|
2014-11-25 02:49:26 +08:00
|
|
|
struct drm_i915_gem_request *last_write_req;
|
2010-11-12 21:53:37 +08:00
|
|
|
/** Breadcrumb of last fenced GPU access to the buffer. */
|
2014-11-25 02:49:26 +08:00
|
|
|
struct drm_i915_gem_request *last_fenced_req;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2010-05-13 17:49:44 +08:00
|
|
|
/** Current tiling stride for the object, if it's tiled. */
|
2008-11-13 02:03:55 +08:00
|
|
|
uint32_t stride;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2013-10-10 03:23:52 +08:00
|
|
|
/** References from framebuffers, locks out tiling changes. */
|
|
|
|
unsigned long framebuffer_references;
|
|
|
|
|
2009-03-13 07:56:27 +08:00
|
|
|
/** Record of address bit 17 of each page at last unbind. */
|
2010-06-06 22:40:22 +08:00
|
|
|
unsigned long *bit_17;
|
2009-03-13 07:56:27 +08:00
|
|
|
|
drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl
By exporting the ability to map user address and inserting PTEs
representing their backing pages into the GTT, we can exploit UMA in order
to utilize normal application data as a texture source or even as a
render target (depending upon the capabilities of the chipset). This has
a number of uses, with zero-copy downloads to the GPU and efficient
readback making the intermixed streaming of CPU and GPU operations
fairly efficient. This ability has many widespread implications from
faster rendering of client-side software rasterisers (chromium),
mitigation of stalls due to read back (firefox) and to faster pipelining
of texture data (such as pixel buffer objects in GL or data blobs in CL).
v2: Compile with CONFIG_MMU_NOTIFIER
v3: We can sleep while performing invalidate-range, which we can utilise
to drop our page references prior to the kernel manipulating the vma
(for either discard or cloning) and so protect normal users.
v4: Only run the invalidate notifier if the range intercepts the bo.
v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers
v6: Recheck after reacquire mutex for lost mmu.
v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary.
v8: Fix rebasing error after forwarding porting the back port.
v9: Limit the userptr to page aligned entries. We now expect userspace
to handle all the offset-in-page adjustments itself.
v10: Prevent vma from being copied across fork to avoid issues with cow.
v11: Drop vma behaviour changes -- locking is nigh on impossible.
Use a worker to load user pages to avoid lock inversions.
v12: Use get_task_mm()/mmput() for correct refcounting of mm.
v13: Use a worker to release the mmu_notifier to avoid lock inversion
v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer
with its own locking and tree of objects for each mm/mmu_notifier.
v15: Prevent overlapping userptr objects, and invalidate all objects
within the mmu_notifier range
v16: Fix a typo for iterating over multiple objects in the range and
rearrange error path to destroy the mmu_notifier locklessly.
Also close a race between invalidate_range and the get_pages_worker.
v17: Close a race between get_pages_worker/invalidate_range and fresh
allocations of the same userptr range - and notice that
struct_mutex was presumed to be held when during creation it wasn't.
v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory
for the struct sg_table and to clear it before reporting an error.
v19: Always error out on read-only userptr requests as we don't have the
hardware infrastructure to support them at the moment.
v20: Refuse to implement read-only support until we have the required
infrastructure - but reserve the bit in flags for future use.
v21: use_mm() is not required for get_user_pages(). It is only meant to
be used to fix up the kernel thread's current->mm for use with
copy_user().
v22: Use sg_alloc_table_from_pages for that chunky feeling
v23: Export a function for sanity checking dma-buf rather than encode
userptr details elsewhere, and clean up comments based on
suggestions by Bradley.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Akash Goel <akash.goel@intel.com>
Cc: "Volkin, Bradley D" <bradley.d.volkin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
[danvet: Frob ioctl allocation to pick the next one - will cause a bit
of fuss with create2 apparently, but such are the rules.]
[danvet2: oops, forgot to git add after manual patch application]
[danvet3: Appease sparse.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-16 21:22:37 +08:00
|
|
|
union {
|
2014-11-04 20:51:40 +08:00
|
|
|
/** for phy allocated objects */
|
|
|
|
struct drm_dma_handle *phys_handle;
|
|
|
|
|
drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl
By exporting the ability to map user address and inserting PTEs
representing their backing pages into the GTT, we can exploit UMA in order
to utilize normal application data as a texture source or even as a
render target (depending upon the capabilities of the chipset). This has
a number of uses, with zero-copy downloads to the GPU and efficient
readback making the intermixed streaming of CPU and GPU operations
fairly efficient. This ability has many widespread implications from
faster rendering of client-side software rasterisers (chromium),
mitigation of stalls due to read back (firefox) and to faster pipelining
of texture data (such as pixel buffer objects in GL or data blobs in CL).
v2: Compile with CONFIG_MMU_NOTIFIER
v3: We can sleep while performing invalidate-range, which we can utilise
to drop our page references prior to the kernel manipulating the vma
(for either discard or cloning) and so protect normal users.
v4: Only run the invalidate notifier if the range intercepts the bo.
v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers
v6: Recheck after reacquire mutex for lost mmu.
v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary.
v8: Fix rebasing error after forwarding porting the back port.
v9: Limit the userptr to page aligned entries. We now expect userspace
to handle all the offset-in-page adjustments itself.
v10: Prevent vma from being copied across fork to avoid issues with cow.
v11: Drop vma behaviour changes -- locking is nigh on impossible.
Use a worker to load user pages to avoid lock inversions.
v12: Use get_task_mm()/mmput() for correct refcounting of mm.
v13: Use a worker to release the mmu_notifier to avoid lock inversion
v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer
with its own locking and tree of objects for each mm/mmu_notifier.
v15: Prevent overlapping userptr objects, and invalidate all objects
within the mmu_notifier range
v16: Fix a typo for iterating over multiple objects in the range and
rearrange error path to destroy the mmu_notifier locklessly.
Also close a race between invalidate_range and the get_pages_worker.
v17: Close a race between get_pages_worker/invalidate_range and fresh
allocations of the same userptr range - and notice that
struct_mutex was presumed to be held when during creation it wasn't.
v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory
for the struct sg_table and to clear it before reporting an error.
v19: Always error out on read-only userptr requests as we don't have the
hardware infrastructure to support them at the moment.
v20: Refuse to implement read-only support until we have the required
infrastructure - but reserve the bit in flags for future use.
v21: use_mm() is not required for get_user_pages(). It is only meant to
be used to fix up the kernel thread's current->mm for use with
copy_user().
v22: Use sg_alloc_table_from_pages for that chunky feeling
v23: Export a function for sanity checking dma-buf rather than encode
userptr details elsewhere, and clean up comments based on
suggestions by Bradley.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Akash Goel <akash.goel@intel.com>
Cc: "Volkin, Bradley D" <bradley.d.volkin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
[danvet: Frob ioctl allocation to pick the next one - will cause a bit
of fuss with create2 apparently, but such are the rules.]
[danvet2: oops, forgot to git add after manual patch application]
[danvet3: Appease sparse.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-16 21:22:37 +08:00
|
|
|
struct i915_gem_userptr {
|
|
|
|
uintptr_t ptr;
|
|
|
|
unsigned read_only :1;
|
|
|
|
unsigned workers :4;
|
|
|
|
#define I915_GEM_USERPTR_MAX_WORKERS 15
|
|
|
|
|
2014-08-07 21:20:40 +08:00
|
|
|
struct i915_mm_struct *mm;
|
|
|
|
struct i915_mmu_object *mmu_object;
|
drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl
By exporting the ability to map user address and inserting PTEs
representing their backing pages into the GTT, we can exploit UMA in order
to utilize normal application data as a texture source or even as a
render target (depending upon the capabilities of the chipset). This has
a number of uses, with zero-copy downloads to the GPU and efficient
readback making the intermixed streaming of CPU and GPU operations
fairly efficient. This ability has many widespread implications from
faster rendering of client-side software rasterisers (chromium),
mitigation of stalls due to read back (firefox) and to faster pipelining
of texture data (such as pixel buffer objects in GL or data blobs in CL).
v2: Compile with CONFIG_MMU_NOTIFIER
v3: We can sleep while performing invalidate-range, which we can utilise
to drop our page references prior to the kernel manipulating the vma
(for either discard or cloning) and so protect normal users.
v4: Only run the invalidate notifier if the range intercepts the bo.
v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers
v6: Recheck after reacquire mutex for lost mmu.
v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary.
v8: Fix rebasing error after forwarding porting the back port.
v9: Limit the userptr to page aligned entries. We now expect userspace
to handle all the offset-in-page adjustments itself.
v10: Prevent vma from being copied across fork to avoid issues with cow.
v11: Drop vma behaviour changes -- locking is nigh on impossible.
Use a worker to load user pages to avoid lock inversions.
v12: Use get_task_mm()/mmput() for correct refcounting of mm.
v13: Use a worker to release the mmu_notifier to avoid lock inversion
v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer
with its own locking and tree of objects for each mm/mmu_notifier.
v15: Prevent overlapping userptr objects, and invalidate all objects
within the mmu_notifier range
v16: Fix a typo for iterating over multiple objects in the range and
rearrange error path to destroy the mmu_notifier locklessly.
Also close a race between invalidate_range and the get_pages_worker.
v17: Close a race between get_pages_worker/invalidate_range and fresh
allocations of the same userptr range - and notice that
struct_mutex was presumed to be held when during creation it wasn't.
v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory
for the struct sg_table and to clear it before reporting an error.
v19: Always error out on read-only userptr requests as we don't have the
hardware infrastructure to support them at the moment.
v20: Refuse to implement read-only support until we have the required
infrastructure - but reserve the bit in flags for future use.
v21: use_mm() is not required for get_user_pages(). It is only meant to
be used to fix up the kernel thread's current->mm for use with
copy_user().
v22: Use sg_alloc_table_from_pages for that chunky feeling
v23: Export a function for sanity checking dma-buf rather than encode
userptr details elsewhere, and clean up comments based on
suggestions by Bradley.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Akash Goel <akash.goel@intel.com>
Cc: "Volkin, Bradley D" <bradley.d.volkin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
[danvet: Frob ioctl allocation to pick the next one - will cause a bit
of fuss with create2 apparently, but such are the rules.]
[danvet2: oops, forgot to git add after manual patch application]
[danvet3: Appease sparse.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-16 21:22:37 +08:00
|
|
|
struct work_struct *work;
|
|
|
|
} userptr;
|
|
|
|
};
|
|
|
|
};
|
2010-04-10 03:05:08 +08:00
|
|
|
#define to_intel_bo(x) container_of(x, struct drm_i915_gem_object, base)
|
2010-03-08 20:35:02 +08:00
|
|
|
|
2016-06-20 22:05:51 +08:00
|
|
|
static inline bool
|
|
|
|
i915_gem_object_has_struct_page(const struct drm_i915_gem_object *obj)
|
|
|
|
{
|
|
|
|
return obj->ops->flags & I915_GEM_OBJECT_HAS_STRUCT_PAGE;
|
|
|
|
}
|
|
|
|
|
2016-05-20 18:54:06 +08:00
|
|
|
/*
|
|
|
|
* Optimised SGL iterator for GEM objects
|
|
|
|
*/
|
|
|
|
static __always_inline struct sgt_iter {
|
|
|
|
struct scatterlist *sgp;
|
|
|
|
union {
|
|
|
|
unsigned long pfn;
|
|
|
|
dma_addr_t dma;
|
|
|
|
};
|
|
|
|
unsigned int curr;
|
|
|
|
unsigned int max;
|
|
|
|
} __sgt_iter(struct scatterlist *sgl, bool dma) {
|
|
|
|
struct sgt_iter s = { .sgp = sgl };
|
|
|
|
|
|
|
|
if (s.sgp) {
|
|
|
|
s.max = s.curr = s.sgp->offset;
|
|
|
|
s.max += s.sgp->length;
|
|
|
|
if (dma)
|
|
|
|
s.dma = sg_dma_address(s.sgp);
|
|
|
|
else
|
|
|
|
s.pfn = page_to_pfn(sg_page(s.sgp));
|
|
|
|
}
|
|
|
|
|
|
|
|
return s;
|
|
|
|
}
|
|
|
|
|
2016-05-20 18:54:07 +08:00
|
|
|
/**
|
|
|
|
* __sg_next - return the next scatterlist entry in a list
|
|
|
|
* @sg: The current sg entry
|
|
|
|
*
|
|
|
|
* Description:
|
|
|
|
* If the entry is the last, return NULL; otherwise, step to the next
|
|
|
|
* element in the array (@sg@+1). If that's a chain pointer, follow it;
|
|
|
|
* otherwise just return the pointer to the current element.
|
|
|
|
**/
|
|
|
|
static inline struct scatterlist *__sg_next(struct scatterlist *sg)
|
|
|
|
{
|
|
|
|
#ifdef CONFIG_DEBUG_SG
|
|
|
|
BUG_ON(sg->sg_magic != SG_MAGIC);
|
|
|
|
#endif
|
|
|
|
return sg_is_last(sg) ? NULL :
|
|
|
|
likely(!sg_is_chain(++sg)) ? sg :
|
|
|
|
sg_chain_ptr(sg);
|
|
|
|
}
|
|
|
|
|
2016-05-20 18:54:06 +08:00
|
|
|
/**
|
|
|
|
* for_each_sgt_dma - iterate over the DMA addresses of the given sg_table
|
|
|
|
* @__dmap: DMA address (output)
|
|
|
|
* @__iter: 'struct sgt_iter' (iterator state, internal)
|
|
|
|
* @__sgt: sg_table to iterate over (input)
|
|
|
|
*/
|
|
|
|
#define for_each_sgt_dma(__dmap, __iter, __sgt) \
|
|
|
|
for ((__iter) = __sgt_iter((__sgt)->sgl, true); \
|
|
|
|
((__dmap) = (__iter).dma + (__iter).curr); \
|
|
|
|
(((__iter).curr += PAGE_SIZE) < (__iter).max) || \
|
2016-05-20 18:54:07 +08:00
|
|
|
((__iter) = __sgt_iter(__sg_next((__iter).sgp), true), 0))
|
2016-05-20 18:54:06 +08:00
|
|
|
|
|
|
|
/**
|
|
|
|
* for_each_sgt_page - iterate over the pages of the given sg_table
|
|
|
|
* @__pp: page pointer (output)
|
|
|
|
* @__iter: 'struct sgt_iter' (iterator state, internal)
|
|
|
|
* @__sgt: sg_table to iterate over (input)
|
|
|
|
*/
|
|
|
|
#define for_each_sgt_page(__pp, __iter, __sgt) \
|
|
|
|
for ((__iter) = __sgt_iter((__sgt)->sgl, false); \
|
|
|
|
((__pp) = (__iter).pfn == 0 ? NULL : \
|
|
|
|
pfn_to_page((__iter).pfn + ((__iter).curr >> PAGE_SHIFT))); \
|
|
|
|
(((__iter).curr += PAGE_SIZE) < (__iter).max) || \
|
2016-05-20 18:54:07 +08:00
|
|
|
((__iter) = __sgt_iter(__sg_next((__iter).sgp), false), 0))
|
2014-06-19 05:28:09 +08:00
|
|
|
|
2008-07-31 03:06:12 +08:00
|
|
|
/**
|
|
|
|
* Request queue structure.
|
|
|
|
*
|
|
|
|
* The request queue allows us to note sequence numbers that have been emitted
|
|
|
|
* and may be associated with active buffers to be retired.
|
|
|
|
*
|
2014-11-25 02:49:26 +08:00
|
|
|
* By keeping this list, we can avoid having to do questionable sequence
|
|
|
|
* number comparisons on buffer last_read|write_seqno. It also allows an
|
|
|
|
* emission time to be associated with the request for tracking how far ahead
|
|
|
|
* of the GPU the submission is.
|
2015-02-20 00:30:47 +08:00
|
|
|
*
|
|
|
|
* The requests are reference counted, so upon creation they should have an
|
|
|
|
* initial reference taken using kref_init
|
2008-07-31 03:06:12 +08:00
|
|
|
*/
|
|
|
|
struct drm_i915_gem_request {
|
2014-11-25 02:49:24 +08:00
|
|
|
struct kref ref;
|
|
|
|
|
2010-05-21 09:08:56 +08:00
|
|
|
/** On Which ring this request was generated */
|
2015-04-07 23:20:57 +08:00
|
|
|
struct drm_i915_private *i915;
|
2016-03-16 19:00:38 +08:00
|
|
|
struct intel_engine_cs *engine;
|
2016-07-02 00:23:26 +08:00
|
|
|
struct intel_signal_node signaling;
|
2010-05-21 09:08:56 +08:00
|
|
|
|
2015-12-11 19:32:59 +08:00
|
|
|
/** GEM sequence number associated with the previous request,
|
|
|
|
* when the HWS breadcrumb is equal to this the GPU is processing
|
|
|
|
* this request.
|
|
|
|
*/
|
|
|
|
u32 previous_seqno;
|
|
|
|
|
|
|
|
/** GEM sequence number associated with this request,
|
|
|
|
* when the HWS breadcrumb is equal or greater than this the GPU
|
|
|
|
* has finished processing this request.
|
|
|
|
*/
|
|
|
|
u32 seqno;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2013-06-12 20:01:39 +08:00
|
|
|
/** Position in the ringbuffer of the start of the request */
|
|
|
|
u32 head;
|
|
|
|
|
2015-01-15 21:10:37 +08:00
|
|
|
/**
|
|
|
|
* Position in the ringbuffer of the start of the postfix.
|
|
|
|
* This is required to calculate the maximum available ringbuffer
|
|
|
|
* space without overwriting the postfix.
|
|
|
|
*/
|
|
|
|
u32 postfix;
|
|
|
|
|
|
|
|
/** Position in the ringbuffer of the end of the whole request */
|
2012-02-15 19:25:36 +08:00
|
|
|
u32 tail;
|
|
|
|
|
2016-04-28 16:56:47 +08:00
|
|
|
/** Preallocate space in the ringbuffer for the emitting the request */
|
|
|
|
u32 reserved_space;
|
|
|
|
|
2015-02-20 00:30:47 +08:00
|
|
|
/**
|
2015-03-09 17:58:30 +08:00
|
|
|
* Context and ring buffer related to this request
|
2015-02-20 00:30:47 +08:00
|
|
|
* Contexts are refcounted, so when this request is associated with a
|
|
|
|
* context, we must increment the context's refcount, to guarantee that
|
|
|
|
* it persists while any request is linked to it. Requests themselves
|
|
|
|
* are also refcounted, so the request will only be freed when the last
|
|
|
|
* reference to it is dismissed, and the code in
|
|
|
|
* i915_gem_request_free() will then decrement the refcount on the
|
|
|
|
* context.
|
|
|
|
*/
|
2016-05-24 21:53:34 +08:00
|
|
|
struct i915_gem_context *ctx;
|
2015-02-13 19:48:12 +08:00
|
|
|
struct intel_ringbuffer *ringbuf;
|
2013-05-02 21:48:08 +08:00
|
|
|
|
2016-04-28 16:56:56 +08:00
|
|
|
/**
|
|
|
|
* Context related to the previous request.
|
|
|
|
* As the contexts are accessed by the hardware until the switch is
|
|
|
|
* completed to a new context, the hardware may still be writing
|
|
|
|
* to the context object after the breadcrumb is visible. We must
|
|
|
|
* not unpin/unbind/prune that object whilst still active and so
|
|
|
|
* we keep the previous context pinned until the following (this)
|
|
|
|
* request is retired.
|
|
|
|
*/
|
2016-05-24 21:53:34 +08:00
|
|
|
struct i915_gem_context *previous_context;
|
2016-04-28 16:56:56 +08:00
|
|
|
|
2015-05-30 00:43:39 +08:00
|
|
|
/** Batch buffer related to this request if any (used for
|
|
|
|
error state dump only) */
|
2013-06-12 20:01:39 +08:00
|
|
|
struct drm_i915_gem_object *batch_obj;
|
|
|
|
|
2008-07-31 03:06:12 +08:00
|
|
|
/** Time at which this request was emitted, in jiffies. */
|
|
|
|
unsigned long emitted_jiffies;
|
|
|
|
|
2009-06-03 15:27:35 +08:00
|
|
|
/** global list entry for this request */
|
2008-07-31 03:06:12 +08:00
|
|
|
struct list_head list;
|
2009-06-03 15:27:35 +08:00
|
|
|
|
2010-09-24 23:02:42 +08:00
|
|
|
struct drm_i915_file_private *file_priv;
|
2009-06-03 15:27:35 +08:00
|
|
|
/** file_priv list entry for this request */
|
|
|
|
struct list_head client_list;
|
2014-12-05 21:49:35 +08:00
|
|
|
|
2015-02-12 16:26:02 +08:00
|
|
|
/** process identifier submitting this request */
|
|
|
|
struct pid *pid;
|
|
|
|
|
2015-01-15 21:10:39 +08:00
|
|
|
/**
|
|
|
|
* The ELSP only accepts two elements at a time, so we queue
|
|
|
|
* context/tail pairs on a given queue (ring->execlist_queue) until the
|
|
|
|
* hardware is available. The queue serves a double purpose: we also use
|
|
|
|
* it to keep track of the up to 2 contexts currently in the hardware
|
|
|
|
* (usually one in execution and the other queued up by the GPU): We
|
|
|
|
* only remove elements from the head of the queue when the hardware
|
|
|
|
* informs us that an element has been completed.
|
|
|
|
*
|
|
|
|
* All accesses to the queue are mediated by a spinlock
|
|
|
|
* (ring->execlist_lock).
|
|
|
|
*/
|
|
|
|
|
|
|
|
/** Execlist link in the submission queue.*/
|
|
|
|
struct list_head execlist_link;
|
|
|
|
|
|
|
|
/** Execlists no. of times this request has been sent to the ELSP */
|
|
|
|
int elsp_submitted;
|
|
|
|
|
2016-04-28 16:56:57 +08:00
|
|
|
/** Execlists context hardware id. */
|
|
|
|
unsigned ctx_hw_id;
|
2008-07-31 03:06:12 +08:00
|
|
|
};
|
|
|
|
|
drm/i915: simplify allocation of driver-internal requests
There are a number of places where the driver needs a request, but isn't
working on behalf of any specific user or in a specific context. At
present, we associate them with the per-engine default context. A future
patch will abolish those per-engine context pointers; but we can already
eliminate a lot of the references to them, just by making the allocator
allow NULL as a shorthand for "an appropriate context for this ring",
which will mean that the callers don't need to know anything about how
the "appropriate context" is found (e.g. per-ring vs per-device, etc).
So this patch renames the existing i915_gem_request_alloc(), and makes
it local (static inline), and replaces it with a wrapper that provides
a default if the context is NULL, and also has a nicer calling
convention (doesn't require a pointer to an output parameter). Then we
change all callers to use the new convention:
OLD:
err = i915_gem_request_alloc(ring, user_ctx, &req);
if (err) ...
NEW:
req = i915_gem_request_alloc(ring, user_ctx);
if (IS_ERR(req)) ...
OLD:
err = i915_gem_request_alloc(ring, ring->default_context, &req);
if (err) ...
NEW:
req = i915_gem_request_alloc(ring, NULL);
if (IS_ERR(req)) ...
v4: Rebased
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Nick Hoath <nicholas.hoath@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453230175-19330-2-git-send-email-david.s.gordon@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-01-20 03:02:53 +08:00
|
|
|
struct drm_i915_gem_request * __must_check
|
|
|
|
i915_gem_request_alloc(struct intel_engine_cs *engine,
|
2016-05-24 21:53:34 +08:00
|
|
|
struct i915_gem_context *ctx);
|
2014-11-25 02:49:24 +08:00
|
|
|
void i915_gem_request_free(struct kref *req_ref);
|
2015-05-30 00:44:12 +08:00
|
|
|
int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
|
|
|
|
struct drm_file *file);
|
2014-11-25 02:49:24 +08:00
|
|
|
|
2014-11-25 02:49:25 +08:00
|
|
|
static inline uint32_t
|
|
|
|
i915_gem_request_get_seqno(struct drm_i915_gem_request *req)
|
|
|
|
{
|
|
|
|
return req ? req->seqno : 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline struct intel_engine_cs *
|
2016-03-16 19:00:39 +08:00
|
|
|
i915_gem_request_get_engine(struct drm_i915_gem_request *req)
|
2014-11-25 02:49:25 +08:00
|
|
|
{
|
2016-03-16 19:00:38 +08:00
|
|
|
return req ? req->engine : NULL;
|
2014-11-25 02:49:25 +08:00
|
|
|
}
|
|
|
|
|
2015-04-27 20:41:16 +08:00
|
|
|
static inline struct drm_i915_gem_request *
|
2014-11-25 02:49:24 +08:00
|
|
|
i915_gem_request_reference(struct drm_i915_gem_request *req)
|
|
|
|
{
|
2015-04-27 20:41:16 +08:00
|
|
|
if (req)
|
|
|
|
kref_get(&req->ref);
|
|
|
|
return req;
|
2014-11-25 02:49:24 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline void
|
|
|
|
i915_gem_request_unreference(struct drm_i915_gem_request *req)
|
|
|
|
{
|
|
|
|
kref_put(&req->ref, i915_gem_request_free);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
|
|
|
|
struct drm_i915_gem_request *src)
|
|
|
|
{
|
|
|
|
if (src)
|
|
|
|
i915_gem_request_reference(src);
|
|
|
|
|
|
|
|
if (*pdst)
|
|
|
|
i915_gem_request_unreference(*pdst);
|
|
|
|
|
|
|
|
*pdst = src;
|
|
|
|
}
|
|
|
|
|
2014-11-25 02:49:42 +08:00
|
|
|
/*
|
|
|
|
* XXX: i915_gem_request_completed should be here but currently needs the
|
|
|
|
* definition of i915_seqno_passed() which is below. It will be moved in
|
|
|
|
* a later patch when the call to i915_seqno_passed() is obsoleted...
|
|
|
|
*/
|
|
|
|
|
2014-02-19 02:15:46 +08:00
|
|
|
/*
|
|
|
|
* A command that requires special handling by the command parser.
|
|
|
|
*/
|
|
|
|
struct drm_i915_cmd_descriptor {
|
|
|
|
/*
|
|
|
|
* Flags describing how the command parser processes the command.
|
|
|
|
*
|
|
|
|
* CMD_DESC_FIXED: The command has a fixed length if this is set,
|
|
|
|
* a length mask if not set
|
|
|
|
* CMD_DESC_SKIP: The command is allowed but does not follow the
|
|
|
|
* standard length encoding for the opcode range in
|
|
|
|
* which it falls
|
|
|
|
* CMD_DESC_REJECT: The command is never allowed
|
|
|
|
* CMD_DESC_REGISTER: The command should be checked against the
|
|
|
|
* register whitelist for the appropriate ring
|
|
|
|
* CMD_DESC_MASTER: The command is allowed if the submitting process
|
|
|
|
* is the DRM master
|
|
|
|
*/
|
|
|
|
u32 flags;
|
|
|
|
#define CMD_DESC_FIXED (1<<0)
|
|
|
|
#define CMD_DESC_SKIP (1<<1)
|
|
|
|
#define CMD_DESC_REJECT (1<<2)
|
|
|
|
#define CMD_DESC_REGISTER (1<<3)
|
|
|
|
#define CMD_DESC_BITMASK (1<<4)
|
|
|
|
#define CMD_DESC_MASTER (1<<5)
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The command's unique identification bits and the bitmask to get them.
|
|
|
|
* This isn't strictly the opcode field as defined in the spec and may
|
|
|
|
* also include type, subtype, and/or subop fields.
|
|
|
|
*/
|
|
|
|
struct {
|
|
|
|
u32 value;
|
|
|
|
u32 mask;
|
|
|
|
} cmd;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The command's length. The command is either fixed length (i.e. does
|
|
|
|
* not include a length field) or has a length field mask. The flag
|
|
|
|
* CMD_DESC_FIXED indicates a fixed length. Otherwise, the command has
|
|
|
|
* a length mask. All command entries in a command table must include
|
|
|
|
* length information.
|
|
|
|
*/
|
|
|
|
union {
|
|
|
|
u32 fixed;
|
|
|
|
u32 mask;
|
|
|
|
} length;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Describes where to find a register address in the command to check
|
|
|
|
* against the ring's register whitelist. Only valid if flags has the
|
|
|
|
* CMD_DESC_REGISTER bit set.
|
2015-05-29 21:44:13 +08:00
|
|
|
*
|
|
|
|
* A non-zero step value implies that the command may access multiple
|
|
|
|
* registers in sequence (e.g. LRI), in that case step gives the
|
|
|
|
* distance in dwords between individual offset fields.
|
2014-02-19 02:15:46 +08:00
|
|
|
*/
|
|
|
|
struct {
|
|
|
|
u32 offset;
|
|
|
|
u32 mask;
|
2015-05-29 21:44:13 +08:00
|
|
|
u32 step;
|
2014-02-19 02:15:46 +08:00
|
|
|
} reg;
|
|
|
|
|
|
|
|
#define MAX_CMD_DESC_BITMASKS 3
|
|
|
|
/*
|
|
|
|
* Describes command checks where a particular dword is masked and
|
|
|
|
* compared against an expected value. If the command does not match
|
|
|
|
* the expected value, the parser rejects it. Only valid if flags has
|
|
|
|
* the CMD_DESC_BITMASK bit set. Only entries where mask is non-zero
|
|
|
|
* are valid.
|
2014-02-19 02:15:54 +08:00
|
|
|
*
|
|
|
|
* If the check specifies a non-zero condition_mask then the parser
|
|
|
|
* only performs the check when the bits specified by condition_mask
|
|
|
|
* are non-zero.
|
2014-02-19 02:15:46 +08:00
|
|
|
*/
|
|
|
|
struct {
|
|
|
|
u32 offset;
|
|
|
|
u32 mask;
|
|
|
|
u32 expected;
|
2014-02-19 02:15:54 +08:00
|
|
|
u32 condition_offset;
|
|
|
|
u32 condition_mask;
|
2014-02-19 02:15:46 +08:00
|
|
|
} bits[MAX_CMD_DESC_BITMASKS];
|
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* A table of commands requiring special handling by the command parser.
|
|
|
|
*
|
|
|
|
* Each ring has an array of tables. Each table consists of an array of command
|
|
|
|
* descriptors, which must be sorted with command opcodes in ascending order.
|
|
|
|
*/
|
|
|
|
struct drm_i915_cmd_table {
|
|
|
|
const struct drm_i915_cmd_descriptor *table;
|
|
|
|
int count;
|
|
|
|
};
|
|
|
|
|
2014-08-10 02:18:43 +08:00
|
|
|
/* Note that the (struct drm_i915_private *) cast is just to shut up gcc. */
|
2014-08-13 19:14:12 +08:00
|
|
|
#define __I915__(p) ({ \
|
|
|
|
struct drm_i915_private *__p; \
|
|
|
|
if (__builtin_types_compatible_p(typeof(*p), struct drm_i915_private)) \
|
|
|
|
__p = (struct drm_i915_private *)p; \
|
|
|
|
else if (__builtin_types_compatible_p(typeof(*p), struct drm_device)) \
|
|
|
|
__p = to_i915((struct drm_device *)p); \
|
|
|
|
else \
|
|
|
|
BUILD_BUG(); \
|
|
|
|
__p; \
|
|
|
|
})
|
2014-08-10 02:18:43 +08:00
|
|
|
#define INTEL_INFO(p) (&__I915__(p)->info)
|
2016-04-07 17:48:17 +08:00
|
|
|
#define INTEL_GEN(p) (INTEL_INFO(p)->gen)
|
2014-08-10 02:18:42 +08:00
|
|
|
#define INTEL_DEVID(p) (INTEL_INFO(p)->device_id)
|
2010-11-09 17:17:32 +08:00
|
|
|
|
2015-10-20 20:22:02 +08:00
|
|
|
#define REVID_FOREVER 0xff
|
2016-06-24 21:00:21 +08:00
|
|
|
#define INTEL_REVID(p) (__I915__(p)->drm.pdev->revision)
|
2016-05-10 17:57:08 +08:00
|
|
|
|
|
|
|
#define GEN_FOREVER (0)
|
|
|
|
/*
|
|
|
|
* Returns true if Gen is in inclusive range [Start, End].
|
|
|
|
*
|
|
|
|
* Use GEN_FOREVER for unbound start and or end.
|
|
|
|
*/
|
|
|
|
#define IS_GEN(p, s, e) ({ \
|
|
|
|
unsigned int __s = (s), __e = (e); \
|
|
|
|
BUILD_BUG_ON(!__builtin_constant_p(s)); \
|
|
|
|
BUILD_BUG_ON(!__builtin_constant_p(e)); \
|
|
|
|
if ((__s) != GEN_FOREVER) \
|
|
|
|
__s = (s) - 1; \
|
|
|
|
if ((__e) == GEN_FOREVER) \
|
|
|
|
__e = BITS_PER_LONG - 1; \
|
|
|
|
else \
|
|
|
|
__e = (e) - 1; \
|
|
|
|
!!(INTEL_INFO(p)->gen_mask & GENMASK((__e), (__s))); \
|
|
|
|
})
|
|
|
|
|
2015-10-20 20:22:02 +08:00
|
|
|
/*
|
|
|
|
* Return true if revision is in range [since,until] inclusive.
|
|
|
|
*
|
|
|
|
* Use 0 for open-ended since, and REVID_FOREVER for open-ended until.
|
|
|
|
*/
|
|
|
|
#define IS_REVID(p, since, until) \
|
|
|
|
(INTEL_REVID(p) >= (since) && INTEL_REVID(p) <= (until))
|
|
|
|
|
2014-08-10 02:18:42 +08:00
|
|
|
#define IS_I830(dev) (INTEL_DEVID(dev) == 0x3577)
|
|
|
|
#define IS_845G(dev) (INTEL_DEVID(dev) == 0x2562)
|
2010-11-09 17:17:32 +08:00
|
|
|
#define IS_I85X(dev) (INTEL_INFO(dev)->is_i85x)
|
2014-08-10 02:18:42 +08:00
|
|
|
#define IS_I865G(dev) (INTEL_DEVID(dev) == 0x2572)
|
2010-11-09 17:17:32 +08:00
|
|
|
#define IS_I915G(dev) (INTEL_INFO(dev)->is_i915g)
|
2014-08-10 02:18:42 +08:00
|
|
|
#define IS_I915GM(dev) (INTEL_DEVID(dev) == 0x2592)
|
|
|
|
#define IS_I945G(dev) (INTEL_DEVID(dev) == 0x2772)
|
2010-11-09 17:17:32 +08:00
|
|
|
#define IS_I945GM(dev) (INTEL_INFO(dev)->is_i945gm)
|
|
|
|
#define IS_BROADWATER(dev) (INTEL_INFO(dev)->is_broadwater)
|
|
|
|
#define IS_CRESTLINE(dev) (INTEL_INFO(dev)->is_crestline)
|
2014-08-10 02:18:42 +08:00
|
|
|
#define IS_GM45(dev) (INTEL_DEVID(dev) == 0x2A42)
|
2010-11-09 17:17:32 +08:00
|
|
|
#define IS_G4X(dev) (INTEL_INFO(dev)->is_g4x)
|
2014-08-10 02:18:42 +08:00
|
|
|
#define IS_PINEVIEW_G(dev) (INTEL_DEVID(dev) == 0xa001)
|
|
|
|
#define IS_PINEVIEW_M(dev) (INTEL_DEVID(dev) == 0xa011)
|
2010-11-09 17:17:32 +08:00
|
|
|
#define IS_PINEVIEW(dev) (INTEL_INFO(dev)->is_pineview)
|
|
|
|
#define IS_G33(dev) (INTEL_INFO(dev)->is_g33)
|
2014-08-10 02:18:42 +08:00
|
|
|
#define IS_IRONLAKE_M(dev) (INTEL_DEVID(dev) == 0x0046)
|
2011-04-29 05:33:09 +08:00
|
|
|
#define IS_IVYBRIDGE(dev) (INTEL_INFO(dev)->is_ivybridge)
|
2014-08-10 02:18:42 +08:00
|
|
|
#define IS_IVB_GT1(dev) (INTEL_DEVID(dev) == 0x0156 || \
|
|
|
|
INTEL_DEVID(dev) == 0x0152 || \
|
|
|
|
INTEL_DEVID(dev) == 0x015a)
|
2012-03-29 04:39:21 +08:00
|
|
|
#define IS_VALLEYVIEW(dev) (INTEL_INFO(dev)->is_valleyview)
|
2015-12-10 04:29:35 +08:00
|
|
|
#define IS_CHERRYVIEW(dev) (INTEL_INFO(dev)->is_cherryview)
|
2012-03-29 23:32:18 +08:00
|
|
|
#define IS_HASWELL(dev) (INTEL_INFO(dev)->is_haswell)
|
2016-05-10 17:57:05 +08:00
|
|
|
#define IS_BROADWELL(dev) (INTEL_INFO(dev)->is_broadwell)
|
2014-04-02 13:54:50 +08:00
|
|
|
#define IS_SKYLAKE(dev) (INTEL_INFO(dev)->is_skylake)
|
2015-10-28 01:14:54 +08:00
|
|
|
#define IS_BROXTON(dev) (INTEL_INFO(dev)->is_broxton)
|
2015-10-28 19:16:45 +08:00
|
|
|
#define IS_KABYLAKE(dev) (INTEL_INFO(dev)->is_kabylake)
|
2010-11-09 17:17:32 +08:00
|
|
|
#define IS_MOBILE(dev) (INTEL_INFO(dev)->is_mobile)
|
2013-08-13 01:34:08 +08:00
|
|
|
#define IS_HSW_EARLY_SDV(dev) (IS_HASWELL(dev) && \
|
2014-08-10 02:18:42 +08:00
|
|
|
(INTEL_DEVID(dev) & 0xFF00) == 0x0C00)
|
2013-11-09 02:20:06 +08:00
|
|
|
#define IS_BDW_ULT(dev) (IS_BROADWELL(dev) && \
|
2015-01-20 08:16:15 +08:00
|
|
|
((INTEL_DEVID(dev) & 0xf) == 0x6 || \
|
2015-01-22 03:46:32 +08:00
|
|
|
(INTEL_DEVID(dev) & 0xf) == 0xb || \
|
2014-08-10 02:18:42 +08:00
|
|
|
(INTEL_DEVID(dev) & 0xf) == 0xe))
|
2015-06-03 20:45:12 +08:00
|
|
|
/* ULX machines are also considered ULT. */
|
|
|
|
#define IS_BDW_ULX(dev) (IS_BROADWELL(dev) && \
|
|
|
|
(INTEL_DEVID(dev) & 0xf) == 0xe)
|
2014-09-20 08:16:26 +08:00
|
|
|
#define IS_BDW_GT3(dev) (IS_BROADWELL(dev) && \
|
|
|
|
(INTEL_DEVID(dev) & 0x00F0) == 0x0020)
|
2013-11-09 02:20:06 +08:00
|
|
|
#define IS_HSW_ULT(dev) (IS_HASWELL(dev) && \
|
2014-08-10 02:18:42 +08:00
|
|
|
(INTEL_DEVID(dev) & 0xFF00) == 0x0A00)
|
2013-08-29 03:45:46 +08:00
|
|
|
#define IS_HSW_GT3(dev) (IS_HASWELL(dev) && \
|
2014-08-10 02:18:42 +08:00
|
|
|
(INTEL_DEVID(dev) & 0x00F0) == 0x0020)
|
2014-04-29 22:00:22 +08:00
|
|
|
/* ULX machines are also considered ULT. */
|
2014-08-10 02:18:42 +08:00
|
|
|
#define IS_HSW_ULX(dev) (INTEL_DEVID(dev) == 0x0A0E || \
|
|
|
|
INTEL_DEVID(dev) == 0x0A1E)
|
2015-06-25 16:11:03 +08:00
|
|
|
#define IS_SKL_ULT(dev) (INTEL_DEVID(dev) == 0x1906 || \
|
|
|
|
INTEL_DEVID(dev) == 0x1913 || \
|
|
|
|
INTEL_DEVID(dev) == 0x1916 || \
|
|
|
|
INTEL_DEVID(dev) == 0x1921 || \
|
|
|
|
INTEL_DEVID(dev) == 0x1926)
|
|
|
|
#define IS_SKL_ULX(dev) (INTEL_DEVID(dev) == 0x190E || \
|
|
|
|
INTEL_DEVID(dev) == 0x1915 || \
|
|
|
|
INTEL_DEVID(dev) == 0x191E)
|
2015-12-09 08:58:37 +08:00
|
|
|
#define IS_KBL_ULT(dev) (INTEL_DEVID(dev) == 0x5906 || \
|
|
|
|
INTEL_DEVID(dev) == 0x5913 || \
|
|
|
|
INTEL_DEVID(dev) == 0x5916 || \
|
|
|
|
INTEL_DEVID(dev) == 0x5921 || \
|
|
|
|
INTEL_DEVID(dev) == 0x5926)
|
|
|
|
#define IS_KBL_ULX(dev) (INTEL_DEVID(dev) == 0x590E || \
|
|
|
|
INTEL_DEVID(dev) == 0x5915 || \
|
|
|
|
INTEL_DEVID(dev) == 0x591E)
|
2015-09-12 12:47:50 +08:00
|
|
|
#define IS_SKL_GT3(dev) (IS_SKYLAKE(dev) && \
|
|
|
|
(INTEL_DEVID(dev) & 0x00F0) == 0x0020)
|
|
|
|
#define IS_SKL_GT4(dev) (IS_SKYLAKE(dev) && \
|
|
|
|
(INTEL_DEVID(dev) & 0x00F0) == 0x0030)
|
|
|
|
|
2013-08-24 07:00:07 +08:00
|
|
|
#define IS_PRELIMINARY_HW(intel_info) ((intel_info)->is_preliminary)
|
2010-11-09 17:17:32 +08:00
|
|
|
|
2015-10-20 20:22:00 +08:00
|
|
|
#define SKL_REVID_A0 0x0
|
|
|
|
#define SKL_REVID_B0 0x1
|
|
|
|
#define SKL_REVID_C0 0x2
|
|
|
|
#define SKL_REVID_D0 0x3
|
|
|
|
#define SKL_REVID_E0 0x4
|
|
|
|
#define SKL_REVID_F0 0x5
|
|
|
|
|
2015-10-20 20:22:02 +08:00
|
|
|
#define IS_SKL_REVID(p, since, until) (IS_SKYLAKE(p) && IS_REVID(p, since, until))
|
|
|
|
|
2015-10-20 20:22:00 +08:00
|
|
|
#define BXT_REVID_A0 0x0
|
2015-10-20 20:22:01 +08:00
|
|
|
#define BXT_REVID_A1 0x1
|
2015-10-20 20:22:00 +08:00
|
|
|
#define BXT_REVID_B0 0x3
|
|
|
|
#define BXT_REVID_C0 0x9
|
2015-03-20 17:03:52 +08:00
|
|
|
|
2015-10-20 20:22:02 +08:00
|
|
|
#define IS_BXT_REVID(p, since, until) (IS_BROXTON(p) && IS_REVID(p, since, until))
|
|
|
|
|
2016-06-07 22:18:55 +08:00
|
|
|
#define KBL_REVID_A0 0x0
|
|
|
|
#define KBL_REVID_B0 0x1
|
2016-06-07 22:19:03 +08:00
|
|
|
#define KBL_REVID_C0 0x2
|
|
|
|
#define KBL_REVID_D0 0x3
|
|
|
|
#define KBL_REVID_E0 0x4
|
2016-06-07 22:18:55 +08:00
|
|
|
|
|
|
|
#define IS_KBL_REVID(p, since, until) \
|
|
|
|
(IS_KABYLAKE(p) && IS_REVID(p, since, until))
|
|
|
|
|
2011-04-07 03:11:14 +08:00
|
|
|
/*
|
|
|
|
* The genX designation typically refers to the render engine, so render
|
|
|
|
* capability related checks should use IS_GEN, while display and other checks
|
|
|
|
* have their own (e.g. HAS_PCH_SPLIT for ILK+ display, IS_foo for particular
|
|
|
|
* chips, etc.).
|
|
|
|
*/
|
2016-07-04 22:50:23 +08:00
|
|
|
#define IS_GEN2(dev) (!!(INTEL_INFO(dev)->gen_mask & BIT(1)))
|
|
|
|
#define IS_GEN3(dev) (!!(INTEL_INFO(dev)->gen_mask & BIT(2)))
|
|
|
|
#define IS_GEN4(dev) (!!(INTEL_INFO(dev)->gen_mask & BIT(3)))
|
|
|
|
#define IS_GEN5(dev) (!!(INTEL_INFO(dev)->gen_mask & BIT(4)))
|
|
|
|
#define IS_GEN6(dev) (!!(INTEL_INFO(dev)->gen_mask & BIT(5)))
|
|
|
|
#define IS_GEN7(dev) (!!(INTEL_INFO(dev)->gen_mask & BIT(6)))
|
|
|
|
#define IS_GEN8(dev) (!!(INTEL_INFO(dev)->gen_mask & BIT(7)))
|
|
|
|
#define IS_GEN9(dev) (!!(INTEL_INFO(dev)->gen_mask & BIT(8)))
|
2010-11-09 17:17:32 +08:00
|
|
|
|
2016-06-23 21:52:41 +08:00
|
|
|
#define ENGINE_MASK(id) BIT(id)
|
|
|
|
#define RENDER_RING ENGINE_MASK(RCS)
|
|
|
|
#define BSD_RING ENGINE_MASK(VCS)
|
|
|
|
#define BLT_RING ENGINE_MASK(BCS)
|
|
|
|
#define VEBOX_RING ENGINE_MASK(VECS)
|
|
|
|
#define BSD2_RING ENGINE_MASK(VCS2)
|
|
|
|
#define ALL_ENGINES (~0)
|
|
|
|
|
|
|
|
#define HAS_ENGINE(dev_priv, id) \
|
2016-07-04 22:50:23 +08:00
|
|
|
(!!(INTEL_INFO(dev_priv)->ring_mask & ENGINE_MASK(id)))
|
2016-06-23 21:52:41 +08:00
|
|
|
|
|
|
|
#define HAS_BSD(dev_priv) HAS_ENGINE(dev_priv, VCS)
|
|
|
|
#define HAS_BSD2(dev_priv) HAS_ENGINE(dev_priv, VCS2)
|
|
|
|
#define HAS_BLT(dev_priv) HAS_ENGINE(dev_priv, BCS)
|
|
|
|
#define HAS_VEBOX(dev_priv) HAS_ENGINE(dev_priv, VECS)
|
|
|
|
|
2014-04-19 05:04:27 +08:00
|
|
|
#define HAS_LLC(dev) (INTEL_INFO(dev)->has_llc)
|
2016-03-02 20:10:31 +08:00
|
|
|
#define HAS_SNOOP(dev) (INTEL_INFO(dev)->has_snoop)
|
2016-07-04 22:50:23 +08:00
|
|
|
#define HAS_EDRAM(dev) (!!(__I915__(dev)->edram_cap & EDRAM_ENABLED))
|
2014-04-19 05:04:27 +08:00
|
|
|
#define HAS_WT(dev) ((IS_HASWELL(dev) || IS_BROADWELL(dev)) && \
|
2016-04-13 22:26:43 +08:00
|
|
|
HAS_EDRAM(dev))
|
2010-11-09 17:17:32 +08:00
|
|
|
#define I915_NEED_GFX_HWS(dev) (INTEL_INFO(dev)->need_gfx_hws)
|
|
|
|
|
drm/i915: preliminary context support
Very basic code for context setup/destruction in the driver.
Adds the file i915_gem_context.c This file implements HW context
support. On gen5+ a HW context consists of an opaque GPU object which is
referenced at times of context saves and restores. With RC6 enabled,
the context is also referenced as the GPU enters and exists from RC6
(GPU has it's own internal power context, except on gen5). Though
something like a context does exist for the media ring, the code only
supports contexts for the render ring.
In software, there is a distinction between contexts created by the
user, and the default HW context. The default HW context is used by GPU
clients that do not request setup of their own hardware context. The
default context's state is never restored to help prevent programming
errors. This would happen if a client ran and piggy-backed off another
clients GPU state. The default context only exists to give the GPU some
offset to load as the current to invoke a save of the context we
actually care about. In fact, the code could likely be constructed,
albeit in a more complicated fashion, to never use the default context,
though that limits the driver's ability to swap out, and/or destroy
other contexts.
All other contexts are created as a request by the GPU client. These
contexts store GPU state, and thus allow GPU clients to not re-emit
state (and potentially query certain state) at any time. The kernel
driver makes certain that the appropriate commands are inserted.
There are 4 entry points into the contexts, init, fini, open, close.
The names are self-explanatory except that init can be called during
reset, and also during pm thaw/resume. As we expect our context to be
preserved across these events, we do not reinitialize in this case.
As Adam Jackson pointed out, The cutoff of 1MB where a HW context is
considered too big is arbitrary. The reason for this is even though
context sizes are increasing with every generation, they have yet to
eclipse even 32k. If we somehow read back way more than that, it
probably means BIOS has done something strange, or we're running on a
platform that wasn't designed for this.
v2: rename load/unload to init/fini (daniel)
remove ILK support for get_size() (indirectly daniel)
add HAS_HW_CONTEXTS macro to clarify supported platforms (daniel)
added comments (Ben)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2012-06-05 05:42:42 +08:00
|
|
|
#define HAS_HW_CONTEXTS(dev) (INTEL_INFO(dev)->gen >= 6)
|
2014-07-25 00:04:49 +08:00
|
|
|
#define HAS_LOGICAL_RING_CONTEXTS(dev) (INTEL_INFO(dev)->gen >= 8)
|
2014-08-05 22:51:18 +08:00
|
|
|
#define USES_PPGTT(dev) (i915.enable_ppgtt)
|
2015-08-03 16:52:01 +08:00
|
|
|
#define USES_FULL_PPGTT(dev) (i915.enable_ppgtt >= 2)
|
|
|
|
#define USES_FULL_48BIT_PPGTT(dev) (i915.enable_ppgtt == 3)
|
2012-02-10 00:15:46 +08:00
|
|
|
|
2010-11-09 03:18:58 +08:00
|
|
|
#define HAS_OVERLAY(dev) (INTEL_INFO(dev)->has_overlay)
|
2010-11-09 17:17:32 +08:00
|
|
|
#define OVERLAY_NEEDS_PHYSICAL(dev) (INTEL_INFO(dev)->overlay_needs_physical)
|
|
|
|
|
2012-12-17 23:21:27 +08:00
|
|
|
/* Early gen2 have a totally busted CS tlb and require pinned batches. */
|
|
|
|
#define HAS_BROKEN_CS_TLB(dev) (IS_I830(dev) || IS_845G(dev))
|
2015-12-17 01:18:37 +08:00
|
|
|
|
|
|
|
/* WaRsDisableCoarsePowerGating:skl,bxt */
|
2016-06-21 22:07:14 +08:00
|
|
|
#define NEEDS_WaRsDisableCoarsePowerGating(dev_priv) \
|
|
|
|
(IS_BXT_REVID(dev_priv, 0, BXT_REVID_A1) || \
|
|
|
|
IS_SKL_GT3(dev_priv) || \
|
|
|
|
IS_SKL_GT4(dev_priv))
|
2016-04-05 20:56:16 +08:00
|
|
|
|
2014-02-07 23:33:20 +08:00
|
|
|
/*
|
|
|
|
* dp aux and gmbus irq on gen4 seems to be able to generate legacy interrupts
|
|
|
|
* even when in MSI mode. This results in spurious interrupt warnings if the
|
|
|
|
* legacy irq no. is shared with another device. The kernel then disables that
|
|
|
|
* interrupt source and so prevents the other device from working properly.
|
|
|
|
*/
|
|
|
|
#define HAS_AUX_IRQ(dev) (INTEL_INFO(dev)->gen >= 5)
|
|
|
|
#define HAS_GMBUS_IRQ(dev) (INTEL_INFO(dev)->gen >= 5)
|
2012-12-17 23:21:27 +08:00
|
|
|
|
2010-11-09 17:17:32 +08:00
|
|
|
/* With the 945 and later, Y tiling got adjusted so that it was 32 128-byte
|
|
|
|
* rows, which changed the alignment requirements and fence programming.
|
|
|
|
*/
|
|
|
|
#define HAS_128_BYTE_Y_TILING(dev) (!IS_GEN2(dev) && !(IS_I915G(dev) || \
|
|
|
|
IS_I915GM(dev)))
|
|
|
|
#define SUPPORTS_TV(dev) (INTEL_INFO(dev)->supports_tv)
|
|
|
|
#define I915_HAS_HOTPLUG(dev) (INTEL_INFO(dev)->has_hotplug)
|
|
|
|
|
|
|
|
#define HAS_FW_BLC(dev) (INTEL_INFO(dev)->gen > 2)
|
|
|
|
#define HAS_PIPE_CXSR(dev) (INTEL_INFO(dev)->has_pipe_cxsr)
|
2014-01-10 15:50:12 +08:00
|
|
|
#define HAS_FBC(dev) (INTEL_INFO(dev)->has_fbc)
|
2010-11-09 17:17:32 +08:00
|
|
|
|
2014-10-02 03:04:14 +08:00
|
|
|
#define HAS_IPS(dev) (IS_HSW_ULT(dev) || IS_BROADWELL(dev))
|
2013-06-25 01:29:34 +08:00
|
|
|
|
2015-05-18 22:10:01 +08:00
|
|
|
#define HAS_DP_MST(dev) (IS_HASWELL(dev) || IS_BROADWELL(dev) || \
|
|
|
|
INTEL_INFO(dev)->gen >= 9)
|
|
|
|
|
2013-04-23 01:40:39 +08:00
|
|
|
#define HAS_DDI(dev) (INTEL_INFO(dev)->has_ddi)
|
2013-04-23 01:40:41 +08:00
|
|
|
#define HAS_FPGA_DBG_UNCLAIMED(dev) (INTEL_INFO(dev)->has_fpga_dbg)
|
2014-11-20 19:44:37 +08:00
|
|
|
#define HAS_PSR(dev) (IS_HASWELL(dev) || IS_BROADWELL(dev) || \
|
2015-01-22 17:00:54 +08:00
|
|
|
IS_VALLEYVIEW(dev) || IS_CHERRYVIEW(dev) || \
|
2015-10-28 19:16:45 +08:00
|
|
|
IS_SKYLAKE(dev) || IS_KABYLAKE(dev))
|
2014-03-08 07:12:37 +08:00
|
|
|
#define HAS_RUNTIME_PM(dev) (IS_GEN6(dev) || IS_HASWELL(dev) || \
|
2015-04-16 16:52:14 +08:00
|
|
|
IS_BROADWELL(dev) || IS_VALLEYVIEW(dev) || \
|
2015-12-10 04:29:35 +08:00
|
|
|
IS_CHERRYVIEW(dev) || IS_SKYLAKE(dev) || \
|
2016-04-01 21:02:47 +08:00
|
|
|
IS_KABYLAKE(dev) || IS_BROXTON(dev))
|
2014-10-07 22:06:50 +08:00
|
|
|
#define HAS_RC6(dev) (INTEL_INFO(dev)->gen >= 6)
|
2016-05-10 17:57:06 +08:00
|
|
|
#define HAS_RC6p(dev) (IS_GEN6(dev) || IS_IVYBRIDGE(dev))
|
2012-11-24 01:30:39 +08:00
|
|
|
|
2015-08-05 00:32:42 +08:00
|
|
|
#define HAS_CSR(dev) (IS_GEN9(dev))
|
drm/i915/skl: Add support to load SKL CSR firmware.
Display Context Save and Restore support is needed for
various SKL Display C states like DC5, DC6.
This implementation is added based on first version of DMC CSR program
that we received from h/w team.
Here we are using request_firmware based design.
Finally this firmware should end up in linux-firmware tree.
For SKL platform its mandatory to ensure that we load this
csr program before enabling DC states like DC5/DC6.
As CSR program gets reset on various conditions, we should ensure
to load it during boot and in future change to be added to load
this system resume sequence too.
v1: Initial relese as RFC patch
v2: Design change as per Daniel, Damien and Shobit's review comments
request firmware method followed.
v3: Some optimization and functional changes.
Pulled register defines into drivers/gpu/drm/i915/i915_reg.h
Used kmemdup to allocate and duplicate firmware content.
Ensured to free allocated buffer.
v4: Modified as per review comments from Satheesh and Daniel
Removed temporary buffer.
Optimized number of writes by replacing I915_WRITE with I915_WRITE64.
v5:
Modified as per review comemnts from Damien.
- Changed name for functions and firmware.
- Introduced HAS_CSR.
- Reverted back previous change and used csr_buf with u8 size.
- Using cpu_to_be64 for endianness change.
Modified as per review comments from Imre.
- Modified registers and macro names to be a bit closer to bspec terminology
and the existing register naming in the driver.
- Early return for non SKL platforms in intel_load_csr_program function.
- Added locking around CSR program load function as it may be called
concurrently during system/runtime resume.
- Releasing the fw before loading the program for consistency
- Handled error path during f/w load.
v6: Modified as per review comments from Imre.
- Corrected out_freecsr sequence.
v7: Modified as per review comments from Imre.
Fail loading fw if fw->size%8!=0.
v8: Rebase to latest.
v9: Rebase on top of -nightly (Damien)
v10: Enabled support for dmc firmware ver 1.0.
According to ver 1.0 in a single binary package all the firmware's that are
required for different stepping's of the product will be stored. The package
contains the css header, followed by the package header and the actual dmc
firmwares. Package header contains the firmware/stepping mapping table and
the corresponding firmware offsets to the individual binaries, within the
package. Each individual program binary contains the header and the payload
sections whose size is specified in the header section. This changes are done
to extract the specific firmaware from the package. (Animesh)
v11: Modified as per review comemnts from Imre.
- Added code comment from bpec for header structure elements.
- Added __packed to avoid structure padding.
- Added helper functions for stepping and substepping info.
- Added code comment for CSR_MAX_FW_SIZE.
- Disabled BXT firmware loading, will be enabled with dmc 1.0 support.
- Changed skl_stepping_info based on bspec, earlier used from config DB.
- Removed duplicate call of cpu_to_be* from intel_csr_load_program function.
- Used cpu_to_be32 instead of cpu_to_be64 as firmware binary in dword aligned.
- Added sanity check for header length.
- Added sanity check for mmio address got from firmware binary.
- kmalloc done separately for dmc header and dmc firmware. (Animesh)
v12: Modified as per review comemnts from Imre.
- Corrected the typo error in skl stepping info structure.
- Added out-of-bound access for skl_stepping_info.
- Sanity check for mmio address modified.
- Sanity check added for stepping and substeppig.
- Modified the intel_dmc_info structure, cache only the required header info. (Animesh)
v13: clarify firmware load error message.
The reason for a firmware loading failure can be obscure if the driver
is built-in. Provide an explanation to the user about the likely reason for
the failure and how to resolve it. (Imre)
v14: Suggested by Jani.
- fix s/I915/CONFIG_DRM_I915/ typo
- add fw_path to the firmware object instead of using a static ptr (Jani)
v15:
1) Changed the firmware name as dmc_gen9.bin, everytime for a new firmware version a symbolic link
with same name will help not to build kernel again.
2) Changes done as per review comments from Imre.
- Error check removed for intel_csr_ucode_init.
- Moved csr-specific data structure to intel_csr.h and optimization done on structure definition.
- fw->data used directly for parsing the header info & memory allocation
only done separately for payload. (Animesh)
v16:
- No need for out_regs label in i915_driver_load(), so removed it.
- Changed the firmware name as skl_dmc_ver1.bin, followed naming convention <platform>_dmc_<api-version>.bin (Animesh)
Issue: VIZ-2569
Signed-off-by: A.Sunil Kamath <sunil.kamath@intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Animesh Manna <animesh.manna@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-05-04 20:58:44 +08:00
|
|
|
|
2016-05-13 22:36:30 +08:00
|
|
|
/*
|
|
|
|
* For now, anything with a GuC requires uCode loading, and then supports
|
|
|
|
* command submission once loaded. But these are logically independent
|
|
|
|
* properties, so we have separate macros to test them.
|
|
|
|
*/
|
2016-07-01 00:37:51 +08:00
|
|
|
#define HAS_GUC(dev) (IS_GEN9(dev))
|
2016-05-13 22:36:30 +08:00
|
|
|
#define HAS_GUC_UCODE(dev) (HAS_GUC(dev))
|
|
|
|
#define HAS_GUC_SCHED(dev) (HAS_GUC(dev))
|
2015-08-12 22:43:36 +08:00
|
|
|
|
2015-07-01 15:12:23 +08:00
|
|
|
#define HAS_RESOURCE_STREAMER(dev) (IS_HASWELL(dev) || \
|
|
|
|
INTEL_INFO(dev)->gen >= 8)
|
|
|
|
|
2015-06-29 17:20:23 +08:00
|
|
|
#define HAS_CORE_RING_FREQ(dev) (INTEL_INFO(dev)->gen >= 6 && \
|
2015-12-10 04:29:35 +08:00
|
|
|
!IS_VALLEYVIEW(dev) && !IS_CHERRYVIEW(dev) && \
|
|
|
|
!IS_BROXTON(dev))
|
2015-06-29 17:20:23 +08:00
|
|
|
|
2016-06-03 13:34:33 +08:00
|
|
|
#define HAS_POOLED_EU(dev) (INTEL_INFO(dev)->has_pooled_eu)
|
|
|
|
|
2012-11-21 01:12:07 +08:00
|
|
|
#define INTEL_PCH_DEVICE_ID_MASK 0xff00
|
|
|
|
#define INTEL_PCH_IBX_DEVICE_ID_TYPE 0x3b00
|
|
|
|
#define INTEL_PCH_CPT_DEVICE_ID_TYPE 0x1c00
|
|
|
|
#define INTEL_PCH_PPT_DEVICE_ID_TYPE 0x1e00
|
|
|
|
#define INTEL_PCH_LPT_DEVICE_ID_TYPE 0x8c00
|
|
|
|
#define INTEL_PCH_LPT_LP_DEVICE_ID_TYPE 0x9c00
|
2014-04-09 13:38:57 +08:00
|
|
|
#define INTEL_PCH_SPT_DEVICE_ID_TYPE 0xA100
|
|
|
|
#define INTEL_PCH_SPT_LP_DEVICE_ID_TYPE 0x9D00
|
2016-07-02 08:07:12 +08:00
|
|
|
#define INTEL_PCH_KBP_DEVICE_ID_TYPE 0xA200
|
2015-08-28 20:10:22 +08:00
|
|
|
#define INTEL_PCH_P2X_DEVICE_ID_TYPE 0x7100
|
2016-03-17 04:31:30 +08:00
|
|
|
#define INTEL_PCH_P3X_DEVICE_ID_TYPE 0x7000
|
2015-11-26 19:03:51 +08:00
|
|
|
#define INTEL_PCH_QEMU_DEVICE_ID_TYPE 0x2900 /* qemu q35 has 2918 */
|
2012-11-21 01:12:07 +08:00
|
|
|
|
2014-08-25 02:35:31 +08:00
|
|
|
#define INTEL_PCH_TYPE(dev) (__I915__(dev)->pch_type)
|
2016-07-02 08:07:12 +08:00
|
|
|
#define HAS_PCH_KBP(dev) (INTEL_PCH_TYPE(dev) == PCH_KBP)
|
2014-04-09 13:38:57 +08:00
|
|
|
#define HAS_PCH_SPT(dev) (INTEL_PCH_TYPE(dev) == PCH_SPT)
|
2012-03-29 23:32:20 +08:00
|
|
|
#define HAS_PCH_LPT(dev) (INTEL_PCH_TYPE(dev) == PCH_LPT)
|
2015-08-28 04:55:59 +08:00
|
|
|
#define HAS_PCH_LPT_LP(dev) (__I915__(dev)->pch_id == INTEL_PCH_LPT_LP_DEVICE_ID_TYPE)
|
2015-11-30 22:23:44 +08:00
|
|
|
#define HAS_PCH_LPT_H(dev) (__I915__(dev)->pch_id == INTEL_PCH_LPT_DEVICE_ID_TYPE)
|
2010-11-09 17:17:32 +08:00
|
|
|
#define HAS_PCH_CPT(dev) (INTEL_PCH_TYPE(dev) == PCH_CPT)
|
|
|
|
#define HAS_PCH_IBX(dev) (INTEL_PCH_TYPE(dev) == PCH_IBX)
|
2013-04-06 04:12:40 +08:00
|
|
|
#define HAS_PCH_NOP(dev) (INTEL_PCH_TYPE(dev) == PCH_NOP)
|
2012-07-04 02:57:32 +08:00
|
|
|
#define HAS_PCH_SPLIT(dev) (INTEL_PCH_TYPE(dev) != PCH_NONE)
|
2010-11-09 17:17:32 +08:00
|
|
|
|
2015-12-10 04:29:35 +08:00
|
|
|
#define HAS_GMCH_DISPLAY(dev) (INTEL_INFO(dev)->gen < 5 || \
|
|
|
|
IS_VALLEYVIEW(dev) || IS_CHERRYVIEW(dev))
|
2014-07-21 17:53:38 +08:00
|
|
|
|
2013-09-20 02:01:40 +08:00
|
|
|
/* DPF == dynamic parity feature */
|
|
|
|
#define HAS_L3_DPF(dev) (IS_IVYBRIDGE(dev) || IS_HASWELL(dev))
|
|
|
|
#define NUM_L3_SLICES(dev) (IS_HSW_GT3(dev) ? 2 : HAS_L3_DPF(dev))
|
2012-07-25 11:47:31 +08:00
|
|
|
|
2012-09-08 10:43:39 +08:00
|
|
|
#define GT_FREQUENCY_MULTIPLIER 50
|
2015-03-06 13:37:14 +08:00
|
|
|
#define GEN9_FREQ_SCALER 3
|
2012-09-08 10:43:39 +08:00
|
|
|
|
2010-11-09 03:18:58 +08:00
|
|
|
#include "i915_trace.h"
|
|
|
|
|
2016-06-24 21:07:14 +08:00
|
|
|
static inline bool intel_scanout_needs_vtd_wa(struct drm_i915_private *dev_priv)
|
|
|
|
{
|
|
|
|
#ifdef CONFIG_INTEL_IOMMU
|
|
|
|
if (INTEL_GEN(dev_priv) >= 6 && intel_iommu_gfx_mapped)
|
|
|
|
return true;
|
|
|
|
#endif
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2015-08-27 21:15:15 +08:00
|
|
|
extern int i915_suspend_switcheroo(struct drm_device *dev, pm_message_t state);
|
|
|
|
extern int i915_resume_switcheroo(struct drm_device *dev);
|
2008-11-28 12:22:24 +08:00
|
|
|
|
2016-05-06 22:40:21 +08:00
|
|
|
int intel_sanitize_enable_ppgtt(struct drm_i915_private *dev_priv,
|
|
|
|
int enable_ppgtt);
|
2016-04-29 20:18:22 +08:00
|
|
|
|
2016-06-24 21:00:22 +08:00
|
|
|
/* i915_drv.c */
|
2016-03-18 16:46:10 +08:00
|
|
|
void __printf(3, 4)
|
|
|
|
__i915_printk(struct drm_i915_private *dev_priv, const char *level,
|
|
|
|
const char *fmt, ...);
|
|
|
|
|
|
|
|
#define i915_report_error(dev_priv, fmt, ...) \
|
|
|
|
__i915_printk(dev_priv, KERN_ERR, fmt, ##__VA_ARGS__)
|
|
|
|
|
2012-04-17 05:07:40 +08:00
|
|
|
#ifdef CONFIG_COMPAT
|
2006-01-02 17:14:23 +08:00
|
|
|
extern long i915_compat_ioctl(struct file *filp, unsigned int cmd,
|
|
|
|
unsigned long arg);
|
2012-04-17 05:07:40 +08:00
|
|
|
#endif
|
2016-05-10 21:10:04 +08:00
|
|
|
extern int intel_gpu_reset(struct drm_i915_private *dev_priv, u32 engine_mask);
|
|
|
|
extern bool intel_has_gpu_reset(struct drm_i915_private *dev_priv);
|
2016-05-06 22:40:21 +08:00
|
|
|
extern int i915_reset(struct drm_i915_private *dev_priv);
|
2016-04-05 01:50:56 +08:00
|
|
|
extern int intel_guc_reset(struct drm_i915_private *dev_priv);
|
2016-03-22 00:26:59 +08:00
|
|
|
extern void intel_engine_init_hangcheck(struct intel_engine_cs *engine);
|
2010-05-21 05:28:11 +08:00
|
|
|
extern unsigned long i915_chipset_val(struct drm_i915_private *dev_priv);
|
|
|
|
extern unsigned long i915_mch_val(struct drm_i915_private *dev_priv);
|
|
|
|
extern unsigned long i915_gfx_val(struct drm_i915_private *dev_priv);
|
|
|
|
extern void i915_update_gfx_val(struct drm_i915_private *dev_priv);
|
2014-04-18 21:35:02 +08:00
|
|
|
int vlv_force_gfx_clock(struct drm_i915_private *dev_priv, bool on);
|
2010-05-21 05:28:11 +08:00
|
|
|
|
2015-06-18 18:06:16 +08:00
|
|
|
/* intel_hotplug.c */
|
2016-05-06 21:48:28 +08:00
|
|
|
void intel_hpd_irq_handler(struct drm_i915_private *dev_priv,
|
|
|
|
u32 pin_mask, u32 long_mask);
|
2015-06-18 18:06:16 +08:00
|
|
|
void intel_hpd_init(struct drm_i915_private *dev_priv);
|
|
|
|
void intel_hpd_init_work(struct drm_i915_private *dev_priv);
|
|
|
|
void intel_hpd_cancel_work(struct drm_i915_private *dev_priv);
|
2015-07-22 06:32:45 +08:00
|
|
|
bool intel_hpd_pin_to_port(enum hpd_pin pin, enum port *port);
|
2015-06-18 18:06:16 +08:00
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
/* i915_irq.c */
|
2016-07-02 00:23:13 +08:00
|
|
|
static inline void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
|
|
|
|
{
|
|
|
|
unsigned long delay;
|
|
|
|
|
|
|
|
if (unlikely(!i915.enable_hangcheck))
|
|
|
|
return;
|
|
|
|
|
|
|
|
/* Don't continually defer the hangcheck so that it is always run at
|
|
|
|
* least once after work has been scheduled on any ring. Otherwise,
|
|
|
|
* we will ignore a hung ring if a second ring is kept busy.
|
|
|
|
*/
|
|
|
|
|
|
|
|
delay = round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES);
|
|
|
|
queue_delayed_work(system_long_wq,
|
|
|
|
&dev_priv->gpu_error.hangcheck_work, delay);
|
|
|
|
}
|
|
|
|
|
2014-02-25 23:11:26 +08:00
|
|
|
__printf(3, 4)
|
2016-05-06 22:40:21 +08:00
|
|
|
void i915_handle_error(struct drm_i915_private *dev_priv,
|
|
|
|
u32 engine_mask,
|
2014-02-25 23:11:26 +08:00
|
|
|
const char *fmt, ...);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2014-09-30 16:56:44 +08:00
|
|
|
extern void intel_irq_init(struct drm_i915_private *dev_priv);
|
2014-09-30 16:56:43 +08:00
|
|
|
int intel_irq_install(struct drm_i915_private *dev_priv);
|
|
|
|
void intel_irq_uninstall(struct drm_i915_private *dev_priv);
|
2013-07-20 03:36:52 +08:00
|
|
|
|
2016-05-10 21:10:04 +08:00
|
|
|
extern void intel_uncore_sanitize(struct drm_i915_private *dev_priv);
|
|
|
|
extern void intel_uncore_early_sanitize(struct drm_i915_private *dev_priv,
|
2014-06-06 17:59:39 +08:00
|
|
|
bool restore_forcewake);
|
2016-05-10 21:10:04 +08:00
|
|
|
extern void intel_uncore_init(struct drm_i915_private *dev_priv);
|
2015-12-15 22:25:07 +08:00
|
|
|
extern bool intel_uncore_unclaimed_mmio(struct drm_i915_private *dev_priv);
|
2016-01-08 21:51:20 +08:00
|
|
|
extern bool intel_uncore_arm_unclaimed_mmio_detection(struct drm_i915_private *dev_priv);
|
2016-05-10 21:10:04 +08:00
|
|
|
extern void intel_uncore_fini(struct drm_i915_private *dev_priv);
|
|
|
|
extern void intel_uncore_forcewake_reset(struct drm_i915_private *dev_priv,
|
|
|
|
bool restore);
|
2015-01-16 17:34:41 +08:00
|
|
|
const char *intel_uncore_forcewake_domain_to_str(const enum forcewake_domain_id id);
|
2015-01-16 17:34:40 +08:00
|
|
|
void intel_uncore_forcewake_get(struct drm_i915_private *dev_priv,
|
2015-01-16 17:34:41 +08:00
|
|
|
enum forcewake_domains domains);
|
2015-01-16 17:34:40 +08:00
|
|
|
void intel_uncore_forcewake_put(struct drm_i915_private *dev_priv,
|
2015-01-16 17:34:41 +08:00
|
|
|
enum forcewake_domains domains);
|
2015-04-07 23:21:02 +08:00
|
|
|
/* Like above but the caller must manage the uncore.lock itself.
|
|
|
|
* Must be used with I915_READ_FW and friends.
|
|
|
|
*/
|
|
|
|
void intel_uncore_forcewake_get__locked(struct drm_i915_private *dev_priv,
|
|
|
|
enum forcewake_domains domains);
|
|
|
|
void intel_uncore_forcewake_put__locked(struct drm_i915_private *dev_priv,
|
|
|
|
enum forcewake_domains domains);
|
2016-04-13 22:26:43 +08:00
|
|
|
u64 intel_uncore_edram_size(struct drm_i915_private *dev_priv);
|
|
|
|
|
2015-01-16 17:34:40 +08:00
|
|
|
void assert_forcewakes_inactive(struct drm_i915_private *dev_priv);
|
drm/i915: gvt: Introduce the basic architecture of GVT-g
This patch introduces the very basic framework of GVT-g device model,
includes basic prototypes, definitions, initialization.
v12:
- Call intel_gvt_init() in driver early initialization stage. (Chris)
v8:
- Remove the GVT idr and mutex in intel_gvt_host. (Joonas)
v7:
- Refine the URL link in Kconfig. (Joonas)
- Refine the introduction of GVT-g host support in Kconfig. (Joonas)
- Remove the macro GVT_ALIGN(), use round_down() instead. (Joonas)
- Make "struct intel_gvt" a data member in struct drm_i915_private.(Joonas)
- Remove {alloc, free}_gvt_device()
- Rename intel_gvt_{create, destroy}_gvt_device()
- Expost intel_gvt_init_host()
- Remove the dummy "struct intel_gvt" declaration in intel_gvt.h (Joonas)
v6:
- Refine introduction in Kconfig. (Chris)
- The exposed API functions will take struct intel_gvt * instead of
void *. (Chris/Tvrtko)
- Remove most memebers of strct intel_gvt_device_info. Will add them
in the device model patches.(Chris)
- Remove gvt_info() and gvt_err() in debug.h. (Chris)
- Move GVT kernel parameter into i915_params. (Chris)
- Remove include/drm/i915_gvt.h, as GVT-g will be built within i915.
- Remove the redundant struct i915_gvt *, as the functions in i915
will directly take struct intel_gvt *.
- Add more comments for reviewer.
v5:
Take Tvrtko's comments:
- Fix the misspelled words in Kconfig
- Let functions take drm_i915_private * instead of struct drm_device *
- Remove redundant prints/local varible initialization
v3:
Take Joonas' comments:
- Change file name i915_gvt.* to intel_gvt.*
- Move GVT kernel parameter into intel_gvt.c
- Remove redundant debug macros
- Change error handling style
- Add introductions for some stub functions
- Introduce drm/i915_gvt.h.
Take Kevin's comments:
- Move GVT-g host/guest check into intel_vgt_balloon in i915_gem_gtt.c
v2:
- Introduce i915_gvt.c.
It's necessary to introduce the stubs between i915 driver and GVT-g host,
as GVT-g components is configurable in kernel config. When disabled, the
stubs here do nothing.
Take Joonas' comments:
- Replace boolean return value with int.
- Replace customized info/warn/debug macros with DRM macros.
- Document all non-static functions like i915.
- Remove empty and unused functions.
- Replace magic number with marcos.
- Set GVT-g in kernel config to "n" by default.
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466078825-6662-5-git-send-email-zhi.a.wang@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-06-16 20:07:00 +08:00
|
|
|
|
2016-06-30 22:32:44 +08:00
|
|
|
int intel_wait_for_register(struct drm_i915_private *dev_priv,
|
|
|
|
i915_reg_t reg,
|
|
|
|
const u32 mask,
|
|
|
|
const u32 value,
|
|
|
|
const unsigned long timeout_ms);
|
|
|
|
int intel_wait_for_register_fw(struct drm_i915_private *dev_priv,
|
|
|
|
i915_reg_t reg,
|
|
|
|
const u32 mask,
|
|
|
|
const u32 value,
|
|
|
|
const unsigned long timeout_ms);
|
|
|
|
|
drm/i915: gvt: Introduce the basic architecture of GVT-g
This patch introduces the very basic framework of GVT-g device model,
includes basic prototypes, definitions, initialization.
v12:
- Call intel_gvt_init() in driver early initialization stage. (Chris)
v8:
- Remove the GVT idr and mutex in intel_gvt_host. (Joonas)
v7:
- Refine the URL link in Kconfig. (Joonas)
- Refine the introduction of GVT-g host support in Kconfig. (Joonas)
- Remove the macro GVT_ALIGN(), use round_down() instead. (Joonas)
- Make "struct intel_gvt" a data member in struct drm_i915_private.(Joonas)
- Remove {alloc, free}_gvt_device()
- Rename intel_gvt_{create, destroy}_gvt_device()
- Expost intel_gvt_init_host()
- Remove the dummy "struct intel_gvt" declaration in intel_gvt.h (Joonas)
v6:
- Refine introduction in Kconfig. (Chris)
- The exposed API functions will take struct intel_gvt * instead of
void *. (Chris/Tvrtko)
- Remove most memebers of strct intel_gvt_device_info. Will add them
in the device model patches.(Chris)
- Remove gvt_info() and gvt_err() in debug.h. (Chris)
- Move GVT kernel parameter into i915_params. (Chris)
- Remove include/drm/i915_gvt.h, as GVT-g will be built within i915.
- Remove the redundant struct i915_gvt *, as the functions in i915
will directly take struct intel_gvt *.
- Add more comments for reviewer.
v5:
Take Tvrtko's comments:
- Fix the misspelled words in Kconfig
- Let functions take drm_i915_private * instead of struct drm_device *
- Remove redundant prints/local varible initialization
v3:
Take Joonas' comments:
- Change file name i915_gvt.* to intel_gvt.*
- Move GVT kernel parameter into intel_gvt.c
- Remove redundant debug macros
- Change error handling style
- Add introductions for some stub functions
- Introduce drm/i915_gvt.h.
Take Kevin's comments:
- Move GVT-g host/guest check into intel_vgt_balloon in i915_gem_gtt.c
v2:
- Introduce i915_gvt.c.
It's necessary to introduce the stubs between i915 driver and GVT-g host,
as GVT-g components is configurable in kernel config. When disabled, the
stubs here do nothing.
Take Joonas' comments:
- Replace boolean return value with int.
- Replace customized info/warn/debug macros with DRM macros.
- Document all non-static functions like i915.
- Remove empty and unused functions.
- Replace magic number with marcos.
- Set GVT-g in kernel config to "n" by default.
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466078825-6662-5-git-send-email-zhi.a.wang@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-06-16 20:07:00 +08:00
|
|
|
static inline bool intel_gvt_active(struct drm_i915_private *dev_priv)
|
|
|
|
{
|
|
|
|
return dev_priv->gvt.initialized;
|
|
|
|
}
|
|
|
|
|
2016-05-06 22:40:21 +08:00
|
|
|
static inline bool intel_vgpu_active(struct drm_i915_private *dev_priv)
|
2015-02-10 19:05:47 +08:00
|
|
|
{
|
2016-05-06 22:40:21 +08:00
|
|
|
return dev_priv->vgpu.active;
|
2015-02-10 19:05:47 +08:00
|
|
|
}
|
2011-04-07 03:13:38 +08:00
|
|
|
|
2008-11-04 18:03:27 +08:00
|
|
|
void
|
2014-03-31 19:27:21 +08:00
|
|
|
i915_enable_pipestat(struct drm_i915_private *dev_priv, enum pipe pipe,
|
2014-02-11 00:42:47 +08:00
|
|
|
u32 status_mask);
|
2008-11-04 18:03:27 +08:00
|
|
|
|
|
|
|
void
|
2014-03-31 19:27:21 +08:00
|
|
|
i915_disable_pipestat(struct drm_i915_private *dev_priv, enum pipe pipe,
|
2014-02-11 00:42:47 +08:00
|
|
|
u32 status_mask);
|
2008-11-04 18:03:27 +08:00
|
|
|
|
2014-03-05 01:23:07 +08:00
|
|
|
void valleyview_enable_display_irqs(struct drm_i915_private *dev_priv);
|
|
|
|
void valleyview_disable_display_irqs(struct drm_i915_private *dev_priv);
|
2015-09-23 22:15:27 +08:00
|
|
|
void i915_hotplug_interrupt_update(struct drm_i915_private *dev_priv,
|
|
|
|
uint32_t mask,
|
|
|
|
uint32_t bits);
|
2015-11-24 00:06:16 +08:00
|
|
|
void ilk_update_display_irq(struct drm_i915_private *dev_priv,
|
|
|
|
uint32_t interrupt_mask,
|
|
|
|
uint32_t enabled_irq_mask);
|
|
|
|
static inline void
|
|
|
|
ilk_enable_display_irq(struct drm_i915_private *dev_priv, uint32_t bits)
|
|
|
|
{
|
|
|
|
ilk_update_display_irq(dev_priv, bits, bits);
|
|
|
|
}
|
|
|
|
static inline void
|
|
|
|
ilk_disable_display_irq(struct drm_i915_private *dev_priv, uint32_t bits)
|
|
|
|
{
|
|
|
|
ilk_update_display_irq(dev_priv, bits, 0);
|
|
|
|
}
|
2015-11-24 00:06:17 +08:00
|
|
|
void bdw_update_pipe_irq(struct drm_i915_private *dev_priv,
|
|
|
|
enum pipe pipe,
|
|
|
|
uint32_t interrupt_mask,
|
|
|
|
uint32_t enabled_irq_mask);
|
|
|
|
static inline void bdw_enable_pipe_irq(struct drm_i915_private *dev_priv,
|
|
|
|
enum pipe pipe, uint32_t bits)
|
|
|
|
{
|
|
|
|
bdw_update_pipe_irq(dev_priv, pipe, bits, bits);
|
|
|
|
}
|
|
|
|
static inline void bdw_disable_pipe_irq(struct drm_i915_private *dev_priv,
|
|
|
|
enum pipe pipe, uint32_t bits)
|
|
|
|
{
|
|
|
|
bdw_update_pipe_irq(dev_priv, pipe, bits, 0);
|
|
|
|
}
|
2014-09-30 16:56:46 +08:00
|
|
|
void ibx_display_interrupt_update(struct drm_i915_private *dev_priv,
|
|
|
|
uint32_t interrupt_mask,
|
|
|
|
uint32_t enabled_irq_mask);
|
2015-11-24 00:06:15 +08:00
|
|
|
static inline void
|
|
|
|
ibx_enable_display_interrupt(struct drm_i915_private *dev_priv, uint32_t bits)
|
|
|
|
{
|
|
|
|
ibx_display_interrupt_update(dev_priv, bits, bits);
|
|
|
|
}
|
|
|
|
static inline void
|
|
|
|
ibx_disable_display_interrupt(struct drm_i915_private *dev_priv, uint32_t bits)
|
|
|
|
{
|
|
|
|
ibx_display_interrupt_update(dev_priv, bits, 0);
|
|
|
|
}
|
|
|
|
|
2008-07-31 03:06:12 +08:00
|
|
|
/* i915_gem.c */
|
|
|
|
int i915_gem_create_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file_priv);
|
|
|
|
int i915_gem_pread_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file_priv);
|
|
|
|
int i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file_priv);
|
|
|
|
int i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file_priv);
|
2008-11-13 02:03:55 +08:00
|
|
|
int i915_gem_mmap_gtt_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file_priv);
|
2008-07-31 03:06:12 +08:00
|
|
|
int i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file_priv);
|
|
|
|
int i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file_priv);
|
2014-07-25 00:04:33 +08:00
|
|
|
void i915_gem_execbuffer_move_to_active(struct list_head *vmas,
|
2015-05-30 00:43:33 +08:00
|
|
|
struct drm_i915_gem_request *req);
|
2015-05-30 00:43:27 +08:00
|
|
|
int i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
|
2014-07-25 00:04:21 +08:00
|
|
|
struct drm_i915_gem_execbuffer2 *args,
|
2015-05-30 00:43:27 +08:00
|
|
|
struct list_head *vmas);
|
2008-07-31 03:06:12 +08:00
|
|
|
int i915_gem_execbuffer(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file_priv);
|
2009-12-18 11:05:42 +08:00
|
|
|
int i915_gem_execbuffer2(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file_priv);
|
2008-07-31 03:06:12 +08:00
|
|
|
int i915_gem_busy_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file_priv);
|
2012-09-22 08:01:20 +08:00
|
|
|
int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file);
|
|
|
|
int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file);
|
2008-07-31 03:06:12 +08:00
|
|
|
int i915_gem_throttle_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file_priv);
|
2009-09-14 23:50:29 +08:00
|
|
|
int i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file_priv);
|
2008-07-31 03:06:12 +08:00
|
|
|
int i915_gem_set_tiling(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file_priv);
|
|
|
|
int i915_gem_get_tiling(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file_priv);
|
2016-05-19 23:17:16 +08:00
|
|
|
void i915_gem_init_userptr(struct drm_i915_private *dev_priv);
|
drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl
By exporting the ability to map user address and inserting PTEs
representing their backing pages into the GTT, we can exploit UMA in order
to utilize normal application data as a texture source or even as a
render target (depending upon the capabilities of the chipset). This has
a number of uses, with zero-copy downloads to the GPU and efficient
readback making the intermixed streaming of CPU and GPU operations
fairly efficient. This ability has many widespread implications from
faster rendering of client-side software rasterisers (chromium),
mitigation of stalls due to read back (firefox) and to faster pipelining
of texture data (such as pixel buffer objects in GL or data blobs in CL).
v2: Compile with CONFIG_MMU_NOTIFIER
v3: We can sleep while performing invalidate-range, which we can utilise
to drop our page references prior to the kernel manipulating the vma
(for either discard or cloning) and so protect normal users.
v4: Only run the invalidate notifier if the range intercepts the bo.
v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers
v6: Recheck after reacquire mutex for lost mmu.
v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary.
v8: Fix rebasing error after forwarding porting the back port.
v9: Limit the userptr to page aligned entries. We now expect userspace
to handle all the offset-in-page adjustments itself.
v10: Prevent vma from being copied across fork to avoid issues with cow.
v11: Drop vma behaviour changes -- locking is nigh on impossible.
Use a worker to load user pages to avoid lock inversions.
v12: Use get_task_mm()/mmput() for correct refcounting of mm.
v13: Use a worker to release the mmu_notifier to avoid lock inversion
v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer
with its own locking and tree of objects for each mm/mmu_notifier.
v15: Prevent overlapping userptr objects, and invalidate all objects
within the mmu_notifier range
v16: Fix a typo for iterating over multiple objects in the range and
rearrange error path to destroy the mmu_notifier locklessly.
Also close a race between invalidate_range and the get_pages_worker.
v17: Close a race between get_pages_worker/invalidate_range and fresh
allocations of the same userptr range - and notice that
struct_mutex was presumed to be held when during creation it wasn't.
v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory
for the struct sg_table and to clear it before reporting an error.
v19: Always error out on read-only userptr requests as we don't have the
hardware infrastructure to support them at the moment.
v20: Refuse to implement read-only support until we have the required
infrastructure - but reserve the bit in flags for future use.
v21: use_mm() is not required for get_user_pages(). It is only meant to
be used to fix up the kernel thread's current->mm for use with
copy_user().
v22: Use sg_alloc_table_from_pages for that chunky feeling
v23: Export a function for sanity checking dma-buf rather than encode
userptr details elsewhere, and clean up comments based on
suggestions by Bradley.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Akash Goel <akash.goel@intel.com>
Cc: "Volkin, Bradley D" <bradley.d.volkin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
[danvet: Frob ioctl allocation to pick the next one - will cause a bit
of fuss with create2 apparently, but such are the rules.]
[danvet2: oops, forgot to git add after manual patch application]
[danvet3: Appease sparse.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-16 21:22:37 +08:00
|
|
|
int i915_gem_userptr_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file);
|
2008-10-23 12:40:13 +08:00
|
|
|
int i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file_priv);
|
2012-05-25 06:03:10 +08:00
|
|
|
int i915_gem_wait_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file_priv);
|
2016-01-19 21:26:29 +08:00
|
|
|
void i915_gem_load_init(struct drm_device *dev);
|
|
|
|
void i915_gem_load_cleanup(struct drm_device *dev);
|
2016-03-16 20:54:03 +08:00
|
|
|
void i915_gem_load_init_fences(struct drm_i915_private *dev_priv);
|
2016-05-14 14:26:33 +08:00
|
|
|
int i915_gem_freeze_late(struct drm_i915_private *dev_priv);
|
|
|
|
|
2012-11-15 19:32:30 +08:00
|
|
|
void *i915_gem_object_alloc(struct drm_device *dev);
|
|
|
|
void i915_gem_object_free(struct drm_i915_gem_object *obj);
|
2012-06-07 22:38:42 +08:00
|
|
|
void i915_gem_object_init(struct drm_i915_gem_object *obj,
|
|
|
|
const struct drm_i915_gem_object_ops *ops);
|
2016-04-23 02:14:32 +08:00
|
|
|
struct drm_i915_gem_object *i915_gem_object_create(struct drm_device *dev,
|
2010-11-09 03:18:58 +08:00
|
|
|
size_t size);
|
2015-07-10 02:29:02 +08:00
|
|
|
struct drm_i915_gem_object *i915_gem_object_create_from_data(
|
|
|
|
struct drm_device *dev, const void *data, size_t size);
|
2008-07-31 03:06:12 +08:00
|
|
|
void i915_gem_free_object(struct drm_gem_object *obj);
|
2013-07-18 03:19:03 +08:00
|
|
|
void i915_gem_vma_destroy(struct i915_vma *vma);
|
2012-11-15 19:32:30 +08:00
|
|
|
|
2015-04-21 00:04:05 +08:00
|
|
|
/* Flags used by pin/bind&friends. */
|
|
|
|
#define PIN_MAPPABLE (1<<0)
|
|
|
|
#define PIN_NONBLOCK (1<<1)
|
|
|
|
#define PIN_GLOBAL (1<<2)
|
|
|
|
#define PIN_OFFSET_BIAS (1<<3)
|
|
|
|
#define PIN_USER (1<<4)
|
|
|
|
#define PIN_UPDATE (1<<5)
|
2015-10-01 20:33:57 +08:00
|
|
|
#define PIN_ZONE_4G (1<<6)
|
|
|
|
#define PIN_HIGH (1<<7)
|
2015-12-08 19:55:07 +08:00
|
|
|
#define PIN_OFFSET_FIXED (1<<8)
|
drm/i915: Prevent negative relocation deltas from wrapping
This is pure evil. Userspace, I'm looking at you SNA, repacks batch
buffers on the fly after generation as they are being passed to the
kernel for execution. These batches also contain self-referenced
relocations as a single buffer encompasses the state commands, kernels,
vertices and sampler. During generation the buffers are placed at known
offsets within the full batch, and then the relocation deltas (as passed
to the kernel) are tweaked as the batch is repacked into a smaller buffer.
This means that userspace is passing negative relocations deltas, which
subsequently wrap to large values if the batch is at a low address. The
GPU hangs when it then tries to use the large value as a base for its
address offsets, rather than wrapping back to the real value (as one
would hope). As the GPU uses positive offsets from the base, we can
treat the relocation address as the minimum address read by the GPU.
For the upper bound, we trust that userspace will not read beyond the
end of the buffer.
So, how do we fix negative relocations from wrapping? We can either
check that every relocation looks valid when we write it, and then
position each object such that we prevent the offset wraparound, or we
just special-case the self-referential behaviour of SNA and force all
batches to be above 256k. Daniel prefers the latter approach.
This fixes a GPU hang when it tries to use an address (relocation +
offset) greater than the GTT size. The issue would occur quite easily
with full-ppgtt as each fd gets its own VM space, so low offsets would
often be handed out. However, with the rearrangement of the low GTT due
to capturing the BIOS framebuffer, it is already affecting kernels 3.15
onwards. I think only IVB+ is susceptible to this bug, but the workaround
should only kick in rarely, so it seems sensible to always apply it.
v3: Use a bias for batch buffers to prevent small negative delta relocations
from wrapping.
v4 from Daniel:
- s/BIAS/BATCH_OFFSET_BIAS/
- Extract eb_vma_misplaced/i915_vma_misplaced since the conditions
were growing rather cumbersome.
- Add a comment to eb_get_batch explaining why we do this.
- Apply the batch offset bias everywhere but mention that we've only
observed it on gen7 gpus.
- Drop PIN_OFFSET_FIX for now, that slipped in from a feature patch.
v5: Add static to eb_get_batch, spotted by 0-day tester.
Testcase: igt/gem_bad_reloc
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78533
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v3)
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-23 14:48:08 +08:00
|
|
|
#define PIN_OFFSET_MASK (~4095)
|
2015-03-16 20:11:13 +08:00
|
|
|
int __must_check
|
|
|
|
i915_gem_object_pin(struct drm_i915_gem_object *obj,
|
|
|
|
struct i915_address_space *vm,
|
|
|
|
uint32_t alignment,
|
|
|
|
uint64_t flags);
|
|
|
|
int __must_check
|
|
|
|
i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
|
|
|
|
const struct i915_ggtt_view *view,
|
|
|
|
uint32_t alignment,
|
|
|
|
uint64_t flags);
|
2014-12-11 01:27:58 +08:00
|
|
|
|
|
|
|
int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level,
|
|
|
|
u32 flags);
|
2015-11-20 22:16:39 +08:00
|
|
|
void __i915_vma_set_map_and_fenceable(struct i915_vma *vma);
|
drm/i915: plumb VM into bind/unbind code
As alluded to in several patches, and it will be reiterated later... A
VMA is an abstraction for a GEM BO bound into an address space.
Therefore it stands to reason, that the existing bind, and unbind are
the ones which will be the most impacted. This patch implements this,
and updates all callers which weren't already updated in the series
(because it was too messy).
This patch represents the bulk of an earlier, larger patch. I've pulled
out a bunch of things by the request of Daniel. The history is preserved
for posterity with the email convention of ">" One big change from the
original patch aside from a bunch of cropping is I've created an
i915_vma_unbind() function. That is because we always have the VMA
anyway, and doing an extra lookup is useful. There is a caveat, we
retain an i915_gem_object_ggtt_unbind, for the global cases which might
not talk in VMAs.
> drm/i915: plumb VM into object operations
>
> This patch was formerly known as:
> "drm/i915: Create VMAs (part 3) - plumbing"
>
> This patch adds a VM argument, bind/unbind, and the object
> offset/size/color getters/setters. It preserves the old ggtt helper
> functions because things still need, and will continue to need them.
>
> Some code will still need to be ported over after this.
>
> v2: Fix purge to pick an object and unbind all vmas
> This was doable because of the global bound list change.
>
> v3: With the commit to actually pin/unpin pages in place, there is no
> longer a need to check if unbind succeeded before calling put_pages().
> Make put_pages only BUG() after checking pin count.
>
> v4: Rebased on top of the new hangcheck work by Mika
> plumbed eb_destroy also
> Many checkpatch related fixes
>
> v5: Very large rebase
>
> v6:
> Change BUG_ON to WARN_ON (Daniel)
> Rename vm to ggtt in preallocate stolen, since it is always ggtt when
> dealing with stolen memory. (Daniel)
> list_for_each will short-circuit already (Daniel)
> remove superflous space (Daniel)
> Use per object list of vmas (Daniel)
> Make obj_bound_any() use obj_bound for each vm (Ben)
> s/bind_to_gtt/bind_to_vm/ (Ben)
>
> Fixed up the inactive shrinker. As Daniel noticed the code could
> potentially count the same object multiple times. While it's not
> possible in the current case, since 1 object can only ever be bound into
> 1 address space thus far - we may as well try to get something more
> future proof in place now. With a prep patch before this to switch over
> to using the bound list + inactive check, we're now able to carry that
> forward for every address space an object is bound into.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Rebase on top of the loss of "drm/i915: Cleanup more of VMA
in destroy".]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-01 08:00:10 +08:00
|
|
|
int __must_check i915_vma_unbind(struct i915_vma *vma);
|
2015-10-05 20:26:36 +08:00
|
|
|
/*
|
|
|
|
* BEWARE: Do not use the function below unless you can _absolutely_
|
|
|
|
* _guarantee_ VMA in question is _not in use_ anywhere.
|
|
|
|
*/
|
|
|
|
int __must_check __i915_vma_unbind_no_wait(struct i915_vma *vma);
|
2013-01-15 20:39:35 +08:00
|
|
|
int i915_gem_object_put_pages(struct drm_i915_gem_object *obj);
|
2013-12-14 01:22:31 +08:00
|
|
|
void i915_gem_release_all_mmaps(struct drm_i915_private *dev_priv);
|
2010-11-09 03:18:58 +08:00
|
|
|
void i915_gem_release_mmap(struct drm_i915_gem_object *obj);
|
2010-09-24 23:02:42 +08:00
|
|
|
|
2014-02-19 02:15:45 +08:00
|
|
|
int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
|
|
|
|
int *needs_clflush);
|
|
|
|
|
2012-06-07 22:38:42 +08:00
|
|
|
int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
|
2015-04-07 23:20:25 +08:00
|
|
|
|
|
|
|
static inline int __sg_page_count(struct scatterlist *sg)
|
2012-06-01 22:20:22 +08:00
|
|
|
{
|
2015-04-07 23:20:25 +08:00
|
|
|
return sg->length >> PAGE_SHIFT;
|
|
|
|
}
|
2013-02-19 01:28:02 +08:00
|
|
|
|
2015-12-11 02:51:23 +08:00
|
|
|
struct page *
|
|
|
|
i915_gem_object_get_dirty_page(struct drm_i915_gem_object *obj, int n);
|
|
|
|
|
2016-06-10 16:53:00 +08:00
|
|
|
static inline dma_addr_t
|
|
|
|
i915_gem_object_get_dma_address(struct drm_i915_gem_object *obj, int n)
|
|
|
|
{
|
|
|
|
if (n < obj->get_page.last) {
|
|
|
|
obj->get_page.sg = obj->pages->sgl;
|
|
|
|
obj->get_page.last = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
while (obj->get_page.last + __sg_page_count(obj->get_page.sg) <= n) {
|
|
|
|
obj->get_page.last += __sg_page_count(obj->get_page.sg++);
|
|
|
|
if (unlikely(sg_is_chain(obj->get_page.sg)))
|
|
|
|
obj->get_page.sg = sg_chain_ptr(obj->get_page.sg);
|
|
|
|
}
|
|
|
|
|
|
|
|
return sg_dma_address(obj->get_page.sg) + ((n - obj->get_page.last) << PAGE_SHIFT);
|
|
|
|
}
|
|
|
|
|
2015-04-07 23:20:25 +08:00
|
|
|
static inline struct page *
|
|
|
|
i915_gem_object_get_page(struct drm_i915_gem_object *obj, int n)
|
2012-06-01 22:20:22 +08:00
|
|
|
{
|
2015-04-07 23:20:25 +08:00
|
|
|
if (WARN_ON(n >= obj->base.size >> PAGE_SHIFT))
|
|
|
|
return NULL;
|
2013-02-19 01:28:02 +08:00
|
|
|
|
2015-04-07 23:20:25 +08:00
|
|
|
if (n < obj->get_page.last) {
|
|
|
|
obj->get_page.sg = obj->pages->sgl;
|
|
|
|
obj->get_page.last = 0;
|
|
|
|
}
|
2013-02-19 01:28:02 +08:00
|
|
|
|
2015-04-07 23:20:25 +08:00
|
|
|
while (obj->get_page.last + __sg_page_count(obj->get_page.sg) <= n) {
|
|
|
|
obj->get_page.last += __sg_page_count(obj->get_page.sg++);
|
|
|
|
if (unlikely(sg_is_chain(obj->get_page.sg)))
|
|
|
|
obj->get_page.sg = sg_chain_ptr(obj->get_page.sg);
|
|
|
|
}
|
2013-02-19 01:28:02 +08:00
|
|
|
|
2015-04-07 23:20:25 +08:00
|
|
|
return nth_page(sg_page(obj->get_page.sg), n - obj->get_page.last);
|
2012-06-01 22:20:22 +08:00
|
|
|
}
|
2015-04-07 23:20:25 +08:00
|
|
|
|
2012-09-05 04:02:54 +08:00
|
|
|
static inline void i915_gem_object_pin_pages(struct drm_i915_gem_object *obj)
|
|
|
|
{
|
|
|
|
BUG_ON(obj->pages == NULL);
|
|
|
|
obj->pages_pin_count++;
|
|
|
|
}
|
2016-04-08 19:11:11 +08:00
|
|
|
|
2012-09-05 04:02:54 +08:00
|
|
|
static inline void i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj)
|
|
|
|
{
|
|
|
|
BUG_ON(obj->pages_pin_count == 0);
|
|
|
|
obj->pages_pin_count--;
|
|
|
|
}
|
|
|
|
|
2016-04-08 19:11:11 +08:00
|
|
|
/**
|
|
|
|
* i915_gem_object_pin_map - return a contiguous mapping of the entire object
|
|
|
|
* @obj - the object to map into kernel address space
|
|
|
|
*
|
|
|
|
* Calls i915_gem_object_pin_pages() to prevent reaping of the object's
|
|
|
|
* pages and then returns a contiguous mapping of the backing storage into
|
|
|
|
* the kernel address space.
|
|
|
|
*
|
2016-04-12 21:46:16 +08:00
|
|
|
* The caller must hold the struct_mutex, and is responsible for calling
|
|
|
|
* i915_gem_object_unpin_map() when the mapping is no longer required.
|
2016-04-08 19:11:11 +08:00
|
|
|
*
|
2016-04-12 21:46:16 +08:00
|
|
|
* Returns the pointer through which to access the mapped object, or an
|
|
|
|
* ERR_PTR() on error.
|
2016-04-08 19:11:11 +08:00
|
|
|
*/
|
|
|
|
void *__must_check i915_gem_object_pin_map(struct drm_i915_gem_object *obj);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* i915_gem_object_unpin_map - releases an earlier mapping
|
|
|
|
* @obj - the object to unmap
|
|
|
|
*
|
|
|
|
* After pinning the object and mapping its pages, once you are finished
|
|
|
|
* with your access, call i915_gem_object_unpin_map() to release the pin
|
|
|
|
* upon the mapping. Once the pin count reaches zero, that mapping may be
|
|
|
|
* removed.
|
|
|
|
*
|
|
|
|
* The caller must hold the struct_mutex.
|
|
|
|
*/
|
|
|
|
static inline void i915_gem_object_unpin_map(struct drm_i915_gem_object *obj)
|
|
|
|
{
|
|
|
|
lockdep_assert_held(&obj->base.dev->struct_mutex);
|
|
|
|
i915_gem_object_unpin_pages(obj);
|
|
|
|
}
|
|
|
|
|
2010-11-26 02:00:26 +08:00
|
|
|
int __must_check i915_mutex_lock_interruptible(struct drm_device *dev);
|
2012-04-06 05:47:36 +08:00
|
|
|
int i915_gem_object_sync(struct drm_i915_gem_object *obj,
|
2015-06-18 20:14:56 +08:00
|
|
|
struct intel_engine_cs *to,
|
|
|
|
struct drm_i915_gem_request **to_req);
|
2013-09-25 00:57:58 +08:00
|
|
|
void i915_vma_move_to_active(struct i915_vma *vma,
|
2015-05-30 00:43:50 +08:00
|
|
|
struct drm_i915_gem_request *req);
|
2011-02-07 10:16:14 +08:00
|
|
|
int i915_gem_dumb_create(struct drm_file *file_priv,
|
|
|
|
struct drm_device *dev,
|
|
|
|
struct drm_mode_create_dumb *args);
|
2014-12-24 11:11:17 +08:00
|
|
|
int i915_gem_mmap_gtt(struct drm_file *file_priv, struct drm_device *dev,
|
|
|
|
uint32_t handle, uint64_t *offset);
|
2016-05-20 18:54:06 +08:00
|
|
|
|
|
|
|
void i915_gem_track_fb(struct drm_i915_gem_object *old,
|
|
|
|
struct drm_i915_gem_object *new,
|
|
|
|
unsigned frontbuffer_bits);
|
|
|
|
|
2010-09-24 23:02:42 +08:00
|
|
|
/**
|
|
|
|
* Returns true if seq1 is later than seq2.
|
|
|
|
*/
|
|
|
|
static inline bool
|
|
|
|
i915_seqno_passed(uint32_t seq1, uint32_t seq2)
|
|
|
|
{
|
|
|
|
return (int32_t)(seq1 - seq2) >= 0;
|
|
|
|
}
|
|
|
|
|
2016-07-02 00:23:16 +08:00
|
|
|
static inline bool i915_gem_request_started(const struct drm_i915_gem_request *req)
|
2015-12-11 19:32:59 +08:00
|
|
|
{
|
2016-07-02 00:23:17 +08:00
|
|
|
return i915_seqno_passed(intel_engine_get_seqno(req->engine),
|
2016-04-09 17:57:54 +08:00
|
|
|
req->previous_seqno);
|
2015-12-11 19:32:59 +08:00
|
|
|
}
|
|
|
|
|
2016-07-02 00:23:16 +08:00
|
|
|
static inline bool i915_gem_request_completed(const struct drm_i915_gem_request *req)
|
2014-11-25 02:49:42 +08:00
|
|
|
{
|
2016-07-02 00:23:17 +08:00
|
|
|
return i915_seqno_passed(intel_engine_get_seqno(req->engine),
|
2016-04-09 17:57:54 +08:00
|
|
|
req->seqno);
|
2014-11-25 02:49:42 +08:00
|
|
|
}
|
|
|
|
|
2016-07-02 00:23:16 +08:00
|
|
|
bool __i915_spin_request(const struct drm_i915_gem_request *request,
|
|
|
|
int state, unsigned long timeout_us);
|
|
|
|
static inline bool i915_spin_request(const struct drm_i915_gem_request *request,
|
|
|
|
int state, unsigned long timeout_us)
|
|
|
|
{
|
|
|
|
return (i915_gem_request_started(request) &&
|
|
|
|
__i915_spin_request(request, state, timeout_us));
|
|
|
|
}
|
|
|
|
|
2016-05-06 22:40:21 +08:00
|
|
|
int __must_check i915_gem_get_seqno(struct drm_i915_private *dev_priv, u32 *seqno);
|
2012-12-19 17:13:08 +08:00
|
|
|
int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno);
|
2011-12-14 20:57:08 +08:00
|
|
|
|
2014-02-25 23:11:23 +08:00
|
|
|
struct drm_i915_gem_request *
|
2016-03-16 19:00:37 +08:00
|
|
|
i915_gem_find_active_request(struct intel_engine_cs *engine);
|
2014-02-25 23:11:23 +08:00
|
|
|
|
2016-07-04 15:08:31 +08:00
|
|
|
void i915_gem_retire_requests(struct drm_i915_private *dev_priv);
|
2016-03-16 19:00:37 +08:00
|
|
|
void i915_gem_retire_requests_ring(struct intel_engine_cs *engine);
|
drm/i915: Replaced Blitter ring based flips with MMIO flips
This patch enables the framework for using MMIO based flip calls,
in contrast with the CS based flip calls which are being used currently.
MMIO based flip calls can be enabled on architectures where
Render and Blitter engines reside in different power wells. The
decision to use MMIO flips can be made based on workloads to give
100% residency for Media power well.
v2: The MMIO flips now use the interrupt driven mechanism for issuing the
flips when target seqno is reached. (Incorporating Ville's idea)
v3: Rebasing on latest code. Code restructuring after incorporating
Damien's comments
v4: Addressing Ville's review comments
-general cleanup
-updating only base addr instead of calling update_primary_plane
-extending patch for gen5+ platforms
v5: Addressed Ville's review comments
-Making mmio flip vs cs flip selection based on module parameter
-Adding check for DRIVER_MODESET feature in notify_ring before calling
notify mmio flip.
-Other changes mostly in function arguments
v6: -Having a seperate function to check condition for using mmio flips (Ville)
-propogating error code from i915_gem_check_olr (Ville)
v7: -Adding __must_check with i915_gem_check_olr (Chris)
-Renaming mmio_flip_data to mmio_flip (Chris)
-Rebasing on latest nightly
v8: -Rebasing on latest code
-squash 3rd patch in series(mmio setbase vs page flip race) with this patch
-Added new tiling mode update in intel_do_mmio_flip (Chris)
v9: -check for obj->last_write_seqno being 0 instead of obj->ring being NULL in
intel_postpone_flip, as this is a more restrictive condition (Chris)
v10: -Applied Chris's suggestions for squashing patches 2,3 into this patch.
These patches make the selection of CS vs MMIO flip at the page flip time, and
make the module parameter for using mmio flips as tristate, the states being
'force CS flips', 'force mmio flips', 'driver discretion'.
Changed the logic for driver discretion (Chris)
v11: Minor code cleanup(better readability, fixing whitespace errors, using
lockdep to check mutex locked status in postpone_flip, removal of __must_check
in function definition) (Chris)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk> # snb, ivb
[danvet: Fix up parameter alignement checkpatch spotted.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-06-02 19:17:17 +08:00
|
|
|
|
2016-04-14 00:35:03 +08:00
|
|
|
static inline u32 i915_reset_counter(struct i915_gpu_error *error)
|
|
|
|
{
|
|
|
|
return atomic_read(&error->reset_counter);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool __i915_reset_in_progress(u32 reset)
|
|
|
|
{
|
|
|
|
return unlikely(reset & I915_RESET_IN_PROGRESS_FLAG);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool __i915_reset_in_progress_or_wedged(u32 reset)
|
|
|
|
{
|
|
|
|
return unlikely(reset & (I915_RESET_IN_PROGRESS_FLAG | I915_WEDGED));
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool __i915_terminally_wedged(u32 reset)
|
|
|
|
{
|
|
|
|
return unlikely(reset & I915_WEDGED);
|
|
|
|
}
|
|
|
|
|
2012-11-16 00:17:22 +08:00
|
|
|
static inline bool i915_reset_in_progress(struct i915_gpu_error *error)
|
|
|
|
{
|
2016-04-14 00:35:03 +08:00
|
|
|
return __i915_reset_in_progress(i915_reset_counter(error));
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool i915_reset_in_progress_or_wedged(struct i915_gpu_error *error)
|
|
|
|
{
|
|
|
|
return __i915_reset_in_progress_or_wedged(i915_reset_counter(error));
|
2012-11-16 00:17:22 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool i915_terminally_wedged(struct i915_gpu_error *error)
|
|
|
|
{
|
2016-04-14 00:35:03 +08:00
|
|
|
return __i915_terminally_wedged(i915_reset_counter(error));
|
2013-11-12 20:44:19 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline u32 i915_reset_count(struct i915_gpu_error *error)
|
|
|
|
{
|
2016-04-14 00:35:03 +08:00
|
|
|
return ((i915_reset_counter(error) & ~I915_WEDGED) + 1) / 2;
|
2012-11-16 00:17:22 +08:00
|
|
|
}
|
2012-02-15 19:25:36 +08:00
|
|
|
|
2010-09-30 23:53:18 +08:00
|
|
|
void i915_gem_reset(struct drm_device *dev);
|
2013-08-08 21:41:09 +08:00
|
|
|
bool i915_gem_clflush_object(struct drm_i915_gem_object *obj, bool force);
|
2012-04-24 22:47:41 +08:00
|
|
|
int __must_check i915_gem_init(struct drm_device *dev);
|
2016-03-16 19:00:40 +08:00
|
|
|
int i915_gem_init_engines(struct drm_device *dev);
|
2012-02-02 16:58:12 +08:00
|
|
|
int __must_check i915_gem_init_hw(struct drm_device *dev);
|
|
|
|
void i915_gem_init_swizzling(struct drm_device *dev);
|
2016-03-16 19:00:40 +08:00
|
|
|
void i915_gem_cleanup_engines(struct drm_device *dev);
|
2016-06-24 21:55:57 +08:00
|
|
|
int __must_check i915_gem_wait_for_idle(struct drm_i915_private *dev_priv);
|
2013-10-16 18:50:01 +08:00
|
|
|
int __must_check i915_gem_suspend(struct drm_device *dev);
|
2015-05-30 00:43:49 +08:00
|
|
|
void __i915_add_request(struct drm_i915_gem_request *req,
|
2015-05-30 00:43:34 +08:00
|
|
|
struct drm_i915_gem_object *batch_obj,
|
|
|
|
bool flush_caches);
|
2015-05-30 00:43:49 +08:00
|
|
|
#define i915_add_request(req) \
|
2015-05-30 00:44:12 +08:00
|
|
|
__i915_add_request(req, NULL, true)
|
2015-05-30 00:43:49 +08:00
|
|
|
#define i915_add_request_no_flush(req) \
|
2015-05-30 00:44:12 +08:00
|
|
|
__i915_add_request(req, NULL, false)
|
2014-11-25 02:49:35 +08:00
|
|
|
int __i915_wait_request(struct drm_i915_gem_request *req,
|
2014-11-06 15:26:38 +08:00
|
|
|
bool interruptible,
|
|
|
|
s64 *timeout,
|
2015-04-27 20:41:22 +08:00
|
|
|
struct intel_rps_client *rps);
|
2014-11-26 21:17:05 +08:00
|
|
|
int __must_check i915_wait_request(struct drm_i915_gem_request *req);
|
2008-11-13 02:03:55 +08:00
|
|
|
int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf);
|
2010-11-23 23:26:33 +08:00
|
|
|
int __must_check
|
2015-04-27 20:41:14 +08:00
|
|
|
i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
|
|
|
|
bool readonly);
|
|
|
|
int __must_check
|
2010-11-23 23:26:33 +08:00
|
|
|
i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj,
|
|
|
|
bool write);
|
|
|
|
int __must_check
|
2012-03-26 16:10:27 +08:00
|
|
|
i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write);
|
|
|
|
int __must_check
|
2011-04-14 16:41:17 +08:00
|
|
|
i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
|
|
|
|
u32 alignment,
|
2015-03-23 19:10:33 +08:00
|
|
|
const struct i915_ggtt_view *view);
|
|
|
|
void i915_gem_object_unpin_from_display_plane(struct drm_i915_gem_object *obj,
|
|
|
|
const struct i915_ggtt_view *view);
|
2014-05-21 19:42:56 +08:00
|
|
|
int i915_gem_object_attach_phys(struct drm_i915_gem_object *obj,
|
2010-08-07 18:01:39 +08:00
|
|
|
int align);
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-26 00:34:56 +08:00
|
|
|
int i915_gem_open(struct drm_device *dev, struct drm_file *file);
|
2010-11-09 03:18:58 +08:00
|
|
|
void i915_gem_release(struct drm_device *dev, struct drm_file *file);
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2013-01-08 03:47:35 +08:00
|
|
|
uint32_t
|
|
|
|
i915_gem_get_gtt_size(struct drm_device *dev, uint32_t size, int tiling_mode);
|
2011-03-07 18:42:03 +08:00
|
|
|
uint32_t
|
2013-01-08 03:47:33 +08:00
|
|
|
i915_gem_get_gtt_alignment(struct drm_device *dev, uint32_t size,
|
|
|
|
int tiling_mode, bool fenced);
|
2011-03-07 18:42:03 +08:00
|
|
|
|
2011-04-04 16:44:39 +08:00
|
|
|
int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
|
|
|
|
enum i915_cache_level cache_level);
|
|
|
|
|
2012-05-10 21:25:09 +08:00
|
|
|
struct drm_gem_object *i915_gem_prime_import(struct drm_device *dev,
|
|
|
|
struct dma_buf *dma_buf);
|
|
|
|
|
|
|
|
struct dma_buf *i915_gem_prime_export(struct drm_device *dev,
|
|
|
|
struct drm_gem_object *gem_obj, int flags);
|
|
|
|
|
2015-08-08 00:40:17 +08:00
|
|
|
u64 i915_gem_obj_ggtt_offset_view(struct drm_i915_gem_object *o,
|
|
|
|
const struct i915_ggtt_view *view);
|
|
|
|
u64 i915_gem_obj_offset(struct drm_i915_gem_object *o,
|
|
|
|
struct i915_address_space *vm);
|
|
|
|
static inline u64
|
2015-03-16 20:11:13 +08:00
|
|
|
i915_gem_obj_ggtt_offset(struct drm_i915_gem_object *o)
|
2014-12-11 01:27:58 +08:00
|
|
|
{
|
2015-03-27 19:09:22 +08:00
|
|
|
return i915_gem_obj_ggtt_offset_view(o, &i915_ggtt_view_normal);
|
2014-12-11 01:27:58 +08:00
|
|
|
}
|
2015-03-16 20:11:13 +08:00
|
|
|
|
2013-08-01 07:59:56 +08:00
|
|
|
bool i915_gem_obj_bound_any(struct drm_i915_gem_object *o);
|
2015-03-16 20:11:13 +08:00
|
|
|
bool i915_gem_obj_ggtt_bound_view(struct drm_i915_gem_object *o,
|
2015-03-27 19:09:22 +08:00
|
|
|
const struct i915_ggtt_view *view);
|
2013-08-01 07:59:56 +08:00
|
|
|
bool i915_gem_obj_bound(struct drm_i915_gem_object *o,
|
2015-03-16 20:11:13 +08:00
|
|
|
struct i915_address_space *vm);
|
2014-12-11 01:27:58 +08:00
|
|
|
|
|
|
|
struct i915_vma *
|
2015-03-16 20:11:13 +08:00
|
|
|
i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
|
|
|
|
struct i915_address_space *vm);
|
|
|
|
struct i915_vma *
|
|
|
|
i915_gem_obj_to_ggtt_view(struct drm_i915_gem_object *obj,
|
|
|
|
const struct i915_ggtt_view *view);
|
2014-12-11 01:27:58 +08:00
|
|
|
|
2013-08-14 17:38:35 +08:00
|
|
|
struct i915_vma *
|
|
|
|
i915_gem_obj_lookup_or_create_vma(struct drm_i915_gem_object *obj,
|
2015-03-16 20:11:13 +08:00
|
|
|
struct i915_address_space *vm);
|
|
|
|
struct i915_vma *
|
|
|
|
i915_gem_obj_lookup_or_create_ggtt_vma(struct drm_i915_gem_object *obj,
|
|
|
|
const struct i915_ggtt_view *view);
|
2013-09-25 00:57:57 +08:00
|
|
|
|
2015-03-16 20:11:13 +08:00
|
|
|
static inline struct i915_vma *
|
|
|
|
i915_gem_obj_to_ggtt(struct drm_i915_gem_object *obj)
|
|
|
|
{
|
|
|
|
return i915_gem_obj_to_ggtt_view(obj, &i915_ggtt_view_normal);
|
2013-12-07 06:10:55 +08:00
|
|
|
}
|
2015-03-16 20:11:13 +08:00
|
|
|
bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj);
|
2013-09-25 00:57:57 +08:00
|
|
|
|
2013-08-01 07:59:56 +08:00
|
|
|
/* Some GGTT VM helpers */
|
2014-08-06 21:04:48 +08:00
|
|
|
static inline struct i915_hw_ppgtt *
|
|
|
|
i915_vm_to_ppgtt(struct i915_address_space *vm)
|
|
|
|
{
|
|
|
|
return container_of(vm, struct i915_hw_ppgtt, base);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2013-08-01 07:59:56 +08:00
|
|
|
static inline bool i915_gem_obj_ggtt_bound(struct drm_i915_gem_object *obj)
|
|
|
|
{
|
2015-03-27 19:09:22 +08:00
|
|
|
return i915_gem_obj_ggtt_bound_view(obj, &i915_ggtt_view_normal);
|
2013-08-01 07:59:56 +08:00
|
|
|
}
|
|
|
|
|
2016-04-21 20:04:43 +08:00
|
|
|
unsigned long
|
|
|
|
i915_gem_obj_ggtt_size(struct drm_i915_gem_object *obj);
|
2013-08-01 07:59:58 +08:00
|
|
|
|
|
|
|
static inline int __must_check
|
|
|
|
i915_gem_obj_ggtt_pin(struct drm_i915_gem_object *obj,
|
|
|
|
uint32_t alignment,
|
2014-02-14 21:01:11 +08:00
|
|
|
unsigned flags)
|
2013-08-01 07:59:58 +08:00
|
|
|
{
|
2016-03-30 21:57:10 +08:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
|
|
|
|
struct i915_ggtt *ggtt = &dev_priv->ggtt;
|
|
|
|
|
|
|
|
return i915_gem_object_pin(obj, &ggtt->base,
|
2014-08-06 21:04:49 +08:00
|
|
|
alignment, flags | PIN_GLOBAL);
|
2013-08-01 07:59:58 +08:00
|
|
|
}
|
2013-08-01 07:59:56 +08:00
|
|
|
|
2015-03-23 19:10:33 +08:00
|
|
|
void i915_gem_object_ggtt_unpin_view(struct drm_i915_gem_object *obj,
|
|
|
|
const struct i915_ggtt_view *view);
|
|
|
|
static inline void
|
|
|
|
i915_gem_object_ggtt_unpin(struct drm_i915_gem_object *obj)
|
|
|
|
{
|
|
|
|
i915_gem_object_ggtt_unpin_view(obj, &i915_ggtt_view_normal);
|
|
|
|
}
|
2014-02-14 21:01:19 +08:00
|
|
|
|
2015-07-24 19:55:11 +08:00
|
|
|
/* i915_gem_fence.c */
|
|
|
|
int __must_check i915_gem_object_get_fence(struct drm_i915_gem_object *obj);
|
|
|
|
int __must_check i915_gem_object_put_fence(struct drm_i915_gem_object *obj);
|
|
|
|
|
|
|
|
bool i915_gem_object_pin_fence(struct drm_i915_gem_object *obj);
|
|
|
|
void i915_gem_object_unpin_fence(struct drm_i915_gem_object *obj);
|
|
|
|
|
|
|
|
void i915_gem_restore_fences(struct drm_device *dev);
|
|
|
|
|
2015-07-24 23:40:14 +08:00
|
|
|
void i915_gem_detect_bit_6_swizzle(struct drm_device *dev);
|
|
|
|
void i915_gem_object_do_bit_17_swizzle(struct drm_i915_gem_object *obj);
|
|
|
|
void i915_gem_object_save_bit_17_swizzle(struct drm_i915_gem_object *obj);
|
|
|
|
|
drm/i915: preliminary context support
Very basic code for context setup/destruction in the driver.
Adds the file i915_gem_context.c This file implements HW context
support. On gen5+ a HW context consists of an opaque GPU object which is
referenced at times of context saves and restores. With RC6 enabled,
the context is also referenced as the GPU enters and exists from RC6
(GPU has it's own internal power context, except on gen5). Though
something like a context does exist for the media ring, the code only
supports contexts for the render ring.
In software, there is a distinction between contexts created by the
user, and the default HW context. The default HW context is used by GPU
clients that do not request setup of their own hardware context. The
default context's state is never restored to help prevent programming
errors. This would happen if a client ran and piggy-backed off another
clients GPU state. The default context only exists to give the GPU some
offset to load as the current to invoke a save of the context we
actually care about. In fact, the code could likely be constructed,
albeit in a more complicated fashion, to never use the default context,
though that limits the driver's ability to swap out, and/or destroy
other contexts.
All other contexts are created as a request by the GPU client. These
contexts store GPU state, and thus allow GPU clients to not re-emit
state (and potentially query certain state) at any time. The kernel
driver makes certain that the appropriate commands are inserted.
There are 4 entry points into the contexts, init, fini, open, close.
The names are self-explanatory except that init can be called during
reset, and also during pm thaw/resume. As we expect our context to be
preserved across these events, we do not reinitialize in this case.
As Adam Jackson pointed out, The cutoff of 1MB where a HW context is
considered too big is arbitrary. The reason for this is even though
context sizes are increasing with every generation, they have yet to
eclipse even 32k. If we somehow read back way more than that, it
probably means BIOS has done something strange, or we're running on a
platform that wasn't designed for this.
v2: rename load/unload to init/fini (daniel)
remove ILK support for get_size() (indirectly daniel)
add HAS_HW_CONTEXTS macro to clarify supported platforms (daniel)
added comments (Ben)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2012-06-05 05:42:42 +08:00
|
|
|
/* i915_gem_context.c */
|
2013-11-06 23:56:29 +08:00
|
|
|
int __must_check i915_gem_context_init(struct drm_device *dev);
|
2016-04-28 16:56:41 +08:00
|
|
|
void i915_gem_context_lost(struct drm_i915_private *dev_priv);
|
drm/i915: preliminary context support
Very basic code for context setup/destruction in the driver.
Adds the file i915_gem_context.c This file implements HW context
support. On gen5+ a HW context consists of an opaque GPU object which is
referenced at times of context saves and restores. With RC6 enabled,
the context is also referenced as the GPU enters and exists from RC6
(GPU has it's own internal power context, except on gen5). Though
something like a context does exist for the media ring, the code only
supports contexts for the render ring.
In software, there is a distinction between contexts created by the
user, and the default HW context. The default HW context is used by GPU
clients that do not request setup of their own hardware context. The
default context's state is never restored to help prevent programming
errors. This would happen if a client ran and piggy-backed off another
clients GPU state. The default context only exists to give the GPU some
offset to load as the current to invoke a save of the context we
actually care about. In fact, the code could likely be constructed,
albeit in a more complicated fashion, to never use the default context,
though that limits the driver's ability to swap out, and/or destroy
other contexts.
All other contexts are created as a request by the GPU client. These
contexts store GPU state, and thus allow GPU clients to not re-emit
state (and potentially query certain state) at any time. The kernel
driver makes certain that the appropriate commands are inserted.
There are 4 entry points into the contexts, init, fini, open, close.
The names are self-explanatory except that init can be called during
reset, and also during pm thaw/resume. As we expect our context to be
preserved across these events, we do not reinitialize in this case.
As Adam Jackson pointed out, The cutoff of 1MB where a HW context is
considered too big is arbitrary. The reason for this is even though
context sizes are increasing with every generation, they have yet to
eclipse even 32k. If we somehow read back way more than that, it
probably means BIOS has done something strange, or we're running on a
platform that wasn't designed for this.
v2: rename load/unload to init/fini (daniel)
remove ILK support for get_size() (indirectly daniel)
add HAS_HW_CONTEXTS macro to clarify supported platforms (daniel)
added comments (Ben)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2012-06-05 05:42:42 +08:00
|
|
|
void i915_gem_context_fini(struct drm_device *dev);
|
2013-12-07 06:11:03 +08:00
|
|
|
void i915_gem_context_reset(struct drm_device *dev);
|
2013-12-07 06:10:58 +08:00
|
|
|
int i915_gem_context_open(struct drm_device *dev, struct drm_file *file);
|
drm/i915: preliminary context support
Very basic code for context setup/destruction in the driver.
Adds the file i915_gem_context.c This file implements HW context
support. On gen5+ a HW context consists of an opaque GPU object which is
referenced at times of context saves and restores. With RC6 enabled,
the context is also referenced as the GPU enters and exists from RC6
(GPU has it's own internal power context, except on gen5). Though
something like a context does exist for the media ring, the code only
supports contexts for the render ring.
In software, there is a distinction between contexts created by the
user, and the default HW context. The default HW context is used by GPU
clients that do not request setup of their own hardware context. The
default context's state is never restored to help prevent programming
errors. This would happen if a client ran and piggy-backed off another
clients GPU state. The default context only exists to give the GPU some
offset to load as the current to invoke a save of the context we
actually care about. In fact, the code could likely be constructed,
albeit in a more complicated fashion, to never use the default context,
though that limits the driver's ability to swap out, and/or destroy
other contexts.
All other contexts are created as a request by the GPU client. These
contexts store GPU state, and thus allow GPU clients to not re-emit
state (and potentially query certain state) at any time. The kernel
driver makes certain that the appropriate commands are inserted.
There are 4 entry points into the contexts, init, fini, open, close.
The names are self-explanatory except that init can be called during
reset, and also during pm thaw/resume. As we expect our context to be
preserved across these events, we do not reinitialize in this case.
As Adam Jackson pointed out, The cutoff of 1MB where a HW context is
considered too big is arbitrary. The reason for this is even though
context sizes are increasing with every generation, they have yet to
eclipse even 32k. If we somehow read back way more than that, it
probably means BIOS has done something strange, or we're running on a
platform that wasn't designed for this.
v2: rename load/unload to init/fini (daniel)
remove ILK support for get_size() (indirectly daniel)
add HAS_HW_CONTEXTS macro to clarify supported platforms (daniel)
added comments (Ben)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2012-06-05 05:42:42 +08:00
|
|
|
void i915_gem_context_close(struct drm_device *dev, struct drm_file *file);
|
2015-05-30 00:43:41 +08:00
|
|
|
int i915_switch_context(struct drm_i915_gem_request *req);
|
2013-04-30 18:30:33 +08:00
|
|
|
void i915_gem_context_free(struct kref *ctx_ref);
|
drm/i915/bdw: A bit more advanced LR context alloc/free
Now that we have the ability to allocate our own context backing objects
and we have multiplexed one of them per engine inside the context structs,
we can finally allocate and free them correctly.
Regarding the context size, reading the register to calculate the sizes
can work, I think, however the docs are very clear about the actual
context sizes on GEN8, so just hardcode that and use it.
v2: Rebased on top of the Full PPGTT series. It is important to notice
that at this point we have one global default context per engine, all
of them using the aliasing PPGTT (as opposed to the single global
default context we have with legacy HW contexts).
v3:
- Go back to one single global default context, this time with multiple
backing objects inside.
- Use different context sizes for non-render engines, as suggested by
Damien (still hardcoded, since the information about the context size
registers in the BSpec is, well, *lacking*).
- Render ctx size is 20 (or 19) pages, but not 21 (caught by Damien).
- Move default context backing object creation to intel_init_ring (so
that we don't waste memory in rings that might not get initialized).
v4:
- Reuse the HW legacy context init/fini.
- Create a separate free function.
- Rename the functions with an intel_ preffix.
v5: Several rebases to account for the changes in the previous patches.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-07-25 00:04:14 +08:00
|
|
|
struct drm_i915_gem_object *
|
|
|
|
i915_gem_alloc_context_obj(struct drm_device *dev, size_t size);
|
2016-06-16 20:07:05 +08:00
|
|
|
struct i915_gem_context *
|
|
|
|
i915_gem_context_create_gvt(struct drm_device *dev);
|
2016-05-24 21:53:36 +08:00
|
|
|
|
|
|
|
static inline struct i915_gem_context *
|
|
|
|
i915_gem_context_lookup(struct drm_i915_file_private *file_priv, u32 id)
|
|
|
|
{
|
|
|
|
struct i915_gem_context *ctx;
|
|
|
|
|
2016-06-24 21:00:21 +08:00
|
|
|
lockdep_assert_held(&file_priv->dev_priv->drm.struct_mutex);
|
2016-05-24 21:53:36 +08:00
|
|
|
|
|
|
|
ctx = idr_find(&file_priv->context_idr, id);
|
|
|
|
if (!ctx)
|
|
|
|
return ERR_PTR(-ENOENT);
|
|
|
|
|
|
|
|
return ctx;
|
|
|
|
}
|
|
|
|
|
2016-05-24 21:53:34 +08:00
|
|
|
static inline void i915_gem_context_reference(struct i915_gem_context *ctx)
|
2013-04-30 18:30:33 +08:00
|
|
|
{
|
2014-04-09 16:07:36 +08:00
|
|
|
kref_get(&ctx->ref);
|
2013-04-30 18:30:33 +08:00
|
|
|
}
|
|
|
|
|
2016-05-24 21:53:34 +08:00
|
|
|
static inline void i915_gem_context_unreference(struct i915_gem_context *ctx)
|
2013-04-30 18:30:33 +08:00
|
|
|
{
|
2016-06-24 21:00:21 +08:00
|
|
|
lockdep_assert_held(&ctx->i915->drm.struct_mutex);
|
2014-04-09 16:07:36 +08:00
|
|
|
kref_put(&ctx->ref, i915_gem_context_free);
|
2013-04-30 18:30:33 +08:00
|
|
|
}
|
|
|
|
|
2016-05-24 21:53:34 +08:00
|
|
|
static inline bool i915_gem_context_is_default(const struct i915_gem_context *c)
|
2014-01-30 22:05:48 +08:00
|
|
|
{
|
drm/i915: Emphasize that ctx->id is merely a user handle
This is an Execlists preparatory patch, since they make context ID become an
overloaded term:
- In the software, it was used to distinguish which context userspace was
trying to use.
- In the BSpec, the term is used to describe the 20-bits long field the
hardware uses to it to discriminate the contexts that are submitted to
the ELSP and inform the driver about their current status (via Context
Switch Interrupts and Context Status Buffers).
Initially, I tried to make the different meanings converge, but it proved
impossible:
- The software ctx->id is per-filp, while the hardware one needs to be
globally unique.
- Also, we multiplex several backing states objects per intel_context,
and all of them need unique HW IDs.
- I tried adding a per-filp ID and then composing the HW context ID as:
ctx->id + file_priv->id + ring->id, but the fact that the hardware only
uses 20-bits means we have to artificially limit the number of filps or
contexts the userspace can create.
The ctx->user_handle renaming bits are done with this Cocci patch (plus
manual frobbing of the struct declaration):
@@
struct intel_context c;
@@
- (c).id
+ c.user_handle
@@
struct intel_context *c;
@@
- (c)->id
+ c->user_handle
Also, while we are at it, s/DEFAULT_CONTEXT_ID/DEFAULT_CONTEXT_HANDLE and
change the type to unsigned 32 bits.
v2: s/handle/user_handle and change the type to uint32_t as suggested by
Chris Wilson.
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-07-03 23:28:00 +08:00
|
|
|
return c->user_handle == DEFAULT_CONTEXT_HANDLE;
|
2014-01-30 22:05:48 +08:00
|
|
|
}
|
|
|
|
|
2012-06-05 05:42:54 +08:00
|
|
|
int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file);
|
|
|
|
int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file);
|
2014-12-25 00:13:40 +08:00
|
|
|
int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file_priv);
|
|
|
|
int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file_priv);
|
2016-05-13 18:57:19 +08:00
|
|
|
int i915_gem_context_reset_stats_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file);
|
2012-05-10 21:25:09 +08:00
|
|
|
|
2013-12-07 06:11:23 +08:00
|
|
|
/* i915_gem_evict.c */
|
|
|
|
int __must_check i915_gem_evict_something(struct drm_device *dev,
|
|
|
|
struct i915_address_space *vm,
|
|
|
|
int min_size,
|
|
|
|
unsigned alignment,
|
|
|
|
unsigned cache_level,
|
drm/i915: Prevent negative relocation deltas from wrapping
This is pure evil. Userspace, I'm looking at you SNA, repacks batch
buffers on the fly after generation as they are being passed to the
kernel for execution. These batches also contain self-referenced
relocations as a single buffer encompasses the state commands, kernels,
vertices and sampler. During generation the buffers are placed at known
offsets within the full batch, and then the relocation deltas (as passed
to the kernel) are tweaked as the batch is repacked into a smaller buffer.
This means that userspace is passing negative relocations deltas, which
subsequently wrap to large values if the batch is at a low address. The
GPU hangs when it then tries to use the large value as a base for its
address offsets, rather than wrapping back to the real value (as one
would hope). As the GPU uses positive offsets from the base, we can
treat the relocation address as the minimum address read by the GPU.
For the upper bound, we trust that userspace will not read beyond the
end of the buffer.
So, how do we fix negative relocations from wrapping? We can either
check that every relocation looks valid when we write it, and then
position each object such that we prevent the offset wraparound, or we
just special-case the self-referential behaviour of SNA and force all
batches to be above 256k. Daniel prefers the latter approach.
This fixes a GPU hang when it tries to use an address (relocation +
offset) greater than the GTT size. The issue would occur quite easily
with full-ppgtt as each fd gets its own VM space, so low offsets would
often be handed out. However, with the rearrangement of the low GTT due
to capturing the BIOS framebuffer, it is already affecting kernels 3.15
onwards. I think only IVB+ is susceptible to this bug, but the workaround
should only kick in rarely, so it seems sensible to always apply it.
v3: Use a bias for batch buffers to prevent small negative delta relocations
from wrapping.
v4 from Daniel:
- s/BIAS/BATCH_OFFSET_BIAS/
- Extract eb_vma_misplaced/i915_vma_misplaced since the conditions
were growing rather cumbersome.
- Add a comment to eb_get_batch explaining why we do this.
- Apply the batch offset bias everywhere but mention that we've only
observed it on gen7 gpus.
- Drop PIN_OFFSET_FIX for now, that slipped in from a feature patch.
v5: Add static to eb_get_batch, spotted by 0-day tester.
Testcase: igt/gem_bad_reloc
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78533
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v3)
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-23 14:48:08 +08:00
|
|
|
unsigned long start,
|
|
|
|
unsigned long end,
|
2014-02-14 21:01:11 +08:00
|
|
|
unsigned flags);
|
2015-12-08 19:55:07 +08:00
|
|
|
int __must_check i915_gem_evict_for_vma(struct i915_vma *target);
|
2013-12-07 06:11:23 +08:00
|
|
|
int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle);
|
2012-02-10 00:15:46 +08:00
|
|
|
|
drm/i915: Split out GTT specific header file
This file contains all necessary defines, prototypes and typesdefs for
manipulating GEN graphics address translation (this does not include the
legacy AGP driver)
Reiterating the comment in the header,
"Please try to maintain the following order within this file unless it
makes sense to do otherwise. From top to bottom:
1. typedefs
2. #defines, and macros
3. structure definitions
4. function prototypes
Within each section, please try to order by generation in ascending
order, from top to bottom (ie. GEN6 on the top, GEN8 on the bottom)."
I've made some minor cleanups, and fixed a couple of typos while here -
but there should be no functional changes.
The purpose of the patch is to reduce clutter in our main header file,
making room for new growth, and make documentation of our interfaces
easier by splitting things out.
With a little more work, like making i915_gtt a pointer, we could
potentially completely isolate this header from i915_drv.h. At the
moment however, I don't think it's worth the effort.
Personally, I would have liked to put the PTE encoding functions in this
file too, but I didn't want to rock the boat too much.
A similar patch has been in use on my machine for some time. This exact
patch though has only been compile tested.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-23 13:47:21 +08:00
|
|
|
/* belongs in i915_gem_gtt.h */
|
2016-05-06 22:40:21 +08:00
|
|
|
static inline void i915_gem_chipset_flush(struct drm_i915_private *dev_priv)
|
2012-11-05 01:21:27 +08:00
|
|
|
{
|
2016-05-06 22:40:21 +08:00
|
|
|
if (INTEL_GEN(dev_priv) < 6)
|
2012-11-05 01:21:27 +08:00
|
|
|
intel_gtt_chipset_flush();
|
|
|
|
}
|
2013-12-07 06:11:14 +08:00
|
|
|
|
2012-04-24 22:47:39 +08:00
|
|
|
/* i915_gem_stolen.c */
|
2015-07-03 06:25:07 +08:00
|
|
|
int i915_gem_stolen_insert_node(struct drm_i915_private *dev_priv,
|
|
|
|
struct drm_mm_node *node, u64 size,
|
|
|
|
unsigned alignment);
|
2015-09-15 02:19:57 +08:00
|
|
|
int i915_gem_stolen_insert_node_in_range(struct drm_i915_private *dev_priv,
|
|
|
|
struct drm_mm_node *node, u64 size,
|
|
|
|
unsigned alignment, u64 start,
|
|
|
|
u64 end);
|
2015-07-03 06:25:07 +08:00
|
|
|
void i915_gem_stolen_remove_node(struct drm_i915_private *dev_priv,
|
|
|
|
struct drm_mm_node *node);
|
2012-04-24 22:47:39 +08:00
|
|
|
int i915_gem_init_stolen(struct drm_device *dev);
|
|
|
|
void i915_gem_cleanup_stolen(struct drm_device *dev);
|
2012-11-15 19:32:26 +08:00
|
|
|
struct drm_i915_gem_object *
|
|
|
|
i915_gem_object_create_stolen(struct drm_device *dev, u32 size);
|
2013-02-20 05:31:37 +08:00
|
|
|
struct drm_i915_gem_object *
|
|
|
|
i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
|
|
|
|
u32 stolen_offset,
|
|
|
|
u32 gtt_offset,
|
|
|
|
u32 size);
|
2012-04-24 22:47:39 +08:00
|
|
|
|
2015-03-18 17:46:04 +08:00
|
|
|
/* i915_gem_shrinker.c */
|
|
|
|
unsigned long i915_gem_shrink(struct drm_i915_private *dev_priv,
|
2015-10-01 19:18:25 +08:00
|
|
|
unsigned long target,
|
2015-03-18 17:46:04 +08:00
|
|
|
unsigned flags);
|
|
|
|
#define I915_SHRINK_PURGEABLE 0x1
|
|
|
|
#define I915_SHRINK_UNBOUND 0x2
|
|
|
|
#define I915_SHRINK_BOUND 0x4
|
2015-10-01 19:18:29 +08:00
|
|
|
#define I915_SHRINK_ACTIVE 0x8
|
2016-04-08 19:11:12 +08:00
|
|
|
#define I915_SHRINK_VMAPS 0x10
|
2015-03-18 17:46:04 +08:00
|
|
|
unsigned long i915_gem_shrink_all(struct drm_i915_private *dev_priv);
|
|
|
|
void i915_gem_shrinker_init(struct drm_i915_private *dev_priv);
|
2016-01-19 21:26:28 +08:00
|
|
|
void i915_gem_shrinker_cleanup(struct drm_i915_private *dev_priv);
|
2015-03-18 17:46:04 +08:00
|
|
|
|
|
|
|
|
2008-07-31 03:06:12 +08:00
|
|
|
/* i915_gem_tiling.c */
|
2013-08-02 01:39:55 +08:00
|
|
|
static inline bool i915_gem_object_needs_bit17_swizzle(struct drm_i915_gem_object *obj)
|
2012-12-04 05:03:14 +08:00
|
|
|
{
|
2016-06-24 21:00:21 +08:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
|
2012-12-04 05:03:14 +08:00
|
|
|
|
|
|
|
return dev_priv->mm.bit_6_swizzle_x == I915_BIT_6_SWIZZLE_9_10_17 &&
|
|
|
|
obj->tiling_mode != I915_TILING_NONE;
|
|
|
|
}
|
|
|
|
|
2008-07-31 03:06:12 +08:00
|
|
|
/* i915_gem_debug.c */
|
2010-09-29 23:10:57 +08:00
|
|
|
#if WATCH_LISTS
|
|
|
|
int i915_verify_lists(struct drm_device *dev);
|
2008-07-31 03:06:12 +08:00
|
|
|
#else
|
2010-09-29 23:10:57 +08:00
|
|
|
#define i915_verify_lists(dev) 0
|
2008-07-31 03:06:12 +08:00
|
|
|
#endif
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2009-02-18 09:08:50 +08:00
|
|
|
/* i915_debugfs.c */
|
2013-10-16 17:49:58 +08:00
|
|
|
#ifdef CONFIG_DEBUG_FS
|
2016-06-24 21:00:17 +08:00
|
|
|
int i915_debugfs_register(struct drm_i915_private *dev_priv);
|
|
|
|
void i915_debugfs_unregister(struct drm_i915_private *dev_priv);
|
2015-04-10 21:59:32 +08:00
|
|
|
int i915_debugfs_connector_add(struct drm_connector *connector);
|
2013-10-16 01:55:40 +08:00
|
|
|
void intel_display_crc_init(struct drm_device *dev);
|
|
|
|
#else
|
2016-06-24 21:00:17 +08:00
|
|
|
static inline int i915_debugfs_register(struct drm_i915_private *) {return 0;}
|
|
|
|
static inline void i915_debugfs_unregister(struct drm_i915_private *) {}
|
2015-07-13 15:23:19 +08:00
|
|
|
static inline int i915_debugfs_connector_add(struct drm_connector *connector)
|
|
|
|
{ return 0; }
|
2013-10-16 17:49:58 +08:00
|
|
|
static inline void intel_display_crc_init(struct drm_device *dev) {}
|
2013-10-16 01:55:40 +08:00
|
|
|
#endif
|
2013-07-12 21:50:57 +08:00
|
|
|
|
|
|
|
/* i915_gpu_error.c */
|
2013-05-23 18:55:35 +08:00
|
|
|
__printf(2, 3)
|
|
|
|
void i915_error_printf(struct drm_i915_error_state_buf *e, const char *f, ...);
|
2013-06-06 20:18:39 +08:00
|
|
|
int i915_error_state_to_str(struct drm_i915_error_state_buf *estr,
|
|
|
|
const struct i915_error_state_file_priv *error);
|
2013-06-06 20:18:41 +08:00
|
|
|
int i915_error_state_buf_init(struct drm_i915_error_state_buf *eb,
|
2014-08-22 21:41:39 +08:00
|
|
|
struct drm_i915_private *i915,
|
2013-06-06 20:18:41 +08:00
|
|
|
size_t count, loff_t pos);
|
|
|
|
static inline void i915_error_state_buf_release(
|
|
|
|
struct drm_i915_error_state_buf *eb)
|
|
|
|
{
|
|
|
|
kfree(eb->buf);
|
|
|
|
}
|
2016-05-06 22:40:21 +08:00
|
|
|
void i915_capture_error_state(struct drm_i915_private *dev_priv,
|
|
|
|
u32 engine_mask,
|
2014-02-25 23:11:26 +08:00
|
|
|
const char *error_msg);
|
2013-07-12 21:50:57 +08:00
|
|
|
void i915_error_state_get(struct drm_device *dev,
|
|
|
|
struct i915_error_state_file_priv *error_priv);
|
|
|
|
void i915_error_state_put(struct i915_error_state_file_priv *error_priv);
|
|
|
|
void i915_destroy_error_state(struct drm_device *dev);
|
|
|
|
|
2016-05-06 22:40:21 +08:00
|
|
|
void i915_get_extra_instdone(struct drm_i915_private *dev_priv, uint32_t *instdone);
|
2014-08-22 21:41:39 +08:00
|
|
|
const char *i915_cache_level_str(struct drm_i915_private *i915, int type);
|
2009-02-18 09:08:50 +08:00
|
|
|
|
2014-02-19 02:15:46 +08:00
|
|
|
/* i915_cmd_parser.c */
|
2016-05-04 21:25:36 +08:00
|
|
|
int i915_cmd_parser_get_version(struct drm_i915_private *dev_priv);
|
2016-03-16 19:00:37 +08:00
|
|
|
int i915_cmd_parser_init_ring(struct intel_engine_cs *engine);
|
|
|
|
void i915_cmd_parser_fini_ring(struct intel_engine_cs *engine);
|
|
|
|
bool i915_needs_cmd_parser(struct intel_engine_cs *engine);
|
|
|
|
int i915_parse_cmds(struct intel_engine_cs *engine,
|
2014-02-19 02:15:46 +08:00
|
|
|
struct drm_i915_gem_object *batch_obj,
|
2014-12-12 04:13:09 +08:00
|
|
|
struct drm_i915_gem_object *shadow_batch_obj,
|
2014-02-19 02:15:46 +08:00
|
|
|
u32 batch_start_offset,
|
2014-12-12 04:13:10 +08:00
|
|
|
u32 batch_len,
|
2014-02-19 02:15:46 +08:00
|
|
|
bool is_master);
|
|
|
|
|
2008-08-26 06:11:06 +08:00
|
|
|
/* i915_suspend.c */
|
|
|
|
extern int i915_save_state(struct drm_device *dev);
|
|
|
|
extern int i915_restore_state(struct drm_device *dev);
|
2008-10-01 03:14:26 +08:00
|
|
|
|
2012-04-11 12:17:01 +08:00
|
|
|
/* i915_sysfs.c */
|
|
|
|
void i915_setup_sysfs(struct drm_device *dev_priv);
|
|
|
|
void i915_teardown_sysfs(struct drm_device *dev_priv);
|
|
|
|
|
2010-07-21 06:44:45 +08:00
|
|
|
/* intel_i2c.c */
|
|
|
|
extern int intel_setup_gmbus(struct drm_device *dev);
|
|
|
|
extern void intel_teardown_gmbus(struct drm_device *dev);
|
2015-03-27 06:20:22 +08:00
|
|
|
extern bool intel_gmbus_is_valid_pin(struct drm_i915_private *dev_priv,
|
|
|
|
unsigned int pin);
|
2012-03-28 02:36:14 +08:00
|
|
|
|
2015-03-27 06:20:20 +08:00
|
|
|
extern struct i2c_adapter *
|
|
|
|
intel_gmbus_get_adapter(struct drm_i915_private *dev_priv, unsigned int pin);
|
2010-09-24 19:52:03 +08:00
|
|
|
extern void intel_gmbus_set_speed(struct i2c_adapter *adapter, int speed);
|
|
|
|
extern void intel_gmbus_force_bit(struct i2c_adapter *adapter, bool force_bit);
|
2013-05-06 20:52:08 +08:00
|
|
|
static inline bool intel_gmbus_is_forced_bit(struct i2c_adapter *adapter)
|
2010-09-28 23:41:32 +08:00
|
|
|
{
|
|
|
|
return container_of(adapter, struct intel_gmbus, adapter)->force_bit;
|
|
|
|
}
|
2010-07-21 06:44:45 +08:00
|
|
|
extern void intel_i2c_reset(struct drm_device *dev);
|
|
|
|
|
2015-12-14 18:50:49 +08:00
|
|
|
/* intel_bios.c */
|
2015-12-16 21:04:20 +08:00
|
|
|
int intel_bios_init(struct drm_i915_private *dev_priv);
|
2015-12-15 19:16:15 +08:00
|
|
|
bool intel_bios_is_valid_vbt(const void *buf, size_t size);
|
2016-03-16 18:43:29 +08:00
|
|
|
bool intel_bios_is_tv_present(struct drm_i915_private *dev_priv);
|
2016-03-16 18:43:30 +08:00
|
|
|
bool intel_bios_is_lvds_present(struct drm_i915_private *dev_priv, u8 *i2c_pin);
|
2016-06-03 17:17:43 +08:00
|
|
|
bool intel_bios_is_port_present(struct drm_i915_private *dev_priv, enum port port);
|
2016-03-16 18:43:31 +08:00
|
|
|
bool intel_bios_is_port_edp(struct drm_i915_private *dev_priv, enum port port);
|
2016-05-04 19:45:22 +08:00
|
|
|
bool intel_bios_is_port_dp_dual_mode(struct drm_i915_private *dev_priv, enum port port);
|
2016-03-16 18:43:32 +08:00
|
|
|
bool intel_bios_is_dsi_present(struct drm_i915_private *dev_priv, enum port *port);
|
2016-03-31 18:41:47 +08:00
|
|
|
bool intel_bios_is_port_hpd_inverted(struct drm_i915_private *dev_priv,
|
|
|
|
enum port port);
|
2015-12-14 18:50:49 +08:00
|
|
|
|
2010-08-24 16:02:58 +08:00
|
|
|
/* intel_opregion.c */
|
2010-08-19 23:09:23 +08:00
|
|
|
#ifdef CONFIG_ACPI
|
2016-05-23 22:08:09 +08:00
|
|
|
extern int intel_opregion_setup(struct drm_i915_private *dev_priv);
|
2016-05-23 22:08:10 +08:00
|
|
|
extern void intel_opregion_register(struct drm_i915_private *dev_priv);
|
|
|
|
extern void intel_opregion_unregister(struct drm_i915_private *dev_priv);
|
2016-05-06 21:48:28 +08:00
|
|
|
extern void intel_opregion_asle_intr(struct drm_i915_private *dev_priv);
|
2013-08-31 00:40:30 +08:00
|
|
|
extern int intel_opregion_notify_encoder(struct intel_encoder *intel_encoder,
|
|
|
|
bool enable);
|
2016-05-23 22:08:09 +08:00
|
|
|
extern int intel_opregion_notify_adapter(struct drm_i915_private *dev_priv,
|
2013-08-31 00:40:31 +08:00
|
|
|
pci_power_t state);
|
2016-05-23 22:08:09 +08:00
|
|
|
extern int intel_opregion_get_panel_type(struct drm_i915_private *dev_priv);
|
2008-10-25 05:18:10 +08:00
|
|
|
#else
|
2016-05-23 22:08:09 +08:00
|
|
|
static inline int intel_opregion_setup(struct drm_i915_private *dev) { return 0; }
|
2016-06-27 19:53:19 +08:00
|
|
|
static inline void intel_opregion_register(struct drm_i915_private *dev_priv) { }
|
|
|
|
static inline void intel_opregion_unregister(struct drm_i915_private *dev_priv) { }
|
2016-05-06 21:48:28 +08:00
|
|
|
static inline void intel_opregion_asle_intr(struct drm_i915_private *dev_priv)
|
|
|
|
{
|
|
|
|
}
|
2013-08-31 00:40:30 +08:00
|
|
|
static inline int
|
|
|
|
intel_opregion_notify_encoder(struct intel_encoder *intel_encoder, bool enable)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
2013-08-31 00:40:31 +08:00
|
|
|
static inline int
|
2016-05-23 22:08:09 +08:00
|
|
|
intel_opregion_notify_adapter(struct drm_i915_private *dev, pci_power_t state)
|
2013-08-31 00:40:31 +08:00
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
2016-05-23 22:08:09 +08:00
|
|
|
static inline int intel_opregion_get_panel_type(struct drm_i915_private *dev)
|
2016-04-11 15:23:51 +08:00
|
|
|
{
|
|
|
|
return -ENODEV;
|
|
|
|
}
|
2008-10-25 05:18:10 +08:00
|
|
|
#endif
|
2008-08-06 02:37:25 +08:00
|
|
|
|
2010-10-08 07:01:13 +08:00
|
|
|
/* intel_acpi.c */
|
|
|
|
#ifdef CONFIG_ACPI
|
|
|
|
extern void intel_register_dsm_handler(void);
|
|
|
|
extern void intel_unregister_dsm_handler(void);
|
|
|
|
#else
|
|
|
|
static inline void intel_register_dsm_handler(void) { return; }
|
|
|
|
static inline void intel_unregister_dsm_handler(void) { return; }
|
|
|
|
#endif /* CONFIG_ACPI */
|
|
|
|
|
2016-07-05 17:40:20 +08:00
|
|
|
/* intel_device_info.c */
|
|
|
|
static inline struct intel_device_info *
|
|
|
|
mkwrite_device_info(struct drm_i915_private *dev_priv)
|
|
|
|
{
|
|
|
|
return (struct intel_device_info *)&dev_priv->info;
|
|
|
|
}
|
|
|
|
|
|
|
|
void intel_device_info_runtime_init(struct drm_i915_private *dev_priv);
|
|
|
|
void intel_device_info_dump(struct drm_i915_private *dev_priv);
|
|
|
|
|
DRM: i915: add mode setting support
This commit adds i915 driver support for the DRM mode setting APIs.
Currently, VGA, LVDS, SDVO DVI & VGA, TV and DVO LVDS outputs are
supported. HDMI, DisplayPort and additional SDVO output support will
follow.
Support for the mode setting code is controlled by the new 'modeset'
module option. A new config option, CONFIG_DRM_I915_KMS controls the
default behavior, and whether a PCI ID list is built into the module for
use by user level module utilities.
Note that if mode setting is enabled, user level drivers that access
display registers directly or that don't use the kernel graphics memory
manager will likely corrupt kernel graphics memory, disrupt output
configuration (possibly leading to hangs and/or blank displays), and
prevent panic/oops messages from appearing. So use caution when
enabling this code; be sure your user level code supports the new
interfaces.
A new SysRq key, 'g', provides emergency support for switching back to
the kernel's framebuffer console; which is useful for testing.
Co-authors: Dave Airlie <airlied@linux.ie>, Hong Liu <hong.liu@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2008-11-08 06:24:08 +08:00
|
|
|
/* modesetting */
|
2012-04-10 21:50:11 +08:00
|
|
|
extern void intel_modeset_init_hw(struct drm_device *dev);
|
DRM: i915: add mode setting support
This commit adds i915 driver support for the DRM mode setting APIs.
Currently, VGA, LVDS, SDVO DVI & VGA, TV and DVO LVDS outputs are
supported. HDMI, DisplayPort and additional SDVO output support will
follow.
Support for the mode setting code is controlled by the new 'modeset'
module option. A new config option, CONFIG_DRM_I915_KMS controls the
default behavior, and whether a PCI ID list is built into the module for
use by user level module utilities.
Note that if mode setting is enabled, user level drivers that access
display registers directly or that don't use the kernel graphics memory
manager will likely corrupt kernel graphics memory, disrupt output
configuration (possibly leading to hangs and/or blank displays), and
prevent panic/oops messages from appearing. So use caution when
enabling this code; be sure your user level code supports the new
interfaces.
A new SysRq key, 'g', provides emergency support for switching back to
the kernel's framebuffer console; which is useful for testing.
Co-authors: Dave Airlie <airlied@linux.ie>, Hong Liu <hong.liu@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2008-11-08 06:24:08 +08:00
|
|
|
extern void intel_modeset_init(struct drm_device *dev);
|
2011-03-29 17:40:27 +08:00
|
|
|
extern void intel_modeset_gem_init(struct drm_device *dev);
|
DRM: i915: add mode setting support
This commit adds i915 driver support for the DRM mode setting APIs.
Currently, VGA, LVDS, SDVO DVI & VGA, TV and DVO LVDS outputs are
supported. HDMI, DisplayPort and additional SDVO output support will
follow.
Support for the mode setting code is controlled by the new 'modeset'
module option. A new config option, CONFIG_DRM_I915_KMS controls the
default behavior, and whether a PCI ID list is built into the module for
use by user level module utilities.
Note that if mode setting is enabled, user level drivers that access
display registers directly or that don't use the kernel graphics memory
manager will likely corrupt kernel graphics memory, disrupt output
configuration (possibly leading to hangs and/or blank displays), and
prevent panic/oops messages from appearing. So use caution when
enabling this code; be sure your user level code supports the new
interfaces.
A new SysRq key, 'g', provides emergency support for switching back to
the kernel's framebuffer console; which is useful for testing.
Co-authors: Dave Airlie <airlied@linux.ie>, Hong Liu <hong.liu@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2008-11-08 06:24:08 +08:00
|
|
|
extern void intel_modeset_cleanup(struct drm_device *dev);
|
2016-06-24 21:00:15 +08:00
|
|
|
extern int intel_connector_register(struct drm_connector *);
|
2016-06-17 18:40:33 +08:00
|
|
|
extern void intel_connector_unregister(struct drm_connector *);
|
2009-09-21 12:33:58 +08:00
|
|
|
extern int intel_modeset_vga_set_state(struct drm_device *dev, bool state);
|
2015-07-13 22:30:25 +08:00
|
|
|
extern void intel_display_resume(struct drm_device *dev);
|
2013-01-26 00:53:21 +08:00
|
|
|
extern void i915_redisable_vga(struct drm_device *dev);
|
2014-02-18 06:02:16 +08:00
|
|
|
extern void i915_redisable_vga_power_on(struct drm_device *dev);
|
2016-05-06 21:48:28 +08:00
|
|
|
extern bool ironlake_set_drps(struct drm_i915_private *dev_priv, u8 val);
|
2012-12-01 22:04:25 +08:00
|
|
|
extern void intel_init_pch_refclk(struct drm_device *dev);
|
2016-05-10 21:10:04 +08:00
|
|
|
extern void intel_set_rps(struct drm_i915_private *dev_priv, u8 val);
|
2014-07-01 17:36:17 +08:00
|
|
|
extern void intel_set_memory_cxsr(struct drm_i915_private *dev_priv,
|
|
|
|
bool enable);
|
2010-04-07 16:15:53 +08:00
|
|
|
|
2016-05-06 22:40:21 +08:00
|
|
|
extern bool i915_semaphore_is_enabled(struct drm_i915_private *dev_priv);
|
2012-07-13 02:01:05 +08:00
|
|
|
int i915_reg_read_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file);
|
2012-03-29 04:39:37 +08:00
|
|
|
|
2010-08-05 03:26:07 +08:00
|
|
|
/* overlay */
|
2016-05-06 22:40:21 +08:00
|
|
|
extern struct intel_overlay_error_state *
|
|
|
|
intel_overlay_capture_error_state(struct drm_i915_private *dev_priv);
|
2013-05-23 18:55:35 +08:00
|
|
|
extern void intel_overlay_print_error_state(struct drm_i915_error_state_buf *e,
|
|
|
|
struct intel_overlay_error_state *error);
|
2010-11-21 21:12:35 +08:00
|
|
|
|
2016-05-06 22:40:21 +08:00
|
|
|
extern struct intel_display_error_state *
|
|
|
|
intel_display_capture_error_state(struct drm_i915_private *dev_priv);
|
2013-05-23 18:55:35 +08:00
|
|
|
extern void intel_display_print_error_state(struct drm_i915_error_state_buf *e,
|
2010-11-21 21:12:35 +08:00
|
|
|
struct drm_device *dev,
|
|
|
|
struct intel_display_error_state *error);
|
2010-08-05 03:26:07 +08:00
|
|
|
|
2014-11-14 10:50:10 +08:00
|
|
|
int sandybridge_pcode_read(struct drm_i915_private *dev_priv, u32 mbox, u32 *val);
|
|
|
|
int sandybridge_pcode_write(struct drm_i915_private *dev_priv, u32 mbox, u32 val);
|
2013-05-22 20:36:16 +08:00
|
|
|
|
|
|
|
/* intel_sideband.c */
|
2015-01-16 23:12:17 +08:00
|
|
|
u32 vlv_punit_read(struct drm_i915_private *dev_priv, u32 addr);
|
|
|
|
void vlv_punit_write(struct drm_i915_private *dev_priv, u32 addr, u32 val);
|
2013-05-22 20:36:20 +08:00
|
|
|
u32 vlv_nc_read(struct drm_i915_private *dev_priv, u8 addr);
|
2016-02-05 00:55:15 +08:00
|
|
|
u32 vlv_iosf_sb_read(struct drm_i915_private *dev_priv, u8 port, u32 reg);
|
|
|
|
void vlv_iosf_sb_write(struct drm_i915_private *dev_priv, u8 port, u32 reg, u32 val);
|
2013-08-27 20:12:14 +08:00
|
|
|
u32 vlv_cck_read(struct drm_i915_private *dev_priv, u32 reg);
|
|
|
|
void vlv_cck_write(struct drm_i915_private *dev_priv, u32 reg, u32 val);
|
|
|
|
u32 vlv_ccu_read(struct drm_i915_private *dev_priv, u32 reg);
|
|
|
|
void vlv_ccu_write(struct drm_i915_private *dev_priv, u32 reg, u32 val);
|
2013-11-05 03:52:44 +08:00
|
|
|
u32 vlv_bunit_read(struct drm_i915_private *dev_priv, u32 reg);
|
|
|
|
void vlv_bunit_write(struct drm_i915_private *dev_priv, u32 reg, u32 val);
|
2013-09-05 20:41:49 +08:00
|
|
|
u32 vlv_dpio_read(struct drm_i915_private *dev_priv, enum pipe pipe, int reg);
|
|
|
|
void vlv_dpio_write(struct drm_i915_private *dev_priv, enum pipe pipe, int reg, u32 val);
|
2013-05-22 20:36:16 +08:00
|
|
|
u32 intel_sbi_read(struct drm_i915_private *dev_priv, u16 reg,
|
|
|
|
enum intel_sbi_destination destination);
|
|
|
|
void intel_sbi_write(struct drm_i915_private *dev_priv, u16 reg, u32 value,
|
|
|
|
enum intel_sbi_destination destination);
|
2013-12-10 14:44:55 +08:00
|
|
|
u32 vlv_flisdsi_read(struct drm_i915_private *dev_priv, u32 reg);
|
|
|
|
void vlv_flisdsi_write(struct drm_i915_private *dev_priv, u32 reg, u32 val);
|
2013-04-18 06:54:58 +08:00
|
|
|
|
2016-04-27 20:44:17 +08:00
|
|
|
/* intel_dpio_phy.c */
|
|
|
|
void chv_set_phy_signal_level(struct intel_encoder *encoder,
|
|
|
|
u32 deemph_reg_value, u32 margin_reg_value,
|
|
|
|
bool uniq_trans_scale);
|
2016-04-27 20:44:18 +08:00
|
|
|
void chv_data_lane_soft_reset(struct intel_encoder *encoder,
|
|
|
|
bool reset);
|
2016-04-27 20:44:19 +08:00
|
|
|
void chv_phy_pre_pll_enable(struct intel_encoder *encoder);
|
2016-04-27 20:44:20 +08:00
|
|
|
void chv_phy_pre_encoder_enable(struct intel_encoder *encoder);
|
|
|
|
void chv_phy_release_cl2_override(struct intel_encoder *encoder);
|
2016-04-27 20:44:21 +08:00
|
|
|
void chv_phy_post_pll_disable(struct intel_encoder *encoder);
|
2016-04-27 20:44:17 +08:00
|
|
|
|
2016-04-27 20:44:22 +08:00
|
|
|
void vlv_set_phy_signal_level(struct intel_encoder *encoder,
|
|
|
|
u32 demph_reg_value, u32 preemph_reg_value,
|
|
|
|
u32 uniqtranscale_reg_value, u32 tx3_demph);
|
2016-04-27 20:44:23 +08:00
|
|
|
void vlv_phy_pre_pll_enable(struct intel_encoder *encoder);
|
2016-04-27 20:44:24 +08:00
|
|
|
void vlv_phy_pre_encoder_enable(struct intel_encoder *encoder);
|
2016-04-27 20:44:25 +08:00
|
|
|
void vlv_phy_reset_lanes(struct intel_encoder *encoder);
|
2016-04-27 20:44:22 +08:00
|
|
|
|
2015-01-24 03:04:25 +08:00
|
|
|
int intel_gpu_freq(struct drm_i915_private *dev_priv, int val);
|
|
|
|
int intel_freq_opcode(struct drm_i915_private *dev_priv, int val);
|
2013-11-23 17:25:42 +08:00
|
|
|
|
2013-10-05 12:22:51 +08:00
|
|
|
#define I915_READ8(reg) dev_priv->uncore.funcs.mmio_readb(dev_priv, (reg), true)
|
|
|
|
#define I915_WRITE8(reg, val) dev_priv->uncore.funcs.mmio_writeb(dev_priv, (reg), (val), true)
|
|
|
|
|
|
|
|
#define I915_READ16(reg) dev_priv->uncore.funcs.mmio_readw(dev_priv, (reg), true)
|
|
|
|
#define I915_WRITE16(reg, val) dev_priv->uncore.funcs.mmio_writew(dev_priv, (reg), (val), true)
|
|
|
|
#define I915_READ16_NOTRACE(reg) dev_priv->uncore.funcs.mmio_readw(dev_priv, (reg), false)
|
|
|
|
#define I915_WRITE16_NOTRACE(reg, val) dev_priv->uncore.funcs.mmio_writew(dev_priv, (reg), (val), false)
|
|
|
|
|
|
|
|
#define I915_READ(reg) dev_priv->uncore.funcs.mmio_readl(dev_priv, (reg), true)
|
|
|
|
#define I915_WRITE(reg, val) dev_priv->uncore.funcs.mmio_writel(dev_priv, (reg), (val), true)
|
|
|
|
#define I915_READ_NOTRACE(reg) dev_priv->uncore.funcs.mmio_readl(dev_priv, (reg), false)
|
|
|
|
#define I915_WRITE_NOTRACE(reg, val) dev_priv->uncore.funcs.mmio_writel(dev_priv, (reg), (val), false)
|
|
|
|
|
2014-03-21 21:16:43 +08:00
|
|
|
/* Be very careful with read/write 64-bit values. On 32-bit machines, they
|
|
|
|
* will be implemented using 2 32-bit writes in an arbitrary order with
|
|
|
|
* an arbitrary delay between them. This can cause the hardware to
|
|
|
|
* act upon the intermediate value, possibly leading to corruption and
|
|
|
|
* machine death. You have been warned.
|
|
|
|
*/
|
2013-10-05 12:22:51 +08:00
|
|
|
#define I915_WRITE64(reg, val) dev_priv->uncore.funcs.mmio_writeq(dev_priv, (reg), (val), true)
|
|
|
|
#define I915_READ64(reg) dev_priv->uncore.funcs.mmio_readq(dev_priv, (reg), true)
|
2010-11-09 17:17:32 +08:00
|
|
|
|
2014-03-21 20:41:53 +08:00
|
|
|
#define I915_READ64_2x32(lower_reg, upper_reg) ({ \
|
2015-09-08 21:17:13 +08:00
|
|
|
u32 upper, lower, old_upper, loop = 0; \
|
|
|
|
upper = I915_READ(upper_reg); \
|
2015-07-15 16:50:42 +08:00
|
|
|
do { \
|
2015-09-08 21:17:13 +08:00
|
|
|
old_upper = upper; \
|
2015-07-15 16:50:42 +08:00
|
|
|
lower = I915_READ(lower_reg); \
|
2015-09-08 21:17:13 +08:00
|
|
|
upper = I915_READ(upper_reg); \
|
|
|
|
} while (upper != old_upper && loop++ < 2); \
|
2015-07-15 16:50:42 +08:00
|
|
|
(u64)upper << 32 | lower; })
|
2014-03-21 20:41:53 +08:00
|
|
|
|
2010-11-09 17:17:32 +08:00
|
|
|
#define POSTING_READ(reg) (void)I915_READ_NOTRACE(reg)
|
|
|
|
#define POSTING_READ16(reg) (void)I915_READ16_NOTRACE(reg)
|
|
|
|
|
2015-10-22 20:34:56 +08:00
|
|
|
#define __raw_read(x, s) \
|
|
|
|
static inline uint##x##_t __raw_i915_read##x(struct drm_i915_private *dev_priv, \
|
drm/i915: Type safe register read/write
Make I915_READ and I915_WRITE more type safe by wrapping the register
offset in a struct. This should eliminate most of the fumbles we've had
with misplaced parens.
This only takes care of normal mmio registers. We could extend the idea
to other register types and define each with its own struct. That way
you wouldn't be able to accidentally pass the wrong thing to a specific
register access function.
The gpio_reg setup is probably the ugliest thing left. But I figure I'd
just leave it for now, and wait for some divine inspiration to strike
before making it nice.
As for the generated code, it's actually a bit better sometimes. Eg.
looking at i915_irq_handler(), we can see the following change:
lea 0x70024(%rdx,%rax,1),%r9d
mov $0x1,%edx
- movslq %r9d,%r9
- mov %r9,%rsi
- mov %r9,-0x58(%rbp)
- callq *0xd8(%rbx)
+ mov %r9d,%esi
+ mov %r9d,-0x48(%rbp)
callq *0xd8(%rbx)
So previously gcc thought the register offset might be signed and
decided to sign extend it, just in case. The rest appears to be
mostly just minor shuffling of instructions.
v2: i915_mmio_reg_{offset,equal,valid}() helpers added
s/_REG/_MMIO/ in the register defines
mo more switch statements left to worry about
ring_emit stuff got sorted in a prep patch
cmd parser, lrc context and w/a batch buildup also in prep patch
vgpu stuff cleaned up and moved to a prep patch
all other unrelated changes split out
v3: Rebased due to BXT DSI/BLC, MOCS, etc.
v4: Rebased due to churn, s/i915_mmio_reg_t/i915_reg_t/
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1447853606-2751-1-git-send-email-ville.syrjala@linux.intel.com
2015-11-18 21:33:26 +08:00
|
|
|
i915_reg_t reg) \
|
2015-10-22 20:34:56 +08:00
|
|
|
{ \
|
drm/i915: Type safe register read/write
Make I915_READ and I915_WRITE more type safe by wrapping the register
offset in a struct. This should eliminate most of the fumbles we've had
with misplaced parens.
This only takes care of normal mmio registers. We could extend the idea
to other register types and define each with its own struct. That way
you wouldn't be able to accidentally pass the wrong thing to a specific
register access function.
The gpio_reg setup is probably the ugliest thing left. But I figure I'd
just leave it for now, and wait for some divine inspiration to strike
before making it nice.
As for the generated code, it's actually a bit better sometimes. Eg.
looking at i915_irq_handler(), we can see the following change:
lea 0x70024(%rdx,%rax,1),%r9d
mov $0x1,%edx
- movslq %r9d,%r9
- mov %r9,%rsi
- mov %r9,-0x58(%rbp)
- callq *0xd8(%rbx)
+ mov %r9d,%esi
+ mov %r9d,-0x48(%rbp)
callq *0xd8(%rbx)
So previously gcc thought the register offset might be signed and
decided to sign extend it, just in case. The rest appears to be
mostly just minor shuffling of instructions.
v2: i915_mmio_reg_{offset,equal,valid}() helpers added
s/_REG/_MMIO/ in the register defines
mo more switch statements left to worry about
ring_emit stuff got sorted in a prep patch
cmd parser, lrc context and w/a batch buildup also in prep patch
vgpu stuff cleaned up and moved to a prep patch
all other unrelated changes split out
v3: Rebased due to BXT DSI/BLC, MOCS, etc.
v4: Rebased due to churn, s/i915_mmio_reg_t/i915_reg_t/
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1447853606-2751-1-git-send-email-ville.syrjala@linux.intel.com
2015-11-18 21:33:26 +08:00
|
|
|
return read##s(dev_priv->regs + i915_mmio_reg_offset(reg)); \
|
2015-10-22 20:34:56 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
#define __raw_write(x, s) \
|
|
|
|
static inline void __raw_i915_write##x(struct drm_i915_private *dev_priv, \
|
drm/i915: Type safe register read/write
Make I915_READ and I915_WRITE more type safe by wrapping the register
offset in a struct. This should eliminate most of the fumbles we've had
with misplaced parens.
This only takes care of normal mmio registers. We could extend the idea
to other register types and define each with its own struct. That way
you wouldn't be able to accidentally pass the wrong thing to a specific
register access function.
The gpio_reg setup is probably the ugliest thing left. But I figure I'd
just leave it for now, and wait for some divine inspiration to strike
before making it nice.
As for the generated code, it's actually a bit better sometimes. Eg.
looking at i915_irq_handler(), we can see the following change:
lea 0x70024(%rdx,%rax,1),%r9d
mov $0x1,%edx
- movslq %r9d,%r9
- mov %r9,%rsi
- mov %r9,-0x58(%rbp)
- callq *0xd8(%rbx)
+ mov %r9d,%esi
+ mov %r9d,-0x48(%rbp)
callq *0xd8(%rbx)
So previously gcc thought the register offset might be signed and
decided to sign extend it, just in case. The rest appears to be
mostly just minor shuffling of instructions.
v2: i915_mmio_reg_{offset,equal,valid}() helpers added
s/_REG/_MMIO/ in the register defines
mo more switch statements left to worry about
ring_emit stuff got sorted in a prep patch
cmd parser, lrc context and w/a batch buildup also in prep patch
vgpu stuff cleaned up and moved to a prep patch
all other unrelated changes split out
v3: Rebased due to BXT DSI/BLC, MOCS, etc.
v4: Rebased due to churn, s/i915_mmio_reg_t/i915_reg_t/
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1447853606-2751-1-git-send-email-ville.syrjala@linux.intel.com
2015-11-18 21:33:26 +08:00
|
|
|
i915_reg_t reg, uint##x##_t val) \
|
2015-10-22 20:34:56 +08:00
|
|
|
{ \
|
drm/i915: Type safe register read/write
Make I915_READ and I915_WRITE more type safe by wrapping the register
offset in a struct. This should eliminate most of the fumbles we've had
with misplaced parens.
This only takes care of normal mmio registers. We could extend the idea
to other register types and define each with its own struct. That way
you wouldn't be able to accidentally pass the wrong thing to a specific
register access function.
The gpio_reg setup is probably the ugliest thing left. But I figure I'd
just leave it for now, and wait for some divine inspiration to strike
before making it nice.
As for the generated code, it's actually a bit better sometimes. Eg.
looking at i915_irq_handler(), we can see the following change:
lea 0x70024(%rdx,%rax,1),%r9d
mov $0x1,%edx
- movslq %r9d,%r9
- mov %r9,%rsi
- mov %r9,-0x58(%rbp)
- callq *0xd8(%rbx)
+ mov %r9d,%esi
+ mov %r9d,-0x48(%rbp)
callq *0xd8(%rbx)
So previously gcc thought the register offset might be signed and
decided to sign extend it, just in case. The rest appears to be
mostly just minor shuffling of instructions.
v2: i915_mmio_reg_{offset,equal,valid}() helpers added
s/_REG/_MMIO/ in the register defines
mo more switch statements left to worry about
ring_emit stuff got sorted in a prep patch
cmd parser, lrc context and w/a batch buildup also in prep patch
vgpu stuff cleaned up and moved to a prep patch
all other unrelated changes split out
v3: Rebased due to BXT DSI/BLC, MOCS, etc.
v4: Rebased due to churn, s/i915_mmio_reg_t/i915_reg_t/
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1447853606-2751-1-git-send-email-ville.syrjala@linux.intel.com
2015-11-18 21:33:26 +08:00
|
|
|
write##s(val, dev_priv->regs + i915_mmio_reg_offset(reg)); \
|
2015-10-22 20:34:56 +08:00
|
|
|
}
|
|
|
|
__raw_read(8, b)
|
|
|
|
__raw_read(16, w)
|
|
|
|
__raw_read(32, l)
|
|
|
|
__raw_read(64, q)
|
|
|
|
|
|
|
|
__raw_write(8, b)
|
|
|
|
__raw_write(16, w)
|
|
|
|
__raw_write(32, l)
|
|
|
|
__raw_write(64, q)
|
|
|
|
|
|
|
|
#undef __raw_read
|
|
|
|
#undef __raw_write
|
|
|
|
|
2015-04-07 23:21:02 +08:00
|
|
|
/* These are untraced mmio-accessors that are only valid to be used inside
|
|
|
|
* criticial sections inside IRQ handlers where forcewake is explicitly
|
|
|
|
* controlled.
|
|
|
|
* Think twice, and think again, before using these.
|
|
|
|
* Note: Should only be used between intel_uncore_forcewake_irqlock() and
|
|
|
|
* intel_uncore_forcewake_irqunlock().
|
|
|
|
*/
|
2015-10-22 20:34:56 +08:00
|
|
|
#define I915_READ_FW(reg__) __raw_i915_read32(dev_priv, (reg__))
|
|
|
|
#define I915_WRITE_FW(reg__, val__) __raw_i915_write32(dev_priv, (reg__), (val__))
|
2016-06-30 22:33:45 +08:00
|
|
|
#define I915_WRITE64_FW(reg__, val__) __raw_i915_write64(dev_priv, (reg__), (val__))
|
2015-04-07 23:21:02 +08:00
|
|
|
#define POSTING_READ_FW(reg__) (void)I915_READ_FW(reg__)
|
|
|
|
|
2013-01-17 22:31:29 +08:00
|
|
|
/* "Broadcast RGB" property */
|
|
|
|
#define INTEL_BROADCAST_RGB_AUTO 0
|
|
|
|
#define INTEL_BROADCAST_RGB_FULL 1
|
|
|
|
#define INTEL_BROADCAST_RGB_LIMITED 2
|
2010-11-08 17:09:41 +08:00
|
|
|
|
drm/i915: Type safe register read/write
Make I915_READ and I915_WRITE more type safe by wrapping the register
offset in a struct. This should eliminate most of the fumbles we've had
with misplaced parens.
This only takes care of normal mmio registers. We could extend the idea
to other register types and define each with its own struct. That way
you wouldn't be able to accidentally pass the wrong thing to a specific
register access function.
The gpio_reg setup is probably the ugliest thing left. But I figure I'd
just leave it for now, and wait for some divine inspiration to strike
before making it nice.
As for the generated code, it's actually a bit better sometimes. Eg.
looking at i915_irq_handler(), we can see the following change:
lea 0x70024(%rdx,%rax,1),%r9d
mov $0x1,%edx
- movslq %r9d,%r9
- mov %r9,%rsi
- mov %r9,-0x58(%rbp)
- callq *0xd8(%rbx)
+ mov %r9d,%esi
+ mov %r9d,-0x48(%rbp)
callq *0xd8(%rbx)
So previously gcc thought the register offset might be signed and
decided to sign extend it, just in case. The rest appears to be
mostly just minor shuffling of instructions.
v2: i915_mmio_reg_{offset,equal,valid}() helpers added
s/_REG/_MMIO/ in the register defines
mo more switch statements left to worry about
ring_emit stuff got sorted in a prep patch
cmd parser, lrc context and w/a batch buildup also in prep patch
vgpu stuff cleaned up and moved to a prep patch
all other unrelated changes split out
v3: Rebased due to BXT DSI/BLC, MOCS, etc.
v4: Rebased due to churn, s/i915_mmio_reg_t/i915_reg_t/
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1447853606-2751-1-git-send-email-ville.syrjala@linux.intel.com
2015-11-18 21:33:26 +08:00
|
|
|
static inline i915_reg_t i915_vgacntrl_reg(struct drm_device *dev)
|
2013-01-26 03:44:46 +08:00
|
|
|
{
|
2015-12-10 04:29:35 +08:00
|
|
|
if (IS_VALLEYVIEW(dev) || IS_CHERRYVIEW(dev))
|
2013-01-26 03:44:46 +08:00
|
|
|
return VLV_VGACNTRL;
|
2014-07-21 17:53:40 +08:00
|
|
|
else if (INTEL_INFO(dev)->gen >= 5)
|
|
|
|
return CPU_VGACNTRL;
|
2013-01-26 03:44:46 +08:00
|
|
|
else
|
|
|
|
return VGACNTRL;
|
|
|
|
}
|
|
|
|
|
2013-05-22 01:03:17 +08:00
|
|
|
static inline unsigned long msecs_to_jiffies_timeout(const unsigned int m)
|
|
|
|
{
|
|
|
|
unsigned long j = msecs_to_jiffies(m);
|
|
|
|
|
|
|
|
return min_t(unsigned long, MAX_JIFFY_OFFSET, j + 1);
|
|
|
|
}
|
|
|
|
|
2014-12-04 18:12:54 +08:00
|
|
|
static inline unsigned long nsecs_to_jiffies_timeout(const u64 n)
|
|
|
|
{
|
|
|
|
return min_t(u64, MAX_JIFFY_OFFSET, nsecs_to_jiffies64(n) + 1);
|
|
|
|
}
|
|
|
|
|
2013-05-22 01:03:17 +08:00
|
|
|
static inline unsigned long
|
|
|
|
timespec_to_jiffies_timeout(const struct timespec *value)
|
|
|
|
{
|
|
|
|
unsigned long j = timespec_to_jiffies(value);
|
|
|
|
|
|
|
|
return min_t(unsigned long, MAX_JIFFY_OFFSET, j + 1);
|
|
|
|
}
|
|
|
|
|
2013-12-20 00:29:40 +08:00
|
|
|
/*
|
|
|
|
* If you need to wait X milliseconds between events A and B, but event B
|
|
|
|
* doesn't happen exactly after event A, you record the timestamp (jiffies) of
|
|
|
|
* when event A happened, then just before event B you call this function and
|
|
|
|
* pass the timestamp as the first argument, and X as the second argument.
|
|
|
|
*/
|
|
|
|
static inline void
|
|
|
|
wait_remaining_ms_from_jiffies(unsigned long timestamp_jiffies, int to_wait_ms)
|
|
|
|
{
|
2014-01-29 19:25:40 +08:00
|
|
|
unsigned long target_jiffies, tmp_jiffies, remaining_jiffies;
|
2013-12-20 00:29:40 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Don't re-read the value of "jiffies" every time since it may change
|
|
|
|
* behind our back and break the math.
|
|
|
|
*/
|
|
|
|
tmp_jiffies = jiffies;
|
|
|
|
target_jiffies = timestamp_jiffies +
|
|
|
|
msecs_to_jiffies_timeout(to_wait_ms);
|
|
|
|
|
|
|
|
if (time_after(target_jiffies, tmp_jiffies)) {
|
2014-01-29 19:25:40 +08:00
|
|
|
remaining_jiffies = target_jiffies - tmp_jiffies;
|
|
|
|
while (remaining_jiffies)
|
|
|
|
remaining_jiffies =
|
|
|
|
schedule_timeout_uninterruptible(remaining_jiffies);
|
2013-12-20 00:29:40 +08:00
|
|
|
}
|
|
|
|
}
|
drm/i915: Slaughter the thundering i915_wait_request herd
One particularly stressful scenario consists of many independent tasks
all competing for GPU time and waiting upon the results (e.g. realtime
transcoding of many, many streams). One bottleneck in particular is that
each client waits on its own results, but every client is woken up after
every batchbuffer - hence the thunder of hooves as then every client must
do its heavyweight dance to read a coherent seqno to see if it is the
lucky one.
Ideally, we only want one client to wake up after the interrupt and
check its request for completion. Since the requests must retire in
order, we can select the first client on the oldest request to be woken.
Once that client has completed his wait, we can then wake up the
next client and so on. However, all clients then incur latency as every
process in the chain may be delayed for scheduling - this may also then
cause some priority inversion. To reduce the latency, when a client
is added or removed from the list, we scan the tree for completed
seqno and wake up all the completed waiters in parallel.
Using igt/benchmarks/gem_latency, we can demonstrate this effect. The
benchmark measures the number of GPU cycles between completion of a
batch and the client waking up from a call to wait-ioctl. With many
concurrent waiters, with each on a different request, we observe that
the wakeup latency before the patch scales nearly linearly with the
number of waiters (before external factors kick in making the scaling much
worse). After applying the patch, we can see that only the single waiter
for the request is being woken up, providing a constant wakeup latency
for every operation. However, the situation is not quite as rosy for
many waiters on the same request, though to the best of my knowledge this
is much less likely in practice. Here, we can observe that the
concurrent waiters incur extra latency from being woken up by the
solitary bottom-half, rather than directly by the interrupt. This
appears to be scheduler induced (having discounted adverse effects from
having a rbtree walk/erase in the wakeup path), each additional
wake_up_process() costs approximately 1us on big core. Another effect of
performing the secondary wakeups from the first bottom-half is the
incurred delay this imposes on high priority threads - rather than
immediately returning to userspace and leaving the interrupt handler to
wake the others.
To offset the delay incurred with additional waiters on a request, we
could use a hybrid scheme that did a quick read in the interrupt handler
and dequeued all the completed waiters (incurring the overhead in the
interrupt handler, not the best plan either as we then incur GPU
submission latency) but we would still have to wake up the bottom-half
every time to do the heavyweight slow read. Or we could only kick the
waiters on the seqno with the same priority as the current task (i.e. in
the realtime waiter scenario, only it is woken up immediately by the
interrupt and simply queues the next waiter before returning to userspace,
minimising its delay at the expense of the chain, and also reducing
contention on its scheduler runqueue). This is effective at avoid long
pauses in the interrupt handler and at avoiding the extra latency in
realtime/high-priority waiters.
v2: Convert from a kworker per engine into a dedicated kthread for the
bottom-half.
v3: Rename request members and tweak comments.
v4: Use a per-engine spinlock in the breadcrumbs bottom-half.
v5: Fix race in locklessly checking waiter status and kicking the task on
adding a new waiter.
v6: Fix deciding when to force the timer to hide missing interrupts.
v7: Move the bottom-half from the kthread to the first client process.
v8: Reword a few comments
v9: Break the busy loop when the interrupt is unmasked or has fired.
v10: Comments, unnecessary churn, better debugging from Tvrtko
v11: Wake all completed waiters on removing the current bottom-half to
reduce the latency of waking up a herd of clients all waiting on the
same request.
v12: Rearrange missed-interrupt fault injection so that it works with
igt/drv_missed_irq_hang
v13: Rename intel_breadcrumb and friends to intel_wait in preparation
for signal handling.
v14: RCU commentary, assert_spin_locked
v15: Hide BUG_ON behind the compiler; report on gem_latency findings.
v16: Sort seqno-groups by priority so that first-waiter has the highest
task priority (and so avoid priority inversion).
v17: Add waiters to post-mortem GPU hang state.
v18: Return early for a completed wait after acquiring the spinlock.
Avoids adding ourselves to the tree if the is already complete, and
skips the awkward question of why we don't do completion wakeups for
waits earlier than or equal to ourselves.
v19: Prepare for init_breadcrumbs to fail. Later patches may want to
allocate during init, so be prepared to propagate back the error code.
Testcase: igt/gem_concurrent_blit
Testcase: igt/benchmarks/gem_latency
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> #v18
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-6-git-send-email-chris@chris-wilson.co.uk
2016-07-02 00:23:15 +08:00
|
|
|
static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
|
|
|
|
{
|
2016-07-02 00:23:16 +08:00
|
|
|
struct intel_engine_cs *engine = req->engine;
|
|
|
|
|
2016-07-02 00:23:22 +08:00
|
|
|
/* Before we do the heavier coherent read of the seqno,
|
|
|
|
* check the value (hopefully) in the CPU cacheline.
|
|
|
|
*/
|
|
|
|
if (i915_gem_request_completed(req))
|
|
|
|
return true;
|
|
|
|
|
drm/i915: Slaughter the thundering i915_wait_request herd
One particularly stressful scenario consists of many independent tasks
all competing for GPU time and waiting upon the results (e.g. realtime
transcoding of many, many streams). One bottleneck in particular is that
each client waits on its own results, but every client is woken up after
every batchbuffer - hence the thunder of hooves as then every client must
do its heavyweight dance to read a coherent seqno to see if it is the
lucky one.
Ideally, we only want one client to wake up after the interrupt and
check its request for completion. Since the requests must retire in
order, we can select the first client on the oldest request to be woken.
Once that client has completed his wait, we can then wake up the
next client and so on. However, all clients then incur latency as every
process in the chain may be delayed for scheduling - this may also then
cause some priority inversion. To reduce the latency, when a client
is added or removed from the list, we scan the tree for completed
seqno and wake up all the completed waiters in parallel.
Using igt/benchmarks/gem_latency, we can demonstrate this effect. The
benchmark measures the number of GPU cycles between completion of a
batch and the client waking up from a call to wait-ioctl. With many
concurrent waiters, with each on a different request, we observe that
the wakeup latency before the patch scales nearly linearly with the
number of waiters (before external factors kick in making the scaling much
worse). After applying the patch, we can see that only the single waiter
for the request is being woken up, providing a constant wakeup latency
for every operation. However, the situation is not quite as rosy for
many waiters on the same request, though to the best of my knowledge this
is much less likely in practice. Here, we can observe that the
concurrent waiters incur extra latency from being woken up by the
solitary bottom-half, rather than directly by the interrupt. This
appears to be scheduler induced (having discounted adverse effects from
having a rbtree walk/erase in the wakeup path), each additional
wake_up_process() costs approximately 1us on big core. Another effect of
performing the secondary wakeups from the first bottom-half is the
incurred delay this imposes on high priority threads - rather than
immediately returning to userspace and leaving the interrupt handler to
wake the others.
To offset the delay incurred with additional waiters on a request, we
could use a hybrid scheme that did a quick read in the interrupt handler
and dequeued all the completed waiters (incurring the overhead in the
interrupt handler, not the best plan either as we then incur GPU
submission latency) but we would still have to wake up the bottom-half
every time to do the heavyweight slow read. Or we could only kick the
waiters on the seqno with the same priority as the current task (i.e. in
the realtime waiter scenario, only it is woken up immediately by the
interrupt and simply queues the next waiter before returning to userspace,
minimising its delay at the expense of the chain, and also reducing
contention on its scheduler runqueue). This is effective at avoid long
pauses in the interrupt handler and at avoiding the extra latency in
realtime/high-priority waiters.
v2: Convert from a kworker per engine into a dedicated kthread for the
bottom-half.
v3: Rename request members and tweak comments.
v4: Use a per-engine spinlock in the breadcrumbs bottom-half.
v5: Fix race in locklessly checking waiter status and kicking the task on
adding a new waiter.
v6: Fix deciding when to force the timer to hide missing interrupts.
v7: Move the bottom-half from the kthread to the first client process.
v8: Reword a few comments
v9: Break the busy loop when the interrupt is unmasked or has fired.
v10: Comments, unnecessary churn, better debugging from Tvrtko
v11: Wake all completed waiters on removing the current bottom-half to
reduce the latency of waking up a herd of clients all waiting on the
same request.
v12: Rearrange missed-interrupt fault injection so that it works with
igt/drv_missed_irq_hang
v13: Rename intel_breadcrumb and friends to intel_wait in preparation
for signal handling.
v14: RCU commentary, assert_spin_locked
v15: Hide BUG_ON behind the compiler; report on gem_latency findings.
v16: Sort seqno-groups by priority so that first-waiter has the highest
task priority (and so avoid priority inversion).
v17: Add waiters to post-mortem GPU hang state.
v18: Return early for a completed wait after acquiring the spinlock.
Avoids adding ourselves to the tree if the is already complete, and
skips the awkward question of why we don't do completion wakeups for
waits earlier than or equal to ourselves.
v19: Prepare for init_breadcrumbs to fail. Later patches may want to
allocate during init, so be prepared to propagate back the error code.
Testcase: igt/gem_concurrent_blit
Testcase: igt/benchmarks/gem_latency
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> #v18
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-6-git-send-email-chris@chris-wilson.co.uk
2016-07-02 00:23:15 +08:00
|
|
|
/* Ensure our read of the seqno is coherent so that we
|
|
|
|
* do not "miss an interrupt" (i.e. if this is the last
|
|
|
|
* request and the seqno write from the GPU is not visible
|
|
|
|
* by the time the interrupt fires, we will see that the
|
|
|
|
* request is incomplete and go back to sleep awaiting
|
|
|
|
* another interrupt that will never come.)
|
|
|
|
*
|
|
|
|
* Strictly, we only need to do this once after an interrupt,
|
|
|
|
* but it is easier and safer to do it every time the waiter
|
|
|
|
* is woken.
|
|
|
|
*/
|
2016-07-02 00:23:23 +08:00
|
|
|
if (engine->irq_seqno_barrier &&
|
2016-07-06 19:39:02 +08:00
|
|
|
READ_ONCE(engine->breadcrumbs.irq_seqno_bh) == current &&
|
|
|
|
cmpxchg_relaxed(&engine->breadcrumbs.irq_posted, 1, 0)) {
|
2016-07-06 19:39:01 +08:00
|
|
|
struct task_struct *tsk;
|
|
|
|
|
2016-07-02 00:23:23 +08:00
|
|
|
/* The ordering of irq_posted versus applying the barrier
|
|
|
|
* is crucial. The clearing of the current irq_posted must
|
|
|
|
* be visible before we perform the barrier operation,
|
|
|
|
* such that if a subsequent interrupt arrives, irq_posted
|
|
|
|
* is reasserted and our task rewoken (which causes us to
|
|
|
|
* do another __i915_request_irq_complete() immediately
|
|
|
|
* and reapply the barrier). Conversely, if the clear
|
|
|
|
* occurs after the barrier, then an interrupt that arrived
|
|
|
|
* whilst we waited on the barrier would not trigger a
|
|
|
|
* barrier on the next pass, and the read may not see the
|
|
|
|
* seqno update.
|
|
|
|
*/
|
2016-07-02 00:23:16 +08:00
|
|
|
engine->irq_seqno_barrier(engine);
|
2016-07-06 19:39:01 +08:00
|
|
|
|
|
|
|
/* If we consume the irq, but we are no longer the bottom-half,
|
|
|
|
* the real bottom-half may not have serialised their own
|
|
|
|
* seqno check with the irq-barrier (i.e. may have inspected
|
|
|
|
* the seqno before we believe it coherent since they see
|
|
|
|
* irq_posted == false but we are still running).
|
|
|
|
*/
|
|
|
|
rcu_read_lock();
|
2016-07-06 19:39:02 +08:00
|
|
|
tsk = READ_ONCE(engine->breadcrumbs.irq_seqno_bh);
|
2016-07-06 19:39:01 +08:00
|
|
|
if (tsk && tsk != current)
|
|
|
|
/* Note that if the bottom-half is changed as we
|
|
|
|
* are sending the wake-up, the new bottom-half will
|
|
|
|
* be woken by whomever made the change. We only have
|
|
|
|
* to worry about when we steal the irq-posted for
|
|
|
|
* ourself.
|
|
|
|
*/
|
|
|
|
wake_up_process(tsk);
|
|
|
|
rcu_read_unlock();
|
|
|
|
|
2016-07-02 00:23:22 +08:00
|
|
|
if (i915_gem_request_completed(req))
|
|
|
|
return true;
|
|
|
|
}
|
drm/i915: Slaughter the thundering i915_wait_request herd
One particularly stressful scenario consists of many independent tasks
all competing for GPU time and waiting upon the results (e.g. realtime
transcoding of many, many streams). One bottleneck in particular is that
each client waits on its own results, but every client is woken up after
every batchbuffer - hence the thunder of hooves as then every client must
do its heavyweight dance to read a coherent seqno to see if it is the
lucky one.
Ideally, we only want one client to wake up after the interrupt and
check its request for completion. Since the requests must retire in
order, we can select the first client on the oldest request to be woken.
Once that client has completed his wait, we can then wake up the
next client and so on. However, all clients then incur latency as every
process in the chain may be delayed for scheduling - this may also then
cause some priority inversion. To reduce the latency, when a client
is added or removed from the list, we scan the tree for completed
seqno and wake up all the completed waiters in parallel.
Using igt/benchmarks/gem_latency, we can demonstrate this effect. The
benchmark measures the number of GPU cycles between completion of a
batch and the client waking up from a call to wait-ioctl. With many
concurrent waiters, with each on a different request, we observe that
the wakeup latency before the patch scales nearly linearly with the
number of waiters (before external factors kick in making the scaling much
worse). After applying the patch, we can see that only the single waiter
for the request is being woken up, providing a constant wakeup latency
for every operation. However, the situation is not quite as rosy for
many waiters on the same request, though to the best of my knowledge this
is much less likely in practice. Here, we can observe that the
concurrent waiters incur extra latency from being woken up by the
solitary bottom-half, rather than directly by the interrupt. This
appears to be scheduler induced (having discounted adverse effects from
having a rbtree walk/erase in the wakeup path), each additional
wake_up_process() costs approximately 1us on big core. Another effect of
performing the secondary wakeups from the first bottom-half is the
incurred delay this imposes on high priority threads - rather than
immediately returning to userspace and leaving the interrupt handler to
wake the others.
To offset the delay incurred with additional waiters on a request, we
could use a hybrid scheme that did a quick read in the interrupt handler
and dequeued all the completed waiters (incurring the overhead in the
interrupt handler, not the best plan either as we then incur GPU
submission latency) but we would still have to wake up the bottom-half
every time to do the heavyweight slow read. Or we could only kick the
waiters on the seqno with the same priority as the current task (i.e. in
the realtime waiter scenario, only it is woken up immediately by the
interrupt and simply queues the next waiter before returning to userspace,
minimising its delay at the expense of the chain, and also reducing
contention on its scheduler runqueue). This is effective at avoid long
pauses in the interrupt handler and at avoiding the extra latency in
realtime/high-priority waiters.
v2: Convert from a kworker per engine into a dedicated kthread for the
bottom-half.
v3: Rename request members and tweak comments.
v4: Use a per-engine spinlock in the breadcrumbs bottom-half.
v5: Fix race in locklessly checking waiter status and kicking the task on
adding a new waiter.
v6: Fix deciding when to force the timer to hide missing interrupts.
v7: Move the bottom-half from the kthread to the first client process.
v8: Reword a few comments
v9: Break the busy loop when the interrupt is unmasked or has fired.
v10: Comments, unnecessary churn, better debugging from Tvrtko
v11: Wake all completed waiters on removing the current bottom-half to
reduce the latency of waking up a herd of clients all waiting on the
same request.
v12: Rearrange missed-interrupt fault injection so that it works with
igt/drv_missed_irq_hang
v13: Rename intel_breadcrumb and friends to intel_wait in preparation
for signal handling.
v14: RCU commentary, assert_spin_locked
v15: Hide BUG_ON behind the compiler; report on gem_latency findings.
v16: Sort seqno-groups by priority so that first-waiter has the highest
task priority (and so avoid priority inversion).
v17: Add waiters to post-mortem GPU hang state.
v18: Return early for a completed wait after acquiring the spinlock.
Avoids adding ourselves to the tree if the is already complete, and
skips the awkward question of why we don't do completion wakeups for
waits earlier than or equal to ourselves.
v19: Prepare for init_breadcrumbs to fail. Later patches may want to
allocate during init, so be prepared to propagate back the error code.
Testcase: igt/gem_concurrent_blit
Testcase: igt/benchmarks/gem_latency
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> #v18
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-6-git-send-email-chris@chris-wilson.co.uk
2016-07-02 00:23:15 +08:00
|
|
|
|
|
|
|
/* We need to check whether any gpu reset happened in between
|
|
|
|
* the request being submitted and now. If a reset has occurred,
|
|
|
|
* the seqno will have been advance past ours and our request
|
|
|
|
* is complete. If we are in the process of handling a reset,
|
|
|
|
* the request is effectively complete as the rendering will
|
|
|
|
* be discarded, but we need to return in order to drop the
|
|
|
|
* struct_mutex.
|
|
|
|
*/
|
|
|
|
if (i915_reset_in_progress(&req->i915->gpu_error))
|
|
|
|
return true;
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
#endif
|