Take a copy of the HW state after a reset upon module loading by
executing a context switch from a blank context to the kernel context,
thus saving the default hw state over the blank context image.
We can then use the default hw state to initialise any future context,
ensuring that each starts with the default view of hw state.
v2: Unmap our default state from the GTT after stealing it from the
context. This should stop us from accidentally overwriting it via the
GTT (and frees up some precious GTT space).
Testcase: igt/gem_ctx_isolation
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-7-chris@chris-wilson.co.uk
Let the listener know that the context we just scheduled out was not
complete, and will be scheduled back in at a later point.
v2: Handle CONTEXT_STATUS_PREEMPTED in gvt by aliasing it to
CONTEXT_STATUS_OUT for the moment, gvt can expand upon the difference
later.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Zhenyu Wang" <zhenyuw@linux.intel.com>
Cc: "Wang, Zhi A" <zhi.a.wang@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171003203453.15692-3-chris@chris-wilson.co.uk
Getting started with v4.15 features:
- Cannonlake workarounds (Rodrigo, Oscar)
- Infoframe refactoring and fixes to enable infoframes for DP (Ville)
- VBT definition updates (Jani)
- Sparse warning fixes (Ville, Chris)
- Crtc state usage fixes and cleanups (Ville)
- DP vswing, pre-emph and buffer translation refactoring and fixes (Rodrigo)
- Prevent IPS from interfering with CRC capture (Ville, Marta)
- Enable Mesa to advertise ARB_timer_query (Nanley)
- Refactor GT number into intel_device_info (Lionel)
- Avoid eDP DP AUX CH timeouts harder (Manasi)
- CDCLK check improvements (Ville)
- Restore GPU clock boost on missed pageflip vblanks (Chris)
- Fence register reservation API for vGPU (Changbin)
- First batch of CCS fixes (Ville)
- Finally, numerous GEM fixes, cleanups and improvements (Chris)
* tag 'drm-intel-next-2017-09-07' of git://anongit.freedesktop.org/git/drm-intel: (100 commits)
drm/i915: Update DRIVER_DATE to 20170907
drm/i915/cnl: WaThrottleEUPerfToAvoidTDBackPressure:cnl(pre-prod)
drm/i915: Lift has-pinned-pages assert to caller of ____i915_gem_object_get_pages
drm/i915: Display WA #1133 WaFbcSkipSegments:cnl, glk
drm/i915/cnl: Allow the reg_read ioctl to read the RCS TIMESTAMP register
drm/i915: Move device_info.has_snoop into the static tables
drm/i915: Disable MI_STORE_DATA_IMM for i915g/i915gm
drm/i915: Re-enable GTT following a device reset
drm/i915/cnp: Wa 1181: Fix Backlight issue
drm/i915: Annotate user relocs with __user
drm/i915: Constify load detect mode
drm/i915/perf: Remove __user from u64 in drm_i915_perf_oa_config
drm/i915: Silence sparse by using gfp_t
drm/i915: io unmap functions want __iomem
drm/i915: Add __rcu to radix tree slot pointer
drm/i915: Wake up the device for the fbdev setup
drm/i915: Add interface to reserve fence registers for vGPU
drm/i915: Use correct path to trace include
drm/i915: Fix the missing PPAT cache attributes on CNL
drm/i915: Fix enum pipe vs. enum transcoder for the PCH transcoder
...
Our global struct with params is named exactly the same way
as new preferred name for the drm_i915_private function parameter.
To avoid such name reuse lets use different name for the global.
v5: pure rename
v6: fix
Credits-to: Coccinelle
@@
identifier n;
@@
(
- i915.n
+ i915_modparams.n
)
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Ville Syrjala <ville.syrjala@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170919193846.38060-1-michal.wajdeczko@intel.com
Looking at our virtual PCI device, we can see surprising Region 4 and Region 5.
00:10.0 VGA compatible controller: Intel Corporation Sky Lake Integrated Graphics (rev 06) (prog-if 00 [VGA controller])
....
Region 0: Memory at 140000000 (64-bit, non-prefetchable) [size=16M]
Region 2: Memory at 180000000 (64-bit, prefetchable) [size=1G]
Region 4: Memory at <ignored> (32-bit, non-prefetchable)
Region 5: Memory at <ignored> (32-bit, non-prefetchable)
Expansion ROM at febd6000 [disabled] [size=2K]
The fact is that we only implemented BAR0 and BAR2. Surprising Region 4 and
Region 5 are shown because we report their size as 0xffffffff. They should
report size 0 instead.
BTW, the physical GPU has a PIO BAR. GVTg hasn't implemented PIO access, so
we ignored this BAR for vGPU device.
v2: fix BAR size value calculation.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1458032
Signed-off-by: Changbin Du <changbin.du@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
(cherry picked from commit f1751362d6)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Remove the "INDEX" suffix from PPAT marcos as they are bits actually, not
indexes.
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1505392783-4084-2-git-send-email-zhi.a.wang@intel.com
Not only the context image consist of two parts (the PPHWSP, and the
logical context state), but we also allocate a header at the start of
for sharing data with GuC. Thus every lrc looks like this:
| [guc] | [hwsp] [logical state] |
|<- our header ->|<- context image ->|
So far, we have oversimplified whenever we use each of these parts of the
context, just because the GuC header happens to be in page 0, and the
(PP)HWSP is in page 1. But this had led to using the same define for more
than one meaning (as a page index in the lrc and as 1 page).
This patch adds defines for the GuC shared page, the PPHWSP page and the
start of the logical state. It also updated the places where the old
define was being used. Since we are not changing the size (or format) of
the context, there are no functional changes.
v2: Use PPHWSP index for hws again.
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: intel-gvt-dev@lists.freedesktop.org
Link: http://patchwork.freedesktop.org/patch/msgid/20170712193032.27080-1-michel.thierry@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170913085605.18299-1-chris@chris-wilson.co.uk
Looking at our virtual PCI device, we can see surprising Region 4 and Region 5.
00:10.0 VGA compatible controller: Intel Corporation Sky Lake Integrated Graphics (rev 06) (prog-if 00 [VGA controller])
....
Region 0: Memory at 140000000 (64-bit, non-prefetchable) [size=16M]
Region 2: Memory at 180000000 (64-bit, prefetchable) [size=1G]
Region 4: Memory at <ignored> (32-bit, non-prefetchable)
Region 5: Memory at <ignored> (32-bit, non-prefetchable)
Expansion ROM at febd6000 [disabled] [size=2K]
The fact is that we only implemented BAR0 and BAR2. Surprising Region 4 and
Region 5 are shown because we report their size as 0xffffffff. They should
report size 0 instead.
BTW, the physical GPU has a PIO BAR. GVTg hasn't implemented PIO access, so
we ignored this BAR for vGPU device.
v2: fix BAR size value calculation.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1458032
Signed-off-by: Changbin Du <changbin.du@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
fix the wrong return type and return error once the unknown
command is scanned.
v2:
- separate this error handle from healthy rating code. (Zhenyu)
Signed-off-by: fred gao <fred.gao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
When an error occurs in dispatch_workload, this patch is to do the
proper cleanup and rollback to the original states before the workload
is abandoned.
v2:
- split the mixed several error paths for better review. (Zhenyu)
v3:
- original PTR_ERR(cs) is good and code cleanup. (Zhenyu)
v4:
- reuse the existing i915_add_request for error handling. (Zhenyu)
v5:
- remove the duplicate error handling release_shadow_wa_ctx and
move the engine->context_unpin upper. (Zhenyu)
v6:
- keep the old label "out". (Zhenyu)
Signed-off-by: fred gao <fred.gao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
When it is failed in shadow_mm, the pin_count should rollback
to the original states before return.
v2:
- split the mixed several error paths for better review. (Zhenyu)
v3:
increase the pincount after shadow success. (Zhenyu)
Signed-off-by: fred gao <fred.gao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
refine the error handling for prepare_execlist_workload to restore to the
original states once error occurs.
only release the shadowed batch buffer and wa ctx when the workload is
completed successfully.
v2:
- split the mixed several error paths for better review. (Zhenyu)
v3:
- handle prepare batch buffer/wa ctx pin errors and
- emulate_schedule_in null issue. (Zhenyu)
v4:
- no need to handle emulate_schedule_in null issue. (Zhenyu)
v5:
- release the shadowed batch buffer and wa ctx only for the
successful workload. (Zhenyu)
v6:
- polish the return style. (Zhenyu)
Signed-off-by: fred gao <fred.gao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
When an error occurs after shadow_indirect_ctx, this patch is to do the
proper cleanup and rollback to the original states for shadowed indirect
context before the workload is abandoned.
v2:
- split the mixed several error paths for better review. (Zhenyu)
v3:
- no return check for clean up functions. (Changbin)
v4:
- expose and reuse the existing release_shadow_wa_ctx. (Zhenyu)
v5:
- move the release function to scheduler.c file. (Zhenyu)
v6:
- move error handling code of intel_gvt_scan_and_shadow_workload
to here. (Zhenyu)
Signed-off-by: fred gao <fred.gao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Currently i915 request structure and shadow ring buffer are allocated
before command scan, so it will have to restore to previous states once
any error happens afterwards in the long dispatch_workload path.
This patch is to introduce a reserved ring buffer created at the beginning
of vGPU initialization. Workload will be coped to this reserved buffer and
be scanned first, the i915 request and shadow ring buffer are only
allocated after the result of scan is successful.
To balance the memory usage and buffer alloc time, the coming bigger ring
buffer will be reallocated and kept until more bigger buffer is coming.
v2:
- use kmalloc for the smaller ring buffer, realloc if required. (Zhenyu)
v3:
- remove the dynamically allocated ring buffer. (Zhenyu)
v4:
- code style polish.
- kfree previous allocated buffer once kmalloc failed. (Zhenyu)
Signed-off-by: fred gao <fred.gao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
For vfio-pci, if the region support MMAP then it should support both
mmap and normal file access. The user-space is free to choose which is
being used. For qemu, we just need add 'x-no-mmap=on' for vfio-pci
option.
Currently GVTg only support MMAP for BAR2. So GVTg will not work when
user turn on x-no-mmap option.
This patch added file style access for BAR2, aka the GPU aperture. We
map the entire aperture partition of active vGPU to kernel space when
guest driver try to enable PCI Memory Space. Then we redirect the file
RW operation from kvmgt to this mapped area.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1458032
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
For PCI, 64bit bar consumes two BAR registers, but this doesn't mean
both of two BAR are valid. Actually the second BAR is regarded as
reserved in this case. So we shouldn't emulate the second BAR.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Pull i916 drm fixes from Rodrigo Vivi:
"Since Dave is on paternity leave we are sending drm/i915 fixes for
v4.14-rc1 directly to you as he had asked us to do.
The most critical ones are the GPU reset fix for gen2-4 and GVT fix
for a regression that is blocking gvt init to work on your tree.
The rest is general fixes for patches coming from drm-next"
Acked-by: Dave Airlie <airlied@redhat.com>
* tag 'drm-intel-next-fixes-2017-09-07' of git://anongit.freedesktop.org/git/drm-intel:
drm/i915: Re-enable GTT following a device reset
drm/i915: Annotate user relocs with __user
drm/i915: Silence sparse by using gfp_t
drm/i915: Add __rcu to radix tree slot pointer
drm/i915: Fix the missing PPAT cache attributes on CNL
drm/i915/gvt: Remove one duplicated MMIO
drm/i915: Fix enum pipe vs. enum transcoder for the PCH transcoder
drm/i915: Make i2c lock ops static
drm/i915: Make i9xx_load_ycbcr_conversion_matrix() static
drm/i915/edp: Increase T12 panel delay to 900 ms to fix DP AUX CH timeouts
drm/i915: Ignore duplicate VMA stored within the per-object handle LUT
drm/i915: Skip fence alignemnt check for the CCS plane
drm/i915: Treat fb->offsets[] as a raw byte offset instead of a linear offset
drm/i915: Always wake the device to flush the GTT
drm/i915: Recreate vmapping even when the object is pinned
drm/i915: Quietly cancel FBC activation if CRTC is turned off before worker
Remove one duplicated MMIO GEN6_PCODE_MAILBOX. Duplicated MMIO will
cause host GVT-g initialization failure.
Fixes: 9c3a16c887 ("drm/i915/hsw+: Add support for multiple power well regs")
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Acked-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
In the past, vGPU alloc fence registers by walking through mm.fence_list
to find fence which pin_count = 0 and vma is empty. vGPU may not find
enough fence registers this way. Because a fence can be bind to vma even
though it is not in using. We have found such failure many times these
days.
An option to resolve this issue is that we can force-remove fence from
vma in this case.
This patch added two new api to the fence management code:
- i915_reserve_fence() will try to find a free fence from fence_list
and force-remove vma if need.
- i915_unreserve_fence() reclaim a reserved fence after vGPU has
finished.
With this change, the fence management is more clear to work with vGPU.
GVTg do not need remove fence from fence_list in private.
v3: (Chris)
- Add struct_mutex lock assertion.
- Only count for unpinned fence.
v2: (Chris)
- Rename the new api for symmetry.
- Add safeguard to ensure at least 1 fence remained for host display.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/1504512061-5892-1-git-send-email-changbin.du@intel.com
Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZpRPIAAoJEAx081l5xIa+kCIP/2m2q0jBmCATvXXwrMBH0zNk
4lm9yIfl9pmluJP97aklvkeKF77chhost76+hv+0sQ9ZsJD8koHWv5WyTHEs7Cfn
NpmtGPqYlIZsWNSwW0OFF/XzllgLCVEWa+W/7ryYzPZrSEZr6Ge4HE0qS3LfuLJv
K89amZWHkP5ysPZ1uxRBzHtZfNAhdyjYVTUntCR7gj3DYv3yNdeZu+/epfcWK2w/
Q+ggoy644vX/yzy5L5zCGL/J1BjStDuec7sgAKTlNx4TwBUmp2wsfhEdovQBGFiu
t5PHMajvrBRqSJWDIAZSUfjQzIMSz517J9LWeChU7KtAClNJQJEabbu4CoX4aEmG
UbSzEe0IxnxQ4842jcqQXZ+mevlNIEIBVSNR7dXi17jL3Ts+APQgrYjRJYVk2ipg
uQ9TwkeVVu2WRGyU8iRQrXAZI7+O3p4UnbNPjeG2qACD2Ur7Z3n7b0mhNFPOLzO4
gbIv4D6CcUB/vltl+vhZTW3P50oMCVSq8ScCpY8CGo29mZ5vypj5PTS+W8FsyY3Z
ypyMqWg/DyxKlOoO+aK8EmXuZmgtDR4kb8asltH/S1A0NZkzjrFkKgs10Cp6EjJy
Zz1BWa1KKEpdN6yp+jrbJKjf9MJ7K2RPGv3bxWnCCdNv4j49rk4t3IHqvcihddsd
XXFQB5zE7Pz0ROi/VkXR
=5fxW
-----END PGP SIGNATURE-----
Merge tag 'drm-for-v4.14' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
"This is the main drm pull request for 4.14 merge window.
I'm sending this early, as my continuing journey into fatherhood is
occurring really soon now, I'm going to be mostly useless for the next
couple of weeks, though I may be able to read email, I doubt I'll be
doing much patch applications or git sending. If anything urgent pops
up I've asked Daniel/Jani/Alex/Sean to try and direct stuff towards
you.
Outside drm changes:
Some rcar-du updates that touch the V4L tree, all acks should be in
place. It adds one export to the radix tree code for new i915 use
case. There are some minor AGP cleanups (don't see that too often).
Changes to the vbox driver in staging to avoid breaking compilation.
Summary:
core:
- Atomic helper fixes
- Atomic UAPI fixes
- Add YCBCR 4:2:0 support
- Drop set_busid hook
- Refactor fb_helper locking
- Remove a bunch of internal APIs
- Add a bunch of better default handlers
- Format modifier/blob plane property added
- More internal header refactoring
- Make more internal API names consistent
- Enhanced syncobj APIs (wait/signal/reset/create signalled)
bridge:
- Add Synopsys Designware MIPI DSI host bridge driver
tiny:
- Add Pervasive Displays RePaper displays
- Add support for LEGO MINDSTORMS EV3 LCD
i915:
- Lots of GEN10/CNL support patches
- drm syncobj support
- Skylake+ watermark refactoring
- GVT vGPU 48-bit ppgtt support
- GVT performance improvements
- NOA change ioctl
- CCS (color compression) scanout support
- GPU reset improvements
amdgpu:
- Initial hugepage support
- BO migration logic rework
- Vega10 improvements
- Powerplay fixes
- Stop reprogramming the MC
- Fixes for ACP audio on stoney
- SR-IOV fixes/improvements
- Command submission overhead improvements
amdkfd:
- Non-dGPU upstreaming patches
- Scratch VA ioctl
- Image tiling modes
- Update PM4 headers for new firmware
- Drop all BUG_ONs.
nouveau:
- GP108 modesetting support.
- Disable MSI on big endian.
vmwgfx:
- Add fence fd support.
msm:
- Runtime PM improvements
exynos:
- NV12MT support
- Refactor KMS drivers
imx-drm:
- Lock scanout channel to improve memory bw
- Cleanups
etnaviv:
- GEM object population fixes
tegra:
- Prep work for Tegra186 support
- PRIME mmap support
sunxi:
- HDMI support improvements
- HDMI CEC support
omapdrm:
- HDMI hotplug IRQ support
- Big driver cleanup
- OMAP5 DSI support
rcar-du:
- vblank fixes
- VSP1 updates
arcgpu:
- Minor fixes
stm:
- Add STM32 DSI controller driver
dw_hdmi:
- Add support for Rockchip RK3399
- HDMI CEC support
atmel-hlcdc:
- Add 8-bit color support
vc4:
- Atomic fixes
- New ioctl to attach a label to a buffer object
- HDMI CEC support
- Allow userspace to dictate rendering order on submit ioctl"
* tag 'drm-for-v4.14' of git://people.freedesktop.org/~airlied/linux: (1074 commits)
drm/syncobj: Add a signal ioctl (v3)
drm/syncobj: Add a reset ioctl (v3)
drm/syncobj: Add a syncobj_array_find helper
drm/syncobj: Allow wait for submit and signal behavior (v5)
drm/syncobj: Add a CREATE_SIGNALED flag
drm/syncobj: Add a callback mechanism for replace_fence (v3)
drm/syncobj: add sync obj wait interface. (v8)
i915: Use drm_syncobj_fence_get
drm/syncobj: Add a race-free drm_syncobj_fence_get helper (v2)
drm/syncobj: Rename fence_get to find_fence
drm: kirin: Add mode_valid logic to avoid mode clocks we can't generate
drm/vmwgfx: Bump the version for fence FD support
drm/vmwgfx: Add export fence to file descriptor support
drm/vmwgfx: Add support for imported Fence File Descriptor
drm/vmwgfx: Prepare to support fence fd
drm/vmwgfx: Fix incorrect command header offset at restart
drm/vmwgfx: Support the NOP_ERROR command
drm/vmwgfx: Restart command buffers after errors
drm/vmwgfx: Move irq bottom half processing to threads
drm/vmwgfx: Don't use drm_irq_[un]install
...
once error happens in shadow_indirect_ctx function, the variable
wa_ctx->indirect_ctx.obj is not initialized but accessed, so the
kernel null point panic occurs.
Fixes: 894cf7d156 ("drm/i915/gvt: i915_gem_object_create() returns an error pointer")
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: fred gao <fred.gao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Final pile of features for 4.14
- New ioctl to change NOA configurations, plus prep (Lionel)
- CCS (color compression) scanout support, based on the fancy new
modifier additions (Ville&Ben)
- Document i915 register macro style (Jani)
- Many more gen10/cnl patches (Rodrigo, Pualo, ...)
- More gpu reset vs. modeset duct-tape to restore the old way.
- prep work for cnl: hpd_pin reorg (Rodrigo), support for more power
wells (Imre), i2c pin reorg (Anusha)
- drm_syncobj support (Jason Ekstrand)
- forcewake vs gpu reset fix (Chris)
- execbuf speedup for the no-relocs fastpath, anv/vk low-overhead ftw (Chris)
- switch to idr/radixtree instead of the resizing ht for execbuf id->vma
lookups (Chris)
gvt:
- MMIO save/restore optimization (Changbin)
- Split workload scan vs. dispatch for more parallel exec (Ping)
- vGPU full 48bit ppgtt support (Joonas, Tina)
- vGPU hw id expose for perf (Zhenyu)
Bunch of work all over to make the igt CI runs more complete/stable.
Watch https://intel-gfx-ci.01.org/tree/drm-tip/shards-all.html for
progress in getting this ready. Next week we're going into production
mode (i.e. will send results to intel-gfx) on hsw, more platforms to
come.
Also, a new maintainer tram, I'm stepping out. Huge thanks to Jani for
being an awesome co-maintainer the past few years, and all the best
for Jani, Joonas&Rodrigo as the new maintainers!
* tag 'drm-intel-next-2017-08-18' of git://anongit.freedesktop.org/git/drm-intel: (179 commits)
drm/i915: Update DRIVER_DATE to 20170818
drm/i915/bxt: use NULL for GPIO connection ID
drm/i915: Mark the GT as busy before idling the previous request
drm/i915: Trivial grammar fix s/opt of/opt out of/ in comment
drm/i915: Replace execbuf vma ht with an idr
drm/i915: Simplify eb_lookup_vmas()
drm/i915: Convert execbuf to use struct-of-array packing for critical fields
drm/i915: Check context status before looking up our obj/vma
drm/i915: Don't use MI_STORE_DWORD_IMM on Sandybridge/vcs
drm/i915: Stop touching forcewake following a gen6+ engine reset
MAINTAINERS: drm/i915 has a new maintainer team
drm/i915: Split pin mapping into per platform functions
drm/i915/opregion: let user specify override VBT via firmware load
drm/i915/cnl: Reuse skl_wm_get_hw_state on Cannonlake.
drm/i915/gen10: implement gen 10 watermarks calculations
drm/i915/cnl: Fix LSPCON support.
drm/i915/vbt: ignore extraneous child devices for a port
drm/i915/cnl: Setup PAT Index.
drm/i915/edp: Allow alternate fixed mode for eDP if available.
drm/i915: Add support for drm syncobjs
...
Future platforms increase the number of power wells which require
additional control registers. A convenient way to select the correct
register is to use the high bits of the power well ID as index. This
patch only prepares for this, while upcoming platform enabling patches
will add the actual new power well IDs and corresponding power well
control registers.
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Animesh Manna <animesh.manna@intel.com>
Cc: Rakshmi Bhatia <rakshmi.bhatia@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Animesh Manna <animesh.manna@intel.com>
Reviewed-by: Rakshmi Bhatia <rakshmi.bhatia@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170814151530.24154-2-imre.deak@intel.com
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJZkNpUAAoJEHm+PkMAQRiGr68H/2nr8kxpoUhZ7eA5C71waCjh
gnJSevkzJAp+fCb0KfQFAp1qvpmLLle4e6tAxYgTQZg4Z3W5cJJNfxu9TzY5sGuL
o9QUr43XzABepW4e4jhRtZv6dj3K6XruNeDQKXDZTDcc/S8zoiS/Pltq7VgPcAuM
kX+3qsNdUyknngD6b0z9NtJkb0mHKY6J8MpraWRO34egDwsaN/tuhRj0DRQpCoyQ
x/k+hMbc9MB9Dn8cfACo6Omb+r5Rfd7dTBUAju/TnIIgs//9voHba307N7XvLJZg
kWc8MqMQQZXfRZHB0atpDMHyZS/XQRlNPXj76j0+Ud/byODKTFkkazmgTpALvj8=
=CxeU
-----END PGP SIGNATURE-----
Backmerge tag 'v4.13-rc5' into drm-next
Linux 4.13-rc5
There's a really nasty nouveau collision, hopefully someone can take a look
once I pushed this out.
Guest i915 full ppgtt functionality was blocking by an issue, which would
lead to gpu hardware hang. Guest i915 driver may update the ppgtt table
just before this workload is going to be submitted to the hardware by
device model. This case wasn't handled well by device model before, due
to the small time window between removing old ppgtt entry and adding the
new one. Errors occur when the workload is executed by hardware during
that small time window. This patch is to remove this time window by adding
the new ppgtt entry first and then remove the old one.
Changes in v2:
- Move VGT_CAPS_FULL_PPGTT introduction to patch 2/4. (Joonas)
Changes since v2:
- Divide the whole patch set into two separate patch series, with one
patch in i915 side to check guest i915 full ppgtt capability and enable
it when this capability is supported by the device model, and the other
one in gvt side which fixs the blocking issue and enables the device
model to provide the capability to guest. And this patch focuses on gvt
side. (Joonas)
- Change the title from "reorder the shadow ppgtt update process by adding
entry first" to "Fix guest i915 full ppgtt blocking issue". (Tina)
Changes since v3:
- Rebase to the latest branch.
Changes since v4:
- Tested by Tina Zhang.
Changes since v5:
- Rebase to the latest branch.
v6:
- Update full 48bit ppgtt definition
Cc: Tina Zhang <tina.zhang@intel.com>
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
The current context logic only updates the descriptor of context when
it's being pinned to graphics memory space. But this cannot satisfy the
requirement of shadow context. The addressing mode of the pinned shadow
context descriptor may be changed according to the guest addressing mode.
And this won't be updated, as the already pinned shadow context has no
chance to update its descriptor. And this will lead to GPU hang issue,
as shadow context is used with wrong descriptor. This patch fixes this
issue by letting the pinned shadow context descriptor update its
addressing mode on demand.
This patch fixes GPU HANG issue which happends after changing the
grub parameter i915.enable_ppgtt form 0x01 to 0x03 or vice versa and
then rebooting the guest.
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Signed-off-by: Kechen Lu <kechen.lu@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
This exposes vGPU context hw id in mdev sysfs which is used to
do vGPU based profiling. Retrieved vGPU context hw id can be set
through i915 perf ioctl to set profiling for target vGPU.
Cc: Jiao Pengyuan <pengyuan.jiao@intel.com>
Cc: Niu Bing <bing.niu@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
When doing the VGPU reset, we don't need to do the gtt/ppgtt reset.
This will make the GVT to do the ppgtt shadow every time for
a workload and caused really bad performance after a VGPU reset.
This patch will make sure ppgtt clean only happen at device module
level reset to fix this.
Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
When debugging the gtt code, found the intel_vgpu_gma_to_gpa() can
translate any given GMA though the GMA is not valid. This because
the GTT ops suppress the possible errors, which may result in an
invalid PT entry is retrieved by upper caller.
This patch changed the prototype of pte ops to propagate status to
callers. Then we make sure the GTT walker stop as early as when
a error is detected to prevent undefined behavior.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Remove duplicated MMIO entries in the tracked MMIO list. -EEXIST
is returned if duplicated MMIO entries are found when new MMIO
entry is added.
v2:
- Use WARN(1, ...) for more verbose message. (Zhenyu)
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Cc: Changbin Du <changbin.du@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: Yulei Zhang <yulei.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Need to take runtime pm when do early scan/shadow of workload
for request operations.
Fixes: 7fa56bd159bc ("drm/i915/gvt: Audit and shadow workload during ELSP writing")
Cc: Ping Gao <ping.a.gao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Use the exist function intel_gvt_ggtt_validate_range to replace
these duplicated code that do the same thing.
Signed-off-by: Ping Gao <ping.a.gao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
The function workload scan and shadow have to hold the drm.struct_mutex
before called. To avoid misusing of this function, add a lockdep assert
in it.
Signed-off-by: Ping Gao <ping.a.gao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Let the workload audit and shadow ahead of vGPU scheduling, that
will eliminate GPU idle time and improve performance for multi-VM.
The performance of Heaven running simultaneously in 3VMs has
improved 20% after this patch.
v2:Remove condition current->vgpu==vgpu when shadow during ELSP
writing.
Signed-off-by: Ping Gao <ping.a.gao@intel.com>
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
To perform the workload scan and shadow in ELSP writing stage for
performance consideration, the workload scan and shadow stuffs
should be factored out from dispatch_workload().
v2:Put context pin before i915_add_request;
Refine the comments;
Rename some APIs;
v3:workload->status should set only when error happens.
v4:i915_add_request is must to have after i915_gem_request_alloc.
Signed-off-by: Ping Gao <ping.a.gao@intel.com>
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
The I915_READ/WRITE is not only a mmio read/write, it also contains
debug checking and Forcewake domain lookup. This is too heavy for
GVT ring switch case which access batch of mmio registers on ring
switch. We can handle Forcewake manually and use the raw
i915_read/write instead. The benefit from this is 2x faster mmio
switch performance.
Before After
cycles ~550000 ~250000
v2: Use existing I915_READ_FW/I915_WRITE_FW macro. (zhenyu)
Signed-off-by: Changbin Du <changbin.du@intel.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
There are lots of POSTING_READ alongside each mmio write Op. While
actually this is not necessary. It just bring too much latency since
PCIe read Op is very slow which is of non-posted transaction.
For PCIe device, the mem transaction for strong ordering rules are:
o PCIe mmio write sequence is FIFO. Posted request cannot
pass previous posted request.
o PCIe mmio read will not go ahead of previous write.
Intel graphics doesn't support RO, so we can apply above rules. In
our case, we only need one POSTING_READ at last. This can remove
half of mmio read Op and then the average ring switch performance
is nearly doubled.
Before After
cycles ~970000 ~550000
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
It is better to use gvt_err when the gvt resource is not enough so
the user can be notified from the kernel dmesg. And this kind of
error message is gvt related.
Suggested-by: Bing Niu <bing.niu@intel.com>
Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Cc: Bing Niu <bing.niu@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
When linux guest access mmio with __raw_i915_read64 or __raw_i915_write64,
its length is 8 bytes.
This fix the linux guest in xengt couldn't boot up as it fail in
reading pv_info->magic.
Fixes: 65f9f6febf ("drm/i915/gvt: Optimize MMIO register handling for some large MMIO blocks")
Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
MMIO block with tracked mmio, is introduced for the sake of performance
of searching tracked mmio. All the tracked mmio needs to get the initial
value from the HW state during vGPU being created. This patch is to
initialize the tracked registers in MMIO block with the HW state.
v2: Add "Fixes:" line for this patch (Zhenyu)
Fixes: 65f9f6febf ("drm/i915/gvt: Optimize MMIO register handling for some large MMIO blocks")
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
If a workload caused a HW GPU hang or it is in the middle of
vGPU reset, the workload queue should be cleaned up to emulate
the hang state of the GPU.
v2:
- use ENGINE_MASK(ring_id) instead of (1 << ring_id). (Zhenyu)
Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Use resetting_eng to identify which engine is resetting
so the rest ones' workload won't be impacted
v2:
- use ENGINE_MASK(ring_id) instead of (1 << ring_id). (Zhenyu)
Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
The pattern of a power well backing a set of fuses whose initialization
we need to wait for during power well enabling is common to all GEN9+
platforms. Adding support for this to the HSW power well enable helper
allows us to use the HSW/BDW power well code for GEN9+ as well in a
follow-up patch.
v2:
- Use an enum for power gates instead of raw numbers. (Ville)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170711204236.5618-6-imre.deak@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Although on HSW/BDW there is only a single display global power well,
it's programmed the same way as other GEN9+ power wells. This also
means we can get at the HSW/BDW request and status flags the same way
it's done on GEN9+ by assigning the corresponding HSW/BDW power well ID.
This ID was assigned in a recent patch, so we can now switch to using
the same macros everywhere on HSW+.
Updating the HSW power well control register with RMW is not strictly
necessary, but this will allow us to use the same code for GEN9+.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1499352040-8819-13-git-send-email-imre.deak@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2nd round of 4.14 features:
- prep for deferred fbdev setup
- refactor fixed 16.16 computations and skl+ wm code (Mahesh Kumar)
- more cnl paches (Rodrigo, Imre et al)
- tighten context cleanup and handling (Chris Wilson)
- fix interlaced handling on skl+ (Mahesh Kumar)
- small bits as usual
* tag 'drm-intel-next-2017-07-17' of git://anongit.freedesktop.org/git/drm-intel: (84 commits)
drm/i915: Update DRIVER_DATE to 20170717
drm/i915: Protect against deferred fbdev setup
drm/i915/fbdev: Always forward hotplug events
drm/i915/skl+: unify cpp value in WM calculation
drm/i915/skl+: WM calculation don't require height
drm/i915: Addition wrapper for fixed16.16 operation
drm/i915: cleanup fixed-point wrappers naming
drm/i915: Always perform internal fixed16 division in 64 bits
drm/i915: take-out common clamping code of fixed16 wrappers
drm/i915/cnl: Add missing type case.
drm/i915/cnl: Add max allowed Cannonlake DC.
drm/i915: Make DP-MST connector info work
drm/i915/cnl: Get DDI clock based on PLLs.
drm/i915/cnl: Inherit RPS stuff from previous platforms.
drm/i915/cnl: Gen10 render context size.
drm/i915/cnl: Don't trust VBT's alternate pin for port D for now.
drm/i915: Fix the kernel panic when using aliasing ppgtt
drm/i915/cnl: Cannonlake color init.
drm/i915/cnl: Add force wake for gen10+.
x86/gpu: CNL uses the same GMS values as SKL
...
Once the Windows guest is shutdown, the display pipe will be disabled
and intel_gvt_check_vblank_emulation will be called to check if the
vblank timer is turned off. Given the scenario of creating VM1 ,VM2,
destoying VM2 in current code, VM1 has pipe enabled and continues to
check VM2, the flag have_enabled_pipe is always false since all the VM2
pipes are disabled, so the vblank timer will be canceled and TDR happens
in Windows VM1 guest due to the vsync timeout.
In this patch the vblank timer will be never canceled once one pipe is
enabled.
v2:
- remove have_enabled_pipe flag and check pipe enabled directly. (Zhenyu)
Cc: Wang Hongbo <hongbo.wang@intel.com>
Signed-off-by: fred gao <fred.gao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>