2010-11-06 05:23:30 +08:00
|
|
|
/*
|
|
|
|
* Copyright © 2010 Daniel Vetter
|
2014-02-20 14:05:47 +08:00
|
|
|
* Copyright © 2011-2014 Intel Corporation
|
2010-11-06 05:23:30 +08:00
|
|
|
*
|
|
|
|
* Permission is hereby granted, free of charge, to any person obtaining a
|
|
|
|
* copy of this software and associated documentation files (the "Software"),
|
|
|
|
* to deal in the Software without restriction, including without limitation
|
|
|
|
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
|
|
|
* and/or sell copies of the Software, and to permit persons to whom the
|
|
|
|
* Software is furnished to do so, subject to the following conditions:
|
|
|
|
*
|
|
|
|
* The above copyright notice and this permission notice (including the next
|
|
|
|
* paragraph) shall be included in all copies or substantial portions of the
|
|
|
|
* Software.
|
|
|
|
*
|
|
|
|
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
|
|
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
|
|
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
|
|
|
|
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
|
|
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
|
|
|
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
|
|
|
* IN THE SOFTWARE.
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
|
2017-02-14 01:15:44 +08:00
|
|
|
#include <linux/slab.h> /* fault-inject.h is not standalone! */
|
|
|
|
|
|
|
|
#include <linux/fault-inject.h>
|
2017-01-11 19:23:10 +08:00
|
|
|
#include <linux/log2.h>
|
2017-01-11 19:23:12 +08:00
|
|
|
#include <linux/random.h>
|
2014-01-08 23:10:27 +08:00
|
|
|
#include <linux/seq_file.h>
|
2015-10-24 01:43:32 +08:00
|
|
|
#include <linux/stop_machine.h>
|
2017-01-11 19:23:10 +08:00
|
|
|
|
2017-05-09 06:58:17 +08:00
|
|
|
#include <asm/set_memory.h>
|
|
|
|
|
2012-10-03 01:01:07 +08:00
|
|
|
#include <drm/drmP.h>
|
|
|
|
#include <drm/i915_drm.h>
|
2017-01-11 19:23:10 +08:00
|
|
|
|
2010-11-06 05:23:30 +08:00
|
|
|
#include "i915_drv.h"
|
2015-02-10 19:05:48 +08:00
|
|
|
#include "i915_vgpu.h"
|
2010-11-06 05:23:30 +08:00
|
|
|
#include "i915_trace.h"
|
|
|
|
#include "intel_drv.h"
|
drm/i915: Move GEM activity tracking into a common struct reservation_object
In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)
v2: Reimplement busy-ioctl (by walking the reservation object), postpone
the ABI change for another day. Similarly use the reservation object to
find the last_write request (if active and from i915) for choosing
display CS flips.
Caveats:
* busy-ioctl: busy-ioctl only reports on the native fences, it will not
warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is
being rendered to by external fences. It also will not report the same
busy state as wait-ioctl (or polling on the dma-buf) in the same
circumstances. On the plus side, it does retain reporting of which
*i915* engines are engaged with this object.
* non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering, see
"drm/i915: Restore nonblocking awaits for modesetting"
* dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 66s), mainly due to atomic operations
(maintaining the fence refcounts).
* loss of object-level retirement callbacks, emulated by VMA retirement
tracking.
* minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
2016-10-28 20:58:44 +08:00
|
|
|
#include "intel_frontbuffer.h"
|
2010-11-06 05:23:30 +08:00
|
|
|
|
2016-08-22 15:44:31 +08:00
|
|
|
#define I915_GFP_DMA (GFP_KERNEL | __GFP_HIGHMEM)
|
|
|
|
|
2014-12-11 01:27:59 +08:00
|
|
|
/**
|
|
|
|
* DOC: Global GTT views
|
|
|
|
*
|
|
|
|
* Background and previous state
|
|
|
|
*
|
|
|
|
* Historically objects could exists (be bound) in global GTT space only as
|
|
|
|
* singular instances with a view representing all of the object's backing pages
|
|
|
|
* in a linear fashion. This view will be called a normal view.
|
|
|
|
*
|
|
|
|
* To support multiple views of the same object, where the number of mapped
|
|
|
|
* pages is not equal to the backing store, or where the layout of the pages
|
|
|
|
* is not linear, concept of a GGTT view was added.
|
|
|
|
*
|
|
|
|
* One example of an alternative view is a stereo display driven by a single
|
|
|
|
* image. In this case we would have a framebuffer looking like this
|
|
|
|
* (2x2 pages):
|
|
|
|
*
|
|
|
|
* 12
|
|
|
|
* 34
|
|
|
|
*
|
|
|
|
* Above would represent a normal GGTT view as normally mapped for GPU or CPU
|
|
|
|
* rendering. In contrast, fed to the display engine would be an alternative
|
|
|
|
* view which could look something like this:
|
|
|
|
*
|
|
|
|
* 1212
|
|
|
|
* 3434
|
|
|
|
*
|
|
|
|
* In this example both the size and layout of pages in the alternative view is
|
|
|
|
* different from the normal view.
|
|
|
|
*
|
|
|
|
* Implementation and usage
|
|
|
|
*
|
|
|
|
* GGTT views are implemented using VMAs and are distinguished via enum
|
|
|
|
* i915_ggtt_view_type and struct i915_ggtt_view.
|
|
|
|
*
|
|
|
|
* A new flavour of core GEM functions which work with GGTT bound objects were
|
2015-03-16 20:11:13 +08:00
|
|
|
* added with the _ggtt_ infix, and sometimes with _view postfix to avoid
|
|
|
|
* renaming in large amounts of code. They take the struct i915_ggtt_view
|
|
|
|
* parameter encapsulating all metadata required to implement a view.
|
2014-12-11 01:27:59 +08:00
|
|
|
*
|
|
|
|
* As a helper for callers which are only interested in the normal view,
|
|
|
|
* globally const i915_ggtt_view_normal singleton instance exists. All old core
|
|
|
|
* GEM API functions, the ones not taking the view parameter, are operating on,
|
|
|
|
* or with the normal GGTT view.
|
|
|
|
*
|
|
|
|
* Code wanting to add or use a new GGTT view needs to:
|
|
|
|
*
|
|
|
|
* 1. Add a new enum with a suitable name.
|
|
|
|
* 2. Extend the metadata in the i915_ggtt_view structure if required.
|
|
|
|
* 3. Add support to i915_get_vma_pages().
|
|
|
|
*
|
|
|
|
* New views are required to build a scatter-gather table from within the
|
|
|
|
* i915_get_vma_pages function. This table is stored in the vma.ggtt_view and
|
|
|
|
* exists for the lifetime of an VMA.
|
|
|
|
*
|
|
|
|
* Core API is designed to have copy semantics which means that passed in
|
|
|
|
* struct i915_ggtt_view does not need to be persistent (left around after
|
|
|
|
* calling the core API functions).
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
|
2015-04-14 23:35:27 +08:00
|
|
|
static int
|
|
|
|
i915_get_ggtt_vma_pages(struct i915_vma *vma);
|
|
|
|
|
2017-01-12 19:00:49 +08:00
|
|
|
static void gen6_ggtt_invalidate(struct drm_i915_private *dev_priv)
|
|
|
|
{
|
|
|
|
/* Note that as an uncached mmio write, this should flush the
|
|
|
|
* WCB of the writes into the GGTT before it triggers the invalidate.
|
|
|
|
*/
|
|
|
|
I915_WRITE(GFX_FLSH_CNTL_GEN6, GFX_FLSH_CNTL_EN);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void guc_ggtt_invalidate(struct drm_i915_private *dev_priv)
|
|
|
|
{
|
|
|
|
gen6_ggtt_invalidate(dev_priv);
|
|
|
|
I915_WRITE(GEN8_GTCR, GEN8_GTCR_INVALIDATE);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void gmch_ggtt_invalidate(struct drm_i915_private *dev_priv)
|
|
|
|
{
|
|
|
|
intel_gtt_chipset_flush();
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void i915_ggtt_invalidate(struct drm_i915_private *i915)
|
|
|
|
{
|
|
|
|
i915->ggtt.invalidate(i915);
|
|
|
|
}
|
|
|
|
|
2016-05-06 22:40:21 +08:00
|
|
|
int intel_sanitize_enable_ppgtt(struct drm_i915_private *dev_priv,
|
|
|
|
int enable_ppgtt)
|
2014-04-29 17:53:58 +08:00
|
|
|
{
|
2014-09-19 18:56:27 +08:00
|
|
|
bool has_aliasing_ppgtt;
|
|
|
|
bool has_full_ppgtt;
|
2015-09-30 22:36:19 +08:00
|
|
|
bool has_full_48bit_ppgtt;
|
2014-09-19 18:56:27 +08:00
|
|
|
|
2016-12-06 09:57:03 +08:00
|
|
|
has_aliasing_ppgtt = dev_priv->info.has_aliasing_ppgtt;
|
|
|
|
has_full_ppgtt = dev_priv->info.has_full_ppgtt;
|
|
|
|
has_full_48bit_ppgtt = dev_priv->info.has_full_48bit_ppgtt;
|
2014-09-19 18:56:27 +08:00
|
|
|
|
2016-09-06 12:04:12 +08:00
|
|
|
if (intel_vgpu_active(dev_priv)) {
|
2017-08-14 15:20:46 +08:00
|
|
|
/* GVT-g has no support for 32bit ppgtt */
|
2016-09-06 12:04:12 +08:00
|
|
|
has_full_ppgtt = false;
|
2017-08-14 15:20:46 +08:00
|
|
|
has_full_48bit_ppgtt = intel_vgpu_has_full_48bit_ppgtt(dev_priv);
|
2016-09-06 12:04:12 +08:00
|
|
|
}
|
2015-02-10 19:05:54 +08:00
|
|
|
|
2016-04-29 20:18:22 +08:00
|
|
|
if (!has_aliasing_ppgtt)
|
|
|
|
return 0;
|
|
|
|
|
2014-11-14 23:05:59 +08:00
|
|
|
/*
|
|
|
|
* We don't allow disabling PPGTT for gen9+ as it's a requirement for
|
|
|
|
* execlists, the sole mechanism available to submit work.
|
|
|
|
*/
|
2016-05-06 22:40:21 +08:00
|
|
|
if (enable_ppgtt == 0 && INTEL_GEN(dev_priv) < 9)
|
2014-04-29 17:53:58 +08:00
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (enable_ppgtt == 1)
|
|
|
|
return 1;
|
|
|
|
|
2014-09-19 18:56:27 +08:00
|
|
|
if (enable_ppgtt == 2 && has_full_ppgtt)
|
2014-04-29 17:53:58 +08:00
|
|
|
return 2;
|
|
|
|
|
2015-09-30 22:36:19 +08:00
|
|
|
if (enable_ppgtt == 3 && has_full_48bit_ppgtt)
|
|
|
|
return 3;
|
|
|
|
|
drm/i915: Disable full ppgtt by default
There are too many oustanding issues:
- Fence handling in the current code is broken. There's a patch series
from me, but it's blocked on and extended review (which includes
writing the testcases).
- IOMMU mapping handling is broken, we need to properly refcount it -
currently it gets destroyed when the first vma is unbound, so way
too early.
- There's a pending reset issue on snb. Since Mika's reset work and
full ppgtt have been pulled in in separate branches and ended up
intermittingly breaking each another it's unclear who's the exact
culprit here.
- We still have persistent evidince of crazy recursion bugs through
vma_unbind and ppgtt_relase, e.g.
https://bugs.freedesktop.org/show_bug.cgi?id=73383
This issue (and a few others meanwhile resolved) have blocked our
performance measuring/tuning group since 3 months.
- Secure batch dispatching is broken. This is blocking Brad Volkin's
command checker work since 3 months.
All these issues are confirmed to only happen when full ppgtt is
enabled, falling back to aliasing ppgtt resolves them. But even
aliasing ppgtt itself still has a regression:
- We currently unconditionally bind objects into the aliasing ppgtt,
which means all priviledged objects like ringbuffers are visible to
unpriviledged access again. On top of that this also breaks the
command checker for aliasing ppgtt, since it can't hide the
validated batch any more.
Furthermore topic/full-ppgtt has never been reviewed:
- Lifetime rules around vma unbinding/release are unclear, resulting
into this awesome hack called ppgtt_release. Which seems to take the
blame for most of the recursion fallout.
- Context/ring init works different on gpu reset than anywhere else.
Such differeneces have in the past always lead to really hard to
track down bugs.
- Aliasing ppgtt is treated in a bunch of places as a real address
space, but it isn't - the real address space is always the global
gtt in that case. This results in a bit a mess between contexts and
ppgtt object, further complication the context/ppgtt/vma lifetime
rules.
- We don't have any docs describing the overall concepts introduced
with full ppgtt. A short, concise overview describing vmas and some
of the strange bits around them (like the unbound vmas used by
execbuf, or the new binding rules) really is needed.
Note that a lot of the post topic/full-ppgtt merge fallout has already
been addressed, this entire list here of 10 issues really only contains
the still outstanding issues.
Finally the 3.15 merge window is approaching and I think we need to
use the remaining time to ensure that our fallback option of using
aliasing ppgtt is in solid shape. Hence I think it's time to throw the
switch. While at it demote the helper from static inline status
because really.
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Dave Airlie <airlied@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-06 16:40:43 +08:00
|
|
|
/* Disable ppgtt on SNB if VT-d is on. */
|
2017-05-25 20:16:12 +08:00
|
|
|
if (IS_GEN6(dev_priv) && intel_vtd_active()) {
|
drm/i915: Disable full ppgtt by default
There are too many oustanding issues:
- Fence handling in the current code is broken. There's a patch series
from me, but it's blocked on and extended review (which includes
writing the testcases).
- IOMMU mapping handling is broken, we need to properly refcount it -
currently it gets destroyed when the first vma is unbound, so way
too early.
- There's a pending reset issue on snb. Since Mika's reset work and
full ppgtt have been pulled in in separate branches and ended up
intermittingly breaking each another it's unclear who's the exact
culprit here.
- We still have persistent evidince of crazy recursion bugs through
vma_unbind and ppgtt_relase, e.g.
https://bugs.freedesktop.org/show_bug.cgi?id=73383
This issue (and a few others meanwhile resolved) have blocked our
performance measuring/tuning group since 3 months.
- Secure batch dispatching is broken. This is blocking Brad Volkin's
command checker work since 3 months.
All these issues are confirmed to only happen when full ppgtt is
enabled, falling back to aliasing ppgtt resolves them. But even
aliasing ppgtt itself still has a regression:
- We currently unconditionally bind objects into the aliasing ppgtt,
which means all priviledged objects like ringbuffers are visible to
unpriviledged access again. On top of that this also breaks the
command checker for aliasing ppgtt, since it can't hide the
validated batch any more.
Furthermore topic/full-ppgtt has never been reviewed:
- Lifetime rules around vma unbinding/release are unclear, resulting
into this awesome hack called ppgtt_release. Which seems to take the
blame for most of the recursion fallout.
- Context/ring init works different on gpu reset than anywhere else.
Such differeneces have in the past always lead to really hard to
track down bugs.
- Aliasing ppgtt is treated in a bunch of places as a real address
space, but it isn't - the real address space is always the global
gtt in that case. This results in a bit a mess between contexts and
ppgtt object, further complication the context/ppgtt/vma lifetime
rules.
- We don't have any docs describing the overall concepts introduced
with full ppgtt. A short, concise overview describing vmas and some
of the strange bits around them (like the unbound vmas used by
execbuf, or the new binding rules) really is needed.
Note that a lot of the post topic/full-ppgtt merge fallout has already
been addressed, this entire list here of 10 issues really only contains
the still outstanding issues.
Finally the 3.15 merge window is approaching and I think we need to
use the remaining time to ensure that our fallback option of using
aliasing ppgtt is in solid shape. Hence I think it's time to throw the
switch. While at it demote the helper from static inline status
because really.
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Dave Airlie <airlied@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-06 16:40:43 +08:00
|
|
|
DRM_INFO("Disabling PPGTT because VT-d is on\n");
|
2014-04-29 17:53:58 +08:00
|
|
|
return 0;
|
drm/i915: Disable full ppgtt by default
There are too many oustanding issues:
- Fence handling in the current code is broken. There's a patch series
from me, but it's blocked on and extended review (which includes
writing the testcases).
- IOMMU mapping handling is broken, we need to properly refcount it -
currently it gets destroyed when the first vma is unbound, so way
too early.
- There's a pending reset issue on snb. Since Mika's reset work and
full ppgtt have been pulled in in separate branches and ended up
intermittingly breaking each another it's unclear who's the exact
culprit here.
- We still have persistent evidince of crazy recursion bugs through
vma_unbind and ppgtt_relase, e.g.
https://bugs.freedesktop.org/show_bug.cgi?id=73383
This issue (and a few others meanwhile resolved) have blocked our
performance measuring/tuning group since 3 months.
- Secure batch dispatching is broken. This is blocking Brad Volkin's
command checker work since 3 months.
All these issues are confirmed to only happen when full ppgtt is
enabled, falling back to aliasing ppgtt resolves them. But even
aliasing ppgtt itself still has a regression:
- We currently unconditionally bind objects into the aliasing ppgtt,
which means all priviledged objects like ringbuffers are visible to
unpriviledged access again. On top of that this also breaks the
command checker for aliasing ppgtt, since it can't hide the
validated batch any more.
Furthermore topic/full-ppgtt has never been reviewed:
- Lifetime rules around vma unbinding/release are unclear, resulting
into this awesome hack called ppgtt_release. Which seems to take the
blame for most of the recursion fallout.
- Context/ring init works different on gpu reset than anywhere else.
Such differeneces have in the past always lead to really hard to
track down bugs.
- Aliasing ppgtt is treated in a bunch of places as a real address
space, but it isn't - the real address space is always the global
gtt in that case. This results in a bit a mess between contexts and
ppgtt object, further complication the context/ppgtt/vma lifetime
rules.
- We don't have any docs describing the overall concepts introduced
with full ppgtt. A short, concise overview describing vmas and some
of the strange bits around them (like the unbound vmas used by
execbuf, or the new binding rules) really is needed.
Note that a lot of the post topic/full-ppgtt merge fallout has already
been addressed, this entire list here of 10 issues really only contains
the still outstanding issues.
Finally the 3.15 merge window is approaching and I think we need to
use the remaining time to ensure that our fallback option of using
aliasing ppgtt is in solid shape. Hence I think it's time to throw the
switch. While at it demote the helper from static inline status
because really.
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Dave Airlie <airlied@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-06 16:40:43 +08:00
|
|
|
}
|
|
|
|
|
2014-06-14 00:28:33 +08:00
|
|
|
/* Early VLV doesn't have this */
|
2016-07-05 17:40:23 +08:00
|
|
|
if (IS_VALLEYVIEW(dev_priv) && dev_priv->drm.pdev->revision < 0xb) {
|
2014-06-14 00:28:33 +08:00
|
|
|
DRM_DEBUG_DRIVER("disabling PPGTT on pre-B3 step VLV\n");
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-08-11 17:51:26 +08:00
|
|
|
if (INTEL_GEN(dev_priv) >= 8 && i915.enable_execlists) {
|
|
|
|
if (has_full_48bit_ppgtt)
|
|
|
|
return 3;
|
|
|
|
|
|
|
|
if (has_full_ppgtt)
|
|
|
|
return 2;
|
|
|
|
}
|
|
|
|
|
|
|
|
return has_aliasing_ppgtt ? 1 : 0;
|
drm/i915: Disable full ppgtt by default
There are too many oustanding issues:
- Fence handling in the current code is broken. There's a patch series
from me, but it's blocked on and extended review (which includes
writing the testcases).
- IOMMU mapping handling is broken, we need to properly refcount it -
currently it gets destroyed when the first vma is unbound, so way
too early.
- There's a pending reset issue on snb. Since Mika's reset work and
full ppgtt have been pulled in in separate branches and ended up
intermittingly breaking each another it's unclear who's the exact
culprit here.
- We still have persistent evidince of crazy recursion bugs through
vma_unbind and ppgtt_relase, e.g.
https://bugs.freedesktop.org/show_bug.cgi?id=73383
This issue (and a few others meanwhile resolved) have blocked our
performance measuring/tuning group since 3 months.
- Secure batch dispatching is broken. This is blocking Brad Volkin's
command checker work since 3 months.
All these issues are confirmed to only happen when full ppgtt is
enabled, falling back to aliasing ppgtt resolves them. But even
aliasing ppgtt itself still has a regression:
- We currently unconditionally bind objects into the aliasing ppgtt,
which means all priviledged objects like ringbuffers are visible to
unpriviledged access again. On top of that this also breaks the
command checker for aliasing ppgtt, since it can't hide the
validated batch any more.
Furthermore topic/full-ppgtt has never been reviewed:
- Lifetime rules around vma unbinding/release are unclear, resulting
into this awesome hack called ppgtt_release. Which seems to take the
blame for most of the recursion fallout.
- Context/ring init works different on gpu reset than anywhere else.
Such differeneces have in the past always lead to really hard to
track down bugs.
- Aliasing ppgtt is treated in a bunch of places as a real address
space, but it isn't - the real address space is always the global
gtt in that case. This results in a bit a mess between contexts and
ppgtt object, further complication the context/ppgtt/vma lifetime
rules.
- We don't have any docs describing the overall concepts introduced
with full ppgtt. A short, concise overview describing vmas and some
of the strange bits around them (like the unbound vmas used by
execbuf, or the new binding rules) really is needed.
Note that a lot of the post topic/full-ppgtt merge fallout has already
been addressed, this entire list here of 10 issues really only contains
the still outstanding issues.
Finally the 3.15 merge window is approaching and I think we need to
use the remaining time to ensure that our fallback option of using
aliasing ppgtt is in solid shape. Hence I think it's time to throw the
switch. While at it demote the helper from static inline status
because really.
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Dave Airlie <airlied@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-06 16:40:43 +08:00
|
|
|
}
|
|
|
|
|
2015-04-14 23:35:27 +08:00
|
|
|
static int ppgtt_bind_vma(struct i915_vma *vma,
|
|
|
|
enum i915_cache_level cache_level,
|
|
|
|
u32 unused)
|
2015-04-14 23:35:24 +08:00
|
|
|
{
|
2017-02-15 16:43:42 +08:00
|
|
|
u32 pte_flags;
|
|
|
|
int ret;
|
|
|
|
|
2017-05-12 17:14:23 +08:00
|
|
|
if (!(vma->flags & I915_VMA_LOCAL_BIND)) {
|
|
|
|
ret = vma->vm->allocate_va_range(vma->vm, vma->node.start,
|
|
|
|
vma->size);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
}
|
2015-04-14 23:35:24 +08:00
|
|
|
|
2016-10-28 20:58:35 +08:00
|
|
|
vma->pages = vma->obj->mm.pages;
|
2016-08-15 17:48:47 +08:00
|
|
|
|
2015-04-14 23:35:24 +08:00
|
|
|
/* Currently applicable only to VLV */
|
2017-02-15 16:43:42 +08:00
|
|
|
pte_flags = 0;
|
2015-04-14 23:35:24 +08:00
|
|
|
if (vma->obj->gt_ro)
|
|
|
|
pte_flags |= PTE_READ_ONLY;
|
|
|
|
|
2017-06-22 17:58:36 +08:00
|
|
|
vma->vm->insert_entries(vma->vm, vma, cache_level, pte_flags);
|
2015-04-14 23:35:27 +08:00
|
|
|
|
|
|
|
return 0;
|
2015-04-14 23:35:24 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void ppgtt_unbind_vma(struct i915_vma *vma)
|
|
|
|
{
|
2017-02-15 16:43:42 +08:00
|
|
|
vma->vm->clear_range(vma->vm, vma->node.start, vma->size);
|
2015-04-14 23:35:24 +08:00
|
|
|
}
|
drm/i915: Create bind/unbind abstraction for VMAs
To sum up what goes on here, we abstract the vma binding, similarly to
the previous object binding. This helps for distinguishing legacy
binding, versus modern binding. To keep the code churn as minimal as
possible, I am leaving in insert_entries(). It serves as the per
platform pte writing basically. bind_vma and insert_entries do share a
lot of similarities, and I did have designs to combine the two, but as
mentioned already... too much churn in an already massive patchset.
What follows are the 3 commits which existed discretely in the original
submissions. Upon rebasing on Broadwell support, it became clear that
separation was not good, and only made for more error prone code. Below
are the 3 commit messages with all their history.
drm/i915: Add bind/unbind object functions to VMA
drm/i915: Use the new vm [un]bind functions
drm/i915: reduce vm->insert_entries() usage
drm/i915: Add bind/unbind object functions to VMA
As we plumb the code with more VM information, it has become more
obvious that the easiest way to deal with bind and unbind is to simply
put the function pointers in the vm, and let those choose the correct
way to handle the page table updates. This change allows many places in
the code to simply be vm->bind, and not have to worry about
distinguishing PPGTT vs GGTT.
Notice that this patch has no impact on functionality. I've decided to
save the actual change until the next patch because I think it's easier
to review that way. I'm happy to squash the two, or let Daniel do it on
merge.
v2:
Make ggtt handle the quirky aliasing ppgtt
Add flags to bind object to support above
Don't ever call bind/unbind directly for PPGTT until we have real, full
PPGTT (use NULLs to assert this)
Make sure we rebind the ggtt if there already is a ggtt binding. This
happens on set cache levels.
Use VMA for bind/unbind (Daniel, Ben)
v3: Reorganize ggtt_vma_bind to be more concise and easier to read
(Ville). Change logic in unbind to only unbind ggtt when there is a
global mapping, and to remove a redundant check if the aliasing ppgtt
exists.
v4: Make the bind function a bit smarter about the cache levels to avoid
unnecessary multiple remaps. "I accept it is a wart, I think unifying
the pin_vma / bind_vma could be unified later" (Chris)
Removed the git notes, and put version info here. (Daniel)
v5: Update the comment to not suck (Chris)
v6:
Move bind/unbind to the VMA. It makes more sense in the VMA structure
(always has, but I was previously lazy). With this change, it will allow
us to keep a distinct insert_entries.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
drm/i915: Use the new vm [un]bind functions
Building on the last patch which created the new function pointers in
the VM for bind/unbind, here we actually put those new function pointers
to use.
Split out as a separate patch to aid in review. I'm fine with squashing
into the previous patch if people request it.
v2: Updated to address the smart ggtt which can do aliasing as needed
Make sure we bind to global gtt when mappable and fenceable. I thought
we could get away without this initialy, but we cannot.
v3: Make the global GTT binding explicitly use the ggtt VM for
bind_vma(). While at it, use the new ggtt_vma helper (Chris)
At this point the original mailing list thread diverges. ie.
v4^:
use target_obj instead of obj for gen6 relocate_entry
vma->bind_vma() can be called safely during pin. So simply do that
instead of the complicated conditionals.
Don't restore PPGTT bound objects on resume path
Bug fix in resume path for globally bound Bos
Properly handle secure dispatch
Rebased on vma bind/unbind conversion
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
drm/i915: reduce vm->insert_entries() usage
FKA: drm/i915: eliminate vm->insert_entries()
With bind/unbind function pointers in place, we no longer need
insert_entries. We could, and want, to remove clear_range, however it's
not totally easy at this point. Since it's used in a couple of place
still that don't only deal in objects: setup, ppgtt init, and restore
gtt mappings.
v2: Don't actually remove insert_entries, just limit its usage. It will
be useful when we introduce gen8. It will always be called from the vma
bind/unbind.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-07 06:10:56 +08:00
|
|
|
|
2015-04-14 23:35:26 +08:00
|
|
|
static gen8_pte_t gen8_pte_encode(dma_addr_t addr,
|
2016-10-13 20:02:40 +08:00
|
|
|
enum i915_cache_level level)
|
2013-11-03 12:07:18 +08:00
|
|
|
{
|
2016-10-13 20:02:40 +08:00
|
|
|
gen8_pte_t pte = _PAGE_PRESENT | _PAGE_RW;
|
2013-11-03 12:07:18 +08:00
|
|
|
pte |= addr;
|
2014-04-19 05:04:27 +08:00
|
|
|
|
|
|
|
switch (level) {
|
|
|
|
case I915_CACHE_NONE:
|
2017-09-14 20:39:41 +08:00
|
|
|
pte |= PPAT_UNCACHED;
|
2014-04-19 05:04:27 +08:00
|
|
|
break;
|
|
|
|
case I915_CACHE_WT:
|
2017-09-14 20:39:41 +08:00
|
|
|
pte |= PPAT_DISPLAY_ELLC;
|
2014-04-19 05:04:27 +08:00
|
|
|
break;
|
|
|
|
default:
|
2017-09-14 20:39:41 +08:00
|
|
|
pte |= PPAT_CACHED;
|
2014-04-19 05:04:27 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2013-11-03 12:07:18 +08:00
|
|
|
return pte;
|
|
|
|
}
|
|
|
|
|
2015-06-25 23:35:16 +08:00
|
|
|
static gen8_pde_t gen8_pde_encode(const dma_addr_t addr,
|
|
|
|
const enum i915_cache_level level)
|
2013-11-05 13:20:14 +08:00
|
|
|
{
|
2015-03-17 00:00:54 +08:00
|
|
|
gen8_pde_t pde = _PAGE_PRESENT | _PAGE_RW;
|
2013-11-05 13:20:14 +08:00
|
|
|
pde |= addr;
|
|
|
|
if (level != I915_CACHE_NONE)
|
2017-09-14 20:39:41 +08:00
|
|
|
pde |= PPAT_CACHED_PDE;
|
2013-11-05 13:20:14 +08:00
|
|
|
else
|
2017-09-14 20:39:41 +08:00
|
|
|
pde |= PPAT_UNCACHED;
|
2013-11-05 13:20:14 +08:00
|
|
|
return pde;
|
|
|
|
}
|
|
|
|
|
2015-07-30 18:05:29 +08:00
|
|
|
#define gen8_pdpe_encode gen8_pde_encode
|
|
|
|
#define gen8_pml4e_encode gen8_pde_encode
|
|
|
|
|
2015-03-17 00:00:54 +08:00
|
|
|
static gen6_pte_t snb_pte_encode(dma_addr_t addr,
|
|
|
|
enum i915_cache_level level,
|
2016-10-13 20:02:40 +08:00
|
|
|
u32 unused)
|
2012-09-25 07:44:32 +08:00
|
|
|
{
|
2016-10-13 20:02:40 +08:00
|
|
|
gen6_pte_t pte = GEN6_PTE_VALID;
|
2012-09-25 07:44:32 +08:00
|
|
|
pte |= GEN6_PTE_ADDR_ENCODE(addr);
|
2012-10-20 00:33:22 +08:00
|
|
|
|
|
|
|
switch (level) {
|
2013-08-06 20:17:02 +08:00
|
|
|
case I915_CACHE_L3_LLC:
|
|
|
|
case I915_CACHE_LLC:
|
|
|
|
pte |= GEN6_PTE_CACHE_LLC;
|
|
|
|
break;
|
|
|
|
case I915_CACHE_NONE:
|
|
|
|
pte |= GEN6_PTE_UNCACHED;
|
|
|
|
break;
|
|
|
|
default:
|
2014-12-08 23:40:10 +08:00
|
|
|
MISSING_CASE(level);
|
2013-08-06 20:17:02 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
return pte;
|
|
|
|
}
|
|
|
|
|
2015-03-17 00:00:54 +08:00
|
|
|
static gen6_pte_t ivb_pte_encode(dma_addr_t addr,
|
|
|
|
enum i915_cache_level level,
|
2016-10-13 20:02:40 +08:00
|
|
|
u32 unused)
|
2013-08-06 20:17:02 +08:00
|
|
|
{
|
2016-10-13 20:02:40 +08:00
|
|
|
gen6_pte_t pte = GEN6_PTE_VALID;
|
2013-08-06 20:17:02 +08:00
|
|
|
pte |= GEN6_PTE_ADDR_ENCODE(addr);
|
|
|
|
|
|
|
|
switch (level) {
|
|
|
|
case I915_CACHE_L3_LLC:
|
|
|
|
pte |= GEN7_PTE_CACHE_L3_LLC;
|
2012-10-20 00:33:22 +08:00
|
|
|
break;
|
|
|
|
case I915_CACHE_LLC:
|
|
|
|
pte |= GEN6_PTE_CACHE_LLC;
|
|
|
|
break;
|
|
|
|
case I915_CACHE_NONE:
|
2013-04-22 15:53:51 +08:00
|
|
|
pte |= GEN6_PTE_UNCACHED;
|
2012-10-20 00:33:22 +08:00
|
|
|
break;
|
|
|
|
default:
|
2014-12-08 23:40:10 +08:00
|
|
|
MISSING_CASE(level);
|
2012-10-20 00:33:22 +08:00
|
|
|
}
|
|
|
|
|
2012-09-25 07:44:32 +08:00
|
|
|
return pte;
|
|
|
|
}
|
|
|
|
|
2015-03-17 00:00:54 +08:00
|
|
|
static gen6_pte_t byt_pte_encode(dma_addr_t addr,
|
|
|
|
enum i915_cache_level level,
|
2016-10-13 20:02:40 +08:00
|
|
|
u32 flags)
|
2013-04-22 15:53:50 +08:00
|
|
|
{
|
2016-10-13 20:02:40 +08:00
|
|
|
gen6_pte_t pte = GEN6_PTE_VALID;
|
2013-04-22 15:53:50 +08:00
|
|
|
pte |= GEN6_PTE_ADDR_ENCODE(addr);
|
|
|
|
|
2014-06-17 13:29:42 +08:00
|
|
|
if (!(flags & PTE_READ_ONLY))
|
|
|
|
pte |= BYT_PTE_WRITEABLE;
|
2013-04-22 15:53:50 +08:00
|
|
|
|
|
|
|
if (level != I915_CACHE_NONE)
|
|
|
|
pte |= BYT_PTE_SNOOPED_BY_CPU_CACHES;
|
|
|
|
|
|
|
|
return pte;
|
|
|
|
}
|
|
|
|
|
2015-03-17 00:00:54 +08:00
|
|
|
static gen6_pte_t hsw_pte_encode(dma_addr_t addr,
|
|
|
|
enum i915_cache_level level,
|
2016-10-13 20:02:40 +08:00
|
|
|
u32 unused)
|
2013-04-22 15:53:51 +08:00
|
|
|
{
|
2016-10-13 20:02:40 +08:00
|
|
|
gen6_pte_t pte = GEN6_PTE_VALID;
|
2013-07-05 02:02:03 +08:00
|
|
|
pte |= HSW_PTE_ADDR_ENCODE(addr);
|
2013-04-22 15:53:51 +08:00
|
|
|
|
|
|
|
if (level != I915_CACHE_NONE)
|
2013-08-05 14:47:29 +08:00
|
|
|
pte |= HSW_WB_LLC_AGE3;
|
2013-04-22 15:53:51 +08:00
|
|
|
|
|
|
|
return pte;
|
|
|
|
}
|
|
|
|
|
2015-03-17 00:00:54 +08:00
|
|
|
static gen6_pte_t iris_pte_encode(dma_addr_t addr,
|
|
|
|
enum i915_cache_level level,
|
2016-10-13 20:02:40 +08:00
|
|
|
u32 unused)
|
2013-07-05 02:02:06 +08:00
|
|
|
{
|
2016-10-13 20:02:40 +08:00
|
|
|
gen6_pte_t pte = GEN6_PTE_VALID;
|
2013-07-05 02:02:06 +08:00
|
|
|
pte |= HSW_PTE_ADDR_ENCODE(addr);
|
|
|
|
|
2013-08-08 21:41:10 +08:00
|
|
|
switch (level) {
|
|
|
|
case I915_CACHE_NONE:
|
|
|
|
break;
|
|
|
|
case I915_CACHE_WT:
|
2013-11-22 18:37:53 +08:00
|
|
|
pte |= HSW_WT_ELLC_LLC_AGE3;
|
2013-08-08 21:41:10 +08:00
|
|
|
break;
|
|
|
|
default:
|
2013-11-22 18:37:53 +08:00
|
|
|
pte |= HSW_WB_ELLC_LLC_AGE3;
|
2013-08-08 21:41:10 +08:00
|
|
|
break;
|
|
|
|
}
|
2013-07-05 02:02:06 +08:00
|
|
|
|
|
|
|
return pte;
|
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:40 +08:00
|
|
|
static struct page *vm_alloc_page(struct i915_address_space *vm, gfp_t gfp)
|
2015-03-17 00:00:56 +08:00
|
|
|
{
|
2017-08-23 01:38:28 +08:00
|
|
|
struct pagevec *pvec = &vm->free_pages;
|
2015-03-17 00:00:56 +08:00
|
|
|
|
2017-02-15 16:43:40 +08:00
|
|
|
if (I915_SELFTEST_ONLY(should_fail(&vm->fault_attr, 1)))
|
|
|
|
i915_gem_shrink_all(vm->i915);
|
2017-02-14 01:15:44 +08:00
|
|
|
|
2017-08-23 01:38:28 +08:00
|
|
|
if (likely(pvec->nr))
|
|
|
|
return pvec->pages[--pvec->nr];
|
|
|
|
|
|
|
|
if (!vm->pt_kmap_wc)
|
|
|
|
return alloc_page(gfp);
|
|
|
|
|
|
|
|
/* A placeholder for a specific mutex to guard the WC stash */
|
|
|
|
lockdep_assert_held(&vm->i915->drm.struct_mutex);
|
|
|
|
|
|
|
|
/* Look in our global stash of WC pages... */
|
|
|
|
pvec = &vm->i915->mm.wc_stash;
|
|
|
|
if (likely(pvec->nr))
|
|
|
|
return pvec->pages[--pvec->nr];
|
|
|
|
|
|
|
|
/* Otherwise batch allocate pages to amoritize cost of set_pages_wc. */
|
|
|
|
do {
|
|
|
|
struct page *page;
|
2017-02-15 16:43:40 +08:00
|
|
|
|
2017-08-23 01:38:28 +08:00
|
|
|
page = alloc_page(gfp);
|
|
|
|
if (unlikely(!page))
|
|
|
|
break;
|
|
|
|
|
|
|
|
pvec->pages[pvec->nr++] = page;
|
|
|
|
} while (pagevec_space(pvec));
|
|
|
|
|
|
|
|
if (unlikely(!pvec->nr))
|
2017-02-15 16:43:40 +08:00
|
|
|
return NULL;
|
|
|
|
|
2017-08-23 01:38:28 +08:00
|
|
|
set_pages_array_wc(pvec->pages, pvec->nr);
|
2017-02-15 16:43:40 +08:00
|
|
|
|
2017-08-23 01:38:28 +08:00
|
|
|
return pvec->pages[--pvec->nr];
|
2017-02-15 16:43:40 +08:00
|
|
|
}
|
|
|
|
|
2017-08-23 01:38:28 +08:00
|
|
|
static void vm_free_pages_release(struct i915_address_space *vm,
|
|
|
|
bool immediate)
|
2017-02-15 16:43:40 +08:00
|
|
|
{
|
2017-08-23 01:38:28 +08:00
|
|
|
struct pagevec *pvec = &vm->free_pages;
|
|
|
|
|
|
|
|
GEM_BUG_ON(!pagevec_count(pvec));
|
2017-02-15 16:43:40 +08:00
|
|
|
|
2017-08-23 01:38:28 +08:00
|
|
|
if (vm->pt_kmap_wc) {
|
|
|
|
struct pagevec *stash = &vm->i915->mm.wc_stash;
|
|
|
|
|
|
|
|
/* When we use WC, first fill up the global stash and then
|
|
|
|
* only if full immediately free the overflow.
|
|
|
|
*/
|
2017-02-15 16:43:40 +08:00
|
|
|
|
2017-08-23 01:38:28 +08:00
|
|
|
lockdep_assert_held(&vm->i915->drm.struct_mutex);
|
|
|
|
if (pagevec_space(stash)) {
|
|
|
|
do {
|
|
|
|
stash->pages[stash->nr++] =
|
|
|
|
pvec->pages[--pvec->nr];
|
|
|
|
if (!pvec->nr)
|
|
|
|
return;
|
|
|
|
} while (pagevec_space(stash));
|
|
|
|
|
|
|
|
/* As we have made some room in the VM's free_pages,
|
|
|
|
* we can wait for it to fill again. Unless we are
|
|
|
|
* inside i915_address_space_fini() and must
|
|
|
|
* immediately release the pages!
|
|
|
|
*/
|
|
|
|
if (!immediate)
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
set_pages_array_wb(pvec->pages, pvec->nr);
|
|
|
|
}
|
|
|
|
|
|
|
|
__pagevec_release(pvec);
|
2017-02-15 16:43:40 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void vm_free_page(struct i915_address_space *vm, struct page *page)
|
|
|
|
{
|
|
|
|
if (!pagevec_add(&vm->free_pages, page))
|
2017-08-23 01:38:28 +08:00
|
|
|
vm_free_pages_release(vm, false);
|
2017-02-15 16:43:40 +08:00
|
|
|
}
|
2015-03-17 00:00:56 +08:00
|
|
|
|
2017-02-15 16:43:40 +08:00
|
|
|
static int __setup_page_dma(struct i915_address_space *vm,
|
|
|
|
struct i915_page_dma *p,
|
|
|
|
gfp_t gfp)
|
|
|
|
{
|
|
|
|
p->page = vm_alloc_page(vm, gfp | __GFP_NOWARN | __GFP_NORETRY);
|
|
|
|
if (unlikely(!p->page))
|
|
|
|
return -ENOMEM;
|
2015-03-17 00:00:56 +08:00
|
|
|
|
2017-02-15 16:43:40 +08:00
|
|
|
p->daddr = dma_map_page(vm->dma, p->page, 0, PAGE_SIZE,
|
|
|
|
PCI_DMA_BIDIRECTIONAL);
|
|
|
|
if (unlikely(dma_mapping_error(vm->dma, p->daddr))) {
|
|
|
|
vm_free_page(vm, p->page);
|
|
|
|
return -ENOMEM;
|
2015-06-25 23:35:07 +08:00
|
|
|
}
|
2015-03-25 01:06:33 +08:00
|
|
|
|
|
|
|
return 0;
|
2015-03-17 00:00:56 +08:00
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:40 +08:00
|
|
|
static int setup_page_dma(struct i915_address_space *vm,
|
2016-11-16 16:55:34 +08:00
|
|
|
struct i915_page_dma *p)
|
2015-06-25 23:35:13 +08:00
|
|
|
{
|
2017-02-15 16:43:40 +08:00
|
|
|
return __setup_page_dma(vm, p, I915_GFP_DMA);
|
2015-06-25 23:35:13 +08:00
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:40 +08:00
|
|
|
static void cleanup_page_dma(struct i915_address_space *vm,
|
2016-11-16 16:55:34 +08:00
|
|
|
struct i915_page_dma *p)
|
drm/i915: Create page table allocators
As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks. This makes the
code easier to write, read, and verify.
Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,
The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain additional complexity as a
result.
This patch exists for a few reasons:
1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the DMA mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.
2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).
3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and catching leaks should be easy.
4. Less important: the GFP flags are confined to one location, which
makes playing around with such things trivial.
v2: Updated commit message to explain why this patch exists
v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/
v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)
v5: Added additional safety checks in gen8 clear/free/unmap.
v6: Use WARN_ON and return -EINVAL in alloc_pt_range (Mika).
v7: Make err_out loop symmetrical to the way we allocate in
alloc_pt_range. Also s/page_tables/page_table and correct commit
message (Mika)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-25 00:22:36 +08:00
|
|
|
{
|
2017-02-15 16:43:40 +08:00
|
|
|
dma_unmap_page(vm->dma, p->daddr, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
|
|
|
|
vm_free_page(vm, p->page);
|
2015-06-25 23:35:07 +08:00
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:41 +08:00
|
|
|
#define kmap_atomic_px(px) kmap_atomic(px_base(px)->page)
|
2015-06-25 23:35:11 +08:00
|
|
|
|
2017-02-15 16:43:40 +08:00
|
|
|
#define setup_px(vm, px) setup_page_dma((vm), px_base(px))
|
|
|
|
#define cleanup_px(vm, px) cleanup_page_dma((vm), px_base(px))
|
|
|
|
#define fill_px(ppgtt, px, v) fill_page_dma((vm), px_base(px), (v))
|
|
|
|
#define fill32_px(ppgtt, px, v) fill_page_dma_32((vm), px_base(px), (v))
|
2015-06-25 23:35:12 +08:00
|
|
|
|
2017-02-15 16:43:40 +08:00
|
|
|
static void fill_page_dma(struct i915_address_space *vm,
|
|
|
|
struct i915_page_dma *p,
|
|
|
|
const u64 val)
|
2015-06-25 23:35:11 +08:00
|
|
|
{
|
2017-02-15 16:43:41 +08:00
|
|
|
u64 * const vaddr = kmap_atomic(p->page);
|
2015-06-25 23:35:11 +08:00
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < 512; i++)
|
|
|
|
vaddr[i] = val;
|
|
|
|
|
2017-02-15 16:43:41 +08:00
|
|
|
kunmap_atomic(vaddr);
|
2015-06-25 23:35:11 +08:00
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:40 +08:00
|
|
|
static void fill_page_dma_32(struct i915_address_space *vm,
|
|
|
|
struct i915_page_dma *p,
|
|
|
|
const u32 v)
|
2015-06-25 23:35:10 +08:00
|
|
|
{
|
2017-02-15 16:43:40 +08:00
|
|
|
fill_page_dma(vm, p, (u64)v << 32 | v);
|
2015-06-25 23:35:10 +08:00
|
|
|
}
|
|
|
|
|
2016-08-22 15:44:30 +08:00
|
|
|
static int
|
2017-02-15 16:43:40 +08:00
|
|
|
setup_scratch_page(struct i915_address_space *vm, gfp_t gfp)
|
2015-06-30 23:16:39 +08:00
|
|
|
{
|
2017-08-23 01:38:28 +08:00
|
|
|
struct page *page;
|
|
|
|
dma_addr_t addr;
|
|
|
|
|
|
|
|
page = alloc_page(gfp | __GFP_ZERO);
|
|
|
|
if (unlikely(!page))
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
addr = dma_map_page(vm->dma, page, 0, PAGE_SIZE,
|
|
|
|
PCI_DMA_BIDIRECTIONAL);
|
|
|
|
if (unlikely(dma_mapping_error(vm->dma, addr))) {
|
|
|
|
__free_page(page);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
vm->scratch_page.page = page;
|
|
|
|
vm->scratch_page.daddr = addr;
|
|
|
|
return 0;
|
2015-06-30 23:16:39 +08:00
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:40 +08:00
|
|
|
static void cleanup_scratch_page(struct i915_address_space *vm)
|
2015-06-30 23:16:39 +08:00
|
|
|
{
|
2017-08-23 01:38:28 +08:00
|
|
|
struct i915_page_dma *p = &vm->scratch_page;
|
|
|
|
|
|
|
|
dma_unmap_page(vm->dma, p->daddr, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
|
|
|
|
__free_page(p->page);
|
2015-06-30 23:16:39 +08:00
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:40 +08:00
|
|
|
static struct i915_page_table *alloc_pt(struct i915_address_space *vm)
|
drm/i915: Create page table allocators
As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks. This makes the
code easier to write, read, and verify.
Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,
The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain additional complexity as a
result.
This patch exists for a few reasons:
1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the DMA mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.
2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).
3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and catching leaks should be easy.
4. Less important: the GFP flags are confined to one location, which
makes playing around with such things trivial.
v2: Updated commit message to explain why this patch exists
v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/
v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)
v5: Added additional safety checks in gen8 clear/free/unmap.
v6: Use WARN_ON and return -EINVAL in alloc_pt_range (Mika).
v7: Make err_out loop symmetrical to the way we allocate in
alloc_pt_range. Also s/page_tables/page_table and correct commit
message (Mika)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-25 00:22:36 +08:00
|
|
|
{
|
2015-04-08 19:13:23 +08:00
|
|
|
struct i915_page_table *pt;
|
drm/i915: Create page table allocators
As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks. This makes the
code easier to write, read, and verify.
Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,
The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain additional complexity as a
result.
This patch exists for a few reasons:
1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the DMA mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.
2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).
3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and catching leaks should be easy.
4. Less important: the GFP flags are confined to one location, which
makes playing around with such things trivial.
v2: Updated commit message to explain why this patch exists
v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/
v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)
v5: Added additional safety checks in gen8 clear/free/unmap.
v6: Use WARN_ON and return -EINVAL in alloc_pt_range (Mika).
v7: Make err_out loop symmetrical to the way we allocate in
alloc_pt_range. Also s/page_tables/page_table and correct commit
message (Mika)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-25 00:22:36 +08:00
|
|
|
|
2017-02-15 16:43:46 +08:00
|
|
|
pt = kmalloc(sizeof(*pt), GFP_KERNEL | __GFP_NOWARN);
|
|
|
|
if (unlikely(!pt))
|
drm/i915: Create page table allocators
As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks. This makes the
code easier to write, read, and verify.
Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,
The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain additional complexity as a
result.
This patch exists for a few reasons:
1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the DMA mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.
2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).
3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and catching leaks should be easy.
4. Less important: the GFP flags are confined to one location, which
makes playing around with such things trivial.
v2: Updated commit message to explain why this patch exists
v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/
v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)
v5: Added additional safety checks in gen8 clear/free/unmap.
v6: Use WARN_ON and return -EINVAL in alloc_pt_range (Mika).
v7: Make err_out loop symmetrical to the way we allocate in
alloc_pt_range. Also s/page_tables/page_table and correct commit
message (Mika)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-25 00:22:36 +08:00
|
|
|
return ERR_PTR(-ENOMEM);
|
|
|
|
|
2017-02-15 16:43:46 +08:00
|
|
|
if (unlikely(setup_px(vm, pt))) {
|
|
|
|
kfree(pt);
|
|
|
|
return ERR_PTR(-ENOMEM);
|
|
|
|
}
|
drm/i915: Create page table allocators
As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks. This makes the
code easier to write, read, and verify.
Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,
The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain additional complexity as a
result.
This patch exists for a few reasons:
1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the DMA mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.
2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).
3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and catching leaks should be easy.
4. Less important: the GFP flags are confined to one location, which
makes playing around with such things trivial.
v2: Updated commit message to explain why this patch exists
v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/
v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)
v5: Added additional safety checks in gen8 clear/free/unmap.
v6: Use WARN_ON and return -EINVAL in alloc_pt_range (Mika).
v7: Make err_out loop symmetrical to the way we allocate in
alloc_pt_range. Also s/page_tables/page_table and correct commit
message (Mika)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-25 00:22:36 +08:00
|
|
|
|
2017-02-15 16:43:46 +08:00
|
|
|
pt->used_ptes = 0;
|
drm/i915: Create page table allocators
As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks. This makes the
code easier to write, read, and verify.
Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,
The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain additional complexity as a
result.
This patch exists for a few reasons:
1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the DMA mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.
2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).
3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and catching leaks should be easy.
4. Less important: the GFP flags are confined to one location, which
makes playing around with such things trivial.
v2: Updated commit message to explain why this patch exists
v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/
v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)
v5: Added additional safety checks in gen8 clear/free/unmap.
v6: Use WARN_ON and return -EINVAL in alloc_pt_range (Mika).
v7: Make err_out loop symmetrical to the way we allocate in
alloc_pt_range. Also s/page_tables/page_table and correct commit
message (Mika)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-25 00:22:36 +08:00
|
|
|
return pt;
|
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:40 +08:00
|
|
|
static void free_pt(struct i915_address_space *vm, struct i915_page_table *pt)
|
drm/i915: Create page table allocators
As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks. This makes the
code easier to write, read, and verify.
Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,
The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain additional complexity as a
result.
This patch exists for a few reasons:
1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the DMA mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.
2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).
3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and catching leaks should be easy.
4. Less important: the GFP flags are confined to one location, which
makes playing around with such things trivial.
v2: Updated commit message to explain why this patch exists
v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/
v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)
v5: Added additional safety checks in gen8 clear/free/unmap.
v6: Use WARN_ON and return -EINVAL in alloc_pt_range (Mika).
v7: Make err_out loop symmetrical to the way we allocate in
alloc_pt_range. Also s/page_tables/page_table and correct commit
message (Mika)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-25 00:22:36 +08:00
|
|
|
{
|
2017-02-15 16:43:40 +08:00
|
|
|
cleanup_px(vm, pt);
|
2015-06-30 23:16:37 +08:00
|
|
|
kfree(pt);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void gen8_initialize_pt(struct i915_address_space *vm,
|
|
|
|
struct i915_page_table *pt)
|
|
|
|
{
|
2017-02-15 16:43:46 +08:00
|
|
|
fill_px(vm, pt,
|
|
|
|
gen8_pte_encode(vm->scratch_page.daddr, I915_CACHE_LLC));
|
2015-06-30 23:16:37 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void gen6_initialize_pt(struct i915_address_space *vm,
|
|
|
|
struct i915_page_table *pt)
|
|
|
|
{
|
2017-02-15 16:43:46 +08:00
|
|
|
fill32_px(vm, pt,
|
|
|
|
vm->pte_encode(vm->scratch_page.daddr, I915_CACHE_LLC, 0));
|
drm/i915: Create page table allocators
As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks. This makes the
code easier to write, read, and verify.
Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,
The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain additional complexity as a
result.
This patch exists for a few reasons:
1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the DMA mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.
2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).
3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and catching leaks should be easy.
4. Less important: the GFP flags are confined to one location, which
makes playing around with such things trivial.
v2: Updated commit message to explain why this patch exists
v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/
v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)
v5: Added additional safety checks in gen8 clear/free/unmap.
v6: Use WARN_ON and return -EINVAL in alloc_pt_range (Mika).
v7: Make err_out loop symmetrical to the way we allocate in
alloc_pt_range. Also s/page_tables/page_table and correct commit
message (Mika)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-25 00:22:36 +08:00
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:40 +08:00
|
|
|
static struct i915_page_directory *alloc_pd(struct i915_address_space *vm)
|
drm/i915: Create page table allocators
As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks. This makes the
code easier to write, read, and verify.
Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,
The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain additional complexity as a
result.
This patch exists for a few reasons:
1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the DMA mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.
2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).
3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and catching leaks should be easy.
4. Less important: the GFP flags are confined to one location, which
makes playing around with such things trivial.
v2: Updated commit message to explain why this patch exists
v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/
v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)
v5: Added additional safety checks in gen8 clear/free/unmap.
v6: Use WARN_ON and return -EINVAL in alloc_pt_range (Mika).
v7: Make err_out loop symmetrical to the way we allocate in
alloc_pt_range. Also s/page_tables/page_table and correct commit
message (Mika)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-25 00:22:36 +08:00
|
|
|
{
|
2015-04-08 19:13:23 +08:00
|
|
|
struct i915_page_directory *pd;
|
drm/i915: Create page table allocators
As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks. This makes the
code easier to write, read, and verify.
Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,
The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain additional complexity as a
result.
This patch exists for a few reasons:
1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the DMA mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.
2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).
3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and catching leaks should be easy.
4. Less important: the GFP flags are confined to one location, which
makes playing around with such things trivial.
v2: Updated commit message to explain why this patch exists
v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/
v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)
v5: Added additional safety checks in gen8 clear/free/unmap.
v6: Use WARN_ON and return -EINVAL in alloc_pt_range (Mika).
v7: Make err_out loop symmetrical to the way we allocate in
alloc_pt_range. Also s/page_tables/page_table and correct commit
message (Mika)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-25 00:22:36 +08:00
|
|
|
|
2017-02-15 16:43:47 +08:00
|
|
|
pd = kzalloc(sizeof(*pd), GFP_KERNEL | __GFP_NOWARN);
|
|
|
|
if (unlikely(!pd))
|
drm/i915: Create page table allocators
As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks. This makes the
code easier to write, read, and verify.
Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,
The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain additional complexity as a
result.
This patch exists for a few reasons:
1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the DMA mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.
2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).
3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and catching leaks should be easy.
4. Less important: the GFP flags are confined to one location, which
makes playing around with such things trivial.
v2: Updated commit message to explain why this patch exists
v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/
v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)
v5: Added additional safety checks in gen8 clear/free/unmap.
v6: Use WARN_ON and return -EINVAL in alloc_pt_range (Mika).
v7: Make err_out loop symmetrical to the way we allocate in
alloc_pt_range. Also s/page_tables/page_table and correct commit
message (Mika)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-25 00:22:36 +08:00
|
|
|
return ERR_PTR(-ENOMEM);
|
|
|
|
|
2017-02-15 16:43:47 +08:00
|
|
|
if (unlikely(setup_px(vm, pd))) {
|
|
|
|
kfree(pd);
|
|
|
|
return ERR_PTR(-ENOMEM);
|
|
|
|
}
|
2015-04-08 19:13:32 +08:00
|
|
|
|
2017-02-15 16:43:47 +08:00
|
|
|
pd->used_pdes = 0;
|
drm/i915: Create page table allocators
As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks. This makes the
code easier to write, read, and verify.
Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,
The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain additional complexity as a
result.
This patch exists for a few reasons:
1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the DMA mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.
2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).
3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and catching leaks should be easy.
4. Less important: the GFP flags are confined to one location, which
makes playing around with such things trivial.
v2: Updated commit message to explain why this patch exists
v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/
v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)
v5: Added additional safety checks in gen8 clear/free/unmap.
v6: Use WARN_ON and return -EINVAL in alloc_pt_range (Mika).
v7: Make err_out loop symmetrical to the way we allocate in
alloc_pt_range. Also s/page_tables/page_table and correct commit
message (Mika)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-25 00:22:36 +08:00
|
|
|
return pd;
|
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:40 +08:00
|
|
|
static void free_pd(struct i915_address_space *vm,
|
2016-11-16 16:55:34 +08:00
|
|
|
struct i915_page_directory *pd)
|
2015-06-30 23:16:37 +08:00
|
|
|
{
|
2017-02-15 16:43:47 +08:00
|
|
|
cleanup_px(vm, pd);
|
|
|
|
kfree(pd);
|
2015-06-30 23:16:37 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void gen8_initialize_pd(struct i915_address_space *vm,
|
|
|
|
struct i915_page_directory *pd)
|
|
|
|
{
|
2017-02-15 16:43:46 +08:00
|
|
|
unsigned int i;
|
2015-06-30 23:16:37 +08:00
|
|
|
|
2017-02-15 16:43:46 +08:00
|
|
|
fill_px(vm, pd,
|
|
|
|
gen8_pde_encode(px_dma(vm->scratch_pt), I915_CACHE_LLC));
|
|
|
|
for (i = 0; i < I915_PDES; i++)
|
|
|
|
pd->page_table[i] = vm->scratch_pt;
|
2015-06-30 23:16:37 +08:00
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:47 +08:00
|
|
|
static int __pdp_init(struct i915_address_space *vm,
|
2015-07-30 00:23:46 +08:00
|
|
|
struct i915_page_directory_pointer *pdp)
|
|
|
|
{
|
2017-02-28 23:28:07 +08:00
|
|
|
const unsigned int pdpes = i915_pdpes_per_pdp(vm);
|
2017-02-15 16:43:48 +08:00
|
|
|
unsigned int i;
|
2015-07-30 00:23:46 +08:00
|
|
|
|
2017-02-15 16:43:47 +08:00
|
|
|
pdp->page_directory = kmalloc_array(pdpes, sizeof(*pdp->page_directory),
|
2017-02-15 16:43:48 +08:00
|
|
|
GFP_KERNEL | __GFP_NOWARN);
|
|
|
|
if (unlikely(!pdp->page_directory))
|
2015-07-30 00:23:46 +08:00
|
|
|
return -ENOMEM;
|
|
|
|
|
2017-02-15 16:43:47 +08:00
|
|
|
for (i = 0; i < pdpes; i++)
|
|
|
|
pdp->page_directory[i] = vm->scratch_pd;
|
|
|
|
|
2015-07-30 00:23:46 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void __pdp_fini(struct i915_page_directory_pointer *pdp)
|
|
|
|
{
|
|
|
|
kfree(pdp->page_directory);
|
|
|
|
pdp->page_directory = NULL;
|
|
|
|
}
|
|
|
|
|
2017-02-28 23:28:09 +08:00
|
|
|
static inline bool use_4lvl(const struct i915_address_space *vm)
|
|
|
|
{
|
|
|
|
return i915_vm_is_48bit(vm);
|
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:40 +08:00
|
|
|
static struct i915_page_directory_pointer *
|
|
|
|
alloc_pdp(struct i915_address_space *vm)
|
2015-07-30 18:05:29 +08:00
|
|
|
{
|
|
|
|
struct i915_page_directory_pointer *pdp;
|
|
|
|
int ret = -ENOMEM;
|
|
|
|
|
2017-02-28 23:28:09 +08:00
|
|
|
WARN_ON(!use_4lvl(vm));
|
2015-07-30 18:05:29 +08:00
|
|
|
|
|
|
|
pdp = kzalloc(sizeof(*pdp), GFP_KERNEL);
|
|
|
|
if (!pdp)
|
|
|
|
return ERR_PTR(-ENOMEM);
|
|
|
|
|
2017-02-15 16:43:47 +08:00
|
|
|
ret = __pdp_init(vm, pdp);
|
2015-07-30 18:05:29 +08:00
|
|
|
if (ret)
|
|
|
|
goto fail_bitmap;
|
|
|
|
|
2017-02-15 16:43:40 +08:00
|
|
|
ret = setup_px(vm, pdp);
|
2015-07-30 18:05:29 +08:00
|
|
|
if (ret)
|
|
|
|
goto fail_page_m;
|
|
|
|
|
|
|
|
return pdp;
|
|
|
|
|
|
|
|
fail_page_m:
|
|
|
|
__pdp_fini(pdp);
|
|
|
|
fail_bitmap:
|
|
|
|
kfree(pdp);
|
|
|
|
|
|
|
|
return ERR_PTR(ret);
|
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:40 +08:00
|
|
|
static void free_pdp(struct i915_address_space *vm,
|
2015-07-30 00:23:46 +08:00
|
|
|
struct i915_page_directory_pointer *pdp)
|
|
|
|
{
|
|
|
|
__pdp_fini(pdp);
|
2017-02-28 23:28:09 +08:00
|
|
|
|
|
|
|
if (!use_4lvl(vm))
|
|
|
|
return;
|
|
|
|
|
|
|
|
cleanup_px(vm, pdp);
|
|
|
|
kfree(pdp);
|
2015-07-30 18:05:29 +08:00
|
|
|
}
|
|
|
|
|
2015-07-30 00:23:55 +08:00
|
|
|
static void gen8_initialize_pdp(struct i915_address_space *vm,
|
|
|
|
struct i915_page_directory_pointer *pdp)
|
|
|
|
{
|
|
|
|
gen8_ppgtt_pdpe_t scratch_pdpe;
|
|
|
|
|
|
|
|
scratch_pdpe = gen8_pdpe_encode(px_dma(vm->scratch_pd), I915_CACHE_LLC);
|
|
|
|
|
2017-02-15 16:43:40 +08:00
|
|
|
fill_px(vm, pdp, scratch_pdpe);
|
2015-07-30 00:23:55 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void gen8_initialize_pml4(struct i915_address_space *vm,
|
|
|
|
struct i915_pml4 *pml4)
|
|
|
|
{
|
2017-02-15 16:43:48 +08:00
|
|
|
unsigned int i;
|
2015-07-30 18:05:29 +08:00
|
|
|
|
2017-02-15 16:43:48 +08:00
|
|
|
fill_px(vm, pml4,
|
|
|
|
gen8_pml4e_encode(px_dma(vm->scratch_pdp), I915_CACHE_LLC));
|
|
|
|
for (i = 0; i < GEN8_PML4ES_PER_PML4; i++)
|
|
|
|
pml4->pdps[i] = vm->scratch_pdp;
|
2015-07-30 00:23:46 +08:00
|
|
|
}
|
|
|
|
|
2013-11-05 14:29:36 +08:00
|
|
|
/* Broadwell Page Directory Pointer Descriptors */
|
2015-05-30 00:43:56 +08:00
|
|
|
static int gen8_write_pdp(struct drm_i915_gem_request *req,
|
2015-04-08 19:13:29 +08:00
|
|
|
unsigned entry,
|
|
|
|
dma_addr_t addr)
|
2013-11-05 14:29:36 +08:00
|
|
|
{
|
2016-03-16 19:00:38 +08:00
|
|
|
struct intel_engine_cs *engine = req->engine;
|
2017-02-14 19:32:42 +08:00
|
|
|
u32 *cs;
|
2013-11-05 14:29:36 +08:00
|
|
|
|
|
|
|
BUG_ON(entry >= 4);
|
|
|
|
|
2017-02-14 19:32:42 +08:00
|
|
|
cs = intel_ring_begin(req, 6);
|
|
|
|
if (IS_ERR(cs))
|
|
|
|
return PTR_ERR(cs);
|
2013-11-05 14:29:36 +08:00
|
|
|
|
2017-02-14 19:32:42 +08:00
|
|
|
*cs++ = MI_LOAD_REGISTER_IMM(1);
|
|
|
|
*cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, entry));
|
|
|
|
*cs++ = upper_32_bits(addr);
|
|
|
|
*cs++ = MI_LOAD_REGISTER_IMM(1);
|
|
|
|
*cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, entry));
|
|
|
|
*cs++ = lower_32_bits(addr);
|
|
|
|
intel_ring_advance(req, cs);
|
2013-11-05 14:29:36 +08:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-02-28 23:28:10 +08:00
|
|
|
static int gen8_mm_switch_3lvl(struct i915_hw_ppgtt *ppgtt,
|
|
|
|
struct drm_i915_gem_request *req)
|
2013-11-05 14:29:36 +08:00
|
|
|
{
|
2013-12-07 06:11:10 +08:00
|
|
|
int i, ret;
|
2013-11-05 14:29:36 +08:00
|
|
|
|
2017-02-28 23:28:10 +08:00
|
|
|
for (i = GEN8_3LVL_PDPES - 1; i >= 0; i--) {
|
2015-06-25 23:35:06 +08:00
|
|
|
const dma_addr_t pd_daddr = i915_page_dir_dma_addr(ppgtt, i);
|
|
|
|
|
2015-05-30 00:43:56 +08:00
|
|
|
ret = gen8_write_pdp(req, i, pd_daddr);
|
2013-12-07 06:11:10 +08:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
2013-11-05 14:29:36 +08:00
|
|
|
}
|
2013-11-26 01:54:32 +08:00
|
|
|
|
2013-12-07 06:11:10 +08:00
|
|
|
return 0;
|
2013-11-05 14:29:36 +08:00
|
|
|
}
|
|
|
|
|
2017-02-28 23:28:10 +08:00
|
|
|
static int gen8_mm_switch_4lvl(struct i915_hw_ppgtt *ppgtt,
|
|
|
|
struct drm_i915_gem_request *req)
|
2015-07-30 18:06:23 +08:00
|
|
|
{
|
|
|
|
return gen8_write_pdp(req, 0, px_dma(&ppgtt->pml4));
|
|
|
|
}
|
|
|
|
|
2016-10-31 23:24:46 +08:00
|
|
|
/* PDE TLBs are a pain to invalidate on GEN8+. When we modify
|
|
|
|
* the page table structures, we mark them dirty so that
|
|
|
|
* context switching/execlist queuing code takes extra steps
|
|
|
|
* to ensure that tlbs are flushed.
|
|
|
|
*/
|
|
|
|
static void mark_tlbs_dirty(struct i915_hw_ppgtt *ppgtt)
|
|
|
|
{
|
2016-11-29 17:50:08 +08:00
|
|
|
ppgtt->pd_dirty_rings = INTEL_INFO(ppgtt->base.i915)->ring_mask;
|
2016-10-31 23:24:46 +08:00
|
|
|
}
|
|
|
|
|
2016-10-13 20:02:42 +08:00
|
|
|
/* Removes entries from a single page table, releasing it if it's empty.
|
|
|
|
* Caller can use the return value to update higher-level entries.
|
|
|
|
*/
|
|
|
|
static bool gen8_ppgtt_clear_pt(struct i915_address_space *vm,
|
2016-10-13 20:02:41 +08:00
|
|
|
struct i915_page_table *pt,
|
2017-02-15 16:43:46 +08:00
|
|
|
u64 start, u64 length)
|
2013-11-03 12:07:23 +08:00
|
|
|
{
|
2016-10-13 20:02:41 +08:00
|
|
|
unsigned int num_entries = gen8_pte_count(start, length);
|
2016-11-01 21:27:36 +08:00
|
|
|
unsigned int pte = gen8_pte_index(start);
|
|
|
|
unsigned int pte_end = pte + num_entries;
|
2017-02-15 16:43:37 +08:00
|
|
|
const gen8_pte_t scratch_pte =
|
|
|
|
gen8_pte_encode(vm->scratch_page.daddr, I915_CACHE_LLC);
|
|
|
|
gen8_pte_t *vaddr;
|
2013-11-03 12:07:23 +08:00
|
|
|
|
2017-02-15 16:43:46 +08:00
|
|
|
GEM_BUG_ON(num_entries > pt->used_ptes);
|
2016-11-01 21:27:36 +08:00
|
|
|
|
2017-02-15 16:43:46 +08:00
|
|
|
pt->used_ptes -= num_entries;
|
|
|
|
if (!pt->used_ptes)
|
|
|
|
return true;
|
2016-10-13 20:02:42 +08:00
|
|
|
|
2017-02-15 16:43:41 +08:00
|
|
|
vaddr = kmap_atomic_px(pt);
|
2016-11-01 21:27:36 +08:00
|
|
|
while (pte < pte_end)
|
2017-02-15 16:43:37 +08:00
|
|
|
vaddr[pte++] = scratch_pte;
|
2017-02-15 16:43:41 +08:00
|
|
|
kunmap_atomic(vaddr);
|
2016-10-13 20:02:42 +08:00
|
|
|
|
|
|
|
return false;
|
2016-10-13 20:02:41 +08:00
|
|
|
}
|
drm/i915: Create page table allocators
As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks. This makes the
code easier to write, read, and verify.
Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,
The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain additional complexity as a
result.
This patch exists for a few reasons:
1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the DMA mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.
2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).
3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and catching leaks should be easy.
4. Less important: the GFP flags are confined to one location, which
makes playing around with such things trivial.
v2: Updated commit message to explain why this patch exists
v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/
v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)
v5: Added additional safety checks in gen8 clear/free/unmap.
v6: Use WARN_ON and return -EINVAL in alloc_pt_range (Mika).
v7: Make err_out loop symmetrical to the way we allocate in
alloc_pt_range. Also s/page_tables/page_table and correct commit
message (Mika)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-25 00:22:36 +08:00
|
|
|
|
2017-02-15 16:43:46 +08:00
|
|
|
static void gen8_ppgtt_set_pde(struct i915_address_space *vm,
|
|
|
|
struct i915_page_directory *pd,
|
|
|
|
struct i915_page_table *pt,
|
|
|
|
unsigned int pde)
|
|
|
|
{
|
|
|
|
gen8_pde_t *vaddr;
|
|
|
|
|
|
|
|
pd->page_table[pde] = pt;
|
|
|
|
|
|
|
|
vaddr = kmap_atomic_px(pd);
|
|
|
|
vaddr[pde] = gen8_pde_encode(px_dma(pt), I915_CACHE_LLC);
|
|
|
|
kunmap_atomic(vaddr);
|
|
|
|
}
|
|
|
|
|
2016-10-13 20:02:42 +08:00
|
|
|
static bool gen8_ppgtt_clear_pd(struct i915_address_space *vm,
|
2016-10-13 20:02:41 +08:00
|
|
|
struct i915_page_directory *pd,
|
2017-02-15 16:43:46 +08:00
|
|
|
u64 start, u64 length)
|
2016-10-13 20:02:41 +08:00
|
|
|
{
|
|
|
|
struct i915_page_table *pt;
|
2017-02-15 16:43:46 +08:00
|
|
|
u32 pde;
|
2016-10-13 20:02:41 +08:00
|
|
|
|
|
|
|
gen8_for_each_pde(pt, pd, start, length, pde) {
|
2017-02-27 20:26:52 +08:00
|
|
|
GEM_BUG_ON(pt == vm->scratch_pt);
|
|
|
|
|
2017-02-15 16:43:46 +08:00
|
|
|
if (!gen8_ppgtt_clear_pt(vm, pt, start, length))
|
|
|
|
continue;
|
drm/i915: Create page table allocators
As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks. This makes the
code easier to write, read, and verify.
Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,
The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain additional complexity as a
result.
This patch exists for a few reasons:
1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the DMA mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.
2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).
3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and catching leaks should be easy.
4. Less important: the GFP flags are confined to one location, which
makes playing around with such things trivial.
v2: Updated commit message to explain why this patch exists
v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/
v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)
v5: Added additional safety checks in gen8 clear/free/unmap.
v6: Use WARN_ON and return -EINVAL in alloc_pt_range (Mika).
v7: Make err_out loop symmetrical to the way we allocate in
alloc_pt_range. Also s/page_tables/page_table and correct commit
message (Mika)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-25 00:22:36 +08:00
|
|
|
|
2017-02-15 16:43:46 +08:00
|
|
|
gen8_ppgtt_set_pde(vm, pd, vm->scratch_pt, pde);
|
2017-02-27 20:26:52 +08:00
|
|
|
GEM_BUG_ON(!pd->used_pdes);
|
2017-02-15 16:43:47 +08:00
|
|
|
pd->used_pdes--;
|
2017-02-15 16:43:46 +08:00
|
|
|
|
|
|
|
free_pt(vm, pt);
|
2016-10-13 20:02:42 +08:00
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:47 +08:00
|
|
|
return !pd->used_pdes;
|
|
|
|
}
|
2016-10-13 20:02:42 +08:00
|
|
|
|
2017-02-15 16:43:47 +08:00
|
|
|
static void gen8_ppgtt_set_pdpe(struct i915_address_space *vm,
|
|
|
|
struct i915_page_directory_pointer *pdp,
|
|
|
|
struct i915_page_directory *pd,
|
|
|
|
unsigned int pdpe)
|
|
|
|
{
|
|
|
|
gen8_ppgtt_pdpe_t *vaddr;
|
|
|
|
|
|
|
|
pdp->page_directory[pdpe] = pd;
|
2017-02-28 23:28:09 +08:00
|
|
|
if (!use_4lvl(vm))
|
2017-02-15 16:43:47 +08:00
|
|
|
return;
|
|
|
|
|
|
|
|
vaddr = kmap_atomic_px(pdp);
|
|
|
|
vaddr[pdpe] = gen8_pdpe_encode(px_dma(pd), I915_CACHE_LLC);
|
|
|
|
kunmap_atomic(vaddr);
|
2016-10-13 20:02:41 +08:00
|
|
|
}
|
drm/i915: Create page table allocators
As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks. This makes the
code easier to write, read, and verify.
Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,
The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain additional complexity as a
result.
This patch exists for a few reasons:
1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the DMA mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.
2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).
3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and catching leaks should be easy.
4. Less important: the GFP flags are confined to one location, which
makes playing around with such things trivial.
v2: Updated commit message to explain why this patch exists
v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/
v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)
v5: Added additional safety checks in gen8 clear/free/unmap.
v6: Use WARN_ON and return -EINVAL in alloc_pt_range (Mika).
v7: Make err_out loop symmetrical to the way we allocate in
alloc_pt_range. Also s/page_tables/page_table and correct commit
message (Mika)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-25 00:22:36 +08:00
|
|
|
|
2016-10-13 20:02:42 +08:00
|
|
|
/* Removes entries from a single page dir pointer, releasing it if it's empty.
|
|
|
|
* Caller can use the return value to update higher-level entries
|
|
|
|
*/
|
|
|
|
static bool gen8_ppgtt_clear_pdp(struct i915_address_space *vm,
|
2016-10-13 20:02:41 +08:00
|
|
|
struct i915_page_directory_pointer *pdp,
|
2017-02-15 16:43:47 +08:00
|
|
|
u64 start, u64 length)
|
2016-10-13 20:02:41 +08:00
|
|
|
{
|
|
|
|
struct i915_page_directory *pd;
|
2017-02-15 16:43:47 +08:00
|
|
|
unsigned int pdpe;
|
drm/i915: Create page table allocators
As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks. This makes the
code easier to write, read, and verify.
Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,
The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain additional complexity as a
result.
This patch exists for a few reasons:
1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the DMA mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.
2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).
3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and catching leaks should be easy.
4. Less important: the GFP flags are confined to one location, which
makes playing around with such things trivial.
v2: Updated commit message to explain why this patch exists
v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/
v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)
v5: Added additional safety checks in gen8 clear/free/unmap.
v6: Use WARN_ON and return -EINVAL in alloc_pt_range (Mika).
v7: Make err_out loop symmetrical to the way we allocate in
alloc_pt_range. Also s/page_tables/page_table and correct commit
message (Mika)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-25 00:22:36 +08:00
|
|
|
|
2016-10-13 20:02:41 +08:00
|
|
|
gen8_for_each_pdpe(pd, pdp, start, length, pdpe) {
|
2017-02-27 20:26:52 +08:00
|
|
|
GEM_BUG_ON(pd == vm->scratch_pd);
|
|
|
|
|
2017-02-15 16:43:47 +08:00
|
|
|
if (!gen8_ppgtt_clear_pd(vm, pd, start, length))
|
|
|
|
continue;
|
2013-11-03 12:07:23 +08:00
|
|
|
|
2017-02-15 16:43:47 +08:00
|
|
|
gen8_ppgtt_set_pdpe(vm, pdp, vm->scratch_pd, pdpe);
|
2017-02-27 20:26:52 +08:00
|
|
|
GEM_BUG_ON(!pdp->used_pdpes);
|
2017-02-15 16:43:48 +08:00
|
|
|
pdp->used_pdpes--;
|
2016-10-13 20:02:42 +08:00
|
|
|
|
2017-02-15 16:43:47 +08:00
|
|
|
free_pd(vm, pd);
|
|
|
|
}
|
2016-10-31 23:24:46 +08:00
|
|
|
|
2017-02-15 16:43:48 +08:00
|
|
|
return !pdp->used_pdpes;
|
2016-10-13 20:02:41 +08:00
|
|
|
}
|
2013-11-03 12:07:23 +08:00
|
|
|
|
2017-02-15 16:43:47 +08:00
|
|
|
static void gen8_ppgtt_clear_3lvl(struct i915_address_space *vm,
|
|
|
|
u64 start, u64 length)
|
|
|
|
{
|
|
|
|
gen8_ppgtt_clear_pdp(vm, &i915_vm_to_ppgtt(vm)->pdp, start, length);
|
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:48 +08:00
|
|
|
static void gen8_ppgtt_set_pml4e(struct i915_pml4 *pml4,
|
|
|
|
struct i915_page_directory_pointer *pdp,
|
|
|
|
unsigned int pml4e)
|
|
|
|
{
|
|
|
|
gen8_ppgtt_pml4e_t *vaddr;
|
|
|
|
|
|
|
|
pml4->pdps[pml4e] = pdp;
|
|
|
|
|
|
|
|
vaddr = kmap_atomic_px(pml4);
|
|
|
|
vaddr[pml4e] = gen8_pml4e_encode(px_dma(pdp), I915_CACHE_LLC);
|
|
|
|
kunmap_atomic(vaddr);
|
|
|
|
}
|
|
|
|
|
2016-10-13 20:02:42 +08:00
|
|
|
/* Removes entries from a single pml4.
|
|
|
|
* This is the top-level structure in 4-level page tables used on gen8+.
|
|
|
|
* Empty entries are always scratch pml4e.
|
|
|
|
*/
|
2017-02-15 16:43:47 +08:00
|
|
|
static void gen8_ppgtt_clear_4lvl(struct i915_address_space *vm,
|
|
|
|
u64 start, u64 length)
|
2016-10-13 20:02:41 +08:00
|
|
|
{
|
2017-02-15 16:43:47 +08:00
|
|
|
struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
|
|
|
|
struct i915_pml4 *pml4 = &ppgtt->pml4;
|
2016-10-13 20:02:41 +08:00
|
|
|
struct i915_page_directory_pointer *pdp;
|
2017-02-15 16:43:48 +08:00
|
|
|
unsigned int pml4e;
|
2016-10-13 20:02:42 +08:00
|
|
|
|
2017-02-28 23:28:09 +08:00
|
|
|
GEM_BUG_ON(!use_4lvl(vm));
|
2013-11-03 12:07:23 +08:00
|
|
|
|
2016-10-13 20:02:41 +08:00
|
|
|
gen8_for_each_pml4e(pdp, pml4, start, length, pml4e) {
|
2017-02-27 20:26:52 +08:00
|
|
|
GEM_BUG_ON(pdp == vm->scratch_pdp);
|
|
|
|
|
2017-02-15 16:43:48 +08:00
|
|
|
if (!gen8_ppgtt_clear_pdp(vm, pdp, start, length))
|
|
|
|
continue;
|
2013-11-03 12:07:23 +08:00
|
|
|
|
2017-02-15 16:43:48 +08:00
|
|
|
gen8_ppgtt_set_pml4e(pml4, vm->scratch_pdp, pml4e);
|
|
|
|
|
|
|
|
free_pdp(vm, pdp);
|
2013-11-03 12:07:23 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:37 +08:00
|
|
|
struct sgt_dma {
|
|
|
|
struct scatterlist *sg;
|
|
|
|
dma_addr_t dma, max;
|
|
|
|
};
|
|
|
|
|
2017-02-26 02:11:22 +08:00
|
|
|
struct gen8_insert_pte {
|
|
|
|
u16 pml4e;
|
|
|
|
u16 pdpe;
|
|
|
|
u16 pde;
|
|
|
|
u16 pte;
|
|
|
|
};
|
|
|
|
|
|
|
|
static __always_inline struct gen8_insert_pte gen8_insert_pte(u64 start)
|
|
|
|
{
|
|
|
|
return (struct gen8_insert_pte) {
|
|
|
|
gen8_pml4e_index(start),
|
|
|
|
gen8_pdpe_index(start),
|
|
|
|
gen8_pde_index(start),
|
|
|
|
gen8_pte_index(start),
|
|
|
|
};
|
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:37 +08:00
|
|
|
static __always_inline bool
|
|
|
|
gen8_ppgtt_insert_pte_entries(struct i915_hw_ppgtt *ppgtt,
|
2015-07-30 18:02:49 +08:00
|
|
|
struct i915_page_directory_pointer *pdp,
|
2017-02-15 16:43:37 +08:00
|
|
|
struct sgt_dma *iter,
|
2017-02-26 02:11:22 +08:00
|
|
|
struct gen8_insert_pte *idx,
|
2015-07-30 18:02:49 +08:00
|
|
|
enum i915_cache_level cache_level)
|
|
|
|
{
|
2017-02-15 16:43:37 +08:00
|
|
|
struct i915_page_directory *pd;
|
|
|
|
const gen8_pte_t pte_encode = gen8_pte_encode(0, cache_level);
|
|
|
|
gen8_pte_t *vaddr;
|
|
|
|
bool ret;
|
2013-11-03 12:07:24 +08:00
|
|
|
|
2017-02-28 23:28:07 +08:00
|
|
|
GEM_BUG_ON(idx->pdpe >= i915_pdpes_per_pdp(&ppgtt->base));
|
2017-02-26 02:11:22 +08:00
|
|
|
pd = pdp->page_directory[idx->pdpe];
|
|
|
|
vaddr = kmap_atomic_px(pd->page_table[idx->pde]);
|
2017-02-15 16:43:37 +08:00
|
|
|
do {
|
2017-02-26 02:11:22 +08:00
|
|
|
vaddr[idx->pte] = pte_encode | iter->dma;
|
|
|
|
|
2017-02-15 16:43:37 +08:00
|
|
|
iter->dma += PAGE_SIZE;
|
|
|
|
if (iter->dma >= iter->max) {
|
|
|
|
iter->sg = __sg_next(iter->sg);
|
|
|
|
if (!iter->sg) {
|
|
|
|
ret = false;
|
|
|
|
break;
|
|
|
|
}
|
drm/i915/bdw: Reorganize PT allocations
The previous allocation mechanism would get 2 contiguous allocations,
one for the page directories, and one for the page tables. As each page
table is 1 page, and there are 512 of these per page directory, this
goes to 2MB. An unfriendly request at best. Worse still, our HW now
supports 4 page directories, and a 2MB allocation is not allowed.
In order to fix this, this patch attempts to split up each page table
allocation into a single, discrete allocation. There is nothing really
fancy about the patch itself, it just has to manage an extra pointer
indirection, and have a fancier bit of logic to free up the pages.
To accommodate some of the added complexity, two new helpers are
introduced to allocate, and free the page table pages.
NOTE: I really wanted to split the way we do allocations, and the way in
which we identify the page table/page directory being used. I found
splitting this functionality up to be too unwieldy. I apologize in
advance to the reviewer. I'd recommend looking at the result, rather
than the diff.
v2/NOTE2: This patch predated commit:
6f1cc993518462ccf039e195fabd47e7aa5bfd13
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Dec 31 15:50:31 2013 +0000
drm/i915: Avoid dereference past end of page arr
It fixed the same issue as that patch, but because of the limbo state of
PPGTT, Chris patch was merged instead. The excess churn is a result of
my using my original patch, which has my preferred naming. Primarily
act_* is changed to which_*, but it's mostly the same otherwise. I've
kept the convention Chris used for the pte wrap (I had something
slightly different, and broken - but fixable)
v3: Rename which_p[..]e to drop which_ (Chris)
Remove BUG_ON in inner loop (Chris)
Redo the pde/pdpe wrap logic (Chris)
v4: s/1MB/2MB in commit message (Imre)
Plug leaking gen8_pt_pages in both the error path, as well as general
free case (Imre)
v5: Rename leftover "which_" variables (Imre)
Add the pde = 0 wrap that was missed from v3 (Imre)
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Squash in fixup from Ben.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-21 03:51:21 +08:00
|
|
|
|
2017-02-15 16:43:37 +08:00
|
|
|
iter->dma = sg_dma_address(iter->sg);
|
|
|
|
iter->max = iter->dma + iter->sg->length;
|
2015-02-25 00:22:34 +08:00
|
|
|
}
|
2013-11-03 12:07:24 +08:00
|
|
|
|
2017-02-26 02:11:22 +08:00
|
|
|
if (++idx->pte == GEN8_PTES) {
|
|
|
|
idx->pte = 0;
|
|
|
|
|
|
|
|
if (++idx->pde == I915_PDES) {
|
|
|
|
idx->pde = 0;
|
|
|
|
|
2017-02-15 16:43:37 +08:00
|
|
|
/* Limited by sg length for 3lvl */
|
2017-02-26 02:11:22 +08:00
|
|
|
if (++idx->pdpe == GEN8_PML4ES_PER_PML4) {
|
|
|
|
idx->pdpe = 0;
|
2017-02-15 16:43:37 +08:00
|
|
|
ret = true;
|
2015-08-03 16:53:27 +08:00
|
|
|
break;
|
2017-02-15 16:43:37 +08:00
|
|
|
}
|
|
|
|
|
2017-02-28 23:28:07 +08:00
|
|
|
GEM_BUG_ON(idx->pdpe >= i915_pdpes_per_pdp(&ppgtt->base));
|
2017-02-26 02:11:22 +08:00
|
|
|
pd = pdp->page_directory[idx->pdpe];
|
drm/i915/bdw: Reorganize PT allocations
The previous allocation mechanism would get 2 contiguous allocations,
one for the page directories, and one for the page tables. As each page
table is 1 page, and there are 512 of these per page directory, this
goes to 2MB. An unfriendly request at best. Worse still, our HW now
supports 4 page directories, and a 2MB allocation is not allowed.
In order to fix this, this patch attempts to split up each page table
allocation into a single, discrete allocation. There is nothing really
fancy about the patch itself, it just has to manage an extra pointer
indirection, and have a fancier bit of logic to free up the pages.
To accommodate some of the added complexity, two new helpers are
introduced to allocate, and free the page table pages.
NOTE: I really wanted to split the way we do allocations, and the way in
which we identify the page table/page directory being used. I found
splitting this functionality up to be too unwieldy. I apologize in
advance to the reviewer. I'd recommend looking at the result, rather
than the diff.
v2/NOTE2: This patch predated commit:
6f1cc993518462ccf039e195fabd47e7aa5bfd13
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Dec 31 15:50:31 2013 +0000
drm/i915: Avoid dereference past end of page arr
It fixed the same issue as that patch, but because of the limbo state of
PPGTT, Chris patch was merged instead. The excess churn is a result of
my using my original patch, which has my preferred naming. Primarily
act_* is changed to which_*, but it's mostly the same otherwise. I've
kept the convention Chris used for the pte wrap (I had something
slightly different, and broken - but fixable)
v3: Rename which_p[..]e to drop which_ (Chris)
Remove BUG_ON in inner loop (Chris)
Redo the pde/pdpe wrap logic (Chris)
v4: s/1MB/2MB in commit message (Imre)
Plug leaking gen8_pt_pages in both the error path, as well as general
free case (Imre)
v5: Rename leftover "which_" variables (Imre)
Add the pde = 0 wrap that was missed from v3 (Imre)
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Squash in fixup from Ben.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-21 03:51:21 +08:00
|
|
|
}
|
2017-02-15 16:43:37 +08:00
|
|
|
|
2017-02-15 16:43:41 +08:00
|
|
|
kunmap_atomic(vaddr);
|
2017-02-26 02:11:22 +08:00
|
|
|
vaddr = kmap_atomic_px(pd->page_table[idx->pde]);
|
2013-11-03 12:07:24 +08:00
|
|
|
}
|
2017-02-15 16:43:37 +08:00
|
|
|
} while (1);
|
2017-02-15 16:43:41 +08:00
|
|
|
kunmap_atomic(vaddr);
|
2015-06-25 23:35:11 +08:00
|
|
|
|
2017-02-15 16:43:37 +08:00
|
|
|
return ret;
|
2013-11-03 12:07:24 +08:00
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:37 +08:00
|
|
|
static void gen8_ppgtt_insert_3lvl(struct i915_address_space *vm,
|
2017-06-22 17:58:36 +08:00
|
|
|
struct i915_vma *vma,
|
2017-02-15 16:43:37 +08:00
|
|
|
enum i915_cache_level cache_level,
|
|
|
|
u32 unused)
|
2015-07-30 18:02:49 +08:00
|
|
|
{
|
2017-07-07 17:50:59 +08:00
|
|
|
struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
|
2017-02-15 16:43:37 +08:00
|
|
|
struct sgt_dma iter = {
|
2017-06-22 17:58:36 +08:00
|
|
|
.sg = vma->pages->sgl,
|
2017-02-15 16:43:37 +08:00
|
|
|
.dma = sg_dma_address(iter.sg),
|
|
|
|
.max = iter.dma + iter.sg->length,
|
|
|
|
};
|
2017-06-22 17:58:36 +08:00
|
|
|
struct gen8_insert_pte idx = gen8_insert_pte(vma->node.start);
|
2015-07-30 18:02:49 +08:00
|
|
|
|
2017-02-26 02:11:22 +08:00
|
|
|
gen8_ppgtt_insert_pte_entries(ppgtt, &ppgtt->pdp, &iter, &idx,
|
|
|
|
cache_level);
|
2017-02-15 16:43:37 +08:00
|
|
|
}
|
2015-08-03 16:53:27 +08:00
|
|
|
|
2017-02-15 16:43:37 +08:00
|
|
|
static void gen8_ppgtt_insert_4lvl(struct i915_address_space *vm,
|
2017-06-22 17:58:36 +08:00
|
|
|
struct i915_vma *vma,
|
2017-02-15 16:43:37 +08:00
|
|
|
enum i915_cache_level cache_level,
|
|
|
|
u32 unused)
|
|
|
|
{
|
|
|
|
struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
|
|
|
|
struct sgt_dma iter = {
|
2017-06-22 17:58:36 +08:00
|
|
|
.sg = vma->pages->sgl,
|
2017-02-15 16:43:37 +08:00
|
|
|
.dma = sg_dma_address(iter.sg),
|
|
|
|
.max = iter.dma + iter.sg->length,
|
|
|
|
};
|
|
|
|
struct i915_page_directory_pointer **pdps = ppgtt->pml4.pdps;
|
2017-06-22 17:58:36 +08:00
|
|
|
struct gen8_insert_pte idx = gen8_insert_pte(vma->node.start);
|
2015-08-03 16:53:27 +08:00
|
|
|
|
2017-02-26 02:11:22 +08:00
|
|
|
while (gen8_ppgtt_insert_pte_entries(ppgtt, pdps[idx.pml4e++], &iter,
|
|
|
|
&idx, cache_level))
|
|
|
|
GEM_BUG_ON(idx.pml4e >= GEN8_PML4ES_PER_PML4);
|
2015-07-30 18:02:49 +08:00
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:40 +08:00
|
|
|
static void gen8_free_page_tables(struct i915_address_space *vm,
|
2015-06-11 00:46:39 +08:00
|
|
|
struct i915_page_directory *pd)
|
drm/i915/bdw: Reorganize PT allocations
The previous allocation mechanism would get 2 contiguous allocations,
one for the page directories, and one for the page tables. As each page
table is 1 page, and there are 512 of these per page directory, this
goes to 2MB. An unfriendly request at best. Worse still, our HW now
supports 4 page directories, and a 2MB allocation is not allowed.
In order to fix this, this patch attempts to split up each page table
allocation into a single, discrete allocation. There is nothing really
fancy about the patch itself, it just has to manage an extra pointer
indirection, and have a fancier bit of logic to free up the pages.
To accommodate some of the added complexity, two new helpers are
introduced to allocate, and free the page table pages.
NOTE: I really wanted to split the way we do allocations, and the way in
which we identify the page table/page directory being used. I found
splitting this functionality up to be too unwieldy. I apologize in
advance to the reviewer. I'd recommend looking at the result, rather
than the diff.
v2/NOTE2: This patch predated commit:
6f1cc993518462ccf039e195fabd47e7aa5bfd13
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Dec 31 15:50:31 2013 +0000
drm/i915: Avoid dereference past end of page arr
It fixed the same issue as that patch, but because of the limbo state of
PPGTT, Chris patch was merged instead. The excess churn is a result of
my using my original patch, which has my preferred naming. Primarily
act_* is changed to which_*, but it's mostly the same otherwise. I've
kept the convention Chris used for the pte wrap (I had something
slightly different, and broken - but fixable)
v3: Rename which_p[..]e to drop which_ (Chris)
Remove BUG_ON in inner loop (Chris)
Redo the pde/pdpe wrap logic (Chris)
v4: s/1MB/2MB in commit message (Imre)
Plug leaking gen8_pt_pages in both the error path, as well as general
free case (Imre)
v5: Rename leftover "which_" variables (Imre)
Add the pde = 0 wrap that was missed from v3 (Imre)
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Squash in fixup from Ben.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-21 03:51:21 +08:00
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
2015-06-25 23:35:12 +08:00
|
|
|
if (!px_page(pd))
|
drm/i915/bdw: Reorganize PT allocations
The previous allocation mechanism would get 2 contiguous allocations,
one for the page directories, and one for the page tables. As each page
table is 1 page, and there are 512 of these per page directory, this
goes to 2MB. An unfriendly request at best. Worse still, our HW now
supports 4 page directories, and a 2MB allocation is not allowed.
In order to fix this, this patch attempts to split up each page table
allocation into a single, discrete allocation. There is nothing really
fancy about the patch itself, it just has to manage an extra pointer
indirection, and have a fancier bit of logic to free up the pages.
To accommodate some of the added complexity, two new helpers are
introduced to allocate, and free the page table pages.
NOTE: I really wanted to split the way we do allocations, and the way in
which we identify the page table/page directory being used. I found
splitting this functionality up to be too unwieldy. I apologize in
advance to the reviewer. I'd recommend looking at the result, rather
than the diff.
v2/NOTE2: This patch predated commit:
6f1cc993518462ccf039e195fabd47e7aa5bfd13
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Dec 31 15:50:31 2013 +0000
drm/i915: Avoid dereference past end of page arr
It fixed the same issue as that patch, but because of the limbo state of
PPGTT, Chris patch was merged instead. The excess churn is a result of
my using my original patch, which has my preferred naming. Primarily
act_* is changed to which_*, but it's mostly the same otherwise. I've
kept the convention Chris used for the pte wrap (I had something
slightly different, and broken - but fixable)
v3: Rename which_p[..]e to drop which_ (Chris)
Remove BUG_ON in inner loop (Chris)
Redo the pde/pdpe wrap logic (Chris)
v4: s/1MB/2MB in commit message (Imre)
Plug leaking gen8_pt_pages in both the error path, as well as general
free case (Imre)
v5: Rename leftover "which_" variables (Imre)
Add the pde = 0 wrap that was missed from v3 (Imre)
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Squash in fixup from Ben.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-21 03:51:21 +08:00
|
|
|
return;
|
|
|
|
|
2017-02-15 16:43:47 +08:00
|
|
|
for (i = 0; i < I915_PDES; i++) {
|
|
|
|
if (pd->page_table[i] != vm->scratch_pt)
|
|
|
|
free_pt(vm, pd->page_table[i]);
|
drm/i915: Create page table allocators
As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks. This makes the
code easier to write, read, and verify.
Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,
The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain additional complexity as a
result.
This patch exists for a few reasons:
1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the DMA mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.
2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).
3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and catching leaks should be easy.
4. Less important: the GFP flags are confined to one location, which
makes playing around with such things trivial.
v2: Updated commit message to explain why this patch exists
v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/
v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)
v5: Added additional safety checks in gen8 clear/free/unmap.
v6: Use WARN_ON and return -EINVAL in alloc_pt_range (Mika).
v7: Make err_out loop symmetrical to the way we allocate in
alloc_pt_range. Also s/page_tables/page_table and correct commit
message (Mika)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-25 00:22:36 +08:00
|
|
|
}
|
2015-02-25 00:22:34 +08:00
|
|
|
}
|
|
|
|
|
2015-06-30 23:16:40 +08:00
|
|
|
static int gen8_init_scratch(struct i915_address_space *vm)
|
|
|
|
{
|
2016-04-27 20:19:25 +08:00
|
|
|
int ret;
|
2015-06-30 23:16:40 +08:00
|
|
|
|
2017-02-15 16:43:40 +08:00
|
|
|
ret = setup_scratch_page(vm, I915_GFP_DMA);
|
2016-08-22 15:44:30 +08:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
2015-06-30 23:16:40 +08:00
|
|
|
|
2017-02-15 16:43:40 +08:00
|
|
|
vm->scratch_pt = alloc_pt(vm);
|
2015-06-30 23:16:40 +08:00
|
|
|
if (IS_ERR(vm->scratch_pt)) {
|
2016-04-27 20:19:25 +08:00
|
|
|
ret = PTR_ERR(vm->scratch_pt);
|
|
|
|
goto free_scratch_page;
|
2015-06-30 23:16:40 +08:00
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:40 +08:00
|
|
|
vm->scratch_pd = alloc_pd(vm);
|
2015-06-30 23:16:40 +08:00
|
|
|
if (IS_ERR(vm->scratch_pd)) {
|
2016-04-27 20:19:25 +08:00
|
|
|
ret = PTR_ERR(vm->scratch_pd);
|
|
|
|
goto free_pt;
|
2015-06-30 23:16:40 +08:00
|
|
|
}
|
|
|
|
|
2017-02-28 23:28:09 +08:00
|
|
|
if (use_4lvl(vm)) {
|
2017-02-15 16:43:40 +08:00
|
|
|
vm->scratch_pdp = alloc_pdp(vm);
|
2015-07-30 00:23:55 +08:00
|
|
|
if (IS_ERR(vm->scratch_pdp)) {
|
2016-04-27 20:19:25 +08:00
|
|
|
ret = PTR_ERR(vm->scratch_pdp);
|
|
|
|
goto free_pd;
|
2015-07-30 00:23:55 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-06-30 23:16:40 +08:00
|
|
|
gen8_initialize_pt(vm, vm->scratch_pt);
|
|
|
|
gen8_initialize_pd(vm, vm->scratch_pd);
|
2017-02-28 23:28:09 +08:00
|
|
|
if (use_4lvl(vm))
|
2015-07-30 00:23:55 +08:00
|
|
|
gen8_initialize_pdp(vm, vm->scratch_pdp);
|
2015-06-30 23:16:40 +08:00
|
|
|
|
|
|
|
return 0;
|
2016-04-27 20:19:25 +08:00
|
|
|
|
|
|
|
free_pd:
|
2017-02-15 16:43:40 +08:00
|
|
|
free_pd(vm, vm->scratch_pd);
|
2016-04-27 20:19:25 +08:00
|
|
|
free_pt:
|
2017-02-15 16:43:40 +08:00
|
|
|
free_pt(vm, vm->scratch_pt);
|
2016-04-27 20:19:25 +08:00
|
|
|
free_scratch_page:
|
2017-02-15 16:43:40 +08:00
|
|
|
cleanup_scratch_page(vm);
|
2016-04-27 20:19:25 +08:00
|
|
|
|
|
|
|
return ret;
|
2015-06-30 23:16:40 +08:00
|
|
|
}
|
|
|
|
|
2015-08-28 15:41:18 +08:00
|
|
|
static int gen8_ppgtt_notify_vgt(struct i915_hw_ppgtt *ppgtt, bool create)
|
|
|
|
{
|
2017-02-28 23:28:09 +08:00
|
|
|
struct i915_address_space *vm = &ppgtt->base;
|
|
|
|
struct drm_i915_private *dev_priv = vm->i915;
|
2015-08-28 15:41:18 +08:00
|
|
|
enum vgt_g2v_type msg;
|
|
|
|
int i;
|
|
|
|
|
2017-02-28 23:28:09 +08:00
|
|
|
if (use_4lvl(vm)) {
|
|
|
|
const u64 daddr = px_dma(&ppgtt->pml4);
|
2015-08-28 15:41:18 +08:00
|
|
|
|
2015-11-05 05:20:12 +08:00
|
|
|
I915_WRITE(vgtif_reg(pdp[0].lo), lower_32_bits(daddr));
|
|
|
|
I915_WRITE(vgtif_reg(pdp[0].hi), upper_32_bits(daddr));
|
2015-08-28 15:41:18 +08:00
|
|
|
|
|
|
|
msg = (create ? VGT_G2V_PPGTT_L4_PAGE_TABLE_CREATE :
|
|
|
|
VGT_G2V_PPGTT_L4_PAGE_TABLE_DESTROY);
|
|
|
|
} else {
|
2017-02-28 23:28:10 +08:00
|
|
|
for (i = 0; i < GEN8_3LVL_PDPES; i++) {
|
2017-02-28 23:28:09 +08:00
|
|
|
const u64 daddr = i915_page_dir_dma_addr(ppgtt, i);
|
2015-08-28 15:41:18 +08:00
|
|
|
|
2015-11-05 05:20:12 +08:00
|
|
|
I915_WRITE(vgtif_reg(pdp[i].lo), lower_32_bits(daddr));
|
|
|
|
I915_WRITE(vgtif_reg(pdp[i].hi), upper_32_bits(daddr));
|
2015-08-28 15:41:18 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
msg = (create ? VGT_G2V_PPGTT_L3_PAGE_TABLE_CREATE :
|
|
|
|
VGT_G2V_PPGTT_L3_PAGE_TABLE_DESTROY);
|
|
|
|
}
|
|
|
|
|
|
|
|
I915_WRITE(vgtif_reg(g2v_notify), msg);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2015-06-30 23:16:40 +08:00
|
|
|
static void gen8_free_scratch(struct i915_address_space *vm)
|
|
|
|
{
|
2017-02-28 23:28:09 +08:00
|
|
|
if (use_4lvl(vm))
|
2017-02-15 16:43:40 +08:00
|
|
|
free_pdp(vm, vm->scratch_pdp);
|
|
|
|
free_pd(vm, vm->scratch_pd);
|
|
|
|
free_pt(vm, vm->scratch_pt);
|
|
|
|
cleanup_scratch_page(vm);
|
2015-06-30 23:16:40 +08:00
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:40 +08:00
|
|
|
static void gen8_ppgtt_cleanup_3lvl(struct i915_address_space *vm,
|
2015-07-30 18:05:29 +08:00
|
|
|
struct i915_page_directory_pointer *pdp)
|
2014-02-13 06:28:44 +08:00
|
|
|
{
|
2017-02-28 23:28:07 +08:00
|
|
|
const unsigned int pdpes = i915_pdpes_per_pdp(vm);
|
2014-02-13 06:28:44 +08:00
|
|
|
int i;
|
|
|
|
|
2017-02-28 23:28:07 +08:00
|
|
|
for (i = 0; i < pdpes; i++) {
|
2017-02-15 16:43:47 +08:00
|
|
|
if (pdp->page_directory[i] == vm->scratch_pd)
|
drm/i915: Create page table allocators
As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks. This makes the
code easier to write, read, and verify.
Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,
The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain additional complexity as a
result.
This patch exists for a few reasons:
1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the DMA mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.
2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).
3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and catching leaks should be easy.
4. Less important: the GFP flags are confined to one location, which
makes playing around with such things trivial.
v2: Updated commit message to explain why this patch exists
v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/
v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)
v5: Added additional safety checks in gen8 clear/free/unmap.
v6: Use WARN_ON and return -EINVAL in alloc_pt_range (Mika).
v7: Make err_out loop symmetrical to the way we allocate in
alloc_pt_range. Also s/page_tables/page_table and correct commit
message (Mika)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-25 00:22:36 +08:00
|
|
|
continue;
|
|
|
|
|
2017-02-15 16:43:40 +08:00
|
|
|
gen8_free_page_tables(vm, pdp->page_directory[i]);
|
|
|
|
free_pd(vm, pdp->page_directory[i]);
|
drm/i915/bdw: Reorganize PT allocations
The previous allocation mechanism would get 2 contiguous allocations,
one for the page directories, and one for the page tables. As each page
table is 1 page, and there are 512 of these per page directory, this
goes to 2MB. An unfriendly request at best. Worse still, our HW now
supports 4 page directories, and a 2MB allocation is not allowed.
In order to fix this, this patch attempts to split up each page table
allocation into a single, discrete allocation. There is nothing really
fancy about the patch itself, it just has to manage an extra pointer
indirection, and have a fancier bit of logic to free up the pages.
To accommodate some of the added complexity, two new helpers are
introduced to allocate, and free the page table pages.
NOTE: I really wanted to split the way we do allocations, and the way in
which we identify the page table/page directory being used. I found
splitting this functionality up to be too unwieldy. I apologize in
advance to the reviewer. I'd recommend looking at the result, rather
than the diff.
v2/NOTE2: This patch predated commit:
6f1cc993518462ccf039e195fabd47e7aa5bfd13
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Dec 31 15:50:31 2013 +0000
drm/i915: Avoid dereference past end of page arr
It fixed the same issue as that patch, but because of the limbo state of
PPGTT, Chris patch was merged instead. The excess churn is a result of
my using my original patch, which has my preferred naming. Primarily
act_* is changed to which_*, but it's mostly the same otherwise. I've
kept the convention Chris used for the pte wrap (I had something
slightly different, and broken - but fixable)
v3: Rename which_p[..]e to drop which_ (Chris)
Remove BUG_ON in inner loop (Chris)
Redo the pde/pdpe wrap logic (Chris)
v4: s/1MB/2MB in commit message (Imre)
Plug leaking gen8_pt_pages in both the error path, as well as general
free case (Imre)
v5: Rename leftover "which_" variables (Imre)
Add the pde = 0 wrap that was missed from v3 (Imre)
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Squash in fixup from Ben.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-21 03:51:21 +08:00
|
|
|
}
|
2015-04-08 19:13:27 +08:00
|
|
|
|
2017-02-15 16:43:40 +08:00
|
|
|
free_pdp(vm, pdp);
|
2015-07-30 18:05:29 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void gen8_ppgtt_cleanup_4lvl(struct i915_hw_ppgtt *ppgtt)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
2017-02-15 16:43:49 +08:00
|
|
|
for (i = 0; i < GEN8_PML4ES_PER_PML4; i++) {
|
|
|
|
if (ppgtt->pml4.pdps[i] == ppgtt->base.scratch_pdp)
|
2015-07-30 18:05:29 +08:00
|
|
|
continue;
|
|
|
|
|
2017-02-15 16:43:40 +08:00
|
|
|
gen8_ppgtt_cleanup_3lvl(&ppgtt->base, ppgtt->pml4.pdps[i]);
|
2015-07-30 18:05:29 +08:00
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:40 +08:00
|
|
|
cleanup_px(&ppgtt->base, &ppgtt->pml4);
|
2015-07-30 18:05:29 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
|
|
|
|
{
|
2016-11-29 17:50:08 +08:00
|
|
|
struct drm_i915_private *dev_priv = vm->i915;
|
2016-04-07 16:08:03 +08:00
|
|
|
struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
|
2015-07-30 18:05:29 +08:00
|
|
|
|
2016-11-16 16:55:34 +08:00
|
|
|
if (intel_vgpu_active(dev_priv))
|
2015-08-28 15:41:18 +08:00
|
|
|
gen8_ppgtt_notify_vgt(ppgtt, false);
|
|
|
|
|
2017-02-28 23:28:09 +08:00
|
|
|
if (use_4lvl(vm))
|
2015-07-30 18:05:29 +08:00
|
|
|
gen8_ppgtt_cleanup_4lvl(ppgtt);
|
2017-02-28 23:28:09 +08:00
|
|
|
else
|
|
|
|
gen8_ppgtt_cleanup_3lvl(&ppgtt->base, &ppgtt->pdp);
|
2015-07-30 18:02:03 +08:00
|
|
|
|
2015-06-30 23:16:40 +08:00
|
|
|
gen8_free_scratch(vm);
|
2014-02-13 06:28:44 +08:00
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:47 +08:00
|
|
|
static int gen8_ppgtt_alloc_pd(struct i915_address_space *vm,
|
|
|
|
struct i915_page_directory *pd,
|
|
|
|
u64 start, u64 length)
|
2014-02-20 14:05:43 +08:00
|
|
|
{
|
drm/i915/gen8: Dynamic page table allocations
This finishes off the dynamic page tables allocations, in the legacy 3
level style that already exists. Most everything has already been setup
to this point, the patch finishes off the enabling by setting the
appropriate function pointers.
In LRC mode, contexts need to know the PDPs when they are populated. With
dynamic page table allocations, these PDPs may not exist yet. Check if
PDPs have been allocated and use the scratch page if they do not exist yet.
Before submission, update the PDPs in the logic ring context as PDPs
have been allocated.
v2: Update aliasing/true ppgtt allocate/teardown/clear functions for
gen 6 & 7.
v3: Rebase.
v4: Remove BUG() from ppgtt_unbind_vma, but keep checking that either
teardown_va_range or clear_range functions exist (Daniel).
v5: Similar to gen6, in init, gen8_ppgtt_clear_range call is only needed
for aliasing ppgtt. Zombie tracking was originally added for teardown
function and is no longer required.
v6: Update err_out case in gen8_alloc_va_range (missed from lastest
rebase).
v7: Rebase after s/page_tables/page_table/.
v8: Updated scratch_pt check after scratch flag was removed in previous
patch.
v9: Note that lrc mode needs to be updated to support init state without
any PDP.
v10: Unmap correct page_table in gen8_alloc_va_range's error case, clean-up
gen8_aliasing_ppgtt_init (remove duplicated map), and initialize PTs
during page table allocation.
v11: Squashed LRC enabling commit, otherwise LRC mode would be left broken
until it was updated to handle the init case without any PDP.
v12: Do not overallocate new_pts bitmap, make alloc_gen8_temp_bitmaps
static and don't abuse of inline functions. (Mika)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-08 19:13:34 +08:00
|
|
|
struct i915_page_table *pt;
|
2017-02-15 16:43:46 +08:00
|
|
|
u64 from = start;
|
2017-02-15 16:43:47 +08:00
|
|
|
unsigned int pde;
|
2014-02-20 14:05:43 +08:00
|
|
|
|
2015-12-08 21:30:51 +08:00
|
|
|
gen8_for_each_pde(pt, pd, start, length, pde) {
|
2017-09-09 02:16:22 +08:00
|
|
|
int count = gen8_pte_count(start, length);
|
|
|
|
|
2017-02-15 16:43:47 +08:00
|
|
|
if (pt == vm->scratch_pt) {
|
2017-02-15 16:43:46 +08:00
|
|
|
pt = alloc_pt(vm);
|
|
|
|
if (IS_ERR(pt))
|
|
|
|
goto unwind;
|
2015-04-08 19:13:28 +08:00
|
|
|
|
2017-09-09 02:16:22 +08:00
|
|
|
if (count < GEN8_PTES)
|
|
|
|
gen8_initialize_pt(vm, pt);
|
2017-02-15 16:43:47 +08:00
|
|
|
|
|
|
|
gen8_ppgtt_set_pde(vm, pd, pt, pde);
|
|
|
|
pd->used_pdes++;
|
2017-02-27 20:26:52 +08:00
|
|
|
GEM_BUG_ON(pd->used_pdes > I915_PDES);
|
2017-02-15 16:43:46 +08:00
|
|
|
}
|
2017-02-15 16:43:47 +08:00
|
|
|
|
2017-09-09 02:16:22 +08:00
|
|
|
pt->used_ptes += count;
|
drm/i915/bdw: Reorganize PT allocations
The previous allocation mechanism would get 2 contiguous allocations,
one for the page directories, and one for the page tables. As each page
table is 1 page, and there are 512 of these per page directory, this
goes to 2MB. An unfriendly request at best. Worse still, our HW now
supports 4 page directories, and a 2MB allocation is not allowed.
In order to fix this, this patch attempts to split up each page table
allocation into a single, discrete allocation. There is nothing really
fancy about the patch itself, it just has to manage an extra pointer
indirection, and have a fancier bit of logic to free up the pages.
To accommodate some of the added complexity, two new helpers are
introduced to allocate, and free the page table pages.
NOTE: I really wanted to split the way we do allocations, and the way in
which we identify the page table/page directory being used. I found
splitting this functionality up to be too unwieldy. I apologize in
advance to the reviewer. I'd recommend looking at the result, rather
than the diff.
v2/NOTE2: This patch predated commit:
6f1cc993518462ccf039e195fabd47e7aa5bfd13
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Dec 31 15:50:31 2013 +0000
drm/i915: Avoid dereference past end of page arr
It fixed the same issue as that patch, but because of the limbo state of
PPGTT, Chris patch was merged instead. The excess churn is a result of
my using my original patch, which has my preferred naming. Primarily
act_* is changed to which_*, but it's mostly the same otherwise. I've
kept the convention Chris used for the pte wrap (I had something
slightly different, and broken - but fixable)
v3: Rename which_p[..]e to drop which_ (Chris)
Remove BUG_ON in inner loop (Chris)
Redo the pde/pdpe wrap logic (Chris)
v4: s/1MB/2MB in commit message (Imre)
Plug leaking gen8_pt_pages in both the error path, as well as general
free case (Imre)
v5: Rename leftover "which_" variables (Imre)
Add the pde = 0 wrap that was missed from v3 (Imre)
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Squash in fixup from Ben.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-21 03:51:21 +08:00
|
|
|
}
|
2014-02-20 14:05:43 +08:00
|
|
|
return 0;
|
drm/i915/bdw: Reorganize PT allocations
The previous allocation mechanism would get 2 contiguous allocations,
one for the page directories, and one for the page tables. As each page
table is 1 page, and there are 512 of these per page directory, this
goes to 2MB. An unfriendly request at best. Worse still, our HW now
supports 4 page directories, and a 2MB allocation is not allowed.
In order to fix this, this patch attempts to split up each page table
allocation into a single, discrete allocation. There is nothing really
fancy about the patch itself, it just has to manage an extra pointer
indirection, and have a fancier bit of logic to free up the pages.
To accommodate some of the added complexity, two new helpers are
introduced to allocate, and free the page table pages.
NOTE: I really wanted to split the way we do allocations, and the way in
which we identify the page table/page directory being used. I found
splitting this functionality up to be too unwieldy. I apologize in
advance to the reviewer. I'd recommend looking at the result, rather
than the diff.
v2/NOTE2: This patch predated commit:
6f1cc993518462ccf039e195fabd47e7aa5bfd13
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Dec 31 15:50:31 2013 +0000
drm/i915: Avoid dereference past end of page arr
It fixed the same issue as that patch, but because of the limbo state of
PPGTT, Chris patch was merged instead. The excess churn is a result of
my using my original patch, which has my preferred naming. Primarily
act_* is changed to which_*, but it's mostly the same otherwise. I've
kept the convention Chris used for the pte wrap (I had something
slightly different, and broken - but fixable)
v3: Rename which_p[..]e to drop which_ (Chris)
Remove BUG_ON in inner loop (Chris)
Redo the pde/pdpe wrap logic (Chris)
v4: s/1MB/2MB in commit message (Imre)
Plug leaking gen8_pt_pages in both the error path, as well as general
free case (Imre)
v5: Rename leftover "which_" variables (Imre)
Add the pde = 0 wrap that was missed from v3 (Imre)
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Squash in fixup from Ben.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-21 03:51:21 +08:00
|
|
|
|
2017-02-15 16:43:46 +08:00
|
|
|
unwind:
|
|
|
|
gen8_ppgtt_clear_pd(vm, pd, from, start - from);
|
2015-02-25 00:22:34 +08:00
|
|
|
return -ENOMEM;
|
2014-02-20 14:05:43 +08:00
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:49 +08:00
|
|
|
static int gen8_ppgtt_alloc_pdp(struct i915_address_space *vm,
|
|
|
|
struct i915_page_directory_pointer *pdp,
|
|
|
|
u64 start, u64 length)
|
2014-02-20 14:05:43 +08:00
|
|
|
{
|
2015-04-08 19:13:28 +08:00
|
|
|
struct i915_page_directory *pd;
|
2017-02-15 16:43:48 +08:00
|
|
|
u64 from = start;
|
|
|
|
unsigned int pdpe;
|
2014-02-20 14:05:43 +08:00
|
|
|
int ret;
|
|
|
|
|
2015-12-08 21:30:51 +08:00
|
|
|
gen8_for_each_pdpe(pd, pdp, start, length, pdpe) {
|
2017-02-15 16:43:48 +08:00
|
|
|
if (pd == vm->scratch_pd) {
|
|
|
|
pd = alloc_pd(vm);
|
|
|
|
if (IS_ERR(pd))
|
|
|
|
goto unwind;
|
2015-04-08 19:13:28 +08:00
|
|
|
|
2017-02-15 16:43:48 +08:00
|
|
|
gen8_initialize_pd(vm, pd);
|
2017-02-15 16:43:47 +08:00
|
|
|
gen8_ppgtt_set_pdpe(vm, pdp, pd, pdpe);
|
2017-02-15 16:43:48 +08:00
|
|
|
pdp->used_pdpes++;
|
2017-02-28 23:28:07 +08:00
|
|
|
GEM_BUG_ON(pdp->used_pdpes > i915_pdpes_per_pdp(vm));
|
2017-02-15 16:43:51 +08:00
|
|
|
|
|
|
|
mark_tlbs_dirty(i915_vm_to_ppgtt(vm));
|
2017-02-15 16:43:48 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
ret = gen8_ppgtt_alloc_pd(vm, pd, start, length);
|
2017-02-27 20:26:52 +08:00
|
|
|
if (unlikely(ret))
|
|
|
|
goto unwind_pd;
|
2017-02-15 16:43:47 +08:00
|
|
|
}
|
2015-04-08 19:13:33 +08:00
|
|
|
|
2015-02-25 00:22:34 +08:00
|
|
|
return 0;
|
2014-02-20 14:05:43 +08:00
|
|
|
|
2017-02-27 20:26:52 +08:00
|
|
|
unwind_pd:
|
|
|
|
if (!pd->used_pdes) {
|
|
|
|
gen8_ppgtt_set_pdpe(vm, pdp, vm->scratch_pd, pdpe);
|
|
|
|
GEM_BUG_ON(!pdp->used_pdpes);
|
|
|
|
pdp->used_pdpes--;
|
|
|
|
free_pd(vm, pd);
|
|
|
|
}
|
2017-02-15 16:43:48 +08:00
|
|
|
unwind:
|
|
|
|
gen8_ppgtt_clear_pdp(vm, pdp, from, start - from);
|
|
|
|
return -ENOMEM;
|
2014-02-20 14:05:43 +08:00
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:49 +08:00
|
|
|
static int gen8_ppgtt_alloc_3lvl(struct i915_address_space *vm,
|
|
|
|
u64 start, u64 length)
|
2015-07-30 18:05:29 +08:00
|
|
|
{
|
2017-02-15 16:43:49 +08:00
|
|
|
return gen8_ppgtt_alloc_pdp(vm,
|
|
|
|
&i915_vm_to_ppgtt(vm)->pdp, start, length);
|
|
|
|
}
|
2015-07-30 18:05:29 +08:00
|
|
|
|
2017-02-15 16:43:49 +08:00
|
|
|
static int gen8_ppgtt_alloc_4lvl(struct i915_address_space *vm,
|
|
|
|
u64 start, u64 length)
|
|
|
|
{
|
|
|
|
struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
|
|
|
|
struct i915_pml4 *pml4 = &ppgtt->pml4;
|
|
|
|
struct i915_page_directory_pointer *pdp;
|
|
|
|
u64 from = start;
|
|
|
|
u32 pml4e;
|
|
|
|
int ret;
|
2015-07-30 18:05:29 +08:00
|
|
|
|
2015-12-08 21:30:51 +08:00
|
|
|
gen8_for_each_pml4e(pdp, pml4, start, length, pml4e) {
|
2017-02-15 16:43:49 +08:00
|
|
|
if (pml4->pdps[pml4e] == vm->scratch_pdp) {
|
|
|
|
pdp = alloc_pdp(vm);
|
|
|
|
if (IS_ERR(pdp))
|
|
|
|
goto unwind;
|
2015-07-30 18:05:29 +08:00
|
|
|
|
2017-02-15 16:43:49 +08:00
|
|
|
gen8_initialize_pdp(vm, pdp);
|
|
|
|
gen8_ppgtt_set_pml4e(pml4, pdp, pml4e);
|
|
|
|
}
|
2015-07-30 18:05:29 +08:00
|
|
|
|
2017-02-15 16:43:49 +08:00
|
|
|
ret = gen8_ppgtt_alloc_pdp(vm, pdp, start, length);
|
2017-02-27 20:26:52 +08:00
|
|
|
if (unlikely(ret))
|
|
|
|
goto unwind_pdp;
|
2015-07-30 18:05:29 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
2017-02-27 20:26:52 +08:00
|
|
|
unwind_pdp:
|
|
|
|
if (!pdp->used_pdpes) {
|
|
|
|
gen8_ppgtt_set_pml4e(pml4, vm->scratch_pdp, pml4e);
|
|
|
|
free_pdp(vm, pdp);
|
|
|
|
}
|
2017-02-15 16:43:49 +08:00
|
|
|
unwind:
|
|
|
|
gen8_ppgtt_clear_4lvl(vm, from, start - from);
|
|
|
|
return -ENOMEM;
|
2015-07-30 18:05:29 +08:00
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:40 +08:00
|
|
|
static void gen8_dump_pdp(struct i915_hw_ppgtt *ppgtt,
|
|
|
|
struct i915_page_directory_pointer *pdp,
|
2017-02-15 16:43:57 +08:00
|
|
|
u64 start, u64 length,
|
2015-07-30 00:23:57 +08:00
|
|
|
gen8_pte_t scratch_pte,
|
|
|
|
struct seq_file *m)
|
|
|
|
{
|
2017-02-28 23:28:07 +08:00
|
|
|
struct i915_address_space *vm = &ppgtt->base;
|
2015-07-30 00:23:57 +08:00
|
|
|
struct i915_page_directory *pd;
|
2017-02-15 16:43:57 +08:00
|
|
|
u32 pdpe;
|
2015-07-30 00:23:57 +08:00
|
|
|
|
2015-12-08 21:30:51 +08:00
|
|
|
gen8_for_each_pdpe(pd, pdp, start, length, pdpe) {
|
2015-07-30 00:23:57 +08:00
|
|
|
struct i915_page_table *pt;
|
2017-02-15 16:43:57 +08:00
|
|
|
u64 pd_len = length;
|
|
|
|
u64 pd_start = start;
|
|
|
|
u32 pde;
|
2015-07-30 00:23:57 +08:00
|
|
|
|
2017-02-15 16:43:48 +08:00
|
|
|
if (pdp->page_directory[pdpe] == ppgtt->base.scratch_pd)
|
2015-07-30 00:23:57 +08:00
|
|
|
continue;
|
|
|
|
|
|
|
|
seq_printf(m, "\tPDPE #%d\n", pdpe);
|
2015-12-08 21:30:51 +08:00
|
|
|
gen8_for_each_pde(pt, pd, pd_start, pd_len, pde) {
|
2017-02-15 16:43:57 +08:00
|
|
|
u32 pte;
|
2015-07-30 00:23:57 +08:00
|
|
|
gen8_pte_t *pt_vaddr;
|
|
|
|
|
2017-02-15 16:43:47 +08:00
|
|
|
if (pd->page_table[pde] == ppgtt->base.scratch_pt)
|
2015-07-30 00:23:57 +08:00
|
|
|
continue;
|
|
|
|
|
2017-02-15 16:43:41 +08:00
|
|
|
pt_vaddr = kmap_atomic_px(pt);
|
2015-07-30 00:23:57 +08:00
|
|
|
for (pte = 0; pte < GEN8_PTES; pte += 4) {
|
2017-02-15 16:43:57 +08:00
|
|
|
u64 va = (pdpe << GEN8_PDPE_SHIFT |
|
|
|
|
pde << GEN8_PDE_SHIFT |
|
|
|
|
pte << GEN8_PTE_SHIFT);
|
2015-07-30 00:23:57 +08:00
|
|
|
int i;
|
|
|
|
bool found = false;
|
|
|
|
|
|
|
|
for (i = 0; i < 4; i++)
|
|
|
|
if (pt_vaddr[pte + i] != scratch_pte)
|
|
|
|
found = true;
|
|
|
|
if (!found)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
seq_printf(m, "\t\t0x%llx [%03d,%03d,%04d]: =", va, pdpe, pde, pte);
|
|
|
|
for (i = 0; i < 4; i++) {
|
|
|
|
if (pt_vaddr[pte + i] != scratch_pte)
|
|
|
|
seq_printf(m, " %llx", pt_vaddr[pte + i]);
|
|
|
|
else
|
|
|
|
seq_puts(m, " SCRATCH ");
|
|
|
|
}
|
|
|
|
seq_puts(m, "\n");
|
|
|
|
}
|
|
|
|
kunmap_atomic(pt_vaddr);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void gen8_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
|
|
|
|
{
|
|
|
|
struct i915_address_space *vm = &ppgtt->base;
|
2017-02-15 16:43:37 +08:00
|
|
|
const gen8_pte_t scratch_pte =
|
|
|
|
gen8_pte_encode(vm->scratch_page.daddr, I915_CACHE_LLC);
|
2017-02-15 16:43:54 +08:00
|
|
|
u64 start = 0, length = ppgtt->base.total;
|
2015-07-30 00:23:57 +08:00
|
|
|
|
2017-02-28 23:28:09 +08:00
|
|
|
if (use_4lvl(vm)) {
|
2017-02-15 16:43:57 +08:00
|
|
|
u64 pml4e;
|
2015-07-30 00:23:57 +08:00
|
|
|
struct i915_pml4 *pml4 = &ppgtt->pml4;
|
|
|
|
struct i915_page_directory_pointer *pdp;
|
|
|
|
|
2015-12-08 21:30:51 +08:00
|
|
|
gen8_for_each_pml4e(pdp, pml4, start, length, pml4e) {
|
2017-02-15 16:43:49 +08:00
|
|
|
if (pml4->pdps[pml4e] == ppgtt->base.scratch_pdp)
|
2015-07-30 00:23:57 +08:00
|
|
|
continue;
|
|
|
|
|
|
|
|
seq_printf(m, " PML4E #%llu\n", pml4e);
|
2017-02-15 16:43:40 +08:00
|
|
|
gen8_dump_pdp(ppgtt, pdp, start, length, scratch_pte, m);
|
2015-07-30 00:23:57 +08:00
|
|
|
}
|
2017-02-28 23:28:09 +08:00
|
|
|
} else {
|
|
|
|
gen8_dump_pdp(ppgtt, &ppgtt->pdp, start, length, scratch_pte, m);
|
2015-07-30 00:23:57 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:48 +08:00
|
|
|
static int gen8_preallocate_top_level_pdp(struct i915_hw_ppgtt *ppgtt)
|
2015-08-28 15:41:14 +08:00
|
|
|
{
|
2017-02-15 16:43:48 +08:00
|
|
|
struct i915_address_space *vm = &ppgtt->base;
|
|
|
|
struct i915_page_directory_pointer *pdp = &ppgtt->pdp;
|
|
|
|
struct i915_page_directory *pd;
|
|
|
|
u64 start = 0, length = ppgtt->base.total;
|
|
|
|
u64 from = start;
|
|
|
|
unsigned int pdpe;
|
2015-08-28 15:41:14 +08:00
|
|
|
|
2017-02-15 16:43:48 +08:00
|
|
|
gen8_for_each_pdpe(pd, pdp, start, length, pdpe) {
|
|
|
|
pd = alloc_pd(vm);
|
|
|
|
if (IS_ERR(pd))
|
|
|
|
goto unwind;
|
2015-08-28 15:41:14 +08:00
|
|
|
|
2017-02-15 16:43:48 +08:00
|
|
|
gen8_initialize_pd(vm, pd);
|
|
|
|
gen8_ppgtt_set_pdpe(vm, pdp, pd, pdpe);
|
|
|
|
pdp->used_pdpes++;
|
|
|
|
}
|
2015-08-28 15:41:14 +08:00
|
|
|
|
2017-02-15 16:43:48 +08:00
|
|
|
pdp->used_pdpes++; /* never remove */
|
|
|
|
return 0;
|
2015-08-28 15:41:14 +08:00
|
|
|
|
2017-02-15 16:43:48 +08:00
|
|
|
unwind:
|
|
|
|
start -= from;
|
|
|
|
gen8_for_each_pdpe(pd, pdp, from, start, pdpe) {
|
|
|
|
gen8_ppgtt_set_pdpe(vm, pdp, vm->scratch_pd, pdpe);
|
|
|
|
free_pd(vm, pd);
|
|
|
|
}
|
|
|
|
pdp->used_pdpes = 0;
|
|
|
|
return -ENOMEM;
|
2015-08-28 15:41:14 +08:00
|
|
|
}
|
|
|
|
|
2015-03-18 21:47:59 +08:00
|
|
|
/*
|
2014-02-20 14:05:42 +08:00
|
|
|
* GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
|
|
|
|
* with a net effect resembling a 2-level page table in normal x86 terms. Each
|
|
|
|
* PDP represents 1GB of memory 4 * 512 * 512 * 4096 = 4GB legacy 32b address
|
|
|
|
* space.
|
2013-11-05 12:47:32 +08:00
|
|
|
*
|
2014-02-20 14:05:42 +08:00
|
|
|
*/
|
2015-04-14 23:35:14 +08:00
|
|
|
static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
|
2013-11-05 12:47:32 +08:00
|
|
|
{
|
2017-02-28 23:28:09 +08:00
|
|
|
struct i915_address_space *vm = &ppgtt->base;
|
|
|
|
struct drm_i915_private *dev_priv = vm->i915;
|
2015-06-30 23:16:40 +08:00
|
|
|
int ret;
|
2015-04-08 19:13:29 +08:00
|
|
|
|
2017-02-28 23:28:09 +08:00
|
|
|
ppgtt->base.total = USES_FULL_48BIT_PPGTT(dev_priv) ?
|
|
|
|
1ULL << 48 :
|
|
|
|
1ULL << 32;
|
|
|
|
|
2017-02-15 16:43:40 +08:00
|
|
|
/* There are only few exceptions for gen >=6. chv and bxt.
|
|
|
|
* And we are not sure about the latter so play safe for now.
|
|
|
|
*/
|
|
|
|
if (IS_CHERRYVIEW(dev_priv) || IS_BROXTON(dev_priv))
|
|
|
|
ppgtt->base.pt_kmap_wc = true;
|
|
|
|
|
2017-08-23 01:38:28 +08:00
|
|
|
ret = gen8_init_scratch(&ppgtt->base);
|
|
|
|
if (ret) {
|
|
|
|
ppgtt->base.total = 0;
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2017-02-28 23:28:09 +08:00
|
|
|
if (use_4lvl(vm)) {
|
2017-02-15 16:43:40 +08:00
|
|
|
ret = setup_px(&ppgtt->base, &ppgtt->pml4);
|
2015-07-30 18:05:29 +08:00
|
|
|
if (ret)
|
|
|
|
goto free_scratch;
|
2015-07-30 00:23:46 +08:00
|
|
|
|
2015-07-30 00:23:55 +08:00
|
|
|
gen8_initialize_pml4(&ppgtt->base, &ppgtt->pml4);
|
|
|
|
|
2017-02-28 23:28:10 +08:00
|
|
|
ppgtt->switch_mm = gen8_mm_switch_4lvl;
|
2017-02-15 16:43:49 +08:00
|
|
|
ppgtt->base.allocate_va_range = gen8_ppgtt_alloc_4lvl;
|
2017-02-15 16:43:37 +08:00
|
|
|
ppgtt->base.insert_entries = gen8_ppgtt_insert_4lvl;
|
2017-02-15 16:43:47 +08:00
|
|
|
ppgtt->base.clear_range = gen8_ppgtt_clear_4lvl;
|
2015-07-30 18:05:29 +08:00
|
|
|
} else {
|
2017-02-15 16:43:47 +08:00
|
|
|
ret = __pdp_init(&ppgtt->base, &ppgtt->pdp);
|
2015-08-03 16:52:01 +08:00
|
|
|
if (ret)
|
|
|
|
goto free_scratch;
|
|
|
|
|
2016-11-16 16:55:34 +08:00
|
|
|
if (intel_vgpu_active(dev_priv)) {
|
2017-02-15 16:43:48 +08:00
|
|
|
ret = gen8_preallocate_top_level_pdp(ppgtt);
|
|
|
|
if (ret) {
|
|
|
|
__pdp_fini(&ppgtt->pdp);
|
2015-08-28 15:41:14 +08:00
|
|
|
goto free_scratch;
|
2017-02-15 16:43:48 +08:00
|
|
|
}
|
2015-08-28 15:41:14 +08:00
|
|
|
}
|
2017-02-15 16:43:37 +08:00
|
|
|
|
2017-02-28 23:28:10 +08:00
|
|
|
ppgtt->switch_mm = gen8_mm_switch_3lvl;
|
2017-02-15 16:43:49 +08:00
|
|
|
ppgtt->base.allocate_va_range = gen8_ppgtt_alloc_3lvl;
|
2017-02-15 16:43:37 +08:00
|
|
|
ppgtt->base.insert_entries = gen8_ppgtt_insert_3lvl;
|
2017-02-15 16:43:47 +08:00
|
|
|
ppgtt->base.clear_range = gen8_ppgtt_clear_3lvl;
|
2015-08-03 16:52:01 +08:00
|
|
|
}
|
2015-07-30 00:23:46 +08:00
|
|
|
|
2016-11-16 16:55:34 +08:00
|
|
|
if (intel_vgpu_active(dev_priv))
|
2015-08-28 15:41:18 +08:00
|
|
|
gen8_ppgtt_notify_vgt(ppgtt, true);
|
|
|
|
|
2017-02-28 23:28:11 +08:00
|
|
|
ppgtt->base.cleanup = gen8_ppgtt_cleanup;
|
|
|
|
ppgtt->base.unbind_vma = ppgtt_unbind_vma;
|
|
|
|
ppgtt->base.bind_vma = ppgtt_bind_vma;
|
|
|
|
ppgtt->debug_dump = gen8_dump_ppgtt;
|
|
|
|
|
drm/i915/gen8: Dynamic page table allocations
This finishes off the dynamic page tables allocations, in the legacy 3
level style that already exists. Most everything has already been setup
to this point, the patch finishes off the enabling by setting the
appropriate function pointers.
In LRC mode, contexts need to know the PDPs when they are populated. With
dynamic page table allocations, these PDPs may not exist yet. Check if
PDPs have been allocated and use the scratch page if they do not exist yet.
Before submission, update the PDPs in the logic ring context as PDPs
have been allocated.
v2: Update aliasing/true ppgtt allocate/teardown/clear functions for
gen 6 & 7.
v3: Rebase.
v4: Remove BUG() from ppgtt_unbind_vma, but keep checking that either
teardown_va_range or clear_range functions exist (Daniel).
v5: Similar to gen6, in init, gen8_ppgtt_clear_range call is only needed
for aliasing ppgtt. Zombie tracking was originally added for teardown
function and is no longer required.
v6: Update err_out case in gen8_alloc_va_range (missed from lastest
rebase).
v7: Rebase after s/page_tables/page_table/.
v8: Updated scratch_pt check after scratch flag was removed in previous
patch.
v9: Note that lrc mode needs to be updated to support init state without
any PDP.
v10: Unmap correct page_table in gen8_alloc_va_range's error case, clean-up
gen8_aliasing_ppgtt_init (remove duplicated map), and initialize PTs
during page table allocation.
v11: Squashed LRC enabling commit, otherwise LRC mode would be left broken
until it was updated to handle the init case without any PDP.
v12: Do not overallocate new_pts bitmap, make alloc_gen8_temp_bitmaps
static and don't abuse of inline functions. (Mika)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-08 19:13:34 +08:00
|
|
|
return 0;
|
2015-07-30 00:23:46 +08:00
|
|
|
|
|
|
|
free_scratch:
|
|
|
|
gen8_free_scratch(&ppgtt->base);
|
|
|
|
return ret;
|
drm/i915/gen8: Dynamic page table allocations
This finishes off the dynamic page tables allocations, in the legacy 3
level style that already exists. Most everything has already been setup
to this point, the patch finishes off the enabling by setting the
appropriate function pointers.
In LRC mode, contexts need to know the PDPs when they are populated. With
dynamic page table allocations, these PDPs may not exist yet. Check if
PDPs have been allocated and use the scratch page if they do not exist yet.
Before submission, update the PDPs in the logic ring context as PDPs
have been allocated.
v2: Update aliasing/true ppgtt allocate/teardown/clear functions for
gen 6 & 7.
v3: Rebase.
v4: Remove BUG() from ppgtt_unbind_vma, but keep checking that either
teardown_va_range or clear_range functions exist (Daniel).
v5: Similar to gen6, in init, gen8_ppgtt_clear_range call is only needed
for aliasing ppgtt. Zombie tracking was originally added for teardown
function and is no longer required.
v6: Update err_out case in gen8_alloc_va_range (missed from lastest
rebase).
v7: Rebase after s/page_tables/page_table/.
v8: Updated scratch_pt check after scratch flag was removed in previous
patch.
v9: Note that lrc mode needs to be updated to support init state without
any PDP.
v10: Unmap correct page_table in gen8_alloc_va_range's error case, clean-up
gen8_aliasing_ppgtt_init (remove duplicated map), and initialize PTs
during page table allocation.
v11: Squashed LRC enabling commit, otherwise LRC mode would be left broken
until it was updated to handle the init case without any PDP.
v12: Do not overallocate new_pts bitmap, make alloc_gen8_temp_bitmaps
static and don't abuse of inline functions. (Mika)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-08 19:13:34 +08:00
|
|
|
}
|
|
|
|
|
2013-12-07 06:11:29 +08:00
|
|
|
static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
|
|
|
|
{
|
|
|
|
struct i915_address_space *vm = &ppgtt->base;
|
2015-04-08 19:13:30 +08:00
|
|
|
struct i915_page_table *unused;
|
2015-03-17 00:00:54 +08:00
|
|
|
gen6_pte_t scratch_pte;
|
2017-02-15 16:43:54 +08:00
|
|
|
u32 pd_entry, pte, pde;
|
|
|
|
u32 start = 0, length = ppgtt->base.total;
|
2013-12-07 06:11:29 +08:00
|
|
|
|
2016-08-22 15:44:30 +08:00
|
|
|
scratch_pte = vm->pte_encode(vm->scratch_page.daddr,
|
2016-10-13 20:02:40 +08:00
|
|
|
I915_CACHE_LLC, 0);
|
2013-12-07 06:11:29 +08:00
|
|
|
|
2016-06-25 02:37:46 +08:00
|
|
|
gen6_for_each_pde(unused, &ppgtt->pd, start, length, pde) {
|
2013-12-07 06:11:29 +08:00
|
|
|
u32 expected;
|
2015-03-17 00:00:54 +08:00
|
|
|
gen6_pte_t *pt_vaddr;
|
2015-06-25 23:35:12 +08:00
|
|
|
const dma_addr_t pt_addr = px_dma(ppgtt->pd.page_table[pde]);
|
2015-04-08 19:13:30 +08:00
|
|
|
pd_entry = readl(ppgtt->pd_addr + pde);
|
2013-12-07 06:11:29 +08:00
|
|
|
expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
|
|
|
|
|
|
|
|
if (pd_entry != expected)
|
|
|
|
seq_printf(m, "\tPDE #%d mismatch: Actual PDE: %x Expected PDE: %x\n",
|
|
|
|
pde,
|
|
|
|
pd_entry,
|
|
|
|
expected);
|
|
|
|
seq_printf(m, "\tPDE: %x\n", pd_entry);
|
|
|
|
|
2017-02-15 16:43:41 +08:00
|
|
|
pt_vaddr = kmap_atomic_px(ppgtt->pd.page_table[pde]);
|
2015-06-25 23:35:11 +08:00
|
|
|
|
2015-03-17 00:00:54 +08:00
|
|
|
for (pte = 0; pte < GEN6_PTES; pte+=4) {
|
2013-12-07 06:11:29 +08:00
|
|
|
unsigned long va =
|
2015-03-17 00:00:54 +08:00
|
|
|
(pde * PAGE_SIZE * GEN6_PTES) +
|
2013-12-07 06:11:29 +08:00
|
|
|
(pte * PAGE_SIZE);
|
|
|
|
int i;
|
|
|
|
bool found = false;
|
|
|
|
for (i = 0; i < 4; i++)
|
|
|
|
if (pt_vaddr[pte + i] != scratch_pte)
|
|
|
|
found = true;
|
|
|
|
if (!found)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
seq_printf(m, "\t\t0x%lx [%03d,%04d]: =", va, pde, pte);
|
|
|
|
for (i = 0; i < 4; i++) {
|
|
|
|
if (pt_vaddr[pte + i] != scratch_pte)
|
|
|
|
seq_printf(m, " %08x", pt_vaddr[pte + i]);
|
|
|
|
else
|
|
|
|
seq_puts(m, " SCRATCH ");
|
|
|
|
}
|
|
|
|
seq_puts(m, "\n");
|
|
|
|
}
|
2017-02-15 16:43:41 +08:00
|
|
|
kunmap_atomic(pt_vaddr);
|
2013-12-07 06:11:29 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-03-17 00:00:56 +08:00
|
|
|
/* Write pde (index) from the page directory @pd to the page table @pt */
|
2017-02-15 16:43:45 +08:00
|
|
|
static inline void gen6_write_pde(const struct i915_hw_ppgtt *ppgtt,
|
|
|
|
const unsigned int pde,
|
|
|
|
const struct i915_page_table *pt)
|
2013-04-09 09:43:54 +08:00
|
|
|
{
|
2015-03-17 00:00:56 +08:00
|
|
|
/* Caller needs to make sure the write completes if necessary */
|
2017-02-15 16:43:45 +08:00
|
|
|
writel_relaxed(GEN6_PDE_ADDR_ENCODE(px_dma(pt)) | GEN6_PDE_VALID,
|
|
|
|
ppgtt->pd_addr + pde);
|
2015-03-17 00:00:56 +08:00
|
|
|
}
|
2013-04-09 09:43:54 +08:00
|
|
|
|
2015-03-17 00:00:56 +08:00
|
|
|
/* Write all the page tables found in the ppgtt structure to incrementing page
|
|
|
|
* directories. */
|
2017-02-15 16:43:45 +08:00
|
|
|
static void gen6_write_page_range(struct i915_hw_ppgtt *ppgtt,
|
2017-02-15 16:43:57 +08:00
|
|
|
u32 start, u32 length)
|
2015-03-17 00:00:56 +08:00
|
|
|
{
|
2015-04-08 19:13:23 +08:00
|
|
|
struct i915_page_table *pt;
|
2017-02-15 16:43:45 +08:00
|
|
|
unsigned int pde;
|
2015-03-17 00:00:56 +08:00
|
|
|
|
2017-02-15 16:43:45 +08:00
|
|
|
gen6_for_each_pde(pt, &ppgtt->pd, start, length, pde)
|
|
|
|
gen6_write_pde(ppgtt, pde, pt);
|
2015-03-17 00:00:56 +08:00
|
|
|
|
2017-02-15 16:43:45 +08:00
|
|
|
mark_tlbs_dirty(ppgtt);
|
2017-02-15 16:43:46 +08:00
|
|
|
wmb();
|
2013-04-24 14:15:32 +08:00
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:57 +08:00
|
|
|
static inline u32 get_pd_offset(struct i915_hw_ppgtt *ppgtt)
|
2013-04-24 14:15:32 +08:00
|
|
|
{
|
2017-02-15 16:43:46 +08:00
|
|
|
GEM_BUG_ON(ppgtt->pd.base.ggtt_offset & 0x3f);
|
|
|
|
return ppgtt->pd.base.ggtt_offset << 10;
|
2013-12-07 06:11:09 +08:00
|
|
|
}
|
|
|
|
|
2013-12-07 06:11:12 +08:00
|
|
|
static int hsw_mm_switch(struct i915_hw_ppgtt *ppgtt,
|
2015-05-30 00:43:56 +08:00
|
|
|
struct drm_i915_gem_request *req)
|
2013-12-07 06:11:12 +08:00
|
|
|
{
|
2016-03-16 19:00:38 +08:00
|
|
|
struct intel_engine_cs *engine = req->engine;
|
2017-02-14 19:32:42 +08:00
|
|
|
u32 *cs;
|
2013-12-07 06:11:12 +08:00
|
|
|
|
|
|
|
/* NB: TLBs must be flushed and invalidated before a switch */
|
2017-02-14 19:32:42 +08:00
|
|
|
cs = intel_ring_begin(req, 6);
|
|
|
|
if (IS_ERR(cs))
|
|
|
|
return PTR_ERR(cs);
|
2013-12-07 06:11:12 +08:00
|
|
|
|
2017-02-14 19:32:42 +08:00
|
|
|
*cs++ = MI_LOAD_REGISTER_IMM(2);
|
|
|
|
*cs++ = i915_mmio_reg_offset(RING_PP_DIR_DCLV(engine));
|
|
|
|
*cs++ = PP_DIR_DCLV_2G;
|
|
|
|
*cs++ = i915_mmio_reg_offset(RING_PP_DIR_BASE(engine));
|
|
|
|
*cs++ = get_pd_offset(ppgtt);
|
|
|
|
*cs++ = MI_NOOP;
|
|
|
|
intel_ring_advance(req, cs);
|
2013-12-07 06:11:12 +08:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2013-12-07 06:11:11 +08:00
|
|
|
static int gen7_mm_switch(struct i915_hw_ppgtt *ppgtt,
|
2015-05-30 00:43:56 +08:00
|
|
|
struct drm_i915_gem_request *req)
|
2013-12-07 06:11:11 +08:00
|
|
|
{
|
2016-03-16 19:00:38 +08:00
|
|
|
struct intel_engine_cs *engine = req->engine;
|
2017-02-14 19:32:42 +08:00
|
|
|
u32 *cs;
|
2013-12-07 06:11:11 +08:00
|
|
|
|
|
|
|
/* NB: TLBs must be flushed and invalidated before a switch */
|
2017-02-14 19:32:42 +08:00
|
|
|
cs = intel_ring_begin(req, 6);
|
|
|
|
if (IS_ERR(cs))
|
|
|
|
return PTR_ERR(cs);
|
|
|
|
|
|
|
|
*cs++ = MI_LOAD_REGISTER_IMM(2);
|
|
|
|
*cs++ = i915_mmio_reg_offset(RING_PP_DIR_DCLV(engine));
|
|
|
|
*cs++ = PP_DIR_DCLV_2G;
|
|
|
|
*cs++ = i915_mmio_reg_offset(RING_PP_DIR_BASE(engine));
|
|
|
|
*cs++ = get_pd_offset(ppgtt);
|
|
|
|
*cs++ = MI_NOOP;
|
|
|
|
intel_ring_advance(req, cs);
|
2013-12-07 06:11:11 +08:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2013-12-07 06:11:10 +08:00
|
|
|
static int gen6_mm_switch(struct i915_hw_ppgtt *ppgtt,
|
2015-05-30 00:43:56 +08:00
|
|
|
struct drm_i915_gem_request *req)
|
2013-12-07 06:11:10 +08:00
|
|
|
{
|
2016-03-16 19:00:38 +08:00
|
|
|
struct intel_engine_cs *engine = req->engine;
|
2016-07-04 15:48:31 +08:00
|
|
|
struct drm_i915_private *dev_priv = req->i915;
|
2013-12-07 06:11:11 +08:00
|
|
|
|
2016-03-16 19:00:36 +08:00
|
|
|
I915_WRITE(RING_PP_DIR_DCLV(engine), PP_DIR_DCLV_2G);
|
|
|
|
I915_WRITE(RING_PP_DIR_BASE(engine), get_pd_offset(ppgtt));
|
2013-12-07 06:11:10 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2016-11-16 16:55:31 +08:00
|
|
|
static void gen8_ppgtt_enable(struct drm_i915_private *dev_priv)
|
2013-12-07 06:11:10 +08:00
|
|
|
{
|
2016-03-16 19:00:36 +08:00
|
|
|
struct intel_engine_cs *engine;
|
drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 01:14:48 +08:00
|
|
|
enum intel_engine_id id;
|
2013-04-24 14:15:32 +08:00
|
|
|
|
drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 01:14:48 +08:00
|
|
|
for_each_engine(engine, dev_priv, id) {
|
2016-11-16 16:55:31 +08:00
|
|
|
u32 four_level = USES_FULL_48BIT_PPGTT(dev_priv) ?
|
|
|
|
GEN8_GFX_PPGTT_48B : 0;
|
2016-03-16 19:00:36 +08:00
|
|
|
I915_WRITE(RING_MODE_GEN7(engine),
|
2015-07-30 18:06:23 +08:00
|
|
|
_MASKED_BIT_ENABLE(GFX_PPGTT_ENABLE | four_level));
|
2013-12-07 06:11:10 +08:00
|
|
|
}
|
|
|
|
}
|
2013-04-09 09:43:54 +08:00
|
|
|
|
2016-11-16 16:55:31 +08:00
|
|
|
static void gen7_ppgtt_enable(struct drm_i915_private *dev_priv)
|
2013-04-24 14:15:32 +08:00
|
|
|
{
|
2016-03-16 19:00:36 +08:00
|
|
|
struct intel_engine_cs *engine;
|
2017-02-15 16:43:57 +08:00
|
|
|
u32 ecochk, ecobits;
|
drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 01:14:48 +08:00
|
|
|
enum intel_engine_id id;
|
2013-04-09 09:43:54 +08:00
|
|
|
|
2013-12-07 06:11:09 +08:00
|
|
|
ecobits = I915_READ(GAC_ECO_BITS);
|
|
|
|
I915_WRITE(GAC_ECO_BITS, ecobits | ECOBITS_PPGTT_CACHE64B);
|
2013-04-04 20:13:41 +08:00
|
|
|
|
2013-12-07 06:11:09 +08:00
|
|
|
ecochk = I915_READ(GAM_ECOCHK);
|
2016-10-13 18:03:01 +08:00
|
|
|
if (IS_HASWELL(dev_priv)) {
|
2013-12-07 06:11:09 +08:00
|
|
|
ecochk |= ECOCHK_PPGTT_WB_HSW;
|
|
|
|
} else {
|
|
|
|
ecochk |= ECOCHK_PPGTT_LLC_IVB;
|
|
|
|
ecochk &= ~ECOCHK_PPGTT_GFDT_IVB;
|
|
|
|
}
|
|
|
|
I915_WRITE(GAM_ECOCHK, ecochk);
|
2013-04-04 20:13:41 +08:00
|
|
|
|
drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 01:14:48 +08:00
|
|
|
for_each_engine(engine, dev_priv, id) {
|
2013-04-09 09:43:54 +08:00
|
|
|
/* GFX_MODE is per-ring on gen7+ */
|
2016-03-16 19:00:36 +08:00
|
|
|
I915_WRITE(RING_MODE_GEN7(engine),
|
2013-12-07 06:11:09 +08:00
|
|
|
_MASKED_BIT_ENABLE(GFX_PPGTT_ENABLE));
|
2013-04-09 09:43:54 +08:00
|
|
|
}
|
2013-12-07 06:11:09 +08:00
|
|
|
}
|
2013-04-09 09:43:54 +08:00
|
|
|
|
2016-11-16 16:55:31 +08:00
|
|
|
static void gen6_ppgtt_enable(struct drm_i915_private *dev_priv)
|
2013-12-07 06:11:09 +08:00
|
|
|
{
|
2017-02-15 16:43:57 +08:00
|
|
|
u32 ecochk, gab_ctl, ecobits;
|
2013-04-04 20:13:41 +08:00
|
|
|
|
2013-12-07 06:11:09 +08:00
|
|
|
ecobits = I915_READ(GAC_ECO_BITS);
|
|
|
|
I915_WRITE(GAC_ECO_BITS, ecobits | ECOBITS_SNB_BIT |
|
|
|
|
ECOBITS_PPGTT_CACHE64B);
|
2013-04-09 09:43:54 +08:00
|
|
|
|
2013-12-07 06:11:09 +08:00
|
|
|
gab_ctl = I915_READ(GAB_CTL);
|
|
|
|
I915_WRITE(GAB_CTL, gab_ctl | GAB_CTL_CONT_AFTER_PAGEFAULT);
|
|
|
|
|
|
|
|
ecochk = I915_READ(GAM_ECOCHK);
|
|
|
|
I915_WRITE(GAM_ECOCHK, ecochk | ECOCHK_SNB_BIT | ECOCHK_PPGTT_CACHE64B);
|
|
|
|
|
|
|
|
I915_WRITE(GFX_MODE, _MASKED_BIT_ENABLE(GFX_PPGTT_ENABLE));
|
2013-04-09 09:43:54 +08:00
|
|
|
}
|
|
|
|
|
2012-02-10 00:15:46 +08:00
|
|
|
/* PPGTT support for Sandybdrige/Gen6 and later */
|
2013-07-17 07:50:05 +08:00
|
|
|
static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
|
2017-02-15 16:43:46 +08:00
|
|
|
u64 start, u64 length)
|
2012-02-10 00:15:46 +08:00
|
|
|
{
|
2016-04-07 16:08:03 +08:00
|
|
|
struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
|
2017-02-15 16:43:46 +08:00
|
|
|
unsigned int first_entry = start >> PAGE_SHIFT;
|
|
|
|
unsigned int pde = first_entry / GEN6_PTES;
|
|
|
|
unsigned int pte = first_entry % GEN6_PTES;
|
|
|
|
unsigned int num_entries = length >> PAGE_SHIFT;
|
|
|
|
gen6_pte_t scratch_pte =
|
|
|
|
vm->pte_encode(vm->scratch_page.daddr, I915_CACHE_LLC, 0);
|
2012-02-10 00:15:46 +08:00
|
|
|
|
2012-02-10 00:15:47 +08:00
|
|
|
while (num_entries) {
|
2017-02-15 16:43:46 +08:00
|
|
|
struct i915_page_table *pt = ppgtt->pd.page_table[pde++];
|
|
|
|
unsigned int end = min(pte + num_entries, GEN6_PTES);
|
|
|
|
gen6_pte_t *vaddr;
|
2012-02-10 00:15:47 +08:00
|
|
|
|
2017-02-15 16:43:46 +08:00
|
|
|
num_entries -= end - pte;
|
2012-02-10 00:15:46 +08:00
|
|
|
|
2017-02-15 16:43:46 +08:00
|
|
|
/* Note that the hw doesn't support removing PDE on the fly
|
|
|
|
* (they are cached inside the context with no means to
|
|
|
|
* invalidate the cache), so we can only reset the PTE
|
|
|
|
* entries back to scratch.
|
|
|
|
*/
|
2012-02-10 00:15:46 +08:00
|
|
|
|
2017-02-15 16:43:46 +08:00
|
|
|
vaddr = kmap_atomic_px(pt);
|
|
|
|
do {
|
|
|
|
vaddr[pte++] = scratch_pte;
|
|
|
|
} while (pte < end);
|
|
|
|
kunmap_atomic(vaddr);
|
2012-02-10 00:15:46 +08:00
|
|
|
|
2017-02-15 16:43:46 +08:00
|
|
|
pte = 0;
|
2012-02-10 00:15:47 +08:00
|
|
|
}
|
2012-02-10 00:15:46 +08:00
|
|
|
}
|
|
|
|
|
2013-07-17 07:50:05 +08:00
|
|
|
static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
|
2017-06-22 17:58:36 +08:00
|
|
|
struct i915_vma *vma,
|
2017-02-15 16:43:57 +08:00
|
|
|
enum i915_cache_level cache_level,
|
|
|
|
u32 flags)
|
2013-01-25 06:44:56 +08:00
|
|
|
{
|
2016-04-07 16:08:03 +08:00
|
|
|
struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
|
2017-06-22 17:58:36 +08:00
|
|
|
unsigned first_entry = vma->node.start >> PAGE_SHIFT;
|
2015-03-17 00:00:54 +08:00
|
|
|
unsigned act_pt = first_entry / GEN6_PTES;
|
|
|
|
unsigned act_pte = first_entry % GEN6_PTES;
|
2017-02-15 16:43:36 +08:00
|
|
|
const u32 pte_encode = vm->pte_encode(0, cache_level, flags);
|
|
|
|
struct sgt_dma iter;
|
|
|
|
gen6_pte_t *vaddr;
|
|
|
|
|
2017-02-15 16:43:41 +08:00
|
|
|
vaddr = kmap_atomic_px(ppgtt->pd.page_table[act_pt]);
|
2017-06-22 17:58:36 +08:00
|
|
|
iter.sg = vma->pages->sgl;
|
2017-02-15 16:43:36 +08:00
|
|
|
iter.dma = sg_dma_address(iter.sg);
|
|
|
|
iter.max = iter.dma + iter.sg->length;
|
|
|
|
do {
|
|
|
|
vaddr[act_pte] = pte_encode | GEN6_PTE_ADDR_ENCODE(iter.dma);
|
2013-02-19 01:28:04 +08:00
|
|
|
|
2017-02-15 16:43:36 +08:00
|
|
|
iter.dma += PAGE_SIZE;
|
|
|
|
if (iter.dma == iter.max) {
|
|
|
|
iter.sg = __sg_next(iter.sg);
|
|
|
|
if (!iter.sg)
|
|
|
|
break;
|
2013-02-19 01:28:04 +08:00
|
|
|
|
2017-02-15 16:43:36 +08:00
|
|
|
iter.dma = sg_dma_address(iter.sg);
|
|
|
|
iter.max = iter.dma + iter.sg->length;
|
|
|
|
}
|
2014-06-17 13:29:42 +08:00
|
|
|
|
2015-03-17 00:00:54 +08:00
|
|
|
if (++act_pte == GEN6_PTES) {
|
2017-02-15 16:43:41 +08:00
|
|
|
kunmap_atomic(vaddr);
|
|
|
|
vaddr = kmap_atomic_px(ppgtt->pd.page_table[++act_pt]);
|
2013-02-19 01:28:04 +08:00
|
|
|
act_pte = 0;
|
2013-01-25 06:44:56 +08:00
|
|
|
}
|
2017-02-15 16:43:36 +08:00
|
|
|
} while (1);
|
2017-02-15 16:43:41 +08:00
|
|
|
kunmap_atomic(vaddr);
|
2013-01-25 06:44:56 +08:00
|
|
|
}
|
|
|
|
|
2015-03-17 00:00:56 +08:00
|
|
|
static int gen6_alloc_va_range(struct i915_address_space *vm,
|
2017-02-15 16:43:46 +08:00
|
|
|
u64 start, u64 length)
|
2015-03-17 00:00:56 +08:00
|
|
|
{
|
2016-04-07 16:08:03 +08:00
|
|
|
struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
|
2015-04-08 19:13:23 +08:00
|
|
|
struct i915_page_table *pt;
|
2017-02-15 16:43:46 +08:00
|
|
|
u64 from = start;
|
|
|
|
unsigned int pde;
|
|
|
|
bool flush = false;
|
drm/i915: Finish gen6/7 dynamic page table allocation
This patch continues on the idea from "Track GEN6 page table usage".
From here on, in the steady state, PDEs are all pointing to the scratch
page table (as recommended in the spec). When an object is allocated in
the VA range, the code will determine if we need to allocate a page for
the page table. Similarly when the object is destroyed, we will remove,
and free the page table pointing the PDE back to the scratch page.
Following patches will work to unify the code a bit as we bring in GEN8
support. GEN6 and GEN8 are different enough that I had a hard time to
get to this point with as much common code as I do.
The aliasing PPGTT must pre-allocate all of the page tables. There are a
few reasons for this. Two trivial ones: aliasing ppgtt goes through the
ggtt paths, so it's hard to maintain, we currently do not restore the
default context (assuming the previous force reload is indeed
necessary). Most importantly though, the only way (it seems from
empirical evidence) to invalidate the CS TLBs on non-render ring is to
either use ring sync (which requires actually stopping the rings in
order to synchronize when the sync completes vs. where you are in
execution), or to reload DCLV. Since without full PPGTT we do not ever
reload the DCLV register, there is no good way to achieve this. The
simplest solution is just to not support dynamic page table
creation/destruction in the aliasing PPGTT.
We could always reload DCLV, but this seems like quite a bit of excess
overhead only to save at most 2MB-4k of memory for the aliasing PPGTT
page tables.
v2: Make the page table bitmap declared inside the function (Chris)
Simplify the way scratching address space works.
Move the alloc/teardown tracepoints up a level in the call stack so that
both all implementations get the trace.
v3: Updated trace event to spit out a name
v4: Aliasing ppgtt is now initialized differently (in setup global gtt)
v5: Rebase to latest code. Also removed unnecessary aliasing ppgtt check
for trace, as it is no longer possible after the PPGTT cleanup patch series
of a couple of months ago (Daniel).
v6: Implement changes from code review (Daniel):
- allocate/teardown_va_range calls added.
- Add a scratch page allocation helper (only need the address).
- Move trace events to a new patch.
- Use updated mark_tlbs_dirty.
- Moved pt preallocation for aliasing ppgtt into gen6_ppgtt_init.
v7: teardown_va_range removed (Daniel).
In init, gen6_ppgtt_clear_range call is only needed for aliasing ppgtt.
v8: Rebase after s/page_tables/page_table/.
v9: Remove unnecessary scratch flag in page_table struct, future patches
can just compare against ppgtt->scratch_pt, and alloc_pt_scratch becomes
redundant. Initialize scratch_pt and pt. (Mika)
v10: Clean up aliasing ppgtt init error path and prevent leaking the
ppgtt obj when init fails. (Mika)
Updated commit author. (Daniel)
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v4+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-03-24 23:46:22 +08:00
|
|
|
|
2016-06-25 02:37:46 +08:00
|
|
|
gen6_for_each_pde(pt, &ppgtt->pd, start, length, pde) {
|
2017-02-15 16:43:46 +08:00
|
|
|
if (pt == vm->scratch_pt) {
|
|
|
|
pt = alloc_pt(vm);
|
|
|
|
if (IS_ERR(pt))
|
|
|
|
goto unwind_out;
|
drm/i915: Finish gen6/7 dynamic page table allocation
This patch continues on the idea from "Track GEN6 page table usage".
From here on, in the steady state, PDEs are all pointing to the scratch
page table (as recommended in the spec). When an object is allocated in
the VA range, the code will determine if we need to allocate a page for
the page table. Similarly when the object is destroyed, we will remove,
and free the page table pointing the PDE back to the scratch page.
Following patches will work to unify the code a bit as we bring in GEN8
support. GEN6 and GEN8 are different enough that I had a hard time to
get to this point with as much common code as I do.
The aliasing PPGTT must pre-allocate all of the page tables. There are a
few reasons for this. Two trivial ones: aliasing ppgtt goes through the
ggtt paths, so it's hard to maintain, we currently do not restore the
default context (assuming the previous force reload is indeed
necessary). Most importantly though, the only way (it seems from
empirical evidence) to invalidate the CS TLBs on non-render ring is to
either use ring sync (which requires actually stopping the rings in
order to synchronize when the sync completes vs. where you are in
execution), or to reload DCLV. Since without full PPGTT we do not ever
reload the DCLV register, there is no good way to achieve this. The
simplest solution is just to not support dynamic page table
creation/destruction in the aliasing PPGTT.
We could always reload DCLV, but this seems like quite a bit of excess
overhead only to save at most 2MB-4k of memory for the aliasing PPGTT
page tables.
v2: Make the page table bitmap declared inside the function (Chris)
Simplify the way scratching address space works.
Move the alloc/teardown tracepoints up a level in the call stack so that
both all implementations get the trace.
v3: Updated trace event to spit out a name
v4: Aliasing ppgtt is now initialized differently (in setup global gtt)
v5: Rebase to latest code. Also removed unnecessary aliasing ppgtt check
for trace, as it is no longer possible after the PPGTT cleanup patch series
of a couple of months ago (Daniel).
v6: Implement changes from code review (Daniel):
- allocate/teardown_va_range calls added.
- Add a scratch page allocation helper (only need the address).
- Move trace events to a new patch.
- Use updated mark_tlbs_dirty.
- Moved pt preallocation for aliasing ppgtt into gen6_ppgtt_init.
v7: teardown_va_range removed (Daniel).
In init, gen6_ppgtt_clear_range call is only needed for aliasing ppgtt.
v8: Rebase after s/page_tables/page_table/.
v9: Remove unnecessary scratch flag in page_table struct, future patches
can just compare against ppgtt->scratch_pt, and alloc_pt_scratch becomes
redundant. Initialize scratch_pt and pt. (Mika)
v10: Clean up aliasing ppgtt init error path and prevent leaking the
ppgtt obj when init fails. (Mika)
Updated commit author. (Daniel)
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v4+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-03-24 23:46:22 +08:00
|
|
|
|
2017-02-15 16:43:46 +08:00
|
|
|
gen6_initialize_pt(vm, pt);
|
|
|
|
ppgtt->pd.page_table[pde] = pt;
|
|
|
|
gen6_write_pde(ppgtt, pde, pt);
|
|
|
|
flush = true;
|
drm/i915: Finish gen6/7 dynamic page table allocation
This patch continues on the idea from "Track GEN6 page table usage".
From here on, in the steady state, PDEs are all pointing to the scratch
page table (as recommended in the spec). When an object is allocated in
the VA range, the code will determine if we need to allocate a page for
the page table. Similarly when the object is destroyed, we will remove,
and free the page table pointing the PDE back to the scratch page.
Following patches will work to unify the code a bit as we bring in GEN8
support. GEN6 and GEN8 are different enough that I had a hard time to
get to this point with as much common code as I do.
The aliasing PPGTT must pre-allocate all of the page tables. There are a
few reasons for this. Two trivial ones: aliasing ppgtt goes through the
ggtt paths, so it's hard to maintain, we currently do not restore the
default context (assuming the previous force reload is indeed
necessary). Most importantly though, the only way (it seems from
empirical evidence) to invalidate the CS TLBs on non-render ring is to
either use ring sync (which requires actually stopping the rings in
order to synchronize when the sync completes vs. where you are in
execution), or to reload DCLV. Since without full PPGTT we do not ever
reload the DCLV register, there is no good way to achieve this. The
simplest solution is just to not support dynamic page table
creation/destruction in the aliasing PPGTT.
We could always reload DCLV, but this seems like quite a bit of excess
overhead only to save at most 2MB-4k of memory for the aliasing PPGTT
page tables.
v2: Make the page table bitmap declared inside the function (Chris)
Simplify the way scratching address space works.
Move the alloc/teardown tracepoints up a level in the call stack so that
both all implementations get the trace.
v3: Updated trace event to spit out a name
v4: Aliasing ppgtt is now initialized differently (in setup global gtt)
v5: Rebase to latest code. Also removed unnecessary aliasing ppgtt check
for trace, as it is no longer possible after the PPGTT cleanup patch series
of a couple of months ago (Daniel).
v6: Implement changes from code review (Daniel):
- allocate/teardown_va_range calls added.
- Add a scratch page allocation helper (only need the address).
- Move trace events to a new patch.
- Use updated mark_tlbs_dirty.
- Moved pt preallocation for aliasing ppgtt into gen6_ppgtt_init.
v7: teardown_va_range removed (Daniel).
In init, gen6_ppgtt_clear_range call is only needed for aliasing ppgtt.
v8: Rebase after s/page_tables/page_table/.
v9: Remove unnecessary scratch flag in page_table struct, future patches
can just compare against ppgtt->scratch_pt, and alloc_pt_scratch becomes
redundant. Initialize scratch_pt and pt. (Mika)
v10: Clean up aliasing ppgtt init error path and prevent leaking the
ppgtt obj when init fails. (Mika)
Updated commit author. (Daniel)
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v4+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-03-24 23:46:22 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:46 +08:00
|
|
|
if (flush) {
|
|
|
|
mark_tlbs_dirty(ppgtt);
|
|
|
|
wmb();
|
2015-03-17 00:00:56 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
drm/i915: Finish gen6/7 dynamic page table allocation
This patch continues on the idea from "Track GEN6 page table usage".
From here on, in the steady state, PDEs are all pointing to the scratch
page table (as recommended in the spec). When an object is allocated in
the VA range, the code will determine if we need to allocate a page for
the page table. Similarly when the object is destroyed, we will remove,
and free the page table pointing the PDE back to the scratch page.
Following patches will work to unify the code a bit as we bring in GEN8
support. GEN6 and GEN8 are different enough that I had a hard time to
get to this point with as much common code as I do.
The aliasing PPGTT must pre-allocate all of the page tables. There are a
few reasons for this. Two trivial ones: aliasing ppgtt goes through the
ggtt paths, so it's hard to maintain, we currently do not restore the
default context (assuming the previous force reload is indeed
necessary). Most importantly though, the only way (it seems from
empirical evidence) to invalidate the CS TLBs on non-render ring is to
either use ring sync (which requires actually stopping the rings in
order to synchronize when the sync completes vs. where you are in
execution), or to reload DCLV. Since without full PPGTT we do not ever
reload the DCLV register, there is no good way to achieve this. The
simplest solution is just to not support dynamic page table
creation/destruction in the aliasing PPGTT.
We could always reload DCLV, but this seems like quite a bit of excess
overhead only to save at most 2MB-4k of memory for the aliasing PPGTT
page tables.
v2: Make the page table bitmap declared inside the function (Chris)
Simplify the way scratching address space works.
Move the alloc/teardown tracepoints up a level in the call stack so that
both all implementations get the trace.
v3: Updated trace event to spit out a name
v4: Aliasing ppgtt is now initialized differently (in setup global gtt)
v5: Rebase to latest code. Also removed unnecessary aliasing ppgtt check
for trace, as it is no longer possible after the PPGTT cleanup patch series
of a couple of months ago (Daniel).
v6: Implement changes from code review (Daniel):
- allocate/teardown_va_range calls added.
- Add a scratch page allocation helper (only need the address).
- Move trace events to a new patch.
- Use updated mark_tlbs_dirty.
- Moved pt preallocation for aliasing ppgtt into gen6_ppgtt_init.
v7: teardown_va_range removed (Daniel).
In init, gen6_ppgtt_clear_range call is only needed for aliasing ppgtt.
v8: Rebase after s/page_tables/page_table/.
v9: Remove unnecessary scratch flag in page_table struct, future patches
can just compare against ppgtt->scratch_pt, and alloc_pt_scratch becomes
redundant. Initialize scratch_pt and pt. (Mika)
v10: Clean up aliasing ppgtt init error path and prevent leaking the
ppgtt obj when init fails. (Mika)
Updated commit author. (Daniel)
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v4+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-03-24 23:46:22 +08:00
|
|
|
|
|
|
|
unwind_out:
|
2017-02-15 16:43:46 +08:00
|
|
|
gen6_ppgtt_clear_range(vm, from, start);
|
|
|
|
return -ENOMEM;
|
2015-03-17 00:00:56 +08:00
|
|
|
}
|
|
|
|
|
2015-06-30 23:16:40 +08:00
|
|
|
static int gen6_init_scratch(struct i915_address_space *vm)
|
|
|
|
{
|
2016-08-22 15:44:30 +08:00
|
|
|
int ret;
|
2015-06-30 23:16:40 +08:00
|
|
|
|
2017-02-15 16:43:40 +08:00
|
|
|
ret = setup_scratch_page(vm, I915_GFP_DMA);
|
2016-08-22 15:44:30 +08:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
2015-06-30 23:16:40 +08:00
|
|
|
|
2017-02-15 16:43:40 +08:00
|
|
|
vm->scratch_pt = alloc_pt(vm);
|
2015-06-30 23:16:40 +08:00
|
|
|
if (IS_ERR(vm->scratch_pt)) {
|
2017-02-15 16:43:40 +08:00
|
|
|
cleanup_scratch_page(vm);
|
2015-06-30 23:16:40 +08:00
|
|
|
return PTR_ERR(vm->scratch_pt);
|
|
|
|
}
|
|
|
|
|
|
|
|
gen6_initialize_pt(vm, vm->scratch_pt);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void gen6_free_scratch(struct i915_address_space *vm)
|
|
|
|
{
|
2017-02-15 16:43:40 +08:00
|
|
|
free_pt(vm, vm->scratch_pt);
|
|
|
|
cleanup_scratch_page(vm);
|
2015-06-30 23:16:40 +08:00
|
|
|
}
|
|
|
|
|
2015-04-14 23:35:13 +08:00
|
|
|
static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
|
2014-02-20 14:05:48 +08:00
|
|
|
{
|
2016-04-07 16:08:03 +08:00
|
|
|
struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
|
2016-06-25 02:37:46 +08:00
|
|
|
struct i915_page_directory *pd = &ppgtt->pd;
|
2015-04-08 19:13:30 +08:00
|
|
|
struct i915_page_table *pt;
|
2017-02-15 16:43:57 +08:00
|
|
|
u32 pde;
|
drm/i915: Finish gen6/7 dynamic page table allocation
This patch continues on the idea from "Track GEN6 page table usage".
From here on, in the steady state, PDEs are all pointing to the scratch
page table (as recommended in the spec). When an object is allocated in
the VA range, the code will determine if we need to allocate a page for
the page table. Similarly when the object is destroyed, we will remove,
and free the page table pointing the PDE back to the scratch page.
Following patches will work to unify the code a bit as we bring in GEN8
support. GEN6 and GEN8 are different enough that I had a hard time to
get to this point with as much common code as I do.
The aliasing PPGTT must pre-allocate all of the page tables. There are a
few reasons for this. Two trivial ones: aliasing ppgtt goes through the
ggtt paths, so it's hard to maintain, we currently do not restore the
default context (assuming the previous force reload is indeed
necessary). Most importantly though, the only way (it seems from
empirical evidence) to invalidate the CS TLBs on non-render ring is to
either use ring sync (which requires actually stopping the rings in
order to synchronize when the sync completes vs. where you are in
execution), or to reload DCLV. Since without full PPGTT we do not ever
reload the DCLV register, there is no good way to achieve this. The
simplest solution is just to not support dynamic page table
creation/destruction in the aliasing PPGTT.
We could always reload DCLV, but this seems like quite a bit of excess
overhead only to save at most 2MB-4k of memory for the aliasing PPGTT
page tables.
v2: Make the page table bitmap declared inside the function (Chris)
Simplify the way scratching address space works.
Move the alloc/teardown tracepoints up a level in the call stack so that
both all implementations get the trace.
v3: Updated trace event to spit out a name
v4: Aliasing ppgtt is now initialized differently (in setup global gtt)
v5: Rebase to latest code. Also removed unnecessary aliasing ppgtt check
for trace, as it is no longer possible after the PPGTT cleanup patch series
of a couple of months ago (Daniel).
v6: Implement changes from code review (Daniel):
- allocate/teardown_va_range calls added.
- Add a scratch page allocation helper (only need the address).
- Move trace events to a new patch.
- Use updated mark_tlbs_dirty.
- Moved pt preallocation for aliasing ppgtt into gen6_ppgtt_init.
v7: teardown_va_range removed (Daniel).
In init, gen6_ppgtt_clear_range call is only needed for aliasing ppgtt.
v8: Rebase after s/page_tables/page_table/.
v9: Remove unnecessary scratch flag in page_table struct, future patches
can just compare against ppgtt->scratch_pt, and alloc_pt_scratch becomes
redundant. Initialize scratch_pt and pt. (Mika)
v10: Clean up aliasing ppgtt init error path and prevent leaking the
ppgtt obj when init fails. (Mika)
Updated commit author. (Daniel)
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v4+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-03-24 23:46:22 +08:00
|
|
|
|
2015-04-14 23:35:13 +08:00
|
|
|
drm_mm_remove_node(&ppgtt->node);
|
|
|
|
|
2016-06-25 02:37:46 +08:00
|
|
|
gen6_for_all_pdes(pt, pd, pde)
|
2015-06-25 23:35:17 +08:00
|
|
|
if (pt != vm->scratch_pt)
|
2017-02-15 16:43:40 +08:00
|
|
|
free_pt(vm, pt);
|
drm/i915: Create page table allocators
As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks. This makes the
code easier to write, read, and verify.
Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,
The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain additional complexity as a
result.
This patch exists for a few reasons:
1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the DMA mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.
2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).
3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and catching leaks should be easy.
4. Less important: the GFP flags are confined to one location, which
makes playing around with such things trivial.
v2: Updated commit message to explain why this patch exists
v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/
v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)
v5: Added additional safety checks in gen8 clear/free/unmap.
v6: Use WARN_ON and return -EINVAL in alloc_pt_range (Mika).
v7: Make err_out loop symmetrical to the way we allocate in
alloc_pt_range. Also s/page_tables/page_table and correct commit
message (Mika)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-25 00:22:36 +08:00
|
|
|
|
2015-06-30 23:16:40 +08:00
|
|
|
gen6_free_scratch(vm);
|
2013-01-25 05:49:56 +08:00
|
|
|
}
|
|
|
|
|
2014-02-20 14:05:49 +08:00
|
|
|
static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
|
2013-01-25 05:49:56 +08:00
|
|
|
{
|
2015-06-30 23:16:40 +08:00
|
|
|
struct i915_address_space *vm = &ppgtt->base;
|
2016-11-29 17:50:08 +08:00
|
|
|
struct drm_i915_private *dev_priv = ppgtt->base.i915;
|
2016-03-30 21:57:10 +08:00
|
|
|
struct i915_ggtt *ggtt = &dev_priv->ggtt;
|
2014-02-20 14:05:49 +08:00
|
|
|
int ret;
|
2012-02-10 00:15:46 +08:00
|
|
|
|
drm/i915: Use drm_mm for PPGTT PDEs
When PPGTT support was originally enabled, it was only designed to
support 1 PPGTT. It therefore made sense to simply hide the GGTT space
required to enable this from the drm_mm allocator.
Since we intend to support full PPGTT, which means more than 1, and they
can be created and destroyed ad hoc it will be required to use the
proper allocation techniques we already have.
The first step here is to make the existing single PPGTT use the
allocator.
The astute observer will notice that we are reserving space in the GGTT
for the PDEs for the lifetime of the address space, and would be right
to question whether or not this is a good idea. It does not make a
difference with this current patch only the aliasing PPGTT (indeed the
PDEs should still be hidden from the shrinker). For the future, we are
allocating from top to bottom to avoid using the precious "gtt
space" The GGTT space at that point should only be used for scanout, HW
contexts, ringbuffers, HWSP, PDEs, and a couple of other small buffers
(potentially) used by the kernel. Everything else should be mapped into
a PPGTT. To put the consumption in more tangible terms, it takes
approximately 4 sets of PDEs to equal one 19x10 framebuffer (with no
fancy stride or alignment constraints). 3/4 of the total [average] GGTT
can be used for PDEs, and hopefully never touch the 1/4 that the
framebuffer needs.
The astute, and persistent observer might ask about the page tables
which are also pinned for the address space. This waste is unfortunate.
We use 2MB of memory per address space. We leave wrapping the PDEs as a
real GEM object as a TODO.
v2: Align PDEs to 64b in GTT
Allocate the node dynamically so we can use drm_mm_put_block
Now tested on IGT
Allocate node at the top to avoid fragmentation (Chris)
v3: Use Chris' top down allocator
v4: Embed drm_mm_node into ppgtt struct (Jesse)
Remove hunks which didn't belong (Jesse)
v5: Don't subtract guard page since we now killed the guard page prior
to this patch. (Ben)
v6: Rebased and removed guard page stuff.
Added a chunk to the commit message
Allow adding a context to mappable region
v7: Undo v3, so we can make the drm patch last in the series
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> (v4)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
squash: drm/i915: allow PPGTT to use mappable
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-07 06:11:07 +08:00
|
|
|
/* PPGTT PDEs reside in the GGTT and consists of 512 entries. The
|
|
|
|
* allocator works in address space sizes, so it's multiplied by page
|
|
|
|
* size. We allocate at the top of the GTT to avoid fragmentation.
|
|
|
|
*/
|
2016-03-30 21:57:10 +08:00
|
|
|
BUG_ON(!drm_mm_initialized(&ggtt->base.mm));
|
drm/i915: Finish gen6/7 dynamic page table allocation
This patch continues on the idea from "Track GEN6 page table usage".
From here on, in the steady state, PDEs are all pointing to the scratch
page table (as recommended in the spec). When an object is allocated in
the VA range, the code will determine if we need to allocate a page for
the page table. Similarly when the object is destroyed, we will remove,
and free the page table pointing the PDE back to the scratch page.
Following patches will work to unify the code a bit as we bring in GEN8
support. GEN6 and GEN8 are different enough that I had a hard time to
get to this point with as much common code as I do.
The aliasing PPGTT must pre-allocate all of the page tables. There are a
few reasons for this. Two trivial ones: aliasing ppgtt goes through the
ggtt paths, so it's hard to maintain, we currently do not restore the
default context (assuming the previous force reload is indeed
necessary). Most importantly though, the only way (it seems from
empirical evidence) to invalidate the CS TLBs on non-render ring is to
either use ring sync (which requires actually stopping the rings in
order to synchronize when the sync completes vs. where you are in
execution), or to reload DCLV. Since without full PPGTT we do not ever
reload the DCLV register, there is no good way to achieve this. The
simplest solution is just to not support dynamic page table
creation/destruction in the aliasing PPGTT.
We could always reload DCLV, but this seems like quite a bit of excess
overhead only to save at most 2MB-4k of memory for the aliasing PPGTT
page tables.
v2: Make the page table bitmap declared inside the function (Chris)
Simplify the way scratching address space works.
Move the alloc/teardown tracepoints up a level in the call stack so that
both all implementations get the trace.
v3: Updated trace event to spit out a name
v4: Aliasing ppgtt is now initialized differently (in setup global gtt)
v5: Rebase to latest code. Also removed unnecessary aliasing ppgtt check
for trace, as it is no longer possible after the PPGTT cleanup patch series
of a couple of months ago (Daniel).
v6: Implement changes from code review (Daniel):
- allocate/teardown_va_range calls added.
- Add a scratch page allocation helper (only need the address).
- Move trace events to a new patch.
- Use updated mark_tlbs_dirty.
- Moved pt preallocation for aliasing ppgtt into gen6_ppgtt_init.
v7: teardown_va_range removed (Daniel).
In init, gen6_ppgtt_clear_range call is only needed for aliasing ppgtt.
v8: Rebase after s/page_tables/page_table/.
v9: Remove unnecessary scratch flag in page_table struct, future patches
can just compare against ppgtt->scratch_pt, and alloc_pt_scratch becomes
redundant. Initialize scratch_pt and pt. (Mika)
v10: Clean up aliasing ppgtt init error path and prevent leaking the
ppgtt obj when init fails. (Mika)
Updated commit author. (Daniel)
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v4+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-03-24 23:46:22 +08:00
|
|
|
|
2015-06-30 23:16:40 +08:00
|
|
|
ret = gen6_init_scratch(vm);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
drm/i915: Finish gen6/7 dynamic page table allocation
This patch continues on the idea from "Track GEN6 page table usage".
From here on, in the steady state, PDEs are all pointing to the scratch
page table (as recommended in the spec). When an object is allocated in
the VA range, the code will determine if we need to allocate a page for
the page table. Similarly when the object is destroyed, we will remove,
and free the page table pointing the PDE back to the scratch page.
Following patches will work to unify the code a bit as we bring in GEN8
support. GEN6 and GEN8 are different enough that I had a hard time to
get to this point with as much common code as I do.
The aliasing PPGTT must pre-allocate all of the page tables. There are a
few reasons for this. Two trivial ones: aliasing ppgtt goes through the
ggtt paths, so it's hard to maintain, we currently do not restore the
default context (assuming the previous force reload is indeed
necessary). Most importantly though, the only way (it seems from
empirical evidence) to invalidate the CS TLBs on non-render ring is to
either use ring sync (which requires actually stopping the rings in
order to synchronize when the sync completes vs. where you are in
execution), or to reload DCLV. Since without full PPGTT we do not ever
reload the DCLV register, there is no good way to achieve this. The
simplest solution is just to not support dynamic page table
creation/destruction in the aliasing PPGTT.
We could always reload DCLV, but this seems like quite a bit of excess
overhead only to save at most 2MB-4k of memory for the aliasing PPGTT
page tables.
v2: Make the page table bitmap declared inside the function (Chris)
Simplify the way scratching address space works.
Move the alloc/teardown tracepoints up a level in the call stack so that
both all implementations get the trace.
v3: Updated trace event to spit out a name
v4: Aliasing ppgtt is now initialized differently (in setup global gtt)
v5: Rebase to latest code. Also removed unnecessary aliasing ppgtt check
for trace, as it is no longer possible after the PPGTT cleanup patch series
of a couple of months ago (Daniel).
v6: Implement changes from code review (Daniel):
- allocate/teardown_va_range calls added.
- Add a scratch page allocation helper (only need the address).
- Move trace events to a new patch.
- Use updated mark_tlbs_dirty.
- Moved pt preallocation for aliasing ppgtt into gen6_ppgtt_init.
v7: teardown_va_range removed (Daniel).
In init, gen6_ppgtt_clear_range call is only needed for aliasing ppgtt.
v8: Rebase after s/page_tables/page_table/.
v9: Remove unnecessary scratch flag in page_table struct, future patches
can just compare against ppgtt->scratch_pt, and alloc_pt_scratch becomes
redundant. Initialize scratch_pt and pt. (Mika)
v10: Clean up aliasing ppgtt init error path and prevent leaking the
ppgtt obj when init fails. (Mika)
Updated commit author. (Daniel)
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v4+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-03-24 23:46:22 +08:00
|
|
|
|
2017-01-11 19:23:10 +08:00
|
|
|
ret = i915_gem_gtt_insert(&ggtt->base, &ppgtt->node,
|
|
|
|
GEN6_PD_SIZE, GEN6_PD_ALIGN,
|
|
|
|
I915_COLOR_UNEVICTABLE,
|
|
|
|
0, ggtt->base.total,
|
|
|
|
PIN_HIGH);
|
2015-01-23 01:01:25 +08:00
|
|
|
if (ret)
|
2015-03-17 00:00:56 +08:00
|
|
|
goto err_out;
|
|
|
|
|
2016-03-30 21:57:10 +08:00
|
|
|
if (ppgtt->node.start < ggtt->mappable_end)
|
drm/i915: Use drm_mm for PPGTT PDEs
When PPGTT support was originally enabled, it was only designed to
support 1 PPGTT. It therefore made sense to simply hide the GGTT space
required to enable this from the drm_mm allocator.
Since we intend to support full PPGTT, which means more than 1, and they
can be created and destroyed ad hoc it will be required to use the
proper allocation techniques we already have.
The first step here is to make the existing single PPGTT use the
allocator.
The astute observer will notice that we are reserving space in the GGTT
for the PDEs for the lifetime of the address space, and would be right
to question whether or not this is a good idea. It does not make a
difference with this current patch only the aliasing PPGTT (indeed the
PDEs should still be hidden from the shrinker). For the future, we are
allocating from top to bottom to avoid using the precious "gtt
space" The GGTT space at that point should only be used for scanout, HW
contexts, ringbuffers, HWSP, PDEs, and a couple of other small buffers
(potentially) used by the kernel. Everything else should be mapped into
a PPGTT. To put the consumption in more tangible terms, it takes
approximately 4 sets of PDEs to equal one 19x10 framebuffer (with no
fancy stride or alignment constraints). 3/4 of the total [average] GGTT
can be used for PDEs, and hopefully never touch the 1/4 that the
framebuffer needs.
The astute, and persistent observer might ask about the page tables
which are also pinned for the address space. This waste is unfortunate.
We use 2MB of memory per address space. We leave wrapping the PDEs as a
real GEM object as a TODO.
v2: Align PDEs to 64b in GTT
Allocate the node dynamically so we can use drm_mm_put_block
Now tested on IGT
Allocate node at the top to avoid fragmentation (Chris)
v3: Use Chris' top down allocator
v4: Embed drm_mm_node into ppgtt struct (Jesse)
Remove hunks which didn't belong (Jesse)
v5: Don't subtract guard page since we now killed the guard page prior
to this patch. (Ben)
v6: Rebased and removed guard page stuff.
Added a chunk to the commit message
Allow adding a context to mappable region
v7: Undo v3, so we can make the drm patch last in the series
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> (v4)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
squash: drm/i915: allow PPGTT to use mappable
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-07 06:11:07 +08:00
|
|
|
DRM_DEBUG("Forced to use aperture for PDEs\n");
|
2012-02-10 00:15:46 +08:00
|
|
|
|
2017-02-15 16:43:43 +08:00
|
|
|
ppgtt->pd.base.ggtt_offset =
|
|
|
|
ppgtt->node.start / PAGE_SIZE * sizeof(gen6_pte_t);
|
|
|
|
|
|
|
|
ppgtt->pd_addr = (gen6_pte_t __iomem *)ggtt->gsm +
|
|
|
|
ppgtt->pd.base.ggtt_offset / sizeof(gen6_pte_t);
|
|
|
|
|
2015-01-23 01:01:25 +08:00
|
|
|
return 0;
|
2015-03-17 00:00:56 +08:00
|
|
|
|
|
|
|
err_out:
|
2015-06-30 23:16:40 +08:00
|
|
|
gen6_free_scratch(vm);
|
2015-03-17 00:00:56 +08:00
|
|
|
return ret;
|
2014-02-20 14:05:49 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
|
|
|
|
{
|
2015-03-27 19:26:35 +08:00
|
|
|
return gen6_ppgtt_allocate_page_directories(ppgtt);
|
drm/i915: Finish gen6/7 dynamic page table allocation
This patch continues on the idea from "Track GEN6 page table usage".
From here on, in the steady state, PDEs are all pointing to the scratch
page table (as recommended in the spec). When an object is allocated in
the VA range, the code will determine if we need to allocate a page for
the page table. Similarly when the object is destroyed, we will remove,
and free the page table pointing the PDE back to the scratch page.
Following patches will work to unify the code a bit as we bring in GEN8
support. GEN6 and GEN8 are different enough that I had a hard time to
get to this point with as much common code as I do.
The aliasing PPGTT must pre-allocate all of the page tables. There are a
few reasons for this. Two trivial ones: aliasing ppgtt goes through the
ggtt paths, so it's hard to maintain, we currently do not restore the
default context (assuming the previous force reload is indeed
necessary). Most importantly though, the only way (it seems from
empirical evidence) to invalidate the CS TLBs on non-render ring is to
either use ring sync (which requires actually stopping the rings in
order to synchronize when the sync completes vs. where you are in
execution), or to reload DCLV. Since without full PPGTT we do not ever
reload the DCLV register, there is no good way to achieve this. The
simplest solution is just to not support dynamic page table
creation/destruction in the aliasing PPGTT.
We could always reload DCLV, but this seems like quite a bit of excess
overhead only to save at most 2MB-4k of memory for the aliasing PPGTT
page tables.
v2: Make the page table bitmap declared inside the function (Chris)
Simplify the way scratching address space works.
Move the alloc/teardown tracepoints up a level in the call stack so that
both all implementations get the trace.
v3: Updated trace event to spit out a name
v4: Aliasing ppgtt is now initialized differently (in setup global gtt)
v5: Rebase to latest code. Also removed unnecessary aliasing ppgtt check
for trace, as it is no longer possible after the PPGTT cleanup patch series
of a couple of months ago (Daniel).
v6: Implement changes from code review (Daniel):
- allocate/teardown_va_range calls added.
- Add a scratch page allocation helper (only need the address).
- Move trace events to a new patch.
- Use updated mark_tlbs_dirty.
- Moved pt preallocation for aliasing ppgtt into gen6_ppgtt_init.
v7: teardown_va_range removed (Daniel).
In init, gen6_ppgtt_clear_range call is only needed for aliasing ppgtt.
v8: Rebase after s/page_tables/page_table/.
v9: Remove unnecessary scratch flag in page_table struct, future patches
can just compare against ppgtt->scratch_pt, and alloc_pt_scratch becomes
redundant. Initialize scratch_pt and pt. (Mika)
v10: Clean up aliasing ppgtt init error path and prevent leaking the
ppgtt obj when init fails. (Mika)
Updated commit author. (Daniel)
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v4+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-03-24 23:46:22 +08:00
|
|
|
}
|
2015-02-25 00:22:37 +08:00
|
|
|
|
drm/i915: Finish gen6/7 dynamic page table allocation
This patch continues on the idea from "Track GEN6 page table usage".
From here on, in the steady state, PDEs are all pointing to the scratch
page table (as recommended in the spec). When an object is allocated in
the VA range, the code will determine if we need to allocate a page for
the page table. Similarly when the object is destroyed, we will remove,
and free the page table pointing the PDE back to the scratch page.
Following patches will work to unify the code a bit as we bring in GEN8
support. GEN6 and GEN8 are different enough that I had a hard time to
get to this point with as much common code as I do.
The aliasing PPGTT must pre-allocate all of the page tables. There are a
few reasons for this. Two trivial ones: aliasing ppgtt goes through the
ggtt paths, so it's hard to maintain, we currently do not restore the
default context (assuming the previous force reload is indeed
necessary). Most importantly though, the only way (it seems from
empirical evidence) to invalidate the CS TLBs on non-render ring is to
either use ring sync (which requires actually stopping the rings in
order to synchronize when the sync completes vs. where you are in
execution), or to reload DCLV. Since without full PPGTT we do not ever
reload the DCLV register, there is no good way to achieve this. The
simplest solution is just to not support dynamic page table
creation/destruction in the aliasing PPGTT.
We could always reload DCLV, but this seems like quite a bit of excess
overhead only to save at most 2MB-4k of memory for the aliasing PPGTT
page tables.
v2: Make the page table bitmap declared inside the function (Chris)
Simplify the way scratching address space works.
Move the alloc/teardown tracepoints up a level in the call stack so that
both all implementations get the trace.
v3: Updated trace event to spit out a name
v4: Aliasing ppgtt is now initialized differently (in setup global gtt)
v5: Rebase to latest code. Also removed unnecessary aliasing ppgtt check
for trace, as it is no longer possible after the PPGTT cleanup patch series
of a couple of months ago (Daniel).
v6: Implement changes from code review (Daniel):
- allocate/teardown_va_range calls added.
- Add a scratch page allocation helper (only need the address).
- Move trace events to a new patch.
- Use updated mark_tlbs_dirty.
- Moved pt preallocation for aliasing ppgtt into gen6_ppgtt_init.
v7: teardown_va_range removed (Daniel).
In init, gen6_ppgtt_clear_range call is only needed for aliasing ppgtt.
v8: Rebase after s/page_tables/page_table/.
v9: Remove unnecessary scratch flag in page_table struct, future patches
can just compare against ppgtt->scratch_pt, and alloc_pt_scratch becomes
redundant. Initialize scratch_pt and pt. (Mika)
v10: Clean up aliasing ppgtt init error path and prevent leaking the
ppgtt obj when init fails. (Mika)
Updated commit author. (Daniel)
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v4+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-03-24 23:46:22 +08:00
|
|
|
static void gen6_scratch_va_range(struct i915_hw_ppgtt *ppgtt,
|
2017-02-15 16:43:57 +08:00
|
|
|
u64 start, u64 length)
|
drm/i915: Finish gen6/7 dynamic page table allocation
This patch continues on the idea from "Track GEN6 page table usage".
From here on, in the steady state, PDEs are all pointing to the scratch
page table (as recommended in the spec). When an object is allocated in
the VA range, the code will determine if we need to allocate a page for
the page table. Similarly when the object is destroyed, we will remove,
and free the page table pointing the PDE back to the scratch page.
Following patches will work to unify the code a bit as we bring in GEN8
support. GEN6 and GEN8 are different enough that I had a hard time to
get to this point with as much common code as I do.
The aliasing PPGTT must pre-allocate all of the page tables. There are a
few reasons for this. Two trivial ones: aliasing ppgtt goes through the
ggtt paths, so it's hard to maintain, we currently do not restore the
default context (assuming the previous force reload is indeed
necessary). Most importantly though, the only way (it seems from
empirical evidence) to invalidate the CS TLBs on non-render ring is to
either use ring sync (which requires actually stopping the rings in
order to synchronize when the sync completes vs. where you are in
execution), or to reload DCLV. Since without full PPGTT we do not ever
reload the DCLV register, there is no good way to achieve this. The
simplest solution is just to not support dynamic page table
creation/destruction in the aliasing PPGTT.
We could always reload DCLV, but this seems like quite a bit of excess
overhead only to save at most 2MB-4k of memory for the aliasing PPGTT
page tables.
v2: Make the page table bitmap declared inside the function (Chris)
Simplify the way scratching address space works.
Move the alloc/teardown tracepoints up a level in the call stack so that
both all implementations get the trace.
v3: Updated trace event to spit out a name
v4: Aliasing ppgtt is now initialized differently (in setup global gtt)
v5: Rebase to latest code. Also removed unnecessary aliasing ppgtt check
for trace, as it is no longer possible after the PPGTT cleanup patch series
of a couple of months ago (Daniel).
v6: Implement changes from code review (Daniel):
- allocate/teardown_va_range calls added.
- Add a scratch page allocation helper (only need the address).
- Move trace events to a new patch.
- Use updated mark_tlbs_dirty.
- Moved pt preallocation for aliasing ppgtt into gen6_ppgtt_init.
v7: teardown_va_range removed (Daniel).
In init, gen6_ppgtt_clear_range call is only needed for aliasing ppgtt.
v8: Rebase after s/page_tables/page_table/.
v9: Remove unnecessary scratch flag in page_table struct, future patches
can just compare against ppgtt->scratch_pt, and alloc_pt_scratch becomes
redundant. Initialize scratch_pt and pt. (Mika)
v10: Clean up aliasing ppgtt init error path and prevent leaking the
ppgtt obj when init fails. (Mika)
Updated commit author. (Daniel)
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v4+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-03-24 23:46:22 +08:00
|
|
|
{
|
2015-04-08 19:13:23 +08:00
|
|
|
struct i915_page_table *unused;
|
2017-02-15 16:43:57 +08:00
|
|
|
u32 pde;
|
2012-02-10 00:15:46 +08:00
|
|
|
|
2016-06-25 02:37:46 +08:00
|
|
|
gen6_for_each_pde(unused, &ppgtt->pd, start, length, pde)
|
2015-06-25 23:35:17 +08:00
|
|
|
ppgtt->pd.page_table[pde] = ppgtt->base.scratch_pt;
|
2014-02-20 14:05:49 +08:00
|
|
|
}
|
|
|
|
|
2015-04-14 23:35:14 +08:00
|
|
|
static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
|
2014-02-20 14:05:49 +08:00
|
|
|
{
|
2016-11-29 17:50:08 +08:00
|
|
|
struct drm_i915_private *dev_priv = ppgtt->base.i915;
|
2016-03-30 21:57:10 +08:00
|
|
|
struct i915_ggtt *ggtt = &dev_priv->ggtt;
|
2014-02-20 14:05:49 +08:00
|
|
|
int ret;
|
|
|
|
|
2016-03-30 21:57:10 +08:00
|
|
|
ppgtt->base.pte_encode = ggtt->base.pte_encode;
|
2016-10-13 18:03:10 +08:00
|
|
|
if (intel_vgpu_active(dev_priv) || IS_GEN6(dev_priv))
|
2014-02-20 14:05:49 +08:00
|
|
|
ppgtt->switch_mm = gen6_mm_switch;
|
2016-10-13 18:03:01 +08:00
|
|
|
else if (IS_HASWELL(dev_priv))
|
2014-02-20 14:05:49 +08:00
|
|
|
ppgtt->switch_mm = hsw_mm_switch;
|
2016-10-13 18:03:10 +08:00
|
|
|
else if (IS_GEN7(dev_priv))
|
2014-02-20 14:05:49 +08:00
|
|
|
ppgtt->switch_mm = gen7_mm_switch;
|
2016-07-04 15:48:31 +08:00
|
|
|
else
|
2014-02-20 14:05:49 +08:00
|
|
|
BUG();
|
|
|
|
|
|
|
|
ret = gen6_ppgtt_alloc(ppgtt);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2015-04-08 19:13:30 +08:00
|
|
|
ppgtt->base.total = I915_PDES * GEN6_PTES * PAGE_SIZE;
|
2012-02-10 00:15:46 +08:00
|
|
|
|
2015-04-14 23:35:14 +08:00
|
|
|
gen6_scratch_va_range(ppgtt, 0, ppgtt->base.total);
|
2017-02-15 16:43:45 +08:00
|
|
|
gen6_write_page_range(ppgtt, 0, ppgtt->base.total);
|
2015-03-17 00:00:56 +08:00
|
|
|
|
2017-02-15 16:43:43 +08:00
|
|
|
ret = gen6_alloc_va_range(&ppgtt->base, 0, ppgtt->base.total);
|
|
|
|
if (ret) {
|
|
|
|
gen6_ppgtt_cleanup(&ppgtt->base);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2017-02-28 23:28:11 +08:00
|
|
|
ppgtt->base.clear_range = gen6_ppgtt_clear_range;
|
|
|
|
ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
|
|
|
|
ppgtt->base.unbind_vma = ppgtt_unbind_vma;
|
|
|
|
ppgtt->base.bind_vma = ppgtt_bind_vma;
|
|
|
|
ppgtt->base.cleanup = gen6_ppgtt_cleanup;
|
|
|
|
ppgtt->debug_dump = gen6_dump_ppgtt;
|
|
|
|
|
2015-01-23 16:05:06 +08:00
|
|
|
DRM_DEBUG_DRIVER("Allocated pde space (%lldM) at GTT entry: %llx\n",
|
2014-02-20 14:05:49 +08:00
|
|
|
ppgtt->node.size >> 20,
|
|
|
|
ppgtt->node.start / PAGE_SIZE);
|
2013-01-25 05:49:56 +08:00
|
|
|
|
2017-02-15 16:43:43 +08:00
|
|
|
DRM_DEBUG_DRIVER("Adding PPGTT at offset %x\n",
|
|
|
|
ppgtt->pd.base.ggtt_offset << 10);
|
2014-08-07 02:19:54 +08:00
|
|
|
|
2014-02-20 14:05:49 +08:00
|
|
|
return 0;
|
2013-01-25 05:49:56 +08:00
|
|
|
}
|
|
|
|
|
2016-08-04 14:52:25 +08:00
|
|
|
static int __hw_ppgtt_init(struct i915_hw_ppgtt *ppgtt,
|
|
|
|
struct drm_i915_private *dev_priv)
|
2013-01-25 05:49:56 +08:00
|
|
|
{
|
2016-11-29 17:50:08 +08:00
|
|
|
ppgtt->base.i915 = dev_priv;
|
2017-02-15 16:43:40 +08:00
|
|
|
ppgtt->base.dma = &dev_priv->drm.pdev->dev;
|
2013-01-25 05:49:56 +08:00
|
|
|
|
2016-08-04 14:52:25 +08:00
|
|
|
if (INTEL_INFO(dev_priv)->gen < 8)
|
2015-04-14 23:35:14 +08:00
|
|
|
return gen6_ppgtt_init(ppgtt);
|
2013-04-09 09:43:53 +08:00
|
|
|
else
|
drm/i915/gen8: Dynamic page table allocations
This finishes off the dynamic page tables allocations, in the legacy 3
level style that already exists. Most everything has already been setup
to this point, the patch finishes off the enabling by setting the
appropriate function pointers.
In LRC mode, contexts need to know the PDPs when they are populated. With
dynamic page table allocations, these PDPs may not exist yet. Check if
PDPs have been allocated and use the scratch page if they do not exist yet.
Before submission, update the PDPs in the logic ring context as PDPs
have been allocated.
v2: Update aliasing/true ppgtt allocate/teardown/clear functions for
gen 6 & 7.
v3: Rebase.
v4: Remove BUG() from ppgtt_unbind_vma, but keep checking that either
teardown_va_range or clear_range functions exist (Daniel).
v5: Similar to gen6, in init, gen8_ppgtt_clear_range call is only needed
for aliasing ppgtt. Zombie tracking was originally added for teardown
function and is no longer required.
v6: Update err_out case in gen8_alloc_va_range (missed from lastest
rebase).
v7: Rebase after s/page_tables/page_table/.
v8: Updated scratch_pt check after scratch flag was removed in previous
patch.
v9: Note that lrc mode needs to be updated to support init state without
any PDP.
v10: Unmap correct page_table in gen8_alloc_va_range's error case, clean-up
gen8_aliasing_ppgtt_init (remove duplicated map), and initialize PTs
during page table allocation.
v11: Squashed LRC enabling commit, otherwise LRC mode would be left broken
until it was updated to handle the init case without any PDP.
v12: Do not overallocate new_pts bitmap, make alloc_gen8_temp_bitmaps
static and don't abuse of inline functions. (Mika)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-08 19:13:34 +08:00
|
|
|
return gen8_ppgtt_init(ppgtt);
|
2014-08-07 02:19:54 +08:00
|
|
|
}
|
2015-06-25 23:35:13 +08:00
|
|
|
|
2015-09-16 17:49:00 +08:00
|
|
|
static void i915_address_space_init(struct i915_address_space *vm,
|
2016-10-28 20:58:58 +08:00
|
|
|
struct drm_i915_private *dev_priv,
|
|
|
|
const char *name)
|
2015-09-16 17:49:00 +08:00
|
|
|
{
|
2016-10-28 20:58:58 +08:00
|
|
|
i915_gem_timeline_init(dev_priv, &vm->timeline, name);
|
2017-02-06 16:45:46 +08:00
|
|
|
|
2017-02-15 16:43:54 +08:00
|
|
|
drm_mm_init(&vm->mm, 0, vm->total);
|
2017-02-06 16:45:46 +08:00
|
|
|
vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
|
|
|
|
|
2015-09-16 17:49:00 +08:00
|
|
|
INIT_LIST_HEAD(&vm->active_list);
|
|
|
|
INIT_LIST_HEAD(&vm->inactive_list);
|
2016-08-04 14:52:46 +08:00
|
|
|
INIT_LIST_HEAD(&vm->unbound_list);
|
2017-02-06 16:45:46 +08:00
|
|
|
|
2015-09-16 17:49:00 +08:00
|
|
|
list_add_tail(&vm->global_link, &dev_priv->vm_list);
|
2017-02-15 16:43:40 +08:00
|
|
|
pagevec_init(&vm->free_pages, false);
|
2015-09-16 17:49:00 +08:00
|
|
|
}
|
|
|
|
|
2016-11-18 05:04:10 +08:00
|
|
|
static void i915_address_space_fini(struct i915_address_space *vm)
|
|
|
|
{
|
2017-02-15 16:43:40 +08:00
|
|
|
if (pagevec_count(&vm->free_pages))
|
2017-08-23 01:38:28 +08:00
|
|
|
vm_free_pages_release(vm, true);
|
2017-02-15 16:43:40 +08:00
|
|
|
|
2016-11-18 05:04:10 +08:00
|
|
|
i915_gem_timeline_fini(&vm->timeline);
|
|
|
|
drm_mm_takedown(&vm->mm);
|
|
|
|
list_del(&vm->global_link);
|
|
|
|
}
|
|
|
|
|
2016-11-16 16:55:31 +08:00
|
|
|
static void gtt_write_workarounds(struct drm_i915_private *dev_priv)
|
2016-02-04 19:49:34 +08:00
|
|
|
{
|
|
|
|
/* This function is for gtt related workarounds. This function is
|
|
|
|
* called on driver load and after a GPU reset, so you can place
|
|
|
|
* workarounds here even if they get overwritten by GPU reset.
|
|
|
|
*/
|
2017-08-16 07:16:48 +08:00
|
|
|
/* WaIncreaseDefaultTLBEntries:chv,bdw,skl,bxt,kbl,glk,cfl,cnl */
|
2016-10-13 18:03:00 +08:00
|
|
|
if (IS_BROADWELL(dev_priv))
|
2016-02-04 19:49:34 +08:00
|
|
|
I915_WRITE(GEN8_L3_LRA_1_GPGPU, GEN8_L3_LRA_1_GPGPU_DEFAULT_VALUE_BDW);
|
2016-10-14 17:13:44 +08:00
|
|
|
else if (IS_CHERRYVIEW(dev_priv))
|
2016-02-04 19:49:34 +08:00
|
|
|
I915_WRITE(GEN8_L3_LRA_1_GPGPU, GEN8_L3_LRA_1_GPGPU_DEFAULT_VALUE_CHV);
|
2017-08-16 07:16:48 +08:00
|
|
|
else if (IS_GEN9_BC(dev_priv) || IS_GEN10(dev_priv))
|
2016-02-04 19:49:34 +08:00
|
|
|
I915_WRITE(GEN8_L3_LRA_1_GPGPU, GEN9_L3_LRA_1_GPGPU_DEFAULT_VALUE_SKL);
|
2017-01-26 17:16:58 +08:00
|
|
|
else if (IS_GEN9_LP(dev_priv))
|
2016-02-04 19:49:34 +08:00
|
|
|
I915_WRITE(GEN8_L3_LRA_1_GPGPU, GEN9_L3_LRA_1_GPGPU_DEFAULT_VALUE_BXT);
|
|
|
|
}
|
|
|
|
|
2016-11-16 16:55:31 +08:00
|
|
|
int i915_ppgtt_init_hw(struct drm_i915_private *dev_priv)
|
2014-08-07 02:19:53 +08:00
|
|
|
{
|
2016-11-16 16:55:31 +08:00
|
|
|
gtt_write_workarounds(dev_priv);
|
2016-02-04 19:49:34 +08:00
|
|
|
|
2014-08-20 23:24:50 +08:00
|
|
|
/* In the case of execlists, PPGTT is enabled by the context descriptor
|
|
|
|
* and the PDPs are contained within the context itself. We don't
|
|
|
|
* need to do anything here. */
|
|
|
|
if (i915.enable_execlists)
|
|
|
|
return 0;
|
|
|
|
|
2016-11-16 16:55:31 +08:00
|
|
|
if (!USES_PPGTT(dev_priv))
|
2014-08-07 02:19:53 +08:00
|
|
|
return 0;
|
|
|
|
|
2016-10-13 18:03:10 +08:00
|
|
|
if (IS_GEN6(dev_priv))
|
2016-11-16 16:55:31 +08:00
|
|
|
gen6_ppgtt_enable(dev_priv);
|
2016-10-13 18:03:10 +08:00
|
|
|
else if (IS_GEN7(dev_priv))
|
2016-11-16 16:55:31 +08:00
|
|
|
gen7_ppgtt_enable(dev_priv);
|
|
|
|
else if (INTEL_GEN(dev_priv) >= 8)
|
|
|
|
gen8_ppgtt_enable(dev_priv);
|
2014-08-07 02:19:53 +08:00
|
|
|
else
|
2016-11-16 16:55:31 +08:00
|
|
|
MISSING_CASE(INTEL_GEN(dev_priv));
|
2014-08-07 02:19:53 +08:00
|
|
|
|
2015-06-18 20:11:20 +08:00
|
|
|
return 0;
|
|
|
|
}
|
2012-02-10 00:15:46 +08:00
|
|
|
|
2014-08-06 21:04:47 +08:00
|
|
|
struct i915_hw_ppgtt *
|
2016-08-04 14:52:25 +08:00
|
|
|
i915_ppgtt_create(struct drm_i915_private *dev_priv,
|
2016-10-28 20:58:58 +08:00
|
|
|
struct drm_i915_file_private *fpriv,
|
|
|
|
const char *name)
|
2014-08-06 21:04:47 +08:00
|
|
|
{
|
|
|
|
struct i915_hw_ppgtt *ppgtt;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
ppgtt = kzalloc(sizeof(*ppgtt), GFP_KERNEL);
|
|
|
|
if (!ppgtt)
|
|
|
|
return ERR_PTR(-ENOMEM);
|
|
|
|
|
2017-02-15 16:43:38 +08:00
|
|
|
ret = __hw_ppgtt_init(ppgtt, dev_priv);
|
2014-08-06 21:04:47 +08:00
|
|
|
if (ret) {
|
|
|
|
kfree(ppgtt);
|
|
|
|
return ERR_PTR(ret);
|
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:38 +08:00
|
|
|
kref_init(&ppgtt->ref);
|
|
|
|
i915_address_space_init(&ppgtt->base, dev_priv, name);
|
|
|
|
ppgtt->base.file = fpriv;
|
|
|
|
|
2014-11-10 21:44:31 +08:00
|
|
|
trace_i915_ppgtt_create(&ppgtt->base);
|
|
|
|
|
2014-08-06 21:04:47 +08:00
|
|
|
return ppgtt;
|
|
|
|
}
|
|
|
|
|
2017-01-12 05:09:25 +08:00
|
|
|
void i915_ppgtt_close(struct i915_address_space *vm)
|
|
|
|
{
|
|
|
|
struct list_head *phases[] = {
|
|
|
|
&vm->active_list,
|
|
|
|
&vm->inactive_list,
|
|
|
|
&vm->unbound_list,
|
|
|
|
NULL,
|
|
|
|
}, **phase;
|
|
|
|
|
|
|
|
GEM_BUG_ON(vm->closed);
|
|
|
|
vm->closed = true;
|
|
|
|
|
|
|
|
for (phase = phases; *phase; phase++) {
|
|
|
|
struct i915_vma *vma, *vn;
|
|
|
|
|
|
|
|
list_for_each_entry_safe(vma, vn, *phase, vm_link)
|
|
|
|
if (!i915_vma_is_closed(vma))
|
|
|
|
i915_vma_close(vma);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-11-18 05:04:10 +08:00
|
|
|
void i915_ppgtt_release(struct kref *kref)
|
2014-08-06 21:04:45 +08:00
|
|
|
{
|
|
|
|
struct i915_hw_ppgtt *ppgtt =
|
|
|
|
container_of(kref, struct i915_hw_ppgtt, ref);
|
|
|
|
|
2014-11-10 21:44:31 +08:00
|
|
|
trace_i915_ppgtt_release(&ppgtt->base);
|
|
|
|
|
2016-08-04 14:52:46 +08:00
|
|
|
/* vmas should already be unbound and destroyed */
|
2014-08-06 21:04:45 +08:00
|
|
|
WARN_ON(!list_empty(&ppgtt->base.active_list));
|
|
|
|
WARN_ON(!list_empty(&ppgtt->base.inactive_list));
|
2016-08-04 14:52:46 +08:00
|
|
|
WARN_ON(!list_empty(&ppgtt->base.unbound_list));
|
2014-08-06 21:04:45 +08:00
|
|
|
|
|
|
|
ppgtt->base.cleanup(&ppgtt->base);
|
2017-02-15 16:43:40 +08:00
|
|
|
i915_address_space_fini(&ppgtt->base);
|
2014-08-06 21:04:45 +08:00
|
|
|
kfree(ppgtt);
|
|
|
|
}
|
2012-02-10 00:15:46 +08:00
|
|
|
|
2013-01-19 04:30:31 +08:00
|
|
|
/* Certain Gen5 chipsets require require idling the GPU before
|
|
|
|
* unmapping anything from the GTT when VT-d is enabled.
|
|
|
|
*/
|
2016-08-04 14:52:22 +08:00
|
|
|
static bool needs_idle_maps(struct drm_i915_private *dev_priv)
|
2013-01-19 04:30:31 +08:00
|
|
|
{
|
|
|
|
/* Query intel_iommu to see if we need the workaround. Presumably that
|
|
|
|
* was loaded first.
|
|
|
|
*/
|
2017-05-25 20:16:12 +08:00
|
|
|
return IS_GEN5(dev_priv) && IS_MOBILE(dev_priv) && intel_vtd_active();
|
2013-01-19 04:30:31 +08:00
|
|
|
}
|
|
|
|
|
2016-05-10 21:10:04 +08:00
|
|
|
void i915_check_and_clear_faults(struct drm_i915_private *dev_priv)
|
2013-10-17 00:21:30 +08:00
|
|
|
{
|
2016-03-16 19:00:36 +08:00
|
|
|
struct intel_engine_cs *engine;
|
drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 01:14:48 +08:00
|
|
|
enum intel_engine_id id;
|
2013-10-17 00:21:30 +08:00
|
|
|
|
2016-05-10 21:10:04 +08:00
|
|
|
if (INTEL_INFO(dev_priv)->gen < 6)
|
2013-10-17 00:21:30 +08:00
|
|
|
return;
|
|
|
|
|
drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 01:14:48 +08:00
|
|
|
for_each_engine(engine, dev_priv, id) {
|
2013-10-17 00:21:30 +08:00
|
|
|
u32 fault_reg;
|
2016-03-16 19:00:36 +08:00
|
|
|
fault_reg = I915_READ(RING_FAULT_REG(engine));
|
2013-10-17 00:21:30 +08:00
|
|
|
if (fault_reg & RING_FAULT_VALID) {
|
|
|
|
DRM_DEBUG_DRIVER("Unexpected fault\n"
|
2014-10-31 01:52:45 +08:00
|
|
|
"\tAddr: 0x%08lx\n"
|
2013-10-17 00:21:30 +08:00
|
|
|
"\tAddress space: %s\n"
|
|
|
|
"\tSource ID: %d\n"
|
|
|
|
"\tType: %d\n",
|
|
|
|
fault_reg & PAGE_MASK,
|
|
|
|
fault_reg & RING_FAULT_GTTSEL_MASK ? "GGTT" : "PPGTT",
|
|
|
|
RING_FAULT_SRCID(fault_reg),
|
|
|
|
RING_FAULT_FAULT_TYPE(fault_reg));
|
2016-03-16 19:00:36 +08:00
|
|
|
I915_WRITE(RING_FAULT_REG(engine),
|
2013-10-17 00:21:30 +08:00
|
|
|
fault_reg & ~RING_FAULT_VALID);
|
|
|
|
}
|
|
|
|
}
|
drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 01:14:48 +08:00
|
|
|
|
|
|
|
/* Engine specific init may not have been done till this point. */
|
|
|
|
if (dev_priv->engine[RCS])
|
|
|
|
POSTING_READ(RING_FAULT_REG(dev_priv->engine[RCS]));
|
2013-10-17 00:21:30 +08:00
|
|
|
}
|
|
|
|
|
2016-11-16 16:55:34 +08:00
|
|
|
void i915_gem_suspend_gtt_mappings(struct drm_i915_private *dev_priv)
|
2013-10-17 00:21:30 +08:00
|
|
|
{
|
2016-03-30 21:57:10 +08:00
|
|
|
struct i915_ggtt *ggtt = &dev_priv->ggtt;
|
2013-10-17 00:21:30 +08:00
|
|
|
|
|
|
|
/* Don't bother messing with faults pre GEN6 as we have little
|
|
|
|
* documentation supporting that it's a good idea.
|
|
|
|
*/
|
2016-11-16 16:55:34 +08:00
|
|
|
if (INTEL_GEN(dev_priv) < 6)
|
2013-10-17 00:21:30 +08:00
|
|
|
return;
|
|
|
|
|
2016-05-10 21:10:04 +08:00
|
|
|
i915_check_and_clear_faults(dev_priv);
|
2013-10-17 00:21:30 +08:00
|
|
|
|
2017-02-15 16:43:54 +08:00
|
|
|
ggtt->base.clear_range(&ggtt->base, 0, ggtt->base.total);
|
2014-09-25 17:13:12 +08:00
|
|
|
|
2017-01-12 19:00:49 +08:00
|
|
|
i915_ggtt_invalidate(dev_priv);
|
2013-10-17 00:21:30 +08:00
|
|
|
}
|
|
|
|
|
2016-10-28 20:58:36 +08:00
|
|
|
int i915_gem_gtt_prepare_pages(struct drm_i915_gem_object *obj,
|
|
|
|
struct sg_table *pages)
|
2010-11-06 17:10:47 +08:00
|
|
|
{
|
2017-01-06 23:22:39 +08:00
|
|
|
do {
|
|
|
|
if (dma_map_sg(&obj->base.dev->pdev->dev,
|
|
|
|
pages->sgl, pages->nents,
|
|
|
|
PCI_DMA_BIDIRECTIONAL))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
/* If the DMA remap fails, one cause can be that we have
|
|
|
|
* too many objects pinned in a small remapping table,
|
|
|
|
* such as swiotlb. Incrementally purge all other objects and
|
|
|
|
* try again - if there are no more pages to remove from
|
|
|
|
* the DMA remapper, i915_gem_shrink will return 0.
|
|
|
|
*/
|
|
|
|
GEM_BUG_ON(obj->mm.pages == pages);
|
|
|
|
} while (i915_gem_shrink(to_i915(obj->base.dev),
|
|
|
|
obj->base.size >> PAGE_SHIFT,
|
|
|
|
I915_SHRINK_BOUND |
|
|
|
|
I915_SHRINK_UNBOUND |
|
|
|
|
I915_SHRINK_ACTIVE));
|
2012-06-01 22:20:22 +08:00
|
|
|
|
2016-10-28 20:58:36 +08:00
|
|
|
return -ENOSPC;
|
2010-11-06 17:10:47 +08:00
|
|
|
}
|
|
|
|
|
2015-04-14 23:35:26 +08:00
|
|
|
static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte)
|
2013-11-03 12:07:18 +08:00
|
|
|
{
|
|
|
|
writeq(pte, addr);
|
|
|
|
}
|
|
|
|
|
2016-06-10 16:52:59 +08:00
|
|
|
static void gen8_ggtt_insert_page(struct i915_address_space *vm,
|
|
|
|
dma_addr_t addr,
|
2017-02-15 16:43:57 +08:00
|
|
|
u64 offset,
|
2016-06-10 16:52:59 +08:00
|
|
|
enum i915_cache_level level,
|
|
|
|
u32 unused)
|
|
|
|
{
|
2017-01-12 19:00:49 +08:00
|
|
|
struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
|
2016-06-10 16:52:59 +08:00
|
|
|
gen8_pte_t __iomem *pte =
|
2017-01-12 19:00:49 +08:00
|
|
|
(gen8_pte_t __iomem *)ggtt->gsm + (offset >> PAGE_SHIFT);
|
2016-06-10 16:52:59 +08:00
|
|
|
|
2016-10-13 20:02:40 +08:00
|
|
|
gen8_set_pte(pte, gen8_pte_encode(addr, level));
|
2016-06-10 16:52:59 +08:00
|
|
|
|
2017-01-12 19:00:49 +08:00
|
|
|
ggtt->invalidate(vm->i915);
|
2016-06-10 16:52:59 +08:00
|
|
|
}
|
|
|
|
|
2013-11-03 12:07:18 +08:00
|
|
|
static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
|
2017-06-22 17:58:36 +08:00
|
|
|
struct i915_vma *vma,
|
2017-02-15 16:43:57 +08:00
|
|
|
enum i915_cache_level level,
|
|
|
|
u32 unused)
|
2013-11-03 12:07:18 +08:00
|
|
|
{
|
2016-04-28 16:56:38 +08:00
|
|
|
struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
|
2016-05-20 18:54:06 +08:00
|
|
|
struct sgt_iter sgt_iter;
|
|
|
|
gen8_pte_t __iomem *gtt_entries;
|
2017-02-15 16:43:37 +08:00
|
|
|
const gen8_pte_t pte_encode = gen8_pte_encode(0, level);
|
2016-05-20 18:54:06 +08:00
|
|
|
dma_addr_t addr;
|
2015-12-16 02:10:38 +08:00
|
|
|
|
2017-02-15 16:43:37 +08:00
|
|
|
gtt_entries = (gen8_pte_t __iomem *)ggtt->gsm;
|
2017-06-22 17:58:36 +08:00
|
|
|
gtt_entries += vma->node.start >> PAGE_SHIFT;
|
|
|
|
for_each_sgt_dma(addr, sgt_iter, vma->pages)
|
2017-02-15 16:43:37 +08:00
|
|
|
gen8_set_pte(gtt_entries++, pte_encode | addr);
|
2016-05-20 18:54:06 +08:00
|
|
|
|
2017-02-15 16:43:37 +08:00
|
|
|
wmb();
|
2013-11-03 12:07:18 +08:00
|
|
|
|
|
|
|
/* This next bit makes the above posting read even more important. We
|
|
|
|
* want to flush the TLBs only after we're certain all the PTE updates
|
|
|
|
* have finished.
|
|
|
|
*/
|
2017-01-12 19:00:49 +08:00
|
|
|
ggtt->invalidate(vm->i915);
|
2013-11-03 12:07:18 +08:00
|
|
|
}
|
|
|
|
|
2016-06-10 16:52:59 +08:00
|
|
|
static void gen6_ggtt_insert_page(struct i915_address_space *vm,
|
|
|
|
dma_addr_t addr,
|
2017-02-15 16:43:57 +08:00
|
|
|
u64 offset,
|
2016-06-10 16:52:59 +08:00
|
|
|
enum i915_cache_level level,
|
|
|
|
u32 flags)
|
|
|
|
{
|
2017-01-12 19:00:49 +08:00
|
|
|
struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
|
2016-06-10 16:52:59 +08:00
|
|
|
gen6_pte_t __iomem *pte =
|
2017-01-12 19:00:49 +08:00
|
|
|
(gen6_pte_t __iomem *)ggtt->gsm + (offset >> PAGE_SHIFT);
|
2016-06-10 16:52:59 +08:00
|
|
|
|
2016-10-13 20:02:40 +08:00
|
|
|
iowrite32(vm->pte_encode(addr, level, flags), pte);
|
2016-06-10 16:52:59 +08:00
|
|
|
|
2017-01-12 19:00:49 +08:00
|
|
|
ggtt->invalidate(vm->i915);
|
2016-06-10 16:52:59 +08:00
|
|
|
}
|
|
|
|
|
2012-11-05 01:21:27 +08:00
|
|
|
/*
|
|
|
|
* Binds an object into the global gtt with the specified cache level. The object
|
|
|
|
* will be accessible to the GPU via commands whose operands reference offsets
|
|
|
|
* within the global GTT as well as accessible by the GPU through the GMADR
|
|
|
|
* mapped BAR (dev_priv->mm.gtt->gtt).
|
|
|
|
*/
|
2013-07-17 07:50:05 +08:00
|
|
|
static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
|
2017-06-22 17:58:36 +08:00
|
|
|
struct i915_vma *vma,
|
2017-02-15 16:43:57 +08:00
|
|
|
enum i915_cache_level level,
|
|
|
|
u32 flags)
|
2012-11-05 01:21:27 +08:00
|
|
|
{
|
2016-04-28 16:56:38 +08:00
|
|
|
struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
|
2017-02-15 16:43:36 +08:00
|
|
|
gen6_pte_t __iomem *entries = (gen6_pte_t __iomem *)ggtt->gsm;
|
2017-06-22 17:58:36 +08:00
|
|
|
unsigned int i = vma->node.start >> PAGE_SHIFT;
|
2017-02-15 16:43:36 +08:00
|
|
|
struct sgt_iter iter;
|
2016-05-20 18:54:06 +08:00
|
|
|
dma_addr_t addr;
|
2017-06-22 17:58:36 +08:00
|
|
|
for_each_sgt_dma(addr, iter, vma->pages)
|
2017-02-15 16:43:36 +08:00
|
|
|
iowrite32(vm->pte_encode(addr, level, flags), &entries[i++]);
|
|
|
|
wmb();
|
2012-11-05 01:21:30 +08:00
|
|
|
|
|
|
|
/* This next bit makes the above posting read even more important. We
|
|
|
|
* want to flush the TLBs only after we're certain all the PTE updates
|
|
|
|
* have finished.
|
|
|
|
*/
|
2017-01-12 19:00:49 +08:00
|
|
|
ggtt->invalidate(vm->i915);
|
2012-11-05 01:21:27 +08:00
|
|
|
}
|
|
|
|
|
2016-05-14 14:26:35 +08:00
|
|
|
static void nop_clear_range(struct i915_address_space *vm,
|
2017-02-15 16:43:57 +08:00
|
|
|
u64 start, u64 length)
|
2016-05-14 14:26:35 +08:00
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2013-11-03 12:07:18 +08:00
|
|
|
static void gen8_ggtt_clear_range(struct i915_address_space *vm,
|
2017-02-15 16:43:57 +08:00
|
|
|
u64 start, u64 length)
|
2013-11-03 12:07:18 +08:00
|
|
|
{
|
2016-04-28 16:56:38 +08:00
|
|
|
struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
|
2014-02-21 03:50:33 +08:00
|
|
|
unsigned first_entry = start >> PAGE_SHIFT;
|
|
|
|
unsigned num_entries = length >> PAGE_SHIFT;
|
2017-02-15 16:43:37 +08:00
|
|
|
const gen8_pte_t scratch_pte =
|
|
|
|
gen8_pte_encode(vm->scratch_page.daddr, I915_CACHE_LLC);
|
|
|
|
gen8_pte_t __iomem *gtt_base =
|
2016-03-30 21:57:10 +08:00
|
|
|
(gen8_pte_t __iomem *)ggtt->gsm + first_entry;
|
|
|
|
const int max_entries = ggtt_total_entries(ggtt) - first_entry;
|
2013-11-03 12:07:18 +08:00
|
|
|
int i;
|
|
|
|
|
|
|
|
if (WARN(num_entries > max_entries,
|
|
|
|
"First entry = %d; Num entries = %d (max=%d)\n",
|
|
|
|
first_entry, num_entries, max_entries))
|
|
|
|
num_entries = max_entries;
|
|
|
|
|
|
|
|
for (i = 0; i < num_entries; i++)
|
|
|
|
gen8_set_pte(>t_base[i], scratch_pte);
|
|
|
|
}
|
|
|
|
|
2017-05-24 23:54:11 +08:00
|
|
|
static void bxt_vtd_ggtt_wa(struct i915_address_space *vm)
|
|
|
|
{
|
|
|
|
struct drm_i915_private *dev_priv = vm->i915;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Make sure the internal GAM fifo has been cleared of all GTT
|
|
|
|
* writes before exiting stop_machine(). This guarantees that
|
|
|
|
* any aperture accesses waiting to start in another process
|
|
|
|
* cannot back up behind the GTT writes causing a hang.
|
|
|
|
* The register can be any arbitrary GAM register.
|
|
|
|
*/
|
|
|
|
POSTING_READ(GFX_FLSH_CNTL_GEN6);
|
|
|
|
}
|
|
|
|
|
|
|
|
struct insert_page {
|
|
|
|
struct i915_address_space *vm;
|
|
|
|
dma_addr_t addr;
|
|
|
|
u64 offset;
|
|
|
|
enum i915_cache_level level;
|
|
|
|
};
|
|
|
|
|
|
|
|
static int bxt_vtd_ggtt_insert_page__cb(void *_arg)
|
|
|
|
{
|
|
|
|
struct insert_page *arg = _arg;
|
|
|
|
|
|
|
|
gen8_ggtt_insert_page(arg->vm, arg->addr, arg->offset, arg->level, 0);
|
|
|
|
bxt_vtd_ggtt_wa(arg->vm);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void bxt_vtd_ggtt_insert_page__BKL(struct i915_address_space *vm,
|
|
|
|
dma_addr_t addr,
|
|
|
|
u64 offset,
|
|
|
|
enum i915_cache_level level,
|
|
|
|
u32 unused)
|
|
|
|
{
|
|
|
|
struct insert_page arg = { vm, addr, offset, level };
|
|
|
|
|
|
|
|
stop_machine(bxt_vtd_ggtt_insert_page__cb, &arg, NULL);
|
|
|
|
}
|
|
|
|
|
|
|
|
struct insert_entries {
|
|
|
|
struct i915_address_space *vm;
|
2017-06-22 17:58:36 +08:00
|
|
|
struct i915_vma *vma;
|
2017-05-24 23:54:11 +08:00
|
|
|
enum i915_cache_level level;
|
|
|
|
};
|
|
|
|
|
|
|
|
static int bxt_vtd_ggtt_insert_entries__cb(void *_arg)
|
|
|
|
{
|
|
|
|
struct insert_entries *arg = _arg;
|
|
|
|
|
2017-06-22 17:58:36 +08:00
|
|
|
gen8_ggtt_insert_entries(arg->vm, arg->vma, arg->level, 0);
|
2017-05-24 23:54:11 +08:00
|
|
|
bxt_vtd_ggtt_wa(arg->vm);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void bxt_vtd_ggtt_insert_entries__BKL(struct i915_address_space *vm,
|
2017-06-22 17:58:36 +08:00
|
|
|
struct i915_vma *vma,
|
2017-05-24 23:54:11 +08:00
|
|
|
enum i915_cache_level level,
|
|
|
|
u32 unused)
|
|
|
|
{
|
2017-07-07 17:50:59 +08:00
|
|
|
struct insert_entries arg = { vm, vma, level };
|
2017-05-24 23:54:11 +08:00
|
|
|
|
|
|
|
stop_machine(bxt_vtd_ggtt_insert_entries__cb, &arg, NULL);
|
|
|
|
}
|
|
|
|
|
|
|
|
struct clear_range {
|
|
|
|
struct i915_address_space *vm;
|
|
|
|
u64 start;
|
|
|
|
u64 length;
|
|
|
|
};
|
|
|
|
|
|
|
|
static int bxt_vtd_ggtt_clear_range__cb(void *_arg)
|
|
|
|
{
|
|
|
|
struct clear_range *arg = _arg;
|
|
|
|
|
|
|
|
gen8_ggtt_clear_range(arg->vm, arg->start, arg->length);
|
|
|
|
bxt_vtd_ggtt_wa(arg->vm);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void bxt_vtd_ggtt_clear_range__BKL(struct i915_address_space *vm,
|
|
|
|
u64 start,
|
|
|
|
u64 length)
|
|
|
|
{
|
|
|
|
struct clear_range arg = { vm, start, length };
|
|
|
|
|
|
|
|
stop_machine(bxt_vtd_ggtt_clear_range__cb, &arg, NULL);
|
|
|
|
}
|
|
|
|
|
2013-07-17 07:50:05 +08:00
|
|
|
static void gen6_ggtt_clear_range(struct i915_address_space *vm,
|
2017-02-15 16:43:57 +08:00
|
|
|
u64 start, u64 length)
|
2013-01-25 06:44:55 +08:00
|
|
|
{
|
2016-04-28 16:56:38 +08:00
|
|
|
struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
|
2014-02-21 03:50:33 +08:00
|
|
|
unsigned first_entry = start >> PAGE_SHIFT;
|
|
|
|
unsigned num_entries = length >> PAGE_SHIFT;
|
2015-03-17 00:00:54 +08:00
|
|
|
gen6_pte_t scratch_pte, __iomem *gtt_base =
|
2016-03-30 21:57:10 +08:00
|
|
|
(gen6_pte_t __iomem *)ggtt->gsm + first_entry;
|
|
|
|
const int max_entries = ggtt_total_entries(ggtt) - first_entry;
|
2013-01-25 06:44:55 +08:00
|
|
|
int i;
|
|
|
|
|
|
|
|
if (WARN(num_entries > max_entries,
|
|
|
|
"First entry = %d; Num entries = %d (max=%d)\n",
|
|
|
|
first_entry, num_entries, max_entries))
|
|
|
|
num_entries = max_entries;
|
|
|
|
|
2016-08-22 15:44:30 +08:00
|
|
|
scratch_pte = vm->pte_encode(vm->scratch_page.daddr,
|
2016-10-13 20:02:40 +08:00
|
|
|
I915_CACHE_LLC, 0);
|
2013-10-17 00:21:30 +08:00
|
|
|
|
2013-01-25 06:44:55 +08:00
|
|
|
for (i = 0; i < num_entries; i++)
|
|
|
|
iowrite32(scratch_pte, >t_base[i]);
|
|
|
|
}
|
|
|
|
|
2016-06-10 16:52:59 +08:00
|
|
|
static void i915_ggtt_insert_page(struct i915_address_space *vm,
|
|
|
|
dma_addr_t addr,
|
2017-02-15 16:43:57 +08:00
|
|
|
u64 offset,
|
2016-06-10 16:52:59 +08:00
|
|
|
enum i915_cache_level cache_level,
|
|
|
|
u32 unused)
|
|
|
|
{
|
|
|
|
unsigned int flags = (cache_level == I915_CACHE_NONE) ?
|
|
|
|
AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
|
|
|
|
|
|
|
|
intel_gtt_insert_page(addr, offset >> PAGE_SHIFT, flags);
|
|
|
|
}
|
|
|
|
|
2015-04-14 23:35:25 +08:00
|
|
|
static void i915_ggtt_insert_entries(struct i915_address_space *vm,
|
2017-06-22 17:58:36 +08:00
|
|
|
struct i915_vma *vma,
|
2017-02-15 16:43:57 +08:00
|
|
|
enum i915_cache_level cache_level,
|
|
|
|
u32 unused)
|
2013-01-25 06:44:55 +08:00
|
|
|
{
|
|
|
|
unsigned int flags = (cache_level == I915_CACHE_NONE) ?
|
|
|
|
AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
|
|
|
|
|
2017-06-22 17:58:36 +08:00
|
|
|
intel_gtt_insert_sg_entries(vma->pages, vma->node.start >> PAGE_SHIFT,
|
|
|
|
flags);
|
2013-01-25 06:44:55 +08:00
|
|
|
}
|
|
|
|
|
2013-07-17 07:50:05 +08:00
|
|
|
static void i915_ggtt_clear_range(struct i915_address_space *vm,
|
2017-02-15 16:43:57 +08:00
|
|
|
u64 start, u64 length)
|
2013-01-25 06:44:55 +08:00
|
|
|
{
|
2016-10-24 20:42:17 +08:00
|
|
|
intel_gtt_clear_range(start >> PAGE_SHIFT, length >> PAGE_SHIFT);
|
2013-01-25 06:44:55 +08:00
|
|
|
}
|
|
|
|
|
2015-04-14 23:35:27 +08:00
|
|
|
static int ggtt_bind_vma(struct i915_vma *vma,
|
|
|
|
enum i915_cache_level cache_level,
|
|
|
|
u32 flags)
|
2015-10-15 20:23:01 +08:00
|
|
|
{
|
2016-11-29 17:50:08 +08:00
|
|
|
struct drm_i915_private *i915 = vma->vm->i915;
|
2015-10-15 20:23:01 +08:00
|
|
|
struct drm_i915_gem_object *obj = vma->obj;
|
2017-02-15 16:43:35 +08:00
|
|
|
u32 pte_flags;
|
2015-10-15 20:23:01 +08:00
|
|
|
|
2017-02-15 16:43:35 +08:00
|
|
|
if (unlikely(!vma->pages)) {
|
|
|
|
int ret = i915_get_ggtt_vma_pages(vma);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
}
|
2015-10-15 20:23:01 +08:00
|
|
|
|
|
|
|
/* Currently applicable only to VLV */
|
2017-02-15 16:43:35 +08:00
|
|
|
pte_flags = 0;
|
2015-10-15 20:23:01 +08:00
|
|
|
if (obj->gt_ro)
|
|
|
|
pte_flags |= PTE_READ_ONLY;
|
|
|
|
|
2016-10-24 20:42:15 +08:00
|
|
|
intel_runtime_pm_get(i915);
|
2017-06-22 17:58:36 +08:00
|
|
|
vma->vm->insert_entries(vma->vm, vma, cache_level, pte_flags);
|
2016-10-24 20:42:15 +08:00
|
|
|
intel_runtime_pm_put(i915);
|
2015-10-15 20:23:01 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Without aliasing PPGTT there's no difference between
|
|
|
|
* GLOBAL/LOCAL_BIND, it's all the same ptes. Hence unconditionally
|
|
|
|
* upgrade to both bound if we bind either to avoid double-binding.
|
|
|
|
*/
|
2016-08-04 23:32:32 +08:00
|
|
|
vma->flags |= I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND;
|
2015-10-15 20:23:01 +08:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:39 +08:00
|
|
|
static void ggtt_unbind_vma(struct i915_vma *vma)
|
|
|
|
{
|
|
|
|
struct drm_i915_private *i915 = vma->vm->i915;
|
|
|
|
|
|
|
|
intel_runtime_pm_get(i915);
|
|
|
|
vma->vm->clear_range(vma->vm, vma->node.start, vma->size);
|
|
|
|
intel_runtime_pm_put(i915);
|
|
|
|
}
|
|
|
|
|
2015-10-15 20:23:01 +08:00
|
|
|
static int aliasing_gtt_bind_vma(struct i915_vma *vma,
|
|
|
|
enum i915_cache_level cache_level,
|
|
|
|
u32 flags)
|
2011-04-14 13:48:26 +08:00
|
|
|
{
|
2016-11-29 17:50:08 +08:00
|
|
|
struct drm_i915_private *i915 = vma->vm->i915;
|
2015-11-20 18:27:18 +08:00
|
|
|
u32 pte_flags;
|
2017-02-15 16:43:42 +08:00
|
|
|
int ret;
|
2015-04-14 23:35:27 +08:00
|
|
|
|
2017-02-15 16:43:35 +08:00
|
|
|
if (unlikely(!vma->pages)) {
|
2017-02-15 16:43:42 +08:00
|
|
|
ret = i915_get_ggtt_vma_pages(vma);
|
2017-02-15 16:43:35 +08:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
}
|
2013-01-25 06:44:55 +08:00
|
|
|
|
2014-06-17 13:29:42 +08:00
|
|
|
/* Currently applicable only to VLV */
|
2015-11-20 18:27:18 +08:00
|
|
|
pte_flags = 0;
|
|
|
|
if (vma->obj->gt_ro)
|
2015-04-14 23:35:15 +08:00
|
|
|
pte_flags |= PTE_READ_ONLY;
|
2014-06-17 13:29:42 +08:00
|
|
|
|
2017-02-15 16:43:42 +08:00
|
|
|
if (flags & I915_VMA_LOCAL_BIND) {
|
|
|
|
struct i915_hw_ppgtt *appgtt = i915->mm.aliasing_ppgtt;
|
|
|
|
|
2017-05-12 17:14:23 +08:00
|
|
|
if (!(vma->flags & I915_VMA_LOCAL_BIND) &&
|
|
|
|
appgtt->base.allocate_va_range) {
|
2017-02-15 16:43:42 +08:00
|
|
|
ret = appgtt->base.allocate_va_range(&appgtt->base,
|
|
|
|
vma->node.start,
|
2017-05-16 16:55:14 +08:00
|
|
|
vma->size);
|
2017-02-15 16:43:42 +08:00
|
|
|
if (ret)
|
2017-02-27 20:26:53 +08:00
|
|
|
goto err_pages;
|
2017-02-15 16:43:42 +08:00
|
|
|
}
|
|
|
|
|
2017-06-22 17:58:36 +08:00
|
|
|
appgtt->base.insert_entries(&appgtt->base, vma, cache_level,
|
|
|
|
pte_flags);
|
2017-02-15 16:43:42 +08:00
|
|
|
}
|
|
|
|
|
2016-08-04 23:32:32 +08:00
|
|
|
if (flags & I915_VMA_GLOBAL_BIND) {
|
2016-10-24 20:42:15 +08:00
|
|
|
intel_runtime_pm_get(i915);
|
2017-06-22 17:58:36 +08:00
|
|
|
vma->vm->insert_entries(vma->vm, vma, cache_level, pte_flags);
|
2016-10-24 20:42:15 +08:00
|
|
|
intel_runtime_pm_put(i915);
|
drm/i915: Create bind/unbind abstraction for VMAs
To sum up what goes on here, we abstract the vma binding, similarly to
the previous object binding. This helps for distinguishing legacy
binding, versus modern binding. To keep the code churn as minimal as
possible, I am leaving in insert_entries(). It serves as the per
platform pte writing basically. bind_vma and insert_entries do share a
lot of similarities, and I did have designs to combine the two, but as
mentioned already... too much churn in an already massive patchset.
What follows are the 3 commits which existed discretely in the original
submissions. Upon rebasing on Broadwell support, it became clear that
separation was not good, and only made for more error prone code. Below
are the 3 commit messages with all their history.
drm/i915: Add bind/unbind object functions to VMA
drm/i915: Use the new vm [un]bind functions
drm/i915: reduce vm->insert_entries() usage
drm/i915: Add bind/unbind object functions to VMA
As we plumb the code with more VM information, it has become more
obvious that the easiest way to deal with bind and unbind is to simply
put the function pointers in the vm, and let those choose the correct
way to handle the page table updates. This change allows many places in
the code to simply be vm->bind, and not have to worry about
distinguishing PPGTT vs GGTT.
Notice that this patch has no impact on functionality. I've decided to
save the actual change until the next patch because I think it's easier
to review that way. I'm happy to squash the two, or let Daniel do it on
merge.
v2:
Make ggtt handle the quirky aliasing ppgtt
Add flags to bind object to support above
Don't ever call bind/unbind directly for PPGTT until we have real, full
PPGTT (use NULLs to assert this)
Make sure we rebind the ggtt if there already is a ggtt binding. This
happens on set cache levels.
Use VMA for bind/unbind (Daniel, Ben)
v3: Reorganize ggtt_vma_bind to be more concise and easier to read
(Ville). Change logic in unbind to only unbind ggtt when there is a
global mapping, and to remove a redundant check if the aliasing ppgtt
exists.
v4: Make the bind function a bit smarter about the cache levels to avoid
unnecessary multiple remaps. "I accept it is a wart, I think unifying
the pin_vma / bind_vma could be unified later" (Chris)
Removed the git notes, and put version info here. (Daniel)
v5: Update the comment to not suck (Chris)
v6:
Move bind/unbind to the VMA. It makes more sense in the VMA structure
(always has, but I was previously lazy). With this change, it will allow
us to keep a distinct insert_entries.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
drm/i915: Use the new vm [un]bind functions
Building on the last patch which created the new function pointers in
the VM for bind/unbind, here we actually put those new function pointers
to use.
Split out as a separate patch to aid in review. I'm fine with squashing
into the previous patch if people request it.
v2: Updated to address the smart ggtt which can do aliasing as needed
Make sure we bind to global gtt when mappable and fenceable. I thought
we could get away without this initialy, but we cannot.
v3: Make the global GTT binding explicitly use the ggtt VM for
bind_vma(). While at it, use the new ggtt_vma helper (Chris)
At this point the original mailing list thread diverges. ie.
v4^:
use target_obj instead of obj for gen6 relocate_entry
vma->bind_vma() can be called safely during pin. So simply do that
instead of the complicated conditionals.
Don't restore PPGTT bound objects on resume path
Bug fix in resume path for globally bound Bos
Properly handle secure dispatch
Rebased on vma bind/unbind conversion
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
drm/i915: reduce vm->insert_entries() usage
FKA: drm/i915: eliminate vm->insert_entries()
With bind/unbind function pointers in place, we no longer need
insert_entries. We could, and want, to remove clear_range, however it's
not totally easy at this point. Since it's used in a couple of place
still that don't only deal in objects: setup, ppgtt init, and restore
gtt mappings.
v2: Don't actually remove insert_entries, just limit its usage. It will
be useful when we introduce gen8. It will always be called from the vma
bind/unbind.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-07 06:10:56 +08:00
|
|
|
}
|
2011-04-14 13:48:26 +08:00
|
|
|
|
2015-04-14 23:35:27 +08:00
|
|
|
return 0;
|
2017-02-27 20:26:53 +08:00
|
|
|
|
|
|
|
err_pages:
|
|
|
|
if (!(vma->flags & (I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND))) {
|
|
|
|
if (vma->pages != vma->obj->mm.pages) {
|
|
|
|
GEM_BUG_ON(!vma->pages);
|
|
|
|
sg_free_table(vma->pages);
|
|
|
|
kfree(vma->pages);
|
|
|
|
}
|
|
|
|
vma->pages = NULL;
|
|
|
|
}
|
|
|
|
return ret;
|
2011-04-14 13:48:26 +08:00
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:39 +08:00
|
|
|
static void aliasing_gtt_unbind_vma(struct i915_vma *vma)
|
2012-02-16 06:50:21 +08:00
|
|
|
{
|
2016-11-29 17:50:08 +08:00
|
|
|
struct drm_i915_private *i915 = vma->vm->i915;
|
drm/i915: Create bind/unbind abstraction for VMAs
To sum up what goes on here, we abstract the vma binding, similarly to
the previous object binding. This helps for distinguishing legacy
binding, versus modern binding. To keep the code churn as minimal as
possible, I am leaving in insert_entries(). It serves as the per
platform pte writing basically. bind_vma and insert_entries do share a
lot of similarities, and I did have designs to combine the two, but as
mentioned already... too much churn in an already massive patchset.
What follows are the 3 commits which existed discretely in the original
submissions. Upon rebasing on Broadwell support, it became clear that
separation was not good, and only made for more error prone code. Below
are the 3 commit messages with all their history.
drm/i915: Add bind/unbind object functions to VMA
drm/i915: Use the new vm [un]bind functions
drm/i915: reduce vm->insert_entries() usage
drm/i915: Add bind/unbind object functions to VMA
As we plumb the code with more VM information, it has become more
obvious that the easiest way to deal with bind and unbind is to simply
put the function pointers in the vm, and let those choose the correct
way to handle the page table updates. This change allows many places in
the code to simply be vm->bind, and not have to worry about
distinguishing PPGTT vs GGTT.
Notice that this patch has no impact on functionality. I've decided to
save the actual change until the next patch because I think it's easier
to review that way. I'm happy to squash the two, or let Daniel do it on
merge.
v2:
Make ggtt handle the quirky aliasing ppgtt
Add flags to bind object to support above
Don't ever call bind/unbind directly for PPGTT until we have real, full
PPGTT (use NULLs to assert this)
Make sure we rebind the ggtt if there already is a ggtt binding. This
happens on set cache levels.
Use VMA for bind/unbind (Daniel, Ben)
v3: Reorganize ggtt_vma_bind to be more concise and easier to read
(Ville). Change logic in unbind to only unbind ggtt when there is a
global mapping, and to remove a redundant check if the aliasing ppgtt
exists.
v4: Make the bind function a bit smarter about the cache levels to avoid
unnecessary multiple remaps. "I accept it is a wart, I think unifying
the pin_vma / bind_vma could be unified later" (Chris)
Removed the git notes, and put version info here. (Daniel)
v5: Update the comment to not suck (Chris)
v6:
Move bind/unbind to the VMA. It makes more sense in the VMA structure
(always has, but I was previously lazy). With this change, it will allow
us to keep a distinct insert_entries.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
drm/i915: Use the new vm [un]bind functions
Building on the last patch which created the new function pointers in
the VM for bind/unbind, here we actually put those new function pointers
to use.
Split out as a separate patch to aid in review. I'm fine with squashing
into the previous patch if people request it.
v2: Updated to address the smart ggtt which can do aliasing as needed
Make sure we bind to global gtt when mappable and fenceable. I thought
we could get away without this initialy, but we cannot.
v3: Make the global GTT binding explicitly use the ggtt VM for
bind_vma(). While at it, use the new ggtt_vma helper (Chris)
At this point the original mailing list thread diverges. ie.
v4^:
use target_obj instead of obj for gen6 relocate_entry
vma->bind_vma() can be called safely during pin. So simply do that
instead of the complicated conditionals.
Don't restore PPGTT bound objects on resume path
Bug fix in resume path for globally bound Bos
Properly handle secure dispatch
Rebased on vma bind/unbind conversion
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
drm/i915: reduce vm->insert_entries() usage
FKA: drm/i915: eliminate vm->insert_entries()
With bind/unbind function pointers in place, we no longer need
insert_entries. We could, and want, to remove clear_range, however it's
not totally easy at this point. Since it's used in a couple of place
still that don't only deal in objects: setup, ppgtt init, and restore
gtt mappings.
v2: Don't actually remove insert_entries, just limit its usage. It will
be useful when we introduce gen8. It will always be called from the vma
bind/unbind.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-07 06:10:56 +08:00
|
|
|
|
2016-10-24 20:42:15 +08:00
|
|
|
if (vma->flags & I915_VMA_GLOBAL_BIND) {
|
|
|
|
intel_runtime_pm_get(i915);
|
2017-02-15 16:43:39 +08:00
|
|
|
vma->vm->clear_range(vma->vm, vma->node.start, vma->size);
|
2016-10-24 20:42:15 +08:00
|
|
|
intel_runtime_pm_put(i915);
|
|
|
|
}
|
2015-04-24 20:09:03 +08:00
|
|
|
|
2017-02-15 16:43:39 +08:00
|
|
|
if (vma->flags & I915_VMA_LOCAL_BIND) {
|
|
|
|
struct i915_address_space *vm = &i915->mm.aliasing_ppgtt->base;
|
|
|
|
|
|
|
|
vm->clear_range(vm, vma->node.start, vma->size);
|
|
|
|
}
|
2012-02-16 06:50:21 +08:00
|
|
|
}
|
|
|
|
|
2016-10-28 20:58:36 +08:00
|
|
|
void i915_gem_gtt_finish_pages(struct drm_i915_gem_object *obj,
|
|
|
|
struct sg_table *pages)
|
2010-11-06 17:10:47 +08:00
|
|
|
{
|
2016-08-22 18:32:44 +08:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
|
|
|
|
struct device *kdev = &dev_priv->drm.pdev->dev;
|
2016-08-05 17:14:12 +08:00
|
|
|
struct i915_ggtt *ggtt = &dev_priv->ggtt;
|
2011-10-18 06:51:55 +08:00
|
|
|
|
2016-08-05 17:14:12 +08:00
|
|
|
if (unlikely(ggtt->do_idle_maps)) {
|
2017-03-30 16:53:41 +08:00
|
|
|
if (i915_gem_wait_for_idle(dev_priv, 0)) {
|
2016-08-05 17:14:12 +08:00
|
|
|
DRM_ERROR("Failed to wait for idle; VT'd may hang.\n");
|
|
|
|
/* Wait a bit, in hopes it avoids the hang */
|
|
|
|
udelay(10);
|
|
|
|
}
|
|
|
|
}
|
2011-10-18 06:51:55 +08:00
|
|
|
|
2016-10-28 20:58:36 +08:00
|
|
|
dma_unmap_sg(kdev, pages->sgl, pages->nents, PCI_DMA_BIDIRECTIONAL);
|
2010-11-06 17:10:47 +08:00
|
|
|
}
|
2012-03-26 15:45:40 +08:00
|
|
|
|
2016-12-16 15:46:42 +08:00
|
|
|
static void i915_gtt_color_adjust(const struct drm_mm_node *node,
|
2012-07-26 18:49:32 +08:00
|
|
|
unsigned long color,
|
2015-01-23 16:05:06 +08:00
|
|
|
u64 *start,
|
|
|
|
u64 *end)
|
2012-07-26 18:49:32 +08:00
|
|
|
{
|
2017-02-06 16:45:47 +08:00
|
|
|
if (node->allocated && node->color != color)
|
2017-01-10 22:47:34 +08:00
|
|
|
*start += I915_GTT_PAGE_SIZE;
|
2012-07-26 18:49:32 +08:00
|
|
|
|
2017-02-06 16:45:47 +08:00
|
|
|
/* Also leave a space between the unallocated reserved node after the
|
|
|
|
* GTT and any objects within the GTT, i.e. we use the color adjustment
|
|
|
|
* to insert a guard page to prevent prefetches crossing over the
|
|
|
|
* GTT boundary.
|
|
|
|
*/
|
2016-12-16 15:46:40 +08:00
|
|
|
node = list_next_entry(node, node_list);
|
2017-02-06 16:45:47 +08:00
|
|
|
if (node->color != color)
|
2017-01-10 22:47:34 +08:00
|
|
|
*end -= I915_GTT_PAGE_SIZE;
|
2012-07-26 18:49:32 +08:00
|
|
|
}
|
2013-11-05 11:56:49 +08:00
|
|
|
|
2017-02-14 01:15:50 +08:00
|
|
|
int i915_gem_init_aliasing_ppgtt(struct drm_i915_private *i915)
|
|
|
|
{
|
|
|
|
struct i915_ggtt *ggtt = &i915->ggtt;
|
|
|
|
struct i915_hw_ppgtt *ppgtt;
|
|
|
|
int err;
|
|
|
|
|
2017-02-15 16:43:56 +08:00
|
|
|
ppgtt = i915_ppgtt_create(i915, ERR_PTR(-EPERM), "[alias]");
|
2017-02-15 16:43:38 +08:00
|
|
|
if (IS_ERR(ppgtt))
|
|
|
|
return PTR_ERR(ppgtt);
|
2017-02-14 01:15:50 +08:00
|
|
|
|
2017-02-15 16:43:55 +08:00
|
|
|
if (WARN_ON(ppgtt->base.total < ggtt->base.total)) {
|
|
|
|
err = -ENODEV;
|
|
|
|
goto err_ppgtt;
|
|
|
|
}
|
|
|
|
|
2017-02-14 01:15:50 +08:00
|
|
|
if (ppgtt->base.allocate_va_range) {
|
2017-02-15 16:43:55 +08:00
|
|
|
/* Note we only pre-allocate as far as the end of the global
|
|
|
|
* GTT. On 48b / 4-level page-tables, the difference is very,
|
|
|
|
* very significant! We have to preallocate as GVT/vgpu does
|
|
|
|
* not like the page directory disappearing.
|
|
|
|
*/
|
2017-02-14 01:15:50 +08:00
|
|
|
err = ppgtt->base.allocate_va_range(&ppgtt->base,
|
2017-02-15 16:43:55 +08:00
|
|
|
0, ggtt->base.total);
|
2017-02-14 01:15:50 +08:00
|
|
|
if (err)
|
2017-02-15 16:43:38 +08:00
|
|
|
goto err_ppgtt;
|
2017-02-14 01:15:50 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
i915->mm.aliasing_ppgtt = ppgtt;
|
2017-02-15 16:43:39 +08:00
|
|
|
|
2017-02-14 01:15:50 +08:00
|
|
|
WARN_ON(ggtt->base.bind_vma != ggtt_bind_vma);
|
|
|
|
ggtt->base.bind_vma = aliasing_gtt_bind_vma;
|
|
|
|
|
2017-02-15 16:43:39 +08:00
|
|
|
WARN_ON(ggtt->base.unbind_vma != ggtt_unbind_vma);
|
|
|
|
ggtt->base.unbind_vma = aliasing_gtt_unbind_vma;
|
|
|
|
|
2017-02-14 01:15:50 +08:00
|
|
|
return 0;
|
|
|
|
|
|
|
|
err_ppgtt:
|
2017-02-15 16:43:38 +08:00
|
|
|
i915_ppgtt_put(ppgtt);
|
2017-02-14 01:15:50 +08:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
void i915_gem_fini_aliasing_ppgtt(struct drm_i915_private *i915)
|
|
|
|
{
|
|
|
|
struct i915_ggtt *ggtt = &i915->ggtt;
|
|
|
|
struct i915_hw_ppgtt *ppgtt;
|
|
|
|
|
|
|
|
ppgtt = fetch_and_zero(&i915->mm.aliasing_ppgtt);
|
|
|
|
if (!ppgtt)
|
|
|
|
return;
|
|
|
|
|
2017-02-15 16:43:38 +08:00
|
|
|
i915_ppgtt_put(ppgtt);
|
2017-02-14 01:15:50 +08:00
|
|
|
|
|
|
|
ggtt->base.bind_vma = ggtt_bind_vma;
|
2017-02-15 16:43:39 +08:00
|
|
|
ggtt->base.unbind_vma = ggtt_unbind_vma;
|
2017-02-14 01:15:50 +08:00
|
|
|
}
|
|
|
|
|
2016-08-04 14:52:23 +08:00
|
|
|
int i915_gem_init_ggtt(struct drm_i915_private *dev_priv)
|
2012-03-26 15:45:40 +08:00
|
|
|
{
|
2013-01-26 08:41:04 +08:00
|
|
|
/* Let GEM Manage all of the aperture.
|
|
|
|
*
|
|
|
|
* However, leave one page at the end still bound to the scratch page.
|
|
|
|
* There are a number of places where the hardware apparently prefetches
|
|
|
|
* past the end of the object, and we've seen multiple hangs with the
|
|
|
|
* GPU head pointer stuck in a batchbuffer bound at the last page of the
|
|
|
|
* aperture. One page should be enough to keep any prefetching inside
|
|
|
|
* of the aperture.
|
|
|
|
*/
|
2016-03-30 21:57:10 +08:00
|
|
|
struct i915_ggtt *ggtt = &dev_priv->ggtt;
|
2012-11-15 19:32:19 +08:00
|
|
|
unsigned long hole_start, hole_end;
|
2016-08-04 14:52:23 +08:00
|
|
|
struct drm_mm_node *entry;
|
2014-08-07 02:19:54 +08:00
|
|
|
int ret;
|
2012-03-26 15:45:40 +08:00
|
|
|
|
2016-06-16 20:06:59 +08:00
|
|
|
ret = intel_vgt_balloon(dev_priv);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
2015-02-10 19:05:48 +08:00
|
|
|
|
2016-10-12 17:05:20 +08:00
|
|
|
/* Reserve a mappable slot for our lockless error capture */
|
2017-02-03 05:04:38 +08:00
|
|
|
ret = drm_mm_insert_node_in_range(&ggtt->base.mm, &ggtt->error_capture,
|
|
|
|
PAGE_SIZE, 0, I915_COLOR_UNEVICTABLE,
|
|
|
|
0, ggtt->mappable_end,
|
|
|
|
DRM_MM_INSERT_LOW);
|
2016-10-12 17:05:20 +08:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2012-11-15 19:32:19 +08:00
|
|
|
/* Clear any non-preallocated blocks */
|
2016-03-30 21:57:10 +08:00
|
|
|
drm_mm_for_each_hole(entry, &ggtt->base.mm, hole_start, hole_end) {
|
2012-11-15 19:32:19 +08:00
|
|
|
DRM_DEBUG_KMS("clearing unused GTT space: [%lx, %lx]\n",
|
|
|
|
hole_start, hole_end);
|
2016-03-30 21:57:10 +08:00
|
|
|
ggtt->base.clear_range(&ggtt->base, hole_start,
|
2016-10-13 20:02:40 +08:00
|
|
|
hole_end - hole_start);
|
2012-11-15 19:32:19 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/* And finally clear the reserved guard page */
|
2016-08-04 14:52:23 +08:00
|
|
|
ggtt->base.clear_range(&ggtt->base,
|
2016-10-13 20:02:40 +08:00
|
|
|
ggtt->base.total - PAGE_SIZE, PAGE_SIZE);
|
2014-08-06 21:04:50 +08:00
|
|
|
|
2016-08-04 14:52:22 +08:00
|
|
|
if (USES_PPGTT(dev_priv) && !USES_FULL_PPGTT(dev_priv)) {
|
2017-02-14 01:15:50 +08:00
|
|
|
ret = i915_gem_init_aliasing_ppgtt(dev_priv);
|
2016-10-12 17:05:20 +08:00
|
|
|
if (ret)
|
2017-02-14 01:15:50 +08:00
|
|
|
goto err;
|
2014-08-07 02:19:54 +08:00
|
|
|
}
|
|
|
|
|
2014-08-06 21:04:50 +08:00
|
|
|
return 0;
|
2016-10-12 17:05:20 +08:00
|
|
|
|
|
|
|
err:
|
|
|
|
drm_mm_remove_node(&ggtt->error_capture);
|
|
|
|
return ret;
|
2012-11-05 01:21:27 +08:00
|
|
|
}
|
|
|
|
|
2016-03-24 22:47:46 +08:00
|
|
|
/**
|
|
|
|
* i915_ggtt_cleanup_hw - Clean up GGTT hardware initialization
|
2016-08-04 14:52:22 +08:00
|
|
|
* @dev_priv: i915 device
|
2016-03-24 22:47:46 +08:00
|
|
|
*/
|
2016-08-04 14:52:22 +08:00
|
|
|
void i915_ggtt_cleanup_hw(struct drm_i915_private *dev_priv)
|
2014-08-06 21:04:56 +08:00
|
|
|
{
|
2016-03-30 21:57:10 +08:00
|
|
|
struct i915_ggtt *ggtt = &dev_priv->ggtt;
|
2017-02-11 00:35:22 +08:00
|
|
|
struct i915_vma *vma, *vn;
|
2017-08-23 01:38:28 +08:00
|
|
|
struct pagevec *pvec;
|
2017-02-11 00:35:22 +08:00
|
|
|
|
|
|
|
ggtt->base.closed = true;
|
|
|
|
|
|
|
|
mutex_lock(&dev_priv->drm.struct_mutex);
|
|
|
|
WARN_ON(!list_empty(&ggtt->base.active_list));
|
|
|
|
list_for_each_entry_safe(vma, vn, &ggtt->base.inactive_list, vm_link)
|
|
|
|
WARN_ON(i915_vma_unbind(vma));
|
|
|
|
mutex_unlock(&dev_priv->drm.struct_mutex);
|
2014-08-06 21:04:56 +08:00
|
|
|
|
2016-08-04 14:52:22 +08:00
|
|
|
i915_gem_cleanup_stolen(&dev_priv->drm);
|
2016-01-19 21:26:32 +08:00
|
|
|
|
2017-02-15 16:43:38 +08:00
|
|
|
mutex_lock(&dev_priv->drm.struct_mutex);
|
|
|
|
i915_gem_fini_aliasing_ppgtt(dev_priv);
|
|
|
|
|
2016-10-12 17:05:20 +08:00
|
|
|
if (drm_mm_node_allocated(&ggtt->error_capture))
|
|
|
|
drm_mm_remove_node(&ggtt->error_capture);
|
|
|
|
|
2016-03-30 21:57:10 +08:00
|
|
|
if (drm_mm_initialized(&ggtt->base.mm)) {
|
2016-06-16 20:06:59 +08:00
|
|
|
intel_vgt_deballoon(dev_priv);
|
2016-11-18 05:04:10 +08:00
|
|
|
i915_address_space_fini(&ggtt->base);
|
2014-08-06 21:04:56 +08:00
|
|
|
}
|
|
|
|
|
2016-03-30 21:57:10 +08:00
|
|
|
ggtt->base.cleanup(&ggtt->base);
|
2017-08-23 01:38:28 +08:00
|
|
|
|
|
|
|
pvec = &dev_priv->mm.wc_stash;
|
|
|
|
if (pvec->nr) {
|
|
|
|
set_pages_array_wb(pvec->pages, pvec->nr);
|
|
|
|
__pagevec_release(pvec);
|
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:38 +08:00
|
|
|
mutex_unlock(&dev_priv->drm.struct_mutex);
|
2016-08-04 14:52:23 +08:00
|
|
|
|
|
|
|
arch_phys_wc_del(ggtt->mtrr);
|
2016-08-19 23:54:27 +08:00
|
|
|
io_mapping_fini(&ggtt->mappable);
|
2014-08-06 21:04:56 +08:00
|
|
|
}
|
2014-08-06 21:04:57 +08:00
|
|
|
|
2015-04-14 23:35:26 +08:00
|
|
|
static unsigned int gen6_get_total_gtt_size(u16 snb_gmch_ctl)
|
2012-11-05 01:21:27 +08:00
|
|
|
{
|
|
|
|
snb_gmch_ctl >>= SNB_GMCH_GGMS_SHIFT;
|
|
|
|
snb_gmch_ctl &= SNB_GMCH_GGMS_MASK;
|
|
|
|
return snb_gmch_ctl << 20;
|
|
|
|
}
|
|
|
|
|
2015-04-14 23:35:26 +08:00
|
|
|
static unsigned int gen8_get_total_gtt_size(u16 bdw_gmch_ctl)
|
2013-11-04 08:53:55 +08:00
|
|
|
{
|
|
|
|
bdw_gmch_ctl >>= BDW_GMCH_GGMS_SHIFT;
|
|
|
|
bdw_gmch_ctl &= BDW_GMCH_GGMS_MASK;
|
|
|
|
if (bdw_gmch_ctl)
|
|
|
|
bdw_gmch_ctl = 1 << bdw_gmch_ctl;
|
2014-05-28 07:53:08 +08:00
|
|
|
|
|
|
|
#ifdef CONFIG_X86_32
|
|
|
|
/* Limit 32b platforms to a 2GB GGTT: 4 << 20 / pte size * PAGE_SIZE */
|
|
|
|
if (bdw_gmch_ctl > 4)
|
|
|
|
bdw_gmch_ctl = 4;
|
|
|
|
#endif
|
|
|
|
|
2013-11-04 08:53:55 +08:00
|
|
|
return bdw_gmch_ctl << 20;
|
|
|
|
}
|
|
|
|
|
2015-04-14 23:35:26 +08:00
|
|
|
static unsigned int chv_get_total_gtt_size(u16 gmch_ctrl)
|
2014-05-09 03:19:40 +08:00
|
|
|
{
|
|
|
|
gmch_ctrl >>= SNB_GMCH_GGMS_SHIFT;
|
|
|
|
gmch_ctrl &= SNB_GMCH_GGMS_MASK;
|
|
|
|
|
|
|
|
if (gmch_ctrl)
|
|
|
|
return 1 << (20 + gmch_ctrl);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2015-04-14 23:35:26 +08:00
|
|
|
static size_t gen6_get_stolen_size(u16 snb_gmch_ctl)
|
2012-11-05 01:21:27 +08:00
|
|
|
{
|
|
|
|
snb_gmch_ctl >>= SNB_GMCH_GMS_SHIFT;
|
|
|
|
snb_gmch_ctl &= SNB_GMCH_GMS_MASK;
|
2017-05-10 17:21:52 +08:00
|
|
|
return (size_t)snb_gmch_ctl << 25; /* 32 MB units */
|
2012-11-05 01:21:27 +08:00
|
|
|
}
|
|
|
|
|
2015-04-14 23:35:26 +08:00
|
|
|
static size_t gen8_get_stolen_size(u16 bdw_gmch_ctl)
|
2013-11-04 08:53:55 +08:00
|
|
|
{
|
|
|
|
bdw_gmch_ctl >>= BDW_GMCH_GMS_SHIFT;
|
|
|
|
bdw_gmch_ctl &= BDW_GMCH_GMS_MASK;
|
2017-05-10 17:21:52 +08:00
|
|
|
return (size_t)bdw_gmch_ctl << 25; /* 32 MB units */
|
2013-11-04 08:53:55 +08:00
|
|
|
}
|
|
|
|
|
2014-05-09 03:19:40 +08:00
|
|
|
static size_t chv_get_stolen_size(u16 gmch_ctrl)
|
|
|
|
{
|
|
|
|
gmch_ctrl >>= SNB_GMCH_GMS_SHIFT;
|
|
|
|
gmch_ctrl &= SNB_GMCH_GMS_MASK;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* 0x0 to 0x10: 32MB increments starting at 0MB
|
|
|
|
* 0x11 to 0x16: 4MB increments starting at 8MB
|
|
|
|
* 0x17 to 0x1d: 4MB increments start at 36MB
|
|
|
|
*/
|
|
|
|
if (gmch_ctrl < 0x11)
|
2017-05-10 17:21:52 +08:00
|
|
|
return (size_t)gmch_ctrl << 25;
|
2014-05-09 03:19:40 +08:00
|
|
|
else if (gmch_ctrl < 0x17)
|
2017-05-10 17:21:52 +08:00
|
|
|
return (size_t)(gmch_ctrl - 0x11 + 2) << 22;
|
2014-05-09 03:19:40 +08:00
|
|
|
else
|
2017-05-10 17:21:52 +08:00
|
|
|
return (size_t)(gmch_ctrl - 0x17 + 9) << 22;
|
2014-05-09 03:19:40 +08:00
|
|
|
}
|
|
|
|
|
2014-01-10 02:02:46 +08:00
|
|
|
static size_t gen9_get_stolen_size(u16 gen9_gmch_ctl)
|
|
|
|
{
|
|
|
|
gen9_gmch_ctl >>= BDW_GMCH_GMS_SHIFT;
|
|
|
|
gen9_gmch_ctl &= BDW_GMCH_GMS_MASK;
|
|
|
|
|
|
|
|
if (gen9_gmch_ctl < 0xf0)
|
2017-05-10 17:21:52 +08:00
|
|
|
return (size_t)gen9_gmch_ctl << 25; /* 32 MB units */
|
2014-01-10 02:02:46 +08:00
|
|
|
else
|
|
|
|
/* 4MB increments starting at 0xf0 for 4MB */
|
2017-05-10 17:21:52 +08:00
|
|
|
return (size_t)(gen9_gmch_ctl - 0xf0 + 1) << 22;
|
2014-01-10 02:02:46 +08:00
|
|
|
}
|
|
|
|
|
2016-08-04 14:52:24 +08:00
|
|
|
static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size)
|
2013-11-05 11:32:22 +08:00
|
|
|
{
|
2016-11-29 17:50:08 +08:00
|
|
|
struct drm_i915_private *dev_priv = ggtt->base.i915;
|
|
|
|
struct pci_dev *pdev = dev_priv->drm.pdev;
|
2016-08-04 14:52:24 +08:00
|
|
|
phys_addr_t phys_addr;
|
2016-08-22 15:44:30 +08:00
|
|
|
int ret;
|
2013-11-05 11:32:22 +08:00
|
|
|
|
|
|
|
/* For Modern GENs the PTEs and register space are split in the BAR */
|
2016-08-04 14:52:24 +08:00
|
|
|
phys_addr = pci_resource_start(pdev, 0) + pci_resource_len(pdev, 0) / 2;
|
2013-11-05 11:32:22 +08:00
|
|
|
|
2015-03-27 19:07:33 +08:00
|
|
|
/*
|
2017-08-30 07:09:07 +08:00
|
|
|
* On BXT+/CNL+ writes larger than 64 bit to the GTT pagetable range
|
|
|
|
* will be dropped. For WC mappings in general we have 64 byte burst
|
|
|
|
* writes when the WC buffer is flushed, so we can't use it, but have to
|
2015-03-27 19:07:33 +08:00
|
|
|
* resort to an uncached mapping. The WC issue is easily caught by the
|
|
|
|
* readback check when writing GTT PTE entries.
|
|
|
|
*/
|
2017-08-30 07:09:07 +08:00
|
|
|
if (IS_GEN9_LP(dev_priv) || INTEL_GEN(dev_priv) >= 10)
|
2016-08-04 14:52:24 +08:00
|
|
|
ggtt->gsm = ioremap_nocache(phys_addr, size);
|
2015-03-27 19:07:33 +08:00
|
|
|
else
|
2016-08-04 14:52:24 +08:00
|
|
|
ggtt->gsm = ioremap_wc(phys_addr, size);
|
2016-03-30 21:57:10 +08:00
|
|
|
if (!ggtt->gsm) {
|
2016-08-04 14:52:24 +08:00
|
|
|
DRM_ERROR("Failed to map the ggtt page table\n");
|
2013-11-05 11:32:22 +08:00
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:40 +08:00
|
|
|
ret = setup_scratch_page(&ggtt->base, GFP_DMA32);
|
2016-08-22 15:44:30 +08:00
|
|
|
if (ret) {
|
2013-11-05 11:32:22 +08:00
|
|
|
DRM_ERROR("Scratch setup failed\n");
|
|
|
|
/* iounmap will also get called at remove, but meh */
|
2016-03-30 21:57:10 +08:00
|
|
|
iounmap(ggtt->gsm);
|
2016-08-22 15:44:30 +08:00
|
|
|
return ret;
|
2013-11-05 11:32:22 +08:00
|
|
|
}
|
|
|
|
|
2015-06-30 23:16:39 +08:00
|
|
|
return 0;
|
2013-11-05 11:32:22 +08:00
|
|
|
}
|
|
|
|
|
2017-09-14 20:39:40 +08:00
|
|
|
static struct intel_ppat_entry *
|
|
|
|
__alloc_ppat_entry(struct intel_ppat *ppat, unsigned int index, u8 value)
|
2017-08-16 07:25:39 +08:00
|
|
|
{
|
2017-09-14 20:39:40 +08:00
|
|
|
struct intel_ppat_entry *entry = &ppat->entries[index];
|
|
|
|
|
|
|
|
GEM_BUG_ON(index >= ppat->max_entries);
|
|
|
|
GEM_BUG_ON(test_bit(index, ppat->used));
|
|
|
|
|
|
|
|
entry->ppat = ppat;
|
|
|
|
entry->value = value;
|
|
|
|
kref_init(&entry->ref);
|
|
|
|
set_bit(index, ppat->used);
|
|
|
|
set_bit(index, ppat->dirty);
|
|
|
|
|
|
|
|
return entry;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void __free_ppat_entry(struct intel_ppat_entry *entry)
|
|
|
|
{
|
|
|
|
struct intel_ppat *ppat = entry->ppat;
|
|
|
|
unsigned int index = entry - ppat->entries;
|
|
|
|
|
|
|
|
GEM_BUG_ON(index >= ppat->max_entries);
|
|
|
|
GEM_BUG_ON(!test_bit(index, ppat->used));
|
|
|
|
|
|
|
|
entry->value = ppat->clear_value;
|
|
|
|
clear_bit(index, ppat->used);
|
|
|
|
set_bit(index, ppat->dirty);
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* intel_ppat_get - get a usable PPAT entry
|
|
|
|
* @i915: i915 device instance
|
|
|
|
* @value: the PPAT value required by the caller
|
|
|
|
*
|
|
|
|
* The function tries to search if there is an existing PPAT entry which
|
|
|
|
* matches with the required value. If perfectly matched, the existing PPAT
|
|
|
|
* entry will be used. If only partially matched, it will try to check if
|
|
|
|
* there is any available PPAT index. If yes, it will allocate a new PPAT
|
|
|
|
* index for the required entry and update the HW. If not, the partially
|
|
|
|
* matched entry will be used.
|
|
|
|
*/
|
|
|
|
const struct intel_ppat_entry *
|
|
|
|
intel_ppat_get(struct drm_i915_private *i915, u8 value)
|
|
|
|
{
|
|
|
|
struct intel_ppat *ppat = &i915->ppat;
|
|
|
|
struct intel_ppat_entry *entry;
|
|
|
|
unsigned int scanned, best_score;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
GEM_BUG_ON(!ppat->max_entries);
|
|
|
|
|
|
|
|
scanned = best_score = 0;
|
|
|
|
for_each_set_bit(i, ppat->used, ppat->max_entries) {
|
|
|
|
unsigned int score;
|
|
|
|
|
|
|
|
score = ppat->match(ppat->entries[i].value, value);
|
|
|
|
if (score > best_score) {
|
|
|
|
entry = &ppat->entries[i];
|
|
|
|
if (score == INTEL_PPAT_PERFECT_MATCH) {
|
|
|
|
kref_get(&entry->ref);
|
|
|
|
return entry;
|
|
|
|
}
|
|
|
|
best_score = score;
|
|
|
|
}
|
|
|
|
scanned++;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (scanned == ppat->max_entries) {
|
|
|
|
if (!best_score)
|
|
|
|
return ERR_PTR(-ENOSPC);
|
|
|
|
|
|
|
|
kref_get(&entry->ref);
|
|
|
|
return entry;
|
|
|
|
}
|
|
|
|
|
|
|
|
i = find_first_zero_bit(ppat->used, ppat->max_entries);
|
|
|
|
entry = __alloc_ppat_entry(ppat, i, value);
|
|
|
|
ppat->update_hw(i915);
|
|
|
|
return entry;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void release_ppat(struct kref *kref)
|
|
|
|
{
|
|
|
|
struct intel_ppat_entry *entry =
|
|
|
|
container_of(kref, struct intel_ppat_entry, ref);
|
|
|
|
struct drm_i915_private *i915 = entry->ppat->i915;
|
|
|
|
|
|
|
|
__free_ppat_entry(entry);
|
|
|
|
entry->ppat->update_hw(i915);
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* intel_ppat_put - put back the PPAT entry got from intel_ppat_get()
|
|
|
|
* @entry: an intel PPAT entry
|
|
|
|
*
|
|
|
|
* Put back the PPAT entry got from intel_ppat_get(). If the PPAT index of the
|
|
|
|
* entry is dynamically allocated, its reference count will be decreased. Once
|
|
|
|
* the reference count becomes into zero, the PPAT index becomes free again.
|
|
|
|
*/
|
|
|
|
void intel_ppat_put(const struct intel_ppat_entry *entry)
|
|
|
|
{
|
|
|
|
struct intel_ppat *ppat = entry->ppat;
|
|
|
|
unsigned int index = entry - ppat->entries;
|
|
|
|
|
|
|
|
GEM_BUG_ON(!ppat->max_entries);
|
|
|
|
|
|
|
|
kref_put(&ppat->entries[index].ref, release_ppat);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void cnl_private_pat_update_hw(struct drm_i915_private *dev_priv)
|
|
|
|
{
|
|
|
|
struct intel_ppat *ppat = &dev_priv->ppat;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for_each_set_bit(i, ppat->dirty, ppat->max_entries) {
|
|
|
|
I915_WRITE(GEN10_PAT_INDEX(i), ppat->entries[i].value);
|
|
|
|
clear_bit(i, ppat->dirty);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void bdw_private_pat_update_hw(struct drm_i915_private *dev_priv)
|
|
|
|
{
|
|
|
|
struct intel_ppat *ppat = &dev_priv->ppat;
|
|
|
|
u64 pat = 0;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < ppat->max_entries; i++)
|
|
|
|
pat |= GEN8_PPAT(i, ppat->entries[i].value);
|
|
|
|
|
|
|
|
bitmap_clear(ppat->dirty, 0, ppat->max_entries);
|
|
|
|
|
|
|
|
I915_WRITE(GEN8_PRIVATE_PAT_LO, lower_32_bits(pat));
|
|
|
|
I915_WRITE(GEN8_PRIVATE_PAT_HI, upper_32_bits(pat));
|
|
|
|
}
|
|
|
|
|
|
|
|
static unsigned int bdw_private_pat_match(u8 src, u8 dst)
|
|
|
|
{
|
|
|
|
unsigned int score = 0;
|
|
|
|
enum {
|
|
|
|
AGE_MATCH = BIT(0),
|
|
|
|
TC_MATCH = BIT(1),
|
|
|
|
CA_MATCH = BIT(2),
|
|
|
|
};
|
|
|
|
|
|
|
|
/* Cache attribute has to be matched. */
|
|
|
|
if (GEN8_PPAT_GET_CA(src) == GEN8_PPAT_GET_CA(dst))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
score |= CA_MATCH;
|
|
|
|
|
|
|
|
if (GEN8_PPAT_GET_TC(src) == GEN8_PPAT_GET_TC(dst))
|
|
|
|
score |= TC_MATCH;
|
|
|
|
|
|
|
|
if (GEN8_PPAT_GET_AGE(src) == GEN8_PPAT_GET_AGE(dst))
|
|
|
|
score |= AGE_MATCH;
|
|
|
|
|
|
|
|
if (score == (AGE_MATCH | TC_MATCH | CA_MATCH))
|
|
|
|
return INTEL_PPAT_PERFECT_MATCH;
|
|
|
|
|
|
|
|
return score;
|
|
|
|
}
|
|
|
|
|
|
|
|
static unsigned int chv_private_pat_match(u8 src, u8 dst)
|
|
|
|
{
|
|
|
|
return (CHV_PPAT_GET_SNOOP(src) == CHV_PPAT_GET_SNOOP(dst)) ?
|
|
|
|
INTEL_PPAT_PERFECT_MATCH : 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void cnl_setup_private_ppat(struct intel_ppat *ppat)
|
|
|
|
{
|
|
|
|
ppat->max_entries = 8;
|
|
|
|
ppat->update_hw = cnl_private_pat_update_hw;
|
|
|
|
ppat->match = bdw_private_pat_match;
|
|
|
|
ppat->clear_value = GEN8_PPAT_WB | GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(3);
|
|
|
|
|
2017-08-16 07:25:39 +08:00
|
|
|
/* XXX: spec is unclear if this is still needed for CNL+ */
|
2017-09-14 20:39:40 +08:00
|
|
|
if (!USES_PPGTT(ppat->i915)) {
|
|
|
|
__alloc_ppat_entry(ppat, 0, GEN8_PPAT_UC);
|
2017-08-16 07:25:39 +08:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2017-09-14 20:39:40 +08:00
|
|
|
__alloc_ppat_entry(ppat, 0, GEN8_PPAT_WB | GEN8_PPAT_LLC);
|
|
|
|
__alloc_ppat_entry(ppat, 1, GEN8_PPAT_WC | GEN8_PPAT_LLCELLC);
|
|
|
|
__alloc_ppat_entry(ppat, 2, GEN8_PPAT_WT | GEN8_PPAT_LLCELLC);
|
|
|
|
__alloc_ppat_entry(ppat, 3, GEN8_PPAT_UC);
|
|
|
|
__alloc_ppat_entry(ppat, 4, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(0));
|
|
|
|
__alloc_ppat_entry(ppat, 5, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(1));
|
|
|
|
__alloc_ppat_entry(ppat, 6, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(2));
|
|
|
|
__alloc_ppat_entry(ppat, 7, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(3));
|
2017-08-16 07:25:39 +08:00
|
|
|
}
|
|
|
|
|
2013-11-05 11:56:49 +08:00
|
|
|
/* The GGTT and PPGTT need a private PPAT setup in order to handle cacheability
|
|
|
|
* bits. When using advanced contexts each context stores its own PAT, but
|
|
|
|
* writing this data shouldn't be harmful even in those cases. */
|
2017-09-14 20:39:40 +08:00
|
|
|
static void bdw_setup_private_ppat(struct intel_ppat *ppat)
|
2013-11-05 11:56:49 +08:00
|
|
|
{
|
2017-09-14 20:39:40 +08:00
|
|
|
ppat->max_entries = 8;
|
|
|
|
ppat->update_hw = bdw_private_pat_update_hw;
|
|
|
|
ppat->match = bdw_private_pat_match;
|
|
|
|
ppat->clear_value = GEN8_PPAT_WB | GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(3);
|
2013-11-05 11:56:49 +08:00
|
|
|
|
2017-09-14 20:39:40 +08:00
|
|
|
if (!USES_PPGTT(ppat->i915)) {
|
2014-11-06 08:56:36 +08:00
|
|
|
/* Spec: "For GGTT, there is NO pat_sel[2:0] from the entry,
|
|
|
|
* so RTL will always use the value corresponding to
|
|
|
|
* pat_sel = 000".
|
|
|
|
* So let's disable cache for GGTT to avoid screen corruptions.
|
|
|
|
* MOCS still can be used though.
|
|
|
|
* - System agent ggtt writes (i.e. cpu gtt mmaps) already work
|
|
|
|
* before this patch, i.e. the same uncached + snooping access
|
|
|
|
* like on gen6/7 seems to be in effect.
|
|
|
|
* - So this just fixes blitter/render access. Again it looks
|
|
|
|
* like it's not just uncached access, but uncached + snooping.
|
|
|
|
* So we can still hold onto all our assumptions wrt cpu
|
|
|
|
* clflushing on LLC machines.
|
|
|
|
*/
|
2017-09-14 20:39:40 +08:00
|
|
|
__alloc_ppat_entry(ppat, 0, GEN8_PPAT_UC);
|
|
|
|
return;
|
|
|
|
}
|
2014-11-06 08:56:36 +08:00
|
|
|
|
2017-09-14 20:39:40 +08:00
|
|
|
__alloc_ppat_entry(ppat, 0, GEN8_PPAT_WB | GEN8_PPAT_LLC); /* for normal objects, no eLLC */
|
|
|
|
__alloc_ppat_entry(ppat, 1, GEN8_PPAT_WC | GEN8_PPAT_LLCELLC); /* for something pointing to ptes? */
|
|
|
|
__alloc_ppat_entry(ppat, 2, GEN8_PPAT_WT | GEN8_PPAT_LLCELLC); /* for scanout with eLLC */
|
|
|
|
__alloc_ppat_entry(ppat, 3, GEN8_PPAT_UC); /* Uncached objects, mostly for scanout */
|
|
|
|
__alloc_ppat_entry(ppat, 4, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(0));
|
|
|
|
__alloc_ppat_entry(ppat, 5, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(1));
|
|
|
|
__alloc_ppat_entry(ppat, 6, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(2));
|
|
|
|
__alloc_ppat_entry(ppat, 7, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(3));
|
2013-11-05 11:56:49 +08:00
|
|
|
}
|
|
|
|
|
2017-09-14 20:39:40 +08:00
|
|
|
static void chv_setup_private_ppat(struct intel_ppat *ppat)
|
2014-04-09 18:28:01 +08:00
|
|
|
{
|
2017-09-14 20:39:40 +08:00
|
|
|
ppat->max_entries = 8;
|
|
|
|
ppat->update_hw = bdw_private_pat_update_hw;
|
|
|
|
ppat->match = chv_private_pat_match;
|
|
|
|
ppat->clear_value = CHV_PPAT_SNOOP;
|
2014-04-09 18:28:01 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Map WB on BDW to snooped on CHV.
|
|
|
|
*
|
|
|
|
* Only the snoop bit has meaning for CHV, the rest is
|
|
|
|
* ignored.
|
|
|
|
*
|
2014-11-15 03:02:44 +08:00
|
|
|
* The hardware will never snoop for certain types of accesses:
|
|
|
|
* - CPU GTT (GMADR->GGTT->no snoop->memory)
|
|
|
|
* - PPGTT page tables
|
|
|
|
* - some other special cycles
|
|
|
|
*
|
|
|
|
* As with BDW, we also need to consider the following for GT accesses:
|
|
|
|
* "For GGTT, there is NO pat_sel[2:0] from the entry,
|
|
|
|
* so RTL will always use the value corresponding to
|
|
|
|
* pat_sel = 000".
|
|
|
|
* Which means we must set the snoop bit in PAT entry 0
|
|
|
|
* in order to keep the global status page working.
|
2014-04-09 18:28:01 +08:00
|
|
|
*/
|
|
|
|
|
2017-09-14 20:39:40 +08:00
|
|
|
__alloc_ppat_entry(ppat, 0, CHV_PPAT_SNOOP);
|
|
|
|
__alloc_ppat_entry(ppat, 1, 0);
|
|
|
|
__alloc_ppat_entry(ppat, 2, 0);
|
|
|
|
__alloc_ppat_entry(ppat, 3, 0);
|
|
|
|
__alloc_ppat_entry(ppat, 4, CHV_PPAT_SNOOP);
|
|
|
|
__alloc_ppat_entry(ppat, 5, CHV_PPAT_SNOOP);
|
|
|
|
__alloc_ppat_entry(ppat, 6, CHV_PPAT_SNOOP);
|
|
|
|
__alloc_ppat_entry(ppat, 7, CHV_PPAT_SNOOP);
|
2014-04-09 18:28:01 +08:00
|
|
|
}
|
|
|
|
|
2016-08-04 14:52:24 +08:00
|
|
|
static void gen6_gmch_remove(struct i915_address_space *vm)
|
|
|
|
{
|
|
|
|
struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
|
|
|
|
|
|
|
|
iounmap(ggtt->gsm);
|
2017-02-15 16:43:40 +08:00
|
|
|
cleanup_scratch_page(vm);
|
2016-08-04 14:52:24 +08:00
|
|
|
}
|
|
|
|
|
2017-09-12 15:42:24 +08:00
|
|
|
static void setup_private_pat(struct drm_i915_private *dev_priv)
|
|
|
|
{
|
2017-09-14 20:39:40 +08:00
|
|
|
struct intel_ppat *ppat = &dev_priv->ppat;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
ppat->i915 = dev_priv;
|
|
|
|
|
2017-09-12 15:42:24 +08:00
|
|
|
if (INTEL_GEN(dev_priv) >= 10)
|
2017-09-14 20:39:40 +08:00
|
|
|
cnl_setup_private_ppat(ppat);
|
2017-09-12 15:42:24 +08:00
|
|
|
else if (IS_CHERRYVIEW(dev_priv) || IS_GEN9_LP(dev_priv))
|
2017-09-14 20:39:40 +08:00
|
|
|
chv_setup_private_ppat(ppat);
|
2017-09-12 15:42:24 +08:00
|
|
|
else
|
2017-09-14 20:39:40 +08:00
|
|
|
bdw_setup_private_ppat(ppat);
|
|
|
|
|
|
|
|
GEM_BUG_ON(ppat->max_entries > INTEL_MAX_PPAT_ENTRIES);
|
|
|
|
|
|
|
|
for_each_clear_bit(i, ppat->used, ppat->max_entries) {
|
|
|
|
ppat->entries[i].value = ppat->clear_value;
|
|
|
|
ppat->entries[i].ppat = ppat;
|
|
|
|
set_bit(i, ppat->dirty);
|
|
|
|
}
|
|
|
|
|
|
|
|
ppat->update_hw(dev_priv);
|
2017-09-12 15:42:24 +08:00
|
|
|
}
|
|
|
|
|
2016-03-18 16:42:58 +08:00
|
|
|
static int gen8_gmch_probe(struct i915_ggtt *ggtt)
|
2013-11-05 11:32:22 +08:00
|
|
|
{
|
2016-11-29 17:50:08 +08:00
|
|
|
struct drm_i915_private *dev_priv = ggtt->base.i915;
|
2016-08-04 14:52:22 +08:00
|
|
|
struct pci_dev *pdev = dev_priv->drm.pdev;
|
2016-08-04 14:52:24 +08:00
|
|
|
unsigned int size;
|
2013-11-05 11:32:22 +08:00
|
|
|
u16 snb_gmch_ctl;
|
2017-05-10 17:21:50 +08:00
|
|
|
int err;
|
2013-11-05 11:32:22 +08:00
|
|
|
|
|
|
|
/* TODO: We're not aware of mappable constraints on gen8 yet */
|
2016-08-04 14:52:22 +08:00
|
|
|
ggtt->mappable_base = pci_resource_start(pdev, 2);
|
|
|
|
ggtt->mappable_end = pci_resource_len(pdev, 2);
|
2013-11-05 11:32:22 +08:00
|
|
|
|
2017-05-10 17:21:50 +08:00
|
|
|
err = pci_set_dma_mask(pdev, DMA_BIT_MASK(39));
|
|
|
|
if (!err)
|
|
|
|
err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(39));
|
|
|
|
if (err)
|
|
|
|
DRM_ERROR("Can't set DMA mask/consistent mask (%d)\n", err);
|
2013-11-05 11:32:22 +08:00
|
|
|
|
2016-08-04 14:52:22 +08:00
|
|
|
pci_read_config_word(pdev, SNB_GMCH_CTRL, &snb_gmch_ctl);
|
2013-11-05 11:32:22 +08:00
|
|
|
|
2016-08-04 14:52:22 +08:00
|
|
|
if (INTEL_GEN(dev_priv) >= 9) {
|
2016-03-18 16:42:58 +08:00
|
|
|
ggtt->stolen_size = gen9_get_stolen_size(snb_gmch_ctl);
|
2016-08-04 14:52:24 +08:00
|
|
|
size = gen8_get_total_gtt_size(snb_gmch_ctl);
|
2016-08-04 14:52:22 +08:00
|
|
|
} else if (IS_CHERRYVIEW(dev_priv)) {
|
2016-03-18 16:42:58 +08:00
|
|
|
ggtt->stolen_size = chv_get_stolen_size(snb_gmch_ctl);
|
2016-08-04 14:52:24 +08:00
|
|
|
size = chv_get_total_gtt_size(snb_gmch_ctl);
|
2014-05-09 03:19:40 +08:00
|
|
|
} else {
|
2016-03-18 16:42:58 +08:00
|
|
|
ggtt->stolen_size = gen8_get_stolen_size(snb_gmch_ctl);
|
2016-08-04 14:52:24 +08:00
|
|
|
size = gen8_get_total_gtt_size(snb_gmch_ctl);
|
2014-05-09 03:19:40 +08:00
|
|
|
}
|
2013-11-05 11:32:22 +08:00
|
|
|
|
2016-08-04 14:52:24 +08:00
|
|
|
ggtt->base.total = (size / sizeof(gen8_pte_t)) << PAGE_SHIFT;
|
|
|
|
ggtt->base.cleanup = gen6_gmch_remove;
|
2016-03-18 16:42:58 +08:00
|
|
|
ggtt->base.bind_vma = ggtt_bind_vma;
|
|
|
|
ggtt->base.unbind_vma = ggtt_unbind_vma;
|
2016-06-10 16:52:59 +08:00
|
|
|
ggtt->base.insert_page = gen8_ggtt_insert_page;
|
2016-05-14 14:26:35 +08:00
|
|
|
ggtt->base.clear_range = nop_clear_range;
|
2016-06-24 21:07:14 +08:00
|
|
|
if (!USES_FULL_PPGTT(dev_priv) || intel_scanout_needs_vtd_wa(dev_priv))
|
2016-05-14 14:26:35 +08:00
|
|
|
ggtt->base.clear_range = gen8_ggtt_clear_range;
|
|
|
|
|
|
|
|
ggtt->base.insert_entries = gen8_ggtt_insert_entries;
|
|
|
|
|
2017-05-24 23:54:11 +08:00
|
|
|
/* Serialize GTT updates with aperture access on BXT if VT-d is on. */
|
|
|
|
if (intel_ggtt_update_needs_vtd_wa(dev_priv)) {
|
|
|
|
ggtt->base.insert_entries = bxt_vtd_ggtt_insert_entries__BKL;
|
|
|
|
ggtt->base.insert_page = bxt_vtd_ggtt_insert_page__BKL;
|
|
|
|
if (ggtt->base.clear_range != nop_clear_range)
|
|
|
|
ggtt->base.clear_range = bxt_vtd_ggtt_clear_range__BKL;
|
|
|
|
}
|
|
|
|
|
2017-01-12 19:00:49 +08:00
|
|
|
ggtt->invalidate = gen6_ggtt_invalidate;
|
|
|
|
|
2017-09-12 15:42:24 +08:00
|
|
|
setup_private_pat(dev_priv);
|
|
|
|
|
2016-08-04 14:52:24 +08:00
|
|
|
return ggtt_probe_common(ggtt, size);
|
2013-11-05 11:32:22 +08:00
|
|
|
}
|
|
|
|
|
2016-03-18 16:42:58 +08:00
|
|
|
static int gen6_gmch_probe(struct i915_ggtt *ggtt)
|
2012-11-05 01:21:27 +08:00
|
|
|
{
|
2016-11-29 17:50:08 +08:00
|
|
|
struct drm_i915_private *dev_priv = ggtt->base.i915;
|
2016-08-04 14:52:22 +08:00
|
|
|
struct pci_dev *pdev = dev_priv->drm.pdev;
|
2016-08-04 14:52:24 +08:00
|
|
|
unsigned int size;
|
2012-11-05 01:21:27 +08:00
|
|
|
u16 snb_gmch_ctl;
|
2017-05-10 17:21:50 +08:00
|
|
|
int err;
|
2012-11-05 01:21:27 +08:00
|
|
|
|
2016-08-04 14:52:22 +08:00
|
|
|
ggtt->mappable_base = pci_resource_start(pdev, 2);
|
|
|
|
ggtt->mappable_end = pci_resource_len(pdev, 2);
|
2013-02-09 03:32:47 +08:00
|
|
|
|
2013-01-25 05:49:57 +08:00
|
|
|
/* 64/512MB is the current min/max we actually know of, but this is just
|
|
|
|
* a coarse sanity check.
|
2012-11-05 01:21:27 +08:00
|
|
|
*/
|
2016-08-04 14:52:24 +08:00
|
|
|
if (ggtt->mappable_end < (64<<20) || ggtt->mappable_end > (512<<20)) {
|
2016-03-18 16:42:58 +08:00
|
|
|
DRM_ERROR("Unknown GMADR size (%llx)\n", ggtt->mappable_end);
|
2013-01-25 05:49:57 +08:00
|
|
|
return -ENXIO;
|
2012-11-05 01:21:27 +08:00
|
|
|
}
|
|
|
|
|
2017-05-10 17:21:50 +08:00
|
|
|
err = pci_set_dma_mask(pdev, DMA_BIT_MASK(40));
|
|
|
|
if (!err)
|
|
|
|
err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(40));
|
|
|
|
if (err)
|
|
|
|
DRM_ERROR("Can't set DMA mask/consistent mask (%d)\n", err);
|
2016-08-04 14:52:22 +08:00
|
|
|
pci_read_config_word(pdev, SNB_GMCH_CTRL, &snb_gmch_ctl);
|
2012-11-05 01:21:27 +08:00
|
|
|
|
2016-03-18 16:42:58 +08:00
|
|
|
ggtt->stolen_size = gen6_get_stolen_size(snb_gmch_ctl);
|
2012-11-05 01:21:27 +08:00
|
|
|
|
2016-08-04 14:52:24 +08:00
|
|
|
size = gen6_get_total_gtt_size(snb_gmch_ctl);
|
|
|
|
ggtt->base.total = (size / sizeof(gen6_pte_t)) << PAGE_SHIFT;
|
2012-11-05 01:21:27 +08:00
|
|
|
|
2016-03-18 16:42:58 +08:00
|
|
|
ggtt->base.clear_range = gen6_ggtt_clear_range;
|
2016-06-10 16:52:59 +08:00
|
|
|
ggtt->base.insert_page = gen6_ggtt_insert_page;
|
2016-03-18 16:42:58 +08:00
|
|
|
ggtt->base.insert_entries = gen6_ggtt_insert_entries;
|
|
|
|
ggtt->base.bind_vma = ggtt_bind_vma;
|
|
|
|
ggtt->base.unbind_vma = ggtt_unbind_vma;
|
2016-08-04 14:52:24 +08:00
|
|
|
ggtt->base.cleanup = gen6_gmch_remove;
|
|
|
|
|
2017-01-12 19:00:49 +08:00
|
|
|
ggtt->invalidate = gen6_ggtt_invalidate;
|
|
|
|
|
2016-08-04 14:52:24 +08:00
|
|
|
if (HAS_EDRAM(dev_priv))
|
|
|
|
ggtt->base.pte_encode = iris_pte_encode;
|
|
|
|
else if (IS_HASWELL(dev_priv))
|
|
|
|
ggtt->base.pte_encode = hsw_pte_encode;
|
|
|
|
else if (IS_VALLEYVIEW(dev_priv))
|
|
|
|
ggtt->base.pte_encode = byt_pte_encode;
|
|
|
|
else if (INTEL_GEN(dev_priv) >= 7)
|
|
|
|
ggtt->base.pte_encode = ivb_pte_encode;
|
|
|
|
else
|
|
|
|
ggtt->base.pte_encode = snb_pte_encode;
|
2013-01-25 06:44:55 +08:00
|
|
|
|
2016-08-04 14:52:24 +08:00
|
|
|
return ggtt_probe_common(ggtt, size);
|
2012-11-05 01:21:27 +08:00
|
|
|
}
|
|
|
|
|
2016-08-04 14:52:24 +08:00
|
|
|
static void i915_gmch_remove(struct i915_address_space *vm)
|
2012-11-05 01:21:27 +08:00
|
|
|
{
|
2016-08-04 14:52:24 +08:00
|
|
|
intel_gmch_remove();
|
2012-03-26 15:45:40 +08:00
|
|
|
}
|
2013-01-25 05:49:57 +08:00
|
|
|
|
2016-03-18 16:42:58 +08:00
|
|
|
static int i915_gmch_probe(struct i915_ggtt *ggtt)
|
2013-01-25 05:49:57 +08:00
|
|
|
{
|
2016-11-29 17:50:08 +08:00
|
|
|
struct drm_i915_private *dev_priv = ggtt->base.i915;
|
2013-01-25 05:49:57 +08:00
|
|
|
int ret;
|
|
|
|
|
2016-07-05 17:40:23 +08:00
|
|
|
ret = intel_gmch_probe(dev_priv->bridge_dev, dev_priv->drm.pdev, NULL);
|
2013-01-25 05:49:57 +08:00
|
|
|
if (!ret) {
|
|
|
|
DRM_ERROR("failed to set up gmch\n");
|
|
|
|
return -EIO;
|
|
|
|
}
|
|
|
|
|
2017-01-06 23:20:11 +08:00
|
|
|
intel_gtt_get(&ggtt->base.total,
|
|
|
|
&ggtt->stolen_size,
|
|
|
|
&ggtt->mappable_base,
|
|
|
|
&ggtt->mappable_end);
|
2013-01-25 05:49:57 +08:00
|
|
|
|
2016-08-04 14:52:22 +08:00
|
|
|
ggtt->do_idle_maps = needs_idle_maps(dev_priv);
|
2016-06-10 16:52:59 +08:00
|
|
|
ggtt->base.insert_page = i915_ggtt_insert_page;
|
2016-03-18 16:42:58 +08:00
|
|
|
ggtt->base.insert_entries = i915_ggtt_insert_entries;
|
|
|
|
ggtt->base.clear_range = i915_ggtt_clear_range;
|
|
|
|
ggtt->base.bind_vma = ggtt_bind_vma;
|
|
|
|
ggtt->base.unbind_vma = ggtt_unbind_vma;
|
2016-08-04 14:52:24 +08:00
|
|
|
ggtt->base.cleanup = i915_gmch_remove;
|
2013-01-25 05:49:57 +08:00
|
|
|
|
2017-01-12 19:00:49 +08:00
|
|
|
ggtt->invalidate = gmch_ggtt_invalidate;
|
|
|
|
|
2016-03-18 16:42:58 +08:00
|
|
|
if (unlikely(ggtt->do_idle_maps))
|
2013-12-30 20:16:15 +08:00
|
|
|
DRM_INFO("applying Ironlake quirks for intel_iommu\n");
|
|
|
|
|
2013-01-25 05:49:57 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2016-03-24 22:47:46 +08:00
|
|
|
/**
|
2016-08-04 14:52:21 +08:00
|
|
|
* i915_ggtt_probe_hw - Probe GGTT hardware location
|
2016-08-04 14:52:22 +08:00
|
|
|
* @dev_priv: i915 device
|
2016-03-24 22:47:46 +08:00
|
|
|
*/
|
2016-08-04 14:52:22 +08:00
|
|
|
int i915_ggtt_probe_hw(struct drm_i915_private *dev_priv)
|
2013-01-25 05:49:57 +08:00
|
|
|
{
|
2016-03-18 16:42:57 +08:00
|
|
|
struct i915_ggtt *ggtt = &dev_priv->ggtt;
|
2013-01-25 05:49:57 +08:00
|
|
|
int ret;
|
|
|
|
|
2016-11-29 17:50:08 +08:00
|
|
|
ggtt->base.i915 = dev_priv;
|
2017-02-15 16:43:40 +08:00
|
|
|
ggtt->base.dma = &dev_priv->drm.pdev->dev;
|
2015-06-25 23:35:13 +08:00
|
|
|
|
2016-08-04 14:52:24 +08:00
|
|
|
if (INTEL_GEN(dev_priv) <= 5)
|
|
|
|
ret = i915_gmch_probe(ggtt);
|
|
|
|
else if (INTEL_GEN(dev_priv) < 8)
|
|
|
|
ret = gen6_gmch_probe(ggtt);
|
|
|
|
else
|
|
|
|
ret = gen8_gmch_probe(ggtt);
|
2013-01-25 06:45:00 +08:00
|
|
|
if (ret)
|
2013-01-25 05:49:57 +08:00
|
|
|
return ret;
|
|
|
|
|
2017-01-05 23:30:23 +08:00
|
|
|
/* Trim the GGTT to fit the GuC mappable upper range (when enabled).
|
|
|
|
* This is easier than doing range restriction on the fly, as we
|
|
|
|
* currently don't have any bits spare to pass in this upper
|
|
|
|
* restriction!
|
|
|
|
*/
|
|
|
|
if (HAS_GUC(dev_priv) && i915.enable_guc_loading) {
|
|
|
|
ggtt->base.total = min_t(u64, ggtt->base.total, GUC_GGTT_TOP);
|
|
|
|
ggtt->mappable_end = min(ggtt->mappable_end, ggtt->base.total);
|
|
|
|
}
|
|
|
|
|
2016-03-18 16:42:59 +08:00
|
|
|
if ((ggtt->base.total - 1) >> 32) {
|
|
|
|
DRM_ERROR("We never expected a Global GTT with more than 32bits"
|
2016-08-04 14:52:23 +08:00
|
|
|
" of address space! Found %lldM!\n",
|
2016-03-18 16:42:59 +08:00
|
|
|
ggtt->base.total >> 20);
|
|
|
|
ggtt->base.total = 1ULL << 32;
|
|
|
|
ggtt->mappable_end = min(ggtt->mappable_end, ggtt->base.total);
|
|
|
|
}
|
|
|
|
|
2016-08-04 14:52:23 +08:00
|
|
|
if (ggtt->mappable_end > ggtt->base.total) {
|
|
|
|
DRM_ERROR("mappable aperture extends past end of GGTT,"
|
|
|
|
" aperture=%llx, total=%llx\n",
|
|
|
|
ggtt->mappable_end, ggtt->base.total);
|
|
|
|
ggtt->mappable_end = ggtt->base.total;
|
|
|
|
}
|
|
|
|
|
2013-01-25 05:49:57 +08:00
|
|
|
/* GMADR is the PCI mmio aperture into the global GTT. */
|
2015-06-25 23:35:05 +08:00
|
|
|
DRM_INFO("Memory usable by graphics device = %lluM\n",
|
2016-03-18 16:42:57 +08:00
|
|
|
ggtt->base.total >> 20);
|
|
|
|
DRM_DEBUG_DRIVER("GMADR size = %lldM\n", ggtt->mappable_end >> 20);
|
2017-01-06 23:20:11 +08:00
|
|
|
DRM_DEBUG_DRIVER("GTT stolen size = %uM\n", ggtt->stolen_size >> 20);
|
2017-05-25 20:16:12 +08:00
|
|
|
if (intel_vtd_active())
|
2014-03-31 22:23:04 +08:00
|
|
|
DRM_INFO("VT-d active for gfx access\n");
|
2013-01-25 05:49:57 +08:00
|
|
|
|
|
|
|
return 0;
|
2016-08-04 14:52:21 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* i915_ggtt_init_hw - Initialize GGTT hardware
|
2016-08-04 14:52:22 +08:00
|
|
|
* @dev_priv: i915 device
|
2016-08-04 14:52:21 +08:00
|
|
|
*/
|
2016-08-04 14:52:22 +08:00
|
|
|
int i915_ggtt_init_hw(struct drm_i915_private *dev_priv)
|
2016-08-04 14:52:21 +08:00
|
|
|
{
|
|
|
|
struct i915_ggtt *ggtt = &dev_priv->ggtt;
|
|
|
|
int ret;
|
|
|
|
|
2016-08-04 14:52:23 +08:00
|
|
|
INIT_LIST_HEAD(&dev_priv->vm_list);
|
|
|
|
|
2017-02-06 16:45:47 +08:00
|
|
|
/* Note that we use page colouring to enforce a guard page at the
|
|
|
|
* end of the address space. This is required as the CS may prefetch
|
|
|
|
* beyond the end of the batch buffer, across the page boundary,
|
|
|
|
* and beyond the end of the GTT if we do not provide a guard.
|
2016-08-04 14:52:23 +08:00
|
|
|
*/
|
2016-10-28 20:58:58 +08:00
|
|
|
mutex_lock(&dev_priv->drm.struct_mutex);
|
|
|
|
i915_address_space_init(&ggtt->base, dev_priv, "[global]");
|
2017-02-06 16:45:47 +08:00
|
|
|
if (!HAS_LLC(dev_priv) && !USES_PPGTT(dev_priv))
|
2016-08-04 14:52:23 +08:00
|
|
|
ggtt->base.mm.color_adjust = i915_gtt_color_adjust;
|
2016-10-28 20:58:58 +08:00
|
|
|
mutex_unlock(&dev_priv->drm.struct_mutex);
|
2016-08-04 14:52:23 +08:00
|
|
|
|
2016-08-19 23:54:27 +08:00
|
|
|
if (!io_mapping_init_wc(&dev_priv->ggtt.mappable,
|
|
|
|
dev_priv->ggtt.mappable_base,
|
|
|
|
dev_priv->ggtt.mappable_end)) {
|
2016-08-04 14:52:23 +08:00
|
|
|
ret = -EIO;
|
|
|
|
goto out_gtt_cleanup;
|
|
|
|
}
|
|
|
|
|
|
|
|
ggtt->mtrr = arch_phys_wc_add(ggtt->mappable_base, ggtt->mappable_end);
|
|
|
|
|
2016-08-04 14:52:21 +08:00
|
|
|
/*
|
|
|
|
* Initialise stolen early so that we may reserve preallocated
|
|
|
|
* objects for the BIOS to KMS transition.
|
|
|
|
*/
|
2016-11-16 16:55:35 +08:00
|
|
|
ret = i915_gem_init_stolen(dev_priv);
|
2016-08-04 14:52:21 +08:00
|
|
|
if (ret)
|
|
|
|
goto out_gtt_cleanup;
|
|
|
|
|
|
|
|
return 0;
|
2016-01-19 21:26:32 +08:00
|
|
|
|
|
|
|
out_gtt_cleanup:
|
2016-03-30 21:57:10 +08:00
|
|
|
ggtt->base.cleanup(&ggtt->base);
|
2016-01-19 21:26:32 +08:00
|
|
|
return ret;
|
2013-01-25 05:49:57 +08:00
|
|
|
}
|
drm/i915: Create bind/unbind abstraction for VMAs
To sum up what goes on here, we abstract the vma binding, similarly to
the previous object binding. This helps for distinguishing legacy
binding, versus modern binding. To keep the code churn as minimal as
possible, I am leaving in insert_entries(). It serves as the per
platform pte writing basically. bind_vma and insert_entries do share a
lot of similarities, and I did have designs to combine the two, but as
mentioned already... too much churn in an already massive patchset.
What follows are the 3 commits which existed discretely in the original
submissions. Upon rebasing on Broadwell support, it became clear that
separation was not good, and only made for more error prone code. Below
are the 3 commit messages with all their history.
drm/i915: Add bind/unbind object functions to VMA
drm/i915: Use the new vm [un]bind functions
drm/i915: reduce vm->insert_entries() usage
drm/i915: Add bind/unbind object functions to VMA
As we plumb the code with more VM information, it has become more
obvious that the easiest way to deal with bind and unbind is to simply
put the function pointers in the vm, and let those choose the correct
way to handle the page table updates. This change allows many places in
the code to simply be vm->bind, and not have to worry about
distinguishing PPGTT vs GGTT.
Notice that this patch has no impact on functionality. I've decided to
save the actual change until the next patch because I think it's easier
to review that way. I'm happy to squash the two, or let Daniel do it on
merge.
v2:
Make ggtt handle the quirky aliasing ppgtt
Add flags to bind object to support above
Don't ever call bind/unbind directly for PPGTT until we have real, full
PPGTT (use NULLs to assert this)
Make sure we rebind the ggtt if there already is a ggtt binding. This
happens on set cache levels.
Use VMA for bind/unbind (Daniel, Ben)
v3: Reorganize ggtt_vma_bind to be more concise and easier to read
(Ville). Change logic in unbind to only unbind ggtt when there is a
global mapping, and to remove a redundant check if the aliasing ppgtt
exists.
v4: Make the bind function a bit smarter about the cache levels to avoid
unnecessary multiple remaps. "I accept it is a wart, I think unifying
the pin_vma / bind_vma could be unified later" (Chris)
Removed the git notes, and put version info here. (Daniel)
v5: Update the comment to not suck (Chris)
v6:
Move bind/unbind to the VMA. It makes more sense in the VMA structure
(always has, but I was previously lazy). With this change, it will allow
us to keep a distinct insert_entries.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
drm/i915: Use the new vm [un]bind functions
Building on the last patch which created the new function pointers in
the VM for bind/unbind, here we actually put those new function pointers
to use.
Split out as a separate patch to aid in review. I'm fine with squashing
into the previous patch if people request it.
v2: Updated to address the smart ggtt which can do aliasing as needed
Make sure we bind to global gtt when mappable and fenceable. I thought
we could get away without this initialy, but we cannot.
v3: Make the global GTT binding explicitly use the ggtt VM for
bind_vma(). While at it, use the new ggtt_vma helper (Chris)
At this point the original mailing list thread diverges. ie.
v4^:
use target_obj instead of obj for gen6 relocate_entry
vma->bind_vma() can be called safely during pin. So simply do that
instead of the complicated conditionals.
Don't restore PPGTT bound objects on resume path
Bug fix in resume path for globally bound Bos
Properly handle secure dispatch
Rebased on vma bind/unbind conversion
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
drm/i915: reduce vm->insert_entries() usage
FKA: drm/i915: eliminate vm->insert_entries()
With bind/unbind function pointers in place, we no longer need
insert_entries. We could, and want, to remove clear_range, however it's
not totally easy at this point. Since it's used in a couple of place
still that don't only deal in objects: setup, ppgtt init, and restore
gtt mappings.
v2: Don't actually remove insert_entries, just limit its usage. It will
be useful when we introduce gen8. It will always be called from the vma
bind/unbind.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-07 06:10:56 +08:00
|
|
|
|
2016-08-04 14:52:22 +08:00
|
|
|
int i915_ggtt_enable_hw(struct drm_i915_private *dev_priv)
|
2016-05-07 02:35:55 +08:00
|
|
|
{
|
2016-08-04 14:52:22 +08:00
|
|
|
if (INTEL_GEN(dev_priv) < 6 && !intel_enable_gtt())
|
2016-05-07 02:35:55 +08:00
|
|
|
return -EIO;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-01-12 19:00:49 +08:00
|
|
|
void i915_ggtt_enable_guc(struct drm_i915_private *i915)
|
|
|
|
{
|
2017-06-01 17:04:46 +08:00
|
|
|
GEM_BUG_ON(i915->ggtt.invalidate != gen6_ggtt_invalidate);
|
|
|
|
|
2017-01-12 19:00:49 +08:00
|
|
|
i915->ggtt.invalidate = guc_ggtt_invalidate;
|
|
|
|
}
|
|
|
|
|
|
|
|
void i915_ggtt_disable_guc(struct drm_i915_private *i915)
|
|
|
|
{
|
2017-06-01 17:04:46 +08:00
|
|
|
/* We should only be called after i915_ggtt_enable_guc() */
|
|
|
|
GEM_BUG_ON(i915->ggtt.invalidate != guc_ggtt_invalidate);
|
|
|
|
|
|
|
|
i915->ggtt.invalidate = gen6_ggtt_invalidate;
|
2017-01-12 19:00:49 +08:00
|
|
|
}
|
|
|
|
|
2016-11-16 16:55:34 +08:00
|
|
|
void i915_gem_restore_gtt_mappings(struct drm_i915_private *dev_priv)
|
2015-04-14 23:35:23 +08:00
|
|
|
{
|
2016-03-30 21:57:10 +08:00
|
|
|
struct i915_ggtt *ggtt = &dev_priv->ggtt;
|
2016-09-10 04:19:57 +08:00
|
|
|
struct drm_i915_gem_object *obj, *on;
|
2015-04-14 23:35:23 +08:00
|
|
|
|
2016-05-10 21:10:04 +08:00
|
|
|
i915_check_and_clear_faults(dev_priv);
|
2015-04-14 23:35:23 +08:00
|
|
|
|
|
|
|
/* First fill our portion of the GTT with scratch pages */
|
2017-02-15 16:43:54 +08:00
|
|
|
ggtt->base.clear_range(&ggtt->base, 0, ggtt->base.total);
|
2015-04-14 23:35:23 +08:00
|
|
|
|
2016-09-10 04:19:57 +08:00
|
|
|
ggtt->base.closed = true; /* skip rewriting PTE on VMA unbind */
|
|
|
|
|
|
|
|
/* clflush objects bound into the GGTT and rebind them. */
|
|
|
|
list_for_each_entry_safe(obj, on,
|
2016-11-02 18:16:04 +08:00
|
|
|
&dev_priv->mm.bound_list, global_link) {
|
2016-09-10 04:19:57 +08:00
|
|
|
bool ggtt_bound = false;
|
|
|
|
struct i915_vma *vma;
|
|
|
|
|
2016-02-26 19:03:19 +08:00
|
|
|
list_for_each_entry(vma, &obj->vma_list, obj_link) {
|
2016-03-30 21:57:10 +08:00
|
|
|
if (vma->vm != &ggtt->base)
|
2015-07-06 22:15:01 +08:00
|
|
|
continue;
|
2015-04-14 23:35:23 +08:00
|
|
|
|
2016-09-10 04:19:57 +08:00
|
|
|
if (!i915_vma_unbind(vma))
|
|
|
|
continue;
|
|
|
|
|
2015-07-06 22:15:01 +08:00
|
|
|
WARN_ON(i915_vma_bind(vma, obj->cache_level,
|
|
|
|
PIN_UPDATE));
|
2016-09-10 04:19:57 +08:00
|
|
|
ggtt_bound = true;
|
2015-07-06 22:15:01 +08:00
|
|
|
}
|
|
|
|
|
2016-09-10 04:19:57 +08:00
|
|
|
if (ggtt_bound)
|
2016-05-14 14:26:34 +08:00
|
|
|
WARN_ON(i915_gem_object_set_to_gtt_domain(obj, false));
|
2015-07-06 22:15:01 +08:00
|
|
|
}
|
2015-04-14 23:35:23 +08:00
|
|
|
|
2016-09-10 04:19:57 +08:00
|
|
|
ggtt->base.closed = false;
|
|
|
|
|
2016-11-16 16:55:34 +08:00
|
|
|
if (INTEL_GEN(dev_priv) >= 8) {
|
2017-09-14 20:39:40 +08:00
|
|
|
struct intel_ppat *ppat = &dev_priv->ppat;
|
2015-04-14 23:35:23 +08:00
|
|
|
|
2017-09-14 20:39:40 +08:00
|
|
|
bitmap_set(ppat->dirty, 0, ppat->max_entries);
|
|
|
|
dev_priv->ppat.update_hw(dev_priv);
|
2015-04-14 23:35:23 +08:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2016-11-16 16:55:34 +08:00
|
|
|
if (USES_PPGTT(dev_priv)) {
|
2016-03-30 21:57:10 +08:00
|
|
|
struct i915_address_space *vm;
|
|
|
|
|
2015-04-14 23:35:23 +08:00
|
|
|
list_for_each_entry(vm, &dev_priv->vm_list, global_link) {
|
2016-04-07 16:08:03 +08:00
|
|
|
struct i915_hw_ppgtt *ppgtt;
|
2015-04-14 23:35:23 +08:00
|
|
|
|
2016-08-04 14:52:25 +08:00
|
|
|
if (i915_is_ggtt(vm))
|
2015-04-14 23:35:23 +08:00
|
|
|
ppgtt = dev_priv->mm.aliasing_ppgtt;
|
2016-04-07 16:08:03 +08:00
|
|
|
else
|
|
|
|
ppgtt = i915_vm_to_ppgtt(vm);
|
2015-04-14 23:35:23 +08:00
|
|
|
|
2017-02-15 16:43:45 +08:00
|
|
|
gen6_write_page_range(ppgtt, 0, ppgtt->base.total);
|
2015-04-14 23:35:23 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-01-12 19:00:49 +08:00
|
|
|
i915_ggtt_invalidate(dev_priv);
|
2015-04-14 23:35:23 +08:00
|
|
|
}
|
|
|
|
|
2015-09-21 17:45:33 +08:00
|
|
|
static struct scatterlist *
|
2016-01-14 21:22:11 +08:00
|
|
|
rotate_pages(const dma_addr_t *in, unsigned int offset,
|
2015-09-21 17:45:33 +08:00
|
|
|
unsigned int width, unsigned int height,
|
2016-01-21 03:05:23 +08:00
|
|
|
unsigned int stride,
|
2015-09-21 17:45:33 +08:00
|
|
|
struct sg_table *st, struct scatterlist *sg)
|
2015-03-23 19:10:36 +08:00
|
|
|
{
|
|
|
|
unsigned int column, row;
|
|
|
|
unsigned int src_idx;
|
|
|
|
|
|
|
|
for (column = 0; column < width; column++) {
|
2016-01-21 03:05:23 +08:00
|
|
|
src_idx = stride * (height - 1) + column;
|
2015-03-23 19:10:36 +08:00
|
|
|
for (row = 0; row < height; row++) {
|
|
|
|
st->nents++;
|
|
|
|
/* We don't need the pages, but need to initialize
|
|
|
|
* the entries so the sg list can be happily traversed.
|
|
|
|
* The only thing we need are DMA addresses.
|
|
|
|
*/
|
|
|
|
sg_set_page(sg, NULL, PAGE_SIZE, 0);
|
2015-09-21 17:45:33 +08:00
|
|
|
sg_dma_address(sg) = in[offset + src_idx];
|
2015-03-23 19:10:36 +08:00
|
|
|
sg_dma_len(sg) = PAGE_SIZE;
|
|
|
|
sg = sg_next(sg);
|
2016-01-21 03:05:23 +08:00
|
|
|
src_idx -= stride;
|
2015-03-23 19:10:36 +08:00
|
|
|
}
|
|
|
|
}
|
2015-09-21 17:45:33 +08:00
|
|
|
|
|
|
|
return sg;
|
2015-03-23 19:10:36 +08:00
|
|
|
}
|
|
|
|
|
2017-02-15 16:43:35 +08:00
|
|
|
static noinline struct sg_table *
|
|
|
|
intel_rotate_pages(struct intel_rotation_info *rot_info,
|
|
|
|
struct drm_i915_gem_object *obj)
|
2015-03-23 19:10:36 +08:00
|
|
|
{
|
2017-02-15 16:43:57 +08:00
|
|
|
const unsigned long n_pages = obj->base.size / PAGE_SIZE;
|
drm/i915: Rewrite fb rotation GTT handling
Redo the fb rotation handling in order to:
- eliminate the NV12 special casing
- handle fb->offsets[] properly
- make the rotation handling easier for the plane code
To achieve these goals we reduce intel_rotation_info to only contain
(for each plane) the rotated view width,height,stride in tile units,
and the page offset into the object where the plane starts. Each plane
is handled exactly the same way, no special casing for NV12 or other
formats. We then store the computed rotation_info under
intel_framebuffer so that we don't have to recompute it again.
To handle fb->offsets[] we treat them as a linear offsets and convert
them to x/y offsets from the start of the relevant GTT mapping (either
normal or rotated). We store the x/y offsets under intel_framebuffer,
and for some extra convenience we also store the rotated pitch (ie.
tile aligned plane height). So for each plane we have the normal
x/y offsets, rotated x/y offsets, and the rotated pitch. The normal
pitch is available already in fb->pitches[].
While we're gathering up all that extra information, we can also easily
compute the storage requirements for the framebuffer, so that we can
check that the object is big enough to hold it.
When it comes time to deal with the plane source coordinates, we first
rotate the clipped src coordinates to match the relevant GTT view
orientation, then add to them the fb x/y offsets. Next we compute
the aligned surface page offset, and as a result we're left with some
residual x/y offsets. Finally, if required by the hardware, we convert
the remaining x/y offsets into a linear offset.
For gen2/3 we simply skip computing the final page offset, and just
convert the src+fb x/y offsets directly into a linear offset since
that's what the hardware wants.
After this all platforms, incluing SKL+, compute these things in exactly
the same way (excluding alignemnt differences).
v2: Use BIT(DRM_ROTATE_270) instead of ROTATE_270 when rotating
plane src coordinates
Drop some spurious changes that got left behind during
development
v3: Split out more changes to prep patches (Daniel)
s/intel_fb->plane[].foo.bar/intel_fb->foo[].bar/ for brevity
Rename intel_surf_gtt_offset to intel_fb_gtt_offset
Kill the pointless 'plane' parameter from intel_fb_gtt_offset()
v4: Fix alignment vs. alignment-1 when calling
_intel_compute_tile_offset() from intel_fill_fb_info()
Pass the pitch in tiles in
stad of pixels to intel_adjust_tile_offset() from intel_fill_fb_info()
Pass the full width/height of the rotated area to
drm_rect_rotate() for clarity
Use u32 for more offsets
v5: Preserve the upper_32_bits()/lower_32_bits() handling for the
fb ggtt offset (Sivakumar)
v6: Rebase due to drm_plane_state src/dst rects
Cc: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470821001-25272-2-git-send-email-ville.syrjala@linux.intel.com
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-09-15 18:16:41 +08:00
|
|
|
unsigned int size = intel_rotation_info_size(rot_info);
|
2016-05-20 18:54:06 +08:00
|
|
|
struct sgt_iter sgt_iter;
|
|
|
|
dma_addr_t dma_addr;
|
2015-03-23 19:10:36 +08:00
|
|
|
unsigned long i;
|
|
|
|
dma_addr_t *page_addr_list;
|
|
|
|
struct sg_table *st;
|
2015-09-21 17:45:34 +08:00
|
|
|
struct scatterlist *sg;
|
2015-03-25 18:15:26 +08:00
|
|
|
int ret = -ENOMEM;
|
2015-03-23 19:10:36 +08:00
|
|
|
|
|
|
|
/* Allocate a temporary list of source pages for random access. */
|
2017-05-17 20:23:12 +08:00
|
|
|
page_addr_list = kvmalloc_array(n_pages,
|
2016-04-08 19:11:13 +08:00
|
|
|
sizeof(dma_addr_t),
|
|
|
|
GFP_TEMPORARY);
|
2015-03-23 19:10:36 +08:00
|
|
|
if (!page_addr_list)
|
|
|
|
return ERR_PTR(ret);
|
|
|
|
|
|
|
|
/* Allocate target SG list. */
|
|
|
|
st = kmalloc(sizeof(*st), GFP_KERNEL);
|
|
|
|
if (!st)
|
|
|
|
goto err_st_alloc;
|
|
|
|
|
drm/i915: Rewrite fb rotation GTT handling
Redo the fb rotation handling in order to:
- eliminate the NV12 special casing
- handle fb->offsets[] properly
- make the rotation handling easier for the plane code
To achieve these goals we reduce intel_rotation_info to only contain
(for each plane) the rotated view width,height,stride in tile units,
and the page offset into the object where the plane starts. Each plane
is handled exactly the same way, no special casing for NV12 or other
formats. We then store the computed rotation_info under
intel_framebuffer so that we don't have to recompute it again.
To handle fb->offsets[] we treat them as a linear offsets and convert
them to x/y offsets from the start of the relevant GTT mapping (either
normal or rotated). We store the x/y offsets under intel_framebuffer,
and for some extra convenience we also store the rotated pitch (ie.
tile aligned plane height). So for each plane we have the normal
x/y offsets, rotated x/y offsets, and the rotated pitch. The normal
pitch is available already in fb->pitches[].
While we're gathering up all that extra information, we can also easily
compute the storage requirements for the framebuffer, so that we can
check that the object is big enough to hold it.
When it comes time to deal with the plane source coordinates, we first
rotate the clipped src coordinates to match the relevant GTT view
orientation, then add to them the fb x/y offsets. Next we compute
the aligned surface page offset, and as a result we're left with some
residual x/y offsets. Finally, if required by the hardware, we convert
the remaining x/y offsets into a linear offset.
For gen2/3 we simply skip computing the final page offset, and just
convert the src+fb x/y offsets directly into a linear offset since
that's what the hardware wants.
After this all platforms, incluing SKL+, compute these things in exactly
the same way (excluding alignemnt differences).
v2: Use BIT(DRM_ROTATE_270) instead of ROTATE_270 when rotating
plane src coordinates
Drop some spurious changes that got left behind during
development
v3: Split out more changes to prep patches (Daniel)
s/intel_fb->plane[].foo.bar/intel_fb->foo[].bar/ for brevity
Rename intel_surf_gtt_offset to intel_fb_gtt_offset
Kill the pointless 'plane' parameter from intel_fb_gtt_offset()
v4: Fix alignment vs. alignment-1 when calling
_intel_compute_tile_offset() from intel_fill_fb_info()
Pass the pitch in tiles in
stad of pixels to intel_adjust_tile_offset() from intel_fill_fb_info()
Pass the full width/height of the rotated area to
drm_rect_rotate() for clarity
Use u32 for more offsets
v5: Preserve the upper_32_bits()/lower_32_bits() handling for the
fb ggtt offset (Sivakumar)
v6: Rebase due to drm_plane_state src/dst rects
Cc: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470821001-25272-2-git-send-email-ville.syrjala@linux.intel.com
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-09-15 18:16:41 +08:00
|
|
|
ret = sg_alloc_table(st, size, GFP_KERNEL);
|
2015-03-23 19:10:36 +08:00
|
|
|
if (ret)
|
|
|
|
goto err_sg_alloc;
|
|
|
|
|
|
|
|
/* Populate source page list from the object. */
|
|
|
|
i = 0;
|
2016-10-28 20:58:35 +08:00
|
|
|
for_each_sgt_dma(dma_addr, sgt_iter, obj->mm.pages)
|
2016-05-20 18:54:06 +08:00
|
|
|
page_addr_list[i++] = dma_addr;
|
2015-03-23 19:10:36 +08:00
|
|
|
|
2016-05-20 18:54:06 +08:00
|
|
|
GEM_BUG_ON(i != n_pages);
|
2016-02-16 04:54:46 +08:00
|
|
|
st->nents = 0;
|
|
|
|
sg = st->sgl;
|
|
|
|
|
drm/i915: Rewrite fb rotation GTT handling
Redo the fb rotation handling in order to:
- eliminate the NV12 special casing
- handle fb->offsets[] properly
- make the rotation handling easier for the plane code
To achieve these goals we reduce intel_rotation_info to only contain
(for each plane) the rotated view width,height,stride in tile units,
and the page offset into the object where the plane starts. Each plane
is handled exactly the same way, no special casing for NV12 or other
formats. We then store the computed rotation_info under
intel_framebuffer so that we don't have to recompute it again.
To handle fb->offsets[] we treat them as a linear offsets and convert
them to x/y offsets from the start of the relevant GTT mapping (either
normal or rotated). We store the x/y offsets under intel_framebuffer,
and for some extra convenience we also store the rotated pitch (ie.
tile aligned plane height). So for each plane we have the normal
x/y offsets, rotated x/y offsets, and the rotated pitch. The normal
pitch is available already in fb->pitches[].
While we're gathering up all that extra information, we can also easily
compute the storage requirements for the framebuffer, so that we can
check that the object is big enough to hold it.
When it comes time to deal with the plane source coordinates, we first
rotate the clipped src coordinates to match the relevant GTT view
orientation, then add to them the fb x/y offsets. Next we compute
the aligned surface page offset, and as a result we're left with some
residual x/y offsets. Finally, if required by the hardware, we convert
the remaining x/y offsets into a linear offset.
For gen2/3 we simply skip computing the final page offset, and just
convert the src+fb x/y offsets directly into a linear offset since
that's what the hardware wants.
After this all platforms, incluing SKL+, compute these things in exactly
the same way (excluding alignemnt differences).
v2: Use BIT(DRM_ROTATE_270) instead of ROTATE_270 when rotating
plane src coordinates
Drop some spurious changes that got left behind during
development
v3: Split out more changes to prep patches (Daniel)
s/intel_fb->plane[].foo.bar/intel_fb->foo[].bar/ for brevity
Rename intel_surf_gtt_offset to intel_fb_gtt_offset
Kill the pointless 'plane' parameter from intel_fb_gtt_offset()
v4: Fix alignment vs. alignment-1 when calling
_intel_compute_tile_offset() from intel_fill_fb_info()
Pass the pitch in tiles in
stad of pixels to intel_adjust_tile_offset() from intel_fill_fb_info()
Pass the full width/height of the rotated area to
drm_rect_rotate() for clarity
Use u32 for more offsets
v5: Preserve the upper_32_bits()/lower_32_bits() handling for the
fb ggtt offset (Sivakumar)
v6: Rebase due to drm_plane_state src/dst rects
Cc: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470821001-25272-2-git-send-email-ville.syrjala@linux.intel.com
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-09-15 18:16:41 +08:00
|
|
|
for (i = 0 ; i < ARRAY_SIZE(rot_info->plane); i++) {
|
|
|
|
sg = rotate_pages(page_addr_list, rot_info->plane[i].offset,
|
|
|
|
rot_info->plane[i].width, rot_info->plane[i].height,
|
|
|
|
rot_info->plane[i].stride, st, sg);
|
2015-09-21 17:45:34 +08:00
|
|
|
}
|
|
|
|
|
drm/i915: Rewrite fb rotation GTT handling
Redo the fb rotation handling in order to:
- eliminate the NV12 special casing
- handle fb->offsets[] properly
- make the rotation handling easier for the plane code
To achieve these goals we reduce intel_rotation_info to only contain
(for each plane) the rotated view width,height,stride in tile units,
and the page offset into the object where the plane starts. Each plane
is handled exactly the same way, no special casing for NV12 or other
formats. We then store the computed rotation_info under
intel_framebuffer so that we don't have to recompute it again.
To handle fb->offsets[] we treat them as a linear offsets and convert
them to x/y offsets from the start of the relevant GTT mapping (either
normal or rotated). We store the x/y offsets under intel_framebuffer,
and for some extra convenience we also store the rotated pitch (ie.
tile aligned plane height). So for each plane we have the normal
x/y offsets, rotated x/y offsets, and the rotated pitch. The normal
pitch is available already in fb->pitches[].
While we're gathering up all that extra information, we can also easily
compute the storage requirements for the framebuffer, so that we can
check that the object is big enough to hold it.
When it comes time to deal with the plane source coordinates, we first
rotate the clipped src coordinates to match the relevant GTT view
orientation, then add to them the fb x/y offsets. Next we compute
the aligned surface page offset, and as a result we're left with some
residual x/y offsets. Finally, if required by the hardware, we convert
the remaining x/y offsets into a linear offset.
For gen2/3 we simply skip computing the final page offset, and just
convert the src+fb x/y offsets directly into a linear offset since
that's what the hardware wants.
After this all platforms, incluing SKL+, compute these things in exactly
the same way (excluding alignemnt differences).
v2: Use BIT(DRM_ROTATE_270) instead of ROTATE_270 when rotating
plane src coordinates
Drop some spurious changes that got left behind during
development
v3: Split out more changes to prep patches (Daniel)
s/intel_fb->plane[].foo.bar/intel_fb->foo[].bar/ for brevity
Rename intel_surf_gtt_offset to intel_fb_gtt_offset
Kill the pointless 'plane' parameter from intel_fb_gtt_offset()
v4: Fix alignment vs. alignment-1 when calling
_intel_compute_tile_offset() from intel_fill_fb_info()
Pass the pitch in tiles in
stad of pixels to intel_adjust_tile_offset() from intel_fill_fb_info()
Pass the full width/height of the rotated area to
drm_rect_rotate() for clarity
Use u32 for more offsets
v5: Preserve the upper_32_bits()/lower_32_bits() handling for the
fb ggtt offset (Sivakumar)
v6: Rebase due to drm_plane_state src/dst rects
Cc: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470821001-25272-2-git-send-email-ville.syrjala@linux.intel.com
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-09-15 18:16:41 +08:00
|
|
|
DRM_DEBUG_KMS("Created rotated page mapping for object size %zu (%ux%u tiles, %u pages)\n",
|
|
|
|
obj->base.size, rot_info->plane[0].width, rot_info->plane[0].height, size);
|
2015-03-23 19:10:36 +08:00
|
|
|
|
2017-05-17 20:23:12 +08:00
|
|
|
kvfree(page_addr_list);
|
2015-03-23 19:10:36 +08:00
|
|
|
|
|
|
|
return st;
|
|
|
|
|
|
|
|
err_sg_alloc:
|
|
|
|
kfree(st);
|
|
|
|
err_st_alloc:
|
2017-05-17 20:23:12 +08:00
|
|
|
kvfree(page_addr_list);
|
2015-03-23 19:10:36 +08:00
|
|
|
|
drm/i915: Rewrite fb rotation GTT handling
Redo the fb rotation handling in order to:
- eliminate the NV12 special casing
- handle fb->offsets[] properly
- make the rotation handling easier for the plane code
To achieve these goals we reduce intel_rotation_info to only contain
(for each plane) the rotated view width,height,stride in tile units,
and the page offset into the object where the plane starts. Each plane
is handled exactly the same way, no special casing for NV12 or other
formats. We then store the computed rotation_info under
intel_framebuffer so that we don't have to recompute it again.
To handle fb->offsets[] we treat them as a linear offsets and convert
them to x/y offsets from the start of the relevant GTT mapping (either
normal or rotated). We store the x/y offsets under intel_framebuffer,
and for some extra convenience we also store the rotated pitch (ie.
tile aligned plane height). So for each plane we have the normal
x/y offsets, rotated x/y offsets, and the rotated pitch. The normal
pitch is available already in fb->pitches[].
While we're gathering up all that extra information, we can also easily
compute the storage requirements for the framebuffer, so that we can
check that the object is big enough to hold it.
When it comes time to deal with the plane source coordinates, we first
rotate the clipped src coordinates to match the relevant GTT view
orientation, then add to them the fb x/y offsets. Next we compute
the aligned surface page offset, and as a result we're left with some
residual x/y offsets. Finally, if required by the hardware, we convert
the remaining x/y offsets into a linear offset.
For gen2/3 we simply skip computing the final page offset, and just
convert the src+fb x/y offsets directly into a linear offset since
that's what the hardware wants.
After this all platforms, incluing SKL+, compute these things in exactly
the same way (excluding alignemnt differences).
v2: Use BIT(DRM_ROTATE_270) instead of ROTATE_270 when rotating
plane src coordinates
Drop some spurious changes that got left behind during
development
v3: Split out more changes to prep patches (Daniel)
s/intel_fb->plane[].foo.bar/intel_fb->foo[].bar/ for brevity
Rename intel_surf_gtt_offset to intel_fb_gtt_offset
Kill the pointless 'plane' parameter from intel_fb_gtt_offset()
v4: Fix alignment vs. alignment-1 when calling
_intel_compute_tile_offset() from intel_fill_fb_info()
Pass the pitch in tiles in
stad of pixels to intel_adjust_tile_offset() from intel_fill_fb_info()
Pass the full width/height of the rotated area to
drm_rect_rotate() for clarity
Use u32 for more offsets
v5: Preserve the upper_32_bits()/lower_32_bits() handling for the
fb ggtt offset (Sivakumar)
v6: Rebase due to drm_plane_state src/dst rects
Cc: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470821001-25272-2-git-send-email-ville.syrjala@linux.intel.com
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-09-15 18:16:41 +08:00
|
|
|
DRM_DEBUG_KMS("Failed to create rotated mapping for object size %zu! (%ux%u tiles, %u pages)\n",
|
|
|
|
obj->base.size, rot_info->plane[0].width, rot_info->plane[0].height, size);
|
|
|
|
|
2015-03-23 19:10:36 +08:00
|
|
|
return ERR_PTR(ret);
|
|
|
|
}
|
2015-03-16 20:11:13 +08:00
|
|
|
|
2017-02-15 16:43:35 +08:00
|
|
|
static noinline struct sg_table *
|
2015-05-06 19:35:38 +08:00
|
|
|
intel_partial_pages(const struct i915_ggtt_view *view,
|
|
|
|
struct drm_i915_gem_object *obj)
|
|
|
|
{
|
|
|
|
struct sg_table *st;
|
2016-10-28 20:58:34 +08:00
|
|
|
struct scatterlist *sg, *iter;
|
2017-01-14 08:28:25 +08:00
|
|
|
unsigned int count = view->partial.size;
|
2016-10-28 20:58:34 +08:00
|
|
|
unsigned int offset;
|
2015-05-06 19:35:38 +08:00
|
|
|
int ret = -ENOMEM;
|
|
|
|
|
|
|
|
st = kmalloc(sizeof(*st), GFP_KERNEL);
|
|
|
|
if (!st)
|
|
|
|
goto err_st_alloc;
|
|
|
|
|
2016-10-28 20:58:34 +08:00
|
|
|
ret = sg_alloc_table(st, count, GFP_KERNEL);
|
2015-05-06 19:35:38 +08:00
|
|
|
if (ret)
|
|
|
|
goto err_sg_alloc;
|
|
|
|
|
2017-01-14 08:28:25 +08:00
|
|
|
iter = i915_gem_object_get_sg(obj, view->partial.offset, &offset);
|
2016-10-28 20:58:34 +08:00
|
|
|
GEM_BUG_ON(!iter);
|
|
|
|
|
2015-05-06 19:35:38 +08:00
|
|
|
sg = st->sgl;
|
|
|
|
st->nents = 0;
|
2016-10-28 20:58:34 +08:00
|
|
|
do {
|
|
|
|
unsigned int len;
|
2015-05-06 19:35:38 +08:00
|
|
|
|
2016-10-28 20:58:34 +08:00
|
|
|
len = min(iter->length - (offset << PAGE_SHIFT),
|
|
|
|
count << PAGE_SHIFT);
|
|
|
|
sg_set_page(sg, NULL, len, 0);
|
|
|
|
sg_dma_address(sg) =
|
|
|
|
sg_dma_address(iter) + (offset << PAGE_SHIFT);
|
|
|
|
sg_dma_len(sg) = len;
|
2015-05-06 19:35:38 +08:00
|
|
|
|
|
|
|
st->nents++;
|
2016-10-28 20:58:34 +08:00
|
|
|
count -= len >> PAGE_SHIFT;
|
|
|
|
if (count == 0) {
|
|
|
|
sg_mark_end(sg);
|
|
|
|
return st;
|
|
|
|
}
|
2015-05-06 19:35:38 +08:00
|
|
|
|
2016-10-28 20:58:34 +08:00
|
|
|
sg = __sg_next(sg);
|
|
|
|
iter = __sg_next(iter);
|
|
|
|
offset = 0;
|
|
|
|
} while (1);
|
2015-05-06 19:35:38 +08:00
|
|
|
|
|
|
|
err_sg_alloc:
|
|
|
|
kfree(st);
|
|
|
|
err_st_alloc:
|
|
|
|
return ERR_PTR(ret);
|
|
|
|
}
|
|
|
|
|
2015-04-14 23:35:27 +08:00
|
|
|
static int
|
2015-03-23 19:10:36 +08:00
|
|
|
i915_get_ggtt_vma_pages(struct i915_vma *vma)
|
2014-12-11 01:27:58 +08:00
|
|
|
{
|
2017-02-15 16:43:35 +08:00
|
|
|
int ret;
|
2015-03-23 19:10:36 +08:00
|
|
|
|
2016-11-04 18:30:01 +08:00
|
|
|
/* The vma->pages are only valid within the lifespan of the borrowed
|
|
|
|
* obj->mm.pages. When the obj->mm.pages sg_table is regenerated, so
|
|
|
|
* must be the vma->pages. A simple rule is that vma->pages must only
|
|
|
|
* be accessed when the obj->mm.pages are pinned.
|
|
|
|
*/
|
|
|
|
GEM_BUG_ON(!i915_gem_object_has_pinned_pages(vma->obj));
|
|
|
|
|
2017-02-15 16:43:35 +08:00
|
|
|
switch (vma->ggtt_view.type) {
|
|
|
|
case I915_GGTT_VIEW_NORMAL:
|
|
|
|
vma->pages = vma->obj->mm.pages;
|
2014-12-11 01:27:58 +08:00
|
|
|
return 0;
|
|
|
|
|
2017-02-15 16:43:35 +08:00
|
|
|
case I915_GGTT_VIEW_ROTATED:
|
2016-08-15 17:48:47 +08:00
|
|
|
vma->pages =
|
2017-02-15 16:43:35 +08:00
|
|
|
intel_rotate_pages(&vma->ggtt_view.rotated, vma->obj);
|
|
|
|
break;
|
|
|
|
|
|
|
|
case I915_GGTT_VIEW_PARTIAL:
|
2016-08-15 17:48:47 +08:00
|
|
|
vma->pages = intel_partial_pages(&vma->ggtt_view, vma->obj);
|
2017-02-15 16:43:35 +08:00
|
|
|
break;
|
|
|
|
|
|
|
|
default:
|
2014-12-11 01:27:58 +08:00
|
|
|
WARN_ONCE(1, "GGTT view %u not implemented!\n",
|
|
|
|
vma->ggtt_view.type);
|
2017-02-15 16:43:35 +08:00
|
|
|
return -EINVAL;
|
|
|
|
}
|
2014-12-11 01:27:58 +08:00
|
|
|
|
2017-02-15 16:43:35 +08:00
|
|
|
ret = 0;
|
|
|
|
if (unlikely(IS_ERR(vma->pages))) {
|
2016-08-15 17:48:47 +08:00
|
|
|
ret = PTR_ERR(vma->pages);
|
|
|
|
vma->pages = NULL;
|
2015-03-23 19:10:36 +08:00
|
|
|
DRM_ERROR("Failed to get pages for VMA view type %u (%d)!\n",
|
|
|
|
vma->ggtt_view.type, ret);
|
2014-12-11 01:27:58 +08:00
|
|
|
}
|
2015-03-23 19:10:36 +08:00
|
|
|
return ret;
|
2014-12-11 01:27:58 +08:00
|
|
|
}
|
|
|
|
|
2017-01-11 19:23:11 +08:00
|
|
|
/**
|
|
|
|
* i915_gem_gtt_reserve - reserve a node in an address_space (GTT)
|
2017-01-13 00:45:59 +08:00
|
|
|
* @vm: the &struct i915_address_space
|
|
|
|
* @node: the &struct drm_mm_node (typically i915_vma.mode)
|
|
|
|
* @size: how much space to allocate inside the GTT,
|
|
|
|
* must be #I915_GTT_PAGE_SIZE aligned
|
|
|
|
* @offset: where to insert inside the GTT,
|
|
|
|
* must be #I915_GTT_MIN_ALIGNMENT aligned, and the node
|
|
|
|
* (@offset + @size) must fit within the address space
|
|
|
|
* @color: color to apply to node, if this node is not from a VMA,
|
|
|
|
* color must be #I915_COLOR_UNEVICTABLE
|
|
|
|
* @flags: control search and eviction behaviour
|
2017-01-11 19:23:11 +08:00
|
|
|
*
|
|
|
|
* i915_gem_gtt_reserve() tries to insert the @node at the exact @offset inside
|
|
|
|
* the address space (using @size and @color). If the @node does not fit, it
|
|
|
|
* tries to evict any overlapping nodes from the GTT, including any
|
|
|
|
* neighbouring nodes if the colors do not match (to ensure guard pages between
|
|
|
|
* differing domains). See i915_gem_evict_for_node() for the gory details
|
|
|
|
* on the eviction algorithm. #PIN_NONBLOCK may used to prevent waiting on
|
|
|
|
* evicting active overlapping objects, and any overlapping node that is pinned
|
|
|
|
* or marked as unevictable will also result in failure.
|
|
|
|
*
|
|
|
|
* Returns: 0 on success, -ENOSPC if no suitable hole is found, -EINTR if
|
|
|
|
* asked to wait for eviction and interrupted.
|
|
|
|
*/
|
|
|
|
int i915_gem_gtt_reserve(struct i915_address_space *vm,
|
|
|
|
struct drm_mm_node *node,
|
|
|
|
u64 size, u64 offset, unsigned long color,
|
|
|
|
unsigned int flags)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
|
|
|
|
GEM_BUG_ON(!size);
|
|
|
|
GEM_BUG_ON(!IS_ALIGNED(size, I915_GTT_PAGE_SIZE));
|
|
|
|
GEM_BUG_ON(!IS_ALIGNED(offset, I915_GTT_MIN_ALIGNMENT));
|
|
|
|
GEM_BUG_ON(range_overflows(offset, size, vm->total));
|
2017-01-15 21:47:46 +08:00
|
|
|
GEM_BUG_ON(vm == &vm->i915->mm.aliasing_ppgtt->base);
|
2017-01-16 01:27:40 +08:00
|
|
|
GEM_BUG_ON(drm_mm_node_allocated(node));
|
2017-01-11 19:23:11 +08:00
|
|
|
|
|
|
|
node->size = size;
|
|
|
|
node->start = offset;
|
|
|
|
node->color = color;
|
|
|
|
|
|
|
|
err = drm_mm_reserve_node(&vm->mm, node);
|
|
|
|
if (err != -ENOSPC)
|
|
|
|
return err;
|
|
|
|
|
2017-06-16 22:05:21 +08:00
|
|
|
if (flags & PIN_NOEVICT)
|
|
|
|
return -ENOSPC;
|
|
|
|
|
2017-01-11 19:23:11 +08:00
|
|
|
err = i915_gem_evict_for_node(vm, node, flags);
|
|
|
|
if (err == 0)
|
|
|
|
err = drm_mm_reserve_node(&vm->mm, node);
|
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2017-01-11 19:23:12 +08:00
|
|
|
static u64 random_offset(u64 start, u64 end, u64 len, u64 align)
|
|
|
|
{
|
|
|
|
u64 range, addr;
|
|
|
|
|
|
|
|
GEM_BUG_ON(range_overflows(start, len, end));
|
|
|
|
GEM_BUG_ON(round_up(start, align) > round_down(end - len, align));
|
|
|
|
|
|
|
|
range = round_down(end - len, align) - round_up(start, align);
|
|
|
|
if (range) {
|
|
|
|
if (sizeof(unsigned long) == sizeof(u64)) {
|
|
|
|
addr = get_random_long();
|
|
|
|
} else {
|
|
|
|
addr = get_random_int();
|
|
|
|
if (range > U32_MAX) {
|
|
|
|
addr <<= 32;
|
|
|
|
addr |= get_random_int();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
div64_u64_rem(addr, range, &addr);
|
|
|
|
start += addr;
|
|
|
|
}
|
|
|
|
|
|
|
|
return round_up(start, align);
|
|
|
|
}
|
|
|
|
|
2017-01-11 19:23:10 +08:00
|
|
|
/**
|
|
|
|
* i915_gem_gtt_insert - insert a node into an address_space (GTT)
|
2017-01-13 00:45:59 +08:00
|
|
|
* @vm: the &struct i915_address_space
|
|
|
|
* @node: the &struct drm_mm_node (typically i915_vma.node)
|
|
|
|
* @size: how much space to allocate inside the GTT,
|
|
|
|
* must be #I915_GTT_PAGE_SIZE aligned
|
|
|
|
* @alignment: required alignment of starting offset, may be 0 but
|
|
|
|
* if specified, this must be a power-of-two and at least
|
|
|
|
* #I915_GTT_MIN_ALIGNMENT
|
|
|
|
* @color: color to apply to node
|
|
|
|
* @start: start of any range restriction inside GTT (0 for all),
|
2017-01-11 19:23:10 +08:00
|
|
|
* must be #I915_GTT_PAGE_SIZE aligned
|
2017-01-13 00:45:59 +08:00
|
|
|
* @end: end of any range restriction inside GTT (U64_MAX for all),
|
|
|
|
* must be #I915_GTT_PAGE_SIZE aligned if not U64_MAX
|
|
|
|
* @flags: control search and eviction behaviour
|
2017-01-11 19:23:10 +08:00
|
|
|
*
|
|
|
|
* i915_gem_gtt_insert() first searches for an available hole into which
|
|
|
|
* is can insert the node. The hole address is aligned to @alignment and
|
|
|
|
* its @size must then fit entirely within the [@start, @end] bounds. The
|
|
|
|
* nodes on either side of the hole must match @color, or else a guard page
|
|
|
|
* will be inserted between the two nodes (or the node evicted). If no
|
2017-01-11 19:23:12 +08:00
|
|
|
* suitable hole is found, first a victim is randomly selected and tested
|
|
|
|
* for eviction, otherwise then the LRU list of objects within the GTT
|
2017-01-11 19:23:10 +08:00
|
|
|
* is scanned to find the first set of replacement nodes to create the hole.
|
|
|
|
* Those old overlapping nodes are evicted from the GTT (and so must be
|
|
|
|
* rebound before any future use). Any node that is currently pinned cannot
|
|
|
|
* be evicted (see i915_vma_pin()). Similar if the node's VMA is currently
|
|
|
|
* active and #PIN_NONBLOCK is specified, that node is also skipped when
|
|
|
|
* searching for an eviction candidate. See i915_gem_evict_something() for
|
|
|
|
* the gory details on the eviction algorithm.
|
|
|
|
*
|
|
|
|
* Returns: 0 on success, -ENOSPC if no suitable hole is found, -EINTR if
|
|
|
|
* asked to wait for eviction and interrupted.
|
|
|
|
*/
|
|
|
|
int i915_gem_gtt_insert(struct i915_address_space *vm,
|
|
|
|
struct drm_mm_node *node,
|
|
|
|
u64 size, u64 alignment, unsigned long color,
|
|
|
|
u64 start, u64 end, unsigned int flags)
|
|
|
|
{
|
2017-02-03 05:04:38 +08:00
|
|
|
enum drm_mm_insert_mode mode;
|
2017-01-11 19:23:12 +08:00
|
|
|
u64 offset;
|
2017-01-11 19:23:10 +08:00
|
|
|
int err;
|
|
|
|
|
|
|
|
lockdep_assert_held(&vm->i915->drm.struct_mutex);
|
|
|
|
GEM_BUG_ON(!size);
|
|
|
|
GEM_BUG_ON(!IS_ALIGNED(size, I915_GTT_PAGE_SIZE));
|
|
|
|
GEM_BUG_ON(alignment && !is_power_of_2(alignment));
|
|
|
|
GEM_BUG_ON(alignment && !IS_ALIGNED(alignment, I915_GTT_MIN_ALIGNMENT));
|
|
|
|
GEM_BUG_ON(start >= end);
|
|
|
|
GEM_BUG_ON(start > 0 && !IS_ALIGNED(start, I915_GTT_PAGE_SIZE));
|
|
|
|
GEM_BUG_ON(end < U64_MAX && !IS_ALIGNED(end, I915_GTT_PAGE_SIZE));
|
2017-01-15 21:47:46 +08:00
|
|
|
GEM_BUG_ON(vm == &vm->i915->mm.aliasing_ppgtt->base);
|
2017-01-16 01:27:40 +08:00
|
|
|
GEM_BUG_ON(drm_mm_node_allocated(node));
|
2017-01-11 19:23:10 +08:00
|
|
|
|
|
|
|
if (unlikely(range_overflows(start, size, end)))
|
|
|
|
return -ENOSPC;
|
|
|
|
|
|
|
|
if (unlikely(round_up(start, alignment) > round_down(end - size, alignment)))
|
|
|
|
return -ENOSPC;
|
|
|
|
|
2017-02-03 05:04:38 +08:00
|
|
|
mode = DRM_MM_INSERT_BEST;
|
|
|
|
if (flags & PIN_HIGH)
|
|
|
|
mode = DRM_MM_INSERT_HIGH;
|
|
|
|
if (flags & PIN_MAPPABLE)
|
|
|
|
mode = DRM_MM_INSERT_LOW;
|
2017-01-11 19:23:10 +08:00
|
|
|
|
|
|
|
/* We only allocate in PAGE_SIZE/GTT_PAGE_SIZE (4096) chunks,
|
|
|
|
* so we know that we always have a minimum alignment of 4096.
|
|
|
|
* The drm_mm range manager is optimised to return results
|
|
|
|
* with zero alignment, so where possible use the optimal
|
|
|
|
* path.
|
|
|
|
*/
|
|
|
|
BUILD_BUG_ON(I915_GTT_MIN_ALIGNMENT > I915_GTT_PAGE_SIZE);
|
|
|
|
if (alignment <= I915_GTT_MIN_ALIGNMENT)
|
|
|
|
alignment = 0;
|
|
|
|
|
2017-02-03 05:04:38 +08:00
|
|
|
err = drm_mm_insert_node_in_range(&vm->mm, node,
|
|
|
|
size, alignment, color,
|
|
|
|
start, end, mode);
|
2017-01-11 19:23:10 +08:00
|
|
|
if (err != -ENOSPC)
|
|
|
|
return err;
|
|
|
|
|
2017-06-16 22:05:21 +08:00
|
|
|
if (flags & PIN_NOEVICT)
|
|
|
|
return -ENOSPC;
|
|
|
|
|
2017-01-11 19:23:12 +08:00
|
|
|
/* No free space, pick a slot at random.
|
|
|
|
*
|
|
|
|
* There is a pathological case here using a GTT shared between
|
|
|
|
* mmap and GPU (i.e. ggtt/aliasing_ppgtt but not full-ppgtt):
|
|
|
|
*
|
|
|
|
* |<-- 256 MiB aperture -->||<-- 1792 MiB unmappable -->|
|
|
|
|
* (64k objects) (448k objects)
|
|
|
|
*
|
|
|
|
* Now imagine that the eviction LRU is ordered top-down (just because
|
|
|
|
* pathology meets real life), and that we need to evict an object to
|
|
|
|
* make room inside the aperture. The eviction scan then has to walk
|
|
|
|
* the 448k list before it finds one within range. And now imagine that
|
|
|
|
* it has to search for a new hole between every byte inside the memcpy,
|
|
|
|
* for several simultaneous clients.
|
|
|
|
*
|
|
|
|
* On a full-ppgtt system, if we have run out of available space, there
|
|
|
|
* will be lots and lots of objects in the eviction list! Again,
|
|
|
|
* searching that LRU list may be slow if we are also applying any
|
|
|
|
* range restrictions (e.g. restriction to low 4GiB) and so, for
|
|
|
|
* simplicity and similarilty between different GTT, try the single
|
|
|
|
* random replacement first.
|
|
|
|
*/
|
|
|
|
offset = random_offset(start, end,
|
|
|
|
size, alignment ?: I915_GTT_MIN_ALIGNMENT);
|
|
|
|
err = i915_gem_gtt_reserve(vm, node, size, offset, color, flags);
|
|
|
|
if (err != -ENOSPC)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
/* Randomly selected placement is pinned, do a search */
|
2017-01-11 19:23:10 +08:00
|
|
|
err = i915_gem_evict_something(vm, size, alignment, color,
|
|
|
|
start, end, flags);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
2017-02-03 05:04:38 +08:00
|
|
|
return drm_mm_insert_node_in_range(&vm->mm, node,
|
|
|
|
size, alignment, color,
|
|
|
|
start, end, DRM_MM_INSERT_EVICT);
|
2017-01-11 19:23:10 +08:00
|
|
|
}
|
2017-02-14 01:15:18 +08:00
|
|
|
|
|
|
|
#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
|
|
|
|
#include "selftests/mock_gtt.c"
|
2017-02-14 01:15:38 +08:00
|
|
|
#include "selftests/i915_gem_gtt.c"
|
2017-02-14 01:15:18 +08:00
|
|
|
#endif
|