Since commit 1233e2db19 ("drm/i915: Move object backing storage
manipulation to its own locking"), i915_gem_object_put_pages() and
specifically the i915_gem_gtt_finish_pages() may be called from outside
of the struct_mutex and so we can no longer pass I915_WAIT_LOCKED to
i915_gem_wait_for_idle.
Fixes: 1233e2db19 ("drm/i915: Move object backing storage manipulation to its own locking")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v4.10+
Link: http://patchwork.freedesktop.org/patch/msgid/20170330085341.20311-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
If we manage to tangle errorpaths and get call to callbacks,
it is better to defensively keep them as null until object init is
finished so that we get clean null deref on callsite,
instead of more cryptic wreckage with partly initialized vm objects.
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1488295691-9404-5-git-send-email-mika.kuoppala@intel.com
If we setup the vm size early, we can use the newly introduced
i915_vm_is_48bit() in majority of callsites wanting to know the vm size.
As we operate either with 3lvl or 4lvl page table structure,
wrap the vm size query inside a function which tells us if
4lvl setup is needed for particular vm, as the following
code uses the function names where level is noted.
v2: use_4lvl (Chris)
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1488295691-9404-3-git-send-email-mika.kuoppala@intel.com
The macro takes a vm pointer at some sites, and dev_priv on others
We were saved as the internal macro never deferences the pointer
given.
As the number of pdpes depend on vm configuration, make it
as a inline function that accepts vm pointer.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wsilon.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1488295691-9404-1-git-send-email-mika.kuoppala@intel.com
We are required to reload the TLBs around ppgtt switches. However, we
already do an unconditional TLB invalidate before every batch and a flush
afterwards, so this condition is already satisfied without extra flushes
around the LRI instructions.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170227135913.8056-2-chris@chris-wilson.co.uk
If we fail to allocate the ppgtt range after allocating the pages for
the vma, we should unwind the local allocation before reporting back the
failure.
Fixes: ff685975d9 ("drm/i915: Move allocate_va_range to GTT")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170227122654.27651-2-chris@chris-wilson.co.uk
Only if we allocated the layer and the lower level failed should we
remove this layer when unwinding. Otherwise we ignore the overlapping
entries by overwriting the old layer with scratch.
Fixes: c5d092a429 ("drm/i915: Remove bitmap tracking for used-pml4")
Fixes: e2b763caa6 ("drm/i915: Remove bitmap tracking for used-pdpes")
Reported-by: Matthew Auld <matthew.william.auld@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99947
Testcase: igt/drv_selftest/live_gtt
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Tested-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170227122654.27651-1-chris@chris-wilson.co.uk
When advancing onto the next 4th level page table entry, we need to
reset our indices to 0. Currently we restart from the original address
which means we start with an offset into the next PML table.
Fixes: 894ccebee2 ("drm/i915: Micro-optimise gen8_ppgtt_insert_entries()")
Reported-by: Matthew Auld <matthew.william.auld@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99948
Testcase: igt/drv_selftest/live_gtt
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Tested-by: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170225181122.4788-4-chris@chris-wilson.co.uk
Testing with concurrent GGTT accesses no longer show the coherency
problems from yonder, commit 5bab6f60cb ("drm/i915: Serialise updates
to GGTT with access through GGTT on Braswell"). My presumption is that
the root cause was more likely fixed by commit 3b5724d702 ("drm/i915:
Wait for writes through the GTT to land before reading back"), along
with the use of WC updates to the global gTT in commit 8448661d65
("drm/i915: Convert clflushed pagetables over to WC maps". Given
that the original symptoms can no longer be reproduced, time to remove
the workaround.
Testcase: igt/gem_concurrent_blit
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170220124718.14796-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Prevent the overflow check from firing on machines with the full 4lvl
page tables, that are not restricted to GEN8_LEGACY_PDES.
v2: Also fix the off-by-one in the compare
Fixes: 894ccebee2 ("drm/i915: Micro-optimise gen8_ppgtt_insert_entries()")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170217141455.19877-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
As the aliasing GTT is only accessed via the global GTT, we will never
use more of it than we expose via the Global GTT and so we only need to
preallocate sufficient space within the ppgtt for the full GTT. Equally,
if the aliasing GTT is smaller than the global GTT, we have a serious
issue and must bail.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170215084357.19977-21-chris@chris-wilson.co.uk
Once upon a time, back in the UMS days, we supported userspace
initialising the GTT and sharing portions of the GTT with other users.
Now, we own the GTT (both global and per-process) and the tables always
start at 0 - so we can remove i915_address_space.start and forget about
this old complication.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170215084357.19977-20-chris@chris-wilson.co.uk
We only operate on known extents (both for alloc/clear) and so we can use
both the knowledge of the bind/unbind range along with the knowledge of
the existing pagetable to avoid having to allocate temporary and
auxiliary bitmaps.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170215084357.19977-15-chris@chris-wilson.co.uk
We only operate on known extents (both for alloc/clear) and so we can use
both the knowledge of the bind/unbind range along with the knowledge of
the existing pagetable to avoid having to allocate temporary and
auxiliary bitmaps.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170215084357.19977-14-chris@chris-wilson.co.uk
We only operate on known extents (both for alloc/clear) and so we can use
both the knowledge of the bind/unbind range along with the knowledge of
the existing pagetable to avoid having to allocate temporary and
auxiliary bitmaps.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170215084357.19977-13-chris@chris-wilson.co.uk
The hardware does not cope very well with us changing the PD within an
active context (the context must be idle for it to re-read the PD). As
we only check whether the page is idle before changing the entry (and on
through the PD tree), we cannot reliably replace PD entries on
gen6/gen7. To fully avoid changing the tree at runtime, preallocate it
on init.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170215084357.19977-9-chris@chris-wilson.co.uk
In the future, we need to call allocate_va_range on the aliasing-ppgtt
which means moving the call down from the vma into the vm (which is
more appropriate for calling the vm function).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170215084357.19977-8-chris@chris-wilson.co.uk
As these are now both plain and simple kmap_atomic/kunmap_atomic pairs,
we can remove the wrappers for a small gain of clarity (in particular,
not hiding the atomic critical sections!).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170215084357.19977-7-chris@chris-wilson.co.uk
We flush the entire page every time we update a few bytes, making the
update of a page table many, many times slower than is required. If we
create a WC map of the page for our updates, we can avoid the clflush
but incur additional cost for creating the pagetable. We amoritize that
cost by reusing page vmappings, and only changing the page protection in
batches.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Improve the sg iteration and in hte process eliminate a bug in
miscomputing the pml4 length as orig_nents<<PAGE_SHIFT is no longer the
full length of the sg table.
v2: Check for the end of the fourth level page table (the final pdpe)
and move onto the next.
v3: Assert that 3lvl insert_pte_entries doesn't overflow its smaller set
of PDP.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170215084357.19977-3-chris@chris-wilson.co.uk
Inline the address computation to avoid the vfunc call for every page.
We still have to pay the high overhead of sg_page_iter_next(), but now
at least GCC can optimise the inner most loop, giving a significant
boost to some thrashing Unreal Engine workloads.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170215084357.19977-2-chris@chris-wilson.co.uk
This removes the usage of intel_ring_emit in favour of
directly writing to the ring buffer.
intel_ring_emit was preventing the compiler for optimising
fetch and increment of the current ring buffer pointer and
therefore generating very verbose code for every write.
It had no useful purpose since all ringbuffer operations
are started and ended with intel_ring_begin and
intel_ring_advance respectively, with no bail out in the
middle possible, so it is fine to increment the tail in
intel_ring_begin and let the code manage the pointer
itself.
Useless instruction removal amounts to approximately
two and half kilobytes of saved text on my build.
Not sure if this has any measurable performance
implications but executing a ton of useless instructions
on fast paths cannot be good.
v2:
* Change return from intel_ring_begin to error pointer by
popular demand.
* Move tail increment to intel_ring_advance to enable some
error checking.
v3:
* Move tail advance back into intel_ring_begin.
* Rebase and tidy.
v4:
* Complete rebase after a few months since v3.
v5:
* Remove unecessary cast and fix !debug compile. (Chris Wilson)
v6:
* Make intel_ring_offset take request as well.
* Fix recording of request postfix plus a sprinkle of asserts.
(Chris Wilson)
v7:
* Use intel_ring_offset to get the postfix. (Chris Wilson)
* Convert GVT code as well.
v8:
* Rename *out++ to *cs++.
v9:
* Fix GVT out to cs conversion in GVT.
v10:
* Rebase for new intel_ring_begin in selftests.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170214113242.29241-1-tvrtko.ursulin@linux.intel.com
It is possible whilst allocating the page-directory tree for a ppgtt
bind that the shrinker may run and reap unused parts of the tree. If the
shrinker happens to remove a chunk of the tree that the
allocate_va_range has already processed, we may then try to insert into
the dangling tree. This test uses the fault-injection framework to force
the shrinker to be invoked before we allocate new pages, i.e. new chunks
of the PD tree.
References: https://bugs.freedesktop.org/show_bug.cgi?id=99295
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
A very simple mockery, just a random manager and timeline. Useful for
inserting objects and ordering retirement; and not much else.
v2: mock_fini_ggtt() to complement mock_init_ggtt().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170213171558.20942-7-chris@chris-wilson.co.uk
We may unload the PCI device before all users (such as dma-buf) are
completely shutdown. This may leave VMA in the global GTT which we want
to revoke, whilst keeping the objects themselves around to service the
dma-buf.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170210163523.17533-2-chris@chris-wilson.co.uk
Chris Wilson needs the new drm_driver->release callback to make sure
the shiny new dma-buf testcases don't oops the driver on unload.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
This patch makes PPGTT page table non-shrinkable when using aliasing PPGTT
mode. It's just a temporary solution for making GVT-g work.
Fixes: 2ce5179fe8 ("drm/i915/gtt: Free unused lower-level page tables")
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhiyuan Lv <zhiyuan.lv@intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1486559013-25251-2-git-send-email-zhi.a.wang@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Cc: stable@vger.kernel.org
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we now mark the reserved hole (drm_mm.head_node) with the special
UNEVICTABLE color, we can use the page coloring to avoid prefetching of
the CS beyond the end of the GTT.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170206084547.27921-3-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
The drm_mm range manager (within i915_address_space) uses a special
drm_mm_node that excludes the unavailable range (beyond the end of the
drm_mm). However, we play games with the global GTT to use the head_node
to exclude the tail page but tell ourselves that the whole range is
available. This causes an issue when we try to evict using the full
range of the global GTT which is wider than the drm_mm, resulting in
complete confusion and catastrophe. One way to resolve this would be to
use a reserved node to exclude the guard page, or we can treat the
drm_mm's head_node as our guard page and assign it the appropriate
colour.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170206084547.27921-2-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
I incorrectly converted the exclusion of the last 4096 bytes (that avoids
any potential prefetching past the end of the GTT) to PAGE_SIZE and not
to I915_GTT_PAGE_SIZE as it should be.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170206084547.27921-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Now that we have selftests in place exercising truly huge allocations
we will start to hit the 512GB warning, so now seems like a good time to
remove this user-triggerable WARN.
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1486047300-13198-1-git-send-email-matthew.auld@intel.com
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The drm_mm range manager claimed to support top-down insertion, but it
was neither searching for the top-most hole that could fit the
allocation request nor fitting the request to the hole correctly.
In order to search the range efficiently, we create a secondary index
for the holes using either their size or their address. This index
allows us to find the smallest hole or the hole at the bottom or top of
the range efficiently, whilst keeping the hole stack to rapidly service
evictions.
v2: Search for holes both high and low. Rename flags to mode.
v3: Discover rb_entry_safe() and use it!
v4: Kerneldoc for enum drm_mm_insert_mode.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Cc: Alexandre Courbot <gnurou@gmail.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: Sinclair Yeh <syeh@vmware.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com> # vmwgfx
Reviewed-by: Lucas Stach <l.stach@pengutronix.de> #etnaviv
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170202210438.28702-1-chris@chris-wilson.co.uk
Apply workarounds to Geminilake, and annotate those that are applied
unconditionally when they apply to GLK based on the workaround database.
v2: Fix commit message typos. (David)
v3: Rebase.
Cc: David Weinehall <david.weinehall@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1485422218-9102-1-git-send-email-ander.conselvan.de.oliveira@intel.com
Along with GLK it was introduced the .is_lp and IS_GEN9_LP.
So, following the same simplification standard we can
put Skylake and Kabylake under the same bucket for most
of the things.
So let's add the IS_GEN9_BC for "Big Core" (non Atom based
platforms).
The i915_drv.c was let out of this patch on purpose
because that is really a decision per platform, just like
other cases where IS_KABYLAKE is different from IS_SKYLAKE.
v2: fix conflict with IS_LP and 3 new cases for this
big core bucket:
- intel_ddi.c: intel_ddi_get_link_dpll
- intel_fbc.c: find_compression_threshold
- i915_gem_gtt.c: gtt_write_workarounds
Cc: Anusha Srivatsa <anusha.srivatsa@intel.com>
Cc: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1485196357-30599-2-git-send-email-rodrigo.vivi@intel.com
According to wa_database this Wa persist on KBL as it was on SKL.
Cc: Tim Gore <tim.gore@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1485196357-30599-1-git-send-email-rodrigo.vivi@intel.com
Whilst writing testcases to exercise the VMA API, some oddities came to
light, such as i915_gem_obj_lookup_or_create(). Joonas suggested
i915_vma_instance() as a neat replacement, so rename them, move them to
i915_vma.c and add some kerneldoc as a sugary bonus.
s/i915_gem_obj_to_vma/i915_vma_lookup/
s/i915_gem_obj_lookup_or_create_vma/i915_vma_instance/
Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170116152131.18089-2-chris@chris-wilson.co.uk
The aliasing_gtt is just that, an alias of the global GTT. We do not
populate it directly, instead we always use the global GTT. Catch any
attempt to incorrectly allocate ranges from the aliasing_gtt.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170115134746.29325-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
It is only being used to clear a struct and set the type, after which it
is overwritten. Since we no longer check the unset bits of the union,
skipping the clear is permissible.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170114002827.31315-6-chris@chris-wilson.co.uk
Reading the ggtt_views is much more pleasant without the extra
characters from specifying the union (i.e. ggtt_view.partial rather than
ggtt_view.params.partial). To make this work inside i915_vma_compare()
with only a single memcmp requires us to ensure that there are no
uninitialised bytes within each branch of the union (we make sure the
structs are packed) and we need to store the size of each branch.
v4: Rewrite changelog and add comments explaining the assert.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170114002827.31315-5-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Parameter: good.
Parameter - bad.
One day I'll learn the syntax.
Fixes: 625d988acc ("drm/i915: Extract reserving space in the GTT to a helper")
Fixes: e007b19d7b ("drm/i915: Use the MRU stack search after evicting")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170112164559.27232-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Move the GuC invalidation of its ggtt TLB to where we perform the ggtt
modification rather than proliferate it into all the callers of the
insert (which may or may not in fact have to do the insertion).
v2: Just do the guc invalidate unconditionally, (afaict) it has no impact
without the guc loaded on gen8+
v3: Conditionally invalidate the guc - just in case that register has
not been validated for other modes.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170112110050.25333-1-chris@chris-wilson.co.uk
Performing an eviction search can be very, very slow especially for a
range restricted replacement. For example, a workload like
gem_concurrent_blit will populate the entire GTT and then cause aperture
thrashing. Since the GTT is a mix of active and inactive tiny objects,
we have to search through almost 400k objects before finding anything
inside the mappable region, and as this search is required before every
operation performance falls off a cliff.
Instead of performing the full search, we do a trial replacement of the
node at a random location fitting the specified restrictions. We lose
the strict LRU property of the GTT in exchange for avoiding the slow
search (several orders of runtime improvement for gem_concurrent_blit
4KiB-global-gtt, e.g. from 5000s to 20s). The loss of LRU replacement is
(later) mitigated firstly by only doing replacement if we find no
freespace and secondly by execbuf doing a PIN_NONBLOCK search first before
it starts thrashing (i.e. the random replacement will only occur from the
already inactive set of objects).
v2: Ascii-art, and check preconditionst
v3: Rephrase final sentence in comment to explain why we don't bother
with if (i915_is_ggtt(vm)) for preferring random replacement.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170111112312.31493-3-chris@chris-wilson.co.uk
When we evict from the GTT to make room for an object, the hole we
create is put onto the MRU stack inside the drm_mm range manager. On the
next search pass, we can speed up a PIN_HIGH allocation by referencing
that stack for the new hole.
v2: Pull together the 3 identical implements (ahem, a couple were
outdated) into a common routine for allocating a node and evicting as
necessary.
v3: Detect invalid calls to i915_gem_gtt_insert()
v4: kerneldoc
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170111112312.31493-1-chris@chris-wilson.co.uk
Start converting over from the byte count to its semantic macro, either
we want to allocate the size of a physical page in main memory or we
want the size of a virtual page in the GTT. 4096 could mean either, but
PAGE_SIZE and I915_GTT_PAGE_SIZE are explicit and should help improve
code comprehension and future changes. In the future, we may want to use
variable GTT page sizes and so have the challenge of knowing which
hardcoded values were used to represent a physical page vs the virtual
page.
v2: Look for a few more 4096s to convert, discover IS_ALIGNED().
v3: 4096ul paranoia, make fence alignment a distinct value of 4096, keep
bdw stolen w/a as 4096 until we know better.
v4: Add asserts that i915_vma_insert() start/end are aligned to GTT page
sizes.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110144734.26052-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Now that it's obvious what the helpers do, we can simplify the code
somewhat by using them when clearing the pdpe/pml4e with the relevant
scratch entry.
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161213160512.7008-1-matthew.auld@intel.com
The function name gen8_setup_page_directory_pointer is misleading, and
only serves to confuse the reader, it's not setting up a pdp, but
rather encoding a specific pml4e with a given pdp.
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
The function name gen8_setup_page_directory is misleading, and only
serves to confuse the reader, it's not setting up a pd, but rather
encoding a specific pdpe with a given pd.
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
If the DMA remap fails, one cause can be that we have too many objects
pinned in a small remapping table, such as swiotlb. (DMA remapping does
not trigger the shrinker by itself on its normal failure paths.) So try
purging all other objects (using i915_gem_shrink_all(), sparing our own
pages as we have yet to assign them to the obj->pages) and try again. If
there are no pages to reclaim (and consequently no pages to unmap), the
shrinker will report 0 and we fail with -ENOSPC as before.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170106152240.5793-2-chris@chris-wilson.co.uk
Stolen memory is a hardware resource of known size, so use an accurate
fixed integer type rather than the ambiguous variable size_t. This was
motivated by the next patch spotting inconsistencies in our types.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170106152013.24684-3-chris@chris-wilson.co.uk
The GuC uses a special mapping for the upper end of the Global GTT,
similar to the way it uses a special mapping for the lower end, so
exclude it from our drm_mm to prevent us using it.
v2: Rename to reflect that it is unmappable similar to the region at the
bottom of the GGTT, and couple it into the assertion that we don't feed
unmappable addresses to the GuC.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170105153023.30575-5-chris@chris-wilson.co.uk
Directly merge drm-misc into drm-intel since Dave is on vacation and
we need the various drm-misc patches (fb format rework, drm mm fixes,
selftest framework and others). Also pulled back -rc2 in first to
resync with drm-intel-fixes and make sure I can reuse the exact rerere
solutions from drm-tip for safety, and because I'm lazy.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
If we move the sanity checking from gen8_alloc_va_range_3lvl and
gen6_alloc_va_range into i915_vma_bind, we will increase our coverage to
now both callbacks. We also convert each WARN_ON over to a GEM_WARN_ON.
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161213203222.32564-2-matthew.auld@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If we remember that node_list is a circular list containing the fake
head_node, we can use a simple list_next_entry() and skip the NULL check
for the allocated check against the head_node.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161216074718.32500-3-chris@chris-wilson.co.uk
Instead of being hidden in sanitize_enable_ppgtt.
It also seems to be the place to do so nowadays.
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
We need to distinguish between full i915_vma structs and simple
drm_mm_nodes when considering eviction (i.e. we must be careful not to
treat a mere drm_mm_node as a much larger i915_vma causing memory
corruption, if we are lucky). To do this, color these not-a-vma with -1
(I915_COLOR_UNEVICTABLE).
v2...v200: New name for -1.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161205142941.21965-1-chris@chris-wilson.co.uk
Geminilake is mostly backwards compatible with broxton, so change most
of the IS_BROXTON() checks to IS_GEN9_LP(). Differences between the
platforms will be implemented in follow-up patches.
v2: Don't reuse broxton's path in intel_update_max_cdclk().
Don't set plane count as in broxton.
v3: Rebase
v4: Include the check intel_bios_is_port_hpd_inverted().
Commit message.
v5: Leave i915_dmc_info() out; glk's csr version != bxt's. (Rodrigo)
v6: Rebase.
v7: Convert a few mode IS_BROXTON() occurances in pps, ddi, dsi and pll
code. (Rodrigo)
v8: Squash a couple of DDI patches with more conversions. (Rodrigo)
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1480667037-11215-2-git-send-email-ander.conselvan.de.oliveira@intel.com
drivers/gpu/drm/i915/./i915_trace.h: In function ‘trace_event_raw_event_i915_gem_evict’:
drivers/gpu/drm/i915/./i915_trace.h:409:24: error: ‘struct i915_address_space’ has no member named ‘dev’
__entry->dev = vm->dev->primary->index;
A couple of macros missed in the s/vm->dev/vm->i915/ conversion.
Fixes: 49d73912cb ("drm/i915: Convert vm->dev backpointer to vm->i915")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161129124205.19351-1-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
99% of the time we access i915_address_space->dev we want the i915
device and not the drm device, so let's store the drm_i915_private
backpointer instead. The only real complication here are the inlines
in i915_vma.h where drm_i915_private is not yet defined and so we have
to choose an alternate path for our asserts.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161129095008.32622-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
a PT page will be released if it doesn't contain any meaningful mappings
during PPGTT page table shrinking. The PT entry in the upper level will
be set to a scratch entry.
Normally this works nicely, but in virtualization world, the PPGTT page
table is tracked by hypervisor. Releasing the PT page before modifying
the upper level PT entry would cause extra efforts.
As the tracked page has been returned to OS before losing track from
hypervisor, it could be written in any pattern. Hypervisor has to recognize
if a page is still being used as a PT page by validating these writing
patterns. It's complicated. Better let the guest modify the PT entry in
upper level PT first, then release the PT page.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhiyuan Lv <zhiyuan.lv@intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Link: https://patchwork.freedesktop.org/patch/122697/msgid/1479728666-25333-1-git-send-email-zhi.a.wang@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1480402516-22275-1-git-send-email-zhi.a.wang@intel.com
We already have an i915_address_space_init, so for symmetry we should
also have a _fini, plus we already open code it twice. This then also
fixes a bug where we leak the timeline for the ggtt vm.
v2: don't forget about the struct_mutex for the ggtt path.
Fixes: 80b204bce8 ("drm/i915: Enable multiple timelines")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20161117210411.14044-1-chris@chris-wilson.co.uk
And a little bit of cascaded function prototype changes.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Started with removing INTEL_INFO(dev) and cascaded into a quite
big trickle of function prototype changes. Still, I think it is
for the better.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
After this patch only conversion of INTEL_INFO(p)->gen to
INTEL_GEN(dev_priv) remains before the __I915__ macro can
be removed.
v2: Tidy vlv_compute_wm. (David Weinehall)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: David Weinehall <david.weinehall@linux.intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
As a side product, had to split two other files;
- i915_gem_fence_reg.h
- i915_gem_object.h (only parts that needed immediate untanglement)
I tried to move code in as big chunks as possible, to make review
easier. i915_vma_compare was moved to a header temporarily.
v2:
- Use i915_gem_fence_reg.{c,h}
v3:
- Rebased
v4:
- Fix building when DEBUG_GEM is enabled by reordering a bit.
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1478861034-30643-1-git-send-email-joonas.lahtinen@linux.intel.com
commit bc0629a767 ("drm/i915: Track pages pinned due to swizzling
quirk") fixed one problem, but revealed a whole lot more. The root cause
of the pin count mismatch for the swizzle quirk (for L-shaped memory on
gen3/4) was that we were incrementing the pages_pin_count upon getting
the backing pages but then overwriting the pages_pin_count to set it to
1 afterwards. With a little bit of adjustment to satisfy the GEM_BUG_ON
sanitychecks, the fix is to replace the explicit atomic_set with an
atomic_inc.
v2: Consistently use atomics (not mix atomics and helpers) within the
lowlevel get_pages routines. This makes the atomic operations much
clearer.
Fixes: 1233e2db19 ("drm/i915: Move object backing storage manipulation")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161104103001.27643-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
When supplying a view to vma_compare() it is required that the supplied
i915_address_space is the global GTT. I tested the VMA instead (which is
the current position in the rbtree and maybe from any address space).
Reported-by: Matthew Auld <matthew.auld@intel.com>
Tested-by: Matthew Auld <matthew.auld@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98579
Fixes: db6c2b4151 ("drm/i915: Store the vma in an rbtree...")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161103200852.23431-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Comparing pte index to a number of entries is wrong
when clearing a range of pte entries. Use end marker
of 'one past' to correctly point adequate number of
ptes to the scratch page.
v2: assert early instead of warning late (Chris)
v3: removed consts (Joonas)
Fixes: d209b9c3cd ("drm/i915/gtt: Split gen8_ppgtt_clear_pte_range")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98282
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reported-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
With full-ppgtt one of the main bottlenecks is the lookup of the VMA
underneath the object. For execbuf there is merit in having a very fast
direct lookup of ctx:handle to the vma using a hashtree, but that still
leaves a large number of other lookups. One way to speed up the lookup
would be to use a rhashtable, but that requires extra allocations and
may exhibit poor worse case behaviour. An alternative is to use an
embedded rbtree, i.e. no extra allocations and deterministic behaviour,
but at the slight cost of O(lgN) lookups (instead of O(1) for
rhashtable). The major of such tree will be very shallow and so not much
slower, and still scales much, much better than the current unsorted
list.
v2: Bump vma_compare() to return a long, as we return the result of
comparing two pointers.
References: https://bugs.freedesktop.org/show_bug.cgi?id=87726
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161101115400.15647-1-chris@chris-wilson.co.uk
With the infrastructure converted over to tracking multiple timelines in
the GEM API whilst preserving the efficiency of using a single execution
timeline internally, we can now assign a separate timeline to every
context with full-ppgtt.
v2: Add a comment to indicate the xfer between timelines upon submission.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-35-chris@chris-wilson.co.uk
In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)
v2: Reimplement busy-ioctl (by walking the reservation object), postpone
the ABI change for another day. Similarly use the reservation object to
find the last_write request (if active and from i915) for choosing
display CS flips.
Caveats:
* busy-ioctl: busy-ioctl only reports on the native fences, it will not
warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is
being rendered to by external fences. It also will not report the same
busy state as wait-ioctl (or polling on the dma-buf) in the same
circumstances. On the plus side, it does retain reporting of which
*i915* engines are engaged with this object.
* non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering, see
"drm/i915: Restore nonblocking awaits for modesetting"
* dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 66s), mainly due to atomic operations
(maintaining the fence refcounts).
* loss of object-level retirement callbacks, emulated by VMA retirement
tracking.
* minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
The plan is to move obj->pages out from under the struct_mutex into its
own per-object lock. We need to prune any assumption of the struct_mutex
from the get_pages/put_pages backends, and to make it easier we pass
around the sg_table to operate on rather than indirectly via the obj.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-13-chris@chris-wilson.co.uk
The plan is to make obtaining the backing storage for the object avoid
struct_mutex (i.e. use its own locking). The first step is to update the
API so that normal users only call pin/unpin whilst working on the
backing storage.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-12-chris@chris-wilson.co.uk
We only need the active reference to keep the object alive after the
handle has been deleted (so as to prevent a synchronous gem_close). Why
then pay the price of a kref on every execbuf when we can insert that
final active ref just in time for the handle deletion?
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-6-chris@chris-wilson.co.uk
We only used the RPM sequence checking inside the lowlevel GTT
accessors, when we had to rely on callers taking the wakeref on our
behalf. Now that we take the RPM wakeref inside the GTT management
routines themselves, we can forgo the sanitycheck of the callers.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161024124218.18252-4-chris@chris-wilson.co.uk
We can remove the false coupling between RPM and struct mutex by the
observation that we can use the RPM wakeref as the barrier around user
mmap access. That is as we tear down the user's PTE atomically from
within rpm suspend and then to fault in new PTE requires the rpm
wakeref, means that no user access is possible through those PTE without
RPM being awake. Having made that observation, we can then remove the
presumption of having to take rpm outside of struct_mutex and so allow
fine grained acquisition of a wakeref around hw access rather than
having to remember to acquire the wakeref early on.
v2: Rejig placement of the new intel_runtime_pm_get() to be as tight
as possible around the GTT pread/pwrite.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161024124218.18252-2-chris@chris-wilson.co.uk
Since "Dynamic page table allocations" were introduced, our page tables
can grow (being dynamically allocated) with address space range usage.
Unfortunately, their lifetime is bound to vm. This is not a huge problem
when we're not using softpin - drm_mm is creating an upper bound on used
range by causing addresses for our VMAs to eventually be reused.
With softpin, long lived contexts can drain the system out of memory
even with a single "small" object. For example:
bo = bo_alloc(size);
while(true)
offset += size;
exec(bo, offset);
Will cause us to create new allocations until all memory in the system
is used for tracking GPU pages (even though almost all PTEs in this vm
are pointing to scratch).
Let's free unused page tables in clear_range to prevent this - if no
entries are used, we can safely free it and return this information to
the caller (so that higher-level entry is pointing to scratch).
v2: Document return value and free semantics (Joonas)
v3: No newlines in vars block (Joonas)
v4: Drop redundant local 'reduce' variable
v5: Handle CI fail with enable_ppgtt=2
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1476360162-24062-3-git-send-email-michal.winiarski@intel.com
Let's use more top-down approach, where each gen8_ppgtt_clear_* function
is responsible for clearing the struct passed as an argument and calling
relevant clear_range functions on lower-level tables.
Doing this rather than operating on PTE ranges makes the implementation
of shrinking page tables quite simple.
v2: Drop min when calculating num_entries, no negation in 48b ppgtt
check, no newlines in vars block (Joonas)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1476360162-24062-2-git-send-email-michal.winiarski@intel.com
We never used any invalid ptes, those were put in place for
a possibility of doing gpu faults. However our batchbuffers are not
restricted in length, so everything needs to be pointing to something
and thus out-of-bounds is pointing to scratch.
Remove the valid flag as it is always true.
v2: Expand commit msg, patch reorder (Mika)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1476360162-24062-1-git-send-email-michal.winiarski@intel.com
Saves 864 bytes of .rodata strings and ~100 of .text.
v2: Add parantheses around dev_priv. (Ville Syrjala)
v3: Rebase.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Saves 1392 bytes of .rodata strings.
Also change a few function/macro prototypes in i915_gem_gtt.c
from dev to dev_priv where it made more sense to do so.
v2: Add parantheses around dev_priv. (Ville Syrjala)
v3: Mention function prototype changes. (David Weinehall)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: David Weinehall <david.weinehall@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Saves 1016 bytes of .rodata strings and couple hundred of .text.
v2: Add parantheses around dev_priv. (Ville Syrjala)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
Since the GTT provides universal access to any GPU page, we can use it
to reduce our plethora of read methods to just one. It also has the
important characteristic of being exactly what the GPU sees - if there
are incoherency problems, seeing the batch as executed (rather than as
trapped inside the cpu cache) is important.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161012090522.367-4-chris@chris-wilson.co.uk
Recently I have been applying an optimisation to avoid stalling and
clflushing GGTT objects based on their current binding. That is we only
set-to-gtt-domain upon first bind. However, on hibernation the objects
remain bound, but they are in the CPU domain. Currently (since commit
975f7ff42e ("drm/i915: Lazily migrate the objects after hibernation"))
we only flush scanout objects as all other objects are expected to be
flushed prior to use. That breaks down in the face of the runtime
optimisation above - and we need to flush all GGTT pinned objects
(essentially ringbuffers).
To reduce the burden of extra clflushes, we only flush those objects we
cannot discard from the GGTT. Everything pinned to the scanout, or
current contexts or ringbuffers will be flushed and rebound. Other
objects, such as inactive contexts, will be left unbound and in the CPU
domain until first use after resuming.
Fixes: 7abc98fadf ("drm/i915: Only change the context object's domain...")
Fixes: 57e8853181 ("drm/i915: Use VMA for ringbuffer tracking")
References: https://bugs.freedesktop.org/show_bug.cgi?id=94722
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: David Weinehall <david.weinehall@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909201957.2499-1-chris@chris-wilson.co.uk
In the next patch we want to handle reset directly by a locked waiter in
order to avoid issues with returning before the reset is handled. To
handle the reset, we must first know whether we hold the struct_mutex.
If we do not hold the struct_mtuex we can not perform the reset, but we do
not block the reset worker either (and so we can just continue to wait for
request completion) - otherwise we must relinquish the mutex.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-10-chris@chris-wilson.co.uk
As we never need to directly access the pages we allocate for scratch and
the pagetables, and always remap them into the GTT through the dma
remapper, we do not need to limit the allocations to lowmem i.e. we can
pass in the __GFP_HIGHMEM flag to the page allocation.
For backwards compatibility, e.g. certain old GPUs not liking highmem
for certain functions that may be accidentally mapped to the scratch
page by userspace, keep the GMCH probe as only allocating from DMA32.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20160822074431.26872-3-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
As the scratch page is no longer shared between all VM, and each has
their own, forgo the small allocation and simply embed the scratch page
struct into the i915_address_space.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20160822074431.26872-2-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@intel.com>
In an effort to simplify things for a future push of dev_priv instead
of dev wherever possible, always take pdev via dev_priv where
feasible, eliminating the direct access from dev. Right now this
only eliminates a few cases of dev, but it also obviates that we pass
dev into a lot of functions where dev_priv would be the more obvious
choice.
v2: Fixed one more place missing in the previous patch set
Signed-off-by: David Weinehall <david.weinehall@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160822103245.24069-5-david.weinehall@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We currently have a mix of struct device *device, struct device *kdev,
and struct device *dev (the latter forcing us to refer to
struct drm_device as something else than the normal dev).
To simplify things, always use kdev when referring to struct device.
v2: Replace the dev_to_drm_minor() macro with the inline function
kdev_to_drm_minor().
Signed-off-by: David Weinehall <david.weinehall@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160822103245.24069-3-david.weinehall@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Since by design, if not entirely by practice, nothing is allowed to
access the scratch page we use to background fill the VM, then we do not
need to ensure that it is coherent between the CPU and GPU.
set_pages_uc() does a stop_machine() after changing the PAT, and that
significantly impacts upon context creation throughput.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20160822074431.26872-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
As io_mapping.h now always allocates the struct, we can avoid that
allocation and extra pointer dance by embedding the struct inside
drm_i915_private
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160819155428.1670-5-chris@chris-wilson.co.uk
In order to handle tiled partial GTT mmappings, we need to associate the
fence with an individual vma.
v2: A couple of silly drops replaced spotted by Joonas
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-21-chris@chris-wilson.co.uk
By moving map-and-fenceable tracking from the object to the VMA, we gain
fine-grained tracking and the ability to track individual fences on the VMA
(subsequent patch).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-16-chris@chris-wilson.co.uk
Treat the VMA as the primary struct responsible for tracking bindings
into the GPU's VM. That is we want to treat the VMA returned after we
pin an object into the VM as the cookie we hold and eventually release
when unpinning. Doing so eliminates the ambiguity in pinning the object
and then searching for the relevant pin later.
v2: Joonas' stylistic nitpicks, a fun rebase.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-27-git-send-email-chris@chris-wilson.co.uk
Access through the GTT requires the device to be awake. Ideally
i915_vma_pin_iomap() is short-lived and the pinning demarcates the
access through the iomap. This is not entirely true, we have a mixture
of long lived pins that exceed the wakelock (such as legacy ringbuffers)
and short lived pin that do live within the wakelock (such as execlist
ringbuffers).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-17-git-send-email-chris@chris-wilson.co.uk
Previously, we would only set the vma->pages pointer for GGTT entries.
However, if we always set it, we can use it to prettify some code that
may want to access the backing store associated with the VMA (as
assigned to the VMA).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-8-git-send-email-chris@chris-wilson.co.uk
Redo the fb rotation handling in order to:
- eliminate the NV12 special casing
- handle fb->offsets[] properly
- make the rotation handling easier for the plane code
To achieve these goals we reduce intel_rotation_info to only contain
(for each plane) the rotated view width,height,stride in tile units,
and the page offset into the object where the plane starts. Each plane
is handled exactly the same way, no special casing for NV12 or other
formats. We then store the computed rotation_info under
intel_framebuffer so that we don't have to recompute it again.
To handle fb->offsets[] we treat them as a linear offsets and convert
them to x/y offsets from the start of the relevant GTT mapping (either
normal or rotated). We store the x/y offsets under intel_framebuffer,
and for some extra convenience we also store the rotated pitch (ie.
tile aligned plane height). So for each plane we have the normal
x/y offsets, rotated x/y offsets, and the rotated pitch. The normal
pitch is available already in fb->pitches[].
While we're gathering up all that extra information, we can also easily
compute the storage requirements for the framebuffer, so that we can
check that the object is big enough to hold it.
When it comes time to deal with the plane source coordinates, we first
rotate the clipped src coordinates to match the relevant GTT view
orientation, then add to them the fb x/y offsets. Next we compute
the aligned surface page offset, and as a result we're left with some
residual x/y offsets. Finally, if required by the hardware, we convert
the remaining x/y offsets into a linear offset.
For gen2/3 we simply skip computing the final page offset, and just
convert the src+fb x/y offsets directly into a linear offset since
that's what the hardware wants.
After this all platforms, incluing SKL+, compute these things in exactly
the same way (excluding alignemnt differences).
v2: Use BIT(DRM_ROTATE_270) instead of ROTATE_270 when rotating
plane src coordinates
Drop some spurious changes that got left behind during
development
v3: Split out more changes to prep patches (Daniel)
s/intel_fb->plane[].foo.bar/intel_fb->foo[].bar/ for brevity
Rename intel_surf_gtt_offset to intel_fb_gtt_offset
Kill the pointless 'plane' parameter from intel_fb_gtt_offset()
v4: Fix alignment vs. alignment-1 when calling
_intel_compute_tile_offset() from intel_fill_fb_info()
Pass the pitch in tiles in
stad of pixels to intel_adjust_tile_offset() from intel_fill_fb_info()
Pass the full width/height of the rotated area to
drm_rect_rotate() for clarity
Use u32 for more offsets
v5: Preserve the upper_32_bits()/lower_32_bits() handling for the
fb ggtt offset (Sivakumar)
v6: Rebase due to drm_plane_state src/dst rects
Cc: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470821001-25272-2-git-send-email-ville.syrjala@linux.intel.com
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now that we pass along the expected interruptible nature for the
wait-for-idle, we do not need to modify the global
i915->mm.interruptible for this single call. (Only the immediate call to
i915_gem_wait_for_idle() takes the interruptible status as the other
action, dma_map_sg(), is independent of i915.ko)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-7-git-send-email-chris@chris-wilson.co.uk
The principal motivation for this was to try and eliminate the
struct_mutex from i915_gem_suspend - but we still need to hold the mutex
current for the i915_gem_context_lost(). (The issue there is that there
may be an indirect lockdep cycle between cpu_hotplug (i.e. suspend) and
struct_mutex via the stop_machine().) For the moment, enabling last
request tracking for the engine, allows us to do busyness checking and
waiting without requiring the struct_mutex - which is useful in its own
right.
As a side-effect of having a robust means for tracking engine busyness,
we can replace our other busyness heuristic, that of comparing against
the last submitted seqno. For paranoid reasons, we have a semi-ordered
check of that seqno inside the hangchecker, which we can now improve to
an ordered check of the engine's busyness (removing a locked xchg in the
process).
v2: Pass along "bool interruptible" as being unlocked we cannot rely on
i915->mm.interruptible being stable or even under our control.
v3: Replace check Ironlake i915_gpu_busy() with the common precalculated value
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-6-git-send-email-chris@chris-wilson.co.uk
Since i915_gem_obj_ggtt_pin() is an idiom breaking curry function for
i915_gem_object_ggtt_pin(), spare us the confusion and remove it.
Removing it now simplifies later patches to change the i915_vma_pin()
(and friends) interface.
v2: Add a redundant GEM_BUG_ON(!view) to
i915_gem_obj_lookup_or_create_ggtt_vma()
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-18-git-send-email-chris@chris-wilson.co.uk
In preparation to perform some magic to speed up i915_vma_pin(), which
is among the hottest of hot paths in execbuf, refactor all the bitfields
accessed by i915_vma_pin() into a single unified set of flags.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-16-git-send-email-chris@chris-wilson.co.uk
During execbuffer we look up the i915_vma in order to reserve them in
the VM. However, we then do a double lookup of the vma in order to then
pin them, all because we lack the necessary interfaces to operate on
i915_vma - so introduce i915_vma_pin()!
v2: Tidy parameter lists to remove one level of redirection in the hot
path.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-15-git-send-email-chris@chris-wilson.co.uk
This reverts commit e9f24d5fb7.
The patch was only a stop-gap measure that fixed half the problem - the
leak of the fbcon when restarting X. A complete solution required
releasing the VMA when the object itself was closed rather than rely on
file/process exit. The previous patches add the VMA tracking necessary
to do close them along with the object, context or file, and so the time
has come to remove the partial fix.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-28-git-send-email-chris@chris-wilson.co.uk
When the user closes the context mark it and the dependent address space
as closed. As we use an asynchronous destruct method, this has two
purposes. First it allows us to flag the closed context and detect
internal errors if we to create any new objects for it (as it is removed
from the user's namespace, these should be internal bugs only). And
secondly, it allows us to immediately reap stale vma.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-27-git-send-email-chris@chris-wilson.co.uk
In order to prevent a leak of the vma on shared objects, we need to
hook into the object_close callback to destroy the vma on the object for
this file. However, if we destroyed that vma immediately we may cause
unexpected application stalls as we try to unbind a busy vma - hence we
defer the unbind to when we retire the vma.
v2: Keep vma allocated until closed. This is useful for a later
optimisation, but it is required now in order to handle potential
recursion of i915_vma_unbind() by retiring itself.
v3: Comments are important.
Testcase: igt/gem_ppggtt/flink-and-close-vma-leak
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-26-git-send-email-chris@chris-wilson.co.uk
Hook the vma itself into the i915_gem_request_retire() so that we can
accurately track when a solitary vma is inactive (as opposed to having
to wait for the entire object to be idle). This improves the interaction
when using multiple contexts (with full-ppgtt) and eliminates some
frequent list walking when retiring objects after a completed request.
A side-effect is that we get an active vma reference for free. The
consequence of this is shown in the next patch...
v2: Update inline names to be consistent with
i915_gem_object_get_active()
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-25-git-send-email-chris@chris-wilson.co.uk
For the global GTT (and aliasing GTT), the address space is owned by the
device (it is a global resource) and so the per-file owner field is
NULL. For per-process GTT (where we create an address space per
context), each is owned by the opening file. We can use this ownership
information to both distinguish GGTT and ppGTT address spaces, as well
as occasionally inspect the owner.
v2: Whitespace, tells us who owns i915_address_space
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-6-git-send-email-chris@chris-wilson.co.uk
Since we have a static if-else-chain for device probing of the global
GTT, we do not need to use a function pointer, let alone store it when
we never use it again. So use the if-else-chain to call down into the
device specific probe.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-5-git-send-email-chris@chris-wilson.co.uk
Initialising the global GTT is tricky as we wish to use the drm_mm range
manager during the modesetting initialisation (to capture stolen
allocations from the BIOS) before we actually enable GEM. To overcome
this, we currently setup the drm_mm first and then carefully rebind
them.
v2: Fixup after rebasing
v3: GGTT initialisation needs to be split around kicking out conflicts
v4: Restore an old UMS BUG_ON(mappable > total) as a DRM_ERROR plus
fixup of probe results.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-4-git-send-email-chris@chris-wilson.co.uk
Since these are internal functions they operate on drm_i915_private and
not the drm_device being passed in. So pass in the drm_i915_private
instead, and remove one layer of dancing. No space wins here, just
conforming to the norm in function parameters.
v2: Include all the probe functions
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-3-git-send-email-chris@chris-wilson.co.uk
In order to handle conflicting drivers (i.e. vgacon) having a different
setup of hardware, we have to remove those other drivers before we try
to setup our own mappings. This requires us to split GGTT initialisation
between probing for the hardware location (part of the PCI BAR) and
later establishing the kernel resources for it.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-2-git-send-email-chris@chris-wilson.co.uk
Rather than passing a complete set of GPU cache domains for either
invalidation or for flushing, or even both, just pass a single parameter
to the engine->emit_flush to determine the required operations.
engine->emit_flush(GPU, 0) -> engine->emit_flush(EMIT_INVALIDATE)
engine->emit_flush(0, GPU) -> engine->emit_flush(EMIT_FLUSH)
engine->emit_flush(GPU, GPU) -> engine->emit_flush(EMIT_FLUSH | EMIT_INVALIDATE)
This allows us to extend the behaviour easily in future, for example if
we want just a command barrier without the overhead of flushing.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-8-git-send-email-chris@chris-wilson.co.uk
Space for flushing the GPU cache prior to completing the request is
preallocated and so cannot fail - the GPU caches will always be flushed
along with the completed request. This means we no longer have to track
whether the GPU cache is dirty between batches like we had to with the
outstanding_lazy_seqno.
With the removal of the duplication in the per-backend entry points for
emitting the obsolete lazy flush, we can then further unify the
engine->emit_flush.
v2: Expand a bit on the legacy of gpu_caches_dirty
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-18-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-7-git-send-email-chris@chris-wilson.co.uk
The state stored in this struct is not only the information about the
buffer object, but the ring used to communicate with the hardware. Using
buffer here is overly specific and, for me at least, conflates with the
notion of buffer objects themselves.
s/struct intel_ringbuffer/struct intel_ring/
s/enum intel_ring_hangcheck/enum intel_engine_hangcheck/
s/describe_ctx_ringbuf()/describe_ctx_ring()/
s/intel_ring_get_active_head()/intel_engine_get_active_head()/
s/intel_ring_sync_index()/intel_engine_sync_index()/
s/intel_ring_init_seqno()/intel_engine_init_seqno()/
s/ring_stuck()/engine_stuck()/
s/intel_cleanup_engine()/intel_engine_cleanup()/
s/intel_stop_engine()/intel_engine_stop()/
s/intel_pin_and_map_ringbuffer_obj()/intel_pin_and_map_ring()/
s/intel_unpin_ringbuffer()/intel_unpin_ring()/
s/intel_engine_create_ringbuffer()/intel_engine_create_ring()/
s/intel_ring_flush_all_caches()/intel_engine_flush_all_caches()/
s/intel_ring_invalidate_all_caches()/intel_engine_invalidate_all_caches()/
s/intel_ringbuffer_free()/intel_ring_free()/
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-15-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-4-git-send-email-chris@chris-wilson.co.uk
Ringbuffers are now being written to either through LLC or WC paths, so
treating them as simply iomem is no longer adequate. However, for the
older !llc hardware, the hardware is documentated as treating the TAIL
register update as serialising, so we can relax the barriers when filling
the rings (but even if it were not, it is still an uncached register write
and so serialising anyway.).
For simplicity, let's ignore the iomem annotation.
v2: Remove iomem from ringbuffer->virtual_address
v3: And for good measure add iomem elsewhere to keep sparse happy
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> #v2
Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-8-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-7-git-send-email-chris@chris-wilson.co.uk
One of the numerous VT-d workarounds we require is that the display
hardware reads past the end of the buffer triggering VT-d faults. This
is acknowledged in the code as being safe "since we fill the unused
portions of the GGTT with the scratch page". Alas, that is no longer
always true and so we trigger DMAR read faults.
Skylake also requires another workaround to avoid mixing VT-d and
unpopulated PTE, and so there we also need to ensure we fill unused
entries with the scratch page.
Reported-by: Mike Lothian <mike@fireburn.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96584
Fixes: f7770bfd9f ("drm/i915: Skip clearing the GGTT on full-ppgtt systems")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: David Weinehall <david.weinehall@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466773634-8106-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: David Weinehall <david.weinehall@intel.com>
Since drm_i915_private is now a subclass of drm_device we do not need to
chase the drm_i915_private->dev backpointer and can instead simply
access drm_i915_private->drm directly.
text data bss dec hex filename
1068757 4565 416 1073738 10624a drivers/gpu/drm/i915/i915.ko
1066949 4565 416 1071930 105b3a drivers/gpu/drm/i915/i915.ko
Created by the coccinelle script:
@@
struct drm_i915_private *d;
identifier i;
@@
(
- d->dev->i
+ d->drm.i
|
- d->dev
+ &d->drm
)
and for good measure the dev_priv->dev backpointer was removed entirely.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467711623-2905-4-git-send-email-chris@chris-wilson.co.uk
Since we now subclass struct drm_device, we can save pointer dances by
noting the equivalence of struct drm_device and struct drm_i915_private,
i.e. by using to_i915().
text data bss dec hex filename
1073824 4562 416 1078802 107612 drivers/gpu/drm/i915/i915.ko
1068976 4562 416 1073954 106322 drivers/gpu/drm/i915/i915.ko
Created by the coccinelle script:
@@
expression E;
identifier p;
@@
- struct drm_i915_private *p = E->dev_private;
+ struct drm_i915_private *p = to_i915(E);
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467628477-25379-1-git-send-email-chris@chris-wilson.co.uk
Gen8 versions of these macros were updated a few months ago
(e8ebd8e drm/i915: eliminate 'temp' in gen8_for_each macros)
originally because at least one iterator could generate an
out of bounds access, but also because eliminating the 'temp'
parameter generated smaller and faster code.
Matthew Auld recently noticed the same problem with the gen6
versions and provided a patch
https://lists.freedesktop.org/archives/intel-gfx/2016-June/099334.html
but while we're changing these, we might as well make them as
much like the gen8 versions as possible, including the style
of using "&& (..., true)" rather than ": (..., 1) : 0", and
of course eliminating the redundant 'temp'.
Furthermore, the "all_pdes" version is only used in one place,
so we can improve code efficiency by changing both the macro
parameters and the calling code to reduce extra dereferences.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466793466-23500-1-git-send-email-david.s.gordon@intel.com
We only need to force a switch to the kernel context placeholder during
eviction. All other uses of i915_gpu_idle() just want to wait until
existing work on the GPU is idle. Rename i915_gpu_idle() to
i915_gem_wait_for_idle() to avoid any implications about "parking" the
context first.
v2: Tweak an error message if the wait fails for the ilk vtd w/a
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466776558-21516-6-git-send-email-chris@chris-wilson.co.uk
v5:
- Let functions take struct drm_i915_private *. (Tvrtko)
- Fold vGPU related active check into the inner functions. (Kevin)
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466078825-6662-4-git-send-email-zhi.a.wang@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Introduced a new vm specfic callback insert_page() to program a single pte in
ggtt or ppgtt. This allows us to map a single page in to the mappable aperture
space. This can be iterated over to access the whole object by using space as
meagre as page size.
v2: Added low level rpm assertions to insert_page routines (Chris)
v3: Added POSTING_READ post register write (Tvrtko)
v4: Rebase (Ankit)
v5: Removed wmb() and FLUSH_CTL from insert_page, caller to take care
of it (Chris)
v6: insert_page not working correctly without FLSH_CNTL write, added the
write again.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
The existing for_each_sg_page() iterator is somewhat heavyweight, and is
limiting i915 driver performance in a few benchmarks. So here we
introduce somewhat lighter weight iterators, primarily for use with GEM
objects or other case where we need only deal with whole aligned pages.
Unlike the old iterator, the new iterators use an internal state
structure which is not intended to be accessed by the caller; instead
each takes as a parameter an output variable which is set before each
iteration. This makes them particularly simple to use :)
One of the new iterators provides the caller with the DMA address of
each page in turn; the other provides the 'struct page' pointer required
by many memory management operations.
Various uses of for_each_sg_page() are then converted to the new macros.
v2: Force inlining of the sg_iter constructor and make the union
anonymous.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1463741647-15666-4-git-send-email-chris@chris-wilson.co.uk
Under full-ppgtt, access to the global GTT is carefully regulated
through hardware functions (i.e. userspace cannot read and write to
arbitrary locations in the GGTT via the GPU). With this restriction in
place, we can forgo clearing stale entries from the GGTT as they will
not be accessed.
For aliasing-ppgtt, we could almost do the same except that we do allow
userspace access to the global-GTT via execbuf in order to workraound
some quirks of certain instructions. (This execbuf path is filtered out
with EINVAL on full-ppgtt.)
The most dramatic effect this will have will be during resume, as with
full-ppgtt the GGTT is only used sparingly.
References: https://bugs.freedesktop.org/show_bug.cgi?id=94722
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: David Weinehall <david.weinehall@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Tested-by: David Weinehall <david.weinehall@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463207195-22076-4-git-send-email-chris@chris-wilson.co.uk
Now that we mark the object domains for having been restored from the
hibernation image, we not need to flush everything during resume and
can instead rely on the normal domain tracking to flush only when
required. The only caveat here are objects that are pinned for use by
the hardware, whose contents must be coherent for when the device
resumes reading from then (shortly afterwards with the driver assuming
the objects are in the correct domain).
References: https://bugs.freedesktop.org/show_bug.cgi?id=94722
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Cc: David Weinehall <david.weinehall@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Tested-by: David Weinehall <david.weinehall@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463207195-22076-3-git-send-email-chris@chris-wilson.co.uk
Pass drm_i915_private to the uncore init/fini routines and their
subservients as it is their native type.
text data bss dec hex filename
6309978 3578778 696320 10585076 a183f4 vmlinux
6309530 3578778 696320 10584628 a18234 vmlinux
a modest 400 bytes of saving, but 60 lines of code deleted!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1462885804-26750-1-git-send-email-chris@chris-wilson.co.uk
Move the intel_enable_gtt() call to happen before we touch the GTT
during resume. Right now it's done way too late. Before
commit ebb7c78d35 ("agp/intel-gtt: Only register fake agp driver for gen1")
it was actually done earlier on account of also getting called from
the resume hook of the fake agp driver. With the fake agp driver
no longer getting registered we must move the call up.
The symptoms I've seen on my 830 machine include lowmem corruption,
other kinds of memory corruption, and straight up hung machine during
or just after resume. Not really sure what causes the memory corruption,
but so far I've not seen any with this fix.
I think we shouldn't really need to call this during init, but we have
been doing that so I've decided to keep the call. However moving that
call earlier could be prudent as well. Doing it right after the
intel-gtt probe seems appropriate.
Also tested this on 946gz,elk,ilk and all seemed quite happy with
this change.
v2: Reorder init_hw vs. enable_hw functions (Chris)
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: drm-intel-fixes@lists.freedesktop.org
Fixes: ebb7c78d35 ("agp/intel-gtt: Only register fake agp driver for gen1")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1462559755-353-1-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
text data bss dec hex filename
6309351 3578714 696320 10584385 a18141 vmlinux
6308391 3578714 696320 10583425 a17d81 vmlinux
Almost 1KiB of code reduction.
v2: More s/INTEL_INFO()->gen/INTEL_GEN()/ and IS_GENx() conversions
text data bss dec hex filename
6304579 3578778 696320 10579677 a16edd vmlinux
6303427 3578778 696320 10578525 a16a5d vmlinux
Now over 1KiB!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1462545621-30125-3-git-send-email-chris@chris-wilson.co.uk
The code to switch_mm() is already handled by i915_switch_context(), the
only difference required to setup the aliasing ppgtt is that we need to
emit te switch_mm() on the first context, i.e. when transitioning from
engine->last_context == NULL. This allows us to defer the
initialisation of the GPU from early device initialisation to first use,
which should marginally speed up both. The caveat is that we then defer
the context initialisation until first use - i.e. we cannot assume that
the GPU engines are initialised. For example, this means that power
contexts for rc6 (Ironlake) need to explicitly loaded, as they are.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-11-git-send-email-chris@chris-wilson.co.uk
By tracking the iomapping on the VMA itself, we can share that area
between multiple users. Also by only revoking the iomapping upon
unbinding from the mappable portion of the GGTT, we can keep that iomap
across multiple invocations (e.g. execlists context pinning).
Note that by moving the iounnmap tracking to the VMA, we actually end up
fixing a leak of the iomapping in intel_fbdev.
v1.5: Rebase prompted by Tvrtko
v2: Drop dev_priv parameter, we can recover the i915_ggtt from the vma.
v3: Move handling of ioremap space exhaustion to vmap_purge and also
allow vmallocs to recover old iomaps. Add Tvrtko's kerneldoc.
v4: Fix a use-after-free in shrinker and rearrange i915_vma_iomap
v5: Back to i915_vm_to_ggtt
v6: Use i915_vma_pin_iomap and i915_vma_unpin_iomap to mark critical
sections and ensure the VMA cannot be reaped whilst mapped.
v7: Move i915_vma_iounmap so that consumers of the API are not tempted,
and add iomem annotations
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-5-git-send-email-chris@chris-wilson.co.uk
In a couple of places, we have an i915_address_space that we know is
really an i915_ggtt that we want to use. Create an inline helper to
convert from the i915_address_space subclass into its container.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-4-git-send-email-chris@chris-wilson.co.uk
Remove dev local and use to_i915() in gen8_ppgtt_notify_vgt.
v2: use dev_priv directly for QUESTION_MACROS (Joonas Lahtinen)
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461323365-21256-1-git-send-email-matthew.auld@intel.com
We need to kunmap pt_vaddr and not pt itself, otherwise we end up
mapping a bunch of pages without ever unmapping them.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Fixes: d1c54acd67 ("drm/i915/gtt: Introduce kmap|kunmap for dma page")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460476663-24890-4-git-send-email-matthew.auld@intel.com
Store the edram capabilities instead of only the size of
edram. This is preparatory patch to allow edram size calculation
based on edram capability bits for gen9+. With gen9 the
edram is behind llc and is a separate entity. With hsw/bdw
it was more of a victim cache for LLC so the name 'eLLC' might
be warranted. Regardless, rename all mentions of eLLC to EDRAM to
clear the confusion.
v2: return bytes for edram size (Chris)
s/eLLC/eDRAM in output if we are gen > 8
v3: rebase, INTEL_GEN (Chris)
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
I have instances where I want to use drm_malloc_ab() but with a custom
gfp mask. And with those, where I want a temporary allocation, I want to
try a high-order kmalloc() before using a vmalloc().
So refactor my usage into drm_malloc_gfp().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: dri-devel@lists.freedesktop.org
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460113874-17366-6-git-send-email-chris@chris-wilson.co.uk
dev_priv is what the macro works hard to extract, pass it directly.
> sed 's/\([A-Z].*(dev_priv\)->dev)/\1)/g'
v2:
- Include all wrapper macros too (Chris)
v3:
- Include sed cmdline (Chris)
v4:
- Break long line
- Rebase
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460016485-8089-1-git-send-email-joonas.lahtinen@linux.intel.com
Looks much better without container_of everywhere.
v2:
- In i915_gem_restore_gtt_mappings too (Chris)
v3:
- Do not cause WARN by calling on non PPGTT object (Chris)
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Refer to the GGTT VM consistently as "ggtt->base" instead of just "ggtt",
"vm" or indirectly through other variables like "dev_priv->ggtt.base"
to avoid confusion with the i915_ggtt object itself and PPGTT VMs.
Refer to the GGTT as "ggtt" instead of indirectly through chaining.
As a bonus gets rid of the long-standing i915_obj_to_ggtt vs.
i915_gem_obj_to_ggtt conflict, due to removal of i915_obj_to_ggtt!
v2:
- Added some more after grepping sources with Chris
v3:
- Refer to GGTT VM through ggtt->base consistently instead of ggtt_vm
(Chris)
v4:
- Convert all dev_priv->ggtt->foo accesses to ggtt->foo.
v5:
- Make patch checker happy
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Rename and document the GGTT init functions to give a better
idea of the context where they are called from.
i915_gem_gtt_init => i915_ggtt_init_hw
i915_gem_init_global_gtt => i915_gem_init_ggtt
i915_global_gtt_cleanup => i915_ggtt_cleanup_hw
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1458830866-12578-1-git-send-email-joonas.lahtinen@linux.intel.com
Lets BUG_ON and don't bother with a WARN and returning an error, so we can
remove the need to pollute the code with error handling, after all it is
a programmer error to provide NULL view. Also while we're here remove
redundant NULL ggtt_view check.
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1458834860-7898-1-git-send-email-matthew.auld@intel.com
Having provided for_each_engine_id() for cases where the third (id)
argument is useful, we can now replace all the remaining instances with
a simpler version that takes only two parameters. In many cases, this
also allows the elimination of the local variable used in the iterator
(usually 'i').
v2:
s/dev_priv/(dev_priv__)/ in body of for_each_engine_masked() [Chris Wilson]
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1458757194-17783-2-git-send-email-david.s.gordon@intel.com
In commit 0a87871626
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Thu Oct 15 14:23:01 2015 +0200
drm/i915: restore ggtt double-bind avoidance
we wrote the ggtt_bind_vma() observing a number of cleanups we could do
over the template of aliasing_gtt_bind_vma(). Now let's apply the
cleanups we made there back to the original. The essence is to avoid
redundant variables and assignements, and by doing so make the code
easier to read.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1448015238-24639-1-git-send-email-chris@chris-wilson.co.uk
Throughout the code base, we use u32 for offsets into the global GTT. If
we ever see any hardware with a larger GGTT, then we run the real risk
of silent corruption. So test for our assumption up front so that we
have a nice reminder should the time come when it fails.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
[Rebased and changed 1ull -> 1ULL, cut 80 char line]
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1458290579-27783-1-git-send-email-joonas.lahtinen@linux.intel.com
Use less pointers with the probing code, making it much less confusing
to read.
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Refer to Global GTT consistently as GGTT, thus rename dev_priv->gtt
to dev_priv->ggtt and struct i915_gtt to struct i915_ggtt.
Fix a couple of whitespace problems while at it.
v2:
- Fix a typo in commit message.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Where we have a request we can use req->i915 directly instead
of going through the engine and device. Coccinelle script:
@@
function f;
identifier r;
@@
f(..., struct drm_i915_gem_request *r, ...)
{
...
- engine->dev->dev_private
+ r->i915
...
}
@@
struct drm_i915_gem_request *req;
@@
(
req->
- engine->dev->dev_private
+ i915
)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1458219850-21007-1-git-send-email-tvrtko.ursulin@linux.intel.com
Some trivial ones, first pass done with Coccinelle:
@@
@@
(
- I915_NUM_RINGS
+ I915_NUM_ENGINES
|
- intel_ring_flag
+ intel_engine_flag
|
- for_each_ring
+ for_each_engine
|
- i915_gem_request_get_ring
+ i915_gem_request_get_engine
|
- intel_ring_idle
+ intel_engine_idle
|
- i915_gem_reset_ring_status
+ i915_gem_reset_engine_status
|
- i915_gem_reset_ring_cleanup
+ i915_gem_reset_engine_cleanup
|
- init_ring_lists
+ init_engine_lists
)
But that didn't fully work so I cleaned it up with:
for f in *.[hc]; do sed -i -e s/I915_NUM_RINGS/I915_NUM_ENGINES/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_request_get_ring/i915_gem_request_get_engine/ $f; done
for f in *.[hc]; do sed -i -e s/intel_ring_flag/intel_engine_flag/ $f; done
for f in *.[hc]; do sed -i -e s/intel_ring_idle/intel_engine_idle/ $f; done
for f in *.[hc]; do sed -i -e s/init_ring_lists/init_engine_lists/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_reset_ring_cleanup/i915_gem_reset_engine_cleanup/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_reset_ring_status/i915_gem_reset_engine_status/ $f; done
v2: Rebase.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
rotate_pages() checks to see if it got called with a NULL sg, and then
goes to extract it from sg->sgl. It always gets called with a NULL sg
for the first plane, so moving the initial 'sg=st->sgl' assignment out
into intel_rotate_fb_obj_pages() seems less special-casey.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1455569699-27905-9-git-send-email-ville.syrjala@linux.intel.com
Throw out a bunch of unnecessary stuff from struct intel_rotation_info,
and pull most of the remaining stuff to live under an array of
per-color plane sub-structures.
What still remains outside the sub-structure will be reorgranized later
as well, but that requires more work elsewhere so leave it be for now.
v2: Split the vma size == luma+chroma size fix to prep patch (Daniel)
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> (v1)
Link: http://patchwork.freedesktop.org/patch/msgid/1455569699-27905-8-git-send-email-ville.syrjala@linux.intel.com
The size of the rotated ggtt mapping ought to include the size of the
chroma plane as well. Not a huge deal since we don't expose NV12 (or any
pother planar format for that matter) yet.
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Fixes: 89e3e14276 ("drm/i915: Support NV12 in rotated GGTT mapping")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1455569699-27905-2-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
The multiple levels of indirect do nothing but hinder the compiler and
the pointer chasing turns to be quite painful but painless to fix.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1456484600-11477-1-git-send-email-tvrtko.ursulin@linux.intel.com
Elsewhere we have adopted the convention of using '_link' to denote
elements in the list (and '_list' for the actual list_head itself), and
that the name should indicate which list the link belongs to (and
preferrably not just where the link is being stored).
s/vma_link/obj_link/ (we iterate over obj->vma_list)
s/mm_list/vm_link/ (we iterate over vm->[in]active_list)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
WaIncreaseDefaultTLBEntries increases the number of TLB
entries available for GPGPU workloads and gives significant
( > 10% ) performance gain for some OCL benchmarks.
Put this in a new function that can be a place for
workarounds that are GT related but not required per ring.
This function is called on driver load and also after a
reset and on resume, so it is safe for workarounds that get
clobbered in these situations. This function currently has
just this one workaround.
v2: This was originally split into 3 patches but following
review feedback was squashed into 1.
I have not incorporated some style comments from Chris
Wilson as I felt that after defining and intialising a
temporary variable and then adding an additional if block
to only write the register if the temporary variable had
been set, this didn't really give a net gain.
v3: Resending in the hope that BAT will run
v4: Change subject line to trigger BAT (please!)
Signed-off-by: Tim Gore <tim.gore@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1454586574-2343-1-git-send-email-tim.gore@intel.com
The only device specific dependency of the stolen memory setup is the
MMIO mapping and the stolen memory size. Both are already available in
i915_gtt_init(), so move the stolen initialization to there. The
clean-up code for i915_gtt_init() is in i915_global_gtt_cleanup(), so
move the stolen memory clean-up code there too.
This will be needed by an upcoming patch that needs the details of the
memory we reserve, but the change is also part of our generic goal to
move the initialization of resources with no or little dependencies on
other device specific resources towards the beginning of the init
sequence.
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: David Weinehall <david.weinehall@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453209992-25995-8-git-send-email-imre.deak@intel.com
There was a silent conflict between
commit 0a87871626
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Thu Oct 15 14:23:01 2015 +0200
drm/i915: restore ggtt double-bind avoidance
and
commit 5bab6f60cb
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Fri Oct 23 18:43:32 2015 +0100
drm/i915: Serialise updates to GGTT with access through GGTT on Braswell
thankfully caught by the extra WARN safegaurd in 0a878716. Since we now
override the GGTT insert_pages callback when installing the aliasing
ppgtt, we assert that the callback is the original ggtt routine.
However, on Braswell we now use a different insertion routine to
serialise access through the GGTT with updating the PTE and hence the
conflict. To avoid the conflict, move the custom insertion routine for
Braswell down a level.
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1447859979-20107-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: drm-intel-fixes@lists.freedesktop.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we mark the preallocated objects as bound, we should also flag them
correctly as being map-and-fenceable (if appropriate!) so that later
users do not get confused and try and rebind the pinned vma in order to
get a map-and-fenceable binding.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: drm-intel-fixes@lists.freedesktop.org
Link: http://patchwork.freedesktop.org/patch/msgid/1448029000-10616-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The device should be on for the whole duration of the update, so check
for this.
v2:
- use the existing dev_priv directly everywhere (Ville)
v3:
- check also that we are in an RPM atomic section (Chris)
- add the assert to i915_ggtt_insert_entries/clear_range too (Chris)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1450203038-5150-11-git-send-email-imre.deak@intel.com
The cherryview device shares many characteristics with the valleyview
device. When support was added to the driver for cherryview, the
corresponding device info structure included .is_valleyview = 1.
This is not correct and leads to some confusion.
This patch changes .is_valleyview to .is_cherryview in the cherryview
device info structure and simplifies the IS_CHERRYVIEW macro.
Then where appropriate, instances of IS_VALLEYVIEW are replaced with
IS_VALLEYVIEW || IS_CHERRYVIEW or equivalent.
v2: Use IS_VALLEYVIEW || IS_CHERRYVIEW instead of defining a new macro.
Also add followup patches to fix issues discovered during the first
review. (Ville)
v3: Fix some style issues and one gen check. Remove CRT related changes
as CRT is not supported on CHV. (Imre, Ville)
v4: Make a few more optimizations. (Ville)
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Wayne Boyer <wayne.boyer@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1449692975-14803-1-git-send-email-wayne.boyer@intel.com
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
All of these iterator macros require a 'temp' argument, used merely to
hold internal partial results. We can instead declare the temporary
variable inside the macro, so the caller need not provide it.
Some of the old code contained nested iterators that actually reused the
same 'temp' variable for both inner and outer instances. It's quite
surprising that this didn't introduce bugs! But it does show that the
value of 'temp' isn't required to persist during the iterated body.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1449581451-11848-2-git-send-email-david.s.gordon@intel.com
We don't need 2 separate unions.
Note that this was done intentinoally
Author: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Date: Wed May 6 14:35:38 2015 +0300
drm/i915: Add a partial GGTT view type
on Tvrtko's request, but without a clear justification. Rotated views
are also not checking for matching paramters in i915_ggtt_view_equal,
which seems like a bug. But this patch here doesn't change that.
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1444834266-12689-2-git-send-email-daniel.vetter@ffwll.ch
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We'll want to avoid performing arithmetic with register offsets, so
instead calculating the vgpu PDP as pdp0_lo+offset, make the PDPs
into an array. This way we can simply loop through them.
Cc: Eddie Dong <eddie.dong@intel.com>
Cc: Jike Song <jike.song@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Yu Zhang <yu.c.zhang@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Cc: Zhiyuan Lv <zhiyuan.lv@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1446672017-24497-25-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Zhiyuan Lv <zhiyuan.lv@intel.com>
When register type safety happens, we can't just try to emit the
register itself to the ring. Instead we'll need to extract the
offset from it first. Add some convenience functions that will do
that.
v2: Convert MOCS setup too
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1446672017-24497-20-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
When accessing through the GTT from one CPU whilst concurrently updating
the GGTT PTEs in another thread, the hardware likes to return random
data. As we have strong serialisation prevent us from modifying the PTE
of an active GTT mmapping, we have to conclude that it whilst modifying
other PTE's that error occurs. (I have not looked for any pattern such
as modifying PTE within the same page or cacheline as active PTE -
though checking whether revoking neighbouring objects should be enough
to test that theory.) The corruption also seems restricted to Braswell
and disappears with maxcpus=0. This patch stops all access through the
GTT by other CPUs when we update any PTE by stopping the machine around
the GGTT update.
Note that splitting up the 64 bit write into two 32 bit writes was
tried and found to fail too.
Testcase: igt/gem_concurrent_blit
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89079
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Add note about 2x 32bits failing too.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Use 48b addresses if hw supports it (i915.enable_ppgtt=3).
Update the sanitize_enable_ppgtt for 48 bit PPGTT mode.
Note, aliasing PPGTT remains 32b only.
v2: s/full_64b/full_48b/. (Akash)
v3: Add sanitize_enable_ppgtt changes until here. (Akash)
v4: Update param description (Chris)
Cc: Akash Goel <akash.goel@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This was accidentally lost in
commit 75d04a3773
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date: Tue Apr 28 17:56:17 2015 +0300
drm/i915/gtt: Allocate va range only if vma is not bound
While at it implement an improved version suggested by Chris which
avoids the double-bind irrespective of what type of bind is done
first.
Note that this exact bug was already addressed in
commit d0e30adc42
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed Jul 29 20:02:48 2015 +0100
drm/i915: Mark PIN_USER binding as GLOBAL_BIND without the aliasing ppgtt
but the problem is still that originally in
commit 0875546c53
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Mon Apr 20 09:04:05 2015 -0700
drm/i915: Fix up the vma aliasing ppgtt binding
if forgotten to take into account there case where we have a
GLOBAL_BIND before a LOCAL_BIND. This patch here fixes that.
v2: Pimp commit message and revert the partial fix.
v3: Split into two functions to specialize on aliasing_ppgtt y/n.
v4: WARN_ON for paranoia in the init sequence, since the ggtt probe
and aliasing ppgtt setup are far apart.
v5: Style nits.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: http://mid.gmane.org/1444911781-32607-1-git-send-email-daniel.vetter@ffwll.ch
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When preallocating a stolen object during early initialisation, we may
be running before we have setup the the global GTT VM state, in
particular before we have initialised the range manager and associated
lists. As this is the case, we defer binding the stolen object until we
call i915_gem_setup_global_gtt(). Not only should we defer the binding,
but we should also defer the VM list manipulation.
Fixes regression uncovered by commit a2cad9dff4
Author: Michał Winiarski <michal.winiarski@intel.com>
Date: Wed Sep 16 11:49:00 2015 +0200
drm/i915/gtt: Do not initialize drm_mm twice.
Whilst I am here remove the duplicate work leaving dangling pointers
from the error path...
v2: Typos galore before coffee.
Reported-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92099
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Tested-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Just adding the rotated UV plane at the end of the rotated Y plane.
v2: Rebase.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
By providing a start offset into the source array of pages, and returning the
end position in the scatter-gather table, we will be able to append the UV
plane to the rotated mapping in later patches.
v2: Rebase.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It would be initialized just moments later by i915_init_vm.
Rearrange the code such that i915_init_vm() is next to its callers
inside i915_gem_gtt (and so we can make it static). After removing the
dance around the files, it is clear that we are repeating some work
inside the initializers (such as calling drm_mm_init() multiple times),
so take advantage of the refactor to also remove some redundant code and
clean up the interface.
v2: Commit msg update,
s/i915_init_vm/i915_address_space_init, move to i915_gem_gtt.c,
init address_space during i915_gem_setup_global_gtt for ggtt.
v3: Do not init global_link - we are adding it to vm_list moments later,
make i915_address_space_init static, use OOP style parameter order.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
On each call to gen8_alloc_va_range_3lvl we're allocating temporary
bitmaps needed for error handling. Unfortunately, when we increase
address space size (48b ppgtt) we do additional (512 - 4) calls to
kcalloc, increasing latency between exec and actual start of execution
on the GPU. Let's just do a single kcalloc, we can also drop the size
from free_gen8_temp_bitmaps since it's no longer used.
v2: Use GFP_TEMPORARY to make the allocations reclaimable.
v3: Drop the 2D array, just allocate a single block.
v4: Rebase to handle gen8_preallocate_top_level_pdps.
v5: Align misaligned bracket.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Correct kcalloc arguments as suggested by Chris.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When i915 drivers run inside a VM with Intel GVT-g, some explicit
notifications are needed from guest to host device model through PV
INFO page write. The notifications include:
PPGTT create
PPGTT destroy
They are used for the shadow implementation of PPGTT. Intel GVT-g
needs to write-protect the guest pages of PPGTT, and clear the write
protection when they end their life cycle.
v2:
- Use lower_32_bits()/upper_32_bits() for qword operations;
- Remove the notification of guest context creation/destroy;
Signed-off-by: Zhiyuan Lv <zhiyuan.lv@intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is based on Mika Kuoppala's patch below:
http://article.gmane.org/gmane.comp.freedesktop.xorg.drivers.intel/61104/match=workaround+hw+preload
The patch will preallocate the page directories for 32-bit PPGTT when
i915 runs inside a virtual machine with Intel GVT-g. With this change,
the root pointers in EXECLIST context will always keep the same.
The change is needed for vGPU because Intel GVT-g will do page table
shadowing, and needs to track all the page table changes from guest
i915 driver. However, if guest PPGTT is modified through GPU commands
like LRI, it is not possible to trap the operations in the right time,
so it will be hard to make shadow PPGTT to work correctly.
Shadow PPGTT could be much simpler with this change. Meanwhile
hypervisor could simply prohibit any attempt of PPGTT modification
through GPU command for security.
The function gen8_preallocate_top_level_pdps() in the patch is from
Mika, with only one change to set "used_pdpes" to avoid duplicated
allocation later.
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Zhiyuan Lv <zhiyuan.lv@intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
And fix 0-DAY kernel test infrastructure warning.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With the offset length being taken care of in ("drm/i915/gtt: Allow >=
4GB offsets in X86_32"), the code should be finally safe in 32-bit
kernels.
This reverts commit 501fd70fca
Author: Michel Thierry <michel.thierry@intel.com>
Date: Fri May 29 14:15:05 2015 +0100
drm/i915: limit PPGTT size to 2GB in 32-bit platforms
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Similar to commit c44ef60e43 ("drm/i915/gtt:
Allow >= 4GB sizes for vm"), i915_gem_obj_offset and i915_gem_obj_ggtt_offset
return an unsigned long, which in only 4-bytes long in 32-bit kernels.
Change return type (and other related offset variables) to u64.
Since Global GTT is always limited to 4GB, this change would not be required
in i915_gem_obj_ggtt_offset, but this is done for consistency.
v2: Remove unnecessary offset variable in do_pin, as we already have
vma->node.start (Chris).
Update GGTT offset too (Tvrtko).
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
v2: Clean up patch after rebases.
v3: gen8_dump_ppgtt for 32b and 48b PPGTT.
v4: Use used_pml4es/pdpes (Akash).
v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
v6: Rely on used_px bits instead of null checking (Akash)
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Similar to PDs, while setting up a page directory pointer, make all entries
of the pdp point to the scratch pd before mapping (and make all its entries
point to the scratch page); this is to be safe in case of out of bound
access or proactive prefetch.
Also add a scratch pdp, which the PML4 entries point to.
v2: Handle scratch_pdp allocation failure correctly, and keep
initialize_px functions together (Akash)
v3: Rebase after Mika's ppgtt cleanup / scratch merge patch series. Rely on
the added macros to initialize the pdps.
v4: Rebase after final merged version of Mika's ppgtt/scratch patches
(and removed commit message part related to v3).
v5: Update commit message to also mention PML4 table initialization and
the new scratch pdp (Akash).
Suggested-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When 48b is enabled, gen8_ppgtt_insert_entries needs to read the Page Map
Level 4 (PML4), before it selects which Page Directory Pointer (PDP)
it will write to.
Similarly, gen8_ppgtt_clear_range needs to get the correct PDP/PD range.
This patch was inspired by Ben's "Depend exclusively on map and
unmap_vma".
v2: Rebase after s/page_tables/page_table/.
v3: Remove unnecessary pdpe loop in gen8_ppgtt_clear_range_4lvl and use
clamp_pdp in gen8_ppgtt_insert_entries (Akash).
v4: Merge gen8_ppgtt_clear_range_4lvl into gen8_ppgtt_clear_range to
maintain symmetry with gen8_ppgtt_insert_entries (Akash).
v5: Do not mix pages and bytes in insert_entries (Akash).
v6: Prevent overflow in sg_nents << PAGE_SHIFT, when inserting 4GB at
once.
v7: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
Use gen8_px_index functions, and remove unnecessary number of pages
parameter in insert_pte_entries.
v8: Change gen8_ppgtt_clear_pte_range to stop at PDP boundary, instead of
adding and extra clamp function; remove unnecessary pdp_start/pdp_len
variables (Akash).
v9: pages->orig_nents instead of sg_nents(pages->sgl) to get the
length (Akash).
v10: Remove pdp warning check ingen8_ppgtt_insert_pte_entries until this
commit (Akash).
Reviewed-by: Akash Goel <akash.goel@intel.com> (v9)
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As a step towards implementing 4 levels, while not discarding the
existing pte insert functions, we need to pass the sg_iter through.
The current function understands to the page directory granularity.
An object's pages may span the page directory, and so using the iter
directly as we write the PTEs allows the iterator to stay coherent
through a VMA insert operation spanning multiple page table levels.
v2: Rebase after s/page_tables/page_table/.
v3: Rebase after Mika's ppgtt cleanup / scratch merge patch series;
updated commit message (s/map/insert).
v4: Rebase.
Reviewed-by: Akash Goel <akash.goel@intel.com> (v3)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In 64b (48bit canonical) PPGTT addressing, the PDP0 register contains
the base address to PML4, while the other PDP registers are ignored.
In LRC, the addressing mode must be specified in every context
descriptor, and the base address to PML4 is stored in the reg state.
v2: PML4 update in legacy context switch is left for historic reasons,
the preferred mode of operation is with lrc context based submission.
v3: s/gen8_map_page_directory/gen8_setup_page_directory and
s/gen8_map_page_directory_pointer/gen8_setup_page_directory_pointer.
Also, clflush will be needed for bxt. (Akash)
v4: Squashed lrc-specific code and use a macro to set PML4 register.
v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
PDP update in bb_start is only for legacy 32b mode.
v6: Rebase after final merged version of Mika's ppgtt/scratch
patches.
v7: There is no need to update the pml4 register value in
execlists_update_context. (Akash)
v8: Move pd and pdp setup functions to a previous patch, they do not
belong here. (Akash)
v9: Check USES_FULL_48BIT_PPGTT instead of GEN8_CTX_ADDRESSING_MODE in
gen8_emit_bb_start to check if emit pdps is needed. (Akash)
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
PML4 has no special attributes, and there will always be a PML4.
So simply initialize it at creation, and destroy it at the end.
The code for 4lvl is able to call into the existing 3lvl page table code
to handle all of the lower levels.
v2: Return something at the end of gen8_alloc_va_range_4lvl to keep the
compiler happy. And define ret only in one place.
Updated gen8_ppgtt_unmap_pages and gen8_ppgtt_free to handle 4lvl.
v3: Use i915_dma_unmap_single instead of pci API. Fix a
couple of incorrect checks when unmapping pdp and pd pages (Akash).
v4: Call __pdp_fini also for 32b PPGTT. Clean up alloc_pdp param list.
v5: Prevent (harmless) out of range access in gen8_for_each_pml4e.
v6: Simplify alloc_vma_range_4lvl and gen8_ppgtt_init_common error
paths. (Akash)
v7: Rebase, s/gen8_ppgtt_free_*/gen8_ppgtt_cleanup_*/.
v8: Change location of pml4_init/fini. It will make next patches
cleaner.
v9: Rebase after Mika's ppgtt cleanup / scratch merge patch series, while
trying to reuse as much as possible for pdp alloc. pml4_init/fini
replaced by setup/cleanup_px macros.
v10: Rebase after Mika's merged ppgtt cleanup patch series.
v11: Rebase after final merged version of Mika's ppgtt/scratch
patches.
v12: Fix pdpe start value in trace (Akash)
v13: Define all 4lvl functions in this patch directly, instead of
previous patches, add i915_page_directory_pointer_entry_alloc here,
use test_bit to detect when pdp is already allocated (Akash).
v14: Move pdp allocation into a new gen8_ppgtt_alloc_page_dirpointers
funtion, as we do for pds and pts; move pd and pdp setup functions to
this patch (Akash).
v15: Added kfree(pdp) from previous patch to this (Akash).
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Introduces the Page Map Level 4 (PML4), ie. the new top level structure
of the page tables.
To facilitate testing, 48b mode will be available on Broadwell and
GEN9+, when i915.enable_ppgtt = 3.
v2: Remove unnecessary CONFIG_X86_64 checks, ppgtt code is already
32/64-bit safe (Chris).
v3: Add goto free_scratch in temp 48-bit mode init code (Akash).
v4: kfree the pdp until the 4lvl alloc/free patch (Akash).
v5: Postpone 48-bit code in sanitize_enable_ppgtt (Akash).
v6: Keep _insert_pte_entries changes outside this patch (Akash).
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The dynamic page allocation patch series added it for GEN6, this patch
adds them for GEN8.
v2: Consolidate pagetable/page_directory events
v3: Multiple rebases.
v4: Rebase after s/page_tables/page_table/.
v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
v6: Rebase after gen8_map_pagetable_range removal.
v7: Use generic page name (px) in DECLARE_EVENT_CLASS (Akash)
v8: Defer define of i915_page_directory_pointer_entry_alloc (Akash)
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
Reviewed-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The insert_entries function was the function used to write PTEs. For the
PPGTT it was "hardcoded" to only understand two level page tables, which
was the case for GEN7. We can reuse this for 4 level page tables, and
remove the concept of insert_entries, which was never viable past 2
level page tables anyway, but it requires a bit of rework to make the
function a bit more generic.
v2: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
v3: Rebase after final merged version of Mika's ppgtt/scratch patches.
v4: Check and warn for NULL value of pdp pointer (Akash).
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
Reviewed-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Up until now, ppgtt->pdp has always been the root of our page tables.
Legacy 32b addresses acted like it had 1 PDP with 4 PDPEs.
In preparation for 4 level page tables, we need to stop using ppgtt->pdp
directly unless we know it's what we want. The future structure will use
ppgtt->pml4 for the top level, and the pdp is just one of the entries
being pointed to by a pml4e. The temporal pdp local variable will be
removed once the rest of the 4-level code lands.
Also, start passing the vm pointer to the alloc functions, instead of
ppgtt.
v2: Updated after dynamic page allocation changes.
v3: Rebase after s/page_tables/page_table/.
v4: Rebase after changes in "Dynamic page table allocations" patch.
v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
v6: Rebase after final merged version of Mika's ppgtt/scratch patches.
v7: Keep pagetable map in-line (and avoid unnecessary for_each_pde
loops), remove redundant ppgtt pointer in _alloc_pagetabs (Akash)
v8: Fix text indentation in _alloc_pagetabs/page_directories (Chris)
v9: Defer gen8_alloc_va_range_4lvl definition until 4lvl is implemented,
clean-up gen8_ppgtt_cleanup [pun intended] (Akash).
v10: Clean-up commit message (Akash).
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: "Akash Goel" <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This transitional patch doesn't do much for the existing code. However,
it should make upcoming patches to use the full 48b address space a bit
easier.
32-bit ppgtt uses just 4 PDPs, while 48-bit ppgtt will have up-to 512;
this patch prepares the existing functions to query the right number of pdps
at run-time. This also means that used_pdpes should also be allocated during
ppgtt_init, as the bitmap size will depend on the ppgtt address range
selected.
v2: Renamed pdp_free to be similar to pd/pt (unmap_and_free_pdp).
v3: To facilitate testing, 48b mode will be available on Broadwell and
GEN9+, when i915.enable_ppgtt = 3.
v4: Rebase after s/page_tables/page_table/, added extra information
about 4-level page table formats and use IS_ENABLED macro.
v5: Check CONFIG_X86_64 instead of CONFIG_64BIT.
v6: Rebase after Mika's ppgtt cleanup / scratch merge patch series, and
follow
his nomenclature in pdp functions (there is no alloc_pdp yet).
v7: Rebase after merged version of Mika's ppgtt cleanup patch series.
v8: Rebase after final merged version of Mika's ppgtt/scratch patches.
v9: Introduce PML4 (and 48-bit checks) until next patch (Akash).
v10: Also use test_bit to detect when pd/pt are already allocated (Akash)
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Akash Goel <akash.goel@intel.com>
[danvet: Amend commit message as suggested by Michel.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
gen8_clamp_pd clamps to the next page directory boundary, but the macro
gen8_for_each_pde already has a check to stop at the page directory
boundary.
Furthermore, i915_pte_count also restricts to the next page table
boundary.
v2: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
Suggested-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: "Akash Goel" <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Backmerge drm-intel-fixes because a bunch of atomic patch backporting
we had to do lead to horrible conflicts.
Conflicts:
drivers/gpu/drm/drm_crtc.c
Just a bit of context conflict between -next and -fixes.
drivers/gpu/drm/i915/intel_atomic.c
drivers/gpu/drm/i915/intel_display.c
Atomic conflicts, always pick the code from -next.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
If the device does not support the aliasing ppgtt, we must translate
user bind requests (PIN_USER) from LOCAL_BIND to a GLOBAL_BIND. However,
since this is device specific we cannot do this conveniently in the
upper layers and so must manage the vma->bound flags in the backend.
Partial revert of commit 75d04a3773 [4.2-rc1]
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date: Tue Apr 28 17:56:17 2015 +0300
drm/i915/gtt: Allocate va range only if vma is not bound
Note this was spotted by Daniel originally, but we dropped the ball in
getting the fix in before the bug going wild. Sorry all.
Reported-by: Vincent Legoll vincent.legoll@gmail.com
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91133
References: https://bugs.freedesktop.org/show_bug.cgi?id=90224
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Backmerge fixes since it's getting out of hand again with the massive
split due to atomic between -next and 4.2-rc. All the bugfixes in
4.2-rc are addressed already (by converting more towards atomic
instead of minimal duct-tape) so just always pick the version in next
for the conflicts in modeset code.
All the other conflicts are just adjacent lines changed.
Conflicts:
drivers/gpu/drm/i915/i915_drv.h
drivers/gpu/drm/i915/i915_gem_gtt.c
drivers/gpu/drm/i915/intel_display.c
drivers/gpu/drm/i915/intel_drv.h
drivers/gpu/drm/i915/intel_ringbuffer.h
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
After the previous patch this flag will check always clear, as it's
never set for shmem backed and userptr objects, so we can remove it.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Yeah this isn't really fixes but it's a nice cleanup to
clarify the code but not really worth the hassle of backmerging. So
just add to -fixes, we're still early in -rc.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When rotated and partial views were added no one spotted the resume
path which assumes only one GGTT VMA per object and hence is now
skipping rebind of alternative views.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Previously we have pointed the page where the individual ppgtt
scratch structures refer to, to be the instance which GGTT setup have
allocated. So it has been shared.
To achieve full isolation between ppgtts also in this regard,
allocate per ppgtt scratch page.
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Every other alloc_* function return the pointer to the page
they alloc. Follow the convention with scratch page also.
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Maintain base page handling functions in order of
alloc, free, init. No functional changes.
v2: s/Introduce/Maintain (Michel)
v3: Rebase
Cc: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>