2008-07-31 03:06:12 +08:00
|
|
|
/*
|
2015-03-18 17:46:04 +08:00
|
|
|
* Copyright © 2008-2015 Intel Corporation
|
2008-07-31 03:06:12 +08:00
|
|
|
*
|
|
|
|
* Permission is hereby granted, free of charge, to any person obtaining a
|
|
|
|
* copy of this software and associated documentation files (the "Software"),
|
|
|
|
* to deal in the Software without restriction, including without limitation
|
|
|
|
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
|
|
|
* and/or sell copies of the Software, and to permit persons to whom the
|
|
|
|
* Software is furnished to do so, subject to the following conditions:
|
|
|
|
*
|
|
|
|
* The above copyright notice and this permission notice (including the next
|
|
|
|
* paragraph) shall be included in all copies or substantial portions of the
|
|
|
|
* Software.
|
|
|
|
*
|
|
|
|
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
|
|
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
|
|
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
|
|
|
|
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
|
|
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
|
|
|
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
|
|
|
* IN THE SOFTWARE.
|
|
|
|
*
|
|
|
|
* Authors:
|
|
|
|
* Eric Anholt <eric@anholt.net>
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
|
2012-10-03 01:01:07 +08:00
|
|
|
#include <drm/drmP.h>
|
2013-07-25 03:07:52 +08:00
|
|
|
#include <drm/drm_vma_manager.h>
|
2012-10-03 01:01:07 +08:00
|
|
|
#include <drm/i915_drm.h>
|
2008-07-31 03:06:12 +08:00
|
|
|
#include "i915_drv.h"
|
2016-07-20 16:21:15 +08:00
|
|
|
#include "i915_gem_dmabuf.h"
|
2015-02-10 19:05:49 +08:00
|
|
|
#include "i915_vgpu.h"
|
2009-08-25 18:15:50 +08:00
|
|
|
#include "i915_trace.h"
|
2009-08-18 04:31:43 +08:00
|
|
|
#include "intel_drv.h"
|
2016-08-04 23:32:35 +08:00
|
|
|
#include "intel_frontbuffer.h"
|
2016-04-13 22:03:25 +08:00
|
|
|
#include "intel_mocs.h"
|
2016-07-20 16:21:15 +08:00
|
|
|
#include <linux/reservation.h>
|
2011-06-28 07:18:18 +08:00
|
|
|
#include <linux/shmem_fs.h>
|
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 16:04:11 +08:00
|
|
|
#include <linux/slab.h>
|
2008-07-31 03:06:12 +08:00
|
|
|
#include <linux/swap.h>
|
DRM: i915: add mode setting support
This commit adds i915 driver support for the DRM mode setting APIs.
Currently, VGA, LVDS, SDVO DVI & VGA, TV and DVO LVDS outputs are
supported. HDMI, DisplayPort and additional SDVO output support will
follow.
Support for the mode setting code is controlled by the new 'modeset'
module option. A new config option, CONFIG_DRM_I915_KMS controls the
default behavior, and whether a PCI ID list is built into the module for
use by user level module utilities.
Note that if mode setting is enabled, user level drivers that access
display registers directly or that don't use the kernel graphics memory
manager will likely corrupt kernel graphics memory, disrupt output
configuration (possibly leading to hangs and/or blank displays), and
prevent panic/oops messages from appearing. So use caution when
enabling this code; be sure your user level code supports the new
interfaces.
A new SysRq key, 'g', provides emergency support for switching back to
the kernel's framebuffer console; which is useful for testing.
Co-authors: Dave Airlie <airlied@linux.ie>, Hong Liu <hong.liu@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2008-11-08 06:24:08 +08:00
|
|
|
#include <linux/pci.h>
|
2012-05-10 21:25:09 +08:00
|
|
|
#include <linux/dma-buf.h>
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2010-11-09 03:18:58 +08:00
|
|
|
static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
|
2015-01-21 21:53:48 +08:00
|
|
|
static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj);
|
2012-04-17 22:31:31 +08:00
|
|
|
|
2013-08-08 21:41:03 +08:00
|
|
|
static bool cpu_cache_is_coherent(struct drm_device *dev,
|
|
|
|
enum i915_cache_level level)
|
|
|
|
{
|
|
|
|
return HAS_LLC(dev) || level != I915_CACHE_NONE;
|
|
|
|
}
|
|
|
|
|
2013-08-09 19:26:45 +08:00
|
|
|
static bool cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
|
|
|
|
{
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
if (obj->base.write_domain == I915_GEM_DOMAIN_CPU)
|
|
|
|
return false;
|
|
|
|
|
2013-08-09 19:26:45 +08:00
|
|
|
if (!cpu_cache_is_coherent(obj->base.dev, obj->cache_level))
|
|
|
|
return true;
|
|
|
|
|
|
|
|
return obj->pin_display;
|
|
|
|
}
|
|
|
|
|
2016-06-10 16:53:01 +08:00
|
|
|
static int
|
|
|
|
insert_mappable_node(struct drm_i915_private *i915,
|
|
|
|
struct drm_mm_node *node, u32 size)
|
|
|
|
{
|
|
|
|
memset(node, 0, sizeof(*node));
|
|
|
|
return drm_mm_insert_node_in_range_generic(&i915->ggtt.base.mm, node,
|
|
|
|
size, 0, 0, 0,
|
|
|
|
i915->ggtt.mappable_end,
|
|
|
|
DRM_MM_SEARCH_DEFAULT,
|
|
|
|
DRM_MM_CREATE_DEFAULT);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
remove_mappable_node(struct drm_mm_node *node)
|
|
|
|
{
|
|
|
|
drm_mm_remove_node(node);
|
|
|
|
}
|
|
|
|
|
2010-09-30 18:46:12 +08:00
|
|
|
/* some bookkeeping */
|
|
|
|
static void i915_gem_info_add_obj(struct drm_i915_private *dev_priv,
|
2016-10-18 20:02:48 +08:00
|
|
|
u64 size)
|
2010-09-30 18:46:12 +08:00
|
|
|
{
|
2013-07-25 04:40:23 +08:00
|
|
|
spin_lock(&dev_priv->mm.object_stat_lock);
|
2010-09-30 18:46:12 +08:00
|
|
|
dev_priv->mm.object_count++;
|
|
|
|
dev_priv->mm.object_memory += size;
|
2013-07-25 04:40:23 +08:00
|
|
|
spin_unlock(&dev_priv->mm.object_stat_lock);
|
2010-09-30 18:46:12 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void i915_gem_info_remove_obj(struct drm_i915_private *dev_priv,
|
2016-10-18 20:02:48 +08:00
|
|
|
u64 size)
|
2010-09-30 18:46:12 +08:00
|
|
|
{
|
2013-07-25 04:40:23 +08:00
|
|
|
spin_lock(&dev_priv->mm.object_stat_lock);
|
2010-09-30 18:46:12 +08:00
|
|
|
dev_priv->mm.object_count--;
|
|
|
|
dev_priv->mm.object_memory -= size;
|
2013-07-25 04:40:23 +08:00
|
|
|
spin_unlock(&dev_priv->mm.object_stat_lock);
|
2010-09-30 18:46:12 +08:00
|
|
|
}
|
|
|
|
|
2011-01-26 23:55:56 +08:00
|
|
|
static int
|
2012-11-15 00:14:05 +08:00
|
|
|
i915_gem_wait_for_error(struct i915_gpu_error *error)
|
2010-09-25 17:19:17 +08:00
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
2016-04-14 00:35:05 +08:00
|
|
|
if (!i915_reset_in_progress(error))
|
2010-09-25 17:19:17 +08:00
|
|
|
return 0;
|
|
|
|
|
2012-07-05 04:18:41 +08:00
|
|
|
/*
|
|
|
|
* Only wait 10 seconds for the gpu reset to complete to avoid hanging
|
|
|
|
* userspace. If it takes that long something really bad is going on and
|
|
|
|
* we should simply try to bail out and fail as gracefully as possible.
|
|
|
|
*/
|
2012-11-16 00:17:22 +08:00
|
|
|
ret = wait_event_interruptible_timeout(error->reset_queue,
|
2016-04-14 00:35:05 +08:00
|
|
|
!i915_reset_in_progress(error),
|
2012-11-16 00:17:22 +08:00
|
|
|
10*HZ);
|
2012-07-05 04:18:41 +08:00
|
|
|
if (ret == 0) {
|
|
|
|
DRM_ERROR("Timed out waiting for the gpu reset to complete\n");
|
|
|
|
return -EIO;
|
|
|
|
} else if (ret < 0) {
|
2010-09-25 17:19:17 +08:00
|
|
|
return ret;
|
2016-04-14 00:35:05 +08:00
|
|
|
} else {
|
|
|
|
return 0;
|
2012-07-05 04:18:41 +08:00
|
|
|
}
|
2010-09-25 17:19:17 +08:00
|
|
|
}
|
|
|
|
|
2010-11-26 02:00:26 +08:00
|
|
|
int i915_mutex_lock_interruptible(struct drm_device *dev)
|
2010-09-25 18:22:51 +08:00
|
|
|
{
|
2016-07-04 18:34:36 +08:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(dev);
|
2010-09-25 18:22:51 +08:00
|
|
|
int ret;
|
|
|
|
|
2012-11-15 00:14:05 +08:00
|
|
|
ret = i915_gem_wait_for_error(&dev_priv->gpu_error);
|
2010-09-25 18:22:51 +08:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
ret = mutex_lock_interruptible(&dev->struct_mutex);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
2010-09-25 17:19:17 +08:00
|
|
|
|
2008-10-23 12:40:13 +08:00
|
|
|
int
|
|
|
|
i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_file *file)
|
2008-10-23 12:40:13 +08:00
|
|
|
{
|
2016-03-30 21:57:10 +08:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(dev);
|
2016-03-18 16:42:57 +08:00
|
|
|
struct i915_ggtt *ggtt = &dev_priv->ggtt;
|
2008-10-23 12:40:13 +08:00
|
|
|
struct drm_i915_gem_get_aperture *args = data;
|
2015-07-01 18:51:10 +08:00
|
|
|
struct i915_vma *vma;
|
2010-11-24 20:23:44 +08:00
|
|
|
size_t pinned;
|
2008-10-23 12:40:13 +08:00
|
|
|
|
2010-11-24 20:23:44 +08:00
|
|
|
pinned = 0;
|
2010-09-30 18:46:12 +08:00
|
|
|
mutex_lock(&dev->struct_mutex);
|
2016-02-26 19:03:19 +08:00
|
|
|
list_for_each_entry(vma, &ggtt->base.active_list, vm_link)
|
2016-08-04 23:32:30 +08:00
|
|
|
if (i915_vma_is_pinned(vma))
|
2015-07-01 18:51:10 +08:00
|
|
|
pinned += vma->node.size;
|
2016-02-26 19:03:19 +08:00
|
|
|
list_for_each_entry(vma, &ggtt->base.inactive_list, vm_link)
|
2016-08-04 23:32:30 +08:00
|
|
|
if (i915_vma_is_pinned(vma))
|
2015-07-01 18:51:10 +08:00
|
|
|
pinned += vma->node.size;
|
2010-09-30 18:46:12 +08:00
|
|
|
mutex_unlock(&dev->struct_mutex);
|
2008-10-23 12:40:13 +08:00
|
|
|
|
2016-03-30 21:57:10 +08:00
|
|
|
args->aper_size = ggtt->base.total;
|
2011-08-17 03:34:10 +08:00
|
|
|
args->aper_available_size = args->aper_size - pinned;
|
2010-11-24 20:23:44 +08:00
|
|
|
|
2008-10-23 12:40:13 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2014-11-04 20:51:40 +08:00
|
|
|
static int
|
|
|
|
i915_gem_object_get_pages_phys(struct drm_i915_gem_object *obj)
|
2014-05-21 19:42:56 +08:00
|
|
|
{
|
2015-12-05 12:45:44 +08:00
|
|
|
struct address_space *mapping = obj->base.filp->f_mapping;
|
2014-11-04 20:51:40 +08:00
|
|
|
char *vaddr = obj->phys_handle->vaddr;
|
|
|
|
struct sg_table *st;
|
|
|
|
struct scatterlist *sg;
|
|
|
|
int i;
|
2014-05-21 19:42:56 +08:00
|
|
|
|
2014-11-04 20:51:40 +08:00
|
|
|
if (WARN_ON(i915_gem_object_needs_bit17_swizzle(obj)))
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
for (i = 0; i < obj->base.size / PAGE_SIZE; i++) {
|
|
|
|
struct page *page;
|
|
|
|
char *src;
|
|
|
|
|
|
|
|
page = shmem_read_mapping_page(mapping, i);
|
|
|
|
if (IS_ERR(page))
|
|
|
|
return PTR_ERR(page);
|
|
|
|
|
|
|
|
src = kmap_atomic(page);
|
|
|
|
memcpy(vaddr, src, PAGE_SIZE);
|
|
|
|
drm_clflush_virt_range(vaddr, PAGE_SIZE);
|
|
|
|
kunmap_atomic(src);
|
|
|
|
|
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-01 20:29:47 +08:00
|
|
|
put_page(page);
|
2014-11-04 20:51:40 +08:00
|
|
|
vaddr += PAGE_SIZE;
|
|
|
|
}
|
|
|
|
|
2016-05-06 22:40:21 +08:00
|
|
|
i915_gem_chipset_flush(to_i915(obj->base.dev));
|
2014-11-04 20:51:40 +08:00
|
|
|
|
|
|
|
st = kmalloc(sizeof(*st), GFP_KERNEL);
|
|
|
|
if (st == NULL)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
if (sg_alloc_table(st, 1, GFP_KERNEL)) {
|
|
|
|
kfree(st);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
sg = st->sgl;
|
|
|
|
sg->offset = 0;
|
|
|
|
sg->length = obj->base.size;
|
2014-05-21 19:42:56 +08:00
|
|
|
|
2014-11-04 20:51:40 +08:00
|
|
|
sg_dma_address(sg) = obj->phys_handle->busaddr;
|
|
|
|
sg_dma_len(sg) = obj->base.size;
|
|
|
|
|
|
|
|
obj->pages = st;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
i915_gem_object_put_pages_phys(struct drm_i915_gem_object *obj)
|
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
BUG_ON(obj->madv == __I915_MADV_PURGED);
|
2014-05-21 19:42:56 +08:00
|
|
|
|
2014-11-04 20:51:40 +08:00
|
|
|
ret = i915_gem_object_set_to_cpu_domain(obj, true);
|
drm/i915: Prevent leaking of -EIO from i915_wait_request()
Reporting -EIO from i915_wait_request() has proven very troublematic
over the years, with numerous hard-to-reproduce bugs cropping up in the
corner case of where a reset occurs and the code wasn't expecting such
an error.
If the we reset the GPU or have detected a hang and wish to reset the
GPU, the request is forcibly complete and the wait broken. Currently, we
report either -EAGAIN or -EIO in order for the caller to retreat and
restart the wait (if appropriate) after dropping and then reacquiring
the struct_mutex (essential to allow the GPU reset to proceed). However,
if we take the view that the request is complete (no further work will
be done on it by the GPU because it is dead and soon to be reset), then
we can proceed with the task at hand and then drop the struct_mutex
allowing the reset to occur. This transfers the burden of checking
whether it is safe to proceed to the caller, which in all but one
instance it is safe - completely eliminating the source of all spurious
-EIO.
Of note, we only have two API entry points where we expect that
userspace can observe an EIO. First is when submitting an execbuf, if
the GPU is terminally wedged, then the operation cannot succeed and an
-EIO is reported. Secondly, existing userspace uses the throttle ioctl
to detect an already wedged GPU before starting using HW acceleration
(or to confirm that the GPU is wedged after an error condition). So if
the GPU is wedged when the user calls throttle, also report -EIO.
v2: Split more carefully the change to i915_wait_request() and assorted
ABI from the reset handling.
v3: Add a couple of WARN_ON(EIO) to the interruptible modesetting code
so that we don't start to leak EIO there in future (and break our hang
resistant modesetting).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-9-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-1-git-send-email-chris@chris-wilson.co.uk
2016-04-14 00:35:08 +08:00
|
|
|
if (WARN_ON(ret)) {
|
2014-11-04 20:51:40 +08:00
|
|
|
/* In the event of a disaster, abandon all caches and
|
|
|
|
* hope for the best.
|
|
|
|
*/
|
|
|
|
obj->base.read_domains = obj->base.write_domain = I915_GEM_DOMAIN_CPU;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (obj->madv == I915_MADV_DONTNEED)
|
|
|
|
obj->dirty = 0;
|
|
|
|
|
|
|
|
if (obj->dirty) {
|
2015-12-05 12:45:44 +08:00
|
|
|
struct address_space *mapping = obj->base.filp->f_mapping;
|
2014-11-04 20:51:40 +08:00
|
|
|
char *vaddr = obj->phys_handle->vaddr;
|
2014-05-21 19:42:56 +08:00
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < obj->base.size / PAGE_SIZE; i++) {
|
2014-11-04 20:51:40 +08:00
|
|
|
struct page *page;
|
|
|
|
char *dst;
|
|
|
|
|
|
|
|
page = shmem_read_mapping_page(mapping, i);
|
|
|
|
if (IS_ERR(page))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
dst = kmap_atomic(page);
|
|
|
|
drm_clflush_virt_range(vaddr, PAGE_SIZE);
|
|
|
|
memcpy(dst, vaddr, PAGE_SIZE);
|
|
|
|
kunmap_atomic(dst);
|
|
|
|
|
|
|
|
set_page_dirty(page);
|
|
|
|
if (obj->madv == I915_MADV_WILLNEED)
|
2014-05-21 19:42:56 +08:00
|
|
|
mark_page_accessed(page);
|
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-01 20:29:47 +08:00
|
|
|
put_page(page);
|
2014-05-21 19:42:56 +08:00
|
|
|
vaddr += PAGE_SIZE;
|
|
|
|
}
|
2014-11-04 20:51:40 +08:00
|
|
|
obj->dirty = 0;
|
2014-05-21 19:42:56 +08:00
|
|
|
}
|
|
|
|
|
2014-11-04 20:51:40 +08:00
|
|
|
sg_free_table(obj->pages);
|
|
|
|
kfree(obj->pages);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
i915_gem_object_release_phys(struct drm_i915_gem_object *obj)
|
|
|
|
{
|
|
|
|
drm_pci_free(obj->base.dev, obj->phys_handle);
|
|
|
|
}
|
|
|
|
|
|
|
|
static const struct drm_i915_gem_object_ops i915_gem_phys_ops = {
|
|
|
|
.get_pages = i915_gem_object_get_pages_phys,
|
|
|
|
.put_pages = i915_gem_object_put_pages_phys,
|
|
|
|
.release = i915_gem_object_release_phys,
|
|
|
|
};
|
|
|
|
|
2016-08-15 01:44:40 +08:00
|
|
|
int i915_gem_object_unbind(struct drm_i915_gem_object *obj)
|
2014-11-04 20:51:40 +08:00
|
|
|
{
|
2016-08-04 14:52:27 +08:00
|
|
|
struct i915_vma *vma;
|
|
|
|
LIST_HEAD(still_in_list);
|
2014-11-04 20:51:40 +08:00
|
|
|
int ret;
|
|
|
|
|
2016-08-15 01:44:41 +08:00
|
|
|
lockdep_assert_held(&obj->base.dev->struct_mutex);
|
2014-11-04 20:51:40 +08:00
|
|
|
|
2016-08-15 01:44:41 +08:00
|
|
|
/* Closed vma are removed from the obj->vma_list - but they may
|
|
|
|
* still have an active binding on the object. To remove those we
|
|
|
|
* must wait for all rendering to complete to the object (as unbinding
|
|
|
|
* must anyway), and retire the requests.
|
2016-08-04 14:52:27 +08:00
|
|
|
*/
|
2016-08-15 01:44:41 +08:00
|
|
|
ret = i915_gem_object_wait_rendering(obj, false);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
i915_gem_retire_requests(to_i915(obj->base.dev));
|
|
|
|
|
2016-08-04 14:52:27 +08:00
|
|
|
while ((vma = list_first_entry_or_null(&obj->vma_list,
|
|
|
|
struct i915_vma,
|
|
|
|
obj_link))) {
|
|
|
|
list_move_tail(&vma->obj_link, &still_in_list);
|
|
|
|
ret = i915_vma_unbind(vma);
|
|
|
|
if (ret)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
list_splice(&still_in_list, &obj->vma_list);
|
2014-11-04 20:51:40 +08:00
|
|
|
|
|
|
|
return ret;
|
2014-05-21 19:42:56 +08:00
|
|
|
}
|
|
|
|
|
2016-08-04 23:32:40 +08:00
|
|
|
/**
|
|
|
|
* Ensures that all rendering to the object has completed and the object is
|
|
|
|
* safe to unbind from the GTT or access from the CPU.
|
|
|
|
* @obj: i915 gem object
|
|
|
|
* @readonly: waiting for just read access or read-write access
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
|
|
|
|
bool readonly)
|
|
|
|
{
|
|
|
|
struct reservation_object *resv;
|
|
|
|
struct i915_gem_active *active;
|
|
|
|
unsigned long active_mask;
|
|
|
|
int idx;
|
|
|
|
|
|
|
|
lockdep_assert_held(&obj->base.dev->struct_mutex);
|
|
|
|
|
|
|
|
if (!readonly) {
|
|
|
|
active = obj->last_read;
|
|
|
|
active_mask = i915_gem_object_get_active(obj);
|
|
|
|
} else {
|
|
|
|
active_mask = 1;
|
|
|
|
active = &obj->last_write;
|
|
|
|
}
|
|
|
|
|
|
|
|
for_each_active(active_mask, idx) {
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
ret = i915_gem_active_wait(&active[idx],
|
|
|
|
&obj->base.dev->struct_mutex);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
resv = i915_gem_object_get_dmabuf_resv(obj);
|
|
|
|
if (resv) {
|
|
|
|
long err;
|
|
|
|
|
|
|
|
err = reservation_object_wait_timeout_rcu(resv, !readonly, true,
|
|
|
|
MAX_SCHEDULE_TIMEOUT);
|
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2016-08-05 17:14:07 +08:00
|
|
|
/* A nonblocking variant of the above wait. Must be called prior to
|
|
|
|
* acquiring the mutex for the object, as the object state may change
|
|
|
|
* during this call. A reference must be held by the caller for the object.
|
2016-08-04 23:32:40 +08:00
|
|
|
*/
|
|
|
|
static __must_check int
|
2016-08-05 17:14:07 +08:00
|
|
|
__unsafe_wait_rendering(struct drm_i915_gem_object *obj,
|
|
|
|
struct intel_rps_client *rps,
|
|
|
|
bool readonly)
|
2016-08-04 23:32:40 +08:00
|
|
|
{
|
|
|
|
struct i915_gem_active *active;
|
|
|
|
unsigned long active_mask;
|
2016-08-05 17:14:07 +08:00
|
|
|
int idx;
|
2016-08-04 23:32:40 +08:00
|
|
|
|
2016-08-05 17:14:07 +08:00
|
|
|
active_mask = __I915_BO_ACTIVE(obj);
|
2016-08-04 23:32:40 +08:00
|
|
|
if (!active_mask)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (!readonly) {
|
|
|
|
active = obj->last_read;
|
|
|
|
} else {
|
|
|
|
active_mask = 1;
|
|
|
|
active = &obj->last_write;
|
|
|
|
}
|
|
|
|
|
2016-08-05 17:14:07 +08:00
|
|
|
for_each_active(active_mask, idx) {
|
|
|
|
int ret;
|
2016-08-04 23:32:40 +08:00
|
|
|
|
2016-08-05 17:14:07 +08:00
|
|
|
ret = i915_gem_active_wait_unlocked(&active[idx],
|
2016-09-09 21:11:49 +08:00
|
|
|
I915_WAIT_INTERRUPTIBLE,
|
|
|
|
NULL, rps);
|
2016-08-05 17:14:07 +08:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
2016-08-04 23:32:40 +08:00
|
|
|
}
|
|
|
|
|
2016-08-05 17:14:07 +08:00
|
|
|
return 0;
|
2016-08-04 23:32:40 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static struct intel_rps_client *to_rps_client(struct drm_file *file)
|
|
|
|
{
|
|
|
|
struct drm_i915_file_private *fpriv = file->driver_priv;
|
|
|
|
|
|
|
|
return &fpriv->rps;
|
|
|
|
}
|
|
|
|
|
2014-05-21 19:42:56 +08:00
|
|
|
int
|
|
|
|
i915_gem_object_attach_phys(struct drm_i915_gem_object *obj,
|
|
|
|
int align)
|
|
|
|
{
|
|
|
|
drm_dma_handle_t *phys;
|
2014-11-04 20:51:40 +08:00
|
|
|
int ret;
|
2014-05-21 19:42:56 +08:00
|
|
|
|
|
|
|
if (obj->phys_handle) {
|
|
|
|
if ((unsigned long)obj->phys_handle->vaddr & (align -1))
|
|
|
|
return -EBUSY;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (obj->madv != I915_MADV_WILLNEED)
|
|
|
|
return -EFAULT;
|
|
|
|
|
|
|
|
if (obj->base.filp == NULL)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2016-08-04 14:52:28 +08:00
|
|
|
ret = i915_gem_object_unbind(obj);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
ret = i915_gem_object_put_pages(obj);
|
2014-11-04 20:51:40 +08:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2014-05-21 19:42:56 +08:00
|
|
|
/* create a new object */
|
|
|
|
phys = drm_pci_alloc(obj->base.dev, obj->base.size, align);
|
|
|
|
if (!phys)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
obj->phys_handle = phys;
|
2014-11-04 20:51:40 +08:00
|
|
|
obj->ops = &i915_gem_phys_ops;
|
|
|
|
|
|
|
|
return i915_gem_object_get_pages(obj);
|
2014-05-21 19:42:56 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
i915_gem_phys_pwrite(struct drm_i915_gem_object *obj,
|
|
|
|
struct drm_i915_gem_pwrite *args,
|
|
|
|
struct drm_file *file_priv)
|
|
|
|
{
|
|
|
|
struct drm_device *dev = obj->base.dev;
|
|
|
|
void *vaddr = obj->phys_handle->vaddr + args->offset;
|
2016-04-26 23:32:27 +08:00
|
|
|
char __user *user_data = u64_to_user_ptr(args->data_ptr);
|
2015-02-14 03:23:45 +08:00
|
|
|
int ret = 0;
|
2014-11-04 20:51:40 +08:00
|
|
|
|
|
|
|
/* We manually control the domain here and pretend that it
|
|
|
|
* remains coherent i.e. in the GTT domain, like shmem_pwrite.
|
|
|
|
*/
|
|
|
|
ret = i915_gem_object_wait_rendering(obj, false);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
2014-05-21 19:42:56 +08:00
|
|
|
|
2015-06-19 02:43:24 +08:00
|
|
|
intel_fb_obj_invalidate(obj, ORIGIN_CPU);
|
2014-05-21 19:42:56 +08:00
|
|
|
if (__copy_from_user_inatomic_nocache(vaddr, user_data, args->size)) {
|
|
|
|
unsigned long unwritten;
|
|
|
|
|
|
|
|
/* The physical object once assigned is fixed for the lifetime
|
|
|
|
* of the obj, so we can safely drop the lock and continue
|
|
|
|
* to access vaddr.
|
|
|
|
*/
|
|
|
|
mutex_unlock(&dev->struct_mutex);
|
|
|
|
unwritten = copy_from_user(vaddr, user_data, args->size);
|
|
|
|
mutex_lock(&dev->struct_mutex);
|
2015-02-14 03:23:45 +08:00
|
|
|
if (unwritten) {
|
|
|
|
ret = -EFAULT;
|
|
|
|
goto out;
|
|
|
|
}
|
2014-05-21 19:42:56 +08:00
|
|
|
}
|
|
|
|
|
2014-11-04 20:51:40 +08:00
|
|
|
drm_clflush_virt_range(vaddr, args->size);
|
2016-05-06 22:40:21 +08:00
|
|
|
i915_gem_chipset_flush(to_i915(dev));
|
2015-02-14 03:23:45 +08:00
|
|
|
|
|
|
|
out:
|
2015-07-08 07:28:51 +08:00
|
|
|
intel_fb_obj_flush(obj, false, ORIGIN_CPU);
|
2015-02-14 03:23:45 +08:00
|
|
|
return ret;
|
2014-05-21 19:42:56 +08:00
|
|
|
}
|
|
|
|
|
2012-11-15 19:32:30 +08:00
|
|
|
void *i915_gem_object_alloc(struct drm_device *dev)
|
|
|
|
{
|
2016-07-04 18:34:36 +08:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(dev);
|
2015-04-07 23:20:57 +08:00
|
|
|
return kmem_cache_zalloc(dev_priv->objects, GFP_KERNEL);
|
2012-11-15 19:32:30 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
void i915_gem_object_free(struct drm_i915_gem_object *obj)
|
|
|
|
{
|
2016-07-04 18:34:36 +08:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
|
2015-04-07 23:20:57 +08:00
|
|
|
kmem_cache_free(dev_priv->objects, obj);
|
2012-11-15 19:32:30 +08:00
|
|
|
}
|
|
|
|
|
2011-02-07 10:16:14 +08:00
|
|
|
static int
|
|
|
|
i915_gem_create(struct drm_file *file,
|
|
|
|
struct drm_device *dev,
|
|
|
|
uint64_t size,
|
|
|
|
uint32_t *handle_p)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_i915_gem_object *obj;
|
2009-08-23 17:40:55 +08:00
|
|
|
int ret;
|
|
|
|
u32 handle;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2011-02-07 10:16:14 +08:00
|
|
|
size = roundup(size, PAGE_SIZE);
|
2011-09-14 20:14:28 +08:00
|
|
|
if (size == 0)
|
|
|
|
return -EINVAL;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
|
|
|
/* Allocate the new object */
|
2016-04-23 02:14:32 +08:00
|
|
|
obj = i915_gem_object_create(dev, size);
|
2016-04-25 20:32:13 +08:00
|
|
|
if (IS_ERR(obj))
|
|
|
|
return PTR_ERR(obj);
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2010-11-09 03:18:58 +08:00
|
|
|
ret = drm_gem_handle_create(file, &obj->base, &handle);
|
2010-10-14 20:20:40 +08:00
|
|
|
/* drop reference from allocate - handle holds it now */
|
2016-07-20 20:31:54 +08:00
|
|
|
i915_gem_object_put_unlocked(obj);
|
2013-07-25 05:25:03 +08:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
2010-10-14 20:20:40 +08:00
|
|
|
|
2011-02-07 10:16:14 +08:00
|
|
|
*handle_p = handle;
|
2008-07-31 03:06:12 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2011-02-07 10:16:14 +08:00
|
|
|
int
|
|
|
|
i915_gem_dumb_create(struct drm_file *file,
|
|
|
|
struct drm_device *dev,
|
|
|
|
struct drm_mode_create_dumb *args)
|
|
|
|
{
|
|
|
|
/* have to work out size/pitch and return them */
|
2013-10-19 05:48:24 +08:00
|
|
|
args->pitch = ALIGN(args->width * DIV_ROUND_UP(args->bpp, 8), 64);
|
2011-02-07 10:16:14 +08:00
|
|
|
args->size = args->pitch * args->height;
|
|
|
|
return i915_gem_create(file, dev,
|
2014-12-24 11:11:17 +08:00
|
|
|
args->size, &args->handle);
|
2011-02-07 10:16:14 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Creates a new mm object and returns a handle to it.
|
2016-06-03 21:02:17 +08:00
|
|
|
* @dev: drm device pointer
|
|
|
|
* @data: ioctl data blob
|
|
|
|
* @file: drm file pointer
|
2011-02-07 10:16:14 +08:00
|
|
|
*/
|
|
|
|
int
|
|
|
|
i915_gem_create_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file)
|
|
|
|
{
|
|
|
|
struct drm_i915_gem_create *args = data;
|
2012-04-23 22:50:50 +08:00
|
|
|
|
2011-02-07 10:16:14 +08:00
|
|
|
return i915_gem_create(file, dev,
|
2014-12-24 11:11:17 +08:00
|
|
|
args->size, &args->handle);
|
2011-02-07 10:16:14 +08:00
|
|
|
}
|
|
|
|
|
2011-12-14 20:57:32 +08:00
|
|
|
static inline int
|
|
|
|
__copy_to_user_swizzled(char __user *cpu_vaddr,
|
|
|
|
const char *gpu_vaddr, int gpu_offset,
|
|
|
|
int length)
|
|
|
|
{
|
|
|
|
int ret, cpu_offset = 0;
|
|
|
|
|
|
|
|
while (length > 0) {
|
|
|
|
int cacheline_end = ALIGN(gpu_offset + 1, 64);
|
|
|
|
int this_length = min(cacheline_end - gpu_offset, length);
|
|
|
|
int swizzled_gpu_offset = gpu_offset ^ 64;
|
|
|
|
|
|
|
|
ret = __copy_to_user(cpu_vaddr + cpu_offset,
|
|
|
|
gpu_vaddr + swizzled_gpu_offset,
|
|
|
|
this_length);
|
|
|
|
if (ret)
|
|
|
|
return ret + length;
|
|
|
|
|
|
|
|
cpu_offset += this_length;
|
|
|
|
gpu_offset += this_length;
|
|
|
|
length -= this_length;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
drm/i915: rewrite shmem_pwrite_slow to use copy_from_user
... instead of get_user_pages, because that fails on non page-backed
user addresses like e.g. a gtt mapping of a bo.
To get there essentially copy the vfs read path into pagecache. We
can't call that right away because we have to take care of bit17
swizzling. To not deadlock with our own pagefault handler we need
to completely drop struct_mutex, reducing the atomicty-guarantees
of our userspace abi. Implications for racing with other gem ioctl:
- execbuf, pwrite, pread: Due to -EFAULT fallback to slow paths there's
already the risk of the pwrite call not being atomic, no degration.
- read/write access to mmaps: already fully racy, no degration.
- set_tiling: Calling set_tiling while reading/writing is already
pretty much undefined, now it just got a bit worse. set_tiling is
only called by libdrm on unused/new bos, so no problem.
- set_domain: When changing to the gtt domain while copying (without any
read/write access, e.g. for synchronization), we might leave unflushed
data in the cpu caches. The clflush_object at the end of pwrite_slow
takes care of this problem.
- truncating of purgeable objects: the shmem_read_mapping_page call could
reinstate backing storage for truncated objects. The check at the end
of pwrite_slow takes care of this.
v2:
- add missing intel_gtt_chipset_flush
- add __ to copy_from_user_swizzled as suggest by Chris Wilson.
v3: Fixup bit17 swizzling, it swizzled the wrong pages.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2011-12-14 20:57:31 +08:00
|
|
|
static inline int
|
2012-04-17 05:07:47 +08:00
|
|
|
__copy_from_user_swizzled(char *gpu_vaddr, int gpu_offset,
|
|
|
|
const char __user *cpu_vaddr,
|
drm/i915: rewrite shmem_pwrite_slow to use copy_from_user
... instead of get_user_pages, because that fails on non page-backed
user addresses like e.g. a gtt mapping of a bo.
To get there essentially copy the vfs read path into pagecache. We
can't call that right away because we have to take care of bit17
swizzling. To not deadlock with our own pagefault handler we need
to completely drop struct_mutex, reducing the atomicty-guarantees
of our userspace abi. Implications for racing with other gem ioctl:
- execbuf, pwrite, pread: Due to -EFAULT fallback to slow paths there's
already the risk of the pwrite call not being atomic, no degration.
- read/write access to mmaps: already fully racy, no degration.
- set_tiling: Calling set_tiling while reading/writing is already
pretty much undefined, now it just got a bit worse. set_tiling is
only called by libdrm on unused/new bos, so no problem.
- set_domain: When changing to the gtt domain while copying (without any
read/write access, e.g. for synchronization), we might leave unflushed
data in the cpu caches. The clflush_object at the end of pwrite_slow
takes care of this problem.
- truncating of purgeable objects: the shmem_read_mapping_page call could
reinstate backing storage for truncated objects. The check at the end
of pwrite_slow takes care of this.
v2:
- add missing intel_gtt_chipset_flush
- add __ to copy_from_user_swizzled as suggest by Chris Wilson.
v3: Fixup bit17 swizzling, it swizzled the wrong pages.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2011-12-14 20:57:31 +08:00
|
|
|
int length)
|
|
|
|
{
|
|
|
|
int ret, cpu_offset = 0;
|
|
|
|
|
|
|
|
while (length > 0) {
|
|
|
|
int cacheline_end = ALIGN(gpu_offset + 1, 64);
|
|
|
|
int this_length = min(cacheline_end - gpu_offset, length);
|
|
|
|
int swizzled_gpu_offset = gpu_offset ^ 64;
|
|
|
|
|
|
|
|
ret = __copy_from_user(gpu_vaddr + swizzled_gpu_offset,
|
|
|
|
cpu_vaddr + cpu_offset,
|
|
|
|
this_length);
|
|
|
|
if (ret)
|
|
|
|
return ret + length;
|
|
|
|
|
|
|
|
cpu_offset += this_length;
|
|
|
|
gpu_offset += this_length;
|
|
|
|
length -= this_length;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2014-02-19 02:15:45 +08:00
|
|
|
/*
|
|
|
|
* Pins the specified object's pages and synchronizes the object with
|
|
|
|
* GPU accesses. Sets needs_clflush to non-zero if the caller should
|
|
|
|
* flush the object from the CPU cache.
|
|
|
|
*/
|
|
|
|
int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
|
2016-08-19 00:16:47 +08:00
|
|
|
unsigned int *needs_clflush)
|
2014-02-19 02:15:45 +08:00
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
*needs_clflush = 0;
|
|
|
|
|
2016-08-19 00:16:47 +08:00
|
|
|
if (!i915_gem_object_has_struct_page(obj))
|
|
|
|
return -ENODEV;
|
2014-02-19 02:15:45 +08:00
|
|
|
|
2016-07-20 16:21:15 +08:00
|
|
|
ret = i915_gem_object_wait_rendering(obj, true);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2016-08-19 00:16:50 +08:00
|
|
|
ret = i915_gem_object_get_pages(obj);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
i915_gem_object_pin_pages(obj);
|
|
|
|
|
2016-08-19 00:16:48 +08:00
|
|
|
i915_gem_object_flush_gtt_write_domain(obj);
|
2014-02-19 02:15:45 +08:00
|
|
|
|
2016-08-19 00:16:47 +08:00
|
|
|
/* If we're not in the cpu read domain, set ourself into the gtt
|
|
|
|
* read domain and manually flush cachelines (if required). This
|
|
|
|
* optimizes for the case when the gpu will dirty the data
|
|
|
|
* anyway again before the next pread happens.
|
|
|
|
*/
|
|
|
|
if (!(obj->base.read_domains & I915_GEM_DOMAIN_CPU))
|
2014-02-19 02:15:45 +08:00
|
|
|
*needs_clflush = !cpu_cache_is_coherent(obj->base.dev,
|
|
|
|
obj->cache_level);
|
2016-08-19 00:16:47 +08:00
|
|
|
|
|
|
|
if (*needs_clflush && !static_cpu_has(X86_FEATURE_CLFLUSH)) {
|
|
|
|
ret = i915_gem_object_set_to_cpu_domain(obj, false);
|
2014-02-19 02:15:45 +08:00
|
|
|
if (ret)
|
2016-08-19 00:16:50 +08:00
|
|
|
goto err_unpin;
|
|
|
|
|
2016-08-19 00:16:47 +08:00
|
|
|
*needs_clflush = 0;
|
2014-02-19 02:15:45 +08:00
|
|
|
}
|
|
|
|
|
2016-08-19 00:16:50 +08:00
|
|
|
/* return with the pages pinned */
|
2016-08-19 00:16:47 +08:00
|
|
|
return 0;
|
2016-08-19 00:16:50 +08:00
|
|
|
|
|
|
|
err_unpin:
|
|
|
|
i915_gem_object_unpin_pages(obj);
|
|
|
|
return ret;
|
2016-08-19 00:16:47 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj,
|
|
|
|
unsigned int *needs_clflush)
|
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
*needs_clflush = 0;
|
|
|
|
if (!i915_gem_object_has_struct_page(obj))
|
|
|
|
return -ENODEV;
|
|
|
|
|
|
|
|
ret = i915_gem_object_wait_rendering(obj, false);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2014-02-19 02:15:45 +08:00
|
|
|
ret = i915_gem_object_get_pages(obj);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
i915_gem_object_pin_pages(obj);
|
|
|
|
|
2016-08-19 00:16:48 +08:00
|
|
|
i915_gem_object_flush_gtt_write_domain(obj);
|
|
|
|
|
2016-08-19 00:16:47 +08:00
|
|
|
/* If we're not in the cpu write domain, set ourself into the
|
|
|
|
* gtt write domain and manually flush cachelines (as required).
|
|
|
|
* This optimizes for the case when the gpu will use the data
|
|
|
|
* right away and we therefore have to clflush anyway.
|
|
|
|
*/
|
|
|
|
if (obj->base.write_domain != I915_GEM_DOMAIN_CPU)
|
|
|
|
*needs_clflush |= cpu_write_needs_clflush(obj) << 1;
|
|
|
|
|
|
|
|
/* Same trick applies to invalidate partially written cachelines read
|
|
|
|
* before writing.
|
|
|
|
*/
|
|
|
|
if (!(obj->base.read_domains & I915_GEM_DOMAIN_CPU))
|
|
|
|
*needs_clflush |= !cpu_cache_is_coherent(obj->base.dev,
|
|
|
|
obj->cache_level);
|
|
|
|
|
|
|
|
if (*needs_clflush && !static_cpu_has(X86_FEATURE_CLFLUSH)) {
|
|
|
|
ret = i915_gem_object_set_to_cpu_domain(obj, true);
|
2016-08-19 00:16:50 +08:00
|
|
|
if (ret)
|
|
|
|
goto err_unpin;
|
|
|
|
|
2016-08-19 00:16:47 +08:00
|
|
|
*needs_clflush = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
if ((*needs_clflush & CLFLUSH_AFTER) == 0)
|
|
|
|
obj->cache_dirty = true;
|
|
|
|
|
|
|
|
intel_fb_obj_invalidate(obj, ORIGIN_CPU);
|
|
|
|
obj->dirty = 1;
|
2016-08-19 00:16:50 +08:00
|
|
|
/* return with the pages pinned */
|
2016-08-19 00:16:47 +08:00
|
|
|
return 0;
|
2016-08-19 00:16:50 +08:00
|
|
|
|
|
|
|
err_unpin:
|
|
|
|
i915_gem_object_unpin_pages(obj);
|
2014-02-19 02:15:45 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2012-03-26 01:47:40 +08:00
|
|
|
/* Per-page copy function for the shmem pread fastpath.
|
|
|
|
* Flushes invalid cachelines before reading the target if
|
|
|
|
* needs_clflush is set. */
|
2009-03-11 02:44:52 +08:00
|
|
|
static int
|
2012-03-26 01:47:40 +08:00
|
|
|
shmem_pread_fast(struct page *page, int shmem_page_offset, int page_length,
|
|
|
|
char __user *user_data,
|
|
|
|
bool page_do_bit17_swizzling, bool needs_clflush)
|
|
|
|
{
|
|
|
|
char *vaddr;
|
|
|
|
int ret;
|
|
|
|
|
2012-03-26 01:47:43 +08:00
|
|
|
if (unlikely(page_do_bit17_swizzling))
|
2012-03-26 01:47:40 +08:00
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
vaddr = kmap_atomic(page);
|
|
|
|
if (needs_clflush)
|
|
|
|
drm_clflush_virt_range(vaddr + shmem_page_offset,
|
|
|
|
page_length);
|
|
|
|
ret = __copy_to_user_inatomic(user_data,
|
|
|
|
vaddr + shmem_page_offset,
|
|
|
|
page_length);
|
|
|
|
kunmap_atomic(vaddr);
|
|
|
|
|
2012-09-05 04:02:56 +08:00
|
|
|
return ret ? -EFAULT : 0;
|
2012-03-26 01:47:40 +08:00
|
|
|
}
|
|
|
|
|
2012-03-26 01:47:42 +08:00
|
|
|
static void
|
|
|
|
shmem_clflush_swizzled_range(char *addr, unsigned long length,
|
|
|
|
bool swizzled)
|
|
|
|
{
|
2012-03-26 01:47:43 +08:00
|
|
|
if (unlikely(swizzled)) {
|
2012-03-26 01:47:42 +08:00
|
|
|
unsigned long start = (unsigned long) addr;
|
|
|
|
unsigned long end = (unsigned long) addr + length;
|
|
|
|
|
|
|
|
/* For swizzling simply ensure that we always flush both
|
|
|
|
* channels. Lame, but simple and it works. Swizzled
|
|
|
|
* pwrite/pread is far from a hotpath - current userspace
|
|
|
|
* doesn't use it at all. */
|
|
|
|
start = round_down(start, 128);
|
|
|
|
end = round_up(end, 128);
|
|
|
|
|
|
|
|
drm_clflush_virt_range((void *)start, end - start);
|
|
|
|
} else {
|
|
|
|
drm_clflush_virt_range(addr, length);
|
|
|
|
}
|
|
|
|
|
|
|
|
}
|
|
|
|
|
2012-03-26 01:47:40 +08:00
|
|
|
/* Only difference to the fast-path function is that this can handle bit17
|
|
|
|
* and uses non-atomic copy and kmap functions. */
|
|
|
|
static int
|
|
|
|
shmem_pread_slow(struct page *page, int shmem_page_offset, int page_length,
|
|
|
|
char __user *user_data,
|
|
|
|
bool page_do_bit17_swizzling, bool needs_clflush)
|
|
|
|
{
|
|
|
|
char *vaddr;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
vaddr = kmap(page);
|
|
|
|
if (needs_clflush)
|
2012-03-26 01:47:42 +08:00
|
|
|
shmem_clflush_swizzled_range(vaddr + shmem_page_offset,
|
|
|
|
page_length,
|
|
|
|
page_do_bit17_swizzling);
|
2012-03-26 01:47:40 +08:00
|
|
|
|
|
|
|
if (page_do_bit17_swizzling)
|
|
|
|
ret = __copy_to_user_swizzled(user_data,
|
|
|
|
vaddr, shmem_page_offset,
|
|
|
|
page_length);
|
|
|
|
else
|
|
|
|
ret = __copy_to_user(user_data,
|
|
|
|
vaddr + shmem_page_offset,
|
|
|
|
page_length);
|
|
|
|
kunmap(page);
|
|
|
|
|
2012-09-05 04:02:56 +08:00
|
|
|
return ret ? - EFAULT : 0;
|
2012-03-26 01:47:40 +08:00
|
|
|
}
|
|
|
|
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
static inline unsigned long
|
|
|
|
slow_user_access(struct io_mapping *mapping,
|
|
|
|
uint64_t page_base, int page_offset,
|
|
|
|
char __user *user_data,
|
|
|
|
unsigned long length, bool pwrite)
|
|
|
|
{
|
|
|
|
void __iomem *ioaddr;
|
|
|
|
void *vaddr;
|
|
|
|
uint64_t unwritten;
|
|
|
|
|
|
|
|
ioaddr = io_mapping_map_wc(mapping, page_base, PAGE_SIZE);
|
|
|
|
/* We can use the cpu mem copy function because this is X86. */
|
|
|
|
vaddr = (void __force *)ioaddr + page_offset;
|
|
|
|
if (pwrite)
|
|
|
|
unwritten = __copy_from_user(vaddr, user_data, length);
|
|
|
|
else
|
|
|
|
unwritten = __copy_to_user(user_data, vaddr, length);
|
|
|
|
|
|
|
|
io_mapping_unmap(ioaddr);
|
|
|
|
return unwritten;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
i915_gem_gtt_pread(struct drm_device *dev,
|
|
|
|
struct drm_i915_gem_object *obj, uint64_t size,
|
|
|
|
uint64_t data_offset, uint64_t data_ptr)
|
|
|
|
{
|
2016-07-04 18:34:36 +08:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(dev);
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
struct i915_ggtt *ggtt = &dev_priv->ggtt;
|
2016-08-15 17:49:06 +08:00
|
|
|
struct i915_vma *vma;
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
struct drm_mm_node node;
|
|
|
|
char __user *user_data;
|
|
|
|
uint64_t remain;
|
|
|
|
uint64_t offset;
|
|
|
|
int ret;
|
|
|
|
|
2016-08-15 17:49:06 +08:00
|
|
|
vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, PIN_MAPPABLE);
|
2016-08-19 00:16:45 +08:00
|
|
|
if (!IS_ERR(vma)) {
|
|
|
|
node.start = i915_ggtt_offset(vma);
|
|
|
|
node.allocated = false;
|
2016-08-19 00:17:00 +08:00
|
|
|
ret = i915_vma_put_fence(vma);
|
2016-08-19 00:16:45 +08:00
|
|
|
if (ret) {
|
|
|
|
i915_vma_unpin(vma);
|
|
|
|
vma = ERR_PTR(ret);
|
|
|
|
}
|
|
|
|
}
|
2016-08-15 17:49:06 +08:00
|
|
|
if (IS_ERR(vma)) {
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
ret = insert_mappable_node(dev_priv, &node, PAGE_SIZE);
|
|
|
|
if (ret)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
ret = i915_gem_object_get_pages(obj);
|
|
|
|
if (ret) {
|
|
|
|
remove_mappable_node(&node);
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
i915_gem_object_pin_pages(obj);
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = i915_gem_object_set_to_gtt_domain(obj, false);
|
|
|
|
if (ret)
|
|
|
|
goto out_unpin;
|
|
|
|
|
|
|
|
user_data = u64_to_user_ptr(data_ptr);
|
|
|
|
remain = size;
|
|
|
|
offset = data_offset;
|
|
|
|
|
|
|
|
mutex_unlock(&dev->struct_mutex);
|
|
|
|
if (likely(!i915.prefault_disable)) {
|
2016-09-18 06:02:44 +08:00
|
|
|
ret = fault_in_pages_writeable(user_data, remain);
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
if (ret) {
|
|
|
|
mutex_lock(&dev->struct_mutex);
|
|
|
|
goto out_unpin;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
while (remain > 0) {
|
|
|
|
/* Operation in this page
|
|
|
|
*
|
|
|
|
* page_base = page offset within aperture
|
|
|
|
* page_offset = offset within page
|
|
|
|
* page_length = bytes to copy for this page
|
|
|
|
*/
|
|
|
|
u32 page_base = node.start;
|
|
|
|
unsigned page_offset = offset_in_page(offset);
|
|
|
|
unsigned page_length = PAGE_SIZE - page_offset;
|
|
|
|
page_length = remain < page_length ? remain : page_length;
|
|
|
|
if (node.allocated) {
|
|
|
|
wmb();
|
|
|
|
ggtt->base.insert_page(&ggtt->base,
|
|
|
|
i915_gem_object_get_dma_address(obj, offset >> PAGE_SHIFT),
|
|
|
|
node.start,
|
|
|
|
I915_CACHE_NONE, 0);
|
|
|
|
wmb();
|
|
|
|
} else {
|
|
|
|
page_base += offset & PAGE_MASK;
|
|
|
|
}
|
|
|
|
/* This is a slow read/write as it tries to read from
|
|
|
|
* and write to user memory which may result into page
|
|
|
|
* faults, and so we cannot perform this under struct_mutex.
|
|
|
|
*/
|
2016-08-19 23:54:27 +08:00
|
|
|
if (slow_user_access(&ggtt->mappable, page_base,
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
page_offset, user_data,
|
|
|
|
page_length, false)) {
|
|
|
|
ret = -EFAULT;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
remain -= page_length;
|
|
|
|
user_data += page_length;
|
|
|
|
offset += page_length;
|
|
|
|
}
|
|
|
|
|
|
|
|
mutex_lock(&dev->struct_mutex);
|
|
|
|
if (ret == 0 && (obj->base.read_domains & I915_GEM_DOMAIN_GTT) == 0) {
|
|
|
|
/* The user has modified the object whilst we tried
|
|
|
|
* reading from it, and we now have no idea what domain
|
|
|
|
* the pages should be in. As we have just been touching
|
|
|
|
* them directly, flush everything back to the GTT
|
|
|
|
* domain.
|
|
|
|
*/
|
|
|
|
ret = i915_gem_object_set_to_gtt_domain(obj, false);
|
|
|
|
}
|
|
|
|
|
|
|
|
out_unpin:
|
|
|
|
if (node.allocated) {
|
|
|
|
wmb();
|
|
|
|
ggtt->base.clear_range(&ggtt->base,
|
2016-10-13 20:02:40 +08:00
|
|
|
node.start, node.size);
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
i915_gem_object_unpin_pages(obj);
|
|
|
|
remove_mappable_node(&node);
|
|
|
|
} else {
|
2016-08-15 17:49:06 +08:00
|
|
|
i915_vma_unpin(vma);
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
}
|
|
|
|
out:
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2009-03-11 02:44:52 +08:00
|
|
|
static int
|
2012-03-26 01:47:29 +08:00
|
|
|
i915_gem_shmem_pread(struct drm_device *dev,
|
|
|
|
struct drm_i915_gem_object *obj,
|
|
|
|
struct drm_i915_gem_pread *args,
|
|
|
|
struct drm_file *file)
|
2009-03-11 02:44:52 +08:00
|
|
|
{
|
2011-12-14 20:57:32 +08:00
|
|
|
char __user *user_data;
|
2009-03-11 02:44:52 +08:00
|
|
|
ssize_t remain;
|
2011-12-14 20:57:32 +08:00
|
|
|
loff_t offset;
|
2012-02-15 21:42:43 +08:00
|
|
|
int shmem_page_offset, page_length, ret = 0;
|
2011-12-14 20:57:32 +08:00
|
|
|
int obj_do_bit17_swizzling, page_do_bit17_swizzling;
|
2012-03-26 01:47:36 +08:00
|
|
|
int prefaulted = 0;
|
2012-03-26 01:47:31 +08:00
|
|
|
int needs_clflush = 0;
|
2013-02-19 01:28:02 +08:00
|
|
|
struct sg_page_iter sg_iter;
|
2009-03-11 02:44:52 +08:00
|
|
|
|
2014-02-19 02:15:45 +08:00
|
|
|
ret = i915_gem_obj_prepare_shmem_read(obj, &needs_clflush);
|
2012-09-05 04:02:56 +08:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2016-08-19 00:16:47 +08:00
|
|
|
obj_do_bit17_swizzling = i915_gem_object_needs_bit17_swizzle(obj);
|
|
|
|
user_data = u64_to_user_ptr(args->data_ptr);
|
2011-12-14 20:57:32 +08:00
|
|
|
offset = args->offset;
|
2016-08-19 00:16:47 +08:00
|
|
|
remain = args->size;
|
2009-03-11 02:44:52 +08:00
|
|
|
|
2013-02-19 01:28:02 +08:00
|
|
|
for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents,
|
|
|
|
offset >> PAGE_SHIFT) {
|
2013-03-26 21:14:18 +08:00
|
|
|
struct page *page = sg_page_iter_page(&sg_iter);
|
2012-06-01 22:20:22 +08:00
|
|
|
|
|
|
|
if (remain <= 0)
|
|
|
|
break;
|
|
|
|
|
2009-03-11 02:44:52 +08:00
|
|
|
/* Operation in this page
|
|
|
|
*
|
|
|
|
* shmem_page_offset = offset within page in shmem file
|
|
|
|
* page_length = bytes to copy for this page
|
|
|
|
*/
|
2011-05-13 05:17:11 +08:00
|
|
|
shmem_page_offset = offset_in_page(offset);
|
2009-03-11 02:44:52 +08:00
|
|
|
page_length = remain;
|
|
|
|
if ((shmem_page_offset + page_length) > PAGE_SIZE)
|
|
|
|
page_length = PAGE_SIZE - shmem_page_offset;
|
|
|
|
|
2011-12-14 20:57:32 +08:00
|
|
|
page_do_bit17_swizzling = obj_do_bit17_swizzling &&
|
|
|
|
(page_to_phys(page) & (1 << 17)) != 0;
|
|
|
|
|
2012-03-26 01:47:40 +08:00
|
|
|
ret = shmem_pread_fast(page, shmem_page_offset, page_length,
|
|
|
|
user_data, page_do_bit17_swizzling,
|
|
|
|
needs_clflush);
|
|
|
|
if (ret == 0)
|
|
|
|
goto next_page;
|
2012-03-26 01:47:29 +08:00
|
|
|
|
|
|
|
mutex_unlock(&dev->struct_mutex);
|
|
|
|
|
2014-01-21 17:24:25 +08:00
|
|
|
if (likely(!i915.prefault_disable) && !prefaulted) {
|
2016-09-18 06:02:44 +08:00
|
|
|
ret = fault_in_pages_writeable(user_data, remain);
|
2012-03-26 01:47:36 +08:00
|
|
|
/* Userspace is tricking us, but we've already clobbered
|
|
|
|
* its pages with the prefault and promised to write the
|
|
|
|
* data up to the first fault. Hence ignore any errors
|
|
|
|
* and just continue. */
|
|
|
|
(void)ret;
|
|
|
|
prefaulted = 1;
|
|
|
|
}
|
2009-03-11 02:44:52 +08:00
|
|
|
|
2012-03-26 01:47:40 +08:00
|
|
|
ret = shmem_pread_slow(page, shmem_page_offset, page_length,
|
|
|
|
user_data, page_do_bit17_swizzling,
|
|
|
|
needs_clflush);
|
2009-03-11 02:44:52 +08:00
|
|
|
|
2012-03-26 01:47:29 +08:00
|
|
|
mutex_lock(&dev->struct_mutex);
|
2012-09-05 04:02:56 +08:00
|
|
|
|
|
|
|
if (ret)
|
2011-12-14 20:57:32 +08:00
|
|
|
goto out;
|
|
|
|
|
2014-03-07 16:30:36 +08:00
|
|
|
next_page:
|
2009-03-11 02:44:52 +08:00
|
|
|
remain -= page_length;
|
2011-12-14 20:57:32 +08:00
|
|
|
user_data += page_length;
|
2009-03-11 02:44:52 +08:00
|
|
|
offset += page_length;
|
|
|
|
}
|
|
|
|
|
2010-10-14 22:26:45 +08:00
|
|
|
out:
|
2016-08-19 00:16:47 +08:00
|
|
|
i915_gem_obj_finish_shmem_access(obj);
|
2012-09-05 04:02:56 +08:00
|
|
|
|
2009-03-11 02:44:52 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2008-07-31 03:06:12 +08:00
|
|
|
/**
|
|
|
|
* Reads data from the object referenced by handle.
|
2016-06-03 21:02:17 +08:00
|
|
|
* @dev: drm device pointer
|
|
|
|
* @data: ioctl data blob
|
|
|
|
* @file: drm file pointer
|
2008-07-31 03:06:12 +08:00
|
|
|
*
|
|
|
|
* On error, the contents of *data are undefined.
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
i915_gem_pread_ioctl(struct drm_device *dev, void *data,
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_file *file)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
|
|
|
struct drm_i915_gem_pread *args = data;
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_i915_gem_object *obj;
|
2010-09-27 03:23:38 +08:00
|
|
|
int ret = 0;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2010-11-17 17:10:42 +08:00
|
|
|
if (args->size == 0)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (!access_ok(VERIFY_WRITE,
|
2016-04-26 23:32:27 +08:00
|
|
|
u64_to_user_ptr(args->data_ptr),
|
2010-11-17 17:10:42 +08:00
|
|
|
args->size))
|
|
|
|
return -EFAULT;
|
|
|
|
|
2016-07-20 20:31:51 +08:00
|
|
|
obj = i915_gem_object_lookup(file, args->handle);
|
2016-08-05 17:14:16 +08:00
|
|
|
if (!obj)
|
|
|
|
return -ENOENT;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2010-09-27 03:21:44 +08:00
|
|
|
/* Bounds check source. */
|
2010-11-09 03:18:58 +08:00
|
|
|
if (args->offset > obj->base.size ||
|
|
|
|
args->size > obj->base.size - args->offset) {
|
2010-09-27 03:50:05 +08:00
|
|
|
ret = -EINVAL;
|
2016-08-05 17:14:16 +08:00
|
|
|
goto err;
|
2010-09-27 03:50:05 +08:00
|
|
|
}
|
|
|
|
|
2011-02-03 19:57:46 +08:00
|
|
|
trace_i915_gem_object_pread(obj, args->offset, args->size);
|
|
|
|
|
2016-08-05 17:14:16 +08:00
|
|
|
ret = __unsafe_wait_rendering(obj, to_rps_client(file), true);
|
|
|
|
if (ret)
|
|
|
|
goto err;
|
|
|
|
|
|
|
|
ret = i915_mutex_lock_interruptible(dev);
|
|
|
|
if (ret)
|
|
|
|
goto err;
|
|
|
|
|
2012-03-26 01:47:29 +08:00
|
|
|
ret = i915_gem_shmem_pread(dev, obj, args, file);
|
2008-07-31 03:06:12 +08:00
|
|
|
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
/* pread for non shmem backed objects */
|
2016-08-04 16:09:53 +08:00
|
|
|
if (ret == -EFAULT || ret == -ENODEV) {
|
|
|
|
intel_runtime_pm_get(to_i915(dev));
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
ret = i915_gem_gtt_pread(dev, obj, args->size,
|
|
|
|
args->offset, args->data_ptr);
|
2016-08-04 16:09:53 +08:00
|
|
|
intel_runtime_pm_put(to_i915(dev));
|
|
|
|
}
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
|
2016-07-20 20:31:53 +08:00
|
|
|
i915_gem_object_put(obj);
|
2010-10-14 22:26:45 +08:00
|
|
|
mutex_unlock(&dev->struct_mutex);
|
2016-08-05 17:14:16 +08:00
|
|
|
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
err:
|
|
|
|
i915_gem_object_put_unlocked(obj);
|
2009-03-11 02:44:52 +08:00
|
|
|
return ret;
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
|
|
|
|
2008-10-31 10:38:48 +08:00
|
|
|
/* This is the fast write path which cannot handle
|
|
|
|
* page faults in the source data
|
2008-10-21 05:16:43 +08:00
|
|
|
*/
|
2008-10-31 10:38:48 +08:00
|
|
|
|
|
|
|
static inline int
|
|
|
|
fast_user_write(struct io_mapping *mapping,
|
|
|
|
loff_t page_base, int page_offset,
|
|
|
|
char __user *user_data,
|
|
|
|
int length)
|
2008-10-21 05:16:43 +08:00
|
|
|
{
|
2012-04-17 05:07:47 +08:00
|
|
|
void __iomem *vaddr_atomic;
|
|
|
|
void *vaddr;
|
2008-10-31 10:38:48 +08:00
|
|
|
unsigned long unwritten;
|
2008-10-21 05:16:43 +08:00
|
|
|
|
2010-10-27 05:21:51 +08:00
|
|
|
vaddr_atomic = io_mapping_map_atomic_wc(mapping, page_base);
|
2012-04-17 05:07:47 +08:00
|
|
|
/* We can use the cpu mem copy function because this is X86. */
|
|
|
|
vaddr = (void __force*)vaddr_atomic + page_offset;
|
|
|
|
unwritten = __copy_from_user_inatomic_nocache(vaddr,
|
2008-10-31 10:38:48 +08:00
|
|
|
user_data, length);
|
2010-10-27 05:21:51 +08:00
|
|
|
io_mapping_unmap_atomic(vaddr_atomic);
|
2010-10-14 22:03:58 +08:00
|
|
|
return unwritten;
|
2008-10-31 10:38:48 +08:00
|
|
|
}
|
|
|
|
|
2009-03-10 00:42:23 +08:00
|
|
|
/**
|
|
|
|
* This is the fast pwrite path, where we copy the data directly from the
|
|
|
|
* user into the GTT, uncached.
|
2016-07-16 03:48:07 +08:00
|
|
|
* @i915: i915 device private data
|
2016-06-03 21:02:17 +08:00
|
|
|
* @obj: i915 gem object
|
|
|
|
* @args: pwrite arguments structure
|
|
|
|
* @file: drm file pointer
|
2009-03-10 00:42:23 +08:00
|
|
|
*/
|
2008-07-31 03:06:12 +08:00
|
|
|
static int
|
2016-06-10 16:53:01 +08:00
|
|
|
i915_gem_gtt_pwrite_fast(struct drm_i915_private *i915,
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_i915_gem_object *obj,
|
2009-03-10 00:42:23 +08:00
|
|
|
struct drm_i915_gem_pwrite *args,
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_file *file)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
2016-06-10 16:53:01 +08:00
|
|
|
struct i915_ggtt *ggtt = &i915->ggtt;
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
struct drm_device *dev = obj->base.dev;
|
2016-08-15 17:49:06 +08:00
|
|
|
struct i915_vma *vma;
|
2016-06-10 16:53:01 +08:00
|
|
|
struct drm_mm_node node;
|
|
|
|
uint64_t remain, offset;
|
2008-07-31 03:06:12 +08:00
|
|
|
char __user *user_data;
|
2016-06-10 16:53:01 +08:00
|
|
|
int ret;
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
bool hit_slow_path = false;
|
|
|
|
|
2016-08-05 17:14:23 +08:00
|
|
|
if (i915_gem_object_is_tiled(obj))
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
return -EFAULT;
|
2012-03-26 01:47:35 +08:00
|
|
|
|
2016-08-15 17:49:06 +08:00
|
|
|
vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
|
2016-08-04 23:32:34 +08:00
|
|
|
PIN_MAPPABLE | PIN_NONBLOCK);
|
2016-08-19 00:16:45 +08:00
|
|
|
if (!IS_ERR(vma)) {
|
|
|
|
node.start = i915_ggtt_offset(vma);
|
|
|
|
node.allocated = false;
|
2016-08-19 00:17:00 +08:00
|
|
|
ret = i915_vma_put_fence(vma);
|
2016-08-19 00:16:45 +08:00
|
|
|
if (ret) {
|
|
|
|
i915_vma_unpin(vma);
|
|
|
|
vma = ERR_PTR(ret);
|
|
|
|
}
|
|
|
|
}
|
2016-08-15 17:49:06 +08:00
|
|
|
if (IS_ERR(vma)) {
|
2016-06-10 16:53:01 +08:00
|
|
|
ret = insert_mappable_node(i915, &node, PAGE_SIZE);
|
|
|
|
if (ret)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
ret = i915_gem_object_get_pages(obj);
|
|
|
|
if (ret) {
|
|
|
|
remove_mappable_node(&node);
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
i915_gem_object_pin_pages(obj);
|
|
|
|
}
|
2012-03-26 01:47:35 +08:00
|
|
|
|
|
|
|
ret = i915_gem_object_set_to_gtt_domain(obj, true);
|
|
|
|
if (ret)
|
|
|
|
goto out_unpin;
|
|
|
|
|
2016-08-19 00:16:43 +08:00
|
|
|
intel_fb_obj_invalidate(obj, ORIGIN_CPU);
|
2016-06-10 16:53:01 +08:00
|
|
|
obj->dirty = true;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2016-04-26 23:32:27 +08:00
|
|
|
user_data = u64_to_user_ptr(args->data_ptr);
|
2016-06-10 16:53:01 +08:00
|
|
|
offset = args->offset;
|
2008-07-31 03:06:12 +08:00
|
|
|
remain = args->size;
|
2016-06-10 16:53:01 +08:00
|
|
|
while (remain) {
|
2008-07-31 03:06:12 +08:00
|
|
|
/* Operation in this page
|
|
|
|
*
|
2008-10-31 10:38:48 +08:00
|
|
|
* page_base = page offset within aperture
|
|
|
|
* page_offset = offset within page
|
|
|
|
* page_length = bytes to copy for this page
|
2008-07-31 03:06:12 +08:00
|
|
|
*/
|
2016-06-10 16:53:01 +08:00
|
|
|
u32 page_base = node.start;
|
|
|
|
unsigned page_offset = offset_in_page(offset);
|
|
|
|
unsigned page_length = PAGE_SIZE - page_offset;
|
|
|
|
page_length = remain < page_length ? remain : page_length;
|
|
|
|
if (node.allocated) {
|
|
|
|
wmb(); /* flush the write before we modify the GGTT */
|
|
|
|
ggtt->base.insert_page(&ggtt->base,
|
|
|
|
i915_gem_object_get_dma_address(obj, offset >> PAGE_SHIFT),
|
|
|
|
node.start, I915_CACHE_NONE, 0);
|
|
|
|
wmb(); /* flush modifications to the GGTT (insert_page) */
|
|
|
|
} else {
|
|
|
|
page_base += offset & PAGE_MASK;
|
|
|
|
}
|
2008-10-31 10:38:48 +08:00
|
|
|
/* If we get a fault while copying data, then (presumably) our
|
2009-03-10 00:42:23 +08:00
|
|
|
* source page isn't available. Return the error and we'll
|
|
|
|
* retry in the slow path.
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
* If the object is non-shmem backed, we retry again with the
|
|
|
|
* path that handles page fault.
|
2008-10-31 10:38:48 +08:00
|
|
|
*/
|
2016-08-19 23:54:27 +08:00
|
|
|
if (fast_user_write(&ggtt->mappable, page_base,
|
2012-03-26 01:47:35 +08:00
|
|
|
page_offset, user_data, page_length)) {
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
hit_slow_path = true;
|
|
|
|
mutex_unlock(&dev->struct_mutex);
|
2016-08-19 23:54:27 +08:00
|
|
|
if (slow_user_access(&ggtt->mappable,
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
page_base,
|
|
|
|
page_offset, user_data,
|
|
|
|
page_length, true)) {
|
|
|
|
ret = -EFAULT;
|
|
|
|
mutex_lock(&dev->struct_mutex);
|
|
|
|
goto out_flush;
|
|
|
|
}
|
|
|
|
|
|
|
|
mutex_lock(&dev->struct_mutex);
|
2012-03-26 01:47:35 +08:00
|
|
|
}
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2008-10-31 10:38:48 +08:00
|
|
|
remain -= page_length;
|
|
|
|
user_data += page_length;
|
|
|
|
offset += page_length;
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
|
|
|
|
2015-02-14 03:23:45 +08:00
|
|
|
out_flush:
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
if (hit_slow_path) {
|
|
|
|
if (ret == 0 &&
|
|
|
|
(obj->base.read_domains & I915_GEM_DOMAIN_GTT) == 0) {
|
|
|
|
/* The user has modified the object whilst we tried
|
|
|
|
* reading from it, and we now have no idea what domain
|
|
|
|
* the pages should be in. As we have just been touching
|
|
|
|
* them directly, flush everything back to the GTT
|
|
|
|
* domain.
|
|
|
|
*/
|
|
|
|
ret = i915_gem_object_set_to_gtt_domain(obj, false);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-08-19 00:16:43 +08:00
|
|
|
intel_fb_obj_flush(obj, false, ORIGIN_CPU);
|
2012-03-26 01:47:35 +08:00
|
|
|
out_unpin:
|
2016-06-10 16:53:01 +08:00
|
|
|
if (node.allocated) {
|
|
|
|
wmb();
|
|
|
|
ggtt->base.clear_range(&ggtt->base,
|
2016-10-13 20:02:40 +08:00
|
|
|
node.start, node.size);
|
2016-06-10 16:53:01 +08:00
|
|
|
i915_gem_object_unpin_pages(obj);
|
|
|
|
remove_mappable_node(&node);
|
|
|
|
} else {
|
2016-08-15 17:49:06 +08:00
|
|
|
i915_vma_unpin(vma);
|
2016-06-10 16:53:01 +08:00
|
|
|
}
|
2012-03-26 01:47:35 +08:00
|
|
|
out:
|
2009-03-10 00:42:23 +08:00
|
|
|
return ret;
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
|
|
|
|
2012-03-26 01:47:40 +08:00
|
|
|
/* Per-page copy function for the shmem pwrite fastpath.
|
|
|
|
* Flushes invalid cachelines before writing to the target if
|
|
|
|
* needs_clflush_before is set and flushes out any written cachelines after
|
|
|
|
* writing if needs_clflush is set. */
|
2008-10-03 03:24:47 +08:00
|
|
|
static int
|
2012-03-26 01:47:40 +08:00
|
|
|
shmem_pwrite_fast(struct page *page, int shmem_page_offset, int page_length,
|
|
|
|
char __user *user_data,
|
|
|
|
bool page_do_bit17_swizzling,
|
|
|
|
bool needs_clflush_before,
|
|
|
|
bool needs_clflush_after)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
2012-03-26 01:47:40 +08:00
|
|
|
char *vaddr;
|
2008-07-31 03:06:12 +08:00
|
|
|
int ret;
|
2009-03-10 00:42:23 +08:00
|
|
|
|
2012-03-26 01:47:43 +08:00
|
|
|
if (unlikely(page_do_bit17_swizzling))
|
2012-03-26 01:47:40 +08:00
|
|
|
return -EINVAL;
|
2009-03-10 00:42:23 +08:00
|
|
|
|
2012-03-26 01:47:40 +08:00
|
|
|
vaddr = kmap_atomic(page);
|
|
|
|
if (needs_clflush_before)
|
|
|
|
drm_clflush_virt_range(vaddr + shmem_page_offset,
|
|
|
|
page_length);
|
2014-03-07 16:30:37 +08:00
|
|
|
ret = __copy_from_user_inatomic(vaddr + shmem_page_offset,
|
|
|
|
user_data, page_length);
|
2012-03-26 01:47:40 +08:00
|
|
|
if (needs_clflush_after)
|
|
|
|
drm_clflush_virt_range(vaddr + shmem_page_offset,
|
|
|
|
page_length);
|
|
|
|
kunmap_atomic(vaddr);
|
2009-03-10 00:42:23 +08:00
|
|
|
|
2012-09-05 04:02:55 +08:00
|
|
|
return ret ? -EFAULT : 0;
|
2009-03-10 00:42:23 +08:00
|
|
|
}
|
|
|
|
|
2012-03-26 01:47:40 +08:00
|
|
|
/* Only difference to the fast-path function is that this can handle bit17
|
|
|
|
* and uses non-atomic copy and kmap functions. */
|
2008-10-03 03:24:47 +08:00
|
|
|
static int
|
2012-03-26 01:47:40 +08:00
|
|
|
shmem_pwrite_slow(struct page *page, int shmem_page_offset, int page_length,
|
|
|
|
char __user *user_data,
|
|
|
|
bool page_do_bit17_swizzling,
|
|
|
|
bool needs_clflush_before,
|
|
|
|
bool needs_clflush_after)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
2012-03-26 01:47:40 +08:00
|
|
|
char *vaddr;
|
|
|
|
int ret;
|
2010-10-28 20:45:36 +08:00
|
|
|
|
2012-03-26 01:47:40 +08:00
|
|
|
vaddr = kmap(page);
|
2012-03-26 01:47:43 +08:00
|
|
|
if (unlikely(needs_clflush_before || page_do_bit17_swizzling))
|
2012-03-26 01:47:42 +08:00
|
|
|
shmem_clflush_swizzled_range(vaddr + shmem_page_offset,
|
|
|
|
page_length,
|
|
|
|
page_do_bit17_swizzling);
|
2012-03-26 01:47:40 +08:00
|
|
|
if (page_do_bit17_swizzling)
|
|
|
|
ret = __copy_from_user_swizzled(vaddr, shmem_page_offset,
|
2010-10-28 20:45:36 +08:00
|
|
|
user_data,
|
|
|
|
page_length);
|
2012-03-26 01:47:40 +08:00
|
|
|
else
|
|
|
|
ret = __copy_from_user(vaddr + shmem_page_offset,
|
|
|
|
user_data,
|
|
|
|
page_length);
|
|
|
|
if (needs_clflush_after)
|
2012-03-26 01:47:42 +08:00
|
|
|
shmem_clflush_swizzled_range(vaddr + shmem_page_offset,
|
|
|
|
page_length,
|
|
|
|
page_do_bit17_swizzling);
|
2012-03-26 01:47:40 +08:00
|
|
|
kunmap(page);
|
2009-03-10 04:42:30 +08:00
|
|
|
|
2012-09-05 04:02:55 +08:00
|
|
|
return ret ? -EFAULT : 0;
|
2009-03-10 04:42:30 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2012-03-26 01:47:28 +08:00
|
|
|
i915_gem_shmem_pwrite(struct drm_device *dev,
|
|
|
|
struct drm_i915_gem_object *obj,
|
|
|
|
struct drm_i915_gem_pwrite *args,
|
|
|
|
struct drm_file *file)
|
2009-03-10 04:42:30 +08:00
|
|
|
{
|
|
|
|
ssize_t remain;
|
drm/i915: rewrite shmem_pwrite_slow to use copy_from_user
... instead of get_user_pages, because that fails on non page-backed
user addresses like e.g. a gtt mapping of a bo.
To get there essentially copy the vfs read path into pagecache. We
can't call that right away because we have to take care of bit17
swizzling. To not deadlock with our own pagefault handler we need
to completely drop struct_mutex, reducing the atomicty-guarantees
of our userspace abi. Implications for racing with other gem ioctl:
- execbuf, pwrite, pread: Due to -EFAULT fallback to slow paths there's
already the risk of the pwrite call not being atomic, no degration.
- read/write access to mmaps: already fully racy, no degration.
- set_tiling: Calling set_tiling while reading/writing is already
pretty much undefined, now it just got a bit worse. set_tiling is
only called by libdrm on unused/new bos, so no problem.
- set_domain: When changing to the gtt domain while copying (without any
read/write access, e.g. for synchronization), we might leave unflushed
data in the cpu caches. The clflush_object at the end of pwrite_slow
takes care of this problem.
- truncating of purgeable objects: the shmem_read_mapping_page call could
reinstate backing storage for truncated objects. The check at the end
of pwrite_slow takes care of this.
v2:
- add missing intel_gtt_chipset_flush
- add __ to copy_from_user_swizzled as suggest by Chris Wilson.
v3: Fixup bit17 swizzling, it swizzled the wrong pages.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2011-12-14 20:57:31 +08:00
|
|
|
loff_t offset;
|
|
|
|
char __user *user_data;
|
2012-02-15 21:42:43 +08:00
|
|
|
int shmem_page_offset, page_length, ret = 0;
|
drm/i915: rewrite shmem_pwrite_slow to use copy_from_user
... instead of get_user_pages, because that fails on non page-backed
user addresses like e.g. a gtt mapping of a bo.
To get there essentially copy the vfs read path into pagecache. We
can't call that right away because we have to take care of bit17
swizzling. To not deadlock with our own pagefault handler we need
to completely drop struct_mutex, reducing the atomicty-guarantees
of our userspace abi. Implications for racing with other gem ioctl:
- execbuf, pwrite, pread: Due to -EFAULT fallback to slow paths there's
already the risk of the pwrite call not being atomic, no degration.
- read/write access to mmaps: already fully racy, no degration.
- set_tiling: Calling set_tiling while reading/writing is already
pretty much undefined, now it just got a bit worse. set_tiling is
only called by libdrm on unused/new bos, so no problem.
- set_domain: When changing to the gtt domain while copying (without any
read/write access, e.g. for synchronization), we might leave unflushed
data in the cpu caches. The clflush_object at the end of pwrite_slow
takes care of this problem.
- truncating of purgeable objects: the shmem_read_mapping_page call could
reinstate backing storage for truncated objects. The check at the end
of pwrite_slow takes care of this.
v2:
- add missing intel_gtt_chipset_flush
- add __ to copy_from_user_swizzled as suggest by Chris Wilson.
v3: Fixup bit17 swizzling, it swizzled the wrong pages.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2011-12-14 20:57:31 +08:00
|
|
|
int obj_do_bit17_swizzling, page_do_bit17_swizzling;
|
2012-03-26 01:47:28 +08:00
|
|
|
int hit_slowpath = 0;
|
2016-08-19 00:16:47 +08:00
|
|
|
unsigned int needs_clflush;
|
2013-02-19 01:28:02 +08:00
|
|
|
struct sg_page_iter sg_iter;
|
2009-03-10 04:42:30 +08:00
|
|
|
|
2016-08-19 00:16:47 +08:00
|
|
|
ret = i915_gem_obj_prepare_shmem_write(obj, &needs_clflush);
|
2012-09-05 04:02:55 +08:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2016-08-19 00:16:47 +08:00
|
|
|
obj_do_bit17_swizzling = i915_gem_object_needs_bit17_swizzle(obj);
|
|
|
|
user_data = u64_to_user_ptr(args->data_ptr);
|
2008-07-31 03:06:12 +08:00
|
|
|
offset = args->offset;
|
2016-08-19 00:16:47 +08:00
|
|
|
remain = args->size;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2013-02-19 01:28:02 +08:00
|
|
|
for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents,
|
|
|
|
offset >> PAGE_SHIFT) {
|
2013-03-26 21:14:18 +08:00
|
|
|
struct page *page = sg_page_iter_page(&sg_iter);
|
2012-03-26 01:47:37 +08:00
|
|
|
int partial_cacheline_write;
|
2010-10-28 20:45:36 +08:00
|
|
|
|
2012-06-01 22:20:22 +08:00
|
|
|
if (remain <= 0)
|
|
|
|
break;
|
|
|
|
|
2009-03-10 04:42:30 +08:00
|
|
|
/* Operation in this page
|
|
|
|
*
|
|
|
|
* shmem_page_offset = offset within page in shmem file
|
|
|
|
* page_length = bytes to copy for this page
|
|
|
|
*/
|
2011-05-13 05:17:11 +08:00
|
|
|
shmem_page_offset = offset_in_page(offset);
|
2009-03-10 04:42:30 +08:00
|
|
|
|
|
|
|
page_length = remain;
|
|
|
|
if ((shmem_page_offset + page_length) > PAGE_SIZE)
|
|
|
|
page_length = PAGE_SIZE - shmem_page_offset;
|
|
|
|
|
2012-03-26 01:47:37 +08:00
|
|
|
/* If we don't overwrite a cacheline completely we need to be
|
|
|
|
* careful to have up-to-date data by first clflushing. Don't
|
|
|
|
* overcomplicate things and flush the entire patch. */
|
2016-08-19 00:16:47 +08:00
|
|
|
partial_cacheline_write = needs_clflush & CLFLUSH_BEFORE &&
|
2012-03-26 01:47:37 +08:00
|
|
|
((shmem_page_offset | page_length)
|
|
|
|
& (boot_cpu_data.x86_clflush_size - 1));
|
|
|
|
|
drm/i915: rewrite shmem_pwrite_slow to use copy_from_user
... instead of get_user_pages, because that fails on non page-backed
user addresses like e.g. a gtt mapping of a bo.
To get there essentially copy the vfs read path into pagecache. We
can't call that right away because we have to take care of bit17
swizzling. To not deadlock with our own pagefault handler we need
to completely drop struct_mutex, reducing the atomicty-guarantees
of our userspace abi. Implications for racing with other gem ioctl:
- execbuf, pwrite, pread: Due to -EFAULT fallback to slow paths there's
already the risk of the pwrite call not being atomic, no degration.
- read/write access to mmaps: already fully racy, no degration.
- set_tiling: Calling set_tiling while reading/writing is already
pretty much undefined, now it just got a bit worse. set_tiling is
only called by libdrm on unused/new bos, so no problem.
- set_domain: When changing to the gtt domain while copying (without any
read/write access, e.g. for synchronization), we might leave unflushed
data in the cpu caches. The clflush_object at the end of pwrite_slow
takes care of this problem.
- truncating of purgeable objects: the shmem_read_mapping_page call could
reinstate backing storage for truncated objects. The check at the end
of pwrite_slow takes care of this.
v2:
- add missing intel_gtt_chipset_flush
- add __ to copy_from_user_swizzled as suggest by Chris Wilson.
v3: Fixup bit17 swizzling, it swizzled the wrong pages.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2011-12-14 20:57:31 +08:00
|
|
|
page_do_bit17_swizzling = obj_do_bit17_swizzling &&
|
|
|
|
(page_to_phys(page) & (1 << 17)) != 0;
|
|
|
|
|
2012-03-26 01:47:40 +08:00
|
|
|
ret = shmem_pwrite_fast(page, shmem_page_offset, page_length,
|
|
|
|
user_data, page_do_bit17_swizzling,
|
|
|
|
partial_cacheline_write,
|
2016-08-19 00:16:47 +08:00
|
|
|
needs_clflush & CLFLUSH_AFTER);
|
2012-03-26 01:47:40 +08:00
|
|
|
if (ret == 0)
|
|
|
|
goto next_page;
|
2012-03-26 01:47:28 +08:00
|
|
|
|
|
|
|
hit_slowpath = 1;
|
|
|
|
mutex_unlock(&dev->struct_mutex);
|
2012-03-26 01:47:40 +08:00
|
|
|
ret = shmem_pwrite_slow(page, shmem_page_offset, page_length,
|
|
|
|
user_data, page_do_bit17_swizzling,
|
|
|
|
partial_cacheline_write,
|
2016-08-19 00:16:47 +08:00
|
|
|
needs_clflush & CLFLUSH_AFTER);
|
2009-03-10 04:42:30 +08:00
|
|
|
|
2012-03-26 01:47:28 +08:00
|
|
|
mutex_lock(&dev->struct_mutex);
|
2012-09-05 04:02:55 +08:00
|
|
|
|
|
|
|
if (ret)
|
drm/i915: rewrite shmem_pwrite_slow to use copy_from_user
... instead of get_user_pages, because that fails on non page-backed
user addresses like e.g. a gtt mapping of a bo.
To get there essentially copy the vfs read path into pagecache. We
can't call that right away because we have to take care of bit17
swizzling. To not deadlock with our own pagefault handler we need
to completely drop struct_mutex, reducing the atomicty-guarantees
of our userspace abi. Implications for racing with other gem ioctl:
- execbuf, pwrite, pread: Due to -EFAULT fallback to slow paths there's
already the risk of the pwrite call not being atomic, no degration.
- read/write access to mmaps: already fully racy, no degration.
- set_tiling: Calling set_tiling while reading/writing is already
pretty much undefined, now it just got a bit worse. set_tiling is
only called by libdrm on unused/new bos, so no problem.
- set_domain: When changing to the gtt domain while copying (without any
read/write access, e.g. for synchronization), we might leave unflushed
data in the cpu caches. The clflush_object at the end of pwrite_slow
takes care of this problem.
- truncating of purgeable objects: the shmem_read_mapping_page call could
reinstate backing storage for truncated objects. The check at the end
of pwrite_slow takes care of this.
v2:
- add missing intel_gtt_chipset_flush
- add __ to copy_from_user_swizzled as suggest by Chris Wilson.
v3: Fixup bit17 swizzling, it swizzled the wrong pages.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2011-12-14 20:57:31 +08:00
|
|
|
goto out;
|
|
|
|
|
2014-03-07 16:30:36 +08:00
|
|
|
next_page:
|
2009-03-10 04:42:30 +08:00
|
|
|
remain -= page_length;
|
drm/i915: rewrite shmem_pwrite_slow to use copy_from_user
... instead of get_user_pages, because that fails on non page-backed
user addresses like e.g. a gtt mapping of a bo.
To get there essentially copy the vfs read path into pagecache. We
can't call that right away because we have to take care of bit17
swizzling. To not deadlock with our own pagefault handler we need
to completely drop struct_mutex, reducing the atomicty-guarantees
of our userspace abi. Implications for racing with other gem ioctl:
- execbuf, pwrite, pread: Due to -EFAULT fallback to slow paths there's
already the risk of the pwrite call not being atomic, no degration.
- read/write access to mmaps: already fully racy, no degration.
- set_tiling: Calling set_tiling while reading/writing is already
pretty much undefined, now it just got a bit worse. set_tiling is
only called by libdrm on unused/new bos, so no problem.
- set_domain: When changing to the gtt domain while copying (without any
read/write access, e.g. for synchronization), we might leave unflushed
data in the cpu caches. The clflush_object at the end of pwrite_slow
takes care of this problem.
- truncating of purgeable objects: the shmem_read_mapping_page call could
reinstate backing storage for truncated objects. The check at the end
of pwrite_slow takes care of this.
v2:
- add missing intel_gtt_chipset_flush
- add __ to copy_from_user_swizzled as suggest by Chris Wilson.
v3: Fixup bit17 swizzling, it swizzled the wrong pages.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2011-12-14 20:57:31 +08:00
|
|
|
user_data += page_length;
|
2009-03-10 04:42:30 +08:00
|
|
|
offset += page_length;
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
|
|
|
|
2010-10-14 22:03:58 +08:00
|
|
|
out:
|
2016-08-19 00:16:47 +08:00
|
|
|
i915_gem_obj_finish_shmem_access(obj);
|
2012-09-05 04:02:55 +08:00
|
|
|
|
2012-03-26 01:47:28 +08:00
|
|
|
if (hit_slowpath) {
|
2012-11-15 23:53:58 +08:00
|
|
|
/*
|
|
|
|
* Fixup: Flush cpu caches in case we didn't flush the dirty
|
|
|
|
* cachelines in-line while writing and the object moved
|
|
|
|
* out of the cpu write domain while we've dropped the lock.
|
|
|
|
*/
|
2016-08-19 00:16:47 +08:00
|
|
|
if (!(needs_clflush & CLFLUSH_AFTER) &&
|
2012-11-15 23:53:58 +08:00
|
|
|
obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
|
2013-08-08 21:41:09 +08:00
|
|
|
if (i915_gem_clflush_object(obj, obj->pin_display))
|
2016-08-19 00:16:47 +08:00
|
|
|
needs_clflush |= CLFLUSH_AFTER;
|
2012-03-26 01:47:28 +08:00
|
|
|
}
|
drm/i915: rewrite shmem_pwrite_slow to use copy_from_user
... instead of get_user_pages, because that fails on non page-backed
user addresses like e.g. a gtt mapping of a bo.
To get there essentially copy the vfs read path into pagecache. We
can't call that right away because we have to take care of bit17
swizzling. To not deadlock with our own pagefault handler we need
to completely drop struct_mutex, reducing the atomicty-guarantees
of our userspace abi. Implications for racing with other gem ioctl:
- execbuf, pwrite, pread: Due to -EFAULT fallback to slow paths there's
already the risk of the pwrite call not being atomic, no degration.
- read/write access to mmaps: already fully racy, no degration.
- set_tiling: Calling set_tiling while reading/writing is already
pretty much undefined, now it just got a bit worse. set_tiling is
only called by libdrm on unused/new bos, so no problem.
- set_domain: When changing to the gtt domain while copying (without any
read/write access, e.g. for synchronization), we might leave unflushed
data in the cpu caches. The clflush_object at the end of pwrite_slow
takes care of this problem.
- truncating of purgeable objects: the shmem_read_mapping_page call could
reinstate backing storage for truncated objects. The check at the end
of pwrite_slow takes care of this.
v2:
- add missing intel_gtt_chipset_flush
- add __ to copy_from_user_swizzled as suggest by Chris Wilson.
v3: Fixup bit17 swizzling, it swizzled the wrong pages.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2011-12-14 20:57:31 +08:00
|
|
|
}
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2016-08-19 00:16:47 +08:00
|
|
|
if (needs_clflush & CLFLUSH_AFTER)
|
2016-05-06 22:40:21 +08:00
|
|
|
i915_gem_chipset_flush(to_i915(dev));
|
2012-03-26 01:47:37 +08:00
|
|
|
|
2015-07-08 07:28:51 +08:00
|
|
|
intel_fb_obj_flush(obj, false, ORIGIN_CPU);
|
2009-03-10 04:42:30 +08:00
|
|
|
return ret;
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Writes data to the object referenced by handle.
|
2016-06-03 21:02:17 +08:00
|
|
|
* @dev: drm device
|
|
|
|
* @data: ioctl data blob
|
|
|
|
* @file: drm file
|
2008-07-31 03:06:12 +08:00
|
|
|
*
|
|
|
|
* On error, the contents of the buffer that were to be modified are undefined.
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
|
2010-10-14 22:03:58 +08:00
|
|
|
struct drm_file *file)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
2016-07-04 18:34:36 +08:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(dev);
|
2008-07-31 03:06:12 +08:00
|
|
|
struct drm_i915_gem_pwrite *args = data;
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_i915_gem_object *obj;
|
2010-11-17 17:10:42 +08:00
|
|
|
int ret;
|
|
|
|
|
|
|
|
if (args->size == 0)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (!access_ok(VERIFY_READ,
|
2016-04-26 23:32:27 +08:00
|
|
|
u64_to_user_ptr(args->data_ptr),
|
2010-11-17 17:10:42 +08:00
|
|
|
args->size))
|
|
|
|
return -EFAULT;
|
|
|
|
|
2014-01-21 17:24:25 +08:00
|
|
|
if (likely(!i915.prefault_disable)) {
|
2016-09-18 06:02:44 +08:00
|
|
|
ret = fault_in_pages_readable(u64_to_user_ptr(args->data_ptr),
|
2013-07-19 13:51:24 +08:00
|
|
|
args->size);
|
|
|
|
if (ret)
|
|
|
|
return -EFAULT;
|
|
|
|
}
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2016-07-20 20:31:51 +08:00
|
|
|
obj = i915_gem_object_lookup(file, args->handle);
|
2016-08-05 17:14:16 +08:00
|
|
|
if (!obj)
|
|
|
|
return -ENOENT;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2010-09-27 03:21:44 +08:00
|
|
|
/* Bounds check destination. */
|
2010-11-09 03:18:58 +08:00
|
|
|
if (args->offset > obj->base.size ||
|
|
|
|
args->size > obj->base.size - args->offset) {
|
2010-09-27 03:50:05 +08:00
|
|
|
ret = -EINVAL;
|
2016-08-05 17:14:16 +08:00
|
|
|
goto err;
|
2010-09-27 03:50:05 +08:00
|
|
|
}
|
|
|
|
|
2011-02-03 19:57:46 +08:00
|
|
|
trace_i915_gem_object_pwrite(obj, args->offset, args->size);
|
|
|
|
|
2016-08-05 17:14:16 +08:00
|
|
|
ret = __unsafe_wait_rendering(obj, to_rps_client(file), false);
|
|
|
|
if (ret)
|
|
|
|
goto err;
|
|
|
|
|
|
|
|
intel_runtime_pm_get(dev_priv);
|
|
|
|
|
|
|
|
ret = i915_mutex_lock_interruptible(dev);
|
|
|
|
if (ret)
|
|
|
|
goto err_rpm;
|
|
|
|
|
2012-03-26 01:47:35 +08:00
|
|
|
ret = -EFAULT;
|
2008-07-31 03:06:12 +08:00
|
|
|
/* We can only do the GTT pwrite on untiled buffers, as otherwise
|
|
|
|
* it would end up going through the fenced access, and we'll get
|
|
|
|
* different detiling behavior between reading and writing.
|
|
|
|
* pread/pwrite currently are reading and writing from the CPU
|
|
|
|
* perspective, requiring manual detiling by the client.
|
|
|
|
*/
|
2016-06-20 22:05:52 +08:00
|
|
|
if (!i915_gem_object_has_struct_page(obj) ||
|
2013-08-09 19:26:45 +08:00
|
|
|
cpu_write_needs_clflush(obj)) {
|
2016-06-10 16:53:01 +08:00
|
|
|
ret = i915_gem_gtt_pwrite_fast(dev_priv, obj, args, file);
|
2012-03-26 01:47:35 +08:00
|
|
|
/* Note that the gtt paths might fail with non-page-backed user
|
|
|
|
* pointers (e.g. gtt mappings when moving data between
|
|
|
|
* textures). Fallback to the shmem path in that case. */
|
2010-10-14 22:03:58 +08:00
|
|
|
}
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2016-07-17 01:42:36 +08:00
|
|
|
if (ret == -EFAULT || ret == -ENOSPC) {
|
2014-11-04 20:51:40 +08:00
|
|
|
if (obj->phys_handle)
|
|
|
|
ret = i915_gem_phys_pwrite(obj, args, file);
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
else
|
2016-08-19 00:16:47 +08:00
|
|
|
ret = i915_gem_shmem_pwrite(dev, obj, args, file);
|
2014-11-04 20:51:40 +08:00
|
|
|
}
|
2011-12-14 20:57:30 +08:00
|
|
|
|
2016-07-20 20:31:53 +08:00
|
|
|
i915_gem_object_put(obj);
|
2010-10-14 22:03:58 +08:00
|
|
|
mutex_unlock(&dev->struct_mutex);
|
2014-11-12 22:40:35 +08:00
|
|
|
intel_runtime_pm_put(dev_priv);
|
|
|
|
|
2008-07-31 03:06:12 +08:00
|
|
|
return ret;
|
2012-08-24 16:35:08 +08:00
|
|
|
|
2016-08-05 17:14:16 +08:00
|
|
|
err_rpm:
|
|
|
|
intel_runtime_pm_put(dev_priv);
|
|
|
|
err:
|
|
|
|
i915_gem_object_put_unlocked(obj);
|
|
|
|
return ret;
|
2012-08-24 16:35:08 +08:00
|
|
|
}
|
|
|
|
|
2016-08-19 00:16:44 +08:00
|
|
|
static inline enum fb_op_origin
|
2016-06-18 01:46:39 +08:00
|
|
|
write_origin(struct drm_i915_gem_object *obj, unsigned domain)
|
drm/i915: Limit the busy wait on requests to 5us not 10ms!
When waiting for high frequency requests, the finite amount of time
required to set up the irq and wait upon it limits the response rate. By
busywaiting on the request completion for a short while we can service
the high frequency waits as quick as possible. However, if it is a slow
request, we want to sleep as quickly as possible. The tradeoff between
waiting and sleeping is roughly the time it takes to sleep on a request,
on the order of a microsecond. Based on measurements of synchronous
workloads from across big core and little atom, I have set the limit for
busywaiting as 10 microseconds. In most of the synchronous cases, we can
reduce the limit down to as little as 2 miscroseconds, but that leaves
quite a few test cases regressing by factors of 3 and more.
The code currently uses the jiffie clock, but that is far too coarse (on
the order of 10 milliseconds) and results in poor interactivity as the
CPU ends up being hogged by slow requests. To get microsecond resolution
we need to use a high resolution timer. The cheapest of which is polling
local_clock(), but that is only valid on the same CPU. If we switch CPUs
because the task was preempted, we can also use that as an indicator that
the system is too busy to waste cycles on spinning and we should sleep
instead.
__i915_spin_request was introduced in
commit 2def4ad99befa25775dd2f714fdd4d92faec6e34 [v4.2]
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Apr 7 16:20:41 2015 +0100
drm/i915: Optimistically spin for the request completion
v2: Drop full u64 for unsigned long - the timer is 32bit wraparound safe,
so we can use native register sizes on smaller architectures. Mention
the approximate microseconds units for elapsed time and add some extra
comments describing the reason for busywaiting.
v3: Raise the limit to 10us
v4: Now 5us.
Reported-by: Jens Axboe <axboe@kernel.dk>
Link: https://lkml.org/lkml/2015/11/12/621
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1449833608-22125-3-git-send-email-chris@chris-wilson.co.uk
2015-12-11 19:32:58 +08:00
|
|
|
{
|
2016-08-19 00:17:04 +08:00
|
|
|
return (domain == I915_GEM_DOMAIN_GTT ?
|
|
|
|
obj->frontbuffer_ggtt_origin : ORIGIN_CPU);
|
2016-06-18 01:46:39 +08:00
|
|
|
}
|
|
|
|
|
2008-07-31 03:06:12 +08:00
|
|
|
/**
|
2008-11-11 02:53:25 +08:00
|
|
|
* Called when user space prepares to use an object with the CPU, either
|
|
|
|
* through the mmap ioctl's mapping or a GTT mapping.
|
2016-06-03 21:02:17 +08:00
|
|
|
* @dev: drm device
|
|
|
|
* @data: ioctl data blob
|
|
|
|
* @file: drm file
|
2008-07-31 03:06:12 +08:00
|
|
|
*/
|
|
|
|
int
|
|
|
|
i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_file *file)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
|
|
|
struct drm_i915_gem_set_domain *args = data;
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_i915_gem_object *obj;
|
2008-11-11 02:53:25 +08:00
|
|
|
uint32_t read_domains = args->read_domains;
|
|
|
|
uint32_t write_domain = args->write_domain;
|
2008-07-31 03:06:12 +08:00
|
|
|
int ret;
|
|
|
|
|
2008-11-11 02:53:25 +08:00
|
|
|
/* Only handle setting domains to types used by the CPU. */
|
2016-08-05 17:14:07 +08:00
|
|
|
if ((write_domain | read_domains) & I915_GEM_GPU_DOMAINS)
|
2008-11-11 02:53:25 +08:00
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
/* Having something in the write domain implies it's in the read
|
|
|
|
* domain, and only that read domain. Enforce that in the request.
|
|
|
|
*/
|
|
|
|
if (write_domain != 0 && read_domains != write_domain)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2016-07-20 20:31:51 +08:00
|
|
|
obj = i915_gem_object_lookup(file, args->handle);
|
2016-08-05 17:14:07 +08:00
|
|
|
if (!obj)
|
|
|
|
return -ENOENT;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2012-08-24 16:35:09 +08:00
|
|
|
/* Try to flush the object off the GPU without holding the lock.
|
|
|
|
* We will repeat the flush holding the lock in the normal manner
|
|
|
|
* to catch cases where we are gazumped.
|
|
|
|
*/
|
2016-08-05 17:14:07 +08:00
|
|
|
ret = __unsafe_wait_rendering(obj, to_rps_client(file), !write_domain);
|
2012-08-24 16:35:09 +08:00
|
|
|
if (ret)
|
2016-08-05 17:14:07 +08:00
|
|
|
goto err;
|
|
|
|
|
|
|
|
ret = i915_mutex_lock_interruptible(dev);
|
2012-08-24 16:35:09 +08:00
|
|
|
if (ret)
|
2016-08-05 17:14:07 +08:00
|
|
|
goto err;
|
2012-08-24 16:35:09 +08:00
|
|
|
|
2015-01-02 18:59:29 +08:00
|
|
|
if (read_domains & I915_GEM_DOMAIN_GTT)
|
2008-11-11 02:53:25 +08:00
|
|
|
ret = i915_gem_object_set_to_gtt_domain(obj, write_domain != 0);
|
2015-01-02 18:59:29 +08:00
|
|
|
else
|
2008-11-15 05:35:19 +08:00
|
|
|
ret = i915_gem_object_set_to_cpu_domain(obj, write_domain != 0);
|
2008-11-11 02:53:25 +08:00
|
|
|
|
2015-06-27 01:35:16 +08:00
|
|
|
if (write_domain != 0)
|
2016-06-18 01:46:39 +08:00
|
|
|
intel_fb_obj_invalidate(obj, write_origin(obj, write_domain));
|
2015-06-27 01:35:16 +08:00
|
|
|
|
2016-07-20 20:31:53 +08:00
|
|
|
i915_gem_object_put(obj);
|
2008-07-31 03:06:12 +08:00
|
|
|
mutex_unlock(&dev->struct_mutex);
|
|
|
|
return ret;
|
2016-08-05 17:14:07 +08:00
|
|
|
|
|
|
|
err:
|
|
|
|
i915_gem_object_put_unlocked(obj);
|
|
|
|
return ret;
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Called when user space has done writes to this buffer
|
2016-06-03 21:02:17 +08:00
|
|
|
* @dev: drm device
|
|
|
|
* @data: ioctl data blob
|
|
|
|
* @file: drm file
|
2008-07-31 03:06:12 +08:00
|
|
|
*/
|
|
|
|
int
|
|
|
|
i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_file *file)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
|
|
|
struct drm_i915_gem_sw_finish *args = data;
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_i915_gem_object *obj;
|
2016-08-05 17:14:19 +08:00
|
|
|
int err = 0;
|
2010-10-17 16:45:41 +08:00
|
|
|
|
2016-07-20 20:31:51 +08:00
|
|
|
obj = i915_gem_object_lookup(file, args->handle);
|
2016-08-05 17:14:19 +08:00
|
|
|
if (!obj)
|
|
|
|
return -ENOENT;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
|
|
|
/* Pinned buffers may be scanout, so flush the cache */
|
2016-08-05 17:14:19 +08:00
|
|
|
if (READ_ONCE(obj->pin_display)) {
|
|
|
|
err = i915_mutex_lock_interruptible(dev);
|
|
|
|
if (!err) {
|
|
|
|
i915_gem_object_flush_cpu_write_domain(obj);
|
|
|
|
mutex_unlock(&dev->struct_mutex);
|
|
|
|
}
|
|
|
|
}
|
2008-11-15 05:35:19 +08:00
|
|
|
|
2016-08-05 17:14:19 +08:00
|
|
|
i915_gem_object_put_unlocked(obj);
|
|
|
|
return err;
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
2016-06-03 21:02:17 +08:00
|
|
|
* i915_gem_mmap_ioctl - Maps the contents of an object, returning the address
|
|
|
|
* it is mapped to.
|
|
|
|
* @dev: drm device
|
|
|
|
* @data: ioctl data blob
|
|
|
|
* @file: drm file
|
2008-07-31 03:06:12 +08:00
|
|
|
*
|
|
|
|
* While the mapping holds a reference on the contents of the object, it doesn't
|
|
|
|
* imply a ref on the object itself.
|
2014-10-16 18:28:18 +08:00
|
|
|
*
|
|
|
|
* IMPORTANT:
|
|
|
|
*
|
|
|
|
* DRM driver writers who look a this function as an example for how to do GEM
|
|
|
|
* mmap support, please don't implement mmap support like here. The modern way
|
|
|
|
* to implement DRM mmap support is with an mmap offset ioctl (like
|
|
|
|
* i915_gem_mmap_gtt) and then using the mmap syscall on the DRM fd directly.
|
|
|
|
* That way debug tooling like valgrind will understand what's going on, hiding
|
|
|
|
* the mmap call in a driver private ioctl will break that. The i915 driver only
|
|
|
|
* does cpu mmaps this way because we didn't know better.
|
2008-07-31 03:06:12 +08:00
|
|
|
*/
|
|
|
|
int
|
|
|
|
i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_file *file)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
|
|
|
struct drm_i915_gem_mmap *args = data;
|
2016-07-20 20:31:51 +08:00
|
|
|
struct drm_i915_gem_object *obj;
|
2008-07-31 03:06:12 +08:00
|
|
|
unsigned long addr;
|
|
|
|
|
drm/i915: Support creation of unbound wc user mappings for objects
This patch provides support to create write-combining virtual mappings of
GEM object. It intends to provide the same funtionality of 'mmap_gtt'
interface without the constraints and contention of a limited aperture
space, but requires clients handles the linear to tile conversion on their
own. This is for improving the CPU write operation performance, as with such
mapping, writes and reads are almost 50% faster than with mmap_gtt. Similar
to the GTT mmapping, unlike the regular CPU mmapping, it avoids the cache
flush after update from CPU side, when object is passed onto GPU. This
type of mapping is specially useful in case of sub-region update,
i.e. when only a portion of the object is to be updated. Using a CPU mmap
in such cases would normally incur a clflush of the whole object, and
using a GTT mmapping would likely require eviction of an active object or
fence and thus stall. The write-combining CPU mmap avoids both.
To ensure the cache coherency, before using this mapping, the GTT domain
has been reused here. This provides the required cache flush if the object
is in CPU domain or synchronization against the concurrent rendering.
Although the access through an uncached mmap should automatically
invalidate the cache lines, this may not be true for non-temporal write
instructions and also not all pages of the object may be updated at any
given point of time through this mapping. Having a call to get_pages in
set_to_gtt_domain function, as added in the earlier patch 'drm/i915:
Broaden application of set-domain(GTT)', would guarantee the clflush and
so there will be no cachelines holding the data for the object before it
is accessed through this map.
The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
extended with a new flags field (defaulting to 0 for existent users). In
order for userspace to detect the extended ioctl, a new parameter
I915_PARAM_MMAP_VERSION has been added for versioning the ioctl interface.
v2: Fix error handling, invalid flag detection, renaming (ickle)
v3: Rebase to latest drm-intel-nightly codebase
The new mmapping is exercised by igt/gem_mmap_wc,
igt/gem_concurrent_blit and igt/gem_gtt_speed.
Change-Id: Ie883942f9e689525f72fe9a8d3780c3a9faa769a
Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-02 18:59:30 +08:00
|
|
|
if (args->flags & ~(I915_MMAP_WC))
|
|
|
|
return -EINVAL;
|
|
|
|
|
2016-03-29 23:42:01 +08:00
|
|
|
if (args->flags & I915_MMAP_WC && !boot_cpu_has(X86_FEATURE_PAT))
|
drm/i915: Support creation of unbound wc user mappings for objects
This patch provides support to create write-combining virtual mappings of
GEM object. It intends to provide the same funtionality of 'mmap_gtt'
interface without the constraints and contention of a limited aperture
space, but requires clients handles the linear to tile conversion on their
own. This is for improving the CPU write operation performance, as with such
mapping, writes and reads are almost 50% faster than with mmap_gtt. Similar
to the GTT mmapping, unlike the regular CPU mmapping, it avoids the cache
flush after update from CPU side, when object is passed onto GPU. This
type of mapping is specially useful in case of sub-region update,
i.e. when only a portion of the object is to be updated. Using a CPU mmap
in such cases would normally incur a clflush of the whole object, and
using a GTT mmapping would likely require eviction of an active object or
fence and thus stall. The write-combining CPU mmap avoids both.
To ensure the cache coherency, before using this mapping, the GTT domain
has been reused here. This provides the required cache flush if the object
is in CPU domain or synchronization against the concurrent rendering.
Although the access through an uncached mmap should automatically
invalidate the cache lines, this may not be true for non-temporal write
instructions and also not all pages of the object may be updated at any
given point of time through this mapping. Having a call to get_pages in
set_to_gtt_domain function, as added in the earlier patch 'drm/i915:
Broaden application of set-domain(GTT)', would guarantee the clflush and
so there will be no cachelines holding the data for the object before it
is accessed through this map.
The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
extended with a new flags field (defaulting to 0 for existent users). In
order for userspace to detect the extended ioctl, a new parameter
I915_PARAM_MMAP_VERSION has been added for versioning the ioctl interface.
v2: Fix error handling, invalid flag detection, renaming (ickle)
v3: Rebase to latest drm-intel-nightly codebase
The new mmapping is exercised by igt/gem_mmap_wc,
igt/gem_concurrent_blit and igt/gem_gtt_speed.
Change-Id: Ie883942f9e689525f72fe9a8d3780c3a9faa769a
Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-02 18:59:30 +08:00
|
|
|
return -ENODEV;
|
|
|
|
|
2016-07-20 20:31:51 +08:00
|
|
|
obj = i915_gem_object_lookup(file, args->handle);
|
|
|
|
if (!obj)
|
2010-08-04 21:19:46 +08:00
|
|
|
return -ENOENT;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2012-05-10 21:25:09 +08:00
|
|
|
/* prime objects have no backing filp to GEM mmap
|
|
|
|
* pages from.
|
|
|
|
*/
|
2016-07-20 20:31:51 +08:00
|
|
|
if (!obj->base.filp) {
|
2016-07-20 20:31:54 +08:00
|
|
|
i915_gem_object_put_unlocked(obj);
|
2012-05-10 21:25:09 +08:00
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2016-07-20 20:31:51 +08:00
|
|
|
addr = vm_mmap(obj->base.filp, 0, args->size,
|
2008-07-31 03:06:12 +08:00
|
|
|
PROT_READ | PROT_WRITE, MAP_SHARED,
|
|
|
|
args->offset);
|
drm/i915: Support creation of unbound wc user mappings for objects
This patch provides support to create write-combining virtual mappings of
GEM object. It intends to provide the same funtionality of 'mmap_gtt'
interface without the constraints and contention of a limited aperture
space, but requires clients handles the linear to tile conversion on their
own. This is for improving the CPU write operation performance, as with such
mapping, writes and reads are almost 50% faster than with mmap_gtt. Similar
to the GTT mmapping, unlike the regular CPU mmapping, it avoids the cache
flush after update from CPU side, when object is passed onto GPU. This
type of mapping is specially useful in case of sub-region update,
i.e. when only a portion of the object is to be updated. Using a CPU mmap
in such cases would normally incur a clflush of the whole object, and
using a GTT mmapping would likely require eviction of an active object or
fence and thus stall. The write-combining CPU mmap avoids both.
To ensure the cache coherency, before using this mapping, the GTT domain
has been reused here. This provides the required cache flush if the object
is in CPU domain or synchronization against the concurrent rendering.
Although the access through an uncached mmap should automatically
invalidate the cache lines, this may not be true for non-temporal write
instructions and also not all pages of the object may be updated at any
given point of time through this mapping. Having a call to get_pages in
set_to_gtt_domain function, as added in the earlier patch 'drm/i915:
Broaden application of set-domain(GTT)', would guarantee the clflush and
so there will be no cachelines holding the data for the object before it
is accessed through this map.
The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
extended with a new flags field (defaulting to 0 for existent users). In
order for userspace to detect the extended ioctl, a new parameter
I915_PARAM_MMAP_VERSION has been added for versioning the ioctl interface.
v2: Fix error handling, invalid flag detection, renaming (ickle)
v3: Rebase to latest drm-intel-nightly codebase
The new mmapping is exercised by igt/gem_mmap_wc,
igt/gem_concurrent_blit and igt/gem_gtt_speed.
Change-Id: Ie883942f9e689525f72fe9a8d3780c3a9faa769a
Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-02 18:59:30 +08:00
|
|
|
if (args->flags & I915_MMAP_WC) {
|
|
|
|
struct mm_struct *mm = current->mm;
|
|
|
|
struct vm_area_struct *vma;
|
|
|
|
|
2016-05-24 07:26:11 +08:00
|
|
|
if (down_write_killable(&mm->mmap_sem)) {
|
2016-07-20 20:31:54 +08:00
|
|
|
i915_gem_object_put_unlocked(obj);
|
2016-05-24 07:26:11 +08:00
|
|
|
return -EINTR;
|
|
|
|
}
|
drm/i915: Support creation of unbound wc user mappings for objects
This patch provides support to create write-combining virtual mappings of
GEM object. It intends to provide the same funtionality of 'mmap_gtt'
interface without the constraints and contention of a limited aperture
space, but requires clients handles the linear to tile conversion on their
own. This is for improving the CPU write operation performance, as with such
mapping, writes and reads are almost 50% faster than with mmap_gtt. Similar
to the GTT mmapping, unlike the regular CPU mmapping, it avoids the cache
flush after update from CPU side, when object is passed onto GPU. This
type of mapping is specially useful in case of sub-region update,
i.e. when only a portion of the object is to be updated. Using a CPU mmap
in such cases would normally incur a clflush of the whole object, and
using a GTT mmapping would likely require eviction of an active object or
fence and thus stall. The write-combining CPU mmap avoids both.
To ensure the cache coherency, before using this mapping, the GTT domain
has been reused here. This provides the required cache flush if the object
is in CPU domain or synchronization against the concurrent rendering.
Although the access through an uncached mmap should automatically
invalidate the cache lines, this may not be true for non-temporal write
instructions and also not all pages of the object may be updated at any
given point of time through this mapping. Having a call to get_pages in
set_to_gtt_domain function, as added in the earlier patch 'drm/i915:
Broaden application of set-domain(GTT)', would guarantee the clflush and
so there will be no cachelines holding the data for the object before it
is accessed through this map.
The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
extended with a new flags field (defaulting to 0 for existent users). In
order for userspace to detect the extended ioctl, a new parameter
I915_PARAM_MMAP_VERSION has been added for versioning the ioctl interface.
v2: Fix error handling, invalid flag detection, renaming (ickle)
v3: Rebase to latest drm-intel-nightly codebase
The new mmapping is exercised by igt/gem_mmap_wc,
igt/gem_concurrent_blit and igt/gem_gtt_speed.
Change-Id: Ie883942f9e689525f72fe9a8d3780c3a9faa769a
Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-02 18:59:30 +08:00
|
|
|
vma = find_vma(mm, addr);
|
|
|
|
if (vma)
|
|
|
|
vma->vm_page_prot =
|
|
|
|
pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
|
|
|
|
else
|
|
|
|
addr = -ENOMEM;
|
|
|
|
up_write(&mm->mmap_sem);
|
2016-06-18 01:46:39 +08:00
|
|
|
|
|
|
|
/* This may race, but that's ok, it only gets set */
|
2016-08-19 00:17:04 +08:00
|
|
|
WRITE_ONCE(obj->frontbuffer_ggtt_origin, ORIGIN_CPU);
|
drm/i915: Support creation of unbound wc user mappings for objects
This patch provides support to create write-combining virtual mappings of
GEM object. It intends to provide the same funtionality of 'mmap_gtt'
interface without the constraints and contention of a limited aperture
space, but requires clients handles the linear to tile conversion on their
own. This is for improving the CPU write operation performance, as with such
mapping, writes and reads are almost 50% faster than with mmap_gtt. Similar
to the GTT mmapping, unlike the regular CPU mmapping, it avoids the cache
flush after update from CPU side, when object is passed onto GPU. This
type of mapping is specially useful in case of sub-region update,
i.e. when only a portion of the object is to be updated. Using a CPU mmap
in such cases would normally incur a clflush of the whole object, and
using a GTT mmapping would likely require eviction of an active object or
fence and thus stall. The write-combining CPU mmap avoids both.
To ensure the cache coherency, before using this mapping, the GTT domain
has been reused here. This provides the required cache flush if the object
is in CPU domain or synchronization against the concurrent rendering.
Although the access through an uncached mmap should automatically
invalidate the cache lines, this may not be true for non-temporal write
instructions and also not all pages of the object may be updated at any
given point of time through this mapping. Having a call to get_pages in
set_to_gtt_domain function, as added in the earlier patch 'drm/i915:
Broaden application of set-domain(GTT)', would guarantee the clflush and
so there will be no cachelines holding the data for the object before it
is accessed through this map.
The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
extended with a new flags field (defaulting to 0 for existent users). In
order for userspace to detect the extended ioctl, a new parameter
I915_PARAM_MMAP_VERSION has been added for versioning the ioctl interface.
v2: Fix error handling, invalid flag detection, renaming (ickle)
v3: Rebase to latest drm-intel-nightly codebase
The new mmapping is exercised by igt/gem_mmap_wc,
igt/gem_concurrent_blit and igt/gem_gtt_speed.
Change-Id: Ie883942f9e689525f72fe9a8d3780c3a9faa769a
Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-02 18:59:30 +08:00
|
|
|
}
|
2016-07-20 20:31:54 +08:00
|
|
|
i915_gem_object_put_unlocked(obj);
|
2008-07-31 03:06:12 +08:00
|
|
|
if (IS_ERR((void *)addr))
|
|
|
|
return addr;
|
|
|
|
|
|
|
|
args->addr_ptr = (uint64_t) addr;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2016-08-19 00:17:01 +08:00
|
|
|
static unsigned int tile_row_pages(struct drm_i915_gem_object *obj)
|
|
|
|
{
|
|
|
|
u64 size;
|
|
|
|
|
|
|
|
size = i915_gem_object_get_stride(obj);
|
|
|
|
size *= i915_gem_object_get_tiling(obj) == I915_TILING_Y ? 32 : 8;
|
|
|
|
|
|
|
|
return size >> PAGE_SHIFT;
|
|
|
|
}
|
|
|
|
|
2016-08-26 02:05:19 +08:00
|
|
|
/**
|
|
|
|
* i915_gem_mmap_gtt_version - report the current feature set for GTT mmaps
|
|
|
|
*
|
|
|
|
* A history of the GTT mmap interface:
|
|
|
|
*
|
|
|
|
* 0 - Everything had to fit into the GTT. Both parties of a memcpy had to
|
|
|
|
* aligned and suitable for fencing, and still fit into the available
|
|
|
|
* mappable space left by the pinned display objects. A classic problem
|
|
|
|
* we called the page-fault-of-doom where we would ping-pong between
|
|
|
|
* two objects that could not fit inside the GTT and so the memcpy
|
|
|
|
* would page one object in at the expense of the other between every
|
|
|
|
* single byte.
|
|
|
|
*
|
|
|
|
* 1 - Objects can be any size, and have any compatible fencing (X Y, or none
|
|
|
|
* as set via i915_gem_set_tiling() [DRM_I915_GEM_SET_TILING]). If the
|
|
|
|
* object is too large for the available space (or simply too large
|
|
|
|
* for the mappable aperture!), a view is created instead and faulted
|
|
|
|
* into userspace. (This view is aligned and sized appropriately for
|
|
|
|
* fenced access.)
|
|
|
|
*
|
|
|
|
* Restrictions:
|
|
|
|
*
|
|
|
|
* * snoopable objects cannot be accessed via the GTT. It can cause machine
|
|
|
|
* hangs on some architectures, corruption on others. An attempt to service
|
|
|
|
* a GTT page fault from a snoopable object will generate a SIGBUS.
|
|
|
|
*
|
|
|
|
* * the object must be able to fit into RAM (physical memory, though no
|
|
|
|
* limited to the mappable aperture).
|
|
|
|
*
|
|
|
|
*
|
|
|
|
* Caveats:
|
|
|
|
*
|
|
|
|
* * a new GTT page fault will synchronize rendering from the GPU and flush
|
|
|
|
* all data to system memory. Subsequent access will not be synchronized.
|
|
|
|
*
|
|
|
|
* * all mappings are revoked on runtime device suspend.
|
|
|
|
*
|
|
|
|
* * there are only 8, 16 or 32 fence registers to share between all users
|
|
|
|
* (older machines require fence register for display and blitter access
|
|
|
|
* as well). Contention of the fence registers will cause the previous users
|
|
|
|
* to be unmapped and any new access will generate new page faults.
|
|
|
|
*
|
|
|
|
* * running out of memory while servicing a fault may generate a SIGBUS,
|
|
|
|
* rather than the expected SIGSEGV.
|
|
|
|
*/
|
|
|
|
int i915_gem_mmap_gtt_version(void)
|
|
|
|
{
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2008-11-13 02:03:55 +08:00
|
|
|
/**
|
|
|
|
* i915_gem_fault - fault a page into the GTT
|
2016-08-15 17:49:06 +08:00
|
|
|
* @area: CPU VMA in question
|
2015-09-15 20:58:44 +08:00
|
|
|
* @vmf: fault info
|
2008-11-13 02:03:55 +08:00
|
|
|
*
|
|
|
|
* The fault handler is set up by drm_gem_mmap() when a object is GTT mapped
|
|
|
|
* from userspace. The fault handler takes care of binding the object to
|
|
|
|
* the GTT (if needed), allocating and programming a fence register (again,
|
|
|
|
* only if needed based on whether the old reg is still valid or the object
|
|
|
|
* is tiled) and inserting a new PTE into the faulting process.
|
|
|
|
*
|
|
|
|
* Note that the faulting process may involve evicting existing objects
|
|
|
|
* from the GTT and/or fence registers to make room. So performance may
|
|
|
|
* suffer if the GTT working set is large or there are few fence registers
|
|
|
|
* left.
|
2016-08-26 02:05:19 +08:00
|
|
|
*
|
|
|
|
* The current feature set supported by i915_gem_fault() and thus GTT mmaps
|
|
|
|
* is exposed via I915_PARAM_MMAP_GTT_VERSION (see i915_gem_mmap_gtt_version).
|
2008-11-13 02:03:55 +08:00
|
|
|
*/
|
2016-08-15 17:49:06 +08:00
|
|
|
int i915_gem_fault(struct vm_area_struct *area, struct vm_fault *vmf)
|
2008-11-13 02:03:55 +08:00
|
|
|
{
|
2016-08-19 00:17:01 +08:00
|
|
|
#define MIN_CHUNK_PAGES ((1 << 20) >> PAGE_SHIFT) /* 1 MiB */
|
2016-08-15 17:49:06 +08:00
|
|
|
struct drm_i915_gem_object *obj = to_intel_bo(area->vm_private_data);
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_device *dev = obj->base.dev;
|
2016-03-30 21:57:10 +08:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(dev);
|
|
|
|
struct i915_ggtt *ggtt = &dev_priv->ggtt;
|
2009-01-27 09:10:45 +08:00
|
|
|
bool write = !!(vmf->flags & FAULT_FLAG_WRITE);
|
2016-08-15 17:49:06 +08:00
|
|
|
struct i915_vma *vma;
|
2008-11-13 02:03:55 +08:00
|
|
|
pgoff_t page_offset;
|
2016-08-19 00:17:05 +08:00
|
|
|
unsigned int flags;
|
2016-08-05 17:14:07 +08:00
|
|
|
int ret;
|
2013-11-28 04:20:34 +08:00
|
|
|
|
2008-11-13 02:03:55 +08:00
|
|
|
/* We don't use vmf->pgoff since that has the fake offset */
|
2016-08-15 17:49:06 +08:00
|
|
|
page_offset = ((unsigned long)vmf->virtual_address - area->vm_start) >>
|
2008-11-13 02:03:55 +08:00
|
|
|
PAGE_SHIFT;
|
|
|
|
|
2011-02-03 19:57:46 +08:00
|
|
|
trace_i915_gem_object_fault(obj, page_offset, true, write);
|
|
|
|
|
2014-02-08 04:37:06 +08:00
|
|
|
/* Try to flush the object off the GPU first without holding the lock.
|
2016-08-05 17:14:07 +08:00
|
|
|
* Upon acquiring the lock, we will perform our sanity checks and then
|
2014-02-08 04:37:06 +08:00
|
|
|
* repeat the flush holding the lock in the normal manner to catch cases
|
|
|
|
* where we are gazumped.
|
|
|
|
*/
|
2016-08-05 17:14:07 +08:00
|
|
|
ret = __unsafe_wait_rendering(obj, NULL, !write);
|
2014-02-08 04:37:06 +08:00
|
|
|
if (ret)
|
2016-08-05 17:14:07 +08:00
|
|
|
goto err;
|
|
|
|
|
|
|
|
intel_runtime_pm_get(dev_priv);
|
|
|
|
|
|
|
|
ret = i915_mutex_lock_interruptible(dev);
|
|
|
|
if (ret)
|
|
|
|
goto err_rpm;
|
2014-02-08 04:37:06 +08:00
|
|
|
|
2012-12-16 20:43:36 +08:00
|
|
|
/* Access to snoopable pages through the GTT is incoherent. */
|
|
|
|
if (obj->cache_level != I915_CACHE_NONE && !HAS_LLC(dev)) {
|
2014-05-28 23:16:41 +08:00
|
|
|
ret = -EFAULT;
|
2016-08-05 17:14:07 +08:00
|
|
|
goto err_unlock;
|
2012-12-16 20:43:36 +08:00
|
|
|
}
|
|
|
|
|
2016-08-19 00:17:05 +08:00
|
|
|
/* If the object is smaller than a couple of partial vma, it is
|
|
|
|
* not worth only creating a single partial vma - we may as well
|
|
|
|
* clear enough space for the full object.
|
|
|
|
*/
|
|
|
|
flags = PIN_MAPPABLE;
|
|
|
|
if (obj->base.size > 2 * MIN_CHUNK_PAGES << PAGE_SHIFT)
|
|
|
|
flags |= PIN_NONBLOCK | PIN_NONFAULT;
|
|
|
|
|
2016-08-19 00:17:02 +08:00
|
|
|
/* Now pin it into the GTT as needed */
|
2016-08-19 00:17:05 +08:00
|
|
|
vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, flags);
|
2016-08-19 00:17:02 +08:00
|
|
|
if (IS_ERR(vma)) {
|
|
|
|
struct i915_ggtt_view view;
|
2016-08-19 00:17:01 +08:00
|
|
|
unsigned int chunk_size;
|
|
|
|
|
2016-08-19 00:17:02 +08:00
|
|
|
/* Use a partial view if it is bigger than available space */
|
2016-08-19 00:17:01 +08:00
|
|
|
chunk_size = MIN_CHUNK_PAGES;
|
|
|
|
if (i915_gem_object_is_tiled(obj))
|
|
|
|
chunk_size = max(chunk_size, tile_row_pages(obj));
|
2015-05-08 19:37:39 +08:00
|
|
|
|
2015-05-06 19:36:09 +08:00
|
|
|
memset(&view, 0, sizeof(view));
|
|
|
|
view.type = I915_GGTT_VIEW_PARTIAL;
|
|
|
|
view.params.partial.offset = rounddown(page_offset, chunk_size);
|
|
|
|
view.params.partial.size =
|
2016-08-19 00:17:02 +08:00
|
|
|
min_t(unsigned int, chunk_size,
|
2016-10-11 17:06:56 +08:00
|
|
|
vma_pages(area) - view.params.partial.offset);
|
2015-05-06 19:36:09 +08:00
|
|
|
|
2016-08-19 00:17:03 +08:00
|
|
|
/* If the partial covers the entire object, just create a
|
|
|
|
* normal VMA.
|
|
|
|
*/
|
|
|
|
if (chunk_size >= obj->base.size >> PAGE_SHIFT)
|
|
|
|
view.type = I915_GGTT_VIEW_NORMAL;
|
|
|
|
|
2016-08-19 00:17:04 +08:00
|
|
|
/* Userspace is now writing through an untracked VMA, abandon
|
|
|
|
* all hope that the hardware is able to track future writes.
|
|
|
|
*/
|
|
|
|
obj->frontbuffer_ggtt_origin = ORIGIN_CPU;
|
|
|
|
|
2016-08-19 00:17:02 +08:00
|
|
|
vma = i915_gem_object_ggtt_pin(obj, &view, 0, 0, PIN_MAPPABLE);
|
|
|
|
}
|
2016-08-15 17:49:06 +08:00
|
|
|
if (IS_ERR(vma)) {
|
|
|
|
ret = PTR_ERR(vma);
|
2016-08-05 17:14:07 +08:00
|
|
|
goto err_unlock;
|
2016-08-15 17:49:06 +08:00
|
|
|
}
|
2010-10-28 21:44:08 +08:00
|
|
|
|
2012-11-20 18:45:17 +08:00
|
|
|
ret = i915_gem_object_set_to_gtt_domain(obj, write);
|
|
|
|
if (ret)
|
2016-08-05 17:14:07 +08:00
|
|
|
goto err_unpin;
|
2012-02-16 06:50:22 +08:00
|
|
|
|
2016-08-19 00:17:00 +08:00
|
|
|
ret = i915_vma_get_fence(vma);
|
2010-11-11 00:40:20 +08:00
|
|
|
if (ret)
|
2016-08-05 17:14:07 +08:00
|
|
|
goto err_unpin;
|
2010-08-08 04:45:03 +08:00
|
|
|
|
2014-06-10 19:14:40 +08:00
|
|
|
/* Finally, remap it using the new GTT offset */
|
2016-08-19 23:54:28 +08:00
|
|
|
ret = remap_io_mapping(area,
|
|
|
|
area->vm_start + (vma->ggtt_view.params.partial.offset << PAGE_SHIFT),
|
|
|
|
(ggtt->mappable_base + vma->node.start) >> PAGE_SHIFT,
|
|
|
|
min_t(u64, vma->size, area->vm_end - area->vm_start),
|
|
|
|
&ggtt->mappable);
|
|
|
|
if (ret)
|
|
|
|
goto err_unpin;
|
2015-05-06 19:36:09 +08:00
|
|
|
|
2016-08-19 00:17:02 +08:00
|
|
|
obj->fault_mappable = true;
|
2016-08-05 17:14:07 +08:00
|
|
|
err_unpin:
|
2016-08-15 17:49:06 +08:00
|
|
|
__i915_vma_unpin(vma);
|
2016-08-05 17:14:07 +08:00
|
|
|
err_unlock:
|
2008-11-13 02:03:55 +08:00
|
|
|
mutex_unlock(&dev->struct_mutex);
|
2016-08-05 17:14:07 +08:00
|
|
|
err_rpm:
|
|
|
|
intel_runtime_pm_put(dev_priv);
|
|
|
|
err:
|
2008-11-13 02:03:55 +08:00
|
|
|
switch (ret) {
|
2011-02-07 21:09:31 +08:00
|
|
|
case -EIO:
|
2014-09-04 15:36:18 +08:00
|
|
|
/*
|
|
|
|
* We eat errors when the gpu is terminally wedged to avoid
|
|
|
|
* userspace unduly crashing (gl has no provisions for mmaps to
|
|
|
|
* fail). But any other -EIO isn't ours (e.g. swap in failure)
|
|
|
|
* and so needs to be reported.
|
|
|
|
*/
|
|
|
|
if (!i915_terminally_wedged(&dev_priv->gpu_error)) {
|
2013-11-28 04:20:34 +08:00
|
|
|
ret = VM_FAULT_SIGBUS;
|
|
|
|
break;
|
|
|
|
}
|
2010-11-07 17:18:22 +08:00
|
|
|
case -EAGAIN:
|
2013-09-12 23:57:28 +08:00
|
|
|
/*
|
|
|
|
* EAGAIN means the gpu is hung and we'll wait for the error
|
|
|
|
* handler to reset everything when re-faulting in
|
|
|
|
* i915_mutex_lock_interruptible.
|
2011-02-07 21:09:31 +08:00
|
|
|
*/
|
2009-09-23 07:43:56 +08:00
|
|
|
case 0:
|
|
|
|
case -ERESTARTSYS:
|
2011-02-12 04:31:19 +08:00
|
|
|
case -EINTR:
|
2012-10-03 22:15:26 +08:00
|
|
|
case -EBUSY:
|
|
|
|
/*
|
|
|
|
* EBUSY is ok: this just means that another thread
|
|
|
|
* already did the job.
|
|
|
|
*/
|
2013-11-28 04:20:34 +08:00
|
|
|
ret = VM_FAULT_NOPAGE;
|
|
|
|
break;
|
2008-11-13 02:03:55 +08:00
|
|
|
case -ENOMEM:
|
2013-11-28 04:20:34 +08:00
|
|
|
ret = VM_FAULT_OOM;
|
|
|
|
break;
|
2012-10-17 17:17:16 +08:00
|
|
|
case -ENOSPC:
|
2014-01-31 19:34:57 +08:00
|
|
|
case -EFAULT:
|
2013-11-28 04:20:34 +08:00
|
|
|
ret = VM_FAULT_SIGBUS;
|
|
|
|
break;
|
2008-11-13 02:03:55 +08:00
|
|
|
default:
|
2012-10-17 17:17:16 +08:00
|
|
|
WARN_ONCE(ret, "unhandled error in i915_gem_fault: %i\n", ret);
|
2013-11-28 04:20:34 +08:00
|
|
|
ret = VM_FAULT_SIGBUS;
|
|
|
|
break;
|
2008-11-13 02:03:55 +08:00
|
|
|
}
|
2013-11-28 04:20:34 +08:00
|
|
|
return ret;
|
2008-11-13 02:03:55 +08:00
|
|
|
}
|
|
|
|
|
2009-07-10 15:18:50 +08:00
|
|
|
/**
|
|
|
|
* i915_gem_release_mmap - remove physical page mappings
|
|
|
|
* @obj: obj in question
|
|
|
|
*
|
tree-wide: fix assorted typos all over the place
That is "success", "unknown", "through", "performance", "[re|un]mapping"
, "access", "default", "reasonable", "[con]currently", "temperature"
, "channel", "[un]used", "application", "example","hierarchy", "therefore"
, "[over|under]flow", "contiguous", "threshold", "enough" and others.
Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-11-14 23:09:05 +08:00
|
|
|
* Preserve the reservation of the mmapping with the DRM core code, but
|
2009-07-10 15:18:50 +08:00
|
|
|
* relinquish ownership of the pages back to the system.
|
|
|
|
*
|
|
|
|
* It is vital that we remove the page mapping if we have mapped a tiled
|
|
|
|
* object through the GTT and then lose the fence register due to
|
|
|
|
* resource pressure. Similarly if the object has been moved out of the
|
|
|
|
* aperture, than pages mapped into userspace must be revoked. Removing the
|
|
|
|
* mapping will then trigger a page fault on the next user access, allowing
|
|
|
|
* fixup by i915_gem_fault().
|
|
|
|
*/
|
2009-07-11 04:02:26 +08:00
|
|
|
void
|
2010-11-09 03:18:58 +08:00
|
|
|
i915_gem_release_mmap(struct drm_i915_gem_object *obj)
|
2009-07-10 15:18:50 +08:00
|
|
|
{
|
2016-04-14 00:35:12 +08:00
|
|
|
/* Serialisation between user GTT access and our code depends upon
|
|
|
|
* revoking the CPU's PTE whilst the mutex is held. The next user
|
|
|
|
* pagefault then has to wait until we release the mutex.
|
|
|
|
*/
|
|
|
|
lockdep_assert_held(&obj->base.dev->struct_mutex);
|
|
|
|
|
2010-11-24 20:23:44 +08:00
|
|
|
if (!obj->fault_mappable)
|
|
|
|
return;
|
2009-07-10 15:18:50 +08:00
|
|
|
|
2014-01-03 21:24:19 +08:00
|
|
|
drm_vma_node_unmap(&obj->base.vma_node,
|
|
|
|
obj->base.dev->anon_inode->i_mapping);
|
2016-04-14 00:35:12 +08:00
|
|
|
|
|
|
|
/* Ensure that the CPU's PTE are revoked and there are not outstanding
|
|
|
|
* memory transactions from userspace before we return. The TLB
|
|
|
|
* flushing implied above by changing the PTE above *should* be
|
|
|
|
* sufficient, an extra barrier here just provides us with a bit
|
|
|
|
* of paranoid documentation about our requirement to serialise
|
|
|
|
* memory writes before touching registers / GSM.
|
|
|
|
*/
|
|
|
|
wmb();
|
|
|
|
|
2010-11-24 20:23:44 +08:00
|
|
|
obj->fault_mappable = false;
|
2009-07-10 15:18:50 +08:00
|
|
|
}
|
|
|
|
|
2014-06-16 15:57:44 +08:00
|
|
|
void
|
|
|
|
i915_gem_release_all_mmaps(struct drm_i915_private *dev_priv)
|
|
|
|
{
|
|
|
|
struct drm_i915_gem_object *obj;
|
|
|
|
|
|
|
|
list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list)
|
|
|
|
i915_gem_release_mmap(obj);
|
|
|
|
}
|
|
|
|
|
2016-08-04 23:32:27 +08:00
|
|
|
/**
|
|
|
|
* i915_gem_get_ggtt_size - return required global GTT size for an object
|
2016-08-04 23:32:28 +08:00
|
|
|
* @dev_priv: i915 device
|
2016-08-04 23:32:27 +08:00
|
|
|
* @size: object size
|
|
|
|
* @tiling_mode: tiling mode
|
|
|
|
*
|
|
|
|
* Return the required global GTT size for an object, taking into account
|
|
|
|
* potential fence register mapping.
|
|
|
|
*/
|
2016-08-04 23:32:28 +08:00
|
|
|
u64 i915_gem_get_ggtt_size(struct drm_i915_private *dev_priv,
|
|
|
|
u64 size, int tiling_mode)
|
2010-11-09 19:47:32 +08:00
|
|
|
{
|
2016-08-04 23:32:27 +08:00
|
|
|
u64 ggtt_size;
|
2010-11-09 19:47:32 +08:00
|
|
|
|
2016-08-04 23:32:27 +08:00
|
|
|
GEM_BUG_ON(size == 0);
|
2010-11-09 19:47:32 +08:00
|
|
|
|
2016-08-04 23:32:28 +08:00
|
|
|
if (INTEL_GEN(dev_priv) >= 4 ||
|
2011-07-19 04:11:49 +08:00
|
|
|
tiling_mode == I915_TILING_NONE)
|
|
|
|
return size;
|
2010-11-09 19:47:32 +08:00
|
|
|
|
|
|
|
/* Previous chips need a power-of-two fence region when tiling */
|
2016-08-04 23:32:28 +08:00
|
|
|
if (IS_GEN3(dev_priv))
|
2016-08-04 23:32:27 +08:00
|
|
|
ggtt_size = 1024*1024;
|
2010-11-09 19:47:32 +08:00
|
|
|
else
|
2016-08-04 23:32:27 +08:00
|
|
|
ggtt_size = 512*1024;
|
2010-11-09 19:47:32 +08:00
|
|
|
|
2016-08-04 23:32:27 +08:00
|
|
|
while (ggtt_size < size)
|
|
|
|
ggtt_size <<= 1;
|
2010-11-09 19:47:32 +08:00
|
|
|
|
2016-08-04 23:32:27 +08:00
|
|
|
return ggtt_size;
|
2010-11-09 19:47:32 +08:00
|
|
|
}
|
|
|
|
|
2008-11-13 02:03:55 +08:00
|
|
|
/**
|
2016-08-04 23:32:27 +08:00
|
|
|
* i915_gem_get_ggtt_alignment - return required global GTT alignment
|
2016-08-04 23:32:28 +08:00
|
|
|
* @dev_priv: i915 device
|
2016-06-03 21:02:17 +08:00
|
|
|
* @size: object size
|
|
|
|
* @tiling_mode: tiling mode
|
2016-08-04 23:32:27 +08:00
|
|
|
* @fenced: is fenced alignment required or not
|
2008-11-13 02:03:55 +08:00
|
|
|
*
|
2016-08-04 23:32:27 +08:00
|
|
|
* Return the required global GTT alignment for an object, taking into account
|
2010-11-15 05:32:36 +08:00
|
|
|
* potential fence register mapping.
|
2008-11-13 02:03:55 +08:00
|
|
|
*/
|
2016-08-04 23:32:28 +08:00
|
|
|
u64 i915_gem_get_ggtt_alignment(struct drm_i915_private *dev_priv, u64 size,
|
2016-08-04 23:32:27 +08:00
|
|
|
int tiling_mode, bool fenced)
|
2008-11-13 02:03:55 +08:00
|
|
|
{
|
2016-08-04 23:32:27 +08:00
|
|
|
GEM_BUG_ON(size == 0);
|
|
|
|
|
2008-11-13 02:03:55 +08:00
|
|
|
/*
|
|
|
|
* Minimum alignment is 4k (GTT page size), but might be greater
|
|
|
|
* if a fence register is needed for the object.
|
|
|
|
*/
|
2016-08-04 23:32:28 +08:00
|
|
|
if (INTEL_GEN(dev_priv) >= 4 || (!fenced && IS_G33(dev_priv)) ||
|
2011-07-19 04:11:49 +08:00
|
|
|
tiling_mode == I915_TILING_NONE)
|
2008-11-13 02:03:55 +08:00
|
|
|
return 4096;
|
|
|
|
|
2010-09-25 04:15:47 +08:00
|
|
|
/*
|
|
|
|
* Previous chips need to be aligned to the size of the smallest
|
|
|
|
* fence register that can contain the object.
|
|
|
|
*/
|
2016-08-04 23:32:28 +08:00
|
|
|
return i915_gem_get_ggtt_size(dev_priv, size, tiling_mode);
|
2010-09-25 04:15:47 +08:00
|
|
|
}
|
|
|
|
|
2012-08-11 22:41:03 +08:00
|
|
|
static int i915_gem_object_create_mmap_offset(struct drm_i915_gem_object *obj)
|
|
|
|
{
|
2016-07-04 18:34:36 +08:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
|
2016-08-05 17:14:14 +08:00
|
|
|
int err;
|
2012-12-20 22:11:16 +08:00
|
|
|
|
2016-08-05 17:14:14 +08:00
|
|
|
err = drm_gem_create_mmap_offset(&obj->base);
|
|
|
|
if (!err)
|
|
|
|
return 0;
|
2012-08-11 22:41:03 +08:00
|
|
|
|
2016-08-05 17:14:14 +08:00
|
|
|
/* We can idle the GPU locklessly to flush stale objects, but in order
|
|
|
|
* to claim that space for ourselves, we need to take the big
|
|
|
|
* struct_mutex to free the requests+objects and allocate our slot.
|
2012-08-11 22:41:03 +08:00
|
|
|
*/
|
2016-09-09 21:11:49 +08:00
|
|
|
err = i915_gem_wait_for_idle(dev_priv, I915_WAIT_INTERRUPTIBLE);
|
2016-08-05 17:14:14 +08:00
|
|
|
if (err)
|
|
|
|
return err;
|
2012-08-11 22:41:03 +08:00
|
|
|
|
2016-08-05 17:14:14 +08:00
|
|
|
err = i915_mutex_lock_interruptible(&dev_priv->drm);
|
|
|
|
if (!err) {
|
|
|
|
i915_gem_retire_requests(dev_priv);
|
|
|
|
err = drm_gem_create_mmap_offset(&obj->base);
|
|
|
|
mutex_unlock(&dev_priv->drm.struct_mutex);
|
|
|
|
}
|
2012-12-20 22:11:16 +08:00
|
|
|
|
2016-08-05 17:14:14 +08:00
|
|
|
return err;
|
2012-08-11 22:41:03 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void i915_gem_object_free_mmap_offset(struct drm_i915_gem_object *obj)
|
|
|
|
{
|
|
|
|
drm_gem_free_mmap_offset(&obj->base);
|
|
|
|
}
|
|
|
|
|
2014-12-24 11:11:17 +08:00
|
|
|
int
|
2011-02-07 10:16:14 +08:00
|
|
|
i915_gem_mmap_gtt(struct drm_file *file,
|
|
|
|
struct drm_device *dev,
|
2014-12-24 11:11:17 +08:00
|
|
|
uint32_t handle,
|
2011-02-07 10:16:14 +08:00
|
|
|
uint64_t *offset)
|
2008-11-13 02:03:55 +08:00
|
|
|
{
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_i915_gem_object *obj;
|
2008-11-13 02:03:55 +08:00
|
|
|
int ret;
|
|
|
|
|
2016-07-20 20:31:51 +08:00
|
|
|
obj = i915_gem_object_lookup(file, handle);
|
2016-08-05 17:14:14 +08:00
|
|
|
if (!obj)
|
|
|
|
return -ENOENT;
|
2009-09-23 01:46:17 +08:00
|
|
|
|
2012-08-11 22:41:03 +08:00
|
|
|
ret = i915_gem_object_create_mmap_offset(obj);
|
2016-08-05 17:14:14 +08:00
|
|
|
if (ret == 0)
|
|
|
|
*offset = drm_vma_node_offset_addr(&obj->base.vma_node);
|
2008-11-13 02:03:55 +08:00
|
|
|
|
2016-08-05 17:14:14 +08:00
|
|
|
i915_gem_object_put_unlocked(obj);
|
2010-10-17 16:45:41 +08:00
|
|
|
return ret;
|
2008-11-13 02:03:55 +08:00
|
|
|
}
|
|
|
|
|
2011-02-07 10:16:14 +08:00
|
|
|
/**
|
|
|
|
* i915_gem_mmap_gtt_ioctl - prepare an object for GTT mmap'ing
|
|
|
|
* @dev: DRM device
|
|
|
|
* @data: GTT mapping ioctl data
|
|
|
|
* @file: GEM object info
|
|
|
|
*
|
|
|
|
* Simply returns the fake offset to userspace so it can mmap it.
|
|
|
|
* The mmap call will end up in drm_gem_mmap(), which will set things
|
|
|
|
* up so we can get faults in the handler above.
|
|
|
|
*
|
|
|
|
* The fault handler will take care of binding the object into the GTT
|
|
|
|
* (since it may have been evicted to make room for something), allocating
|
|
|
|
* a fence register, and mapping the appropriate aperture address into
|
|
|
|
* userspace.
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
i915_gem_mmap_gtt_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file)
|
|
|
|
{
|
|
|
|
struct drm_i915_gem_mmap_gtt *args = data;
|
|
|
|
|
2014-12-24 11:11:17 +08:00
|
|
|
return i915_gem_mmap_gtt(file, dev, args->handle, &args->offset);
|
2011-02-07 10:16:14 +08:00
|
|
|
}
|
|
|
|
|
2012-08-20 16:23:20 +08:00
|
|
|
/* Immediately discard the backing storage */
|
|
|
|
static void
|
|
|
|
i915_gem_object_truncate(struct drm_i915_gem_object *obj)
|
2010-10-28 20:45:36 +08:00
|
|
|
{
|
2012-08-11 22:41:05 +08:00
|
|
|
i915_gem_object_free_mmap_offset(obj);
|
2012-05-10 21:25:09 +08:00
|
|
|
|
2012-08-11 22:41:05 +08:00
|
|
|
if (obj->base.filp == NULL)
|
|
|
|
return;
|
2010-10-28 20:45:36 +08:00
|
|
|
|
2012-08-20 16:23:20 +08:00
|
|
|
/* Our goal here is to return as much of the memory as
|
|
|
|
* is possible back to the system as we are called from OOM.
|
|
|
|
* To do this we must instruct the shmfs to drop all of its
|
|
|
|
* backing pages, *now*.
|
|
|
|
*/
|
2014-03-25 21:23:06 +08:00
|
|
|
shmem_truncate_range(file_inode(obj->base.filp), 0, (loff_t)-1);
|
2012-08-20 16:23:20 +08:00
|
|
|
obj->madv = __I915_MADV_PURGED;
|
|
|
|
}
|
2010-10-28 20:45:36 +08:00
|
|
|
|
2014-03-25 21:23:06 +08:00
|
|
|
/* Try to discard unwanted pages */
|
|
|
|
static void
|
|
|
|
i915_gem_object_invalidate(struct drm_i915_gem_object *obj)
|
2012-08-20 16:23:20 +08:00
|
|
|
{
|
2014-03-25 21:23:06 +08:00
|
|
|
struct address_space *mapping;
|
|
|
|
|
|
|
|
switch (obj->madv) {
|
|
|
|
case I915_MADV_DONTNEED:
|
|
|
|
i915_gem_object_truncate(obj);
|
|
|
|
case __I915_MADV_PURGED:
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (obj->base.filp == NULL)
|
|
|
|
return;
|
|
|
|
|
2015-12-05 12:45:44 +08:00
|
|
|
mapping = obj->base.filp->f_mapping,
|
2014-03-25 21:23:06 +08:00
|
|
|
invalidate_mapping_pages(mapping, 0, (loff_t)-1);
|
2010-10-28 20:45:36 +08:00
|
|
|
}
|
|
|
|
|
2010-09-27 22:51:07 +08:00
|
|
|
static void
|
2010-11-09 03:18:58 +08:00
|
|
|
i915_gem_object_put_pages_gtt(struct drm_i915_gem_object *obj)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
2016-05-20 18:54:06 +08:00
|
|
|
struct sgt_iter sgt_iter;
|
|
|
|
struct page *page;
|
2013-02-19 01:28:03 +08:00
|
|
|
int ret;
|
2012-05-10 21:25:09 +08:00
|
|
|
|
2010-11-09 03:18:58 +08:00
|
|
|
BUG_ON(obj->madv == __I915_MADV_PURGED);
|
2008-07-31 03:06:12 +08:00
|
|
|
|
drm/i915: Track unbound pages
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-20 17:40:46 +08:00
|
|
|
ret = i915_gem_object_set_to_cpu_domain(obj, true);
|
drm/i915: Prevent leaking of -EIO from i915_wait_request()
Reporting -EIO from i915_wait_request() has proven very troublematic
over the years, with numerous hard-to-reproduce bugs cropping up in the
corner case of where a reset occurs and the code wasn't expecting such
an error.
If the we reset the GPU or have detected a hang and wish to reset the
GPU, the request is forcibly complete and the wait broken. Currently, we
report either -EAGAIN or -EIO in order for the caller to retreat and
restart the wait (if appropriate) after dropping and then reacquiring
the struct_mutex (essential to allow the GPU reset to proceed). However,
if we take the view that the request is complete (no further work will
be done on it by the GPU because it is dead and soon to be reset), then
we can proceed with the task at hand and then drop the struct_mutex
allowing the reset to occur. This transfers the burden of checking
whether it is safe to proceed to the caller, which in all but one
instance it is safe - completely eliminating the source of all spurious
-EIO.
Of note, we only have two API entry points where we expect that
userspace can observe an EIO. First is when submitting an execbuf, if
the GPU is terminally wedged, then the operation cannot succeed and an
-EIO is reported. Secondly, existing userspace uses the throttle ioctl
to detect an already wedged GPU before starting using HW acceleration
(or to confirm that the GPU is wedged after an error condition). So if
the GPU is wedged when the user calls throttle, also report -EIO.
v2: Split more carefully the change to i915_wait_request() and assorted
ABI from the reset handling.
v3: Add a couple of WARN_ON(EIO) to the interruptible modesetting code
so that we don't start to leak EIO there in future (and break our hang
resistant modesetting).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-9-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-1-git-send-email-chris@chris-wilson.co.uk
2016-04-14 00:35:08 +08:00
|
|
|
if (WARN_ON(ret)) {
|
drm/i915: Track unbound pages
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-20 17:40:46 +08:00
|
|
|
/* In the event of a disaster, abandon all caches and
|
|
|
|
* hope for the best.
|
|
|
|
*/
|
2013-08-09 19:26:45 +08:00
|
|
|
i915_gem_clflush_object(obj, true);
|
drm/i915: Track unbound pages
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-20 17:40:46 +08:00
|
|
|
obj->base.read_domains = obj->base.write_domain = I915_GEM_DOMAIN_CPU;
|
|
|
|
}
|
|
|
|
|
2015-07-09 17:59:05 +08:00
|
|
|
i915_gem_gtt_finish_object(obj);
|
|
|
|
|
2011-09-13 03:30:02 +08:00
|
|
|
if (i915_gem_object_needs_bit17_swizzle(obj))
|
2009-03-13 07:56:27 +08:00
|
|
|
i915_gem_object_save_bit_17_swizzle(obj);
|
|
|
|
|
2010-11-09 03:18:58 +08:00
|
|
|
if (obj->madv == I915_MADV_DONTNEED)
|
|
|
|
obj->dirty = 0;
|
2009-09-14 23:50:29 +08:00
|
|
|
|
2016-05-20 18:54:06 +08:00
|
|
|
for_each_sgt_page(page, sgt_iter, obj->pages) {
|
2010-11-09 03:18:58 +08:00
|
|
|
if (obj->dirty)
|
2012-06-01 22:20:22 +08:00
|
|
|
set_page_dirty(page);
|
2009-09-14 23:50:29 +08:00
|
|
|
|
2010-11-09 03:18:58 +08:00
|
|
|
if (obj->madv == I915_MADV_WILLNEED)
|
2012-06-01 22:20:22 +08:00
|
|
|
mark_page_accessed(page);
|
2009-09-14 23:50:29 +08:00
|
|
|
|
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-01 20:29:47 +08:00
|
|
|
put_page(page);
|
2009-09-14 23:50:29 +08:00
|
|
|
}
|
2010-11-09 03:18:58 +08:00
|
|
|
obj->dirty = 0;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2012-06-01 22:20:22 +08:00
|
|
|
sg_free_table(obj->pages);
|
|
|
|
kfree(obj->pages);
|
2012-06-07 22:38:42 +08:00
|
|
|
}
|
drm/i915: Track unbound pages
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-20 17:40:46 +08:00
|
|
|
|
2013-01-15 20:39:35 +08:00
|
|
|
int
|
2012-06-07 22:38:42 +08:00
|
|
|
i915_gem_object_put_pages(struct drm_i915_gem_object *obj)
|
|
|
|
{
|
|
|
|
const struct drm_i915_gem_object_ops *ops = obj->ops;
|
|
|
|
|
2012-09-05 04:02:58 +08:00
|
|
|
if (obj->pages == NULL)
|
2012-06-07 22:38:42 +08:00
|
|
|
return 0;
|
|
|
|
|
2012-09-05 04:02:54 +08:00
|
|
|
if (obj->pages_pin_count)
|
|
|
|
return -EBUSY;
|
|
|
|
|
2016-08-04 14:52:26 +08:00
|
|
|
GEM_BUG_ON(obj->bind_count);
|
2013-08-01 08:00:04 +08:00
|
|
|
|
2012-12-03 19:49:00 +08:00
|
|
|
/* ->put_pages might need to allocate memory for the bit17 swizzle
|
|
|
|
* array, hence protect them from being reaped by removing them from gtt
|
|
|
|
* lists early. */
|
2013-06-01 02:28:48 +08:00
|
|
|
list_del(&obj->global_list);
|
2012-12-03 19:49:00 +08:00
|
|
|
|
2016-04-08 19:11:11 +08:00
|
|
|
if (obj->mapping) {
|
2016-08-19 00:16:42 +08:00
|
|
|
void *ptr;
|
|
|
|
|
|
|
|
ptr = ptr_mask_bits(obj->mapping);
|
|
|
|
if (is_vmalloc_addr(ptr))
|
|
|
|
vunmap(ptr);
|
2016-04-08 19:11:14 +08:00
|
|
|
else
|
2016-08-19 00:16:42 +08:00
|
|
|
kunmap(kmap_to_page(ptr));
|
|
|
|
|
2016-04-08 19:11:11 +08:00
|
|
|
obj->mapping = NULL;
|
|
|
|
}
|
|
|
|
|
2012-06-07 22:38:42 +08:00
|
|
|
ops->put_pages(obj);
|
2010-11-09 03:18:58 +08:00
|
|
|
obj->pages = NULL;
|
2012-06-07 22:38:42 +08:00
|
|
|
|
2014-03-25 21:23:06 +08:00
|
|
|
i915_gem_object_invalidate(obj);
|
drm/i915: Track unbound pages
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-20 17:40:46 +08:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2016-10-18 20:02:50 +08:00
|
|
|
static unsigned int swiotlb_max_size(void)
|
2016-10-11 16:20:21 +08:00
|
|
|
{
|
|
|
|
#if IS_ENABLED(CONFIG_SWIOTLB)
|
|
|
|
return rounddown(swiotlb_nr_tbl() << IO_TLB_SHIFT, PAGE_SIZE);
|
|
|
|
#else
|
|
|
|
return 0;
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2012-06-07 22:38:42 +08:00
|
|
|
static int
|
drm/i915: Track unbound pages
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-20 17:40:46 +08:00
|
|
|
i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
|
2010-10-28 20:45:36 +08:00
|
|
|
{
|
2016-07-04 18:34:36 +08:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
|
2010-10-28 20:45:36 +08:00
|
|
|
int page_count, i;
|
|
|
|
struct address_space *mapping;
|
2012-06-01 22:20:22 +08:00
|
|
|
struct sg_table *st;
|
|
|
|
struct scatterlist *sg;
|
2016-05-20 18:54:06 +08:00
|
|
|
struct sgt_iter sgt_iter;
|
2010-10-28 20:45:36 +08:00
|
|
|
struct page *page;
|
2013-02-19 01:28:03 +08:00
|
|
|
unsigned long last_pfn = 0; /* suppress gcc warning */
|
2016-10-18 20:02:50 +08:00
|
|
|
unsigned int max_segment;
|
2015-07-09 17:59:05 +08:00
|
|
|
int ret;
|
drm/i915: Track unbound pages
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-20 17:40:46 +08:00
|
|
|
gfp_t gfp;
|
2010-10-28 20:45:36 +08:00
|
|
|
|
drm/i915: Track unbound pages
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-20 17:40:46 +08:00
|
|
|
/* Assert that the object is not currently in any GPU domain. As it
|
|
|
|
* wasn't in the GTT, there shouldn't be any way it could have been in
|
|
|
|
* a GPU cache
|
|
|
|
*/
|
|
|
|
BUG_ON(obj->base.read_domains & I915_GEM_GPU_DOMAINS);
|
|
|
|
BUG_ON(obj->base.write_domain & I915_GEM_GPU_DOMAINS);
|
|
|
|
|
2016-10-11 16:20:21 +08:00
|
|
|
max_segment = swiotlb_max_size();
|
|
|
|
if (!max_segment)
|
2016-10-18 20:02:50 +08:00
|
|
|
max_segment = rounddown(UINT_MAX, PAGE_SIZE);
|
2016-10-11 16:20:21 +08:00
|
|
|
|
2012-06-01 22:20:22 +08:00
|
|
|
st = kmalloc(sizeof(*st), GFP_KERNEL);
|
|
|
|
if (st == NULL)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
2010-11-09 03:18:58 +08:00
|
|
|
page_count = obj->base.size / PAGE_SIZE;
|
2012-06-01 22:20:22 +08:00
|
|
|
if (sg_alloc_table(st, page_count, GFP_KERNEL)) {
|
|
|
|
kfree(st);
|
2010-10-28 20:45:36 +08:00
|
|
|
return -ENOMEM;
|
2012-06-01 22:20:22 +08:00
|
|
|
}
|
2010-10-28 20:45:36 +08:00
|
|
|
|
2012-06-01 22:20:22 +08:00
|
|
|
/* Get the list of pages out of our struct file. They'll be pinned
|
|
|
|
* at this point until we release them.
|
|
|
|
*
|
|
|
|
* Fail silently without starting the shrinker
|
|
|
|
*/
|
2015-12-05 12:45:44 +08:00
|
|
|
mapping = obj->base.filp->f_mapping;
|
2015-11-07 08:28:49 +08:00
|
|
|
gfp = mapping_gfp_constraint(mapping, ~(__GFP_IO | __GFP_RECLAIM));
|
2015-11-07 08:28:21 +08:00
|
|
|
gfp |= __GFP_NORETRY | __GFP_NOWARN;
|
2013-02-19 01:28:03 +08:00
|
|
|
sg = st->sgl;
|
|
|
|
st->nents = 0;
|
|
|
|
for (i = 0; i < page_count; i++) {
|
drm/i915: Track unbound pages
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-20 17:40:46 +08:00
|
|
|
page = shmem_read_mapping_page_gfp(mapping, i, gfp);
|
|
|
|
if (IS_ERR(page)) {
|
2014-09-09 18:16:08 +08:00
|
|
|
i915_gem_shrink(dev_priv,
|
|
|
|
page_count,
|
|
|
|
I915_SHRINK_BOUND |
|
|
|
|
I915_SHRINK_UNBOUND |
|
|
|
|
I915_SHRINK_PURGEABLE);
|
drm/i915: Track unbound pages
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-20 17:40:46 +08:00
|
|
|
page = shmem_read_mapping_page_gfp(mapping, i, gfp);
|
|
|
|
}
|
|
|
|
if (IS_ERR(page)) {
|
|
|
|
/* We've tried hard to allocate the memory by reaping
|
|
|
|
* our own buffer, now let the real VM do its job and
|
|
|
|
* go down in flames if truly OOM.
|
|
|
|
*/
|
2014-05-25 20:34:10 +08:00
|
|
|
page = shmem_read_mapping_page(mapping, i);
|
2015-07-09 17:59:05 +08:00
|
|
|
if (IS_ERR(page)) {
|
|
|
|
ret = PTR_ERR(page);
|
drm/i915: Track unbound pages
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-20 17:40:46 +08:00
|
|
|
goto err_pages;
|
2015-07-09 17:59:05 +08:00
|
|
|
}
|
drm/i915: Track unbound pages
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-20 17:40:46 +08:00
|
|
|
}
|
2016-10-11 16:20:21 +08:00
|
|
|
if (!i ||
|
|
|
|
sg->length >= max_segment ||
|
|
|
|
page_to_pfn(page) != last_pfn + 1) {
|
2013-02-19 01:28:03 +08:00
|
|
|
if (i)
|
|
|
|
sg = sg_next(sg);
|
|
|
|
st->nents++;
|
|
|
|
sg_set_page(sg, page, PAGE_SIZE, 0);
|
|
|
|
} else {
|
|
|
|
sg->length += PAGE_SIZE;
|
|
|
|
}
|
|
|
|
last_pfn = page_to_pfn(page);
|
2013-10-08 04:15:45 +08:00
|
|
|
|
|
|
|
/* Check that the i965g/gm workaround works. */
|
|
|
|
WARN_ON((gfp & __GFP_DMA32) && (last_pfn >= 0x00100000UL));
|
2010-10-28 20:45:36 +08:00
|
|
|
}
|
2016-10-11 16:20:21 +08:00
|
|
|
if (sg) /* loop terminated early; short sg table */
|
2013-06-24 23:47:48 +08:00
|
|
|
sg_mark_end(sg);
|
2012-10-19 22:51:06 +08:00
|
|
|
obj->pages = st;
|
|
|
|
|
2015-07-09 17:59:05 +08:00
|
|
|
ret = i915_gem_gtt_prepare_object(obj);
|
|
|
|
if (ret)
|
|
|
|
goto err_pages;
|
|
|
|
|
2011-09-13 03:30:02 +08:00
|
|
|
if (i915_gem_object_needs_bit17_swizzle(obj))
|
2010-10-28 20:45:36 +08:00
|
|
|
i915_gem_object_do_bit_17_swizzle(obj);
|
|
|
|
|
2016-08-05 17:14:23 +08:00
|
|
|
if (i915_gem_object_is_tiled(obj) &&
|
2014-11-20 16:26:30 +08:00
|
|
|
dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES)
|
|
|
|
i915_gem_object_pin_pages(obj);
|
|
|
|
|
2010-10-28 20:45:36 +08:00
|
|
|
return 0;
|
|
|
|
|
|
|
|
err_pages:
|
2013-02-19 01:28:03 +08:00
|
|
|
sg_mark_end(sg);
|
2016-05-20 18:54:06 +08:00
|
|
|
for_each_sgt_page(page, sgt_iter, st)
|
|
|
|
put_page(page);
|
2012-06-01 22:20:22 +08:00
|
|
|
sg_free_table(st);
|
|
|
|
kfree(st);
|
2014-03-25 21:23:03 +08:00
|
|
|
|
|
|
|
/* shmemfs first checks if there is enough memory to allocate the page
|
|
|
|
* and reports ENOSPC should there be insufficient, along with the usual
|
|
|
|
* ENOMEM for a genuine allocation failure.
|
|
|
|
*
|
|
|
|
* We use ENOSPC in our driver to mean that we have run out of aperture
|
|
|
|
* space and so want to translate the error from shmemfs back to our
|
|
|
|
* usual understanding of ENOMEM.
|
|
|
|
*/
|
2015-07-09 17:59:05 +08:00
|
|
|
if (ret == -ENOSPC)
|
|
|
|
ret = -ENOMEM;
|
|
|
|
|
|
|
|
return ret;
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
|
|
|
|
2012-06-07 22:38:42 +08:00
|
|
|
/* Ensure that the associated pages are gathered from the backing storage
|
|
|
|
* and pinned into our object. i915_gem_object_get_pages() may be called
|
|
|
|
* multiple times before they are released by a single call to
|
|
|
|
* i915_gem_object_put_pages() - once the pages are no longer referenced
|
|
|
|
* either as a result of memory pressure (reaping pages under the shrinker)
|
|
|
|
* or as the object is itself released.
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
|
|
|
|
{
|
2016-07-04 18:34:36 +08:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
|
2012-06-07 22:38:42 +08:00
|
|
|
const struct drm_i915_gem_object_ops *ops = obj->ops;
|
|
|
|
int ret;
|
|
|
|
|
2012-09-05 04:02:58 +08:00
|
|
|
if (obj->pages)
|
2012-06-07 22:38:42 +08:00
|
|
|
return 0;
|
|
|
|
|
2013-01-08 18:53:09 +08:00
|
|
|
if (obj->madv != I915_MADV_WILLNEED) {
|
2014-02-10 17:03:50 +08:00
|
|
|
DRM_DEBUG("Attempting to obtain a purgeable object\n");
|
2014-01-31 19:34:58 +08:00
|
|
|
return -EFAULT;
|
2013-01-08 18:53:09 +08:00
|
|
|
}
|
|
|
|
|
2012-09-05 04:02:54 +08:00
|
|
|
BUG_ON(obj->pages_pin_count);
|
|
|
|
|
2012-06-07 22:38:42 +08:00
|
|
|
ret = ops->get_pages(obj);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2013-06-01 02:28:48 +08:00
|
|
|
list_add_tail(&obj->global_list, &dev_priv->mm.unbound_list);
|
2015-04-07 23:20:25 +08:00
|
|
|
|
|
|
|
obj->get_page.sg = obj->pages->sgl;
|
|
|
|
obj->get_page.last = 0;
|
|
|
|
|
2012-06-07 22:38:42 +08:00
|
|
|
return 0;
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
|
|
|
|
2016-05-20 18:54:04 +08:00
|
|
|
/* The 'mapping' part of i915_gem_object_pin_map() below */
|
drm/i915: Support for creating write combined type vmaps
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.
We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT
v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)
v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)
v4:
- Rename the enums and clean up the pin_map function. (Chris)
v5: Drop the VM_NO_GUARD, minor cosmetics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
2016-08-12 19:39:58 +08:00
|
|
|
static void *i915_gem_object_map(const struct drm_i915_gem_object *obj,
|
|
|
|
enum i915_map_type type)
|
2016-05-20 18:54:04 +08:00
|
|
|
{
|
|
|
|
unsigned long n_pages = obj->base.size >> PAGE_SHIFT;
|
|
|
|
struct sg_table *sgt = obj->pages;
|
2016-05-20 18:54:06 +08:00
|
|
|
struct sgt_iter sgt_iter;
|
|
|
|
struct page *page;
|
2016-05-20 18:54:05 +08:00
|
|
|
struct page *stack_pages[32];
|
|
|
|
struct page **pages = stack_pages;
|
2016-05-20 18:54:04 +08:00
|
|
|
unsigned long i = 0;
|
drm/i915: Support for creating write combined type vmaps
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.
We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT
v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)
v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)
v4:
- Rename the enums and clean up the pin_map function. (Chris)
v5: Drop the VM_NO_GUARD, minor cosmetics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
2016-08-12 19:39:58 +08:00
|
|
|
pgprot_t pgprot;
|
2016-05-20 18:54:04 +08:00
|
|
|
void *addr;
|
|
|
|
|
|
|
|
/* A single page can always be kmapped */
|
drm/i915: Support for creating write combined type vmaps
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.
We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT
v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)
v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)
v4:
- Rename the enums and clean up the pin_map function. (Chris)
v5: Drop the VM_NO_GUARD, minor cosmetics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
2016-08-12 19:39:58 +08:00
|
|
|
if (n_pages == 1 && type == I915_MAP_WB)
|
2016-05-20 18:54:04 +08:00
|
|
|
return kmap(sg_page(sgt->sgl));
|
|
|
|
|
2016-05-20 18:54:05 +08:00
|
|
|
if (n_pages > ARRAY_SIZE(stack_pages)) {
|
|
|
|
/* Too big for stack -- allocate temporary array instead */
|
|
|
|
pages = drm_malloc_gfp(n_pages, sizeof(*pages), GFP_TEMPORARY);
|
|
|
|
if (!pages)
|
|
|
|
return NULL;
|
|
|
|
}
|
2016-05-20 18:54:04 +08:00
|
|
|
|
2016-05-20 18:54:06 +08:00
|
|
|
for_each_sgt_page(page, sgt_iter, sgt)
|
|
|
|
pages[i++] = page;
|
2016-05-20 18:54:04 +08:00
|
|
|
|
|
|
|
/* Check that we have the expected number of pages */
|
|
|
|
GEM_BUG_ON(i != n_pages);
|
|
|
|
|
drm/i915: Support for creating write combined type vmaps
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.
We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT
v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)
v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)
v4:
- Rename the enums and clean up the pin_map function. (Chris)
v5: Drop the VM_NO_GUARD, minor cosmetics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
2016-08-12 19:39:58 +08:00
|
|
|
switch (type) {
|
|
|
|
case I915_MAP_WB:
|
|
|
|
pgprot = PAGE_KERNEL;
|
|
|
|
break;
|
|
|
|
case I915_MAP_WC:
|
|
|
|
pgprot = pgprot_writecombine(PAGE_KERNEL_IO);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
addr = vmap(pages, n_pages, 0, pgprot);
|
2016-05-20 18:54:04 +08:00
|
|
|
|
2016-05-20 18:54:05 +08:00
|
|
|
if (pages != stack_pages)
|
|
|
|
drm_free_large(pages);
|
2016-05-20 18:54:04 +08:00
|
|
|
|
|
|
|
return addr;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* get, pin, and map the pages of the object into kernel space */
|
drm/i915: Support for creating write combined type vmaps
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.
We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT
v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)
v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)
v4:
- Rename the enums and clean up the pin_map function. (Chris)
v5: Drop the VM_NO_GUARD, minor cosmetics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
2016-08-12 19:39:58 +08:00
|
|
|
void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
|
|
|
|
enum i915_map_type type)
|
2016-04-08 19:11:11 +08:00
|
|
|
{
|
drm/i915: Support for creating write combined type vmaps
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.
We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT
v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)
v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)
v4:
- Rename the enums and clean up the pin_map function. (Chris)
v5: Drop the VM_NO_GUARD, minor cosmetics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
2016-08-12 19:39:58 +08:00
|
|
|
enum i915_map_type has_type;
|
|
|
|
bool pinned;
|
|
|
|
void *ptr;
|
2016-04-08 19:11:11 +08:00
|
|
|
int ret;
|
|
|
|
|
|
|
|
lockdep_assert_held(&obj->base.dev->struct_mutex);
|
drm/i915: Support for creating write combined type vmaps
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.
We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT
v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)
v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)
v4:
- Rename the enums and clean up the pin_map function. (Chris)
v5: Drop the VM_NO_GUARD, minor cosmetics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
2016-08-12 19:39:58 +08:00
|
|
|
GEM_BUG_ON(!i915_gem_object_has_struct_page(obj));
|
2016-04-08 19:11:11 +08:00
|
|
|
|
|
|
|
ret = i915_gem_object_get_pages(obj);
|
|
|
|
if (ret)
|
|
|
|
return ERR_PTR(ret);
|
|
|
|
|
|
|
|
i915_gem_object_pin_pages(obj);
|
drm/i915: Support for creating write combined type vmaps
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.
We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT
v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)
v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)
v4:
- Rename the enums and clean up the pin_map function. (Chris)
v5: Drop the VM_NO_GUARD, minor cosmetics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
2016-08-12 19:39:58 +08:00
|
|
|
pinned = obj->pages_pin_count > 1;
|
2016-04-08 19:11:11 +08:00
|
|
|
|
drm/i915: Support for creating write combined type vmaps
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.
We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT
v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)
v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)
v4:
- Rename the enums and clean up the pin_map function. (Chris)
v5: Drop the VM_NO_GUARD, minor cosmetics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
2016-08-12 19:39:58 +08:00
|
|
|
ptr = ptr_unpack_bits(obj->mapping, has_type);
|
|
|
|
if (ptr && has_type != type) {
|
|
|
|
if (pinned) {
|
|
|
|
ret = -EBUSY;
|
|
|
|
goto err;
|
2016-04-08 19:11:11 +08:00
|
|
|
}
|
|
|
|
|
drm/i915: Support for creating write combined type vmaps
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.
We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT
v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)
v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)
v4:
- Rename the enums and clean up the pin_map function. (Chris)
v5: Drop the VM_NO_GUARD, minor cosmetics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
2016-08-12 19:39:58 +08:00
|
|
|
if (is_vmalloc_addr(ptr))
|
|
|
|
vunmap(ptr);
|
|
|
|
else
|
|
|
|
kunmap(kmap_to_page(ptr));
|
2012-12-10 19:56:17 +08:00
|
|
|
|
drm/i915: Support for creating write combined type vmaps
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.
We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT
v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)
v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)
v4:
- Rename the enums and clean up the pin_map function. (Chris)
v5: Drop the VM_NO_GUARD, minor cosmetics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
2016-08-12 19:39:58 +08:00
|
|
|
ptr = obj->mapping = NULL;
|
drm/i915: Slaughter the thundering i915_wait_request herd
One particularly stressful scenario consists of many independent tasks
all competing for GPU time and waiting upon the results (e.g. realtime
transcoding of many, many streams). One bottleneck in particular is that
each client waits on its own results, but every client is woken up after
every batchbuffer - hence the thunder of hooves as then every client must
do its heavyweight dance to read a coherent seqno to see if it is the
lucky one.
Ideally, we only want one client to wake up after the interrupt and
check its request for completion. Since the requests must retire in
order, we can select the first client on the oldest request to be woken.
Once that client has completed his wait, we can then wake up the
next client and so on. However, all clients then incur latency as every
process in the chain may be delayed for scheduling - this may also then
cause some priority inversion. To reduce the latency, when a client
is added or removed from the list, we scan the tree for completed
seqno and wake up all the completed waiters in parallel.
Using igt/benchmarks/gem_latency, we can demonstrate this effect. The
benchmark measures the number of GPU cycles between completion of a
batch and the client waking up from a call to wait-ioctl. With many
concurrent waiters, with each on a different request, we observe that
the wakeup latency before the patch scales nearly linearly with the
number of waiters (before external factors kick in making the scaling much
worse). After applying the patch, we can see that only the single waiter
for the request is being woken up, providing a constant wakeup latency
for every operation. However, the situation is not quite as rosy for
many waiters on the same request, though to the best of my knowledge this
is much less likely in practice. Here, we can observe that the
concurrent waiters incur extra latency from being woken up by the
solitary bottom-half, rather than directly by the interrupt. This
appears to be scheduler induced (having discounted adverse effects from
having a rbtree walk/erase in the wakeup path), each additional
wake_up_process() costs approximately 1us on big core. Another effect of
performing the secondary wakeups from the first bottom-half is the
incurred delay this imposes on high priority threads - rather than
immediately returning to userspace and leaving the interrupt handler to
wake the others.
To offset the delay incurred with additional waiters on a request, we
could use a hybrid scheme that did a quick read in the interrupt handler
and dequeued all the completed waiters (incurring the overhead in the
interrupt handler, not the best plan either as we then incur GPU
submission latency) but we would still have to wake up the bottom-half
every time to do the heavyweight slow read. Or we could only kick the
waiters on the seqno with the same priority as the current task (i.e. in
the realtime waiter scenario, only it is woken up immediately by the
interrupt and simply queues the next waiter before returning to userspace,
minimising its delay at the expense of the chain, and also reducing
contention on its scheduler runqueue). This is effective at avoid long
pauses in the interrupt handler and at avoiding the extra latency in
realtime/high-priority waiters.
v2: Convert from a kworker per engine into a dedicated kthread for the
bottom-half.
v3: Rename request members and tweak comments.
v4: Use a per-engine spinlock in the breadcrumbs bottom-half.
v5: Fix race in locklessly checking waiter status and kicking the task on
adding a new waiter.
v6: Fix deciding when to force the timer to hide missing interrupts.
v7: Move the bottom-half from the kthread to the first client process.
v8: Reword a few comments
v9: Break the busy loop when the interrupt is unmasked or has fired.
v10: Comments, unnecessary churn, better debugging from Tvrtko
v11: Wake all completed waiters on removing the current bottom-half to
reduce the latency of waking up a herd of clients all waiting on the
same request.
v12: Rearrange missed-interrupt fault injection so that it works with
igt/drv_missed_irq_hang
v13: Rename intel_breadcrumb and friends to intel_wait in preparation
for signal handling.
v14: RCU commentary, assert_spin_locked
v15: Hide BUG_ON behind the compiler; report on gem_latency findings.
v16: Sort seqno-groups by priority so that first-waiter has the highest
task priority (and so avoid priority inversion).
v17: Add waiters to post-mortem GPU hang state.
v18: Return early for a completed wait after acquiring the spinlock.
Avoids adding ourselves to the tree if the is already complete, and
skips the awkward question of why we don't do completion wakeups for
waits earlier than or equal to ourselves.
v19: Prepare for init_breadcrumbs to fail. Later patches may want to
allocate during init, so be prepared to propagate back the error code.
Testcase: igt/gem_concurrent_blit
Testcase: igt/benchmarks/gem_latency
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> #v18
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-6-git-send-email-chris@chris-wilson.co.uk
2016-07-02 00:23:15 +08:00
|
|
|
}
|
2012-12-10 19:56:17 +08:00
|
|
|
|
drm/i915: Support for creating write combined type vmaps
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.
We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT
v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)
v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)
v4:
- Rename the enums and clean up the pin_map function. (Chris)
v5: Drop the VM_NO_GUARD, minor cosmetics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
2016-08-12 19:39:58 +08:00
|
|
|
if (!ptr) {
|
|
|
|
ptr = i915_gem_object_map(obj, type);
|
|
|
|
if (!ptr) {
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto err;
|
|
|
|
}
|
2012-01-25 23:32:49 +08:00
|
|
|
|
drm/i915: Support for creating write combined type vmaps
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.
We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT
v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)
v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)
v4:
- Rename the enums and clean up the pin_map function. (Chris)
v5: Drop the VM_NO_GUARD, minor cosmetics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
2016-08-12 19:39:58 +08:00
|
|
|
obj->mapping = ptr_pack_bits(ptr, type);
|
2012-11-28 00:22:52 +08:00
|
|
|
}
|
2012-01-25 23:32:49 +08:00
|
|
|
|
drm/i915: Support for creating write combined type vmaps
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.
We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT
v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)
v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)
v4:
- Rename the enums and clean up the pin_map function. (Chris)
v5: Drop the VM_NO_GUARD, minor cosmetics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
2016-08-12 19:39:58 +08:00
|
|
|
return ptr;
|
2016-07-04 15:08:31 +08:00
|
|
|
|
drm/i915: Support for creating write combined type vmaps
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.
We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT
v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)
v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)
v4:
- Rename the enums and clean up the pin_map function. (Chris)
v5: Drop the VM_NO_GUARD, minor cosmetics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
2016-08-12 19:39:58 +08:00
|
|
|
err:
|
|
|
|
i915_gem_object_unpin_pages(obj);
|
|
|
|
return ERR_PTR(ret);
|
2016-07-04 15:08:31 +08:00
|
|
|
}
|
|
|
|
|
2015-04-27 20:41:17 +08:00
|
|
|
static void
|
drm/i915: Refactor activity tracking for requests
With the introduction of requests, we amplified the number of atomic
refcounted objects we use and update every execbuffer; from none to
several references, and a set of references that need to be changed. We
also introduced interesting side-effects in the order of retiring
requests and objects.
Instead of independently tracking the last request for an object, track
the active objects for each request. The object will reside in the
buffer list of its most recent active request and so we reduce the kref
interchange to a list_move. Now retirements are entirely driven by the
request, dramatically simplifying activity tracking on the object
themselves, and removing the ambiguity between retiring objects and
retiring requests.
Furthermore with the consolidation of managing the activity tracking
centrally, we can look forward to using RCU to enable lockless lookup of
the current active requests for an object. In the future, we will be
able to query the status or wait upon rendering to an object without
even touching the struct_mutex BKL.
All told, less code, simpler and faster, and more extensible.
v2: Add a typedef for the function pointer for convenience later.
v3: Make the noop retirement callback explicit. Allow passing NULL to
the init_request_active() which is expanded to a common noop function.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-16-git-send-email-chris@chris-wilson.co.uk
2016-08-04 14:52:35 +08:00
|
|
|
i915_gem_object_retire__write(struct i915_gem_active *active,
|
|
|
|
struct drm_i915_gem_request *request)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
drm/i915: Refactor activity tracking for requests
With the introduction of requests, we amplified the number of atomic
refcounted objects we use and update every execbuffer; from none to
several references, and a set of references that need to be changed. We
also introduced interesting side-effects in the order of retiring
requests and objects.
Instead of independently tracking the last request for an object, track
the active objects for each request. The object will reside in the
buffer list of its most recent active request and so we reduce the kref
interchange to a list_move. Now retirements are entirely driven by the
request, dramatically simplifying activity tracking on the object
themselves, and removing the ambiguity between retiring objects and
retiring requests.
Furthermore with the consolidation of managing the activity tracking
centrally, we can look forward to using RCU to enable lockless lookup of
the current active requests for an object. In the future, we will be
able to query the status or wait upon rendering to an object without
even touching the struct_mutex BKL.
All told, less code, simpler and faster, and more extensible.
v2: Add a typedef for the function pointer for convenience later.
v3: Make the noop retirement callback explicit. Allow passing NULL to
the init_request_active() which is expanded to a common noop function.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-16-git-send-email-chris@chris-wilson.co.uk
2016-08-04 14:52:35 +08:00
|
|
|
struct drm_i915_gem_object *obj =
|
|
|
|
container_of(active, struct drm_i915_gem_object, last_write);
|
2016-04-07 14:29:17 +08:00
|
|
|
|
2015-07-08 07:28:51 +08:00
|
|
|
intel_fb_obj_flush(obj, true, ORIGIN_CS);
|
2013-09-25 00:57:58 +08:00
|
|
|
}
|
2016-04-07 14:29:17 +08:00
|
|
|
|
2010-11-12 21:53:37 +08:00
|
|
|
static void
|
drm/i915: Refactor activity tracking for requests
With the introduction of requests, we amplified the number of atomic
refcounted objects we use and update every execbuffer; from none to
several references, and a set of references that need to be changed. We
also introduced interesting side-effects in the order of retiring
requests and objects.
Instead of independently tracking the last request for an object, track
the active objects for each request. The object will reside in the
buffer list of its most recent active request and so we reduce the kref
interchange to a list_move. Now retirements are entirely driven by the
request, dramatically simplifying activity tracking on the object
themselves, and removing the ambiguity between retiring objects and
retiring requests.
Furthermore with the consolidation of managing the activity tracking
centrally, we can look forward to using RCU to enable lockless lookup of
the current active requests for an object. In the future, we will be
able to query the status or wait upon rendering to an object without
even touching the struct_mutex BKL.
All told, less code, simpler and faster, and more extensible.
v2: Add a typedef for the function pointer for convenience later.
v3: Make the noop retirement callback explicit. Allow passing NULL to
the init_request_active() which is expanded to a common noop function.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-16-git-send-email-chris@chris-wilson.co.uk
2016-08-04 14:52:35 +08:00
|
|
|
i915_gem_object_retire__read(struct i915_gem_active *active,
|
|
|
|
struct drm_i915_gem_request *request)
|
2008-11-07 08:00:31 +08:00
|
|
|
{
|
drm/i915: Refactor activity tracking for requests
With the introduction of requests, we amplified the number of atomic
refcounted objects we use and update every execbuffer; from none to
several references, and a set of references that need to be changed. We
also introduced interesting side-effects in the order of retiring
requests and objects.
Instead of independently tracking the last request for an object, track
the active objects for each request. The object will reside in the
buffer list of its most recent active request and so we reduce the kref
interchange to a list_move. Now retirements are entirely driven by the
request, dramatically simplifying activity tracking on the object
themselves, and removing the ambiguity between retiring objects and
retiring requests.
Furthermore with the consolidation of managing the activity tracking
centrally, we can look forward to using RCU to enable lockless lookup of
the current active requests for an object. In the future, we will be
able to query the status or wait upon rendering to an object without
even touching the struct_mutex BKL.
All told, less code, simpler and faster, and more extensible.
v2: Add a typedef for the function pointer for convenience later.
v3: Make the noop retirement callback explicit. Allow passing NULL to
the init_request_active() which is expanded to a common noop function.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-16-git-send-email-chris@chris-wilson.co.uk
2016-08-04 14:52:35 +08:00
|
|
|
int idx = request->engine->id;
|
|
|
|
struct drm_i915_gem_object *obj =
|
|
|
|
container_of(active, struct drm_i915_gem_object, last_read[idx]);
|
2016-04-07 14:29:17 +08:00
|
|
|
|
2016-08-04 23:32:39 +08:00
|
|
|
GEM_BUG_ON(!i915_gem_object_has_active_engine(obj, idx));
|
2012-02-15 19:25:36 +08:00
|
|
|
|
2016-08-04 23:32:39 +08:00
|
|
|
i915_gem_object_clear_active(obj, idx);
|
|
|
|
if (i915_gem_object_is_active(obj))
|
2015-04-27 20:41:17 +08:00
|
|
|
return;
|
2015-04-16 01:11:33 +08:00
|
|
|
|
2015-07-27 17:26:26 +08:00
|
|
|
/* Bump our place on the bound list to keep it roughly in LRU order
|
|
|
|
* so that we don't steal from recently used but inactive objects
|
|
|
|
* (unless we are forced to ofc!)
|
|
|
|
*/
|
2016-08-04 14:52:44 +08:00
|
|
|
if (obj->bind_count)
|
|
|
|
list_move_tail(&obj->global_list,
|
|
|
|
&request->i915->mm.bound_list);
|
2016-07-04 15:08:31 +08:00
|
|
|
|
2016-07-20 20:31:53 +08:00
|
|
|
i915_gem_object_put(obj);
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
|
|
|
|
2016-07-04 15:08:37 +08:00
|
|
|
static bool i915_context_is_banned(const struct i915_gem_context *ctx)
|
2013-08-30 21:19:28 +08:00
|
|
|
{
|
2014-01-30 22:01:15 +08:00
|
|
|
unsigned long elapsed;
|
2013-08-30 21:19:28 +08:00
|
|
|
|
2014-01-30 22:01:15 +08:00
|
|
|
if (ctx->hang_stats.banned)
|
2013-08-30 21:19:28 +08:00
|
|
|
return true;
|
|
|
|
|
2016-07-04 15:08:37 +08:00
|
|
|
elapsed = get_seconds() - ctx->hang_stats.guilty_ts;
|
2014-12-25 00:13:39 +08:00
|
|
|
if (ctx->hang_stats.ban_period_seconds &&
|
|
|
|
elapsed <= ctx->hang_stats.ban_period_seconds) {
|
2016-07-04 15:08:37 +08:00
|
|
|
DRM_DEBUG("context hanging too fast, banning!\n");
|
|
|
|
return true;
|
2013-08-30 21:19:28 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2016-07-04 15:08:37 +08:00
|
|
|
static void i915_set_reset_status(struct i915_gem_context *ctx,
|
2014-01-31 01:04:43 +08:00
|
|
|
const bool guilty)
|
2013-06-12 20:13:20 +08:00
|
|
|
{
|
2016-07-04 15:08:37 +08:00
|
|
|
struct i915_ctx_hang_stats *hs = &ctx->hang_stats;
|
2014-01-30 22:01:15 +08:00
|
|
|
|
|
|
|
if (guilty) {
|
2016-07-04 15:08:37 +08:00
|
|
|
hs->banned = i915_context_is_banned(ctx);
|
2014-01-30 22:01:15 +08:00
|
|
|
hs->batch_active++;
|
|
|
|
hs->guilty_ts = get_seconds();
|
|
|
|
} else {
|
|
|
|
hs->batch_pending++;
|
2013-06-12 20:13:20 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2014-02-25 23:11:23 +08:00
|
|
|
struct drm_i915_gem_request *
|
2016-03-16 19:00:37 +08:00
|
|
|
i915_gem_find_active_request(struct intel_engine_cs *engine)
|
2010-09-19 19:21:28 +08:00
|
|
|
{
|
2013-12-04 19:37:09 +08:00
|
|
|
struct drm_i915_gem_request *request;
|
|
|
|
|
2016-07-02 00:23:16 +08:00
|
|
|
/* We are called by the error capture and reset at a random
|
|
|
|
* point in time. In particular, note that neither is crucially
|
|
|
|
* ordered with an interrupt. After a hang, the GPU is dead and we
|
|
|
|
* assume that no more writes can happen (we waited long enough for
|
|
|
|
* all writes that were in transaction to be flushed) - adding an
|
|
|
|
* extra delay for a recent interrupt is pointless. Hence, we do
|
|
|
|
* not need an engine->irq_seqno_barrier() before the seqno reads.
|
|
|
|
*/
|
2016-08-04 14:52:33 +08:00
|
|
|
list_for_each_entry(request, &engine->request_list, link) {
|
2016-07-02 00:23:16 +08:00
|
|
|
if (i915_gem_request_completed(request))
|
2013-12-04 19:37:09 +08:00
|
|
|
continue;
|
2013-06-12 20:13:20 +08:00
|
|
|
|
2016-09-09 21:11:54 +08:00
|
|
|
if (!i915_sw_fence_done(&request->submit))
|
|
|
|
break;
|
|
|
|
|
2014-01-31 01:04:43 +08:00
|
|
|
return request;
|
2013-12-04 19:37:09 +08:00
|
|
|
}
|
2014-01-31 01:04:43 +08:00
|
|
|
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2016-09-09 21:11:53 +08:00
|
|
|
static void reset_request(struct drm_i915_gem_request *request)
|
|
|
|
{
|
|
|
|
void *vaddr = request->ring->vaddr;
|
|
|
|
u32 head;
|
|
|
|
|
|
|
|
/* As this request likely depends on state from the lost
|
|
|
|
* context, clear out all the user operations leaving the
|
|
|
|
* breadcrumb at the end (so we get the fence notifications).
|
|
|
|
*/
|
|
|
|
head = request->head;
|
|
|
|
if (request->postfix < head) {
|
|
|
|
memset(vaddr + head, 0, request->ring->size - head);
|
|
|
|
head = 0;
|
|
|
|
}
|
|
|
|
memset(vaddr + head, 0, request->postfix - head);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void i915_gem_reset_engine(struct intel_engine_cs *engine)
|
2014-01-31 01:04:43 +08:00
|
|
|
{
|
|
|
|
struct drm_i915_gem_request *request;
|
2016-09-09 21:11:53 +08:00
|
|
|
struct i915_gem_context *incomplete_ctx;
|
2014-01-31 01:04:43 +08:00
|
|
|
bool ring_hung;
|
|
|
|
|
2016-09-09 21:11:53 +08:00
|
|
|
if (engine->irq_seqno_barrier)
|
|
|
|
engine->irq_seqno_barrier(engine);
|
|
|
|
|
2016-03-16 19:00:37 +08:00
|
|
|
request = i915_gem_find_active_request(engine);
|
2016-09-09 21:11:53 +08:00
|
|
|
if (!request)
|
2014-01-31 01:04:43 +08:00
|
|
|
return;
|
|
|
|
|
2016-03-16 19:00:37 +08:00
|
|
|
ring_hung = engine->hangcheck.score >= HANGCHECK_SCORE_RING_HUNG;
|
2016-10-05 04:11:29 +08:00
|
|
|
if (engine->hangcheck.seqno != intel_engine_get_seqno(engine))
|
|
|
|
ring_hung = false;
|
|
|
|
|
2016-07-04 15:08:37 +08:00
|
|
|
i915_set_reset_status(request->ctx, ring_hung);
|
2016-09-09 21:11:53 +08:00
|
|
|
if (!ring_hung)
|
|
|
|
return;
|
2014-01-02 02:15:13 +08:00
|
|
|
|
2016-09-09 21:11:53 +08:00
|
|
|
DRM_DEBUG_DRIVER("resetting %s to restart from tail of request 0x%x\n",
|
|
|
|
engine->name, request->fence.seqno);
|
2014-01-02 02:15:13 +08:00
|
|
|
|
2016-09-09 21:11:53 +08:00
|
|
|
/* Setup the CS to resume from the breadcrumb of the hung request */
|
|
|
|
engine->reset_hw(engine, request);
|
2015-09-03 20:01:40 +08:00
|
|
|
|
2016-09-09 21:11:53 +08:00
|
|
|
/* Users of the default context do not rely on logical state
|
|
|
|
* preserved between batches. They have to emit full state on
|
|
|
|
* every batch and so it is safe to execute queued requests following
|
|
|
|
* the hang.
|
|
|
|
*
|
|
|
|
* Other contexts preserve state, now corrupt. We want to skip all
|
|
|
|
* queued requests that reference the corrupt context.
|
2015-09-03 20:01:40 +08:00
|
|
|
*/
|
2016-09-09 21:11:53 +08:00
|
|
|
incomplete_ctx = request->ctx;
|
|
|
|
if (i915_gem_context_is_default(incomplete_ctx))
|
|
|
|
return;
|
2016-07-13 16:10:31 +08:00
|
|
|
|
2016-08-04 14:52:33 +08:00
|
|
|
list_for_each_entry_continue(request, &engine->request_list, link)
|
2016-09-09 21:11:53 +08:00
|
|
|
if (request->ctx == incomplete_ctx)
|
|
|
|
reset_request(request);
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
|
|
|
|
2016-09-09 21:11:53 +08:00
|
|
|
void i915_gem_reset(struct drm_i915_private *dev_priv)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
2016-03-16 19:00:36 +08:00
|
|
|
struct intel_engine_cs *engine;
|
drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 01:14:48 +08:00
|
|
|
enum intel_engine_id id;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2016-09-09 21:11:53 +08:00
|
|
|
i915_gem_retire_requests(dev_priv);
|
2013-12-04 19:37:09 +08:00
|
|
|
|
drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 01:14:48 +08:00
|
|
|
for_each_engine(engine, dev_priv, id)
|
2016-09-09 21:11:53 +08:00
|
|
|
i915_gem_reset_engine(engine);
|
2010-09-22 17:31:52 +08:00
|
|
|
|
2016-09-09 21:11:53 +08:00
|
|
|
i915_gem_restore_fences(&dev_priv->drm);
|
2013-12-07 06:11:03 +08:00
|
|
|
|
2016-09-21 21:51:06 +08:00
|
|
|
if (dev_priv->gt.awake) {
|
|
|
|
intel_sanitize_gt_powersave(dev_priv);
|
|
|
|
intel_enable_gt_powersave(dev_priv);
|
|
|
|
if (INTEL_GEN(dev_priv) >= 6)
|
|
|
|
gen6_rps_busy(dev_priv);
|
|
|
|
}
|
2016-09-09 21:11:53 +08:00
|
|
|
}
|
2015-04-27 20:41:17 +08:00
|
|
|
|
2016-09-09 21:11:53 +08:00
|
|
|
static void nop_submit_request(struct drm_i915_gem_request *request)
|
|
|
|
{
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
|
|
|
|
2016-09-09 21:11:53 +08:00
|
|
|
static void i915_gem_cleanup_engine(struct intel_engine_cs *engine)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
2016-09-09 21:11:53 +08:00
|
|
|
engine->submit_request = nop_submit_request;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2016-07-20 16:21:10 +08:00
|
|
|
/* Mark all pending requests as complete so that any concurrent
|
|
|
|
* (lockless) lookup doesn't try and wait upon the request as we
|
|
|
|
* reset it.
|
2014-01-07 19:45:14 +08:00
|
|
|
*/
|
2016-08-09 15:37:02 +08:00
|
|
|
intel_engine_init_seqno(engine, engine->last_submitted_seqno);
|
2008-07-31 03:06:12 +08:00
|
|
|
|
drm/i915/bdw: Pin the context backing objects to GGTT on-demand
Up until now, we have pinned every logical ring context backing object
during creation, and left it pinned until destruction. This made my life
easier, but it's a harmful thing to do, because we cause fragmentation
of the GGTT (and, eventually, we would run out of space).
This patch makes the pinning on-demand: the backing objects of the two
contexts that are written to the ELSP are pinned right before submission
and unpinned once the hardware is done with them. The only context that
is still pinned regardless is the global default one, so that the HWS can
still be accessed in the same way (ring->status_page).
v2: In the early version of this patch, we were pinning the context as
we put it into the ELSP: on the one hand, this is very efficient because
only a maximum two contexts are pinned at any given time, but on the other
hand, we cannot really pin in interrupt time :(
v3: Use a mutex rather than atomic_t to protect pin count to avoid races.
Do not unpin default context in free_request.
v4: Break out pin and unpin into functions. Fix style problems reported
by checkpatch
v5: Remove unpin_lock as all pinning and unpinning is done with the struct
mutex already locked. Add WARN_ONs to make sure this is the case in future.
Issue: VIZ-4277
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Akash Goel <akash.goels@gmail.com>
Reviewed-by: Deepak S<deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-11-13 18:28:10 +08:00
|
|
|
/*
|
|
|
|
* Clear the execlists queue up before freeing the requests, as those
|
|
|
|
* are the ones that keep the context and ringbuffer backing objects
|
|
|
|
* pinned in place.
|
2015-03-19 02:19:22 +08:00
|
|
|
*/
|
|
|
|
|
2015-10-19 23:32:32 +08:00
|
|
|
if (i915.enable_execlists) {
|
2016-09-09 21:11:46 +08:00
|
|
|
spin_lock(&engine->execlist_lock);
|
|
|
|
INIT_LIST_HEAD(&engine->execlist_queue);
|
|
|
|
i915_gem_request_put(engine->execlist_port[0].request);
|
|
|
|
i915_gem_request_put(engine->execlist_port[1].request);
|
|
|
|
memset(engine->execlist_port, 0, sizeof(engine->execlist_port));
|
|
|
|
spin_unlock(&engine->execlist_lock);
|
2015-03-19 02:19:22 +08:00
|
|
|
}
|
|
|
|
|
2016-07-13 16:10:31 +08:00
|
|
|
engine->i915->gt.active_engines &= ~intel_engine_flag(engine);
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
|
|
|
|
2016-09-09 21:11:53 +08:00
|
|
|
void i915_gem_set_wedged(struct drm_i915_private *dev_priv)
|
2010-07-24 06:18:49 +08:00
|
|
|
{
|
2016-03-16 19:00:36 +08:00
|
|
|
struct intel_engine_cs *engine;
|
drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 01:14:48 +08:00
|
|
|
enum intel_engine_id id;
|
2016-07-04 15:08:31 +08:00
|
|
|
|
2016-07-05 17:40:23 +08:00
|
|
|
lockdep_assert_held(&dev_priv->drm.struct_mutex);
|
2016-09-09 21:11:53 +08:00
|
|
|
set_bit(I915_WEDGED, &dev_priv->gpu_error.flags);
|
2016-07-04 15:08:31 +08:00
|
|
|
|
2016-09-09 21:11:53 +08:00
|
|
|
i915_gem_context_lost(dev_priv);
|
drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 01:14:48 +08:00
|
|
|
for_each_engine(engine, dev_priv, id)
|
2016-09-09 21:11:53 +08:00
|
|
|
i915_gem_cleanup_engine(engine);
|
2016-07-13 16:10:31 +08:00
|
|
|
mod_delayed_work(dev_priv->wq, &dev_priv->gt.idle_work, 0);
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-26 00:34:56 +08:00
|
|
|
|
2016-09-09 21:11:53 +08:00
|
|
|
i915_gem_retire_requests(dev_priv);
|
2010-07-24 06:18:49 +08:00
|
|
|
}
|
|
|
|
|
2010-08-21 06:25:16 +08:00
|
|
|
static void
|
2008-07-31 03:06:12 +08:00
|
|
|
i915_gem_retire_work_handler(struct work_struct *work)
|
|
|
|
{
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-26 00:34:56 +08:00
|
|
|
struct drm_i915_private *dev_priv =
|
2016-07-04 15:08:31 +08:00
|
|
|
container_of(work, typeof(*dev_priv), gt.retire_work.work);
|
2016-07-05 17:40:23 +08:00
|
|
|
struct drm_device *dev = &dev_priv->drm;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2010-09-29 19:26:37 +08:00
|
|
|
/* Come back later if the device is busy... */
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-26 00:34:56 +08:00
|
|
|
if (mutex_trylock(&dev->struct_mutex)) {
|
2016-07-04 15:08:31 +08:00
|
|
|
i915_gem_retire_requests(dev_priv);
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-26 00:34:56 +08:00
|
|
|
mutex_unlock(&dev->struct_mutex);
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
2016-07-04 15:08:31 +08:00
|
|
|
|
|
|
|
/* Keep the retire handler running until we are finally idle.
|
|
|
|
* We do not need to do this test under locking as in the worst-case
|
|
|
|
* we queue the retire worker once too often.
|
|
|
|
*/
|
2016-07-09 17:12:06 +08:00
|
|
|
if (READ_ONCE(dev_priv->gt.awake)) {
|
|
|
|
i915_queue_hangcheck(dev_priv);
|
2016-07-04 15:08:31 +08:00
|
|
|
queue_delayed_work(dev_priv->wq,
|
|
|
|
&dev_priv->gt.retire_work,
|
2012-10-06 00:02:57 +08:00
|
|
|
round_jiffies_up_relative(HZ));
|
2016-07-09 17:12:06 +08:00
|
|
|
}
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-26 00:34:56 +08:00
|
|
|
}
|
2011-01-10 05:05:44 +08:00
|
|
|
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-26 00:34:56 +08:00
|
|
|
static void
|
|
|
|
i915_gem_idle_work_handler(struct work_struct *work)
|
|
|
|
{
|
|
|
|
struct drm_i915_private *dev_priv =
|
2016-07-04 15:08:31 +08:00
|
|
|
container_of(work, typeof(*dev_priv), gt.idle_work.work);
|
2016-07-05 17:40:23 +08:00
|
|
|
struct drm_device *dev = &dev_priv->drm;
|
2016-03-24 19:20:38 +08:00
|
|
|
struct intel_engine_cs *engine;
|
drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 01:14:48 +08:00
|
|
|
enum intel_engine_id id;
|
2016-07-04 15:08:31 +08:00
|
|
|
bool rearm_hangcheck;
|
|
|
|
|
|
|
|
if (!READ_ONCE(dev_priv->gt.awake))
|
|
|
|
return;
|
|
|
|
|
|
|
|
if (READ_ONCE(dev_priv->gt.active_engines))
|
|
|
|
return;
|
|
|
|
|
|
|
|
rearm_hangcheck =
|
|
|
|
cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work);
|
|
|
|
|
|
|
|
if (!mutex_trylock(&dev->struct_mutex)) {
|
|
|
|
/* Currently busy, come back later */
|
|
|
|
mod_delayed_work(dev_priv->wq,
|
|
|
|
&dev_priv->gt.idle_work,
|
|
|
|
msecs_to_jiffies(50));
|
|
|
|
goto out_rearm;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (dev_priv->gt.active_engines)
|
|
|
|
goto out_unlock;
|
drm/i915: simplify allocation of driver-internal requests
There are a number of places where the driver needs a request, but isn't
working on behalf of any specific user or in a specific context. At
present, we associate them with the per-engine default context. A future
patch will abolish those per-engine context pointers; but we can already
eliminate a lot of the references to them, just by making the allocator
allow NULL as a shorthand for "an appropriate context for this ring",
which will mean that the callers don't need to know anything about how
the "appropriate context" is found (e.g. per-ring vs per-device, etc).
So this patch renames the existing i915_gem_request_alloc(), and makes
it local (static inline), and replaces it with a wrapper that provides
a default if the context is NULL, and also has a nicer calling
convention (doesn't require a pointer to an output parameter). Then we
change all callers to use the new convention:
OLD:
err = i915_gem_request_alloc(ring, user_ctx, &req);
if (err) ...
NEW:
req = i915_gem_request_alloc(ring, user_ctx);
if (IS_ERR(req)) ...
OLD:
err = i915_gem_request_alloc(ring, ring->default_context, &req);
if (err) ...
NEW:
req = i915_gem_request_alloc(ring, NULL);
if (IS_ERR(req)) ...
v4: Rebased
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Nick Hoath <nicholas.hoath@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453230175-19330-2-git-send-email-david.s.gordon@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-01-20 03:02:53 +08:00
|
|
|
|
drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 01:14:48 +08:00
|
|
|
for_each_engine(engine, dev_priv, id)
|
2016-07-04 15:08:31 +08:00
|
|
|
i915_gem_batch_pool_fini(&engine->batch_pool);
|
drm/i915: simplify allocation of driver-internal requests
There are a number of places where the driver needs a request, but isn't
working on behalf of any specific user or in a specific context. At
present, we associate them with the per-engine default context. A future
patch will abolish those per-engine context pointers; but we can already
eliminate a lot of the references to them, just by making the allocator
allow NULL as a shorthand for "an appropriate context for this ring",
which will mean that the callers don't need to know anything about how
the "appropriate context" is found (e.g. per-ring vs per-device, etc).
So this patch renames the existing i915_gem_request_alloc(), and makes
it local (static inline), and replaces it with a wrapper that provides
a default if the context is NULL, and also has a nicer calling
convention (doesn't require a pointer to an output parameter). Then we
change all callers to use the new convention:
OLD:
err = i915_gem_request_alloc(ring, user_ctx, &req);
if (err) ...
NEW:
req = i915_gem_request_alloc(ring, user_ctx);
if (IS_ERR(req)) ...
OLD:
err = i915_gem_request_alloc(ring, ring->default_context, &req);
if (err) ...
NEW:
req = i915_gem_request_alloc(ring, NULL);
if (IS_ERR(req)) ...
v4: Rebased
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Nick Hoath <nicholas.hoath@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453230175-19330-2-git-send-email-david.s.gordon@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-01-20 03:02:53 +08:00
|
|
|
|
2016-07-04 15:08:31 +08:00
|
|
|
GEM_BUG_ON(!dev_priv->gt.awake);
|
|
|
|
dev_priv->gt.awake = false;
|
|
|
|
rearm_hangcheck = false;
|
2015-06-18 20:14:56 +08:00
|
|
|
|
2016-07-04 15:08:31 +08:00
|
|
|
if (INTEL_GEN(dev_priv) >= 6)
|
|
|
|
gen6_rps_idle(dev_priv);
|
|
|
|
intel_runtime_pm_put(dev_priv);
|
|
|
|
out_unlock:
|
|
|
|
mutex_unlock(&dev->struct_mutex);
|
2015-04-27 20:41:17 +08:00
|
|
|
|
2016-07-04 15:08:31 +08:00
|
|
|
out_rearm:
|
|
|
|
if (rearm_hangcheck) {
|
|
|
|
GEM_BUG_ON(!dev_priv->gt.awake);
|
|
|
|
i915_queue_hangcheck(dev_priv);
|
2015-04-27 20:41:17 +08:00
|
|
|
}
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
2015-04-27 20:41:17 +08:00
|
|
|
|
2016-08-04 14:52:45 +08:00
|
|
|
void i915_gem_close_object(struct drm_gem_object *gem, struct drm_file *file)
|
|
|
|
{
|
|
|
|
struct drm_i915_gem_object *obj = to_intel_bo(gem);
|
|
|
|
struct drm_i915_file_private *fpriv = file->driver_priv;
|
|
|
|
struct i915_vma *vma, *vn;
|
|
|
|
|
|
|
|
mutex_lock(&obj->base.dev->struct_mutex);
|
|
|
|
list_for_each_entry_safe(vma, vn, &obj->vma_list, obj_link)
|
|
|
|
if (vma->vm->file == fpriv)
|
|
|
|
i915_vma_close(vma);
|
|
|
|
mutex_unlock(&obj->base.dev->struct_mutex);
|
2015-04-27 20:41:17 +08:00
|
|
|
}
|
|
|
|
|
2012-04-12 02:18:19 +08:00
|
|
|
/**
|
2012-05-25 06:03:10 +08:00
|
|
|
* i915_gem_wait_ioctl - implements DRM_IOCTL_I915_GEM_WAIT
|
2016-06-03 21:02:17 +08:00
|
|
|
* @dev: drm device pointer
|
|
|
|
* @data: ioctl data blob
|
|
|
|
* @file: drm file pointer
|
2012-04-12 02:18:19 +08:00
|
|
|
*
|
2012-05-25 06:03:10 +08:00
|
|
|
* Returns 0 if successful, else an error is returned with the remaining time in
|
|
|
|
* the timeout parameter.
|
|
|
|
* -ETIME: object is still busy after timeout
|
|
|
|
* -ERESTARTSYS: signal interrupted the wait
|
|
|
|
* -ENONENT: object doesn't exist
|
|
|
|
* Also possible, but rare:
|
|
|
|
* -EAGAIN: GPU wedged
|
|
|
|
* -ENOMEM: damn
|
|
|
|
* -ENODEV: Internal IRQ fail
|
|
|
|
* -E?: The add request failed
|
2015-06-18 20:14:56 +08:00
|
|
|
*
|
2012-05-25 06:03:10 +08:00
|
|
|
* The wait ioctl with a timeout of 0 reimplements the busy ioctl. With any
|
|
|
|
* non-zero timeout parameter the wait ioctl will wait for the given number of
|
|
|
|
* nanoseconds on an object becoming unbusy. Since the wait itself does so
|
|
|
|
* without holding struct_mutex the object may become re-busied before this
|
|
|
|
* function completes. A similar but shorter * race condition exists in the busy
|
|
|
|
* ioctl
|
2012-04-12 02:18:19 +08:00
|
|
|
*/
|
2012-04-06 05:47:36 +08:00
|
|
|
int
|
2012-05-25 06:03:10 +08:00
|
|
|
i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
|
2012-04-06 05:47:36 +08:00
|
|
|
{
|
2012-05-25 06:03:10 +08:00
|
|
|
struct drm_i915_gem_wait *args = data;
|
2016-08-05 17:14:17 +08:00
|
|
|
struct intel_rps_client *rps = to_rps_client(file);
|
2012-05-25 06:03:10 +08:00
|
|
|
struct drm_i915_gem_object *obj;
|
2016-08-05 17:14:17 +08:00
|
|
|
unsigned long active;
|
|
|
|
int idx, ret = 0;
|
2014-11-25 02:49:43 +08:00
|
|
|
|
2014-09-29 21:31:26 +08:00
|
|
|
if (args->flags != 0)
|
|
|
|
return -EINVAL;
|
2012-04-06 05:47:36 +08:00
|
|
|
|
2016-07-20 20:31:51 +08:00
|
|
|
obj = i915_gem_object_lookup(file, args->bo_handle);
|
2016-08-05 17:14:17 +08:00
|
|
|
if (!obj)
|
2012-05-25 06:03:10 +08:00
|
|
|
return -ENOENT;
|
2012-04-06 05:47:36 +08:00
|
|
|
|
2016-08-05 17:14:17 +08:00
|
|
|
active = __I915_BO_ACTIVE(obj);
|
|
|
|
for_each_active(active, idx) {
|
|
|
|
s64 *timeout = args->timeout_ns >= 0 ? &args->timeout_ns : NULL;
|
2016-09-09 21:11:49 +08:00
|
|
|
ret = i915_gem_active_wait_unlocked(&obj->last_read[idx],
|
|
|
|
I915_WAIT_INTERRUPTIBLE,
|
2016-08-05 17:14:17 +08:00
|
|
|
timeout, rps);
|
2015-04-27 20:41:17 +08:00
|
|
|
if (ret)
|
2016-08-05 17:14:17 +08:00
|
|
|
break;
|
2015-04-27 20:41:17 +08:00
|
|
|
}
|
2012-04-06 05:47:36 +08:00
|
|
|
|
2016-08-05 17:14:17 +08:00
|
|
|
i915_gem_object_put_unlocked(obj);
|
2014-11-25 02:49:28 +08:00
|
|
|
return ret;
|
2011-04-14 05:06:03 +08:00
|
|
|
}
|
|
|
|
|
2016-04-28 16:56:39 +08:00
|
|
|
static void __i915_vma_iounmap(struct i915_vma *vma)
|
|
|
|
{
|
2016-08-04 23:32:30 +08:00
|
|
|
GEM_BUG_ON(i915_vma_is_pinned(vma));
|
2016-04-28 16:56:39 +08:00
|
|
|
|
|
|
|
if (vma->iomap == NULL)
|
|
|
|
return;
|
|
|
|
|
|
|
|
io_mapping_unmap(vma->iomap);
|
|
|
|
vma->iomap = NULL;
|
|
|
|
}
|
|
|
|
|
2016-08-04 14:52:47 +08:00
|
|
|
int i915_vma_unbind(struct i915_vma *vma)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
drm/i915: plumb VM into bind/unbind code
As alluded to in several patches, and it will be reiterated later... A
VMA is an abstraction for a GEM BO bound into an address space.
Therefore it stands to reason, that the existing bind, and unbind are
the ones which will be the most impacted. This patch implements this,
and updates all callers which weren't already updated in the series
(because it was too messy).
This patch represents the bulk of an earlier, larger patch. I've pulled
out a bunch of things by the request of Daniel. The history is preserved
for posterity with the email convention of ">" One big change from the
original patch aside from a bunch of cropping is I've created an
i915_vma_unbind() function. That is because we always have the VMA
anyway, and doing an extra lookup is useful. There is a caveat, we
retain an i915_gem_object_ggtt_unbind, for the global cases which might
not talk in VMAs.
> drm/i915: plumb VM into object operations
>
> This patch was formerly known as:
> "drm/i915: Create VMAs (part 3) - plumbing"
>
> This patch adds a VM argument, bind/unbind, and the object
> offset/size/color getters/setters. It preserves the old ggtt helper
> functions because things still need, and will continue to need them.
>
> Some code will still need to be ported over after this.
>
> v2: Fix purge to pick an object and unbind all vmas
> This was doable because of the global bound list change.
>
> v3: With the commit to actually pin/unpin pages in place, there is no
> longer a need to check if unbind succeeded before calling put_pages().
> Make put_pages only BUG() after checking pin count.
>
> v4: Rebased on top of the new hangcheck work by Mika
> plumbed eb_destroy also
> Many checkpatch related fixes
>
> v5: Very large rebase
>
> v6:
> Change BUG_ON to WARN_ON (Daniel)
> Rename vm to ggtt in preallocate stolen, since it is always ggtt when
> dealing with stolen memory. (Daniel)
> list_for_each will short-circuit already (Daniel)
> remove superflous space (Daniel)
> Use per object list of vmas (Daniel)
> Make obj_bound_any() use obj_bound for each vm (Ben)
> s/bind_to_gtt/bind_to_vm/ (Ben)
>
> Fixed up the inactive shrinker. As Daniel noticed the code could
> potentially count the same object multiple times. While it's not
> possible in the current case, since 1 object can only ever be bound into
> 1 address space thus far - we may as well try to get something more
> future proof in place now. With a prep patch before this to switch over
> to using the bound list + inactive check, we're now able to carry that
> forward for every address space an object is bound into.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Rebase on top of the loss of "drm/i915: Cleanup more of VMA
in destroy".]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-01 08:00:10 +08:00
|
|
|
struct drm_i915_gem_object *obj = vma->obj;
|
2016-08-04 14:52:44 +08:00
|
|
|
unsigned long active;
|
2013-01-08 18:53:09 +08:00
|
|
|
int ret;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2016-08-04 14:52:44 +08:00
|
|
|
/* First wait upon any activity as retiring the request may
|
|
|
|
* have side-effects such as unpinning or even unbinding this vma.
|
|
|
|
*/
|
|
|
|
active = i915_vma_get_active(vma);
|
2016-08-04 14:52:47 +08:00
|
|
|
if (active) {
|
2016-08-04 14:52:44 +08:00
|
|
|
int idx;
|
|
|
|
|
2016-08-04 14:52:45 +08:00
|
|
|
/* When a closed VMA is retired, it is unbound - eek.
|
|
|
|
* In order to prevent it from being recursively closed,
|
|
|
|
* take a pin on the vma so that the second unbind is
|
|
|
|
* aborted.
|
|
|
|
*/
|
2016-08-04 23:32:30 +08:00
|
|
|
__i915_vma_pin(vma);
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2016-08-04 14:52:44 +08:00
|
|
|
for_each_active(active, idx) {
|
|
|
|
ret = i915_gem_active_retire(&vma->last_read[idx],
|
|
|
|
&vma->vm->dev->struct_mutex);
|
|
|
|
if (ret)
|
2016-08-04 14:52:45 +08:00
|
|
|
break;
|
2016-08-04 14:52:44 +08:00
|
|
|
}
|
2012-08-20 17:23:27 +08:00
|
|
|
|
2016-08-04 23:32:30 +08:00
|
|
|
__i915_vma_unpin(vma);
|
2015-10-05 20:26:36 +08:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
2016-08-04 14:52:45 +08:00
|
|
|
|
2016-08-04 14:52:44 +08:00
|
|
|
GEM_BUG_ON(i915_vma_is_active(vma));
|
2015-10-05 20:26:36 +08:00
|
|
|
}
|
2011-04-14 05:04:09 +08:00
|
|
|
|
2016-08-04 23:32:30 +08:00
|
|
|
if (i915_vma_is_pinned(vma))
|
2016-08-04 14:52:44 +08:00
|
|
|
return -EBUSY;
|
|
|
|
|
2016-08-04 14:52:45 +08:00
|
|
|
if (!drm_mm_node_allocated(&vma->node))
|
|
|
|
goto destroy;
|
2013-08-14 09:09:06 +08:00
|
|
|
|
2016-08-04 14:52:26 +08:00
|
|
|
GEM_BUG_ON(obj->bind_count == 0);
|
|
|
|
GEM_BUG_ON(!obj->pages);
|
2009-09-10 02:50:45 +08:00
|
|
|
|
2016-08-19 00:16:55 +08:00
|
|
|
if (i915_vma_is_map_and_fenceable(vma)) {
|
2014-02-14 21:06:07 +08:00
|
|
|
/* release the fence reg _after_ flushing */
|
2016-08-19 00:17:00 +08:00
|
|
|
ret = i915_vma_put_fence(vma);
|
2014-02-14 21:06:07 +08:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
2016-04-28 16:56:39 +08:00
|
|
|
|
2016-08-19 00:17:09 +08:00
|
|
|
/* Force a pagefault for domain tracking on next user access */
|
|
|
|
i915_gem_release_mmap(obj);
|
|
|
|
|
2016-04-28 16:56:39 +08:00
|
|
|
__i915_vma_iounmap(vma);
|
2016-08-19 00:16:55 +08:00
|
|
|
vma->flags &= ~I915_VMA_CAN_FENCE;
|
2014-02-14 21:06:07 +08:00
|
|
|
}
|
2009-12-16 00:50:00 +08:00
|
|
|
|
2016-08-04 14:52:46 +08:00
|
|
|
if (likely(!vma->vm->closed)) {
|
|
|
|
trace_i915_vma_unbind(vma);
|
|
|
|
vma->vm->unbind_vma(vma);
|
2014-12-11 01:27:58 +08:00
|
|
|
}
|
2016-08-04 23:32:32 +08:00
|
|
|
vma->flags &= ~(I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND);
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2013-07-18 03:19:03 +08:00
|
|
|
drm_mm_remove_node(&vma->node);
|
2016-08-04 14:52:46 +08:00
|
|
|
list_move_tail(&vma->vm_link, &vma->vm->unbound_list);
|
|
|
|
|
2016-08-19 00:16:55 +08:00
|
|
|
if (vma->pages != obj->pages) {
|
|
|
|
GEM_BUG_ON(!vma->pages);
|
|
|
|
sg_free_table(vma->pages);
|
|
|
|
kfree(vma->pages);
|
2014-12-11 01:27:58 +08:00
|
|
|
}
|
2016-08-15 17:48:47 +08:00
|
|
|
vma->pages = NULL;
|
2013-07-18 03:19:03 +08:00
|
|
|
|
|
|
|
/* Since the unbound list is global, only move to that list if
|
2013-08-26 17:23:47 +08:00
|
|
|
* no more VMAs exist. */
|
2016-08-04 14:52:26 +08:00
|
|
|
if (--obj->bind_count == 0)
|
|
|
|
list_move_tail(&obj->global_list,
|
|
|
|
&to_i915(obj->base.dev)->mm.unbound_list);
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2013-12-04 17:59:09 +08:00
|
|
|
/* And finally now the object is completely decoupled from this vma,
|
|
|
|
* we can drop its hold on the backing storage and allow it to be
|
|
|
|
* reaped by the shrinker.
|
|
|
|
*/
|
|
|
|
i915_gem_object_unpin_pages(obj);
|
|
|
|
|
2016-08-04 14:52:45 +08:00
|
|
|
destroy:
|
2016-08-04 23:32:32 +08:00
|
|
|
if (unlikely(i915_vma_is_closed(vma)))
|
2016-08-04 14:52:45 +08:00
|
|
|
i915_vma_destroy(vma);
|
2015-10-05 20:26:36 +08:00
|
|
|
|
2011-01-08 01:09:48 +08:00
|
|
|
return 0;
|
2015-10-05 20:26:36 +08:00
|
|
|
}
|
|
|
|
|
2016-08-05 17:14:11 +08:00
|
|
|
int i915_gem_wait_for_idle(struct drm_i915_private *dev_priv,
|
2016-09-09 21:11:49 +08:00
|
|
|
unsigned int flags)
|
2010-02-19 18:52:00 +08:00
|
|
|
{
|
2016-03-16 19:00:36 +08:00
|
|
|
struct intel_engine_cs *engine;
|
drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 01:14:48 +08:00
|
|
|
enum intel_engine_id id;
|
2016-03-24 19:20:38 +08:00
|
|
|
int ret;
|
2010-02-19 18:52:00 +08:00
|
|
|
|
drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 01:14:48 +08:00
|
|
|
for_each_engine(engine, dev_priv, id) {
|
2016-06-24 21:55:52 +08:00
|
|
|
if (engine->last_context == NULL)
|
|
|
|
continue;
|
2012-08-15 05:35:14 +08:00
|
|
|
|
2016-09-09 21:11:49 +08:00
|
|
|
ret = intel_engine_idle(engine, flags);
|
2010-12-04 19:30:53 +08:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
}
|
2010-02-19 18:52:00 +08:00
|
|
|
|
2010-02-12 05:29:04 +08:00
|
|
|
return 0;
|
2010-02-19 18:52:00 +08:00
|
|
|
}
|
|
|
|
|
2014-09-11 15:43:48 +08:00
|
|
|
static bool i915_gem_valid_gtt_space(struct i915_vma *vma,
|
2012-07-26 18:49:32 +08:00
|
|
|
unsigned long cache_level)
|
|
|
|
{
|
2014-09-11 15:43:48 +08:00
|
|
|
struct drm_mm_node *gtt_space = &vma->node;
|
2012-07-26 18:49:32 +08:00
|
|
|
struct drm_mm_node *other;
|
|
|
|
|
2014-09-11 15:43:48 +08:00
|
|
|
/*
|
|
|
|
* On some machines we have to be careful when putting differing types
|
|
|
|
* of snoopable memory together to avoid the prefetcher crossing memory
|
|
|
|
* domains and dying. During vm initialisation, we decide whether or not
|
|
|
|
* these constraints apply and set the drm_mm.color_adjust
|
|
|
|
* appropriately.
|
2012-07-26 18:49:32 +08:00
|
|
|
*/
|
2014-09-11 15:43:48 +08:00
|
|
|
if (vma->vm->mm.color_adjust == NULL)
|
2012-07-26 18:49:32 +08:00
|
|
|
return true;
|
|
|
|
|
2013-07-06 05:41:06 +08:00
|
|
|
if (!drm_mm_node_allocated(gtt_space))
|
2012-07-26 18:49:32 +08:00
|
|
|
return true;
|
|
|
|
|
|
|
|
if (list_empty(>t_space->node_list))
|
|
|
|
return true;
|
|
|
|
|
|
|
|
other = list_entry(gtt_space->node_list.prev, struct drm_mm_node, node_list);
|
|
|
|
if (other->allocated && !other->hole_follows && other->color != cache_level)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
other = list_entry(gtt_space->node_list.next, struct drm_mm_node, node_list);
|
|
|
|
if (other->allocated && !gtt_space->hole_follows && other->color != cache_level)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2008-07-31 03:06:12 +08:00
|
|
|
/**
|
2016-08-04 23:32:31 +08:00
|
|
|
* i915_vma_insert - finds a slot for the vma in its address space
|
|
|
|
* @vma: the vma
|
2016-08-04 23:32:23 +08:00
|
|
|
* @size: requested size in bytes (can be larger than the VMA)
|
2016-08-04 23:32:31 +08:00
|
|
|
* @alignment: required alignment
|
2016-06-03 21:02:17 +08:00
|
|
|
* @flags: mask of PIN_* flags to use
|
2016-08-04 23:32:31 +08:00
|
|
|
*
|
|
|
|
* First we try to allocate some free space that meets the requirements for
|
|
|
|
* the VMA. Failiing that, if the flags permit, it will evict an old VMA,
|
|
|
|
* preferrably the oldest idle entry to make room for the new VMA.
|
|
|
|
*
|
|
|
|
* Returns:
|
|
|
|
* 0 on success, negative error code otherwise.
|
2008-07-31 03:06:12 +08:00
|
|
|
*/
|
2016-08-04 23:32:31 +08:00
|
|
|
static int
|
|
|
|
i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
2016-08-04 23:32:31 +08:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(vma->vm->dev);
|
|
|
|
struct drm_i915_gem_object *obj = vma->obj;
|
2015-10-01 20:33:57 +08:00
|
|
|
u64 start, end;
|
2009-09-14 23:50:30 +08:00
|
|
|
int ret;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2016-08-04 23:32:32 +08:00
|
|
|
GEM_BUG_ON(vma->flags & (I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND));
|
2016-08-04 23:32:31 +08:00
|
|
|
GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
|
2016-08-04 23:32:29 +08:00
|
|
|
|
|
|
|
size = max(size, vma->size);
|
|
|
|
if (flags & PIN_MAPPABLE)
|
2016-08-05 17:14:23 +08:00
|
|
|
size = i915_gem_get_ggtt_size(dev_priv, size,
|
|
|
|
i915_gem_object_get_tiling(obj));
|
2016-08-04 23:32:29 +08:00
|
|
|
|
2016-08-19 00:17:07 +08:00
|
|
|
alignment = max(max(alignment, vma->display_alignment),
|
|
|
|
i915_gem_get_ggtt_alignment(dev_priv, size,
|
|
|
|
i915_gem_object_get_tiling(obj),
|
|
|
|
flags & PIN_MAPPABLE));
|
2010-09-25 04:15:47 +08:00
|
|
|
|
2015-10-01 20:33:57 +08:00
|
|
|
start = flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
|
2016-08-04 23:32:29 +08:00
|
|
|
|
|
|
|
end = vma->vm->total;
|
2015-10-01 20:33:57 +08:00
|
|
|
if (flags & PIN_MAPPABLE)
|
2016-08-04 23:32:23 +08:00
|
|
|
end = min_t(u64, end, dev_priv->ggtt.mappable_end);
|
2015-10-01 20:33:57 +08:00
|
|
|
if (flags & PIN_ZONE_4G)
|
2016-01-11 19:39:27 +08:00
|
|
|
end = min_t(u64, end, (1ULL << 32) - PAGE_SIZE);
|
2015-10-01 20:33:57 +08:00
|
|
|
|
2015-05-06 19:33:58 +08:00
|
|
|
/* If binding the object/GGTT view requires more space than the entire
|
|
|
|
* aperture has, reject it early before evicting everything in a vain
|
|
|
|
* attempt to find space.
|
2010-05-27 20:18:21 +08:00
|
|
|
*/
|
2015-05-06 19:33:58 +08:00
|
|
|
if (size > end) {
|
2016-08-04 23:32:29 +08:00
|
|
|
DRM_DEBUG("Attempting to bind an object larger than the aperture: request=%llu [object=%zd] > %s aperture=%llu\n",
|
2016-08-04 23:32:23 +08:00
|
|
|
size, obj->base.size,
|
2014-02-14 21:01:11 +08:00
|
|
|
flags & PIN_MAPPABLE ? "mappable" : "total",
|
drm/i915: Prevent negative relocation deltas from wrapping
This is pure evil. Userspace, I'm looking at you SNA, repacks batch
buffers on the fly after generation as they are being passed to the
kernel for execution. These batches also contain self-referenced
relocations as a single buffer encompasses the state commands, kernels,
vertices and sampler. During generation the buffers are placed at known
offsets within the full batch, and then the relocation deltas (as passed
to the kernel) are tweaked as the batch is repacked into a smaller buffer.
This means that userspace is passing negative relocations deltas, which
subsequently wrap to large values if the batch is at a low address. The
GPU hangs when it then tries to use the large value as a base for its
address offsets, rather than wrapping back to the real value (as one
would hope). As the GPU uses positive offsets from the base, we can
treat the relocation address as the minimum address read by the GPU.
For the upper bound, we trust that userspace will not read beyond the
end of the buffer.
So, how do we fix negative relocations from wrapping? We can either
check that every relocation looks valid when we write it, and then
position each object such that we prevent the offset wraparound, or we
just special-case the self-referential behaviour of SNA and force all
batches to be above 256k. Daniel prefers the latter approach.
This fixes a GPU hang when it tries to use an address (relocation +
offset) greater than the GTT size. The issue would occur quite easily
with full-ppgtt as each fd gets its own VM space, so low offsets would
often be handed out. However, with the rearrangement of the low GTT due
to capturing the BIOS framebuffer, it is already affecting kernels 3.15
onwards. I think only IVB+ is susceptible to this bug, but the workaround
should only kick in rarely, so it seems sensible to always apply it.
v3: Use a bias for batch buffers to prevent small negative delta relocations
from wrapping.
v4 from Daniel:
- s/BIAS/BATCH_OFFSET_BIAS/
- Extract eb_vma_misplaced/i915_vma_misplaced since the conditions
were growing rather cumbersome.
- Add a comment to eb_get_batch explaining why we do this.
- Apply the batch offset bias everywhere but mention that we've only
observed it on gen7 gpus.
- Drop PIN_OFFSET_FIX for now, that slipped in from a feature patch.
v5: Add static to eb_get_batch, spotted by 0-day tester.
Testcase: igt/gem_bad_reloc
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78533
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v3)
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-23 14:48:08 +08:00
|
|
|
end);
|
2016-08-04 23:32:31 +08:00
|
|
|
return -E2BIG;
|
2010-05-27 20:18:21 +08:00
|
|
|
}
|
|
|
|
|
2012-06-07 22:38:42 +08:00
|
|
|
ret = i915_gem_object_get_pages(obj);
|
drm/i915: Track unbound pages
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-20 17:40:46 +08:00
|
|
|
if (ret)
|
2016-08-04 23:32:31 +08:00
|
|
|
return ret;
|
drm/i915: Track unbound pages
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-20 17:40:46 +08:00
|
|
|
|
2012-11-20 18:45:16 +08:00
|
|
|
i915_gem_object_pin_pages(obj);
|
|
|
|
|
2015-12-08 19:55:07 +08:00
|
|
|
if (flags & PIN_OFFSET_FIXED) {
|
2016-08-04 23:32:31 +08:00
|
|
|
u64 offset = flags & PIN_OFFSET_MASK;
|
2016-08-04 23:32:29 +08:00
|
|
|
if (offset & (alignment - 1) || offset > end - size) {
|
2015-12-08 19:55:07 +08:00
|
|
|
ret = -EINVAL;
|
2016-08-04 23:32:29 +08:00
|
|
|
goto err_unpin;
|
2015-12-08 19:55:07 +08:00
|
|
|
}
|
2016-08-04 23:32:29 +08:00
|
|
|
|
2015-12-08 19:55:07 +08:00
|
|
|
vma->node.start = offset;
|
|
|
|
vma->node.size = size;
|
|
|
|
vma->node.color = obj->cache_level;
|
2016-08-04 23:32:29 +08:00
|
|
|
ret = drm_mm_reserve_node(&vma->vm->mm, &vma->node);
|
2015-12-08 19:55:07 +08:00
|
|
|
if (ret) {
|
|
|
|
ret = i915_gem_evict_for_vma(vma);
|
|
|
|
if (ret == 0)
|
2016-08-04 23:32:29 +08:00
|
|
|
ret = drm_mm_reserve_node(&vma->vm->mm, &vma->node);
|
|
|
|
if (ret)
|
|
|
|
goto err_unpin;
|
2015-12-08 19:55:07 +08:00
|
|
|
}
|
2015-10-01 20:33:57 +08:00
|
|
|
} else {
|
2016-08-04 23:32:29 +08:00
|
|
|
u32 search_flag, alloc_flag;
|
|
|
|
|
2015-12-08 19:55:07 +08:00
|
|
|
if (flags & PIN_HIGH) {
|
|
|
|
search_flag = DRM_MM_SEARCH_BELOW;
|
|
|
|
alloc_flag = DRM_MM_CREATE_TOP;
|
|
|
|
} else {
|
|
|
|
search_flag = DRM_MM_SEARCH_DEFAULT;
|
|
|
|
alloc_flag = DRM_MM_CREATE_DEFAULT;
|
|
|
|
}
|
2015-10-01 20:33:57 +08:00
|
|
|
|
2016-08-04 23:32:26 +08:00
|
|
|
/* We only allocate in PAGE_SIZE/GTT_PAGE_SIZE (4096) chunks,
|
|
|
|
* so we know that we always have a minimum alignment of 4096.
|
|
|
|
* The drm_mm range manager is optimised to return results
|
|
|
|
* with zero alignment, so where possible use the optimal
|
|
|
|
* path.
|
|
|
|
*/
|
|
|
|
if (alignment <= 4096)
|
|
|
|
alignment = 0;
|
|
|
|
|
2013-05-26 03:26:35 +08:00
|
|
|
search_free:
|
2016-08-04 23:32:29 +08:00
|
|
|
ret = drm_mm_insert_node_in_range_generic(&vma->vm->mm,
|
|
|
|
&vma->node,
|
2015-12-08 19:55:07 +08:00
|
|
|
size, alignment,
|
|
|
|
obj->cache_level,
|
|
|
|
start, end,
|
|
|
|
search_flag,
|
|
|
|
alloc_flag);
|
|
|
|
if (ret) {
|
2016-08-04 23:32:29 +08:00
|
|
|
ret = i915_gem_evict_something(vma->vm, size, alignment,
|
2015-12-08 19:55:07 +08:00
|
|
|
obj->cache_level,
|
|
|
|
start, end,
|
|
|
|
flags);
|
|
|
|
if (ret == 0)
|
|
|
|
goto search_free;
|
2009-09-21 07:22:34 +08:00
|
|
|
|
2016-08-04 23:32:29 +08:00
|
|
|
goto err_unpin;
|
2015-12-08 19:55:07 +08:00
|
|
|
}
|
2016-10-13 16:55:04 +08:00
|
|
|
|
|
|
|
GEM_BUG_ON(vma->node.start < start);
|
|
|
|
GEM_BUG_ON(vma->node.start + vma->node.size > end);
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
2016-08-04 23:32:24 +08:00
|
|
|
GEM_BUG_ON(!i915_gem_valid_gtt_space(vma, obj->cache_level));
|
2014-12-11 01:27:58 +08:00
|
|
|
|
2013-06-01 02:28:48 +08:00
|
|
|
list_move_tail(&obj->global_list, &dev_priv->mm.bound_list);
|
2016-08-04 23:32:29 +08:00
|
|
|
list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
|
2016-08-04 14:52:26 +08:00
|
|
|
obj->bind_count++;
|
2010-08-07 18:01:20 +08:00
|
|
|
|
2016-08-04 23:32:31 +08:00
|
|
|
return 0;
|
2013-07-18 03:19:03 +08:00
|
|
|
|
2013-07-22 18:12:38 +08:00
|
|
|
err_unpin:
|
2013-07-18 03:19:03 +08:00
|
|
|
i915_gem_object_unpin_pages(obj);
|
2016-08-04 23:32:31 +08:00
|
|
|
return ret;
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
|
|
|
|
2013-08-08 21:41:09 +08:00
|
|
|
bool
|
2013-08-09 19:26:45 +08:00
|
|
|
i915_gem_clflush_object(struct drm_i915_gem_object *obj,
|
|
|
|
bool force)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
|
|
|
/* If we don't have a page list set up, then we're not pinned
|
|
|
|
* to GPU, and we can ignore the cache flush because it'll happen
|
|
|
|
* again at bind time.
|
|
|
|
*/
|
2010-11-09 03:18:58 +08:00
|
|
|
if (obj->pages == NULL)
|
2013-08-08 21:41:09 +08:00
|
|
|
return false;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2013-02-14 03:56:05 +08:00
|
|
|
/*
|
|
|
|
* Stolen memory is always coherent with the GPU as it is explicitly
|
|
|
|
* marked as wc by the system, or the system is cache-coherent.
|
|
|
|
*/
|
2014-11-04 20:51:40 +08:00
|
|
|
if (obj->stolen || obj->phys_handle)
|
2013-08-08 21:41:09 +08:00
|
|
|
return false;
|
2013-02-14 03:56:05 +08:00
|
|
|
|
2011-03-30 07:59:52 +08:00
|
|
|
/* If the GPU is snooping the contents of the CPU cache,
|
|
|
|
* we do not need to manually clear the CPU cache lines. However,
|
|
|
|
* the caches are only snooped when the render cache is
|
|
|
|
* flushed/invalidated. As we always have to emit invalidations
|
|
|
|
* and flushes when moving into and out of the RENDER domain, correct
|
|
|
|
* snooping behaviour occurs naturally as the result of our domain
|
|
|
|
* tracking.
|
|
|
|
*/
|
2015-01-13 21:32:52 +08:00
|
|
|
if (!force && cpu_cache_is_coherent(obj->base.dev, obj->cache_level)) {
|
|
|
|
obj->cache_dirty = true;
|
2013-08-08 21:41:09 +08:00
|
|
|
return false;
|
2015-01-13 21:32:52 +08:00
|
|
|
}
|
2011-03-30 07:59:52 +08:00
|
|
|
|
2009-08-25 18:15:50 +08:00
|
|
|
trace_i915_gem_object_clflush(obj);
|
2012-06-01 22:20:22 +08:00
|
|
|
drm_clflush_sg(obj->pages);
|
2015-01-13 21:32:52 +08:00
|
|
|
obj->cache_dirty = false;
|
2013-08-08 21:41:09 +08:00
|
|
|
|
|
|
|
return true;
|
2008-11-15 05:35:19 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/** Flushes the GTT write domain for the object if it's dirty. */
|
|
|
|
static void
|
2010-11-09 03:18:58 +08:00
|
|
|
i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj)
|
2008-11-15 05:35:19 +08:00
|
|
|
{
|
drm/i915: Wait for writes through the GTT to land before reading back
If we quickly switch from writing through the GTT to a read of the
physical page directly with the CPU (e.g. performing relocations through
the GTT and then running the command parser), we can observe that the
writes are not visible to the CPU. It is not a coherency problem, as
extensive investigations with clflush have demonstrated, but a mere
timing issue - we have to wait for the GTT to complete it's write before
we start our read from the CPU.
The issue can be illustrated in userspace with:
gtt = gem_mmap__gtt(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
cpu = gem_mmap__cpu(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
gem_set_domain(fd, handle, I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT);
for (i = 0; i < OBJECT_SIZE / 64; i++) {
int x = 16*i + (i%16);
gtt[x] = i;
clflush(&cpu[x], sizeof(cpu[x]));
assert(cpu[x] == i);
}
Experimenting with that shows that this behaviour is indeed limited to
recent Atom-class hardware.
Testcase: igt/gem_exec_flush/basic-batch-default-cmd #byt
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-10-chris@chris-wilson.co.uk
2016-08-19 00:16:49 +08:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
|
2009-08-25 18:15:50 +08:00
|
|
|
|
2010-11-09 03:18:58 +08:00
|
|
|
if (obj->base.write_domain != I915_GEM_DOMAIN_GTT)
|
2008-11-15 05:35:19 +08:00
|
|
|
return;
|
|
|
|
|
2011-01-05 02:42:07 +08:00
|
|
|
/* No actual flushing is required for the GTT write domain. Writes
|
drm/i915: Wait for writes through the GTT to land before reading back
If we quickly switch from writing through the GTT to a read of the
physical page directly with the CPU (e.g. performing relocations through
the GTT and then running the command parser), we can observe that the
writes are not visible to the CPU. It is not a coherency problem, as
extensive investigations with clflush have demonstrated, but a mere
timing issue - we have to wait for the GTT to complete it's write before
we start our read from the CPU.
The issue can be illustrated in userspace with:
gtt = gem_mmap__gtt(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
cpu = gem_mmap__cpu(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
gem_set_domain(fd, handle, I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT);
for (i = 0; i < OBJECT_SIZE / 64; i++) {
int x = 16*i + (i%16);
gtt[x] = i;
clflush(&cpu[x], sizeof(cpu[x]));
assert(cpu[x] == i);
}
Experimenting with that shows that this behaviour is indeed limited to
recent Atom-class hardware.
Testcase: igt/gem_exec_flush/basic-batch-default-cmd #byt
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-10-chris@chris-wilson.co.uk
2016-08-19 00:16:49 +08:00
|
|
|
* to it "immediately" go to main memory as far as we know, so there's
|
2008-11-15 05:35:19 +08:00
|
|
|
* no chipset flush. It also doesn't land in render cache.
|
2011-01-05 02:42:07 +08:00
|
|
|
*
|
|
|
|
* However, we do have to enforce the order so that all writes through
|
|
|
|
* the GTT land before any writes to the device, such as updates to
|
|
|
|
* the GATT itself.
|
drm/i915: Wait for writes through the GTT to land before reading back
If we quickly switch from writing through the GTT to a read of the
physical page directly with the CPU (e.g. performing relocations through
the GTT and then running the command parser), we can observe that the
writes are not visible to the CPU. It is not a coherency problem, as
extensive investigations with clflush have demonstrated, but a mere
timing issue - we have to wait for the GTT to complete it's write before
we start our read from the CPU.
The issue can be illustrated in userspace with:
gtt = gem_mmap__gtt(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
cpu = gem_mmap__cpu(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
gem_set_domain(fd, handle, I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT);
for (i = 0; i < OBJECT_SIZE / 64; i++) {
int x = 16*i + (i%16);
gtt[x] = i;
clflush(&cpu[x], sizeof(cpu[x]));
assert(cpu[x] == i);
}
Experimenting with that shows that this behaviour is indeed limited to
recent Atom-class hardware.
Testcase: igt/gem_exec_flush/basic-batch-default-cmd #byt
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-10-chris@chris-wilson.co.uk
2016-08-19 00:16:49 +08:00
|
|
|
*
|
|
|
|
* We also have to wait a bit for the writes to land from the GTT.
|
|
|
|
* An uncached read (i.e. mmio) seems to be ideal for the round-trip
|
|
|
|
* timing. This issue has only been observed when switching quickly
|
|
|
|
* between GTT writes and CPU reads from inside the kernel on recent hw,
|
|
|
|
* and it appears to only affect discrete GTT blocks (i.e. on LLC
|
|
|
|
* system agents we cannot reproduce this behaviour).
|
2008-11-15 05:35:19 +08:00
|
|
|
*/
|
2011-01-05 02:42:07 +08:00
|
|
|
wmb();
|
drm/i915: Wait for writes through the GTT to land before reading back
If we quickly switch from writing through the GTT to a read of the
physical page directly with the CPU (e.g. performing relocations through
the GTT and then running the command parser), we can observe that the
writes are not visible to the CPU. It is not a coherency problem, as
extensive investigations with clflush have demonstrated, but a mere
timing issue - we have to wait for the GTT to complete it's write before
we start our read from the CPU.
The issue can be illustrated in userspace with:
gtt = gem_mmap__gtt(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
cpu = gem_mmap__cpu(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
gem_set_domain(fd, handle, I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT);
for (i = 0; i < OBJECT_SIZE / 64; i++) {
int x = 16*i + (i%16);
gtt[x] = i;
clflush(&cpu[x], sizeof(cpu[x]));
assert(cpu[x] == i);
}
Experimenting with that shows that this behaviour is indeed limited to
recent Atom-class hardware.
Testcase: igt/gem_exec_flush/basic-batch-default-cmd #byt
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-10-chris@chris-wilson.co.uk
2016-08-19 00:16:49 +08:00
|
|
|
if (INTEL_GEN(dev_priv) >= 6 && !HAS_LLC(dev_priv))
|
drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 01:14:48 +08:00
|
|
|
POSTING_READ(RING_ACTHD(dev_priv->engine[RCS]->mmio_base));
|
2011-01-05 02:42:07 +08:00
|
|
|
|
2016-08-19 00:16:44 +08:00
|
|
|
intel_fb_obj_flush(obj, false, write_origin(obj, I915_GEM_DOMAIN_GTT));
|
drm/i915: Track frontbuffer invalidation/flushing
So these are the guts of the new beast. This tracks when a frontbuffer
gets invalidated (due to frontbuffer rendering) and hence should be
constantly scaned out, and when it's flushed again and can be
compressed/one-shot-upload.
Rules for flushing are simple: The frontbuffer needs one more full
upload starting from the next vblank. Which means that the flushing
can _only_ be called once the frontbuffer update has been latched.
But this poses a problem for pageflips: We can't just delay the
flushing until the pageflip is latched, since that would pose the risk
that we override frontbuffer rendering that has been scheduled
in-between the pageflip ioctl and the actual latching.
To handle this track asynchronous invalidations (and also pageflip)
state per-ring and delay any in-between flushing until the rendering
has completed. And also cancel any delayed flushing if we get a new
invalidation request (whether delayed or not).
Also call intel_mark_fb_busy in both cases in all cases to make sure
that we keep the screen at the highest refresh rate both on flips,
synchronous plane updates and for frontbuffer rendering.
v2: Lots of improvements
Suggestions from Chris:
- Move invalidate/flush in flush_*_domain and set_to_*_domain.
- Drop the flush in busy_ioctl since it's redundant. Was a leftover
from an earlier concept to track flips/delayed flushes.
- Don't forget about the initial modeset enable/final disable.
Suggested by Chris.
Track flips accurately, too. Since flips complete independently of
rendering we need to track pending flips in a separate mask. Again if
an invalidate happens we need to cancel the evenutal flush to avoid
races.
v3:
Provide correct header declarations for flip functions. Currently not
needed outside of intel_display.c, but part of the proper interface.
v4: Add proper domain management to fbcon so that the fbcon buffer is
also tracked correctly.
v5: Fixup locking around the fbcon set_to_gtt_domain call.
v6: More comments from Chris:
- Split out fbcon changes.
- Drop superflous checks for potential scanout before calling intel_fb
functions - we can micro-optimize this later.
- s/intel_fb_/intel_fb_obj_/ to make it clear that this deals in gem
object. We already have precedence for fb_obj in the pin_and_fence
functions.
v7: Clarify the semantics of the flip flush handling by renaming
things a bit:
- Don't go through a gem object but take the relevant frontbuffer bits
directly. These functions center on the plane, the actual object is
irrelevant - even a flip to the same object as already active should
cause a flush.
- Add a new intel_frontbuffer_flip for synchronous plane updates. It
currently just calls intel_frontbuffer_flush since the implemenation
differs.
This way we achieve a clear split between one-shot update events on
one side and frontbuffer rendering with potentially a very long delay
between the invalidate and flush.
Chris and I also had some discussions about mark_busy and whether it
is appropriate to call from flush. But mark busy is a state which
should be derived from the 3 events (invalidate, flush, flip) we now
have by the users, like psr does by tracking relevant information in
psr.busy_frontbuffer_bits. DRRS (the only real use of mark_busy for
frontbuffer) needs to have similar logic. With that the overall
mark_busy in the core could be removed.
v8: Only when retiring gpu buffers only flush frontbuffer bits we
actually invalidated in a batch. Just for safety since before any
additional usage/invalidate we should always retire current rendering.
Suggested by Chris Wilson.
v9: Actually use intel_frontbuffer_flip in all appropriate places.
Spotted by Chris.
v10: Address more comments from Chris:
- Don't call _flip in set_base when the crtc is inactive, avoids redunancy
in the modeset case with the initial enabling of all planes.
- Add comments explaining that the initial/final plane enable/disable
still has work left to do before it's fully generic.
v11: Only invalidate for gtt/cpu access when writing. Spotted by Chris.
v12: s/_flush/_flip/ in intel_overlay.c per Chris' comment.
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-06-19 22:01:59 +08:00
|
|
|
|
2016-08-19 00:16:51 +08:00
|
|
|
obj->base.write_domain = 0;
|
2009-08-25 18:15:50 +08:00
|
|
|
trace_i915_gem_object_change_domain(obj,
|
2010-11-09 03:18:58 +08:00
|
|
|
obj->base.read_domains,
|
2016-08-19 00:16:51 +08:00
|
|
|
I915_GEM_DOMAIN_GTT);
|
2008-11-15 05:35:19 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/** Flushes the CPU write domain for the object if it's dirty. */
|
|
|
|
static void
|
2015-01-21 21:53:48 +08:00
|
|
|
i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj)
|
2008-11-15 05:35:19 +08:00
|
|
|
{
|
2010-11-09 03:18:58 +08:00
|
|
|
if (obj->base.write_domain != I915_GEM_DOMAIN_CPU)
|
2008-11-15 05:35:19 +08:00
|
|
|
return;
|
|
|
|
|
2015-01-21 21:53:48 +08:00
|
|
|
if (i915_gem_clflush_object(obj, obj->pin_display))
|
2016-05-06 22:40:21 +08:00
|
|
|
i915_gem_chipset_flush(to_i915(obj->base.dev));
|
2013-08-08 21:41:09 +08:00
|
|
|
|
2015-07-08 07:28:51 +08:00
|
|
|
intel_fb_obj_flush(obj, false, ORIGIN_CPU);
|
drm/i915: Track frontbuffer invalidation/flushing
So these are the guts of the new beast. This tracks when a frontbuffer
gets invalidated (due to frontbuffer rendering) and hence should be
constantly scaned out, and when it's flushed again and can be
compressed/one-shot-upload.
Rules for flushing are simple: The frontbuffer needs one more full
upload starting from the next vblank. Which means that the flushing
can _only_ be called once the frontbuffer update has been latched.
But this poses a problem for pageflips: We can't just delay the
flushing until the pageflip is latched, since that would pose the risk
that we override frontbuffer rendering that has been scheduled
in-between the pageflip ioctl and the actual latching.
To handle this track asynchronous invalidations (and also pageflip)
state per-ring and delay any in-between flushing until the rendering
has completed. And also cancel any delayed flushing if we get a new
invalidation request (whether delayed or not).
Also call intel_mark_fb_busy in both cases in all cases to make sure
that we keep the screen at the highest refresh rate both on flips,
synchronous plane updates and for frontbuffer rendering.
v2: Lots of improvements
Suggestions from Chris:
- Move invalidate/flush in flush_*_domain and set_to_*_domain.
- Drop the flush in busy_ioctl since it's redundant. Was a leftover
from an earlier concept to track flips/delayed flushes.
- Don't forget about the initial modeset enable/final disable.
Suggested by Chris.
Track flips accurately, too. Since flips complete independently of
rendering we need to track pending flips in a separate mask. Again if
an invalidate happens we need to cancel the evenutal flush to avoid
races.
v3:
Provide correct header declarations for flip functions. Currently not
needed outside of intel_display.c, but part of the proper interface.
v4: Add proper domain management to fbcon so that the fbcon buffer is
also tracked correctly.
v5: Fixup locking around the fbcon set_to_gtt_domain call.
v6: More comments from Chris:
- Split out fbcon changes.
- Drop superflous checks for potential scanout before calling intel_fb
functions - we can micro-optimize this later.
- s/intel_fb_/intel_fb_obj_/ to make it clear that this deals in gem
object. We already have precedence for fb_obj in the pin_and_fence
functions.
v7: Clarify the semantics of the flip flush handling by renaming
things a bit:
- Don't go through a gem object but take the relevant frontbuffer bits
directly. These functions center on the plane, the actual object is
irrelevant - even a flip to the same object as already active should
cause a flush.
- Add a new intel_frontbuffer_flip for synchronous plane updates. It
currently just calls intel_frontbuffer_flush since the implemenation
differs.
This way we achieve a clear split between one-shot update events on
one side and frontbuffer rendering with potentially a very long delay
between the invalidate and flush.
Chris and I also had some discussions about mark_busy and whether it
is appropriate to call from flush. But mark busy is a state which
should be derived from the 3 events (invalidate, flush, flip) we now
have by the users, like psr does by tracking relevant information in
psr.busy_frontbuffer_bits. DRRS (the only real use of mark_busy for
frontbuffer) needs to have similar logic. With that the overall
mark_busy in the core could be removed.
v8: Only when retiring gpu buffers only flush frontbuffer bits we
actually invalidated in a batch. Just for safety since before any
additional usage/invalidate we should always retire current rendering.
Suggested by Chris Wilson.
v9: Actually use intel_frontbuffer_flip in all appropriate places.
Spotted by Chris.
v10: Address more comments from Chris:
- Don't call _flip in set_base when the crtc is inactive, avoids redunancy
in the modeset case with the initial enabling of all planes.
- Add comments explaining that the initial/final plane enable/disable
still has work left to do before it's fully generic.
v11: Only invalidate for gtt/cpu access when writing. Spotted by Chris.
v12: s/_flush/_flip/ in intel_overlay.c per Chris' comment.
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-06-19 22:01:59 +08:00
|
|
|
|
2016-08-19 00:16:51 +08:00
|
|
|
obj->base.write_domain = 0;
|
2009-08-25 18:15:50 +08:00
|
|
|
trace_i915_gem_object_change_domain(obj,
|
2010-11-09 03:18:58 +08:00
|
|
|
obj->base.read_domains,
|
2016-08-19 00:16:51 +08:00
|
|
|
I915_GEM_DOMAIN_CPU);
|
2008-11-15 05:35:19 +08:00
|
|
|
}
|
|
|
|
|
2016-08-19 00:17:08 +08:00
|
|
|
static void i915_gem_object_bump_inactive_ggtt(struct drm_i915_gem_object *obj)
|
|
|
|
{
|
|
|
|
struct i915_vma *vma;
|
|
|
|
|
|
|
|
list_for_each_entry(vma, &obj->vma_list, obj_link) {
|
|
|
|
if (!i915_vma_is_ggtt(vma))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (i915_vma_is_active(vma))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (!drm_mm_node_allocated(&vma->node))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
|
|
|
|
}
|
2008-11-15 05:35:19 +08:00
|
|
|
}
|
|
|
|
|
2008-11-11 02:53:25 +08:00
|
|
|
/**
|
|
|
|
* Moves a single object to the GTT read, and possibly write domain.
|
2016-06-03 21:02:17 +08:00
|
|
|
* @obj: object to act on
|
|
|
|
* @write: ask for write access or read only
|
2008-11-11 02:53:25 +08:00
|
|
|
*
|
|
|
|
* This function returns when the move is complete, including waiting on
|
|
|
|
* flushes to occur.
|
|
|
|
*/
|
DRM: i915: add mode setting support
This commit adds i915 driver support for the DRM mode setting APIs.
Currently, VGA, LVDS, SDVO DVI & VGA, TV and DVO LVDS outputs are
supported. HDMI, DisplayPort and additional SDVO output support will
follow.
Support for the mode setting code is controlled by the new 'modeset'
module option. A new config option, CONFIG_DRM_I915_KMS controls the
default behavior, and whether a PCI ID list is built into the module for
use by user level module utilities.
Note that if mode setting is enabled, user level drivers that access
display registers directly or that don't use the kernel graphics memory
manager will likely corrupt kernel graphics memory, disrupt output
configuration (possibly leading to hangs and/or blank displays), and
prevent panic/oops messages from appearing. So use caution when
enabling this code; be sure your user level code supports the new
interfaces.
A new SysRq key, 'g', provides emergency support for switching back to
the kernel's framebuffer console; which is useful for testing.
Co-authors: Dave Airlie <airlied@linux.ie>, Hong Liu <hong.liu@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2008-11-08 06:24:08 +08:00
|
|
|
int
|
2010-11-23 23:26:33 +08:00
|
|
|
i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
|
2008-11-11 02:53:25 +08:00
|
|
|
{
|
2009-08-25 18:15:50 +08:00
|
|
|
uint32_t old_write_domain, old_read_domains;
|
2008-11-15 05:35:19 +08:00
|
|
|
int ret;
|
2008-11-11 02:53:25 +08:00
|
|
|
|
2012-07-20 19:41:01 +08:00
|
|
|
ret = i915_gem_object_wait_rendering(obj, !write);
|
2011-01-08 01:09:48 +08:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2016-07-20 16:21:15 +08:00
|
|
|
if (obj->base.write_domain == I915_GEM_DOMAIN_GTT)
|
|
|
|
return 0;
|
|
|
|
|
2015-01-02 18:59:29 +08:00
|
|
|
/* Flush and acquire obj->pages so that we are coherent through
|
|
|
|
* direct access in memory with previous cached writes through
|
|
|
|
* shmemfs and that our cache domain tracking remains valid.
|
|
|
|
* For example, if the obj->filp was moved to swap without us
|
|
|
|
* being notified and releasing the pages, we would mistakenly
|
|
|
|
* continue to assume that the obj remained out of the CPU cached
|
|
|
|
* domain.
|
|
|
|
*/
|
|
|
|
ret = i915_gem_object_get_pages(obj);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2015-01-21 21:53:48 +08:00
|
|
|
i915_gem_object_flush_cpu_write_domain(obj);
|
2009-08-25 18:15:50 +08:00
|
|
|
|
2012-10-10 02:24:37 +08:00
|
|
|
/* Serialise direct access to this object with the barriers for
|
|
|
|
* coherent writes from the GPU, by effectively invalidating the
|
|
|
|
* GTT domain upon first access.
|
|
|
|
*/
|
|
|
|
if ((obj->base.read_domains & I915_GEM_DOMAIN_GTT) == 0)
|
|
|
|
mb();
|
|
|
|
|
2010-11-09 03:18:58 +08:00
|
|
|
old_write_domain = obj->base.write_domain;
|
|
|
|
old_read_domains = obj->base.read_domains;
|
2009-08-25 18:15:50 +08:00
|
|
|
|
2008-11-15 05:35:19 +08:00
|
|
|
/* It should now be out of any other write domains, and we can update
|
|
|
|
* the domain values for our changes.
|
|
|
|
*/
|
2010-11-09 03:18:58 +08:00
|
|
|
BUG_ON((obj->base.write_domain & ~I915_GEM_DOMAIN_GTT) != 0);
|
|
|
|
obj->base.read_domains |= I915_GEM_DOMAIN_GTT;
|
2008-11-15 05:35:19 +08:00
|
|
|
if (write) {
|
2010-11-09 03:18:58 +08:00
|
|
|
obj->base.read_domains = I915_GEM_DOMAIN_GTT;
|
|
|
|
obj->base.write_domain = I915_GEM_DOMAIN_GTT;
|
|
|
|
obj->dirty = 1;
|
2008-11-11 02:53:25 +08:00
|
|
|
}
|
|
|
|
|
2009-08-25 18:15:50 +08:00
|
|
|
trace_i915_gem_object_change_domain(obj,
|
|
|
|
old_read_domains,
|
|
|
|
old_write_domain);
|
|
|
|
|
2012-04-24 22:52:35 +08:00
|
|
|
/* And bump the LRU for this access */
|
2016-08-19 00:17:08 +08:00
|
|
|
i915_gem_object_bump_inactive_ggtt(obj);
|
2012-04-24 22:52:35 +08:00
|
|
|
|
2008-11-15 05:35:19 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2015-10-09 21:11:27 +08:00
|
|
|
/**
|
|
|
|
* Changes the cache-level of an object across all VMA.
|
2016-06-03 21:02:17 +08:00
|
|
|
* @obj: object to act on
|
|
|
|
* @cache_level: new cache level to set for the object
|
2015-10-09 21:11:27 +08:00
|
|
|
*
|
|
|
|
* After this function returns, the object will be in the new cache-level
|
|
|
|
* across all GTT and the contents of the backing storage will be coherent,
|
|
|
|
* with respect to the new cache-level. In order to keep the backing storage
|
|
|
|
* coherent for all users, we only allow a single cache level to be set
|
|
|
|
* globally on the object and prevent it from being changed whilst the
|
|
|
|
* hardware is reading from the object. That is if the object is currently
|
|
|
|
* on the scanout it will be set to uncached (or equivalent display
|
|
|
|
* cache coherency) and all non-MOCS GPU access will also be uncached so
|
|
|
|
* that all direct access to the scanout remains coherent.
|
|
|
|
*/
|
2011-04-04 16:44:39 +08:00
|
|
|
int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
|
|
|
|
enum i915_cache_level cache_level)
|
|
|
|
{
|
2016-08-04 14:52:27 +08:00
|
|
|
struct i915_vma *vma;
|
2015-08-12 00:47:10 +08:00
|
|
|
int ret = 0;
|
2011-04-04 16:44:39 +08:00
|
|
|
|
|
|
|
if (obj->cache_level == cache_level)
|
2015-08-12 00:47:10 +08:00
|
|
|
goto out;
|
2011-04-04 16:44:39 +08:00
|
|
|
|
2015-10-09 21:11:27 +08:00
|
|
|
/* Inspect the list of currently bound VMA and unbind any that would
|
|
|
|
* be invalid given the new cache-level. This is principally to
|
|
|
|
* catch the issue of the CS prefetch crossing page boundaries and
|
|
|
|
* reading an invalid PTE on older architectures.
|
|
|
|
*/
|
2016-08-04 14:52:27 +08:00
|
|
|
restart:
|
|
|
|
list_for_each_entry(vma, &obj->vma_list, obj_link) {
|
2015-10-09 21:11:27 +08:00
|
|
|
if (!drm_mm_node_allocated(&vma->node))
|
|
|
|
continue;
|
|
|
|
|
2016-08-04 23:32:30 +08:00
|
|
|
if (i915_vma_is_pinned(vma)) {
|
2015-10-09 21:11:27 +08:00
|
|
|
DRM_DEBUG("can not change the cache level of pinned objects\n");
|
|
|
|
return -EBUSY;
|
|
|
|
}
|
|
|
|
|
2016-08-04 14:52:27 +08:00
|
|
|
if (i915_gem_valid_gtt_space(vma, cache_level))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
ret = i915_vma_unbind(vma);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
/* As unbinding may affect other elements in the
|
|
|
|
* obj->vma_list (due to side-effects from retiring
|
|
|
|
* an active vma), play safe and restart the iterator.
|
|
|
|
*/
|
|
|
|
goto restart;
|
2012-07-26 18:49:32 +08:00
|
|
|
}
|
|
|
|
|
2015-10-09 21:11:27 +08:00
|
|
|
/* We can reuse the existing drm_mm nodes but need to change the
|
|
|
|
* cache-level on the PTE. We could simply unbind them all and
|
|
|
|
* rebind with the correct cache-level on next use. However since
|
|
|
|
* we already have a valid slot, dma mapping, pages etc, we may as
|
|
|
|
* rewrite the PTE in the belief that doing so tramples upon less
|
|
|
|
* state and so involves less work.
|
|
|
|
*/
|
2016-08-04 14:52:26 +08:00
|
|
|
if (obj->bind_count) {
|
2015-10-09 21:11:27 +08:00
|
|
|
/* Before we change the PTE, the GPU must not be accessing it.
|
|
|
|
* If we wait upon the object, we know that all the bound
|
|
|
|
* VMA are no longer active.
|
|
|
|
*/
|
2015-04-27 20:41:14 +08:00
|
|
|
ret = i915_gem_object_wait_rendering(obj, false);
|
2011-04-04 16:44:39 +08:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2016-08-04 14:52:27 +08:00
|
|
|
if (!HAS_LLC(obj->base.dev) && cache_level != I915_CACHE_NONE) {
|
2015-10-09 21:11:27 +08:00
|
|
|
/* Access to snoopable pages through the GTT is
|
|
|
|
* incoherent and on some machines causes a hard
|
|
|
|
* lockup. Relinquish the CPU mmaping to force
|
|
|
|
* userspace to refault in the pages and we can
|
|
|
|
* then double check if the GTT mapping is still
|
|
|
|
* valid for that pointer access.
|
|
|
|
*/
|
|
|
|
i915_gem_release_mmap(obj);
|
|
|
|
|
|
|
|
/* As we no longer need a fence for GTT access,
|
|
|
|
* we can relinquish it now (and so prevent having
|
|
|
|
* to steal a fence from someone else on the next
|
|
|
|
* fence request). Note GPU activity would have
|
|
|
|
* dropped the fence as all snoopable access is
|
|
|
|
* supposed to be linear.
|
|
|
|
*/
|
2016-08-19 00:17:00 +08:00
|
|
|
list_for_each_entry(vma, &obj->vma_list, obj_link) {
|
|
|
|
ret = i915_vma_put_fence(vma);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
}
|
2015-10-09 21:11:27 +08:00
|
|
|
} else {
|
|
|
|
/* We either have incoherent backing store and
|
|
|
|
* so no GTT access or the architecture is fully
|
|
|
|
* coherent. In such cases, existing GTT mmaps
|
|
|
|
* ignore the cache bit in the PTE and we can
|
|
|
|
* rewrite it without confusing the GPU or having
|
|
|
|
* to force userspace to fault back in its mmaps.
|
|
|
|
*/
|
2011-04-04 16:44:39 +08:00
|
|
|
}
|
|
|
|
|
2016-02-26 19:03:19 +08:00
|
|
|
list_for_each_entry(vma, &obj->vma_list, obj_link) {
|
2015-10-09 21:11:27 +08:00
|
|
|
if (!drm_mm_node_allocated(&vma->node))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
ret = i915_vma_bind(vma, cache_level, PIN_UPDATE);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
}
|
2011-04-04 16:44:39 +08:00
|
|
|
}
|
|
|
|
|
2016-02-26 19:03:19 +08:00
|
|
|
list_for_each_entry(vma, &obj->vma_list, obj_link)
|
2013-08-09 19:26:45 +08:00
|
|
|
vma->node.color = cache_level;
|
|
|
|
obj->cache_level = cache_level;
|
|
|
|
|
2015-08-12 00:47:10 +08:00
|
|
|
out:
|
2015-10-09 21:11:27 +08:00
|
|
|
/* Flush the dirty CPU caches to the backing storage so that the
|
|
|
|
* object is now coherent at its new cache level (with respect
|
|
|
|
* to the access domain).
|
|
|
|
*/
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
if (obj->cache_dirty && cpu_write_needs_clflush(obj)) {
|
2015-01-13 21:32:52 +08:00
|
|
|
if (i915_gem_clflush_object(obj, true))
|
2016-05-06 22:40:21 +08:00
|
|
|
i915_gem_chipset_flush(to_i915(obj->base.dev));
|
2011-04-04 16:44:39 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2012-09-22 08:01:20 +08:00
|
|
|
int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file)
|
2012-07-10 17:27:08 +08:00
|
|
|
{
|
2012-09-22 08:01:20 +08:00
|
|
|
struct drm_i915_gem_caching *args = data;
|
2012-07-10 17:27:08 +08:00
|
|
|
struct drm_i915_gem_object *obj;
|
|
|
|
|
2016-07-20 20:31:51 +08:00
|
|
|
obj = i915_gem_object_lookup(file, args->handle);
|
|
|
|
if (!obj)
|
2015-05-07 19:14:55 +08:00
|
|
|
return -ENOENT;
|
2012-07-10 17:27:08 +08:00
|
|
|
|
2013-08-08 21:41:10 +08:00
|
|
|
switch (obj->cache_level) {
|
|
|
|
case I915_CACHE_LLC:
|
|
|
|
case I915_CACHE_L3_LLC:
|
|
|
|
args->caching = I915_CACHING_CACHED;
|
|
|
|
break;
|
|
|
|
|
2013-08-08 21:41:11 +08:00
|
|
|
case I915_CACHE_WT:
|
|
|
|
args->caching = I915_CACHING_DISPLAY;
|
|
|
|
break;
|
|
|
|
|
2013-08-08 21:41:10 +08:00
|
|
|
default:
|
|
|
|
args->caching = I915_CACHING_NONE;
|
|
|
|
break;
|
|
|
|
}
|
2012-07-10 17:27:08 +08:00
|
|
|
|
2016-07-20 20:31:54 +08:00
|
|
|
i915_gem_object_put_unlocked(obj);
|
2015-05-07 19:14:55 +08:00
|
|
|
return 0;
|
2012-07-10 17:27:08 +08:00
|
|
|
}
|
|
|
|
|
2012-09-22 08:01:20 +08:00
|
|
|
int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file)
|
2012-07-10 17:27:08 +08:00
|
|
|
{
|
2016-07-04 18:34:36 +08:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(dev);
|
2012-09-22 08:01:20 +08:00
|
|
|
struct drm_i915_gem_caching *args = data;
|
2012-07-10 17:27:08 +08:00
|
|
|
struct drm_i915_gem_object *obj;
|
|
|
|
enum i915_cache_level level;
|
|
|
|
int ret;
|
|
|
|
|
2012-09-22 08:01:20 +08:00
|
|
|
switch (args->caching) {
|
|
|
|
case I915_CACHING_NONE:
|
2012-07-10 17:27:08 +08:00
|
|
|
level = I915_CACHE_NONE;
|
|
|
|
break;
|
2012-09-22 08:01:20 +08:00
|
|
|
case I915_CACHING_CACHED:
|
2015-08-14 23:43:30 +08:00
|
|
|
/*
|
|
|
|
* Due to a HW issue on BXT A stepping, GPU stores via a
|
|
|
|
* snooped mapping may leave stale data in a corresponding CPU
|
|
|
|
* cacheline, whereas normally such cachelines would get
|
|
|
|
* invalidated.
|
|
|
|
*/
|
2016-03-02 20:10:31 +08:00
|
|
|
if (!HAS_LLC(dev) && !HAS_SNOOP(dev))
|
2015-08-14 23:43:30 +08:00
|
|
|
return -ENODEV;
|
|
|
|
|
2012-07-10 17:27:08 +08:00
|
|
|
level = I915_CACHE_LLC;
|
|
|
|
break;
|
2013-08-08 21:41:11 +08:00
|
|
|
case I915_CACHING_DISPLAY:
|
2016-10-13 18:03:00 +08:00
|
|
|
level = HAS_WT(dev_priv) ? I915_CACHE_WT : I915_CACHE_NONE;
|
2013-08-08 21:41:11 +08:00
|
|
|
break;
|
2012-07-10 17:27:08 +08:00
|
|
|
default:
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2015-11-05 03:25:32 +08:00
|
|
|
intel_runtime_pm_get(dev_priv);
|
|
|
|
|
2012-09-27 07:15:20 +08:00
|
|
|
ret = i915_mutex_lock_interruptible(dev);
|
|
|
|
if (ret)
|
2015-11-05 03:25:32 +08:00
|
|
|
goto rpm_put;
|
2012-09-27 07:15:20 +08:00
|
|
|
|
2016-07-20 20:31:51 +08:00
|
|
|
obj = i915_gem_object_lookup(file, args->handle);
|
|
|
|
if (!obj) {
|
2012-07-10 17:27:08 +08:00
|
|
|
ret = -ENOENT;
|
|
|
|
goto unlock;
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = i915_gem_object_set_cache_level(obj, level);
|
|
|
|
|
2016-07-20 20:31:53 +08:00
|
|
|
i915_gem_object_put(obj);
|
2012-07-10 17:27:08 +08:00
|
|
|
unlock:
|
|
|
|
mutex_unlock(&dev->struct_mutex);
|
2015-11-05 03:25:32 +08:00
|
|
|
rpm_put:
|
|
|
|
intel_runtime_pm_put(dev_priv);
|
|
|
|
|
2012-07-10 17:27:08 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2009-11-25 13:09:39 +08:00
|
|
|
/*
|
2011-04-14 16:41:17 +08:00
|
|
|
* Prepare buffer for display plane (scanout, cursors, etc).
|
|
|
|
* Can be called from an uninterruptible phase (modesetting) and allows
|
|
|
|
* any flushes to be pipelined (for pageflips).
|
2009-11-25 13:09:39 +08:00
|
|
|
*/
|
2016-08-15 17:49:06 +08:00
|
|
|
struct i915_vma *
|
2011-04-14 16:41:17 +08:00
|
|
|
i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
|
|
|
|
u32 alignment,
|
2015-03-23 19:10:33 +08:00
|
|
|
const struct i915_ggtt_view *view)
|
2009-11-25 13:09:39 +08:00
|
|
|
{
|
2016-08-15 17:49:06 +08:00
|
|
|
struct i915_vma *vma;
|
2011-04-14 16:41:17 +08:00
|
|
|
u32 old_read_domains, old_write_domain;
|
2009-11-25 13:09:39 +08:00
|
|
|
int ret;
|
|
|
|
|
2013-08-09 19:25:09 +08:00
|
|
|
/* Mark the pin_display early so that we account for the
|
|
|
|
* display coherency whilst setting up the cache domains.
|
|
|
|
*/
|
2015-04-13 18:50:09 +08:00
|
|
|
obj->pin_display++;
|
2013-08-09 19:25:09 +08:00
|
|
|
|
2011-03-30 07:59:54 +08:00
|
|
|
/* The display engine is not coherent with the LLC cache on gen6. As
|
|
|
|
* a result, we make sure that the pinning that is about to occur is
|
|
|
|
* done with uncached PTEs. This is lowest common denominator for all
|
|
|
|
* chipsets.
|
|
|
|
*
|
|
|
|
* However for gen6+, we could do better by using the GFDT bit instead
|
|
|
|
* of uncaching, which would allow us to flush all the LLC-cached data
|
|
|
|
* with that bit in the PTE to main memory with just one PIPE_CONTROL.
|
|
|
|
*/
|
2013-08-08 21:41:10 +08:00
|
|
|
ret = i915_gem_object_set_cache_level(obj,
|
2016-10-13 18:03:00 +08:00
|
|
|
HAS_WT(to_i915(obj->base.dev)) ?
|
|
|
|
I915_CACHE_WT : I915_CACHE_NONE);
|
2016-08-15 17:49:06 +08:00
|
|
|
if (ret) {
|
|
|
|
vma = ERR_PTR(ret);
|
2013-08-09 19:25:09 +08:00
|
|
|
goto err_unpin_display;
|
2016-08-15 17:49:06 +08:00
|
|
|
}
|
2011-03-30 07:59:54 +08:00
|
|
|
|
2011-04-14 16:41:17 +08:00
|
|
|
/* As the user may map the buffer once pinned in the display plane
|
|
|
|
* (e.g. libkms for the bootup splash), we have to ensure that we
|
2016-08-19 00:17:06 +08:00
|
|
|
* always use map_and_fenceable for all scanout buffers. However,
|
|
|
|
* it may simply be too big to fit into mappable, in which case
|
|
|
|
* put it anyway and hope that userspace can cope (but always first
|
|
|
|
* try to preserve the existing ABI).
|
2011-04-14 16:41:17 +08:00
|
|
|
*/
|
2016-08-19 00:17:06 +08:00
|
|
|
vma = ERR_PTR(-ENOSPC);
|
|
|
|
if (view->type == I915_GGTT_VIEW_NORMAL)
|
|
|
|
vma = i915_gem_object_ggtt_pin(obj, view, 0, alignment,
|
|
|
|
PIN_MAPPABLE | PIN_NONBLOCK);
|
|
|
|
if (IS_ERR(vma))
|
|
|
|
vma = i915_gem_object_ggtt_pin(obj, view, 0, alignment, 0);
|
2016-08-15 17:49:06 +08:00
|
|
|
if (IS_ERR(vma))
|
2013-08-09 19:25:09 +08:00
|
|
|
goto err_unpin_display;
|
2011-04-14 16:41:17 +08:00
|
|
|
|
2016-08-19 00:17:07 +08:00
|
|
|
vma->display_alignment = max_t(u64, vma->display_alignment, alignment);
|
|
|
|
|
2016-08-15 17:49:06 +08:00
|
|
|
WARN_ON(obj->pin_display > i915_vma_pin_count(vma));
|
|
|
|
|
2015-01-21 21:53:48 +08:00
|
|
|
i915_gem_object_flush_cpu_write_domain(obj);
|
2010-05-27 20:18:14 +08:00
|
|
|
|
2011-04-14 16:41:17 +08:00
|
|
|
old_write_domain = obj->base.write_domain;
|
2010-11-09 03:18:58 +08:00
|
|
|
old_read_domains = obj->base.read_domains;
|
2011-04-14 16:41:17 +08:00
|
|
|
|
|
|
|
/* It should now be out of any other write domains, and we can update
|
|
|
|
* the domain values for our changes.
|
|
|
|
*/
|
2012-07-20 19:41:00 +08:00
|
|
|
obj->base.write_domain = 0;
|
2010-11-09 03:18:58 +08:00
|
|
|
obj->base.read_domains |= I915_GEM_DOMAIN_GTT;
|
2009-11-25 13:09:39 +08:00
|
|
|
|
|
|
|
trace_i915_gem_object_change_domain(obj,
|
|
|
|
old_read_domains,
|
2011-04-14 16:41:17 +08:00
|
|
|
old_write_domain);
|
2009-11-25 13:09:39 +08:00
|
|
|
|
2016-08-15 17:49:06 +08:00
|
|
|
return vma;
|
2013-08-09 19:25:09 +08:00
|
|
|
|
|
|
|
err_unpin_display:
|
2015-04-13 18:50:09 +08:00
|
|
|
obj->pin_display--;
|
2016-08-15 17:49:06 +08:00
|
|
|
return vma;
|
2013-08-09 19:25:09 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
void
|
2016-08-15 17:49:06 +08:00
|
|
|
i915_gem_object_unpin_from_display_plane(struct i915_vma *vma)
|
2013-08-09 19:25:09 +08:00
|
|
|
{
|
2016-08-15 17:49:06 +08:00
|
|
|
if (WARN_ON(vma->obj->pin_display == 0))
|
2015-04-13 18:50:09 +08:00
|
|
|
return;
|
|
|
|
|
2016-08-19 00:17:07 +08:00
|
|
|
if (--vma->obj->pin_display == 0)
|
|
|
|
vma->display_alignment = 0;
|
2015-03-23 19:10:33 +08:00
|
|
|
|
2016-08-19 00:17:08 +08:00
|
|
|
/* Bump the LRU to try and avoid premature eviction whilst flipping */
|
|
|
|
if (!i915_vma_is_active(vma))
|
|
|
|
list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
|
|
|
|
|
2016-08-15 17:49:06 +08:00
|
|
|
i915_vma_unpin(vma);
|
|
|
|
WARN_ON(vma->obj->pin_display > i915_vma_pin_count(vma));
|
2009-11-25 13:09:39 +08:00
|
|
|
}
|
|
|
|
|
2008-11-15 05:35:19 +08:00
|
|
|
/**
|
|
|
|
* Moves a single object to the CPU read, and possibly write domain.
|
2016-06-03 21:02:17 +08:00
|
|
|
* @obj: object to act on
|
|
|
|
* @write: requesting write or read-only access
|
2008-11-15 05:35:19 +08:00
|
|
|
*
|
|
|
|
* This function returns when the move is complete, including waiting on
|
|
|
|
* flushes to occur.
|
|
|
|
*/
|
2012-03-26 16:10:27 +08:00
|
|
|
int
|
2010-11-12 21:42:53 +08:00
|
|
|
i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
|
2008-11-15 05:35:19 +08:00
|
|
|
{
|
2009-08-25 18:15:50 +08:00
|
|
|
uint32_t old_write_domain, old_read_domains;
|
2008-11-15 05:35:19 +08:00
|
|
|
int ret;
|
|
|
|
|
2012-07-20 19:41:01 +08:00
|
|
|
ret = i915_gem_object_wait_rendering(obj, !write);
|
2011-01-08 01:09:48 +08:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2016-07-20 16:21:15 +08:00
|
|
|
if (obj->base.write_domain == I915_GEM_DOMAIN_CPU)
|
|
|
|
return 0;
|
|
|
|
|
2008-11-15 05:35:19 +08:00
|
|
|
i915_gem_object_flush_gtt_write_domain(obj);
|
2008-11-11 02:53:25 +08:00
|
|
|
|
2010-11-09 03:18:58 +08:00
|
|
|
old_write_domain = obj->base.write_domain;
|
|
|
|
old_read_domains = obj->base.read_domains;
|
2009-08-25 18:15:50 +08:00
|
|
|
|
2008-11-15 05:35:19 +08:00
|
|
|
/* Flush the CPU cache if it's still invalid. */
|
2010-11-09 03:18:58 +08:00
|
|
|
if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
|
2013-08-09 19:26:45 +08:00
|
|
|
i915_gem_clflush_object(obj, false);
|
2008-11-11 02:53:25 +08:00
|
|
|
|
2010-11-09 03:18:58 +08:00
|
|
|
obj->base.read_domains |= I915_GEM_DOMAIN_CPU;
|
2008-11-11 02:53:25 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/* It should now be out of any other write domains, and we can update
|
|
|
|
* the domain values for our changes.
|
|
|
|
*/
|
2010-11-09 03:18:58 +08:00
|
|
|
BUG_ON((obj->base.write_domain & ~I915_GEM_DOMAIN_CPU) != 0);
|
2008-11-15 05:35:19 +08:00
|
|
|
|
|
|
|
/* If we're writing through the CPU, then the GPU read domains will
|
|
|
|
* need to be invalidated at next use.
|
|
|
|
*/
|
|
|
|
if (write) {
|
2010-11-09 03:18:58 +08:00
|
|
|
obj->base.read_domains = I915_GEM_DOMAIN_CPU;
|
|
|
|
obj->base.write_domain = I915_GEM_DOMAIN_CPU;
|
2008-11-15 05:35:19 +08:00
|
|
|
}
|
2008-11-11 02:53:25 +08:00
|
|
|
|
2009-08-25 18:15:50 +08:00
|
|
|
trace_i915_gem_object_change_domain(obj,
|
|
|
|
old_read_domains,
|
|
|
|
old_write_domain);
|
|
|
|
|
2008-11-11 02:53:25 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2008-07-31 03:06:12 +08:00
|
|
|
/* Throttle our rendering by waiting until the ring has completed our requests
|
|
|
|
* emitted over 20 msec ago.
|
|
|
|
*
|
2009-06-03 15:27:35 +08:00
|
|
|
* Note that if we were to use the current jiffies each time around the loop,
|
|
|
|
* we wouldn't escape the function with any frames outstanding if the time to
|
|
|
|
* render a frame was over 20ms.
|
|
|
|
*
|
2008-07-31 03:06:12 +08:00
|
|
|
* This should get us reasonable parallelism between CPU and GPU but also
|
|
|
|
* relatively low latency when blocking on a particular request to finish.
|
|
|
|
*/
|
2009-03-13 02:23:52 +08:00
|
|
|
static int
|
2010-09-24 23:02:42 +08:00
|
|
|
i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
|
2009-03-13 02:23:52 +08:00
|
|
|
{
|
2016-07-04 18:34:36 +08:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(dev);
|
2010-09-24 23:02:42 +08:00
|
|
|
struct drm_i915_file_private *file_priv = file->driver_priv;
|
2015-05-22 04:01:48 +08:00
|
|
|
unsigned long recent_enough = jiffies - DRM_I915_THROTTLE_JIFFIES;
|
2014-11-25 02:49:27 +08:00
|
|
|
struct drm_i915_gem_request *request, *target = NULL;
|
2010-09-24 23:02:42 +08:00
|
|
|
int ret;
|
2010-01-31 18:40:48 +08:00
|
|
|
|
2012-11-15 00:14:06 +08:00
|
|
|
ret = i915_gem_wait_for_error(&dev_priv->gpu_error);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
drm/i915: Prevent leaking of -EIO from i915_wait_request()
Reporting -EIO from i915_wait_request() has proven very troublematic
over the years, with numerous hard-to-reproduce bugs cropping up in the
corner case of where a reset occurs and the code wasn't expecting such
an error.
If the we reset the GPU or have detected a hang and wish to reset the
GPU, the request is forcibly complete and the wait broken. Currently, we
report either -EAGAIN or -EIO in order for the caller to retreat and
restart the wait (if appropriate) after dropping and then reacquiring
the struct_mutex (essential to allow the GPU reset to proceed). However,
if we take the view that the request is complete (no further work will
be done on it by the GPU because it is dead and soon to be reset), then
we can proceed with the task at hand and then drop the struct_mutex
allowing the reset to occur. This transfers the burden of checking
whether it is safe to proceed to the caller, which in all but one
instance it is safe - completely eliminating the source of all spurious
-EIO.
Of note, we only have two API entry points where we expect that
userspace can observe an EIO. First is when submitting an execbuf, if
the GPU is terminally wedged, then the operation cannot succeed and an
-EIO is reported. Secondly, existing userspace uses the throttle ioctl
to detect an already wedged GPU before starting using HW acceleration
(or to confirm that the GPU is wedged after an error condition). So if
the GPU is wedged when the user calls throttle, also report -EIO.
v2: Split more carefully the change to i915_wait_request() and assorted
ABI from the reset handling.
v3: Add a couple of WARN_ON(EIO) to the interruptible modesetting code
so that we don't start to leak EIO there in future (and break our hang
resistant modesetting).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-9-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-1-git-send-email-chris@chris-wilson.co.uk
2016-04-14 00:35:08 +08:00
|
|
|
/* ABI: return -EIO if already wedged */
|
|
|
|
if (i915_terminally_wedged(&dev_priv->gpu_error))
|
|
|
|
return -EIO;
|
2011-01-26 23:39:14 +08:00
|
|
|
|
2010-09-26 18:03:27 +08:00
|
|
|
spin_lock(&file_priv->mm.lock);
|
2010-09-24 23:02:42 +08:00
|
|
|
list_for_each_entry(request, &file_priv->mm.request_list, client_list) {
|
2009-06-03 15:27:35 +08:00
|
|
|
if (time_after_eq(request->emitted_jiffies, recent_enough))
|
|
|
|
break;
|
2009-03-13 02:23:52 +08:00
|
|
|
|
2015-05-30 00:44:12 +08:00
|
|
|
/*
|
|
|
|
* Note that the request might not have been submitted yet.
|
|
|
|
* In which case emitted_jiffies will be zero.
|
|
|
|
*/
|
|
|
|
if (!request->emitted_jiffies)
|
|
|
|
continue;
|
|
|
|
|
2014-11-25 02:49:27 +08:00
|
|
|
target = request;
|
2009-06-03 15:27:35 +08:00
|
|
|
}
|
2014-11-25 02:49:28 +08:00
|
|
|
if (target)
|
2016-07-20 20:31:49 +08:00
|
|
|
i915_gem_request_get(target);
|
2010-09-26 18:03:27 +08:00
|
|
|
spin_unlock(&file_priv->mm.lock);
|
2009-03-13 02:23:52 +08:00
|
|
|
|
2014-11-25 02:49:27 +08:00
|
|
|
if (target == NULL)
|
2010-09-24 23:02:42 +08:00
|
|
|
return 0;
|
2009-04-07 04:55:41 +08:00
|
|
|
|
2016-09-09 21:11:49 +08:00
|
|
|
ret = i915_wait_request(target, I915_WAIT_INTERRUPTIBLE, NULL, NULL);
|
2016-07-20 20:31:49 +08:00
|
|
|
i915_gem_request_put(target);
|
2014-11-25 02:49:28 +08:00
|
|
|
|
2009-03-13 02:23:52 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
drm/i915: Prevent negative relocation deltas from wrapping
This is pure evil. Userspace, I'm looking at you SNA, repacks batch
buffers on the fly after generation as they are being passed to the
kernel for execution. These batches also contain self-referenced
relocations as a single buffer encompasses the state commands, kernels,
vertices and sampler. During generation the buffers are placed at known
offsets within the full batch, and then the relocation deltas (as passed
to the kernel) are tweaked as the batch is repacked into a smaller buffer.
This means that userspace is passing negative relocations deltas, which
subsequently wrap to large values if the batch is at a low address. The
GPU hangs when it then tries to use the large value as a base for its
address offsets, rather than wrapping back to the real value (as one
would hope). As the GPU uses positive offsets from the base, we can
treat the relocation address as the minimum address read by the GPU.
For the upper bound, we trust that userspace will not read beyond the
end of the buffer.
So, how do we fix negative relocations from wrapping? We can either
check that every relocation looks valid when we write it, and then
position each object such that we prevent the offset wraparound, or we
just special-case the self-referential behaviour of SNA and force all
batches to be above 256k. Daniel prefers the latter approach.
This fixes a GPU hang when it tries to use an address (relocation +
offset) greater than the GTT size. The issue would occur quite easily
with full-ppgtt as each fd gets its own VM space, so low offsets would
often be handed out. However, with the rearrangement of the low GTT due
to capturing the BIOS framebuffer, it is already affecting kernels 3.15
onwards. I think only IVB+ is susceptible to this bug, but the workaround
should only kick in rarely, so it seems sensible to always apply it.
v3: Use a bias for batch buffers to prevent small negative delta relocations
from wrapping.
v4 from Daniel:
- s/BIAS/BATCH_OFFSET_BIAS/
- Extract eb_vma_misplaced/i915_vma_misplaced since the conditions
were growing rather cumbersome.
- Add a comment to eb_get_batch explaining why we do this.
- Apply the batch offset bias everywhere but mention that we've only
observed it on gen7 gpus.
- Drop PIN_OFFSET_FIX for now, that slipped in from a feature patch.
v5: Add static to eb_get_batch, spotted by 0-day tester.
Testcase: igt/gem_bad_reloc
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78533
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v3)
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-23 14:48:08 +08:00
|
|
|
static bool
|
2016-08-04 23:32:23 +08:00
|
|
|
i915_vma_misplaced(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
|
drm/i915: Prevent negative relocation deltas from wrapping
This is pure evil. Userspace, I'm looking at you SNA, repacks batch
buffers on the fly after generation as they are being passed to the
kernel for execution. These batches also contain self-referenced
relocations as a single buffer encompasses the state commands, kernels,
vertices and sampler. During generation the buffers are placed at known
offsets within the full batch, and then the relocation deltas (as passed
to the kernel) are tweaked as the batch is repacked into a smaller buffer.
This means that userspace is passing negative relocations deltas, which
subsequently wrap to large values if the batch is at a low address. The
GPU hangs when it then tries to use the large value as a base for its
address offsets, rather than wrapping back to the real value (as one
would hope). As the GPU uses positive offsets from the base, we can
treat the relocation address as the minimum address read by the GPU.
For the upper bound, we trust that userspace will not read beyond the
end of the buffer.
So, how do we fix negative relocations from wrapping? We can either
check that every relocation looks valid when we write it, and then
position each object such that we prevent the offset wraparound, or we
just special-case the self-referential behaviour of SNA and force all
batches to be above 256k. Daniel prefers the latter approach.
This fixes a GPU hang when it tries to use an address (relocation +
offset) greater than the GTT size. The issue would occur quite easily
with full-ppgtt as each fd gets its own VM space, so low offsets would
often be handed out. However, with the rearrangement of the low GTT due
to capturing the BIOS framebuffer, it is already affecting kernels 3.15
onwards. I think only IVB+ is susceptible to this bug, but the workaround
should only kick in rarely, so it seems sensible to always apply it.
v3: Use a bias for batch buffers to prevent small negative delta relocations
from wrapping.
v4 from Daniel:
- s/BIAS/BATCH_OFFSET_BIAS/
- Extract eb_vma_misplaced/i915_vma_misplaced since the conditions
were growing rather cumbersome.
- Add a comment to eb_get_batch explaining why we do this.
- Apply the batch offset bias everywhere but mention that we've only
observed it on gen7 gpus.
- Drop PIN_OFFSET_FIX for now, that slipped in from a feature patch.
v5: Add static to eb_get_batch, spotted by 0-day tester.
Testcase: igt/gem_bad_reloc
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78533
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v3)
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-23 14:48:08 +08:00
|
|
|
{
|
2016-08-04 23:32:31 +08:00
|
|
|
if (!drm_mm_node_allocated(&vma->node))
|
|
|
|
return false;
|
drm/i915: Prevent negative relocation deltas from wrapping
This is pure evil. Userspace, I'm looking at you SNA, repacks batch
buffers on the fly after generation as they are being passed to the
kernel for execution. These batches also contain self-referenced
relocations as a single buffer encompasses the state commands, kernels,
vertices and sampler. During generation the buffers are placed at known
offsets within the full batch, and then the relocation deltas (as passed
to the kernel) are tweaked as the batch is repacked into a smaller buffer.
This means that userspace is passing negative relocations deltas, which
subsequently wrap to large values if the batch is at a low address. The
GPU hangs when it then tries to use the large value as a base for its
address offsets, rather than wrapping back to the real value (as one
would hope). As the GPU uses positive offsets from the base, we can
treat the relocation address as the minimum address read by the GPU.
For the upper bound, we trust that userspace will not read beyond the
end of the buffer.
So, how do we fix negative relocations from wrapping? We can either
check that every relocation looks valid when we write it, and then
position each object such that we prevent the offset wraparound, or we
just special-case the self-referential behaviour of SNA and force all
batches to be above 256k. Daniel prefers the latter approach.
This fixes a GPU hang when it tries to use an address (relocation +
offset) greater than the GTT size. The issue would occur quite easily
with full-ppgtt as each fd gets its own VM space, so low offsets would
often be handed out. However, with the rearrangement of the low GTT due
to capturing the BIOS framebuffer, it is already affecting kernels 3.15
onwards. I think only IVB+ is susceptible to this bug, but the workaround
should only kick in rarely, so it seems sensible to always apply it.
v3: Use a bias for batch buffers to prevent small negative delta relocations
from wrapping.
v4 from Daniel:
- s/BIAS/BATCH_OFFSET_BIAS/
- Extract eb_vma_misplaced/i915_vma_misplaced since the conditions
were growing rather cumbersome.
- Add a comment to eb_get_batch explaining why we do this.
- Apply the batch offset bias everywhere but mention that we've only
observed it on gen7 gpus.
- Drop PIN_OFFSET_FIX for now, that slipped in from a feature patch.
v5: Add static to eb_get_batch, spotted by 0-day tester.
Testcase: igt/gem_bad_reloc
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78533
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v3)
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-23 14:48:08 +08:00
|
|
|
|
2016-08-04 23:32:23 +08:00
|
|
|
if (vma->node.size < size)
|
drm/i915: Prevent negative relocation deltas from wrapping
This is pure evil. Userspace, I'm looking at you SNA, repacks batch
buffers on the fly after generation as they are being passed to the
kernel for execution. These batches also contain self-referenced
relocations as a single buffer encompasses the state commands, kernels,
vertices and sampler. During generation the buffers are placed at known
offsets within the full batch, and then the relocation deltas (as passed
to the kernel) are tweaked as the batch is repacked into a smaller buffer.
This means that userspace is passing negative relocations deltas, which
subsequently wrap to large values if the batch is at a low address. The
GPU hangs when it then tries to use the large value as a base for its
address offsets, rather than wrapping back to the real value (as one
would hope). As the GPU uses positive offsets from the base, we can
treat the relocation address as the minimum address read by the GPU.
For the upper bound, we trust that userspace will not read beyond the
end of the buffer.
So, how do we fix negative relocations from wrapping? We can either
check that every relocation looks valid when we write it, and then
position each object such that we prevent the offset wraparound, or we
just special-case the self-referential behaviour of SNA and force all
batches to be above 256k. Daniel prefers the latter approach.
This fixes a GPU hang when it tries to use an address (relocation +
offset) greater than the GTT size. The issue would occur quite easily
with full-ppgtt as each fd gets its own VM space, so low offsets would
often be handed out. However, with the rearrangement of the low GTT due
to capturing the BIOS framebuffer, it is already affecting kernels 3.15
onwards. I think only IVB+ is susceptible to this bug, but the workaround
should only kick in rarely, so it seems sensible to always apply it.
v3: Use a bias for batch buffers to prevent small negative delta relocations
from wrapping.
v4 from Daniel:
- s/BIAS/BATCH_OFFSET_BIAS/
- Extract eb_vma_misplaced/i915_vma_misplaced since the conditions
were growing rather cumbersome.
- Add a comment to eb_get_batch explaining why we do this.
- Apply the batch offset bias everywhere but mention that we've only
observed it on gen7 gpus.
- Drop PIN_OFFSET_FIX for now, that slipped in from a feature patch.
v5: Add static to eb_get_batch, spotted by 0-day tester.
Testcase: igt/gem_bad_reloc
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78533
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v3)
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-23 14:48:08 +08:00
|
|
|
return true;
|
|
|
|
|
2016-08-04 23:32:23 +08:00
|
|
|
if (alignment && vma->node.start & (alignment - 1))
|
drm/i915: Prevent negative relocation deltas from wrapping
This is pure evil. Userspace, I'm looking at you SNA, repacks batch
buffers on the fly after generation as they are being passed to the
kernel for execution. These batches also contain self-referenced
relocations as a single buffer encompasses the state commands, kernels,
vertices and sampler. During generation the buffers are placed at known
offsets within the full batch, and then the relocation deltas (as passed
to the kernel) are tweaked as the batch is repacked into a smaller buffer.
This means that userspace is passing negative relocations deltas, which
subsequently wrap to large values if the batch is at a low address. The
GPU hangs when it then tries to use the large value as a base for its
address offsets, rather than wrapping back to the real value (as one
would hope). As the GPU uses positive offsets from the base, we can
treat the relocation address as the minimum address read by the GPU.
For the upper bound, we trust that userspace will not read beyond the
end of the buffer.
So, how do we fix negative relocations from wrapping? We can either
check that every relocation looks valid when we write it, and then
position each object such that we prevent the offset wraparound, or we
just special-case the self-referential behaviour of SNA and force all
batches to be above 256k. Daniel prefers the latter approach.
This fixes a GPU hang when it tries to use an address (relocation +
offset) greater than the GTT size. The issue would occur quite easily
with full-ppgtt as each fd gets its own VM space, so low offsets would
often be handed out. However, with the rearrangement of the low GTT due
to capturing the BIOS framebuffer, it is already affecting kernels 3.15
onwards. I think only IVB+ is susceptible to this bug, but the workaround
should only kick in rarely, so it seems sensible to always apply it.
v3: Use a bias for batch buffers to prevent small negative delta relocations
from wrapping.
v4 from Daniel:
- s/BIAS/BATCH_OFFSET_BIAS/
- Extract eb_vma_misplaced/i915_vma_misplaced since the conditions
were growing rather cumbersome.
- Add a comment to eb_get_batch explaining why we do this.
- Apply the batch offset bias everywhere but mention that we've only
observed it on gen7 gpus.
- Drop PIN_OFFSET_FIX for now, that slipped in from a feature patch.
v5: Add static to eb_get_batch, spotted by 0-day tester.
Testcase: igt/gem_bad_reloc
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78533
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v3)
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-23 14:48:08 +08:00
|
|
|
return true;
|
|
|
|
|
2016-08-19 00:16:55 +08:00
|
|
|
if (flags & PIN_MAPPABLE && !i915_vma_is_map_and_fenceable(vma))
|
drm/i915: Prevent negative relocation deltas from wrapping
This is pure evil. Userspace, I'm looking at you SNA, repacks batch
buffers on the fly after generation as they are being passed to the
kernel for execution. These batches also contain self-referenced
relocations as a single buffer encompasses the state commands, kernels,
vertices and sampler. During generation the buffers are placed at known
offsets within the full batch, and then the relocation deltas (as passed
to the kernel) are tweaked as the batch is repacked into a smaller buffer.
This means that userspace is passing negative relocations deltas, which
subsequently wrap to large values if the batch is at a low address. The
GPU hangs when it then tries to use the large value as a base for its
address offsets, rather than wrapping back to the real value (as one
would hope). As the GPU uses positive offsets from the base, we can
treat the relocation address as the minimum address read by the GPU.
For the upper bound, we trust that userspace will not read beyond the
end of the buffer.
So, how do we fix negative relocations from wrapping? We can either
check that every relocation looks valid when we write it, and then
position each object such that we prevent the offset wraparound, or we
just special-case the self-referential behaviour of SNA and force all
batches to be above 256k. Daniel prefers the latter approach.
This fixes a GPU hang when it tries to use an address (relocation +
offset) greater than the GTT size. The issue would occur quite easily
with full-ppgtt as each fd gets its own VM space, so low offsets would
often be handed out. However, with the rearrangement of the low GTT due
to capturing the BIOS framebuffer, it is already affecting kernels 3.15
onwards. I think only IVB+ is susceptible to this bug, but the workaround
should only kick in rarely, so it seems sensible to always apply it.
v3: Use a bias for batch buffers to prevent small negative delta relocations
from wrapping.
v4 from Daniel:
- s/BIAS/BATCH_OFFSET_BIAS/
- Extract eb_vma_misplaced/i915_vma_misplaced since the conditions
were growing rather cumbersome.
- Add a comment to eb_get_batch explaining why we do this.
- Apply the batch offset bias everywhere but mention that we've only
observed it on gen7 gpus.
- Drop PIN_OFFSET_FIX for now, that slipped in from a feature patch.
v5: Add static to eb_get_batch, spotted by 0-day tester.
Testcase: igt/gem_bad_reloc
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78533
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v3)
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-23 14:48:08 +08:00
|
|
|
return true;
|
|
|
|
|
|
|
|
if (flags & PIN_OFFSET_BIAS &&
|
|
|
|
vma->node.start < (flags & PIN_OFFSET_MASK))
|
|
|
|
return true;
|
|
|
|
|
2015-12-08 19:55:07 +08:00
|
|
|
if (flags & PIN_OFFSET_FIXED &&
|
|
|
|
vma->node.start != (flags & PIN_OFFSET_MASK))
|
|
|
|
return true;
|
|
|
|
|
drm/i915: Prevent negative relocation deltas from wrapping
This is pure evil. Userspace, I'm looking at you SNA, repacks batch
buffers on the fly after generation as they are being passed to the
kernel for execution. These batches also contain self-referenced
relocations as a single buffer encompasses the state commands, kernels,
vertices and sampler. During generation the buffers are placed at known
offsets within the full batch, and then the relocation deltas (as passed
to the kernel) are tweaked as the batch is repacked into a smaller buffer.
This means that userspace is passing negative relocations deltas, which
subsequently wrap to large values if the batch is at a low address. The
GPU hangs when it then tries to use the large value as a base for its
address offsets, rather than wrapping back to the real value (as one
would hope). As the GPU uses positive offsets from the base, we can
treat the relocation address as the minimum address read by the GPU.
For the upper bound, we trust that userspace will not read beyond the
end of the buffer.
So, how do we fix negative relocations from wrapping? We can either
check that every relocation looks valid when we write it, and then
position each object such that we prevent the offset wraparound, or we
just special-case the self-referential behaviour of SNA and force all
batches to be above 256k. Daniel prefers the latter approach.
This fixes a GPU hang when it tries to use an address (relocation +
offset) greater than the GTT size. The issue would occur quite easily
with full-ppgtt as each fd gets its own VM space, so low offsets would
often be handed out. However, with the rearrangement of the low GTT due
to capturing the BIOS framebuffer, it is already affecting kernels 3.15
onwards. I think only IVB+ is susceptible to this bug, but the workaround
should only kick in rarely, so it seems sensible to always apply it.
v3: Use a bias for batch buffers to prevent small negative delta relocations
from wrapping.
v4 from Daniel:
- s/BIAS/BATCH_OFFSET_BIAS/
- Extract eb_vma_misplaced/i915_vma_misplaced since the conditions
were growing rather cumbersome.
- Add a comment to eb_get_batch explaining why we do this.
- Apply the batch offset bias everywhere but mention that we've only
observed it on gen7 gpus.
- Drop PIN_OFFSET_FIX for now, that slipped in from a feature patch.
v5: Add static to eb_get_batch, spotted by 0-day tester.
Testcase: igt/gem_bad_reloc
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78533
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v3)
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-23 14:48:08 +08:00
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2015-11-20 22:16:39 +08:00
|
|
|
void __i915_vma_set_map_and_fenceable(struct i915_vma *vma)
|
|
|
|
{
|
|
|
|
struct drm_i915_gem_object *obj = vma->obj;
|
2016-08-04 23:32:28 +08:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
|
2015-11-20 22:16:39 +08:00
|
|
|
bool mappable, fenceable;
|
|
|
|
u32 fence_size, fence_alignment;
|
|
|
|
|
2016-08-04 23:32:28 +08:00
|
|
|
fence_size = i915_gem_get_ggtt_size(dev_priv,
|
2016-08-19 00:16:55 +08:00
|
|
|
vma->size,
|
2016-08-05 17:14:23 +08:00
|
|
|
i915_gem_object_get_tiling(obj));
|
2016-08-04 23:32:28 +08:00
|
|
|
fence_alignment = i915_gem_get_ggtt_alignment(dev_priv,
|
2016-08-19 00:16:55 +08:00
|
|
|
vma->size,
|
2016-08-05 17:14:23 +08:00
|
|
|
i915_gem_object_get_tiling(obj),
|
2016-08-04 23:32:27 +08:00
|
|
|
true);
|
2015-11-20 22:16:39 +08:00
|
|
|
|
|
|
|
fenceable = (vma->node.size == fence_size &&
|
|
|
|
(vma->node.start & (fence_alignment - 1)) == 0);
|
|
|
|
|
|
|
|
mappable = (vma->node.start + fence_size <=
|
2016-08-04 23:32:28 +08:00
|
|
|
dev_priv->ggtt.mappable_end);
|
2015-11-20 22:16:39 +08:00
|
|
|
|
2016-08-19 00:16:55 +08:00
|
|
|
if (mappable && fenceable)
|
|
|
|
vma->flags |= I915_VMA_CAN_FENCE;
|
|
|
|
else
|
|
|
|
vma->flags &= ~I915_VMA_CAN_FENCE;
|
2015-11-20 22:16:39 +08:00
|
|
|
}
|
|
|
|
|
2016-08-04 23:32:33 +08:00
|
|
|
int __i915_vma_do_pin(struct i915_vma *vma,
|
|
|
|
u64 size, u64 alignment, u64 flags)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
2016-08-04 23:32:33 +08:00
|
|
|
unsigned int bound = vma->flags;
|
2008-07-31 03:06:12 +08:00
|
|
|
int ret;
|
|
|
|
|
2016-08-04 23:32:31 +08:00
|
|
|
GEM_BUG_ON((flags & (PIN_GLOBAL | PIN_USER)) == 0);
|
2016-08-04 23:32:32 +08:00
|
|
|
GEM_BUG_ON((flags & PIN_GLOBAL) && !i915_vma_is_ggtt(vma));
|
drm/i915: plumb VM into bind/unbind code
As alluded to in several patches, and it will be reiterated later... A
VMA is an abstraction for a GEM BO bound into an address space.
Therefore it stands to reason, that the existing bind, and unbind are
the ones which will be the most impacted. This patch implements this,
and updates all callers which weren't already updated in the series
(because it was too messy).
This patch represents the bulk of an earlier, larger patch. I've pulled
out a bunch of things by the request of Daniel. The history is preserved
for posterity with the email convention of ">" One big change from the
original patch aside from a bunch of cropping is I've created an
i915_vma_unbind() function. That is because we always have the VMA
anyway, and doing an extra lookup is useful. There is a caveat, we
retain an i915_gem_object_ggtt_unbind, for the global cases which might
not talk in VMAs.
> drm/i915: plumb VM into object operations
>
> This patch was formerly known as:
> "drm/i915: Create VMAs (part 3) - plumbing"
>
> This patch adds a VM argument, bind/unbind, and the object
> offset/size/color getters/setters. It preserves the old ggtt helper
> functions because things still need, and will continue to need them.
>
> Some code will still need to be ported over after this.
>
> v2: Fix purge to pick an object and unbind all vmas
> This was doable because of the global bound list change.
>
> v3: With the commit to actually pin/unpin pages in place, there is no
> longer a need to check if unbind succeeded before calling put_pages().
> Make put_pages only BUG() after checking pin count.
>
> v4: Rebased on top of the new hangcheck work by Mika
> plumbed eb_destroy also
> Many checkpatch related fixes
>
> v5: Very large rebase
>
> v6:
> Change BUG_ON to WARN_ON (Daniel)
> Rename vm to ggtt in preallocate stolen, since it is always ggtt when
> dealing with stolen memory. (Daniel)
> list_for_each will short-circuit already (Daniel)
> remove superflous space (Daniel)
> Use per object list of vmas (Daniel)
> Make obj_bound_any() use obj_bound for each vm (Ben)
> s/bind_to_gtt/bind_to_vm/ (Ben)
>
> Fixed up the inactive shrinker. As Daniel noticed the code could
> potentially count the same object multiple times. While it's not
> possible in the current case, since 1 object can only ever be bound into
> 1 address space thus far - we may as well try to get something more
> future proof in place now. With a prep patch before this to switch over
> to using the bound list + inactive check, we're now able to carry that
> forward for every address space an object is bound into.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Rebase on top of the loss of "drm/i915: Cleanup more of VMA
in destroy".]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-01 08:00:10 +08:00
|
|
|
|
2016-08-04 23:32:33 +08:00
|
|
|
if (WARN_ON(bound & I915_VMA_PIN_OVERFLOW)) {
|
|
|
|
ret = -EBUSY;
|
|
|
|
goto err;
|
|
|
|
}
|
2014-10-31 21:53:53 +08:00
|
|
|
|
2016-08-04 23:32:34 +08:00
|
|
|
if ((bound & I915_VMA_BIND_MASK) == 0) {
|
2016-08-04 23:32:31 +08:00
|
|
|
ret = i915_vma_insert(vma, size, alignment, flags);
|
|
|
|
if (ret)
|
|
|
|
goto err;
|
2014-12-11 01:27:58 +08:00
|
|
|
}
|
2015-03-16 20:11:13 +08:00
|
|
|
|
2016-08-04 23:32:31 +08:00
|
|
|
ret = i915_vma_bind(vma, vma->obj->cache_level, flags);
|
2016-08-04 23:32:25 +08:00
|
|
|
if (ret)
|
2016-08-04 23:32:31 +08:00
|
|
|
goto err;
|
2015-03-16 20:11:13 +08:00
|
|
|
|
2016-08-04 23:32:32 +08:00
|
|
|
if ((bound ^ vma->flags) & I915_VMA_GLOBAL_BIND)
|
2015-11-20 22:16:39 +08:00
|
|
|
__i915_vma_set_map_and_fenceable(vma);
|
2013-12-07 06:10:55 +08:00
|
|
|
|
2016-08-04 23:32:25 +08:00
|
|
|
GEM_BUG_ON(i915_vma_misplaced(vma, size, alignment, flags));
|
2008-07-31 03:06:12 +08:00
|
|
|
return 0;
|
2014-02-14 21:01:21 +08:00
|
|
|
|
2016-08-04 23:32:31 +08:00
|
|
|
err:
|
|
|
|
__i915_vma_unpin(vma);
|
|
|
|
return ret;
|
2015-03-16 20:11:13 +08:00
|
|
|
}
|
2010-05-27 20:18:18 +08:00
|
|
|
|
2016-08-15 17:49:06 +08:00
|
|
|
struct i915_vma *
|
2015-03-16 20:11:13 +08:00
|
|
|
i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
|
|
|
|
const struct i915_ggtt_view *view,
|
2016-08-04 23:32:23 +08:00
|
|
|
u64 size,
|
2016-08-04 23:32:22 +08:00
|
|
|
u64 alignment,
|
|
|
|
u64 flags)
|
2015-03-16 20:11:13 +08:00
|
|
|
{
|
2016-10-13 16:55:04 +08:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
|
|
|
|
struct i915_address_space *vm = &dev_priv->ggtt.base;
|
2016-08-04 23:32:31 +08:00
|
|
|
struct i915_vma *vma;
|
|
|
|
int ret;
|
2016-03-30 21:57:10 +08:00
|
|
|
|
2016-08-15 17:49:06 +08:00
|
|
|
vma = i915_gem_obj_lookup_or_create_vma(obj, vm, view);
|
2016-08-04 23:32:31 +08:00
|
|
|
if (IS_ERR(vma))
|
2016-08-15 17:49:06 +08:00
|
|
|
return vma;
|
2016-08-04 23:32:31 +08:00
|
|
|
|
|
|
|
if (i915_vma_misplaced(vma, size, alignment, flags)) {
|
|
|
|
if (flags & PIN_NONBLOCK &&
|
|
|
|
(i915_vma_is_pinned(vma) || i915_vma_is_active(vma)))
|
2016-08-15 17:49:06 +08:00
|
|
|
return ERR_PTR(-ENOSPC);
|
2016-08-04 23:32:31 +08:00
|
|
|
|
2016-10-13 16:55:04 +08:00
|
|
|
if (flags & PIN_MAPPABLE) {
|
|
|
|
u32 fence_size;
|
|
|
|
|
|
|
|
fence_size = i915_gem_get_ggtt_size(dev_priv, vma->size,
|
|
|
|
i915_gem_object_get_tiling(obj));
|
|
|
|
/* If the required space is larger than the available
|
|
|
|
* aperture, we will not able to find a slot for the
|
|
|
|
* object and unbinding the object now will be in
|
|
|
|
* vain. Worse, doing so may cause us to ping-pong
|
|
|
|
* the object in and out of the Global GTT and
|
|
|
|
* waste a lot of cycles under the mutex.
|
|
|
|
*/
|
|
|
|
if (fence_size > dev_priv->ggtt.mappable_end)
|
|
|
|
return ERR_PTR(-E2BIG);
|
|
|
|
|
|
|
|
/* If NONBLOCK is set the caller is optimistically
|
|
|
|
* trying to cache the full object within the mappable
|
|
|
|
* aperture, and *must* have a fallback in place for
|
|
|
|
* situations where we cannot bind the object. We
|
|
|
|
* can be a little more lax here and use the fallback
|
|
|
|
* more often to avoid costly migrations of ourselves
|
|
|
|
* and other objects within the aperture.
|
|
|
|
*
|
|
|
|
* Half-the-aperture is used as a simple heuristic.
|
|
|
|
* More interesting would to do search for a free
|
|
|
|
* block prior to making the commitment to unbind.
|
|
|
|
* That caters for the self-harm case, and with a
|
|
|
|
* little more heuristics (e.g. NOFAULT, NOEVICT)
|
|
|
|
* we could try to minimise harm to others.
|
|
|
|
*/
|
|
|
|
if (flags & PIN_NONBLOCK &&
|
|
|
|
fence_size > dev_priv->ggtt.mappable_end / 2)
|
|
|
|
return ERR_PTR(-ENOSPC);
|
|
|
|
}
|
|
|
|
|
2016-08-04 23:32:31 +08:00
|
|
|
WARN(i915_vma_is_pinned(vma),
|
|
|
|
"bo is already pinned in ggtt with incorrect alignment:"
|
2016-08-19 00:16:55 +08:00
|
|
|
" offset=%08x, req.alignment=%llx,"
|
|
|
|
" req.map_and_fenceable=%d, vma->map_and_fenceable=%d\n",
|
|
|
|
i915_ggtt_offset(vma), alignment,
|
2016-08-04 23:32:31 +08:00
|
|
|
!!(flags & PIN_MAPPABLE),
|
2016-08-19 00:16:55 +08:00
|
|
|
i915_vma_is_map_and_fenceable(vma));
|
2016-08-04 23:32:31 +08:00
|
|
|
ret = i915_vma_unbind(vma);
|
2014-12-11 01:27:58 +08:00
|
|
|
if (ret)
|
2016-08-15 17:49:06 +08:00
|
|
|
return ERR_PTR(ret);
|
2014-12-11 01:27:58 +08:00
|
|
|
}
|
2012-02-16 06:50:22 +08:00
|
|
|
|
2016-08-15 17:49:06 +08:00
|
|
|
ret = i915_vma_pin(vma, size, alignment, flags | PIN_GLOBAL);
|
|
|
|
if (ret)
|
|
|
|
return ERR_PTR(ret);
|
2014-10-31 21:53:52 +08:00
|
|
|
|
2016-08-15 17:49:06 +08:00
|
|
|
return vma;
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
|
|
|
|
2016-08-09 16:23:33 +08:00
|
|
|
static __always_inline unsigned int __busy_read_flag(unsigned int id)
|
2015-03-16 20:11:13 +08:00
|
|
|
{
|
2016-08-05 17:14:18 +08:00
|
|
|
/* Note that we could alias engines in the execbuf API, but
|
|
|
|
* that would be very unwise as it prevents userspace from
|
|
|
|
* fine control over engine selection. Ahem.
|
|
|
|
*
|
|
|
|
* This should be something like EXEC_MAX_ENGINE instead of
|
|
|
|
* I915_NUM_ENGINES.
|
|
|
|
*/
|
|
|
|
BUILD_BUG_ON(I915_NUM_ENGINES > 16);
|
|
|
|
return 0x10000 << id;
|
2015-03-16 20:11:13 +08:00
|
|
|
}
|
|
|
|
|
2016-08-05 17:14:18 +08:00
|
|
|
static __always_inline unsigned int __busy_write_id(unsigned int id)
|
2015-03-16 20:11:13 +08:00
|
|
|
{
|
2016-08-10 01:08:25 +08:00
|
|
|
/* The uABI guarantees an active writer is also amongst the read
|
|
|
|
* engines. This would be true if we accessed the activity tracking
|
|
|
|
* under the lock, but as we perform the lookup of the object and
|
|
|
|
* its activity locklessly we can not guarantee that the last_write
|
|
|
|
* being active implies that we have set the same engine flag from
|
|
|
|
* last_read - hence we always set both read and write busy for
|
|
|
|
* last_write.
|
|
|
|
*/
|
|
|
|
return id | __busy_read_flag(id);
|
2016-08-05 17:14:18 +08:00
|
|
|
}
|
|
|
|
|
2016-08-09 16:23:33 +08:00
|
|
|
static __always_inline unsigned int
|
2016-08-05 17:14:18 +08:00
|
|
|
__busy_set_if_active(const struct i915_gem_active *active,
|
|
|
|
unsigned int (*flag)(unsigned int id))
|
|
|
|
{
|
2016-08-16 16:50:40 +08:00
|
|
|
struct drm_i915_gem_request *request;
|
2016-03-30 21:57:10 +08:00
|
|
|
|
2016-08-16 16:50:40 +08:00
|
|
|
request = rcu_dereference(active->request);
|
|
|
|
if (!request || i915_gem_request_completed(request))
|
|
|
|
return 0;
|
2015-03-16 20:11:13 +08:00
|
|
|
|
2016-08-16 16:50:40 +08:00
|
|
|
/* This is racy. See __i915_gem_active_get_rcu() for an in detail
|
|
|
|
* discussion of how to handle the race correctly, but for reporting
|
|
|
|
* the busy state we err on the side of potentially reporting the
|
|
|
|
* wrong engine as being busy (but we guarantee that the result
|
|
|
|
* is at least self-consistent).
|
|
|
|
*
|
|
|
|
* As we use SLAB_DESTROY_BY_RCU, the request may be reallocated
|
|
|
|
* whilst we are inspecting it, even under the RCU read lock as we are.
|
|
|
|
* This means that there is a small window for the engine and/or the
|
|
|
|
* seqno to have been overwritten. The seqno will always be in the
|
|
|
|
* future compared to the intended, and so we know that if that
|
|
|
|
* seqno is idle (on whatever engine) our request is idle and the
|
|
|
|
* return 0 above is correct.
|
|
|
|
*
|
|
|
|
* The issue is that if the engine is switched, it is just as likely
|
|
|
|
* to report that it is busy (but since the switch happened, we know
|
|
|
|
* the request should be idle). So there is a small chance that a busy
|
|
|
|
* result is actually the wrong engine.
|
|
|
|
*
|
|
|
|
* So why don't we care?
|
|
|
|
*
|
|
|
|
* For starters, the busy ioctl is a heuristic that is by definition
|
|
|
|
* racy. Even with perfect serialisation in the driver, the hardware
|
|
|
|
* state is constantly advancing - the state we report to the user
|
|
|
|
* is stale.
|
|
|
|
*
|
|
|
|
* The critical information for the busy-ioctl is whether the object
|
|
|
|
* is idle as userspace relies on that to detect whether its next
|
|
|
|
* access will stall, or if it has missed submitting commands to
|
|
|
|
* the hardware allowing the GPU to stall. We never generate a
|
|
|
|
* false-positive for idleness, thus busy-ioctl is reliable at the
|
|
|
|
* most fundamental level, and we maintain the guarantee that a
|
|
|
|
* busy object left to itself will eventually become idle (and stay
|
|
|
|
* idle!).
|
|
|
|
*
|
|
|
|
* We allow ourselves the leeway of potentially misreporting the busy
|
|
|
|
* state because that is an optimisation heuristic that is constantly
|
|
|
|
* in flux. Being quickly able to detect the busy/idle state is much
|
|
|
|
* more important than accurate logging of exactly which engines were
|
|
|
|
* busy.
|
|
|
|
*
|
|
|
|
* For accuracy in reporting the engine, we could use
|
|
|
|
*
|
|
|
|
* result = 0;
|
|
|
|
* request = __i915_gem_active_get_rcu(active);
|
|
|
|
* if (request) {
|
|
|
|
* if (!i915_gem_request_completed(request))
|
|
|
|
* result = flag(request->engine->exec_id);
|
|
|
|
* i915_gem_request_put(request);
|
|
|
|
* }
|
|
|
|
*
|
|
|
|
* but that still remains susceptible to both hardware and userspace
|
|
|
|
* races. So we accept making the result of that race slightly worse,
|
|
|
|
* given the rarity of the race and its low impact on the result.
|
|
|
|
*/
|
|
|
|
return flag(READ_ONCE(request->engine->exec_id));
|
2015-03-16 20:11:13 +08:00
|
|
|
}
|
|
|
|
|
2016-08-09 16:23:33 +08:00
|
|
|
static __always_inline unsigned int
|
2016-08-05 17:14:18 +08:00
|
|
|
busy_check_reader(const struct i915_gem_active *active)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
2016-08-05 17:14:18 +08:00
|
|
|
return __busy_set_if_active(active, __busy_read_flag);
|
|
|
|
}
|
2013-12-07 06:10:55 +08:00
|
|
|
|
2016-08-09 16:23:33 +08:00
|
|
|
static __always_inline unsigned int
|
2016-08-05 17:14:18 +08:00
|
|
|
busy_check_writer(const struct i915_gem_active *active)
|
|
|
|
{
|
|
|
|
return __busy_set_if_active(active, __busy_write_id);
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
int
|
|
|
|
i915_gem_busy_ioctl(struct drm_device *dev, void *data,
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_file *file)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
|
|
|
struct drm_i915_gem_busy *args = data;
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_i915_gem_object *obj;
|
2016-08-05 17:14:18 +08:00
|
|
|
unsigned long active;
|
2010-09-25 17:19:17 +08:00
|
|
|
|
2016-07-20 20:31:51 +08:00
|
|
|
obj = i915_gem_object_lookup(file, args->handle);
|
2016-08-05 17:14:18 +08:00
|
|
|
if (!obj)
|
|
|
|
return -ENOENT;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2016-01-16 00:51:46 +08:00
|
|
|
args->busy = 0;
|
2016-08-05 17:14:18 +08:00
|
|
|
active = __I915_BO_ACTIVE(obj);
|
|
|
|
if (active) {
|
|
|
|
int idx;
|
2010-05-21 09:08:57 +08:00
|
|
|
|
2016-08-05 17:14:18 +08:00
|
|
|
/* Yes, the lookups are intentionally racy.
|
|
|
|
*
|
|
|
|
* First, we cannot simply rely on __I915_BO_ACTIVE. We have
|
|
|
|
* to regard the value as stale and as our ABI guarantees
|
|
|
|
* forward progress, we confirm the status of each active
|
|
|
|
* request with the hardware.
|
|
|
|
*
|
|
|
|
* Even though we guard the pointer lookup by RCU, that only
|
|
|
|
* guarantees that the pointer and its contents remain
|
|
|
|
* dereferencable and does *not* mean that the request we
|
|
|
|
* have is the same as the one being tracked by the object.
|
|
|
|
*
|
|
|
|
* Consider that we lookup the request just as it is being
|
|
|
|
* retired and freed. We take a local copy of the pointer,
|
|
|
|
* but before we add its engine into the busy set, the other
|
|
|
|
* thread reallocates it and assigns it to a task on another
|
2016-08-16 16:50:40 +08:00
|
|
|
* engine with a fresh and incomplete seqno. Guarding against
|
|
|
|
* that requires careful serialisation and reference counting,
|
|
|
|
* i.e. using __i915_gem_active_get_request_rcu(). We don't,
|
|
|
|
* instead we expect that if the result is busy, which engines
|
|
|
|
* are busy is not completely reliable - we only guarantee
|
|
|
|
* that the object was busy.
|
2016-08-05 17:14:18 +08:00
|
|
|
*/
|
|
|
|
rcu_read_lock();
|
2010-08-04 22:36:30 +08:00
|
|
|
|
2016-08-05 17:14:18 +08:00
|
|
|
for_each_active(active, idx)
|
|
|
|
args->busy |= busy_check_reader(&obj->last_read[idx]);
|
2016-01-16 00:51:46 +08:00
|
|
|
|
2016-08-05 17:14:18 +08:00
|
|
|
/* For ABI sanity, we only care that the write engine is in
|
2016-08-10 01:08:25 +08:00
|
|
|
* the set of read engines. This should be ensured by the
|
|
|
|
* ordering of setting last_read/last_write in
|
|
|
|
* i915_vma_move_to_active(), and then in reverse in retire.
|
|
|
|
* However, for good measure, we always report the last_write
|
|
|
|
* request as a busy read as well as being a busy write.
|
2016-08-05 17:14:18 +08:00
|
|
|
*
|
|
|
|
* We don't care that the set of active read/write engines
|
|
|
|
* may change during construction of the result, as it is
|
|
|
|
* equally liable to change before userspace can inspect
|
|
|
|
* the result.
|
|
|
|
*/
|
|
|
|
args->busy |= busy_check_writer(&obj->last_write);
|
2016-01-16 00:51:46 +08:00
|
|
|
|
2016-08-05 17:14:18 +08:00
|
|
|
rcu_read_unlock();
|
2016-01-16 00:51:46 +08:00
|
|
|
}
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2016-08-05 17:14:18 +08:00
|
|
|
i915_gem_object_put_unlocked(obj);
|
|
|
|
return 0;
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
int
|
|
|
|
i915_gem_throttle_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file_priv)
|
|
|
|
{
|
2011-08-17 03:34:10 +08:00
|
|
|
return i915_gem_ring_throttle(dev, file_priv);
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
|
|
|
|
2009-09-14 23:50:29 +08:00
|
|
|
int
|
|
|
|
i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file_priv)
|
|
|
|
{
|
2016-07-04 18:34:36 +08:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(dev);
|
2009-09-14 23:50:29 +08:00
|
|
|
struct drm_i915_gem_madvise *args = data;
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_i915_gem_object *obj;
|
2010-09-25 18:22:51 +08:00
|
|
|
int ret;
|
2009-09-14 23:50:29 +08:00
|
|
|
|
|
|
|
switch (args->madv) {
|
|
|
|
case I915_MADV_DONTNEED:
|
|
|
|
case I915_MADV_WILLNEED:
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2010-10-17 16:45:41 +08:00
|
|
|
ret = i915_mutex_lock_interruptible(dev);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2016-07-20 20:31:51 +08:00
|
|
|
obj = i915_gem_object_lookup(file_priv, args->handle);
|
|
|
|
if (!obj) {
|
2010-10-17 16:45:41 +08:00
|
|
|
ret = -ENOENT;
|
|
|
|
goto unlock;
|
2009-09-14 23:50:29 +08:00
|
|
|
}
|
|
|
|
|
2014-11-20 16:26:30 +08:00
|
|
|
if (obj->pages &&
|
2016-08-05 17:14:23 +08:00
|
|
|
i915_gem_object_is_tiled(obj) &&
|
2014-11-20 16:26:30 +08:00
|
|
|
dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES) {
|
|
|
|
if (obj->madv == I915_MADV_WILLNEED)
|
|
|
|
i915_gem_object_unpin_pages(obj);
|
|
|
|
if (args->madv == I915_MADV_WILLNEED)
|
|
|
|
i915_gem_object_pin_pages(obj);
|
|
|
|
}
|
|
|
|
|
2010-11-09 03:18:58 +08:00
|
|
|
if (obj->madv != __I915_MADV_PURGED)
|
|
|
|
obj->madv = args->madv;
|
2009-09-14 23:50:29 +08:00
|
|
|
|
drm/i915: Track unbound pages
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-20 17:40:46 +08:00
|
|
|
/* if the object is no longer attached, discard its backing storage */
|
2015-03-18 17:46:04 +08:00
|
|
|
if (obj->madv == I915_MADV_DONTNEED && obj->pages == NULL)
|
2009-09-21 06:13:10 +08:00
|
|
|
i915_gem_object_truncate(obj);
|
|
|
|
|
2010-11-09 03:18:58 +08:00
|
|
|
args->retained = obj->madv != __I915_MADV_PURGED;
|
2009-09-22 21:24:13 +08:00
|
|
|
|
2016-07-20 20:31:53 +08:00
|
|
|
i915_gem_object_put(obj);
|
2010-10-17 16:45:41 +08:00
|
|
|
unlock:
|
2009-09-14 23:50:29 +08:00
|
|
|
mutex_unlock(&dev->struct_mutex);
|
2010-10-17 16:45:41 +08:00
|
|
|
return ret;
|
2009-09-14 23:50:29 +08:00
|
|
|
}
|
|
|
|
|
2012-06-07 22:38:42 +08:00
|
|
|
void i915_gem_object_init(struct drm_i915_gem_object *obj,
|
|
|
|
const struct drm_i915_gem_object_ops *ops)
|
2012-08-11 22:41:06 +08:00
|
|
|
{
|
2015-04-27 20:41:17 +08:00
|
|
|
int i;
|
|
|
|
|
2013-06-01 02:28:48 +08:00
|
|
|
INIT_LIST_HEAD(&obj->global_list);
|
2016-03-16 19:00:39 +08:00
|
|
|
for (i = 0; i < I915_NUM_ENGINES; i++)
|
drm/i915: Refactor activity tracking for requests
With the introduction of requests, we amplified the number of atomic
refcounted objects we use and update every execbuffer; from none to
several references, and a set of references that need to be changed. We
also introduced interesting side-effects in the order of retiring
requests and objects.
Instead of independently tracking the last request for an object, track
the active objects for each request. The object will reside in the
buffer list of its most recent active request and so we reduce the kref
interchange to a list_move. Now retirements are entirely driven by the
request, dramatically simplifying activity tracking on the object
themselves, and removing the ambiguity between retiring objects and
retiring requests.
Furthermore with the consolidation of managing the activity tracking
centrally, we can look forward to using RCU to enable lockless lookup of
the current active requests for an object. In the future, we will be
able to query the status or wait upon rendering to an object without
even touching the struct_mutex BKL.
All told, less code, simpler and faster, and more extensible.
v2: Add a typedef for the function pointer for convenience later.
v3: Make the noop retirement callback explicit. Allow passing NULL to
the init_request_active() which is expanded to a common noop function.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-16-git-send-email-chris@chris-wilson.co.uk
2016-08-04 14:52:35 +08:00
|
|
|
init_request_active(&obj->last_read[i],
|
|
|
|
i915_gem_object_retire__read);
|
|
|
|
init_request_active(&obj->last_write,
|
|
|
|
i915_gem_object_retire__write);
|
2013-08-14 17:38:33 +08:00
|
|
|
INIT_LIST_HEAD(&obj->obj_exec_link);
|
2013-07-18 03:19:03 +08:00
|
|
|
INIT_LIST_HEAD(&obj->vma_list);
|
2015-04-07 23:20:38 +08:00
|
|
|
INIT_LIST_HEAD(&obj->batch_pool_link);
|
2012-08-11 22:41:06 +08:00
|
|
|
|
2012-06-07 22:38:42 +08:00
|
|
|
obj->ops = ops;
|
|
|
|
|
2016-08-19 00:17:04 +08:00
|
|
|
obj->frontbuffer_ggtt_origin = ORIGIN_GTT;
|
2012-08-11 22:41:06 +08:00
|
|
|
obj->madv = I915_MADV_WILLNEED;
|
|
|
|
|
2016-07-04 18:34:37 +08:00
|
|
|
i915_gem_info_add_obj(to_i915(obj->base.dev), obj->base.size);
|
2012-08-11 22:41:06 +08:00
|
|
|
}
|
|
|
|
|
2012-06-07 22:38:42 +08:00
|
|
|
static const struct drm_i915_gem_object_ops i915_gem_object_ops = {
|
2016-01-23 02:32:31 +08:00
|
|
|
.flags = I915_GEM_OBJECT_HAS_STRUCT_PAGE,
|
2012-06-07 22:38:42 +08:00
|
|
|
.get_pages = i915_gem_object_get_pages_gtt,
|
|
|
|
.put_pages = i915_gem_object_put_pages_gtt,
|
|
|
|
};
|
|
|
|
|
2016-10-18 20:02:49 +08:00
|
|
|
/* Note we don't consider signbits :| */
|
|
|
|
#define overflows_type(x, T) \
|
|
|
|
(sizeof(x) > sizeof(T) && (x) >> (sizeof(T) * BITS_PER_BYTE))
|
|
|
|
|
|
|
|
struct drm_i915_gem_object *
|
|
|
|
i915_gem_object_create(struct drm_device *dev, u64 size)
|
2010-04-10 03:05:06 +08:00
|
|
|
{
|
2010-04-10 03:05:07 +08:00
|
|
|
struct drm_i915_gem_object *obj;
|
2011-06-28 07:18:18 +08:00
|
|
|
struct address_space *mapping;
|
2012-11-30 05:18:51 +08:00
|
|
|
gfp_t mask;
|
2016-04-25 20:32:13 +08:00
|
|
|
int ret;
|
2010-04-10 03:05:06 +08:00
|
|
|
|
2016-10-18 20:02:49 +08:00
|
|
|
/* There is a prevalence of the assumption that we fit the object's
|
|
|
|
* page count inside a 32bit _signed_ variable. Let's document this and
|
|
|
|
* catch if we ever need to fix it. In the meantime, if you do spot
|
|
|
|
* such a local variable, please consider fixing!
|
|
|
|
*/
|
|
|
|
if (WARN_ON(size >> PAGE_SHIFT > INT_MAX))
|
|
|
|
return ERR_PTR(-E2BIG);
|
|
|
|
|
|
|
|
if (overflows_type(size, obj->base.size))
|
|
|
|
return ERR_PTR(-E2BIG);
|
|
|
|
|
2012-11-15 19:32:30 +08:00
|
|
|
obj = i915_gem_object_alloc(dev);
|
2010-04-10 03:05:07 +08:00
|
|
|
if (obj == NULL)
|
2016-04-25 20:32:13 +08:00
|
|
|
return ERR_PTR(-ENOMEM);
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2016-04-25 20:32:13 +08:00
|
|
|
ret = drm_gem_object_init(dev, &obj->base, size);
|
|
|
|
if (ret)
|
|
|
|
goto fail;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2012-05-25 03:48:12 +08:00
|
|
|
mask = GFP_HIGHUSER | __GFP_RECLAIMABLE;
|
|
|
|
if (IS_CRESTLINE(dev) || IS_BROADWATER(dev)) {
|
|
|
|
/* 965gm cannot relocate objects above 4GiB. */
|
|
|
|
mask &= ~__GFP_HIGHMEM;
|
|
|
|
mask |= __GFP_DMA32;
|
|
|
|
}
|
|
|
|
|
2015-12-05 12:45:44 +08:00
|
|
|
mapping = obj->base.filp->f_mapping;
|
2012-05-25 03:48:12 +08:00
|
|
|
mapping_set_gfp_mask(mapping, mask);
|
2011-06-28 07:18:18 +08:00
|
|
|
|
2012-06-07 22:38:42 +08:00
|
|
|
i915_gem_object_init(obj, &i915_gem_object_ops);
|
2010-09-30 18:46:12 +08:00
|
|
|
|
2010-04-10 03:05:07 +08:00
|
|
|
obj->base.write_domain = I915_GEM_DOMAIN_CPU;
|
|
|
|
obj->base.read_domains = I915_GEM_DOMAIN_CPU;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2012-01-18 00:43:53 +08:00
|
|
|
if (HAS_LLC(dev)) {
|
|
|
|
/* On some devices, we can have the GPU use the LLC (the CPU
|
2011-03-30 07:59:55 +08:00
|
|
|
* cache) for about a 10% performance improvement
|
|
|
|
* compared to uncached. Graphics requests other than
|
|
|
|
* display scanout are coherent with the CPU in
|
|
|
|
* accessing this cache. This means in this mode we
|
|
|
|
* don't need to clflush on the CPU side, and on the
|
|
|
|
* GPU side we only need to flush internal caches to
|
|
|
|
* get data visible to the CPU.
|
|
|
|
*
|
|
|
|
* However, we maintain the display planes as UC, and so
|
|
|
|
* need to rebind when first used as such.
|
|
|
|
*/
|
|
|
|
obj->cache_level = I915_CACHE_LLC;
|
|
|
|
} else
|
|
|
|
obj->cache_level = I915_CACHE_NONE;
|
|
|
|
|
2013-07-25 05:25:03 +08:00
|
|
|
trace_i915_gem_object_create(obj);
|
|
|
|
|
2010-11-09 03:18:58 +08:00
|
|
|
return obj;
|
2016-04-25 20:32:13 +08:00
|
|
|
|
|
|
|
fail:
|
|
|
|
i915_gem_object_free(obj);
|
|
|
|
|
|
|
|
return ERR_PTR(ret);
|
2010-04-10 03:05:07 +08:00
|
|
|
}
|
|
|
|
|
2014-05-22 16:16:52 +08:00
|
|
|
static bool discard_backing_storage(struct drm_i915_gem_object *obj)
|
|
|
|
{
|
|
|
|
/* If we are the last user of the backing storage (be it shmemfs
|
|
|
|
* pages or stolen etc), we know that the pages are going to be
|
|
|
|
* immediately released. In this case, we can then skip copying
|
|
|
|
* back the contents from the GPU.
|
|
|
|
*/
|
|
|
|
|
|
|
|
if (obj->madv != I915_MADV_WILLNEED)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
if (obj->base.filp == NULL)
|
|
|
|
return true;
|
|
|
|
|
|
|
|
/* At first glance, this looks racy, but then again so would be
|
|
|
|
* userspace racing mmap against close. However, the first external
|
|
|
|
* reference to the filp can only be obtained through the
|
|
|
|
* i915_gem_mmap_ioctl() which safeguards us against the user
|
|
|
|
* acquiring such a reference whilst we are in the middle of
|
|
|
|
* freeing the object.
|
|
|
|
*/
|
|
|
|
return atomic_long_read(&obj->base.filp->f_count) == 1;
|
|
|
|
}
|
|
|
|
|
2012-04-24 22:47:31 +08:00
|
|
|
void i915_gem_free_object(struct drm_gem_object *gem_obj)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
2012-04-24 22:47:31 +08:00
|
|
|
struct drm_i915_gem_object *obj = to_intel_bo(gem_obj);
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_device *dev = obj->base.dev;
|
2016-07-04 18:34:36 +08:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(dev);
|
drm/i915: plumb VM into bind/unbind code
As alluded to in several patches, and it will be reiterated later... A
VMA is an abstraction for a GEM BO bound into an address space.
Therefore it stands to reason, that the existing bind, and unbind are
the ones which will be the most impacted. This patch implements this,
and updates all callers which weren't already updated in the series
(because it was too messy).
This patch represents the bulk of an earlier, larger patch. I've pulled
out a bunch of things by the request of Daniel. The history is preserved
for posterity with the email convention of ">" One big change from the
original patch aside from a bunch of cropping is I've created an
i915_vma_unbind() function. That is because we always have the VMA
anyway, and doing an extra lookup is useful. There is a caveat, we
retain an i915_gem_object_ggtt_unbind, for the global cases which might
not talk in VMAs.
> drm/i915: plumb VM into object operations
>
> This patch was formerly known as:
> "drm/i915: Create VMAs (part 3) - plumbing"
>
> This patch adds a VM argument, bind/unbind, and the object
> offset/size/color getters/setters. It preserves the old ggtt helper
> functions because things still need, and will continue to need them.
>
> Some code will still need to be ported over after this.
>
> v2: Fix purge to pick an object and unbind all vmas
> This was doable because of the global bound list change.
>
> v3: With the commit to actually pin/unpin pages in place, there is no
> longer a need to check if unbind succeeded before calling put_pages().
> Make put_pages only BUG() after checking pin count.
>
> v4: Rebased on top of the new hangcheck work by Mika
> plumbed eb_destroy also
> Many checkpatch related fixes
>
> v5: Very large rebase
>
> v6:
> Change BUG_ON to WARN_ON (Daniel)
> Rename vm to ggtt in preallocate stolen, since it is always ggtt when
> dealing with stolen memory. (Daniel)
> list_for_each will short-circuit already (Daniel)
> remove superflous space (Daniel)
> Use per object list of vmas (Daniel)
> Make obj_bound_any() use obj_bound for each vm (Ben)
> s/bind_to_gtt/bind_to_vm/ (Ben)
>
> Fixed up the inactive shrinker. As Daniel noticed the code could
> potentially count the same object multiple times. While it's not
> possible in the current case, since 1 object can only ever be bound into
> 1 address space thus far - we may as well try to get something more
> future proof in place now. With a prep patch before this to switch over
> to using the bound list + inactive check, we're now able to carry that
> forward for every address space an object is bound into.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Rebase on top of the loss of "drm/i915: Cleanup more of VMA
in destroy".]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-01 08:00:10 +08:00
|
|
|
struct i915_vma *vma, *next;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2013-11-28 04:20:34 +08:00
|
|
|
intel_runtime_pm_get(dev_priv);
|
|
|
|
|
2011-03-20 19:20:19 +08:00
|
|
|
trace_i915_gem_object_destroy(obj);
|
|
|
|
|
2016-08-04 14:52:45 +08:00
|
|
|
/* All file-owned VMA should have been released by this point through
|
|
|
|
* i915_gem_close_object(), or earlier by i915_gem_context_close().
|
|
|
|
* However, the object may also be bound into the global GTT (e.g.
|
|
|
|
* older GPUs without per-process support, or for direct access through
|
|
|
|
* the GTT either for the user or for scanout). Those VMA still need to
|
|
|
|
* unbound now.
|
|
|
|
*/
|
2016-02-26 19:03:19 +08:00
|
|
|
list_for_each_entry_safe(vma, next, &obj->vma_list, obj_link) {
|
2016-08-04 23:32:32 +08:00
|
|
|
GEM_BUG_ON(!i915_vma_is_ggtt(vma));
|
2016-08-04 14:52:45 +08:00
|
|
|
GEM_BUG_ON(i915_vma_is_active(vma));
|
2016-08-04 23:32:32 +08:00
|
|
|
vma->flags &= ~I915_VMA_PIN_MASK;
|
2016-08-04 14:52:45 +08:00
|
|
|
i915_vma_close(vma);
|
2012-04-24 22:47:31 +08:00
|
|
|
}
|
2016-08-04 14:52:26 +08:00
|
|
|
GEM_BUG_ON(obj->bind_count);
|
2012-04-24 22:47:31 +08:00
|
|
|
|
2013-06-01 05:46:20 +08:00
|
|
|
/* Stolen objects don't hold a ref, but do hold pin count. Fix that up
|
|
|
|
* before progressing. */
|
|
|
|
if (obj->stolen)
|
|
|
|
i915_gem_object_unpin_pages(obj);
|
|
|
|
|
2016-08-04 23:32:37 +08:00
|
|
|
WARN_ON(atomic_read(&obj->frontbuffer_bits));
|
2014-06-19 05:28:09 +08:00
|
|
|
|
2014-11-20 16:26:30 +08:00
|
|
|
if (obj->pages && obj->madv == I915_MADV_WILLNEED &&
|
|
|
|
dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES &&
|
2016-08-05 17:14:23 +08:00
|
|
|
i915_gem_object_is_tiled(obj))
|
2014-11-20 16:26:30 +08:00
|
|
|
i915_gem_object_unpin_pages(obj);
|
|
|
|
|
2013-06-01 02:28:47 +08:00
|
|
|
if (WARN_ON(obj->pages_pin_count))
|
|
|
|
obj->pages_pin_count = 0;
|
2014-05-22 16:16:52 +08:00
|
|
|
if (discard_backing_storage(obj))
|
2014-03-25 21:23:06 +08:00
|
|
|
obj->madv = I915_MADV_DONTNEED;
|
2012-06-07 22:38:42 +08:00
|
|
|
i915_gem_object_put_pages(obj);
|
2008-11-13 02:03:55 +08:00
|
|
|
|
2012-06-01 22:20:22 +08:00
|
|
|
BUG_ON(obj->pages);
|
|
|
|
|
2012-09-05 04:02:58 +08:00
|
|
|
if (obj->base.import_attach)
|
|
|
|
drm_prime_gem_destroy(&obj->base, NULL);
|
2008-11-13 02:03:55 +08:00
|
|
|
|
drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl
By exporting the ability to map user address and inserting PTEs
representing their backing pages into the GTT, we can exploit UMA in order
to utilize normal application data as a texture source or even as a
render target (depending upon the capabilities of the chipset). This has
a number of uses, with zero-copy downloads to the GPU and efficient
readback making the intermixed streaming of CPU and GPU operations
fairly efficient. This ability has many widespread implications from
faster rendering of client-side software rasterisers (chromium),
mitigation of stalls due to read back (firefox) and to faster pipelining
of texture data (such as pixel buffer objects in GL or data blobs in CL).
v2: Compile with CONFIG_MMU_NOTIFIER
v3: We can sleep while performing invalidate-range, which we can utilise
to drop our page references prior to the kernel manipulating the vma
(for either discard or cloning) and so protect normal users.
v4: Only run the invalidate notifier if the range intercepts the bo.
v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers
v6: Recheck after reacquire mutex for lost mmu.
v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary.
v8: Fix rebasing error after forwarding porting the back port.
v9: Limit the userptr to page aligned entries. We now expect userspace
to handle all the offset-in-page adjustments itself.
v10: Prevent vma from being copied across fork to avoid issues with cow.
v11: Drop vma behaviour changes -- locking is nigh on impossible.
Use a worker to load user pages to avoid lock inversions.
v12: Use get_task_mm()/mmput() for correct refcounting of mm.
v13: Use a worker to release the mmu_notifier to avoid lock inversion
v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer
with its own locking and tree of objects for each mm/mmu_notifier.
v15: Prevent overlapping userptr objects, and invalidate all objects
within the mmu_notifier range
v16: Fix a typo for iterating over multiple objects in the range and
rearrange error path to destroy the mmu_notifier locklessly.
Also close a race between invalidate_range and the get_pages_worker.
v17: Close a race between get_pages_worker/invalidate_range and fresh
allocations of the same userptr range - and notice that
struct_mutex was presumed to be held when during creation it wasn't.
v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory
for the struct sg_table and to clear it before reporting an error.
v19: Always error out on read-only userptr requests as we don't have the
hardware infrastructure to support them at the moment.
v20: Refuse to implement read-only support until we have the required
infrastructure - but reserve the bit in flags for future use.
v21: use_mm() is not required for get_user_pages(). It is only meant to
be used to fix up the kernel thread's current->mm for use with
copy_user().
v22: Use sg_alloc_table_from_pages for that chunky feeling
v23: Export a function for sanity checking dma-buf rather than encode
userptr details elsewhere, and clean up comments based on
suggestions by Bradley.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Akash Goel <akash.goel@intel.com>
Cc: "Volkin, Bradley D" <bradley.d.volkin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
[danvet: Frob ioctl allocation to pick the next one - will cause a bit
of fuss with create2 apparently, but such are the rules.]
[danvet2: oops, forgot to git add after manual patch application]
[danvet3: Appease sparse.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-16 21:22:37 +08:00
|
|
|
if (obj->ops->release)
|
|
|
|
obj->ops->release(obj);
|
|
|
|
|
2010-11-09 03:18:58 +08:00
|
|
|
drm_gem_object_release(&obj->base);
|
|
|
|
i915_gem_info_remove_obj(dev_priv, obj->base.size);
|
2010-04-10 03:05:07 +08:00
|
|
|
|
2010-11-09 03:18:58 +08:00
|
|
|
kfree(obj->bit_17);
|
2012-11-15 19:32:30 +08:00
|
|
|
i915_gem_object_free(obj);
|
2013-11-28 04:20:34 +08:00
|
|
|
|
|
|
|
intel_runtime_pm_put(dev_priv);
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
|
|
|
|
2016-08-05 17:14:11 +08:00
|
|
|
int i915_gem_suspend(struct drm_device *dev)
|
2014-04-09 16:19:41 +08:00
|
|
|
{
|
2016-07-04 18:34:36 +08:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(dev);
|
2016-08-05 17:14:11 +08:00
|
|
|
int ret;
|
2014-04-09 16:19:41 +08:00
|
|
|
|
2016-07-22 04:16:19 +08:00
|
|
|
intel_suspend_gt_powersave(dev_priv);
|
2008-11-14 07:00:55 +08:00
|
|
|
|
2013-10-16 18:50:01 +08:00
|
|
|
mutex_lock(&dev->struct_mutex);
|
2016-07-15 21:56:20 +08:00
|
|
|
|
|
|
|
/* We have to flush all the executing contexts to main memory so
|
|
|
|
* that they can saved in the hibernation image. To ensure the last
|
|
|
|
* context image is coherent, we have to switch away from it. That
|
|
|
|
* leaves the dev_priv->kernel_context still active when
|
|
|
|
* we actually suspend, and its image in memory may not match the GPU
|
|
|
|
* state. Fortunately, the kernel_context is disposable and we do
|
|
|
|
* not rely on its state.
|
|
|
|
*/
|
|
|
|
ret = i915_gem_switch_to_kernel_context(dev_priv);
|
|
|
|
if (ret)
|
|
|
|
goto err;
|
|
|
|
|
2016-09-09 21:11:50 +08:00
|
|
|
ret = i915_gem_wait_for_idle(dev_priv,
|
|
|
|
I915_WAIT_INTERRUPTIBLE |
|
|
|
|
I915_WAIT_LOCKED);
|
2013-09-14 06:57:04 +08:00
|
|
|
if (ret)
|
2013-10-16 18:50:01 +08:00
|
|
|
goto err;
|
2013-09-14 06:57:04 +08:00
|
|
|
|
2016-05-06 22:40:21 +08:00
|
|
|
i915_gem_retire_requests(dev_priv);
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2016-04-28 16:56:41 +08:00
|
|
|
i915_gem_context_lost(dev_priv);
|
2013-10-16 18:50:01 +08:00
|
|
|
mutex_unlock(&dev->struct_mutex);
|
|
|
|
|
2015-01-27 00:03:03 +08:00
|
|
|
cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work);
|
2016-07-04 15:08:31 +08:00
|
|
|
cancel_delayed_work_sync(&dev_priv->gt.retire_work);
|
|
|
|
flush_delayed_work(&dev_priv->gt.idle_work);
|
2010-01-07 18:39:13 +08:00
|
|
|
|
2014-11-25 19:56:33 +08:00
|
|
|
/* Assert that we sucessfully flushed all the work and
|
|
|
|
* reset the GPU back to its idle, low power state.
|
|
|
|
*/
|
2016-07-04 15:08:31 +08:00
|
|
|
WARN_ON(dev_priv->gt.awake);
|
2014-11-25 19:56:33 +08:00
|
|
|
|
2016-10-12 22:46:37 +08:00
|
|
|
/*
|
|
|
|
* Neither the BIOS, ourselves or any other kernel
|
|
|
|
* expects the system to be in execlists mode on startup,
|
|
|
|
* so we need to reset the GPU back to legacy mode. And the only
|
|
|
|
* known way to disable logical contexts is through a GPU reset.
|
|
|
|
*
|
|
|
|
* So in order to leave the system in a known default configuration,
|
|
|
|
* always reset the GPU upon unload and suspend. Afterwards we then
|
|
|
|
* clean up the GEM state tracking, flushing off the requests and
|
|
|
|
* leaving the system in a known idle state.
|
|
|
|
*
|
|
|
|
* Note that is of the upmost importance that the GPU is idle and
|
|
|
|
* all stray writes are flushed *before* we dismantle the backing
|
|
|
|
* storage for the pinned objects.
|
|
|
|
*
|
|
|
|
* However, since we are uncertain that resetting the GPU on older
|
|
|
|
* machines is a good idea, we don't - just in case it leaves the
|
|
|
|
* machine in an unusable condition.
|
|
|
|
*/
|
|
|
|
if (HAS_HW_CONTEXTS(dev)) {
|
|
|
|
int reset = intel_gpu_reset(dev_priv, ALL_ENGINES);
|
|
|
|
WARN_ON(reset && reset != -ENODEV);
|
|
|
|
}
|
|
|
|
|
2008-07-31 03:06:12 +08:00
|
|
|
return 0;
|
2013-10-16 18:50:01 +08:00
|
|
|
|
|
|
|
err:
|
|
|
|
mutex_unlock(&dev->struct_mutex);
|
|
|
|
return ret;
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
|
|
|
|
2016-07-15 21:56:20 +08:00
|
|
|
void i915_gem_resume(struct drm_device *dev)
|
|
|
|
{
|
|
|
|
struct drm_i915_private *dev_priv = to_i915(dev);
|
|
|
|
|
|
|
|
mutex_lock(&dev->struct_mutex);
|
|
|
|
i915_gem_restore_gtt_mappings(dev);
|
|
|
|
|
|
|
|
/* As we didn't flush the kernel context before suspend, we cannot
|
|
|
|
* guarantee that the context image is complete. So let's just reset
|
|
|
|
* it and start again.
|
|
|
|
*/
|
2016-09-09 21:11:53 +08:00
|
|
|
dev_priv->gt.resume(dev_priv);
|
2016-07-15 21:56:20 +08:00
|
|
|
|
|
|
|
mutex_unlock(&dev->struct_mutex);
|
|
|
|
}
|
|
|
|
|
2012-02-02 16:58:12 +08:00
|
|
|
void i915_gem_init_swizzling(struct drm_device *dev)
|
|
|
|
{
|
2016-07-04 18:34:36 +08:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(dev);
|
2012-02-02 16:58:12 +08:00
|
|
|
|
2012-01-31 23:47:55 +08:00
|
|
|
if (INTEL_INFO(dev)->gen < 5 ||
|
2012-02-02 16:58:12 +08:00
|
|
|
dev_priv->mm.bit_6_swizzle_x == I915_BIT_6_SWIZZLE_NONE)
|
|
|
|
return;
|
|
|
|
|
|
|
|
I915_WRITE(DISP_ARB_CTL, I915_READ(DISP_ARB_CTL) |
|
|
|
|
DISP_TILE_SURFACE_SWIZZLING);
|
|
|
|
|
2016-10-13 18:03:10 +08:00
|
|
|
if (IS_GEN5(dev_priv))
|
2012-01-31 23:47:55 +08:00
|
|
|
return;
|
|
|
|
|
2012-02-02 16:58:12 +08:00
|
|
|
I915_WRITE(TILECTL, I915_READ(TILECTL) | TILECTL_SWZCTL);
|
2016-10-13 18:03:10 +08:00
|
|
|
if (IS_GEN6(dev_priv))
|
2012-04-24 20:04:12 +08:00
|
|
|
I915_WRITE(ARB_MODE, _MASKED_BIT_ENABLE(ARB_MODE_SWIZZLE_SNB));
|
2016-10-13 18:03:10 +08:00
|
|
|
else if (IS_GEN7(dev_priv))
|
2012-04-24 20:04:12 +08:00
|
|
|
I915_WRITE(ARB_MODE, _MASKED_BIT_ENABLE(ARB_MODE_SWIZZLE_IVB));
|
2016-10-13 18:03:10 +08:00
|
|
|
else if (IS_GEN8(dev_priv))
|
2013-11-03 12:07:04 +08:00
|
|
|
I915_WRITE(GAMTARBMODE, _MASKED_BIT_ENABLE(ARB_MODE_SWIZZLE_BDW));
|
2012-12-19 02:31:23 +08:00
|
|
|
else
|
|
|
|
BUG();
|
2012-02-02 16:58:12 +08:00
|
|
|
}
|
2012-02-10 03:53:27 +08:00
|
|
|
|
2016-10-13 18:02:58 +08:00
|
|
|
static void init_unused_ring(struct drm_i915_private *dev_priv, u32 base)
|
2014-08-15 06:21:55 +08:00
|
|
|
{
|
|
|
|
I915_WRITE(RING_CTL(base), 0);
|
|
|
|
I915_WRITE(RING_HEAD(base), 0);
|
|
|
|
I915_WRITE(RING_TAIL(base), 0);
|
|
|
|
I915_WRITE(RING_START(base), 0);
|
|
|
|
}
|
|
|
|
|
2016-10-13 18:02:58 +08:00
|
|
|
static void init_unused_rings(struct drm_i915_private *dev_priv)
|
2014-08-15 06:21:55 +08:00
|
|
|
{
|
2016-10-13 18:02:58 +08:00
|
|
|
if (IS_I830(dev_priv)) {
|
|
|
|
init_unused_ring(dev_priv, PRB1_BASE);
|
|
|
|
init_unused_ring(dev_priv, SRB0_BASE);
|
|
|
|
init_unused_ring(dev_priv, SRB1_BASE);
|
|
|
|
init_unused_ring(dev_priv, SRB2_BASE);
|
|
|
|
init_unused_ring(dev_priv, SRB3_BASE);
|
|
|
|
} else if (IS_GEN2(dev_priv)) {
|
|
|
|
init_unused_ring(dev_priv, SRB0_BASE);
|
|
|
|
init_unused_ring(dev_priv, SRB1_BASE);
|
|
|
|
} else if (IS_GEN3(dev_priv)) {
|
|
|
|
init_unused_ring(dev_priv, PRB1_BASE);
|
|
|
|
init_unused_ring(dev_priv, PRB2_BASE);
|
2014-08-15 06:21:55 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-02-09 03:49:24 +08:00
|
|
|
int
|
|
|
|
i915_gem_init_hw(struct drm_device *dev)
|
|
|
|
{
|
2016-07-04 18:34:36 +08:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(dev);
|
2016-03-16 19:00:36 +08:00
|
|
|
struct intel_engine_cs *engine;
|
drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 01:14:48 +08:00
|
|
|
enum intel_engine_id id;
|
2016-04-28 16:56:44 +08:00
|
|
|
int ret;
|
2013-02-09 03:49:24 +08:00
|
|
|
|
2015-02-13 22:35:59 +08:00
|
|
|
/* Double layer security blanket, see i915_gem_init() */
|
|
|
|
intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
|
|
|
|
|
2016-04-13 22:26:43 +08:00
|
|
|
if (HAS_EDRAM(dev) && INTEL_GEN(dev_priv) < 9)
|
2013-07-05 02:02:04 +08:00
|
|
|
I915_WRITE(HSW_IDICR, I915_READ(HSW_IDICR) | IDIHASHMSK(0xf));
|
2013-02-09 03:49:24 +08:00
|
|
|
|
2016-10-13 18:03:01 +08:00
|
|
|
if (IS_HASWELL(dev_priv))
|
2016-10-13 18:02:58 +08:00
|
|
|
I915_WRITE(MI_PREDICATE_RESULT_2, IS_HSW_GT3(dev_priv) ?
|
2013-11-29 20:56:12 +08:00
|
|
|
LOWER_SLICE_ENABLED : LOWER_SLICE_DISABLED);
|
2013-08-29 03:45:46 +08:00
|
|
|
|
2016-10-13 18:02:53 +08:00
|
|
|
if (HAS_PCH_NOP(dev_priv)) {
|
2016-10-14 17:13:06 +08:00
|
|
|
if (IS_IVYBRIDGE(dev_priv)) {
|
2014-01-23 06:39:30 +08:00
|
|
|
u32 temp = I915_READ(GEN7_MSG_CTL);
|
|
|
|
temp &= ~(WAIT_FOR_PCH_FLR_ACK | WAIT_FOR_PCH_RESET_ACK);
|
|
|
|
I915_WRITE(GEN7_MSG_CTL, temp);
|
|
|
|
} else if (INTEL_INFO(dev)->gen >= 7) {
|
|
|
|
u32 temp = I915_READ(HSW_NDE_RSTWRN_OPT);
|
|
|
|
temp &= ~RESET_PCH_HANDSHAKE_ENABLE;
|
|
|
|
I915_WRITE(HSW_NDE_RSTWRN_OPT, temp);
|
|
|
|
}
|
2013-04-06 04:12:43 +08:00
|
|
|
}
|
|
|
|
|
2013-02-09 03:49:24 +08:00
|
|
|
i915_gem_init_swizzling(dev);
|
|
|
|
|
2014-11-20 16:45:19 +08:00
|
|
|
/*
|
|
|
|
* At least 830 can leave some of the unused rings
|
|
|
|
* "active" (ie. head != tail) after resume which
|
|
|
|
* will prevent c3 entry. Makes sure all unused rings
|
|
|
|
* are totally idle.
|
|
|
|
*/
|
2016-10-13 18:02:58 +08:00
|
|
|
init_unused_rings(dev_priv);
|
2014-11-20 16:45:19 +08:00
|
|
|
|
2016-01-20 03:02:54 +08:00
|
|
|
BUG_ON(!dev_priv->kernel_context);
|
2015-05-30 00:43:37 +08:00
|
|
|
|
2015-06-18 20:11:20 +08:00
|
|
|
ret = i915_ppgtt_init_hw(dev);
|
|
|
|
if (ret) {
|
|
|
|
DRM_ERROR("PPGTT enable HW failed %d\n", ret);
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Need to do basic initialisation of all rings first: */
|
drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 01:14:48 +08:00
|
|
|
for_each_engine(engine, dev_priv, id) {
|
2016-03-16 19:00:36 +08:00
|
|
|
ret = engine->init_hw(engine);
|
2014-11-20 07:33:07 +08:00
|
|
|
if (ret)
|
2015-02-13 22:35:59 +08:00
|
|
|
goto out;
|
2014-11-20 07:33:07 +08:00
|
|
|
}
|
2013-01-22 20:12:17 +08:00
|
|
|
|
2016-04-13 22:03:25 +08:00
|
|
|
intel_mocs_init_l3cc_table(dev);
|
|
|
|
|
2015-08-12 22:43:36 +08:00
|
|
|
/* We can't enable contexts until all firmware is loaded */
|
2016-06-07 16:14:49 +08:00
|
|
|
ret = intel_guc_setup(dev);
|
2015-09-11 19:53:46 +08:00
|
|
|
if (ret)
|
|
|
|
goto out;
|
|
|
|
|
2015-02-13 22:35:59 +08:00
|
|
|
out:
|
|
|
|
intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
|
drm/i915: Split context enabling from init
We **need** to do this for exactly 1 reason, because we want to embed a
PPGTT into the context, but we don't want to special case the default
context.
To achieve that, we must be able to initialize contexts after the GTT is
setup (so we can allocate and pin the default context's BO), but before
the PPGTT and rings are initialized. This is because, currently, context
initialization requires ring usage. We don't have rings until after the
GTT is setup. If we split the enabling part of context initialization,
the part requiring the ringbuffer, we can untangle this, and then later
embed the PPGTT
Incidentally this allows us to also adhere to the original design of
context init/fini in future patches: they were only ever meant to be
called at driver load and unload.
v2: Move hw_contexts_disabled test in i915_gem_context_enable() (Chris)
v3: BUG_ON after checking for disabled contexts. Or else it blows up pre
gen6 (Ben)
v4: Forward port
Modified enable for each ring, since that patch is earlier in the series
Dropped ring arg from create_default_context so it can be used by others
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-07 06:11:04 +08:00
|
|
|
return ret;
|
2010-05-21 09:08:55 +08:00
|
|
|
}
|
|
|
|
|
2016-07-20 20:31:57 +08:00
|
|
|
bool intel_sanitize_semaphores(struct drm_i915_private *dev_priv, int value)
|
|
|
|
{
|
|
|
|
if (INTEL_INFO(dev_priv)->gen < 6)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
/* TODO: make semaphores and Execlists play nicely together */
|
|
|
|
if (i915.enable_execlists)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
if (value >= 0)
|
|
|
|
return value;
|
|
|
|
|
|
|
|
#ifdef CONFIG_INTEL_IOMMU
|
|
|
|
/* Enable semaphores on SNB when IO remapping is off */
|
|
|
|
if (INTEL_INFO(dev_priv)->gen == 6 && intel_iommu_gfx_mapped)
|
|
|
|
return false;
|
|
|
|
#endif
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2012-04-24 22:47:41 +08:00
|
|
|
int i915_gem_init(struct drm_device *dev)
|
|
|
|
{
|
2016-07-04 18:34:36 +08:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(dev);
|
2012-04-24 22:47:41 +08:00
|
|
|
int ret;
|
|
|
|
|
|
|
|
mutex_lock(&dev->struct_mutex);
|
2013-03-09 02:45:53 +08:00
|
|
|
|
2014-07-25 00:04:21 +08:00
|
|
|
if (!i915.enable_execlists) {
|
2016-09-09 21:11:53 +08:00
|
|
|
dev_priv->gt.resume = intel_legacy_submission_resume;
|
2016-08-03 05:50:21 +08:00
|
|
|
dev_priv->gt.cleanup_engine = intel_engine_cleanup;
|
2014-07-25 00:04:22 +08:00
|
|
|
} else {
|
2016-09-09 21:11:53 +08:00
|
|
|
dev_priv->gt.resume = intel_lr_context_resume;
|
2016-03-16 19:00:40 +08:00
|
|
|
dev_priv->gt.cleanup_engine = intel_logical_ring_cleanup;
|
2014-07-25 00:04:21 +08:00
|
|
|
}
|
|
|
|
|
2015-02-13 22:35:59 +08:00
|
|
|
/* This is just a security blanket to placate dragons.
|
|
|
|
* On some systems, we very sporadically observe that the first TLBs
|
|
|
|
* used by the CS may be stale, despite us poking the TLB reset. If
|
|
|
|
* we hold the forcewake during initialisation these problems
|
|
|
|
* just magically go away.
|
|
|
|
*/
|
|
|
|
intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
|
|
|
|
|
2016-05-19 23:17:16 +08:00
|
|
|
i915_gem_init_userptr(dev_priv);
|
2016-08-04 14:52:23 +08:00
|
|
|
|
|
|
|
ret = i915_gem_init_ggtt(dev_priv);
|
|
|
|
if (ret)
|
|
|
|
goto out_unlock;
|
2013-03-09 02:45:53 +08:00
|
|
|
|
drm/i915: Split context enabling from init
We **need** to do this for exactly 1 reason, because we want to embed a
PPGTT into the context, but we don't want to special case the default
context.
To achieve that, we must be able to initialize contexts after the GTT is
setup (so we can allocate and pin the default context's BO), but before
the PPGTT and rings are initialized. This is because, currently, context
initialization requires ring usage. We don't have rings until after the
GTT is setup. If we split the enabling part of context initialization,
the part requiring the ringbuffer, we can untangle this, and then later
embed the PPGTT
Incidentally this allows us to also adhere to the original design of
context init/fini in future patches: they were only ever meant to be
called at driver load and unload.
v2: Move hw_contexts_disabled test in i915_gem_context_enable() (Chris)
v3: BUG_ON after checking for disabled contexts. Or else it blows up pre
gen6 (Ben)
v4: Forward port
Modified enable for each ring, since that patch is earlier in the series
Dropped ring arg from create_default_context so it can be used by others
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-07 06:11:04 +08:00
|
|
|
ret = i915_gem_context_init(dev);
|
2014-12-05 20:17:42 +08:00
|
|
|
if (ret)
|
|
|
|
goto out_unlock;
|
drm/i915: Split context enabling from init
We **need** to do this for exactly 1 reason, because we want to embed a
PPGTT into the context, but we don't want to special case the default
context.
To achieve that, we must be able to initialize contexts after the GTT is
setup (so we can allocate and pin the default context's BO), but before
the PPGTT and rings are initialized. This is because, currently, context
initialization requires ring usage. We don't have rings until after the
GTT is setup. If we split the enabling part of context initialization,
the part requiring the ringbuffer, we can untangle this, and then later
embed the PPGTT
Incidentally this allows us to also adhere to the original design of
context init/fini in future patches: they were only ever meant to be
called at driver load and unload.
v2: Move hw_contexts_disabled test in i915_gem_context_enable() (Chris)
v3: BUG_ON after checking for disabled contexts. Or else it blows up pre
gen6 (Ben)
v4: Forward port
Modified enable for each ring, since that patch is earlier in the series
Dropped ring arg from create_default_context so it can be used by others
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-07 06:11:04 +08:00
|
|
|
|
2016-07-13 23:03:37 +08:00
|
|
|
ret = intel_engines_init(dev);
|
2014-11-20 07:33:07 +08:00
|
|
|
if (ret)
|
2014-12-05 20:17:42 +08:00
|
|
|
goto out_unlock;
|
drm/i915: Split context enabling from init
We **need** to do this for exactly 1 reason, because we want to embed a
PPGTT into the context, but we don't want to special case the default
context.
To achieve that, we must be able to initialize contexts after the GTT is
setup (so we can allocate and pin the default context's BO), but before
the PPGTT and rings are initialized. This is because, currently, context
initialization requires ring usage. We don't have rings until after the
GTT is setup. If we split the enabling part of context initialization,
the part requiring the ringbuffer, we can untangle this, and then later
embed the PPGTT
Incidentally this allows us to also adhere to the original design of
context init/fini in future patches: they were only ever meant to be
called at driver load and unload.
v2: Move hw_contexts_disabled test in i915_gem_context_enable() (Chris)
v3: BUG_ON after checking for disabled contexts. Or else it blows up pre
gen6 (Ben)
v4: Forward port
Modified enable for each ring, since that patch is earlier in the series
Dropped ring arg from create_default_context so it can be used by others
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-07 06:11:04 +08:00
|
|
|
|
2012-04-24 22:47:41 +08:00
|
|
|
ret = i915_gem_init_hw(dev);
|
2014-04-09 16:19:42 +08:00
|
|
|
if (ret == -EIO) {
|
2016-07-27 16:07:29 +08:00
|
|
|
/* Allow engine initialisation to fail by marking the GPU as
|
2014-04-09 16:19:42 +08:00
|
|
|
* wedged. But we only want to do this where the GPU is angry,
|
|
|
|
* for all other failure, such as an allocation failure, bail.
|
|
|
|
*/
|
|
|
|
DRM_ERROR("Failed to initialize GPU, declaring it wedged\n");
|
2016-09-09 21:11:53 +08:00
|
|
|
i915_gem_set_wedged(dev_priv);
|
2014-04-09 16:19:42 +08:00
|
|
|
ret = 0;
|
2012-04-24 22:47:41 +08:00
|
|
|
}
|
2014-12-05 20:17:42 +08:00
|
|
|
|
|
|
|
out_unlock:
|
2015-02-13 22:35:59 +08:00
|
|
|
intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
|
2014-04-09 16:19:42 +08:00
|
|
|
mutex_unlock(&dev->struct_mutex);
|
2012-04-24 22:47:41 +08:00
|
|
|
|
2014-04-09 16:19:42 +08:00
|
|
|
return ret;
|
2012-04-24 22:47:41 +08:00
|
|
|
}
|
|
|
|
|
2010-05-21 09:08:55 +08:00
|
|
|
void
|
2016-03-16 19:00:40 +08:00
|
|
|
i915_gem_cleanup_engines(struct drm_device *dev)
|
2010-05-21 09:08:55 +08:00
|
|
|
{
|
2016-07-04 18:34:36 +08:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(dev);
|
2016-03-16 19:00:36 +08:00
|
|
|
struct intel_engine_cs *engine;
|
drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 01:14:48 +08:00
|
|
|
enum intel_engine_id id;
|
2010-05-21 09:08:55 +08:00
|
|
|
|
drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 01:14:48 +08:00
|
|
|
for_each_engine(engine, dev_priv, id)
|
2016-03-16 19:00:40 +08:00
|
|
|
dev_priv->gt.cleanup_engine(engine);
|
2010-05-21 09:08:55 +08:00
|
|
|
}
|
|
|
|
|
2016-03-16 20:54:03 +08:00
|
|
|
void
|
|
|
|
i915_gem_load_init_fences(struct drm_i915_private *dev_priv)
|
|
|
|
{
|
2016-07-05 17:40:23 +08:00
|
|
|
struct drm_device *dev = &dev_priv->drm;
|
2016-08-19 00:17:00 +08:00
|
|
|
int i;
|
2016-03-16 20:54:03 +08:00
|
|
|
|
|
|
|
if (INTEL_INFO(dev_priv)->gen >= 7 && !IS_VALLEYVIEW(dev_priv) &&
|
|
|
|
!IS_CHERRYVIEW(dev_priv))
|
|
|
|
dev_priv->num_fence_regs = 32;
|
|
|
|
else if (INTEL_INFO(dev_priv)->gen >= 4 || IS_I945G(dev_priv) ||
|
|
|
|
IS_I945GM(dev_priv) || IS_G33(dev_priv))
|
|
|
|
dev_priv->num_fence_regs = 16;
|
|
|
|
else
|
|
|
|
dev_priv->num_fence_regs = 8;
|
|
|
|
|
2016-05-06 22:40:21 +08:00
|
|
|
if (intel_vgpu_active(dev_priv))
|
2016-03-16 20:54:03 +08:00
|
|
|
dev_priv->num_fence_regs =
|
|
|
|
I915_READ(vgtif_reg(avail_rs.fence_num));
|
|
|
|
|
|
|
|
/* Initialize fence registers to zero */
|
2016-08-19 00:17:00 +08:00
|
|
|
for (i = 0; i < dev_priv->num_fence_regs; i++) {
|
|
|
|
struct drm_i915_fence_reg *fence = &dev_priv->fence_regs[i];
|
|
|
|
|
|
|
|
fence->i915 = dev_priv;
|
|
|
|
fence->id = i;
|
|
|
|
list_add_tail(&fence->link, &dev_priv->mm.fence_list);
|
|
|
|
}
|
2016-03-16 20:54:03 +08:00
|
|
|
i915_gem_restore_fences(dev);
|
|
|
|
|
|
|
|
i915_gem_detect_bit_6_swizzle(dev);
|
2010-10-24 19:38:05 +08:00
|
|
|
}
|
|
|
|
|
2008-07-31 03:06:12 +08:00
|
|
|
void
|
2016-01-19 21:26:29 +08:00
|
|
|
i915_gem_load_init(struct drm_device *dev)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
2016-07-04 18:34:36 +08:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(dev);
|
2012-11-15 19:32:30 +08:00
|
|
|
|
2015-04-07 23:20:57 +08:00
|
|
|
dev_priv->objects =
|
2012-11-15 19:32:30 +08:00
|
|
|
kmem_cache_create("i915_gem_object",
|
|
|
|
sizeof(struct drm_i915_gem_object), 0,
|
|
|
|
SLAB_HWCACHE_ALIGN,
|
|
|
|
NULL);
|
2015-04-07 23:20:58 +08:00
|
|
|
dev_priv->vmas =
|
|
|
|
kmem_cache_create("i915_gem_vma",
|
|
|
|
sizeof(struct i915_vma), 0,
|
|
|
|
SLAB_HWCACHE_ALIGN,
|
|
|
|
NULL);
|
2015-04-07 23:20:57 +08:00
|
|
|
dev_priv->requests =
|
|
|
|
kmem_cache_create("i915_gem_request",
|
|
|
|
sizeof(struct drm_i915_gem_request), 0,
|
drm/i915: Enable lockless lookup of request tracking via RCU
If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.
However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.
v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)
Paul E. McKenney wrote:
Another approach is synchronize_rcu() after some largish number of
requests. The advantage of this approach is that it throttles the
production of callbacks at the source. The corresponding disadvantage
is that it slows things up.
Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it. Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The
idea is to do something like this:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
You would of course do an initial get_state_synchronize_rcu() to
get things going. This would not block unless there was less than
one grace period's worth of time between invocations. But this
assumes a busy system, where there is almost always a grace period
in flight. But you can make that happen as follows:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
call_rcu(&my_rcu_head, noop_function);
Note that you need additional code to make sure that the old callback
has completed before doing a new one. Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).
v3: More comments on compiler and processor order of operations within
the RCU lookup and discover we can use rcu_access_pointer() here instead.
v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04 23:32:41 +08:00
|
|
|
SLAB_HWCACHE_ALIGN |
|
|
|
|
SLAB_RECLAIM_ACCOUNT |
|
|
|
|
SLAB_DESTROY_BY_RCU,
|
2015-04-07 23:20:57 +08:00
|
|
|
NULL);
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2013-09-18 12:12:45 +08:00
|
|
|
INIT_LIST_HEAD(&dev_priv->context_list);
|
drm/i915: Track unbound pages
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-20 17:40:46 +08:00
|
|
|
INIT_LIST_HEAD(&dev_priv->mm.unbound_list);
|
|
|
|
INIT_LIST_HEAD(&dev_priv->mm.bound_list);
|
2009-08-30 03:49:51 +08:00
|
|
|
INIT_LIST_HEAD(&dev_priv->mm.fence_list);
|
2016-07-04 15:08:31 +08:00
|
|
|
INIT_DELAYED_WORK(&dev_priv->gt.retire_work,
|
2008-07-31 03:06:12 +08:00
|
|
|
i915_gem_retire_work_handler);
|
2016-07-04 15:08:31 +08:00
|
|
|
INIT_DELAYED_WORK(&dev_priv->gt.idle_work,
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-26 00:34:56 +08:00
|
|
|
i915_gem_idle_work_handler);
|
2016-07-02 00:23:14 +08:00
|
|
|
init_waitqueue_head(&dev_priv->gpu_error.wait_queue);
|
2012-11-16 00:17:22 +08:00
|
|
|
init_waitqueue_head(&dev_priv->gpu_error.reset_queue);
|
2009-09-14 23:50:28 +08:00
|
|
|
|
2010-12-19 19:42:05 +08:00
|
|
|
dev_priv->relative_constants_mode = I915_EXEC_CONSTANTS_REL_GENERAL;
|
|
|
|
|
2009-11-19 00:25:18 +08:00
|
|
|
init_waitqueue_head(&dev_priv->pending_flip_queue);
|
2010-10-28 19:51:39 +08:00
|
|
|
|
2011-02-21 22:43:56 +08:00
|
|
|
dev_priv->mm.interruptible = true;
|
|
|
|
|
2016-09-01 19:58:21 +08:00
|
|
|
atomic_set(&dev_priv->mm.bsd_engine_dispatch_index, 0);
|
|
|
|
|
2016-08-04 23:32:36 +08:00
|
|
|
spin_lock_init(&dev_priv->fb_tracking.lock);
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
2008-12-30 18:31:46 +08:00
|
|
|
|
2016-01-19 21:26:29 +08:00
|
|
|
void i915_gem_load_cleanup(struct drm_device *dev)
|
|
|
|
{
|
|
|
|
struct drm_i915_private *dev_priv = to_i915(dev);
|
|
|
|
|
|
|
|
kmem_cache_destroy(dev_priv->requests);
|
|
|
|
kmem_cache_destroy(dev_priv->vmas);
|
|
|
|
kmem_cache_destroy(dev_priv->objects);
|
drm/i915: Enable lockless lookup of request tracking via RCU
If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.
However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.
v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)
Paul E. McKenney wrote:
Another approach is synchronize_rcu() after some largish number of
requests. The advantage of this approach is that it throttles the
production of callbacks at the source. The corresponding disadvantage
is that it slows things up.
Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it. Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The
idea is to do something like this:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
You would of course do an initial get_state_synchronize_rcu() to
get things going. This would not block unless there was less than
one grace period's worth of time between invocations. But this
assumes a busy system, where there is almost always a grace period
in flight. But you can make that happen as follows:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
call_rcu(&my_rcu_head, noop_function);
Note that you need additional code to make sure that the old callback
has completed before doing a new one. Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).
v3: More comments on compiler and processor order of operations within
the RCU lookup and discover we can use rcu_access_pointer() here instead.
v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04 23:32:41 +08:00
|
|
|
|
|
|
|
/* And ensure that our DESTROY_BY_RCU slabs are truly destroyed */
|
|
|
|
rcu_barrier();
|
2016-01-19 21:26:29 +08:00
|
|
|
}
|
|
|
|
|
2016-09-21 21:51:07 +08:00
|
|
|
int i915_gem_freeze(struct drm_i915_private *dev_priv)
|
|
|
|
{
|
|
|
|
intel_runtime_pm_get(dev_priv);
|
|
|
|
|
|
|
|
mutex_lock(&dev_priv->drm.struct_mutex);
|
|
|
|
i915_gem_shrink_all(dev_priv);
|
|
|
|
mutex_unlock(&dev_priv->drm.struct_mutex);
|
|
|
|
|
|
|
|
intel_runtime_pm_put(dev_priv);
|
|
|
|
|
|
|
|
return 0;
|
2016-01-19 21:26:29 +08:00
|
|
|
}
|
|
|
|
|
2016-05-14 14:26:33 +08:00
|
|
|
int i915_gem_freeze_late(struct drm_i915_private *dev_priv)
|
|
|
|
{
|
|
|
|
struct drm_i915_gem_object *obj;
|
2016-09-10 03:02:18 +08:00
|
|
|
struct list_head *phases[] = {
|
|
|
|
&dev_priv->mm.unbound_list,
|
|
|
|
&dev_priv->mm.bound_list,
|
|
|
|
NULL
|
|
|
|
}, **p;
|
2016-05-14 14:26:33 +08:00
|
|
|
|
|
|
|
/* Called just before we write the hibernation image.
|
|
|
|
*
|
|
|
|
* We need to update the domain tracking to reflect that the CPU
|
|
|
|
* will be accessing all the pages to create and restore from the
|
|
|
|
* hibernation, and so upon restoration those pages will be in the
|
|
|
|
* CPU domain.
|
|
|
|
*
|
|
|
|
* To make sure the hibernation image contains the latest state,
|
|
|
|
* we update that state just before writing out the image.
|
2016-09-10 03:02:18 +08:00
|
|
|
*
|
|
|
|
* To try and reduce the hibernation image, we manually shrink
|
|
|
|
* the objects as well.
|
2016-05-14 14:26:33 +08:00
|
|
|
*/
|
|
|
|
|
2016-09-21 21:51:07 +08:00
|
|
|
mutex_lock(&dev_priv->drm.struct_mutex);
|
|
|
|
i915_gem_shrink(dev_priv, -1UL, I915_SHRINK_UNBOUND);
|
2016-05-14 14:26:33 +08:00
|
|
|
|
2016-09-10 03:02:18 +08:00
|
|
|
for (p = phases; *p; p++) {
|
|
|
|
list_for_each_entry(obj, *p, global_list) {
|
|
|
|
obj->base.read_domains = I915_GEM_DOMAIN_CPU;
|
|
|
|
obj->base.write_domain = I915_GEM_DOMAIN_CPU;
|
|
|
|
}
|
2016-05-14 14:26:33 +08:00
|
|
|
}
|
2016-09-21 21:51:07 +08:00
|
|
|
mutex_unlock(&dev_priv->drm.struct_mutex);
|
2016-05-14 14:26:33 +08:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2010-09-24 23:02:42 +08:00
|
|
|
void i915_gem_release(struct drm_device *dev, struct drm_file *file)
|
2009-06-03 15:27:35 +08:00
|
|
|
{
|
2010-09-24 23:02:42 +08:00
|
|
|
struct drm_i915_file_private *file_priv = file->driver_priv;
|
2016-07-26 19:01:52 +08:00
|
|
|
struct drm_i915_gem_request *request;
|
2009-06-03 15:27:35 +08:00
|
|
|
|
|
|
|
/* Clean up our request list when the client is going away, so that
|
|
|
|
* later retire_requests won't dereference our soon-to-be-gone
|
|
|
|
* file_priv.
|
|
|
|
*/
|
2010-09-26 18:03:27 +08:00
|
|
|
spin_lock(&file_priv->mm.lock);
|
2016-07-26 19:01:52 +08:00
|
|
|
list_for_each_entry(request, &file_priv->mm.request_list, client_list)
|
2010-09-24 23:02:42 +08:00
|
|
|
request->file_priv = NULL;
|
2010-09-26 18:03:27 +08:00
|
|
|
spin_unlock(&file_priv->mm.lock);
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-26 00:34:56 +08:00
|
|
|
|
2015-04-27 20:41:22 +08:00
|
|
|
if (!list_empty(&file_priv->rps.link)) {
|
2015-05-22 04:01:47 +08:00
|
|
|
spin_lock(&to_i915(dev)->rps.client_lock);
|
2015-04-27 20:41:22 +08:00
|
|
|
list_del(&file_priv->rps.link);
|
2015-05-22 04:01:47 +08:00
|
|
|
spin_unlock(&to_i915(dev)->rps.client_lock);
|
2015-04-07 23:20:32 +08:00
|
|
|
}
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-26 00:34:56 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
int i915_gem_open(struct drm_device *dev, struct drm_file *file)
|
|
|
|
{
|
|
|
|
struct drm_i915_file_private *file_priv;
|
2013-12-07 06:10:58 +08:00
|
|
|
int ret;
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-26 00:34:56 +08:00
|
|
|
|
|
|
|
DRM_DEBUG_DRIVER("\n");
|
|
|
|
|
|
|
|
file_priv = kzalloc(sizeof(*file_priv), GFP_KERNEL);
|
|
|
|
if (!file_priv)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
file->driver_priv = file_priv;
|
2016-07-04 18:34:37 +08:00
|
|
|
file_priv->dev_priv = to_i915(dev);
|
2014-02-25 23:11:24 +08:00
|
|
|
file_priv->file = file;
|
2015-04-27 20:41:22 +08:00
|
|
|
INIT_LIST_HEAD(&file_priv->rps.link);
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-26 00:34:56 +08:00
|
|
|
|
|
|
|
spin_lock_init(&file_priv->mm.lock);
|
|
|
|
INIT_LIST_HEAD(&file_priv->mm.request_list);
|
|
|
|
|
2016-07-27 16:07:27 +08:00
|
|
|
file_priv->bsd_engine = -1;
|
2016-01-15 23:12:50 +08:00
|
|
|
|
2013-12-07 06:10:58 +08:00
|
|
|
ret = i915_gem_context_open(dev, file);
|
|
|
|
if (ret)
|
|
|
|
kfree(file_priv);
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-26 00:34:56 +08:00
|
|
|
|
2013-12-07 06:10:58 +08:00
|
|
|
return ret;
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-26 00:34:56 +08:00
|
|
|
}
|
|
|
|
|
2014-09-20 00:27:27 +08:00
|
|
|
/**
|
|
|
|
* i915_gem_track_fb - update frontbuffer tracking
|
2015-09-15 20:58:44 +08:00
|
|
|
* @old: current GEM buffer for the frontbuffer slots
|
|
|
|
* @new: new GEM buffer for the frontbuffer slots
|
|
|
|
* @frontbuffer_bits: bitmask of frontbuffer slots
|
2014-09-20 00:27:27 +08:00
|
|
|
*
|
|
|
|
* This updates the frontbuffer tracking bits @frontbuffer_bits by clearing them
|
|
|
|
* from @old and setting them in @new. Both @old and @new can be NULL.
|
|
|
|
*/
|
2014-06-19 05:28:09 +08:00
|
|
|
void i915_gem_track_fb(struct drm_i915_gem_object *old,
|
|
|
|
struct drm_i915_gem_object *new,
|
|
|
|
unsigned frontbuffer_bits)
|
|
|
|
{
|
2016-08-04 23:32:37 +08:00
|
|
|
/* Control of individual bits within the mask are guarded by
|
|
|
|
* the owning plane->mutex, i.e. we can never see concurrent
|
|
|
|
* manipulation of individual bits. But since the bitfield as a whole
|
|
|
|
* is updated using RMW, we need to use atomics in order to update
|
|
|
|
* the bits.
|
|
|
|
*/
|
|
|
|
BUILD_BUG_ON(INTEL_FRONTBUFFER_BITS_PER_PIPE * I915_MAX_PIPES >
|
|
|
|
sizeof(atomic_t) * BITS_PER_BYTE);
|
|
|
|
|
2014-06-19 05:28:09 +08:00
|
|
|
if (old) {
|
2016-08-04 23:32:37 +08:00
|
|
|
WARN_ON(!(atomic_read(&old->frontbuffer_bits) & frontbuffer_bits));
|
|
|
|
atomic_andnot(frontbuffer_bits, &old->frontbuffer_bits);
|
2014-06-19 05:28:09 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
if (new) {
|
2016-08-04 23:32:37 +08:00
|
|
|
WARN_ON(atomic_read(&new->frontbuffer_bits) & frontbuffer_bits);
|
|
|
|
atomic_or(frontbuffer_bits, &new->frontbuffer_bits);
|
2015-03-16 20:11:13 +08:00
|
|
|
}
|
2013-09-25 00:57:57 +08:00
|
|
|
}
|
2015-07-10 02:29:02 +08:00
|
|
|
|
2015-12-11 02:51:23 +08:00
|
|
|
/* Like i915_gem_object_get_page(), but mark the returned page dirty */
|
|
|
|
struct page *
|
|
|
|
i915_gem_object_get_dirty_page(struct drm_i915_gem_object *obj, int n)
|
|
|
|
{
|
|
|
|
struct page *page;
|
|
|
|
|
|
|
|
/* Only default objects have per-page dirty tracking */
|
2016-06-20 22:05:51 +08:00
|
|
|
if (WARN_ON(!i915_gem_object_has_struct_page(obj)))
|
2015-12-11 02:51:23 +08:00
|
|
|
return NULL;
|
|
|
|
|
|
|
|
page = i915_gem_object_get_page(obj, n);
|
|
|
|
set_page_dirty(page);
|
|
|
|
return page;
|
|
|
|
}
|
|
|
|
|
2015-07-10 02:29:02 +08:00
|
|
|
/* Allocate a new GEM object and fill it with the supplied data */
|
|
|
|
struct drm_i915_gem_object *
|
|
|
|
i915_gem_object_create_from_data(struct drm_device *dev,
|
|
|
|
const void *data, size_t size)
|
|
|
|
{
|
|
|
|
struct drm_i915_gem_object *obj;
|
|
|
|
struct sg_table *sg;
|
|
|
|
size_t bytes;
|
|
|
|
int ret;
|
|
|
|
|
2016-04-23 02:14:32 +08:00
|
|
|
obj = i915_gem_object_create(dev, round_up(size, PAGE_SIZE));
|
2016-04-25 20:32:13 +08:00
|
|
|
if (IS_ERR(obj))
|
2015-07-10 02:29:02 +08:00
|
|
|
return obj;
|
|
|
|
|
|
|
|
ret = i915_gem_object_set_to_cpu_domain(obj, true);
|
|
|
|
if (ret)
|
|
|
|
goto fail;
|
|
|
|
|
|
|
|
ret = i915_gem_object_get_pages(obj);
|
|
|
|
if (ret)
|
|
|
|
goto fail;
|
|
|
|
|
|
|
|
i915_gem_object_pin_pages(obj);
|
|
|
|
sg = obj->pages;
|
|
|
|
bytes = sg_copy_from_buffer(sg->sgl, sg->nents, (void *)data, size);
|
2015-12-11 02:51:24 +08:00
|
|
|
obj->dirty = 1; /* Backing store is now out of date */
|
2015-07-10 02:29:02 +08:00
|
|
|
i915_gem_object_unpin_pages(obj);
|
|
|
|
|
|
|
|
if (WARN_ON(bytes != size)) {
|
|
|
|
DRM_ERROR("Incomplete copy, wrote %zu of %zu", bytes, size);
|
|
|
|
ret = -EFAULT;
|
|
|
|
goto fail;
|
|
|
|
}
|
|
|
|
|
|
|
|
return obj;
|
|
|
|
|
|
|
|
fail:
|
2016-07-20 20:31:53 +08:00
|
|
|
i915_gem_object_put(obj);
|
2015-07-10 02:29:02 +08:00
|
|
|
return ERR_PTR(ret);
|
|
|
|
}
|