2008-07-31 03:06:12 +08:00
|
|
|
/*
|
2015-03-18 17:46:04 +08:00
|
|
|
* Copyright © 2008-2015 Intel Corporation
|
2008-07-31 03:06:12 +08:00
|
|
|
*
|
|
|
|
* Permission is hereby granted, free of charge, to any person obtaining a
|
|
|
|
* copy of this software and associated documentation files (the "Software"),
|
|
|
|
* to deal in the Software without restriction, including without limitation
|
|
|
|
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
|
|
|
* and/or sell copies of the Software, and to permit persons to whom the
|
|
|
|
* Software is furnished to do so, subject to the following conditions:
|
|
|
|
*
|
|
|
|
* The above copyright notice and this permission notice (including the next
|
|
|
|
* paragraph) shall be included in all copies or substantial portions of the
|
|
|
|
* Software.
|
|
|
|
*
|
|
|
|
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
|
|
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
|
|
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
|
|
|
|
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
|
|
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
|
|
|
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
|
|
|
* IN THE SOFTWARE.
|
|
|
|
*
|
|
|
|
* Authors:
|
|
|
|
* Eric Anholt <eric@anholt.net>
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
|
2013-07-25 03:07:52 +08:00
|
|
|
#include <drm/drm_vma_manager.h>
|
2012-10-03 01:01:07 +08:00
|
|
|
#include <drm/i915_drm.h>
|
2016-11-15 04:41:05 +08:00
|
|
|
#include <linux/dma-fence-array.h>
|
2017-02-13 01:20:01 +08:00
|
|
|
#include <linux/kthread.h>
|
2016-07-20 16:21:15 +08:00
|
|
|
#include <linux/reservation.h>
|
2011-06-28 07:18:18 +08:00
|
|
|
#include <linux/shmem_fs.h>
|
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 16:04:11 +08:00
|
|
|
#include <linux/slab.h>
|
2016-11-22 22:41:21 +08:00
|
|
|
#include <linux/stop_machine.h>
|
2008-07-31 03:06:12 +08:00
|
|
|
#include <linux/swap.h>
|
DRM: i915: add mode setting support
This commit adds i915 driver support for the DRM mode setting APIs.
Currently, VGA, LVDS, SDVO DVI & VGA, TV and DVO LVDS outputs are
supported. HDMI, DisplayPort and additional SDVO output support will
follow.
Support for the mode setting code is controlled by the new 'modeset'
module option. A new config option, CONFIG_DRM_I915_KMS controls the
default behavior, and whether a PCI ID list is built into the module for
use by user level module utilities.
Note that if mode setting is enabled, user level drivers that access
display registers directly or that don't use the kernel graphics memory
manager will likely corrupt kernel graphics memory, disrupt output
configuration (possibly leading to hangs and/or blank displays), and
prevent panic/oops messages from appearing. So use caution when
enabling this code; be sure your user level code supports the new
interfaces.
A new SysRq key, 'g', provides emergency support for switching back to
the kernel's framebuffer console; which is useful for testing.
Co-authors: Dave Airlie <airlied@linux.ie>, Hong Liu <hong.liu@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2008-11-08 06:24:08 +08:00
|
|
|
#include <linux/pci.h>
|
2012-05-10 21:25:09 +08:00
|
|
|
#include <linux/dma-buf.h>
|
2019-01-18 05:03:34 +08:00
|
|
|
#include <linux/mman.h>
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2019-06-13 16:44:16 +08:00
|
|
|
#include "display/intel_display.h"
|
|
|
|
#include "display/intel_frontbuffer.h"
|
|
|
|
|
2019-05-28 17:29:49 +08:00
|
|
|
#include "gem/i915_gem_clflush.h"
|
|
|
|
#include "gem/i915_gem_context.h"
|
2019-05-28 17:29:43 +08:00
|
|
|
#include "gem/i915_gem_ioctls.h"
|
2019-05-28 17:29:49 +08:00
|
|
|
#include "gem/i915_gem_pm.h"
|
|
|
|
#include "gem/i915_gemfs.h"
|
2019-06-21 15:08:02 +08:00
|
|
|
#include "gt/intel_gt.h"
|
drm/i915: Invert the GEM wakeref hierarchy
In the current scheme, on submitting a request we take a single global
GEM wakeref, which trickles down to wake up all GT power domains. This
is undesirable as we would like to be able to localise our power
management to the available power domains and to remove the global GEM
operations from the heart of the driver. (The intent there is to push
global GEM decisions to the boundary as used by the GEM user interface.)
Now during request construction, each request is responsible via its
logical context to acquire a wakeref on each power domain it intends to
utilize. Currently, each request takes a wakeref on the engine(s) and
the engines themselves take a chipset wakeref. This gives us a
transition on each engine which we can extend if we want to insert more
powermangement control (such as soft rc6). The global GEM operations
that currently require a struct_mutex are reduced to listening to pm
events from the chipset GT wakeref. As we reduce the struct_mutex
requirement, these listeners should evaporate.
Perhaps the biggest immediate change is that this removes the
struct_mutex requirement around GT power management, allowing us greater
flexibility in request construction. Another important knock-on effect,
is that by tracking engine usage, we can insert a switch back to the
kernel context on that engine immediately, avoiding any extra delay or
inserting global synchronisation barriers. This makes tracking when an
engine and its associated contexts are idle much easier -- important for
when we forgo our assumed execution ordering and need idle barriers to
unpin used contexts. In the process, it means we remove a large chunk of
code whose only purpose was to switch back to the kernel context.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-5-chris@chris-wilson.co.uk
2019-04-25 04:07:17 +08:00
|
|
|
#include "gt/intel_gt_pm.h"
|
2019-04-25 01:48:39 +08:00
|
|
|
#include "gt/intel_mocs.h"
|
|
|
|
#include "gt/intel_reset.h"
|
2019-07-29 19:37:20 +08:00
|
|
|
#include "gt/intel_renderstate.h"
|
2019-04-25 01:48:39 +08:00
|
|
|
#include "gt/intel_workarounds.h"
|
|
|
|
|
2019-01-16 23:33:04 +08:00
|
|
|
#include "i915_drv.h"
|
2019-05-28 17:29:50 +08:00
|
|
|
#include "i915_scatterlist.h"
|
2019-01-16 23:33:04 +08:00
|
|
|
#include "i915_trace.h"
|
|
|
|
#include "i915_vgpu.h"
|
|
|
|
|
|
|
|
#include "intel_drv.h"
|
2019-04-05 19:00:15 +08:00
|
|
|
#include "intel_pm.h"
|
2019-01-16 23:33:04 +08:00
|
|
|
|
2016-06-10 16:53:01 +08:00
|
|
|
static int
|
2016-10-28 20:58:39 +08:00
|
|
|
insert_mappable_node(struct i915_ggtt *ggtt,
|
2016-06-10 16:53:01 +08:00
|
|
|
struct drm_mm_node *node, u32 size)
|
|
|
|
{
|
|
|
|
memset(node, 0, sizeof(*node));
|
2018-06-05 23:37:58 +08:00
|
|
|
return drm_mm_insert_node_in_range(&ggtt->vm.mm, node,
|
2017-02-03 05:04:38 +08:00
|
|
|
size, 0, I915_COLOR_UNEVICTABLE,
|
|
|
|
0, ggtt->mappable_end,
|
|
|
|
DRM_MM_INSERT_LOW);
|
2016-06-10 16:53:01 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
remove_mappable_node(struct drm_mm_node *node)
|
|
|
|
{
|
|
|
|
drm_mm_remove_node(node);
|
|
|
|
}
|
|
|
|
|
2008-10-23 12:40:13 +08:00
|
|
|
int
|
|
|
|
i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_file *file)
|
2008-10-23 12:40:13 +08:00
|
|
|
{
|
2019-01-28 18:23:53 +08:00
|
|
|
struct i915_ggtt *ggtt = &to_i915(dev)->ggtt;
|
2016-03-30 21:57:10 +08:00
|
|
|
struct drm_i915_gem_get_aperture *args = data;
|
2015-07-01 18:51:10 +08:00
|
|
|
struct i915_vma *vma;
|
2017-05-31 10:35:52 +08:00
|
|
|
u64 pinned;
|
2008-10-23 12:40:13 +08:00
|
|
|
|
2019-01-28 18:23:53 +08:00
|
|
|
mutex_lock(&ggtt->vm.mutex);
|
|
|
|
|
2018-06-05 23:37:58 +08:00
|
|
|
pinned = ggtt->vm.reserved;
|
2019-01-28 18:23:52 +08:00
|
|
|
list_for_each_entry(vma, &ggtt->vm.bound_list, vm_link)
|
2016-08-04 23:32:30 +08:00
|
|
|
if (i915_vma_is_pinned(vma))
|
2015-07-01 18:51:10 +08:00
|
|
|
pinned += vma->node.size;
|
2019-01-28 18:23:53 +08:00
|
|
|
|
|
|
|
mutex_unlock(&ggtt->vm.mutex);
|
2008-10-23 12:40:13 +08:00
|
|
|
|
2018-06-05 23:37:58 +08:00
|
|
|
args->aper_size = ggtt->vm.total;
|
2011-08-17 03:34:10 +08:00
|
|
|
args->aper_available_size = args->aper_size - pinned;
|
2010-11-24 20:23:44 +08:00
|
|
|
|
2008-10-23 12:40:13 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-07-03 17:17:17 +08:00
|
|
|
int i915_gem_object_unbind(struct drm_i915_gem_object *obj,
|
|
|
|
unsigned long flags)
|
2016-08-04 14:52:27 +08:00
|
|
|
{
|
|
|
|
struct i915_vma *vma;
|
|
|
|
LIST_HEAD(still_in_list);
|
2019-05-28 17:29:51 +08:00
|
|
|
int ret = 0;
|
2016-08-15 01:44:41 +08:00
|
|
|
|
|
|
|
lockdep_assert_held(&obj->base.dev->struct_mutex);
|
2016-08-04 14:52:27 +08:00
|
|
|
|
2019-01-28 18:23:54 +08:00
|
|
|
spin_lock(&obj->vma.lock);
|
|
|
|
while (!ret && (vma = list_first_entry_or_null(&obj->vma.list,
|
|
|
|
struct i915_vma,
|
|
|
|
obj_link))) {
|
2016-08-04 14:52:27 +08:00
|
|
|
list_move_tail(&vma->obj_link, &still_in_list);
|
2019-01-28 18:23:54 +08:00
|
|
|
spin_unlock(&obj->vma.lock);
|
|
|
|
|
2019-07-03 17:17:17 +08:00
|
|
|
ret = -EBUSY;
|
|
|
|
if (flags & I915_GEM_OBJECT_UNBIND_ACTIVE ||
|
|
|
|
!i915_vma_is_active(vma))
|
|
|
|
ret = i915_vma_unbind(vma);
|
2019-01-28 18:23:54 +08:00
|
|
|
|
|
|
|
spin_lock(&obj->vma.lock);
|
2016-08-04 14:52:27 +08:00
|
|
|
}
|
2019-01-28 18:23:54 +08:00
|
|
|
list_splice(&still_in_list, &obj->vma.list);
|
|
|
|
spin_unlock(&obj->vma.lock);
|
2016-08-04 14:52:27 +08:00
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2014-05-21 19:42:56 +08:00
|
|
|
static int
|
|
|
|
i915_gem_phys_pwrite(struct drm_i915_gem_object *obj,
|
|
|
|
struct drm_i915_gem_pwrite *args,
|
2016-10-28 20:58:36 +08:00
|
|
|
struct drm_file *file)
|
2014-05-21 19:42:56 +08:00
|
|
|
{
|
|
|
|
void *vaddr = obj->phys_handle->vaddr + args->offset;
|
2016-04-26 23:32:27 +08:00
|
|
|
char __user *user_data = u64_to_user_ptr(args->data_ptr);
|
2014-11-04 20:51:40 +08:00
|
|
|
|
|
|
|
/* We manually control the domain here and pretend that it
|
|
|
|
* remains coherent i.e. in the GTT domain, like shmem_pwrite.
|
|
|
|
*/
|
2015-06-19 02:43:24 +08:00
|
|
|
intel_fb_obj_invalidate(obj, ORIGIN_CPU);
|
2017-01-06 23:22:38 +08:00
|
|
|
if (copy_from_user(vaddr, user_data, args->size))
|
|
|
|
return -EFAULT;
|
2014-05-21 19:42:56 +08:00
|
|
|
|
2014-11-04 20:51:40 +08:00
|
|
|
drm_clflush_virt_range(vaddr, args->size);
|
2019-06-21 15:08:02 +08:00
|
|
|
intel_gt_chipset_flush(&to_i915(obj->base.dev)->gt);
|
2015-02-14 03:23:45 +08:00
|
|
|
|
2017-02-22 19:40:49 +08:00
|
|
|
intel_fb_obj_flush(obj, ORIGIN_CPU);
|
2017-01-06 23:22:38 +08:00
|
|
|
return 0;
|
2014-05-21 19:42:56 +08:00
|
|
|
}
|
|
|
|
|
2011-02-07 10:16:14 +08:00
|
|
|
static int
|
|
|
|
i915_gem_create(struct drm_file *file,
|
2016-12-01 22:16:37 +08:00
|
|
|
struct drm_i915_private *dev_priv,
|
2019-03-27 01:02:18 +08:00
|
|
|
u64 *size_p,
|
2019-01-16 17:15:19 +08:00
|
|
|
u32 *handle_p)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_i915_gem_object *obj;
|
2009-08-23 17:40:55 +08:00
|
|
|
u32 handle;
|
2019-03-27 01:02:18 +08:00
|
|
|
u64 size;
|
|
|
|
int ret;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2019-03-27 01:02:18 +08:00
|
|
|
size = round_up(*size_p, PAGE_SIZE);
|
2011-09-14 20:14:28 +08:00
|
|
|
if (size == 0)
|
|
|
|
return -EINVAL;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
|
|
|
/* Allocate the new object */
|
2019-05-28 17:29:45 +08:00
|
|
|
obj = i915_gem_object_create_shmem(dev_priv, size);
|
2016-04-25 20:32:13 +08:00
|
|
|
if (IS_ERR(obj))
|
|
|
|
return PTR_ERR(obj);
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2010-11-09 03:18:58 +08:00
|
|
|
ret = drm_gem_handle_create(file, &obj->base, &handle);
|
2010-10-14 20:20:40 +08:00
|
|
|
/* drop reference from allocate - handle holds it now */
|
2016-10-28 20:58:43 +08:00
|
|
|
i915_gem_object_put(obj);
|
2013-07-25 05:25:03 +08:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
2010-10-14 20:20:40 +08:00
|
|
|
|
2011-02-07 10:16:14 +08:00
|
|
|
*handle_p = handle;
|
2019-04-17 21:25:07 +08:00
|
|
|
*size_p = size;
|
2008-07-31 03:06:12 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2011-02-07 10:16:14 +08:00
|
|
|
int
|
|
|
|
i915_gem_dumb_create(struct drm_file *file,
|
|
|
|
struct drm_device *dev,
|
|
|
|
struct drm_mode_create_dumb *args)
|
|
|
|
{
|
2019-05-09 20:21:57 +08:00
|
|
|
int cpp = DIV_ROUND_UP(args->bpp, 8);
|
|
|
|
u32 format;
|
|
|
|
|
|
|
|
switch (cpp) {
|
|
|
|
case 1:
|
|
|
|
format = DRM_FORMAT_C8;
|
|
|
|
break;
|
|
|
|
case 2:
|
|
|
|
format = DRM_FORMAT_RGB565;
|
|
|
|
break;
|
|
|
|
case 4:
|
|
|
|
format = DRM_FORMAT_XRGB8888;
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2011-02-07 10:16:14 +08:00
|
|
|
/* have to work out size/pitch and return them */
|
2019-05-09 20:21:57 +08:00
|
|
|
args->pitch = ALIGN(args->width * cpp, 64);
|
|
|
|
|
|
|
|
/* align stride to page size so that we can remap */
|
|
|
|
if (args->pitch > intel_plane_fb_max_stride(to_i915(dev), format,
|
|
|
|
DRM_FORMAT_MOD_LINEAR))
|
|
|
|
args->pitch = ALIGN(args->pitch, 4096);
|
|
|
|
|
2011-02-07 10:16:14 +08:00
|
|
|
args->size = args->pitch * args->height;
|
2016-12-01 22:16:37 +08:00
|
|
|
return i915_gem_create(file, to_i915(dev),
|
2019-03-27 01:02:18 +08:00
|
|
|
&args->size, &args->handle);
|
2011-02-07 10:16:14 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Creates a new mm object and returns a handle to it.
|
2016-06-03 21:02:17 +08:00
|
|
|
* @dev: drm device pointer
|
|
|
|
* @data: ioctl data blob
|
|
|
|
* @file: drm file pointer
|
2011-02-07 10:16:14 +08:00
|
|
|
*/
|
|
|
|
int
|
|
|
|
i915_gem_create_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file)
|
|
|
|
{
|
2016-12-01 22:16:37 +08:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(dev);
|
2011-02-07 10:16:14 +08:00
|
|
|
struct drm_i915_gem_create *args = data;
|
2012-04-23 22:50:50 +08:00
|
|
|
|
2016-12-01 22:16:37 +08:00
|
|
|
i915_gem_flush_free_objects(dev_priv);
|
2016-10-28 20:58:42 +08:00
|
|
|
|
2016-12-01 22:16:37 +08:00
|
|
|
return i915_gem_create(file, dev_priv,
|
2019-03-27 01:02:18 +08:00
|
|
|
&args->size, &args->handle);
|
2011-02-07 10:16:14 +08:00
|
|
|
}
|
|
|
|
|
2012-03-26 01:47:40 +08:00
|
|
|
static int
|
2019-01-05 20:07:58 +08:00
|
|
|
shmem_pread(struct page *page, int offset, int len, char __user *user_data,
|
|
|
|
bool needs_clflush)
|
2012-03-26 01:47:40 +08:00
|
|
|
{
|
|
|
|
char *vaddr;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
vaddr = kmap(page);
|
|
|
|
|
2019-01-05 20:07:58 +08:00
|
|
|
if (needs_clflush)
|
|
|
|
drm_clflush_virt_range(vaddr + offset, len);
|
2016-10-28 20:58:39 +08:00
|
|
|
|
2019-01-05 20:07:58 +08:00
|
|
|
ret = __copy_to_user(user_data, vaddr + offset, len);
|
2016-10-28 20:58:39 +08:00
|
|
|
|
2019-01-05 20:07:58 +08:00
|
|
|
kunmap(page);
|
2016-10-28 20:58:39 +08:00
|
|
|
|
2019-01-05 20:07:58 +08:00
|
|
|
return ret ? -EFAULT : 0;
|
2016-10-28 20:58:39 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
i915_gem_shmem_pread(struct drm_i915_gem_object *obj,
|
|
|
|
struct drm_i915_gem_pread *args)
|
|
|
|
{
|
|
|
|
unsigned int needs_clflush;
|
|
|
|
unsigned int idx, offset;
|
2019-05-28 17:29:51 +08:00
|
|
|
struct dma_fence *fence;
|
|
|
|
char __user *user_data;
|
|
|
|
u64 remain;
|
2016-10-28 20:58:39 +08:00
|
|
|
int ret;
|
|
|
|
|
2019-05-28 17:29:48 +08:00
|
|
|
ret = i915_gem_object_prepare_read(obj, &needs_clflush);
|
2016-10-28 20:58:39 +08:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2019-05-28 17:29:51 +08:00
|
|
|
fence = i915_gem_object_lock_fence(obj);
|
|
|
|
i915_gem_object_finish_access(obj);
|
|
|
|
if (!fence)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
2016-10-28 20:58:39 +08:00
|
|
|
remain = args->size;
|
|
|
|
user_data = u64_to_user_ptr(args->data_ptr);
|
|
|
|
offset = offset_in_page(args->offset);
|
|
|
|
for (idx = args->offset >> PAGE_SHIFT; remain; idx++) {
|
|
|
|
struct page *page = i915_gem_object_get_page(obj, idx);
|
2018-10-12 22:02:28 +08:00
|
|
|
unsigned int length = min_t(u64, remain, PAGE_SIZE - offset);
|
2016-10-28 20:58:39 +08:00
|
|
|
|
|
|
|
ret = shmem_pread(page, offset, length, user_data,
|
|
|
|
needs_clflush);
|
|
|
|
if (ret)
|
|
|
|
break;
|
|
|
|
|
|
|
|
remain -= length;
|
|
|
|
user_data += length;
|
|
|
|
offset = 0;
|
|
|
|
}
|
|
|
|
|
2019-05-28 17:29:51 +08:00
|
|
|
i915_gem_object_unlock_fence(obj, fence);
|
2016-10-28 20:58:39 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool
|
|
|
|
gtt_user_read(struct io_mapping *mapping,
|
|
|
|
loff_t base, int offset,
|
|
|
|
char __user *user_data, int length)
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
{
|
2017-09-02 01:12:52 +08:00
|
|
|
void __iomem *vaddr;
|
2016-10-28 20:58:39 +08:00
|
|
|
unsigned long unwritten;
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
|
|
|
|
/* We can use the cpu mem copy function because this is X86. */
|
2017-09-02 01:12:52 +08:00
|
|
|
vaddr = io_mapping_map_atomic_wc(mapping, base);
|
|
|
|
unwritten = __copy_to_user_inatomic(user_data,
|
|
|
|
(void __force *)vaddr + offset,
|
|
|
|
length);
|
2016-10-28 20:58:39 +08:00
|
|
|
io_mapping_unmap_atomic(vaddr);
|
|
|
|
if (unwritten) {
|
2017-09-02 01:12:52 +08:00
|
|
|
vaddr = io_mapping_map_wc(mapping, base, PAGE_SIZE);
|
|
|
|
unwritten = copy_to_user(user_data,
|
|
|
|
(void __force *)vaddr + offset,
|
|
|
|
length);
|
2016-10-28 20:58:39 +08:00
|
|
|
io_mapping_unmap(vaddr);
|
|
|
|
}
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
return unwritten;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2016-10-28 20:58:39 +08:00
|
|
|
i915_gem_gtt_pread(struct drm_i915_gem_object *obj,
|
|
|
|
const struct drm_i915_gem_pread *args)
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
{
|
2016-10-28 20:58:39 +08:00
|
|
|
struct drm_i915_private *i915 = to_i915(obj->base.dev);
|
|
|
|
struct i915_ggtt *ggtt = &i915->ggtt;
|
2019-01-14 22:21:18 +08:00
|
|
|
intel_wakeref_t wakeref;
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
struct drm_mm_node node;
|
2019-05-28 17:29:51 +08:00
|
|
|
struct dma_fence *fence;
|
2016-10-28 20:58:39 +08:00
|
|
|
void __user *user_data;
|
2019-05-28 17:29:51 +08:00
|
|
|
struct i915_vma *vma;
|
2016-10-28 20:58:39 +08:00
|
|
|
u64 remain, offset;
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
int ret;
|
|
|
|
|
2016-10-28 20:58:39 +08:00
|
|
|
ret = mutex_lock_interruptible(&i915->drm.struct_mutex);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2019-06-14 07:21:54 +08:00
|
|
|
wakeref = intel_runtime_pm_get(&i915->runtime_pm);
|
2016-10-28 20:58:39 +08:00
|
|
|
vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
|
2017-10-09 16:44:00 +08:00
|
|
|
PIN_MAPPABLE |
|
|
|
|
PIN_NONFAULT |
|
|
|
|
PIN_NONBLOCK);
|
2016-08-19 00:16:45 +08:00
|
|
|
if (!IS_ERR(vma)) {
|
|
|
|
node.start = i915_ggtt_offset(vma);
|
|
|
|
node.allocated = false;
|
2016-08-19 00:17:00 +08:00
|
|
|
ret = i915_vma_put_fence(vma);
|
2016-08-19 00:16:45 +08:00
|
|
|
if (ret) {
|
|
|
|
i915_vma_unpin(vma);
|
|
|
|
vma = ERR_PTR(ret);
|
|
|
|
}
|
|
|
|
}
|
2016-08-15 17:49:06 +08:00
|
|
|
if (IS_ERR(vma)) {
|
2016-10-28 20:58:39 +08:00
|
|
|
ret = insert_mappable_node(ggtt, &node, PAGE_SIZE);
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
if (ret)
|
2016-10-28 20:58:39 +08:00
|
|
|
goto out_unlock;
|
|
|
|
GEM_BUG_ON(!node.allocated);
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
}
|
|
|
|
|
2019-05-28 17:29:51 +08:00
|
|
|
mutex_unlock(&i915->drm.struct_mutex);
|
|
|
|
|
|
|
|
ret = i915_gem_object_lock_interruptible(obj);
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
if (ret)
|
|
|
|
goto out_unpin;
|
|
|
|
|
2019-05-28 17:29:51 +08:00
|
|
|
ret = i915_gem_object_set_to_gtt_domain(obj, false);
|
|
|
|
if (ret) {
|
|
|
|
i915_gem_object_unlock(obj);
|
|
|
|
goto out_unpin;
|
|
|
|
}
|
|
|
|
|
|
|
|
fence = i915_gem_object_lock_fence(obj);
|
|
|
|
i915_gem_object_unlock(obj);
|
|
|
|
if (!fence) {
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto out_unpin;
|
|
|
|
}
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
|
2016-10-28 20:58:39 +08:00
|
|
|
user_data = u64_to_user_ptr(args->data_ptr);
|
|
|
|
remain = args->size;
|
|
|
|
offset = args->offset;
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
|
|
|
|
while (remain > 0) {
|
|
|
|
/* Operation in this page
|
|
|
|
*
|
|
|
|
* page_base = page offset within aperture
|
|
|
|
* page_offset = offset within page
|
|
|
|
* page_length = bytes to copy for this page
|
|
|
|
*/
|
|
|
|
u32 page_base = node.start;
|
|
|
|
unsigned page_offset = offset_in_page(offset);
|
|
|
|
unsigned page_length = PAGE_SIZE - page_offset;
|
|
|
|
page_length = remain < page_length ? remain : page_length;
|
|
|
|
if (node.allocated) {
|
2018-06-05 23:37:58 +08:00
|
|
|
ggtt->vm.insert_page(&ggtt->vm,
|
|
|
|
i915_gem_object_get_dma_address(obj, offset >> PAGE_SHIFT),
|
|
|
|
node.start, I915_CACHE_NONE, 0);
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
} else {
|
|
|
|
page_base += offset & PAGE_MASK;
|
|
|
|
}
|
2016-10-28 20:58:39 +08:00
|
|
|
|
2017-12-11 23:18:20 +08:00
|
|
|
if (gtt_user_read(&ggtt->iomap, page_base, page_offset,
|
2016-10-28 20:58:39 +08:00
|
|
|
user_data, page_length)) {
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
ret = -EFAULT;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
remain -= page_length;
|
|
|
|
user_data += page_length;
|
|
|
|
offset += page_length;
|
|
|
|
}
|
|
|
|
|
2019-05-28 17:29:51 +08:00
|
|
|
i915_gem_object_unlock_fence(obj, fence);
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
out_unpin:
|
2019-05-28 17:29:51 +08:00
|
|
|
mutex_lock(&i915->drm.struct_mutex);
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
if (node.allocated) {
|
2018-06-05 23:37:58 +08:00
|
|
|
ggtt->vm.clear_range(&ggtt->vm, node.start, node.size);
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
remove_mappable_node(&node);
|
|
|
|
} else {
|
2016-08-15 17:49:06 +08:00
|
|
|
i915_vma_unpin(vma);
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
}
|
2016-10-28 20:58:39 +08:00
|
|
|
out_unlock:
|
2019-06-14 07:21:54 +08:00
|
|
|
intel_runtime_pm_put(&i915->runtime_pm, wakeref);
|
2016-10-28 20:58:39 +08:00
|
|
|
mutex_unlock(&i915->drm.struct_mutex);
|
2012-09-05 04:02:56 +08:00
|
|
|
|
2009-03-11 02:44:52 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2008-07-31 03:06:12 +08:00
|
|
|
/**
|
|
|
|
* Reads data from the object referenced by handle.
|
2016-06-03 21:02:17 +08:00
|
|
|
* @dev: drm device pointer
|
|
|
|
* @data: ioctl data blob
|
|
|
|
* @file: drm file pointer
|
2008-07-31 03:06:12 +08:00
|
|
|
*
|
|
|
|
* On error, the contents of *data are undefined.
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
i915_gem_pread_ioctl(struct drm_device *dev, void *data,
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_file *file)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
|
|
|
struct drm_i915_gem_pread *args = data;
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_i915_gem_object *obj;
|
2016-10-28 20:58:39 +08:00
|
|
|
int ret;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2010-11-17 17:10:42 +08:00
|
|
|
if (args->size == 0)
|
|
|
|
return 0;
|
|
|
|
|
Remove 'type' argument from access_ok() function
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.
It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access. But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.
A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model. And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.
This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
There were a couple of notable cases:
- csky still had the old "verify_area()" name as an alias.
- the iter_iov code had magical hardcoded knowledge of the actual
values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
really used it)
- microblaze used the type argument for a debug printout
but other than those oddities this should be a total no-op patch.
I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something. Any missed conversion should be trivially fixable, though.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 10:57:57 +08:00
|
|
|
if (!access_ok(u64_to_user_ptr(args->data_ptr),
|
2010-11-17 17:10:42 +08:00
|
|
|
args->size))
|
|
|
|
return -EFAULT;
|
|
|
|
|
2016-07-20 20:31:51 +08:00
|
|
|
obj = i915_gem_object_lookup(file, args->handle);
|
2016-08-05 17:14:16 +08:00
|
|
|
if (!obj)
|
|
|
|
return -ENOENT;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2010-09-27 03:21:44 +08:00
|
|
|
/* Bounds check source. */
|
2016-12-14 04:32:22 +08:00
|
|
|
if (range_overflows_t(u64, args->offset, args->size, obj->base.size)) {
|
2010-09-27 03:50:05 +08:00
|
|
|
ret = -EINVAL;
|
2016-10-28 20:58:39 +08:00
|
|
|
goto out;
|
2010-09-27 03:50:05 +08:00
|
|
|
}
|
|
|
|
|
2011-02-03 19:57:46 +08:00
|
|
|
trace_i915_gem_object_pread(obj, args->offset, args->size);
|
|
|
|
|
2016-10-28 20:58:27 +08:00
|
|
|
ret = i915_gem_object_wait(obj,
|
|
|
|
I915_WAIT_INTERRUPTIBLE,
|
2019-02-13 17:25:04 +08:00
|
|
|
MAX_SCHEDULE_TIMEOUT);
|
2016-08-05 17:14:16 +08:00
|
|
|
if (ret)
|
2016-10-28 20:58:39 +08:00
|
|
|
goto out;
|
2016-08-05 17:14:16 +08:00
|
|
|
|
2016-10-28 20:58:39 +08:00
|
|
|
ret = i915_gem_object_pin_pages(obj);
|
2016-08-05 17:14:16 +08:00
|
|
|
if (ret)
|
2016-10-28 20:58:39 +08:00
|
|
|
goto out;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2016-10-28 20:58:39 +08:00
|
|
|
ret = i915_gem_shmem_pread(obj, args);
|
2016-10-24 20:42:15 +08:00
|
|
|
if (ret == -EFAULT || ret == -ENODEV)
|
2016-10-28 20:58:39 +08:00
|
|
|
ret = i915_gem_gtt_pread(obj, args);
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
|
2016-10-28 20:58:39 +08:00
|
|
|
i915_gem_object_unpin_pages(obj);
|
|
|
|
out:
|
2016-10-28 20:58:43 +08:00
|
|
|
i915_gem_object_put(obj);
|
2009-03-11 02:44:52 +08:00
|
|
|
return ret;
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
|
|
|
|
2008-10-31 10:38:48 +08:00
|
|
|
/* This is the fast write path which cannot handle
|
|
|
|
* page faults in the source data
|
2008-10-21 05:16:43 +08:00
|
|
|
*/
|
2008-10-31 10:38:48 +08:00
|
|
|
|
2016-10-28 20:58:40 +08:00
|
|
|
static inline bool
|
|
|
|
ggtt_write(struct io_mapping *mapping,
|
|
|
|
loff_t base, int offset,
|
|
|
|
char __user *user_data, int length)
|
2008-10-21 05:16:43 +08:00
|
|
|
{
|
2017-09-02 01:12:52 +08:00
|
|
|
void __iomem *vaddr;
|
2008-10-31 10:38:48 +08:00
|
|
|
unsigned long unwritten;
|
2008-10-21 05:16:43 +08:00
|
|
|
|
2012-04-17 05:07:47 +08:00
|
|
|
/* We can use the cpu mem copy function because this is X86. */
|
2017-09-02 01:12:52 +08:00
|
|
|
vaddr = io_mapping_map_atomic_wc(mapping, base);
|
|
|
|
unwritten = __copy_from_user_inatomic_nocache((void __force *)vaddr + offset,
|
2008-10-31 10:38:48 +08:00
|
|
|
user_data, length);
|
2016-10-28 20:58:40 +08:00
|
|
|
io_mapping_unmap_atomic(vaddr);
|
|
|
|
if (unwritten) {
|
2017-09-02 01:12:52 +08:00
|
|
|
vaddr = io_mapping_map_wc(mapping, base, PAGE_SIZE);
|
|
|
|
unwritten = copy_from_user((void __force *)vaddr + offset,
|
|
|
|
user_data, length);
|
2016-10-28 20:58:40 +08:00
|
|
|
io_mapping_unmap(vaddr);
|
|
|
|
}
|
2016-10-28 20:58:39 +08:00
|
|
|
|
|
|
|
return unwritten;
|
|
|
|
}
|
|
|
|
|
2009-03-10 00:42:23 +08:00
|
|
|
/**
|
|
|
|
* This is the fast pwrite path, where we copy the data directly from the
|
|
|
|
* user into the GTT, uncached.
|
2016-10-28 20:58:40 +08:00
|
|
|
* @obj: i915 GEM object
|
2016-06-03 21:02:17 +08:00
|
|
|
* @args: pwrite arguments structure
|
2009-03-10 00:42:23 +08:00
|
|
|
*/
|
2008-07-31 03:06:12 +08:00
|
|
|
static int
|
2016-10-28 20:58:40 +08:00
|
|
|
i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj,
|
|
|
|
const struct drm_i915_gem_pwrite *args)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
2016-10-28 20:58:40 +08:00
|
|
|
struct drm_i915_private *i915 = to_i915(obj->base.dev);
|
2016-06-10 16:53:01 +08:00
|
|
|
struct i915_ggtt *ggtt = &i915->ggtt;
|
2019-06-14 07:21:54 +08:00
|
|
|
struct intel_runtime_pm *rpm = &i915->runtime_pm;
|
2019-01-14 22:21:18 +08:00
|
|
|
intel_wakeref_t wakeref;
|
2016-06-10 16:53:01 +08:00
|
|
|
struct drm_mm_node node;
|
2019-05-28 17:29:51 +08:00
|
|
|
struct dma_fence *fence;
|
2016-10-28 20:58:40 +08:00
|
|
|
struct i915_vma *vma;
|
|
|
|
u64 remain, offset;
|
|
|
|
void __user *user_data;
|
2016-06-10 16:53:01 +08:00
|
|
|
int ret;
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
|
2016-10-28 20:58:40 +08:00
|
|
|
ret = mutex_lock_interruptible(&i915->drm.struct_mutex);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
2012-03-26 01:47:35 +08:00
|
|
|
|
2017-10-19 14:37:33 +08:00
|
|
|
if (i915_gem_object_has_struct_page(obj)) {
|
|
|
|
/*
|
|
|
|
* Avoid waking the device up if we can fallback, as
|
|
|
|
* waking/resuming is very slow (worst-case 10-100 ms
|
|
|
|
* depending on PCI sleeps and our own resume time).
|
|
|
|
* This easily dwarfs any performance advantage from
|
|
|
|
* using the cache bypass of indirect GGTT access.
|
|
|
|
*/
|
2019-06-14 07:21:54 +08:00
|
|
|
wakeref = intel_runtime_pm_get_if_in_use(rpm);
|
2019-01-14 22:21:18 +08:00
|
|
|
if (!wakeref) {
|
2017-10-19 14:37:33 +08:00
|
|
|
ret = -EFAULT;
|
|
|
|
goto out_unlock;
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
/* No backing pages, no fallback, we must force GGTT access */
|
2019-06-14 07:21:54 +08:00
|
|
|
wakeref = intel_runtime_pm_get(rpm);
|
2017-10-19 14:37:33 +08:00
|
|
|
}
|
|
|
|
|
2016-08-15 17:49:06 +08:00
|
|
|
vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
|
2017-10-09 16:44:00 +08:00
|
|
|
PIN_MAPPABLE |
|
|
|
|
PIN_NONFAULT |
|
|
|
|
PIN_NONBLOCK);
|
2016-08-19 00:16:45 +08:00
|
|
|
if (!IS_ERR(vma)) {
|
|
|
|
node.start = i915_ggtt_offset(vma);
|
|
|
|
node.allocated = false;
|
2016-08-19 00:17:00 +08:00
|
|
|
ret = i915_vma_put_fence(vma);
|
2016-08-19 00:16:45 +08:00
|
|
|
if (ret) {
|
|
|
|
i915_vma_unpin(vma);
|
|
|
|
vma = ERR_PTR(ret);
|
|
|
|
}
|
|
|
|
}
|
2016-08-15 17:49:06 +08:00
|
|
|
if (IS_ERR(vma)) {
|
2016-10-28 20:58:39 +08:00
|
|
|
ret = insert_mappable_node(ggtt, &node, PAGE_SIZE);
|
2016-06-10 16:53:01 +08:00
|
|
|
if (ret)
|
2017-10-19 14:37:33 +08:00
|
|
|
goto out_rpm;
|
2016-10-28 20:58:40 +08:00
|
|
|
GEM_BUG_ON(!node.allocated);
|
2016-06-10 16:53:01 +08:00
|
|
|
}
|
2012-03-26 01:47:35 +08:00
|
|
|
|
2019-05-28 17:29:51 +08:00
|
|
|
mutex_unlock(&i915->drm.struct_mutex);
|
|
|
|
|
|
|
|
ret = i915_gem_object_lock_interruptible(obj);
|
2012-03-26 01:47:35 +08:00
|
|
|
if (ret)
|
|
|
|
goto out_unpin;
|
|
|
|
|
2019-05-28 17:29:51 +08:00
|
|
|
ret = i915_gem_object_set_to_gtt_domain(obj, true);
|
|
|
|
if (ret) {
|
|
|
|
i915_gem_object_unlock(obj);
|
|
|
|
goto out_unpin;
|
|
|
|
}
|
|
|
|
|
|
|
|
fence = i915_gem_object_lock_fence(obj);
|
|
|
|
i915_gem_object_unlock(obj);
|
|
|
|
if (!fence) {
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto out_unpin;
|
|
|
|
}
|
2016-10-28 20:58:40 +08:00
|
|
|
|
2016-08-19 00:16:43 +08:00
|
|
|
intel_fb_obj_invalidate(obj, ORIGIN_CPU);
|
2015-02-14 03:23:45 +08:00
|
|
|
|
2016-06-10 16:53:01 +08:00
|
|
|
user_data = u64_to_user_ptr(args->data_ptr);
|
|
|
|
offset = args->offset;
|
|
|
|
remain = args->size;
|
|
|
|
while (remain) {
|
2008-07-31 03:06:12 +08:00
|
|
|
/* Operation in this page
|
|
|
|
*
|
2008-10-31 10:38:48 +08:00
|
|
|
* page_base = page offset within aperture
|
|
|
|
* page_offset = offset within page
|
|
|
|
* page_length = bytes to copy for this page
|
2008-07-31 03:06:12 +08:00
|
|
|
*/
|
2016-06-10 16:53:01 +08:00
|
|
|
u32 page_base = node.start;
|
2016-10-28 20:58:39 +08:00
|
|
|
unsigned int page_offset = offset_in_page(offset);
|
|
|
|
unsigned int page_length = PAGE_SIZE - page_offset;
|
2016-06-10 16:53:01 +08:00
|
|
|
page_length = remain < page_length ? remain : page_length;
|
|
|
|
if (node.allocated) {
|
2019-07-18 22:54:05 +08:00
|
|
|
/* flush the write before we modify the GGTT */
|
|
|
|
intel_gt_flush_ggtt_writes(ggtt->vm.gt);
|
2018-06-05 23:37:58 +08:00
|
|
|
ggtt->vm.insert_page(&ggtt->vm,
|
|
|
|
i915_gem_object_get_dma_address(obj, offset >> PAGE_SHIFT),
|
|
|
|
node.start, I915_CACHE_NONE, 0);
|
2016-06-10 16:53:01 +08:00
|
|
|
wmb(); /* flush modifications to the GGTT (insert_page) */
|
|
|
|
} else {
|
|
|
|
page_base += offset & PAGE_MASK;
|
|
|
|
}
|
2008-10-31 10:38:48 +08:00
|
|
|
/* If we get a fault while copying data, then (presumably) our
|
2009-03-10 00:42:23 +08:00
|
|
|
* source page isn't available. Return the error and we'll
|
|
|
|
* retry in the slow path.
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
* If the object is non-shmem backed, we retry again with the
|
|
|
|
* path that handles page fault.
|
2008-10-31 10:38:48 +08:00
|
|
|
*/
|
2017-12-11 23:18:20 +08:00
|
|
|
if (ggtt_write(&ggtt->iomap, page_base, page_offset,
|
2016-10-28 20:58:40 +08:00
|
|
|
user_data, page_length)) {
|
|
|
|
ret = -EFAULT;
|
|
|
|
break;
|
2012-03-26 01:47:35 +08:00
|
|
|
}
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2008-10-31 10:38:48 +08:00
|
|
|
remain -= page_length;
|
|
|
|
user_data += page_length;
|
|
|
|
offset += page_length;
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
2017-02-22 19:40:49 +08:00
|
|
|
intel_fb_obj_flush(obj, ORIGIN_CPU);
|
2016-10-28 20:58:40 +08:00
|
|
|
|
2019-05-28 17:29:51 +08:00
|
|
|
i915_gem_object_unlock_fence(obj, fence);
|
2012-03-26 01:47:35 +08:00
|
|
|
out_unpin:
|
2019-05-28 17:29:51 +08:00
|
|
|
mutex_lock(&i915->drm.struct_mutex);
|
2019-07-18 22:54:05 +08:00
|
|
|
intel_gt_flush_ggtt_writes(ggtt->vm.gt);
|
2016-06-10 16:53:01 +08:00
|
|
|
if (node.allocated) {
|
2018-06-05 23:37:58 +08:00
|
|
|
ggtt->vm.clear_range(&ggtt->vm, node.start, node.size);
|
2016-06-10 16:53:01 +08:00
|
|
|
remove_mappable_node(&node);
|
|
|
|
} else {
|
2016-08-15 17:49:06 +08:00
|
|
|
i915_vma_unpin(vma);
|
2016-06-10 16:53:01 +08:00
|
|
|
}
|
2017-10-19 14:37:33 +08:00
|
|
|
out_rpm:
|
2019-06-14 07:21:54 +08:00
|
|
|
intel_runtime_pm_put(rpm, wakeref);
|
2017-10-19 14:37:33 +08:00
|
|
|
out_unlock:
|
2016-10-28 20:58:40 +08:00
|
|
|
mutex_unlock(&i915->drm.struct_mutex);
|
2009-03-10 00:42:23 +08:00
|
|
|
return ret;
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
|
|
|
|
2016-10-28 20:58:40 +08:00
|
|
|
/* Per-page copy function for the shmem pwrite fastpath.
|
|
|
|
* Flushes invalid cachelines before writing to the target if
|
|
|
|
* needs_clflush_before is set and flushes out any written cachelines after
|
|
|
|
* writing if needs_clflush is set.
|
|
|
|
*/
|
2009-03-10 04:42:30 +08:00
|
|
|
static int
|
2016-10-28 20:58:40 +08:00
|
|
|
shmem_pwrite(struct page *page, int offset, int len, char __user *user_data,
|
|
|
|
bool needs_clflush_before,
|
|
|
|
bool needs_clflush_after)
|
2009-03-10 04:42:30 +08:00
|
|
|
{
|
2019-01-05 20:07:58 +08:00
|
|
|
char *vaddr;
|
2016-10-28 20:58:40 +08:00
|
|
|
int ret;
|
|
|
|
|
2019-01-05 20:07:58 +08:00
|
|
|
vaddr = kmap(page);
|
2016-10-28 20:58:40 +08:00
|
|
|
|
2019-01-05 20:07:58 +08:00
|
|
|
if (needs_clflush_before)
|
|
|
|
drm_clflush_virt_range(vaddr + offset, len);
|
2016-10-28 20:58:40 +08:00
|
|
|
|
2019-01-05 20:07:58 +08:00
|
|
|
ret = __copy_from_user(vaddr + offset, user_data, len);
|
|
|
|
if (!ret && needs_clflush_after)
|
|
|
|
drm_clflush_virt_range(vaddr + offset, len);
|
2016-10-28 20:58:40 +08:00
|
|
|
|
2019-01-05 20:07:58 +08:00
|
|
|
kunmap(page);
|
|
|
|
|
|
|
|
return ret ? -EFAULT : 0;
|
2016-10-28 20:58:40 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
i915_gem_shmem_pwrite(struct drm_i915_gem_object *obj,
|
|
|
|
const struct drm_i915_gem_pwrite *args)
|
|
|
|
{
|
|
|
|
unsigned int partial_cacheline_write;
|
2016-08-19 00:16:47 +08:00
|
|
|
unsigned int needs_clflush;
|
2016-10-28 20:58:40 +08:00
|
|
|
unsigned int offset, idx;
|
2019-05-28 17:29:51 +08:00
|
|
|
struct dma_fence *fence;
|
|
|
|
void __user *user_data;
|
|
|
|
u64 remain;
|
2016-10-28 20:58:40 +08:00
|
|
|
int ret;
|
2009-03-10 04:42:30 +08:00
|
|
|
|
2019-05-28 17:29:48 +08:00
|
|
|
ret = i915_gem_object_prepare_write(obj, &needs_clflush);
|
2016-10-28 20:58:40 +08:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2019-05-28 17:29:51 +08:00
|
|
|
fence = i915_gem_object_lock_fence(obj);
|
|
|
|
i915_gem_object_finish_access(obj);
|
|
|
|
if (!fence)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
2016-10-28 20:58:40 +08:00
|
|
|
/* If we don't overwrite a cacheline completely we need to be
|
|
|
|
* careful to have up-to-date data by first clflushing. Don't
|
|
|
|
* overcomplicate things and flush the entire patch.
|
|
|
|
*/
|
|
|
|
partial_cacheline_write = 0;
|
|
|
|
if (needs_clflush & CLFLUSH_BEFORE)
|
|
|
|
partial_cacheline_write = boot_cpu_data.x86_clflush_size - 1;
|
2012-06-01 22:20:22 +08:00
|
|
|
|
2016-10-28 20:58:40 +08:00
|
|
|
user_data = u64_to_user_ptr(args->data_ptr);
|
|
|
|
remain = args->size;
|
|
|
|
offset = offset_in_page(args->offset);
|
|
|
|
for (idx = args->offset >> PAGE_SHIFT; remain; idx++) {
|
|
|
|
struct page *page = i915_gem_object_get_page(obj, idx);
|
2018-10-12 22:02:28 +08:00
|
|
|
unsigned int length = min_t(u64, remain, PAGE_SIZE - offset);
|
2012-09-05 04:02:55 +08:00
|
|
|
|
2016-10-28 20:58:40 +08:00
|
|
|
ret = shmem_pwrite(page, offset, length, user_data,
|
|
|
|
(offset | length) & partial_cacheline_write,
|
|
|
|
needs_clflush & CLFLUSH_AFTER);
|
2012-09-05 04:02:55 +08:00
|
|
|
if (ret)
|
2016-10-28 20:58:40 +08:00
|
|
|
break;
|
2012-09-05 04:02:55 +08:00
|
|
|
|
2016-10-28 20:58:40 +08:00
|
|
|
remain -= length;
|
|
|
|
user_data += length;
|
|
|
|
offset = 0;
|
drm/i915: rewrite shmem_pwrite_slow to use copy_from_user
... instead of get_user_pages, because that fails on non page-backed
user addresses like e.g. a gtt mapping of a bo.
To get there essentially copy the vfs read path into pagecache. We
can't call that right away because we have to take care of bit17
swizzling. To not deadlock with our own pagefault handler we need
to completely drop struct_mutex, reducing the atomicty-guarantees
of our userspace abi. Implications for racing with other gem ioctl:
- execbuf, pwrite, pread: Due to -EFAULT fallback to slow paths there's
already the risk of the pwrite call not being atomic, no degration.
- read/write access to mmaps: already fully racy, no degration.
- set_tiling: Calling set_tiling while reading/writing is already
pretty much undefined, now it just got a bit worse. set_tiling is
only called by libdrm on unused/new bos, so no problem.
- set_domain: When changing to the gtt domain while copying (without any
read/write access, e.g. for synchronization), we might leave unflushed
data in the cpu caches. The clflush_object at the end of pwrite_slow
takes care of this problem.
- truncating of purgeable objects: the shmem_read_mapping_page call could
reinstate backing storage for truncated objects. The check at the end
of pwrite_slow takes care of this.
v2:
- add missing intel_gtt_chipset_flush
- add __ to copy_from_user_swizzled as suggest by Chris Wilson.
v3: Fixup bit17 swizzling, it swizzled the wrong pages.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2011-12-14 20:57:31 +08:00
|
|
|
}
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2017-02-22 19:40:49 +08:00
|
|
|
intel_fb_obj_flush(obj, ORIGIN_CPU);
|
2019-05-28 17:29:51 +08:00
|
|
|
i915_gem_object_unlock_fence(obj, fence);
|
|
|
|
|
2009-03-10 04:42:30 +08:00
|
|
|
return ret;
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Writes data to the object referenced by handle.
|
2016-06-03 21:02:17 +08:00
|
|
|
* @dev: drm device
|
|
|
|
* @data: ioctl data blob
|
|
|
|
* @file: drm file
|
2008-07-31 03:06:12 +08:00
|
|
|
*
|
|
|
|
* On error, the contents of the buffer that were to be modified are undefined.
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
|
2010-10-14 22:03:58 +08:00
|
|
|
struct drm_file *file)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
|
|
|
struct drm_i915_gem_pwrite *args = data;
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_i915_gem_object *obj;
|
2010-11-17 17:10:42 +08:00
|
|
|
int ret;
|
|
|
|
|
|
|
|
if (args->size == 0)
|
|
|
|
return 0;
|
|
|
|
|
Remove 'type' argument from access_ok() function
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.
It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access. But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.
A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model. And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.
This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
There were a couple of notable cases:
- csky still had the old "verify_area()" name as an alias.
- the iter_iov code had magical hardcoded knowledge of the actual
values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
really used it)
- microblaze used the type argument for a debug printout
but other than those oddities this should be a total no-op patch.
I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something. Any missed conversion should be trivially fixable, though.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 10:57:57 +08:00
|
|
|
if (!access_ok(u64_to_user_ptr(args->data_ptr), args->size))
|
2010-11-17 17:10:42 +08:00
|
|
|
return -EFAULT;
|
|
|
|
|
2016-07-20 20:31:51 +08:00
|
|
|
obj = i915_gem_object_lookup(file, args->handle);
|
2016-08-05 17:14:16 +08:00
|
|
|
if (!obj)
|
|
|
|
return -ENOENT;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2010-09-27 03:21:44 +08:00
|
|
|
/* Bounds check destination. */
|
2016-12-14 04:32:22 +08:00
|
|
|
if (range_overflows_t(u64, args->offset, args->size, obj->base.size)) {
|
2010-09-27 03:50:05 +08:00
|
|
|
ret = -EINVAL;
|
2016-08-05 17:14:16 +08:00
|
|
|
goto err;
|
2010-09-27 03:50:05 +08:00
|
|
|
}
|
|
|
|
|
2018-07-13 02:53:14 +08:00
|
|
|
/* Writes not allowed into this read-only object */
|
|
|
|
if (i915_gem_object_is_readonly(obj)) {
|
|
|
|
ret = -EINVAL;
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
2011-02-03 19:57:46 +08:00
|
|
|
trace_i915_gem_object_pwrite(obj, args->offset, args->size);
|
|
|
|
|
2017-03-07 20:03:38 +08:00
|
|
|
ret = -ENODEV;
|
|
|
|
if (obj->ops->pwrite)
|
|
|
|
ret = obj->ops->pwrite(obj, args);
|
|
|
|
if (ret != -ENODEV)
|
|
|
|
goto err;
|
|
|
|
|
2016-10-28 20:58:27 +08:00
|
|
|
ret = i915_gem_object_wait(obj,
|
|
|
|
I915_WAIT_INTERRUPTIBLE |
|
|
|
|
I915_WAIT_ALL,
|
2019-02-13 17:25:04 +08:00
|
|
|
MAX_SCHEDULE_TIMEOUT);
|
2016-08-05 17:14:16 +08:00
|
|
|
if (ret)
|
|
|
|
goto err;
|
|
|
|
|
2016-10-28 20:58:40 +08:00
|
|
|
ret = i915_gem_object_pin_pages(obj);
|
2016-08-05 17:14:16 +08:00
|
|
|
if (ret)
|
2016-10-28 20:58:40 +08:00
|
|
|
goto err;
|
2016-08-05 17:14:16 +08:00
|
|
|
|
2012-03-26 01:47:35 +08:00
|
|
|
ret = -EFAULT;
|
2008-07-31 03:06:12 +08:00
|
|
|
/* We can only do the GTT pwrite on untiled buffers, as otherwise
|
|
|
|
* it would end up going through the fenced access, and we'll get
|
|
|
|
* different detiling behavior between reading and writing.
|
|
|
|
* pread/pwrite currently are reading and writing from the CPU
|
|
|
|
* perspective, requiring manual detiling by the client.
|
|
|
|
*/
|
2016-06-20 22:05:52 +08:00
|
|
|
if (!i915_gem_object_has_struct_page(obj) ||
|
2016-10-24 20:42:15 +08:00
|
|
|
cpu_write_needs_clflush(obj))
|
2012-03-26 01:47:35 +08:00
|
|
|
/* Note that the gtt paths might fail with non-page-backed user
|
|
|
|
* pointers (e.g. gtt mappings when moving data between
|
2016-10-24 20:42:15 +08:00
|
|
|
* textures). Fallback to the shmem path in that case.
|
|
|
|
*/
|
2016-10-28 20:58:40 +08:00
|
|
|
ret = i915_gem_gtt_pwrite_fast(obj, args);
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2016-07-17 01:42:36 +08:00
|
|
|
if (ret == -EFAULT || ret == -ENOSPC) {
|
2014-11-04 20:51:40 +08:00
|
|
|
if (obj->phys_handle)
|
|
|
|
ret = i915_gem_phys_pwrite(obj, args, file);
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
else
|
2016-10-28 20:58:40 +08:00
|
|
|
ret = i915_gem_shmem_pwrite(obj, args);
|
2014-11-04 20:51:40 +08:00
|
|
|
}
|
2011-12-14 20:57:30 +08:00
|
|
|
|
2016-10-28 20:58:40 +08:00
|
|
|
i915_gem_object_unpin_pages(obj);
|
2016-08-05 17:14:16 +08:00
|
|
|
err:
|
2016-10-28 20:58:43 +08:00
|
|
|
i915_gem_object_put(obj);
|
2016-08-05 17:14:16 +08:00
|
|
|
return ret;
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Called when user space has done writes to this buffer
|
2016-06-03 21:02:17 +08:00
|
|
|
* @dev: drm device
|
|
|
|
* @data: ioctl data blob
|
|
|
|
* @file: drm file
|
2008-07-31 03:06:12 +08:00
|
|
|
*/
|
|
|
|
int
|
|
|
|
i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_file *file)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
|
|
|
struct drm_i915_gem_sw_finish *args = data;
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_i915_gem_object *obj;
|
2010-10-17 16:45:41 +08:00
|
|
|
|
2016-07-20 20:31:51 +08:00
|
|
|
obj = i915_gem_object_lookup(file, args->handle);
|
2016-08-05 17:14:19 +08:00
|
|
|
if (!obj)
|
|
|
|
return -ENOENT;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2017-11-14 18:25:13 +08:00
|
|
|
/*
|
|
|
|
* Proxy objects are barred from CPU access, so there is no
|
|
|
|
* need to ban sw_finish as it is a nop.
|
|
|
|
*/
|
|
|
|
|
2008-07-31 03:06:12 +08:00
|
|
|
/* Pinned buffers may be scanout, so flush the cache */
|
2017-02-22 19:40:46 +08:00
|
|
|
i915_gem_object_flush_if_display(obj);
|
2016-10-28 20:58:43 +08:00
|
|
|
i915_gem_object_put(obj);
|
2017-02-22 19:40:46 +08:00
|
|
|
|
|
|
|
return 0;
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
|
|
|
|
2019-06-13 15:32:54 +08:00
|
|
|
void i915_gem_runtime_suspend(struct drm_i915_private *i915)
|
2014-06-16 15:57:44 +08:00
|
|
|
{
|
2016-10-24 20:42:16 +08:00
|
|
|
struct drm_i915_gem_object *obj, *on;
|
2016-10-24 20:42:18 +08:00
|
|
|
int i;
|
2014-06-16 15:57:44 +08:00
|
|
|
|
2016-10-24 20:42:16 +08:00
|
|
|
/*
|
|
|
|
* Only called during RPM suspend. All users of the userfault_list
|
|
|
|
* must be holding an RPM wakeref to ensure that this can not
|
|
|
|
* run concurrently with themselves (and use the struct_mutex for
|
|
|
|
* protection between themselves).
|
|
|
|
*/
|
2016-10-24 20:42:14 +08:00
|
|
|
|
2016-10-24 20:42:16 +08:00
|
|
|
list_for_each_entry_safe(obj, on,
|
2019-06-13 15:32:54 +08:00
|
|
|
&i915->ggtt.userfault_list, userfault_link)
|
2017-10-09 16:43:57 +08:00
|
|
|
__i915_gem_object_release_mmap(obj);
|
2016-10-24 20:42:18 +08:00
|
|
|
|
2019-06-13 15:32:54 +08:00
|
|
|
/*
|
|
|
|
* The fence will be lost when the device powers down. If any were
|
2016-10-24 20:42:18 +08:00
|
|
|
* in use by hardware (i.e. they are pinned), we should not be powering
|
|
|
|
* down! All other fences will be reacquired by the user upon waking.
|
|
|
|
*/
|
2019-06-13 15:32:54 +08:00
|
|
|
for (i = 0; i < i915->ggtt.num_fences; i++) {
|
|
|
|
struct i915_fence_reg *reg = &i915->ggtt.fence_regs[i];
|
2016-10-24 20:42:18 +08:00
|
|
|
|
2019-06-13 15:32:54 +08:00
|
|
|
/*
|
|
|
|
* Ideally we want to assert that the fence register is not
|
2017-02-03 20:57:17 +08:00
|
|
|
* live at this point (i.e. that no piece of code will be
|
|
|
|
* trying to write through fence + GTT, as that both violates
|
|
|
|
* our tracking of activity and associated locking/barriers,
|
|
|
|
* but also is illegal given that the hw is powered down).
|
|
|
|
*
|
|
|
|
* Previously we used reg->pin_count as a "liveness" indicator.
|
|
|
|
* That is not sufficient, and we need a more fine-grained
|
|
|
|
* tool if we want to have a sanity check here.
|
|
|
|
*/
|
2016-10-24 20:42:18 +08:00
|
|
|
|
|
|
|
if (!reg->vma)
|
|
|
|
continue;
|
|
|
|
|
2017-10-09 16:43:57 +08:00
|
|
|
GEM_BUG_ON(i915_vma_has_userfault(reg->vma));
|
2016-10-24 20:42:18 +08:00
|
|
|
reg->dirty = true;
|
|
|
|
}
|
2014-06-16 15:57:44 +08:00
|
|
|
}
|
|
|
|
|
2019-07-13 03:29:53 +08:00
|
|
|
static int wait_for_engines(struct intel_gt *gt)
|
2017-03-30 22:50:39 +08:00
|
|
|
{
|
2019-07-13 03:29:53 +08:00
|
|
|
if (wait_for(intel_engines_are_idle(gt), I915_IDLE_ENGINES_TIMEOUT)) {
|
|
|
|
dev_err(gt->i915->drm.dev,
|
2017-12-12 03:41:35 +08:00
|
|
|
"Failed to idle engines, declaring wedged!\n");
|
2018-03-09 18:11:14 +08:00
|
|
|
GEM_TRACE_DUMP();
|
2019-07-13 03:29:53 +08:00
|
|
|
intel_gt_set_wedged(gt);
|
2017-08-26 19:09:33 +08:00
|
|
|
return -EIO;
|
2017-03-30 22:50:39 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-01-28 18:23:56 +08:00
|
|
|
static long
|
|
|
|
wait_for_timelines(struct drm_i915_private *i915,
|
|
|
|
unsigned int flags, long timeout)
|
|
|
|
{
|
2019-06-21 21:16:39 +08:00
|
|
|
struct intel_gt_timelines *gt = &i915->gt.timelines;
|
2019-06-21 15:08:10 +08:00
|
|
|
struct intel_timeline *tl;
|
2019-01-28 18:23:56 +08:00
|
|
|
|
|
|
|
mutex_lock(>->mutex);
|
2019-01-29 02:18:12 +08:00
|
|
|
list_for_each_entry(tl, >->active_list, link) {
|
2019-01-28 18:23:56 +08:00
|
|
|
struct i915_request *rq;
|
|
|
|
|
drm/i915: Pull i915_gem_active into the i915_active family
Looking forward, we need to break the struct_mutex dependency on
i915_gem_active. In the meantime, external use of i915_gem_active is
quite beguiling, little do new users suspect that it implies a barrier
as each request it tracks must be ordered wrt the previous one. As one
of many, it can be used to track activity across multiple timelines, a
shared fence, which fits our unordered request submission much better. We
need to steer external users away from the singular, exclusive fence
imposed by i915_gem_active to i915_active instead. As part of that
process, we move i915_gem_active out of i915_request.c into
i915_active.c to start separating the two concepts, and rename it to
i915_active_request (both to tie it to the concept of tracking just one
request, and to give it a longer, less appealing name).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205130005.2807-5-chris@chris-wilson.co.uk
2019-02-05 21:00:05 +08:00
|
|
|
rq = i915_active_request_get_unlocked(&tl->last_request);
|
2019-01-28 18:23:56 +08:00
|
|
|
if (!rq)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
mutex_unlock(>->mutex);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* "Race-to-idle".
|
|
|
|
*
|
|
|
|
* Switching to the kernel context is often used a synchronous
|
|
|
|
* step prior to idling, e.g. in suspend for flushing all
|
|
|
|
* current operations to memory before sleeping. These we
|
|
|
|
* want to complete as quickly as possible to avoid prolonged
|
|
|
|
* stalls, so allow the gpu to boost to maximum clocks.
|
|
|
|
*/
|
|
|
|
if (flags & I915_WAIT_FOR_IDLE_BOOST)
|
2019-02-13 17:25:04 +08:00
|
|
|
gen6_rps_boost(rq);
|
2019-01-28 18:23:56 +08:00
|
|
|
|
|
|
|
timeout = i915_request_wait(rq, flags, timeout);
|
|
|
|
i915_request_put(rq);
|
|
|
|
if (timeout < 0)
|
|
|
|
return timeout;
|
|
|
|
|
|
|
|
/* restart after reacquiring the lock */
|
|
|
|
mutex_lock(>->mutex);
|
2019-01-29 02:18:12 +08:00
|
|
|
tl = list_entry(>->active_list, typeof(*tl), link);
|
2019-01-28 18:23:56 +08:00
|
|
|
}
|
|
|
|
mutex_unlock(>->mutex);
|
|
|
|
|
|
|
|
return timeout;
|
|
|
|
}
|
|
|
|
|
2018-07-09 20:20:42 +08:00
|
|
|
int i915_gem_wait_for_idle(struct drm_i915_private *i915,
|
|
|
|
unsigned int flags, long timeout)
|
2016-10-28 20:58:46 +08:00
|
|
|
{
|
2019-07-23 17:12:18 +08:00
|
|
|
/* If the device is asleep, we have no requests outstanding */
|
|
|
|
if (!READ_ONCE(i915->gt.awake))
|
|
|
|
return 0;
|
|
|
|
|
drm/i915: Invert the GEM wakeref hierarchy
In the current scheme, on submitting a request we take a single global
GEM wakeref, which trickles down to wake up all GT power domains. This
is undesirable as we would like to be able to localise our power
management to the available power domains and to remove the global GEM
operations from the heart of the driver. (The intent there is to push
global GEM decisions to the boundary as used by the GEM user interface.)
Now during request construction, each request is responsible via its
logical context to acquire a wakeref on each power domain it intends to
utilize. Currently, each request takes a wakeref on the engine(s) and
the engines themselves take a chipset wakeref. This gives us a
transition on each engine which we can extend if we want to insert more
powermangement control (such as soft rc6). The global GEM operations
that currently require a struct_mutex are reduced to listening to pm
events from the chipset GT wakeref. As we reduce the struct_mutex
requirement, these listeners should evaporate.
Perhaps the biggest immediate change is that this removes the
struct_mutex requirement around GT power management, allowing us greater
flexibility in request construction. Another important knock-on effect,
is that by tracking engine usage, we can insert a switch back to the
kernel context on that engine immediately, avoiding any extra delay or
inserting global synchronisation barriers. This makes tracking when an
engine and its associated contexts are idle much easier -- important for
when we forgo our assumed execution ordering and need idle barriers to
unpin used contexts. In the process, it means we remove a large chunk of
code whose only purpose was to switch back to the kernel context.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-5-chris@chris-wilson.co.uk
2019-04-25 04:07:17 +08:00
|
|
|
GEM_TRACE("flags=%x (%s), timeout=%ld%s, awake?=%s\n",
|
2018-07-09 20:20:42 +08:00
|
|
|
flags, flags & I915_WAIT_LOCKED ? "locked" : "unlocked",
|
drm/i915: Invert the GEM wakeref hierarchy
In the current scheme, on submitting a request we take a single global
GEM wakeref, which trickles down to wake up all GT power domains. This
is undesirable as we would like to be able to localise our power
management to the available power domains and to remove the global GEM
operations from the heart of the driver. (The intent there is to push
global GEM decisions to the boundary as used by the GEM user interface.)
Now during request construction, each request is responsible via its
logical context to acquire a wakeref on each power domain it intends to
utilize. Currently, each request takes a wakeref on the engine(s) and
the engines themselves take a chipset wakeref. This gives us a
transition on each engine which we can extend if we want to insert more
powermangement control (such as soft rc6). The global GEM operations
that currently require a struct_mutex are reduced to listening to pm
events from the chipset GT wakeref. As we reduce the struct_mutex
requirement, these listeners should evaporate.
Perhaps the biggest immediate change is that this removes the
struct_mutex requirement around GT power management, allowing us greater
flexibility in request construction. Another important knock-on effect,
is that by tracking engine usage, we can insert a switch back to the
kernel context on that engine immediately, avoiding any extra delay or
inserting global synchronisation barriers. This makes tracking when an
engine and its associated contexts are idle much easier -- important for
when we forgo our assumed execution ordering and need idle barriers to
unpin used contexts. In the process, it means we remove a large chunk of
code whose only purpose was to switch back to the kernel context.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-5-chris@chris-wilson.co.uk
2019-04-25 04:07:17 +08:00
|
|
|
timeout, timeout == MAX_SCHEDULE_TIMEOUT ? " (forever)" : "",
|
|
|
|
yesno(i915->gt.awake));
|
2018-05-24 16:11:35 +08:00
|
|
|
|
2019-01-28 18:23:56 +08:00
|
|
|
timeout = wait_for_timelines(i915, flags, timeout);
|
|
|
|
if (timeout < 0)
|
|
|
|
return timeout;
|
|
|
|
|
2016-11-11 22:58:08 +08:00
|
|
|
if (flags & I915_WAIT_LOCKED) {
|
2018-05-03 00:38:39 +08:00
|
|
|
int err;
|
2016-11-11 22:58:08 +08:00
|
|
|
|
|
|
|
lockdep_assert_held(&i915->drm.struct_mutex);
|
|
|
|
|
2019-07-13 03:29:53 +08:00
|
|
|
err = wait_for_engines(&i915->gt);
|
2018-06-27 19:53:34 +08:00
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
2018-02-21 17:56:36 +08:00
|
|
|
i915_retire_requests(i915);
|
2018-05-03 00:38:39 +08:00
|
|
|
}
|
2018-06-27 19:53:34 +08:00
|
|
|
|
|
|
|
return 0;
|
2010-02-19 18:52:00 +08:00
|
|
|
}
|
|
|
|
|
2016-08-15 17:49:06 +08:00
|
|
|
struct i915_vma *
|
2015-03-16 20:11:13 +08:00
|
|
|
i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
|
|
|
|
const struct i915_ggtt_view *view,
|
2016-08-04 23:32:23 +08:00
|
|
|
u64 size,
|
2016-08-04 23:32:22 +08:00
|
|
|
u64 alignment,
|
|
|
|
u64 flags)
|
2015-03-16 20:11:13 +08:00
|
|
|
{
|
2016-10-13 16:55:04 +08:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
|
2018-06-05 23:37:58 +08:00
|
|
|
struct i915_address_space *vm = &dev_priv->ggtt.vm;
|
2016-08-04 23:32:31 +08:00
|
|
|
struct i915_vma *vma;
|
|
|
|
int ret;
|
2016-03-30 21:57:10 +08:00
|
|
|
|
2016-10-28 20:58:32 +08:00
|
|
|
lockdep_assert_held(&obj->base.dev->struct_mutex);
|
|
|
|
|
2018-02-20 21:42:05 +08:00
|
|
|
if (flags & PIN_MAPPABLE &&
|
|
|
|
(!view || view->type == I915_GGTT_VIEW_NORMAL)) {
|
2017-10-09 16:44:01 +08:00
|
|
|
/* If the required space is larger than the available
|
|
|
|
* aperture, we will not able to find a slot for the
|
|
|
|
* object and unbinding the object now will be in
|
|
|
|
* vain. Worse, doing so may cause us to ping-pong
|
|
|
|
* the object in and out of the Global GTT and
|
|
|
|
* waste a lot of cycles under the mutex.
|
|
|
|
*/
|
|
|
|
if (obj->base.size > dev_priv->ggtt.mappable_end)
|
|
|
|
return ERR_PTR(-E2BIG);
|
|
|
|
|
|
|
|
/* If NONBLOCK is set the caller is optimistically
|
|
|
|
* trying to cache the full object within the mappable
|
|
|
|
* aperture, and *must* have a fallback in place for
|
|
|
|
* situations where we cannot bind the object. We
|
|
|
|
* can be a little more lax here and use the fallback
|
|
|
|
* more often to avoid costly migrations of ourselves
|
|
|
|
* and other objects within the aperture.
|
|
|
|
*
|
|
|
|
* Half-the-aperture is used as a simple heuristic.
|
|
|
|
* More interesting would to do search for a free
|
|
|
|
* block prior to making the commitment to unbind.
|
|
|
|
* That caters for the self-harm case, and with a
|
|
|
|
* little more heuristics (e.g. NOFAULT, NOEVICT)
|
|
|
|
* we could try to minimise harm to others.
|
|
|
|
*/
|
|
|
|
if (flags & PIN_NONBLOCK &&
|
|
|
|
obj->base.size > dev_priv->ggtt.mappable_end / 2)
|
|
|
|
return ERR_PTR(-ENOSPC);
|
|
|
|
}
|
|
|
|
|
2017-01-16 23:21:28 +08:00
|
|
|
vma = i915_vma_instance(obj, vm, view);
|
2019-02-21 10:08:19 +08:00
|
|
|
if (IS_ERR(vma))
|
2016-08-15 17:49:06 +08:00
|
|
|
return vma;
|
2016-08-04 23:32:31 +08:00
|
|
|
|
|
|
|
if (i915_vma_misplaced(vma, size, alignment, flags)) {
|
2017-10-09 16:44:01 +08:00
|
|
|
if (flags & PIN_NONBLOCK) {
|
|
|
|
if (i915_vma_is_pinned(vma) || i915_vma_is_active(vma))
|
|
|
|
return ERR_PTR(-ENOSPC);
|
2016-08-04 23:32:31 +08:00
|
|
|
|
2017-10-09 16:44:01 +08:00
|
|
|
if (flags & PIN_MAPPABLE &&
|
2017-01-10 00:16:11 +08:00
|
|
|
vma->fence_size > dev_priv->ggtt.mappable_end / 2)
|
2016-10-13 16:55:04 +08:00
|
|
|
return ERR_PTR(-ENOSPC);
|
|
|
|
}
|
|
|
|
|
2016-08-04 23:32:31 +08:00
|
|
|
WARN(i915_vma_is_pinned(vma),
|
|
|
|
"bo is already pinned in ggtt with incorrect alignment:"
|
2016-08-19 00:16:55 +08:00
|
|
|
" offset=%08x, req.alignment=%llx,"
|
|
|
|
" req.map_and_fenceable=%d, vma->map_and_fenceable=%d\n",
|
|
|
|
i915_ggtt_offset(vma), alignment,
|
2016-08-04 23:32:31 +08:00
|
|
|
!!(flags & PIN_MAPPABLE),
|
2016-08-19 00:16:55 +08:00
|
|
|
i915_vma_is_map_and_fenceable(vma));
|
2016-08-04 23:32:31 +08:00
|
|
|
ret = i915_vma_unbind(vma);
|
|
|
|
if (ret)
|
2016-08-15 17:49:06 +08:00
|
|
|
return ERR_PTR(ret);
|
2016-08-04 23:32:31 +08:00
|
|
|
}
|
|
|
|
|
2016-08-15 17:49:06 +08:00
|
|
|
ret = i915_vma_pin(vma, size, alignment, flags | PIN_GLOBAL);
|
|
|
|
if (ret)
|
|
|
|
return ERR_PTR(ret);
|
2015-03-16 20:11:13 +08:00
|
|
|
|
2016-08-15 17:49:06 +08:00
|
|
|
return vma;
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
|
|
|
|
2009-09-14 23:50:29 +08:00
|
|
|
int
|
|
|
|
i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file_priv)
|
|
|
|
{
|
2019-05-31 04:34:59 +08:00
|
|
|
struct drm_i915_private *i915 = to_i915(dev);
|
2009-09-14 23:50:29 +08:00
|
|
|
struct drm_i915_gem_madvise *args = data;
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_i915_gem_object *obj;
|
2016-10-28 20:58:37 +08:00
|
|
|
int err;
|
2009-09-14 23:50:29 +08:00
|
|
|
|
|
|
|
switch (args->madv) {
|
|
|
|
case I915_MADV_DONTNEED:
|
|
|
|
case I915_MADV_WILLNEED:
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2016-07-20 20:31:51 +08:00
|
|
|
obj = i915_gem_object_lookup(file_priv, args->handle);
|
2016-10-28 20:58:37 +08:00
|
|
|
if (!obj)
|
|
|
|
return -ENOENT;
|
|
|
|
|
|
|
|
err = mutex_lock_interruptible(&obj->mm.lock);
|
|
|
|
if (err)
|
|
|
|
goto out;
|
2009-09-14 23:50:29 +08:00
|
|
|
|
2017-10-14 04:26:13 +08:00
|
|
|
if (i915_gem_object_has_pages(obj) &&
|
2016-08-05 17:14:23 +08:00
|
|
|
i915_gem_object_is_tiled(obj) &&
|
2019-05-31 04:34:59 +08:00
|
|
|
i915->quirks & QUIRK_PIN_SWIZZLED_PAGES) {
|
2016-11-01 18:03:17 +08:00
|
|
|
if (obj->mm.madv == I915_MADV_WILLNEED) {
|
|
|
|
GEM_BUG_ON(!obj->mm.quirked);
|
2016-10-28 20:58:35 +08:00
|
|
|
__i915_gem_object_unpin_pages(obj);
|
2016-11-01 18:03:17 +08:00
|
|
|
obj->mm.quirked = false;
|
|
|
|
}
|
|
|
|
if (args->madv == I915_MADV_WILLNEED) {
|
2016-11-04 18:30:01 +08:00
|
|
|
GEM_BUG_ON(obj->mm.quirked);
|
2016-10-28 20:58:35 +08:00
|
|
|
__i915_gem_object_pin_pages(obj);
|
2016-11-01 18:03:17 +08:00
|
|
|
obj->mm.quirked = true;
|
|
|
|
}
|
2014-11-20 16:26:30 +08:00
|
|
|
}
|
|
|
|
|
2016-10-28 20:58:35 +08:00
|
|
|
if (obj->mm.madv != __I915_MADV_PURGED)
|
|
|
|
obj->mm.madv = args->madv;
|
2009-09-14 23:50:29 +08:00
|
|
|
|
2019-05-31 04:34:59 +08:00
|
|
|
if (i915_gem_object_has_pages(obj)) {
|
|
|
|
struct list_head *list;
|
|
|
|
|
2019-05-31 04:35:00 +08:00
|
|
|
if (i915_gem_object_is_shrinkable(obj)) {
|
2019-06-10 22:54:30 +08:00
|
|
|
unsigned long flags;
|
|
|
|
|
|
|
|
spin_lock_irqsave(&i915->mm.obj_lock, flags);
|
|
|
|
|
2019-05-31 04:35:00 +08:00
|
|
|
if (obj->mm.madv != I915_MADV_WILLNEED)
|
|
|
|
list = &i915->mm.purge_list;
|
|
|
|
else
|
2019-06-12 18:57:20 +08:00
|
|
|
list = &i915->mm.shrink_list;
|
2019-05-31 04:35:00 +08:00
|
|
|
list_move_tail(&obj->mm.link, list);
|
2019-06-10 22:54:30 +08:00
|
|
|
|
|
|
|
spin_unlock_irqrestore(&i915->mm.obj_lock, flags);
|
2019-05-31 04:35:00 +08:00
|
|
|
}
|
2019-05-31 04:34:59 +08:00
|
|
|
}
|
|
|
|
|
drm/i915: Track unbound pages
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-20 17:40:46 +08:00
|
|
|
/* if the object is no longer attached, discard its backing storage */
|
2017-10-14 04:26:13 +08:00
|
|
|
if (obj->mm.madv == I915_MADV_DONTNEED &&
|
|
|
|
!i915_gem_object_has_pages(obj))
|
2019-05-28 17:29:46 +08:00
|
|
|
i915_gem_object_truncate(obj);
|
2009-09-21 06:13:10 +08:00
|
|
|
|
2016-10-28 20:58:35 +08:00
|
|
|
args->retained = obj->mm.madv != __I915_MADV_PURGED;
|
2016-10-28 20:58:37 +08:00
|
|
|
mutex_unlock(&obj->mm.lock);
|
2009-09-22 21:24:13 +08:00
|
|
|
|
2016-10-28 20:58:37 +08:00
|
|
|
out:
|
2016-07-20 20:31:53 +08:00
|
|
|
i915_gem_object_put(obj);
|
2016-10-28 20:58:37 +08:00
|
|
|
return err;
|
2009-09-14 23:50:29 +08:00
|
|
|
}
|
|
|
|
|
2017-01-24 19:01:35 +08:00
|
|
|
void i915_gem_sanitize(struct drm_i915_private *i915)
|
|
|
|
{
|
2019-01-14 22:21:18 +08:00
|
|
|
intel_wakeref_t wakeref;
|
|
|
|
|
2018-05-31 16:22:45 +08:00
|
|
|
GEM_TRACE("\n");
|
|
|
|
|
2019-06-14 07:21:54 +08:00
|
|
|
wakeref = intel_runtime_pm_get(&i915->runtime_pm);
|
2019-03-20 02:35:36 +08:00
|
|
|
intel_uncore_forcewake_get(&i915->uncore, FORCEWAKE_ALL);
|
2018-05-31 16:22:45 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* As we have just resumed the machine and woken the device up from
|
|
|
|
* deep PCI sleep (presumably D3_cold), assume the HW has been reset
|
|
|
|
* back to defaults, recovering from whatever wedged state we left it
|
|
|
|
* in and so worth trying to use the device once more.
|
|
|
|
*/
|
2019-07-13 03:29:53 +08:00
|
|
|
if (intel_gt_is_wedged(&i915->gt))
|
|
|
|
intel_gt_unset_wedged(&i915->gt);
|
2017-08-26 19:09:34 +08:00
|
|
|
|
2017-01-24 19:01:35 +08:00
|
|
|
/*
|
|
|
|
* If we inherit context state from the BIOS or earlier occupants
|
|
|
|
* of the GPU, the GPU may be in an inconsistent state when we
|
|
|
|
* try to take over. The only way to remove the earlier state
|
|
|
|
* is by resetting. However, resetting on earlier gen is tricky as
|
|
|
|
* it may impact the display and we are uncertain about the stability
|
2017-04-28 15:53:38 +08:00
|
|
|
* of the reset, so this could be applied to even earlier gen.
|
2017-01-24 19:01:35 +08:00
|
|
|
*/
|
2019-06-25 21:01:10 +08:00
|
|
|
intel_gt_sanitize(&i915->gt, false);
|
2018-05-31 16:22:45 +08:00
|
|
|
|
2019-03-20 02:35:36 +08:00
|
|
|
intel_uncore_forcewake_put(&i915->uncore, FORCEWAKE_ALL);
|
2019-06-14 07:21:54 +08:00
|
|
|
intel_runtime_pm_put(&i915->runtime_pm, wakeref);
|
2017-01-24 19:01:35 +08:00
|
|
|
}
|
|
|
|
|
2019-06-21 15:07:47 +08:00
|
|
|
static void init_unused_ring(struct intel_gt *gt, u32 base)
|
2014-08-15 06:21:55 +08:00
|
|
|
{
|
2019-06-21 15:07:47 +08:00
|
|
|
struct intel_uncore *uncore = gt->uncore;
|
|
|
|
|
|
|
|
intel_uncore_write(uncore, RING_CTL(base), 0);
|
|
|
|
intel_uncore_write(uncore, RING_HEAD(base), 0);
|
|
|
|
intel_uncore_write(uncore, RING_TAIL(base), 0);
|
|
|
|
intel_uncore_write(uncore, RING_START(base), 0);
|
2014-08-15 06:21:55 +08:00
|
|
|
}
|
|
|
|
|
2019-06-21 15:07:47 +08:00
|
|
|
static void init_unused_rings(struct intel_gt *gt)
|
2014-08-15 06:21:55 +08:00
|
|
|
{
|
2019-06-21 15:07:47 +08:00
|
|
|
struct drm_i915_private *i915 = gt->i915;
|
|
|
|
|
|
|
|
if (IS_I830(i915)) {
|
|
|
|
init_unused_ring(gt, PRB1_BASE);
|
|
|
|
init_unused_ring(gt, SRB0_BASE);
|
|
|
|
init_unused_ring(gt, SRB1_BASE);
|
|
|
|
init_unused_ring(gt, SRB2_BASE);
|
|
|
|
init_unused_ring(gt, SRB3_BASE);
|
|
|
|
} else if (IS_GEN(i915, 2)) {
|
|
|
|
init_unused_ring(gt, SRB0_BASE);
|
|
|
|
init_unused_ring(gt, SRB1_BASE);
|
|
|
|
} else if (IS_GEN(i915, 3)) {
|
|
|
|
init_unused_ring(gt, PRB1_BASE);
|
|
|
|
init_unused_ring(gt, PRB2_BASE);
|
2014-08-15 06:21:55 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-06-26 23:45:49 +08:00
|
|
|
int i915_gem_init_hw(struct drm_i915_private *i915)
|
2017-02-08 22:30:31 +08:00
|
|
|
{
|
2019-06-26 23:45:49 +08:00
|
|
|
struct intel_uncore *uncore = &i915->uncore;
|
|
|
|
struct intel_gt *gt = &i915->gt;
|
2016-04-28 16:56:44 +08:00
|
|
|
int ret;
|
2013-02-09 03:49:24 +08:00
|
|
|
|
2019-06-26 23:45:49 +08:00
|
|
|
BUG_ON(!i915->kernel_context);
|
2019-07-13 03:29:53 +08:00
|
|
|
ret = intel_gt_terminally_wedged(gt);
|
2019-06-26 23:45:49 +08:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2019-06-21 15:07:53 +08:00
|
|
|
gt->last_init_time = ktime_get();
|
2016-10-25 20:16:02 +08:00
|
|
|
|
2015-02-13 22:35:59 +08:00
|
|
|
/* Double layer security blanket, see i915_gem_init() */
|
2019-06-21 15:07:53 +08:00
|
|
|
intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
|
2015-02-13 22:35:59 +08:00
|
|
|
|
2019-06-21 15:07:53 +08:00
|
|
|
if (HAS_EDRAM(i915) && INTEL_GEN(i915) < 9)
|
|
|
|
intel_uncore_rmw(uncore, HSW_IDICR, 0, IDIHASHMSK(0xf));
|
2013-02-09 03:49:24 +08:00
|
|
|
|
2019-06-21 15:07:53 +08:00
|
|
|
if (IS_HASWELL(i915))
|
|
|
|
intel_uncore_write(uncore,
|
|
|
|
MI_PREDICATE_RESULT_2,
|
|
|
|
IS_HSW_GT3(i915) ?
|
|
|
|
LOWER_SLICE_ENABLED : LOWER_SLICE_DISABLED);
|
2013-08-29 03:45:46 +08:00
|
|
|
|
2018-12-03 20:50:10 +08:00
|
|
|
/* Apply the GT workarounds... */
|
2019-06-21 15:07:53 +08:00
|
|
|
intel_gt_apply_workarounds(gt);
|
2018-12-03 20:50:10 +08:00
|
|
|
/* ...and determine whether they are sticking. */
|
2019-06-21 15:07:53 +08:00
|
|
|
intel_gt_verify_workarounds(gt, "init");
|
2018-04-11 00:12:47 +08:00
|
|
|
|
2019-06-21 15:07:53 +08:00
|
|
|
intel_gt_init_swizzling(gt);
|
2013-02-09 03:49:24 +08:00
|
|
|
|
2014-11-20 16:45:19 +08:00
|
|
|
/*
|
|
|
|
* At least 830 can leave some of the unused rings
|
|
|
|
* "active" (ie. head != tail) after resume which
|
|
|
|
* will prevent c3 entry. Makes sure all unused rings
|
|
|
|
* are totally idle.
|
|
|
|
*/
|
2019-06-21 15:07:53 +08:00
|
|
|
init_unused_rings(gt);
|
2015-05-30 00:43:37 +08:00
|
|
|
|
2019-06-21 15:07:53 +08:00
|
|
|
ret = i915_ppgtt_init_hw(gt);
|
2015-06-18 20:11:20 +08:00
|
|
|
if (ret) {
|
2018-02-07 19:15:45 +08:00
|
|
|
DRM_ERROR("Enabling PPGTT failed (%d)\n", ret);
|
2015-06-18 20:11:20 +08:00
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2017-10-26 01:25:19 +08:00
|
|
|
/* We can't enable contexts until all firmware is loaded */
|
2019-07-31 07:07:40 +08:00
|
|
|
ret = intel_uc_init_hw(>->uc);
|
2018-02-07 19:15:45 +08:00
|
|
|
if (ret) {
|
2019-08-03 02:40:54 +08:00
|
|
|
i915_probe_error(i915, "Enabling uc failed (%d)\n", ret);
|
2017-10-26 01:25:19 +08:00
|
|
|
goto out;
|
2018-02-07 19:15:45 +08:00
|
|
|
}
|
2017-10-26 01:25:19 +08:00
|
|
|
|
2019-07-31 02:04:07 +08:00
|
|
|
intel_mocs_init(gt);
|
2016-04-13 22:03:25 +08:00
|
|
|
|
2019-06-21 15:07:54 +08:00
|
|
|
intel_engines_set_scheduler_caps(i915);
|
|
|
|
|
2019-06-26 23:45:49 +08:00
|
|
|
out:
|
2019-06-21 15:07:54 +08:00
|
|
|
intel_uncore_forcewake_put(uncore, FORCEWAKE_ALL);
|
2018-07-12 20:48:10 +08:00
|
|
|
return ret;
|
2010-05-21 09:08:55 +08:00
|
|
|
}
|
|
|
|
|
2017-11-10 22:26:33 +08:00
|
|
|
static int __intel_engines_record_defaults(struct drm_i915_private *i915)
|
|
|
|
{
|
|
|
|
struct intel_engine_cs *engine;
|
2019-04-27 00:33:34 +08:00
|
|
|
struct i915_gem_context *ctx;
|
|
|
|
struct i915_gem_engines *e;
|
2017-11-10 22:26:33 +08:00
|
|
|
enum intel_engine_id id;
|
2019-03-08 17:36:55 +08:00
|
|
|
int err = 0;
|
2017-11-10 22:26:33 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* As we reset the gpu during very early sanitisation, the current
|
|
|
|
* register state on the GPU should reflect its defaults values.
|
|
|
|
* We load a context onto the hw (with restore-inhibit), then switch
|
|
|
|
* over to a second context to save that default register state. We
|
|
|
|
* can then prime every new context with that state so they all start
|
|
|
|
* from the same default HW values.
|
|
|
|
*/
|
|
|
|
|
|
|
|
ctx = i915_gem_context_create_kernel(i915, 0);
|
|
|
|
if (IS_ERR(ctx))
|
|
|
|
return PTR_ERR(ctx);
|
|
|
|
|
2019-04-27 00:33:34 +08:00
|
|
|
e = i915_gem_context_lock_engines(ctx);
|
|
|
|
|
2017-11-10 22:26:33 +08:00
|
|
|
for_each_engine(engine, i915, id) {
|
2019-04-27 00:33:34 +08:00
|
|
|
struct intel_context *ce = e->engines[id];
|
2018-02-21 17:56:36 +08:00
|
|
|
struct i915_request *rq;
|
2017-11-10 22:26:33 +08:00
|
|
|
|
2019-04-27 00:33:34 +08:00
|
|
|
rq = intel_context_create_request(ce);
|
2017-11-10 22:26:33 +08:00
|
|
|
if (IS_ERR(rq)) {
|
|
|
|
err = PTR_ERR(rq);
|
2019-04-27 00:33:34 +08:00
|
|
|
goto err_active;
|
2017-11-10 22:26:33 +08:00
|
|
|
}
|
|
|
|
|
2019-07-29 19:37:20 +08:00
|
|
|
err = intel_engine_emit_ctx_wa(rq);
|
|
|
|
if (err)
|
|
|
|
goto err_rq;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Failing to program the MOCS is non-fatal.The system will not
|
|
|
|
* run at peak performance. So warn the user and carry on.
|
|
|
|
*/
|
|
|
|
err = intel_mocs_emit(rq);
|
|
|
|
if (err)
|
|
|
|
dev_notice(i915->drm.dev,
|
|
|
|
"Failed to program MOCS registers; expect performance issues.\n");
|
|
|
|
|
|
|
|
err = intel_renderstate_emit(rq);
|
|
|
|
if (err)
|
|
|
|
goto err_rq;
|
2017-11-10 22:26:33 +08:00
|
|
|
|
2019-07-29 19:37:20 +08:00
|
|
|
err_rq:
|
2018-06-12 18:51:35 +08:00
|
|
|
i915_request_add(rq);
|
2017-11-10 22:26:33 +08:00
|
|
|
if (err)
|
|
|
|
goto err_active;
|
|
|
|
}
|
|
|
|
|
2019-03-08 17:36:55 +08:00
|
|
|
/* Flush the default context image to memory, and enable powersaving. */
|
2019-04-25 04:07:14 +08:00
|
|
|
if (!i915_gem_load_power_context(i915)) {
|
2019-03-08 17:36:55 +08:00
|
|
|
err = -EIO;
|
2017-11-10 22:26:33 +08:00
|
|
|
goto err_active;
|
2018-07-09 20:20:43 +08:00
|
|
|
}
|
2017-11-10 22:26:33 +08:00
|
|
|
|
|
|
|
for_each_engine(engine, i915, id) {
|
2019-04-27 00:33:34 +08:00
|
|
|
struct intel_context *ce = e->engines[id];
|
|
|
|
struct i915_vma *state = ce->state;
|
2018-09-14 20:35:03 +08:00
|
|
|
void *vaddr;
|
2017-11-10 22:26:33 +08:00
|
|
|
|
|
|
|
if (!state)
|
|
|
|
continue;
|
|
|
|
|
2019-03-08 21:25:22 +08:00
|
|
|
GEM_BUG_ON(intel_context_is_pinned(ce));
|
2019-03-08 21:25:19 +08:00
|
|
|
|
2017-11-10 22:26:33 +08:00
|
|
|
/*
|
|
|
|
* As we will hold a reference to the logical state, it will
|
|
|
|
* not be torn down with the context, and importantly the
|
|
|
|
* object will hold onto its vma (making it possible for a
|
|
|
|
* stray GTT write to corrupt our defaults). Unmap the vma
|
|
|
|
* from the GTT to prevent such accidents and reclaim the
|
|
|
|
* space.
|
|
|
|
*/
|
|
|
|
err = i915_vma_unbind(state);
|
|
|
|
if (err)
|
|
|
|
goto err_active;
|
|
|
|
|
2019-05-28 17:29:51 +08:00
|
|
|
i915_gem_object_lock(state->obj);
|
2017-11-10 22:26:33 +08:00
|
|
|
err = i915_gem_object_set_to_cpu_domain(state->obj, false);
|
2019-05-28 17:29:51 +08:00
|
|
|
i915_gem_object_unlock(state->obj);
|
2017-11-10 22:26:33 +08:00
|
|
|
if (err)
|
|
|
|
goto err_active;
|
|
|
|
|
|
|
|
engine->default_state = i915_gem_object_get(state->obj);
|
drm/i915: Flush pages on acquisition
When we return pages to the system, we ensure that they are marked as
being in the CPU domain since any external access is uncontrolled and we
must assume the worst. This means that we need to always flush the pages
on acquisition if we need to use them on the GPU, and from the beginning
have used set-domain. Set-domain is overkill for the purpose as it is a
general synchronisation barrier, but our intent is to only flush the
pages being swapped in. If we move that flush into the pages acquisition
phase, we know then that when we have obj->mm.pages, they are coherent
with the GPU and need only maintain that status without resorting to
heavy handed use of set-domain.
The principle knock-on effect for userspace is through mmap-gtt
pagefaulting. Our uAPI has always implied that the GTT mmap was async
(especially as when any pagefault occurs is unpredicatable to userspace)
and so userspace had to apply explicit domain control itself
(set-domain). However, swapping is transparent to the kernel, and so on
first fault we need to acquire the pages and make them coherent for
access through the GTT. Our use of set-domain here leaks into the uABI
that the first pagefault was synchronous. This is unintentional and
baring a few igt should be unoticed, nevertheless we bump the uABI
version for mmap-gtt to reflect the change in behaviour.
Another implication of the change is that gem_create() is presumed to
create an object that is coherent with the CPU and is in the CPU write
domain, so a set-domain(CPU) following a gem_create() would be a minor
operation that merely checked whether we could allocate all pages for
the object. On applying this change, a set-domain(CPU) causes a clflush
as we acquire the pages. This will have a small impact on mesa as we move
the clflush here on !llc from execbuf time to create, but that should
have minimal performance impact as the same clflush exists but is now
done early and because of the clflush issue, userspace recycles bo and
so should resist allocating fresh objects.
Internally, the presumption that objects are created in the CPU
write-domain and remain so through writes to obj->mm.mapping is more
prevalent than I expected; but easy enough to catch and apply a manual
flush.
For the future, we should push the page flush from the central
set_pages() into the callers so that we can more finely control when it
is applied, but for now doing it one location is easier to validate, at
the cost of sometimes flushing when there is no need.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Antonio Argenziano <antonio.argenziano@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190321161908.8007-1-chris@chris-wilson.co.uk
2019-03-22 00:19:07 +08:00
|
|
|
i915_gem_object_set_cache_coherency(engine->default_state,
|
|
|
|
I915_CACHE_LLC);
|
2018-09-14 20:35:03 +08:00
|
|
|
|
|
|
|
/* Check we can acquire the image of the context state */
|
|
|
|
vaddr = i915_gem_object_pin_map(engine->default_state,
|
2018-09-14 20:35:04 +08:00
|
|
|
I915_MAP_FORCE_WB);
|
2018-09-14 20:35:03 +08:00
|
|
|
if (IS_ERR(vaddr)) {
|
|
|
|
err = PTR_ERR(vaddr);
|
|
|
|
goto err_active;
|
|
|
|
}
|
|
|
|
|
|
|
|
i915_gem_object_unpin_map(engine->default_state);
|
2017-11-10 22:26:33 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) {
|
|
|
|
unsigned int found = intel_engines_has_context_isolation(i915);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Make sure that classes with multiple engine instances all
|
|
|
|
* share the same basic configuration.
|
|
|
|
*/
|
|
|
|
for_each_engine(engine, i915, id) {
|
|
|
|
unsigned int bit = BIT(engine->uabi_class);
|
|
|
|
unsigned int expected = engine->default_state ? bit : 0;
|
|
|
|
|
|
|
|
if ((found & bit) != expected) {
|
|
|
|
DRM_ERROR("mismatching default context state for class %d on engine %s\n",
|
|
|
|
engine->uabi_class, engine->name);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
out_ctx:
|
2019-04-27 00:33:34 +08:00
|
|
|
i915_gem_context_unlock_engines(ctx);
|
2017-11-10 22:26:33 +08:00
|
|
|
i915_gem_context_set_closed(ctx);
|
|
|
|
i915_gem_context_put(ctx);
|
|
|
|
return err;
|
|
|
|
|
|
|
|
err_active:
|
|
|
|
/*
|
|
|
|
* If we have to abandon now, we expect the engines to be idle
|
2019-03-08 17:36:55 +08:00
|
|
|
* and ready to be torn-down. The quickest way we can accomplish
|
|
|
|
* this is by declaring ourselves wedged.
|
2017-11-10 22:26:33 +08:00
|
|
|
*/
|
2019-07-13 03:29:53 +08:00
|
|
|
intel_gt_set_wedged(&i915->gt);
|
2017-11-10 22:26:33 +08:00
|
|
|
goto out_ctx;
|
|
|
|
}
|
|
|
|
|
2018-12-04 22:15:16 +08:00
|
|
|
static int
|
|
|
|
i915_gem_init_scratch(struct drm_i915_private *i915, unsigned int size)
|
|
|
|
{
|
2019-06-21 15:08:11 +08:00
|
|
|
return intel_gt_init_scratch(&i915->gt, size);
|
2018-12-04 22:15:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void i915_gem_fini_scratch(struct drm_i915_private *i915)
|
|
|
|
{
|
2019-06-21 15:08:11 +08:00
|
|
|
intel_gt_fini_scratch(&i915->gt);
|
2018-12-04 22:15:16 +08:00
|
|
|
}
|
|
|
|
|
2019-04-17 15:56:28 +08:00
|
|
|
static int intel_engines_verify_workarounds(struct drm_i915_private *i915)
|
|
|
|
{
|
|
|
|
struct intel_engine_cs *engine;
|
|
|
|
enum intel_engine_id id;
|
|
|
|
int err = 0;
|
|
|
|
|
|
|
|
if (!IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
for_each_engine(engine, i915, id) {
|
|
|
|
if (intel_engine_verify_workarounds(engine, "load"))
|
|
|
|
err = -EIO;
|
|
|
|
}
|
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2016-12-01 22:16:38 +08:00
|
|
|
int i915_gem_init(struct drm_i915_private *dev_priv)
|
2012-04-24 22:47:41 +08:00
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
2018-05-08 17:07:05 +08:00
|
|
|
/* We need to fallback to 4K pages if host doesn't support huge gtt. */
|
|
|
|
if (intel_vgpu_active(dev_priv) && !intel_vgpu_has_huge_gtt(dev_priv))
|
2017-10-07 06:18:31 +08:00
|
|
|
mkwrite_device_info(dev_priv)->page_sizes =
|
|
|
|
I915_GTT_PAGE_SIZE_4K;
|
|
|
|
|
2017-05-03 17:39:18 +08:00
|
|
|
dev_priv->mm.unordered_timeline = dma_fence_context_alloc(1);
|
2017-02-22 19:40:48 +08:00
|
|
|
|
2019-06-21 15:08:10 +08:00
|
|
|
intel_timelines_init(dev_priv);
|
2019-01-28 18:23:56 +08:00
|
|
|
|
2017-11-23 01:26:21 +08:00
|
|
|
ret = i915_gem_init_userptr(dev_priv);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2019-07-13 18:00:13 +08:00
|
|
|
intel_uc_fetch_firmwares(&dev_priv->gt.uc);
|
2019-08-03 02:40:55 +08:00
|
|
|
intel_wopcm_init(&dev_priv->wopcm);
|
2017-12-14 06:13:47 +08:00
|
|
|
|
2015-02-13 22:35:59 +08:00
|
|
|
/* This is just a security blanket to placate dragons.
|
|
|
|
* On some systems, we very sporadically observe that the first TLBs
|
|
|
|
* used by the CS may be stale, despite us poking the TLB reset. If
|
|
|
|
* we hold the forcewake during initialisation these problems
|
|
|
|
* just magically go away.
|
|
|
|
*/
|
2017-11-23 01:26:21 +08:00
|
|
|
mutex_lock(&dev_priv->drm.struct_mutex);
|
2019-03-20 02:35:36 +08:00
|
|
|
intel_uncore_forcewake_get(&dev_priv->uncore, FORCEWAKE_ALL);
|
2015-02-13 22:35:59 +08:00
|
|
|
|
2019-06-21 15:08:05 +08:00
|
|
|
ret = i915_init_ggtt(dev_priv);
|
2017-12-13 21:43:47 +08:00
|
|
|
if (ret) {
|
|
|
|
GEM_BUG_ON(ret == -EIO);
|
|
|
|
goto err_unlock;
|
|
|
|
}
|
2013-03-09 02:45:53 +08:00
|
|
|
|
2018-12-04 22:15:16 +08:00
|
|
|
ret = i915_gem_init_scratch(dev_priv,
|
drm/i915: replace IS_GEN<N> with IS_GEN(..., N)
Define IS_GEN() similarly to our IS_GEN_RANGE(). but use gen instead of
gen_mask to do the comparison. Now callers can pass then gen as a parameter,
so we don't require one macro for each gen.
The following spatch was used to convert the users of these macros:
@@
expression e;
@@
(
- IS_GEN2(e)
+ IS_GEN(e, 2)
|
- IS_GEN3(e)
+ IS_GEN(e, 3)
|
- IS_GEN4(e)
+ IS_GEN(e, 4)
|
- IS_GEN5(e)
+ IS_GEN(e, 5)
|
- IS_GEN6(e)
+ IS_GEN(e, 6)
|
- IS_GEN7(e)
+ IS_GEN(e, 7)
|
- IS_GEN8(e)
+ IS_GEN(e, 8)
|
- IS_GEN9(e)
+ IS_GEN(e, 9)
|
- IS_GEN10(e)
+ IS_GEN(e, 10)
|
- IS_GEN11(e)
+ IS_GEN(e, 11)
)
v2: use IS_GEN rather than GT_GEN and compare to info.gen rather than
using the bitmask
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181212181044.15886-2-lucas.demarchi@intel.com
2018-12-13 02:10:43 +08:00
|
|
|
IS_GEN(dev_priv, 2) ? SZ_256K : PAGE_SIZE);
|
2017-12-13 21:43:47 +08:00
|
|
|
if (ret) {
|
|
|
|
GEM_BUG_ON(ret == -EIO);
|
|
|
|
goto err_ggtt;
|
|
|
|
}
|
drm/i915: Split context enabling from init
We **need** to do this for exactly 1 reason, because we want to embed a
PPGTT into the context, but we don't want to special case the default
context.
To achieve that, we must be able to initialize contexts after the GTT is
setup (so we can allocate and pin the default context's BO), but before
the PPGTT and rings are initialized. This is because, currently, context
initialization requires ring usage. We don't have rings until after the
GTT is setup. If we split the enabling part of context initialization,
the part requiring the ringbuffer, we can untangle this, and then later
embed the PPGTT
Incidentally this allows us to also adhere to the original design of
context init/fini in future patches: they were only ever meant to be
called at driver load and unload.
v2: Move hw_contexts_disabled test in i915_gem_context_enable() (Chris)
v3: BUG_ON after checking for disabled contexts. Or else it blows up pre
gen6 (Ben)
v4: Forward port
Modified enable for each ring, since that patch is earlier in the series
Dropped ring arg from create_default_context so it can be used by others
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-07 06:11:04 +08:00
|
|
|
|
2019-04-27 00:33:33 +08:00
|
|
|
ret = intel_engines_setup(dev_priv);
|
|
|
|
if (ret) {
|
|
|
|
GEM_BUG_ON(ret == -EIO);
|
|
|
|
goto err_unlock;
|
|
|
|
}
|
|
|
|
|
2018-12-04 22:15:16 +08:00
|
|
|
ret = i915_gem_contexts_init(dev_priv);
|
|
|
|
if (ret) {
|
|
|
|
GEM_BUG_ON(ret == -EIO);
|
|
|
|
goto err_scratch;
|
|
|
|
}
|
|
|
|
|
2016-12-01 22:16:38 +08:00
|
|
|
ret = intel_engines_init(dev_priv);
|
2017-12-13 21:43:47 +08:00
|
|
|
if (ret) {
|
|
|
|
GEM_BUG_ON(ret == -EIO);
|
|
|
|
goto err_context;
|
|
|
|
}
|
drm/i915: Split context enabling from init
We **need** to do this for exactly 1 reason, because we want to embed a
PPGTT into the context, but we don't want to special case the default
context.
To achieve that, we must be able to initialize contexts after the GTT is
setup (so we can allocate and pin the default context's BO), but before
the PPGTT and rings are initialized. This is because, currently, context
initialization requires ring usage. We don't have rings until after the
GTT is setup. If we split the enabling part of context initialization,
the part requiring the ringbuffer, we can untangle this, and then later
embed the PPGTT
Incidentally this allows us to also adhere to the original design of
context init/fini in future patches: they were only ever meant to be
called at driver load and unload.
v2: Move hw_contexts_disabled test in i915_gem_context_enable() (Chris)
v3: BUG_ON after checking for disabled contexts. Or else it blows up pre
gen6 (Ben)
v4: Forward port
Modified enable for each ring, since that patch is earlier in the series
Dropped ring arg from create_default_context so it can be used by others
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-07 06:11:04 +08:00
|
|
|
|
2017-11-10 22:26:29 +08:00
|
|
|
intel_init_gt_powersave(dev_priv);
|
|
|
|
|
2019-07-13 18:00:13 +08:00
|
|
|
ret = intel_uc_init(&dev_priv->gt.uc);
|
2017-11-10 22:26:30 +08:00
|
|
|
if (ret)
|
2017-12-13 21:43:47 +08:00
|
|
|
goto err_pm;
|
2017-11-10 22:26:30 +08:00
|
|
|
|
2017-12-14 06:13:48 +08:00
|
|
|
ret = i915_gem_init_hw(dev_priv);
|
|
|
|
if (ret)
|
|
|
|
goto err_uc_init;
|
|
|
|
|
2019-06-26 23:45:49 +08:00
|
|
|
/* Only when the HW is re-initialised, can we replay the requests */
|
|
|
|
ret = intel_gt_resume(&dev_priv->gt);
|
|
|
|
if (ret)
|
|
|
|
goto err_init_hw;
|
|
|
|
|
2017-11-10 22:26:30 +08:00
|
|
|
/*
|
|
|
|
* Despite its name intel_init_clock_gating applies both display
|
|
|
|
* clock gating workarounds; GT mmio workarounds and the occasional
|
|
|
|
* GT power context workaround. Worse, sometimes it includes a context
|
|
|
|
* register workaround which we need to apply before we record the
|
|
|
|
* default HW state for all contexts.
|
|
|
|
*
|
|
|
|
* FIXME: break up the workarounds and apply them at the right time!
|
|
|
|
*/
|
|
|
|
intel_init_clock_gating(dev_priv);
|
|
|
|
|
2019-04-17 15:56:28 +08:00
|
|
|
ret = intel_engines_verify_workarounds(dev_priv);
|
|
|
|
if (ret)
|
2019-06-26 23:45:49 +08:00
|
|
|
goto err_gt;
|
2019-04-17 15:56:28 +08:00
|
|
|
|
2017-11-10 22:26:33 +08:00
|
|
|
ret = __intel_engines_record_defaults(dev_priv);
|
2017-12-13 21:43:47 +08:00
|
|
|
if (ret)
|
2019-06-26 23:45:49 +08:00
|
|
|
goto err_gt;
|
2017-12-13 21:43:47 +08:00
|
|
|
|
2019-08-03 02:40:50 +08:00
|
|
|
ret = i915_inject_load_error(dev_priv, -ENODEV);
|
|
|
|
if (ret)
|
2019-06-26 23:45:49 +08:00
|
|
|
goto err_gt;
|
2017-12-13 21:43:47 +08:00
|
|
|
|
2019-08-03 02:40:50 +08:00
|
|
|
ret = i915_inject_load_error(dev_priv, -EIO);
|
|
|
|
if (ret)
|
2019-06-26 23:45:49 +08:00
|
|
|
goto err_gt;
|
2017-12-13 21:43:47 +08:00
|
|
|
|
2019-03-20 02:35:36 +08:00
|
|
|
intel_uncore_forcewake_put(&dev_priv->uncore, FORCEWAKE_ALL);
|
2017-12-13 21:43:47 +08:00
|
|
|
mutex_unlock(&dev_priv->drm.struct_mutex);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Unwinding is complicated by that we want to handle -EIO to mean
|
|
|
|
* disable GPU submission but keep KMS alive. We want to mark the
|
|
|
|
* HW as irrevisibly wedged, but keep enough state around that the
|
|
|
|
* driver doesn't explode during runtime.
|
|
|
|
*/
|
2019-06-26 23:45:49 +08:00
|
|
|
err_gt:
|
2018-06-06 22:54:41 +08:00
|
|
|
mutex_unlock(&dev_priv->drm.struct_mutex);
|
|
|
|
|
2019-07-13 03:29:53 +08:00
|
|
|
intel_gt_set_wedged(&dev_priv->gt);
|
2019-03-08 17:36:54 +08:00
|
|
|
i915_gem_suspend(dev_priv);
|
2018-06-06 22:54:41 +08:00
|
|
|
i915_gem_suspend_late(dev_priv);
|
|
|
|
|
2018-07-10 17:44:20 +08:00
|
|
|
i915_gem_drain_workqueue(dev_priv);
|
|
|
|
|
2018-06-06 22:54:41 +08:00
|
|
|
mutex_lock(&dev_priv->drm.struct_mutex);
|
2019-06-26 23:45:49 +08:00
|
|
|
err_init_hw:
|
2019-07-13 18:00:13 +08:00
|
|
|
intel_uc_fini_hw(&dev_priv->gt.uc);
|
2017-12-14 06:13:48 +08:00
|
|
|
err_uc_init:
|
2019-07-13 18:00:13 +08:00
|
|
|
intel_uc_fini(&dev_priv->gt.uc);
|
2017-12-13 21:43:47 +08:00
|
|
|
err_pm:
|
|
|
|
if (ret != -EIO) {
|
|
|
|
intel_cleanup_gt_powersave(dev_priv);
|
2019-05-01 18:32:04 +08:00
|
|
|
intel_engines_cleanup(dev_priv);
|
2017-12-13 21:43:47 +08:00
|
|
|
}
|
|
|
|
err_context:
|
|
|
|
if (ret != -EIO)
|
|
|
|
i915_gem_contexts_fini(dev_priv);
|
2018-12-04 22:15:16 +08:00
|
|
|
err_scratch:
|
|
|
|
i915_gem_fini_scratch(dev_priv);
|
2017-12-13 21:43:47 +08:00
|
|
|
err_ggtt:
|
|
|
|
err_unlock:
|
2019-03-20 02:35:36 +08:00
|
|
|
intel_uncore_forcewake_put(&dev_priv->uncore, FORCEWAKE_ALL);
|
2017-12-13 21:43:47 +08:00
|
|
|
mutex_unlock(&dev_priv->drm.struct_mutex);
|
|
|
|
|
2019-07-13 18:00:13 +08:00
|
|
|
intel_uc_cleanup_firmwares(&dev_priv->gt.uc);
|
2018-01-10 20:54:16 +08:00
|
|
|
|
2019-01-28 18:23:56 +08:00
|
|
|
if (ret != -EIO) {
|
2017-12-13 21:43:47 +08:00
|
|
|
i915_gem_cleanup_userptr(dev_priv);
|
2019-06-21 15:08:10 +08:00
|
|
|
intel_timelines_fini(dev_priv);
|
2019-01-28 18:23:56 +08:00
|
|
|
}
|
2017-12-13 21:43:47 +08:00
|
|
|
|
2014-04-09 16:19:42 +08:00
|
|
|
if (ret == -EIO) {
|
2018-07-26 16:50:32 +08:00
|
|
|
mutex_lock(&dev_priv->drm.struct_mutex);
|
|
|
|
|
2017-12-13 21:43:47 +08:00
|
|
|
/*
|
|
|
|
* Allow engine initialisation to fail by marking the GPU as
|
2014-04-09 16:19:42 +08:00
|
|
|
* wedged. But we only want to do this where the GPU is angry,
|
|
|
|
* for all other failure, such as an allocation failure, bail.
|
|
|
|
*/
|
2019-07-13 03:29:53 +08:00
|
|
|
if (!intel_gt_is_wedged(&dev_priv->gt)) {
|
2019-07-12 19:24:27 +08:00
|
|
|
i915_probe_error(dev_priv,
|
|
|
|
"Failed to initialize GPU, declaring it wedged!\n");
|
2019-07-13 03:29:53 +08:00
|
|
|
intel_gt_set_wedged(&dev_priv->gt);
|
2017-10-15 22:37:25 +08:00
|
|
|
}
|
2018-07-26 16:50:32 +08:00
|
|
|
|
|
|
|
/* Minimal basic recovery for KMS */
|
|
|
|
ret = i915_ggtt_enable_hw(dev_priv);
|
|
|
|
i915_gem_restore_gtt_mappings(dev_priv);
|
|
|
|
i915_gem_restore_fences(dev_priv);
|
|
|
|
intel_init_clock_gating(dev_priv);
|
|
|
|
|
|
|
|
mutex_unlock(&dev_priv->drm.struct_mutex);
|
2012-04-24 22:47:41 +08:00
|
|
|
}
|
|
|
|
|
2017-12-13 21:43:47 +08:00
|
|
|
i915_gem_drain_freed_objects(dev_priv);
|
2014-04-09 16:19:42 +08:00
|
|
|
return ret;
|
2012-04-24 22:47:41 +08:00
|
|
|
}
|
|
|
|
|
2019-07-12 19:24:29 +08:00
|
|
|
void i915_gem_driver_remove(struct drm_i915_private *dev_priv)
|
2018-06-04 17:00:32 +08:00
|
|
|
{
|
drm/i915: Invert the GEM wakeref hierarchy
In the current scheme, on submitting a request we take a single global
GEM wakeref, which trickles down to wake up all GT power domains. This
is undesirable as we would like to be able to localise our power
management to the available power domains and to remove the global GEM
operations from the heart of the driver. (The intent there is to push
global GEM decisions to the boundary as used by the GEM user interface.)
Now during request construction, each request is responsible via its
logical context to acquire a wakeref on each power domain it intends to
utilize. Currently, each request takes a wakeref on the engine(s) and
the engines themselves take a chipset wakeref. This gives us a
transition on each engine which we can extend if we want to insert more
powermangement control (such as soft rc6). The global GEM operations
that currently require a struct_mutex are reduced to listening to pm
events from the chipset GT wakeref. As we reduce the struct_mutex
requirement, these listeners should evaporate.
Perhaps the biggest immediate change is that this removes the
struct_mutex requirement around GT power management, allowing us greater
flexibility in request construction. Another important knock-on effect,
is that by tracking engine usage, we can insert a switch back to the
kernel context on that engine immediately, avoiding any extra delay or
inserting global synchronisation barriers. This makes tracking when an
engine and its associated contexts are idle much easier -- important for
when we forgo our assumed execution ordering and need idle barriers to
unpin used contexts. In the process, it means we remove a large chunk of
code whose only purpose was to switch back to the kernel context.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-5-chris@chris-wilson.co.uk
2019-04-25 04:07:17 +08:00
|
|
|
GEM_BUG_ON(dev_priv->gt.awake);
|
|
|
|
|
2019-06-13 15:32:54 +08:00
|
|
|
intel_wakeref_auto_fini(&dev_priv->ggtt.userfault_wakeref);
|
2019-05-27 19:51:14 +08:00
|
|
|
|
2018-06-04 17:00:32 +08:00
|
|
|
i915_gem_suspend_late(dev_priv);
|
2018-08-13 06:36:29 +08:00
|
|
|
intel_disable_gt_powersave(dev_priv);
|
2018-06-04 17:00:32 +08:00
|
|
|
|
|
|
|
/* Flush any outstanding unpin_work. */
|
|
|
|
i915_gem_drain_workqueue(dev_priv);
|
|
|
|
|
|
|
|
mutex_lock(&dev_priv->drm.struct_mutex);
|
2019-07-13 18:00:13 +08:00
|
|
|
intel_uc_fini_hw(&dev_priv->gt.uc);
|
|
|
|
intel_uc_fini(&dev_priv->gt.uc);
|
2019-05-30 21:31:05 +08:00
|
|
|
mutex_unlock(&dev_priv->drm.struct_mutex);
|
|
|
|
|
|
|
|
i915_gem_drain_freed_objects(dev_priv);
|
|
|
|
}
|
|
|
|
|
2019-07-12 19:24:28 +08:00
|
|
|
void i915_gem_driver_release(struct drm_i915_private *dev_priv)
|
2019-05-30 21:31:05 +08:00
|
|
|
{
|
|
|
|
mutex_lock(&dev_priv->drm.struct_mutex);
|
2019-05-01 18:32:04 +08:00
|
|
|
intel_engines_cleanup(dev_priv);
|
2018-06-04 17:00:32 +08:00
|
|
|
i915_gem_contexts_fini(dev_priv);
|
2018-12-04 22:15:16 +08:00
|
|
|
i915_gem_fini_scratch(dev_priv);
|
2018-06-04 17:00:32 +08:00
|
|
|
mutex_unlock(&dev_priv->drm.struct_mutex);
|
|
|
|
|
2018-12-03 21:33:19 +08:00
|
|
|
intel_wa_list_free(&dev_priv->gt_wa_list);
|
|
|
|
|
2018-08-13 06:36:29 +08:00
|
|
|
intel_cleanup_gt_powersave(dev_priv);
|
|
|
|
|
2019-07-13 18:00:13 +08:00
|
|
|
intel_uc_cleanup_firmwares(&dev_priv->gt.uc);
|
2018-06-04 17:00:32 +08:00
|
|
|
i915_gem_cleanup_userptr(dev_priv);
|
2019-06-21 15:08:10 +08:00
|
|
|
intel_timelines_fini(dev_priv);
|
2018-06-04 17:00:32 +08:00
|
|
|
|
|
|
|
i915_gem_drain_freed_objects(dev_priv);
|
|
|
|
|
|
|
|
WARN_ON(!list_empty(&dev_priv->contexts.list));
|
|
|
|
}
|
|
|
|
|
2017-01-24 19:01:35 +08:00
|
|
|
void i915_gem_init_mmio(struct drm_i915_private *i915)
|
|
|
|
{
|
|
|
|
i915_gem_sanitize(i915);
|
|
|
|
}
|
|
|
|
|
2017-11-11 07:24:47 +08:00
|
|
|
static void i915_gem_init__mm(struct drm_i915_private *i915)
|
|
|
|
{
|
|
|
|
spin_lock_init(&i915->mm.obj_lock);
|
|
|
|
|
|
|
|
init_llist_head(&i915->mm.free_list);
|
|
|
|
|
2019-05-31 04:34:59 +08:00
|
|
|
INIT_LIST_HEAD(&i915->mm.purge_list);
|
2019-06-12 18:57:20 +08:00
|
|
|
INIT_LIST_HEAD(&i915->mm.shrink_list);
|
2017-11-11 07:24:47 +08:00
|
|
|
|
2019-05-28 17:29:45 +08:00
|
|
|
i915_gem_init__objects(i915);
|
2017-11-11 07:24:47 +08:00
|
|
|
}
|
|
|
|
|
2018-03-23 20:34:49 +08:00
|
|
|
int i915_gem_init_early(struct drm_i915_private *dev_priv)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
2019-02-28 18:20:34 +08:00
|
|
|
int err;
|
2017-08-16 16:52:08 +08:00
|
|
|
|
2017-11-11 07:24:47 +08:00
|
|
|
i915_gem_init__mm(dev_priv);
|
2019-04-25 04:07:14 +08:00
|
|
|
i915_gem_init__pm(dev_priv);
|
2017-10-16 19:40:37 +08:00
|
|
|
|
2016-09-01 19:58:21 +08:00
|
|
|
atomic_set(&dev_priv->mm.bsd_engine_dispatch_index, 0);
|
|
|
|
|
2016-08-04 23:32:36 +08:00
|
|
|
spin_lock_init(&dev_priv->fb_tracking.lock);
|
2016-10-28 20:58:46 +08:00
|
|
|
|
2017-10-07 06:18:14 +08:00
|
|
|
err = i915_gemfs_init(dev_priv);
|
|
|
|
if (err)
|
|
|
|
DRM_NOTE("Unable to create a private tmpfs mount, hugepage support will be disabled(%d).\n", err);
|
|
|
|
|
2016-10-28 20:58:46 +08:00
|
|
|
return 0;
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
2008-12-30 18:31:46 +08:00
|
|
|
|
2018-03-23 20:34:49 +08:00
|
|
|
void i915_gem_cleanup_early(struct drm_i915_private *dev_priv)
|
2016-01-19 21:26:29 +08:00
|
|
|
{
|
2017-02-11 00:35:23 +08:00
|
|
|
i915_gem_drain_freed_objects(dev_priv);
|
2018-02-20 06:06:31 +08:00
|
|
|
GEM_BUG_ON(!llist_empty(&dev_priv->mm.free_list));
|
|
|
|
GEM_BUG_ON(atomic_read(&dev_priv->mm.free_count));
|
2019-05-31 04:35:00 +08:00
|
|
|
WARN_ON(dev_priv->mm.shrink_count);
|
2016-11-18 05:04:11 +08:00
|
|
|
|
2017-10-07 06:18:14 +08:00
|
|
|
i915_gemfs_fini(dev_priv);
|
2016-01-19 21:26:29 +08:00
|
|
|
}
|
|
|
|
|
2016-09-21 21:51:07 +08:00
|
|
|
int i915_gem_freeze(struct drm_i915_private *dev_priv)
|
|
|
|
{
|
2017-04-07 18:25:49 +08:00
|
|
|
/* Discard all purgeable objects, let userspace recover those as
|
|
|
|
* required after resuming.
|
|
|
|
*/
|
2016-09-21 21:51:07 +08:00
|
|
|
i915_gem_shrink_all(dev_priv);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-06-01 22:41:25 +08:00
|
|
|
int i915_gem_freeze_late(struct drm_i915_private *i915)
|
2016-05-14 14:26:33 +08:00
|
|
|
{
|
|
|
|
struct drm_i915_gem_object *obj;
|
2019-06-12 18:57:20 +08:00
|
|
|
intel_wakeref_t wakeref;
|
2016-05-14 14:26:33 +08:00
|
|
|
|
2018-06-01 22:41:25 +08:00
|
|
|
/*
|
|
|
|
* Called just before we write the hibernation image.
|
2016-05-14 14:26:33 +08:00
|
|
|
*
|
|
|
|
* We need to update the domain tracking to reflect that the CPU
|
|
|
|
* will be accessing all the pages to create and restore from the
|
|
|
|
* hibernation, and so upon restoration those pages will be in the
|
|
|
|
* CPU domain.
|
|
|
|
*
|
|
|
|
* To make sure the hibernation image contains the latest state,
|
|
|
|
* we update that state just before writing out the image.
|
2016-09-10 03:02:18 +08:00
|
|
|
*
|
|
|
|
* To try and reduce the hibernation image, we manually shrink
|
2017-04-07 18:25:49 +08:00
|
|
|
* the objects as well, see i915_gem_freeze()
|
2016-05-14 14:26:33 +08:00
|
|
|
*/
|
|
|
|
|
2019-06-14 07:21:54 +08:00
|
|
|
wakeref = intel_runtime_pm_get(&i915->runtime_pm);
|
2019-06-12 18:57:20 +08:00
|
|
|
|
|
|
|
i915_gem_shrink(i915, -1UL, NULL, ~0);
|
2018-06-01 22:41:25 +08:00
|
|
|
i915_gem_drain_freed_objects(i915);
|
2016-05-14 14:26:33 +08:00
|
|
|
|
2019-06-12 18:57:20 +08:00
|
|
|
list_for_each_entry(obj, &i915->mm.shrink_list, mm.link) {
|
|
|
|
i915_gem_object_lock(obj);
|
|
|
|
WARN_ON(i915_gem_object_set_to_cpu_domain(obj, true));
|
|
|
|
i915_gem_object_unlock(obj);
|
2016-05-14 14:26:33 +08:00
|
|
|
}
|
2019-06-12 18:57:20 +08:00
|
|
|
|
2019-06-14 07:21:54 +08:00
|
|
|
intel_runtime_pm_put(&i915->runtime_pm, wakeref);
|
2016-05-14 14:26:33 +08:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2010-09-24 23:02:42 +08:00
|
|
|
void i915_gem_release(struct drm_device *dev, struct drm_file *file)
|
2009-06-03 15:27:35 +08:00
|
|
|
{
|
2010-09-24 23:02:42 +08:00
|
|
|
struct drm_i915_file_private *file_priv = file->driver_priv;
|
2018-02-21 17:56:36 +08:00
|
|
|
struct i915_request *request;
|
2009-06-03 15:27:35 +08:00
|
|
|
|
|
|
|
/* Clean up our request list when the client is going away, so that
|
|
|
|
* later retire_requests won't dereference our soon-to-be-gone
|
|
|
|
* file_priv.
|
|
|
|
*/
|
2010-09-26 18:03:27 +08:00
|
|
|
spin_lock(&file_priv->mm.lock);
|
2017-03-02 20:25:25 +08:00
|
|
|
list_for_each_entry(request, &file_priv->mm.request_list, client_link)
|
2010-09-24 23:02:42 +08:00
|
|
|
request->file_priv = NULL;
|
2010-09-26 18:03:27 +08:00
|
|
|
spin_unlock(&file_priv->mm.lock);
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-26 00:34:56 +08:00
|
|
|
}
|
|
|
|
|
2017-06-20 19:05:45 +08:00
|
|
|
int i915_gem_open(struct drm_i915_private *i915, struct drm_file *file)
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-26 00:34:56 +08:00
|
|
|
{
|
|
|
|
struct drm_i915_file_private *file_priv;
|
2013-12-07 06:10:58 +08:00
|
|
|
int ret;
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-26 00:34:56 +08:00
|
|
|
|
2016-11-09 18:45:07 +08:00
|
|
|
DRM_DEBUG("\n");
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-26 00:34:56 +08:00
|
|
|
|
|
|
|
file_priv = kzalloc(sizeof(*file_priv), GFP_KERNEL);
|
|
|
|
if (!file_priv)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
file->driver_priv = file_priv;
|
2017-06-20 19:05:45 +08:00
|
|
|
file_priv->dev_priv = i915;
|
2014-02-25 23:11:24 +08:00
|
|
|
file_priv->file = file;
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-26 00:34:56 +08:00
|
|
|
|
|
|
|
spin_lock_init(&file_priv->mm.lock);
|
|
|
|
INIT_LIST_HEAD(&file_priv->mm.request_list);
|
|
|
|
|
2016-07-27 16:07:27 +08:00
|
|
|
file_priv->bsd_engine = -1;
|
2018-06-15 18:44:29 +08:00
|
|
|
file_priv->hang_timestamp = jiffies;
|
2016-01-15 23:12:50 +08:00
|
|
|
|
2017-06-20 19:05:45 +08:00
|
|
|
ret = i915_gem_context_open(i915, file);
|
2013-12-07 06:10:58 +08:00
|
|
|
if (ret)
|
|
|
|
kfree(file_priv);
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-26 00:34:56 +08:00
|
|
|
|
2013-12-07 06:10:58 +08:00
|
|
|
return ret;
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-26 00:34:56 +08:00
|
|
|
}
|
|
|
|
|
2014-09-20 00:27:27 +08:00
|
|
|
/**
|
|
|
|
* i915_gem_track_fb - update frontbuffer tracking
|
2015-09-15 20:58:44 +08:00
|
|
|
* @old: current GEM buffer for the frontbuffer slots
|
|
|
|
* @new: new GEM buffer for the frontbuffer slots
|
|
|
|
* @frontbuffer_bits: bitmask of frontbuffer slots
|
2014-09-20 00:27:27 +08:00
|
|
|
*
|
|
|
|
* This updates the frontbuffer tracking bits @frontbuffer_bits by clearing them
|
|
|
|
* from @old and setting them in @new. Both @old and @new can be NULL.
|
|
|
|
*/
|
2014-06-19 05:28:09 +08:00
|
|
|
void i915_gem_track_fb(struct drm_i915_gem_object *old,
|
|
|
|
struct drm_i915_gem_object *new,
|
|
|
|
unsigned frontbuffer_bits)
|
|
|
|
{
|
2016-08-04 23:32:37 +08:00
|
|
|
/* Control of individual bits within the mask are guarded by
|
|
|
|
* the owning plane->mutex, i.e. we can never see concurrent
|
|
|
|
* manipulation of individual bits. But since the bitfield as a whole
|
|
|
|
* is updated using RMW, we need to use atomics in order to update
|
|
|
|
* the bits.
|
|
|
|
*/
|
|
|
|
BUILD_BUG_ON(INTEL_FRONTBUFFER_BITS_PER_PIPE * I915_MAX_PIPES >
|
2018-09-26 18:47:07 +08:00
|
|
|
BITS_PER_TYPE(atomic_t));
|
2016-08-04 23:32:37 +08:00
|
|
|
|
2014-06-19 05:28:09 +08:00
|
|
|
if (old) {
|
2016-08-04 23:32:37 +08:00
|
|
|
WARN_ON(!(atomic_read(&old->frontbuffer_bits) & frontbuffer_bits));
|
|
|
|
atomic_andnot(frontbuffer_bits, &old->frontbuffer_bits);
|
2014-06-19 05:28:09 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
if (new) {
|
2016-08-04 23:32:37 +08:00
|
|
|
WARN_ON(atomic_read(&new->frontbuffer_bits) & frontbuffer_bits);
|
|
|
|
atomic_or(frontbuffer_bits, &new->frontbuffer_bits);
|
2014-06-19 05:28:09 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-02-14 01:15:13 +08:00
|
|
|
#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
|
2017-02-14 01:15:17 +08:00
|
|
|
#include "selftests/mock_gem_device.c"
|
2018-08-30 21:48:06 +08:00
|
|
|
#include "selftests/i915_gem.c"
|
2017-02-14 01:15:13 +08:00
|
|
|
#endif
|