Userspace is faced with a dilemma. The kernel requires implicit fencing
to manage resource usage (we always must wait for the GPU to finish
before releasing its PTE) and for third parties. However, userspace may
wish to avoid this serialisation if it is either using explicit fencing
between parties and wants more fine-grained access to buffers (e.g. it
may partition the buffer between uses and track fences on ranges rather
than the implicit fences tracking the whole object). It follows that
userspace needs a mechanism to avoid the kernel's serialisation on its
implicit fences before execbuf execution.
The next question is whether this is an object, execbuf or context flag.
Hybrid users (such as using explicit EGL_ANDROID_native_sync fencing on
shared winsys buffers, but implicit fencing on internal surfaces)
require a per-object level flag. Given that this flag need to be only
set once for the lifetime of the object, this reduces the convenience of
having an execbuf or context level flag (and avoids having multiple
pieces of uABI controlling the same feature).
Incorrect use of this flag will result in rendering corruption and GPU
hangs - but will not result in use-after-free or similar resource
tracking issues.
Serious caveat: write ordering is not strictly correct after setting
this flag on a render target on multiple engines. This affects all
subsequent GEM operations (execbuf, set-domain, pread) and shared
dma-buf operations. A fix is possible - but costly (both in terms of
further ABI changes and runtime overhead).
Testcase: igt/gem_exec_async
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Chad Versace <chadversary@chromium.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20170127094008.27489-1-chris@chris-wilson.co.uk
Whilst writing testcases to exercise the VMA API, some oddities came to
light, such as i915_gem_obj_lookup_or_create(). Joonas suggested
i915_vma_instance() as a neat replacement, so rename them, move them to
i915_vma.c and add some kerneldoc as a sugary bonus.
s/i915_gem_obj_to_vma/i915_vma_lookup/
s/i915_gem_obj_lookup_or_create_vma/i915_vma_instance/
Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170116152131.18089-2-chris@chris-wilson.co.uk
Start converting over from the byte count to its semantic macro, either
we want to allocate the size of a physical page in main memory or we
want the size of a virtual page in the GTT. 4096 could mean either, but
PAGE_SIZE and I915_GTT_PAGE_SIZE are explicit and should help improve
code comprehension and future changes. In the future, we may want to use
variable GTT page sizes and so have the challenge of knowing which
hardcoded values were used to represent a physical page vs the virtual
page.
v2: Look for a few more 4096s to convert, discover IS_ALIGNED().
v3: 4096ul paranoia, make fence alignment a distinct value of 4096, keep
bdw stolen w/a as 4096 until we know better.
v4: Add asserts that i915_vma_insert() start/end are aligned to GTT page
sizes.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110144734.26052-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Soft-pinning depends upon being able to check for availabilty of an
interval and evict overlapping object from a drm_mm range manager very
quickly. Currently it uses a linear list, and so performance is dire and
not suitable as a general replacement. Worse, the current code will oops
if it tries to evict an active buffer.
It also helps if the routine reports the correct error codes as expected
by its callers and emits a tracepoint upon use.
For posterity since the wrong patch was pushed (i.e. that missed these
key points and had known bugs), this is the changelog that should have
been on commit 506a8e87d8 ("drm/i915: Add soft-pinning API for
execbuffer"):
Userspace can pass in an offset that it presumes the object is located
at. The kernel will then do its utmost to fit the object into that
location. The assumption is that userspace is handling its own object
locations (for example along with full-ppgtt) and that the kernel will
rarely have to make space for the user's requests.
This extends the DRM_IOCTL_I915_GEM_EXECBUFFER2 to do the following:
* if the user supplies a virtual address via the execobject->offset
*and* sets the EXEC_OBJECT_PINNED flag in execobject->flags, then
that object is placed at that offset in the address space selected
by the context specifier in execbuffer.
* the location must be aligned to the GTT page size, 4096 bytes
* as the object is placed exactly as specified, it may be used by this
execbuffer call without relocations pointing to it
It may fail to do so if:
* EINVAL is returned if the object does not have a 4096 byte aligned
address
* the object conflicts with another pinned object (either pinned by
hardware in that address space, e.g. scanouts in the aliasing ppgtt)
or within the same batch.
EBUSY is returned if the location is pinned by hardware
EINVAL is returned if the location is already in use by the batch
* EINVAL is returned if the object conflicts with its own alignment (as meets
the hardware requirements) or if the placement of the object does not fit
within the address space
All other execbuffer errors apply.
Presence of this execbuf extension may be queried by passing
I915_PARAM_HAS_EXEC_SOFTPIN to DRM_IOCTL_I915_GETPARAM and checking for
a reported value of 1 (or greater).
v2: Combine the hole/adjusted-hole ENOSPC checks
v3: More color, more splitting, more blurb.
Fixes: 506a8e87d8 ("drm/i915: Add soft-pinning API for execbuffer")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161205142941.21965-2-chris@chris-wilson.co.uk
We need to distinguish between full i915_vma structs and simple
drm_mm_nodes when considering eviction (i.e. we must be careful not to
treat a mere drm_mm_node as a much larger i915_vma causing memory
corruption, if we are lucky). To do this, color these not-a-vma with -1
(I915_COLOR_UNEVICTABLE).
v2...v200: New name for -1.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161205142941.21965-1-chris@chris-wilson.co.uk
As i915.enable_cmd_parser is an unsafe option, make it read-only at
runtime. Now that it is constant, we can use the value determined during
initialisation as to whether we need the cmdparser at execbuffer time.
v2: Remove the inline for its single user, it is clear enough (and
shorter) without!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161124125851.6615-1-chris@chris-wilson.co.uk
I tried to avoid having to track the write for every VMA by only
tracking writes to the ggtt. However, for the purposes of frontbuffer
tracking this is insufficient as we need to invalidate around writes not
just to the the ggtt but all aliased ppgtt views of the framebuffer. By
moving the critical section to the object and only doing so for
framebuffer writes we can reduce the tracking even further by only
watching framebuffers and not vma.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161116190704.5293-1-chris@chris-wilson.co.uk
Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
A small selection of macros which can only accept dev_priv from
now on and a resulting trickle of fixups.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
A small selection of macros which can only accept dev_priv from
now on and a resulting trickle of fixups.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
On LLC, or even snooped, machines rendering via the GPU ends up in the CPU
cache. This cacheline dirt also needs to be flushed to main memory when
moving to an incoherent domain, such as the display's scanout engine.
Mostly, this happens because either the object is marked as dirty from
its first use or is avoided by setting the object into the display
domain from the start.
v2: Treat WT as not requiring a clflush prior to use on the display
engine as well.
Fixes: 0f71979ab7 ("drm/i915: Performed deferred clflush inside set-cache-level")
References: https://bugs.freedesktop.org/show_bug.cgi?id=95414
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.0+
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161107165204.7008-1-chris@chris-wilson.co.uk
Move has_64bit_reloc into dev_priv->info. This will make it visible
in the feature listing debug output.
v2:
- Keep the struct member to keep GCC fragile but happy (Chris)
v3:
- More detailed commit message (Chris)
- Include forgotten CHV and BXT (Chris)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1478162386-5018-1-git-send-email-joonas.lahtinen@linux.intel.com
In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)
v2: Reimplement busy-ioctl (by walking the reservation object), postpone
the ABI change for another day. Similarly use the reservation object to
find the last_write request (if active and from i915) for choosing
display CS flips.
Caveats:
* busy-ioctl: busy-ioctl only reports on the native fences, it will not
warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is
being rendered to by external fences. It also will not report the same
busy state as wait-ioctl (or polling on the dma-buf) in the same
circumstances. On the plus side, it does retain reporting of which
*i915* engines are engaged with this object.
* non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering, see
"drm/i915: Restore nonblocking awaits for modesetting"
* dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 66s), mainly due to atomic operations
(maintaining the fence refcounts).
* loss of object-level retirement callbacks, emulated by VMA retirement
tracking.
* minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
The plan is to make obtaining the backing storage for the object avoid
struct_mutex (i.e. use its own locking). The first step is to update the
API so that normal users only call pin/unpin whilst working on the
backing storage.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-12-chris@chris-wilson.co.uk
We only need the active reference to keep the object alive after the
handle has been deleted (so as to prevent a synchronous gem_close). Why
then pay the price of a kref on every execbuf when we can insert that
final active ref just in time for the handle deletion?
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-6-chris@chris-wilson.co.uk
We will need to wait on DMA completion (as signaled via struct fence)
before executing our i915_gem_request. Therefore we want to expose a
method for adding the await on the fence itself to the request.
v2: Add a comment detailing a failure to handle a signal-on-any
fence-array.
v3: Pretend that magic numbers don't exist.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-1-chris@chris-wilson.co.uk
We are not allowed to touch the GTT entries underneath an atomic section,
as they take a rpm wakelock (which is illegal from atomic context) and
in the near future acquiring the DMA address for a page within an object
may sleep for an allocation. This makes the current shortcircuit in
relocation_iomap() for performing a second relocation on an adjacent page
illegal, and we need to release the atomic iomapping, lookup the DMA,
insert it into the GTT before reentering the atomic iomap section.
As it happens, this is precisely what we do on if we are using an
iomapping over the full object and not just a single page and by
removing the shortcut, we do the right thing.
Fixes: 9c870d0367 ("drm/i915: Use RPM as the barrier for controlling...")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028142756.3850-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
- first slice of the gvt device model (Zhenyu et al)
- compression support for gpu error states (Chris)
- sunset clause on gpu errors resulting in dmesg noise telling users
how to report them
- .rodata diet from Tvrtko
- switch over lots of macros to only take dev_priv (Tvrtko)
- underrun suppression for dp link training (Ville)
- lspcon (hmdi 2.0 on skl/bxt) support from Shashank Sharma, polish
from Jani
- gen9 wm fixes from Paulo&Lyude
- updated ddi programming for kbl (Rodrigo)
- respect alternate aux/ddc pins (from vbt) for all ddi ports (Ville)
* tag 'drm-intel-next-2016-10-24' of git://anongit.freedesktop.org/drm-intel: (227 commits)
drm/i915: Update DRIVER_DATE to 20161024
drm/i915: Stop setting SNB min-freq-table 0 on powersave setup
drm/i915/dp: add lane_count check in intel_dp_check_link_status
drm/i915: Fix whitespace issues
drm/i915: Clean up DDI DDC/AUX CH sanitation
drm/i915: Respect alternate_ddc_pin for all DDI ports
drm/i915: Respect alternate_aux_channel for all DDI ports
drm/i915/gen9: Remove WaEnableYV12BugFixInHalfSliceChicken7
drm/i915: KBL - Recommended buffer translation programming for DisplayPort
drm/i915: Move down skl/kbl ddi iboost and n_edp_entires fixup
drm/i915: Add a sunset clause to GPU hang logging
drm/i915: Stop reporting error details in dmesg as well as the error-state
drm/i915/gvt: do not ignore return value of create_scratch_page
drm/i915/gvt: fix spare warnings on odd constant _Bool cast
drm/i915/gvt: mark symbols static where possible
drm/i915/gvt: fix sparse warnings on different address spaces
drm/i915/gvt: properly access enabled intel_engine_cs
drm/i915/gvt: Remove defunct vmap_batch()
drm/i915/gvt: Use common mapping routines for shadow_bb object
drm/i915/gvt: Use common mapping routines for indirect_ctx object
...
When handling execbuf relocations, we play a delicate dance with
pagefault. We first try to access the user pages underneath our
struct_mutex. However, if those pages were inside a GEM object, we may
trigger a pagefault and deadlock as i915_gem_fault() tries to
recursively acquire struct_mutex. Instead, we choose to disable
pagefaulting around the copy_from_user whilst inside the struct_mutex
and handle the EFAULT by falling back to a copy outside the
struct_mutex.
We however presumed that disabling pagefaults would be expensive. It is
just an operation on the local current task. Cheap enough that we can
restrict the disable/enable to the critical section around the copy, and
so avoid having to handle the atomic sections within the relocation
handling itself.
v2: Just illustrate the broken error handling rather than argue why it
is safer to ignore it, for now.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161018120251.25043-4-chris@chris-wilson.co.uk
We never used any invalid ptes, those were put in place for
a possibility of doing gpu faults. However our batchbuffers are not
restricted in length, so everything needs to be pointing to something
and thus out-of-bounds is pointing to scratch.
Remove the valid flag as it is always true.
v2: Expand commit msg, patch reorder (Mika)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1476360162-24062-1-git-send-email-michal.winiarski@intel.com
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
Pull drm updates from Dave Airlie:
"Core:
- Fence destaging work
- DRIVER_LEGACY to split off legacy drm drivers
- drm_mm refactoring
- Splitting drm_crtc.c into chunks and documenting better
- Display info fixes
- rbtree support for prime buffer lookup
- Simple VGA DAC driver
Panel:
- Add Nexus 7 panel
- More simple panels
i915:
- Refactoring GEM naming
- Refactored vma/active tracking
- Lockless request lookups
- Better stolen memory support
- FBC fixes
- SKL watermark fixes
- VGPU improvements
- dma-buf fencing support
- Better DP dongle support
amdgpu:
- Powerplay for Iceland asics
- Improved GPU reset support
- UVD/VEC powergating support for CZ/ST
- Preinitialised VRAM buffer support
- Virtual display support
- Initial SI support
- GTT rework
- PCI shutdown callback support
- HPD IRQ storm fixes
amdkfd:
- bugfixes
tilcdc:
- Atomic modesetting support
mediatek:
- AAL + GAMMA engine support
- Hook up gamma LUT
- Temporal dithering support
imx:
- Pixel clock from devicetree
- drm bridge support for LVDS bridges
- active plane reconfiguration
- VDIC deinterlacer support
- Frame synchronisation unit support
- Color space conversion support
analogix:
- PSR support
- Better panel on/off support
rockchip:
- rk3399 vop/crtc support
- PSR support
vc4:
- Interlaced vblank timing
- 3D rendering CPU overhead reduction
- HDMI output fixes
tda998x:
- HDMI audio ASoC support
sunxi:
- Allwinner A33 support
- better TCON support
msm:
- DT binding cleanups
- Explicit fence-fd support
sti:
- remove sti415/416 support
etnaviv:
- MMUv2 refactoring
- GC3000 support
exynos:
- Refactoring HDMI DCC/PHY
- G2D pm regression fix
- Page fault issues with wait for vblank
There is no nouveau work in this tree, as Ben didn't get a pull
request in, and he was fighting moving to atomic and adding mst
support, so maybe best it waits for a cycle"
* tag 'drm-for-v4.9' of git://people.freedesktop.org/~airlied/linux: (1412 commits)
drm/crtc: constify drm_crtc_index parameter
drm/i915: Fix conflict resolution from backmerge of v4.8-rc8 to drm-next
drm/i915/guc: Unwind GuC workqueue reservation if request construction fails
drm/i915: Reset the breadcrumbs IRQ more carefully
drm/i915: Force relocations via cpu if we run out of idle aperture
drm/i915: Distinguish last emitted request from last submitted request
drm/i915: Allow DP to work w/o EDID
drm/i915: Move long hpd handling into the hotplug work
drm/i915/execlists: Reinitialise context image after GPU hang
drm/i915: Use correct index for backtracking HUNG semaphores
drm/i915: Unalias obj->phys_handle and obj->userptr
drm/i915: Just clear the mmiodebug before a register access
drm/i915/gen9: only add the planes actually affected by ddb changes
drm/i915: Allow PCH DPLL sharing regardless of DPLL_SDVO_HIGH_SPEED
drm/i915/bxt: Fix HDMI DPLL configuration
drm/i915/gen9: fix the watermark res_blocks value
drm/i915/gen9: fix plane_blocks_per_line on watermarks calculations
drm/i915/gen9: minimum scanlines for Y tile is not always 4
drm/i915/gen9: fix the WaWmMemoryReadLatency implementation
drm/i915/kbl: KBL also needs to run the SAGV code
...
If we run out of enough aperture space to fit the entire object, we
fallback to trying to insert a single page. However, if that also fails,
we currently fail to userspace with an unexpected ENOSPC. (ENOSPC means
to userspace that their batch could not be fitted within the GTT.) Prior
to commit e8cb909ac3 ("drm/i915: Fallback to single page GTT
mmappings for relocations") the approach is to fallback to using the
slow CPU relocation path in case of iomapping failure, and that is the
behaviour we need to restore.
Fixes: e8cb909ac3 ("drm/i915: Fallback to single page GTT mmappings...")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98101
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161007065327.24515-2-chris@chris-wilson.co.uk
(cherry picked from commit d7f7633557)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
If we run out of enough aperture space to fit the entire object, we
fallback to trying to insert a single page. However, if that also fails,
we currently fail to userspace with an unexpected ENOSPC. (ENOSPC means
to userspace that their batch could not be fitted within the GTT.) Prior
to commit e8cb909ac3 ("drm/i915: Fallback to single page GTT
mmappings for relocations") the approach is to fallback to using the
slow CPU relocation path in case of iomapping failure, and that is the
behaviour we need to restore.
Fixes: e8cb909ac3 ("drm/i915: Fallback to single page GTT mmappings...")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98101
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161007065327.24515-2-chris@chris-wilson.co.uk
* the only remaining callers of "short" fault-ins are just as happy with generic
variants (both in lib/iov_iter.c); switch them to multipage variants, kill the
"short" ones
* rename the multipage variants to now available plain ones.
* get rid of compat macro defining iov_iter_fault_in_multipage_readable by
expanding it in its only user.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Now that we can wait upon fences before emitting the request, it becomes
trivial to wait upon any implicit fence provided by the dma-buf
reservation object.
To protect against failure, we force any asynchronous waits on a foreign
fence to timeout after 10s - so that a stall in another driver does not
permanently cripple ourselves. Still unpleasant though!
Testcase: igt/prime_vgem/fence-wait
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: John Harrison <john.c.harrison@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-21-chris@chris-wilson.co.uk
We are about to specialize object synchronisation to enable nonblocking
execbuf submission. First we make a copy of the current object
synchronisation for execbuffer. The general i915_gem_object_sync() will
be removed following the removal of CS flips in the near future.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: John Harrison <john.c.harrison@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-16-chris@chris-wilson.co.uk
Use atomic type and operands for dev_priv->mm.bsd_engine_dispatch_index
to avoid one struct_mutex locking scenario.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Zhao Yakui <yakui.zhao@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1472731101-21982-1-git-send-email-joonas.lahtinen@linux.intel.com
With full-ppgtt, we want the user to have full control over their memory
layout, with a separate instance per context. Forcing them to use a
shared memory layout for !RCS not only duplicates the amount of work we
have to do, but also defeats the memory segregation on offer.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20160822080350.4964-4-chris@chris-wilson.co.uk
Reviewed-by: John Harrison <john.c.harrison@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If userspace is asynchronously streaming into the batch or other
execobjects, we may not flush those writes along with a change in cache
domain (as there is no change). Therefore those writes may end up in
internal chipset buffers and not visible to the GPU upon execution. We
must issue a flush command or otherwise we encounter incoherency in the
batchbuffers and the GPU executing invalid commands (i.e. hanging) quite
regularly.
v2: Throw a paranoid wmb() into the general flush so that we remain
consistent with before.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90841
Fixes: 1816f92363 ("drm/i915: Support creation of unbound wc user...")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Akash Goel <akash.goel@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Tested-by: Matti Hämäläinen <ccr@tnsp.org>
Cc: stable@vger.kernel.org
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-1-chris@chris-wilson.co.uk
(cherry picked from commit 600f436801)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
As io_mapping.h now always allocates the struct, we can avoid that
allocation and extra pointer dance by embedding the struct inside
drm_i915_private
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160819155428.1670-5-chris@chris-wilson.co.uk
The single largest factor in the overhead of parsing the commands is the
setup of the virtual mapping to provide a continuous block for the batch
buffer. If we keep those vmappings around (against the better judgement
of mm/vmalloc.c, which we offset by handwaving and looking suggestively
at the shrinker) we can dramatically improve the performance of the
parser for small batches (such as media workloads). Furthermore, we can
use the prepare shmem read/write functions to determine how best we
need to clflush the range (rather than every page of the object).
The impact of caching both src/dst vmaps is +80% on ivb and +140% on byt
for the throughput on small batches. (Caching just the dst vmap and
iterating over the src, doing a page by page copy is roughly 5% slower
on both platforms. That may be an acceptable trade-off to eliminate one
cached vmapping, and we may be able to reduce the per-page copying overhead
further.) For *this* simple test case, the cmdparser is now within a
factor of 2 of ideal performance.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-33-chris@chris-wilson.co.uk
In order to handle tiled partial GTT mmappings, we need to associate the
fence with an individual vma.
v2: A couple of silly drops replaced spotted by Joonas
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-21-chris@chris-wilson.co.uk
Our current practice is to only name the actual list (here
dev_priv->fence_list) using "list", and elements upon that list are
referred to as "link". Further, the lru nature is of the list and not of
the node and including in the name does not disambiguate the link from
anything else.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-20-chris@chris-wilson.co.uk
By moving map-and-fenceable tracking from the object to the VMA, we gain
fine-grained tracking and the ability to track individual fences on the VMA
(subsequent patch).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-16-chris@chris-wilson.co.uk
If we cannot pin the entire object into the mappable region of the GTT,
try to pin a single page instead. This is much more likely to succeed,
and prevents us falling back to the clflush slow path.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-14-chris@chris-wilson.co.uk
With the introduction of the reloc page cache, we are just one step away
from refactoring the relocation write functions into one. Not only does
it tidy the code (slightly), but it greatly simplifies the control logic
much to gcc's satisfaction.
v2: Add selftests to document the relationship between the clflush
flags, the KMAP bit and packing into the page-aligned pointer.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-13-chris@chris-wilson.co.uk
When doing relocations, we have to obtain a mapping to the page
containing the target address. This is either a kmap or iomap depending
on GPU and its cache coherency. Neighbouring relocation entries are
typically within the same page and so we can cache our kmapping between
them and avoid those pesky TLB flushes.
Note that there is some sleight-of-hand in how the slow relocate works
as the reloc_entry_cache implies pagefaults disabled (as we are inside a
kmap_atomic section). However, the slow relocate code is meant to be the
fallback from the atomic fast path failing. Fortunately it works as we
already have performed the copy_from_user for the relocation array (no
more pagefaults there) and the kmap_atomic cache is enabled after we
have waited upon an active buffer (so no more sleeping in atomic).
Magic!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-7-chris@chris-wilson.co.uk
If userspace is asynchronously streaming into the batch or other
execobjects, we may not flush those writes along with a change in cache
domain (as there is no change). Therefore those writes may end up in
internal chipset buffers and not visible to the GPU upon execution. We
must issue a flush command or otherwise we encounter incoherency in the
batchbuffers and the GPU executing invalid commands (i.e. hanging) quite
regularly.
v2: Throw a paranoid wmb() into the general flush so that we remain
consistent with before.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90841
Fixes: 1816f92363 ("drm/i915: Support creation of unbound wc user...")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Akash Goel <akash.goel@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Tested-by: Matti Hämäläinen <ccr@tnsp.org>
Cc: stable@vger.kernel.org
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-1-chris@chris-wilson.co.uk
Treat the VMA as the primary struct responsible for tracking bindings
into the GPU's VM. That is we want to treat the VMA returned after we
pin an object into the VM as the cookie we hold and eventually release
when unpinning. Doing so eliminates the ambiguity in pinning the object
and then searching for the relevant pin later.
v2: Joonas' stylistic nitpicks, a fun rebase.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-27-git-send-email-chris@chris-wilson.co.uk
The VMA are unreferenced, they belong to the object and live until they
are closed. However, if we want to use the VMA as a cookie and use it to
keep the object alive, we want to hold onto a reference to the object
for the lifetime of the VMA cookie. To facilitate this, add a couple of
simple wrappers for managing the reference count on the object owning the
VMA.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-11-git-send-email-chris@chris-wilson.co.uk
request->batch_obj is only set by execbuffer for the convenience of
debugging hangs. By moving that operation to the callsite, we can
simplify all other callers and future patches. We also move the
complications of reference handling of the request->batch_obj next to
where the active tracking is set up for the request.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470832906-13972-2-git-send-email-chris@chris-wilson.co.uk
In the previous commit, we moved the obj->tiling_mode out of a bitfield
and into its own integer so that we could safely use READ_ONCE(). Let us
now repair some of that damage by sharing the tiling_mode with its
companion, the fence stride.
v2: New magic
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-18-git-send-email-chris@chris-wilson.co.uk
If the GEM objects being rendered with in this request have been
exported via dma-buf to a third party, hook ourselves into the dma-buf
reservation object so that the third party can serialise with our
rendering via the dma-buf fences.
Testcase: igt/prime_busy
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-26-git-send-email-chris@chris-wilson.co.uk
We are motivated to avoid using a bitfield for obj->active for a couple
of reasons. Firstly, we wish to document our lockless read of obj->active
using READ_ONCE inside i915_gem_busy_ioctl() and that requires an
integral type (i.e. not a bitfield). Secondly, gcc produces abysmal code
when presented with a bitfield and that shows up high on the profiles of
request tracking (mainly due to excess memory traffic as it converts
the bitfield to a register and back and generates frequent AGI in the
process).
v2: BIT, break up a long line in compute the other engines, new paint
for i915_gem_object_is_active (now i915_gem_object_get_active).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-23-git-send-email-chris@chris-wilson.co.uk
In view of adding inline functions into the intel_frontbuffer section,
we first split the header into its own file so that we can integrate it
more easily with kerneldoc.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-19-git-send-email-chris@chris-wilson.co.uk
Since i915_gem_obj_ggtt_pin() is an idiom breaking curry function for
i915_gem_object_ggtt_pin(), spare us the confusion and remove it.
Removing it now simplifies later patches to change the i915_vma_pin()
(and friends) interface.
v2: Add a redundant GEM_BUG_ON(!view) to
i915_gem_obj_lookup_or_create_ggtt_vma()
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-18-git-send-email-chris@chris-wilson.co.uk
In preparation to perform some magic to speed up i915_vma_pin(), which
is among the hottest of hot paths in execbuf, refactor all the bitfields
accessed by i915_vma_pin() into a single unified set of flags.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-16-git-send-email-chris@chris-wilson.co.uk
During execbuffer we look up the i915_vma in order to reserve them in
the VM. However, we then do a double lookup of the vma in order to then
pin them, all because we lack the necessary interfaces to operate on
i915_vma - so introduce i915_vma_pin()!
v2: Tidy parameter lists to remove one level of redirection in the hot
path.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-15-git-send-email-chris@chris-wilson.co.uk
Our GPUs impose certain requirements upon buffers that depend upon how
exactly they are used. Typically this is expressed as that they require
a larger surface than would be naively computed by pitch * height.
Normally such requirements are hidden away in the userspace driver, but
when we accept pointers from strangers and later impose extra conditions
on them, the original client allocator has no idea about the
monstrosities in the GPU and we require the userspace driver to inform
the kernel how many padding pages are required beyond the client
allocation.
v2: Long time, no see
v3: Try an anonymous union for uapi struct compatibility
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-7-git-send-email-chris@chris-wilson.co.uk
Move the single line to the callsite as the name is now misleading, and
the purpose is solely to add the request to the execution queue. Here,
we can see that if we failed to dispatch the batch from the request, we
can forgo flushing the GPU when closing the request.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-5-git-send-email-chris@chris-wilson.co.uk
This reimplements the denial-of-service protection against igt from
commit 227f782e46 ("drm/i915: Retire requests before creating a new
one") and transfers the stall from before each batch into get_pages().
The issue is that the stall is increasing latency between batches which
is detrimental in some cases (especially coupled with execlists) to
keeping the GPU well fed. Also we have made the observation that retiring
requests can of itself free objects (and requests) and therefore makes
a good first step when shrinking.
v2: Recycle objects prior to i915_gem_object_get_pages()
v3: Remove the reference to the ring from i915_gem_requests_ring() as it
operates on an intel_engine_cs.
v4: Since commit 9b5f4e5ed6 ("drm/i915: Retire oldest completed request
before allocating next") we no longer need the safeguard to retire
requests before get_pages(). We no longer see the huge latencies when
hitting the shrinker between allocations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-4-git-send-email-chris@chris-wilson.co.uk
Hook the vma itself into the i915_gem_request_retire() so that we can
accurately track when a solitary vma is inactive (as opposed to having
to wait for the entire object to be idle). This improves the interaction
when using multiple contexts (with full-ppgtt) and eliminates some
frequent list walking when retiring objects after a completed request.
A side-effect is that we get an active vma reference for free. The
consequence of this is shown in the next patch...
v2: Update inline names to be consistent with
i915_gem_object_get_active()
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-25-git-send-email-chris@chris-wilson.co.uk
This patch is broken out of the next just to remove the code motion from
that patch and make it more readable. What we do here is move the
i915_vma_move_to_active() to i915_gem_execbuffer.c and put the three
stages (read, write, fenced) together so that future modifications to
active handling are all located in the same spot. The importance of this
is so that we can more simply control the order in which the requests
are place in the retirement list (i.e. control the order at which we
retire and so control the lifetimes to avoid having to hold onto
references).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-24-git-send-email-chris@chris-wilson.co.uk
If the object is active and we need to perform a relocation upon it, we
need to take the slow relocation path. Before we do, double check the
active requests to see if they have completed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-22-git-send-email-chris@chris-wilson.co.uk
In the next patch, request tracking is made more generic and for that we
need a new expanded struct and to separate out the logic changes from
the mechanical churn, we split out the structure renaming into this
patch.
v2: Writer's block. Add some spiel about why we track requests.
v3: Now i915_gem_active.
v4: Now with i915_gem_active_set() for attaching to the active request.
v5: Use i915_gem_active_set() from inside the retirement handlers
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-10-git-send-email-chris@chris-wilson.co.uk
Rather than passing a complete set of GPU cache domains for either
invalidation or for flushing, or even both, just pass a single parameter
to the engine->emit_flush to determine the required operations.
engine->emit_flush(GPU, 0) -> engine->emit_flush(EMIT_INVALIDATE)
engine->emit_flush(0, GPU) -> engine->emit_flush(EMIT_FLUSH)
engine->emit_flush(GPU, GPU) -> engine->emit_flush(EMIT_FLUSH | EMIT_INVALIDATE)
This allows us to extend the behaviour easily in future, for example if
we want just a command barrier without the overhead of flushing.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-8-git-send-email-chris@chris-wilson.co.uk
Space for flushing the GPU cache prior to completing the request is
preallocated and so cannot fail - the GPU caches will always be flushed
along with the completed request. This means we no longer have to track
whether the GPU cache is dirty between batches like we had to with the
outstanding_lazy_seqno.
With the removal of the duplication in the per-backend entry points for
emitting the obsolete lazy flush, we can then further unify the
engine->emit_flush.
v2: Expand a bit on the legacy of gpu_caches_dirty
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-18-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-7-git-send-email-chris@chris-wilson.co.uk
The state stored in this struct is not only the information about the
buffer object, but the ring used to communicate with the hardware. Using
buffer here is overly specific and, for me at least, conflates with the
notion of buffer objects themselves.
s/struct intel_ringbuffer/struct intel_ring/
s/enum intel_ring_hangcheck/enum intel_engine_hangcheck/
s/describe_ctx_ringbuf()/describe_ctx_ring()/
s/intel_ring_get_active_head()/intel_engine_get_active_head()/
s/intel_ring_sync_index()/intel_engine_sync_index()/
s/intel_ring_init_seqno()/intel_engine_init_seqno()/
s/ring_stuck()/engine_stuck()/
s/intel_cleanup_engine()/intel_engine_cleanup()/
s/intel_stop_engine()/intel_engine_stop()/
s/intel_pin_and_map_ringbuffer_obj()/intel_pin_and_map_ring()/
s/intel_unpin_ringbuffer()/intel_unpin_ring()/
s/intel_engine_create_ringbuffer()/intel_engine_create_ring()/
s/intel_ring_flush_all_caches()/intel_engine_flush_all_caches()/
s/intel_ring_invalidate_all_caches()/intel_engine_invalidate_all_caches()/
s/intel_ringbuffer_free()/intel_ring_free()/
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-15-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-4-git-send-email-chris@chris-wilson.co.uk
'ring' is an old deprecated term for a GPU engine, so we're trying to
phase out all such terminology. eb_select_ring() not only has 'ring'
(meaning engine) in its name, but it has an ugly calling convention
whereby it returns an errno and stores a pointer-to-engine indirectly
through an output parameter. As there is only one error it ever returns
(-EINVAL), we can make it return the pointer directly, and have the
caller pass back the error code -EINVAL if the pointer result is NULL.
Thus we can replace
- ret = eb_select_ring(dev_priv, file, args, &engine);
- if (ret)
- return ret;
with
+ engine = eb_select_engine(dev_priv, file, args);
+ if (!engine)
+ return -EINVAL;
for increased clarity and maybe save a few cycles too.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469034967-15840-4-git-send-email-david.s.gordon@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Precursor for fix to secure batch execution. We will need to be able to
retrieve the batch VMA (as well as the batch itself) from the eb list,
so this patch extracts that part of eb_get_batch() into a separate
function, and moves both parts to a more logical place in the file, near
where the eb list is created.
Also, it may not be obvious, but the current execbuffer2 ioctl interface
requires that the buffer object containing the batch-to-be-executed be
the LAST entry in the exec2_list[] array (I expected it to be the
first!).
To clarify this, we can replace the rather obscure construct
"list_entry(eb->vmas.prev, ...)"
in the old version of eb_get_batch() with the equivalent but more
explicit
"list_last_entry(&eb->vmas,...)"
in the new eb_get_batch_vma() and of course add an explanatory comment.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1468504324-12690-2-git-send-email-david.s.gordon@intel.com
Two different sets of flag bits are stored in the 'flags' member of a
'struct drm_i915_gem_exec_object2', and they're defined in two different
source files, increasing the risk of an accidental clash.
Some flags in this field are supplied by the user; these are defined in
i915_drm.h, and they start from the LSB and work up.
Other flags are defined in i915_gem_execbuffer, for internal use within
that file only; they start from the MSB and work down.
So here we add a compile-time check that the two sets of flags do not
overlap, which would cause all sorts of confusion.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1468504324-12690-1-git-send-email-david.s.gordon@intel.com
Since drm_i915_private is now a subclass of drm_device we do not need to
chase the drm_i915_private->dev backpointer and can instead simply
access drm_i915_private->drm directly.
text data bss dec hex filename
1068757 4565 416 1073738 10624a drivers/gpu/drm/i915/i915.ko
1066949 4565 416 1071930 105b3a drivers/gpu/drm/i915/i915.ko
Created by the coccinelle script:
@@
struct drm_i915_private *d;
identifier i;
@@
(
- d->dev->i
+ d->drm.i
|
- d->dev
+ &d->drm
)
and for good measure the dev_priv->dev backpointer was removed entirely.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467711623-2905-4-git-send-email-chris@chris-wilson.co.uk
Since we now subclass struct drm_device, we can save pointer dances by
noting the equivalence of struct drm_device and struct drm_i915_private,
i.e. by using to_i915().
text data bss dec hex filename
1073824 4562 416 1078802 107612 drivers/gpu/drm/i915/i915.ko
1068976 4562 416 1073954 106322 drivers/gpu/drm/i915/i915.ko
Created by the coccinelle script:
@@
expression E;
identifier p;
@@
- struct drm_i915_private *p = E->dev_private;
+ struct drm_i915_private *p = to_i915(E);
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467628477-25379-1-git-send-email-chris@chris-wilson.co.uk
The retire worker is a low frequency task that makes sure we retire
outstanding requests if userspace is being lax. We only need to start it
once as it remains active until the GPU is idle, so do a cheap test
before the more expensive queue_work(). A consequence of this is that we
need correct locking in the worker to make the hot path of request
submission cheap. To keep the symmetry and keep hangcheck strictly bound
by the GPU's wakelock, we move the cancel_sync(hangcheck) to the idle
worker before dropping the wakelock.
v2: Guard against RCU fouling the breadcrumbs bottom-half whilst we kick
the waiter.
v3: Remove the wakeref assertion squelching (now we hold a wakeref for
the hangcheck, any rpm error there is genuine).
v4: To prevent excess work when retiring requests, we split the busy
flag into two, a boolean to denote whether we hold the wakeref and a
bitmask of active engines.
v5: Reorder cancelling hangcheck upon idling to avoid a race where we
might cancel a hangcheck after being preempted by a new task
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
References: https://bugs.freedesktop.org/show_bug.cgi?id=88437
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-1-git-send-email-chris@chris-wilson.co.uk
Just rolling out a bit of abstraction to be able to clean
up the master logic in the next step.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Git got absolutely destroyed with all our cherry-picking from
drm-intel-next-queued to various branches. It ended up inserting
intel_crtc_page_flip 2x even in intel_display.c.
Backmerge to get back to sanity.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
drm-intel-next-2016-05-22:
- cmd-parser support for direct reg->reg loads (Ken Graunke)
- better handle DP++ smart dongles (Ville)
- bxt guc fw loading support (Nick Hoathe)
- remove a bunch of struct typedefs from dpll code (Ander)
- tons of small work all over to avoid casting between drm_device and the i915
dev struct (Tvrtko&Chris)
- untangle request retiring from other operations, also fixes reset stat corner
cases (Chris)
- skl atomic watermark support from Matt Roper, yay!
- various wm handling bugfixes from Ville
- big pile of cdclck rework for bxt/skl (Ville)
- CABC (Content Adaptive Brigthness Control) for dsi panels (Jani&Deepak M)
- nonblocking atomic commits for plane-only updates (Maarten Lankhorst)
- bunch of PSR fixes&improvements
- untangle our map/pin/sg_iter code a bit (Dave Gordon)
drm-intel-next-2016-05-08:
- refactor stolen quirks to share code between early quirks and i915 (Joonas)
- refactor gem BO/vma funcstion (Tvrtko&Dave)
- backlight over DPCD support (Yetunde Abedisi)
- more dsi panel sequence support (Jani)
- lots of refactoring around handling iomaps, vma, ring access and related
topics culmulating in removing the duplicated request tracking in the execlist
code (Chris & Tvrtko) includes a small patch for core iomapping code
- hw state readout for bxt dsi (Ramalingam C)
- cdclk cleanups (Ville)
- dedupe chv pll code a bit (Ander)
- enable semaphores on gen8+ for legacy submission, to be able to have a direct
comparison against execlist on the same platform (Chris) Not meant to be used
for anything else but performance tuning
- lvds border bit hw state checker fix (Jani)
- rpm vs. shrinker/oom-notifier fixes (Praveen Paneri)
- l3 tuning (Imre)
- revert mst dp audio, it's totally non-functional and crash-y (Lyude)
- first official dmc for kbl (Rodrigo)
- and tons of small things all over as usual
* 'drm-intel-next' of git://anongit.freedesktop.org/drm-intel: (194 commits)
drm/i915: Revert async unpin and nonblocking atomic commit
drm/i915: Update DRIVER_DATE to 20160522
drm/i915: Inline sg_next() for the optimised SGL iterator
drm/i915: Introduce & use new lightweight SGL iterators
drm/i915: optimise i915_gem_object_map() for small objects
drm/i915: refactor i915_gem_object_pin_map()
drm/i915/psr: Implement PSR2 w/a for gen9
drm/i915/psr: Use ->get_aux_send_ctl functions
drm/i915/psr: Order DP aux transactions correctly
drm/i915/psr: Make idle_frames sensible again
drm/i915/psr: Try to program link training times correctly
drm/i915/userptr: Convert to drm_i915_private
drm/i915: Allow nonblocking update of pageflips.
drm/i915: Check for unpin correctness.
Reapply "drm/i915: Avoid stalling on pending flips for legacy cursor updates"
drm/i915: Make unpin async.
drm/i915: Prepare connectors for nonblocking checks.
drm/i915: Pass atomic states to fbc update functions.
drm/i915: Remove reset_counter from intel_crtc.
drm/i915: Remove queue_flip pointer.
...
i915_gem_context_get() is a very simple wrapper around idr_find(), so
simple that it would be smaller to do the lookup inline. Also we use the
verb 'lookup' to return a pointer from a handle, freeing 'get' to imply
obtaining a reference to the context.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464098023-3294-3-git-send-email-chris@chris-wilson.co.uk
Our goal is to rename the anonymous per-engine struct beneath the
current intel_context. However, after a lively debate resolving around
the confusion between intel_context_engine and intel_engine_context, the
realisation is that the two structs target different users. The outer
struct is API / user facing, and so carries the higher level GEM
information. The inner struct is hw facing. Thus we want to name the
inner struct intel_context and the outer one i915_gem_context. As the
first step, we need to rename the current struct:
s/struct intel_context/struct i915_gem_context/
which fits much better with its constructors already conveying the
i915_gem_context prefix!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464098023-3294-1-git-send-email-chris@chris-wilson.co.uk
Pull drm updates from Dave Airlie:
"Here's the main drm pull request for 4.7, it's been a busy one, and
I've been a bit more distracted in real life this merge window. Lots
more ARM drivers, not sure if it'll ever end. I think I've at least
one more coming the next merge window.
But changes are all over the place, support for AMD Polaris GPUs is in
here, some missing GM108 support for nouveau (found in some Lenovos),
a bunch of MST and skylake fixes.
I've also noticed a few fixes from Arnd in my inbox, that I'll try and
get in asap, but I didn't think they should hold this up.
New drivers:
- Hisilicon kirin display driver
- Mediatek MT8173 display driver
- ARC PGU - bitstreamer on Synopsys ARC SDP boards
- Allwinner A13 initial RGB output driver
- Analogix driver for DisplayPort IP found in exynos and rockchip
DRM Core:
- UAPI headers fixes and C++ safety
- DRM connector reference counting
- DisplayID mode parsing for Dell 5K monitors
- Removal of struct_mutex from drivers
- Connector registration cleanups
- MST robustness fixes
- MAINTAINERS updates
- Lockless GEM object freeing
- Generic fbdev deferred IO support
panel:
- Support for a bunch of new panels
i915:
- VBT refactoring
- PLL computation cleanups
- DSI support for BXT
- Color manager support
- More atomic patches
- GEM improvements
- GuC fw loading fixes
- DP detection fixes
- SKL GPU hang fixes
- Lots of BXT fixes
radeon/amdgpu:
- Initial Polaris support
- GPUVM/Scheduler/Clock/Power improvements
- ASYNC pageflip support
- New mesa feature support
nouveau:
- GM108 support
- Power sensor support improvements
- GR init + ucode fixes.
- Use GPU provided topology information
vmwgfx:
- Add host messaging support
gma500:
- Some cleanups and fixes
atmel:
- Bridge support
- Async atomic commit support
fsl-dcu:
- Timing controller for LCD support
- Pixel clock polarity support
rcar-du:
- Misc fixes
exynos:
- Pipeline clock support
- Exynoss4533 SoC support
- HW trigger mode support
- export HDMI_PHY clock
- DECON5433 fixes
- Use generic prime functions
- use DMA mapping APIs
rockchip:
- Lots of little fixes
vc4:
- Render node support
- Gamma ramp support
- DPI output support
msm:
- Mostly cleanups and fixes
- Conversion to generic struct fence
etnaviv:
- Fix for prime buffer handling
- Allow hangcheck to be coalesced with other wakeups
tegra:
- Gamme table size fix"
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (1050 commits)
drm/edid: add displayid detailed 1 timings to the modelist. (v1.1)
drm/edid: move displayid validation to it's own function.
drm/displayid: Iterate over all DisplayID blocks
drm/edid: move displayid tiled block parsing into separate function.
drm: Nuke ->vblank_disable_allowed
drm/vmwgfx: Report vmwgfx version to vmware.log
drm/vmwgfx: Add VMWare host messaging capability
drm/vmwgfx: Kill some lockdep warnings
drm/nouveau/gr/gf100-: fix race condition in fecs/gpccs ucode
drm/nouveau/core: recognise GM108 chipsets
drm/nouveau/gr/gm107-: fix touching non-existent ppcs in attrib cb setup
drm/nouveau/gr/gk104-: share implementation of ppc exception init
drm/nouveau/gr/gk104-: move rop_active_fbps init to nonctx
drm/nouveau/bios/pll: check BIT table version before trying to parse it
drm/nouveau/bios/pll: prevent oops when limits table can't be parsed
drm/nouveau/volt/gk104: round up in gk104_volt_set
drm/nouveau/fb/gm200: setup mmu debug buffer registers at init()
drm/nouveau/fb/gk20a,gm20b: setup mmu debug buffer registers at init()
drm/nouveau/fb/gf100-: allocate mmu debug buffers
drm/nouveau/fb: allow chipset-specific actions for oneinit()
...
I'm looking at trying to possibly merge the 32-bit and 64-bit versions
of the x86 uaccess.h implementation, but first this needs to be cleaned
up.
For example, the 32-bit version of "__copy_to_user_inatomic()" is mostly
the special cases for the constant size, and it's actually never
relevant. Every user except for one aren't actually using a constant
size anyway, and the one user that uses it is better off just using
__put_user() instead.
So get rid of the unnecessary complexity.
[ The same cleanup should likely happen to __copy_from_user_inatomic()
as well, but that one has a lot more users that I need to take a look
at first ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Here's the big staging and iio driver update for 4.7-rc1.
I think we almost broke even with this release, only adding a few more
lines than we removed, which isn't bad overall given that there's a
bunch of new iio drivers added. The Lustre developers seem to have
woken up from their sleep and have been doing a great job in cleaning up
the code and pruning unused or old cruft, the filesystem is almost
readable :)
Other than that, just a lot of basic coding style cleanups in the churn.
All have been in linux-next for a while with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlc/00QACgkQMUfUDdst+ynXYQCdG9oEsw4CCItbjGfQau5YVGbd
TOcAnA19tZz+Wcg3sLT8Zsm979dgVvDt
=9UG/
-----END PGP SIGNATURE-----
Merge tag 'staging-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging and IIO driver updates from Greg KH:
"Here's the big staging and iio driver update for 4.7-rc1.
I think we almost broke even with this release, only adding a few more
lines than we removed, which isn't bad overall given that there's a
bunch of new iio drivers added.
The Lustre developers seem to have woken up from their sleep and have
been doing a great job in cleaning up the code and pruning unused or
old cruft, the filesystem is almost readable :)
Other than that, just a lot of basic coding style cleanups in the
churn. All have been in linux-next for a while with no reported
issues"
* tag 'staging-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (938 commits)
Staging: emxx_udc: emxx_udc: fixed coding style issue
staging/gdm724x: fix "alignment should match open parenthesis" issues
staging/gdm724x: Fix avoid CamelCase
staging: unisys: rename misleading var ii with frag
staging: unisys: visorhba: switch success handling to error handling
staging: unisys: visorhba: main path needs to flow down the left margin
staging: unisys: visorinput: handle_locking_key() simplifications
staging: unisys: visorhba: fail gracefully for thread creation failures
staging: unisys: visornic: comment restructuring and removing bad diction
staging: unisys: fix format string %Lx to %llx for u64
staging: unisys: remove unused struct members
staging: unisys: visorchannel: correct variable misspelling
staging: unisys: visorhba: replace functionlike macro with function
staging: dgnc: Need to check for NULL of ch
staging: dgnc: remove redundant condition check
staging: dgnc: fix 'line over 80 characters'
staging: dgnc: clean up the dgnc_get_modem_info()
staging: lustre: lnet: enable configuration per NI interface
staging: lustre: o2iblnd: properly set ibr_why
staging: lustre: o2iblnd: remove last of kiblnd_tunables_fini
...
text data bss dec hex filename
6309351 3578714 696320 10584385 a18141 vmlinux
6308391 3578714 696320 10583425 a17d81 vmlinux
Almost 1KiB of code reduction.
v2: More s/INTEL_INFO()->gen/INTEL_GEN()/ and IS_GENx() conversions
text data bss dec hex filename
6304579 3578778 696320 10579677 a16edd vmlinux
6303427 3578778 696320 10578525 a16a5d vmlinux
Now over 1KiB!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1462545621-30125-3-git-send-email-chris@chris-wilson.co.uk
This function had copies in 3 different files. Unify them in kernel.h.
Cc: Joe Perches <joe@perches.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Rob Clark <robdclark@gmail.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Acked-by: Daniel Vetter <daniel.vetter@intel.com> [drm/i915/]
Acked-by: Rob Clark <robdclark@gmail.com> [drm/msm/]
Acked-by: Lucas Stach <l.stach@pengutronix.de> [drm/etinav/]
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Conceptually, each request is a record of a hardware transaction - we
build up a list of pending commands and then either commit them to
hardware, or cancel them. However, whilst building up the list of
pending commands, we may modify state outside of the request and make
references to the pending request. If we do so and then cancel that
request, external objects then point to the deleted request leading to
both graphical and memory corruption.
The easiest example is to consider object/VMA tracking. When we mark an
object as active in a request, we store a pointer to this, the most
recent request, in the object. Then we want to free that object, we wait
for the most recent request to be idle before proceeding (otherwise the
hardware will write to pages now owned by the system, or we will attempt
to read from those pages before the hardware is finished writing). If
the request was cancelled instead, that wait completes immediately. As a
result, all requests must be committed and not cancelled if the external
state is unknown.
All that remains of i915_gem_request_cancel() users are just a couple of
extremely unlikely allocation failures, so remove the API entirely.
A consequence of committing all incomplete requests is that we generate
excess breadcrumbs and fill the ring much more often with dummy work. We
have completely undone the outstanding_last_seqno optimisation.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93907
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-16-git-send-email-chris@chris-wilson.co.uk
I have instances where I want to use drm_malloc_ab() but with a custom
gfp mask. And with those, where I want a temporary allocation, I want to
try a high-order kmalloc() before using a vmalloc().
So refactor my usage into drm_malloc_gfp().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: dri-devel@lists.freedesktop.org
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460113874-17366-6-git-send-email-chris@chris-wilson.co.uk
Refer to the GGTT VM consistently as "ggtt->base" instead of just "ggtt",
"vm" or indirectly through other variables like "dev_priv->ggtt.base"
to avoid confusion with the i915_ggtt object itself and PPGTT VMs.
Refer to the GGTT as "ggtt" instead of indirectly through chaining.
As a bonus gets rid of the long-standing i915_obj_to_ggtt vs.
i915_gem_obj_to_ggtt conflict, due to removal of i915_obj_to_ggtt!
v2:
- Added some more after grepping sources with Chris
v3:
- Refer to GGTT VM through ggtt->base consistently instead of ggtt_vm
(Chris)
v4:
- Convert all dev_priv->ggtt->foo accesses to ggtt->foo.
v5:
- Make patch checker happy
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Refer to Global GTT consistently as GGTT, thus rename dev_priv->gtt
to dev_priv->ggtt and struct i915_gtt to struct i915_ggtt.
Fix a couple of whitespace problems while at it.
v2:
- Fix a typo in commit message.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Some trivial ones, first pass done with Coccinelle:
@@
@@
(
- I915_NUM_RINGS
+ I915_NUM_ENGINES
|
- intel_ring_flag
+ intel_engine_flag
|
- for_each_ring
+ for_each_engine
|
- i915_gem_request_get_ring
+ i915_gem_request_get_engine
|
- intel_ring_idle
+ intel_engine_idle
|
- i915_gem_reset_ring_status
+ i915_gem_reset_engine_status
|
- i915_gem_reset_ring_cleanup
+ i915_gem_reset_engine_cleanup
|
- init_ring_lists
+ init_engine_lists
)
But that didn't fully work so I cleaned it up with:
for f in *.[hc]; do sed -i -e s/I915_NUM_RINGS/I915_NUM_ENGINES/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_request_get_ring/i915_gem_request_get_engine/ $f; done
for f in *.[hc]; do sed -i -e s/intel_ring_flag/intel_engine_flag/ $f; done
for f in *.[hc]; do sed -i -e s/intel_ring_idle/intel_engine_idle/ $f; done
for f in *.[hc]; do sed -i -e s/init_ring_lists/init_engine_lists/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_reset_ring_cleanup/i915_gem_reset_engine_cleanup/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_reset_ring_status/i915_gem_reset_engine_status/ $f; done
v2: Rebase.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>