We will need to wait on DMA completion (as signaled via struct fence)
before executing our i915_gem_request. Therefore we want to expose a
method for adding the await on the fence itself to the request.
v2: Add a comment detailing a failure to handle a signal-on-any
fence-array.
v3: Pretend that magic numbers don't exist.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-1-chris@chris-wilson.co.uk
We are not allowed to touch the GTT entries underneath an atomic section,
as they take a rpm wakelock (which is illegal from atomic context) and
in the near future acquiring the DMA address for a page within an object
may sleep for an allocation. This makes the current shortcircuit in
relocation_iomap() for performing a second relocation on an adjacent page
illegal, and we need to release the atomic iomapping, lookup the DMA,
insert it into the GTT before reentering the atomic iomap section.
As it happens, this is precisely what we do on if we are using an
iomapping over the full object and not just a single page and by
removing the shortcut, we do the right thing.
Fixes: 9c870d0367 ("drm/i915: Use RPM as the barrier for controlling...")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028142756.3850-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
- first slice of the gvt device model (Zhenyu et al)
- compression support for gpu error states (Chris)
- sunset clause on gpu errors resulting in dmesg noise telling users
how to report them
- .rodata diet from Tvrtko
- switch over lots of macros to only take dev_priv (Tvrtko)
- underrun suppression for dp link training (Ville)
- lspcon (hmdi 2.0 on skl/bxt) support from Shashank Sharma, polish
from Jani
- gen9 wm fixes from Paulo&Lyude
- updated ddi programming for kbl (Rodrigo)
- respect alternate aux/ddc pins (from vbt) for all ddi ports (Ville)
* tag 'drm-intel-next-2016-10-24' of git://anongit.freedesktop.org/drm-intel: (227 commits)
drm/i915: Update DRIVER_DATE to 20161024
drm/i915: Stop setting SNB min-freq-table 0 on powersave setup
drm/i915/dp: add lane_count check in intel_dp_check_link_status
drm/i915: Fix whitespace issues
drm/i915: Clean up DDI DDC/AUX CH sanitation
drm/i915: Respect alternate_ddc_pin for all DDI ports
drm/i915: Respect alternate_aux_channel for all DDI ports
drm/i915/gen9: Remove WaEnableYV12BugFixInHalfSliceChicken7
drm/i915: KBL - Recommended buffer translation programming for DisplayPort
drm/i915: Move down skl/kbl ddi iboost and n_edp_entires fixup
drm/i915: Add a sunset clause to GPU hang logging
drm/i915: Stop reporting error details in dmesg as well as the error-state
drm/i915/gvt: do not ignore return value of create_scratch_page
drm/i915/gvt: fix spare warnings on odd constant _Bool cast
drm/i915/gvt: mark symbols static where possible
drm/i915/gvt: fix sparse warnings on different address spaces
drm/i915/gvt: properly access enabled intel_engine_cs
drm/i915/gvt: Remove defunct vmap_batch()
drm/i915/gvt: Use common mapping routines for shadow_bb object
drm/i915/gvt: Use common mapping routines for indirect_ctx object
...
When handling execbuf relocations, we play a delicate dance with
pagefault. We first try to access the user pages underneath our
struct_mutex. However, if those pages were inside a GEM object, we may
trigger a pagefault and deadlock as i915_gem_fault() tries to
recursively acquire struct_mutex. Instead, we choose to disable
pagefaulting around the copy_from_user whilst inside the struct_mutex
and handle the EFAULT by falling back to a copy outside the
struct_mutex.
We however presumed that disabling pagefaults would be expensive. It is
just an operation on the local current task. Cheap enough that we can
restrict the disable/enable to the critical section around the copy, and
so avoid having to handle the atomic sections within the relocation
handling itself.
v2: Just illustrate the broken error handling rather than argue why it
is safer to ignore it, for now.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161018120251.25043-4-chris@chris-wilson.co.uk
We never used any invalid ptes, those were put in place for
a possibility of doing gpu faults. However our batchbuffers are not
restricted in length, so everything needs to be pointing to something
and thus out-of-bounds is pointing to scratch.
Remove the valid flag as it is always true.
v2: Expand commit msg, patch reorder (Mika)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1476360162-24062-1-git-send-email-michal.winiarski@intel.com
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
Pull drm updates from Dave Airlie:
"Core:
- Fence destaging work
- DRIVER_LEGACY to split off legacy drm drivers
- drm_mm refactoring
- Splitting drm_crtc.c into chunks and documenting better
- Display info fixes
- rbtree support for prime buffer lookup
- Simple VGA DAC driver
Panel:
- Add Nexus 7 panel
- More simple panels
i915:
- Refactoring GEM naming
- Refactored vma/active tracking
- Lockless request lookups
- Better stolen memory support
- FBC fixes
- SKL watermark fixes
- VGPU improvements
- dma-buf fencing support
- Better DP dongle support
amdgpu:
- Powerplay for Iceland asics
- Improved GPU reset support
- UVD/VEC powergating support for CZ/ST
- Preinitialised VRAM buffer support
- Virtual display support
- Initial SI support
- GTT rework
- PCI shutdown callback support
- HPD IRQ storm fixes
amdkfd:
- bugfixes
tilcdc:
- Atomic modesetting support
mediatek:
- AAL + GAMMA engine support
- Hook up gamma LUT
- Temporal dithering support
imx:
- Pixel clock from devicetree
- drm bridge support for LVDS bridges
- active plane reconfiguration
- VDIC deinterlacer support
- Frame synchronisation unit support
- Color space conversion support
analogix:
- PSR support
- Better panel on/off support
rockchip:
- rk3399 vop/crtc support
- PSR support
vc4:
- Interlaced vblank timing
- 3D rendering CPU overhead reduction
- HDMI output fixes
tda998x:
- HDMI audio ASoC support
sunxi:
- Allwinner A33 support
- better TCON support
msm:
- DT binding cleanups
- Explicit fence-fd support
sti:
- remove sti415/416 support
etnaviv:
- MMUv2 refactoring
- GC3000 support
exynos:
- Refactoring HDMI DCC/PHY
- G2D pm regression fix
- Page fault issues with wait for vblank
There is no nouveau work in this tree, as Ben didn't get a pull
request in, and he was fighting moving to atomic and adding mst
support, so maybe best it waits for a cycle"
* tag 'drm-for-v4.9' of git://people.freedesktop.org/~airlied/linux: (1412 commits)
drm/crtc: constify drm_crtc_index parameter
drm/i915: Fix conflict resolution from backmerge of v4.8-rc8 to drm-next
drm/i915/guc: Unwind GuC workqueue reservation if request construction fails
drm/i915: Reset the breadcrumbs IRQ more carefully
drm/i915: Force relocations via cpu if we run out of idle aperture
drm/i915: Distinguish last emitted request from last submitted request
drm/i915: Allow DP to work w/o EDID
drm/i915: Move long hpd handling into the hotplug work
drm/i915/execlists: Reinitialise context image after GPU hang
drm/i915: Use correct index for backtracking HUNG semaphores
drm/i915: Unalias obj->phys_handle and obj->userptr
drm/i915: Just clear the mmiodebug before a register access
drm/i915/gen9: only add the planes actually affected by ddb changes
drm/i915: Allow PCH DPLL sharing regardless of DPLL_SDVO_HIGH_SPEED
drm/i915/bxt: Fix HDMI DPLL configuration
drm/i915/gen9: fix the watermark res_blocks value
drm/i915/gen9: fix plane_blocks_per_line on watermarks calculations
drm/i915/gen9: minimum scanlines for Y tile is not always 4
drm/i915/gen9: fix the WaWmMemoryReadLatency implementation
drm/i915/kbl: KBL also needs to run the SAGV code
...
If we run out of enough aperture space to fit the entire object, we
fallback to trying to insert a single page. However, if that also fails,
we currently fail to userspace with an unexpected ENOSPC. (ENOSPC means
to userspace that their batch could not be fitted within the GTT.) Prior
to commit e8cb909ac3 ("drm/i915: Fallback to single page GTT
mmappings for relocations") the approach is to fallback to using the
slow CPU relocation path in case of iomapping failure, and that is the
behaviour we need to restore.
Fixes: e8cb909ac3 ("drm/i915: Fallback to single page GTT mmappings...")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98101
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161007065327.24515-2-chris@chris-wilson.co.uk
(cherry picked from commit d7f7633557)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
If we run out of enough aperture space to fit the entire object, we
fallback to trying to insert a single page. However, if that also fails,
we currently fail to userspace with an unexpected ENOSPC. (ENOSPC means
to userspace that their batch could not be fitted within the GTT.) Prior
to commit e8cb909ac3 ("drm/i915: Fallback to single page GTT
mmappings for relocations") the approach is to fallback to using the
slow CPU relocation path in case of iomapping failure, and that is the
behaviour we need to restore.
Fixes: e8cb909ac3 ("drm/i915: Fallback to single page GTT mmappings...")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98101
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161007065327.24515-2-chris@chris-wilson.co.uk
* the only remaining callers of "short" fault-ins are just as happy with generic
variants (both in lib/iov_iter.c); switch them to multipage variants, kill the
"short" ones
* rename the multipage variants to now available plain ones.
* get rid of compat macro defining iov_iter_fault_in_multipage_readable by
expanding it in its only user.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Now that we can wait upon fences before emitting the request, it becomes
trivial to wait upon any implicit fence provided by the dma-buf
reservation object.
To protect against failure, we force any asynchronous waits on a foreign
fence to timeout after 10s - so that a stall in another driver does not
permanently cripple ourselves. Still unpleasant though!
Testcase: igt/prime_vgem/fence-wait
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: John Harrison <john.c.harrison@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-21-chris@chris-wilson.co.uk
We are about to specialize object synchronisation to enable nonblocking
execbuf submission. First we make a copy of the current object
synchronisation for execbuffer. The general i915_gem_object_sync() will
be removed following the removal of CS flips in the near future.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: John Harrison <john.c.harrison@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-16-chris@chris-wilson.co.uk
Use atomic type and operands for dev_priv->mm.bsd_engine_dispatch_index
to avoid one struct_mutex locking scenario.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Zhao Yakui <yakui.zhao@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1472731101-21982-1-git-send-email-joonas.lahtinen@linux.intel.com
With full-ppgtt, we want the user to have full control over their memory
layout, with a separate instance per context. Forcing them to use a
shared memory layout for !RCS not only duplicates the amount of work we
have to do, but also defeats the memory segregation on offer.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20160822080350.4964-4-chris@chris-wilson.co.uk
Reviewed-by: John Harrison <john.c.harrison@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If userspace is asynchronously streaming into the batch or other
execobjects, we may not flush those writes along with a change in cache
domain (as there is no change). Therefore those writes may end up in
internal chipset buffers and not visible to the GPU upon execution. We
must issue a flush command or otherwise we encounter incoherency in the
batchbuffers and the GPU executing invalid commands (i.e. hanging) quite
regularly.
v2: Throw a paranoid wmb() into the general flush so that we remain
consistent with before.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90841
Fixes: 1816f92363 ("drm/i915: Support creation of unbound wc user...")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Akash Goel <akash.goel@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Tested-by: Matti Hämäläinen <ccr@tnsp.org>
Cc: stable@vger.kernel.org
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-1-chris@chris-wilson.co.uk
(cherry picked from commit 600f436801)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
As io_mapping.h now always allocates the struct, we can avoid that
allocation and extra pointer dance by embedding the struct inside
drm_i915_private
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160819155428.1670-5-chris@chris-wilson.co.uk
The single largest factor in the overhead of parsing the commands is the
setup of the virtual mapping to provide a continuous block for the batch
buffer. If we keep those vmappings around (against the better judgement
of mm/vmalloc.c, which we offset by handwaving and looking suggestively
at the shrinker) we can dramatically improve the performance of the
parser for small batches (such as media workloads). Furthermore, we can
use the prepare shmem read/write functions to determine how best we
need to clflush the range (rather than every page of the object).
The impact of caching both src/dst vmaps is +80% on ivb and +140% on byt
for the throughput on small batches. (Caching just the dst vmap and
iterating over the src, doing a page by page copy is roughly 5% slower
on both platforms. That may be an acceptable trade-off to eliminate one
cached vmapping, and we may be able to reduce the per-page copying overhead
further.) For *this* simple test case, the cmdparser is now within a
factor of 2 of ideal performance.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-33-chris@chris-wilson.co.uk
In order to handle tiled partial GTT mmappings, we need to associate the
fence with an individual vma.
v2: A couple of silly drops replaced spotted by Joonas
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-21-chris@chris-wilson.co.uk
Our current practice is to only name the actual list (here
dev_priv->fence_list) using "list", and elements upon that list are
referred to as "link". Further, the lru nature is of the list and not of
the node and including in the name does not disambiguate the link from
anything else.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-20-chris@chris-wilson.co.uk
By moving map-and-fenceable tracking from the object to the VMA, we gain
fine-grained tracking and the ability to track individual fences on the VMA
(subsequent patch).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-16-chris@chris-wilson.co.uk
If we cannot pin the entire object into the mappable region of the GTT,
try to pin a single page instead. This is much more likely to succeed,
and prevents us falling back to the clflush slow path.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-14-chris@chris-wilson.co.uk
With the introduction of the reloc page cache, we are just one step away
from refactoring the relocation write functions into one. Not only does
it tidy the code (slightly), but it greatly simplifies the control logic
much to gcc's satisfaction.
v2: Add selftests to document the relationship between the clflush
flags, the KMAP bit and packing into the page-aligned pointer.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-13-chris@chris-wilson.co.uk
When doing relocations, we have to obtain a mapping to the page
containing the target address. This is either a kmap or iomap depending
on GPU and its cache coherency. Neighbouring relocation entries are
typically within the same page and so we can cache our kmapping between
them and avoid those pesky TLB flushes.
Note that there is some sleight-of-hand in how the slow relocate works
as the reloc_entry_cache implies pagefaults disabled (as we are inside a
kmap_atomic section). However, the slow relocate code is meant to be the
fallback from the atomic fast path failing. Fortunately it works as we
already have performed the copy_from_user for the relocation array (no
more pagefaults there) and the kmap_atomic cache is enabled after we
have waited upon an active buffer (so no more sleeping in atomic).
Magic!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-7-chris@chris-wilson.co.uk
If userspace is asynchronously streaming into the batch or other
execobjects, we may not flush those writes along with a change in cache
domain (as there is no change). Therefore those writes may end up in
internal chipset buffers and not visible to the GPU upon execution. We
must issue a flush command or otherwise we encounter incoherency in the
batchbuffers and the GPU executing invalid commands (i.e. hanging) quite
regularly.
v2: Throw a paranoid wmb() into the general flush so that we remain
consistent with before.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90841
Fixes: 1816f92363 ("drm/i915: Support creation of unbound wc user...")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Akash Goel <akash.goel@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Tested-by: Matti Hämäläinen <ccr@tnsp.org>
Cc: stable@vger.kernel.org
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-1-chris@chris-wilson.co.uk
Treat the VMA as the primary struct responsible for tracking bindings
into the GPU's VM. That is we want to treat the VMA returned after we
pin an object into the VM as the cookie we hold and eventually release
when unpinning. Doing so eliminates the ambiguity in pinning the object
and then searching for the relevant pin later.
v2: Joonas' stylistic nitpicks, a fun rebase.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-27-git-send-email-chris@chris-wilson.co.uk
The VMA are unreferenced, they belong to the object and live until they
are closed. However, if we want to use the VMA as a cookie and use it to
keep the object alive, we want to hold onto a reference to the object
for the lifetime of the VMA cookie. To facilitate this, add a couple of
simple wrappers for managing the reference count on the object owning the
VMA.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-11-git-send-email-chris@chris-wilson.co.uk
request->batch_obj is only set by execbuffer for the convenience of
debugging hangs. By moving that operation to the callsite, we can
simplify all other callers and future patches. We also move the
complications of reference handling of the request->batch_obj next to
where the active tracking is set up for the request.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470832906-13972-2-git-send-email-chris@chris-wilson.co.uk
In the previous commit, we moved the obj->tiling_mode out of a bitfield
and into its own integer so that we could safely use READ_ONCE(). Let us
now repair some of that damage by sharing the tiling_mode with its
companion, the fence stride.
v2: New magic
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-18-git-send-email-chris@chris-wilson.co.uk
If the GEM objects being rendered with in this request have been
exported via dma-buf to a third party, hook ourselves into the dma-buf
reservation object so that the third party can serialise with our
rendering via the dma-buf fences.
Testcase: igt/prime_busy
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-26-git-send-email-chris@chris-wilson.co.uk
We are motivated to avoid using a bitfield for obj->active for a couple
of reasons. Firstly, we wish to document our lockless read of obj->active
using READ_ONCE inside i915_gem_busy_ioctl() and that requires an
integral type (i.e. not a bitfield). Secondly, gcc produces abysmal code
when presented with a bitfield and that shows up high on the profiles of
request tracking (mainly due to excess memory traffic as it converts
the bitfield to a register and back and generates frequent AGI in the
process).
v2: BIT, break up a long line in compute the other engines, new paint
for i915_gem_object_is_active (now i915_gem_object_get_active).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-23-git-send-email-chris@chris-wilson.co.uk
In view of adding inline functions into the intel_frontbuffer section,
we first split the header into its own file so that we can integrate it
more easily with kerneldoc.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-19-git-send-email-chris@chris-wilson.co.uk
Since i915_gem_obj_ggtt_pin() is an idiom breaking curry function for
i915_gem_object_ggtt_pin(), spare us the confusion and remove it.
Removing it now simplifies later patches to change the i915_vma_pin()
(and friends) interface.
v2: Add a redundant GEM_BUG_ON(!view) to
i915_gem_obj_lookup_or_create_ggtt_vma()
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-18-git-send-email-chris@chris-wilson.co.uk
In preparation to perform some magic to speed up i915_vma_pin(), which
is among the hottest of hot paths in execbuf, refactor all the bitfields
accessed by i915_vma_pin() into a single unified set of flags.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-16-git-send-email-chris@chris-wilson.co.uk
During execbuffer we look up the i915_vma in order to reserve them in
the VM. However, we then do a double lookup of the vma in order to then
pin them, all because we lack the necessary interfaces to operate on
i915_vma - so introduce i915_vma_pin()!
v2: Tidy parameter lists to remove one level of redirection in the hot
path.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-15-git-send-email-chris@chris-wilson.co.uk
Our GPUs impose certain requirements upon buffers that depend upon how
exactly they are used. Typically this is expressed as that they require
a larger surface than would be naively computed by pitch * height.
Normally such requirements are hidden away in the userspace driver, but
when we accept pointers from strangers and later impose extra conditions
on them, the original client allocator has no idea about the
monstrosities in the GPU and we require the userspace driver to inform
the kernel how many padding pages are required beyond the client
allocation.
v2: Long time, no see
v3: Try an anonymous union for uapi struct compatibility
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-7-git-send-email-chris@chris-wilson.co.uk
Move the single line to the callsite as the name is now misleading, and
the purpose is solely to add the request to the execution queue. Here,
we can see that if we failed to dispatch the batch from the request, we
can forgo flushing the GPU when closing the request.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-5-git-send-email-chris@chris-wilson.co.uk
This reimplements the denial-of-service protection against igt from
commit 227f782e46 ("drm/i915: Retire requests before creating a new
one") and transfers the stall from before each batch into get_pages().
The issue is that the stall is increasing latency between batches which
is detrimental in some cases (especially coupled with execlists) to
keeping the GPU well fed. Also we have made the observation that retiring
requests can of itself free objects (and requests) and therefore makes
a good first step when shrinking.
v2: Recycle objects prior to i915_gem_object_get_pages()
v3: Remove the reference to the ring from i915_gem_requests_ring() as it
operates on an intel_engine_cs.
v4: Since commit 9b5f4e5ed6 ("drm/i915: Retire oldest completed request
before allocating next") we no longer need the safeguard to retire
requests before get_pages(). We no longer see the huge latencies when
hitting the shrinker between allocations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-4-git-send-email-chris@chris-wilson.co.uk
Hook the vma itself into the i915_gem_request_retire() so that we can
accurately track when a solitary vma is inactive (as opposed to having
to wait for the entire object to be idle). This improves the interaction
when using multiple contexts (with full-ppgtt) and eliminates some
frequent list walking when retiring objects after a completed request.
A side-effect is that we get an active vma reference for free. The
consequence of this is shown in the next patch...
v2: Update inline names to be consistent with
i915_gem_object_get_active()
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-25-git-send-email-chris@chris-wilson.co.uk
This patch is broken out of the next just to remove the code motion from
that patch and make it more readable. What we do here is move the
i915_vma_move_to_active() to i915_gem_execbuffer.c and put the three
stages (read, write, fenced) together so that future modifications to
active handling are all located in the same spot. The importance of this
is so that we can more simply control the order in which the requests
are place in the retirement list (i.e. control the order at which we
retire and so control the lifetimes to avoid having to hold onto
references).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-24-git-send-email-chris@chris-wilson.co.uk
If the object is active and we need to perform a relocation upon it, we
need to take the slow relocation path. Before we do, double check the
active requests to see if they have completed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-22-git-send-email-chris@chris-wilson.co.uk
In the next patch, request tracking is made more generic and for that we
need a new expanded struct and to separate out the logic changes from
the mechanical churn, we split out the structure renaming into this
patch.
v2: Writer's block. Add some spiel about why we track requests.
v3: Now i915_gem_active.
v4: Now with i915_gem_active_set() for attaching to the active request.
v5: Use i915_gem_active_set() from inside the retirement handlers
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-10-git-send-email-chris@chris-wilson.co.uk
Rather than passing a complete set of GPU cache domains for either
invalidation or for flushing, or even both, just pass a single parameter
to the engine->emit_flush to determine the required operations.
engine->emit_flush(GPU, 0) -> engine->emit_flush(EMIT_INVALIDATE)
engine->emit_flush(0, GPU) -> engine->emit_flush(EMIT_FLUSH)
engine->emit_flush(GPU, GPU) -> engine->emit_flush(EMIT_FLUSH | EMIT_INVALIDATE)
This allows us to extend the behaviour easily in future, for example if
we want just a command barrier without the overhead of flushing.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-8-git-send-email-chris@chris-wilson.co.uk
Space for flushing the GPU cache prior to completing the request is
preallocated and so cannot fail - the GPU caches will always be flushed
along with the completed request. This means we no longer have to track
whether the GPU cache is dirty between batches like we had to with the
outstanding_lazy_seqno.
With the removal of the duplication in the per-backend entry points for
emitting the obsolete lazy flush, we can then further unify the
engine->emit_flush.
v2: Expand a bit on the legacy of gpu_caches_dirty
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-18-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-7-git-send-email-chris@chris-wilson.co.uk
The state stored in this struct is not only the information about the
buffer object, but the ring used to communicate with the hardware. Using
buffer here is overly specific and, for me at least, conflates with the
notion of buffer objects themselves.
s/struct intel_ringbuffer/struct intel_ring/
s/enum intel_ring_hangcheck/enum intel_engine_hangcheck/
s/describe_ctx_ringbuf()/describe_ctx_ring()/
s/intel_ring_get_active_head()/intel_engine_get_active_head()/
s/intel_ring_sync_index()/intel_engine_sync_index()/
s/intel_ring_init_seqno()/intel_engine_init_seqno()/
s/ring_stuck()/engine_stuck()/
s/intel_cleanup_engine()/intel_engine_cleanup()/
s/intel_stop_engine()/intel_engine_stop()/
s/intel_pin_and_map_ringbuffer_obj()/intel_pin_and_map_ring()/
s/intel_unpin_ringbuffer()/intel_unpin_ring()/
s/intel_engine_create_ringbuffer()/intel_engine_create_ring()/
s/intel_ring_flush_all_caches()/intel_engine_flush_all_caches()/
s/intel_ring_invalidate_all_caches()/intel_engine_invalidate_all_caches()/
s/intel_ringbuffer_free()/intel_ring_free()/
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-15-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-4-git-send-email-chris@chris-wilson.co.uk
'ring' is an old deprecated term for a GPU engine, so we're trying to
phase out all such terminology. eb_select_ring() not only has 'ring'
(meaning engine) in its name, but it has an ugly calling convention
whereby it returns an errno and stores a pointer-to-engine indirectly
through an output parameter. As there is only one error it ever returns
(-EINVAL), we can make it return the pointer directly, and have the
caller pass back the error code -EINVAL if the pointer result is NULL.
Thus we can replace
- ret = eb_select_ring(dev_priv, file, args, &engine);
- if (ret)
- return ret;
with
+ engine = eb_select_engine(dev_priv, file, args);
+ if (!engine)
+ return -EINVAL;
for increased clarity and maybe save a few cycles too.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469034967-15840-4-git-send-email-david.s.gordon@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Precursor for fix to secure batch execution. We will need to be able to
retrieve the batch VMA (as well as the batch itself) from the eb list,
so this patch extracts that part of eb_get_batch() into a separate
function, and moves both parts to a more logical place in the file, near
where the eb list is created.
Also, it may not be obvious, but the current execbuffer2 ioctl interface
requires that the buffer object containing the batch-to-be-executed be
the LAST entry in the exec2_list[] array (I expected it to be the
first!).
To clarify this, we can replace the rather obscure construct
"list_entry(eb->vmas.prev, ...)"
in the old version of eb_get_batch() with the equivalent but more
explicit
"list_last_entry(&eb->vmas,...)"
in the new eb_get_batch_vma() and of course add an explanatory comment.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1468504324-12690-2-git-send-email-david.s.gordon@intel.com
Two different sets of flag bits are stored in the 'flags' member of a
'struct drm_i915_gem_exec_object2', and they're defined in two different
source files, increasing the risk of an accidental clash.
Some flags in this field are supplied by the user; these are defined in
i915_drm.h, and they start from the LSB and work up.
Other flags are defined in i915_gem_execbuffer, for internal use within
that file only; they start from the MSB and work down.
So here we add a compile-time check that the two sets of flags do not
overlap, which would cause all sorts of confusion.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1468504324-12690-1-git-send-email-david.s.gordon@intel.com
Since drm_i915_private is now a subclass of drm_device we do not need to
chase the drm_i915_private->dev backpointer and can instead simply
access drm_i915_private->drm directly.
text data bss dec hex filename
1068757 4565 416 1073738 10624a drivers/gpu/drm/i915/i915.ko
1066949 4565 416 1071930 105b3a drivers/gpu/drm/i915/i915.ko
Created by the coccinelle script:
@@
struct drm_i915_private *d;
identifier i;
@@
(
- d->dev->i
+ d->drm.i
|
- d->dev
+ &d->drm
)
and for good measure the dev_priv->dev backpointer was removed entirely.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467711623-2905-4-git-send-email-chris@chris-wilson.co.uk
Since we now subclass struct drm_device, we can save pointer dances by
noting the equivalence of struct drm_device and struct drm_i915_private,
i.e. by using to_i915().
text data bss dec hex filename
1073824 4562 416 1078802 107612 drivers/gpu/drm/i915/i915.ko
1068976 4562 416 1073954 106322 drivers/gpu/drm/i915/i915.ko
Created by the coccinelle script:
@@
expression E;
identifier p;
@@
- struct drm_i915_private *p = E->dev_private;
+ struct drm_i915_private *p = to_i915(E);
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467628477-25379-1-git-send-email-chris@chris-wilson.co.uk
The retire worker is a low frequency task that makes sure we retire
outstanding requests if userspace is being lax. We only need to start it
once as it remains active until the GPU is idle, so do a cheap test
before the more expensive queue_work(). A consequence of this is that we
need correct locking in the worker to make the hot path of request
submission cheap. To keep the symmetry and keep hangcheck strictly bound
by the GPU's wakelock, we move the cancel_sync(hangcheck) to the idle
worker before dropping the wakelock.
v2: Guard against RCU fouling the breadcrumbs bottom-half whilst we kick
the waiter.
v3: Remove the wakeref assertion squelching (now we hold a wakeref for
the hangcheck, any rpm error there is genuine).
v4: To prevent excess work when retiring requests, we split the busy
flag into two, a boolean to denote whether we hold the wakeref and a
bitmask of active engines.
v5: Reorder cancelling hangcheck upon idling to avoid a race where we
might cancel a hangcheck after being preempted by a new task
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
References: https://bugs.freedesktop.org/show_bug.cgi?id=88437
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-1-git-send-email-chris@chris-wilson.co.uk
Just rolling out a bit of abstraction to be able to clean
up the master logic in the next step.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Git got absolutely destroyed with all our cherry-picking from
drm-intel-next-queued to various branches. It ended up inserting
intel_crtc_page_flip 2x even in intel_display.c.
Backmerge to get back to sanity.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
drm-intel-next-2016-05-22:
- cmd-parser support for direct reg->reg loads (Ken Graunke)
- better handle DP++ smart dongles (Ville)
- bxt guc fw loading support (Nick Hoathe)
- remove a bunch of struct typedefs from dpll code (Ander)
- tons of small work all over to avoid casting between drm_device and the i915
dev struct (Tvrtko&Chris)
- untangle request retiring from other operations, also fixes reset stat corner
cases (Chris)
- skl atomic watermark support from Matt Roper, yay!
- various wm handling bugfixes from Ville
- big pile of cdclck rework for bxt/skl (Ville)
- CABC (Content Adaptive Brigthness Control) for dsi panels (Jani&Deepak M)
- nonblocking atomic commits for plane-only updates (Maarten Lankhorst)
- bunch of PSR fixes&improvements
- untangle our map/pin/sg_iter code a bit (Dave Gordon)
drm-intel-next-2016-05-08:
- refactor stolen quirks to share code between early quirks and i915 (Joonas)
- refactor gem BO/vma funcstion (Tvrtko&Dave)
- backlight over DPCD support (Yetunde Abedisi)
- more dsi panel sequence support (Jani)
- lots of refactoring around handling iomaps, vma, ring access and related
topics culmulating in removing the duplicated request tracking in the execlist
code (Chris & Tvrtko) includes a small patch for core iomapping code
- hw state readout for bxt dsi (Ramalingam C)
- cdclk cleanups (Ville)
- dedupe chv pll code a bit (Ander)
- enable semaphores on gen8+ for legacy submission, to be able to have a direct
comparison against execlist on the same platform (Chris) Not meant to be used
for anything else but performance tuning
- lvds border bit hw state checker fix (Jani)
- rpm vs. shrinker/oom-notifier fixes (Praveen Paneri)
- l3 tuning (Imre)
- revert mst dp audio, it's totally non-functional and crash-y (Lyude)
- first official dmc for kbl (Rodrigo)
- and tons of small things all over as usual
* 'drm-intel-next' of git://anongit.freedesktop.org/drm-intel: (194 commits)
drm/i915: Revert async unpin and nonblocking atomic commit
drm/i915: Update DRIVER_DATE to 20160522
drm/i915: Inline sg_next() for the optimised SGL iterator
drm/i915: Introduce & use new lightweight SGL iterators
drm/i915: optimise i915_gem_object_map() for small objects
drm/i915: refactor i915_gem_object_pin_map()
drm/i915/psr: Implement PSR2 w/a for gen9
drm/i915/psr: Use ->get_aux_send_ctl functions
drm/i915/psr: Order DP aux transactions correctly
drm/i915/psr: Make idle_frames sensible again
drm/i915/psr: Try to program link training times correctly
drm/i915/userptr: Convert to drm_i915_private
drm/i915: Allow nonblocking update of pageflips.
drm/i915: Check for unpin correctness.
Reapply "drm/i915: Avoid stalling on pending flips for legacy cursor updates"
drm/i915: Make unpin async.
drm/i915: Prepare connectors for nonblocking checks.
drm/i915: Pass atomic states to fbc update functions.
drm/i915: Remove reset_counter from intel_crtc.
drm/i915: Remove queue_flip pointer.
...
i915_gem_context_get() is a very simple wrapper around idr_find(), so
simple that it would be smaller to do the lookup inline. Also we use the
verb 'lookup' to return a pointer from a handle, freeing 'get' to imply
obtaining a reference to the context.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464098023-3294-3-git-send-email-chris@chris-wilson.co.uk
Our goal is to rename the anonymous per-engine struct beneath the
current intel_context. However, after a lively debate resolving around
the confusion between intel_context_engine and intel_engine_context, the
realisation is that the two structs target different users. The outer
struct is API / user facing, and so carries the higher level GEM
information. The inner struct is hw facing. Thus we want to name the
inner struct intel_context and the outer one i915_gem_context. As the
first step, we need to rename the current struct:
s/struct intel_context/struct i915_gem_context/
which fits much better with its constructors already conveying the
i915_gem_context prefix!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464098023-3294-1-git-send-email-chris@chris-wilson.co.uk
Pull drm updates from Dave Airlie:
"Here's the main drm pull request for 4.7, it's been a busy one, and
I've been a bit more distracted in real life this merge window. Lots
more ARM drivers, not sure if it'll ever end. I think I've at least
one more coming the next merge window.
But changes are all over the place, support for AMD Polaris GPUs is in
here, some missing GM108 support for nouveau (found in some Lenovos),
a bunch of MST and skylake fixes.
I've also noticed a few fixes from Arnd in my inbox, that I'll try and
get in asap, but I didn't think they should hold this up.
New drivers:
- Hisilicon kirin display driver
- Mediatek MT8173 display driver
- ARC PGU - bitstreamer on Synopsys ARC SDP boards
- Allwinner A13 initial RGB output driver
- Analogix driver for DisplayPort IP found in exynos and rockchip
DRM Core:
- UAPI headers fixes and C++ safety
- DRM connector reference counting
- DisplayID mode parsing for Dell 5K monitors
- Removal of struct_mutex from drivers
- Connector registration cleanups
- MST robustness fixes
- MAINTAINERS updates
- Lockless GEM object freeing
- Generic fbdev deferred IO support
panel:
- Support for a bunch of new panels
i915:
- VBT refactoring
- PLL computation cleanups
- DSI support for BXT
- Color manager support
- More atomic patches
- GEM improvements
- GuC fw loading fixes
- DP detection fixes
- SKL GPU hang fixes
- Lots of BXT fixes
radeon/amdgpu:
- Initial Polaris support
- GPUVM/Scheduler/Clock/Power improvements
- ASYNC pageflip support
- New mesa feature support
nouveau:
- GM108 support
- Power sensor support improvements
- GR init + ucode fixes.
- Use GPU provided topology information
vmwgfx:
- Add host messaging support
gma500:
- Some cleanups and fixes
atmel:
- Bridge support
- Async atomic commit support
fsl-dcu:
- Timing controller for LCD support
- Pixel clock polarity support
rcar-du:
- Misc fixes
exynos:
- Pipeline clock support
- Exynoss4533 SoC support
- HW trigger mode support
- export HDMI_PHY clock
- DECON5433 fixes
- Use generic prime functions
- use DMA mapping APIs
rockchip:
- Lots of little fixes
vc4:
- Render node support
- Gamma ramp support
- DPI output support
msm:
- Mostly cleanups and fixes
- Conversion to generic struct fence
etnaviv:
- Fix for prime buffer handling
- Allow hangcheck to be coalesced with other wakeups
tegra:
- Gamme table size fix"
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (1050 commits)
drm/edid: add displayid detailed 1 timings to the modelist. (v1.1)
drm/edid: move displayid validation to it's own function.
drm/displayid: Iterate over all DisplayID blocks
drm/edid: move displayid tiled block parsing into separate function.
drm: Nuke ->vblank_disable_allowed
drm/vmwgfx: Report vmwgfx version to vmware.log
drm/vmwgfx: Add VMWare host messaging capability
drm/vmwgfx: Kill some lockdep warnings
drm/nouveau/gr/gf100-: fix race condition in fecs/gpccs ucode
drm/nouveau/core: recognise GM108 chipsets
drm/nouveau/gr/gm107-: fix touching non-existent ppcs in attrib cb setup
drm/nouveau/gr/gk104-: share implementation of ppc exception init
drm/nouveau/gr/gk104-: move rop_active_fbps init to nonctx
drm/nouveau/bios/pll: check BIT table version before trying to parse it
drm/nouveau/bios/pll: prevent oops when limits table can't be parsed
drm/nouveau/volt/gk104: round up in gk104_volt_set
drm/nouveau/fb/gm200: setup mmu debug buffer registers at init()
drm/nouveau/fb/gk20a,gm20b: setup mmu debug buffer registers at init()
drm/nouveau/fb/gf100-: allocate mmu debug buffers
drm/nouveau/fb: allow chipset-specific actions for oneinit()
...
I'm looking at trying to possibly merge the 32-bit and 64-bit versions
of the x86 uaccess.h implementation, but first this needs to be cleaned
up.
For example, the 32-bit version of "__copy_to_user_inatomic()" is mostly
the special cases for the constant size, and it's actually never
relevant. Every user except for one aren't actually using a constant
size anyway, and the one user that uses it is better off just using
__put_user() instead.
So get rid of the unnecessary complexity.
[ The same cleanup should likely happen to __copy_from_user_inatomic()
as well, but that one has a lot more users that I need to take a look
at first ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Here's the big staging and iio driver update for 4.7-rc1.
I think we almost broke even with this release, only adding a few more
lines than we removed, which isn't bad overall given that there's a
bunch of new iio drivers added. The Lustre developers seem to have
woken up from their sleep and have been doing a great job in cleaning up
the code and pruning unused or old cruft, the filesystem is almost
readable :)
Other than that, just a lot of basic coding style cleanups in the churn.
All have been in linux-next for a while with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlc/00QACgkQMUfUDdst+ynXYQCdG9oEsw4CCItbjGfQau5YVGbd
TOcAnA19tZz+Wcg3sLT8Zsm979dgVvDt
=9UG/
-----END PGP SIGNATURE-----
Merge tag 'staging-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging and IIO driver updates from Greg KH:
"Here's the big staging and iio driver update for 4.7-rc1.
I think we almost broke even with this release, only adding a few more
lines than we removed, which isn't bad overall given that there's a
bunch of new iio drivers added.
The Lustre developers seem to have woken up from their sleep and have
been doing a great job in cleaning up the code and pruning unused or
old cruft, the filesystem is almost readable :)
Other than that, just a lot of basic coding style cleanups in the
churn. All have been in linux-next for a while with no reported
issues"
* tag 'staging-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (938 commits)
Staging: emxx_udc: emxx_udc: fixed coding style issue
staging/gdm724x: fix "alignment should match open parenthesis" issues
staging/gdm724x: Fix avoid CamelCase
staging: unisys: rename misleading var ii with frag
staging: unisys: visorhba: switch success handling to error handling
staging: unisys: visorhba: main path needs to flow down the left margin
staging: unisys: visorinput: handle_locking_key() simplifications
staging: unisys: visorhba: fail gracefully for thread creation failures
staging: unisys: visornic: comment restructuring and removing bad diction
staging: unisys: fix format string %Lx to %llx for u64
staging: unisys: remove unused struct members
staging: unisys: visorchannel: correct variable misspelling
staging: unisys: visorhba: replace functionlike macro with function
staging: dgnc: Need to check for NULL of ch
staging: dgnc: remove redundant condition check
staging: dgnc: fix 'line over 80 characters'
staging: dgnc: clean up the dgnc_get_modem_info()
staging: lustre: lnet: enable configuration per NI interface
staging: lustre: o2iblnd: properly set ibr_why
staging: lustre: o2iblnd: remove last of kiblnd_tunables_fini
...
text data bss dec hex filename
6309351 3578714 696320 10584385 a18141 vmlinux
6308391 3578714 696320 10583425 a17d81 vmlinux
Almost 1KiB of code reduction.
v2: More s/INTEL_INFO()->gen/INTEL_GEN()/ and IS_GENx() conversions
text data bss dec hex filename
6304579 3578778 696320 10579677 a16edd vmlinux
6303427 3578778 696320 10578525 a16a5d vmlinux
Now over 1KiB!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1462545621-30125-3-git-send-email-chris@chris-wilson.co.uk
This function had copies in 3 different files. Unify them in kernel.h.
Cc: Joe Perches <joe@perches.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Rob Clark <robdclark@gmail.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Acked-by: Daniel Vetter <daniel.vetter@intel.com> [drm/i915/]
Acked-by: Rob Clark <robdclark@gmail.com> [drm/msm/]
Acked-by: Lucas Stach <l.stach@pengutronix.de> [drm/etinav/]
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Conceptually, each request is a record of a hardware transaction - we
build up a list of pending commands and then either commit them to
hardware, or cancel them. However, whilst building up the list of
pending commands, we may modify state outside of the request and make
references to the pending request. If we do so and then cancel that
request, external objects then point to the deleted request leading to
both graphical and memory corruption.
The easiest example is to consider object/VMA tracking. When we mark an
object as active in a request, we store a pointer to this, the most
recent request, in the object. Then we want to free that object, we wait
for the most recent request to be idle before proceeding (otherwise the
hardware will write to pages now owned by the system, or we will attempt
to read from those pages before the hardware is finished writing). If
the request was cancelled instead, that wait completes immediately. As a
result, all requests must be committed and not cancelled if the external
state is unknown.
All that remains of i915_gem_request_cancel() users are just a couple of
extremely unlikely allocation failures, so remove the API entirely.
A consequence of committing all incomplete requests is that we generate
excess breadcrumbs and fill the ring much more often with dummy work. We
have completely undone the outstanding_last_seqno optimisation.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93907
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-16-git-send-email-chris@chris-wilson.co.uk
I have instances where I want to use drm_malloc_ab() but with a custom
gfp mask. And with those, where I want a temporary allocation, I want to
try a high-order kmalloc() before using a vmalloc().
So refactor my usage into drm_malloc_gfp().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: dri-devel@lists.freedesktop.org
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460113874-17366-6-git-send-email-chris@chris-wilson.co.uk
Refer to the GGTT VM consistently as "ggtt->base" instead of just "ggtt",
"vm" or indirectly through other variables like "dev_priv->ggtt.base"
to avoid confusion with the i915_ggtt object itself and PPGTT VMs.
Refer to the GGTT as "ggtt" instead of indirectly through chaining.
As a bonus gets rid of the long-standing i915_obj_to_ggtt vs.
i915_gem_obj_to_ggtt conflict, due to removal of i915_obj_to_ggtt!
v2:
- Added some more after grepping sources with Chris
v3:
- Refer to GGTT VM through ggtt->base consistently instead of ggtt_vm
(Chris)
v4:
- Convert all dev_priv->ggtt->foo accesses to ggtt->foo.
v5:
- Make patch checker happy
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Refer to Global GTT consistently as GGTT, thus rename dev_priv->gtt
to dev_priv->ggtt and struct i915_gtt to struct i915_ggtt.
Fix a couple of whitespace problems while at it.
v2:
- Fix a typo in commit message.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Some trivial ones, first pass done with Coccinelle:
@@
@@
(
- I915_NUM_RINGS
+ I915_NUM_ENGINES
|
- intel_ring_flag
+ intel_engine_flag
|
- for_each_ring
+ for_each_engine
|
- i915_gem_request_get_ring
+ i915_gem_request_get_engine
|
- intel_ring_idle
+ intel_engine_idle
|
- i915_gem_reset_ring_status
+ i915_gem_reset_engine_status
|
- i915_gem_reset_ring_cleanup
+ i915_gem_reset_engine_cleanup
|
- init_ring_lists
+ init_engine_lists
)
But that didn't fully work so I cleaned it up with:
for f in *.[hc]; do sed -i -e s/I915_NUM_RINGS/I915_NUM_ENGINES/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_request_get_ring/i915_gem_request_get_engine/ $f; done
for f in *.[hc]; do sed -i -e s/intel_ring_flag/intel_engine_flag/ $f; done
for f in *.[hc]; do sed -i -e s/intel_ring_idle/intel_engine_idle/ $f; done
for f in *.[hc]; do sed -i -e s/init_ring_lists/init_engine_lists/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_reset_ring_cleanup/i915_gem_reset_engine_cleanup/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_reset_ring_status/i915_gem_reset_engine_status/ $f; done
v2: Rebase.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
The multiple levels of indirect do nothing but hinder the compiler and
the pointer chasing turns to be quite painful but painless to fix.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1456484600-11477-1-git-send-email-tvrtko.ursulin@linux.intel.com
In the error-handling paths of i915_gem_do_execbuffer() and
intel_crtc_page_flip(), the local pointer-to-request variables
were expected to be either valid pointers or NULL. Since
2682708 drm/i915: simplify allocation of driver-internal requests
they could also be ERR_PTR() values, so the tests need to be
updated to accommodate this case.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453978089-29127-1-git-send-email-tvrtko.ursulin@linux.intel.com
This got broken in:
commit de1add3605
Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Date: Fri Jan 15 15:12:50 2016 +0000
drm/i915: Decouple execbuf uAPI from internal implementation
BSD ring flags need to be shifted before they can be considered
indices into the ring array.
Reported by Zhipeng Gong.
v2: Simplify the code. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Zhipeng Gong <zhipeng.gong@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1453902069-31353-1-git-send-email-tvrtko.ursulin@linux.intel.com
Testcase: igt/gem_exec_basic # bdw-gt3
At the moment execbuf ring selection is fully coupled to
internal ring ids which is not a good thing on its own.
This dependency is also spread between two source files and
not spelled out at either side which makes it hidden and
fragile.
This patch decouples this dependency by introducing an explicit
translation table of execbuf uAPI to ring id close to the only
call site (i915_gem_do_execbuffer).
This way we are free to change driver internal implementation
details without breaking userspace. All state relating to the
uAPI is now contained in, or next to, i915_gem_do_execbuffer.
As a side benefit, this patch decreases the compiled size
of i915_gem_do_execbuffer.
v2: Extract ring selection into eb_select_ring. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1452870770-13981-1-git-send-email-tvrtko.ursulin@linux.intel.com
There are a number of places where the driver needs a request, but isn't
working on behalf of any specific user or in a specific context. At
present, we associate them with the per-engine default context. A future
patch will abolish those per-engine context pointers; but we can already
eliminate a lot of the references to them, just by making the allocator
allow NULL as a shorthand for "an appropriate context for this ring",
which will mean that the callers don't need to know anything about how
the "appropriate context" is found (e.g. per-ring vs per-device, etc).
So this patch renames the existing i915_gem_request_alloc(), and makes
it local (static inline), and replaces it with a wrapper that provides
a default if the context is NULL, and also has a nicer calling
convention (doesn't require a pointer to an output parameter). Then we
change all callers to use the new convention:
OLD:
err = i915_gem_request_alloc(ring, user_ctx, &req);
if (err) ...
NEW:
req = i915_gem_request_alloc(ring, user_ctx);
if (IS_ERR(req)) ...
OLD:
err = i915_gem_request_alloc(ring, ring->default_context, &req);
if (err) ...
NEW:
req = i915_gem_request_alloc(ring, NULL);
if (IS_ERR(req)) ...
v4: Rebased
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Nick Hoath <nicholas.hoath@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453230175-19330-2-git-send-email-david.s.gordon@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
MI_BATCH_BUFFER is nasty since it requires that userspace pass in the
correct batch length.
Let's switch to using MI_BATCH_BUFFER_START instead (like we do on
other platforms). Then we don't have to specify the batch length
at all, and the CS will instead execute until it sees the
MI_BATCH_BUFFER_END.
We still need the batch length since we do the CS TLB workaround
and copy the batch into the permanently pinned scratch object
and execute it from there. But for this we can simply use the
batch object length when the user hasn't specified the actual
batch length. So specifying the batch length becomes just a
way to optimize the batch copy a little bit.
We lost batch_len from a bunch of igts (including the quiesce batch)
so without this igt is utterly broken on 830/845. Also some igts such
as gem_cpu_reloc never specified the batch_len and so didn't work.
With MI_BATCH_BUFFER_START we don't have to fix up igt every time
someone forgets that 830/845 exist.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1450110229-30450-11-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
According to PRM, some parts of HW require the addresses to be in
a canonical form, where bits [63:48] == [47]. Let's convert addresses to
canonical form prior to relocating and return converted offsets to
userspace. We also need to make sure that userspace is using addresses
in canonical form in case of softpin.
v2: Whitespace fixup, gen8_canonical_addr description (Chris, Ville)
v3: Rebase on top of softpin, fix a hole in relocate_entry,
s/expect/require (Chris)
v4: Handle softpin in validate_exec_list (Chris)
v5: Convert back to canonical form at copy_to_user time (Chris)
v6: Don't use struct exec_object2 in place of exec_object
v7: Use sign_extend64 for converting to canonical form (Joonas),
reject non-canonical and non-page-aligned offset for softpin (Chris)
v8: Convert back to non-canonical form in a function,
split the test for EXEC_OBJECT_PINNED (Chris)
v9: s/canonial/canonical, drop accidental double newline (Chris)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1451409892-13708-1-git-send-email-michal.winiarski@intel.com
Testcase: igt/gem_bad_reloc/negative-reloc-blt
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92699
Cc: drm-intel-fixes@lists.freedesktop.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In various places, a single page of a (regular) GEM object is mapped into
CPU address space and updated. In each such case, either the page or the
the object should be marked dirty, to ensure that the modifications are
not discarded if the object is evicted under memory pressure.
The typical sequence is:
va = kmap_atomic(i915_gem_object_get_page(obj, pageno));
*(va+offset) = ...
kunmap_atomic(va);
Here we introduce i915_gem_object_get_dirty_page(), which performs the
same operation as i915_gem_object_get_page() but with the side-effect
of marking the returned page dirty in the pagecache. This will ensure
that if the object is subsequently evicted (due to memory pressure),
the changes are written to backing store rather than discarded.
Note that it works only for regular (shmfs-backed) GEM objects, but (at
least for now) those are the only ones that are updated in this way --
the objects in question are contexts and batchbuffers, which are always
shmfs-backed.
Separate patches deal with the cases where whole objects are (or may
be) dirtied.
v3: Mark two more pages dirty in the page-boundary-crossing
cases of the execbuffer relocation code [Chris Wilson]
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1449773486-30822-2-git-send-email-david.s.gordon@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Userspace can pass in an offset that it presumes the object is located
at. The kernel will then do its utmost to fit the object into that
location. The assumption is that userspace is handling its own object
locations (for example along with full-ppgtt) and that the kernel will
rarely have to make space for the user's requests.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
v2: Fixed incorrect eviction found by Michal Winiarski - fix suggested by Chris
Wilson. Fixed incorrect error paths causing crash found by Michal Winiarski.
(Not published externally)
v3: Rebased because of trivial conflict in object_bind_to_vm. Fixed eviction
to allow eviction of soft-pinned objects when another soft-pinned object used
by a subsequent execbuffer overlaps reported by Michal Winiarski.
(Not published externally)
v4: Moved soft-pinned objects to the front of ordered_vmas so that they are
pinned first after an address conflict happens to avoid repeated conflicts in
rare cases (Suggested by Chris Wilson). Expanded comment on
drm_i915_gem_exec_object2.offset to cover this new API.
v5: Added I915_PARAM_HAS_EXEC_SOFTPIN parameter for detecting this capability
(Kristian). Added check for multiple pinnings on eviction (Akash). Made sure
buffers are not considered misplaced without the user specifying
EXEC_OBJECT_SUPPORTS_48B_ADDRESS. User must assume responsibility for any
addressing workarounds. Updated object2.offset field comment again to clarify
NO_RELOC case (Chris). checkpatch cleanup.
v6: Trivial rebase on latest drm-intel-nightly
v7: Catch attempts to pin above the max virtual address size and return
EINVAL (Tvrtko). Decouple EXEC_OBJECT_SUPPORTS_48B_ADDRESS and
EXEC_OBJECT_PINNED flags, user must pass both flags in any attempt to pin
something at an offset above 4GB (Chris, Daniel Vetter).
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Akash Goel <akash.goel@intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Zou Nanhai <nanhai.zou@intel.com>
Cc: Kristian Høgsberg <hoegsberg@gmail.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Acked-by: PDT
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1449575707-20933-1-git-send-email-thomas.daniel@intel.com
When register type safety happens, we can't just try to emit the
register itself to the ring. Instead we'll need to extract the
offset from it first. Add some convenience functions that will do
that.
v2: Convert MOCS setup too
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1446672017-24497-20-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Passing cliprects into the kernel for it to re-execute the batch buffer
with different CMD_DRAWRECT died out long ago. As DRI1 support has been
removed from the kernel, we can now simply reject any execbuf trying to
use this "feature".
To keep Daniel happy with the prospect of being able to reuse these
fields in the next decade, continue to ensure that current userspace is
not passing garbage in through the dead fields.
v2: Fix the cliprects_ptr check
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There are some allocations that must be only referenced by 32-bit
offsets. To limit the chances of having the first 4GB already full,
objects not requiring this workaround use DRM_MM_SEARCH_BELOW/
DRM_MM_CREATE_TOP flags
In specific, any resource used with flat/heapless (0x00000000-0xfffff000)
General State Heap (GSH) or Instruction State Heap (ISH) must be in a
32-bit range, because the General State Offset and Instruction State
Offset are limited to 32-bits.
Objects must have EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag to indicate if
they can be allocated above the 32-bit address range. To limit the
chances of having the first 4GB already full, objects will use
DRM_MM_SEARCH_BELOW + DRM_MM_CREATE_TOP flags when possible.
The libdrm user of the EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag is here:
http://lists.freedesktop.org/archives/intel-gfx/2015-September/075836.html
v2: Changed flag logic from neeeds_32b, to supports_48b.
v3: Moved 48-bit support flag back to exec_object. (Chris, Daniel)
v4: Split pin flags into PIN_ZONE_4G and PIN_HIGH; update PIN_OFFSET_MASK
to use last PIN_ defined instead of hard-coded value; use correct limit
check in eb_vma_misplaced. (Chris)
v5: Don't touch PIN_OFFSET_MASK and update workaround comment (Chris)
v6: Apply pin-high for ggtt too (Chris)
v7: Handle simultaneous pin-high and pin-mappable end correctly (Akash)
Fix check for entries currently using +4GB addresses, use min_t and
other polish in object_bind_to_vm (Chris)
v8: Commit message updated to point to libdrm patch.
v9: vmas are allocated in the correct ozone, so only check flag when the
vma has not been allocated. (Chris)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v4)
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Backmerge to catch up with 4.3. slightly more involved conflict in the
irq code, but nothing beyond adjacent changes.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Extend init/init_hw split to context init.
- Move context initialisation in to i915_gem_init_hw
- Move one off initialisation for render ring to
i915_gem_validate_context
- Move default context initialisation to logical_ring_init
Rename intel_lr_context_deferred_create to
intel_lr_context_deferred_alloc, to reflect reduced functionality &
alloc/init split.
This patch is intended to split out the allocation of resources &
initialisation to allow easier reuse of code for resume/gpu reset.
v2: Removed function ptr wrapping of do_switch_context (Daniel Vetter)
Left ->init_context int intel_lr_context_deferred_alloc
(Daniel Vetter)
Remove unnecessary init flag & ring type test. (Daniel Vetter)
Improve commit message (Daniel Vetter)
v3: On init/reinit, set the hw next sequence number to the sw next
sequence number. This is set to 1 at driver load time. This prevents
the seqno being reset on reinit (Chris Wilson)
v4: Set seqno back to ~0 - 0x1000 at start-of-day, and increment by 0x100
on reset.
This makes it obvious which bbs are which after a reset. (David Gordon
& John Harrison)
Rebase.
v5: Rebase. Fixed rebase breakage. Put context pinning in separate
function. Removed code churn. (Thomas Daniel)
v6: Cleanup up issues introduced in v2 & v5 (Thomas Daniel)
Issue: VIZ-4798
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Harrison <john.c.harrison@intel.com>
Cc: David Gordon <david.s.gordon@intel.com>
Cc: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There have been many hard to track down bugs whereby userspace forgot to
flag a write buffer and then cause graphics corruption or a hung GPU
when that buffer was later purged under memory pressure (as the buffer
appeared clean, its pages would have been evicted rather than preserved
and any changes more recent than in the backing storage would be lost).
In retrospect this is a rare optimisation against memory pressure,
already the slow path. If we always mark the buffer as dirty when
accessed by the GPU, anything not used can still be evicted cheaply
(ideal behaviour for mark-and-sweep eviction) but we do not run the risk
of corruption. For correct read serialisation, userspace still has to
notify when the GPU writes to an object. However, there are certain
situations under which userspace may wish to tell white lies to the
kernel...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: "Goel, Akash" <akash.goel@intel.co>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Backmerge fixes since it's getting out of hand again with the massive
split due to atomic between -next and 4.2-rc. All the bugfixes in
4.2-rc are addressed already (by converting more towards atomic
instead of minimal duct-tape) so just always pick the version in next
for the conflicts in modeset code.
All the other conflicts are just adjacent lines changed.
Conflicts:
drivers/gpu/drm/i915/i915_drv.h
drivers/gpu/drm/i915/i915_gem_gtt.c
drivers/gpu/drm/i915/intel_display.c
drivers/gpu/drm/i915/intel_drv.h
drivers/gpu/drm/i915/intel_ringbuffer.h
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Ensures that the batch buffer is executed by the resource streamer.
And will let userspace know whether Resource Streamer is supported in
the kernel.
v2: Don't skip 1<<15 for the exec flags (Jani Nikula)
v3: Use HAS_RESOURCE_STREAMER macro for execbuf validation (Chris Wilson)
(from getparam patch)
v2: Update I915_PARAM_HAS_RESOURCE_STREAMER so it's after
I915_PARAM_HAS_GPU_RESET.
v3: Only advertise RS support for hardware that supports it.
v4: Add HAS_RESOURCE_STREAMER() macro (Chris)
Testcase: igt/gem_exec_params
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
[danvet: squash in getparam patch since it'd break bisect, suggested
by Chris.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Pull drm updates from Dave Airlie:
"This is the main drm pull request for v4.2.
I've one other new driver from freescale on my radar, it's been posted
and reviewed, I'd just like to get someone to give it a last look, so
maybe I'll send it or maybe I'll leave it.
There is no major nouveau changes in here, Ben was working on
something big, and we agreed it was a bit late, there wasn't anything
else he considered urgent to merge.
There might be another msm pull for some bits that are waiting on
arm-soc, I'll see how we time it.
This touches some "of" stuff, acks are in place except for the fixes
to the build in various configs,t hat I just applied.
Summary:
New drivers:
- virtio-gpu:
KMS only pieces of driver for virtio-gpu in qemu.
This is just the first part of this driver, enough to run
unaccelerated userspace on. As qemu merges more we'll start
adding the 3D features for the virgl 3d work.
- amdgpu:
a new driver from AMD to driver their newer GPUs. (VI+)
It contains a new cleaner userspace API, and is a clean
break from radeon moving forward, that AMD are going to
concentrate on. It also contains a set of register headers
auto generated from AMD internal database.
core:
- atomic modesetting API completed, enabled by default now.
- Add support for mode_id blob to atomic ioctl to complete interface.
- bunch of Displayport MST fixes
- lots of misc fixes.
panel:
- new simple panels
- fix some long-standing build issues with bridge drivers
radeon:
- VCE1 support
- add a GPU reset counter for userspace
- lots of fixes.
amdkfd:
- H/W debugger support module
- static user-mode queues
- support killing all the waves when a process terminates
- use standard DECLARE_BITMAP
i915:
- Add Broxton support
- S3, rotation support for Skylake
- RPS booting tuning
- CPT modeset sequence fixes
- ns2501 dither support
- enable cmd parser on haswell
- cdclk handling fixes
- gen8 dynamic pte allocation
- lots of atomic conversion work
exynos:
- Add atomic modesetting support
- Add iommu support
- Consolidate drm driver initialization
- and MIC, DECON and MIPI-DSI support for exynos5433
omapdrm:
- atomic modesetting support (fixes lots of things in rewrite)
tegra:
- DP aux transaction fixes
- iommu support fix
msm:
- adreno a306 support
- various dsi bits
- various 64-bit fixes
- NV12MT support
rcar-du:
- atomic and misc fixes
sti:
- fix HDMI timing complaince
tilcdc:
- use drm component API to access tda998x driver
- fix module unloading
qxl:
- stability fixes"
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (872 commits)
drm/nouveau: Pause between setting gpu to D3hot and cutting the power
drm/dp/mst: close deadlock in connector destruction.
drm: Always enable atomic API
drm/vgem: Set unique to "vgem"
of: fix a build error to of_graph_get_endpoint_by_regs function
drm/dp/mst: take lock around looking up the branch device on hpd irq
drm/dp/mst: make sure mst_primary mstb is valid in work function
of: add EXPORT_SYMBOL for of_graph_get_endpoint_by_regs
ARM: dts: rename the clock of MIPI DSI 'pll_clk' to 'sclk_mipi'
drm/atomic: Don't set crtc_state->enable manually
drm/exynos: dsi: do not set TE GPIO direction by input
drm/exynos: dsi: add support for MIC driver as a bridge
drm/exynos: dsi: add support for Exynos5433
drm/exynos: dsi: make use of array for clock access
drm/exynos: dsi: make use of driver data for static values
drm/exynos: dsi: add macros for register access
drm/exynos: dsi: rename pll_clk to sclk_clk
drm/exynos: mic: add MIC driver
of: add helper for getting endpoint node of specific identifiers
drm/exynos: add Exynos5433 decon driver
...
In _i915_add_request(), the request is associated with a userland client.
Specifically it is linked to the 'file' structure and the current user process
is recorded. One problem here is that the current user process is not
necessarily the same as when the request was submitted to the driver. This is
especially true when the GPU scheduler arrives and decouples driver submission
from hardware submission. Note also that it is only in the case where the add
request comes from an execbuff call that there is a client to associate. Any
other add request call is kernel only so does not need to do it.
This patch moves the client association into a separate function. This is then
called from the execbuffer code path itself at a sensible time. It also removes
the now redundant 'file' pointer from the add request parameter list.
An extra cleanup of the client association is also added to the request clean up
code for the eventuality where the request is killed after association but
before being submitted (e.g. due to out of memory error somewhere). Once the
submission has happened, the request is on the request list and the regular
request list removal will clear the association. Note that this still needs to
happen at this point in time because the request might be kept floating around
much longer (due to someone holding a reference count) and the client should not
be worrying about this request after it has been retired.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The outstanding_lazy_request is no longer used anywhere in the driver.
Everything that was looking at it now has a request explicitly passed in from on
high. Everything that was relying upon it behind the scenes is now explicitly
creating/passing/submitting its own private request. Thus the OLR can be
removed.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now that everything above has been converted to use requests, intel_ring_begin()
can be updated to take a request instead of a ring. This also means that it no
longer needs to lazily allocate a request if no-one happens to have done it
earlier.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Updated the various ring->dispatch_execbuffer() implementations to take a
request instead of a ring.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Updated *_ring_invalidate_all_caches(), i915_reset_gen7_sol_offsets() and
i915_emit_box() to take request structures instead of ring or ringbuf/context
pairs.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now that everything above has been converted to use request structures, it is
possible to update the lower level move_to_active() functions to be request
based as well.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now that all callers of i915_add_request() have a request pointer to hand, it is
possible to update the add request function to take a request pointer rather
than pulling it out of the OLR.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The plan is to pass requests around as the basic submission tracking structure
rather than rings and contexts. This patch updates the i915_gem_object_sync()
code path.
v2: Much more complex patch to share a single request between the sync and the
page flip. The _sync() function now supports lazy allocation of the request
structure. That is, if one is passed in then that will be used. If one is not,
then a request will be allocated and passed back out. Note that the _sync() code
does not necessarily require a request. Thus one will only be created until
certain situations. The reason the lazy allocation must be done within the
_sync() code itself is because the decision to need one or not is not really
something that code above can second guess (except in the case where one is
definitely not required because no ring is passed in).
The call chains above _sync() now support passing a request through which most
callers passing in NULL and assuming that no request will be required (because
they also pass in NULL for the ring and therefore can't be generating any ring
code).
The exeception is intel_crtc_page_flip() which now supports having a request
returned from _sync(). If one is, then that request is shared by the page flip
(if the page flip is of a type to need a request). If _sync() does not generate
a request but the page flip does need one, then the page flip path will create
its own request.
v3: Updated comment description to be clearer about 'to_req' parameter (Tomas
Elf review request). Rebased onto newer tree that significantly changed the
synchronisation code.
v4: Updated comments from review feedback (Tomas Elf)
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now that the request is guaranteed to specify the context, it is possible to
update the context switch code to use requests rather than ring and context
pairs. This patch updates i915_switch_context() accordingly.
Also removed the warning that the request's context must match the last context
switch's context. As the context switch now gets the context object from the
request structure, there is no longer any scope for the two to become out of
step.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In order to explcitly track all GPU work (and completely remove the outstanding
lazy request), it is necessary to add extra i915_add_request() calls to various
places. Some of these do not need the implicit cache flush done as part of the
standard batch buffer submission process.
This patch adds a flag to _add_request() to specify whether the flush is
required or not.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The plan is to pass requests around as the basic submission tracking structure
rather than rings and contexts. This patch updates the
execbuffer_move_to_active() code path.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The plan is to pass requests around as the basic submission tracking structure
rather than rings and contexts. This patch updates the move_to_gpu() code paths.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Updated a couple of trace points to use the now cached request pointer rather
than extracting it from the ring.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Rather than just having a local request variable in the execbuff code, the
request pointer is now stored in the execbuff params structure. Also added
explicit cleanup of the request (plus wiping the OLR to match) in the error
case. This means that the execbuff code is no longer dependent upon the OLR
keeping track of the request so as to not leak it when things do go wrong. Note
that in the success case, the i915_add_request() at the end of the submission
function will tidy up the request and clear the OLR.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The alloc_request() function does not actually return the newly allocated
request. Instead, it must be pulled from ring->outstanding_lazy_request. This
patch fixes this so that code can create a request and start using it knowing
exactly which request it actually owns.
v2: Updated for new i915_gem_request_alloc() scheme.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Shrunk the parameter list of i915_gem_execbuffer_retire_commands() to a single
structure as everything it requires is available in the execbuff_params object.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The do_execbuf() function takes quite a few parameters. The actual set of
parameters is going to change with the conversion to passing requests around.
Further, it is due to grow massively with the arrival of the GPU scheduler.
This patch simplifies the prototype by passing a parameter structure instead.
Changing the parameter set in the future is then simply a matter of
adding/removing items to the structure.
Note that the structure does not contain absolutely everything that is passed
in. This is because the intention is to use this structure more extensively
later in this patch series and more especially in the GPU scheduler that is
coming soon. The latter requires hanging on to the structure as the final
hardware submission can be delayed until long after the execbuf IOCTL has
returned to user land. Thus it is unsafe to put anything in the structure that
is local to the IOCTL call itself - such as the 'args' parameter. All entries
must be copies of data or pointers to structures that are reference counted in
some way and guaranteed to exist for the duration of the batch buffer's life.
v2: Rebased to newer tree and updated for changes to the command parser.
Specifically, a code shuffle has required saving the batch start address in the
params structure.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Start of explicit request management in the execbuffer code path. This patch
adds a call to allocate a request structure before all the actual hardware work
is done. Thus guaranteeing that all that work is tagged by a known request. At
present, nothing further is done with the request, the rest comes later in the
series.
The only noticable change is that failure to get a request (e.g. due to lack of
memory) will be caught earlier in the sequence. It now occurs right at the start
before any un-undoable work has been done.
v2: Simplified the error handling path.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The i915_add_request() function is called to keep track of work that has been
written to the ring buffer. It adds epilogue commands to track progress (seqno
updates and such), moves the request structure onto the right list and other
such house keeping tasks. However, the work itself has already been written to
the ring and will get executed whether or not the add request call succeeds. So
no matter what goes wrong, there isn't a whole lot of point in failing the call.
At the moment, this is fine(ish). If the add request does bail early on and not
do the housekeeping, the request will still float around in the
ring->outstanding_lazy_request field and be picked up next time. It means
multiple pieces of work will be tagged as the same request and driver can't
actually wait for the first piece of work until something else has been
submitted. But it all sort of hangs together.
This patch series is all about removing the OLR and guaranteeing that each piece
of work gets its own personal request. That means that there is no more
'hoovering up of forgotten requests'. If the request does not get tracked then
it will be leaked. Thus the add request call _must_ not fail. The previous patch
should have already ensured that it _will_ not fail by removing the potential
for running out of ring space. This patch enforces the rule by actually removing
the early exit paths and the return code.
Note that if something does manage to fail and the epilogue commands don't get
written to the ring, the driver will still hang together. The request will be
added to the tracking lists. And as in the old case, any subsequent work will
generate a new seqno which will suffice for marking the old one as complete.
v2: Improved WARNings (Tomas Elf review request).
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Internal requirement for the alignment is that it must be a
power-of-two, so enforce rejection at the user interface to execbuffer
(which allows the caller to specify a stricter-than-expected alignment
criterion).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This patch doesn't have any functional change, but organize fruntbuffer
invalidate and busy by removing unecesarry signature argument for ring.
It was unsed on mark_fb_busy and only used on fb_obj_invalidate for the
same ORIGIN_CS usage. So let's clean it a bit
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Export a new context parameter that can be set/queried through the
context_{get,set}param ioctls. This parameter is passed as a context
flag and decides whether or not a GPU address mapping is allowed to
be made at address zero. The default is to allow such mappings.
Signed-off-by: David Weinehall <david.weinehall@intel.com>
Acked-by: "Zou, Nanhai" <nanhai.zou@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This trims a little overhead from the common case of not needing to
synchronize between rings.
v2: execlists is special and likes to duplicate code.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
i915_parse_cmds returns -EACCES on chained batches, which "tells the
caller to abort and dispatch the workload as a non-secure batch",
but the mechanism implementing that was broken when
flags |= I915_DISPATCH_SECURE was moved from i915_gem_execbuffer_parse
to i915_gem_do_execbuffer (17cabf571e):
i915_gem_execbuffer_parse returns the original batch_obj in this case,
and i915_gem_do_execbuffer doesn't check for that.
Don't set the secure bit in this case to make sure such batches don't
run with elevated priviledges.
Signed-off-by: Rebecca Palmer <rebecca_palmer@zoho.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
[danvet: Stitch together commit message. Also remove a comment as
suggested by Mika. And style-align the comment while at it.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
i915_needs_cmd_parser already checks that for us.
Suggested-by: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
With the binding regression from the original full ppgtt patches
fixed we can throw the switch. Yay!
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90190
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
[Jani: tweaked commit title per Chris' suggestion]
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Currently we have the problem that the decision whether ptes need to
be (re)written is splattered all over the codebase. Move all that into
i915_vma_bind. This needs a few changes:
- Just reuse the PIN_* flags for i915_vma_bind and do the conversion
to vma->bound in there to avoid duplicating the conversion code all
over.
- We need to make binding for EXECBUF (i.e. pick aliasing ppgtt if
around) explicit, add PIN_USER for that.
- Two callers want to update ptes, give them a PIN_UPDATE for that.
Of course we still want to avoid double-binding, but that should be
taken care of:
- A ppgtt vma will only ever see PIN_USER, so no issue with
double-binding.
- A ggtt vma with aliasing ppgtt needs both types of binding, and we
track that properly now.
- A ggtt vma without aliasing ppgtt could be bound twice. In the
lower-level ->bind_vma functions hence unconditionally set
GLOBAL_BIND when writing the ggtt ptes.
There's still a bit room for cleanup, but that's for follow-up
patches.
v2: Fixup fumbles.
v3: s/PIN_EXECBUF/PIN_USER/ for clearer meaning, suggested by Chris.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
It's already protected by the bkl^Wdev->struct_mutex. While at it
realign some related code.
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
We load the ppgtt ptes once per gpu reset/driver load/resume and
that's all that's needed. Note that this only blows up when we're
using the allocate_va_range funcs and not the special-purpose ones
used. With this change we can get rid of that duplication.
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
PIN_GLOBAL is set only when userspace asked for it, and that
is only the case for the gen6 PIPE_CONTROL workaround. We're not
allowed to just clear this.
The important part of the fallback is to drop the restriction to
the mappable range.
This issue has been introduced in
commit edf4427b80
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed Jan 14 11:20:56 2015 +0000
drm/i915: Fallback to using CPU relocations for large batch buffers
v2: Chris pointed out that we also miss to set PIN_GLOBAL when the
buffer is already bound. Fix this up too.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
I woke up one morning and found 50k objects sitting in the batch pool
and every search seemed to iterate the entire list... Painting the
screen in oils would provide a more fluid display.
One issue with the current design is that we only check for retirements
on the current ring when preparing to submit a new batch. This means
that we can have thousands of "active" batches on another ring that we
have to walk over. The simplest way to avoid that is to split the pools
per ring and then our LRU execution ordering will also ensure that the
inactive buffers remain at the front.
v2: execlists still requires duplicate code.
v3: execlists requires more duplicate code
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Move the madvise logic out of the execbuffer main path into the
relatively rare allocation path, making the execbuffer manipulation less
fragile.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The submission portion of the execbuffer code path was abstracted into a
function pointer indirection as part of the legacy vs execlist work. The two
implementation functions are called 'i915_gem_ringbuffer_submission' and
'intel_execlists_submission' but the pointer was called 'do_execbuf'. There is
already a 'i915_gem_do_execbuffer' function (which is what calls the pointer
indirection). The name of the pointer is therefore considered to be backwards
and should be changed.
This patch renames it to 'execbuf_submit' which is hopefully a bit clearer.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Since
commit 17cabf571e
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed Jan 14 11:20:57 2015 +0000
drm/i915: Trim the command parser allocations
we may then try to allocate a zero-sized object and attempt to extract
its pages. Understandably this fails.
Testcase: igt/gem_exec_nop #ivb,byt,hsw
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This patch was formerly known as, "Force pd restore when PDEs change,
gen6-7." I had to change the name because it is needed for GEN8 too.
The real issue this is trying to solve is when a new object is mapped
into the current address space. The GPU does not snoop the new mapping
so we must do the gen specific action to reload the page tables.
GEN8 and GEN7 do differ in the way they load page tables for the RCS.
GEN8 does so with the context restore, while GEN7 requires the proper
load commands in the command streamer. Non-render is similar for both.
Caveat for GEN7
The docs say you cannot change the PDEs of a currently running context.
We never map new PDEs of a running context, and expect them to be
present - so I think this is okay. (We can unmap, but this should also
be okay since we only unmap unreferenced objects that the GPU shouldn't
be tryingto va->pa xlate.) The MI_SET_CONTEXT command does have a flag
to signal that even if the context is the same, force a reload. It's
unclear exactly what this does, but I have a hunch it's the right thing
to do.
The logic assumes that we always emit a context switch after mapping new
PDEs, and before we submit a batch. This is the case today, and has been
the case since the inception of hardware contexts. A note in the comment
let's the user know.
It's not just for gen8. If the current context has mappings change, we
need a context reload to switch
v2: Rebased after ppgtt clean up patches. Split the warning for aliasing
and true ppgtt options. And do not break aliasing ppgtt, where to->ppgtt
is always null.
v3: Invalidate PPGTT TLBs inside alloc_va_range.
v4: Rename ppgtt_invalidate_tlbs to mark_tlbs_dirty and move
pd_dirty_rings from i915_address_space to i915_hw_ppgtt. Fixes when
neither ctx->ppgtt and aliasing_ppgtt exist.
v5: Removed references to teardown_va_range.
v6: Updated needs_pd_load_pre/post.
v7: Fix pd_dirty_rings check in needs_pd_load_post, and update/move
comment about updated PDEs to object_pin/bind (Mika).
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If the batch buffer is too large to fit into the aperture and we need a
GTT mapping for relocations, we currently fail. This only applies to a
subset of machines for a subset of environments, quite undesirable. We
can simply check after failing to insert the batch into the GTT as to
whether we only need a mappable binding for relocation and, if so, we can
revert to using a non-mappable binding and an alternate relocation
method. However, using relocate_entry_cpu() is excruciatingly slow for
large buffers on non-LLC as the entire buffer requires clflushing before
and after the relocation handling. Alternatively, we can implement a
third relocation method that only clflushes around the relocation entry.
This is still slower than updating through the GTT, so we prefer using
the GTT where possible, but is orders of magnitude faster as we
typically do not have to then clflush the entire buffer.
An alternative idea of using a temporary WC mapping of the backing store
is promising (it should be faster than using the GTT itself), but
requires fairly extensive arch/x86 support - along the lines of
kmap_atomic_prof_pfn() (which is not universally implemented even for
x86).
Testcase: igt/gem_exec_big #pnv,byt
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88392
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Add a WARN_ONCE for the impossible reloc case and explain in
a short comment why we want to avoid ping-pong.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We want to port FBC to the frontbuffer tracking infrastructure, but
for that we need to know what caused the object invalidation so
we can react accordingly: CPU mmaps need manual, GTT mmaps and
flips don't need handling and ring rendering needs nukes.
v2: - s/ORIGIN_RENDER/ORIGIN_CS/ (Daniel, Rodrigo)
- Fix copy/pasted wrong documentation
- Rebase
v3: - Rebase
v4: - Don't pass the operation to flushes (Daniel).
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Change 'mutliple' to 'multiple'
Change 'mutlipler' to 'multiplier'
Change 'Haswel' to 'Haswell'
Signed-off-by: Yannick Guerrini <yguerrini@tomshardware.fr>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There is a flags word that is passed through the execbuffer code path all the
way from initial decoding of the user parameters down to the very final dispatch
buffer call. It is simply called 'flags'. Unfortuantely, there are many other
flags words floating around in the same blocks of code. Even more once the GPU
scheduler arrives.
This patch makes it more obvious exactly which flags word is which by renaming
'flags' to 'dispatch_flags'. Note that the bit definitions for this flags word
already have an 'I915_DISPATCH_' prefix on them and so are not quite so
ambiguous.
OTC-Jira: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
[danvet: Resolve conflict with Chris' rework of the bb parsing.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Currently, the command parser tries to create a secondary batch exactly
as large as the original, and vmap both. This is open to abuse by
userspace using extremely large batch objects, but only executing very
short batches. For example, this would be if userspace were to implement
a command submission ringbuffer. However, we only need to allocate pages
for just the contents of the command sequence in the batch - all
relocations copied to the secondary batch will reference the original
batch and so there can be no access to the secondary batch outside of
the explicit execution region.
Testcase: igt/gem_exec_big #ivb,byt,hsw
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88308
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
On Skylake GT3 we have 2 Video Command Streamers (VCS), which is asymmetrical.
For example, HEVC GPU commands can be only dispatched to VCS1 ring.
But userspace has no control when using VCS1 or VCS2. This patch introduces
a mechanism to avoid the default ping-pong mode and use one specific ring
through execution flag. This mechanism is usable for all the platforms
with 2 VCS rings.
The open source usage is from these two commits in vaapi/intel:
commit 702050f04131a44ef8ac16651708ce8a8d98e4b8
Author: Zhao, Yakui <yakui.zhao@intel.com>
Date: Mon Nov 17 12:44:19 2014 +0800
Allow the batchbuffer to be submitted with override flag
commit a56efcdf27d11ad9b21664b4a2cda72d7f90f5a8
Author: Zhao Yakui <yakui.zhao@intel.com>
Date: Mon Nov 17 12:44:22 2014 +0800
Add the override flag to assure that HEVC video command
always uses BSD ring0 for SKL GT3 machine
v2: fix whitespace (Rodrigo)
v3: remove incorrect chunk that came on -collector rebase. (Rodrigo)
v4: change the comment (Zhipeng)
v5: address Daniel's comment (Zhipeng)
Signed-off-by: Zhipeng Gong <zhipeng.gong@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Conflicts:
drivers/gpu/drm/i915/intel_runtime_pm.c
Separate branch so that Takashi can also pull just this refactoring
into sound-next.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
If not pinned VMA can become an eviction target just before it needs to be
executed which breaks the internal object lifetime rules.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87399
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This reverts commit 355a701838.
This had some bad side effects under normal operation, and should
have been dropped earlier.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Move it to a separate function since the main do_execbuffer function
already has so much going on.
v2:
- Move pin/unpin calls inside i915_parse_cmds() (Chris W, v4 7/7
feedback)
Issue: VIZ-4719
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-By: Jon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>