Treat the VMA as the primary struct responsible for tracking bindings
into the GPU's VM. That is we want to treat the VMA returned after we
pin an object into the VM as the cookie we hold and eventually release
when unpinning. Doing so eliminates the ambiguity in pinning the object
and then searching for the relevant pin later.
v2: Joonas' stylistic nitpicks, a fun rebase.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-27-git-send-email-chris@chris-wilson.co.uk
Since the guc allocates and pins and object into the GGTT for its usage,
it is more natural to use that pinned VMA as our resource cookie.
v2: Embrace naming tautology
v3: Rewrite comments for guc_allocate_vma()
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-12-git-send-email-chris@chris-wilson.co.uk
Previously, we would only set the vma->pages pointer for GGTT entries.
However, if we always set it, we can use it to prettify some code that
may want to access the backing store associated with the VMA (as
assigned to the VMA).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-8-git-send-email-chris@chris-wilson.co.uk
Redo the fb rotation handling in order to:
- eliminate the NV12 special casing
- handle fb->offsets[] properly
- make the rotation handling easier for the plane code
To achieve these goals we reduce intel_rotation_info to only contain
(for each plane) the rotated view width,height,stride in tile units,
and the page offset into the object where the plane starts. Each plane
is handled exactly the same way, no special casing for NV12 or other
formats. We then store the computed rotation_info under
intel_framebuffer so that we don't have to recompute it again.
To handle fb->offsets[] we treat them as a linear offsets and convert
them to x/y offsets from the start of the relevant GTT mapping (either
normal or rotated). We store the x/y offsets under intel_framebuffer,
and for some extra convenience we also store the rotated pitch (ie.
tile aligned plane height). So for each plane we have the normal
x/y offsets, rotated x/y offsets, and the rotated pitch. The normal
pitch is available already in fb->pitches[].
While we're gathering up all that extra information, we can also easily
compute the storage requirements for the framebuffer, so that we can
check that the object is big enough to hold it.
When it comes time to deal with the plane source coordinates, we first
rotate the clipped src coordinates to match the relevant GTT view
orientation, then add to them the fb x/y offsets. Next we compute
the aligned surface page offset, and as a result we're left with some
residual x/y offsets. Finally, if required by the hardware, we convert
the remaining x/y offsets into a linear offset.
For gen2/3 we simply skip computing the final page offset, and just
convert the src+fb x/y offsets directly into a linear offset since
that's what the hardware wants.
After this all platforms, incluing SKL+, compute these things in exactly
the same way (excluding alignemnt differences).
v2: Use BIT(DRM_ROTATE_270) instead of ROTATE_270 when rotating
plane src coordinates
Drop some spurious changes that got left behind during
development
v3: Split out more changes to prep patches (Daniel)
s/intel_fb->plane[].foo.bar/intel_fb->foo[].bar/ for brevity
Rename intel_surf_gtt_offset to intel_fb_gtt_offset
Kill the pointless 'plane' parameter from intel_fb_gtt_offset()
v4: Fix alignment vs. alignment-1 when calling
_intel_compute_tile_offset() from intel_fill_fb_info()
Pass the pitch in tiles in
stad of pixels to intel_adjust_tile_offset() from intel_fill_fb_info()
Pass the full width/height of the rotated area to
drm_rect_rotate() for clarity
Use u32 for more offsets
v5: Preserve the upper_32_bits()/lower_32_bits() handling for the
fb ggtt offset (Sivakumar)
v6: Rebase due to drm_plane_state src/dst rects
Cc: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470821001-25272-2-git-send-email-ville.syrjala@linux.intel.com
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Not only is i915_vma_pin() called for every single object on every single
execbuf, it is usually a simple increment as the VMA is already bound for
execution by the GPU. Rearrange the tests for unbound and pin_count
overflow so that we can do the increment and test very cheaply and
compact enough to inline the operation into execbuf. The trick used is
to note that we can check for an overflow bit (keeping space available
for it inside the flags) at the same time as checking the binding bits.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-17-git-send-email-chris@chris-wilson.co.uk
In preparation to perform some magic to speed up i915_vma_pin(), which
is among the hottest of hot paths in execbuf, refactor all the bitfields
accessed by i915_vma_pin() into a single unified set of flags.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-16-git-send-email-chris@chris-wilson.co.uk
During execbuffer we look up the i915_vma in order to reserve them in
the VM. However, we then do a double lookup of the vma in order to then
pin them, all because we lack the necessary interfaces to operate on
i915_vma - so introduce i915_vma_pin()!
v2: Tidy parameter lists to remove one level of redirection in the hot
path.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-15-git-send-email-chris@chris-wilson.co.uk
When the user closes the context mark it and the dependent address space
as closed. As we use an asynchronous destruct method, this has two
purposes. First it allows us to flag the closed context and detect
internal errors if we to create any new objects for it (as it is removed
from the user's namespace, these should be internal bugs only). And
secondly, it allows us to immediately reap stale vma.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-27-git-send-email-chris@chris-wilson.co.uk
In order to prevent a leak of the vma on shared objects, we need to
hook into the object_close callback to destroy the vma on the object for
this file. However, if we destroyed that vma immediately we may cause
unexpected application stalls as we try to unbind a busy vma - hence we
defer the unbind to when we retire the vma.
v2: Keep vma allocated until closed. This is useful for a later
optimisation, but it is required now in order to handle potential
recursion of i915_vma_unbind() by retiring itself.
v3: Comments are important.
Testcase: igt/gem_ppggtt/flink-and-close-vma-leak
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-26-git-send-email-chris@chris-wilson.co.uk
Hook the vma itself into the i915_gem_request_retire() so that we can
accurately track when a solitary vma is inactive (as opposed to having
to wait for the entire object to be idle). This improves the interaction
when using multiple contexts (with full-ppgtt) and eliminates some
frequent list walking when retiring objects after a completed request.
A side-effect is that we get an active vma reference for free. The
consequence of this is shown in the next patch...
v2: Update inline names to be consistent with
i915_gem_object_get_active()
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-25-git-send-email-chris@chris-wilson.co.uk
For the global GTT (and aliasing GTT), the address space is owned by the
device (it is a global resource) and so the per-file owner field is
NULL. For per-process GTT (where we create an address space per
context), each is owned by the opening file. We can use this ownership
information to both distinguish GGTT and ppGTT address spaces, as well
as occasionally inspect the owner.
v2: Whitespace, tells us who owns i915_address_space
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-6-git-send-email-chris@chris-wilson.co.uk
Since we have a static if-else-chain for device probing of the global
GTT, we do not need to use a function pointer, let alone store it when
we never use it again. So use the if-else-chain to call down into the
device specific probe.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-5-git-send-email-chris@chris-wilson.co.uk
Initialising the global GTT is tricky as we wish to use the drm_mm range
manager during the modesetting initialisation (to capture stolen
allocations from the BIOS) before we actually enable GEM. To overcome
this, we currently setup the drm_mm first and then carefully rebind
them.
v2: Fixup after rebasing
v3: GGTT initialisation needs to be split around kicking out conflicts
v4: Restore an old UMS BUG_ON(mappable > total) as a DRM_ERROR plus
fixup of probe results.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-4-git-send-email-chris@chris-wilson.co.uk
Since these are internal functions they operate on drm_i915_private and
not the drm_device being passed in. So pass in the drm_i915_private
instead, and remove one layer of dancing. No space wins here, just
conforming to the norm in function parameters.
v2: Include all the probe functions
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-3-git-send-email-chris@chris-wilson.co.uk
In order to handle conflicting drivers (i.e. vgacon) having a different
setup of hardware, we have to remove those other drivers before we try
to setup our own mappings. This requires us to split GGTT initialisation
between probing for the hardware location (part of the PCI BAR) and
later establishing the kernel resources for it.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-2-git-send-email-chris@chris-wilson.co.uk
Ringbuffers are now being written to either through LLC or WC paths, so
treating them as simply iomem is no longer adequate. However, for the
older !llc hardware, the hardware is documentated as treating the TAIL
register update as serialising, so we can relax the barriers when filling
the rings (but even if it were not, it is still an uncached register write
and so serialising anyway.).
For simplicity, let's ignore the iomem annotation.
v2: Remove iomem from ringbuffer->virtual_address
v3: And for good measure add iomem elsewhere to keep sparse happy
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> #v2
Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-8-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-7-git-send-email-chris@chris-wilson.co.uk
Gen8 versions of these macros were updated a few months ago
(e8ebd8e drm/i915: eliminate 'temp' in gen8_for_each macros)
originally because at least one iterator could generate an
out of bounds access, but also because eliminating the 'temp'
parameter generated smaller and faster code.
Matthew Auld recently noticed the same problem with the gen6
versions and provided a patch
https://lists.freedesktop.org/archives/intel-gfx/2016-June/099334.html
but while we're changing these, we might as well make them as
much like the gen8 versions as possible, including the style
of using "&& (..., true)" rather than ": (..., 1) : 0", and
of course eliminating the redundant 'temp'.
Furthermore, the "all_pdes" version is only used in one place,
so we can improve code efficiency by changing both the macro
parameters and the calling code to reduce extra dereferences.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466793466-23500-1-git-send-email-david.s.gordon@intel.com
Introduced a new vm specfic callback insert_page() to program a single pte in
ggtt or ppgtt. This allows us to map a single page in to the mappable aperture
space. This can be iterated over to access the whole object by using space as
meagre as page size.
v2: Added low level rpm assertions to insert_page routines (Chris)
v3: Added POSTING_READ post register write (Tvrtko)
v4: Rebase (Ankit)
v5: Removed wmb() and FLUSH_CTL from insert_page, caller to take care
of it (Chris)
v6: insert_page not working correctly without FLSH_CNTL write, added the
write again.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Pass drm_i915_private to the uncore init/fini routines and their
subservients as it is their native type.
text data bss dec hex filename
6309978 3578778 696320 10585076 a183f4 vmlinux
6309530 3578778 696320 10584628 a18234 vmlinux
a modest 400 bytes of saving, but 60 lines of code deleted!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1462885804-26750-1-git-send-email-chris@chris-wilson.co.uk
Move the intel_enable_gtt() call to happen before we touch the GTT
during resume. Right now it's done way too late. Before
commit ebb7c78d35 ("agp/intel-gtt: Only register fake agp driver for gen1")
it was actually done earlier on account of also getting called from
the resume hook of the fake agp driver. With the fake agp driver
no longer getting registered we must move the call up.
The symptoms I've seen on my 830 machine include lowmem corruption,
other kinds of memory corruption, and straight up hung machine during
or just after resume. Not really sure what causes the memory corruption,
but so far I've not seen any with this fix.
I think we shouldn't really need to call this during init, but we have
been doing that so I've decided to keep the call. However moving that
call earlier could be prudent as well. Doing it right after the
intel-gtt probe seems appropriate.
Also tested this on 946gz,elk,ilk and all seemed quite happy with
this change.
v2: Reorder init_hw vs. enable_hw functions (Chris)
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: drm-intel-fixes@lists.freedesktop.org
Fixes: ebb7c78d35 ("agp/intel-gtt: Only register fake agp driver for gen1")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1462559755-353-1-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
The code to switch_mm() is already handled by i915_switch_context(), the
only difference required to setup the aliasing ppgtt is that we need to
emit te switch_mm() on the first context, i.e. when transitioning from
engine->last_context == NULL. This allows us to defer the
initialisation of the GPU from early device initialisation to first use,
which should marginally speed up both. The caveat is that we then defer
the context initialisation until first use - i.e. we cannot assume that
the GPU engines are initialised. For example, this means that power
contexts for rc6 (Ironlake) need to explicitly loaded, as they are.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-11-git-send-email-chris@chris-wilson.co.uk
By tracking the iomapping on the VMA itself, we can share that area
between multiple users. Also by only revoking the iomapping upon
unbinding from the mappable portion of the GGTT, we can keep that iomap
across multiple invocations (e.g. execlists context pinning).
Note that by moving the iounnmap tracking to the VMA, we actually end up
fixing a leak of the iomapping in intel_fbdev.
v1.5: Rebase prompted by Tvrtko
v2: Drop dev_priv parameter, we can recover the i915_ggtt from the vma.
v3: Move handling of ioremap space exhaustion to vmap_purge and also
allow vmallocs to recover old iomaps. Add Tvrtko's kerneldoc.
v4: Fix a use-after-free in shrinker and rearrange i915_vma_iomap
v5: Back to i915_vm_to_ggtt
v6: Use i915_vma_pin_iomap and i915_vma_unpin_iomap to mark critical
sections and ensure the VMA cannot be reaped whilst mapped.
v7: Move i915_vma_iounmap so that consumers of the API are not tempted,
and add iomem annotations
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-5-git-send-email-chris@chris-wilson.co.uk
Refer to the GGTT VM consistently as "ggtt->base" instead of just "ggtt",
"vm" or indirectly through other variables like "dev_priv->ggtt.base"
to avoid confusion with the i915_ggtt object itself and PPGTT VMs.
Refer to the GGTT as "ggtt" instead of indirectly through chaining.
As a bonus gets rid of the long-standing i915_obj_to_ggtt vs.
i915_gem_obj_to_ggtt conflict, due to removal of i915_obj_to_ggtt!
v2:
- Added some more after grepping sources with Chris
v3:
- Refer to GGTT VM through ggtt->base consistently instead of ggtt_vm
(Chris)
v4:
- Convert all dev_priv->ggtt->foo accesses to ggtt->foo.
v5:
- Make patch checker happy
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Rename and document the GGTT init functions to give a better
idea of the context where they are called from.
i915_gem_gtt_init => i915_ggtt_init_hw
i915_gem_init_global_gtt => i915_gem_init_ggtt
i915_global_gtt_cleanup => i915_ggtt_cleanup_hw
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1458830866-12578-1-git-send-email-joonas.lahtinen@linux.intel.com
Use less pointers with the probing code, making it much less confusing
to read.
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Refer to Global GTT consistently as GGTT, thus rename dev_priv->gtt
to dev_priv->ggtt and struct i915_gtt to struct i915_ggtt.
Fix a couple of whitespace problems while at it.
v2:
- Fix a typo in commit message.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Throw out a bunch of unnecessary stuff from struct intel_rotation_info,
and pull most of the remaining stuff to live under an array of
per-color plane sub-structures.
What still remains outside the sub-structure will be reorgranized later
as well, but that requires more work elsewhere so leave it be for now.
v2: Split the vma size == luma+chroma size fix to prep patch (Daniel)
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> (v1)
Link: http://patchwork.freedesktop.org/patch/msgid/1455569699-27905-8-git-send-email-ville.syrjala@linux.intel.com
The multiple levels of indirect do nothing but hinder the compiler and
the pointer chasing turns to be quite painful but painless to fix.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1456484600-11477-1-git-send-email-tvrtko.ursulin@linux.intel.com
Elsewhere we have adopted the convention of using '_link' to denote
elements in the list (and '_list' for the actual list_head itself), and
that the name should indicate which list the link belongs to (and
preferrably not just where the link is being stored).
s/vma_link/obj_link/ (we iterate over obj->vma_list)
s/mm_list/vm_link/ (we iterate over vm->[in]active_list)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Otherwise a pde_shift big enough to overflow a u32 will be truncated before
assignment
Note: We never asked for ranges spanning a 4G boundary, so this issue
doesn't cause a real problem.
Signed-off-by: Alan Cox <alan@linux.intel.com>
[danvet: Add note why this isn't a real problem.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20160217142043.4947.60447.stgit@localhost.localdomain
RC6 setup is shared between BIOS and Driver. BIOS sets up subset of RC6
setup registers. If those are not setup Driver should not enable RC6.
For implementing this, driver can check RC_CTRL0 and RC_CTRL1 values
to know if BIOS has enabled HW/SW RC6.
This will also enable user to control RC6 using BIOS settings alone.
RC6 related instability can be avoided by disabling via BIOS settings
till driver fixes it.
v2: Had placed logic in gen8 function by mistake. Fixed it.
Ensuring RPM is not enabled in case BIOS disabled RC6.
v3: Need to disable RPM if RC6 is disabled due to BIOS settings. (Daniel)
Runtime PM enabling happens before gen9_enable_rc6.
Moved the updation of enable_rc6 parameter in intel_uncore_sanitize.
v4: Added elaborate check for BIOS RC6 setup. Prepared check_pctx for bxt.
(Imre)
v5: Caching reserved stolen base and size in the driver private data.
Reorganized RC6 setup check. Moved from gen9_enable_rc6 to
intel_uncore_sanitize. (Imre)
v6: Rebasing on the patch submitted by Imre that moves gem_init_stolen
earlier in the load.
v7: Removed PWRCTX_MAXCNT_VCSUNIT1 check as it applies to SKL. (Imre)
v8: Fixed formatting and checkpatch issues. Fixed functional issue where
RC6 ctx size check was missing. (Imre)
Cc: Imre Deak <imre.deak@intel.com>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1454697809-22113-1-git-send-email-sagar.a.kamble@intel.com
LRC code was calling GEM API like i915_gem_obj_ggtt_offset from
places where the struct_mutex cannot be grabbed (irq handlers).
To avoid that this patch caches some interesting bits and values
in the engine and context structures.
Some usages are also removed where they are not needed like a
few asserts which are either impossible or have been checked
already during engine initialization.
Side benefit is also that interrupt handlers and command
submission stop evaluating invariant conditionals, like what
Gen we are running on, on every interrupt and every command
submitted.
This patch deals with logical ring context id and descriptors
while subsequent patches will deal with the remaining issues.
v2:
* Cache the VMA instead of the address. (Chris Wilson)
* Incorporate Dave Gordon's good comments and function name.
v3:
* Extract ctx descriptor template to a function and group
functions dealing with ctx descriptor & co together near
top of the file. (Dave Gordon)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1452870629-13830-1-git-send-email-tvrtko.ursulin@linux.intel.com
All of these iterator macros require a 'temp' argument, used merely to
hold internal partial results. We can instead declare the temporary
variable inside the macro, so the caller need not provide it.
Some of the old code contained nested iterators that actually reused the
same 'temp' variable for both inner and outer instances. It's quite
surprising that this didn't introduce bugs! But it does show that the
value of 'temp' isn't required to persist during the iterated body.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1449581451-11848-2-git-send-email-david.s.gordon@intel.com
The rotated view depends upon the rotation paramters, but thus far we
didn't bother checking for those. This seems to have been an issue
ever since this was introduce in
commit fe14d5f4e5
Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Date: Wed Dec 10 17:27:58 2014 +0000
drm/i915: Infrastructure for supporting different GGTT views per object
But userspace is allowed to reuse framebuffer backing storage with
different framebuffers with different pixel formats/stride/whatever.
And e.g. SNA indeed does this. Hence we must check for all the
paramters to match, not just that it's rotated.
v2: intel_plane_obj_offset also needs to construct the full view, to
avoid fallout since they don't fully match.
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1444834266-12689-3-git-send-email-daniel.vetter@ffwll.ch
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We don't need 2 separate unions.
Note that this was done intentinoally
Author: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Date: Wed May 6 14:35:38 2015 +0300
drm/i915: Add a partial GGTT view type
on Tvrtko's request, but without a clear justification. Rotated views
are also not checking for matching paramters in i915_ggtt_view_equal,
which seems like a bug. But this patch here doesn't change that.
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1444834266-12689-2-git-send-email-daniel.vetter@ffwll.ch
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We tried to fix this in commit fdc454c148 ("drm/i915: Prevent out of
range pt in gen6_for_each_pde").
But the static analyzer still complains that, just before we break due
to "iter < I915_PDES", we do "pt = (pd)->page_table[iter]" with an
iter value that is bigger than I915_PDES. Of course, this isn't really
a problem since no one uses pt outside the macro. Still, every single
new usage of the macro will create a new issue for us to mark as a
false positive.
Also, Paulo re-started the discussion a while ago [1], but didn't end up
implemented.
In order to "solve" this "problem", this patch takes the ideas from
Chris and Dave, but that check would change the desired behavior of the
code, because the object (for example pdp->page_directory[iter]) can be
null during init/alloc, and C would take this as false, breaking the for
loop immediately.
This has been already verified with "static analysis tools".
[1]http://lists.freedesktop.org/archives/intel-gfx/2015-June/068548.html
v2: Make it a single statement, while preventing the common subexpression
elimination (Chris)
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Just adding the rotated UV plane at the end of the rotated Y plane.
v2: Rebase.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The FBC hardware for these platforms doesn't have access to the
bios_reserved range, so it always assumes the maximum (8mb) is used.
So avoid this range while allocating.
This solves a bunch of FIFO underruns that happen if you end up
putting the CFB in that memory range. On my machine, with 32mb of
stolen, I need a 2560x1440 mode for that.
Testcase: igt/kms_frontbuffer_tracking/fbc-* (given the right setup)
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Similar to commit c44ef60e43 ("drm/i915/gtt:
Allow >= 4GB sizes for vm"), i915_gem_obj_offset and i915_gem_obj_ggtt_offset
return an unsigned long, which in only 4-bytes long in 32-bit kernels.
Change return type (and other related offset variables) to u64.
Since Global GTT is always limited to 4GB, this change would not be required
in i915_gem_obj_ggtt_offset, but this is done for consistency.
v2: Remove unnecessary offset variable in do_pin, as we already have
vma->node.start (Chris).
Update GGTT offset too (Tvrtko).
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Similar to PDs, while setting up a page directory pointer, make all entries
of the pdp point to the scratch pd before mapping (and make all its entries
point to the scratch page); this is to be safe in case of out of bound
access or proactive prefetch.
Also add a scratch pdp, which the PML4 entries point to.
v2: Handle scratch_pdp allocation failure correctly, and keep
initialize_px functions together (Akash)
v3: Rebase after Mika's ppgtt cleanup / scratch merge patch series. Rely on
the added macros to initialize the pdps.
v4: Rebase after final merged version of Mika's ppgtt/scratch patches
(and removed commit message part related to v3).
v5: Update commit message to also mention PML4 table initialization and
the new scratch pdp (Akash).
Suggested-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
PML4 has no special attributes, and there will always be a PML4.
So simply initialize it at creation, and destroy it at the end.
The code for 4lvl is able to call into the existing 3lvl page table code
to handle all of the lower levels.
v2: Return something at the end of gen8_alloc_va_range_4lvl to keep the
compiler happy. And define ret only in one place.
Updated gen8_ppgtt_unmap_pages and gen8_ppgtt_free to handle 4lvl.
v3: Use i915_dma_unmap_single instead of pci API. Fix a
couple of incorrect checks when unmapping pdp and pd pages (Akash).
v4: Call __pdp_fini also for 32b PPGTT. Clean up alloc_pdp param list.
v5: Prevent (harmless) out of range access in gen8_for_each_pml4e.
v6: Simplify alloc_vma_range_4lvl and gen8_ppgtt_init_common error
paths. (Akash)
v7: Rebase, s/gen8_ppgtt_free_*/gen8_ppgtt_cleanup_*/.
v8: Change location of pml4_init/fini. It will make next patches
cleaner.
v9: Rebase after Mika's ppgtt cleanup / scratch merge patch series, while
trying to reuse as much as possible for pdp alloc. pml4_init/fini
replaced by setup/cleanup_px macros.
v10: Rebase after Mika's merged ppgtt cleanup patch series.
v11: Rebase after final merged version of Mika's ppgtt/scratch
patches.
v12: Fix pdpe start value in trace (Akash)
v13: Define all 4lvl functions in this patch directly, instead of
previous patches, add i915_page_directory_pointer_entry_alloc here,
use test_bit to detect when pdp is already allocated (Akash).
v14: Move pdp allocation into a new gen8_ppgtt_alloc_page_dirpointers
funtion, as we do for pds and pts; move pd and pdp setup functions to
this patch (Akash).
v15: Added kfree(pdp) from previous patch to this (Akash).
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Introduces the Page Map Level 4 (PML4), ie. the new top level structure
of the page tables.
To facilitate testing, 48b mode will be available on Broadwell and
GEN9+, when i915.enable_ppgtt = 3.
v2: Remove unnecessary CONFIG_X86_64 checks, ppgtt code is already
32/64-bit safe (Chris).
v3: Add goto free_scratch in temp 48-bit mode init code (Akash).
v4: kfree the pdp until the 4lvl alloc/free patch (Akash).
v5: Postpone 48-bit code in sanitize_enable_ppgtt (Akash).
v6: Keep _insert_pte_entries changes outside this patch (Akash).
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This transitional patch doesn't do much for the existing code. However,
it should make upcoming patches to use the full 48b address space a bit
easier.
32-bit ppgtt uses just 4 PDPs, while 48-bit ppgtt will have up-to 512;
this patch prepares the existing functions to query the right number of pdps
at run-time. This also means that used_pdpes should also be allocated during
ppgtt_init, as the bitmap size will depend on the ppgtt address range
selected.
v2: Renamed pdp_free to be similar to pd/pt (unmap_and_free_pdp).
v3: To facilitate testing, 48b mode will be available on Broadwell and
GEN9+, when i915.enable_ppgtt = 3.
v4: Rebase after s/page_tables/page_table/, added extra information
about 4-level page table formats and use IS_ENABLED macro.
v5: Check CONFIG_X86_64 instead of CONFIG_64BIT.
v6: Rebase after Mika's ppgtt cleanup / scratch merge patch series, and
follow
his nomenclature in pdp functions (there is no alloc_pdp yet).
v7: Rebase after merged version of Mika's ppgtt cleanup patch series.
v8: Rebase after final merged version of Mika's ppgtt/scratch patches.
v9: Introduce PML4 (and 48-bit checks) until next patch (Akash).
v10: Also use test_bit to detect when pd/pt are already allocated (Akash)
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Akash Goel <akash.goel@intel.com>
[danvet: Amend commit message as suggested by Michel.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>