Commit Graph

951050 Commits

Author SHA1 Message Date
Maarten Lankhorst 99f08d674e drm/i915: Add ww context handling to context_barrier_task
This is required if we want to pass a ww context in intel_context_pin
and gen6_ppgtt_pin().

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-11-maarten.lankhorst@linux.intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 14:30:38 +03:00
Maarten Lankhorst bfdf8b1d38 drm/i915: Use ww locking in intel_renderstate.
We want to start using ww locking in intel_context_pin, for this
we need to lock multiple objects, and the single i915_gem_object_lock
is not enough.

Convert to using ww-waiting, and make sure we always pin intel_context_state,
even if we don't have a renderstate object.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-10-maarten.lankhorst@linux.intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 14:30:31 +03:00
Maarten Lankhorst c43ce12328 drm/i915: Use per object locking in execbuf, v12.
Now that we changed execbuf submission slightly to allow us to do all
pinning in one place, we can now simply add ww versions on top of
struct_mutex. All we have to do is a separate path for -EDEADLK
handling, which needs to unpin all gem bo's before dropping the lock,
then starting over.

This finally allows us to do parallel submission, but because not
all of the pinning code uses the ww ctx yet, we cannot completely
drop struct_mutex yet.

Changes since v1:
- Keep struct_mutex for now. :(
Changes since v2:
- Make sure we always lock the ww context in slowpath.
Changes since v3:
- Don't call __eb_unreserve_vma in eb_move_to_gpu now; this can be
  done on normal unlock path.
- Unconditionally release vmas and context.
Changes since v4:
- Rebased on top of struct_mutex reduction.
Changes since v5:
- Remove training wheels.
Changes since v6:
- Fix accidentally broken -ENOSPC handling.
Changes since v7:
- Handle gt buffer pool better.
Changes since v8:
- Properly clear variables, to make -EDEADLK handling not BUG.
Change since v9:
- Fix unpinning fence on pnv and below.
Changes since v10:
- Make relocation gpu chaining working again.
Changes since v11:
- Remove relocation chaining, pain to make it work.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-9-maarten.lankhorst@linux.intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 14:30:07 +03:00
Maarten Lankhorst 8e4ba491b0 drm/i915: Parse command buffer earlier in eb_relocate(slow)
We want to introduce backoff logic, but we need to lock the
pool object as well for command parsing. Because of this, we
will need backoff logic for the engine pool obj, move the batch
validation up slightly to eb_lookup_vmas, and the actual command
parsing in a separate function which can get called from execbuf
relocation fast and slowpath.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-8-maarten.lankhorst@linux.intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 14:30:01 +03:00
Maarten Lankhorst 1af343cdc1 drm/i915: Remove locking from i915_gem_object_prepare_read/write
Execbuffer submission will perform its own WW locking, and we
cannot rely on the implicit lock there.

This also makes it clear that the GVT code will get a lockdep splat when
multiple batchbuffer shadows need to be performed in the same instance,
fix that up.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-7-maarten.lankhorst@linux.intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 14:29:53 +03:00
Maarten Lankhorst 80f0b679d6 drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.
i915_gem_ww_ctx is used to lock all gem bo's for pinning and memory
eviction. We don't use it yet, but lets start adding the definition
first.

To use it, we have to pass a non-NULL ww to gem_object_lock, and don't
unlock directly. It is done in i915_gem_ww_ctx_fini.

Changes since v1:
- Change ww_ctx and obj order in locking functions (Jonas Lahtinen)

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-6-maarten.lankhorst@linux.intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 14:29:44 +03:00
Maarten Lankhorst 8ae275c288 Revert "drm/i915/gem: Split eb_vma into its own allocation"
This reverts commit 0f1dd02295 ("drm/i915/gem: Split eb_vma into
its own allocation") and also moves all unreserving to a single
place at the end, which is a minor simplification.

With the WW locking, we will drop all references only at the
end when unlocking, so refcounting can now be removed.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-5-maarten.lankhorst@linux.intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 14:29:35 +03:00
Maarten Lankhorst fd1500fcd4 Revert "drm/i915/gem: Drop relocation slowpath".
This reverts commit 7dc8f11437 ("drm/i915/gem: Drop relocation
slowpath"). We need the slowpath relocation for taking ww-mutex
inside the page fault handler, and we will take this mutex when
pinning all objects.

We also functionally revert ef398881d2 ("drm/i915/gem: Limit
struct_mutex to eb_reserve"), as we need the struct_mutex in
the slowpath as well, and a tiny part of 003d8b9143 ("drm/i915/gem:
Only call eb_lookup_vma once during execbuf ioctl"). Specifically,
we make the -EAGAIN handling part of fallback to slowpath again.

With this, we have a proper working slowpath again, which
will allow us to do fault handling with WW locks held.

[mlankhorst: Adjusted for reloc_gpu_flush() changes]

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
[mlankhorst: Removed extra reloc_gpu_flush()]
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-4-maarten.lankhorst@linux.intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 14:29:16 +03:00
Maarten Lankhorst 50ae6c61a1 drm/i915: Revert relocation chaining commits.
This reverts commit 964a9b0f61 ("drm/i915/gem: Use chained reloc batches")
and commit 0e97fbb080 ("drm/i915/gem: Use a single chained reloc batches
for a single execbuf").

When adding ww locking to execbuf, it's hard enough to deal with a
single BO that is part of relocation execution. Chaining is hard to
get right, and with GPU relocation deprecated, it's best to drop this
altogether, instead of trying to fix something we will remove.

This is not a completely 1:1 revert, we reset rq_size to 0 in
reloc_cache_init, this was from e3d291301f ("drm/i915/gem: Implement legacy
MI_STORE_DATA_IMM"), because we don't want to break the selftests. (Daniel)

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-3-maarten.lankhorst@linux.intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 14:29:07 +03:00
Maarten Lankhorst 102a0a9051 Revert "drm/i915/gem: Async GPU relocations only"
This reverts commit 9e0f9464e2 ("drm/i915/gem: Async GPU relocations only"),
and related commit 7ac2d2536d ("drm/i915/gem: Delete unused code").

Async GPU relocations are not the path forward, we want to remove
GPU accelerated relocation support eventually when userspace is fixed
to use VM_BIND, and this is the first step towards that. We will keep
async gpu relocations around for now, until userspace is fixed.

Relocation support will be disabled completely on platforms where there
was never any userspace that depends on it, as the hardware doesn't
require it from at least gen9+ onward. For older platforms, the plan
is to use cpu relocations only.

The igt side is fixed in igt commit 39e9aa1032a4e ("tests/i915: Remove
subtests that rely on async relocation behavior").

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-2-maarten.lankhorst@linux.intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 14:28:46 +03:00
Chris Wilson da1ea128a6 drm/i915/gem: Free the fence after a fence-chain lookup failure
If dma_fence_chain_find_seqno() reports an error, it does so in its
preamble before it disposes of the input fence. On handling the
error, we need to drop the reference to the fence.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2292
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 13149e8baf ("drm/i915: add syncobj timeline support")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200806161056.17593-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 14:28:21 +03:00
Chris Wilson 736e785f9b drm/i915/gem: Reduce context termination list iteration guard to RCU
As we now protect the timeline list using RCU, we can drop the
timeline->mutex for guarding the list iteration during context close, as
we are searching for an inflight request. Any new request will see the
context is banned and not be submitted. In doing so, pull the checks for
a concurrent submission of the request (notably the
i915_request_completed()) under the engine spinlock, to fully serialise
with __i915_request_submit()). That is in the case of preempt-to-busy
where the request may be completed during the __i915_request_submit(),
we need to be careful that we sample the request status after
serialising so that we don't miss the request the engine is actually
submitting.

Fixes: 4a31741521 ("drm/i915/gem: Refine occupancy test in kill_context()")
References: d22d2d073e ("drm/i915: Protect i915_request_await_start from early waits") # rcu protection of timeline->requests
References: https://gitlab.freedesktop.org/drm/intel/-/issues/1622
References: https://gitlab.freedesktop.org/drm/intel/-/issues/2158
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200806105954.7766-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 14:28:14 +03:00
Chris Wilson dd5e024956 drm/i915/selftests: Prevent selecting 0 for our random width/align
When igt_random_offset() is a given a range of [0, PAGE_SIZE], it is
allowed to return 0. However, attempting to use a size of 0 for the
igt_lmem_write_cpu() byte poking, leads to call igt_random_offset() with
a range of [offset, offset + 0] and ask it to find a length of 4 within
it. This triggers the bug on that the requested length should fit within
the range!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200806145728.16495-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 14:28:08 +03:00
Chris Wilson e23005604b drm/i915/gt: Hold context/request reference while breadcrumbs are active
Currently we hold no actual reference to the request nor context while
they are attached to a breadcrumb. To avoid freeing the request/context
too early, we serialise with cancel-breadcrumbs by taking the irq
spinlock in i915_request_retire(). The alternative is to take a
reference for a new breadcrumb and release it upon signaling; removing
the more frequently hit contention point in i915_request_retire().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200801160225.6814-2-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Joonas: Rebased and reordered into drm-intel-gt-next branch]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 14:24:29 +03:00
Chris Wilson 3f7dc10716 drm/i915/gt: Move intel_breadcrumbs_arm_irq earlier
Move the __intel_breadcrumbs_arm_irq earlier, next to the disarm_irq, so
that we can make use of it in the following patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200801160225.6814-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 14:24:25 +03:00
Chris Wilson 82adf90113 drm/i915/gt: Shrink i915_page_directory's slab bucket
kmalloc uses power-of-two slab buckets for small allocations (up to a
few pages). Since i915_page_directory is a page of pointers, plus a
couple more, this is rounded up to 8K, and we waste nearly 50% of that
allocation. Long terms this leads to poor memory utilisation, bloating
the kernel footprint, but the problem is exacerbated by our conservative
preallocation scheme for binding VMA. As we are required to allocate all
levels for each vma just in case we need to insert them upon binding,
this leads to a large multiplication factor for a single page vma. By
halving the allocation we need for the page directory structure, we
halve the impact of that factor, bringing workloads that once fitted into
memory, hopefully back to fitting into memory.

We maintain the split between i915_page_directory and i915_page_table as
we only need half the allocation for the lowest, most populous, level.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200729164219.5737-3-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 14:24:23 +03:00
Chris Wilson 89351925a4 drm/i915/gt: Switch to object allocations for page directories
The GEM object is grossly overweight for the practicality of tracking
large numbers of individual pages, yet it is currently our only
abstraction for tracking DMA allocations. Since those allocations need
to be reserved upfront before an operation, and that we need to break
away from simple system memory, we need to ditch using plain struct page
wrappers.

In the process, we drop the WC mapping as we ended up clflushing
everything anyway due to various issues across a wider range of
platforms. Though in a future step, we need to drop the kmap_atomic
approach which suggests we need to pre-map all the pages and keep them
mapped.

v2: Verify our large scratch page is suitably DMA aligned; and manually
clear the scratch since we are allocating plain struct pages full of
prior content.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200729164219.5737-2-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 14:24:08 +03:00
Chris Wilson cd0452aa2a drm/i915: Preallocate stashes for vma page-directories
We need to make the DMA allocations used for page directories to be
performed up front so that we can include those allocations in our
memory reservation pass. The downside is that we have to assume the
worst case, even before we know the final layout, and always allocate
enough page directories for this object, even when there will be overlap.
This unfortunately can be quite expensive, especially as we have to
clear/reset the page directories and DMA pages, but it should only be
required during early phases of a workload when new objects are being
discovered, or after memory/eviction pressure when we need to rebind.
Once we reach steady state, the objects should not be moved and we no
longer need to preallocating the pages tables.

It should be noted that the lifetime for the page directories DMA is
more or less decoupled from individual fences as they will be shared
across objects across timelines.

v2: Only allocate enough PD space for the PTE we may use, we do not need
to allocate PD that will be left as scratch.
v3: Store the shift unto the first PD level to encapsulate the different
PTE counts for gen6/gen8.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200729164219.5737-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 14:24:05 +03:00
Chris Wilson b3786b2937 drm/i915/gt: Distinguish the virtual breadcrumbs from the irq breadcrumbs
On the virtual engines, we only use the intel_breadcrumbs for tracking
signaling of stale breadcrumbs from the irq_workers. They do not have
any associated interrupt handling, active requests are passed to a
physical engine and associated breadcrumb interrupt handler. This causes
issues for us as we need to ensure that we do not actually try and
enable interrupts and the powermanagement required for them on the
virtual engine, as they will never be disabled. Instead, let's
specify the physical engine used for interrupt handler on a particular
breadcrumb.

v2: Drop b->irq_armed = true mocking for no interrupt HW

Fixes: 4fe6abb8f5 ("drm/i915/gt: Ignore irq enabling on the virtual engines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200731154834.8378-4-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 14:23:55 +03:00
Chris Wilson 56f581bad4 drm/i915/gt: Only transfer the virtual context to the new engine if active
One more complication of preempt-to-busy with respect to the virtual
engine is that we may have retired the last request along the virtual
engine at the same time as preparing to submit the completed request to
a new engine. That submit will be shortcircuited, but not before we have
updated the context with the new register offsets and marked the virtual
engine as bound to the new engine (by calling swap on ve->siblings[]).
As we may have just retired the completed request, we may also be in the
middle of calling virtual_context_exit() to turn off the power management
associated with the virtual engine, and that in turn walks the
ve->siblings[]. If we happen to call swap() on the array as we walk, we
will call intel_engine_pm_put() twice on the same engine.

In this patch, we prevent this by only updating the bound engine after a
successful submission which weeds out the already completed requests.

Alternatively, we could walk a non-volatile array for the pm, such as
using the engine->mask. The small advantage to performing the update
after the submit is that we then only have to do a swap for active
requests.

Fixes: 22b7a426bb ("drm/i915/execlists: Preempt-to-busy")
References: 6d06779e86 ("drm/i915: Load balancing across a virtual engine"
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: "Nayana, Venkata Ramana" <venkata.ramana.nayana@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200731154834.8378-3-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 14:23:50 +03:00
Chris Wilson 2854d86632 drm/i915/gt: Replace intel_engine_transfer_stale_breadcrumbs
After staring at the breadcrumb enabling/cancellation and coming to the
conclusion that the cause of the mysterious stale breadcrumbs must the
act of submitting a completed requests, we can then redirect those
completed requests onto a dedicated signaled_list at the time of
construction and so eliminate intel_engine_transfer_stale_breadcrumbs().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200731154834.8378-2-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 14:23:14 +03:00
Chris Wilson c18636f763 drm/i915: Remove requirement for holding i915_request.lock for breadcrumbs
Since the breadcrumb enabling/cancelling itself is serialised by the
breadcrumbs.irq_lock, with a bit of care we can remove the outer
serialisation with i915_request.lock for concurrent
dma_fence_enable_signaling(). This has the important side-effect of
eliminating the nested i915_request.lock within request submission.

The challenge in serialisation is around the unsubmission where we take
an active request that wants a breadcrumb on the signaling engine and
put it to sleep. We do not want a concurrent
dma_fence_enable_signaling() to attach a breadcrumb as we unsubmit, so
we must mark the request as no longer active before serialising with the
concurrent enable-signaling.

On retire, we serialise with the concurrent enable-signaling, but
instead of clearing ACTIVE, we mark it as SIGNALED.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200731154834.8378-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Joonas: Rebased and reordered into drm-intel-gt-next branch]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 13:29:32 +03:00
Chris Wilson af5c6fcf40 drm/i915: Provide a fastpath for waiting on vma bindings
Before we can execute a request, we must wait for all of its vma to be
bound. This is a frequent operation for which we can optimise away a
few atomic operations (notably a cmpxchg) in lieu of the RCU protection.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200731085015.32368-7-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 13:29:19 +03:00
Chris Wilson 9ff33bbcda drm/i915: Reduce locking around i915_active_acquire_preallocate_barrier()
As the conversion between idle-barrier and full i915_active_fence is
already serialised by explicit memory barriers, we can reduce the
spinlock in i915_active_acquire_preallocate_barrier() for finding an
idle-barrier to reuse to an RCU read lock to ensure the fence remains
valid, only taking the spinlock for the update of the rbtree itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200731085015.32368-6-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 13:19:11 +03:00
Chris Wilson e28860ae21 drm/i915: Make the stale cached active node available for any timeline
Rather than require the next timeline after idling to match the MRU
before idling, reset the index on the node and allow it to match the
first request. However, this requires cmpxchg(u64) and so is not trivial
on 32b, so for compatibility we just fallback to keeping the cached node
pointing to the MRU timeline.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200731085015.32368-5-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 13:18:55 +03:00
Chris Wilson 99a7f4dae7 drm/i915: Keep the most recently used active-fence upon discard
Whenever an i915_active idles, we prune its tree of old fence slots to
prevent a gradual leak should it be used to track many, many timelines.
The downside is that we then have to frequently reallocate the rbtree.
A compromise is that we keep the most recently used fence slot, and
reuse that for the next active reference as that is the most likely
timeline to be reused.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200731085015.32368-4-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 13:18:36 +03:00
Chris Wilson 5d9341370f drm/i915: Export a preallocate variant of i915_active_acquire()
Sometimes we have to be very careful not to allocate underneath a mutex
(or spinlock) and yet still want to track activity. Enter
i915_active_acquire_for_context(). This raises the activity counter on
i915_active prior to use and ensures that the fence-tree contains a slot
for the context.

v2: Refactor active_lookup() so it can be called again before/after
locking to resolve contention. Since we protect the rbtree until we
idle, we can do a lockfree lookup, with the caveat that if another
thread performs a concurrent insertion, the rotations from the insert
may cause us to not find our target. A second pass holding the treelock
will find the target if it exists, or the place to perform our
insertion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200731085015.32368-3-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 13:18:17 +03:00
Chris Wilson 04240e30ed drm/i915: Skip taking acquire mutex for no ref->active callback
If no active callback is defined for i915_active, we do not need to
serialise its enabling with the mutex. We still do only want to call the
debug activate once, and must still serialise with a concurrent retire.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200731085015.32368-2-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 13:18:01 +03:00
Chris Wilson bde246d893 drm/i915/selftests: Drop stale timeline constructor assert
Since we pass around encoded parameters to the kernel context
constructor using the ce->timeline pointer, we can no longer assert that
it should be zero for mock timeline construction.

Fixes: d1bf5dd8f6 ("drm/i915/gt: Support multiple pinned timelines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200731102206.6793-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Joonas: Updated Fixes: link after rebasing and reordering into drm-intel-gt-next branch]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 13:17:13 +03:00
Chris Wilson 13106019f7 drm/i915/gt: Pull release of node->age under the spinlock
We need to ensure that the list is valid prior to marking the node as
retrievable, otherwise we may see two threads compete over the same node
in intel_gt_get_buffer_pool(). If the first thread acquires and releases
the node in the same jiffie, the second thread may then acquire it (as
the jiffie now again matches the expected value) and claim the node
before it is put back into the list.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200730134049.8822-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 13:16:58 +03:00
Chris Wilson d1bf5dd8f6 drm/i915/gt: Support multiple pinned timelines
We may need to allocate more than one pinned context/timeline for each
engine which can utilise the per-engine HWSP, so we need to give each
a different offset within it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200730183906.25422-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 13:16:43 +03:00
Chris Wilson eb4dedae92 drm/i915/gem: Delay tracking the GEM context until it is registered
Avoid exposing a partially constructed context by deferring the
list_add() from the initial construction to the end of registration.
Otherwise, if we peek into the list of contexts from inside debugfs, we
may see the partially constructed context and chase down some dangling
incomplete pointers.

Reported-by: CQ Tang <cq.tang@intel.com>
Fixes: 3aa9945a52 ("drm/i915: Separate GEM context construction and registration to userspace")
References: f6e8aa3871 ("drm/i915: Report the number of closed vma held by each context in debugfs")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: CQ Tang <cq.tang@intel.com>
Cc: <stable@vger.kernel.org> # v5.2+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200730092856.23615-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 13:15:25 +03:00
Chris Wilson a30e4ec176 drm/i915/gt: Fix termination condition for freeing all buffer objects
A last minute change, that unfortunately broke CI so badly it declared
SUCCESS, was to refactor the debug free all buffer pool code to reuse
the normal worker, inverted the termination condition so that it instead
of discarding the nodes, they were all declared young enough and
eligible for reuse.

Fixes: 06b73c2d0b ("drm/i915/gt: Delay taking the spinlock for grabbing from the buffer pool")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200729110756.2344-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Joonas: Updating Fixes: link after rebasing and reordering into drm-intel-gt-next]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 13:14:22 +03:00
Chris Wilson 62b1522cc3 drm/i915/selftests: Flush the active barriers before asserting
Before we peek at the barrier status for an assert, first serialise with
its callbacks so that we see a stable value.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200728153325.28351-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 13:14:15 +03:00
Chris Wilson 06b73c2d0b drm/i915/gt: Delay taking the spinlock for grabbing from the buffer pool
Some very low hanging fruit, but contention on the pool->lock is
noticeable between intel_gt_get_buffer_pool() and pool_retire(), with
the majority of the hold time due to the locked list iteration. If we
make the node itself RCU protected, we can perform the search for an
suitable node just under RCU, reserving taking the lock itself for
claiming the node and manipulating the list.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200729080245.8070-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 13:14:07 +03:00
Chris Wilson a817c891c1 drm/i915/gt: Disable preparser around xcs invalidations on tgl
Unlike rcs where we have conclusive evidence from our selftesting that
disabling the preparser before performing the TLB invalidate and
relocations does impact upon the GPU execution, the evidence for the
same requirement on xcs is much more circumstantial. Let's apply the
preparser disable between batches as we invalidate the TLB as a dose of
healthy paranoia, just in case.

References: https://gitlab.freedesktop.org/drm/intel/-/issues/2169
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200728152110.830-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 13:13:59 +03:00
Chris Wilson 27a5dcfe73 drm/i915/gem: Remove disordered per-file request list for throttling
I915_GEM_THROTTLE dates back to the time before contexts where there was
just a single engine, and therefore a single timeline and request list
globally. That request list was in execution/retirement order, and so
walking it to find a particular aged request made sense and could be
split per file.

That is no more. We now have many timelines with a file, as many as the
user wants to construct (essentially per-engine, per-context). Each of
those run independently and so make the single list futile. Remove the
disordered list, and iterate over all the timelines to find a request to
wait on in each to satisfy the criteria that the CPU is no more than 20ms
ahead of its oldest request.

It should go without saying that the I915_GEM_THROTTLE ioctl is no
longer used as the primary means of throttling, so it makes sense to push
the complication into the ioctl where it only impacts upon its few
irregular users, rather than the execbuf/retire where everybody has to
pay the cost. Fortunately, the few users do not create vast amount of
contexts, so the loops over contexts/engines should be concise.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200728152010.30701-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 13:13:50 +03:00
Chris Wilson 3adee4ac29 drm/i915: Soften the tasklet flush frequency before waits
We include a tasklet flush before waiting on a request as a precaution
against the HW being lax in event signaling. We now have a precautionary
flush in the engine's heartbeat and so do not need to be quite so
zealous on every request wait. If we focus on the request, the only
tasklet flush that matters is if there is a delay in submitting this
request to HW, so if the request is not ready to be executed, no
advantage in reducing this wait can be gained by running the tasklet.
And there is little point in doing busy work for no result.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200715115147.11866-10-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 13:13:41 +03:00
Chris Wilson e3d0e21396 drm/i915/selftests: Mock the status_page.vma for the kernel_context
Since we assert that the kernel_context is using the perma-pinned HWSP,
make it so.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2179
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200715155858.16410-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 13:13:29 +03:00
Chris Wilson 3f6a6f343c drm/i915: Reduce i915_request.lock contention for i915_request_wait
Currently, we use i915_request_completed() directly in
i915_request_wait() and follow up with a manual invocation of
dma_fence_signal(). This appears to cause a large number of contentions
on i915_request.lock as when the process is woken up after the fence is
signaled by an interrupt, we will then try and call dma_fence_signal()
ourselves while the signaler is still holding the lock.
dma_fence_is_signaled() has the benefit of checking the
DMA_FENCE_FLAG_SIGNALED_BIT prior to calling dma_fence_signal() and so
avoids most of that contention.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200716100754.5670-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07 13:13:06 +03:00
Linus Torvalds f4d51dffc6 Linux 5.9-rc4 2020-09-06 17:11:40 -07:00
Linus Torvalds a8205e3100 io_uring-5.9-2020-09-06
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl9U/MMQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpg4BEAC6TQ6ctc1yNTTWwiz4UrdIKMAWP7S7wepu
 k00A9+JjLLJBVdkz9rZ2Y/SyGe12qBM+riiRSn/gNTbkd4qq2rCY2d8U1vXXKyP5
 VDXPo12zsUD5WpqdEMfXUB4+DOQs0a3DAyDLT0+K/Qw0rpOQVZA26Ovn6GOh9+Kq
 KJCllynYkwUQ/7CXsdI13ktMI1HADFOx3149SlGkbuPOggwNQrGiLJAxyUOn/i+E
 uiHy8b8o3B2nun61+Y98q+MDISLf0xXYEbHeAsvETEy52ya2iadnMo1lwS5zHzM+
 p6jOBybHaM/wz5t1V44VTBvfog1KAUtp8K0gsxcB6Ezf7LhYfTVjtvfZqpwVz1sl
 txkxhnGEBURHpdr0aCtC15cIpbDGM85ymjP4RD0YlV7oyT0+Ufx7r9jJjLf7IhZO
 FMyEFmVwSPD/NdJE9ZNx0I2v/qdIzYwfeAix4Z4bXoe51BtPrf1uBgsGWVuqNVz/
 dKVf1vK1tPF8PgIiIW/o8GI4iF3RRVQcLwJGiAWMzBS11iniJLf9mUPRVl5Bxpeb
 YRd2chm+ppHC/IgtK44x7Ce6415hnbKehE2KUr43PHB7nIMNWcRsurxLizM7ZGei
 Gv2/K9PM3+4O/b1k20xLakz4Vw3Isk4W6/Flj9LoxheBo+CUG4Rsx5UyzFEB6apA
 uXxiF6ORLg==
 =v5QQ
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.9-2020-09-06' of git://git.kernel.dk/linux-block

Pull more io_uring fixes from Jens Axboe:
 "Two followup fixes. One is fixing a regression from this merge window,
  the other is two commits fixing cancelation of deferred requests.

  Both have gone through full testing, and both spawned a few new
  regression test additions to liburing.

   - Don't play games with const, properly store the output iovec and
     assign it as needed.

   - Deferred request cancelation fix (Pavel)"

* tag 'io_uring-5.9-2020-09-06' of git://git.kernel.dk/linux-block:
  io_uring: fix linked deferred ->files cancellation
  io_uring: fix cancel of deferred reqs with ->files
  io_uring: fix explicit async read/write mapping for large segments
2020-09-06 12:10:27 -07:00
Linus Torvalds 2ccdd9f8b2 IOMMU Fixes for Linux v5.9-rc3
Including:
 
 	- Three Intel VT-d fixes to fix address handling on 32bit, fix a
 	  NULL pointer dereference bug and serialize a hardware register
 	  access as required by the VT-d spec.
 
 	- Two patches for AMD IOMMU to force AMD GPUs into translation mode
 	  when memory encryption is active and disallow using IOMMUv2
 	  functionality. This makes the AMDGPU driver working when
 	  memory encryption is active.
 
 	- Two more fixes for AMD IOMMU to fix updating the Interrupt
 	  Remapping Table Entries.
 
 	- MAINTAINERS file update for the Qualcom IOMMU driver.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAl9U+Q4ACgkQK/BELZcB
 GuOkexAA14LsYG8QWd1lwgf3Yh6/xGwhw/vDTHCMCNqFCMUHTLjte6uLZ9QH/Pxy
 fvwyX73ZOjsrKXOwkKvEFD9fCJDyMsPsfH3ssYNgOZfTFculcGdNc6NhLWhoKCc8
 pmM0tVikRKxhAMxWgc9EX4dwu9WRe8vobvZj0cAPFlMAkY2ZY3lDtl8bniX9cygW
 Oo2hmhud1CeLE3ffUWhWgT9T7GfTL1i6KFF8wUqrNFE53+b7pS4LbJaCWuHKjtKu
 MaBC0eE0rG0qvSng6ArdZx22a47o2j57zc0mG6IeHtGe7eUAgwNkRLQmL1RU35xW
 cDniCJXosQ7DJ0Hel12ahk1HlUQFFqMlZVZbNpYP04v8xqHNEsWvWHw2NS6xZg2T
 FozBWdnhw8HzaXpWmg14WQ+e9+rnsK/FndRxBjXP9H9bKQQNKQV8PaiUEQCZin0s
 QJohAgqyjyPLB3a/B9q1jzG5HHI2pK2s2/4ry9XBdijZ7MhADq9pAvAe9k8baU/5
 gEy6pa4PmZEMibbrfUNjIoeYz80hDaHIxIbW/fFgl5n1JJJoz+qsfZlYaddrN6nc
 XFWDg9dDgQjDaYDjK9uG2vJkbMFiqE0kRxtJdWfqZNGgSWxqVIMtf42ZvYlqH+Wb
 Zc9nBEuH27o4NZ//2mXzynwBooBTmOsLRw/kqSPOuBVMki08fqk=
 =2uqs
 -----END PGP SIGNATURE-----

Merge tag 'iommu-fixes-v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:

 - three Intel VT-d fixes to fix address handling on 32bit, fix a NULL
   pointer dereference bug and serialize a hardware register access as
   required by the VT-d spec.

 - two patches for AMD IOMMU to force AMD GPUs into translation mode
   when memory encryption is active and disallow using IOMMUv2
   functionality.  This makes the AMDGPU driver work when memory
   encryption is active.

 - two more fixes for AMD IOMMU to fix updating the Interrupt Remapping
   Table Entries.

 - MAINTAINERS file update for the Qualcom IOMMU driver.

* tag 'iommu-fixes-v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/vt-d: Handle 36bit addressing for x86-32
  iommu/amd: Do not use IOMMUv2 functionality when SME is active
  iommu/amd: Do not force direct mapping when SME is active
  iommu/amd: Use cmpxchg_double() when updating 128-bit IRTE
  iommu/amd: Restore IRTE.RemapEn bit after programming IRTE
  iommu/vt-d: Fix NULL pointer dereference in dev_iommu_priv_set()
  iommu/vt-d: Serialize IOMMU GCMD register modifications
  MAINTAINERS: Update QUALCOMM IOMMU after Arm SMMU drivers move
2020-09-06 11:58:15 -07:00
Linus Torvalds 015b3155c4 Misc fixes:
- Fix more generic entry code ABI fallout
  - Fix debug register handling bugs
  - Fix vmalloc mappings on 32-bit kernels
  - Fix kprobes instrumentation output on 32-bit kernels
  - Fix over-eager WARN_ON_ONCE() on !SMAP hardware
  - Fix NUMA debugging
  - Fix Clang related crash on !RETPOLINE kernels
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl9UljIRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1g1yA//VecoyJOw4jb43LdkeKDGtUjCsPVZlt4w
 fw55nT4taqqbgl9mQjrJQlh8thtk7LvAqcsrEGk/SH+1fp/hDvBG0i3etyI1mPJ2
 t97MCVtD1bz2zyLpOtGN48tgiRxSazr4S9nZPCLTec+c75I3pmJssj44m/eJi/Z2
 hoj/syiO4J0BPa7a1ou++Jeyag6J+PgXdJTOMyjuqi99vqai1aTVKo8GdWMInext
 +fJNYd0ZQRj1FxVdMusDfzxOk7N7b8nAzvd30iJN67R6QwoEazO12K1F4IYQmHSq
 0rhHrwe0lTLtjmYdp/ef14kfzD7DRFN6Nv2gk/zyZsH+tjGflxTZConkFPnfoJEc
 33cNHfigh0V9TSVNDDhHnkRyy6dzCHkYHEf33KFuX3amC236TgrCEL7+oWE2rcNp
 9PJbPGlXCqNb2feNy2de4cY+KiZ2a1N/T4VcdMK6DEdENFh5T03EZgIChQEd0S99
 LNBYHqTWJdQEKfkzfAXlR4Bd2hX1LWLMM6rNcXxInrH7rWDXUCS0X9m3gLZR9DIs
 7/nXoK4OkaJdgH/D2CToDgwMNT5hlIiTGtVtB3H6Qz8eQQ4+fwTyboQDqpeG4Upy
 LfOH2h5Fo33FCgqnrua8IsgUKLwW2yJGdghJpcd9d0qfVUDEJuXGo6xe6SEHdSu/
 VEiQtFUf50U=
 =EhRy
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2020-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:

 - more generic entry code ABI fallout

 - debug register handling bugfixes

 - fix vmalloc mappings on 32-bit kernels

 - kprobes instrumentation output fix on 32-bit kernels

 - fix over-eager WARN_ON_ONCE() on !SMAP hardware

 - NUMA debugging fix

 - fix Clang related crash on !RETPOLINE kernels

* tag 'x86-urgent-2020-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/entry: Unbreak 32bit fast syscall
  x86/debug: Allow a single level of #DB recursion
  x86/entry: Fix AC assertion
  tracing/kprobes, x86/ptrace: Fix regs argument order for i386
  x86, fakenuma: Fix invalid starting node ID
  x86/mm/32: Bring back vmalloc faulting on x86_32
  x86/cmdline: Disable jump tables for cmdline.c
2020-09-06 10:28:00 -07:00
Linus Torvalds 68beef5710 xen: branch for v5.9-rc4
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCX1Rn1wAKCRCAXGG7T9hj
 vlEjAQC/KGC3wYw5TweWcY48xVzgvued3JLAQ6pcDlOe6osd6AEAzZcZKgL948cx
 oY0T98dxb/U+lUhbIzhpBr/30g8JbAQ=
 =Xcxp
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from Juergen Gross:
 "A small series for fixing a problem with Xen PVH guests when running
  as backends (e.g. as dom0).

  Mapping other guests' memory is now working via ZONE_DEVICE, thus not
  requiring to abuse the memory hotplug functionality for that purpose"

* tag 'for-linus-5.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: add helpers to allocate unpopulated memory
  memremap: rename MEMORY_DEVICE_DEVDAX to MEMORY_DEVICE_GENERIC
  xen/balloon: add header guard
2020-09-06 09:59:27 -07:00
Hans de Goede f8bd54d219 drm/i915: panel: Use atomic PWM API for devs with an external PWM controller
Now that the PWM drivers which we use have been converted to the atomic
PWM API, we can move the i915 panel code over to using the atomic PWM API.

The removes a long standing FIXME and this removes a flicker where
the backlight brightness would jump to 100% when i915 loads even if
using the fastset path.

Note that this commit also simplifies pwm_disable_backlight(), by dropping
the intel_panel_actually_set_backlight(..., 0) call. This call sets the
PWM to 0% duty-cycle. I believe that this call was only present as a
workaround for a bug in the pwm-crc.c driver where it failed to clear the
PWM_OUTPUT_ENABLE bit. This is fixed by an earlier patch in this series.

After the dropping of this workaround, the usleep call, which seems
unnecessary to begin with, has no useful effect anymore, so drop that too.

Acked-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200903112337.4113-18-hdegoede@redhat.com
2020-09-06 15:53:37 +02:00
Hans de Goede 9a6ae5b354 drm/i915: panel: Honor the VBT PWM min setting for devs with an external PWM controller
So far for devices using an external PWM controller (devices using
pwm_setup_backlight()), we have been hardcoding the minimum allowed
PWM level to 0. But several of these devices specify a non 0 minimum
setting in their VBT.

Change pwm_setup_backlight() to use get_backlight_min_vbt() to get
the minimum level.

Acked-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200903112337.4113-17-hdegoede@redhat.com
2020-09-06 15:53:24 +02:00
Hans de Goede 6b51e7d23a drm/i915: panel: Honor the VBT PWM frequency for devs with an external PWM controller
So far for devices using an external PWM controller (devices using
pwm_setup_backlight()), we have been hardcoding the period-time passed to
pwm_config() to 21333 ns.

I suspect this was done because many VBTs set the PWM frequency to 200
which corresponds to a period-time of 5000000 ns, which greatly exceeds
the PWM_MAX_PERIOD_NS define in the Crystal Cove PMIC PWM driver, which
used to be 21333.

This PWM_MAX_PERIOD_NS define was actually based on a bug in the PWM
driver where its period and duty-cycle times where off by a factor of 256.

Due to this bug the hardcoded CRC_PMIC_PWM_PERIOD_NS value of 21333 would
result in the PWM driver using its divider of 128, which would result in
a PWM output frequency of 6000000 Hz / 256 / 128 = 183 Hz. So actually
pretty close to the default VBT value of 200 Hz.

Now that this bug in the pwm-crc driver is fixed, we can actually use
the VBT defined frequency.

This is important because:

a) With the pwm-crc driver fixed it will now translate the hardcoded
CRC_PMIC_PWM_PERIOD_NS value of 21333 ns / 46 Khz to a PWM output
frequency of 23 KHz (the max it can do).

b) The pwm-lpss driver used on many models has always honored the
21333 ns / 46 Khz request

Some panels do not like such high output frequencies. E.g. on a Terra
Pad 1061 tablet, using the LPSS PWM controller, the backlight would go
from off to max, when changing the sysfs backlight brightness value from
90-100%, anything under aprox. 90% would turn the backlight fully off.

Honoring the VBT specified PWM frequency will also hopefully fix the
various bug reports which we have received about users perceiving the
backlight to flicker after a suspend/resume cycle.

Acked-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200903112337.4113-16-hdegoede@redhat.com
2020-09-06 15:53:08 +02:00
Hans de Goede 27a79cbc17 drm/i915: panel: Add get_vbt_pwm_freq() helper
Factor the code which checks and drm_dbg_kms-s the VBT PWM frequency
out of get_backlight_max_vbt().

This is a preparation patch for honering the VBT PWM frequency for
devices which use an external PWM controller (devices using
pwm_setup_backlight()).

Acked-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200903112337.4113-15-hdegoede@redhat.com
2020-09-06 15:38:05 +02:00
Hans de Goede c86b155da7 pwm: crc: Implement get_state() method
Implement the pwm_ops.get_state() method to complete the support for the
new atomic PWM API.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200903112337.4113-14-hdegoede@redhat.com
2020-09-06 15:38:05 +02:00