Commit Graph

1684 Commits

Author SHA1 Message Date
Chris Wilson 3b5724d702 drm/i915: Wait for writes through the GTT to land before reading back
If we quickly switch from writing through the GTT to a read of the
physical page directly with the CPU (e.g. performing relocations through
the GTT and then running the command parser), we can observe that the
writes are not visible to the CPU. It is not a coherency problem, as
extensive investigations with clflush have demonstrated, but a mere
timing issue - we have to wait for the GTT to complete it's write before
we start our read from the CPU.

The issue can be illustrated in userspace with:

	gtt = gem_mmap__gtt(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
	cpu = gem_mmap__cpu(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
	gem_set_domain(fd, handle, I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT);

	for (i = 0; i < OBJECT_SIZE / 64; i++) {
		int x = 16*i + (i%16);
		gtt[x] = i;
		clflush(&cpu[x], sizeof(cpu[x]));
		assert(cpu[x] == i);
	}

Experimenting with that shows that this behaviour is indeed limited to
recent Atom-class hardware.

Testcase: igt/gem_exec_flush/basic-batch-default-cmd #byt
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-10-chris@chris-wilson.co.uk
2016-08-18 22:36:45 +01:00
Chris Wilson a314d5cb4a drm/i915: Before accessing an object via the cpu, flush GTT writes
If we want to read the pages directly via the CPU, we have to be sure
that we have to flush the writes via the GTT (as the CPU can not see
the address aliasing).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-9-chris@chris-wilson.co.uk
2016-08-18 22:36:44 +01:00
Chris Wilson 43394c7d0d drm/i915: Extract i915_gem_obj_prepare_shmem_write()
This is a companion to i915_gem_obj_prepare_shmem_read() that prepares
the backing storage for direct writes. It first serialises with the GPU,
pins the backing storage and then indicates what clfushes are required in
order for the writes to be coherent.

Whilst here, fix support for ancient CPUs without clflush for which we
cannot do the GTT+clflush tricks.

v2: Add i915_gem_obj_finish_shmem_access() for symmetry

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-8-chris@chris-wilson.co.uk
2016-08-18 22:36:44 +01:00
Chris Wilson 1803458478 drm/i915: Fallback to single page pwrite/pread if unable to release fence
If we cannot release the fence (for example if someone is inexplicably
trying to write into a tiled framebuffer that is currently pinned to the
display! *cough* kms_frontbuffer_tracking *cough*) fallback to using the
page-by-page pwrite/pread interface, rather than fail the syscall
entirely.

Since this is triggerable by the user (along pwrite) we have to remove
the WARN_ON(fence->pin_count).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-6-chris@chris-wilson.co.uk
2016-08-18 22:36:43 +01:00
Chris Wilson d243ad8202 drm/i915: Mark up the GTT flush following WC writes as ORIGIN_CPU
Similarly to invalidating beforehand, if the object is mmapped via
I915_MMAP_WC we cannot track writes through the I915_GEM_DOMAIN_GTT. At
the conclusion of the write, i915_gem_object_flush_gtt_writes() we also
need to treat the origin carefully in case it may have been untracked.

See also commit aeecc9696a ("drm/i915: use ORIGIN_CPU for frontbuffer
invalidation on WC mmaps").

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-5-chris@chris-wilson.co.uk
2016-08-18 22:36:43 +01:00
Chris Wilson b19482d7ce drm/i915: Use ORIGIN_CPU for fb invalidation from pwrite
As pwrite does not use the fence for its GTT access, and may even go
through a secondary interface avoiding the main VMA, we cannot treat the
write as automatically invalidated by the hardware and so we require
ORIGIN_CPU frontbufer invalidate/flushes.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-4-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-08-18 22:36:24 +01:00
Chris Wilson 4b30cb2334 drm/i915: vfree() no longer ignores the low bits of the address
Since vfree() now likes to WARN when passed a non-page-aligned pointer,
we need to discard the low bits to comply with it.

Fixes: d31d7cb146 ("drm/i915: Support for creating write combined type vmaps")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-3-chris@chris-wilson.co.uk
2016-08-18 22:36:23 +01:00
Chris Wilson 1255501d86 drm/i915: Embrace the race in busy-ioctl
Daniel Vetter proposed a new challenge to the serialisation inside the
busy-ioctl that exposed a flaw that could result in us reporting the
wrong engine as being busy. If the request is reallocated as we test
its busyness and then reassigned to this object by another thread, we
would not notice that the test itself was incorrect.

We are faced with a choice of using __i915_gem_active_get_request_rcu()
to first acquire a reference to the request preventing the race, or to
acknowledge the race and accept the limitations upon the accuracy of the
busy flags. Note that we guarantee that we never falsely report the
object as idle (providing userspace itself doesn't race), and so the
most important use of the busy-ioctl and its guarantees are fulfilled.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471337440-16777-1-git-send-email-chris@chris-wilson.co.uk
2016-08-16 10:35:02 +01:00
Chris Wilson bde13ebdab drm/i915: Introduce i915_ggtt_offset()
This little helper only exists to safely discard the upper unused 32bits
of the general 64-bit VMA address - as we know that all Global GTT
currently are less than 4GiB in size and so that the upper bits must be
zero. In many places, we use a u32 for the global GTT offset and we want
to document where we are discarding the full VMA offset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-28-git-send-email-chris@chris-wilson.co.uk
2016-08-15 11:01:14 +01:00
Chris Wilson 058d88c433 drm/i915: Track pinned VMA
Treat the VMA as the primary struct responsible for tracking bindings
into the GPU's VM. That is we want to treat the VMA returned after we
pin an object into the VM as the cookie we hold and eventually release
when unpinning. Doing so eliminates the ambiguity in pinning the object
and then searching for the relevant pin later.

v2: Joonas' stylistic nitpicks, a fun rebase.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-27-git-send-email-chris@chris-wilson.co.uk
2016-08-15 11:01:13 +01:00
Chris Wilson 247177ddd5 drm/i915: Always set the vma->pages
Previously, we would only set the vma->pages pointer for GGTT entries.
However, if we always set it, we can use it to prettify some code that
may want to access the backing store associated with the VMA (as
assigned to the VMA).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-8-git-send-email-chris@chris-wilson.co.uk
2016-08-15 11:00:54 +01:00
Daniel Vetter cc9263874b Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued
Backmerge because too many conflicts, and also we need to get at the
latest struct fence patches from Gustavo. Requested by Chris Wilson.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2016-08-15 10:41:47 +02:00
Dave Airlie fc93ff608b Merge tag 'drm-intel-next-2016-08-08' of git://anongit.freedesktop.org/drm-intel into drm-next
- refactor ddi buffer programming a bit (Ville)
- large-scale renaming to untangle naming in the gem code (Chris)
- rework vma/active tracking for accurately reaping idle mappings of shared
  objects (Chris)
- misc dp sst/mst probing corner case fixes (Ville)
- tons of cleanup&tunings all around in gem
- lockless (rcu-protected) request lookup, plus use it everywhere for
  non(b)locking waits (Chris)
- pipe crc debugfs fixes (Rodrigo)
- random fixes all over

* tag 'drm-intel-next-2016-08-08' of git://anongit.freedesktop.org/drm-intel: (222 commits)
  drm/i915: Update DRIVER_DATE to 20160808
  drm/i915: fix aliasing_ppgtt leak
  drm/i915: Update comment before i915_spin_request
  drm/i915: Use drm official vblank_no_hw_counter callback.
  drm/i915: Fix copy_to_user usage for pipe_crc
  Revert "drm/i915: Track active streams also for DP SST"
  drm/i915: fix WaInsertDummyPushConstPs
  drm/i915: Assert that the request hasn't been retired
  drm/i915: Repack fence tiling mode and stride into a single integer
  drm/i915: Document and reject invalid tiling modes
  drm/i915: Remove locking for get_tiling
  drm/i915: Remove pinned check from madvise ioctl
  drm/i915: Reduce locking inside swfinish ioctl
  drm/i915: Remove (struct_mutex) locking for busy-ioctl
  drm/i915: Remove (struct_mutex) locking for wait-ioctl
  drm/i915: Do a nonblocking wait first in pread/pwrite
  drm/i915: Remove unused no-shrinker-steal
  drm/i915: Tidy generation of the GTT mmap offset
  drm/i915/shrinker: Wait before acquiring struct_mutex under oom
  drm/i915: Simplify do_idling() (Ironlake vt-d w/a)
  ...
2016-08-15 16:53:57 +10:00
Chris Wilson 02bef8f98d drm/i915: Unbind closed vma for i915_gem_object_unbind()
Closed vma are removed from the obj->vma_list so that they cannot be
found by userspace. However, this means that when forcibly unbinding an
object, we have to wait upon all rendering to that object first in order
for the closed, but active, vma to be reaped and their bindings removed.

Reported-by: Matthew Auld <matthew.auld@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97343
Fixes: aa653a685d ("drm/i915: Be more careful when unbinding vma")
Fixes: 8a3b3d576c (" drm/i915: Convert non-blocking userptr waits...")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471196681-30043-2-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Tested-by: Matthew Auld <matthew.auld@intel.com>
2016-08-14 19:40:09 +01:00
Chris Wilson 35a9611ca0 drm/i915: Initialize return value for empty i915_gem_object_unbind()
If the obj->vma_list is empty, we immediately return ret. However, we
are doing so having never set it to any value, it should be zero!

Reported-by: Matthew Auld <matthew.auld@intel.com>
References: https://bugs.freedesktop.org/show_bug.cgi?id=97343
Fixes: aa653a685d ("drm/i915: Be more careful when unbinding vma")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471196681-30043-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
2016-08-14 19:38:26 +01:00
Chris Wilson d31d7cb146 drm/i915: Support for creating write combined type vmaps
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.

We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT

v2: Remove the redundant braces around pin count check and fix the marker
     in documentation (Chris)

v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
   i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
   superfluous BUG_ON.(Tvrtko)

v4:
- Rename the enums and clean up the pin_map function. (Chris)

v5: Drop the VM_NO_GUARD, minor cosmetics.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
2016-08-12 13:06:36 +01:00
Chris Wilson 2ca17b87e8 drm/i915: Add missing rpm wakelock to GGTT pread
Joonas spotted a discrepancy between the pwrite and pread ioctls, in
that pwrite takes the rpm wakelock around its GGTT access, The wakelock
is required in order for the GTT to function. In disregard for the
current convention, we take the rpm wakelock around the access itself
rather than around the struct_mutex as the nesting is not strictly
required and such ordering will one day be fixed by explicitly noting
the barrier dependencies between the GGTT and rpm.

Fixes: b50a53715f ("drm/i915: Support for pread/pwrite ...")
Reported-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: drm-intel-fixes@lists.freedesktop.org
Link: http://patchwork.freedesktop.org/patch/msgid/1470298193-21765-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit 1dd5b6f202)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-08-11 00:59:48 +03:00
Chris Wilson fae82e59d2 drm/i915: Handle ENOSPC after failing to insert a mappable node
Even after adding individual page support for GTT mmaping, we can still
fail to find any space within the mappable region, and
drm_mm_insert_node() will then report ENOSPC. We have to then handle
this error by using the shmem access to the pages.

Fixes: b50a53715f ("drm/i915: Support for pread/pwrite ... objects")
Testcase: igt/gem_concurrent_blit
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com
Link: http://patchwork.freedesktop.org/patch/msgid/1468690956-23480-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(cherry picked from commit d1054ee492)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-08-11 00:49:02 +03:00
Chris Wilson b06bc7ec4d drm/i915: Flush GT idle status upon reset
Upon resetting the GPU, we force the engines to be idle by clearing
their request lists. However, I neglected to clear the GT active status
and so the next request following the reset was not marking the device
as busy again. (We had to wait until any outstanding retire worker
finally ran and cleared the active status.)

Fixes: 67d97da349 ("drm/i915: Only start retire worker when idle")
Testcase: igt/pm_rps/reset
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1468397438-21226-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit b913b33c43)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-08-10 17:43:49 +03:00
Chris Wilson 83348ba84e drm/i915: Move missed interrupt detection from hangcheck to breadcrumbs
In commit 2529d57050 ("drm/i915: Drop racy markup of missed-irqs from
idle-worker") the racy detection of missed interrupts was removed when
we went idle. This however opened up the issue that the stuck waiters
were not being reported, causing a test case failure. If we move the
stuck waiter detection out of hangcheck and into the breadcrumb
mechanims (i.e. the waiter) itself, we can avoid this issue entirely.
This leaves hangcheck looking for a stuck GPU (inspecting for request
advancement and HEAD motion), and breadcrumbs looking for a stuck
waiter - hopefully make both easier to understand by their segregation.

v2: Reduce the error message as we now run independently of hangcheck,
and the hanging batch used by igt also counts as a stuck waiter causing
extra warnings in dmesg.
v3: Move the breadcrumb's hangcheck kickstart to the first missed wait.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97104
Fixes: 2529d57050 (waiter"drm/i915: Drop racy markup of missed-irqs...")
Testcase: igt/drv_missed_irq
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470761272-1245-2-git-send-email-chris@chris-wilson.co.uk
2016-08-10 10:37:35 +01:00
Chris Wilson 70cb472c6d drm/i915: Always mark the writer as also a read for busy ioctl
One of the few guarantees we want the busy ioctl to provide is that the
reported busy writer is included in the set of busy read engines. This
should be provided by the ordering of setting and retiring the active
trackers, but we can do better by explicitly setting the busy read
engine flag for the last writer.

v2: More comments inside __busy_write_id() to explain why both fields
are set.

Fixes: 3fdc13c7a3 ("drm/i915: Remove (struct_mutex) locking for busy-ioctl")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470762505-12799-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-08-10 10:37:21 +01:00
Chris Wilson edf6b76f64 drm/i915: Add smp_rmb() to busy ioctl's RCU dance
In the debate as to whether the second read of active->request is
ordered after the dependent reads of the first read of active->request,
just give in and throw a smp_rmb() in there so that ordering of loads is
assured.

v2: Explain the manual smp_rmb()

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470731014-6894-1-git-send-email-chris@chris-wilson.co.uk
2016-08-09 10:17:26 +01:00
Chris Wilson 87b723a16d drm/i915: Don't check for idleness before retiring after a GPU hang
When we force the cleanup after a GPU hang, we want to retire all
requests, or else we may leak them if truly wedged (and the GPU never
advances again). Converting to the active request helpers had the issue
of doing the check against busyness before reporting the request, so if
we claim the GPU had hung but this engine hadn't we could potential skip
the request cleanup - triggering the self-check BUG.

Fixes: dcff85c844 ("drm/i915: Enable i915_gem_wait_for_idle() ...")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470728222-10243-3-git-send-email-chris@chris-wilson.co.uk
2016-08-09 09:11:59 +01:00
Chris Wilson 3e510a8e65 drm/i915: Repack fence tiling mode and stride into a single integer
In the previous commit, we moved the obj->tiling_mode out of a bitfield
and into its own integer so that we could safely use READ_ONCE(). Let us
now repair some of that damage by sharing the tiling_mode with its
companion, the fence stride.

v2: New magic

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-18-git-send-email-chris@chris-wilson.co.uk
2016-08-05 10:54:43 +01:00
Chris Wilson e883d73503 drm/i915: Remove pinned check from madvise ioctl
We don't need to incur the overhead of checking whether the object is
pinned prior to changing its madvise. If the object is pinned, the
madvise will not take effect until it is unpinned and so we cannot free
the pages being pointed at by hardware. Marking a pinned object with
allocated pages as DONTNEED will not trigger any undue warnings. The check
is therefore superfluous, and by removing it we can remove a linear walk
over all the vma the object has.

Still despite it being an overzealous check, that error code is part of
the current ABI and so we must proceed with caution.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-15-git-send-email-chris@chris-wilson.co.uk
2016-08-05 10:54:41 +01:00
Chris Wilson c21724cc4d drm/i915: Reduce locking inside swfinish ioctl
We only need to take the struct_mutex if the object is pinned to the
display engine and so requires checking for clflush. (The race with
userspace pinning the object to a framebuffer is irrelevant.)

v2: Use access once for compiler hints (or not as it is a bitfield)
v3: READ_ONCE, obj->pin_display is not a bitfield anymore
v4: Don't be creative with goto.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-14-git-send-email-chris@chris-wilson.co.uk
2016-08-05 10:54:41 +01:00
Chris Wilson 3fdc13c7a3 drm/i915: Remove (struct_mutex) locking for busy-ioctl
By applying the same logic as for wait-ioctl, we can query whether a
request has completed without holding struct_mutex. The biggest impact
system-wide is removing the flush_active and the contention that causes.

Testcase: igt/gem_busy
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-13-git-send-email-chris@chris-wilson.co.uk
2016-08-05 10:54:40 +01:00
Chris Wilson 033d549b81 drm/i915: Remove (struct_mutex) locking for wait-ioctl
With a bit of care (and leniency) we can iterate over the object and
wait for previous rendering to complete with judicial use of atomic
reference counting. The ABI requires us to ensure that an active object
is eventually flushed (like the busy-ioctl) which is guaranteed by our
management of requests (i.e. everything that is submitted to hardware is
flushed in the same request). All we have to do is ensure that we can
detect when the requests are complete for reporting when the object is
idle (without triggering ETIME), locklessly - this is handled by
i915_gem_active_wait_unlocked().

The impact of this is actually quite small - the return to userspace
following the wait was already lockless and so we don't see much gain in
latency improvement upon completing the wait. What we do achieve here is
completing an already finished wait without hitting the struct_mutex,
our hold is quite short and so we are typically just a victim of
contention rather than a cause - but it is still one less contention
point!

v2: Break up a long line.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-12-git-send-email-chris@chris-wilson.co.uk
2016-08-05 10:54:40 +01:00
Chris Wilson 258a5edee0 drm/i915: Do a nonblocking wait first in pread/pwrite
If we try and read or write to an active request, we first must wait
upon the GPU completing that request. Let's do that without holding the
mutex (and so allow someone else to access the GPU whilst we wait). Upon
completion, we will acquire the mutex and only then start the operation
(i.e. we do not rely on state from before the initial wait).

v2: Repaint the goto labels
v3: Move the tracepoints back to the start of the ioctls

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-11-git-send-email-chris@chris-wilson.co.uk
2016-08-05 10:54:39 +01:00
Chris Wilson f3f6184c5f drm/i915: Tidy generation of the GTT mmap offset
If we make the observation that mmap-offsets are only released when we
free an object, we can then deduce that the shrinker only creates free
space in the mmap arena indirectly by flushing the request list and
freeing expired objects. If we combine this with the lockless
vma-manager and lockless idling, we can avoid taking our big struct_mutex
until we need to actually free the requests.

One side-effect is that we defer the madvise checking until we need the
pages (i.e. the fault handler). This brings us into line with the other
delayed checks (and madvise in general).

v2: s/ret/err/ and use if (!err) rather than if (ret == 0)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-9-git-send-email-chris@chris-wilson.co.uk
2016-08-05 10:54:38 +01:00
Chris Wilson dcff85c844 drm/i915: Enable i915_gem_wait_for_idle() without holding struct_mutex
The principal motivation for this was to try and eliminate the
struct_mutex from i915_gem_suspend - but we still need to hold the mutex
current for the i915_gem_context_lost(). (The issue there is that there
may be an indirect lockdep cycle between cpu_hotplug (i.e. suspend) and
struct_mutex via the stop_machine().) For the moment, enabling last
request tracking for the engine, allows us to do busyness checking and
waiting without requiring the struct_mutex - which is useful in its own
right.

As a side-effect of having a robust means for tracking engine busyness,
we can replace our other busyness heuristic, that of comparing against
the last submitted seqno. For paranoid reasons, we have a semi-ordered
check of that seqno inside the hangchecker, which we can now improve to
an ordered check of the engine's busyness (removing a locked xchg in the
process).

v2: Pass along "bool interruptible" as being unlocked we cannot rely on
i915->mm.interruptible being stable or even under our control.
v3: Replace check Ironlake i915_gpu_busy() with the common precalculated value

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-6-git-send-email-chris@chris-wilson.co.uk
2016-08-05 10:54:37 +01:00
Chris Wilson 90f4fcd56b drm/i915: Remove forced stop ring on suspend/unload
Before suspending (or unloading), we would first wait upon all rendering
to be completed and then disable the rings. This later step is a remanent
from DRI1 days when we did not use request tracking for all operations
upon the ring. Now that we are sure we are waiting upon the very last
operation by the engine, we can forgo clobbering the ring registers,
though we do keep the assert that the engine is indeed idle before
sleeping.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-5-git-send-email-chris@chris-wilson.co.uk
2016-08-05 10:54:36 +01:00
Chris Wilson b8f9096d6a drm/i915: Convert non-blocking waits for requests over to using RCU
We can completely avoid taking the struct_mutex around the non-blocking
waits by switching over to the RCU request management (trading the mutex
for a RCU read lock and some complex atomic operations). The improvement
is that we gain further contention reduction, and overall the code
become simpler due to the reduced mutex dancing.

v2: Move i915_gem_fault tracepoint back to the start of the function,
before the unlocked wait.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-2-git-send-email-chris@chris-wilson.co.uk
2016-08-05 10:54:34 +01:00
Chris Wilson 0eafec6d32 drm/i915: Enable lockless lookup of request tracking via RCU
If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.

However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.

v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)

Paul E. McKenney wrote:

Another approach is synchronize_rcu() after some largish number of
requests.  The advantage of this approach is that it throttles the
production of callbacks at the source.  The corresponding disadvantage
is that it slows things up.

Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it.  Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair.  The
idea is to do something like this:

        cond_synchronize_rcu(cookie);
        cookie = get_state_synchronize_rcu();

You would of course do an initial get_state_synchronize_rcu() to
get things going.  This would not block unless there was less than
one grace period's worth of time between invocations.  But this
assumes a busy system, where there is almost always a grace period
in flight.  But you can make that happen as follows:

        cond_synchronize_rcu(cookie);
        cookie = get_state_synchronize_rcu();
        call_rcu(&my_rcu_head, noop_function);

Note that you need additional code to make sure that the old callback
has completed before doing a new one.  Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).

v3: More comments on compiler and processor order of operations within
the RCU lookup and discover we can use rcu_access_pointer() here instead.

v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:20:05 +01:00
Chris Wilson 00e60f2659 drm/i915: Move i915_gem_object_wait_rendering()
Just move it earlier so that we can use the companion nonblocking
version in a couple of more callsites without having to add a forward
declaration.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-24-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:20:05 +01:00
Chris Wilson 573adb3962 drm/i915: Move obj->active:5 to obj->flags
We are motivated to avoid using a bitfield for obj->active for a couple
of reasons. Firstly, we wish to document our lockless read of obj->active
using READ_ONCE inside i915_gem_busy_ioctl() and that requires an
integral type (i.e. not a bitfield). Secondly, gcc produces abysmal code
when presented with a bitfield and that shows up high on the profiles of
request tracking (mainly due to excess memory traffic as it converts
the bitfield to a register and back and generates frequent AGI in the
process).

v2: BIT, break up a long line in compute the other engines, new paint
for i915_gem_object_is_active (now i915_gem_object_get_active).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-23-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:20:04 +01:00
Chris Wilson faf5bf0ad6 drm/i915: Use atomics to manipulate obj->frontbuffer_bits
The individual bits inside obj->frontbuffer_bits are protected by each
plane->mutex, but the whole bitfield may be accessed by multiple KMS
operations simultaneously and so the RMW need to be under atomics.
However, for updating the single field we do not need to mandate that it
be under the struct_mutex, one more step towards its removal as the de
facto BKL.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-21-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:20:03 +01:00
Chris Wilson b5add9591c drm/i915: Make fb_tracking.lock a spinlock
We only need a very lightweight mechanism here as the locking is only
used for co-ordinating a bitfield.

v2: Move the cheap unlikely tests into the caller
v3: Move the kerneldoc into the header (now separated out into
intel_fronbuffer.h for better kerneldoc and readability)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtien <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-20-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:20:02 +01:00
Chris Wilson 5d723d7afd drm/i915: Separate intel_frontbuffer into its own header
In view of adding inline functions into the intel_frontbuffer section,
we first split the header into its own file so that we can integrate it
more easily with kerneldoc.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-19-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:20:01 +01:00
Chris Wilson de895082f7 drm/i915: Remove highly confusing i915_gem_obj_ggtt_pin()
Since i915_gem_obj_ggtt_pin() is an idiom breaking curry function for
i915_gem_object_ggtt_pin(), spare us the confusion and remove it.
Removing it now simplifies later patches to change the i915_vma_pin()
(and friends) interface.

v2: Add a redundant GEM_BUG_ON(!view) to
i915_gem_obj_lookup_or_create_ggtt_vma()

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-18-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:20:00 +01:00
Chris Wilson 305bc234a8 drm/i915: Make i915_vma_pin() small and inline
Not only is i915_vma_pin() called for every single object on every single
execbuf, it is usually a simple increment as the VMA is already bound for
execution by the GPU. Rearrange the tests for unbound and pin_count
overflow so that we can do the increment and test very cheaply and
compact enough to inline the operation into execbuf. The trick used is
to note that we can check for an overflow bit (keeping space available
for it inside the flags) at the same time as checking the binding bits.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-17-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:20:00 +01:00
Chris Wilson 3272db5313 drm/i915: Combine all i915_vma bitfields into a single set of flags
In preparation to perform some magic to speed up i915_vma_pin(), which
is among the hottest of hot paths in execbuf, refactor all the bitfields
accessed by i915_vma_pin() into a single unified set of flags.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-16-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:19:59 +01:00
Chris Wilson 59bfa1248e drm/i915: Start passing around i915_vma from execbuffer
During execbuffer we look up the i915_vma in order to reserve them in
the VM. However, we then do a double lookup of the vma in order to then
pin them, all because we lack the necessary interfaces to operate on
i915_vma - so introduce i915_vma_pin()!

v2: Tidy parameter lists to remove one level of redirection in the hot
path.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-15-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:19:58 +01:00
Chris Wilson 20dfbde463 drm/i915: Wrap vma->pin_count accessors with small inline helpers
In the next few patches, the VMA pinning API is overhauled and to reduce
the churn we pull out the update to the accessors into a prep patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-14-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:19:58 +01:00
Chris Wilson de18003328 drm/i915: Record allocated vma size
Tracking the size of the VMA as allocated allows us to dramatically
reduce the complexity of later functions (like inserting the VMA in to
the drm_mm range manager).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-13-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:19:57 +01:00
Chris Wilson a9f1481f41 drm/i915: Update i915_gem_get_ggtt_size/_alignment to use drm_i915_private
For consistency, internal functions should take drm_i915_private rather
than drm_device. Now that we are subclassing drm_device, there are no
more size wins, but being consistent is its own blessing.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-12-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:19:56 +01:00
Chris Wilson ad1a7d20a1 drm/i915: Update the GGTT size/alignment query functions
In order to be consistent with other address space functions, we want to
pass around 64-bit sizes, even though all known global GTT are limited
to 4GiB. Similarly, we are trying to be consistent in using the _ggtt_
nomenclature when referring to the special global GTT.

v2: Update docs to consistently state "global GTT".

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-11-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:19:55 +01:00
Chris Wilson 954c469121 drm/i915: Convert 4096 alignment request to 0 for drm_mm allocations
As we always allocate in chunks of 4096 (that being both the PAGE_SIZE
and our own GTT_PAGE_SIZE), we know that all results from the drm_mm are
aligned to at least 4096. The drm_mm allocator itself is optimised for
alignment == 0, and so by converting alignments of 4096 to 0 we can
satisfy our own requirements and still hit the faster path.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-10-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:19:55 +01:00
Chris Wilson 3b16525cc4 drm/i915: Split insertion/binding of an object into the VM
Split the insertion into the address space's range manager and binding
of that object into the GTT to simplify the code flow when pinning a
VMA.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-9-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:19:54 +01:00
Chris Wilson 3750858990 drm/i915: Reduce WARN(i915_gem_valid_gtt_space) to a debug-only check
i915_gem_valid_gtt_space() is used after inserting the VMA to double
check the list - the location should have been chosen to pass all the
restrictions.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-8-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:19:53 +01:00
Chris Wilson 91b2db6f65 drm/i915: Pad GTT views of exec objects up to user specified size
Our GPUs impose certain requirements upon buffers that depend upon how
exactly they are used. Typically this is expressed as that they require
a larger surface than would be naively computed by pitch * height.
Normally such requirements are hidden away in the userspace driver, but
when we accept pointers from strangers and later impose extra conditions
on them, the original client allocator has no idea about the
monstrosities in the GPU and we require the userspace driver to inform
the kernel how many padding pages are required beyond the client
allocation.

v2: Long time, no see
v3: Try an anonymous union for uapi struct compatibility

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-7-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:19:53 +01:00
Chris Wilson 2ffffd0f85 drm/i915: Fix up vma alignment to be u64
This is not the full fix, as we are required to percolate the u64 nature
down through the drm_mm stack, but this is required now to prevent
explosions due to mismatch between execbuf (eb_vma_misplaced) and vma
binding (i915_vma_misplaced) - and reduces the risk of spurious changes
as we adjust the vma interface in the next patches.

v2: long long casts not required for u64 printk (%llx)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-6-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:19:52 +01:00
Chris Wilson e522ac2324 drm/i915: Remove surplus drm_device parameter to i915_gem_evict_something()
Eviction is VM local, so we can ignore the significance of the
drm_device in the caller, and leave it to i915_gem_evict_something() to
manage itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-2-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:19:50 +01:00
Chris Wilson 1dd5b6f202 drm/i915: Add missing rpm wakelock to GGTT pread
Joonas spotted a discrepancy between the pwrite and pread ioctls, in
that pwrite takes the rpm wakelock around its GGTT access, The wakelock
is required in order for the GTT to function. In disregard for the
current convention, we take the rpm wakelock around the access itself
rather than around the struct_mutex as the nesting is not strictly
required and such ordering will one day be fixed by explicitly noting
the barrier dependencies between the GGTT and rpm.

Fixes: b50a53715f ("drm/i915: Support for pread/pwrite ...")
Reported-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: drm-intel-fixes@lists.freedesktop.org
Link: http://patchwork.freedesktop.org/patch/msgid/1470298193-21765-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-08-04 11:18:22 +01:00
Chris Wilson df0e9a287d Revert "drm/i915: Clean up associated VMAs on context destruction"
This reverts commit e9f24d5fb7.

The patch was only a stop-gap measure that fixed half the problem - the
leak of the fbcon when restarting X. A complete solution required
releasing the VMA when the object itself was closed rather than rely on
file/process exit. The previous patches add the VMA tracking necessary
to do close them along with the object, context or file, and so the time
has come to remove the partial fix.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-28-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:34 +01:00
Chris Wilson 50e046b6a0 drm/i915: Mark the context and address space as closed
When the user closes the context mark it and the dependent address space
as closed. As we use an asynchronous destruct method, this has two
purposes.  First it allows us to flag the closed context and detect
internal errors if we to create any new objects for it (as it is removed
from the user's namespace, these should be internal bugs only). And
secondly, it allows us to immediately reap stale vma.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-27-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:33 +01:00
Chris Wilson b1f788c6ac drm/i915: Release vma when the handle is closed
In order to prevent a leak of the vma on shared objects, we need to
hook into the object_close callback to destroy the vma on the object for
this file. However, if we destroyed that vma immediately we may cause
unexpected application stalls as we try to unbind a busy vma - hence we
defer the unbind to when we retire the vma.

v2: Keep vma allocated until closed. This is useful for a later
optimisation, but it is required now in order to handle potential
recursion of i915_vma_unbind() by retiring itself.
v3: Comments are important.

Testcase: igt/gem_ppggtt/flink-and-close-vma-leak
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-26-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:33 +01:00
Chris Wilson b0decaf75b drm/i915: Track active vma requests
Hook the vma itself into the i915_gem_request_retire() so that we can
accurately track when a solitary vma is inactive (as opposed to having
to wait for the entire object to be idle). This improves the interaction
when using multiple contexts (with full-ppgtt) and eliminates some
frequent list walking when retiring objects after a completed request.

A side-effect is that we get an active vma reference for free. The
consequence of this is shown in the next patch...

v2: Update inline names to be consistent with
i915_gem_object_get_active()

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:32 +01:00
Chris Wilson 5cf3d28098 drm/i915: i915_vma_move_to_active prep patch
This patch is broken out of the next just to remove the code motion from
that patch and make it more readable. What we do here is move the
i915_vma_move_to_active() to i915_gem_execbuffer.c and put the three
stages (read, write, fenced) together so that future modifications to
active handling are all located in the same spot. The importance of this
is so that we can more simply control the order in which the requests
are place in the retirement list (i.e. control the order at which we
retire and so control the lifetimes to avoid having to hold onto
references).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-24-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:31 +01:00
Chris Wilson 4b8de8e68a drm/i915: Move request list retirement to i915_gem_request.c
As the list retirement is now clean of implementation details, we can
move it closer to the request management.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-23-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:30 +01:00
Chris Wilson 776f32364d drm/i915: s/__i915_wait_request/i915_wait_request/
There is only one wait on request function now, so drop the "expert"
indication of leading __.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-21-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:29 +01:00
Chris Wilson fa545cbf97 drm/i915: Refactor activity tracking for requests
With the introduction of requests, we amplified the number of atomic
refcounted objects we use and update every execbuffer; from none to
several references, and a set of references that need to be changed. We
also introduced interesting side-effects in the order of retiring
requests and objects.

Instead of independently tracking the last request for an object, track
the active objects for each request. The object will reside in the
buffer list of its most recent active request and so we reduce the kref
interchange to a list_move. Now retirements are entirely driven by the
request, dramatically simplifying activity tracking on the object
themselves, and removing the ambiguity between retiring objects and
retiring requests.

Furthermore with the consolidation of managing the activity tracking
centrally, we can look forward to using RCU to enable lockless lookup of
the current active requests for an object. In the future, we will be
able to query the status or wait upon rendering to an object without
even touching the struct_mutex BKL.

All told, less code, simpler and faster, and more extensible.

v2: Add a typedef for the function pointer for convenience later.
v3: Make the noop retirement callback explicit. Allow passing NULL to
the init_request_active() which is expanded to a common noop function.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-16-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:25 +01:00
Chris Wilson 21c310f2f9 drm/i915: Remove obsolete i915_gem_object_flush_active()
Since we track requests, and requests are always added to the GPU fully
formed, we never have to flush the incomplete request and know that the
given request will eventually complete without any further action on our
part.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-15-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:24 +01:00
Chris Wilson efdf7c0605 drm/i915: Rename request->list to link for consistency
We use "list" to denote the list and "link" to denote an element on that
list. Rename request->list to match this idiom.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-14-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:23 +01:00
Chris Wilson 8cac6f6c41 drm/i915: Refactor blocking waits
Tidy up the for loops that handle waiting for read/write vs read-only
access.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-13-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:23 +01:00
Chris Wilson d72d908b56 drm/i915: Mark up i915_gem_active for locking annotation
The future annotations will track the locking used for access to ensure
that it is always sufficient. We make the preparations now to present
the API ahead and to make sure that GCC can eliminate the unused
parameter.

Before:	6298417 3619610  696320 10614347         a1f64b vmlinux
After:	6298417 3619610  696320 10614347         a1f64b vmlinux
(with i915 builtin)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-12-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:22 +01:00
Chris Wilson 27c01aaef0 drm/i915: Prepare i915_gem_active for annotations
In the future, we will want to add annotations to the i915_gem_active
struct. The API is thus expanded to hide direct access to the contents
of i915_gem_active and mediated instead through a number of helpers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-11-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:21 +01:00
Chris Wilson 381f371b25 drm/i915: Introduce i915_gem_active for request tracking
In the next patch, request tracking is made more generic and for that we
need a new expanded struct and to separate out the logic changes from
the mechanical churn, we split out the structure renaming into this
patch.

v2: Writer's block. Add some spiel about why we track requests.
v3: Now i915_gem_active.
v4: Now with i915_gem_active_set() for attaching to the active request.
v5: Use i915_gem_active_set() from inside the retirement handlers

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-10-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:20 +01:00
Chris Wilson 4717ca9eec drm/i915: Kill drop_pages()
The drop_pages() function is a dangerous trap in that it can release the
passed in object pointer and so unless the caller is aware, it can
easily trick us into using the stale object afterwards. Move it into its
solitary callsite where we know it is safe.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-9-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:19 +01:00
Chris Wilson aa653a685d drm/i915: Be more careful when unbinding vma
When we call i915_vma_unbind(), we will wait upon outstanding rendering.
This will also trigger a retirement phase, which may update the object
lists. If, we extend request tracking to the VMA itself (rather than
keep it at the encompassing object), then there is a potential that the
obj->vma_list be modified for other elements upon i915_vma_unbind(). As
a result, if we walk over the object list and call i915_vma_unbind(), we
need to be prepared for that list to change.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-8-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:18 +01:00
Chris Wilson 15717de219 drm/i915: Count how many VMA are bound for an object
Since we may have VMA allocated for an object, but we interrupted their
binding, there is a disparity between have elements on the obj->vma_list
and being bound. i915_gem_obj_bound_any() does this check, but this is
not rigorously observed - add an explicit count to make it easier.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-7-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:17 +01:00
Chris Wilson f6b9d5cabd drm/i915: Split early global GTT initialisation
Initialising the global GTT is tricky as we wish to use the drm_mm range
manager during the modesetting initialisation (to capture stolen
allocations from the BIOS) before we actually enable GEM. To overcome
this, we currently setup the drm_mm first and then carefully rebind
them.

v2: Fixup after rebasing
v3: GGTT initialisation needs to be split around kicking out conflicts
v4: Restore an old UMS BUG_ON(mappable > total) as a DRM_ERROR plus
fixup of probe results.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-4-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:15 +01:00
Chris Wilson 97d6d7ab68 drm/i915: Update GGTT initialisation functions to take drm_i915_private
Since these are internal functions they operate on drm_i915_private and
not the drm_device being passed in. So pass in the drm_i915_private
instead, and remove one layer of dancing. No space wins here, just
conforming to the norm in function parameters.

v2: Include all the probe functions

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-3-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:14 +01:00
Chris Wilson ddf07be7a2 drm/i915: Simplify calling engine->sync_to
Since requests can no longer be generated as a side-effect of
intel_ring_begin(), we know that the seqno will be unchanged during
ring-emission. This predicatablity then means we do not have to check
for the seqno wrapping around whilst emitting the semaphore for
engine->sync_to().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-31-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-22-git-send-email-chris@chris-wilson.co.uk
2016-08-02 22:58:32 +01:00
Chris Wilson 5b043f4e60 drm/i915: Unify legacy/execlists submit_execbuf callbacks
Now that emitting requests is identical between legacy and execlists, we
can use the same function to build up the ring for submitting to either
engine. (With the exception of i915_switch_contexts(), but in time that
will also be handled gracefully.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-30-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-21-git-send-email-chris@chris-wilson.co.uk
2016-08-02 22:58:31 +01:00
Chris Wilson 8e63717837 drm/i915: Simplify request_alloc by returning the allocated request
If is simpler and leads to more readable code through the callstack if
the allocation returns the allocated struct through the return value.

The importance of this is that it no longer looks like we accidentally
allocate requests as side-effect of calling certain functions.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-19-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-9-git-send-email-chris@chris-wilson.co.uk
2016-08-02 22:58:20 +01:00
Chris Wilson 7e37f889b5 drm/i915: Rename struct intel_ringbuffer to struct intel_ring
The state stored in this struct is not only the information about the
buffer object, but the ring used to communicate with the hardware. Using
buffer here is overly specific and, for me at least, conflates with the
notion of buffer objects themselves.

s/struct intel_ringbuffer/struct intel_ring/
s/enum intel_ring_hangcheck/enum intel_engine_hangcheck/
s/describe_ctx_ringbuf()/describe_ctx_ring()/
s/intel_ring_get_active_head()/intel_engine_get_active_head()/
s/intel_ring_sync_index()/intel_engine_sync_index()/
s/intel_ring_init_seqno()/intel_engine_init_seqno()/
s/ring_stuck()/engine_stuck()/
s/intel_cleanup_engine()/intel_engine_cleanup()/
s/intel_stop_engine()/intel_engine_stop()/
s/intel_pin_and_map_ringbuffer_obj()/intel_pin_and_map_ring()/
s/intel_unpin_ringbuffer()/intel_unpin_ring()/
s/intel_engine_create_ringbuffer()/intel_engine_create_ring()/
s/intel_ring_flush_all_caches()/intel_engine_flush_all_caches()/
s/intel_ring_invalidate_all_caches()/intel_engine_invalidate_all_caches()/
s/intel_ringbuffer_free()/intel_ring_free()/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-15-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-4-git-send-email-chris@chris-wilson.co.uk
2016-08-02 22:58:16 +01:00
Linus Torvalds 731c7d3a20 Merge tag 'drm-for-v4.8' of git://people.freedesktop.org/~airlied/linux
Merge drm updates from Dave Airlie:
 "This is the main drm pull request for 4.8.

  I'm down with a cold at the moment so hopefully this isn't in too bad
  a state, I finished pulling stuff last week mostly (nouveau fixes just
  went in today), so only this message should be influenced by illness.
  Apologies to anyone who's major feature I missed :-)

  Core:
        Lockless GEM BO freeing
        Non-blocking atomic work
        Documentation changes (rst/sphinx)
        Prep for new fencing changes
        Simple display helpers
        Master/auth changes
        Register/unregister rework
        Loads of trivial patches/fixes.

  New stuff:
        ARM Mali display driver (not the 3D chip)
        sii902x RGB->HDMI bridge

  Panel:
        Support for new panels
        Improved backlight support

  Bridge:
        Convert ADV7511 to bridge driver
        ADV7533 support
        TC358767 (DSI/DPI to eDP) encoder chip support

  i915:
        BXT support enabled by default
        GVT-g infrastructure
        GuC command submission and fixes
        BXT workarounds
        SKL/BKL workarounds
        Demidlayering device registration
        Thundering herd fixes
        Missing pci ids
        Atomic updates

  amdgpu/radeon:
        ATPX improvements for better dGPU power control on PX systems
        New power features for CZ/BR/ST
        Pipelined BO moves and evictions in TTM
        GPU scheduler improvements
        GPU reset improvements
        Overclocking on dGPUs with amdgpu
        Polaris powermanagement enabled

  nouveau:
        GK20A/GM20B volt and clock improvements.
        Initial support for GP100/GP104 GPUs, GP104 will not yet support
        acceleration due to NVIDIA having not released firmware for them as of yet.

  exynos:
        Exynos5433 SoC with IOMMU support.

  vc4:
        Shader validation for branching

  imx-drm:
        Atomic mode setting conversion
        Reworked DMFC FIFO allocation
        External bridge support

  analogix-dp:
        RK3399 eDP support
        Lots of fixes.

  rockchip:
        Lots of small fixes.

  msm:
        DT bindings cleanups
        Shrinker and madvise support
        ASoC HDMI codec support

  tegra:
        Host1x driver cleanups
        SOR reworking for DP support
        Runtime PM support

  omapdrm:
        PLL enhancements
        Header refactoring
        Gamma table support

  arcgpu:
        Simulator support

  virtio-gpu:
        Atomic modesetting fixes.

  rcar-du:
        Misc fixes.

  mediatek:
        MT8173 HDMI support

  sti:
        ASOC HDMI codec support
        Minor fixes

  fsl-dcu:
        Suspend/resume support
        Bridge support

  amdkfd:
        Minor fixes.

  etnaviv:
        Enable GPU clock gating

  hisilicon:
        Vblank and other fixes"

* tag 'drm-for-v4.8' of git://people.freedesktop.org/~airlied/linux: (1575 commits)
  drm/nouveau/gr/nv3x: fix instobj write offsets in gr setup
  drm/nouveau/acpi: fix lockup with PCIe runtime PM
  drm/nouveau/acpi: check for function 0x1B before using it
  drm/nouveau/acpi: return supported DSM functions
  drm/nouveau/acpi: ensure matching ACPI handle and supported functions
  drm/nouveau/fbcon: fix font width not divisible by 8
  drm/amd/powerplay: remove enable_clock_power_gatings_tasks from initialize and resume events
  drm/amd/powerplay: move clockgating to after ungating power in pp for uvd/vce
  drm/amdgpu: add query device id and revision id into system info entry at CGS
  drm/amdgpu: add new definition in bif header
  drm/amd/powerplay: rename smum header guards
  drm/amdgpu: enable UVD context buffer for older HW
  drm/amdgpu: fix default UVD context size
  drm/amdgpu: fix incorrect type of info_id
  drm/amdgpu: make amdgpu_cgs_call_acpi_method as static
  drm/amdgpu: comment out unused defaults_staturn_pro static const structure to fix the build
  drm/amdgpu: enable UVD VM only on polaris
  drm/amdgpu: increase timeout of IB test
  drm/amdgpu: add destroy session when generate VCE destroy msg.
  drm/amd: fix deadlock of job_list_lock V2
  ...
2016-08-01 21:44:08 -04:00
Chris Wilson 7e21d6484d drm/i915: Remove stray intel_engine_cs ring identifiers from i915_gem.c
A few places we use ring when referring to the struct intel_engine_cs. An
anachronism we are pruning out.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-9-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469606850-28659-4-git-send-email-chris@chris-wilson.co.uk
2016-07-27 16:23:05 +01:00
Chris Wilson c80ff16e11 drm/i915: Use engine to refer to the user's BSD intel_engine_cs
This patch transitions the execbuf engine selection away from using the
ring nomenclature - though we still refer to the user's incoming
selector as their user_ring_id.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-7-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469606850-28659-2-git-send-email-chris@chris-wilson.co.uk
2016-07-27 16:23:05 +01:00
Chris Wilson 15f7bbc735 drm/i915: Only clear the client pointer when tearing down the file
Upon release of the file (i.e. the user calls close(fd)), we decouple
all objects from the client list so that we don't chase the dangling
file_priv. As we always inspect file_priv first, we only need to nullify
that pointer and can safely ignore the list_head.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-4-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469530913-17180-3-git-send-email-chris@chris-wilson.co.uk
2016-07-26 13:00:59 +01:00
Chris Wilson 2529d57050 drm/i915: Drop racy markup of missed-irqs from idle-worker
During the idle-worker we disable the hangcheck and so kick any waiters
that should have been completed (since the GPU is now idle). Unlike the
hangcheck, we do not take any care to avoid the race between the irq
handler and ourselves, and so it is possible for us to declare a missed
interrupt even as the bottom-half is being scheduled to run. Let's
ignore this race to stop a potential false-positive error.

References: https://bugs.freedesktop.org/show_bug.cgi?id=96974
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469351421-13493-1-git-send-email-chris@chris-wilson.co.uk
2016-07-24 10:57:19 +01:00
Chris Wilson 54b4f68f18 Revert "drm/i915: Enable RC6 immediately"
This reverts commit b12e0ee208 ("drm/i915: Enable RC6 immediately"),
as it was never meant to be sent anywhere other than the bug report for
experimentation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1469132179-4052-1-git-send-email-chris@chris-wilson.co.uk
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-07-21 21:43:19 +01:00
Chris Wilson b12e0ee208 drm/i915: Enable RC6 immediately
Now that PCU communication is reasonably fast, we do not need to defer
RC6 initialisation to a workqueue.

References: https://bugs.freedesktop.org/show_bug.cgi?id=97017
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-07-21 18:30:22 +01:00
Chris Wilson 39df91905d drm/i915: Convert i915_semaphores_is_enabled over to early sanitize
Rather than recomputing whether semaphores are enabled, we can do that
computation once during early initialisation as the i915.semaphores
module parameter is now read-only.

s/i915_semaphores_is_enabled/i915.semaphores/

v2: Add the state to the debug dmesg as well

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-10-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-9-git-send-email-chris@chris-wilson.co.uk
2016-07-20 13:40:43 +01:00
Chris Wilson 34911fd30c drm/i915: Rename drm_gem_object_unreference_unlocked in preparation for lockless free
Whilst this ultimately wraps kref_put_mutex(), our goal here is the
lockless variant, so keep the _unlocked() suffix until we need it no
more.

s/drm_gem_object_unreference_unlocked/i915_gem_object_put_unlocked/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-7-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-6-git-send-email-chris@chris-wilson.co.uk
2016-07-20 13:40:13 +01:00
Chris Wilson f8c417cdb1 drm/i915: Rename drm_gem_object_unreference in preparation for lockless free
Ultimately wraps kref_put(), so adopt its nomenclature for consistency
with other subsystems.

s/drm_gem_object_unreference/i915_gem_object_put/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-6-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-5-git-send-email-chris@chris-wilson.co.uk
2016-07-20 13:40:12 +01:00
Chris Wilson 25dc556a2a drm/i915: Wrap drm_gem_object_reference in i915_gem_object_get
Ultimately wraps kref_get(), so adopt its nomenclature for consistency
with other subsystems.

s/drm_gem_object_reference/i915_gem_object_get/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-5-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-4-git-send-email-chris@chris-wilson.co.uk
2016-07-20 13:40:11 +01:00
Chris Wilson 03ac0642f6 drm/i915: Wrap drm_gem_object_lookup in i915_gem_object_lookup
For symmetry with a forthcoming i915_gem_object_get() and
i915_gem_object_put().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-4-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-3-git-send-email-chris@chris-wilson.co.uk
2016-07-20 13:40:10 +01:00
Chris Wilson e8a261ea63 drm/i915: Rename request reference/unreference to get/put
Now that we derive requests from struct fence, swap over to its
nomenclature for references. It's shorter and more idiomatic across the
kernel.

s/i915_gem_request_reference/i915_gem_request_get/
s/i915_gem_request_unreference/i915_gem_request_put/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-2-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-1-git-send-email-chris@chris-wilson.co.uk
2016-07-20 13:40:09 +01:00
Chris Wilson c13d87ea53 drm/i915: Wait on external rendering for GEM objects
When transitioning to the GTT or CPU domain we wait on all rendering
from i915 to complete (with the optimisation of allowing concurrent read
access by both the GPU and client). We don't yet ensure all rendering
from third parties (tracked by implicit fences on the dma-buf) is
complete. Since implicitly tracked rendering by third parties will
ignore our cache-domain tracking, we have to always wait upon rendering
from third-parties when transitioning to direct access to the backing
store. We still rely on clients notifying us of cache domain changes
(i.e. they need to move to the GTT read or write domain after doing a CPU
access before letting the third party render again).

v2:
This introduces a potential WARN_ON into i915_gem_object_free() as the
current i915_vma_unbind() calls i915_gem_object_wait_rendering(). To
hit this path we first need to render with the GPU, have a dma-buf
attached with an unsignaled fence and then interrupt the wait. It does
get fixed later in the series (when i915_vma_unbind() only waits on the
active VMA and not all, including third-party, rendering.

To offset that risk, use the __i915_vma_unbind_no_wait hack.

Testcase: igt/prime_vgem/basic-fence-read
Testcase: igt/prime_vgem/basic-fence-mmap
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469002875-2335-8-git-send-email-chris@chris-wilson.co.uk
2016-07-20 09:29:53 +01:00
Chris Wilson 197be2ae8b drm/i915: Disable waitboosting for mmioflips/semaphores
Since commit a6f766f397 ("drm/i915: Limit ring synchronisation (sw
sempahores) RPS boosts") and commit bcafc4e38b ("drm/i915: Limit mmio
flip RPS boosts") we have limited the waitboosting for semaphores and
flips. Ideally we do not want to boost in either of these instances as no
userspace consumer is waiting upon the results (though a userspace producer
may be stalled trying to submit an execbuf - but in this case the
producer is being throttled due to the engine being saturated with
work). With the introduction of NO_WAITBOOST in the previous patch, we
can finally disable these needless boosts.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469002875-2335-6-git-send-email-chris@chris-wilson.co.uk
2016-07-20 09:29:53 +01:00
Chris Wilson c4b0930bf4 drm/i915: Mark all current requests as complete before resetting them
Following a GPU reset upon hang, we retire all the requests and then
mark them all as complete. If we mark them as complete first, we both
keep the normal retirement order (completed first then retired) and
provide a small optimisation for concurrent lookups.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469002875-2335-3-git-send-email-chris@chris-wilson.co.uk
2016-07-20 09:29:53 +01:00
Chris Wilson 05235c5354 drm/i915: Move GEM request routines to i915_gem_request.c
Migrate the request operations out of the main body of i915_gem.c and
into their own C file for easier expansion.

v2: Move __i915_add_request() across as well

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469002875-2335-1-git-send-email-chris@chris-wilson.co.uk
2016-07-20 09:29:53 +01:00
Daniel Vetter 62f90b38f3 drm/i915: Update missing kerneldoc
Not sure why so much slips through when 0day is catching these. Hopefully
the much faster sphinx toolchain helps in unlazying people.

Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1468612088-9721-10-git-send-email-daniel.vetter@ffwll.ch
2016-07-19 10:34:24 +02:00
Chris Wilson d1054ee492 drm/i915: Handle ENOSPC after failing to insert a mappable node
Even after adding individual page support for GTT mmaping, we can still
fail to find any space within the mappable region, and
drm_mm_insert_node() will then report ENOSPC. We have to then handle
this error by using the shmem access to the pages.

Fixes: b50a53715f ("drm/i915: Support for pread/pwrite ... objects")
Testcase: igt/gem_concurrent_blit
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com
Link: http://patchwork.freedesktop.org/patch/msgid/1468690956-23480-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-07-19 08:59:40 +01:00
Chris Wilson 5ab57c7020 drm/i915: Flush logical context image out to memory upon suspend
Before suspend, and especially before building the hibernation image, we
need to context image to be coherent in memory. To do this we require
that we perform a context switch to a disposable context (i.e. the
dev_priv->kernel_context) - when that switch is complete, all other
context images will be complete. This leaves the kernel_context image as
incomplete, but fortunately that is disposable and we can do a quick
fixup of the logical state after resuming.

v2: Share the nearly identical code to switch to the kernel context with
eviction.
v3: Explain why we need the switch and reset.

Testcase: igt/gem_exec_suspend # bsw
References: https://bugs.freedesktop.org/show_bug.cgi?id=96526
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1468590980-6186-2-git-send-email-chris@chris-wilson.co.uk
2016-07-15 20:41:05 +01:00
Chris Wilson b7137e0cf1 drm/i915: Defer enabling rc6 til after we submit the first batch/context
Some hardware requires a valid render context before it can initiate
rc6 power gating of the GPU; the default state of the GPU is not
sufficient and may lead to undefined behaviour. The first execution of
any batch will load the "golden render state", at which point it is safe
to enable rc6. As we do not forcibly load the kernel context at resume,
we have to hook into the batch submission to be sure that the render
state is setup before enabling rc6.

However, since we don't enable powersaving until that first batch, we
queued a delayed task in order to guarantee that the batch is indeed
submitted.

v2: Rearrange intel_disable_gt_powersave() to match.
v3: Apply user specified cur_freq (or idle_freq if not set).
v4: Give in, and supply a delayed work to autoenable rc6
v5: Mika suggested a couple of better names for delayed_resume_work
v6: Rebalance rpm_put around the autoenable task

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1468397438-21226-7-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2016-07-14 15:24:21 +01:00
Chris Wilson b913b33c43 drm/i915: Flush GT idle status upon reset
Upon resetting the GPU, we force the engines to be idle by clearing
their request lists. However, I neglected to clear the GT active status
and so the next request following the reset was not marking the device
as busy again. (We had to wait until any outstanding retire worker
finally ran and cleared the active status.)

Fixes: 67d97da349 ("drm/i915: Only start retire worker when idle")
Testcase: igt/pm_rps/reset
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1468397438-21226-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-07-14 15:21:17 +01:00
Chris Wilson 5b58592530 drm/i915/breadcrumbs: Queue hangcheck before sleeping
Never go to sleep waiting on the GPU without first ensuring that we will
get woken up.

We have a choice of queuing the hangcheck before every schedule() or the
first time we wakeup. In order to simply accommodate both the signaler
and the ordinary waiter, move the queuing to the common point of
enabling the irq. We lose the paranoid safety of ensuring that the
hangcheck is active before the sleep, but avoid code duplication (and
redundant hangcheck queuing).

Testcase: igt/prime_busy
Fixes: c81d46138d ("drm/i915: Convert trace-irq to the breadcrumb waiter")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1468055535-19740-2-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
(cherry picked from commit 232af392fd)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-07-14 15:48:32 +02:00
Matthew Auld 3fef3a5be3 drm/i915: remove superfluous i915_gem_object_free_mmap_offset call
This should already be handled by drm_gem_object_release, which is
called later on.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467720019-31876-1-git-send-email-matthew.auld@intel.com
2016-07-14 16:11:55 +03:00
Tvrtko Ursulin 8b3e2d3639 drm/i915: Unify engine init loop
With the unified common engine setup done, and the execlist engine
initialization loop clearly split into two phases, we can eliminate
the separate legacy engine initialization code.

v2: Fix cleanup path for legacy.
v3: Rename constructors. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris-wilson.co.uk>
2016-07-14 11:17:06 +01:00
Chris Wilson c961561382 drm/i915: Kick hangcheck from retire worker
Let's ensure that we cannot run indefinitely without the hangcheck
worker being queued. We removed it from being kicked on every request
because we were kicking it a few millions times in every hangcheck
interval and only once is necessary! However, that leaves us with the
issue of what if userspace never waits for a request, or runs out of
resources, what if userspace just issues a request then spins on
BUSY_IOCTL?

Testcase: igt/gem_busy
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1468055535-19740-3-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2016-07-11 13:48:42 +01:00
Chris Wilson 232af392fd drm/i915/breadcrumbs: Queue hangcheck before sleeping
Never go to sleep waiting on the GPU without first ensuring that we will
get woken up.

We have a choice of queuing the hangcheck before every schedule() or the
first time we wakeup. In order to simply accommodate both the signaler
and the ordinary waiter, move the queuing to the common point of
enabling the irq. We lose the paranoid safety of ensuring that the
hangcheck is active before the sleep, but avoid code duplication (and
redundant hangcheck queuing).

Testcase: igt/prime_busy
Fixes: c81d46138d ("drm/i915: Convert trace-irq to the breadcrumb waiter")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1468055535-19740-2-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2016-07-11 13:48:22 +01:00
Chris Wilson 91c8a326a1 drm/i915: Convert dev_priv->dev backpointers to dev_priv->drm
Since drm_i915_private is now a subclass of drm_device we do not need to
chase the drm_i915_private->dev backpointer and can instead simply
access drm_i915_private->drm directly.

   text	   data	    bss	    dec	    hex	filename
1068757	   4565	    416	1073738	 10624a	drivers/gpu/drm/i915/i915.ko
1066949	   4565	    416	1071930	 105b3a	drivers/gpu/drm/i915/i915.ko

Created by the coccinelle script:
@@
struct drm_i915_private *d;
identifier i;
@@
(
- d->dev->i
+ d->drm.i
|
- d->dev
+ &d->drm
)

and for good measure the dev_priv->dev backpointer was removed entirely.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467711623-2905-4-git-send-email-chris@chris-wilson.co.uk
2016-07-05 11:58:45 +01:00
Chris Wilson b1379d4964 drm/i915: Replace lockless_dereference(bool) with READ_ONCE()
After Joonas complained about using READ_ONCE() on the only use of the
variable in the function, where the intent was to simply document that
the read was intentionally racy and unlocked, I switched the READ_ONCE()
over to lockless_dereference(). However, in linux-next that has a
stronger type-check to only allow pointers and is no longer
interchangeable with READ_ONCE(), see commit 331b6d8c7a
("locking/barriers: Validate lockless_dereference() is used on a pointer
type")

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 67d97da349 ("drm/i915: Only start retire worker when idle")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1467705276-707-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2016-07-05 10:58:07 +01:00
Dave Gordon f19ec8cb5a drm/i915: convert a few more E->dev_private to to_i915(E)
Also remove some redundant dev and dev_priv locals

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1467626365-29871-1-git-send-email-david.s.gordon@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1467628477-25379-2-git-send-email-chris@chris-wilson.co.uk
2016-07-04 12:54:44 +01:00
Chris Wilson fac5e23e3c drm/i915: Mass convert dev->dev_private to to_i915(dev)
Since we now subclass struct drm_device, we can save pointer dances by
noting the equivalence of struct drm_device and struct drm_i915_private,
i.e. by using to_i915().

   text    data     bss     dec     hex filename
1073824    4562     416 1078802  107612 drivers/gpu/drm/i915/i915.ko
1068976    4562     416 1073954  106322 drivers/gpu/drm/i915/i915.ko

Created by the coccinelle script:

@@
expression E;
identifier p;
@@
- struct drm_i915_private *p = E->dev_private;
+ struct drm_i915_private *p = to_i915(E);

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467628477-25379-1-git-send-email-chris@chris-wilson.co.uk
2016-07-04 12:54:07 +01:00
Chris Wilson 7b4d3a16dd drm/i915: Remove stop-rings debugfs interface
Now that we have (near) universal GPU recovery code, we can inject a
real hang from userspace and not need any fakery. Not only does this
mean that the testing is far more realistic, but we can simplify the
kernel in the process.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-7-git-send-email-chris@chris-wilson.co.uk
2016-07-04 08:18:23 +01:00
Chris Wilson df4ba5099f drm/i915: Add background commentary to "waitboosting"
Describe the intent of boosting the GPU frequency to maximum before
waiting on the GPU.

RPS waitboosting was introduced with commit b29c19b645 ("drm/i915:
Boost RPS frequency for CPU stalls") but lacked a concise comment in the
code to explain itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-5-git-send-email-chris@chris-wilson.co.uk
2016-07-04 08:18:22 +01:00
Chris Wilson 0e6883b043 drm/i915: Restore waitboost credit to the synchronous waiter
Ideally, we want to automagically have the GPU respond to the
instantaneous load by reclocking itself. However, reclocking occurs
relatively slowly, and to the client waiting for a result from the GPU,
too late. To compensate and reduce the client latency, we allow the
first wait from a client to boost the GPU clocks to maximum. This
overcomes the lag in autoreclocking, at the expense of forcing the GPU
clocks too high. So to offset the excessive power usage, we currently
allow a client to only boost the clocks once before we detect the GPU
is idle again. This works reasonably for say the first frame in a
benchmark, but for many more synchronous workloads (like OpenCL) we find
the GPU clocks remain too low. By noting a wait which would idle the GPU
(i.e. we just waited upon the last known request), we can give that
client the idle boost credit (for their next wait) without the 100ms
delay required for us to detect the GPU idle state. The intention is to
boost clients that are stalling in the process of feeding the GPU more
work (and who in doing so let the GPU idle), without granting boost
credits to clients that are throttling themselves (such as compositors).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Zou, Nanhai" <nanhai.zou@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-4-git-send-email-chris@chris-wilson.co.uk
2016-07-04 08:18:21 +01:00
Chris Wilson e307d62d5f drm/i915: Remove redundant queue_delayed_work() from throttle ioctl
We know, by design, that whilst the GPU is active (and thus we are
throttling) the retire_worker is queued. Therefore attempting to requeue
it with queue_delayed_work() is a no-op and we can safely remove it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-3-git-send-email-chris@chris-wilson.co.uk
2016-07-04 08:18:21 +01:00
Chris Wilson 1b51bce27b drm/i915: Do not keep postponing the idle-work
Rather than persistently postponing the idle-work everytime somebody
calls i915_gem_retire_requests() (potentially ensuring that we never
reach the idle state), queue the work the first time we detect all
requests are complete. Then if in 100ms, more requests have been queued,
we will abort the idle-worker and wait again until all the new requests
have been completed.

Of course, this does depend upon the idle worker cancelling itself
gracefully from the previous patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-2-git-send-email-chris@chris-wilson.co.uk
2016-07-04 08:18:20 +01:00
Chris Wilson 67d97da349 drm/i915: Only start retire worker when idle
The retire worker is a low frequency task that makes sure we retire
outstanding requests if userspace is being lax. We only need to start it
once as it remains active until the GPU is idle, so do a cheap test
before the more expensive queue_work(). A consequence of this is that we
need correct locking in the worker to make the hot path of request
submission cheap. To keep the symmetry and keep hangcheck strictly bound
by the GPU's wakelock, we move the cancel_sync(hangcheck) to the idle
worker before dropping the wakelock.

v2: Guard against RCU fouling the breadcrumbs bottom-half whilst we kick
the waiter.
v3: Remove the wakeref assertion squelching (now we hold a wakeref for
the hangcheck, any rpm error there is genuine).
v4: To prevent excess work when retiring requests, we split the busy
flag into two, a boolean to denote whether we hold the wakeref and a
bitmask of active engines.
v5: Reorder cancelling hangcheck upon idling to avoid a race where we
might cancel a hangcheck after being preempted by a new task

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
References: https://bugs.freedesktop.org/show_bug.cgi?id=88437
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-1-git-send-email-chris@chris-wilson.co.uk
2016-07-04 08:18:19 +01:00
Chris Wilson c81d46138d drm/i915: Convert trace-irq to the breadcrumb waiter
If we convert the tracing over from direct use of ring->irq_get() and
over to the breadcrumb infrastructure, we only have a single user of the
ring->irq_get and so we will be able to simplify the driver routines
(eliminating the redundant validation and irq refcounting).

Process context is preferred over softirq (or even hardirq) for a couple
of reasons:

 - we already utilize process context to have fast wakeup of a single
   client (i.e. the client waiting for the GPU inspects the seqno for
   itself following an interrupt to avoid the overhead of a context
   switch before it returns to userspace)

 - engine->irq_seqno() is not suitable for use from an softirq/hardirq
   context as we may require long waits (100-250us) to ensure the seqno
   write is posted before we read it from the CPU

A signaling framework is a requirement for enabling dma-fences.

v2: Move to a signaling framework based upon the waiter.
v3: Track the first-signal to avoid having to walk the rbtree everytime.
v4: Mark the signaler thread as RT priority to reduce latency in the
indirect wakeups.
v5: Make failure to allocate the thread fatal.
v6: Rename kthreads to i915/signal:%u

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-16-git-send-email-chris@chris-wilson.co.uk
2016-07-01 21:02:34 +01:00
Chris Wilson 1137fa8615 drm/i915: Stop setting wraparound seqno on initialisation
We have testcases to ensure that seqno wraparound works fine, so we can
forgo forcing everyone to encounter seqno wraparound during early
uptime. seqno wraparound incurs a full GPU stall so not forcing it
will eliminate one jitter from the early system. Using the testcases, we
have very deterministic testing which given how difficult it would be to
debug an issue (GPU hang) stemming from a wraparound using pure
postmortem analysis I see no value in forcing a wrap during boot.

Advancing the global next_seqno after a GPU reset is equally pointless.

References? https://bugs.freedesktop.org/show_bug.cgi?id=95023
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-15-git-send-email-chris@chris-wilson.co.uk
2016-07-01 21:00:55 +01:00
Chris Wilson f69a02c9d5 drm/i915: Spin after waking up for an interrupt
When waiting for an interrupt (waiting for the engine to complete some
work), we know we are the only waiter to be woken on this engine. We also
know when the GPU has nearly completed our request (or at least started
processing it), so after being woken and we detect that the GPU is
active and working on our request, allow us the bottom-half (the first
waiter who wakes up to handle checking the seqno after the interrupt) to
spin for a very short while to reduce client latencies.

The impact is minimal, there was an improvement to the realtime-vs-many
clients case, but exporting the function proves useful later. However,
it is tempting to adjust irq_seqno_barrier to include the spin. The
problem is first ensuring that the "start-of-request" seqno is coherent
as we use that as our basis for judging when it is ok to spin. If we
could, spinning there could dramatically shorten some sleeps, and allow
us to make the barriers more conservative to handle missed seqno writes
on more platforms (all gen7+ are known to have the occasional issue, at
least).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-7-git-send-email-chris@chris-wilson.co.uk
2016-07-01 20:58:47 +01:00
Chris Wilson 688e6c7258 drm/i915: Slaughter the thundering i915_wait_request herd
One particularly stressful scenario consists of many independent tasks
all competing for GPU time and waiting upon the results (e.g. realtime
transcoding of many, many streams). One bottleneck in particular is that
each client waits on its own results, but every client is woken up after
every batchbuffer - hence the thunder of hooves as then every client must
do its heavyweight dance to read a coherent seqno to see if it is the
lucky one.

Ideally, we only want one client to wake up after the interrupt and
check its request for completion. Since the requests must retire in
order, we can select the first client on the oldest request to be woken.
Once that client has completed his wait, we can then wake up the
next client and so on. However, all clients then incur latency as every
process in the chain may be delayed for scheduling - this may also then
cause some priority inversion. To reduce the latency, when a client
is added or removed from the list, we scan the tree for completed
seqno and wake up all the completed waiters in parallel.

Using igt/benchmarks/gem_latency, we can demonstrate this effect. The
benchmark measures the number of GPU cycles between completion of a
batch and the client waking up from a call to wait-ioctl. With many
concurrent waiters, with each on a different request, we observe that
the wakeup latency before the patch scales nearly linearly with the
number of waiters (before external factors kick in making the scaling much
worse). After applying the patch, we can see that only the single waiter
for the request is being woken up, providing a constant wakeup latency
for every operation. However, the situation is not quite as rosy for
many waiters on the same request, though to the best of my knowledge this
is much less likely in practice. Here, we can observe that the
concurrent waiters incur extra latency from being woken up by the
solitary bottom-half, rather than directly by the interrupt. This
appears to be scheduler induced (having discounted adverse effects from
having a rbtree walk/erase in the wakeup path), each additional
wake_up_process() costs approximately 1us on big core. Another effect of
performing the secondary wakeups from the first bottom-half is the
incurred delay this imposes on high priority threads - rather than
immediately returning to userspace and leaving the interrupt handler to
wake the others.

To offset the delay incurred with additional waiters on a request, we
could use a hybrid scheme that did a quick read in the interrupt handler
and dequeued all the completed waiters (incurring the overhead in the
interrupt handler, not the best plan either as we then incur GPU
submission latency) but we would still have to wake up the bottom-half
every time to do the heavyweight slow read. Or we could only kick the
waiters on the seqno with the same priority as the current task (i.e. in
the realtime waiter scenario, only it is woken up immediately by the
interrupt and simply queues the next waiter before returning to userspace,
minimising its delay at the expense of the chain, and also reducing
contention on its scheduler runqueue). This is effective at avoid long
pauses in the interrupt handler and at avoiding the extra latency in
realtime/high-priority waiters.

v2: Convert from a kworker per engine into a dedicated kthread for the
bottom-half.
v3: Rename request members and tweak comments.
v4: Use a per-engine spinlock in the breadcrumbs bottom-half.
v5: Fix race in locklessly checking waiter status and kicking the task on
adding a new waiter.
v6: Fix deciding when to force the timer to hide missing interrupts.
v7: Move the bottom-half from the kthread to the first client process.
v8: Reword a few comments
v9: Break the busy loop when the interrupt is unmasked or has fired.
v10: Comments, unnecessary churn, better debugging from Tvrtko
v11: Wake all completed waiters on removing the current bottom-half to
reduce the latency of waking up a herd of clients all waiting on the
same request.
v12: Rearrange missed-interrupt fault injection so that it works with
igt/drv_missed_irq_hang
v13: Rename intel_breadcrumb and friends to intel_wait in preparation
for signal handling.
v14: RCU commentary, assert_spin_locked
v15: Hide BUG_ON behind the compiler; report on gem_latency findings.
v16: Sort seqno-groups by priority so that first-waiter has the highest
task priority (and so avoid priority inversion).
v17: Add waiters to post-mortem GPU hang state.
v18: Return early for a completed wait after acquiring the spinlock.
Avoids adding ourselves to the tree if the is already complete, and
skips the awkward question of why we don't do completion wakeups for
waits earlier than or equal to ourselves.
v19: Prepare for init_breadcrumbs to fail. Later patches may want to
allocate during init, so be prepared to propagate back the error code.

Testcase: igt/gem_concurrent_blit
Testcase: igt/benchmarks/gem_latency
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> #v18
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-6-git-send-email-chris@chris-wilson.co.uk
2016-07-01 20:58:43 +01:00
Chris Wilson 1f15b76f1e drm/i915: Separate GPU hang waitqueue from advance
Currently __i915_wait_request uses a per-engine wait_queue_t for the dual
purpose of waking after the GPU advances or for waking after an error.
In the future, we may add even more wake sources and require greater
separation, but for now we can conceptually simplify wakeups by separating
the two sources. In particular, this allows us to use different wait-queues
(e.g. one on the engine advancement, a global one for errors and one on
each requests) without any hassle.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-5-git-send-email-chris@chris-wilson.co.uk
2016-07-01 20:42:31 +01:00
Chris Wilson 05535726d3 drm/i915: Delay queuing hangcheck to wait-request
We can forgo queuing the hangcheck from the start of every request to
until we wait upon a request. This reduces the overhead of every
request, but may increase the latency of detecting a hang. However, if
nothing every waits upon a hang, did it ever hang? It also improves the
robustness of the wait-request by ensuring that the hangchecker is
indeed running before we sleep indefinitely (and thereby ensuring that
we never actually sleep forever waiting for a dead GPU).

As pointed out by Tvrtko, it is possible for a GPU hang to go unnoticed
for as long as nobody is waiting for the GPU. Though this rare, during
that time we may be consuming more power than if we had promptly
recovered, and in the most extreme case we may exhaust all memory before
forcing the hangcheck. Something to be wary off in future.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-2-git-send-email-chris@chris-wilson.co.uk
2016-07-01 20:42:28 +01:00
Chris Wilson 0c5eed6514 drm/i915: Remove request->reset_counter
Since commit 2ed53a94d8 ("drm/i915: On GPU reset, set the HWS
breadcrumb to the last seqno") once a hang is completed, the seqno is
advanced past all current requests. With this we know that if we wake up
from waiting for a request, if a hang has occurred and reset completed,
our request will be considered complete (i.e.
i915_gem_request_completed() returns true). Therefore we only need to
worry about the situation where a hang has occurred, but not yet reset,
where we may need to release our struct_mutex. Since we don't need to
detect the completed reset using the global gpu_error->reset_counter
anymore, we do not need to track the reset_counter epoch inside the
request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467211874-11552-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com>
2016-06-29 17:06:41 +01:00
Chris Wilson 6e5a5beb8e drm/i915: Split idling from forcing context switch
We only need to force a switch to the kernel context placeholder during
eviction. All other uses of i915_gpu_idle() just want to wait until
existing work on the GPU is idle. Rename i915_gpu_idle() to
i915_gem_wait_for_idle() to avoid any implications about "parking" the
context first.

v2: Tweak an error message if the wait fails for the ilk vtd w/a

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466776558-21516-6-git-send-email-chris@chris-wilson.co.uk
2016-06-24 15:03:14 +01:00
Chris Wilson 62e6300768 drm/i915: Skip idling an idle engine
During suspend (or module unload), if we have never accessed the engine
(i.e. userspace never submitted a batch to it), the engine is idle. Then
we attempt to idle the engine by forcing it to the default context,
which actually means we submit a render batch to setup the golden
context state and then wait for it to complete. We can skip this
entirely as we know the engine is idle.

v2: Drop incorrect comment.

References: https://bugs.freedesktop.org/show_bug.cgi?id=95634
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466776558-21516-1-git-send-email-chris@chris-wilson.co.uk
2016-06-24 15:02:16 +01:00
Chris Wilson aeecc9696a drm/i915: use ORIGIN_CPU for frontbuffer invalidation on WC mmaps
... instead of the previous ORIGIN_GTT. This should actually
invalidate FBC once something is written on the frontbuffer using WC
mmaps. The problem with ORIGIN_GTT is that the automatic hardware
tracking is not able to detect the WC writes as it can detect the GTT
writes.

This should help fix the SKL bug where nothing happens when you type
your username/password on lightdm.

This patch was originally pasted on an email by Chris and converted to
an actual git patch by Paulo.

v2 (from Paulo):
 - Make it a full variable instead of a bit-field (Daniel)
 - Use WRITE_ONCE (Chris)
v3 (from Paulo):
 - Remove huge comment since now we have WRITE_ONCE (Chris)
 - Remove uneeded new line (Chris)
 - Add Chris' Signed-off-by, authorized via IRC

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466185599-26401-1-git-send-email-paulo.r.zanoni@intel.com
2016-06-20 17:47:36 -03:00
Chris Wilson 6eae0059f9 drm/i915: pwrite/pread do not require obj->base.filp, just pages
The idea behind relaxing the restriction for pread/pwrite was to handle
!obj->base.flip, i.e. non-shmemfs backed objects, which only requires
that the object provide struct pages.

v2: Remove excess (). Note enough editing after copy'n'paste.
v3: Use new i915_gem_object_has_struct_page()

Testcase: igt/prime_vgem/read
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466431552-17860-2-git-send-email-chris@chris-wilson.co.uk
2016-06-20 15:58:08 +01:00
Chris Wilson b9bcd14a2b drm/i915: Extract checking for backing struct pages to a helper
Currently to see if an object is backed by struct pages (as opposed to
being a simple pointer to stolen memory, for example) we do a manual
check on the obj->ops->flags. This is quite shouty and before adding
more checks in future, we should make it a bit calmer.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466431552-17860-1-git-send-email-chris@chris-wilson.co.uk
2016-06-20 15:57:54 +01:00
Ankitprasad Sharma b50a53715f drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.

v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)

v3: Rebased to the latest drm-intel-nightly (Ankit)

v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)

v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)

v6: Using pwrite_fast for non-shmem backed objects as well (Chris)

v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)

v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)

v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)

v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)

v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration

v12: Use page-by-page copy for slow user access too (Chris)

v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)

v14: Corrected datatypes/initializations (Tvrtko)

Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite

Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-13 10:04:38 +01:00
Ankitprasad Sharma 4f1959ee33 drm/i915: Use insert_page for pwrite_fast
In pwrite_fast, map an object page by page if obj_ggtt_pin fails. First,
we try a nonblocking pin for the whole object (since that is fastest if
reused), then failing that we try to grab one page in the mappable
aperture. It also allows us to handle objects larger than the mappable
aperture (e.g. if we need to pwrite with vGPU restricting the aperture
to a measely 8MiB or something like that).

v2: Pin pages before starting pwrite, Combined duplicate loops (Chris)

v3: Combined loops based on local patch by Chris (Chris)

v4: Added i915 wrapper function for drm_mm_insert_node_in_range (Chris)

v5: Renamed wrapper function for drm_mm_insert_node_in_range (Chris)

v5: Added wrapper for drm_mm_remove_node() (Chris)

v6: Added get_pages call before pinning the pages (Tvrtko)
Added remove_mappable_node() wrapper for drm_mm_remove_node() (Chris)

v7: Added size argument for insert_mappable_node (Tvrtko)

v8: Do not put_pages after pwrite, do memset of node in the wrapper
function (insert_mappable_node) (Chris)

v9: Rebase (Ankit)

Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-06-13 10:03:55 +01:00
Dave Gordon e556f7c168 drm/i915/guc: fix GuC loading/submission check
The last stage of the GuC loader also sanitises the GuC submission
settings, so should be called unconditionally (even on platforms
without a GuC) to ensure consistent settings; in particular, this
prevents any attempt to use GuC submission on GuCless platforms!

Also fix error path handling and clarify DRM_INFO fallback message.

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-06-07 14:21:58 +01:00
Tvrtko Ursulin 14bb2c1179 drm/i915: Fix a buch of kerneldoc warnings
Just a bunch of stale kerneldocs generating warnings when
building the docs. Mostly function parameters so not very
useful but still.

v2: Tidy.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1464958937-23344-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-06-06 13:04:26 +01:00
Daniel Vetter 5599617ec0 Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued
Git got absolutely destroyed with all our cherry-picking from
drm-intel-next-queued to various branches. It ended up inserting
intel_crtc_page_flip 2x even in intel_display.c.

Backmerge to get back to sanity.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2016-06-02 09:54:12 +02:00
Dave Airlie 66fd7a66e8 Merge branch 'drm-intel-next' of git://anongit.freedesktop.org/drm-intel into drm-next
drm-intel-next-2016-05-22:
- cmd-parser support for direct reg->reg loads (Ken Graunke)
- better handle DP++ smart dongles (Ville)
- bxt guc fw loading support (Nick Hoathe)
- remove a bunch of struct typedefs from dpll code (Ander)
- tons of small work all over to avoid casting between drm_device and the i915
  dev struct (Tvrtko&Chris)
- untangle request retiring from other operations, also fixes reset stat corner
  cases (Chris)
- skl atomic watermark support from Matt Roper, yay!
- various wm handling bugfixes from Ville
- big pile of cdclck rework for bxt/skl (Ville)
- CABC (Content Adaptive Brigthness Control) for dsi panels (Jani&Deepak M)
- nonblocking atomic commits for plane-only updates (Maarten Lankhorst)
- bunch of PSR fixes&improvements
- untangle our map/pin/sg_iter code a bit (Dave Gordon)
drm-intel-next-2016-05-08:
- refactor stolen quirks to share code between early quirks and i915 (Joonas)
- refactor gem BO/vma funcstion (Tvrtko&Dave)
- backlight over DPCD support (Yetunde Abedisi)
- more dsi panel sequence support (Jani)
- lots of refactoring around handling iomaps, vma, ring access and related
  topics culmulating in removing the duplicated request tracking in the execlist
  code (Chris & Tvrtko) includes a small patch for core iomapping code
- hw state readout for bxt dsi (Ramalingam C)
- cdclk cleanups (Ville)
- dedupe chv pll code a bit (Ander)
- enable semaphores on gen8+ for legacy submission, to be able to have a direct
  comparison against execlist on the same platform (Chris) Not meant to be used
  for anything else but performance tuning
- lvds border bit hw state checker fix (Jani)
- rpm vs. shrinker/oom-notifier fixes (Praveen Paneri)
- l3 tuning (Imre)
- revert mst dp audio, it's totally non-functional and crash-y (Lyude)
- first official dmc for kbl (Rodrigo)
- and tons of small things all over as usual

* 'drm-intel-next' of git://anongit.freedesktop.org/drm-intel: (194 commits)
  drm/i915: Revert async unpin and nonblocking atomic commit
  drm/i915: Update DRIVER_DATE to 20160522
  drm/i915: Inline sg_next() for the optimised SGL iterator
  drm/i915: Introduce & use new lightweight SGL iterators
  drm/i915: optimise i915_gem_object_map() for small objects
  drm/i915: refactor i915_gem_object_pin_map()
  drm/i915/psr: Implement PSR2 w/a for gen9
  drm/i915/psr: Use ->get_aux_send_ctl functions
  drm/i915/psr: Order DP aux transactions correctly
  drm/i915/psr: Make idle_frames sensible again
  drm/i915/psr: Try to program link training times correctly
  drm/i915/userptr: Convert to drm_i915_private
  drm/i915: Allow nonblocking update of pageflips.
  drm/i915: Check for unpin correctness.
  Reapply "drm/i915: Avoid stalling on pending flips for legacy cursor updates"
  drm/i915: Make unpin async.
  drm/i915: Prepare connectors for nonblocking checks.
  drm/i915: Pass atomic states to fbc update functions.
  drm/i915: Remove reset_counter from intel_crtc.
  drm/i915: Remove queue_flip pointer.
  ...
2016-06-02 07:58:36 +10:00
Al Viro 93c76a3d43 file_inode(f)->i_mapping is f->f_mapping
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-29 18:56:09 -04:00
Dave Airlie 7fa1d27b63 Merge tag 'drm-intel-next-fixes-2016-05-25' of git://anongit.freedesktop.org/drm-intel into drm-next
I see the main drm pull got merged, here's the first batch of fixes for
4.7 already. Fixes all around, a large portion cc: stable stuff.

[airlied: the DP++ stuff is a regression fix].
* tag 'drm-intel-next-fixes-2016-05-25' of git://anongit.freedesktop.org/drm-intel:
  drm/i915: Stop automatically retiring requests after a GPU hang
  drm/i915: Unify intel_ring_begin()
  drm/i915: Ignore stale wm register values on resume on ilk-bdw (v2)
  drm/i915/psr: Try to program link training times correctly
  drm/i915/bxt: Adjusting the error in horizontal timings retrieval
  drm/i915: Don't leave old junk in ilk active watermarks on readout
  drm/i915: s/DPPL/DPLL/ for SKL DPLLs
  drm/i915: Fix gen8 semaphores id for legacy mode
  drm/i915: Set crtc_state->lane_count for HDMI
  drm/i915/BXT: Retrieving the horizontal timing for DSI
  drm/i915: Protect gen7 irq_seqno_barrier with uncore lock
  drm/i915: Re-enable GGTT earlier during resume on pre-gen6 platforms
  drm/i915: Determine DP++ type 1 DVI adaptor presence based on VBT
  drm/i915: Enable/disable TMDS output buffers in DP++ adaptor as needed
  drm/i915: Respect DP++ adaptor TMDS clock limit
  drm: Add helper for DP++ adaptors
2016-05-27 16:08:38 +10:00
Chris Wilson e2efd13007 drm/i915: Rename struct intel_context
Our goal is to rename the anonymous per-engine struct beneath the
current intel_context. However, after a lively debate resolving around
the confusion between intel_context_engine and intel_engine_context, the
realisation is that the two structs target different users. The outer
struct is API / user facing, and so carries the higher level GEM
information. The inner struct is hw facing. Thus we want to name the
inner struct intel_context and the outer one i915_gem_context. As the
first step, we need to rename the current struct:

	s/struct intel_context/struct i915_gem_context/

which fits much better with its constructors already conveying the
i915_gem_context prefix!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464098023-3294-1-git-send-email-chris@chris-wilson.co.uk
2016-05-24 15:27:14 +01:00
Linus Torvalds 84787c572d Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:

 - Oleg's "wait/ptrace: assume __WALL if the child is traced".  It's a
   kernel-based workaround for existing userspace issues.

 - A few hotfixes

 - befs cleanups

 - nilfs2 updates

 - sys_wait() changes

 - kexec updates

 - kdump

 - scripts/gdb updates

 - the last of the MM queue

 - a few other misc things

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (84 commits)
  kgdb: depends on VT
  drm/amdgpu: make amdgpu_mn_get wait for mmap_sem killable
  drm/radeon: make radeon_mn_get wait for mmap_sem killable
  drm/i915: make i915_gem_mmap_ioctl wait for mmap_sem killable
  uprobes: wait for mmap_sem for write killable
  prctl: make PR_SET_THP_DISABLE wait for mmap_sem killable
  exec: make exec path waiting for mmap_sem killable
  aio: make aio_setup_ring killable
  coredump: make coredump_wait wait for mmap_sem for write killable
  vdso: make arch_setup_additional_pages wait for mmap_sem for write killable
  ipc, shm: make shmem attach/detach wait for mmap_sem killable
  mm, fork: make dup_mmap wait for mmap_sem for write killable
  mm, proc: make clear_refs killable
  mm: make vm_brk killable
  mm, elf: handle vm_brk error
  mm, aout: handle vm_brk failures
  mm: make vm_munmap killable
  mm: make vm_mmap killable
  mm: make mmap_sem for write waits killable for mm syscalls
  MAINTAINERS: add co-maintainer for scripts/gdb
  ...
2016-05-23 19:42:28 -07:00
Michal Hocko 80a89a5e85 drm/i915: make i915_gem_mmap_ioctl wait for mmap_sem killable
i915_gem_mmap_ioctl relies on mmap_sem for write.  If the waiting task
gets killed by the oom killer it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely OOM
resolving.  Wait for the lock in the killable mode and return with EINTR
if the task got killed while waiting.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: David Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Linus Torvalds 1d6da87a32 Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
 "Here's the main drm pull request for 4.7, it's been a busy one, and
  I've been a bit more distracted in real life this merge window.  Lots
  more ARM drivers, not sure if it'll ever end.  I think I've at least
  one more coming the next merge window.

  But changes are all over the place, support for AMD Polaris GPUs is in
  here, some missing GM108 support for nouveau (found in some Lenovos),
  a bunch of MST and skylake fixes.

  I've also noticed a few fixes from Arnd in my inbox, that I'll try and
  get in asap, but I didn't think they should hold this up.

  New drivers:
   - Hisilicon kirin display driver
   - Mediatek MT8173 display driver
   - ARC PGU - bitstreamer on Synopsys ARC SDP boards
   - Allwinner A13 initial RGB output driver
   - Analogix driver for DisplayPort IP found in exynos and rockchip

  DRM Core:
   - UAPI headers fixes and C++ safety
   - DRM connector reference counting
   - DisplayID mode parsing for Dell 5K monitors
   - Removal of struct_mutex from drivers
   - Connector registration cleanups
   - MST robustness fixes
   - MAINTAINERS updates
   - Lockless GEM object freeing
   - Generic fbdev deferred IO support

  panel:
   - Support for a bunch of new panels

  i915:
   - VBT refactoring
   - PLL computation cleanups
   - DSI support for BXT
   - Color manager support
   - More atomic patches
   - GEM improvements
   - GuC fw loading fixes
   - DP detection fixes
   - SKL GPU hang fixes
   - Lots of BXT fixes

  radeon/amdgpu:
   - Initial Polaris support
   - GPUVM/Scheduler/Clock/Power improvements
   - ASYNC pageflip support
   - New mesa feature support

  nouveau:
   - GM108 support
   - Power sensor support improvements
   - GR init + ucode fixes.
   - Use GPU provided topology information

  vmwgfx:
   - Add host messaging support

  gma500:
   - Some cleanups and fixes

  atmel:
   - Bridge support
   - Async atomic commit support

  fsl-dcu:
   - Timing controller for LCD support
   - Pixel clock polarity support

  rcar-du:
   - Misc fixes

  exynos:
   - Pipeline clock support
   - Exynoss4533 SoC support
   - HW trigger mode support
   - export HDMI_PHY clock
   - DECON5433 fixes
   - Use generic prime functions
   - use DMA mapping APIs

  rockchip:
   - Lots of little fixes

  vc4:
   - Render node support
   - Gamma ramp support
   - DPI output support

  msm:
   - Mostly cleanups and fixes
   - Conversion to generic struct fence

  etnaviv:
   - Fix for prime buffer handling
   - Allow hangcheck to be coalesced with other wakeups

  tegra:
   - Gamme table size fix"

* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (1050 commits)
  drm/edid: add displayid detailed 1 timings to the modelist. (v1.1)
  drm/edid: move displayid validation to it's own function.
  drm/displayid: Iterate over all DisplayID blocks
  drm/edid: move displayid tiled block parsing into separate function.
  drm: Nuke ->vblank_disable_allowed
  drm/vmwgfx: Report vmwgfx version to vmware.log
  drm/vmwgfx: Add VMWare host messaging capability
  drm/vmwgfx: Kill some lockdep warnings
  drm/nouveau/gr/gf100-: fix race condition in fecs/gpccs ucode
  drm/nouveau/core: recognise GM108 chipsets
  drm/nouveau/gr/gm107-: fix touching non-existent ppcs in attrib cb setup
  drm/nouveau/gr/gk104-: share implementation of ppc exception init
  drm/nouveau/gr/gk104-: move rop_active_fbps init to nonctx
  drm/nouveau/bios/pll: check BIT table version before trying to parse it
  drm/nouveau/bios/pll: prevent oops when limits table can't be parsed
  drm/nouveau/volt/gk104: round up in gk104_volt_set
  drm/nouveau/fb/gm200: setup mmu debug buffer registers at init()
  drm/nouveau/fb/gk20a,gm20b: setup mmu debug buffer registers at init()
  drm/nouveau/fb/gf100-: allocate mmu debug buffers
  drm/nouveau/fb: allow chipset-specific actions for oneinit()
  ...
2016-05-23 11:48:48 -07:00
Dave Gordon fce91f22ff drm/i915/guc: add enable_guc_loading parameter
Split the function of "enable_guc_submission" into two separate
options.  The new one ("enable_guc_loading") controls only the
*fetching and loading* of the GuC firmware image. The existing
one is redefined to control only the *use* of the GuC for batch
submission once the firmware is loaded.

In addition, the degree of control has been refined from a simple
bool to an integer key, allowing several options:
-1 (default)     whatever the platform default is
 0  DISABLE      don't load/use the GuC
 1  BEST EFFORT  try to load/use the GuC, fallback if not available
 2  REQUIRE      must load/use the GuC, else leave the GPU wedged

The new platform default (as coded here) will be to attempt to
load the GuC iff the device has a GuC that requires firmware,
but not yet to use it for submission. A later patch will change
to enable it if appropriate.

v4:
    Changed some error-message levels, mostly ERROR->INFO, per
    review comments by Tvrtko Ursulin.

v5:
    Dropped one more error message, disabled GuC submission on
    hypothetical firmware-free devices [Tvrtko Ursulin].

v6:
    Logging tidy by Tvrtko Ursulin:
     * Do not log falling back to execlists when wedging the GPU.
     * Do not log fw load errors when load was disabled by user.
     * Pass down some error code from fw load for log message to
       make more sense.

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (v5)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Tested-by: Fiedorowicz, Lukasz <lukasz.fiedorowicz@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (v5)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Nick Hoath <nicholas.hoath@intel.com> (v6)
2016-05-23 14:21:52 +01:00
Dave Gordon 1a3d189837 drm/i915/guc: distinguish HAS_GUC() from HAS_GUC_UCODE/HAS_GUC_SCHED
For now, anything with a GuC requires uCode loading, and then supports
command submission once loaded. But these are logically distinct from
simply "having a GuC", so we need a separate macro for the latter. Then,
various tests should use this new macro rather than HAS_GUC_UCODE() or
testing enable_guc_submission.

v4:
    Added a couple more uses of the new macro.

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-05-23 14:21:52 +01:00
Dave Gordon f09d675f02 drm/i915/guc: rename loader entry points
The GuC initialisation code could do other things apart from loading
firmware, so here we rename the three primary entry points to remove any
specific reference to "ucode" (no functional changes, just renaming).

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-05-23 14:21:52 +01:00
Chris Wilson 157d2c7fad drm/i915: Stop automatically retiring requests after a GPU hang
Following a GPU hang, we break out of the request loop in order to
unlock the struct_mutex for use by the GPU reset. However, if we retire
all the requests at that moment, we cannot identify the guilty request
after performing the reset.

v2: Not automatically retiring requests forces us to recheck for
available ringspace.

Fixes: f4457ae71f ("drm/i915: Prevent leaking of -EIO from i915_wait_request()")
Testcase: igt/gem_reset_stats/ban-*
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Tested-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463137042-9669-4-git-send-email-chris@chris-wilson.co.uk
(cherry picked from commit e075a32f51)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-05-23 16:21:32 +03:00
Ville Syrjälä 5fbd0418ee drm/i915: Re-enable GGTT earlier during resume on pre-gen6 platforms
Move the intel_enable_gtt() call to happen before we touch the GTT
during resume. Right now it's done way too late. Before
commit ebb7c78d35 ("agp/intel-gtt: Only register fake agp driver for gen1")
it was actually done earlier on account of also getting called from
the resume hook of the fake agp driver. With the fake agp driver
no longer getting registered we must move the call up.

The symptoms I've seen on my 830 machine include lowmem corruption,
other kinds of memory corruption, and straight up hung machine during
or just after resume. Not really sure what causes the memory corruption,
but so far I've not seen any with this fix.

I think we shouldn't really need to call this during init, but we have
been doing that so I've decided to keep the call. However moving that
call earlier could be prudent as well. Doing it right after the
intel-gtt probe seems appropriate.

Also tested this on 946gz,elk,ilk and all seemed quite happy with
this change.

v2: Reorder init_hw vs. enable_hw functions (Chris)

Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: drm-intel-fixes@lists.freedesktop.org
Fixes: ebb7c78d35 ("agp/intel-gtt: Only register fake agp driver for gen1")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1462559755-353-1-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit ac840ae535)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-05-23 11:10:48 +03:00
Linus Torvalds 2f37dd131c Staging and IIO driver update for 4.7-rc1
Here's the big staging and iio driver update for 4.7-rc1.
 
 I think we almost broke even with this release, only adding a few more
 lines than we removed, which isn't bad overall given that there's a
 bunch of new iio drivers added.  The Lustre developers seem to have
 woken up from their sleep and have been doing a great job in cleaning up
 the code and pruning unused or old cruft, the filesystem is almost
 readable :)
 
 Other than that, just a lot of basic coding style cleanups in the churn.
 All have been in linux-next for a while with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlc/00QACgkQMUfUDdst+ynXYQCdG9oEsw4CCItbjGfQau5YVGbd
 TOcAnA19tZz+Wcg3sLT8Zsm979dgVvDt
 =9UG/
 -----END PGP SIGNATURE-----

Merge tag 'staging-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging and IIO driver updates from Greg KH:
 "Here's the big staging and iio driver update for 4.7-rc1.

  I think we almost broke even with this release, only adding a few more
  lines than we removed, which isn't bad overall given that there's a
  bunch of new iio drivers added.

  The Lustre developers seem to have woken up from their sleep and have
  been doing a great job in cleaning up the code and pruning unused or
  old cruft, the filesystem is almost readable :)

  Other than that, just a lot of basic coding style cleanups in the
  churn.  All have been in linux-next for a while with no reported
  issues"

* tag 'staging-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (938 commits)
  Staging: emxx_udc: emxx_udc: fixed coding style issue
  staging/gdm724x: fix "alignment should match open parenthesis" issues
  staging/gdm724x: Fix avoid CamelCase
  staging: unisys: rename misleading var ii with frag
  staging: unisys: visorhba: switch success handling to error handling
  staging: unisys: visorhba: main path needs to flow down the left margin
  staging: unisys: visorinput: handle_locking_key() simplifications
  staging: unisys: visorhba: fail gracefully for thread creation failures
  staging: unisys: visornic: comment restructuring and removing bad diction
  staging: unisys: fix format string %Lx to %llx for u64
  staging: unisys: remove unused struct members
  staging: unisys: visorchannel: correct variable misspelling
  staging: unisys: visorhba: replace functionlike macro with function
  staging: dgnc: Need to check for NULL of ch
  staging: dgnc: remove redundant condition check
  staging: dgnc: fix 'line over 80 characters'
  staging: dgnc: clean up the dgnc_get_modem_info()
  staging: lustre: lnet: enable configuration per NI interface
  staging: lustre: o2iblnd: properly set ibr_why
  staging: lustre: o2iblnd: remove last of kiblnd_tunables_fini
  ...
2016-05-20 22:20:48 -07:00
Dave Gordon 85d1225ec0 drm/i915: Introduce & use new lightweight SGL iterators
The existing for_each_sg_page() iterator is somewhat heavyweight, and is
limiting i915 driver performance in a few benchmarks. So here we
introduce somewhat lighter weight iterators, primarily for use with GEM
objects or other case where we need only deal with whole aligned pages.

Unlike the old iterator, the new iterators use an internal state
structure which is not intended to be accessed by the caller; instead
each takes as a parameter an output variable which is set before each
iteration. This makes them particularly simple to use :)

One of the new iterators provides the caller with the DMA address of
each page in turn; the other provides the 'struct page' pointer required
by many memory management operations.

Various uses of for_each_sg_page() are then converted to the new macros.

v2: Force inlining of the sg_iter constructor and make the union
anonymous.

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1463741647-15666-4-git-send-email-chris@chris-wilson.co.uk
2016-05-20 13:43:00 +01:00
Dave Gordon b338fa473e drm/i915: optimise i915_gem_object_map() for small objects
We're using this function for ringbuffers and other "small" objects, so
it's worth avoiding an extra malloc()/free() cycle if the page array is
small enough to put on the stack. Here we've chosen an arbitrary cutoff
of 32 (4k) pages, which is big enough for a ringbuffer (4 pages) or a
context image (currently up to 22 pages).

v5:
    change name of local array [Chris Wilson]

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1463741647-15666-3-git-send-email-chris@chris-wilson.co.uk
2016-05-20 13:42:58 +01:00
Dave Gordon dd6034c67a drm/i915: refactor i915_gem_object_pin_map()
The recently-added i915_gem_object_pin_map() can be further optimised
for "small" objects. To facilitate this, and simplify the error paths
before adding the new code, this patch pulls out the "mapping" part of
the operation (involving local allocations which must be undone before
return) into its own subfunction.

The next patch will then insert the new optimisation into the middle of
the now-separated subfunction.

This reorganisation will probably not affect the generated code, as the
compiler will most likely inline it anyway, but it makes the logical
structure a bit clearer and easier to modify.

v2:
    Restructure loop-over-pages & error check [Chris Wilson]

v3:
    Add page count to debug messages [Chris Wilson]
    Convert WARN_ON() to GEM_BUG_ON()

v4:
    Drop the DEBUG messages [Tvrtko Ursulin]

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1463741647-15666-2-git-send-email-chris@chris-wilson.co.uk
2016-05-20 13:42:36 +01:00
Chris Wilson 72778cb266 drm/i915/userptr: Convert to drm_i915_private
userptr directly only uses drm_device in a single interface where it
meant to use drm_i915_private (everywhere else we have to derive it from
the drm_i915_gem_object and so require going from drm_device).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463671036-3235-1-git-send-email-chris@chris-wilson.co.uk
2016-05-19 17:57:08 +01:00
Daniel Vetter 9a652cc01e Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued
Backmerge request by Jani to get at

commit 249c4f538b
Author: Deepak M <m.deepak@intel.com>
Date:   Wed Mar 30 17:03:39 2016 +0300

    drm: Add new DCS commands in the enum list

Some simple conflicts in intel_dp.c.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2016-05-17 12:15:49 +02:00
Chris Wilson a8ad0bd84f drm: Remove unused drm_device from drm_gem_object_lookup()
drm_gem_object_lookup() has never required the drm_device for its file
local translation of the user handle to the GEM object. Let's remove the
unused parameter and save some space.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: dri-devel@lists.freedesktop.org
Cc: Dave Airlie <airlied@redhat.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
[danvet: Fixup kerneldoc too.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-05-17 08:47:30 +02:00
Chris Wilson 461fb99c15 drm/i915: Update domain tracking for GEM objects on hibernation
When creating the hibernation image, the CPU will read the pages of all
objects and thus conflict with our domain tracking. We need to update
our domain tracking to accurately reflect the state on restoration.

v2: Perform the domain tracking inside freeze, before the image is
written, rather than upon restoration.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Cc: David Weinehall <david.weinehall@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463207195-22076-2-git-send-email-chris@chris-wilson.co.uk
2016-05-14 08:51:39 +01:00
Chris Wilson e075a32f51 drm/i915: Stop automatically retiring requests after a GPU hang
Following a GPU hang, we break out of the request loop in order to
unlock the struct_mutex for use by the GPU reset. However, if we retire
all the requests at that moment, we cannot identify the guilty request
after performing the reset.

v2: Not automatically retiring requests forces us to recheck for
available ringspace.

Fixes: f4457ae71f ("drm/i915: Prevent leaking of -EIO from i915_wait_request()")
Testcase: igt/gem_reset_stats/ban-*
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Tested-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463137042-9669-4-git-send-email-chris@chris-wilson.co.uk
2016-05-13 12:39:20 +01:00
Chris Wilson e6db746908 drm/i915: Stop retiring requests from busy/wait ioctls
In order to reduce the workload of the caller, we do not want to
actually have to retire requests of others when checking the busy status
of this object. This applies to both busy/wait ioctls as the wait ioctl
has a semantically equivalent mode to the busy ioctl.

At the present time, this is only a minor improvement to reduce the
workload of the busy ioctl under the struct_mutex which helps to reduce
its impact upon contention of struct_mutex. However, since it is mostly
a victim in highly contended scenarios, the impact is very minor until
we can eliminate the struct_mutex requirement for busy-ioctl in the near
future.

v2: Mention the patches intended limited impact. It is just paving the
way for greater changes whilst reducing the impact of a bugfix in the
next patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463137042-9669-3-git-send-email-chris@chris-wilson.co.uk
2016-05-13 12:38:59 +01:00
Tvrtko Ursulin 7e22dbbbae drm/i915: Replace "INTEL_INFO->gen == x" checks with IS_GENx
This way optimization from a previous patch works even better.

v2: Rebase.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
2016-05-11 12:27:27 +01:00
Ville Syrjälä ac840ae535 drm/i915: Re-enable GGTT earlier during resume on pre-gen6 platforms
Move the intel_enable_gtt() call to happen before we touch the GTT
during resume. Right now it's done way too late. Before
commit ebb7c78d35 ("agp/intel-gtt: Only register fake agp driver for gen1")
it was actually done earlier on account of also getting called from
the resume hook of the fake agp driver. With the fake agp driver
no longer getting registered we must move the call up.

The symptoms I've seen on my 830 machine include lowmem corruption,
other kinds of memory corruption, and straight up hung machine during
or just after resume. Not really sure what causes the memory corruption,
but so far I've not seen any with this fix.

I think we shouldn't really need to call this during init, but we have
been doing that so I've decided to keep the call. However moving that
call earlier could be prudent as well. Doing it right after the
intel-gtt probe seems appropriate.

Also tested this on 946gz,elk,ilk and all seemed quite happy with
this change.

v2: Reorder init_hw vs. enable_hw functions (Chris)

Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: drm-intel-fixes@lists.freedesktop.org
Fixes: ebb7c78d35 ("agp/intel-gtt: Only register fake agp driver for gen1")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1462559755-353-1-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-05-10 12:29:33 +03:00
Chris Wilson c033666a94 drm/i915: Store a i915 backpointer from engine, and use it
text	   data	    bss	    dec	    hex	filename
6309351	3578714	 696320	10584385	 a18141	vmlinux
6308391	3578714	 696320	10583425	 a17d81	vmlinux

Almost 1KiB of code reduction.

v2: More s/INTEL_INFO()->gen/INTEL_GEN()/ and IS_GENx() conversions

   text	   data	    bss	    dec	    hex	filename
6304579	3578778	 696320	10579677	 a16edd	vmlinux
6303427	3578778	 696320	10578525	 a16a5d	vmlinux

Now over 1KiB!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1462545621-30125-3-git-send-email-chris@chris-wilson.co.uk
2016-05-09 13:41:24 +01:00
Tvrtko Ursulin 7d99373975 drm/i915: Simplify intel_mark_busy/idle
They use dev_priv exclusively so pass it in instead of dev
for smaller source and binary.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1461844620-35360-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-05-04 10:13:48 +01:00
Dave Airlie fffb675106 Merge tag 'drm-intel-next-2016-04-25' of git://anongit.freedesktop.org/drm-intel into drm-next
- more userptr cornercase fixes from Chris
- clean up and tune forcewake handling (Tvrtko)
- more underrun fixes from Ville, mostly for ilk to appeas CI
- fix unclaimed register warnings on vlv/chv and enable the debug code to catch
  them by default (Ville)
- skl gpu hang fixes for gt3/4 (Mika Kuoppala)
- edram improvements for gen9+ (Mika again)
- clean up gpu reset corner cases (Chris)
- fix ctx/ring machine deaths on snb/ilk (Chris)
- MOCS programming for all engines (Peter Antoine)
- robustify/clean up vlv/chv irq handler (Ville)
- split gen8+ irq handlers into ack/handle phase (Ville)
- tons of bxt rpm fixes (mostly around firmware interactions), from Imre
- hook up panel fitting for dsi panels (Ville)
- more runtime PM fixes all over from Imre
- shrinker polish (Chris)
- more guc fixes from Alex Dai and Dave Gordon
- tons of bugfixes and small polish all over (but with a big focus on bxt)

* tag 'drm-intel-next-2016-04-25' of git://anongit.freedesktop.org/drm-intel: (142 commits)
  drm/i915: Update DRIVER_DATE to 20160425
  drm/i915/bxt: Explicitly clear the Turbo control register
  drm/i915: Correct the i915_frequency_info debugfs output
  drm/i915: Macros to convert PM time interval values to microseconds
  drm/i915: Make RPS EI/thresholds multiple of 25 on SNB-BDW
  drm/i915: Fake HDMI live status
  drm/i915/bxt: Force reprogramming a PHY with invalid HW state
  drm/i915/bxt: Wait for PHY1 GRC done if PHY0 was already enabled
  drm/i915/bxt: Use PHY0 GRC value for HW state verification
  drm/i915: use dev_priv directly in gen8_ppgtt_notify_vgt
  drm/i915/bxt: Enable DC5 during runtime resume
  drm/i915/bxt: Sanitize DC state tracking during system resume
  drm/i915/bxt: Don't uninit/init display core twice during system suspend/resume
  drm/i915: Inline intel_suspend_complete
  drm/i915/kbl: Don't WARN for expected secondary MISC IO power well request
  drm/i915: Fix eDP low vswing for Broadwell
  drm/i915: check for ERR_PTR from i915_gem_object_pin_map()
  drm/i915/guc: local optimisations and updating comments
  drm/i915/guc: drop cached copy of 'wq_head'
  drm/i915/guc: keep GuC doorbell & process descriptor mapped in kernel
  ...
2016-05-04 17:25:30 +10:00
Gustavo Padovan 3ed605bc8a kernel.h: add u64_to_user_ptr()
This function had copies in 3 different files. Unify them in kernel.h.

Cc: Joe Perches <joe@perches.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Rob Clark <robdclark@gmail.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Acked-by: Daniel Vetter <daniel.vetter@intel.com>	[drm/i915/]
Acked-by: Rob Clark <robdclark@gmail.com>		[drm/msm/]
Acked-by: Lucas Stach <l.stach@pengutronix.de>		[drm/etinav/]
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-29 17:03:49 -07:00
Chris Wilson 0e4ca1008e drm/i915: Fix ordering of sanitize ppgtt and sanitize execlists
The i915.enable_ppgtt option depends upon the state of
i915.enable_execlists option - so we need to sanitize execlists first.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461932305-14637-2-git-send-email-chris@chris-wilson.co.uk
2016-04-29 13:59:05 +01:00
Chris Wilson e7ae86bab9 drm/i915: Unify GPU resets upon shutdown
Both execlists and legacy need to reset the context (and mode) of the
GPU before we lose control of the system. By resetting the GPU, we
revert back to default settings. This simplifies the life of any
subsequent driver (in particular for virtualized setups) as it does not
then have to try and recover from an unknown condition. As both paths
need to reset for the same reason, move the reset to a common point.

This unifies the resets added in a647828afc (drm/i915: Also perform gpu
reset under execlist mode) and 8e96d9c4d9 (drm/i915: reset the GPU on
context fini).

v2: Restrict the reset to "modern" gen (where we enable HW contexts) to
try and avoid leaving the machine in an unusable state with a risky
reset on older GPU. This should keep the status quo as to who performs
resets (i.e. currently only GPUs with HW contexts perform a reset on
shutdown).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: "Niu, Bing" <bing.niu@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-25-git-send-email-chris@chris-wilson.co.uk
2016-04-28 12:17:32 +01:00
Tvrtko Ursulin e39d42fa7e drm/i915: Stop tracking execlists retired requests
With the previous patch having extended the pinned lifetime of
contexts by referencing the previous context from the current
request until the latter is retired (completed by the GPU),
we can now remove usage of execlist retired queue entirely.

This is because the above now guarantees that all execlist
object access requirements are satisfied by this new tracking,
and we can stop taking additional references and stop keeping
request on the execlists retired queue.

The latter was a source of significant scalability issues in
the driver causing performance hits on some tests. Most
dramatical of which was igt/gem_close_race which had run time
in tens of minutes which is now reduced to tens of seconds.

Signed-off-by: Tvrtko Ursulin <tvrtko@ursulin.net>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-24-git-send-email-chris@chris-wilson.co.uk
2016-04-28 12:17:32 +01:00
Chris Wilson a16a405259 drm/i915: Track the previous pinned context inside the request
As the contexts are accessed by the hardware until the switch is completed
to a new context, the hardware may still be writing to the context object
after the breadcrumb is visible. We must not unpin/unbind/prune that
object whilst still active and so we keep the previous context pinned until
the following request. We can generalise the tracking we already do via
the engine->last_context and move it to the request so that it works
equally for execlists and GuC.

v2: Drop the execlists double pin as that exposes a race inside the lrc
irq handler as it tries to access the context after it may be retired.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-22-git-send-email-chris@chris-wilson.co.uk
2016-04-28 12:17:32 +01:00
Chris Wilson 73db04cfa8 drm/i915: Move releasing of the GEM request from free to retire/cancel
If we move the release of the GEM request (i.e. decoupling it from the
various lists used for client and context tracking) after it is complete
(either by the GPU retiring the request, or by the caller cancelling the
request), we can remove the requirement that the final unreference of
the GEM request need to be under the struct_mutex.

The careful reader may notice that one or two impossible NULL pointer
tests are dropped for readability. These pointers cannot be NULL since
they are assigned during request construction and never unset.

v2,v3: Rebalance execlists by moving the context unpinning.
v4: Rebase onto -nightly
v5: Avoid trying to rebalance execlist/GuC context pinning, leave that
to the next step

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-21-git-send-email-chris@chris-wilson.co.uk
2016-04-28 12:17:32 +01:00
Chris Wilson 24f1d3cc09 drm/i915: Refactor execlists default context pinning
Refactor pinning and unpinning of contexts, such that the default
context for an engine is pinned during initialisation and unpinned
during teardown (pinning of the context handles the reference counting).
Thus we can eliminate the special case handling of the default context
that was required to mask that it was not being pinned normally.

v2: Rebalance context_queue after rebasing.
v3: Rebase to -nightly (not 40 patches in)
v4: Rebase onto request_alloc unwinding

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-19-git-send-email-chris@chris-wilson.co.uk
2016-04-28 12:17:32 +01:00
Chris Wilson bfa0120073 drm/i915: Manually unwind after a failed request allocation
In the next patches, we want to move the work out of freeing the request
and into its retirement (so that we can free the request without
requiring the struct_mutex). This means that we cannot rely on
unreferencing the request to completely teardown the request any more
and so we need to manually unwind the failed allocation. In doing so, we
reorder the allocation in order to make the unwind simple (and ensure
that we don't try to unwind a partial request that may have modified
global state) and so we end up pushing the initial preallocation down
into the engine request initialisation functions where we have the
requisite control over the state of the request.

Moving the initial preallocation into the engine is less than ideal: it
moves logic to handle a specific problem with request handling out of
the common code. On the other hand, it does allow those backends
significantly more flexibility in performing its allocations.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-14-git-send-email-chris@chris-wilson.co.uk
2016-04-28 12:17:32 +01:00
Chris Wilson 0251a96322 drm/i915: Remove the identical implementations of request space reservation
Now that we share intel_ring_begin(), reserving space for the tail of
the request is identical between legacy/execlists and so the tautology
can be removed. In the process, we move the reserved space tracking
from the ringbuffer on to the request. This is to enable us to reorder
the reserved space allocation in the next patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-13-git-send-email-chris@chris-wilson.co.uk
2016-04-28 12:17:32 +01:00
Chris Wilson f9326be5f1 drm/i915: Rearrange switch_context to load the aliasing ppgtt on first use
The code to switch_mm() is already handled by i915_switch_context(), the
only difference required to setup the aliasing ppgtt is that we need to
emit te switch_mm() on the first context, i.e. when transitioning from
engine->last_context == NULL. This allows us to defer the
initialisation of the GPU from early device initialisation to first use,
which should marginally speed up both. The caveat is that we then defer
the context initialisation until first use - i.e. we cannot assume that
the GPU engines are initialised. For example, this means that power
contexts for rc6 (Ironlake) need to explicitly loaded, as they are.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-11-git-send-email-chris@chris-wilson.co.uk
2016-04-28 12:17:32 +01:00
Chris Wilson d200cda6bd drm/i915: Remove early l3-remap
Since we do the l3-remap on context switch, we can remove the redundant
early call to set the mapping prior to performing the first context
switch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-10-git-send-email-chris@chris-wilson.co.uk
2016-04-28 12:17:32 +01:00
Chris Wilson b0ebde395d drm/i915: L3 cache remapping is part of context switching
Move the i915_gem_l3_remap function such that it next to the context
switching, which is where we perform the L3 remap.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-8-git-send-email-chris@chris-wilson.co.uk
2016-04-28 12:17:32 +01:00
Chris Wilson b2e862d08e drm/i915: Mark the current context as lost on suspend
In order to force a reload of the context image upon resume, we first
need to mark its absence on suspend. Currently we are failing to restore
the golden context state and any context w/a to the default context
after resume.

One oversight corrected, is that we had forgotten to reapply the L3
remapping when restoring the lost default context.

v2: Remove deprecated WARN.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-7-git-send-email-chris@chris-wilson.co.uk
2016-04-28 12:17:32 +01:00
Chris Wilson 8ef8561f2c drm/i915: Move ioremap_wc tracking onto VMA
By tracking the iomapping on the VMA itself, we can share that area
between multiple users. Also by only revoking the iomapping upon
unbinding from the mappable portion of the GGTT, we can keep that iomap
across multiple invocations (e.g. execlists context pinning).

Note that by moving the iounnmap tracking to the VMA, we actually end up
fixing a leak of the iomapping in intel_fbdev.

v1.5: Rebase prompted by Tvrtko
v2: Drop dev_priv parameter, we can recover the i915_ggtt from the vma.
v3: Move handling of ioremap space exhaustion to vmap_purge and also
allow vmallocs to recover old iomaps. Add Tvrtko's kerneldoc.
v4: Fix a use-after-free in shrinker and rearrange i915_vma_iomap
v5: Back to i915_vm_to_ggtt
v6: Use i915_vma_pin_iomap and i915_vma_unpin_iomap to mark critical
sections and ensure the VMA cannot be reaped whilst mapped.
v7: Move i915_vma_iounmap so that consumers of the API are not tempted,
and add iomem annotations

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-5-git-send-email-chris@chris-wilson.co.uk
2016-04-28 12:17:32 +01:00
Chris Wilson fe3db79b0b drm/i915: Propagate error from drm_gem_object_init()
Propagate the real error from drm_gem_object_init(). Note this also
fixes some confusion in the error return from i915_gem_alloc_object...

v2:
(Matthew Auld)
  - updated new users of gem_alloc_object from latest drm-nightly
  - replaced occurrences of IS_ERR_OR_NULL() with IS_ERR()
v3:
(Joonas Lahtinen)
  - fix double "From:" in commit message
  - add goto teardown path
v4:
(Matthew Auld)
  - rebase with i915_gem_alloc_object name change

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461587533-8841-1-git-send-email-matthew.auld@intel.com
[Joonas: Removed spurious " = NULL" from _init() function]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-04-28 12:28:58 +03:00
Tvrtko Ursulin ff5ec22dad drm/i915: Simplify i915_gem_obj_ggtt_bound_view
Can use the new vma->is_gtt to simplify the check and
get rid of the local variables.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461240286-25968-1-git-send-email-tvrtko.ursulin@linux.intel.com
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-04-25 12:34:55 +01:00
Tvrtko Ursulin 8aac2220cc drm/i915: Simplify i915_gem_obj_ggtt_offset_view
Can use the new vma->is_ggtt to simplify the check and
remove the local variables.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
2016-04-25 12:34:51 +01:00
Tvrtko Ursulin 598b9ec8ef drm/i915: Simplify i915_gem_obj_to_ggtt_view
Can use vma->is_ggtt to simplify the check and also switch the
BUG_ON to GEM_BUG_ON which is more appropriate for this.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
2016-04-25 12:34:34 +01:00
Tvrtko Ursulin 8da32727ac drm/i915: Remove i915_gem_obj_size
Only caller is i915_gem_obj_ggtt_size which only cares about
GGTT so simplify it and implement under that name.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-04-25 12:34:05 +01:00
Dave Gordon d37cd8a887 drm/i915: rename i915_gem_alloc_object() to i915_gem_object_create()
Because having both i915_gem_object_alloc() and i915_gem_alloc_object()
(with different return conventions) is just too confusing!

(i915_gem_object_alloc() is the low-level memory allocator, and remains
unchanged, whereas i915_gem_alloc_object() is a constructor that ALSO
initialises the newly-allocated object.)

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461348872-4702-1-git-send-email-david.s.gordon@intel.com
2016-04-25 12:31:34 +01:00
Dave Airlie 605b28c859 Merge tag 'drm-intel-next-2016-04-11' of git://anongit.freedesktop.org/drm-intel into drm-next
- make modeset hw state checker atomic aware (Maarten)
- close races in gpu stuck detection/seqno reading (Chris)
- tons&tons of small improvements from Chris Wilson all over the gem code
- more dsi/bxt work from Ramalingam&Jani
- macro polish from Joonas
- guc fw loading fixes (Arun&Dave)
- vmap notifier (acked by Andrew) + i915 support by Chris Wilson
- create bottom half for execlist irq processing (Chris Wilson)
- vlv/chv pll cleanup (Ville)
- rework DP detection, especially sink detection (Shubhangi Shrivastava)
- make color manager support fully atomic (Maarten)
- avoid livelock on chv in execlist irq handler (Chris)

* tag 'drm-intel-next-2016-04-11' of git://anongit.freedesktop.org/drm-intel: (82 commits)
  drm/i915: Update DRIVER_DATE to 20160411
  drm/i915: Avoid allocating a vmap arena for a single page
  drm,i915: Introduce drm_malloc_gfp()
  drm/i915/shrinker: Restrict vmap purge to objects with vmaps
  drm/i915: Refactor duplicate object vmap functions
  drm/i915: Consolidate common error handling in intel_pin_and_map_ringbuffer_obj
  drm/i915/dmabuf: Tighten struct_mutex for unmap_dma_buf
  drm/i915: implement WaClearTdlStateAckDirtyBits
  drm/i915/bxt: Reversed polarity of PORT_PLL_REF_SEL bit
  drm/i915: Rename hw state checker to hw state verifier.
  drm/i915: Move modeset state verifier calls.
  drm/i915: Make modeset state verifier take crtc as argument.
  drm/i915: Replace manual barrier() with READ_ONCE() in HWS accessor
  drm/i915: Use simplest form for flushing the single cacheline in the HWS
  drm/i915: Harden detection of missed interrupts
  drm/i915: Separate out the seqno-barrier from engine->get_seqno
  drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+
  drm/i915: Fixup the free space logic in ring_prepare
  drm/i915: Simplify check for idleness in hangcheck
  drm/i915: Apply a mb between emitting the request and hangcheck
  ...
2016-04-22 09:03:31 +10:00
Dave Airlie 49047962ec Linux 4.6-rc3
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXCva8AAoJEHm+PkMAQRiGXBoIAIkrjxdbuT2nS9A3tHwkiFXa
 6/Th1UjbNaoLuZ+MckQHayAD9NcWY9lVjOUmFsSiSWMCQK/rTWDl8x5ITputrY2V
 VuhrJCwI7huEtu6GpRaJaUgwtdOjhIHz1Ue2MCdNIbKX3l+LjVyyJ9Vo8rruvZcR
 fC7kiivH04fYX58oQ+SHymCg54ny3qJEPT8i4+g26686m11hvZLI3UAs2PAn6ut+
 atCjxdQ4yLN3DWsbjuA7wYGWhTgFloxL4TIoisuOUc3FXnSi/ivIbXZvu4lUfisz
 LA2JBhfII3AEMBWG9xfGbXPijJTT4q7yNlTD0oYcnMtAt/Roh2F04asqB1LetEY=
 =bri6
 -----END PGP SIGNATURE-----

Merge tag 'v4.6-rc3' into drm-next

Backmerge 4.6-rc3 for i915.

Linux 4.6-rc3
2016-04-22 08:32:51 +10:00
Daniel Vetter f74418a400 drm/vma_manage: Drop has_offset
It's racy, creating mmap offsets is a slowpath, so better to remove it
to avoid drivers doing broken things.

The only user is i915, and it's ok there because everything (well
almost) is protected by dev->struct_mutex in i915-gem.

While at it add a note in the create_mmap_offset kerneldoc that
drivers must release it again. And then I also noticed that
drm_gem_object_release entirely lacks kerneldoc.

Cc: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1459330852-27668-14-git-send-email-daniel.vetter@ffwll.ch
2016-04-20 12:58:53 +02:00
Tvrtko Ursulin 791bee125b drm/i915: Remove a couple pointless WARN_ONs
Just two WARN_ONs followed by pointer dereference I spotted by accident.

v2: Remove some more of the same.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1461080770-14693-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-04-20 09:59:17 +01:00
Ingo Molnar 6666ea558b Linux 4.6-rc4
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXFELfAAoJEHm+PkMAQRiGRYIH+wWsUva7TR9arN1ZrURvI17b
 KqyQH8Ov9zJBsIaq/rFXOr5KfNgx7BU9BL9h7QkBy693HXTWf+GTZ1czHM4N12C3
 0ZdHGrLwTHo2zdisiQaFORZSfhSVTUNGXGHXw13bUMgEqatPgkozXEnsvXXNdt1Z
 HtlcuJn3pcj+QIY7qDXZgTLTwgn248hi1AgNag+ntFcWiz21IYaMIi7/mCY9QUIi
 AY+Y3hqFQM7/8cVyThGS5wZPTg1YzdhsLJpoCk0TbS8FvMEnA+ylcTgc15C78bwu
 AxOwM3OCmH4gMsd7Dd/O+i9lE3K6PFrgzdDisYL3P7eHap+EdiLDvVzPDPPx0xg=
 =Q7r3
 -----END PGP SIGNATURE-----

Merge tag 'v4.6-rc4' into x86/asm, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-19 10:38:52 +02:00
Peter Antoine 0ccdacf694 drm/i915/mocs: Program MOCS for all engines on init
Allow for the MOCS to be programmed for all engines.
Currently we program the MOCS when the first render batch
goes through. This works on most platforms but fails on
platforms that do not run a render batch early,
i.e. headless servers. The patch now programs all initialised engines
on init and the RCS is programmed again within the initial batch. This
is done for predictable consistency with regards to the hardware
context.

Hardware context loading sets the values of the MOCS for RCS
and L3CC. Programming them from within the batch makes sure that
the render context is valid, no matter what the previous state of
the saved-context was.

v2: posted correct version to the mailing list.
v3: moved programming to within engine->init_hw() (Chris Wilson)
v4: code formatting and white-space changes. (Chris Wilson)

Testcase: igt/gem_mocs_settings
Signed-off-by: Peter Antoine <peter.antoine@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1460556205-6644-1-git-send-email-chris@chris-wilson.co.uk
2016-04-14 10:45:40 +01:00
Chris Wilson aa9b78104f drm/i915: Late request cancellations are harmful
Conceptually, each request is a record of a hardware transaction - we
build up a list of pending commands and then either commit them to
hardware, or cancel them. However, whilst building up the list of
pending commands, we may modify state outside of the request and make
references to the pending request. If we do so and then cancel that
request, external objects then point to the deleted request leading to
both graphical and memory corruption.

The easiest example is to consider object/VMA tracking. When we mark an
object as active in a request, we store a pointer to this, the most
recent request, in the object. Then we want to free that object, we wait
for the most recent request to be idle before proceeding (otherwise the
hardware will write to pages now owned by the system, or we will attempt
to read from those pages before the hardware is finished writing). If
the request was cancelled instead, that wait completes immediately. As a
result, all requests must be committed and not cancelled if the external
state is unknown.

All that remains of i915_gem_request_cancel() users are just a couple of
extremely unlikely allocation failures, so remove the API entirely.

A consequence of committing all incomplete requests is that we generate
excess breadcrumbs and fill the ring much more often with dummy work. We
have completely undone the outstanding_last_seqno optimisation.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93907
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-16-git-send-email-chris@chris-wilson.co.uk
2016-04-14 10:45:40 +01:00
Chris Wilson 349f2ccff9 drm/i915: Move the mb() following release-mmap into release-mmap
As paranoia, we want to ensure that the CPU's PTEs have been revoked for
the object before we return from i915_gem_release_mmap(). This allows us
to rely on there being no outstanding memory accesses from userspace
and guarantees serialisation of the code against concurrent access just
by calling i915_gem_release_mmap().

v2: Reduce the mb() into a wmb() following the revoke.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-13-git-send-email-chris@chris-wilson.co.uk
2016-04-14 10:45:40 +01:00
Chris Wilson f4457ae71f drm/i915: Prevent leaking of -EIO from i915_wait_request()
Reporting -EIO from i915_wait_request() has proven very troublematic
over the years, with numerous hard-to-reproduce bugs cropping up in the
corner case of where a reset occurs and the code wasn't expecting such
an error.

If the we reset the GPU or have detected a hang and wish to reset the
GPU, the request is forcibly complete and the wait broken. Currently, we
report either -EAGAIN or -EIO in order for the caller to retreat and
restart the wait (if appropriate) after dropping and then reacquiring
the struct_mutex (essential to allow the GPU reset to proceed). However,
if we take the view that the request is complete (no further work will
be done on it by the GPU because it is dead and soon to be reset), then
we can proceed with the task at hand and then drop the struct_mutex
allowing the reset to occur. This transfers the burden of checking
whether it is safe to proceed to the caller, which in all but one
instance it is safe - completely eliminating the source of all spurious
-EIO.

Of note, we only have two API entry points where we expect that
userspace can observe an EIO. First is when submitting an execbuf, if
the GPU is terminally wedged, then the operation cannot succeed and an
-EIO is reported. Secondly, existing userspace uses the throttle ioctl
to detect an already wedged GPU before starting using HW acceleration
(or to confirm that the GPU is wedged after an error condition). So if
the GPU is wedged when the user calls throttle, also report -EIO.

v2: Split more carefully the change to i915_wait_request() and assorted
ABI from the reset handling.
v3: Add a couple of WARN_ON(EIO) to the interruptible modesetting code
so that we don't start to leak EIO there in future (and break our hang
resistant modesetting).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-9-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-1-git-send-email-chris@chris-wilson.co.uk
2016-04-14 10:45:40 +01:00
Chris Wilson 299259a3a9 drm/i915: Store the reset counter when constructing a request
As the request is only valid during the same global reset epoch, we can
record the current reset_counter when constructing the request and reuse
it when waiting upon that request in future. This removes a very hairy
atomic check serialised by the struct_mutex at the time of waiting and
allows us to transfer those waits to a central dispatcher for all
waiters and all requests.

PS: With per-engine resets, we obviously cannot assume a global reset
epoch for the requests - a per-engine epoch makes the most sense. The
challenge then is how to handle checking in the waiter for when to break
the wait, as the fine-grained reset may also want to requeue the
request (i.e. the assumption that just because the epoch changes the
request is completed may be broken - or we just avoid breaking that
assumption with the fine-grained resets).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-7-git-send-email-chris@chris-wilson.co.uk
2016-04-14 10:45:40 +01:00
Chris Wilson d98c52cf4f drm/i915: Tighten reset_counter for reset status
In the reset_counter, we use two bits to track a GPU hang and reset. The
low bit is a "reset-in-progress" flag that we set to signal when we need
to break waiters in order for the recovery task to grab the mutex. As
soon as the recovery task has the mutex, we can clear that flag (which
we do by incrementing the reset_counter thereby incrementing the gobal
reset epoch). By clearing that flag when the recovery task holds the
struct_mutex, we can forgo a second flag that simply tells GEM to ignore
the "reset-in-progress" flag.

The second flag we store in the reset_counter is whether the
reset failed and we consider the GPU terminally wedged. Whilst this flag
is set, all access to the GPU (at least through GEM rather than direct mmio
access) is verboten.

PS: Fun is in store, as in the future we want to move from a global
reset epoch to a per-engine reset engine with request recovery.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-6-git-send-email-chris@chris-wilson.co.uk
2016-04-14 10:45:40 +01:00
Chris Wilson c19ae989b0 drm/i915: Hide the atomic_read(reset_counter) behind a helper
This is principally a little bit of syntatic sugar to hide the
atomic_read()s throughout the code to retrieve the current reset_counter.
It also provides the other utility functions to check the reset state on the
already read reset_counter, so that (in later patches) we can read it once
and do multiple tests rather than risk the value changing between tests.

v2: Be more strict on converting existing i915_reset_in_progress() over to
the more verbose i915_reset_in_progress_or_wedged().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-4-git-send-email-chris@chris-wilson.co.uk
2016-04-14 10:45:40 +01:00
Chris Wilson d501b1d2b1 drm/i915: Add GEM debugging Kconfig option
Currently there is a #define to enable extra BUG_ON for debugging
requests and associated activities. I want to expand its use to cover
all of GEM internals (so that we can saturate the code with asserts).
We can add a Kconfig option to make it easier to enable - with the usual
caveats of not enabling unless explicitly requested.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-3-git-send-email-chris@chris-wilson.co.uk
2016-04-14 10:45:40 +01:00
Mika Kuoppala 3accaf7e73 drm/i915: Store and use edram capabilities
Store the edram capabilities instead of only the size of
edram. This is preparatory patch to allow edram size calculation
based on edram capability bits for gen9+. With gen9 the
edram is behind llc and is a separate entity. With hsw/bdw
it was more of a victim cache for LLC so the name 'eLLC' might
be warranted. Regardless, rename all mentions of eLLC to EDRAM to
clear the confusion.

v2: return bytes for edram size (Chris)
    s/eLLC/eDRAM in output if we are gen > 8

v3: rebase, INTEL_GEN (Chris)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-04-14 12:27:37 +03:00
Mika Kuoppala 666fbcf5c2 drm/i915: Don't program eLLC IDI hash mask for gen9+
For gen9 onwards, eDRAM is a true memory side cache. So
there is no need to program idi hash mask as it is for eLLC
only.

v2: INTEL_GEN (Chris), s/has/hash (Matthew)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
2016-04-14 12:27:37 +03:00
Daniel Vetter 3970285319 Linux 4.6-rc3
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXCva8AAoJEHm+PkMAQRiGXBoIAIkrjxdbuT2nS9A3tHwkiFXa
 6/Th1UjbNaoLuZ+MckQHayAD9NcWY9lVjOUmFsSiSWMCQK/rTWDl8x5ITputrY2V
 VuhrJCwI7huEtu6GpRaJaUgwtdOjhIHz1Ue2MCdNIbKX3l+LjVyyJ9Vo8rruvZcR
 fC7kiivH04fYX58oQ+SHymCg54ny3qJEPT8i4+g26686m11hvZLI3UAs2PAn6ut+
 atCjxdQ4yLN3DWsbjuA7wYGWhTgFloxL4TIoisuOUc3FXnSi/ivIbXZvu4lUfisz
 LA2JBhfII3AEMBWG9xfGbXPijJTT4q7yNlTD0oYcnMtAt/Roh2F04asqB1LetEY=
 =bri6
 -----END PGP SIGNATURE-----

Merge tag 'v4.6-rc3' into drm-intel-next-queued

Linux 4.6-rc3

Backmerge requested by Chris Wilson to make his patches apply cleanly.
Tiny conflict in vmalloc.c with the (properly acked and all) patch in
drm-intel-next:

commit 4da56b99d9
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Apr 4 14:46:42 2016 +0100

    mm/vmap: Add a notifier for when we run out of vmap address space

and Linus' tree.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2016-04-11 19:25:13 +02:00
Chris Wilson fb8621d3be drm/i915: Avoid allocating a vmap arena for a single page
If we want a contiguous mapping of a single page sized object, we can
forgo using vmap() and just use a regular kmap(). Note that this is only
suitable if the desired pgprot_t is compatible.

v2: Use is_vmalloc_addr()

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460113874-17366-7-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
2016-04-11 17:13:36 +01:00
Chris Wilson f2a85e1975 drm,i915: Introduce drm_malloc_gfp()
I have instances where I want to use drm_malloc_ab() but with a custom
gfp mask. And with those, where I want a temporary allocation, I want to
try a high-order kmalloc() before using a vmalloc().

So refactor my usage into drm_malloc_gfp().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: dri-devel@lists.freedesktop.org
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460113874-17366-6-git-send-email-chris@chris-wilson.co.uk
2016-04-11 17:13:10 +01:00
Chris Wilson 0a798eb92e drm/i915: Refactor duplicate object vmap functions
We now have two implementations for vmapping a whole object, one for
dma-buf and one for the ringbuffer. If we couple the mapping into the
obj->pages lifetime, then we can reuse an obj->mapping for both and at
the same time couple it into the shrinker. There is a third vmapping
routine in the cmdparser that maps only a range within the object, for
the time being that is left alone, but will eventually use these routines
in order to cache the mapping between invocations.

v2: Mark the failable kmalloc() as __GFP_NOWARN (vsyrjala)
v3: Call unpin_vmap from the right dmabuf unmapper

v4: Rename vmap to map as we don't wish to imply the type of mapping
involved, just that it contiguously maps the object into kernel space.
Add kerneldoc and lockdep annotations

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460113874-17366-4-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
2016-04-11 17:11:44 +01:00
Chris Wilson 7c90b7de73 drm/i915: Apply a mb between emitting the request and hangcheck
Seal the request and mark it as pending execution before we submit it to
hardware. We assume that the actual submission cannot fail (that
guarantee is provided by preallocating space in the request for the
submission). As we may inspect this state without holding any locks
during hangcheck we should apply a barrier to ensure that we do
not see a more recent value in the HWS than we are tracking.

Based on a patch by Mika Kuoppala.

Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460010558-10705-8-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
2016-04-08 11:44:48 +01:00
Chris Wilson 29dcb57026 drm/i915: Move the hw semaphore initialisation from GEM to the engine
Since we are setting engine local values that are tied to the hardware,
move it out of i915_gem_init_seqno() into the intel_ring_init_seqno()
backend, next to where the other hw semaphore registers are written.

v2: Make the explanatory comment about always resetting the semaphores to
0 irrespective of the value of the reset seqno.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460010558-10705-4-git-send-email-chris@chris-wilson.co.uk
2016-04-08 11:43:38 +01:00
Chris Wilson 2ed53a94d8 drm/i915: On GPU reset, set the HWS breadcrumb to the last seqno
After the GPU reset and we discard all of the incomplete requests, mark
the GPU as having advanced to the last_submitted_seqno (as having
completed the requests and ready for fresh work). The impact of this is
negligible, as all the requests will be considered completed by this
point, it just brings the HWS into line with expectations for external
viewers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460010558-10705-2-git-send-email-chris@chris-wilson.co.uk
2016-04-08 11:43:11 +01:00
Kirill A. Shutemov 09cbfeaf1a mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-04 10:41:08 -07:00
Tvrtko Ursulin 27af5eea54 drm/i915: Move execlists irq handler to a bottom half
Doing a lot of work in the interrupt handler introduces huge
latencies to the system as a whole.

Most dramatic effect can be seen by running an all engine
stress test like igt/gem_exec_nop/all where, when the kernel
config is lean enough, the whole system can be brought into
multi-second periods of complete non-interactivty. That can
look for example like this:

 NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u8:3:143]
 Modules linked in: [redacted for brevity]
 CPU: 0 PID: 143 Comm: kworker/u8:3 Tainted: G     U       L  4.5.0-160321+ #183
 Hardware name: Intel Corporation Broadwell Client platform/WhiteTip Mountain 1
 Workqueue: i915 gen6_pm_rps_work [i915]
 task: ffff8800aae88000 ti: ffff8800aae90000 task.ti: ffff8800aae90000
 RIP: 0010:[<ffffffff8104a3c2>]  [<ffffffff8104a3c2>] __do_softirq+0x72/0x1d0
 RSP: 0000:ffff88014f403f38  EFLAGS: 00000206
 RAX: ffff8800aae94000 RBX: 0000000000000000 RCX: 00000000000006e0
 RDX: 0000000000000020 RSI: 0000000004208060 RDI: 0000000000215d80
 RBP: ffff88014f403f80 R08: 0000000b1b42c180 R09: 0000000000000022
 R10: 0000000000000004 R11: 00000000ffffffff R12: 000000000000a030
 R13: 0000000000000082 R14: ffff8800aa4d0080 R15: 0000000000000082
 FS:  0000000000000000(0000) GS:ffff88014f400000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fa53b90c000 CR3: 0000000001a0a000 CR4: 00000000001406f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Stack:
  042080601b33869f ffff8800aae94000 00000000fffc2678 ffff88010000000a
  0000000000000000 000000000000a030 0000000000005302 ffff8800aa4d0080
  0000000000000206 ffff88014f403f90 ffffffff8104a716 ffff88014f403fa8
 Call Trace:
  <IRQ>
  [<ffffffff8104a716>] irq_exit+0x86/0x90
  [<ffffffff81031e7d>] smp_apic_timer_interrupt+0x3d/0x50
  [<ffffffff814f3eac>] apic_timer_interrupt+0x7c/0x90
  <EOI>
  [<ffffffffa01c5b40>] ? gen8_write64+0x1a0/0x1a0 [i915]
  [<ffffffff814f2b39>] ? _raw_spin_unlock_irqrestore+0x9/0x20
  [<ffffffffa01c5c44>] gen8_write32+0x104/0x1a0 [i915]
  [<ffffffff8132c6a2>] ? n_tty_receive_buf_common+0x372/0xae0
  [<ffffffffa017cc9e>] gen6_set_rps_thresholds+0x1be/0x330 [i915]
  [<ffffffffa017eaf0>] gen6_set_rps+0x70/0x200 [i915]
  [<ffffffffa0185375>] intel_set_rps+0x25/0x30 [i915]
  [<ffffffffa01768fd>] gen6_pm_rps_work+0x10d/0x2e0 [i915]
  [<ffffffff81063852>] ? finish_task_switch+0x72/0x1c0
  [<ffffffff8105ab29>] process_one_work+0x139/0x350
  [<ffffffff8105b186>] worker_thread+0x126/0x490
  [<ffffffff8105b060>] ? rescuer_thread+0x320/0x320
  [<ffffffff8105fa64>] kthread+0xc4/0xe0
  [<ffffffff8105f9a0>] ? kthread_create_on_node+0x170/0x170
  [<ffffffff814f351f>] ret_from_fork+0x3f/0x70
  [<ffffffff8105f9a0>] ? kthread_create_on_node+0x170/0x170

I could not explain, or find a code path, which would explain
a +20 second lockup, but from some instrumentation it was
apparent the interrupts off proportion of time was between
10-25% under heavy load which is quite bad.

When a interrupt "cliff" is reached, which was >~320k irq/s on
my machine, the whole system goes into a terrible state of the
above described multi-second lockups.

By moving the GT interrupt handling to a tasklet in a most
simple way, the problem above disappears completely.

Testing the effect on sytem-wide latencies using
igt/gem_syslatency shows the following before this patch:

gem_syslatency: cycles=1532739, latency mean=416531.829us max=2499237us
gem_syslatency: cycles=1839434, latency mean=1458099.157us max=4998944us
gem_syslatency: cycles=1432570, latency mean=2688.451us max=1201185us
gem_syslatency: cycles=1533543, latency mean=416520.499us max=2498886us

This shows that the unrelated process is experiencing huge
delays in its wake-up latency. After the patch the results
look like this:

gem_syslatency: cycles=808907, latency mean=53.133us max=1640us
gem_syslatency: cycles=862154, latency mean=62.778us max=2117us
gem_syslatency: cycles=856039, latency mean=58.079us max=2123us
gem_syslatency: cycles=841683, latency mean=56.914us max=1667us

Showing a huge improvement in the unrelated process wake-up
latency. It also shows an approximate halving in the number
of total empty batches submitted during the test. This may
not be worrying since the test puts the driver under
a very unrealistic load with ncpu threads doing empty batch
submission to all GPU engines each.

Another benefit compared to the hard-irq handling is that now
work on all engines can be dispatched in parallel since we can
have up to number of CPUs active tasklets. (While previously
a single hard-irq would serially dispatch on one engine after
another.)

More interesting scenario with regards to throughput is
"gem_latency -n 100" which  shows 25% better throughput and
CPU usage, and 14% better dispatch latencies.

I did not find any gains or regressions with Synmark2 or
GLbench under light testing. More benchmarking is certainly
required.

v2:
   * execlists_lock should be taken as spin_lock_bh when
     queuing work from userspace now. (Chris Wilson)
   * uncore.lock must be taken with spin_lock_irq when
     submitting requests since that now runs from either
     softirq or process context.

v3:
   * Expanded commit message with more testing data;
   * converted missed locking sites to _bh;
   * added execlist_lock comment. (Chris Wilson)

v4:
   * Mention dispatch parallelism in commit. (Chris Wilson)
   * Do not hold uncore.lock over MMIO reads since the block
     is already serialised per-engine via the tasklet itself.
     (Chris Wilson)
   * intel_lrc_irq_handler should be static. (Chris Wilson)
   * Cancel/sync the tasklet on GPU reset. (Chris Wilson)
   * Document and WARN that tasklet cannot be active/pending
     on engine cleanup. (Chris Wilson/Imre Deak)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Testcase: igt/gem_exec_nop/all
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94350
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1459768316-6670-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-04-04 14:08:52 +01:00
Joonas Lahtinen 72e96d6450 drm/i915: Refer to GGTT {,VM} consistently
Refer to the GGTT VM consistently as "ggtt->base" instead of just "ggtt",
"vm" or indirectly through other variables like "dev_priv->ggtt.base"
to avoid confusion with the i915_ggtt object itself and PPGTT VMs.

Refer to the GGTT as "ggtt" instead of indirectly through chaining.

As a bonus gets rid of the long-standing i915_obj_to_ggtt vs.
i915_gem_obj_to_ggtt conflict, due to removal of i915_obj_to_ggtt!

v2:
- Added some more after grepping sources with Chris

v3:
- Refer to GGTT VM through ggtt->base consistently instead of ggtt_vm
  (Chris)

v4:
- Convert all dev_priv->ggtt->foo accesses to ggtt->foo.

v5:
- Make patch checker happy

Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-03-31 17:55:43 +03:00
Borislav Petkov 568a58e5df x86/mm/pat, x86/cpufeature: Remove cpu_has_pat
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: intel-gfx@lists.freedesktop.org
Link: http://lkml.kernel.org/r/1459266123-21878-9-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-31 13:32:43 +02:00
Joonas Lahtinen d85489d314 drm/i915: Rename GGTT init functions
Rename and document the GGTT init functions to give a better
idea of the context where they are called from.

i915_gem_gtt_init => i915_ggtt_init_hw
i915_gem_init_global_gtt => i915_gem_init_ggtt
i915_global_gtt_cleanup => i915_ggtt_cleanup_hw

Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1458830866-12578-1-git-send-email-joonas.lahtinen@linux.intel.com
2016-03-30 13:47:18 +03:00
Matthew Auld ade7daa164 drm/i915: BUG_ON when ggtt_view is NULL
Lets BUG_ON and don't bother with a WARN and returning an error, so we can
remove the need to pollute the code with error handling, after all it is
a programmer error to provide NULL view. Also while we're here remove
redundant NULL ggtt_view check.

Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1458834860-7898-1-git-send-email-matthew.auld@intel.com
2016-03-30 13:47:18 +03:00
Dave Gordon b4ac5afc6b drm/i915: replace for_each_engine()
Having provided for_each_engine_id() for cases where the third (id)
argument is useful, we can now replace all the remaining instances with
a simpler version that takes only two parameters. In many cases, this
also allows the elimination of the local variable used in the iterator
(usually 'i').

v2:
    s/dev_priv/(dev_priv__)/ in body of for_each_engine_masked() [Chris Wilson]

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1458757194-17783-2-git-send-email-david.s.gordon@intel.com
2016-03-24 14:34:11 +00:00
Joonas Lahtinen 62106b4f6b drm/i915: Rename dev_priv->gtt to dev_priv->ggtt
Refer to Global GTT consistently as GGTT, thus rename dev_priv->gtt
to dev_priv->ggtt and struct i915_gtt to struct i915_ggtt.

Fix a couple of whitespace problems while at it.

v2:
- Fix a typo in commit message.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-03-18 15:18:15 +02:00
Tvrtko Ursulin 39dabecd99 drm/i915: Use shorter route to dev_private where possible
Where we have a request we can use req->i915 directly instead
of going through the engine and device. Coccinelle script:

@@
function f;
identifier r;
@@
f(..., struct drm_i915_gem_request *r, ...)
{
...
- engine->dev->dev_private
+ r->i915
...
}
@@
struct drm_i915_gem_request *req;
@@
(
  req->
- engine->dev->dev_private
+ i915
)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1458219850-21007-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-03-18 09:50:37 +00:00
Tvrtko Ursulin a112dbad44 drm/i915: Remove unused variable in i915_gem_request_add_to_client
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-03-18 09:50:36 +00:00
Imre Deak 40ae4e1661 drm/i915: Move load time gem_load_init earlier
The only steps requiring device access is the fence and swizzling
initialization, so split these out keeping them in their current place
and move the rest of init steps earlier.

v2-v3:
- unchanged
v4:
- move call to i915_gem_detect_bit_6_swizzle() to
  i915_gem_load_init_fences() and preserve the original order of
  the detection of HW fence capailities wrt. swizzling (Chris)

CC: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1458132843-21860-1-git-send-email-imre.deak@intel.com
2016-03-17 15:22:05 +02:00
Mika Kuoppala ee4b6faf96 drm/i915: Modify reset func to handle per engine resets
In full gpu reset we prime all engines and reset domains corresponding to
each engine. Per engine reset is just a special case of this process
wherein only a single engine is reset. This change is aimed to modify
relevant functions to achieve this. There are some other steps we carry out
in case of engine reset which are addressed in later patches.

Reset func now accepts a mask of all engines that need to be reset. Where
per engine resets are supported, error handler populates the mask
accordingly otherwise all engines are specified.

v2: ALL_ENGINES mask fixup, better for_each_ring_masked (Chris)
v3: Whitespace fixes (Chris)
v4: Rebase due to s/ring/engine

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1458143640-20563-1-git-send-email-mika.kuoppala@intel.com
2016-03-17 15:01:15 +02:00
Tvrtko Ursulin 117897f42c drm/i915: More renaming of rings to engines
This time using only sed and a few by hand.

v2: Rename also intel_ring_id and intel_ring_initialized.
v3: Fixed typo in intel_ring_initialized.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1458126040-33105-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-03-16 15:33:30 +00:00
Tvrtko Ursulin 666796da7a drm/i915: More intel_engine_cs renaming
Some trivial ones, first pass done with Coccinelle:

@@
@@
(
- I915_NUM_RINGS
+ I915_NUM_ENGINES
|
- intel_ring_flag
+ intel_engine_flag
|
- for_each_ring
+ for_each_engine
|
- i915_gem_request_get_ring
+ i915_gem_request_get_engine
|
- intel_ring_idle
+ intel_engine_idle
|
- i915_gem_reset_ring_status
+ i915_gem_reset_engine_status
|
- i915_gem_reset_ring_cleanup
+ i915_gem_reset_engine_cleanup
|
- init_ring_lists
+ init_engine_lists
)

But that didn't fully work so I cleaned it up with:

for f in *.[hc]; do sed -i -e s/I915_NUM_RINGS/I915_NUM_ENGINES/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_request_get_ring/i915_gem_request_get_engine/ $f; done
for f in *.[hc]; do sed -i -e s/intel_ring_flag/intel_engine_flag/ $f; done
for f in *.[hc]; do sed -i -e s/intel_ring_idle/intel_engine_idle/ $f; done
for f in *.[hc]; do sed -i -e s/init_ring_lists/init_engine_lists/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_reset_ring_cleanup/i915_gem_reset_engine_cleanup/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_reset_ring_status/i915_gem_reset_engine_status/ $f; done

v2: Rebase.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-03-16 15:33:24 +00:00
Tvrtko Ursulin 4a570db57c drm/i915: Rename intel_engine_cs struct members
below and a couple manual fixups.

@@
identifier I, J;
@@
struct I {
...
- struct intel_engine_cs *J;
+ struct intel_engine_cs *engine;
...
}
@@
identifier I, J;
@@
struct I {
...
- struct intel_engine_cs J;
+ struct intel_engine_cs engine;
...
}
@@
struct drm_i915_private *d;
@@
(
- d->ring
+ d->engine
)
@@
struct i915_execbuffer_params *p;
@@
(
- p->ring
+ p->engine
)
@@
struct intel_ringbuffer *r;
@@
(
- r->ring
+ r->engine
)
@@
struct drm_i915_gem_request *req;
@@
(
- req->ring
+ req->engine
)

v2: Script missed the tracepoint code - fixed up by hand.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-03-16 15:33:17 +00:00
Tvrtko Ursulin 0bc40be85f drm/i915: Rename intel_engine_cs function parameters
@@
identifier func;
@@
func(..., struct intel_engine_cs *
- ring
+ engine
, ...)
{
<...
- ring
+ engine
...>
}
@@
identifier func;
type T;
@@
T func(..., struct intel_engine_cs *
- ring
+ engine
, ...);

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-03-16 15:33:10 +00:00
Tvrtko Ursulin e2f8039147 drm/i915: Rename local struct intel_engine_cs variables
Done by the Coccinelle script below plus a manual
intervention to GEN8_RING_SEMAPHORE_INIT.

@@
expression E;
@@
- struct intel_engine_cs *ring = E;
+ struct intel_engine_cs *engine = E;
<+...
- ring
+ engine
...+>
@@
@@
- struct intel_engine_cs *ring;
+ struct intel_engine_cs *engine;
<+...
- ring
+ engine
...+>

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-03-16 15:33:00 +00:00
Tvrtko Ursulin ca377809d6 drm/i915: Avoid snooping with userptr where not supported
commit e5756c10d8
   Author: Imre Deak <imre.deak@intel.com>
   Date:   Fri Aug 14 18:43:30 2015 +0300

       drm/i915/bxt: don't allow cached GEM mappings on A stepping

Added an exception of disallowing snooping for Broxton A
stepping hardware but userptr was still enabling it regardless.

Move the check to HAS_SNOOP now that it is used from multiple
call sites and use it.

v2: Userptr cannot be supported when it cannot be coherent and
    generalize the code better. (Chris Wilson)

v3: Make has_snoop true only when !has_llc. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1456920631-34302-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-03-02 13:46:21 +00:00
Chris Wilson 596c592319 drm/i915: Reduce the pointer dance of i915_is_ggtt()
The multiple levels of indirect do nothing but hinder the compiler and
the pointer chasing turns to be quite painful but painless to fix.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1456484600-11477-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-02-26 13:15:39 +00:00
Chris Wilson 1c7f4bca5a drm/i915: Rename vma->*_list to *_link for consistency
Elsewhere we have adopted the convention of using '_link' to denote
elements in the list (and '_list' for the actual list_head itself), and
that the name should indicate which list the link belongs to (and
preferrably not just where the link is being stored).

s/vma_link/obj_link/ (we iterate over obj->vma_list)
s/mm_list/vm_link/ (we iterate over vm->[in]active_list)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-02-26 13:15:39 +00:00
Ben Widawsky 1db6e2e7dc drm/i915: Check for get_pages instead of shmem (filp)
This behavior of checking for a shmem backed GEM object was introduced here:
commit 4c914c0c7c
Author: Brad Volkin <bradley.d.volkin@intel.com>
Date:   Tue Feb 18 10:15:45 2014 -0800

    drm/i915: Refactor shmem pread setup

It is possible for an object to not be a shmem backed GEM object (for example
userptr objects). An example of how we hit this failure can be found through
copy_batch() in the command parser because we allocate a userptr object for the
batch which contains privileged instructions. Userptr calls
drm_gem_private_object_init() which explicitly sets the filp to none.

NOTE: I manually retyped this from a test machine. So I haven't even compiled
this exact patch.

v2: Use same logic as from a2a4f916c2f (Kristian, Dave Gordon)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1455047053-2644-1-git-send-email-benjamin.widawsky@intel.com
2016-02-15 18:26:52 +01:00
Tvrtko Ursulin 3f441b825d drm/i915: Use appropriate spinlock flavour
We know this never runs from interrupt context so
don't need to use the flags variant.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-02-15 16:10:18 +00:00
Daniel Vetter 1ffedc0677 Revert "drm/i915: fix context/engine cleanup order"
This reverts commit 1b39a917a9.

Chris retracted his reviewed-by (which I failed to notice) and somehow
it blows up (I did it again!) as reported by Mika with the below
backtrace on module reload:

[   58.170374] IP: [<ffffffffa00e04d3>]
intel_logical_ring_cleanup+0x83/0x100 [i915]
...
[   58.170469] Call Trace:
[   58.170479]  [<ffffffffa00d0ed4>] i915_gem_cleanup_engines+0x34/0x60
[i915]
[   58.170493]  [<ffffffffa0154520>] i915_driver_unload+0x140/0x220
[i915]
[   58.170497]  [<ffffffff8154a4f4>] drm_dev_unregister+0x24/0xa0
[   58.170501]  [<ffffffff8154aace>] drm_put_dev+0x1e/0x60
[   58.170506]  [<ffffffffa00912a0>] i915_pci_remove+0x10/0x20 [i915]
[   58.170510]  [<ffffffff814766e4>] pci_device_remove+0x34/0xb0
[   58.170514]  [<ffffffff8156e7d5>] __device_release_driver+0x95/0x140
[   58.170518]  [<ffffffff8156e97c>] driver_detach+0xbc/0xc0
[   58.170521]  [<ffffffff8156d883>] bus_remove_driver+0x53/0xd0
[   58.170525]  [<ffffffff8156f3a7>] driver_unregister+0x27/0x50
[   58.170528]  [<ffffffff81475725>] pci_unregister_driver+0x25/0x70
[   58.170531]  [<ffffffff8154c274>] drm_pci_exit+0x74/0x90
[   58.170543]  [<ffffffffa0154cb0>] i915_exit+0x20/0x1aa [i915]
[   58.170548]  [<ffffffff8111846f>] SyS_delete_module+0x18f/0x1f0

Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Nick Hoath <nicholas.hoath@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2016-02-15 10:50:13 +01:00
Nick Hoath 1b39a917a9 drm/i915: fix context/engine cleanup order
Swap the order of context & engine cleanup, so that contexts are cleaned
up first, and *then* engines. This is a more sensible order anyway, but
in particular has become necessary since the 'intel_ring_initialized()
must be simple and inline' patch, which now uses ring->dev as an
'initialised' flag, so it can now be NULL after engine teardown. This
in turn can cause a problem in the context code, which (used to) check
the ring->dev->struct_mutex -- causing a fault if ring->dev was NULL.

Also rename the cleanup function to reflect what it actually does
(cleanup engines, not a ringbuffer), and fix an annoying whitespace issue.

v2: Also make the fix in i915_load_modeset_init, not just in
    i915_driver_unload (Chris Wilson)
v3: Had extra stuff in it.
v4: Reverted extra stuff (so we're back to v2).
    Rebased and updated commentary above (Dave Gordon).

Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Signed-off-by: David Gordon <david.s.gordon@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1453504211-7982-2-git-send-email-david.s.gordon@intel.com
2016-02-10 08:40:18 +01:00
Chris Wilson de4726649b drm/i915: Allow i915_gem_object_get_page() on userptr as well
commit 033908aed5
Author: Dave Gordon <david.s.gordon@intel.com>
Date:   Thu Dec 10 18:51:23 2015 +0000

    drm/i915: mark GEM object pages dirty when mapped & written by the CPU

introduced a check into i915_gem_object_get_dirty_pages() that returned
a NULL pointer when called with a bad object, one that was not backed by
shmemfs. This WARN was too strict as we can work on all struct page
backed objects, and resulted in a WARN + GPF for existing userspace. In
order to differentiate the various types of objects, add a new flags field
to the i915_gem_object_ops struct to describe their capabilities, with
the first flag being whether the object has struct pages.

v2: Drop silly const before an integer in the structure declaration.

Testcase: igt/gem_userptr_blits/relocations
Reported-and-tested-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Tested-by: Michal Winiarski <michal.winiarski@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453487551-16799-1-git-send-email-chris@chris-wilson.co.uk
2016-02-03 10:21:24 -08:00
Tvrtko Ursulin e5292823c1 drm/i915: Make LRC (un)pinning work on context and engine
Previously intel_lr_context_(un)pin were operating on requests
which is in conflict with their names.

If we make them take a context and an engine, it makes the names
make more sense and it also makes future fixes possible.

v2: Rebase for default_context/kernel_context change.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Nick Hoath <nicholas.hoath@intel.com>
2016-01-28 17:23:15 +00:00
Imre Deak d64aa096a4 drm/i915: Sanitize i915_gem_load() init and clean-up
Factor out common clean-up code for the GEM load time init function.
Also rename i915_gem_load() to i915_gem_load_init() to have a better
match with its new clean-up function.

No functional change.

Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: David Weinehall <david.weinehall@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453209992-25995-5-git-send-email-imre.deak@intel.com
2016-01-27 17:43:15 +02:00
Imre Deak a8a4058925 drm/i915: Sanitize GEM shrinker init and clean-up
Factor out the common GEM shrinker clean-up code and call the shrinker
init function from the same function from where the corresponding
shrinker clean-up function is called. Also add sanity checking to the
shrinker and OOM registration calls.

Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: David Weinehall <david.weinehall@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453209992-25995-4-git-send-email-imre.deak@intel.com
2016-01-27 17:43:14 +02:00
Daniel Vetter 9a15a87338 Revert "drm/i915: Fix context/engine cleanup order"
This reverts commit 1803c035ef.

It seems to blow up on module unload due to a use-after free hitting a
BUG_ON with CONFIG_DEBUG_SG. Quoting from Tvrtko's mail:

"I've decoded the instructions and it pointed to SG_MAGIC checking:

488b8098010000  mov 0x198(%rax),%rax
ba21436587      mov $0x87654321,%edx
488b00          mov (%rax),%rax       *** CRASH

"Grep showed 0x87654321 is SG_MAGIC, so likely candidate for this code
pattern is:

static inline struct page *sg_page(struct scatterlist *sg)
{
    BUG_ON(sg->sg_magic != SG_MAGIC);
    BUG_ON(sg_is_chain(sg));
    return (struct page *)((sg)->page_link & ~0x3);
}

"Which would mean the offender is in intel_logical_ring_cleanup is most
likely:

...
    if (ring->status_page.obj) {
        kunmap(sg_page(ring->status_page.obj->pages->sgl));
        ring->status_page.obj = NULL;
    }
...

"I think that the i915_gem_context_fini will do a final unref on
dev_priv->kernel_context and then the ring buff has a copy which is
left dangling because:

    lrc_setup_hardware_status_page(ring,
        dev_priv->kernel_context->engine[ring->id].state);

and:

ring->status_page.obj = default_ctx_obj;

"Where default_ctx_obj == dev_priv->kernel_context->engine[ring->id].state
So indeed looks like the unload ordering is the trigger.  In fact it
is almost the same fragility wrt/ kernel_context hidden dependency I
expressed my worry about in an e-mail yesterday or so. It only shows
if CONFIG_DEBUG_SG is set, otherwise it accesses freed memory and
probably just survives."

This causes serious trouble in our CI system since it took out all
gen8+ machines. Not yet clear why this wasn't caught in pre-merge
testing.

Backtrace from CI, for posterity:

[  163.737836] general protection fault: 0000 [#1] PREEMPT SMP
[  163.737849] Modules linked in: ax88179_178a usbnet mii snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915(-) x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me mei i2c_hid e1000e ptp pps_core [last unloaded: snd_hda_intel]
[  163.737902] CPU: 0 PID: 5812 Comm: rmmod Tainted: G     U  W       4.5.0-rc1-gfxbench+ #1
[  163.737911] Hardware name: System manufacturer System Product Name/Z170M-PLUS, BIOS 0505 11/16/2015
[  163.737920] task: ffff8800bb99cf80 ti: ffff88022ff2c000 task.ti: ffff88022ff2c000
[  163.737928] RIP: 0010:[<ffffffffa018f723>]  [<ffffffffa018f723>] intel_logical_ring_cleanup+0x83/0x100 [i915]
[  163.737969] RSP: 0018:ffff88022ff2fd30  EFLAGS: 00010282
[  163.737975] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8800bb2f31b8 RCX: 0000000000000002
[  163.737982] RDX: 0000000087654321 RSI: 000000000000000d RDI: ffff8800bb2f31f0
[  163.737989] RBP: ffff88022ff2fd40 R08: 0000000000000000 R09: 0000000000000001
[  163.737996] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800bb2f0000
[  163.738003] R13: ffff8800bb2f8fc8 R14: ffff8800bb285668 R15: 000055af1ae55210
[  163.738010] FS:  00007f187014b700(0000) GS:ffff88023bc00000(0000) knlGS:0000000000000000
[  163.738021] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  163.738030] CR2: 0000558f84e4cbc8 CR3: 000000022cd55000 CR4: 00000000003406f0
[  163.738039] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  163.738048] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  163.738057] Stack:
[  163.738062]  ffff8800bb2f31b8 ffff8800bb2f0000 ffff88022ff2fd70 ffffffffa0180414
[  163.738079]  ffff8800bb2f0000 ffff8800bb285668 ffff8800bb2856c8 ffffffffa0242460
[  163.738094]  ffff88022ff2fd98 ffffffffa0202d30 ffff8800bb285668 ffff8800bb285668
[  163.738109] Call Trace:
[  163.738140]  [<ffffffffa0180414>] i915_gem_cleanup_engines+0x34/0x60 [i915]
[  163.738185]  [<ffffffffa0202d30>] i915_driver_unload+0x150/0x270 [i915]
[  163.738198]  [<ffffffff815100f4>] drm_dev_unregister+0x24/0xa0
[  163.738208]  [<ffffffff815106ce>] drm_put_dev+0x1e/0x60
[  163.738225]  [<ffffffffa01412a0>] i915_pci_remove+0x10/0x20 [i915]
[  163.738237]  [<ffffffff8143d9b4>] pci_device_remove+0x34/0xb0
[  163.738249]  [<ffffffff81533d15>] __device_release_driver+0x95/0x140
[  163.738259]  [<ffffffff81533eb6>] driver_detach+0xb6/0xc0
[  163.738268]  [<ffffffff81532de3>] bus_remove_driver+0x53/0xd0
[  163.738278]  [<ffffffff815348d7>] driver_unregister+0x27/0x50
[  163.738289]  [<ffffffff8143ca15>] pci_unregister_driver+0x25/0x70
[  163.738299]  [<ffffffff81511de4>] drm_pci_exit+0x74/0x90
[  163.738337]  [<ffffffffa02034a9>] i915_exit+0x20/0x1a5 [i915]
[  163.738349]  [<ffffffff8110400f>] SyS_delete_module+0x18f/0x1f0
[  163.738361]  [<ffffffff817b8a9b>] entry_SYSCALL_64_fastpath+0x16/0x73
[  163.738370] Code: ff d0 48 89 df e8 de a1 fd ff 48 8d 7b 38 e8 25 ab fd ff 48 8b 83 90 00 00 00 48 85 c0 74 25 48 8b 80 98 01 00 00 ba 21 43 65 87 <48> 8b 00 48 39 10 75 3c f6 40 08 01 75 38 48 c7 83 90 00 00 00
[  163.738459] RIP  [<ffffffffa018f723>] intel_logical_ring_cleanup+0x83/0x100 [i915]
[  163.738498]  RSP <ffff88022ff2fd30>
[  163.738507] ---[ end trace 68f69ce4740fa44f ]---

Cc: Nick Hoath <nicholas.hoath@intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Tested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2016-01-27 13:47:50 +01:00
Nick Hoath 1803c035ef drm/i915: Fix context/engine cleanup order
Swap the order of context & engine cleanup, so that contexts are cleaned
up first, and *then* engines. This is a more sensible order anyway, but
in particular has become necessary since the 'intel_ring_initialized()
must be simple and inline' patch, which now uses ring->dev as an
'initialised' flag, so it can now be NULL after engine teardown. This
in turn can cause a problem in the context code, which (used to) check
the ring->dev->struct_mutex -- causing a fault if ring->dev was NULL.

Also rename the cleanup function to reflect what it actually does
(cleanup engines, not a ringbuffer), and fix an annoying whitespace issue.

v2: Also make the fix in i915_load_modeset_init, not just in
    i915_driver_unload (Chris Wilson)
v3: Had extra stuff in it.
v4: Reverted extra stuff (so we're back to v2).
    Rebased and updated commentary above (Dave Gordon).

Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1453405067-32890-3-git-send-email-david.s.gordon@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-01-25 19:09:03 +01:00
Chris Wilson 426960bed3 drm/i915: Seal busy-ioctl uABI and prevent leaking of internal ids
Tvrtko was looking through the execbuffer-ioctl and noticed that the
uABI was tightly coupled to our internal engine identifiers. Close
inspection also revealed that we leak those internal engine identifiers
through the busy-ioctl, and those internal identifiers already do not
match the user identifiers. Fortuitiously, there is only one user of the
set of busy rings from the busy-ioctl, and they only wish to choose
between the RENDER and the BLT engines.

Let's fix the userspace ABI while we still can.

v2: Update the uAPI documentation to explain the identifiers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Testcase: igt/gem_busy
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1452876706-21620-1-git-send-email-chris@chris-wilson.co.uk
2016-01-21 11:00:35 +00:00
Tvrtko Ursulin de1add3605 drm/i915: Decouple execbuf uAPI from internal implementation
At the moment execbuf ring selection is fully coupled to
internal ring ids which is not a good thing on its own.

This dependency is also spread between two source files and
not spelled out at either side which makes it hidden and
fragile.

This patch decouples this dependency by introducing an explicit
translation table of execbuf uAPI to ring id close to the only
call site (i915_gem_do_execbuffer).

This way we are free to change driver internal implementation
details without breaking userspace. All state relating to the
uAPI is now contained in, or next to, i915_gem_do_execbuffer.

As a side benefit, this patch decreases the compiled size
of i915_gem_do_execbuffer.

v2: Extract ring selection into eb_select_ring. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1452870770-13981-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-01-21 10:55:44 +00:00
Dave Gordon e28e404c3e drm/i915: tidy up a few leftovers
There are a few bits of code which the transformations implemented by
the previous patch reveal to be suboptimal, once the notion of a per-
ring default context has gone away. So this tidies up the leftovers.

It could have been squashed into the previous patch, but that would have
made that patch less clearly a simple transformation. In particular, any
change which alters the code block structure or indentation has been
deferred into this separate patch, because such things tend to make
diffs more difficult to read.

v4:	Rebased

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Nick Hoath <nicholas.hoath@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453230175-19330-4-git-send-email-david.s.gordon@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-01-21 09:21:29 +01:00
Dave Gordon ed54c1a1d1 drm/i915: abolish separate per-ring default_context pointers
Now that we've eliminated a lot of uses of ring->default_context,
we can eliminate the pointer itself.

All the engines share the same default intel_context, so we can just
keep a single reference to it in the dev_priv structure rather than one
in each of the engine[] elements. This make refcounting more sensible
too, as we now have a refcount of one for the one pointer, rather than
a refcount of one but multiple pointers.

From an idea by Chris Wilson.

v2:	transform an extra instance of ring->default_context introduced by
    42f1cae8c drm/i915: Restore inhibiting the load of the default context
    That patch's commentary includes:
	v2: Mark the global default context as uninitialized on GPU reset so
	    that the context-local workarounds are reloaded upon re-enabling
    The code implementing that now also benefits from the replacement of
    the multiple (per-ring) pointers to the default context with a single
    pointer to the unique kernel context.

v4:	Rebased, remove underused local (Nick Hoath)

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Nick Hoath <nicholas.hoath@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1453230175-19330-3-git-send-email-david.s.gordon@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-01-21 09:21:29 +01:00
Dave Gordon 2682708839 drm/i915: simplify allocation of driver-internal requests
There are a number of places where the driver needs a request, but isn't
working on behalf of any specific user or in a specific context. At
present, we associate them with the per-engine default context. A future
patch will abolish those per-engine context pointers; but we can already
eliminate a lot of the references to them, just by making the allocator
allow NULL as a shorthand for "an appropriate context for this ring",
which will mean that the callers don't need to know anything about how
the "appropriate context" is found (e.g. per-ring vs per-device, etc).

So this patch renames the existing i915_gem_request_alloc(), and makes
it local (static inline), and replaces it with a wrapper that provides
a default if the context is NULL, and also has a nicer calling
convention (doesn't require a pointer to an output parameter). Then we
change all callers to use the new convention:
OLD:
	err = i915_gem_request_alloc(ring, user_ctx, &req);
	if (err) ...
NEW:
	req = i915_gem_request_alloc(ring, user_ctx);
	if (IS_ERR(req)) ...
OLD:
	err = i915_gem_request_alloc(ring, ring->default_context, &req);
	if (err) ...
NEW:
	req = i915_gem_request_alloc(ring, NULL);
	if (IS_ERR(req)) ...

v4:	Rebased

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Nick Hoath <nicholas.hoath@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453230175-19330-2-git-send-email-david.s.gordon@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-01-21 09:21:29 +01:00
Tvrtko Ursulin e0313db047 drm/i915: Only grab timestamps when needed
No need to call ktime_get_raw_ns twice per unlimited wait and can
also elimate a local variable.

v2: Added comment about silencing the compiler warning. (Daniel Vetter)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1452870672-13901-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-01-18 09:58:29 +00:00
Dave Airlie 1df59b8497 Merge tag 'drm-intel-next-fixes-2016-01-14' of git://anongit.freedesktop.org/drm-intel into drm-next
misc i915 fixes all over the place.

* tag 'drm-intel-next-fixes-2016-01-14' of git://anongit.freedesktop.org/drm-intel:
  drm/i915/gen9: Set PIN_ZONE_4G end to 4GB - 1 page
  drm/i915: Widen return value for reservation_object_wait_timeout_rcu to long.
  drm/i915: intel_hpd_init(): Fix suspend/resume reprobing
  drm/i915: shut up gen8+ SDE irq dmesg noise, again
  drm/i915: Restore inhibiting the load of the default context
  drm/i915: Tune down rpm wakelock debug checks
  drm/i915: Avoid writing relocs with addresses in non-canonical form
  drm/i915: Move Braswell stop_machine GGTT insertion workaround
2016-01-18 07:02:19 +10:00
Michel Thierry 48ea1e32c3 drm/i915/gen9: Set PIN_ZONE_4G end to 4GB - 1 page
Kernel and userspace are able to handle 4GB (1<<32) address space range,
but "A32 Stateless Model" is not. According to documentation, A32 accesses
are based on General State Base Address and bound checking is in place.
Because size field (instruction State Base Address) limitation, it is not
possible to address full 4GB memory region.

A32 Stateless Model is used by some libraries and without this patch, the
last page of 4GB address space is not accessible in 32bit processes.

Reported-by: Artur Harasimiuk <artur.harasimiuk@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1452512367-23614-1-git-send-email-michel.thierry@intel.com
Cc: drm-intel-fixes@lists.freedesktop.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(cherry picked from commit 1892faa9ec)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-01-13 10:50:55 +02:00
Dave Airlie ade1ba7346 Merge tag 'drm-intel-next-2015-12-18' of git://anongit.freedesktop.org/drm-intel into drm-next
- fix atomic watermark recomputation logic (Maarten)
- modeset sequence fixes for LPT (Ville)
- more kbl enabling&prep work (Rodrigo, Wayne)
- first bits for mst audio
- page dirty tracking fixes from Dave Gordon
- new get_eld hook from Takashi, also included in the sound tree
- fixup cursor handling when placed at address 0 (Ville)
- refactor VBT parsing code (Jani)
- rpm wakelock debug infrastructure ( Imre)
- fbdev is pinned again (Chris)
- tune the busywait logic to avoid wasting cpu cycles (Chris)

* tag 'drm-intel-next-2015-12-18' of git://anongit.freedesktop.org/drm-intel: (81 commits)
  drm/i915: Update DRIVER_DATE to 20151218
  drm/i915/skl: Default to noncoherent access up to F0
  drm/i915: Only spin whilst waiting on the current request
  drm/i915: Limit the busy wait on requests to 5us not 10ms!
  drm/i915: Break busywaiting for requests on pending signals
  drm/i915: don't enable autosuspend on platforms without RPM support
  drm/i915/backlight: prefer dev_priv over dev pointer
  drm/i915: Disable primary plane if we fail to reconstruct BIOS fb (v2)
  drm/i915: Pin the ifbdev for the info->system_base GGTT mmapping
  drm/i915: Set the map-and-fenceable flag for preallocated objects
  drm/i915: mdelay(10) considered harmful
  drm/i915: check that we are in an RPM atomic section in GGTT PTE updaters
  drm/i915: add support for checking RPM atomic sections
  drm/i915: check that we hold an RPM wakelock ref before we put it
  drm/i915: add support for checking if we hold an RPM reference
  drm/i915: use assert_rpm_wakelock_held instead of opencoding it
  drm/i915: add assert_rpm_wakelock_held helper
  drm/i915: remove HAS_RUNTIME_PM check from RPM get/put/assert helpers
  drm/i915: get a permanent RPM reference on platforms w/o RPM support
  drm/i915: refactor RPM disabling due to RC6 being disabled
  ...
2015-12-23 14:22:09 +10:00
Chris Wilson 821485dc2a drm/i915: Only spin whilst waiting on the current request
Limit busywaiting only to the request currently being processed by the
GPU. If the request is not currently being processed by the GPU, there
is a very low likelihood of it being completed within the 2 microsecond
spin timeout and so we will just be wasting CPU cycles.

v2: Check for logical inversion when rebasing - we were incorrectly
checking for this request being active, and instead busywaiting for
when the GPU was not yet processing the request of interest.

v3: Try another colour for the seqno names.
v4: Another colour for the function names.

v5: Remove the forced coherency when checking for the active request. On
reflection and plenty of recent experimentation, the issue is not a
cache coherency problem - but an irq/seqno ordering problem (timing issue).
Here, we do not need the w/a to force ordering of the read with an
interrupt.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1449833608-22125-4-git-send-email-chris@chris-wilson.co.uk
2015-12-18 17:11:56 +01:00
Chris Wilson ca5b721e23 drm/i915: Limit the busy wait on requests to 5us not 10ms!
When waiting for high frequency requests, the finite amount of time
required to set up the irq and wait upon it limits the response rate. By
busywaiting on the request completion for a short while we can service
the high frequency waits as quick as possible. However, if it is a slow
request, we want to sleep as quickly as possible. The tradeoff between
waiting and sleeping is roughly the time it takes to sleep on a request,
on the order of a microsecond. Based on measurements of synchronous
workloads from across big core and little atom, I have set the limit for
busywaiting as 10 microseconds. In most of the synchronous cases, we can
reduce the limit down to as little as 2 miscroseconds, but that leaves
quite a few test cases regressing by factors of 3 and more.

The code currently uses the jiffie clock, but that is far too coarse (on
the order of 10 milliseconds) and results in poor interactivity as the
CPU ends up being hogged by slow requests. To get microsecond resolution
we need to use a high resolution timer. The cheapest of which is polling
local_clock(), but that is only valid on the same CPU. If we switch CPUs
because the task was preempted, we can also use that as an indicator that
 the system is too busy to waste cycles on spinning and we should sleep
instead.

__i915_spin_request was introduced in
commit 2def4ad99b [v4.2]
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Apr 7 16:20:41 2015 +0100

     drm/i915: Optimistically spin for the request completion

v2: Drop full u64 for unsigned long - the timer is 32bit wraparound safe,
so we can use native register sizes on smaller architectures. Mention
the approximate microseconds units for elapsed time and add some extra
comments describing the reason for busywaiting.

v3: Raise the limit to 10us
v4: Now 5us.

Reported-by: Jens Axboe <axboe@kernel.dk>
Link: https://lkml.org/lkml/2015/11/12/621
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com>
Cc: stable@vger.kernel.org

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1449833608-22125-3-git-send-email-chris@chris-wilson.co.uk
2015-12-18 17:11:52 +01:00
Chris Wilson 91b0c352ac drm/i915: Break busywaiting for requests on pending signals
The busywait in __i915_spin_request() does not respect pending signals
and so may consume the entire timeslice for the task instead of
returning to userspace to handle the signal.

In the worst case this could cause a delay in signal processing of 20ms,
which would be a noticeable jitter in cursor tracking. If a higher
resolution signal was being used, for example to provide fairness of a
server timeslices between clients, we could expect to detect some
unfairness between clients (i.e. some windows not updating as fast as
others). This issue was noticed when inspecting a report of poor
interactivity resulting from excessively high __i915_spin_request usage.

Fixes regression from
commit 2def4ad99b [v4.2]
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Apr 7 16:20:41 2015 +0100

     drm/i915: Optimistically spin for the request completion

v2: Try to assess the impact of the bug

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc; "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1449833608-22125-2-git-send-email-chris@chris-wilson.co.uk
2015-12-18 17:11:47 +01:00
Chris Wilson d0710abbcd drm/i915: Set the map-and-fenceable flag for preallocated objects
As we mark the preallocated objects as bound, we should also flag them
correctly as being map-and-fenceable (if appropriate!) so that later
users do not get confused and try and rebind the pinned vma in order to
get a map-and-fenceable binding.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: drm-intel-fixes@lists.freedesktop.org
Link: http://patchwork.freedesktop.org/patch/msgid/1448029000-10616-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-12-17 16:59:24 +01:00
Dave Airlie 51bce5bc38 Merge tag 'drm-intel-next-2015-12-04-1' of git://anongit.freedesktop.org/drm-intel into drm-next
This is the "fix igt basic test set issues" edition.
- more PSR fixes from Rodrigo, getting closer
- tons of fifo underrun fixes from Ville
- runtime pm fixes from Imre, Daniel Stone
- fix SDE interrupt handling properly (Jani Nikula)
- hsw/bdw fdi modeset sequence fixes (Ville)
- "don't register bad VGA connectors and fall over" fixes (Ville)
- more fbc fixes from Paulo
- and a grand total of exactly one feature item: Implement dma-buf/fence based
  cross-driver sync in the i915 pageflip path (Alex Goins)

* tag 'drm-intel-next-2015-12-04-1' of git://anongit.freedesktop.org/drm-intel: (70 commits)
  drm/i915: Update DRIVER_DATE to 20151204
  drm/i915/skl: Add SKL GT4 PCI IDs
  Revert "drm/i915: Extend LRC pinning to cover GPU context writeback"
  drm/i915: Correct the Ref clock value for BXT
  drm/i915: Restore skl_gt3 device info
  drm/i915: Fix RPS pointer passed from wait_ioctl to i915_wait_request
  Revert "drm/i915: Remove superfluous NULL check"
  drm/i915: Clean up device info structure definitions
  drm/i915: Remove superfluous NULL check
  drm/i915: Handle cdclk limits on broadwell.
  i915: wait for fence in prepare_plane_fb
  i915: wait for fence in mmio_flip_work_func
  drm/i915: Extend LRC pinning to cover GPU context writeback
  drm/i915/guc: Clean up locks in GuC
  drm/i915: only recompress FBC after flushing a drawing operation
  drm/i915: get rid of FBC {,de}activation messages
  drm/i915: kill fbc.uncompressed_size
  drm/i915: use a single intel_fbc_work struct
  drm/i915: check for FBC planes in the same place as the pipes
  drm/i915: alloc/free the FBC CFB during enable/disable
  ...
2015-12-15 11:01:04 +10:00
Daniel Vetter 618100f8a8 Add get_eld audio component for i915/HD-audio
Here are the patchset to add get_eld op to audio component for
 communicating more directly between i915 and HD-audio.
 
 Currently, the HDMI/DP audio status and ELD are notified and obtained
 via the hardware-level communication over HD-audio unsolicited event
 and verbs although the graphics driver holds the exactly same
 information.  As we already have a notification via audio component,
 this is another step forward; namely, the audio driver may fetch
 directly the audio status and ELD via the new component op.
 
 The commits are based on Dave's latest drm-next branch.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJWaXTVAAoJEGwxgFQ9KSmkdZIQAKg2NQN1CGOIb+ce80UmJ9ap
 myZH42aZXytPrMAhHQCw2mvf8aL18EKyQcGv+anLdFM+AqlzZvH/exfOblkr88lg
 ZMGr0bXVNa2Lfyrgt9blUwH58uETQZC4P3BrI0cPqIJBdPagDxnxoaV1e/21g/9p
 0a2bhiz0gn4OFb83vpi5pL4hGv+BQwwlmkOujcVg7yxR7ylnYI419NM9Z+Lbmfq7
 p5jId6Q3EwBw6vpWryOI2TElM3VDThoOGCOtkfmZx6o4fDZ0bdl8CYVCgRFwZZCe
 kk01Caa+5+CW88MlJ1VX6gLy0WRlPY0AFreCWKgy5HCUNqew9ruhUeMj4+C1oHpj
 ui/79ULLRN2hmu2rvU8lZb0ClihXkDCBN8p89j6o2I+y1aIcUtxvY9Srg5w2tVBe
 Ue+OSB3lA4rdnuSjxZiaPf+V4rozIyNJHRjo6xNdY0zuScB4lw9Bh7IYXmj8B8OW
 k3LklToj4ZGeyCgfcTQwztAh7fFEXUb1wN+lLqCt3b9688zvMYTQlJ8ZdtK+t188
 3DNz9QjjPd4DcxLypl1VpM2Xv3AhuFfugq0oEuQq9bXs7qtj+iLmSWWdmhUNaVWb
 Qot21vJEHDii6jtoLdbVMTEZTWyr2nXUfUNFJpUgitif2UhqqgecnR16Fi05pjTv
 +Th/GvjddrQ0oe9DwVGY
 =NShN
 -----END PGP SIGNATURE-----

Merge tag 'drm-i915-get-eld' of tiwai/sound into drm-intel-next-queued

Add get_eld audio component for i915/HD-audio

Currently, the HDMI/DP audio status and ELD are notified and obtained
via the hardware-level communication over HD-audio unsolicited event
and verbs although the graphics driver holds the exactly same
information.  As we already have a notification via audio component,
this is another step forward; namely, the audio driver may fetch
directly the audio status and ELD via the new component op.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2015-12-11 19:28:27 +01:00
Dave Gordon 9e7d18c08a drm/i915: mark a newly-created GEM object dirty when filled with data
When creating a new (pageable) GEM object and filling it with data, we
must mark it as 'dirty', i.e. backing store is out-of-date w.r.t. the
newly-written content. This ensures that if the object is evicted under
memory pressure, its pages in the pagecache will be written to backing
store rather than discarded.

Based on an original version by Alex Dai.

Signed-off-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1449773486-30822-3-git-send-email-david.s.gordon@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-12-11 18:20:19 +01:00
Dave Gordon 033908aed5 drm/i915: mark GEM object pages dirty when mapped & written by the CPU
In various places, a single page of a (regular) GEM object is mapped into
CPU address space and updated. In each such case, either the page or the
the object should be marked dirty, to ensure that the modifications are
not discarded if the object is evicted under memory pressure.

The typical sequence is:
	va = kmap_atomic(i915_gem_object_get_page(obj, pageno));
	*(va+offset) = ...
	kunmap_atomic(va);

Here we introduce i915_gem_object_get_dirty_page(), which performs the
same operation as i915_gem_object_get_page() but with the side-effect
of marking the returned page dirty in the pagecache.  This will ensure
that if the object is subsequently evicted (due to memory pressure),
the changes are written to backing store rather than discarded.

Note that it works only for regular (shmfs-backed) GEM objects, but (at
least for now) those are the only ones that are updated in this way --
the objects in question are contexts and batchbuffers, which are always
shmfs-backed.

Separate patches deal with the cases where whole objects are (or may
be) dirtied.

v3: Mark two more pages dirty in the page-boundary-crossing
    cases of the execbuffer relocation code [Chris Wilson]

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1449773486-30822-2-git-send-email-david.s.gordon@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-12-11 18:11:53 +01:00
Tomas Elf c5baa566db drm/i915: Update to post-reset execlist queue clean-up
When clearing an execlist queue, instead of traversing it and unreferencing all
requests while holding the spinlock (which might lead to thread sleeping with
IRQs are turned off - bad news!), just move all requests to the retire request
list while holding spinlock and then drop spinlock and invoke the execlists
request retirement path, which already deals with the intricacies of
purging/dereferencing execlist queue requests.

This patch can be considered v3 of:

	commit b96db8b81c54ef30485ddb5992d63305d86ea8d3
	Author: Tomas Elf <tomas.elf@intel.com>
	drm/i915: Grab execlist spinlock to avoid post-reset concurrency issues

This patch assumes v2 of the above patch is part of the baseline, reverts v2
and adds changes on top to turn it into v3.

Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1445619757-19822-1-git-send-email-tomas.elf@intel.com
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Dave Gordon <dave.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-12-11 17:39:48 +01:00
Wayne Boyer bf6ce93a73 drm/i915: Remove VLV A0 hack
Do some further clean up based on the initial review of
drm/i915: Separate cherryview from valleyview.

In this case remove a hack for VLV A0.

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Wayne Boyer <wayne.boyer@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1449514270-15171-4-git-send-email-wayne.boyer@intel.com
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
2015-12-10 11:07:30 +01:00
Wayne Boyer 666a45379e drm/i915: Separate cherryview from valleyview
The cherryview device shares many characteristics with the valleyview
device.  When support was added to the driver for cherryview, the
corresponding device info structure included .is_valleyview = 1.
This is not correct and leads to some confusion.

This patch changes .is_valleyview to .is_cherryview in the cherryview
device info structure and simplifies the IS_CHERRYVIEW macro.
Then where appropriate, instances of IS_VALLEYVIEW are replaced with
IS_VALLEYVIEW || IS_CHERRYVIEW or equivalent.

v2: Use IS_VALLEYVIEW || IS_CHERRYVIEW instead of defining a new macro.
    Also add followup patches to fix issues discovered during the first
    review. (Ville)
v3: Fix some style issues and one gen check. Remove CRT related changes
    as CRT is not supported on CHV. (Imre, Ville)
v4: Make a few more optimizations. (Ville)

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Wayne Boyer <wayne.boyer@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1449692975-14803-1-git-send-email-wayne.boyer@intel.com
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
2015-12-10 11:07:24 +01:00