Commit Graph

283 Commits

Author SHA1 Message Date
Daniel Vetter b45305fce5 drm/i915: Implement workaround for broken CS tlb on i830/845
Now that Chris Wilson demonstrated that the key for stability on early
gen 2 is to simple _never_ exchange the physical backing storage of
batch buffers I've tried a stab at a kernel solution. Doesn't look too
nefarious imho, now that I don't try to be too clever for my own good
any more.

v2: After discussing the various techniques, we've decided to always blit
batches on the suspect devices, but allow userspace to opt out of the
kernel workaround assume full responsibility for providing coherent
batches. The principal reason is that avoiding the blit does improve
performance in a few key microbenchmarks and also in cairo-trace
replays.

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet:
- Drop the hunk which uses HAS_BROKEN_CS_TLB to implement the ring
  wrap w/a. Suggested by Chris Wilson.
- Also add the ACTHD check from Chris Wilson for the error state
  dumping, so that we still catch batches when userspace opts out of
  the w/a.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-17 17:27:02 +01:00
Mika Kuoppala 498d2ac15c drm/i915: Add intel_ring_handle_seqno wrap
If there are pre-wrap values in semaphore-mbox registers after wrap,
syncing against some after-wrap request will complete immediately.
Fix this by emitting ring commands to set mbox registers to zero
when the wrap happens.

v2: Use __intel_ring_begin to emit ring commands, from
Chris Wilson.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Add a small comment to handle_seqno_wrap.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-06 13:14:34 +01:00
Ville Syrjälä 633cf8f505 drm/i915: Don't allow ring tail to reach the same cacheline as head
From BSpec:
"If the Ring Buffer Head Pointer and the Tail Pointer are on the same
cacheline, the Head Pointer must not be greater than the Tail
Pointer."

The easiest way to enforce this is to reduce the reported ring space.

References:
Gen2 BSpec "1. Programming Environment" / 1.4.4.6 "Ring Buffer Use"
Gen3 BSpec "vol1c Memory Interface Functions" / 2.3.4.5 "Ring Buffer Use"
Gen4+ BSpec "vol1c Memory Interface and Command Stream" / 5.3.4.5 "Ring Buffer Use"

v2: Include the exact BSpec references in the description

v3: s/64/I915_RING_FREE_SPACE, and add the BSpec information to the code

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-03 18:31:20 +01:00
Chris Wilson 3e9605018a drm/i915: Rearrange code to only have a single method for waiting upon the ring
Replace the wait for the ring to be clear with the more common wait for
the ring to be idle. The principle advantage is one less exported
intel_ring_wait function, and the removal of a hardcoded value.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-11-29 11:43:53 +01:00
Chris Wilson 9d7730914f drm/i915: Preallocate next seqno before touching the ring
Based on the work by Mika Kuoppala, we realised that we need to handle
seqno wraparound prior to committing our changes to the ring. The most
obvious point then is to grab the seqno inside intel_ring_begin(), and
then to reuse that seqno for all ring operations until the next request.
As intel_ring_begin() can fail, the callers must already be prepared to
handle such failure and so we can safely add further checks.

This patch looks like it should be split up into the interface
changes and the tweaks to move seqno wrapping from the execbuffer into
the core seqno increment. However, I found no easy way to break it into
incremental steps without introducing further broken behaviour.

v2: Mika found a silly mistake and a subtle error in the existing code;
inside i915_gem_retire_requests() we were resetting the sync_seqno of
the target ring based on the seqno from this ring - which are only
related by the order of their allocation, not retirement. Hence we were
applying the optimisation that the rings were synchronised too early,
fortunately the only real casualty there is the handling of seqno
wrapping.

v3: Do not forget to reset the sync_seqno upon module reinitialisation,
ala resume.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=863861
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> [v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-11-29 11:43:52 +01:00
Jesse Barnes 9a28977181 drm/i915: TLB invalidation with MI_FLUSH_DW requires a post-sync op v3
So store into the scratch space of the HWS to make sure the invalidate
occurs.

v2: use GTT address space for store, clean up #defines (Chris)
v3: use correct #define in blt ring flush (Chris)

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Antti Koskipää <antti.koskipaa@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
References: https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/1063252
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-11-11 23:51:36 +01:00
Chris Wilson d7d4eeddb8 drm/i915: Allow DRM_ROOT_ONLY|DRM_MASTER to submit privileged batchbuffers
With the introduction of per-process GTT space, the hardware designers
thought it wise to also limit the ability to write to MMIO space to only
a "secure" batch buffer. The ability to rewrite registers is the only
way to program the hardware to perform certain operations like scanline
waits (required for tear-free windowed updates). So we either have a
choice of adding an interface to perform those synchronized updates
inside the kernel, or we permit certain processes the ability to write
to the "safe" registers from within its command stream. This patch
exposes the ability to submit a SECURE batch buffer to
DRM_ROOT_ONLY|DRM_MASTER processes.

v2: Haswell split up bit8 into a ppgtt bit (still bit8) and a security
bit (bit 13, accidentally not set). Also add a comment explaining why
secure batches need a global gtt binding.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
[danvet: added hsw fixup.]
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-10-17 21:06:59 +02:00
Chris Wilson b2eadbc85b drm/i915: Lazily apply the SNB+ seqno w/a
Avoid the forcewake overhead when simply retiring requests, as often the
last seen seqno is good enough to satisfy the retirment process and will
be promptly re-run in any case. Only ensure that we force the coherent
seqno read when we are explicitly waiting upon a completion event to be
sure that none go missing, and also for when we are reporting seqno
values in case of error or debugging.

This greatly reduces the load for userspace using the busy-ioctl to
track active buffers, for instance halving the CPU used by X in pushing
the pixels from a software render (flash). The effect will be even more
magnified with userptr and so providing a zero-copy upload path in that
instance, or in similar instances where X is simply compositing DRI
buffers.

v2: Reverse the polarity of the tachyon stream. Daniel suggested that
'force' was too generic for the parameter name and that 'lazy_coherency'
better encapsulated the semantics of it being an optimization and its
purpose. Also notice that gen6_get_seqno() is only used by gen6/7
chipsets and so the test for IS_GEN6 || IS_GEN7 is redundant in that
function.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-10 11:11:32 +02:00
Chris Wilson a7b9761d0a drm/i915: Split i915_gem_flush_ring() into seperate invalidate/flush funcs
By moving the function to intel_ringbuffer and currying the appropriate
parameter, hopefully we make the callsites easier to read and
understand.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-25 18:23:55 +02:00
Chris Wilson 69c2fc8913 drm/i915: Remove the per-ring write list
This is now handled by a global flag to ensure we emit a flush before
the next serialisation point (if we failed to queue one previously).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-25 18:23:53 +02:00
Daniel Vetter cc889e0f6c drm/i915: disable flushing_list/gpu_write_list
This is just the minimal patch to disable all this code so that we can
do decent amounts of QA before we rip it all out.

The complicating thing is that we need to flush the gpu caches after
the batchbuffer is emitted. Which is past the point of no return where
execbuffer can't fail any more (otherwise we risk submitting the same
batch multiple times).

Hence we need to add a flag to track whether any caches associated
with that ring are dirty. And emit the flush in add_request if that's
the case.

Note that this has a quite a few behaviour changes:
- Caches get flushed/invalidated unconditionally.
- Invalidation now happens after potential inter-ring sync.

I've bantered around a bit with Chris on irc whether this fixes
anything, and it might or might not. The only thing clear is that with
these changes it's much easier to reason about correctness.

Also rip out a lone get_next_request_seqno in the execbuffer
retire_commands function. I've dug around and I couldn't figure out
why that is still there, with the outstanding lazy request stuff it
shouldn't be necessary.

v2: Chris Wilson complained that I also invalidate the read caches
when flushing after a batchbuffer. Now optimized.

v3: Added some comments to explain the new flushing behaviour.

Cc: Eric Anholt <eric@anholt.net>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-06-20 13:54:28 +02:00
Ben Widawsky 12b0286f49 drm/i915: possibly invalidate TLB before context switch
From http://intellinuxgraphics.org/documentation/SNB/IHD_OS_Vol1_Part3.pdf

[DevSNB] If Flush TLB invalidation Mode is enabled it's the driver's
responsibility to invalidate the TLBs at least once after the previous
context switch after any GTT mappings changed (including new GTT
entries).  This can be done by a pipelined PIPE_CONTROL with TLB inv bit
set immediately before MI_SET_CONTEXT.

On GEN7 the invalidation mode is explicitly set, but this appears to be
lacking for GEN6. Since I don't know the history on this, I've decided
to dynamically read the value at ring init time, and use that value
throughout.

v2: better comment (daniel)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2012-06-14 17:36:19 +02:00
Ben Widawsky e055684168 drm/i915: context switch implementation
Implement the context switch code as well as the interfaces to do the
context switch. This patch also doesn't match 1:1 with the RFC patches.
The main difference is that from Daniel's responses the last context
object is now stored instead of the last context. This aids in allows us
to free the context data structure, and context object independently.

There is room for optimization: this code will pin the context object
until the next context is active. The optimal way to do it is to
actually pin the object, move it to the active list, do the context
switch, and then unpin it. This allows the eviction code to actually
evict the context object if needed.

The context switch code is missing workarounds, they will be implemented
in future patches.

v2: actually do obj->dirty=1 in switch (daniel)
Modified comment around above
Remove flags to context switch (daniel)
Move mi_set_context code to i915_gem_context.c (daniel)
Remove seqno , use lazy request instead (daniel)

v3: use i915_gem_request_next_seqno instead of
      outstanding_lazy_request (Daniel)
remove id's from trace events (Daniel)
Put the context BO in the instruction domain (Daniel)
Don't unref the BO is context switch fails (Chris)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2012-06-14 17:36:17 +02:00
Ben Widawsky 40521054fd drm/i915: context basic create & destroy
Invent an abstraction for a hw context which is passed around through
the core functions. The main bit a hw context holds is the buffer object
which backs the context. The rest of the members are just helper
functions. Specifically the ring member, which could likely go away if
we decide to never implement whatever other hw context support exists.

Of note here is the introduction of the 64k alignment constraint for the
BO. If contexts become heavily used, we should consider tweaking this
down to 4k. Until the contexts are merged and tested a bit though, I
think 64k is a nice start (based on docs).

Since we don't yet switch contexts, there is really not much complexity
here. Creation/destruction works pretty much as one would expect. An idr
is used to generate the context id numbers which are unique per file
descriptor.

v2: add DRM_DEBUG_DRIVERS to distinguish ENOMEM failures (ben)
convert a BUG_ON to WARN_ON, default destruction is still fatal (ben)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2012-06-14 17:36:16 +02:00
Chris Wilson b4519513e8 drm/i915: Introduce for_each_ring() macro
In many places we wish to iterate over the rings associated with the
GPU, so refactor them to use a common macro.

Along the way, there are a few code removals that should be side-effect
free and some rearrangement which should only have a cosmetic impact,
such as error-state.

Note that this slightly changes the semantics in the hangcheck code:
We now always cycle through all enabled rings instead of
short-circuiting the logic.

v2: Pull in a couple of suggestions from Ben and Daniel for
intel_ring_initialized() and not removing the warning (just moving them
to a new home, closer to the error).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Added note to commit message about the small behaviour
change, suggested by Ben Widawsky.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-19 22:39:53 +02:00
Daniel Vetter 4225d0f219 drm/i915: fixup __iomem mixups in ringbuffer.c
Two things:
- ring->virtual start is an __iomem pointer, treat it accordingly.
- dev_priv->status_page.page_addr is now always a cpu addr, no pointer
  casting needed for that.

Take the opportunity to remove the unnecessary drm indirection when
setting up the ringbuffer iomapping.

v2: Add a compiler barrier before reading the hw status page.

Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:31 +02:00
Daniel Vetter 09422b2e72 drm/i915: move LP_RING&friends to i915_dma.c
Wohoo!

Now we only need to move all the gem/kms stuff that accidentally
landed in i915_dma.c out of it, and this will be our legacy dri1
grave-yard.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:28 +02:00
Chris Wilson 6d171cb4c2 drm/i915: Remove unused ring->irq_seqno
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:23 +02:00
Ben Widawsky 9574b3fe29 drm/i915: kill waiting_seqno
The waiting_seqno is not terribly useful, and as such we can remove it
so that we'll be able to extract lockless code.

v2: Keep the information for error_state (Chris)
Check if ring is initialized in hangcheck (Chris)
Capture the waiting ring (Chris)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: add some bikeshed to clarify a comment.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:21 +02:00
Chris Wilson 7338aefa5c drm/i915: Use a global lock for modifying global irq flags
We were attempting to use a per-ring spinlock whilst modifying global
IRQ flags. A recipe for rare missed interrupts.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:09 +02:00
Daniel Vetter 6a848ccb80 drm/i915: rip out ring->irq_mask
We only ever enable/disable one interrupt (namely user_interrupts and
pipe_notify), so we don't need to track the interrupt masking state.

Also rename irq_enable to irq_enable_mask, now that it won't collide -
beforehand both a irq_mask and irq_enable_mask would have looked a bit
strange.

Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-13 12:40:57 +02:00
Ben Widawsky 25c063004a drm/i915: open code gen6+ ring irqs
We can now open-code the get/put irq functions as they were just
abstracting single register definitions.

It would be nice to merge this in with the IRQ handling code... but that
is too much work for me at present. In addition I could probably
collapse this in to a lot of the Ironlake stuff, but I don't think it's
worth the potential regressions.

This patch itself should not effect functionality.

CC: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-09 18:04:06 +02:00
Chris Wilson a71d8d9452 drm/i915: Record the tail at each request and use it to estimate the head
By recording the location of every request in the ringbuffer, we know
that in order to retire the request the GPU must have finished reading
it and so the GPU head is now beyond the tail of the request. We can
therefore provide a conservative estimate of where the GPU is reading
from in order to avoid having to read back the ring buffer registers
when polling for space upon starting a new write into the ringbuffer.

A secondary effect is that this allows us to convert
intel_ring_buffer_wait() to use i915_wait_request() and so consolidate
upon the single function to handle the complicated task of waiting upon
the GPU. A necessary precaution is that we need to make that wait
uninterruptible to match the existing conditions as all the callers of
intel_ring_begin() have not been audited to handle ERESTARTSYS
correctly.

By using a conservative estimate for the head, and always processing all
outstanding requests first, we prevent a race condition between using
the estimate and direct reads of I915_RING_HEAD which could result in
the value of the head going backwards, and the tail overflowing once
again. We are also careful to mark any request that we skip over in
order to free space in ring as consumed which provides a
self-consistency check.

Given sufficient abuse, such as a set of unthrottled GPU bound
cairo-traces, avoiding the use of I915_RING_HEAD gives a 10-20% boost on
Sandy Bridge (i5-2520m):
  firefox-paintball  18927ms -> 15646ms: 1.21x speedup
  firefox-fishtank   12563ms -> 11278ms: 1.11x speedup
which is a mild consolation for the performance those traces achieved from
exploiting the buggy autoreported head.

v2: Add a few more comments and make request->tail a conservative
estimate as suggested by Daniel Vetter.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: resolve conflicts with retirement defering and the lack of
the autoreport head removal (that will go in through -fixes).]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-02-15 14:26:03 +01:00
Daniel Vetter 96154f2fab drm/i915: switch ring->id to be a real id
... and add a helpr function for the places where we want a flag.

This way we can use ring->id to index into arrays.

v2: Resurrect the missing beautification-space Chris Wilson noted.
I'm moving this space around because I'll reuse ring_str in the next
patch.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-01-29 17:32:58 +01:00
Ben Widawsky c8c99b0f0d drm/i915: Dumb down the semaphore logic
While I think the previous code is correct, it was hard to follow and
hard to debug. Since we already have a ring abstraction, might as well
use it to handle the semaphore updates and compares.

I don't expect this code to make semaphores better or worse, but you
never know...

v2:
Remove magic per Keith's suggestions.
Ran Daniel's gem_ring_sync_loop test on this.

v3:
Ignored one of Keith's suggestions.

v4:
Removed some bloat per Daniel's recommendation.

Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Keith Packard <keithp@keithp.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-09-21 14:52:41 -07:00
Akshay Joshi 0206e353a0 Drivers: i915: Fix all space related issues.
Various issues involved with the space character were generating
warnings in the checkpatch.pl file. This patch removes most of those
warnings.

Signed-off-by: Akshay Joshi <me@akshayjoshi.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-09-19 18:01:47 -07:00
Chris Wilson a94919eadd drm/i915/ringbuffer: Idling requires waiting for the ring to be empty
...which is measured by the size and not the amount of space remaining.

Waiting upon size-8, did one of two things. In the common case with more
than 8 bytes available to write into the ring, it would return
immediately. Otherwise, it would timeout given the impossible condition
of waiting for more space than is available in the ring, leading to
warnings such as:

[drm:intel_cleanup_ring_buffer] *ERROR* failed to quiesce render ring
whilst cleaning up: -16

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-07-12 10:35:45 -07:00
Ben Widawsky b7287d8054 drm/i915: proper use of forcewake
Moved the macros around to properly do reads and writes for the given
GPU. This is to address special requirements for gen6 (SNB) reads and
writes.

Registers in the range 0-0x40000 on gen6 platforms require special
handling. Instead of relying on the callers to pick the registers
correctly, move the logic into the read and write functions.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-05-10 13:56:45 -07:00
Ben Widawsky 96f298aa9c drm/1915: ringbuffer wait for idle function
Added a new function which waits for the ringbuffer space to be equal to
(total - 8). This is the empty condition of the ringbuffer, and
equivalent to head==tail.

Also modified two users of this functionality elsewhere in the code.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-05-10 13:56:40 -07:00
Chris Wilson 47ae63e0c2 Merge branch 'drm-intel-fixes' into drm-intel-next
Apply the trivial conflicting regression fixes, but keep GPU semaphores
enabled.

Conflicts:
	drivers/gpu/drm/i915/i915_drv.h
	drivers/gpu/drm/i915/i915_gem_execbuffer.c
2011-03-07 12:35:15 +00:00
Chris Wilson 9135583464 drm/i915: Do not overflow the MMADDR write FIFO
Whilst the GT is powered down (rc6), writes to MMADDR are placed in a
FIFO by the System Agent. This is a limited resource, only 64 entries, of
which 20 are reserved for Display and PCH writes, and so we must take
care not to queue up too many writes. To avoid this, there is counter
which we can poll to ensure there are sufficient free entries in the
fifo.

"Issuing a write to a full FIFO is not supported; at worst it could
result in corruption or a system hang."

Reported-and-Tested-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34056
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-03-06 09:07:46 +00:00
Chris Wilson db53a30261 drm/i915: Refine tracepoints
A lot of minor tweaks to fix the tracepoints, improve the outputting for
ftrace, and to generally make the tracepoints useful again. It is a start
and enough to begin identifying performance issues and gaps in our
coverage.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-02-07 14:59:18 +00:00
Chris Wilson bdd92c9ad2 Merge branch 'drm-intel-fixes' into drm-intel-next
Merge important suspend and resume regression fixes and resolve the
small conflict.

Conflicts:
	drivers/gpu/drm/i915/i915_dma.c
2011-01-24 23:45:32 +00:00
Chris Wilson c7dca47bd6 drm/i915/ringbuffer: Fix use of stale HEAD position whilst polling for space
During suspend, Linus found that his machine would hang for 3 seconds,
and identified that intel_ring_buffer_wait() was the culprit:

"Because from looking at the code, I get the notion that
"intel_read_status_page()" may not be exact. But what happens if that
inexact value matches our cached ring->actual_head, so we never even
try to read the exact case? Does it _stay_ inexact for arbitrarily
long times? If so, we might wait for the ring to empty forever (well,
until the timeout - the behavior I see), even though the ring really
_is_ empty."

As the reported HEAD position is only updated every time it crosses a
64k boundary, whilst draining the ring it is indeed likely to remain one
value. If that value matches the last known HEAD position, we never read
the true value from the register and so trigger a timeout.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-20 17:26:57 +00:00
Chris Wilson e8616b6ced drm/i915: Initialise ring vfuncs for old DRI paths
We weren't setting up the vfunc table when initialising the old DRI
ringbuffer, leading to such OOPSes as:

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<(null)>] (null)
PGD 10c441067 PUD 1185e5067 PMD 0
Oops: 0010 [#1] PREEMPT SMP
last sysfs file: /sys/class/dmi/id/chassis_asset_tag
CPU 3
Modules linked in: i915 drm_kms_helper drm fb fbdev i2c_algo_bit
cfbcopyarea video backlight output cfbimgblt cfbfillrect autofs4 ipv6
nfs lockd fscache nfs_acl auth_rpcgss sunrpc coretemp hwmon_vid mousedev
usbhid hid option usb_wwan snd_hda_codec_via asus_atk0110 atl1e
usbserial snd_hda_intel snd_hda_codec firmware_class snd_hwdep snd_pcm
snd_seq snd_timer snd_seq_device processor parport_pc thermal snd
thermal_sys parport 8250_pnp button rng_core rtc_cmos shpchp hwmon
rtc_core ehci_hcd pci_hotplug uhci_hcd soundcore tpm_tis i2c_i801
rtc_lib tpm serio_raw snd_page_alloc tpm_bios i2c_core usbcore psmouse
intel_agp sg pcspkr sr_mod evdev cdrom ext3 jbd mbcache dm_mod sd_mod
ata_piix libata scsi_mod unix
Jan 18 15:49:29 lithui kernel:
Pid: 3605, comm: Xorg Not tainted 2.6.36.2 #5 P5KPL-CM/System Product
Name
RIP: 0010:[<0000000000000000>]  [<(null)>] (null)
RSP: 0018:ffff8801150d1d40  EFLAGS: 00010202
RAX: 000000000001ffff RBX: ffff88011a011b00 RCX: 000000000001a704
RDX: ffff880118566028 RSI: ffff880118566028 RDI: ffff880117876800
RBP: ffff8801150d1d48 R08: ffff8801195fe300 R09: 00000000c0086444
R10: 0000000000000001 R11: 0000000000003206 R12: ffff880117876800
R13: ffff880118566000 R14: ffff880117876820 R15: ffff8801150d1df8
FS:  00007f1038d456e0(0000) GS:ffff880001780000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000001187e7000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process Xorg (pid: 3605, threadinfo ffff8801150d0000, task
ffff88011b016e40)
Stack:
ffffffffa043b8e6 ffff8801150d1d98 ffffffffa041768b dead000000000000
<0> 0000000000000048 00007f1023f2a000 0000000000000044 0000000000000008
<0> ffff88010d26bd80 ffff880117876800 ffff8801150d1df8 ffff8801150d1ea8
Call Trace:
[<ffffffffa043b8e6>] ? intel_ring_advance+0x16/0x20 [i915]
[<ffffffffa041768b>] i915_irq_emit+0x15b/0x240 [i915]
[<ffffffffa03ea7b1>] drm_ioctl+0x1f1/0x460 [drm]
[<ffffffffa0417530>] ? i915_irq_emit+0x0/0x240 [i915]
[<ffffffff810dd8f1>] ? do_sync_read+0xd1/0x120
[<ffffffff81025b1f>] ? do_page_fault+0x1df/0x3d0
[<ffffffff810ed5c7>] do_vfs_ioctl+0x97/0x550
[<ffffffff8115c2ea>] ? security_file_permission+0x7a/0x90
[<ffffffff810edb19>] sys_ioctl+0x99/0xa0
[<ffffffff810024ab>] system_call_fastpath+0x16/0x1b
Code:  Bad RIP value.
RIP  [<(null)>] (null)
RSP <ffff8801150d1d40>
CR2: 0000000000000000

Reported-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Herbert Xu <herbert@gondor.apana.org.au>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=29153
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=23172
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
2011-01-20 11:20:53 +00:00
Chris Wilson 311bd68e02 drm/i915: Trivial sparse fixes
Move code around and invoke iomem annotation in a few more places in
order to silence sparse. Still a few more iomem annotations to go...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-19 12:39:38 +00:00
Chris Wilson 0dc79fb2a3 drm/i915: Make the ring IMR handling private
As the IMR for the USER interrupts are not modified elsewhere, we can
separate the spinlock used for these from that of hpd and pipestats.
Those two IMR are manipulated under an IRQ and so need heavier locking.

Reported-and-tested-by: Alexey Fisher <bug-track@fisher-privat.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-11 20:43:58 +00:00
Chris Wilson 01a03331e5 drm/i915/ringbuffer: Simplify the ring irq refcounting
... and move it under the spinlock to gain the appropriate memory
barriers.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32752
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-11 20:43:57 +00:00
Chris Wilson 9862e600ce drm/i915/debugfs: Show the per-ring IMR
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-11 20:43:56 +00:00
Chris Wilson 0f46832fab drm/i915: Mask USER interrupts on gen6 (until required)
Otherwise we may consume 20% of the CPU just handling IRQs whilst
rendering. Ouch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-11 20:43:56 +00:00
Chris Wilson b72f3acb71 drm/i915: Handle ringbuffer stalls when flushing
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-11 20:43:55 +00:00
Chris Wilson 55249baaa5 drm/i915: Workaround erratum on i830 for TAIL pointer within last 2 cachelines
On i830 if the tail pointer is set to within 2 cachelines of the end of
the buffer, the chip may hang. So instead if the tail were to land in
that location, we pad the end of the buffer with NOPs, and start again
at the beginning.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-11 20:35:41 +00:00
Chris Wilson b13c2b96bf drm/i915/ringbuffer: Make IRQ refcnting atomic
In order to enforce the correct memory barriers for irq get/put, we need
to perform the actual counting using atomic operations.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-12-14 11:34:46 +00:00
Chris Wilson 8d5203ca62 Merge branch 'drm-intel-fixes' into drm-intel-next 2010-12-09 20:22:04 +00:00
Chris Wilson 8c0a6bfef1 drm/i915/ringbuffer: Handle wrapping of the autoreported HEAD
If the tail advances beyond the autoreport HEAD value, then we need to
fallback to an uncached read of the HEAD register in order to ascertain
the correct amount of remaining space in the ringbuffer.

Reported-by: Fang, Xun <xunx.fang@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32259
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-12-09 12:53:19 +00:00
Chris Wilson 1ec14ad313 drm/i915: Implement GPU semaphores for inter-ring synchronisation on SNB
The bulk of the change is to convert the growing list of rings into an
array so that the relationship between the rings and the semaphore sync
registers can be easily computed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-12-05 00:37:38 +00:00
Chris Wilson c4e7a41467 drm/i915/ringbuffer: Handle cliprects in the caller
This makes the various rings more consistent by removing the anomalous
handing of the rendering ring execbuffer dispatch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-11-30 14:17:51 +00:00
Chris Wilson 05394f3975 drm/i915: Use drm_i915_gem_object as the preferred type
A glorified s/obj_priv/obj/ with a net reduction of over a 100 lines and
many characters!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-11-23 20:19:10 +00:00
Zou Nan hai cae5852dca drm/i915/ringbuffer: set FORCE_WAKE bit before reading ring register
Before reading ring register, set FORCE_WAKE bit to prevent GT core
power down to low power state, otherwise we may read stale values.

Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
[ickle: added a udelay which seemed to do the trick on my SNB]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-11-11 17:45:54 +00:00
Chris Wilson 5d97eb69bd drm/i915: Only add the lazy request if we end up waiting for it.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-11-10 20:41:16 +00:00
Chris Wilson 5588978882 drm/i915: SNB BLT workaround
On some stepping of SNB cpu, the first command to be parsed in BLT
command streamer should be MI_BATCHBUFFER_START otherwise the GPU
may hang.

(cherry picked from commit 8d19215be8)

Conflicts:

	drivers/gpu/drm/i915/intel_ringbuffer.c
	drivers/gpu/drm/i915/intel_ringbuffer.h

Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Cc: stable@kernel.org
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-11-02 10:48:48 +00:00
Zou Nan hai 8d19215be8 drm/i915: SNB BLT workaround
On some stepping of SNB cpu, the first command to be parsed in BLT
command streamer should be MI_BATCHBUFFER_START otherwise the GPU
may hang.

Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
[ickle: rebased for -next]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-11-02 10:44:23 +00:00
Chris Wilson 3cce469cab drm/i915: Propagate error from failing to queue a request
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-10-27 23:31:03 +01:00
Chris Wilson b2223497b4 drm/i915: Remove the confusing global waiting/irq seqno
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-10-27 23:30:59 +01:00
Chris Wilson c2c347a9ee drm/i915/debugfs: Include info for the other rings
The render ring is not alone any more! And the other rings are just as
troublesome...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-10-27 23:29:39 +01:00
Chris Wilson e1f99ce6ca drm/i915: Propagate errors from writing to ringbuffer
Preparing the ringbuffer for adding new commands can fail (a timeout
whilst waiting for the GPU to catch up and free some space). So check
for any potential error before overwriting HEAD with new commands, and
propagate that error back to the user where possible.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-10-27 23:26:34 +01:00
Chris Wilson 78501eac34 drm/i915/ringbuffer: Drop the redundant dev from the vfunc interface
The ringbuffer keeps a pointer to the parent device, so we can use that
instead of passing around the pointer on the stack.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-10-27 12:18:21 +01:00
Chris Wilson 641934069d drm/i915: Move gpu_write_list to per-ring
... to prevent flush processing of an idle (or even absent) ring.

This fixes a regression during suspend from 87acb0a5.

Reported-and-tested-by: Alexey Fisher <bug-track@fisher-privat.net>
Tested-by: Peter Clifton <pcjc2@cam.ac.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-10-24 20:22:51 +01:00
Chris Wilson 297b0c5be3 drm/i915/ringbuffer: Write the value passed in to the tail register
This should fix the error along the reset path were we tried to clear the
tail register by setting it to 0, but were in fact setting it to the
current value and complaining when it did not reset to 0.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-10-22 17:57:43 +01:00
Chris Wilson 549f736582 drm/i915: Enable SandyBridge blitter ring
Based on an original patch by Zhenyu Wang, this initializes the BLT ring for
SandyBridge and enables support for user execbuffers.

Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-10-21 19:08:39 +01:00
Chris Wilson e36c1cd729 drm/i915/ringbuffer: Remove broken intel_fill_struct()
... before someone tries to use it. The code both calls
intel_ring_begin/advance() and open-codes the bookkeeping performed by
those two functions.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-10-21 19:00:02 +01:00
Chris Wilson a56ba56c27 Revert "drm/i915: Drop ring->lazy_request"
With multiple rings generating requests independently, the outstanding
requests must also be track independently.

Reported-by: Wang Jinjin <jinjin.wang@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=30380
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-28 11:30:52 +01:00
Daniel Vetter 447da18742 drm/i915: kill ring->setup_status_page
It's the same code, essentially, so kill all copies safe one unified
version.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-25 12:23:16 +01:00
Daniel Vetter 79f321b7e6 drm/i915: kill ring->get_active_head
All functions are extremely similar, so fold them into one generic
implementation.

This function isn't used anyway, because there's not yet a bsd ring
error state dumper.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-25 12:23:15 +01:00
Chris Wilson f787a5f59e drm/i915: Only hold a process-local lock whilst throttling.
Avoid cause latencies in other clients by not taking the global struct
mutex and moving the per-client request manipulation a local per-client
mutex. For example, this allows a compositor to schedule a page-flip
(through X) whilst an OpenGL application is monopolising the GPU.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-24 21:03:00 +01:00
Chris Wilson 780f0ca3e0 drm/i915/ringbuffer: Fix sign of ring space.
As we presume space is signed when computing and looking for wrap along,
make it so.

Reported-by: Owain G. Ainsworth <zerooa@googlemail.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-24 14:19:55 +01:00
Chris Wilson 5c12a07e80 drm/i915: Drop ring->lazy_request
We are not currently using it as intended, so remove the complication.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-22 11:58:55 +01:00
Chris Wilson ab6f8e3250 drm/i915/ringbuffer: whitespace cleanup
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-21 11:20:06 +01:00
Daniel Vetter a9db5c8fdd drm/i915: drop alignment ringbuffer parameter
Always PAGE_SIZE and only complicates the code.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-21 11:20:05 +01:00
Daniel Vetter 7f2ab69913 drm/i915: use new macros to access the ring ctl register
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-21 11:20:04 +01:00
Daniel Vetter 570ef60859 drm/i915: use new macros to access the ring head register
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-21 11:20:03 +01:00
Daniel Vetter 6c0e1c556e drm/i915: use new macros to access the ring start register
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-21 11:20:02 +01:00
Daniel Vetter 870e86ddc2 drm/i915: use new macros to access the ring tail register
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-21 11:20:02 +01:00
Daniel Vetter 333e9fe94d drm/i915: add relative ring register macros
Documentation explicitly mentions that the ring registers are
designed to have the same offsets relative to a base registers.

Use this to fight the code beaurocratic in intel_ringbuffer.c.

No code changes in this patch, just the new definitions.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-21 11:20:01 +01:00
Chris Wilson a3f07cd53e drm/i915/ringbuffer: Implement advance using set_tail
As noted by Zhenyu, we can now simply replace the existing advance hook
by calling the new set_tail function pointer directly.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-21 11:19:57 +01:00
Xiang, Haihao d46eefa297 drm/i915: add set_tail hook in struct intel_ring_buffer
This is prepared for video codec ring buffer on Sandybridge. It is
needed to read/write more than one register to move the tail pointer of
the video codec ring on Sandybridge.

Signed-off-by: Xiang, Haihao <haihao.xiang@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-21 11:19:56 +01:00
Xiang, Haihao 5c1143bbec drm/i915: do not export the instances of struct intel_ring_buffer
Introduce intel_init_render_ring_buffer(), intel_init_bsd_ring_buffer
for ring initialization.

Signed-off-by: Xiang, Haihao <haihao.xiang@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-21 11:19:55 +01:00
Chris Wilson 9220434a87 drm/i915: Only emit a flush request on the active ring.
When flushing the GPU domains,we emit a flush on *both* rings, even
though they share a unified cache. Only emit the flush on the currently
active ring.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-21 11:19:51 +01:00
Chris Wilson 2b6efaa476 drm/i915: Remove unused intel_ringbuffer->ring_flag
This can always be re-added should somebody find a use...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-14 21:13:00 +01:00
Daniel Vetter a6910434e1 drm/i915: only one interrupt per batchbuffer is not enough!
Previously I thought that one interrupt per batchbuffer should be
enough. Now tedious benchmarking showed this to be wrong.

Therefore track whether any commands have been isssued with a future
seqno (like pipelined fencing changes or flushes). If this is the case
emit a request before issueing the batchbuffer.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-08 10:23:35 +01:00
Chris Wilson 6f392d5486 drm/i915: Use a common seqno for all rings.
This will be used by the eviction logic to maintain fairness between the
rings.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
2010-08-09 11:24:32 -07:00
Chris Wilson e898cd221d drm/i915: Inline ringbuffer_emit()
As the function has been reduced to a store plus increment, the body is
now smaller than the call so inline it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
2010-08-09 11:24:31 -07:00
Zou Nan hai 8187a2b70e drm/i915: introduce intel_ring_buffer structure (V2)
Introduces a more complete intel_ring_buffer structure with callbacks
for setup and management of a particular ringbuffer, and converts the
render ring buffer consumers to use it.

Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Xiang Hai hao <haihao.xiang@intel.com>
[anholt: Fixed up whitespace fail and rebased against prep patches]
Signed-off-by: Eric Anholt <eric@anholt.net>
2010-05-26 13:24:49 -07:00