OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Daniel Vetter	b45305fce5	drm/i915: Implement workaround for broken CS tlb on i830/845 Now that Chris Wilson demonstrated that the key for stability on early gen 2 is to simple _never_ exchange the physical backing storage of batch buffers I've tried a stab at a kernel solution. Doesn't look too nefarious imho, now that I don't try to be too clever for my own good any more. v2: After discussing the various techniques, we've decided to always blit batches on the suspect devices, but allow userspace to opt out of the kernel workaround assume full responsibility for providing coherent batches. The principal reason is that avoiding the blit does improve performance in a few key microbenchmarks and also in cairo-trace replays. Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: - Drop the hunk which uses HAS_BROKEN_CS_TLB to implement the ring wrap w/a. Suggested by Chris Wilson. - Also add the ACTHD check from Chris Wilson for the error state dumping, so that we still catch batches when userspace opts out of the w/a.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-12-17 17:27:02 +01:00
Mika Kuoppala	498d2ac15c	drm/i915: Add intel_ring_handle_seqno wrap If there are pre-wrap values in semaphore-mbox registers after wrap, syncing against some after-wrap request will complete immediately. Fix this by emitting ring commands to set mbox registers to zero when the wrap happens. v2: Use __intel_ring_begin to emit ring commands, from Chris Wilson. Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Add a small comment to handle_seqno_wrap.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-12-06 13:14:34 +01:00
Ville Syrjälä	633cf8f505	drm/i915: Don't allow ring tail to reach the same cacheline as head From BSpec: "If the Ring Buffer Head Pointer and the Tail Pointer are on the same cacheline, the Head Pointer must not be greater than the Tail Pointer." The easiest way to enforce this is to reduce the reported ring space. References: Gen2 BSpec "1. Programming Environment" / 1.4.4.6 "Ring Buffer Use" Gen3 BSpec "vol1c Memory Interface Functions" / 2.3.4.5 "Ring Buffer Use" Gen4+ BSpec "vol1c Memory Interface and Command Stream" / 5.3.4.5 "Ring Buffer Use" v2: Include the exact BSpec references in the description v3: s/64/I915_RING_FREE_SPACE, and add the BSpec information to the code Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-12-03 18:31:20 +01:00
Chris Wilson	3e9605018a	drm/i915: Rearrange code to only have a single method for waiting upon the ring Replace the wait for the ring to be clear with the more common wait for the ring to be idle. The principle advantage is one less exported intel_ring_wait function, and the removal of a hardcoded value. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-11-29 11:43:53 +01:00
Chris Wilson	9d7730914f	drm/i915: Preallocate next seqno before touching the ring Based on the work by Mika Kuoppala, we realised that we need to handle seqno wraparound prior to committing our changes to the ring. The most obvious point then is to grab the seqno inside intel_ring_begin(), and then to reuse that seqno for all ring operations until the next request. As intel_ring_begin() can fail, the callers must already be prepared to handle such failure and so we can safely add further checks. This patch looks like it should be split up into the interface changes and the tweaks to move seqno wrapping from the execbuffer into the core seqno increment. However, I found no easy way to break it into incremental steps without introducing further broken behaviour. v2: Mika found a silly mistake and a subtle error in the existing code; inside i915_gem_retire_requests() we were resetting the sync_seqno of the target ring based on the seqno from this ring - which are only related by the order of their allocation, not retirement. Hence we were applying the optimisation that the rings were synchronised too early, fortunately the only real casualty there is the handling of seqno wrapping. v3: Do not forget to reset the sync_seqno upon module reinitialisation, ala resume. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=863861 Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> [v2] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-11-29 11:43:52 +01:00
Jesse Barnes	9a28977181	drm/i915: TLB invalidation with MI_FLUSH_DW requires a post-sync op v3 So store into the scratch space of the HWS to make sure the invalidate occurs. v2: use GTT address space for store, clean up #defines (Chris) v3: use correct #define in blt ring flush (Chris) Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Reviewed-by: Antti Koskipää <antti.koskipaa@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> References: https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/1063252 Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-11-11 23:51:36 +01:00
Chris Wilson	d7d4eeddb8	drm/i915: Allow DRM_ROOT_ONLY\|DRM_MASTER to submit privileged batchbuffers With the introduction of per-process GTT space, the hardware designers thought it wise to also limit the ability to write to MMIO space to only a "secure" batch buffer. The ability to rewrite registers is the only way to program the hardware to perform certain operations like scanline waits (required for tear-free windowed updates). So we either have a choice of adding an interface to perform those synchronized updates inside the kernel, or we permit certain processes the ability to write to the "safe" registers from within its command stream. This patch exposes the ability to submit a SECURE batch buffer to DRM_ROOT_ONLY\|DRM_MASTER processes. v2: Haswell split up bit8 into a ppgtt bit (still bit8) and a security bit (bit 13, accidentally not set). Also add a comment explaining why secure batches need a global gtt binding. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1) [danvet: added hsw fixup.] Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-10-17 21:06:59 +02:00
Chris Wilson	b2eadbc85b	drm/i915: Lazily apply the SNB+ seqno w/a Avoid the forcewake overhead when simply retiring requests, as often the last seen seqno is good enough to satisfy the retirment process and will be promptly re-run in any case. Only ensure that we force the coherent seqno read when we are explicitly waiting upon a completion event to be sure that none go missing, and also for when we are reporting seqno values in case of error or debugging. This greatly reduces the load for userspace using the busy-ioctl to track active buffers, for instance halving the CPU used by X in pushing the pixels from a software render (flash). The effect will be even more magnified with userptr and so providing a zero-copy upload path in that instance, or in similar instances where X is simply compositing DRI buffers. v2: Reverse the polarity of the tachyon stream. Daniel suggested that 'force' was too generic for the parameter name and that 'lazy_coherency' better encapsulated the semantics of it being an optimization and its purpose. Also notice that gen6_get_seqno() is only used by gen6/7 chipsets and so the test for IS_GEN6 \|\| IS_GEN7 is redundant in that function. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-08-10 11:11:32 +02:00
Chris Wilson	a7b9761d0a	drm/i915: Split i915_gem_flush_ring() into seperate invalidate/flush funcs By moving the function to intel_ringbuffer and currying the appropriate parameter, hopefully we make the callsites easier to read and understand. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:55 +02:00
Chris Wilson	69c2fc8913	drm/i915: Remove the per-ring write list This is now handled by a global flag to ensure we emit a flush before the next serialisation point (if we failed to queue one previously). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:53 +02:00
Daniel Vetter	cc889e0f6c	drm/i915: disable flushing_list/gpu_write_list This is just the minimal patch to disable all this code so that we can do decent amounts of QA before we rip it all out. The complicating thing is that we need to flush the gpu caches after the batchbuffer is emitted. Which is past the point of no return where execbuffer can't fail any more (otherwise we risk submitting the same batch multiple times). Hence we need to add a flag to track whether any caches associated with that ring are dirty. And emit the flush in add_request if that's the case. Note that this has a quite a few behaviour changes: - Caches get flushed/invalidated unconditionally. - Invalidation now happens after potential inter-ring sync. I've bantered around a bit with Chris on irc whether this fixes anything, and it might or might not. The only thing clear is that with these changes it's much easier to reason about correctness. Also rip out a lone get_next_request_seqno in the execbuffer retire_commands function. I've dug around and I couldn't figure out why that is still there, with the outstanding lazy request stuff it shouldn't be necessary. v2: Chris Wilson complained that I also invalidate the read caches when flushing after a batchbuffer. Now optimized. v3: Added some comments to explain the new flushing behaviour. Cc: Eric Anholt <eric@anholt.net> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-06-20 13:54:28 +02:00
Ben Widawsky	12b0286f49	drm/i915: possibly invalidate TLB before context switch From http://intellinuxgraphics.org/documentation/SNB/IHD_OS_Vol1_Part3.pdf [DevSNB] If Flush TLB invalidation Mode is enabled it's the driver's responsibility to invalidate the TLBs at least once after the previous context switch after any GTT mappings changed (including new GTT entries). This can be done by a pipelined PIPE_CONTROL with TLB inv bit set immediately before MI_SET_CONTEXT. On GEN7 the invalidation mode is explicitly set, but this appears to be lacking for GEN6. Since I don't know the history on this, I've decided to dynamically read the value at ring init time, and use that value throughout. v2: better comment (daniel) Signed-off-by: Ben Widawsky <ben@bwidawsk.net>	2012-06-14 17:36:19 +02:00
Ben Widawsky	e055684168	drm/i915: context switch implementation Implement the context switch code as well as the interfaces to do the context switch. This patch also doesn't match 1:1 with the RFC patches. The main difference is that from Daniel's responses the last context object is now stored instead of the last context. This aids in allows us to free the context data structure, and context object independently. There is room for optimization: this code will pin the context object until the next context is active. The optimal way to do it is to actually pin the object, move it to the active list, do the context switch, and then unpin it. This allows the eviction code to actually evict the context object if needed. The context switch code is missing workarounds, they will be implemented in future patches. v2: actually do obj->dirty=1 in switch (daniel) Modified comment around above Remove flags to context switch (daniel) Move mi_set_context code to i915_gem_context.c (daniel) Remove seqno , use lazy request instead (daniel) v3: use i915_gem_request_next_seqno instead of outstanding_lazy_request (Daniel) remove id's from trace events (Daniel) Put the context BO in the instruction domain (Daniel) Don't unref the BO is context switch fails (Chris) Signed-off-by: Ben Widawsky <ben@bwidawsk.net>	2012-06-14 17:36:17 +02:00
Ben Widawsky	40521054fd	drm/i915: context basic create & destroy Invent an abstraction for a hw context which is passed around through the core functions. The main bit a hw context holds is the buffer object which backs the context. The rest of the members are just helper functions. Specifically the ring member, which could likely go away if we decide to never implement whatever other hw context support exists. Of note here is the introduction of the 64k alignment constraint for the BO. If contexts become heavily used, we should consider tweaking this down to 4k. Until the contexts are merged and tested a bit though, I think 64k is a nice start (based on docs). Since we don't yet switch contexts, there is really not much complexity here. Creation/destruction works pretty much as one would expect. An idr is used to generate the context id numbers which are unique per file descriptor. v2: add DRM_DEBUG_DRIVERS to distinguish ENOMEM failures (ben) convert a BUG_ON to WARN_ON, default destruction is still fatal (ben) Signed-off-by: Ben Widawsky <ben@bwidawsk.net>	2012-06-14 17:36:16 +02:00
Chris Wilson	b4519513e8	drm/i915: Introduce for_each_ring() macro In many places we wish to iterate over the rings associated with the GPU, so refactor them to use a common macro. Along the way, there are a few code removals that should be side-effect free and some rearrangement which should only have a cosmetic impact, such as error-state. Note that this slightly changes the semantics in the hangcheck code: We now always cycle through all enabled rings instead of short-circuiting the logic. v2: Pull in a couple of suggestions from Ben and Daniel for intel_ring_initialized() and not removing the warning (just moving them to a new home, closer to the error). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Added note to commit message about the small behaviour change, suggested by Ben Widawsky.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-19 22:39:53 +02:00
Daniel Vetter	4225d0f219	drm/i915: fixup __iomem mixups in ringbuffer.c Two things: - ring->virtual start is an __iomem pointer, treat it accordingly. - dev_priv->status_page.page_addr is now always a cpu addr, no pointer casting needed for that. Take the opportunity to remove the unnecessary drm indirection when setting up the ringbuffer iomapping. v2: Add a compiler barrier before reading the hw status page. Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:31 +02:00
Daniel Vetter	09422b2e72	drm/i915: move LP_RING&friends to i915_dma.c Wohoo! Now we only need to move all the gem/kms stuff that accidentally landed in i915_dma.c out of it, and this will be our legacy dri1 grave-yard. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:28 +02:00
Chris Wilson	6d171cb4c2	drm/i915: Remove unused ring->irq_seqno Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:23 +02:00
Ben Widawsky	9574b3fe29	drm/i915: kill waiting_seqno The waiting_seqno is not terribly useful, and as such we can remove it so that we'll be able to extract lockless code. v2: Keep the information for error_state (Chris) Check if ring is initialized in hangcheck (Chris) Capture the waiting ring (Chris) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: add some bikeshed to clarify a comment.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:21 +02:00
Chris Wilson	7338aefa5c	drm/i915: Use a global lock for modifying global irq flags We were attempting to use a per-ring spinlock whilst modifying global IRQ flags. A recipe for rare missed interrupts. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:09 +02:00
Daniel Vetter	6a848ccb80	drm/i915: rip out ring->irq_mask We only ever enable/disable one interrupt (namely user_interrupts and pipe_notify), so we don't need to track the interrupt masking state. Also rename irq_enable to irq_enable_mask, now that it won't collide - beforehand both a irq_mask and irq_enable_mask would have looked a bit strange. Reviewed-by: Eric Anholt <eric@anholt.net> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-13 12:40:57 +02:00
Ben Widawsky	25c063004a	drm/i915: open code gen6+ ring irqs We can now open-code the get/put irq functions as they were just abstracting single register definitions. It would be nice to merge this in with the IRQ handling code... but that is too much work for me at present. In addition I could probably collapse this in to a lot of the Ironlake stuff, but I don't think it's worth the potential regressions. This patch itself should not effect functionality. CC: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-09 18:04:06 +02:00
Chris Wilson	a71d8d9452	drm/i915: Record the tail at each request and use it to estimate the head By recording the location of every request in the ringbuffer, we know that in order to retire the request the GPU must have finished reading it and so the GPU head is now beyond the tail of the request. We can therefore provide a conservative estimate of where the GPU is reading from in order to avoid having to read back the ring buffer registers when polling for space upon starting a new write into the ringbuffer. A secondary effect is that this allows us to convert intel_ring_buffer_wait() to use i915_wait_request() and so consolidate upon the single function to handle the complicated task of waiting upon the GPU. A necessary precaution is that we need to make that wait uninterruptible to match the existing conditions as all the callers of intel_ring_begin() have not been audited to handle ERESTARTSYS correctly. By using a conservative estimate for the head, and always processing all outstanding requests first, we prevent a race condition between using the estimate and direct reads of I915_RING_HEAD which could result in the value of the head going backwards, and the tail overflowing once again. We are also careful to mark any request that we skip over in order to free space in ring as consumed which provides a self-consistency check. Given sufficient abuse, such as a set of unthrottled GPU bound cairo-traces, avoiding the use of I915_RING_HEAD gives a 10-20% boost on Sandy Bridge (i5-2520m): firefox-paintball 18927ms -> 15646ms: 1.21x speedup firefox-fishtank 12563ms -> 11278ms: 1.11x speedup which is a mild consolation for the performance those traces achieved from exploiting the buggy autoreported head. v2: Add a few more comments and make request->tail a conservative estimate as suggested by Daniel Vetter. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: resolve conflicts with retirement defering and the lack of the autoreport head removal (that will go in through -fixes).] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-02-15 14:26:03 +01:00
Daniel Vetter	96154f2fab	drm/i915: switch ring->id to be a real id ... and add a helpr function for the places where we want a flag. This way we can use ring->id to index into arrays. v2: Resurrect the missing beautification-space Chris Wilson noted. I'm moving this space around because I'll reuse ring_str in the next patch. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-01-29 17:32:58 +01:00
Ben Widawsky	c8c99b0f0d	drm/i915: Dumb down the semaphore logic While I think the previous code is correct, it was hard to follow and hard to debug. Since we already have a ring abstraction, might as well use it to handle the semaphore updates and compares. I don't expect this code to make semaphores better or worse, but you never know... v2: Remove magic per Keith's suggestions. Ran Daniel's gem_ring_sync_loop test on this. v3: Ignored one of Keith's suggestions. v4: Removed some bloat per Daniel's recommendation. Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Keith Packard <keithp@keithp.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Keith Packard <keithp@keithp.com>	2011-09-21 14:52:41 -07:00
Akshay Joshi	0206e353a0	Drivers: i915: Fix all space related issues. Various issues involved with the space character were generating warnings in the checkpatch.pl file. This patch removes most of those warnings. Signed-off-by: Akshay Joshi <me@akshayjoshi.com> Signed-off-by: Keith Packard <keithp@keithp.com>	2011-09-19 18:01:47 -07:00
Chris Wilson	a94919eadd	drm/i915/ringbuffer: Idling requires waiting for the ring to be empty ...which is measured by the size and not the amount of space remaining. Waiting upon size-8, did one of two things. In the common case with more than 8 bytes available to write into the ring, it would return immediately. Otherwise, it would timeout given the impossible condition of waiting for more space than is available in the ring, leading to warnings such as: [drm:intel_cleanup_ring_buffer] ERROR failed to quiesce render ring whilst cleaning up: -16 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Keith Packard <keithp@keithp.com>	2011-07-12 10:35:45 -07:00
Ben Widawsky	b7287d8054	drm/i915: proper use of forcewake Moved the macros around to properly do reads and writes for the given GPU. This is to address special requirements for gen6 (SNB) reads and writes. Registers in the range 0-0x40000 on gen6 platforms require special handling. Instead of relying on the callers to pick the registers correctly, move the logic into the read and write functions. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-05-10 13:56:45 -07:00
Ben Widawsky	96f298aa9c	drm/1915: ringbuffer wait for idle function Added a new function which waits for the ringbuffer space to be equal to (total - 8). This is the empty condition of the ringbuffer, and equivalent to head==tail. Also modified two users of this functionality elsewhere in the code. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-05-10 13:56:40 -07:00
Chris Wilson	47ae63e0c2	Merge branch 'drm-intel-fixes' into drm-intel-next Apply the trivial conflicting regression fixes, but keep GPU semaphores enabled. Conflicts: drivers/gpu/drm/i915/i915_drv.h drivers/gpu/drm/i915/i915_gem_execbuffer.c	2011-03-07 12:35:15 +00:00
Chris Wilson	9135583464	drm/i915: Do not overflow the MMADDR write FIFO Whilst the GT is powered down (rc6), writes to MMADDR are placed in a FIFO by the System Agent. This is a limited resource, only 64 entries, of which 20 are reserved for Display and PCH writes, and so we must take care not to queue up too many writes. To avoid this, there is counter which we can poll to ensure there are sufficient free entries in the fifo. "Issuing a write to a full FIFO is not supported; at worst it could result in corruption or a system hang." Reported-and-Tested-by: Matt Turner <mattst88@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34056 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-03-06 09:07:46 +00:00
Chris Wilson	db53a30261	drm/i915: Refine tracepoints A lot of minor tweaks to fix the tracepoints, improve the outputting for ftrace, and to generally make the tracepoints useful again. It is a start and enough to begin identifying performance issues and gaps in our coverage. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-02-07 14:59:18 +00:00
Chris Wilson	bdd92c9ad2	Merge branch 'drm-intel-fixes' into drm-intel-next Merge important suspend and resume regression fixes and resolve the small conflict. Conflicts: drivers/gpu/drm/i915/i915_dma.c	2011-01-24 23:45:32 +00:00
Chris Wilson	c7dca47bd6	drm/i915/ringbuffer: Fix use of stale HEAD position whilst polling for space During suspend, Linus found that his machine would hang for 3 seconds, and identified that intel_ring_buffer_wait() was the culprit: "Because from looking at the code, I get the notion that "intel_read_status_page()" may not be exact. But what happens if that inexact value matches our cached ring->actual_head, so we never even try to read the exact case? Does it _stay_ inexact for arbitrarily long times? If so, we might wait for the ring to empty forever (well, until the timeout - the behavior I see), even though the ring really _is_ empty." As the reported HEAD position is only updated every time it crosses a 64k boundary, whilst draining the ring it is indeed likely to remain one value. If that value matches the last known HEAD position, we never read the true value from the register and so trigger a timeout. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-01-20 17:26:57 +00:00
Chris Wilson	e8616b6ced	drm/i915: Initialise ring vfuncs for old DRI paths We weren't setting up the vfunc table when initialising the old DRI ringbuffer, leading to such OOPSes as: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<(null)>] (null) PGD 10c441067 PUD 1185e5067 PMD 0 Oops: 0010 [#1] PREEMPT SMP last sysfs file: /sys/class/dmi/id/chassis_asset_tag CPU 3 Modules linked in: i915 drm_kms_helper drm fb fbdev i2c_algo_bit cfbcopyarea video backlight output cfbimgblt cfbfillrect autofs4 ipv6 nfs lockd fscache nfs_acl auth_rpcgss sunrpc coretemp hwmon_vid mousedev usbhid hid option usb_wwan snd_hda_codec_via asus_atk0110 atl1e usbserial snd_hda_intel snd_hda_codec firmware_class snd_hwdep snd_pcm snd_seq snd_timer snd_seq_device processor parport_pc thermal snd thermal_sys parport 8250_pnp button rng_core rtc_cmos shpchp hwmon rtc_core ehci_hcd pci_hotplug uhci_hcd soundcore tpm_tis i2c_i801 rtc_lib tpm serio_raw snd_page_alloc tpm_bios i2c_core usbcore psmouse intel_agp sg pcspkr sr_mod evdev cdrom ext3 jbd mbcache dm_mod sd_mod ata_piix libata scsi_mod unix Jan 18 15:49:29 lithui kernel: Pid: 3605, comm: Xorg Not tainted 2.6.36.2 #5 P5KPL-CM/System Product Name RIP: 0010:[<0000000000000000>] [<(null)>] (null) RSP: 0018:ffff8801150d1d40 EFLAGS: 00010202 RAX: 000000000001ffff RBX: ffff88011a011b00 RCX: 000000000001a704 RDX: ffff880118566028 RSI: ffff880118566028 RDI: ffff880117876800 RBP: ffff8801150d1d48 R08: ffff8801195fe300 R09: 00000000c0086444 R10: 0000000000000001 R11: 0000000000003206 R12: ffff880117876800 R13: ffff880118566000 R14: ffff880117876820 R15: ffff8801150d1df8 FS: 00007f1038d456e0(0000) GS:ffff880001780000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000001187e7000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process Xorg (pid: 3605, threadinfo ffff8801150d0000, task ffff88011b016e40) Stack: ffffffffa043b8e6 ffff8801150d1d98 ffffffffa041768b dead000000000000 <0> 0000000000000048 00007f1023f2a000 0000000000000044 0000000000000008 <0> ffff88010d26bd80 ffff880117876800 ffff8801150d1df8 ffff8801150d1ea8 Call Trace: [<ffffffffa043b8e6>] ? intel_ring_advance+0x16/0x20 [i915] [<ffffffffa041768b>] i915_irq_emit+0x15b/0x240 [i915] [<ffffffffa03ea7b1>] drm_ioctl+0x1f1/0x460 [drm] [<ffffffffa0417530>] ? i915_irq_emit+0x0/0x240 [i915] [<ffffffff810dd8f1>] ? do_sync_read+0xd1/0x120 [<ffffffff81025b1f>] ? do_page_fault+0x1df/0x3d0 [<ffffffff810ed5c7>] do_vfs_ioctl+0x97/0x550 [<ffffffff8115c2ea>] ? security_file_permission+0x7a/0x90 [<ffffffff810edb19>] sys_ioctl+0x99/0xa0 [<ffffffff810024ab>] system_call_fastpath+0x16/0x1b Code: Bad RIP value. RIP [<(null)>] (null) RSP <ffff8801150d1d40> CR2: 0000000000000000 Reported-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Herbert Xu <herbert@gondor.apana.org.au> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=29153 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=23172 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@kernel.org	2011-01-20 11:20:53 +00:00
Chris Wilson	311bd68e02	drm/i915: Trivial sparse fixes Move code around and invoke iomem annotation in a few more places in order to silence sparse. Still a few more iomem annotations to go... Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-01-19 12:39:38 +00:00
Chris Wilson	0dc79fb2a3	drm/i915: Make the ring IMR handling private As the IMR for the USER interrupts are not modified elsewhere, we can separate the spinlock used for these from that of hpd and pipestats. Those two IMR are manipulated under an IRQ and so need heavier locking. Reported-and-tested-by: Alexey Fisher <bug-track@fisher-privat.net> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-01-11 20:43:58 +00:00
Chris Wilson	01a03331e5	drm/i915/ringbuffer: Simplify the ring irq refcounting ... and move it under the spinlock to gain the appropriate memory barriers. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32752 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-01-11 20:43:57 +00:00
Chris Wilson	9862e600ce	drm/i915/debugfs: Show the per-ring IMR Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-01-11 20:43:56 +00:00
Chris Wilson	0f46832fab	drm/i915: Mask USER interrupts on gen6 (until required) Otherwise we may consume 20% of the CPU just handling IRQs whilst rendering. Ouch. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-01-11 20:43:56 +00:00
Chris Wilson	b72f3acb71	drm/i915: Handle ringbuffer stalls when flushing Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-01-11 20:43:55 +00:00
Chris Wilson	55249baaa5	drm/i915: Workaround erratum on i830 for TAIL pointer within last 2 cachelines On i830 if the tail pointer is set to within 2 cachelines of the end of the buffer, the chip may hang. So instead if the tail were to land in that location, we pad the end of the buffer with NOPs, and start again at the beginning. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2011-01-11 20:35:41 +00:00
Chris Wilson	b13c2b96bf	drm/i915/ringbuffer: Make IRQ refcnting atomic In order to enforce the correct memory barriers for irq get/put, we need to perform the actual counting using atomic operations. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-12-14 11:34:46 +00:00
Chris Wilson	8d5203ca62	Merge branch 'drm-intel-fixes' into drm-intel-next	2010-12-09 20:22:04 +00:00
Chris Wilson	8c0a6bfef1	drm/i915/ringbuffer: Handle wrapping of the autoreported HEAD If the tail advances beyond the autoreport HEAD value, then we need to fallback to an uncached read of the HEAD register in order to ascertain the correct amount of remaining space in the ringbuffer. Reported-by: Fang, Xun <xunx.fang@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32259 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-12-09 12:53:19 +00:00
Chris Wilson	1ec14ad313	drm/i915: Implement GPU semaphores for inter-ring synchronisation on SNB The bulk of the change is to convert the growing list of rings into an array so that the relationship between the rings and the semaphore sync registers can be easily computed. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-12-05 00:37:38 +00:00
Chris Wilson	c4e7a41467	drm/i915/ringbuffer: Handle cliprects in the caller This makes the various rings more consistent by removing the anomalous handing of the rendering ring execbuffer dispatch. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-11-30 14:17:51 +00:00
Chris Wilson	05394f3975	drm/i915: Use drm_i915_gem_object as the preferred type A glorified s/obj_priv/obj/ with a net reduction of over a 100 lines and many characters! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-11-23 20:19:10 +00:00
Zou Nan hai	cae5852dca	drm/i915/ringbuffer: set FORCE_WAKE bit before reading ring register Before reading ring register, set FORCE_WAKE bit to prevent GT core power down to low power state, otherwise we may read stale values. Signed-off-by: Zou Nan hai <nanhai.zou@intel.com> [ickle: added a udelay which seemed to do the trick on my SNB] Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-11-11 17:45:54 +00:00
Chris Wilson	5d97eb69bd	drm/i915: Only add the lazy request if we end up waiting for it. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-11-10 20:41:16 +00:00
Chris Wilson	5588978882	drm/i915: SNB BLT workaround On some stepping of SNB cpu, the first command to be parsed in BLT command streamer should be MI_BATCHBUFFER_START otherwise the GPU may hang. (cherry picked from commit `8d19215be8`) Conflicts: drivers/gpu/drm/i915/intel_ringbuffer.c drivers/gpu/drm/i915/intel_ringbuffer.h Signed-off-by: Zou Nan hai <nanhai.zou@intel.com> Cc: stable@kernel.org Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-11-02 10:48:48 +00:00
Zou Nan hai	8d19215be8	drm/i915: SNB BLT workaround On some stepping of SNB cpu, the first command to be parsed in BLT command streamer should be MI_BATCHBUFFER_START otherwise the GPU may hang. Signed-off-by: Zou Nan hai <nanhai.zou@intel.com> [ickle: rebased for -next] Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-11-02 10:44:23 +00:00
Chris Wilson	3cce469cab	drm/i915: Propagate error from failing to queue a request Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-10-27 23:31:03 +01:00
Chris Wilson	b2223497b4	drm/i915: Remove the confusing global waiting/irq seqno Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-10-27 23:30:59 +01:00
Chris Wilson	c2c347a9ee	drm/i915/debugfs: Include info for the other rings The render ring is not alone any more! And the other rings are just as troublesome... Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-10-27 23:29:39 +01:00
Chris Wilson	e1f99ce6ca	drm/i915: Propagate errors from writing to ringbuffer Preparing the ringbuffer for adding new commands can fail (a timeout whilst waiting for the GPU to catch up and free some space). So check for any potential error before overwriting HEAD with new commands, and propagate that error back to the user where possible. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-10-27 23:26:34 +01:00
Chris Wilson	78501eac34	drm/i915/ringbuffer: Drop the redundant dev from the vfunc interface The ringbuffer keeps a pointer to the parent device, so we can use that instead of passing around the pointer on the stack. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-10-27 12:18:21 +01:00
Chris Wilson	641934069d	drm/i915: Move gpu_write_list to per-ring ... to prevent flush processing of an idle (or even absent) ring. This fixes a regression during suspend from `87acb0a5`. Reported-and-tested-by: Alexey Fisher <bug-track@fisher-privat.net> Tested-by: Peter Clifton <pcjc2@cam.ac.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-10-24 20:22:51 +01:00
Chris Wilson	297b0c5be3	drm/i915/ringbuffer: Write the value passed in to the tail register This should fix the error along the reset path were we tried to clear the tail register by setting it to 0, but were in fact setting it to the current value and complaining when it did not reset to 0. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-10-22 17:57:43 +01:00
Chris Wilson	549f736582	drm/i915: Enable SandyBridge blitter ring Based on an original patch by Zhenyu Wang, this initializes the BLT ring for SandyBridge and enables support for user execbuffers. Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-10-21 19:08:39 +01:00
Chris Wilson	e36c1cd729	drm/i915/ringbuffer: Remove broken intel_fill_struct() ... before someone tries to use it. The code both calls intel_ring_begin/advance() and open-codes the bookkeeping performed by those two functions. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-10-21 19:00:02 +01:00
Chris Wilson	a56ba56c27	Revert "drm/i915: Drop ring->lazy_request" With multiple rings generating requests independently, the outstanding requests must also be track independently. Reported-by: Wang Jinjin <jinjin.wang@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=30380 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-09-28 11:30:52 +01:00
Daniel Vetter	447da18742	drm/i915: kill ring->setup_status_page It's the same code, essentially, so kill all copies safe one unified version. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-09-25 12:23:16 +01:00
Daniel Vetter	79f321b7e6	drm/i915: kill ring->get_active_head All functions are extremely similar, so fold them into one generic implementation. This function isn't used anyway, because there's not yet a bsd ring error state dumper. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-09-25 12:23:15 +01:00
Chris Wilson	f787a5f59e	drm/i915: Only hold a process-local lock whilst throttling. Avoid cause latencies in other clients by not taking the global struct mutex and moving the per-client request manipulation a local per-client mutex. For example, this allows a compositor to schedule a page-flip (through X) whilst an OpenGL application is monopolising the GPU. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-09-24 21:03:00 +01:00
Chris Wilson	780f0ca3e0	drm/i915/ringbuffer: Fix sign of ring space. As we presume space is signed when computing and looking for wrap along, make it so. Reported-by: Owain G. Ainsworth <zerooa@googlemail.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-09-24 14:19:55 +01:00
Chris Wilson	5c12a07e80	drm/i915: Drop ring->lazy_request We are not currently using it as intended, so remove the complication. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-09-22 11:58:55 +01:00
Chris Wilson	ab6f8e3250	drm/i915/ringbuffer: whitespace cleanup Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-09-21 11:20:06 +01:00
Daniel Vetter	a9db5c8fdd	drm/i915: drop alignment ringbuffer parameter Always PAGE_SIZE and only complicates the code. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-09-21 11:20:05 +01:00
Daniel Vetter	7f2ab69913	drm/i915: use new macros to access the ring ctl register Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-09-21 11:20:04 +01:00
Daniel Vetter	570ef60859	drm/i915: use new macros to access the ring head register Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-09-21 11:20:03 +01:00
Daniel Vetter	6c0e1c556e	drm/i915: use new macros to access the ring start register Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-09-21 11:20:02 +01:00
Daniel Vetter	870e86ddc2	drm/i915: use new macros to access the ring tail register Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-09-21 11:20:02 +01:00
Daniel Vetter	333e9fe94d	drm/i915: add relative ring register macros Documentation explicitly mentions that the ring registers are designed to have the same offsets relative to a base registers. Use this to fight the code beaurocratic in intel_ringbuffer.c. No code changes in this patch, just the new definitions. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-09-21 11:20:01 +01:00
Chris Wilson	a3f07cd53e	drm/i915/ringbuffer: Implement advance using set_tail As noted by Zhenyu, we can now simply replace the existing advance hook by calling the new set_tail function pointer directly. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-09-21 11:19:57 +01:00
Xiang, Haihao	d46eefa297	drm/i915: add set_tail hook in struct intel_ring_buffer This is prepared for video codec ring buffer on Sandybridge. It is needed to read/write more than one register to move the tail pointer of the video codec ring on Sandybridge. Signed-off-by: Xiang, Haihao <haihao.xiang@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-09-21 11:19:56 +01:00
Xiang, Haihao	5c1143bbec	drm/i915: do not export the instances of struct intel_ring_buffer Introduce intel_init_render_ring_buffer(), intel_init_bsd_ring_buffer for ring initialization. Signed-off-by: Xiang, Haihao <haihao.xiang@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-09-21 11:19:55 +01:00
Chris Wilson	9220434a87	drm/i915: Only emit a flush request on the active ring. When flushing the GPU domains,we emit a flush on both rings, even though they share a unified cache. Only emit the flush on the currently active ring. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-09-21 11:19:51 +01:00
Chris Wilson	2b6efaa476	drm/i915: Remove unused intel_ringbuffer->ring_flag This can always be re-added should somebody find a use... Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-09-14 21:13:00 +01:00
Daniel Vetter	a6910434e1	drm/i915: only one interrupt per batchbuffer is not enough! Previously I thought that one interrupt per batchbuffer should be enough. Now tedious benchmarking showed this to be wrong. Therefore track whether any commands have been isssued with a future seqno (like pipelined fencing changes or flushes). If this is the case emit a request before issueing the batchbuffer. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2010-09-08 10:23:35 +01:00
Chris Wilson	6f392d5486	drm/i915: Use a common seqno for all rings. This will be used by the eviction logic to maintain fairness between the rings. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Eric Anholt <eric@anholt.net>	2010-08-09 11:24:32 -07:00
Chris Wilson	e898cd221d	drm/i915: Inline ringbuffer_emit() As the function has been reduced to a store plus increment, the body is now smaller than the call so inline it. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Eric Anholt <eric@anholt.net>	2010-08-09 11:24:31 -07:00
Zou Nan hai	8187a2b70e	drm/i915: introduce intel_ring_buffer structure (V2) Introduces a more complete intel_ring_buffer structure with callbacks for setup and management of a particular ringbuffer, and converts the render ring buffer consumers to use it. Signed-off-by: Zou Nan hai <nanhai.zou@intel.com> Signed-off-by: Xiang Hai hao <haihao.xiang@intel.com> [anholt: Fixed up whitespace fail and rebased against prep patches] Signed-off-by: Eric Anholt <eric@anholt.net>	2010-05-26 13:24:49 -07:00

... 2 3 4 5 6

283 Commits