Pretty astonishing how far apart these two members landed ... Especially since
I've already removed almost 200 lines in between.
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Also, move dev_priv->counter there, it's only used in i915_dma.c
And also move the dri1 dungeon at the end of dev_priv where no one
cares about it.
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
And give the structs slightly more generic names. I've decided to keep
the short rps/ips prefix, since that's just easier and less churn.
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
dev_priv has grown way too big, and grouping memebers into substructs
and moving them out of line helps re-gain some overview.
Unfortunatley I couldn't just call the substruct save and drop the prefix, since
that will make most member names clash with registers #defines. Changes in
i915_drv.h done by hand, everything else changed with
s/\<save\([A-Z]*\)/regfile.save\1/ in vim.
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now that we no longer pretend to have flexibility in matching any
north display block with any pch, we can ditch this.
v2: Fix the embarassing rebase fail that Paulo Zanoni spotted.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Some subsequent commits will need to know what generation we're running
on to do different pte encoding for the ppgtt. Since it's not much
hassle or overhead to store it in the ppgtt structure, do that.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
After all relevant pipes are disabled and after we've updated all the
state with the staged state, but before we call the per-crtc
->mode_set functions there's a very natural point to set up any
shared/global resources like
- shared plls (obviously only the setup, the enabling needs to be
separately handling with a separate refcount)
- global watermark state like the DSPARB on gmch platforms
- workaround bits that depend upon the exact global output
configuration
- enabling the right set of refclocks
- enabling/disabling manual power wells.
Now for a lot of these things we can't move them into this function
yet, most often because we only compute the required information in
the per-crtc ->mode_set callback. Which is too late. But due to a
bunch of reasons (check-only atomic modeset, fastboot&hw state checks,
...) we need to separate the computation of that state from the actual
hw frobbery anyway. So we can move things into this new callback step-
by-step.
Others can't be moved here (or implemented at all) because our code
lacks the smarts to properly update them. E.g. the DSPARB can only be
updated when all pipes are disabled, so if we decide to change it's
value, we need to disable _all_ pipes. The infrastructure for that is
already in place (with the various pipe masks that driver the modeset
logic). But again we need to move a few things out of ->mode_set
first before we can even implement the correct decision making.
In any case, we need to start somewhere, so let's start with the
callback: Some small follow-up patches will make immediate good use of
it.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Before Haswell we used to have the CPU pipes and the PCH transcoders.
We had the same amount of pipes and transcoders, and there was a 1:1
mapping between them. After Haswell what we used to call CPU pipe was
split into CPU pipe and CPU transcoder. So now we have 3 CPU pipes (A,
B and C), 4 CPU transcoders (A, B, C and EDP) and 1 PCH transcoder
(only used for VGA).
For all the outputs except for EDP we have an 1:1 mapping on the CPU
pipes and CPU transcoders, so if you're using CPU pipe A you have to
use CPU transcoder A. When have an eDP output you have to use
transcoder EDP and you can attach this CPU transcoder to any of the 3
CPU pipes. When using VGA you need to select a pair of matching CPU
pipes/transcoders (A/A, B/B, C/C) and you also need to enable/use the
PCH transcoder.
For now we're just creating the cpu_transcoder definitions and setting
cpu_transcoder to TRANSCODER_EDP on DDI eDP code, but none of the
registers was ported to use transcoder instead of pipe. The goal is to
keep the code backwards-compatible since on all cases except when
using eDP we must have pipe == cpu_transcoder.
V2: Comment the haswell_crtc_off chunk, suggested by Damien Lespiau
and Daniel Vetter.
We currently need the haswell_crtc_off chunk because TRANSCODER_EDP
can be used by any CRTC, so when you stop using it you have to stop
saying you're using it, otherwise you may have at some point 2 CRTCs
claiming they're using TRANSCODER_EDP (a disabled CRTC and an enabled
one), then the HW state readout code will get completely confused.
In other words:
Imagine the following case:
xrandr --output eDP1 --auto --crtc 0
xrandr --output eDP1 --off
xrandr --output eDP1 --auto --crtc 2
After the last command you could get a "pipe A assertion failure
(expected off, current on)" because CRTC 0 still claims it's using
TRANSCODER_EDP, so the HW state readout function will read it
(through PIPECONF) and expect it to be off, when it's actually on
because it's being used by CRTC 2.
So when we make "intel_crtc->cpu_transcoder = intel_crtc->pipe" we
make sure we're pointing to our own original CRTC which is certainly
not used by any other CRTC.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Get rid of saved int_lvds_connector and int_edp_connector in
drm_i915_private.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Based on earlier work by Chris Wilson <chris@chris-wilson.co.uk>
CC: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJQgvdwAAoJEHm+PkMAQRiG+3AH/i2XsqqN3VctL0nnbWfvds+Q
aKulfIdJTjKiVAsawPUtRqReZ8ijiebrgA/53lZLlrFOoPPQ5+LHmnSyQF6gErOY
NuAE1lijXDRM1pwBlhvOBbAj26wUobGjqONFJ9OkKr758Ue8ds/Q7UdxyEgmYgmg
tvVMzfRcICzryUV3PcqL+3cNPpCUdT6wGGRJ9DCv/jvGiWKExWhOle5oltrmxk+D
NsqRcws5pEubfHE4J8BvNWr8lE1kHfYVhrJETiLJUiN2XAJcbI4Jy7rU/3EGteNS
0HMZdaPPjV874lohdM70X2225SbYrCVkAYB5hnZCTeC3tYyCawBBPMQoyAiOcmU=
=+861
-----END PGP SIGNATURE-----
Merge tag 'v3.7-rc2' into drm-intel-next-queued
Linux 3.7-rc2
Backmerge to solve two ugly conflicts:
- uapi. We've already added new ioctl definitions for -next. Do I need to say more?
- wc support gtt ptes. We've had to revert this for snb+ for 3.7 and
also fix a few other things in the code. Now we know how to make it
work on snb+, but to avoid losing the other fixes do the backmerge
first before re-enabling wc gtt ptes on snb+.
And a few other minor things, among them git getting confused in
intel_dp.c and seemingly causing a conflict out of nothing ...
Conflicts:
drivers/gpu/drm/i915/i915_reg.h
drivers/gpu/drm/i915/intel_display.c
drivers/gpu/drm/i915/intel_dp.c
drivers/gpu/drm/i915/intel_modes.c
include/drm/i915_drm.h
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Some BIOSes may forcibly suspend RC6 during their operation which
trigger a warning as we find the hardware in a perplexing state upon
first use. So far that appears to be the worst symptom as fortuituously
we use the same values as the BIOS for programming the FORCEWAKE register.
Reported-by: Oleksij Rempel <bug-track@fisher-privat.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
On the worst scenario, users with new hardwares and old kernel from
enabling times can get black screens. So, from now on, this
perliminary_hw_support module parameter shall be used by all upcoming
platforms that are still under enabling. The second option would be to
merge the pci ids after basic modeset works, but that makes testing
and development while bringing up hw a rather tedious afair.
Although it is uncomfortable for developers use this extra variable it
brings more stability for end users.
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
[danvet: dropped the i915_ param prefix, i915.i915_ is just tedious.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There is a special mechanism for communicating with the PCU already
being used for the ring frequency stuff. As we'll be needing this for
other commands, extract it now to make future code less error prone and
the current code more reusable.
I'm not entirely sure if this code matches 1:1 with the previous code
behaviorally. Functionally however, it should be the same.
CC: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: Fixup compile fail reported by Wu Fengguang.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Note that just because we have n == MAX elements left, does not imply
that there are only MAX elements left in the scatterlist and so we may
not be on the last chain, and the nth element may in fact be a chain ptr.
This is exercised by the improved hangman tests and the gem_exec_big
test in i-g-t.
This regression has been introduced in
commit 9da3da660d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Fri Jun 1 15:20:22 2012 +0100
drm/i915: Replace the array of pages with a scatterlist
v2: KISS, replace the direct lookup with a for_each_sg() [danvet]
v3: Try to be clever again.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The intention was to allow the caller to avoid a failure to queue a
request having already written commands to the ring. However, this is a
moot point as the i915_add_request() can fail for other reasons than a
mere allocation failure and those failure cases are more likely than
ENOMEM. So the overlay code already had to handle i915_add_request()
failures, and due to
commit 3bb73aba1e
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Fri Jul 20 12:40:59 2012 +0100
drm/i915: Allow late allocation of request for i915_add_request()
the error handling code in intel_overlay.c was subject to causing
double-frees, as found by coverity.
Rather than further complicate i915_add_request() and callers, realise
the battle is lost and adapt intel_overlay.c to take advantage of the
late allocation of requests.
v2: Handle callers passing in a NULL seqno.
v3: Ditto. This time for sure.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Problems with the previous code:
- HDMI just uses WRPLL1 for everything, so dual head cases might not
work sometimes.
- At encoder->mode_set we just write the PLL register without doing
any kind of check (e.g., check if the PLL is already being used).
- There is no way to fail and return error codes at
encoder->mode_set.
- We write to PORT_CLK_SEL at mode_set and we never disable it.
- Machines hang due to wrong clock enable/disable sequence.
So here we rewrite the code, making it a little more like the
pre-Haswell PLL mode set code:
- Check PLL availability at ironlake_crtc_mode_set.
- Try to use both WRPLLs.
- Check if PLLs are used before actually trying to use them, and
properly fail with error messages.
- Enable/disable PORT_CLK_SEL at the right place.
- Add some WARNs to check for bugs.
The next improvement will be to try to reuse PLLs if the timings
match, but this is content for another patch and it's already
documented with a TODO comment.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
round_jiffies() aligns the wakeup time to the nearest second in order to
batch wakeups and reduce system load, which is useful for unimportant
coarse timers like our hangcheck.
v2: round_jiffies_relative() returns the relative jiffie value, whereas
we need the absolute value for the timer.
Suggested-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
By providing a callback for when we need to bind the pages, and then
release them again later, we can shorten the amount of time we hold the
foreign pages mapped and pinned, and importantly the dmabuf objects then
behave as any other normal object with respect to the shrinker and
memory management.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Magic numbers are bad mmmkay. In this case in particular the value is
especially weird because the docs say multiple things. We'll need this
value for sysfs, so extracting it is useful for that as well.
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Rather than have multiple data structures for describing our page layout
in conjunction with the array of pages, we can migrate all users over to
a scatterlist.
One major advantage, other than unifying the page tracking structures,
this offers is that we replace the vmalloc'ed array (which can be up to
a megabyte in size) with a chain of individual pages which helps reduce
memory pressure.
The disadvantage is that we then do not have a simple array to iterate,
or to access randomly. The common case for this is in the relocation
processing, which will typically fit within a single scatterlist page
and so be almost the same cost as the simple array. For iterating over
the array, the extra function call could be optimised away, but in
reality is an insignificant cost of either binding the pages, or
performing the pwrite/pread.
v2: Fix drm_clflush_sg() to not invoke wbinvd as well! And fix the
trivial compile error from rebasing.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We need to refcount our pages in order to prevent reaping them at
inopportune times, such as when they currently vmapped or exported to
another driver. However, we also wish to keep the lazy deallocation of
our pages so we need to take a pin/unpinned approach rather than a
simple refcount.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In order to specialise functions depending upon the type of object, we
can attach vfuncs to each object via a new ->ops pointer.
For instance, this will be used in future patches to only bind pages from
a dma-buf for the duration that the object is used by the GPU - and so
prevent them from pinning those pages for the entire of the object.
v2: Bonus comments.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As a quick reference I'll detail the motivation and design of the new code a
bit here (mostly stitched together from patchbomb announcements and commits
introducing the new concepts).
The crtc helper code has the fundamental assumption that encoders and crtcs can
be enabled/disabled in any order, as long as we take care of depencies (which
means that enabled encoders need an enabled crtc to feed them data,
essentially).
Our hw works differently. We already have tons of ugly cases where crtc code
enables encoder hw (or encoder->mode_set enables stuff that should only be
enabled in enocder->commit) to work around these issues. But on the disable
side we can't pull off similar tricks - there we actually need to rework the
modeset sequence that controls all this. And this is also the real motivation
why I've finally undertaken this rewrite: eDP on my shiny new Ivybridge
Ultrabook is broken, and it's broken due to the wrong disable sequence ...
The new code introduces a few interfaces and concepts:
- Add new encoder->enable/disable functions which are directly called from the
crtc->enable/disable function. This ensures that the encoder's can be
enabled/disabled at a very specific in the modeset sequence, controlled by our
platform specific code (instead of the crtc helper code calling them at a time
it deems convenient).
- Rework the dpms code - our code has mostly 1:1 connector:encoder mappings and
does support cloning on only a few encoders, so we can simplify things quite a
bit.
- Also only ever disable/enable the entire output pipeline. This ensures that
we obey the right sequence of enabling/disabling things, trying to be clever
here mostly just complicates the code and results in bugs. For cloneable
encoders this requires a bit of special handling to ensure that outputs can
still be disabled individually, but it simplifies the common case.
- Add infrastructure to read out the current hw state. No amount of careful
ordering will help us if we brick the hw on the initial modeset setup. Which
could happen if we just randomly disable things, oblivious to the state set up
by the bios. Hence we need to be able to read that out. As a benefit, we grow a
few generic functions useful to cross-check our modeset code with actual hw
state.
With all this in place, we can copy&paste the crtc helper code into the
drm/i915 driver and start to rework it:
- As detailed above, the new code only disables/enables an entire output pipe.
As a preparation for global mode-changes (e.g. reassigning shared resources) it
keeps track of which pipes need to be touched by a set of bitmasks.
- To ensure that we correctly disable the current display pipes, we need to
know the currently active connector/encoder/crtc linking. The old crtc helper
simply overwrote these links with the new setup, the new code stages the new
links in ->new_* pointers. Those get commited to the real linking pointers once
the old output configuration has been torn down, before the ->mode_set
callbacks are called.
- Finally the code adds tons of self-consistency checks by employing the new hw
state readout functions to cross-check the actual hw state with what the
datastructure think it should be. These checks are done both after every
modeset and after the hw state has been read out and sanitized at boot/resume
time. All these checks greatly helped in tracking down regressions and bugs in
the new code.
With this new basis, a lot of cleanups and improvements to the code are now
possible (besides the DP fixes that ultimately made me write this), but not yet
done:
- I think we should create struct intel_mode and use it as the adjusted mode
everywhere to store little pieces like needs_tvclock, pipe dithering values or
dp link parameters. That would still be a layering violation, but at least we
wouldn't need to recompute these kinds of things in intel_display.c. Especially
the port bpc computation needed for selecting the pipe bpc and dithering
settings in intel_display.c is rather gross.
- In a related rework we could implement ->mode_valid in terms of ->mode_fixup
in a generic way - I've hunted down too many bugs where ->mode_valid did the
right thing, but ->mode_fixup didn't. Or vice versa, resulting in funny bugs
for user-supplied modes.
- Ditch the idea to rework the hdp handling in the common crtc helper code and
just move things to i915.ko. Which would rid us of the ->detect crtc helper
dependencies.
- LVDS wire pair and pll enabling is all done in the crtc->mode_set function
currently. We should be able to move this to the crtc_enable callbacks (or in
the case of the LVDS wire pair enabling, into some encoder callback).
Last, but not least, this new code should also help in enabling a few neat
features: The hw state readout code prepares (but there are still big pieces
missing) for fastboot, i.e. avoiding the inital modeset at boot-up and just
taking over the configuration left behind by the bios. We also should be able
to extend the configuration checks in the beginning of the modeset sequence and
make better decisions about shared resources (which is the entire point behind
the atomic/global modeset ioctl).
Tested-by: Jani Nikula <jani.nikula@intel.com>
Tested-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Damien Lespiau <damien.lespiau@intel.com>
Tested-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Vijay Purushothaman <vijay.a.purushothaman@intel.com>
Acked-by: Vijay Purushothaman <vijay.a.purushothaman@intel.com>
Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Acked-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Tested-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
... instead of resetting a few things and hoping that this will work
out.
To properly disable the output pipelines at the initial modeset after
resume or boot up we need to have an accurate picture of which outputs
are enabled and connected to which crtcs. Otherwise we risk disabling
things at the wrong time, which can lead to hangs (or at least royally
confused panels), both requiring a walk to the reset button to fix.
Hence read out the hw state with the freshly introduce get_hw_state
functions and then sanitize it afterwards.
For a full modeset readout (which would allow us to avoid the initial
modeset at boot up) a few things are still missing:
- Reading out the mode from the pipe, especially the dotclock
computation is quite some fun.
- Reading out the parameters for the stolen memory framebuffer and
wrapping it up.
- Reading out the pch pll connections - luckily the disable code
simply bails out if the crtc doesn't have a pch pll attached (even
for configurations that would need one).
This patch here turned up tons of smelly stuff around resume: We
restore tons of register in seemingly random way (well, not quite, but
we're not too careful either), which leaves the hw in a rather
ill-defined state: E.g. the port registers are sometimes
unconditionally restore (lvds, crt), leaving us with an active
encoder/connector but no active pipe connected to it. Luckily the hw
state sanitizer detects this madness and fixes things up a bit.
v2: When checking whether an encoder with active connectors has a crtc
wire up to it, check for both the crtc _and_ it's active state.
v3:
- Extract intel_sanitize_encoder.
- Manually disable active encoders without an active pipe.
v4: Correclty fix up the pipe<->plane mapping on machines where we
switch pipes/planes. Noticed by Chris Wilson, who also provided the
fixup.
v5: Spelling fix in a comment, noticed by Paulo Zanoni
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Because that's what we're essentially calling. This is the first step
in untangling the crtc_helper induced dpms handling mess we have - at
the crtc level we only have 2 states and the magic is just in
selecting which one (and atm there isn't even much magic, but on
recent platforms where not even the crt output has more than 2 states
we could do better).
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Like with the equivalent change for gen6+ rps state, this helps in
clarifying the code (and in fixing a few places that have fallen through
the cracks in the locking review).
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Using the extracted INSTDONE reading, and our new register definitions,
update our hangcheck detection and error collection to use it. This
primarily means changing == to memcmp, and changing = to memcpy.
Hopefully this will give more info on error dump, and provide more
accurate hangcheck detection (both are actually TBD).
Also, remove the reading of instdone1 from the ring error collection
function, and just crap everything in capture_error_state (that could be
split into a separate patch if it wasn't so trivial).
v2: Now assuming i915_get_extra_instdone does the memset we can clean up the
code a bit (Jani)
v3: use ARRAY_SIZE as requested earlier by Jani (didn't change sizeof)
Updated commit msg
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we wish to create specialised object constructions in the near
future that share the same basic GEM object struct, export the default
initializer.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Avoid stalling and waiting for the GPU by checking to see if there is
sufficient inactive space in the aperture for us to bind the buffer
prior to writing through the GTT. If there is inadequate space we will
have to stall waiting for the GPU, and incur overheads moving objects
about. Instead, only incur the clflush overhead on the target object by
writing through shmem.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
ERR_INT can generate interrupts. However since most of the conditions seem
quite fatal the patch opts to simply report it in error state instead of
adding more complexity to the interrupt handler for little gain (the
bits are sticky anyway).
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Antti Koskipaa <antti.koskipaa@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
... and move a few others only used by i915_dma.c into the dri1
dungeon.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It's only ever a pointer to the global mchdev_lock, and we don't use
it at all.
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This way it's easier so see what belongs together, and what is used
by the ilk ips code. Also add some comments that explain the locking.
Note that (cur|min|max)_delay need to be duplicated, because
they're also used by the ips code.
v2: Missed one place that the dev_priv->ips change caught ...
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Handy for lazy people like me, or when people forget to add the output
of lspci -nn.
v2: Chris Wilson noticed that we have this duplicated already in the
i915_capabilites debugfs file. But there \n as separator looks better,
which would be a bit verbose in dmesg. Abuse the preprocessor to
extract this all.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We believe to have squashed all issues around the gen6+ rps interrupt
generation and why the gpu sometimes got stuck. With that cleared up,
there's no user left for the sanitize_pm infrastructure, so let's just
rip it out.
Note that 'intel_reg_write 0xa014 0x13070000' is the w/a if we find
ourselves stuck again.
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
By selecting the cache level (essentially whether or not the CPU snoops
any updates to the bo, and on more recent machines whether it resides
inside the CPU's last-level-cache) a userspace driver is able to then
manage all of its memory within buffer objects, if it so desires. This
enables the userspace driver to accelerate uploads and more importantly
downloads from the GPU and to able to mix CPU and GPU rendering/activity
efficiently.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Added code comment about where we plan to stuff platform
specific cacheing control bits in the ioctl struct.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Several functions of the GPU have the restriction that differing memory
domains cannot be placed next to each other (as the GPU may prefetch
beyond the end of one domain and hang as it crosses into the other
domain). We use the facility of the drm_mm to mark ranges with a
particular color that corresponds to the cache attributes of those pages
in order to prevent allocating adjacent blocks of differing memory
types.
v2: Rebase ontop of drm_mm coloring v2.
v3: Fix rebinding existing gtt_space and add a verification routine.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Originally I had a macro specifically for DPF support, and Daniel, with
good reason asked me to change it to this. It's not the way I would have
gone (and indeed I didn't), but for now there is no distinction as all
platforms with L3 also have DPF.
Note: The good reasons are that dpf is a l3$ feature (at least on
currrent hw), hence I don't expect one to go without the other.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: added note]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As suggested by Daniel, rip out the independent timers for device and
crtc busyness and integrate the manual powermanagement of the display
engine into the GEM core and its request tracking. The benefits are that
the code is a lot smaller, fewer moving parts and should fit more neatly
into the overall activity tracking of the driver.
v2: Complete overhaul and removal of the racy timers and workers.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
By moving the function to intel_ringbuffer and currying the appropriate
parameter, hopefully we make the callsites easier to read and
understand.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is now handled by a global flag to ensure we emit a flush before
the next serialisation point (if we failed to queue one previously).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we guarantee to emit a flush before emitting the breadcrumb or
the next batchbuffer, there is no further need for the flushing list.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we always flush the GPU cache prior to emitting the breadcrumb, we no
longer have to worry about the deferred flush causing the
pending_gpu_write to be delayed. So we can instead utilize the known
last_write_seqno to hopefully minimise the wait times.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Request preallocation was added to i915_add_request() in order to
support the overlay. However, not all users care and can quite happily
ignore the failure to allocate the request as they will simply repeat
the request in the future.
By pushing the allocation down into i915_add_request(), we can then
remove some rather ugly error handling in the callers.
v2: Nullify request->file_priv otherwise we chase a garbage pointer
when retiring requests.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The interface's immediate purpose is to do synchronous timestamp queries
as required by GL_TIMESTAMP. The GPU has a register for reading the
timestamp but because that would normally require root access through
libpciaccess, the IOCTL can provide this service instead.
Currently the implementation whitelists only the render ring timestamp
register, because that is the only thing we need to expose at this time.
v2: make size implicit based on the register offset
Add a generation check
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: Jacek Lawrynowicz <jacek.lawrynowicz@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: fixup the ioctl numerb:]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Having had to dive into the bspec to understand what each stage of the
workaround meant, and how that the ring broadcasting IDLE corresponded
with the GT powering down the ring (i.e. rc6) add comments to aide
the next reader.
And since the register "is used to control all aspects of PSMI and power
saving functions" that makes it quite interesting to inspect with
regards to RC6 hangs, so add it to the error-state.
v2: Rediscover the piece of magic, set the RNCID to 0 before waiting for
the ring to wake up.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We already have this pattern at quite a few places, and moving part of
the modeset helper stuff into the driver will add more.
v2: Don't clobber the crtc struct name with the macro parameter ...
v3: Convert two more places noticed by Paulo Zanoni.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
So don't return -EAGAIN, even in the case of a gpu hang. Remap it to
-EIO instead. Note that this isn't really an issue with
interruptability, but more that we have quite a few codepaths (mostly
around kms stuff) that simply can't handle any errors and hence not
even -EAGAIN. Instead of adding proper failure paths so that we could
restart these ioctls we've opted for the cheap way out of sleeping
non-interruptibly. Which works everywhere but when the gpu dies,
which this patch fixes.
So essentially interruptible == false means 'wait for the gpu or die
trying'.'
This patch is a bit ugly because intel_ring_begin is all non-interruptible
and hence only returns -EIO. But as the comment in there says,
auditing all the callsites would be a pain.
To avoid duplicating code, reuse i915_gem_check_wedge in __wait_seqno
and intel_wait_ring_buffer. Also use the opportunity to clarify the
different cases in i915_gem_check_wedge a bit with comments.
v2: Don't access dev_priv->mm.interruptible from check_wedge - we
might not hold dev->struct_mutex, making this racy. Instead pass
interruptible in as a parameter. I've noticed this because I've hit a
BUG_ON(!mutex_is_locked) at the top of check_wedge. This has been
added in
commit b4aca0106c
Author: Ben Widawsky <ben@bwidawsk.net>
Date: Wed Apr 25 20:50:12 2012 -0700
drm/i915: extract some common olr+wedge code
although that commit is missing any justification for this. I guess
it's just copy&paste, because the same commit add the same BUG_ON
check to check_olr, where it indeed makes sense.
But in check_wedge everything we access is protected by other means,
so this is superflous. And because it now gets in the way (we add a
new caller in __wait_seqno, which can be called without
dev->struct_mutext) let's just remove it.
v3: Group all the i915_gem_check_wedge refactoring into this patch, so
that this patch here is all about not returning -EAGAIN to callsites
that can't handle syscall restarting.
v4: Add clarification what interuptible == fales means in our code,
requested by Ben Widawsky.
v5: Fix EAGAIN mispell noticed by Chris Wilson.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Previously we had has_pch_split to tell us whether we had a PCH or not
and we also had dev_priv->pch_type to tell us which kind of PCH it
was, but it could only be used if we were 100% sure we did have a PCH.
Now that PCH_NONE was added to dev_priv->pch_type we don't need
has_pch_split anymore: we can just check for pch_type != PCH_NONE.
The HAS_PCH_{IBX,CPT,LPT} macros use dev_priv->pch_type, so they can
only be called after intel_detect_pch. The HAS_PCH_SPLIT macro looks
at dev_priv->info->has_pch_split, which is available earlier.
Since the goal is to implement HAS_PCH_SPLIT using dev_priv->pch_type
instead of dev_priv->info->has_pch_split, we need to make sure that
intel_detect_pch is called before any calls to HAS_PCH_SPLIT are made.
So we moved the intel_detect_pch call to an earlier stage.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
And rely on the fact that it's 0 to assume that machines without a PCH
will have PCH_NONE as dev_priv->pch_type.
Just today I finally realized that HAS_PCH_IBX is true for machines
without a PCH. IMHO this is totally counter-intuitive and I don't
think it's a good idea to assume that we're going to check for
HAS_PCH_IBX only after we check for HAS_PCH_SPLIT.
I believe that in the future we'll have more PCH types and checks
like:
if (HAS_PCH_IBX(dev) || HAS_PCH_CPT(dev))
will become more and more common. There's a good chance that we may
break non-PCH machines by adding these checks in code that runs on all
machines. I also believe that the HAS_PCH_SPLIT check will become less
common as we add more and more different PCH types. We'll probably
start replacing checks like:
if (HAS_PCH_SPLIT(dev))
foo();
else
bar();
with:
if (HAS_PCH_NEW(dev))
baz();
else if (HAS_PCH_OLD(dev) || HAS_PCH_IBX(dev))
foo();
else
bar();
and this may break gen 2/3/4.
As far as we have investigated, this patch will affect the behavior of
intel_hdmi_dpms and intel_dp_link_down on gen 4. In both functions the
code inside the HAS_PCH_IBX check is for IBX-specific workarounds, so
we should be safe. If we start bisecting gen 2/3/4 bugs to this commit
we should consider replacing the HAS_PCH_IBX checks with something
else.
V2: Improve commit message, list possible side effects and solution.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
While creating the new enable/disable_gt_powersave functions in
commit 8090c6b9da
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Sun Jun 24 16:42:32 2012 +0200
drm/i915: wrap up gt powersave enabling functions
I've botched up the handling of ironlake_disable_rc6. Fix this up by
calling it at the right place. Note though that ironlake_disable_rc6
does a bit more than just disabling rc6 - it also tears down all the
allocated context objects.
Hence we need to move intel_teardown_rc6 out and directly call it from
intel_modeset_cleanup.
Also properly mark ironlake_enable_rc6 as static and kill the un-used
declaration in i915_drv.h.
Note: In review a question popped out why disable_rc6 also tears down
the backing object and why we should move that out - it's simply for
consistency with gen6+ rps code, which does it that way.
Cc: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Tidy up the routines for interacting with the GT (in particular the
forcewake dance) which are scattered throughout the code in a single
structure.
v2: use wait_for_atomic for polling.
v3: *really* use wait_for_atomic for polling.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQEcBAABAgAGBQJP53AxAAoJEHm+PkMAQRiGs2QH/RaqkXz96fwjhDcyiKpDqA3c
kGuS5mz5cOhnqKSmR88HFm6pwuhLux/qSJzeAmoQy1MC8a0ACx7AnANW0lfN3/qe
/HGYz8h60yCL/fhn8/bUYtdt9xsoDqoDcq/ooFl9mcsJGWbC6WeMSZU5dAUYqviE
qFrp5zjY07FG53CRGT0hFpezQNwNL+VLH30CF9LD+fJLPVEYum2zBNGXWM42rcw5
fxzGL/6SO8YqA/Upic1ht6HAd6s5LOrlST7qvnyXUMvRXN5z/Y92ueYJZefkS1Om
ohuLIKM2bv9/dJS67H8N2baSKGCzBdfSe5/5WaHdLYW9MiVju0wRl6HPJtAMrkk=
=H8t8
-----END PGP SIGNATURE-----
Merge tag 'v3.5-rc4' into drm-intel-next-queued
I want to merge the "no more fake agp on gen6+" patches into
drm-intel-next (well, the last pieces). But a patch in 3.5-rc4 also
adds a new use of dev->agp. Hence the backmarge to sort this out, for
otherwise drm-intel-next merged into Linus' tree would conflict in the
relevant code, things would compile but nicely OOPS at driver load :(
Conflicts in this merge are just simple cases of "both branches
changed/added lines at the same place". The only tricky part is to
keep the order correct wrt the unwind code in case of errors in
intel_ringbuffer.c (and the MI_DISPLAY_FLIP #defines in i915_reg.h
together, obviously).
Conflicts:
drivers/gpu/drm/i915/i915_reg.h
drivers/gpu/drm/i915/intel_ringbuffer.c
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It doesn't hurt and it at least prevents us from OOPSing left and
right at quite a few places. This also allows us to simplify the code
a bit by folding the only line of context_open into the callsite.
We obviuosly also need to run the cleanup code unconditionally, too.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Add the interfaces to allow user space to create and destroy contexts.
Contexts are destroyed automatically if the file descriptor for the dri
device is closed.
Following convention as usual here causes checkpatch warnings.
v2: with is_initialized, no longer need to init at create
drop the context switch on create (daniel)
v3: Use interruptible lock (Chris)
return -ENODEV in !GEM case (Chris)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Implement the context switch code as well as the interfaces to do the
context switch. This patch also doesn't match 1:1 with the RFC patches.
The main difference is that from Daniel's responses the last context
object is now stored instead of the last context. This aids in allows us
to free the context data structure, and context object independently.
There is room for optimization: this code will pin the context object
until the next context is active. The optimal way to do it is to
actually pin the object, move it to the active list, do the context
switch, and then unpin it. This allows the eviction code to actually
evict the context object if needed.
The context switch code is missing workarounds, they will be implemented
in future patches.
v2: actually do obj->dirty=1 in switch (daniel)
Modified comment around above
Remove flags to context switch (daniel)
Move mi_set_context code to i915_gem_context.c (daniel)
Remove seqno , use lazy request instead (daniel)
v3: use i915_gem_request_next_seqno instead of
outstanding_lazy_request (Daniel)
remove id's from trace events (Daniel)
Put the context BO in the instruction domain (Daniel)
Don't unref the BO is context switch fails (Chris)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Invent an abstraction for a hw context which is passed around through
the core functions. The main bit a hw context holds is the buffer object
which backs the context. The rest of the members are just helper
functions. Specifically the ring member, which could likely go away if
we decide to never implement whatever other hw context support exists.
Of note here is the introduction of the 64k alignment constraint for the
BO. If contexts become heavily used, we should consider tweaking this
down to 4k. Until the contexts are merged and tested a bit though, I
think 64k is a nice start (based on docs).
Since we don't yet switch contexts, there is really not much complexity
here. Creation/destruction works pretty much as one would expect. An idr
is used to generate the context id numbers which are unique per file
descriptor.
v2: add DRM_DEBUG_DRIVERS to distinguish ENOMEM failures (ben)
convert a BUG_ON to WARN_ON, default destruction is still fatal (ben)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Very basic code for context setup/destruction in the driver.
Adds the file i915_gem_context.c This file implements HW context
support. On gen5+ a HW context consists of an opaque GPU object which is
referenced at times of context saves and restores. With RC6 enabled,
the context is also referenced as the GPU enters and exists from RC6
(GPU has it's own internal power context, except on gen5). Though
something like a context does exist for the media ring, the code only
supports contexts for the render ring.
In software, there is a distinction between contexts created by the
user, and the default HW context. The default HW context is used by GPU
clients that do not request setup of their own hardware context. The
default context's state is never restored to help prevent programming
errors. This would happen if a client ran and piggy-backed off another
clients GPU state. The default context only exists to give the GPU some
offset to load as the current to invoke a save of the context we
actually care about. In fact, the code could likely be constructed,
albeit in a more complicated fashion, to never use the default context,
though that limits the driver's ability to swap out, and/or destroy
other contexts.
All other contexts are created as a request by the GPU client. These
contexts store GPU state, and thus allow GPU clients to not re-emit
state (and potentially query certain state) at any time. The kernel
driver makes certain that the appropriate commands are inserted.
There are 4 entry points into the contexts, init, fini, open, close.
The names are self-explanatory except that init can be called during
reset, and also during pm thaw/resume. As we expect our context to be
preserved across these events, we do not reinitialize in this case.
As Adam Jackson pointed out, The cutoff of 1MB where a HW context is
considered too big is arbitrary. The reason for this is even though
context sizes are increasing with every generation, they have yet to
eclipse even 32k. If we somehow read back way more than that, it
probably means BIOS has done something strange, or we're running on a
platform that wasn't designed for this.
v2: rename load/unload to init/fini (daniel)
remove ILK support for get_size() (indirectly daniel)
add HAS_HW_CONTEXTS macro to clarify supported platforms (daniel)
added comments (Ben)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
For that to work we need to export the base address of the gtt
mmio window from intel-gtt. Also replace all other uses of
dev->agp by values we already have at hand.
Reviewed-by: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel:
drm/i915: pch_irq_handler -> {ibx, cpt}_irq_handler
char/agp: add another Ironlake host bridge
drm/i915: fix up ivb plane 3 pageflips
drm/i915: hold forcewake around ring hw init
drm/i915: Mark the ringbuffers as being in the GTT domain
drm/i915/crt: Do not rely upon the HPD presence pin
drm/i915: Reset last_retired_head when resetting ring
Empirical evidence suggests that we need to: On at least one ivb
machine when running the hangman i-g-t test, the rings don't properly
initialize properly - the RING_START registers seems to be stuck at
all zeros.
Holding forcewake around this register init sequences makes chip reset
reliable again. Note that this is not the first such issue:
commit f01db988ef
Author: Sean Paul <seanpaul@chromium.org>
Date: Fri Mar 16 12:43:22 2012 -0400
drm/i915: Add wait_for in init_ring_common
added delay loops to make RING_START and RING_CTL initialization
reliable on the blt ring at boot-up. So I guess it won't hurt if we do
this unconditionally for all force_wake needing gpus.
To avoid copy&pasting of the HAS_FORCE_WAKE check I've added a new
intel_info bit for that.
v2: Fixup missing commas in static struct and properly handling the
error case in init_ring_common, both noticed by Jani Nikula.
Cc: stable@vger.kernel.org
Reported-and-tested-by: Yang Guang <guang.a.yang@intel.com>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50522
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We need the latest dma-buf code from Dave Airlie so that we can pimp
the backing storage handling code in drm/i915 with Chris Wilson's
unbound tracking and stolen mem backed gem object code.
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If any l3 rows have been previously remapped, we must remap them after
GPU reset/resume too.
v2: Just return (no warn) on remapping init if not IVB (Jesse)
Move the check of schizo userspace to i915_gem_l3_remap (Jesse)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
On IVB hardware we are given an interrupt whenever a L3 parity error
occurs in the L3 cache. The L3 cache is used by internal GPU clients
only. This is a very rare occurrence (in fact to test this I need to
use specially instrumented silicon).
When a row in the L3 cache detects a parity error the HW generates an
interrupt. The interrupt is masked in GTIMR until we get a chance to
read some registers and alert userspace via a uevent. With this
information userspace can use a sysfs interface (follow-up patch) to
remap those rows.
Way above my level of understanding, but if a given row fails, it is
statistically more likely to fail again than a row which has not failed.
Therefore it is desirable for an operating system to maintain a lifelong
list of failing rows and always remap any bad rows on driver load.
Hardware limits the number of rows that are remappable per bank/subbank,
and should more than that many rows detect parity errors, software
should maintain a list of the most frequent errors, and remap those
rows.
V2: Drop WARN_ON(IS_GEN6) (Jesse)
DRM_DEBUG row/bank/subbank on errror (Jesse)
Comment updates (Jesse)
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Wait request is poorly named IMO. After working with these functions for
some time, I feel it's much clearer to name the functions more
appropriately.
Of course we must update the callers to use the new name as well.
This leaves room within our namespace for a *real* wait request function
at some point.
Note to maintainer: this patch is optional.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This helps implement GL_ARB_sync but stops short of allowing full blown
sync objects. Finally we can use the new timed seqno waiting function
to allow userspace to wait on a buffer object with a timeout. This
implements that interface.
The IOCTL will take as input a buffer object handle, and a timeout in
nanoseconds (flags is currently optional but will likely be used for
permutations of flush operations). Users may specify 0 nanoseconds to
instantly check.
The wait ioctl with a timeout of 0 reimplements the busy ioctl. With any
non-zero timeout parameter the wait ioctl will wait for the given number
of nanoseconds on an object becoming unbusy. Since the wait itself does
so holding struct_mutex the object may become re-busied before this
completes. A similar but shorter race condition exists in the busy
ioctl.
v2: ETIME/ERESTARTSYS instead of changing to EBUSY, and EGAIN (Chris)
Flush the object from the gpu write domain (Chris + Daniel)
Fix leaked refcount in good case (Chris)
Naturally align ioctl struct (Chris)
v3: Drop lock after getting seqno to avoid ugly dance (Chris)
v4: check for 0 timeout after olr check to allow polling (Chris)
v5: Updated the comment. (Chris)
v6: Return -ETIME instead of -EBUSY when timeout_ns is 0 (Daniel)
Fix the commit message comment to be less ugly (Ben)
Add a warning to check the return timespec (Ben)
v7: Use DRM_AUTH for the ioctl. (Eugeni)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This adds handle->fd and fd->handle support to i915, this is to allow
for offloading of rendering in one direction and outputs in the other.
v2 from Daniel Vetter:
- fixup conflicts with the prepare/finish gtt prep work.
- implement ppgtt binding support.
Note that we have squat i-g-t testcoverage for any of the lifetime and
access rules dma_buf/prime support brings along. And there are quite a
few intricate situations here.
Also note that the integration with the existing code is a bit
hackish, especially around get_gtt_pages and put_gtt_pages. It imo
would be easier with the prep code from Chris Wilson's unbound series,
but that is for 3.6.
Also note that I didn't bother to put the new prepare/finish gtt hooks
to good use by moving the dma_buf_map/unmap_attachment calls in there
(like we've originally planned for).
Last but not least this patch is only compile-tested, but I've changed
very little compared to Dave Airlie's version. So there's a decent
chance v2 on drm-next works as well as v1 on 3.4-rc.
v3: Right when I've hit sent I've noticed that I've screwed up one
obj->sg_list (for dmar support) and obj->sg_table (for prime support)
disdinction. We should be able to merge these 2 paths, but that's
material for another patch.
v4: fix the error reporting bugs pointed out by ickle.
v5: fix another error, and stop non-gtt mmaps on shared objects
stop pread/pwrite on imported objects, add fake kmap
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In many places we wish to iterate over the rings associated with the
GPU, so refactor them to use a common macro.
Along the way, there are a few code removals that should be side-effect
free and some rearrangement which should only have a cosmetic impact,
such as error-state.
Note that this slightly changes the semantics in the hangcheck code:
We now always cycle through all enabled rings instead of
short-circuiting the logic.
v2: Pull in a couple of suggestions from Ben and Daniel for
intel_ring_initialized() and not removing the warning (just moving them
to a new home, closer to the error).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Added note to commit message about the small behaviour
change, suggested by Ben Widawsky.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The line time can be programmed according to the number of horizontal
pixels vs effective pixel rate ratio.
v2: improve comment as per Chris Wilson suggestion
v3: incorporate latest changes in specs.
v4: move into wm update routine, also mention that the same routine can
program IPS watermarks. We do not have their enablement code yet, nor
handle the required clock settings at the moment, so this patch won't
program those values for now.
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Only half of them even cared, and it's always the same one.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
- reduce the irq disabled section, even for a debugfs file this was
way too long.
- always disable irqs when taking the lock.
v2: Thou shalt not mistake locking for reference counting, so:
- reference count the error_state to protect from concurent freeeing.
This will be only really used in the next patch.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
gpu reset is a very important piece of our infrastructure.
Unfortunately we only really it test by actually hanging the gpu,
which often has bad side-effects for the entire system. And the gpu
hang handling code is one of the rather complicated pieces of code we
have, consisting of
- hang detection
- error capture
- actual gpu reset
- reset of all the gem bookkeeping
- reinitialition of the entire gpu
This patch adds a debugfs to selectively stopping rings by ceasing to
update the hw tail pointer, which will result in the gpu no longer
updating it's head pointer and eventually to the hangcheck firing.
This way we can exercise the gpu hang code under controlled conditions
without a dying gpu taking down the entire systems.
Patch motivated by me forgetting to properly reinitialize ppgtt after
a gpu reset.
Usage:
echo $((1 << $ringnum)) > i915_ring_stop # stops one ring
echo 0xffffffff > i915_ring_stop # stops all, future-proof version
then run whatever testload is desired. i915_ring_stop automatically
resets after a gpu hang is detected to avoid hanging the gpu to fast
and declaring it wedged.
v2: Incorporate feedback from Chris Wilson.
v3: Add the missing cleanup.
v4: Fix up inconsistent size of ring_stop_read vs _write, noticed by
Eugeni Dodonov.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Every time we use the device after a period of idleness, check that the
power management setup is still sane. This is to workaround a bug
whereby it seems that we begin suppressing power management interrupts,
preventing SandyBridge+ from going into turbo mode.
This patch does have a side-effect. It removes the mark-busy for just
moving the cursor - we don't want to increase the render clock just for
the sprite, though we may want to bump the display frequency. I'd argue
that we do not, and certainly don't want to take the struct_mutex here
due to the large latencies that introduces.
References: https://bugs.freedesktop.org/show_bug.cgi?id=44006
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
To get the fun stuff out of the way, the legacy hws is allocated by
userspace when the gpu needs a gfx hws. And there's no reference-counting
going on, so userspace can simply screw everyone over.
At least it's not as horrible as i810, where the ringbuffer is allocated
by userspace ...
We can't fix this disaster, but we can at least tidy up the code a
bit to make things clearer:
- Drop the drm ioremap indirection.
- Add a new new read_legacy_status_page to paper over the differences
between the legacy gfx hws and the physical hws shared with the
new ringbuffer code.
- Add a pointer in dev_priv->dri1 for the cpu addresses - that one is
an iomem remapping as opposed to all other hw status pages. This is
just prep work to make sparse happy.
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Wohoo!
Now we only need to move all the gem/kms stuff that accidentally
landed in i915_dma.c out of it, and this will be our legacy dri1
grave-yard.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
... and hide it in i915_dma.c.
This way all the legacy stuff dealing with READ_BREADCRUMB and
LP_RING and friends is in i915_dma.c.
v2: Rebase on top of Chris Wilson's rework irq handling code.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Let's just get this out of the way.
v2: Rebase against ENODEV changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Assigned in setparam, used never.
I didn't bother to dig through the archives to figure out what
this was supposed to do.
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
... and shove allow_batchbuffer in there. More dragons will
follow suit.
There's the curious case that we allow this for KMS ...
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
i915_dma.c contains most of the old dri1 horror-show, so move
the remaining bits there, too. The code has been removed and
the only thing left are some stubs to ensure that userspace
doesn't try to use this stuff. vblank_pipe_set only returns 0
without any side-effects, so we can even stub it out with
the canonical drm_noop.
v2: Rebase against ENODEV changes.
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
vblank_pipe was intended to be used for tracking DRI1 state. However,
the vblank_pipe reported to DRI1 is fixed to umask both pipes, and the
dev_priv->vblank_pipe unused and superfluous.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The waiting_seqno is not terribly useful, and as such we can remove it
so that we'll be able to extract lockless code.
v2: Keep the information for error_state (Chris)
Check if ring is initialized in hangcheck (Chris)
Capture the waiting ring (Chris)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: add some bikeshed to clarify a comment.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This extra bit of interrupt enabling code doesn't belong in the wait
seqno function. If anything we should pull it out to a helper so the
throttle code can also use it. The history is a bit vague, but I am
going to attempt to just dump it, unless someone can argue otherwise.
Removing this allows for a shared lock free wait seqno function. To keep
tabs on this issue though, the IER value is stored on error capture
(recommended by Chris Wilson)
v2: fixed typo EIR->IER (Ben)
Fix some white space (Ben)
Move IER capture to globally instead of per ring (Ben)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: ier is a 16 bit reg on gen2!]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This originates from a hack by me to quickly fix a bug in an earlier
patch where we needed control over whether or not waiting on a seqno
actually did any retire list processing. Since the two operations aren't
clearly related, we should pull the parameter out of the wait function,
and make the caller responsible for retiring if the action is desired.
The only function call site which did not get an explicit retire_request call
(on purpose) is i915_gem_inactive_shrink(). That code was already calling
retire_request a second time.
v2: don't modify any behavior excepit i915_gem_inactive_shrink(Daniel)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
On later gen3, you are able to select the meaning of the FlipPending
status bit in IIR and change it to FlipDone. This was sometimes done by
the BIOS leading to confusion on just how pageflipping worked on gen3.
Simplify the implementation by using the legacy meaning for all gen3
machines.
Note: this makes all gen3 machines equally broken...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
And remove the cargo-culted copy from the valleyview irq handler.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Always true these days. It has been added originally to work
around some issues with the agp layer in 2.6.29:
commit ac5c4e7618
Author: Dave Airlie <airlied@redhat.com>
Date: Fri Dec 19 15:38:34 2008 +1000
drm/i915: GEM on PAE has problems - disable it for now.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We slightly modify the initialisation sequence to move the
initialisation of the memory managers earlier and in particular before
probing outputs and detecting any existing output configuration. This is
essential if we wish to track preallocated objects and preserve them
whilst initialising GEM.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The use of the mm_list by deferred-free breaks the following patches to
extend the range of objects tracked. We can simplify things if we just
make the unbind during free uninterrutible.
Note that unbinding should never fail, because we hold an additional
reference on every active object. Only the ilk vt-d workaround breaks
this, but already takes care of not failing by waiting for the gpu to
quiescent non-interruptible. But the existence of the deferred free
list casted some doubts on this theory, hence WARN if the unbind fails
and only then retry non-interruptible.
We can kill this additional code after a release in case the theory is
indeed right and no one has hit that WARN.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Simplify object tracking by removing the inactive but pinned list. The
only place where this was used is for counting the available memory,
which is just as easy performed by checking all objects on the rare
occasions it is required (application startup). For ease of debugging,
we keep the reporting of pinned objects through the error-state and
debugfs.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This was only used by one external caller who would just be as happy
with evict-everything, so perform the replacement and make the function
private.
In the process we note that unbinding the inactive list should not fail,
and make it a warning instead.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
PCH PLLs aren't required for outputs on the CPU, so we shouldn't just
treat them as part of the pipe.
So split the code out and manage PCH PLLs separately, allocating them
when needed or trying to re-use existing PCH PLL setups when the timings
match.
v2: add num_pch_pll field to dev_priv (Daniel)
don't NULL the pch_pll pointer in disable or DPMS will fail (Jesse)
put register offsets in pll struct (Chris)
v3: Decouple enable/disable of PLLs from get/put.
v4: Track temporary PLL disabling during modeset
v5: Tidy PLL initialisation by only checking for num_pch_pll == 0 (Eugeni)
v6: Avoid mishandling allocation failure by embedding the small array of
PLLs into the device struct
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44309
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> (up to v2)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v3+)
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Tested-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Rename obj->tiling_changed to obj->fence_dirty so that it is clear that
it flags when the parameters for an active fence (including the
no-fence) register are changed.
Also, do not set this flag when the object does not have a fence
register allocated currently and the gpu does not depend upon the
unfence. This case works exactly like when a tiled object lost its
fence and hence does not need additional handling for the tiling
change in the code.
v2: Use fence_dirty to better express what the flag tracks and add a few
more details to the comments to serve as a reminder of how the GPU also
uses the unfenced register slot.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Add some bikeshed to the commit message about the stricter
use of fence_dirty.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Almost all of the errors related __iomem problems.
Most of the changes here are trivial, however there is plenty of chance
for yank/paste errors.
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we now never pipeline a fence update, obj->last_fenced_ring is always
the same as the obj->ring whenever obj->last_fenced_seqno is active, so
remove it.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we now no longer track a pipelined fence change, we never use
ring->setup_seqno and can kill it.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We never succeeded in getting pipelined fencing to work (unresolved
spurious GPU hangs), so begin the process of dismantling and removal
the broken code.
Step 1 is the removal of the pipeline parameter to get_fence().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This should contain all the changes which require no thought to make
sparse happy.
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
After a gpu reset we need to re-init some of the hw state we only
initialize when modeset is enabled, like rc6, hw contexts or render/GT
core clock gating and workaround register settings.
Note that this patch has a small change in the resume code:
- rc6 on gen6+ is only restored for the modeset case (for more
consistency with other callsites). This is no problem because recent
kernels refuse to load drm/i915 without kms on gen6+
- rc6/emon on ilk is only restored for the modeset case. This is no
problem because rc6 is disabled by default on ilk, and ums on ilk
has never really been a supported option outside of horrible rhel
backports.
v2: Chris Wilson noticed that we not only fail to restore the clock
gating settings after gpu reset.
v3: Move the call to modeset_init_hw in _reset out of the
struct_mutext protected area - other callers don't hold it, too.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Merge rc6 information into the power group for our device. Until now the
i915 driver has not had any sysfs entries (aside from the connector
stuff enabled by drm core). Since it seems like we're likely to have
more in the future I created a new file for sysfs stubs, as well as the
rc6 sysfs functions which don't really belong elsewhere (perhaps
i915_suspend, but most of the stuff is in intel_display,c).
displays rc6 modes enabled (as a hex mask):
cat /sys/class/drm/card0/power/rc6_enable
displays #ms GPU has been in rc6 since boot:
cat /sys/class/drm/card0/power/rc6_residency_ms
displays #ms GPU has been in deep rc6 since boot:
cat /sys/class/drm/card0/power/rc6p_residency_ms
displays #ms GPU has been in deepest rc6 since boot:
cat /sys/class/drm/card0/power/rc6pp_residency_ms
Important note: I've seen on SNB that even when RC6 is *not* enabled the
rc6 register seems to have a random value in it. I can only guess at the
reason reason for this. Those writing tools that utilize this value need
to be careful and probably want to scrutinize the value very carefully.
v2: use common rc6 residency units to milliseconds for the other RC6 types
v3: don't create sysfs files for GEN <= 5
add a rc6_enable to show a mask of enabled rc6 types
use unmerge instead of remove for sysfs group
squash intel_enable_rc6() extraction into this patch
v4: rename sysfs files (Chris)
CC: Chris Wilson <chris@chris-wilson.co.uk>
CC: Daniel Vetter <daniel.vetter@ffwll.ch>f
CC: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: squash in the 64bit division fix by Chris Wilson.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In theory this will have performance and power improvements. Performance
because we don't need to stall when the scanout BO is busy, and power
because we don't have to stall when the BO is busy (and the ring can
even go to sleep if the HW supports it).
v2:
squash 2 patches into 1 (me)
un-inline the enable_semaphores function (Daniel)
remove comment about SNB hangs from i915_gem_object_sync (Chris)
rename intel_enable_semaphores to i915_semaphore_is_enabled (me)
removed page flip comment; "no why" (Chris)
To address other comments from Daniel (irc):
update the comment to say 'vt-d is crap, don't enable semaphores'
- I think you misinterpreted Chris' comment, it already exists.
checking out whether we can pageflip on the render ring on ivb (didn't
work on early silicon)
- We don't want to enable workarounds for early silicon unless we have
to.
- I can't find any references in the docs about this.
optionally use it if the fb is already busy on the render ring
- This should be how the code already worked, unless I am
misunderstanding your meaning.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
By simplifying the rules to calling get_fence when writing to the
through the GTT in a tiled manner, and calling put_fence before writing
to the object through the GTT in a linear manner, the code becomes
clearer and there is less chance of making a mistake.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[danvet: fixed up conflict with ppgtt code and spelling in a new
comment.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Daniel Vetter wrote
First pull request for 3.5-next, slightly large than usual because new
things kept coming in since the last pull for 3.4.
Highlights:
- first batch of hw enablement for vlv (Jesse et al) and hsw (Eugeni). pci
ids are not yet added, and there's still quite a few patches to merge
(mostly modesetting). To make QA easier I've decided to merge this stuff
in pieces.
- loads of cleanups and prep patches spurred by the above. Especially vlv
is a real frankenstein chip, but also hsw is stretching our driver's
code design. Expect more to come in this area for 3.5.
- more gmbus fixes, cleanups and improvements by Daniel Kurtz. Again,
there are more patches needed (and some already queued up), but I wanted
to split this a bit for better testing.
- pwrite/pread rework and retuning. This series has been in the works for
a few months already and a lot of i-g-t tests have been created for it.
Now it's finally ready to be merged. Note that one patch in this series
touches include/pagemap.h, that patch is acked-by akpm.
- reduce mappable pressure and relocation throughput improvements from
Chris.
- mmap offset exhaustion mitigation by Chris Wilson.
- a start at figuring out which codepaths in our messy dri1/ums+gem/kms
driver we actually need to support by bailing out of unsupported case.
The driver now refuses to load without kms on gen6+ and disallows a few
ioctls that userspace never used in certain cases. More of this will
definitely come.
- More decoupling of global gtt and ppgtt.
- Improved dual-link lvds detection by Takashi Iwai.
- Shut up the compiler + plus fix the fallout (Ben)
- Inverted panel brightness handling (mostly Acer manages to break things
in this way).
- Small fixlets and adjustements and some minor things to help debugging.
Regression-wise QA reported quite a few issues on ivb, but all of them
turned out to be hw stability issues which are already fixed in
drm-intel-fixes (QA runs the nightly regression tests on -next alone,
without -fixes automatically merged in). There's still one issue open on
snb, it looks like occlusion query writes are not quite as cache coherent
as we've expected. With some of the pwrite adjustements we can now
reliably hit this. Kernel workaround for it is in the works."
* 'drm-intel-next' of git://people.freedesktop.org/~danvet/drm-intel: (101 commits)
drm/i915: VCS is not the last ring
drm/i915: Add a dual link lvds quirk for MacBook Pro 8,2
drm/i915: make quirks more verbose
drm/i915: dump the DMA fetch addr register on pre-gen6
drm/i915/sdvo: Include YRPB as an additional TV output type
drm/i915: disallow gem init ioctl on ilk
drm/i915: refuse to load on gen6+ without kms
drm/i915: extract gt interrupt handler
drm/i915: use render gen to switch ring irq functions
drm/i915: rip out old HWSTAM missed irq WA for vlv
drm/i915: open code gen6+ ring irqs
drm/i915: ring irq cleanups
drm/i915: add SFUSE_STRAP registers for digital port detection
drm/i915: add WM_LINETIME registers
drm/i915: add WRPLL clocks
drm/i915: add LCPLL control registers
drm/i915: add SSC offsets for SBI access
drm/i915: add port clock selection support for HSW
drm/i915: add S PLL control
drm/i915: add PIXCLK_GATE register
...
Conflicts:
drivers/char/agp/intel-agp.h
drivers/char/agp/intel-gtt.c
drivers/gpu/drm/i915/i915_debugfs.c
There are 5 DDI ports on Haswell. Port A is always enabled, and is the one
connected to eDP, and Port E is the one that can be connected to the PCH
using FDI protocol. Ports B, C, D and E can be used for digital outputs.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This adds product definitions for desktop, mobile and server boards.
v2: split into a separate patch, add .has_pch_split feature.
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The macro is becoming too complex and with VLV upon us it can lead to
confusion. So transforming this into a feature check instead.
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
[danvet: fixed conflict with is_valleyview addition.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Totally unexpected that this regressed. Luckily it sounds like we just
need to have dmar disable on the igfx, not the entire system. At least
that's what a few days of testing between Tony Vroon and me indicates.
Reported-by: Tony Vroon <tony@linx.net>
Cc: Tony Vroon <tony@linx.net>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=43024
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This allows to select which rc6 modes are to be used via kernel parameter,
via a bitmask parameter. E.g.:
- to enable rc6, i915_enable_rc6=1
- to enable rc6 and deep rc6, i915_enable_rc6=3
- to enable rc6 and deepest rc6, use i915_enable_rc6=5
- to enable rc6, deep and deepest rc6, use i915_enable_rc6=7
Please keep in mind that the deepest RC6 state really should NOT be used
by default, as it could potentially worsen the issues with deep RC6. So do
enable it only when you know what you are doing. However, having it around
could help solving possible future rc6-related issues and their debugging
on user machines.
Note that this changes behavior - previously, value of 1 would enable both
RC6 and deep RC6. Now it should only enable RC6 and deep/deepest RC6
stages must be enabled manually.
v2: address Chris Wilson comments and clean up the code.
References: https://bugs.freedesktop.org/show_bug.cgi?id=42579
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
ValleyView handles force wake differently than previous chipsets, so add
a couple of new functions for it. But leave it disabled by default
until we test it (need a chip with the Punit enabled first).
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
ValleyView puts some display related registers like the PLL controls and
dividers behind the DPIO bus. Add simple indirect register access
routines to get to those registers.
v2: move new wait_for macro to intel_drv.h (Ben)
fix DPIO_PKT double write (Ben)
add debugfs file
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
For use by the rest of the ValleyView code.
v2: fix desktop variant to not set is_mobile (Ben)
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This memory is always allocated, and it is always a fixed size, so just
allocate it along with the rest of the driver state.
Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There is no GMBUS "disabled" port 0, nor "reserved" port 7.
For the other 6 ports there is a fixed 1:1 mapping between pin pairs and
gmbus ports, which means every real gmbus port has a gpio pin.
Given these realizations, clean up gmbus initialization.
Tested on Sandybridge (gen 6, PCH == CougarPoint) hardware.
Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Instead of letting other modules directly access the ->gmbus array,
introduce intel_gmbus_get_adapter() for looking up an i2c_adapter
for a given gmbus port identifier. This will enable later refactoring
of the gmbus port list.
Note: Before requesting an adapter for a given gmbus port number, the
driver must first check its validity using i2c_intel_gmbus_is_port_valid().
If this check fails, a call to intel_gmbus_get_adapter() will WARN_ON and
return NULL. This is relevant for parts of the driver that read a port
from VBIOS, which might be improperly initialized and contain an invalid
port. In these cases, the driver must fall back to using a safer default
port.
Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
No longer needed.
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We try to avoid writing the relocations through the uncached GTT, if the
buffer is currently in the CPU write domain and so will be flushed out to
main memory afterwards anyway. Also on SandyBridge we can safely write
to the pages in cacheable memory, so long as the buffer is LLC mapped.
In either of these cases, we therefore do not need to force the
reallocation of the buffer into the mappable region of the GTT, reducing
the aperture pressure.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
... because this is what it actually doesn now that we have the global
gtt vs. ppgtt split.
Also move it to the other global gtt functions in i915_gem_gtt.c
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Again, Valleyview modes these around, so make the mmio base more
explicit to consolidate the base address computations to one
HAS_PCH_SPLIT check.
v2: Fix up the PCH_SPLIT braino ... it actually works that way round.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It's only used by the main read/write functions, so we can keep it with
them.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Add a new module optoin lvds_channel to specify the LVDS channel mode
explicitly instead of probing the LVDS register value set by BIOS.
This will be helpful when VBT is broken or incompatible with the
current code.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=42842
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Currently i915 driver checks [PCH_]LVDS register bits to decide
whether to set up the dual-link or the single-link mode. This relies
implicitly on that BIOS initializes the register properly at boot.
However, BIOS doesn't initialize it always. When the machine is
booted with the closed lid, BIOS skips the LVDS reg initialization.
This ends up in blank output on a machine with a dual-link LVDS when
you open the lid after the boot.
This patch adds a workaround for that problem by checking the initial
LVDS register value in VBT.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37742
Tested-By: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
And track the existence of such a binding similar to the aliasing
ppgtt case. Speeds up binding/unbinding in the common case where we
only need a ppgtt binding (which is accessed in a cpu coherent fashion
by the gpu) and no gloabl gtt binding (which needs uc writes for the
ptes).
This patch just puts the required tracking in place.
v2: Check that global gtt mappings exist in the error_state capture
code (with Chris Wilson's llc reloc patches batchbuffers are no longer
relocated as mappable in all situations, so this matters). Suggested
by Chris Wilson.
v3: Adapted to Chris' latest llc-reloc patches.
v4: Fix a bug in the i915 error state capture code noticed by Chris
Wilson.
Reviewed-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Note that there's a functional change buried in this patch wrt the ilk
dmar workaround: We now only idle the gpu while tearing down the dmar
mappings, not while clearing the gtt. Keeping the current semantics
would have made for some really ugly code and afaik the issue is only
with the dmar unmapping that needs a fully idle gpu.
Reviewed-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
A machine may need to invert the panel backlight brightness value. This
patch adds the infrastructure for a quirk to do so.
Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This way we can simplify the setup and teardown a bit.
Because we don't actually allocate anything anymore for the force_bit
case, we can now convert that into a boolean.
Also and the functionality supported by the bit-banging together with
what gmbus can do, so that this doesn't randomly change any more.
v2: Chris Wilson noticed that I've mixed up && and & ...
v3: Clarify an if block as suggested by Eugeni Dodonov.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
... and directly call the newly exported i2c bit-banging functions.
The code is still pretty convoluted because we only set up the gpio
i2c stuff when actually falling back, resulting in more complexity
than necessary. This will be fixed up in the next patch.
v2: Use exported i2c_bit_algo vtable instead of exported functions.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When we set up the gpio fallback, we always have a 1:1 relationship
with an intel_gmbus. Exploit that to store all gpio related data in
there, too. This is a preparation step to merge the tw i2c adapters
controlling the same bus into one.
Just mundane code-munging in this patch.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This way we can free up the bus->adaptor.algo_data pointer and make it
available for use with the bitbanging fallback algo.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
gcc seems to get uber-anal recently about these things.
Clarification from Dan Carpenter:
"Sorry, I should have said that it's not a gcc warning, it's a smatch
thing. But also it's not uber-anal. It's the exact level of anality
which is required to make the == -1 test work. You can compare
unsigned int and longs to -1 and it works but for smaller types it
doesn't."
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
So that we can tally the request against the command sequence in the
ringbuffer, or merely jump to the interesting locations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Being able to tally the list of outstanding requests with the sequence
of commands in the ringbuffer is often useful evidence with respect to
driver corruption.
Note that since this is the umpteenth per-ring data structure to be added
to the error state, I've coallesced the nearby loops (the ringbuffer and
batchbuffer) into a single structure along with the list of requests. A
later task would be to refactor the ring register state into the same
structure.
v2: Fix pretty printing of requests so that they are parsed correctly by
intel_error_decode and use the 0x%08x format for seqno for consistency
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
By recording the location of every request in the ringbuffer, we know
that in order to retire the request the GPU must have finished reading
it and so the GPU head is now beyond the tail of the request. We can
therefore provide a conservative estimate of where the GPU is reading
from in order to avoid having to read back the ring buffer registers
when polling for space upon starting a new write into the ringbuffer.
A secondary effect is that this allows us to convert
intel_ring_buffer_wait() to use i915_wait_request() and so consolidate
upon the single function to handle the complicated task of waiting upon
the GPU. A necessary precaution is that we need to make that wait
uninterruptible to match the existing conditions as all the callers of
intel_ring_begin() have not been audited to handle ERESTARTSYS
correctly.
By using a conservative estimate for the head, and always processing all
outstanding requests first, we prevent a race condition between using
the estimate and direct reads of I915_RING_HEAD which could result in
the value of the head going backwards, and the tail overflowing once
again. We are also careful to mark any request that we skip over in
order to free space in ring as consumed which provides a
self-consistency check.
Given sufficient abuse, such as a set of unthrottled GPU bound
cairo-traces, avoiding the use of I915_RING_HEAD gives a 10-20% boost on
Sandy Bridge (i5-2520m):
firefox-paintball 18927ms -> 15646ms: 1.21x speedup
firefox-fishtank 12563ms -> 11278ms: 1.11x speedup
which is a mild consolation for the performance those traces achieved from
exploiting the buggy autoreported head.
v2: Add a few more comments and make request->tail a conservative
estimate as suggested by Daniel Vetter.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: resolve conflicts with retirement defering and the lack of
the autoreport head removal (that will go in through -fixes).]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
GMBUS has several ports and each has it's own corresponding
I2C adpater. When multiple I2C adapters call gmbus_xfer() at
the same time there is a race condition in using the underlying
GMBUS controller. Fixing this by adding a mutex lock when calling
gmbus_xfer().
v2: Moved gmbus_mutex below intel_gmbus and added comments.
Rebased to drm-intel-next-queued.
Signed-off-by: Yufeng Shen <miletus@chromium.org>
[danvet: Shortened the gmbus_mutex comment a bit and add the patch
revision comment to the commit message.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When HDMI-DVI converter is used, it's not only necessary to turn off
audio, but also to disable HDMI_MODE_SELECT and video infoframe. Since
the DVI mode is mainly tied to audio functionality from end user POV,
add a new "force-dvi" audio mode:
xrandr --output HDMI1 --set audio force-dvi
Note that most users won't need to set this and happily rely on the EDID
based DVI auto detection.
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Currently we reserve seqnos only when we emit the request to the ring
(by bumping dev_priv->next_seqno), but start using it much earlier for
ring->oustanding_lazy_request. When 2 threads compete for the gpu and
run on two different rings (e.g. ddx on blitter vs. compositor)
hilarity ensued, especially when we get constantly interrupted while
reserving buffers.
Breakage seems to have been introduced in
commit 6f392d5486
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Sat Aug 7 11:01:22 2010 +0100
drm/i915: Use a common seqno for all rings.
This patch fixes up the seqno reservation logic by moving it into
i915_gem_next_request_seqno. The ring->add_request functions now
superflously still return the new seqno through a pointer, that will
be refactored in the next patch.
Note that with this change we now unconditionally allocate a seqno,
even when ->add_request might fail because the rings are full and the
gpu died. But this does not open up a new can of worms because we can
already leave behind an outstanding_request_seqno if e.g. the caller
gets interrupted with a signal while stalling for the gpu in the
eviciton paths. And with the bugfix we only ever have one seqno
allocated per ring (and only that ring), so there are no ordering
issues with multiple outstanding seqnos on the same ring.
v2: Keep i915_gem_get_seqno (but move it to i915_gem.c) to make it
clear that we only have one seqno counter for all rings. Suggested by
Chris Wilson.
v3: As suggested by Chris Wilson use i915_gem_next_request_seqno
instead of ring->oustanding_lazy_request to make the follow-up
refactoring more clearly correct. Also improve the commit message
with issues discussed on irc.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45181
Tested-by: Nicolas Kalkhof nkalkhof()at()web.de
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If we don't have a sufficient number of free entries in the FIFO, we
proceed to do a write anyway. With this check we should have a clue if
that write actually failed or not.
After some discussion with Daniel Vetter regarding his original
complaint, we agreed upon this.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Back-merge from drm-fixes into drm-intel-next to sort out two things:
- interlaced support: -fixes contains a bugfix to correctly clear
interlaced configuration bits in case the bios sets up an interlaced
mode and we want to set up the progressive mode (current kernels
don't support interlaced). The actual feature work to support
interlaced depends upon (and conflicts with) this bugfix.
- forcewake voodoo to workaround missed IRQ issues: -fixes only enabled
this for ivybridge, but some recent bug reports indicate that we
need this on Sandybridge, too. But in a slightly different flavour
and with other fixes and reworks on top. Additionally there are some
forcewake cleanup patches heading to -next that would conflict with
currrent -fixes.
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We want to unconditionally enable ppgtt for two reasons:
- Windows uses this on snb and later.
- We need the basic hw support to work before we can think about real
per-process address spaces and other cool features we want.
But Chris Wilson was complaining all over irc and intel-gfx that this
will blow up if we don't have a module option to disable it. Hence add
one, to prevent this.
ppgtt support seems to slightly change the timings and make crashy
things slightly more or less crashy. Now in my testing and the testing
this got on troublesome snb machines, it seems to have improved things
only. But on ivb it makes quite a few crashes happen much more often,
see
https://bugs.freedesktop.org/show_bug.cgi?id=41353
Luckily Eugeni Dodonov seems to have a set of workarounds that fix
this issue.
v2: Don't try to enable ppgtt on pre-snb.
v3: Pimp commit message and make Chris Wilson less grumpy by adding a
module option.
v4: New try at making Chris Wilson happy.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This adds support to bind/unbind objects and wires it up. Objects are
only put into the ppgtt when necessary, i.e. at execbuf time.
Objects are still unconditionally put into the global gtt.
v2: Kill the quick hack and explicitly pass cache_level to ppgtt_bind
like for the global gtt function. Noticed by Chris Wilson.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This just adds the setup and teardown code for the ppgtt PDE and the
last-level pagetables, which are fixed for the entire lifetime, at
least for the moment.
v2: Kill the stray debug printk noted by and improve the pte
definitions as suggested by Chris Wilson.
v3: Clean up the aperture stealing code as noted by Ben Widawsky.
v4: Paint the init code in a more pleasing colour as suggest by Chris
Wilson.
v5: Explain the magic numbers noticed by Ben Widawsky.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Chris Wilson and me have again stared at funny error states and it's
been pretty clear from the start that something was seriously amiss.
The seqnos last seen by the cpu were a few hundred behind those that
the gpu could have possibly emitted last before it died ...
Chris now tracked it down (hopefully, definit verdict's still out),
but in hindsight we'd have found the bug by simply dumping the cpu
side tracking of the ring head and tail registers.
Fix this and prevent an identical time-waster in the future.
Because the hangs always involved semaphores in one way or another,
we've tried to dump the mbox registers, but couldn't find any
inconsistencies. Still, dump them too.
Reviewed-and-wanted-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We have to do this manually. Somebody had a Great Idea.
I've measured speed-ups just a few percent above the noise level
(below 5% for the best case), but no slowdows. Chris Wilson measured
quite a bit more (10-20% above the usual snb variance) on a more
recent and better tuned version of sna, but also recorded a few
slow-downs on benchmarks know for uglier amounts of snb-induced
variance.
v2: Incorporate Ben Widawsky's preliminary review comments and
elaborate a bit about the performance impact in the changelog.
v3: Add a comment as to why we don't need to check the 3rd memory
channel.
v4: Fixup whitespace.
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This was pretty handy when figuring out what exactly went wrong with
ppgtt and it might also be useful when we stop filling the entire gart
with scratch page entries.
Also add the gen6+ DONE reg while at it.
v2: Chris Wilson suggested to allocate the error_state with kzalloc
for better paranoia. Also kill existing spurious clears of the
error_state while at it.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In order to correctly account for reserving space in the GTT and fences
for a batch buffer, we need to independently track whether the fence is
pinned due to a fenced GPU access in the batch or whether the buffer is
pinned in the aperture. Currently we count the fenced as pinned if the
buffer has already been seen in the execbuffer. This leads to a false
accounting of available fence registers, causing frequent mass evictions.
Worse, if coupled with the change to make i915_gem_object_get_fence()
report EDADLK upon fence starvation, the batchbuffer can fail with only
one fence required...
Fixes intel-gpu-tools/tests/gem_fenced_exec_thrash
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38735
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Tested-by: Paul Neumann <paul104x@yahoo.de>
[danvet: Resolve the functional conflict with Jesse Barnes sprite
patches, acked by Chris Wilson on irc.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Based on a patch by Ben Widawsky, but with different colors
for the bikeshed.
In contrast to Ben's patch this one doesn't add the fault regs.
Afaics they're for the optional page fault support which
- we're not enabling
- and which seems to be unsupported by the hw team. Recent bspec
lacks tons of information about this that the public docs released
half a year back still contain.
Also dump ring HEAD/TAIL registers - I've recently seen a few
error_state where just guessing these is not good enough.
v2: Also dump INSTPM for every ring.
v3: Fix a few really silly goof-ups spotted by Chris Wilson.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The code already got unwieldy and we want to dump more per-ring
registers.
Only functional change is that we now also capture the video
ring registers on ilk.
v2: fixup a refactor fumble spotted by Chris Wilson.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Sometimes it may be the case when we idle the gpu or wait on something
we don't actually want to process the retiring list. This patch allows
callers to choose the behavior.
Reviewed-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The problem this patch solves is that the forcewake accounting
necessary for register reads is protected by dev->struct_mutex. But the
hangcheck and error_capture code need to access registers without
grabbing this mutex because we hold it while waiting for the gpu.
So a new lock is required. Because currently the error_state capture
is called from the error irq handler and the hangcheck code runs from
a timer, it needs to be an irqsafe spinlock (note that the registers
used by the irq handler (neglecting the error handling part) only uses
registers that don't need the forcewake dance).
We could tune this down to a normal spinlock when we rework the
error_state capture and hangcheck code to run from a workqueue. But
we don't have any read in a fastpath that needs forcewake, so I've
decided to not care much about overhead.
This prevents tests/gem_hangcheck_forcewake from i-g-t from killing my
snb on recent kernels - something must have slightly changed the
timings. On previous kernels it only trigger a WARN about the broken
locking.
v2: Drop the previous patch for the register writes.
v3: Improve the commit message per Chris Wilson's suggestions.
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
LLC is not SNB/IVB-specific, so we should check for it in a more generic
way.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Some decent history digging indicates that this was to be used for the
GLX_MESA_allocate_memory extension but never actually implemented for
any released i915 userspace code.
So just rip it out.
v2: Fixup the Makefile.
Acked-by: Dave Airlie <airlied@gmail.com>
Cc: Keith Whitwell <keithw@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The video sprites support various video surface formats natively and can
handle scaling as well. So add support for them using the new DRM core
sprite support functions.
v2: use drm specific fourcc header and defines
v3: address Daniel's comments:
- don't take struct mutex around register access (only needed for
regs in the GT power well)
- don't hold struct mutex across vblank waits
- fix up update_plane API (pass obj instead of GTT offset)
- add interlaced defines for sprite regs
- drop unnecessary 'reg' variables
- comment double buffered reg flushing
Also fix w/h confusion when writing the scaling reg.
v4: more fixes, address more comments from Daniel, and include Hai's fix
- prevent divide by zero in scaling calculation (Hai Lan)
- update to Ville's new DRM_FORMAT_* types
- fix sprite watermark handling (calc based on CRTC size, separate
from normal display wm)
- remove private refcounts now that the fb cleanups handles things
v5: add linear surface support
v6: remove color key clearing & setting from update_plane
For this version, I tested DPMS since it came up in the last review;
DPMS off/on works ok when a video player is working under X, but for
power saving we'll probably want to do something smarter. I'll leave
that for a separate patch on top. Likewise with the refcounting/fb
layer handling, which are really separate cleanups.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Keith Packard <keithp@keithp.com>
We learned that the ECOBUS register was inside the GT power well, and
so *did* need force wake to be read, so it gets removed from the list
of 'doesn't need force wake' registers.
That means the code reading ECOBUS after forcing the mt_force_wake
function to be called needs to use I915_READ_NOTRACE; it doesn't need
to do more force wake fun as it's already done it manually.
This also adds a comment explaining why the MT forcewake testing code
only needs to call mt_forcewake_get/put and not disable RC6 manually
-- the ECOBUS read will return 0 if the device is in RC6 and isn't
using MT forcewake, causing the test to work correctly.
Signed-off-by: Keith Packard <keithp@keithp.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Otherwise hangcheck spuriously fires when running blitter/bsd-only
workloads.
Contrary to a similar patch by Ben Widawsky this does not check
INSTDONE of the other rings. Chris Wilson implied that in a failure to
detect a hang, most likely because INSTDONE was fluctuating. Thus only
check ACTHD, which as far as I know is rather reliable. Also, blitter
and bsd rings can't launch complex tasks from a single instruction
(like 3D_PRIM on the render with complex or even infinite shaders).
This fixes spurious gpu hang detection when running
tests/gem_hangcheck_forcewake on snb/ivb.
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Keith Packard <keithp@keithp.com>
This adds a default setting for semaphores parameter, and enables
semaphores by default on IVB.
For now, as semaphores interaction with VTd causes random issues on
SNB, we do not enable them by default. But they can still be enabled
via the semaphores=1 kernel parameter.
v2: enables semaphores on SNB when IO remapping is disabled, with base
on Keith Packard patch.
CC: Daniel Vetter <daniel.vetter@ffwll.ch>
CC: Ben Widawsky <ben@bwidawsk.net>
CC: Keith Packard <keithp@keithp.com>
CC: Jesse Barnes <jbarnes@virtuousgeek.org>
CC: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42696
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40564
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41353
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38862
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
RC6 should always work on IVB, and should work on SNB whenever IO
remapping is disabled. RC6 never works on Ironlake. Make the default
value for the parameter follow these guidelines. Setting the value
to either 0 or 1 will force the specified behavior.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38567
Cc: Ted Phelps <phelps@gnusto.com>
Cc: Peter <pab1612@gmail.com>
Cc: Lukas Hejtmanek <xhejtman@fi.muni.cz>
Cc: Andrew Lutomirski <luto@mit.edu>
This prevents an in-kernel division by zero which happens when we are
asking for i915_chipset_val too quickly, or within a race condition
between the power monitoring thread and userspace accesses via debugfs.
The issue can be reproduced easily via the following command:
while ``; do cat /sys/kernel/debug/dri/0/i915_emon_status; done
This is particularly dangerous because it can be triggered by
a non-privileged user by just reading the debugfs entry.
This issue was also found independently by Konstantin Belousov
<kostikbel@gmail.com>, who proposed a similar patch.
Reported-by: Konstantin Belousov <kostikbel@gmail.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
On IVB C0+ with newer BIOSes, the forcewake handshake has changed. There's
now a bitfield for different driver components to keep the GT powered
on. On Linux, we centralize forcewake handling in one place, so we
still just need a single bit, but we need to use the new registers if MT
forcewake is enabled.
This needs testing on affected machines. Please reply with your
tested-by if you had problems after a BIOS upgrade and this patch fixes
them.
v2: force MT mode. shift by 16
v3: set MT force wake bits then check ECOBUS
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42923
Tested-by: Manoj Iyer <manoj.iyer@canonical.com>
Tested-by: Robert Hooker <robert.hooker@canonical.com>
Tested-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Keith Packard <keithp@keithp.com>
Testing i915_panel_use_ssc for the default value was broken, so the
driver would never autodetect the correct value.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Michel Alexandre Salim <salimma@fedoraproject.org>
Tested-by: Michel Alexandre Salim <salimma@fedoraproject.org>
Cc: stable@kernel.org
In preparation of to support 32 fences on Ivybdrigde.
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Keith Packard <keithp@keithp.com>
With the tracing code in there they are far too big to inline.
.text savings compared to a non force inline kernel:
i915_restore_display 4393 12036 +7643
i915_save_display 4295 11459 +7164
i915_handle_error 2979 6666 +3687
i915_driver_irq_handler 2923 5086 +2163
i915_ringbuffer_info 458 1661 +1203
i915_save_vga - 1200 +1200
i915_driver_irq_uninstall 453 1624 +1171
i915_driver_irq_postinstall 913 2078 +1165
ironlake_enable_drps 719 1872 +1153
i915_restore_vga - 1142 +1142
intel_display_capture_error_state 784 2030 +1246
intel_init_emon 719 2016 +1297
and more ...
[AK: these are older numbers, with the new SNB forcewake checks
it will be even worse]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
Well almost anyway. IVB has 3 planes, pipes, transcoders, and FDI
interfaces, but only 2 pipe PLLs. So two of the pipes must use the same
pipe timings (e.g. 2 DP plus one other, or two HDMI with the same mode
and one other, etc.).
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Tested-By: Eugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-By: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
I have no evidence for this byte being used this way, and lots of
counterexamples. Restore the struct to its empirical definition and
patch up gmbus setup to match.
Signed-off-by: Adam Jackson <ajax@redhat.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Keith Packard <keithp@keithp.com>
This value doesn't come directly from the VBT, and so is rather
specific to the particular DP output.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Store the panel power sequencing delays in the dp private structure,
rather than the global device structure. Who knows, maybe we'll get
more than one eDP device in the future.
From the eDP spec, we need the following numbers:
T1 + T3 Power on to Aux Channel operation (panel_power_up_delay)
This marks how long it takes the panel to boot up and
get ready to receive aux channel communications.
T8 Video signal to backlight on (backlight_on_delay)
Once a valid video signal is being sent to the device,
it can take a while before the panel is actuall
showing useful data. This delay allows the panel
to get something reasonable up before the backlight
is turned on.
T9 Backlight off to video off (backlight_off_delay)
Turning the backlight off can take a moment, so
this delay makes sure there is still valid video
data on the screen.
T10 Video off to power off (panel_power_down_delay)
Presumably this delay allows the panel to perform
an orderly shutdown of the display.
T11 + T12 Power off to power on (panel_power_cycle_delay)
So, once you turn the panel off, you have to wait a
while before you can turn it back on. This delay is
usually the longest in the entire sequence.
Neither the VBIOS source code nor the hardware documentation has a
clear mapping between the delay values they provide and those required
by the eDP spec. The VBIOS code actually uses two different labels for
the delay values in the five words of the relevant VBT table.
**** MORE LATER ***
Look at both the current hardware register settings and the VBT
specified panel power sequencing timings. Use the maximum of the two
delays, to make sure things work reliably. If there is no VBT data,
then those values will be initialized to zero, so we'll just use the
values as programmed in the hardware. Note that the BIOS just fetches
delays from the VBT table to place in the hardware registers, so we
should get the same values from both places, except for rounding.
VBT doesn't provide any values for T1 or T2, so we'll always just use
the hardware value for that.
The panel power up delay is thus T1 + T2 + T3, which should be
sufficient in all cases.
The panel power down delay is T1 + T2 + T12, using T1+T2 as a proxy
for T11, which isn't available anywhere.
For the backlight delays, the eDP spec says T6 + T8 is the delay from the
end of link training to backlight on and T9 is the delay from
backlight off until video off. The hardware provides a 'backlight on'
delay, which I'm taking to be T6 + T8 while the VBT provides something
called 'T7', which I'm assuming is s
On the macbook air I'm testing with, this yields a power-up delay of
over 200ms and a power-down delay of over 600ms. It all works now, but
we're frobbing these power controls several times during mode setting,
making the whole process take an awfully long time.
Signed-off-by: Keith Packard <keithp@keithp.com>
The reference clock configuration must be done before any mode setting
can occur as all outputs must be disabled to change
anything. Initialize the clocks after turning everything off during
the initialization process.
Also, re-initialize the refclk at resume time.
Signed-off-by: Keith Packard <keithp@keithp.com>
This tells the driver whether a CK505 clock source is available on
pre-PCH hardware. If so, it should be used as the non-SSC source,
leaving the internal clock for use as the SSC source.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Chris Wison <chris@chris-wilson.co.uk>
Add ELD support for Intel Eaglelake, IbexPeak/Ironlake,
SandyBridge/CougarPoint and IvyBridge/PantherPoint chips.
ELD (EDID-Like Data) describes to the HDMI/DP audio driver the audio
capabilities of the plugged monitor. It's built and passed to audio
driver in 2 steps:
(1) at get_modes time, parse EDID and save ELD to drm_connector.eld[]
(2) at mode_set time, write drm_connector.eld[] to the Transcoder's hw
ELD buffer and set the ELD_valid bit to inform HDMI/DP audio driver
This patch is tested OK on G45/HDMI, IbexPeak/HDMI and IvyBridge/HDMI+DP.
Test scheme: plug in the HDMI/DP monitor, and run
cat /proc/asound/card0/eld*
to check if the monitor name, HDMI/DP type, etc. show up correctly.
Minor imperfection: the GEN5_AUD_CNTL_ST/DIP_Port_Select field always
reads 0 (reserved). Without knowing the port number, I worked it around
by setting the ELD_valid bit for ALL the three ports. It's tested to not
be a problem, because the audio driver will find invalid ELD data and
hence rightfully abort, even when it sees the ELD_valid indicator.
Thanks to Zhenyu and Pierre-Louis for a lot of valuable help and testing.
CC: Zhao Yakui <yakui.zhao@intel.com>
CC: Wang Zhenyu <zhenyu.z.wang@intel.com>
CC: Jeremy Bush <contractfrombelow@gmail.com>
CC: Christopher White <c.white@pulseforce.com>
CC: Pierre-Louis Bossart <pierre-louis.bossart@intel.com>
CC: Paul Menzel <paulepanter@users.sourceforge.net>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
Various issues involved with the space character were generating
warnings in the checkpatch.pl file. This patch removes most of those
warnings.
Signed-off-by: Akshay Joshi <me@akshayjoshi.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: David Airlie <airlied@linux.ie>
Cc: Alex Deucher <alexdeucher@gmail.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Tested-by: Sedat Dilek <sedat.dilek@googlemail.com>
Tested-by: Michel Alexandre Salim <salimma@fedoraproject.org>
Tested-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Keith Packard <keithp@keithp.com>
At least on a Lenovo X220 the HPD bits of this are enabled at boot but
cleared after resume, which means plug interrupts stop working.
This also happens to fix DP displays re-lighting on resume. I'm quite
certain that's an accident: the first DP link train inevitably fails on
that machine, and it's only serendipity that we're getting multiple plug
interrupts and the second train works. But I shall take my victories
where I get them.
Signed-off-by: Adam Jackson <ajax@redhat.com>
Tested-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
Align unfenced buffers on older hardware to the power-of-two object
size. The docs suggest that it should be possible to align only to a
power-of-two tile height, but using the already computed fence size is
easier and always correct. We also have to make sure that we unbind
misaligned buffers upon tiling changes.
In order to prevent a repetition of this bug, we change the interface
to the alignment computation routines to force the caller to provide
the requested alignment and size of the GTT binding rather than assume
the current values on the object.
Reported-and-tested-by: Sitosfe Wheeler <sitsofe@yahoo.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36326
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Keith Packard <keithp@keithp.com>
We've tried several times to make this machine 'just work', but every
patch that does causes many other machines to fail. This adds a quirk
which special cases this hardware and forces ssc to be
disabled. There's no way to override this from the command line; that
would be a significantly more invasive change.
This patch fixes#36656 on fdo bugzilla:
https://bugs.freedesktop.org/show_bug.cgi?id=36656
Signed-off-by: Keith Packard <keithp@keithp.com>
References: https://bugs.freedesktop.org/show_bug.cgi?id=36656
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
The read back of the available FIFO entries is vital for system
stability, but extremely costly. However, we only need a guide so as to
avoid eating into the reserved entries and since we are the only
consumer we can cache the read of the count from the last write.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
Upon review, all path share the same dependencies for updating the
registers and so we can benefit from sharing the code and checking
early.
This removes the unsightly intel_wait_for_vblank() from the lowlevel
functions and upon further analysis the only path that will require a
wait is if we are performing an instantaneous transition between two
valid FBC configurations. The page-flip path itself will have disabled
FBC registers and will have waited for at least one vblank before
finishing the flip and attempting to re-enable FBC. This wait can be
accomplished simply by delaying the enable until after we are sure that
a vblank will have passed, which we are already doing to make sure that
the display is settled before enabling FBC.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
In order to accommodate the requirements of re-enabling FBC after
page-flipping, but to avoid doing so and incurring the cost of a wait
for vblank in the middle of a page-flip sequence, we defer the actual
enablement by 50ms. If any request to disable FBC arrive within that
interval, the enablement is cancelled and we are saved from blocking on
the wait.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Keith Packard <keithp@keithp.com>
As the enable/disable routines will be gain additional complexity in
future patches, it is necessary that all callers do not bypass the
generic interface by calling into the chipset routines directly. to do
this we make the chipset routines static, so there is no choice.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Keith Packard <keithp@keithp.com>
Updating the planes is device specific, so create a new display callback
and use it in pipe_set_base. (In fact we could go even further, valid
display plane bits have changed with each generation, as has tiled
buffer handling.)
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Keith Packard <keithp@keithp.com>
This lets us make the various IRQ functions static and helps avoid
problems like the one fixed in "drm/i915: Use chipset-specific irq
installers" where one of the exported functions was called rather than
the chipset specific version.
This also fixes a UMS-mode bug -- the correct irq functions for IRL
and later chips were only getting loaded in the KMS path.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
Lots of register access in these functions, some of which requires the
struct mutex.
These functions now hold the struct mutex across the calls to
i915_save_display and i915_restore_display, and so the internal mutex
calls in those functions have been removed. To ensure that no-one else
was calling them (and hence violating the new required locking
invarient), those functions have been made static.
gen6_enable_rps locks the struct mutex, and so i915_restore_state
unlocks the mutex around calls to that function.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
Provide a parameter to disable hanghcheck. This is useful mostly for
developers trying to debug known problems, and probably should not be
touched by normal users.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
This makes things a little clearer and prevents us from running old code
on a new chipset that may not be supported.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewied-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
We need to perform a few operations in order to move the object into the
display plane (where it can be accessed coherently by the display
engine) that are important for future safety to forbid whilst pinned. As a
result, we want to need to perform some of the operations before pinning,
but some are required once we have been bound into the GTT. So combine
the pinning performed by all the callers with set_to_display_plane(), so
this complication is contained within the single function.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[anholt v2: Don't forget that when going from cached to uncached, we
haven't been tracking the write domain from the CPU perspective, since
we haven't needed it for GPU coherency.]
[ickle v3: We also need to make sure we relinquish any fences on older
chipsets and clear the GTT for sane domain tracking.]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
... reincarnated from i915_gem_object_flush_gpu(). The semantic
difference is that after calling finish_gpu() the object no longer
resides in any GPU domain, and so will cause the GPU caches to be
invalidated if it is ever used again.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Make the audio property creation routine common and share the single
property between the connectors.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
Superseded by the tracking the render generation in the chipset
capabiltiies struct.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Keith Packard <keithp@keithp.com>
FBC has too many corner cases that we don't currently deal with, so
disable it by default so we can enable more important features like RC6,
which conflicts in some configurations.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=31742
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
Ibex Peak and CougarPoint already require a different setting (added
here), and future chips will likely follow that precedent.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
This helps contain the mess to init_display() instead.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
Add new interrupt handling functions for Ivy Bridge.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
Ivy Bridge has a similar split display controller to Sandy Bridge, so
use HAS_PCH_SPLIT. And gen7 also has the pipe control instruction, so
use HAS_PIPE_CONTROL as well.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
Note: IS_GEN* are for render related checks. Display and other checks
should use IS_MOBILE, IS_$CHIPSET or test for specific features.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
This makes the Ironlake+ code trivial and generally simplifies things.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
Set the IRQ handling functions in driver load so they'll just be used
directly, rather than branching over most of the code in the chipset
functions.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
Rather than branching in ironlake_pch_enable, add a new train_fdi
function to the display function pointer struct and use it instead.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
The render P-state handling code requires reading from a GT register.
This means that FORCEWAKE must be written to, a resource which is shared
and should be protected by struct_mutex. Hence we can not manipulate
that register from within the interrupt handling and so must delegate
the task to a workqueue.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Keith Packard <keithp@keithp.com>
Provide a reference count to track the forcewake state of the GPU and
give a safe mechanism for userspace to wake the GT. This also potentially
saves a UC read if the GT is known to be awake already.
The reference count is atomic, but the register access and hardware wake
sequence is protected by struct_mutex.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Moved the macros around to properly do reads and writes for the given
GPU. This is to address special requirements for gen6 (SNB) reads and
writes.
Registers in the range 0-0x40000 on gen6 platforms require special
handling. Instead of relying on the callers to pick the registers
correctly, move the logic into the read and write functions.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the outputs are active and continuing to access the GATT when we
teardown the PTEs, then there is a potential for us to hang the GPU.
The hang tends to be a PGTBL_ER with either an invalid host access or
an invalid display plane fetch.
v2: Reorder IRQ initialisation to defer until after GEM is setup.
Reported-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Daniel Vetter <daniel.vetter@ffwll.ch> (855GM)
Tested-by: Pekka Enberg <penberg@kernel.org>
# note that this doesn't fix the underlying problem of the
PGTBL_ER and pipe underruns being reported immediately upon
init on his 965GM MacBook
Reported-and-tested-by: Rick Bramley <richard.bramley@hp.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=35635
Reported-and-tested-by: Zdenek Kabelac <zdenek.kabelac@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36048
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
... to clarify just how we use it inside the driver and remove the
confusion of the poorly matching agp_type names. We still need to
translate through agp_type for interface into the fake AGP driver.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Keith Packard <keithp@keithp.com>
This path, which shouldn't be *that* complicated, is now so littered
with per-chipset tweaks that it's hard to trace the order of what
happens. HAS_PCH_SPLIT() is the most radical change across chipsets,
so it seems like a natural split to simplify the code.
This first commit just copies the existing code without changing
anything.
v2: updated to track removal of call to intel_enable_plane from i9xx_crtc_mode_set
Signed-off-by: Eric Anholt <eric@anholt.net>
Hella-acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This reverts commit a7a75c8f70.
There are two different variations on how Intel hardware addresses the
"Hardware Status Page". One as a location in physical memory and the
other as an offset into the virtual memory of the GPU, used in more
recent chipsets. (The HWS itself is a cacheable region of memory which
the GPU can write to without requiring CPU synchronisation, used for
updating various details of hardware state, such as the position of
the GPU head in the ringbuffer, the last breadcrumb seqno, etc).
These two types of addresses were updated in different locations of code
- one inline with the ringbuffer initialisation, and the other during
device initialisation. (The HWS page is logically associated with
the rings, and there is one HWS page per ring.) During resume, only the
ringbuffers were being re-initialised along with the virtual HWS page,
leaving the older physical address HWS untouched. This then caused a
hang on the older gen3/4 (915GM, 945GM, 965GM) the first time we tried
to synchronise the GPU as the breadcrumbs were never being updated.
Reported-and-tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Jan Niehusmann <jan@gondor.com>
Reported-by: Justin P. Mattock <justinmattock@gmail.com>
Reported-and-tested-by: Michael "brot" Groh <brot@minad.de>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'intel/drm-intel-next' of ../drm-next: (755 commits)
drm/i915: Only wait on a pending flip if we intend to write to the buffer
drm/i915/dp: Sanity check eDP existence
drm/i915: Rebind the buffer if its alignment constraints changes with tiling
drm/i915: Disable GPU semaphores by default
drm/i915: Do not overflow the MMADDR write FIFO
Revert "drm/i915: fix corruptions on i8xx due to relaxed fencing"
drm/i915: Don't save/restore hardware status page address register
drm/i915: don't store the reg value for HWS_PGA
drm/i915: fix memory corruption with GM965 and >4GB RAM
Linux 2.6.38-rc7
Revert "TPM: Long default timeout fix"
drm/i915: Re-enable GPU semaphores for SandyBridge mobile
drm/i915: Replace vblank PM QoS with "Interrupt-Based AGPBUSY#"
Revert "drm/i915: Use PM QoS to prevent C-State starvation of gen3 GPU"
drm/i915: Allow relocation deltas outside of target bo
drm/i915: Silence an innocuous compiler warning for an unused variable
fs/block_dev.c: fix new kernel-doc warning
ACPI: Fix build for CONFIG_NET unset
mm: <asm-generic/pgtable.h> must include <linux/mm_types.h>
x86: Use u32 instead of long to set reset vector back to 0
...
Conflicts:
drivers/gpu/drm/i915/i915_gem.c
Early gen3 and gen2 chipset do not have the relaxed per-surface tiling
constraints of the later chipsets, so we need to check that the GTT
alignment is correct for the new tiling. If it is not, we need to
rebind.
Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Andi Kleen narrowed his GPU hangs on his Sugar Bay (SNB desktop) rev 09
down to the use of GPU semaphores, and we already know that they appear
broken up to Huron River (mobile) rev 08. (I'm optimistic that disabling
GPU semaphores is simply hiding another bug by the latency and
side-effects of the additional device interaction it introduces...)
However, use of semaphores is a massive performance improvement... Only
as long as the system remains stable. Enable at your peril.
Reported-by: Andi Kleen <andi-fd@firstfloor.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33921
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Whilst the GT is powered down (rc6), writes to MMADDR are placed in a
FIFO by the System Agent. This is a limited resource, only 64 entries, of
which 20 are reserved for Display and PCH writes, and so we must take
care not to queue up too many writes. To avoid this, there is counter
which we can poll to ensure there are sufficient free entries in the
fifo.
"Issuing a write to a full FIFO is not supported; at worst it could
result in corruption or a system hang."
Reported-and-Tested-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34056
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
It's cleaned before saving and re-initialized after restoring.
So don't need to save/restore it. And also new chip has new address
for hardware status page register, don't write to old address.
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Using PM latency request turns out to be very fragile and only works for
some systems, depending upon the ACPI implementation. However, I've
stumbled across a promising bit in INSTPM: "Interrupt-Based AGPBUSY#".
This reverts commit b0b544cd37.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In order to prevent "crushed blacks" on TVs, the range of the RGB output
may be limited to 16-235. This used to be available through Xorg under
the "Broadcast RGB" option, so reintroduce support for KMS.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34543
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The code paths for modesetting are growing in complexity as we may need
to move the buffers around in order to fit the scanout in the aperture.
Therefore we face a choice as to whether to thread the interruptible status
through the entire pinning and unbinding code paths or to add a flag to
the device when we may not be interrupted by a signal. This does the
latter and so fixes a few instances of modesetting failures under stress.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Seems like we are forever to be cursed with buggy firmware, so allow the
user to explicitly set the panel connection status.
Of secondary utility for cases where I run laptops with the lid closed,
but still want to configure the LVDS.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This reverts commit 633f2ea266 and the
attempted fix dcbe6f2b3d.
There is a single clock source used for both SSC (some LVDS and DP) and
non-SSC (VGA, DVI) outputs. So we need to be careful to only enable SSC
as necessary. However, fiddling with DREFCLK was causing DP links to be
dropped and we do not have a fix ready, so revert.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Grab the latest stabilisation bits from -fixes and some suspend and
resume fixes from linus.
Conflicts:
drivers/gpu/drm/i915/i915_drv.h
drivers/gpu/drm/i915/i915_irq.c
The automatic powersaving feature is once again causing havoc, with 100%
reliable hangs on boot and resume on affected machines.
Reported-by: Francesco Allertsen <fallertsen@gmail.com>
Reported-by: Gui Rui <chaos.proton@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=28582
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We had some conversions over to the _PIPE macros, but didn't get
everything. So hide the per-pipe regs with an _ (still used in a few
places for legacy) and add a few _PIPE based macros, then make sure
everyone uses them.
[update: remove usage of non-existent no-op macro]
[update 2: keep modesetting suspend/resume code, update to new reg names]
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[ickle: stylistic cleanups for checkpatch and taste]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
A lot of minor tweaks to fix the tracepoints, improve the outputting for
ftrace, and to generally make the tracepoints useful again. It is a start
and enough to begin identifying performance issues and gaps in our
coverage.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This is just an idea that might or might not be a good idea,
it basically adds two ioctls to create a dumb and map a dumb buffer
suitable for scanout. The handle can be passed to the KMS ioctls to create
a framebuffer.
It looks to me like it would be useful in the following cases:
a) in development drivers - we can always provide a shadowfb fallback.
b) libkms users - we can clean up libkms a lot and avoid linking
to libdrm_*.
c) plymouth via libkms is a lot easier.
Userspace bits would be just calls + mmaps. We could probably
mark these handles somehow as not being suitable for acceleartion
so as top stop people who are dumber than dumb.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Judging by comments in the BIOS, if the SDVO LVDS option h40 is enabled,
then we are supposed to query the real panel type via Int15. We don't do
this and so for the Sony Vaio VGC-JS210J which has otherwise default
values, we choose the wrong mode.
This patch adds a driver option, i915.vbt_sdvo_panel_type, which can be
used to override the value in the VBT.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33691
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Instead of reporting EIO upfront in the entrance of an ioctl that may or
may not attempt to use the GPU, defer the actual detection of an invalid
ioctl to when we issue a GPU instruction. This allows us to continue to
use bo in video memory (via pread/pwrite and mmap) after the GPU has hung.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Rather than power cycling the panel when there are no bits to display,
use the VDD AUX bit to power the panel up just enough for DP AUX
transactions to work. This prevents a bit of unnecessary ugliness as
mode sets occur on the panel.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Move the plane->mode config to the point of use rather than repeatedly
querying the same information.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We can only utilize the stolen portion of the GTT if we are in sole
charge of the hardware. This is only true if using GEM and KMS,
otherwise VESA continues to access stolen memory.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
For CRT and SDVO/HDMI, we need to use a normal, non-SSC, clock and so we
must clear any enabling bits left-over from earlier outputs. And also
seems to correct the LVDS panel on the Lenovo U160.
However, at one point, it did cause an "ERROR failed to disable
trancoder". So prolonged testing on top of Jesse's refactored and
error-checking CRTC logic is desired.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Move code around and invoke iomem annotation in a few more places in
order to silence sparse. Still a few more iomem annotations to go...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
945 class hardware has an interesting quirk in which the vblank
interrupt is not raised if the CPU is in a low power state. (We also
suspect that the memory bus is clocked to the CPU/c-state and not the
GPU so there are secondary starvation issues.) In order to prevent the
most obvious issue of the low of the vblank interrupt (stuttering
compositing that only updates when the mouse is moving) is to install a
PM QoS request to prevent low c-states whilst the GPU is active.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Admittedly, trusting ACPI or the BIOS at all to be correct is littered
with numerous examples where it is wrong. Maybe, just maybe, we will
have better luck using the ACPI OpRegion lid status...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In order to workaround the issue with LVDS not working on the Lenovo
U160 apparently due to using the wrong SSC frequency, add an option to
disable SSC.
Suggested-by: Lukács, Árpád <lukacs.arpad@gmail.com>
Bugzillla: https://bugs.freedesktop.org/show_bug.cgi?id=32748
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
As the mappable portion of the aperture is always a small subset at the
start of the GTT, it is allocated preferentially by drm_mm. This is
useful in case we ever need to map an object later. However, if you have
a large object that can consume the entire mappable region of the
GTT this prevents the batchbuffer from fitting and so causing an error.
Instead allocate all those that require a mapping up front in order to
improve the likelihood of finding sufficient space to bind them.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cleanup several aspects of the rc6 code:
- misnamed intel_disable_clock_gating function (was only about rc6)
- remove commented call to intel_disable_clock_gating
- rc6 enabling code belongs in its own function (allows us to move the
actual clock gating enable call back into restore_state)
- allocate power & render contexts up front, only free on unload
(avoids ugly lazy init at rc6 enable time)
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[ickle: checkpatch cleanup]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
By tracking the current status of the backlight we can prevent recording
the value of the current backlight when we have disabled it. And so
prevent restoring it to 'off' after an unbalanced sequence of
intel_lvds_disable/enable.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=22672
Tested-by: Alex Riesen <raa.lkml@gmail.com>
Tested-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
The relative-to-general state default is useless as it means having to
rewrite the streaming kernels for each batch. Relative-to-surface is
more useful, as that stream usually needs to be rewritten for each
batch. And absolute addressing mode, vital if you start streaming
state, is also only available by adjusting the register...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Add an interrupt handler for switching graphics frequencies and handling
PM interrupts. This should allow for increased performance when busy
and lower power consumption when idle.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
v2: Change IS_IRONLAKE to IS_GEN5 to adapt to 2.6.37
This patch adds new functions for use by the drm core:
.get_vblank_timestamp() provides a precise timestamp
for the end of the most recent (or current) vblank
interval of a given crtc, as needed for the DRI2
implementation of the OML_sync_control extension.
It is a thin wrapper around the drm function
drm_calc_vbltimestamp_from_scanoutpos() which does
almost all the work.
.get_scanout_position() provides the current horizontal
and vertical video scanout position and "in vblank"
status of a given crtc, as needed by the drm for use by
drm_calc_vbltimestamp_from_scanoutpos().
The patch modifies the pageflip completion routine
to use these precise vblank timestamps as the timestamps
for pageflip completion events.
This code has been only tested on a HP-Mini Netbook with
Atom processor and Intel 945GME gpu. The codepath for
(IS_G4X(dev) || IS_GEN5(dev) || IS_GEN6(dev)) gpu's
has not been tested so far due to lack of hardware.
Signed-off-by: Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Once we have read the value out of the GT power well, we need to remove
the FORCE WAKE bit to allow the system to auto-power down.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we provide a list of all objects that will be accessed from the
batchbuffer, we can build a lut of the handles associated with those
objects for this invocation and use that to avoid the overhead of
looking up those objects again for every relocation.
The cost of building and searching a small hash table is much less than
that of acquiring a spinlock, searching a radix tree and manipulating an
atomic refcnt per relocation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The bulk of the change is to convert the growing list of rings into an
array so that the relationship between the rings and the semaphore sync
registers can be easily computed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
With this change, every batchbuffer can use all available fences (save
pinned and scanout, of course) without ever stalling the gpu!
In theory. Currently the actual pipelined update of the register is
disabled due to some stability issues. However, just the deferred update
is a significant win.
Based on a series of patches by Daniel Vetter.
The premise is that before every access to a buffer through the GTT we
have to declare whether we need a register or not. If the access is by
the GPU, a pipelined update to the register is made via the ringbuffer,
and we track the last seqno of the batches that access it. If by the
CPU we wait for the last GPU access and update the register (either
to clear or to set it for the current buffer).
One advantage of being able to pipeline changes is that we can defer the
actual updating of the fence register until we first need to access the
object through the GTT, i.e. we can eliminate the stall on set_tiling.
This is important as the userspace bo cache does not track the tiling
status of active buffers which generate frequent stalls on gen3 when
enabling tiling for an already bound buffer.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This makes the various rings more consistent by removing the anomalous
handing of the rendering ring execbuffer dispatch.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Besides the minimal improvement in reducing the execbuffer overhead, the
real benefit is clarifying a few routines.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
A number of dragons have been seen lurking within the execbuffer code.
The first step is then to isolate them from the rest and begin to
scrutinise them in depth. Suggested by Daniel Vetter.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Simply remove our accounting of objects inside the aperture, keeping
only track of what is in the aperture and its current usage. This
removes the over-complication of BUGs that were attempting to keep the
accounting correct and also removes the overhead of the accounting on
the hot-paths.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The pipe control object is allocated by the device for the sole use of the
render ringbuffer. Move this detail from the general code to the render
ring buffer initialisation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Having seen the effects of erroneous fencing on the batchbuffer, a
useful sanity check is to record the fence registers at the time of an
error.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This still uses the agp functions to actually reinstate the mappings
(with a gross hack to make agp cooperate), but it wires everything
up correctly for the switchover.
The call to agp_rebind_memory can be dropped because all non-kms drivers
do all their rebinding on EnterVT.
v2: Be more paranoid and flush the chipset cache after restoring gtt
mappings.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This is required to restore gtt mappings on resume when agp is gone.
The right way to do this would be to make sturct drm_mm_node embeddable
and use the allocation list maintained by the drm memory manager. But
that's a bigger project. Getting rid of the per bo agp_mem will save
more memory than this wastes, anyway.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The GATT is a write-only set of registers, reading from them in the
manner of i915_gtt_to_phys() is supposed to be undefined. However a
simple solution exists as we allocate linear memory from the stolen
area, we can simply add the block offset to the base register. As a
side-effect we recover all the unused stolen GTT entries and so enlarge
our aperture.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We were reading our 64-bit value in I915_READ64 and returning 32 bits
of it. The restoration of fence regs at resume then had a zero end
value, and the fence had no effect.
Version 2: Split register access functions into per-size versions
Sharing code between different sizes seemed reasonable when we only
needed a single copy, but as 64-bit access requires its own version,
it makes sense to just split them out for each size.
Reported-by: Peter Clifton <pcjc2@cam.ac.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
[ickle: use a macro to create the various read/write routines]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
When trying to diagnose mysterious errors on resume, capture the
display register contents as well.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The pinned buffers are useful for diagnosing errors in setting up state
for the chipset, which may not necessarily be 'active' at the time of
the error, e.g. the cursor buffer object.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
An old and oft reported bug, is that of the GPU hanging on a
MI_WAIT_FOR_EVENT following a mode switch. The cause is that the GPU is
waiting on a scanline counter on an inactive pipe, and so waits for a
very long time until eventually the user reboots his machine.
We can prevent this either by moving the WAIT into the kernel and
thereby incurring considerable cost on every swapbuffers, or by waiting
for the GPU to retire the last batch that accesses the framebuffer
before installing a new one. As mode switches are much rarer than swap
buffers, this looks like an easy choice.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=28964
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=29252
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Before reading ring register, set FORCE_WAKE bit to prevent GT core
power down to low power state, otherwise we may read stale values.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
[ickle: added a udelay which seemed to do the trick on my SNB]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we use POSTING_READ to flush the write to the register before
proceeding, we do not care what the return value is and similar we do
not care for the read to be recorded whilst tracing register
read/writes.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This will be used later to hide the frequently written registers
from debug traces in order to increase the signal-to-noise.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Add two tracepoints at I915_WRITE/READ for tracing down all the
register write and read.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
a00b10c360 "Only enforce fence limits inside the GTT" also
added a fenceable/mappable disdinction when binding/pinning buffers.
This only complicates the code with no pratical gain:
- In execbuffer this matters on for g33/pineview, as this is the only
chip that needs fences and has an unmappable gtt area. But fences
are only possible in the mappable part of the gtt, so need_fence
implies need_mappable. And need_mappable is only set independantly
with relocations which implies (for sane userspace) that the buffer
is untiled.
- The overlay code is only really used on i8xx, which doesn't have
unmappable gtt. And it doesn't support tiled buffers, currently.
- For all other buffers it's a bug to pass in a tiled bo.
In short, this disdinction doesn't have any practical gain.
I've also reverted mapping the overlay and context pages as possibly
unmappable. It's not worth being overtly clever here, all the big
gains from unmappable are for execbuf bos.
Also add a comment for a clever optimization that confused me
while reading the original patch by Chris Wilson.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Take two passes to evict everything whilst searching for sufficient free
space to bind the batchbuffer. After searching for sufficient free space
using LRU eviction, evict everything that is purgeable and try again.
Only then if there is insufficient free space (or the GTT is too badly
fragmented) evict everything from the aperture and try one last time.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
So long as we adhere to the fence registers rules for alignment and no
overlaps (including with unfenced accesses to linear memory) and account
for the tiled access in our size allocation, we do not have to allocate
the full fenced region for the object. This allows us to fight the bloat
tiling imposed on pre-i965 chipsets and frees up RAM for real use. [Inside
the GTT we still suffer the additional alignment constraints, so it doesn't
magic allow us to render larger scenes without stalls -- we need the
expanded GTT and fence pipelining to overcome those...]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
By using read_cache_page() for individual pages during pwrite/pread we
can eliminate an unnecessary large allocation (and immediate free) of
obj->pages. Also this eliminates any potential nesting of get/put pages,
simplifying the code and preparing the path for greater things.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Since we rarely use the mmap_offset and it is easily computable from the
obj->map_list.hash, remove it.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Eliminate the racy device unload by embedding a shrinker into each
device. Smaller, simpler code.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This holds error state from the main graphics arbiter mainly involving
the DMA engine and address translation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
More precisely: For those that _need_ to be mappable. Also add two
BUG_ONs in fault and pin to check the consistency of the mappable
flag.
Changes in v2:
- Add tracking of gtt mappable space (to notice mappable/unmappable
balancing issues).
- Improve the mappable working set tracking by tracking fault and pin
separately.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
At least the part that's currently enabled by the BIOS.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Like before add a parameter mappable (also to gem_object_pin) and
set it depending upon the context. Only bos that are brought into
the gtt due to an execbuffer call can be put into the unmappable
part of the gtt, everything else (especially pinned objects) need
to be put into the mappable part of the gtt.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Add a mappable parameter to i915_gem_evict_something to distinguish
the two cases (non-restricted vs. mappable gtt allocations). No
functional changes because the mappable limit is set to the end of
the gtt currently.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Preparing the ringbuffer for adding new commands can fail (a timeout
whilst waiting for the GPU to catch up and free some space). So check
for any potential error before overwriting HEAD with new commands, and
propagate that error back to the user where possible.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The ringbuffer keeps a pointer to the parent device, so we can use that
instead of passing around the pointer on the stack.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
... to prevent flush processing of an idle (or even absent) ring.
This fixes a regression during suspend from 87acb0a5.
Reported-and-tested-by: Alexey Fisher <bug-track@fisher-privat.net>
Tested-by: Peter Clifton <pcjc2@cam.ac.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Based on an original patch by Zhenyu Wang, this initializes the BLT ring for
SandyBridge and enables support for user execbuffers.
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
To handle retirements, we need per-ring tracking of active objects.
To handle evictions, we need global tracking of active objects.
As we enable more rings, rebuilding the global list from the individual
per-ring lists quickly grows tiresome and overly complicated. Tracking the
active objects in two lists is the lesser of two evils.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cache the first 4 bytes of DPCD data in the eDP case. It's unlikely to
change and can save us some trouble at link training time.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We need to use some of these values in eDP configurations, so be sure to
fetch them and store them in the i915 private structure.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The _DSM method on the integrated graphics device can tell us which
connectors are muxable, so add support for making the call and parsing
out the connector info.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[ickle: fix compiler warnings for using uninitialized 'result' and
downgrade error message for non-switchable devices]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The issue is that we may become stuck executing a long running shader
and continually attempt to reset the GPU. (Or maybe we tickle some bug
and need to break the vicious cycle.) So if we are detect a second hang
within 5 seconds, give up trying to programme the GPU and report it
wedged.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
When the GPU is reset, the fence registers are invalidated, so release
the objects and clear them out.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Only drm/i915 does the bookkeeping that makes the information useful,
and the information maintained is driver specific, so move it out of the
core and into its single user.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Airlie <airlied@redhat.com>
This check only appears to succeed when using GMBUS, so we need to skip
it if we have fallen back to using GPIO bit banging.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Besides a couple of bugs when writing more than a single byte along the
GMBUS, SDVO was completely failing whilst trying to use GMBUS, so use
bit banging instead.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Daniel Vetter pointed out that in this case is would be clearer and
cleaner to use a spinlock instead of a mutex to protect the per-file
request list manipulation. Make it so.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Owain Ainsworth reported an issue between the interaction of the
hangcheck and userspace immediately (and permanently) falling back to
s/w rasterisation. In order to break the mutex and begin resetting the
GPU, we must abort the current operation (usually within the wait) and
climb sufficiently far back up the call chain to drop the mutex. In his
implementation, Owain has a loop within the ioctl handler to detect the
hang and then sleep until the error handler has run. I've chosen to
return to userspace and report an EAGAIN which should trigger the
userspace ioctl handler to repeat the call (simply because it felt less
invasive...). Before hitting a wedged GPU, we then wait upon completion
of the error handler.
Reported-by: Owain G. Ainsworth <zerooa@googlemail.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Avoid cause latencies in other clients by not taking the global struct
mutex and moving the per-client request manipulation a local per-client
mutex. For example, this allows a compositor to schedule a page-flip
(through X) whilst an OpenGL application is monopolising the GPU.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
First step, lets have a look at the values for troublesome panels and
see if they may be used to improve our link training.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We need to drain the pending flips prior to disabling the pipe during
modeset, and these need to be done in an uninterruptible fashion.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This is already performed with the pipelined flush, so by the time we
schedule the flush in the page-flip, the ring is NULL and we OOPs
instead.
Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Keep a list of pinned objects and display it via debugfs. Now all
objects that exist in the GTT are always tracked on one of the
active, flushing, inactive or pinned lists.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If we have queued a page flip on the current fb and then request a mode
change, wait until the page flip completes before performing the new
request.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Track if the gpu requires the fence for the execution of a batch buffer
and so only wait upon the retirement of the object's last rendering
seqno if the fence is in use by the GPU.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Xiang, Haihao <haihao.xiang@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>