So here's the Broadwell pull request. From a kernel driver pov there's
two areas with big changes in Broadwell:
- Completely new enumerated interrupt bits. On the plus side it now looks
fairly unform and sane.
- Completely new pagetable layout.
To ensure minimal impact on existing platforms we've refactored both the
irq and low-level gtt handling code a lot in anticipation of the bdw push.
So now bdw enabling in these areas just plugs in a bunch of vfuncs.
Otherwise it's all fairly harmless adjusting of switch cases and
if-ladders to shovel bdw into the right blocks. So minimized impact on
existing platforms. I've also merged the bdw-stage1 branch into our
-nightly integration branch for the past week to make sure we don't break
anything.
Note that there's still quite a flurry or patches floating around, but
I've figured I'll push this out. I plan to keep the bdw fixes separate
from my usual -fixes stream so that you can reject them easily in case it
still looks like too much churn. Also, bdw is for now hidden behind the
preliminary hw enabling module option. So there's no real pressure to get
follow-up patches all into 3.13.
* tag 'bdw-stage1-2013-11-08-v2' of git://people.freedesktop.org/~danvet/drm-intel: (75 commits)
drm/i915: Mask the vblank interrupt on bdw by default
drm/i915: Wire up cpu fifo underrun reporting support for bdw
drm/i915: Optimize gen8_enable|disable_vblank functions
drm/i915: Wire up pipe CRC support for bdw
drm/i915: Wire up PCH interrupts for bdw
drm/i915: Wire up port A aux channel
drm/i915: Fix up the bdw pipe interrupt enable lists
drm/i915: Optimize pipe irq handling on bdw
drm/i915/bdw: Take render error interrupt out of the mask
drm/i915/bdw: Add BDW PCH check first
drm/i915: Use hsw_crt_get_config on BDW
drm/i915/bdw: Change dp aux timeout to 600us on DDIA
drm/i915/bdw: Enable trickle feed on Broadwell
drm/i915/bdw: WaSingleSubspanDispatchOnAALinesAndPoints
drm/i915/bdw: conservative SBE VUE cache mode
drm/i915/bdw: Limit SDE poly depth FIFO to 2
drm/i915/bdw: Sampler power bypass disable
ddrm/i915/bdw: Disable centroid pixel perf optimization
drm/i915/bdw: BWGTLB clock gate disable
drm/i915/bdw: Implement edp PSR workarounds
...
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
HW engineers have listened and given us again a real interrupt with
masking and status regs. Yay!
For consistency with other platforms call the #define FIFO_UNDERRUN.
Eventually we also might need to have some enable/disable functions
for bdw display interrupts, but for now open-coding seems to be good
enough.
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Let's cache the IMR value like on other platforms. This is needed to
implement the underrun reporting since then we'll have two places that
change the same register at runtime.
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The layout of the CRC registers is the same as on hsw, only the
interrupt handling has changed a bit. So trivial to wire up, yay!
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Gives us hotplug, gmbus, dp aux and south errors (underrun
reporting!).
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Useful for dp aux to work better. Also stop enabling the port A
hotplug event - eDP panels are expected to fire that interupt and
we're not really ready to deal with them. This is consistent with how
we handle port A on ilk-hsw.
The more important bit is that we must delay the enabling of hotplug
interrupts until all the encoders are fully set up. But we need irq
support earlier than that, hence hotplug interrupts can only be
enabled in the ->hpd_irq_setup callback.
v2: Drop the _HOTPLUG, it isn't (Ville).
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
- Pipe underrun can't just be enabled, we need some support code like
on ilk-hsw to make this happen. So drop it for now.
- CRC error is a special mode of the CRC hardware that we don't use,
so again drop it. Real CRC support for bdw will be added later.
- All the other error bits are about faults, so rename the #define and
adjust the output.
v2: Use pipe_name as pointed out by Ville. Ville's comment was on a
previous patch, but it was easier to squash in here.
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We have a per-pipe bit in the master irq control register, so use it.
This allows us to drop the masks for aggregate interrupt bits and be a
bit more explicit in the code. It also removes one indentation level.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The interrupt handling implementation remains the same as previous
generations with the 4 types of registers, status, identity, mask, and
enable. However the layout of where the bits go have changed entirely.
To address these changes, all of the interrupt vfuncs needed special
gen8 code.
The way it works is there is a top level status register now which
informs the interrupt service routine which unit caused the interrupt,
and therefore which interrupt registers to read to process the
interrupt. For display the division is quite logical, a set of interrupt
registers for each pipe, and in addition to those, a set each for "misc"
and port.
For GT the things get a bit hairy, as seen by the code. Each of the GT
units has it's own bits defined. They all look *very similar* and
resides in 16 bits of a GT register. As an example, RCS and BCS share
register 0. To compact the code a bit, at a slight expense to
complexity, this is exactly how the code works as well. 2 structures are
added to the ring buffer so that our ring buffer interrupt handling code
knows which ring shares the interrupt registers, and a shift value (ie.
the top or bottom 16 bits of the register).
The above allows us to kept the interrupt register caching scheme, the
per interrupt enables, and the code to mask and unmask interrupts
relatively clean (again at the cost of some more complexity).
Most of the GT units mentioned above are command streamers, and so the
symmetry should work quite well for even the yet to be implemented rings
which Broadwell adds.
v2: Fixes up a couple of bugs, and is more verbose about errors in the
Broadwell interrupt handler.
v3: fix DE_MISC IER offset
v4: Simplify interrupts:
I totally misread the docs the first time I implemented interrupts, and
so this should greatly simplify the mess. Unlike GEN6, we never touch
the regular mask registers in irq_get/put.
v5: Rebased on to of recent pch hotplug setup changes.
v6: Fixup on top of moving num_pipes to intel_info.
v7: Rebased on top of Egbert Eich's hpd irq handling rework. Also
wired up ibx_hpd_irq_setup for gen8.
v8: Rebase on top of Jani's asle handling rework.
v9: Rebase on top of Ben's VECS enabling for Haswell, where he
unfortunately went OCD on the gt irq #defines. Not that they're still
not yet fully consistent:
- Used the GT_RENDER_ #defines + bdw shifts.
- Dropped the shift from the L3_PARITY stuff, seemed clearer.
- s/irq_refcount/irq_refcount.gt/
v10: Squash in VECS enabling patches and the gen8_gt_irq_handler
refactoring from Zhao Yakui <yakui.zhao@intel.com>
v11: Rebase on top of the interrupt cleanups in upstream.
v12: Rebase on top of Ben's DPF changes in upstream.
v13: Drop bdw from the HAS_L3_DPF feature flag for now, it's unclear what
exactly needs to be done. Requested by Ben.
v14: Fix the patch.
- Drop the mask of reserved bits and assorted logic, it doesn't match
the spec.
- Do the posting read inconditionally instead of commenting it out.
- Add a GEN8_MASTER_IRQ_CONTROL definition and use it.
- Fix up the GEN8_PIPE interrupt defines and give the GEN8_ prefixes -
we actually will need to use them.
- Enclose macros in do {} while (0) (checkpatch).
- Clear DE_MISC interrupt bits only after having processed them.
- Fix whitespace fail (checkpatch).
- Fix overtly long lines where appropriate (checkpatch).
- Don't use typedef'ed private_t (maintainer-scripts).
- Align the function parameter list correctly.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v4)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
bikeshed
Bit a bit -fixes pull request in the merge window than usual dua to two
feauture-y things:
- Display CRCs are now enabled on all platforms, including the odd DP case
on gm45/vlv. Since this is a testing-only feature it should ever hurt,
but I figured it'll help with regression-testing -fixes. So I left it
in and didn't postpone it to 3.14.
- Display power well refactoring from Imre. Would have caused major pain
conflict with the bdw stage 1 patches if I'd postpone this to -next.
It's only an relatively small interface rework, so shouldn't cause pain.
It's also been in my tree since almost 3 weeks already.
That accounts for about two thirds of the pull, otherwise just bugfixes:
- vlv backlight fix from Jesse/Jani
- vlv vblank timestamp fix from Jesse
- improved edp detection through vbt from Ville (fixes a vlv issue)
- eDP vdd fix from Paulo
- fixes for dvo lvds on i830M
- a few smaller things all over
Note: This contains a backmerge of v3.12. Since the -internal branch
always applied on top of -nightly I need that unified base to merge bdw
patches. So you'll get a conflict with radeon connector props when pulling
this (and nouveau/master will also conflict a bit when Ben doesn't
rebase). The backmerge itself only had conflicts in drm/i915.
There's also a tiny conflict between Jani's backlight fix and your sysfs
lifetime fix in drm-next.
* tag 'drm-intel-fixes-2013-11-07' of git://people.freedesktop.org/~danvet/drm-intel: (940 commits)
drm/i915/vlv: use per-pipe backlight controls v2
drm/i915: make backlight functions take a connector
drm/i915: move opregion asle request handling to a work queue
drm/i915/vlv: use PIPE_START_VBLANK interrupts on VLV
drm/i915: Make intel_dp_is_edp() less specific
drm/i915: Give names to the VBT child device type bits
drm/i915/vlv: enable HDA display audio for Valleyview2
drm/i915/dvo: call ->mode_set callback only when the port is running
drm/i915: avoid unclaimed registers when capturing the error state
drm/i915: Enable DP port CRC for the "auto" source on g4x/vlv
drm/i915: scramble reset support for DP port CRC on vlv
drm/i915: scramble reset support for DP port CRC on g4x
drm/i916: add "auto" pipe CRC source
...
Conflicts:
MAINTAINERS
drivers/gpu/drm/i915/intel_panel.c
drivers/gpu/drm/nouveau/core/subdev/mc/base.c
drivers/gpu/drm/radeon/atombios_encoders.c
drivers/gpu/drm/radeon/radeon_connectors.c
This fixes a mismatch between our vblank enable code and our IRQ
handler. Also, since vblank start events come in before page flips
reliably, it also fixes the kms_flip plain-flip test on my BYT system.
Spotted-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Move the ktime_get() clock readouts and potential preempt_disable()
calls from drm core into kms driver to make it compatible with the
api changes in the drm core.
The intel-kms driver needs to take the uncore.lock inside
i915_get_crtc_scanoutpos() and intel_pipe_in_vblank().
This is incompatible with the preempt_disable() on a
PREEMPT_RT patched kernel, as regular spin locks must not
be taken within a preempt_disable'd section. Lock contention
on the uncore.lock also introduced too much uncertainty in vblank
timestamps.
Push the ktime_get() timestamping for scanoutpos queries and
potential preempt_disable_rt() into i915_get_crtc_scanoutpos(),
so these problems can be avoided:
1. First lock the uncore.lock (might sleep on a PREEMPT_RT kernel).
2. preempt_disable_rt() (will be added by the rt-linux folks).
3. ktime_get() a timestamp before scanout pos query.
4. Do all mmio reads as fast as possible without grabbing any new locks!
5. ktime_get() a post-query timestamp.
6. preempt_enable_rt()
7. Unlock the uncore.lock.
This reduces timestamp uncertainty on a low-end HP Atom Mini netbook
with Intel GMA-950 nicely:
Before: 3-8 usecs with spikes > 20 usecs, triggering query retries.
After : Typically 1 usec (98% of all samples), occassionally 2 usecs
(2% of all samples), with maximum of 3 usecs (a handful).
v2: Fix formatting of new multi-line code comments.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
- Use a for_each_loop and add the corresponding #defines.
- Drop the _ILK postfix on the existing DE_PIPE_VBLANK macro for
consistency with everything else.
- Also use macros (and add the missing one for plane flips) for the
ivb display interrupt handler.
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
[danvet: Drop the useless parens that Ville spotted.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Request by Ville in his review of the CRC stuff. This converts
everything but ilk_display_irq_handler since that needs a bit more
than a simple search&replace to look nice.
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Otherwise QA will report this as a real hang when running igt
ZZ_missed_irq.
v2: Actually test the right stuff and really shut up the DRM_ERROR
output ...
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70747
Tested-by: lu hua <huax.lu@intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
- CRC support from Damien and He Shuang. Long term this should allow us to
test an awful lot modesetting corner cases automatically. So for me as
the maintainer this is really big.
- HDMI audio fix from Jani.
- VLV dpll computation code refactoring from Ville.
- Fixups for the gpu booster from last time around (Chris).
- Some cleanups in the context code from Ben.
- More watermark work from Ville (we'll be getting there ...).
- vblank timestamp improvements from Ville.
- CONFIG_FB=n support, including drm core changes to make the fbdev
helpers optional.
- DP link training improvements (Jani).
- mmio vtable from Ben, prep work for future hw.
* tag 'drm-intel-next-2013-10-18' of git://people.freedesktop.org/~danvet/drm-intel: (132 commits)
drm/i915/dp: don't mention eDP bpp clamping if it doesn't affect bpp
drm/i915: remove dead code in ironlake_crtc_mode_set
drm/i915: crc support for hsw
drm/i915: fix CRC debugfs setup
drm/i915: wait one vblank when disabling CRCs
drm/i915: use ->get_vblank_counter for the crc frame counter
drm/i915: wire up CRC interrupt for ilk/snb
drm/i915: add CRC #defines for ilk/snb
drm/i915: extract display_pipe_crc_update
drm/i915: don't Oops in debugfs for I915_FBDEV=n
drm/i915: set HDMI pixel clock in audio configuration
drm/i915: pass mode to ELD write vfuncs
cpufreq: Add dummy cpufreq_cpu_get/put for CONFIG_CPU_FREQ=n
drm/i915: check gem bo size when creating framebuffers
drm/i915: Use unsigned long for obj->user_pin_count
drm/i915: prevent tiling changes on framebuffer backing storage
drm/i915: grab dev->struct_mutex around framebuffer_init
drm/i915: vlv: fix VGA hotplug after modeset
drm: add support for additional stereo 3D modes
drm/i915: preserve dispaly init order on ByT
...
So drm was abusing device lifetimes, by having embedded device structures
in the minor and connector it meant that the lifetime of the internal drm
objects (drm_minor and drm_connector) were tied to the lifetime of the device
files in sysfs, so if something kept those files opened the current code
would kfree the objects and things would go downhill from there.
Now in reality there is no need for these lifetimes to be so intertwined,
especailly with hotplugging of devices where we wish to remove the sysfs
and userspace facing pieces before we can unwind the internal objects due
to open userspace files or mmaps, so split the objects out so the struct
device is no longer embedded and do what fbdev does and just allocate
and remove the sysfs inodes separately.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Daniel pointed out that it was hard to get anything lockless to work
correctly, so don't even try for this non critical piece of code and
just use a spin lock.
v2: Make intel_pipe_crc->opened a bool
v3: Use assert_spin_locked() instead of a comment (Daniel Vetter)
v4: Use spin_lock_irq() in the debugfs functions (they can only be
called from process context),
Use spin_lock() in the pipe_crc_update() function that can only be
called from an interrupt handler,
Use wait_event_interruptible_lock_irq() when waiting for data in the
cicular buffer to ensure proper locking around the condition we are
waiting for. (Daniel Vetter)
Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
- Give them an _irq_handler postfix, like all the other irq stuff.
- Shuffle the DEBUG_FS=n dummy functions around a bit. This is prep
work to extract all the crc debug stuff into intel_display_testing.c
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
And throw in a tiny for_each_pipe refactoring for gen2.
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Should work down to gen2. The #defines for the interrupt sources are
already there in PIPESTAT and are the same on all gmch platforms for
gen2 up to vlv.
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
hw designers decided to change the CRC registers and coalesce them all
into one. Otherwise nothing changed. I've opted for a new hsw_ version
to grab the crc sample since hsw+1 will have the same crc registers,
but different interrupt source registers. So this little helper
function will come handy there.
Also refactor the display error handler with a neat pipe loop.
v2: Use for_each_pipe.
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Suggested by Ville.
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We enable the interrupt unconditionally and only control it
through the enable bit in the CRC control register.
v2: Extract per-platform helpers to compute the register values.
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The ringbuffer update logic should always be the same, but different
platforms have different amounts of CRC registers. Hence extract it.
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Also use #ifdef to keep consistent with all other such cases.
Cc: Damien Lespiau <damien.lespiau@intel.com>
Acked-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
seq_file is not quite the right interface for these ones. We have a
circular buffer with a new entry per vblank on one side and a process
wanting to dequeue the CRC with a read().
It's quite racy to wait for vblank in user land and then try to read a
pipe_crc file, sometimes the CRC interrupt hasn't been fired and we end
up with an EOF.
So, let's have the read on the pipe_crc file block until the interrupt
gives us a new entry. At that point we can wake the reading process.
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This shouldn't happen as the buffer is freed after disable pipe CRCs,
but better be safe than sorry.
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There are a few good properties to a circular buffer, for instance it
has a number of entries (before we were always dumping the full buffer).
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There are several points in the display pipeline where CRCs can be
computed on the bits flowing there. For instance, it's usually possible
to compute the CRCs of the primary plane, the sprite plane or the CRCs
of the bits after the panel fitter (collectively called pipe CRCs).
v2: Quite a bit of rework here and there (Damien)
Signed-off-by: Shuang He <shuang.he@intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Fix intermediate compile file reported by Wu Fengguang's
kernel builder.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Gen2 doesn't have a hardware frame counter that can be read out. Just
provide a stub .get_vblank_counter() that always returns 0 instead of
trying to read non-existing registers.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Gen2 doesn't have the pixelcount register that gen3 and gen4 have.
Instead we must use the scanline counter like we do for ctg+.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The DSL register increments at the start of horizontal sync, so it
manages to miss the entire active portion of the current line.
Improve the get_scanoutpos accuracy a bit when the scanout position is
close to the start or end of vblank. We can do that by double checking
the DSL value against the vblank status bit from ISR.
Cc: Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: mario.kleiner.de@gmail.com
Tested-by: mario.kleiner.de@gmail.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The reported scanout position must be relative to the end of vblank.
Currently we manage to fumble that in a few ways.
First we don't consider the case when vtotal != vbl_end. While that
isn't very common (happens maybe only w/ old panel fitting hardware),
we can fix it easily enough.
The second issue is that on pre-CTG hardware we convert the pixel count
to horizontal/vertical components at the very beginning, and then forget
to adjust the horizontal component to be relative to vbl_end. So instead
we should keep our numbers in the pixel count domain while we're
adjusting the position to be relative to vbl_end. Then when we do the
conversion in the end, both vertical _and_ horizontal components will
come out correct.
v2: Change position to int from u32 to avoid sign issues
Cc: Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: mario.kleiner.de@gmail.com
Tested-by: mario.kleiner.de@gmail.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We have all the information we need in the mode structure, so going and
reading it from the hardware is pointless, and slower.
We never populated ->get_vblank_timestamp() in the UMS case, and as that
is the only way we'd ever call ->get_scanout_position(), we can
completely ignore UMS in i915_get_crtc_scanoutpos().
Also reorganize intel_irq_init() a bit to clarify the KMS vs. UMS
situation.
v2: Drop UMS code
Cc: Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: mario.kleiner.de@gmail.com
Tested-by: mario.kleiner.de@gmail.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The old style frame counter increments at the start of active video.
However for i915_get_vblank_counter() we want a counter that increments
at the start of vblank.
Fortunately the low frame counter register also contains the pixel
counter for the current frame. We can can compare that against the
vblank start pixel count to determine if we need to increment the
frame counter by 1 to get the correct answer.
Also reorganize the function pointer assignments in intel_irq_init() a
bit to avoid confusing people.
Cc: Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We lost the ability to capture the first error for a stuck ring in the
recent hangcheck robustification. Whilst both error states are
interesting (why does the GPU not recover is also essential to debug),
our primary goal is to fix the initial hang and so we need to capture
the first error state upon taking hangcheck action.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
After applying wait-boost we often find ourselves stuck at higher clocks
than required. The current threshold value requires the GPU to be
continuously and completely idle for 313ms before it is dropped by one
bin. Conversely, we require the GPU to be busy for an average of 90% over
a 84ms period before we upclock. So the current thresholds almost never
downclock the GPU, and respond very slowly to sudden demands for more
power. It is easy to observe that we currently lock into the wrong bin
and both underperform in benchmarks and consume more power than optimal
(just by repeating the task and measuring the different results).
An alternative approach, as discussed in the bspec, is to use a
continuous threshold for upclocking, and an average value for downclocking.
This is good for quickly detecting and reacting to state changes within a
frame, however it fails with the common throttling method of waiting
upon the outstanding frame - at least it is difficult to choose a
threshold that works well at 15,000fps and at 60fps. So continue to use
average busy/idle loads to determine frequency change.
v2: Use 3 power zones to keep frequencies low in steady-state mostly
idle (e.g. scrolling, interactive 2D drawing), and frequencies high
for demanding games. In between those end-states, we use a
fast-reclocking algorithm to converge more quickly on the desired bin.
v3: Bug fixes - make sure we reset adj after switching power zones.
v4: Tune - drop the continuous busy thresholds as it prevents us from
choosing the right frequency for glxgears style swap benchmarks. Instead
the goal is to be able to find the right clocks irrespective of the
wait-boost.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When we switched to always using a timeout in conjunction with
wait_seqno, we lost the ability to detect missed interrupts. Since, we
have had issues with interrupts on a number of generations, and they are
required to be delivered in a timely fashion for a smooth UX, it is
important that we do log errors found in the wild and prevent the
display stalling for upwards of 1s every time the seqno interrupt is
missed.
Rather than continue to fix up the timeouts to work around the interface
impedence in wait_event_*(), open code the combination of
wait_event[_interruptible][_timeout], and use the exposed timer to
poll for seqno should we detect a lost interrupt.
v2: In order to satisfy the debug requirement of logging missed
interrupts with the real world requirments of making machines work even
if interrupts are hosed, we revert to polling after detecting a missed
interrupt.
v3: Throw in a debugfs interface to simulate broken hw not reporting
interrupts.
v4: s/EGAIN/EAGAIN/ (Imre)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Imre Deak <imre.deak@intel.com>
[danvet: Don't use the struct typedef in new code.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We only wish to know the value of seqno when emitting the tracepoint, so
move the query from a parameter to the macro to inside the conditional
macro body so that the query is only evaluated when required.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQEcBAABAgAGBQJSQMORAAoJEHm+PkMAQRiGj14H/1bjhtfNjPdX7MVQAzA+WpwX
s7h1IQu2Si9S5S1lBiM2sBTOssVcmfheO9x4yqm7JNOD1RnssWKOM3q+zVOLstwd
GD3gluJPeraD5EyYSqEJ9ILPQ3gbxb4wOlT0Z291TW6E8XhLRr0RTOJPksRsgvLH
Ckm9uJh6ArS6ZXfXiaDQfd+xHAQJkUfW6nMSA0g9ZO9C6KIDRvcbUmrY3m4HhfIk
mK0TXCBs+AXGDIjTEB8JgIQL/5y1Qn0c4R+2uTU/4YWwyLvJTV1e44kGoleukMMT
6Pw/TNlUEN161dbSaqCyF3sfXHDYQ5valycI2PDgitMtPSxbzsU1VDizS8+daRg=
=lEmF
-----END PGP SIGNATURE-----
Merge tag 'v3.12-rc2' into drm-intel-next
Backmerge Linux 3.12-rc2 to prep for a bunch of -next patches:
- Header cleanup in intel_drv.h, both changed in -fixes and my current
-next pile.
- Cursor handling cleanup for -next which depends upon the cursor
handling fix merged into -rc2.
All just trivial conflicts of the "changed adjacent lines" type:
drivers/gpu/drm/i915/i915_gem.c
drivers/gpu/drm/i915/intel_display.c
drivers/gpu/drm/i915/intel_drv.h
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We currently disable the ERR_INT interrupts while running the IRQ
handler because we fear that if we do an unclaimed register access
from inside the IRQ handler we'll keep triggering the IRQ handler
forever.
The problem is that since we always disable the ERR_INT interrupts at
the IRQ handler, when we get a FIFO underrun we'll always print both
messages:
- "uncleared fifo underrun on pipe A"
- "Pipe A FIFO underrun"
Because the "was_enabled" variable from
ivybridge_set_fifo_underrun_reporting will always be false (since we
disable ERR int at the IRQ handler!).
Instead of actually fixing ivybridge_set_fifo_underrun_reporting,
let's just remove the "disable ERR_INT during the IRQ handler" code.
As far as we know we shouldn't really be triggering ERR_INT interrupts
from the IRQ handler, so if we ever get stuck in the endless loop of
interrupts we can git-bisect and revert (and we can even bisect and
revert this patch in case I'm just wrong). As a bonus, our IRQ handler
is now simpler and a few nanoseconds faster.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We'd only ever used this define to denote whether or not we have the
dynamic parity feature (DPF) and never to determine whether or not L3
exists. Baytrail is a good example of where L3 exists, and not DPF.
This patch provides clarify in the code for future use cases which might
want to actually query whether or not L3 exists.
v2: Add /* DPF == dynamic parity feature */
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Certain HSW SKUs have a second bank of L3. This L3 remapping has a
separate register set, and interrupt from the first "slice". A slice is
simply a term to define some subset of the GPU's l3 cache. This patch
implements both the interrupt handler, and ability to communicate with
userspace about this second slice.
v2: Remove redundant check about non-existent slice.
Change warning about interrupts of unknown slices to WARN_ON_ONCE
Handle the case where we get 2 slice interrupts concurrently, and switch
the tracking of interrupts to be non-destructive (all Ville)
Don't enable/mask the second slice parity interrupt for ivb/vlv (even
though all docs I can find claim it's rsvd) (Ville + Bryan)
Keep BYT excluded from L3 parity
v3: Fix the slice = ffs to be decremented by one (found by Ville). When
I initially did my testing on the series, I was using 1-based slice
counting, so this code was correct. Not sure why my simpler tests that
I've been running since then didn't pick it up sooner.
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This reduces dmesg noise when there's a glitch on the hpd line, or there
are more than one connectors on the same hpd line and only one of them
changes.
While at it, switch to use the friendly status names instead of numbers.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
My g33 here seems to be shockingly good at hitting them all. This time
around kms_flip/flip-vs-panning-vs-hang blows up:
intel_crtc_wait_for_pending_flips correctly checks for gpu hangs and
if a gpu hang is pending aborts the wait for outstanding flips so that
the setcrtc call will succeed and release the crtc mutex. And the gpu
hang handler needs that lock in intel_display_handle_reset to be able
to complete outstanding flips.
The problem is that we can race in two ways:
- Waiters on the dev_priv->pending_flip_queue aren't woken up after
we've the reset as pending, but before we actually start the reset
work. This means that the waiter doesn't notice the pending reset
and hence will keep on hogging the locks.
Like with dev->struct_mutex and the ring->irq_queue wait queues we
there need to wake up everyone that potentially holds a lock which
the reset handler needs.
- intel_display_handle_reset was called _after_ we've already
signalled the completion of the reset work. Which means a waiter
could sneak in, grab the lock and never release it (since the
pageflips won't ever get released).
Similar to resetting the gem state all the reset work must complete
before we update the reset counter. Contrary to the gem reset we
don't need to have a second explicit wake up call since that will
have happened already when completing the pageflips. We also don't
have any issues that the completion happens while the reset state is
still pending - wait_for_pending_flips is only there to ensure we
display the right frame. After a gpu hang&reset events such
guarantees are out the window anyway. This is in contrast to the gem
code where too-early wake-up would result in unnecessary restarting
of ioctls.
Also, since we've gotten these various deadlocks and ordering
constraints wrong so often throw copious amounts of comments at the
code.
This deadlock regression has been introduced in the commit which added
the pageflip reset logic to the gpu hang work:
commit 96a02917a0
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date: Mon Feb 18 19:08:49 2013 +0200
drm/i915: Finish page flips and update primary planes after a GPU reset
v2:
- Add comments to explain how the wake_up serves as memory barriers
for the atomic_t reset counter.
- Improve the comments a bit as suggested by Chris Wilson.
- Extract the wake_up calls before/after the reset into a little
i915_error_wake_up and unconditionally wake up the
pending_flip_queue waiters, again as suggested by Chris Wilson.
v3: Throw copious amounts of comments at i915_error_wake_up as
suggested by Chris Wilson.
Cc: stable@vger.kernel.org
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>