Commit Graph

50 Commits

Author SHA1 Message Date
Boris Brezillon 65101d8c91 drm/vc4: Expose performance counters to userspace
The V3D engine has various hardware counters which might be interesting
to userspace performance analysis tools.

Expose new ioctls to create/destroy a performance monitor object and
query the counter values of this perfmance monitor.

Note that a perfomance monitor is given an ID that is only valid on the
file descriptor it has been allocated from. A performance monitor can be
attached to a CL submission and the driver will enable HW counters for
this request and update the performance monitor values at the end of the
job.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20180112090926.12538-1-boris.brezillon@free-electrons.com
2018-02-10 22:23:26 +00:00
Noralf Trønnes b8f429a77f drm/vc4: Use drm_fb_cma_fbdev_init/fini()
Use drm_fb_cma_fbdev_init() and drm_fb_cma_fbdev_fini() which relies on
the fact that drm_device holds a pointer to the drm_fb_helper structure.
This means that the driver doesn't have to keep track of that.
Also use the drm_fb_helper functions directly.

Cc: Eric Anholt <eric@anholt.net>
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20171115142001.45358-18-noralf@tronnes.org
2017-12-08 14:47:43 +01:00
Boris Brezillon b9f19259b8 drm/vc4: Add the DRM_IOCTL_VC4_GEM_MADVISE ioctl
This ioctl will allow us to purge inactive userspace buffers when the
system is running out of contiguous memory.

For now, the purge logic is rather dumb in that it does not try to
release only the amount of BO needed to meet the last CMA alloc request
but instead purges all objects placed in the purgeable pool as soon as
we experience a CMA allocation failure.

Note that the in-kernel BO cache is always purged before the purgeable
cache because those objects are known to be unused while objects marked
as purgeable by a userspace application/library might have to be
restored when they are marked back as unpurgeable, which can be
expensive.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20171019125748.3152-1-boris.brezillon@free-electrons.com
2017-10-19 10:34:49 -07:00
Eric Anholt f30994622b drm/vc4: Add an ioctl for labeling GEM BOs for summary stats
This has proven immensely useful for debugging memory leaks and
overallocation (which is a rather serious concern on the platform,
given that we typically run at about 256MB of CMA out of up to 1GB
total memory, with framebuffers that are about 8MB ecah).

The state of the art without this is to dump debug logs from every GL
application, guess as to kernel allocations based on bo_stats, and try
to merge that all together into a global picture of memory allocation
state.  With this, you can add a couple of calls to the debug build of
the 3D driver and get a pretty detailed view of GPU memory usage from
/debug/dri/0/bo_stats (or when we debug print to dmesg on allocation
failure).

The Mesa side currently labels at the gallium resource level (so you
see that a 1920x20 pixmap has been created, presumably for the window
system panel), but we could extend that to be even more useful with
glObjectLabel() names being sent all the way down to the kernel.

(partial) example of sorted debugfs output with Mesa labeling all
resources:

               kernel BO cache:  16392kb BOs (3)
       tiling shadow 1920x1080:   8160kb BOs (1)
       resource 1920x1080@32/0:   8160kb BOs (1)
scanout resource 1920x1080@32/0:   8100kb BOs (1)
                        kernel:   8100kb BOs (1)

v2: Use strndup_user(), use lockdep assertion instead of just a
    comment, fix an array[-1] reference, extend comment about name
    freeing.

Signed-off-by: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20170725182718.31468-2-eric@anholt.net
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-07-28 16:04:53 -07:00
Eric Anholt 55a0b9d70a drm/vc4: Remove dead vc4_event_pending().
It is no longer used as of commit 34c8ea400f ("drm/vc4: Mimic
drm_atomic_helper_commit() behavior")

Signed-off-by: Eric Anholt <eric@anholt.net>
Link: http://patchwork.freedesktop.org/patch/msgid/20170621185002.28563-4-eric@anholt.net
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com>
2017-06-22 11:14:39 -07:00
Eric Anholt 83753117f1 drm/vc4: Add get/set tiling ioctls.
This allows mesa to set the tiling format for a BO and have that
tiling format be respected by mesa on the other side of an
import/export (and by vc4 scanout in the kernel), without defining a
protocol to pass the tiling through userspace.

Signed-off-by: Eric Anholt <eric@anholt.net>
Link: http://patchwork.freedesktop.org/patch/msgid/20170608001336.12842-2-eric@anholt.net
Acked-by: Dave Airlie <airlied@redhat.com>
2017-06-15 16:02:45 -07:00
Boris Brezillon 9a8d5e4a53 drm/vc4: Fix comment in vc4_drv.h
Fixes a copy&paste error.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Link: http://patchwork.freedesktop.org/patch/msgid/1495550187-525-1-git-send-email-boris.brezillon@free-electrons.com
2017-05-31 10:50:54 -07:00
Masahiro Yamada b7e8e25b37 drm/vc4: fix include notation and remove -Iinclude/drm flag
Include <drm/*.h> instead of relative path from include/drm, then
remove the -Iinclude/drm compiler flag.

While we are here, use <...> instead of "..." for include/linux/*.h
and include/sound/*.h headers too.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1495081793-9707-2-git-send-email-yamada.masahiro@socionext.com
2017-05-22 09:36:01 +02:00
Daniel Vetter 1bf6ad622b drm/vblank: drop the mode argument from drm_calc_vbltimestamp_from_scanoutpos
If we restrict this helper to only kms drivers (which is the case) we
can look up the correct mode easily ourselves. But it's a bit tricky:

- All legacy drivers look at crtc->hwmode. But that is updated already
  at the beginning of the modeset helper, which means when we disable
  a pipe. Hence the final timestamps might be a bit off. But since
  this is an existing bug I'm not going to change it, but just try to
  be bug-for-bug compatible with the current code. This only applies
  to radeon&amdgpu.

- i915 tries to get it perfect by updating crtc->hwmode when the pipe
  is off (i.e. vblank->enabled = false).

- All other atomic drivers look at crtc->state->adjusted_mode. Those
  that look at state->requested_mode simply don't adjust their mode,
  so it's the same. That has two problems: Accessing crtc->state from
  interrupt handling code is unsafe, and it's updated before we shut
  down the pipe. For nonblocking modesets it's even worse.

For atomic drivers try to implement what i915 does. To do that we add
a new hwmode field to the vblank structure, and update it from
drm_calc_timestamping_constants(). For atomic drivers that's called
from the right spot by the helper library already, so all fine. But
for safety let's enforce that.

For legacy driver this function is only called at the end (oh the
fun), which is broken, so again let's not bother and just stay
bug-for-bug compatible.

The  benefit is that we can use drm_calc_vbltimestamp_from_scanoutpos
directly to implement ->get_vblank_timestamp in every driver, deleting
a lot of code.

v2: Completely new approach, trying to mimick the i915 solution.

v3: Fixup kerneldoc.

v4: Drop the WARN_ON to check that the vblank is off, atomic helpers
currently unconditionally call this. Recomputing the same stuff should
be harmless.

v5: Fix typos and move misplaced hunks to the right patches (Neil).

v6: Undo hunk movement (kbuild).

Cc: Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
Cc: Eric Anholt <eric@anholt.net>
Cc: Rob Clark <robdclark@gmail.com>
Cc: linux-arm-msm@vger.kernel.org
Cc: freedreno@lists.freedesktop.org
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>
Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170509140329.24114-4-daniel.vetter@ffwll.ch
2017-05-10 10:21:31 +02:00
Daniel Vetter 3fcdcb2709 drm/vblank: Switch to bool in_vblank_irq in get_vblank_timestamp
It's overkill to have a flag parameter which is essentially used just
as a boolean. This takes care of core + adjusting drivers.

Adjusting the scanout position callback is a bit harder, since radeon
also supplies it's own driver-private flags in there.

v2: Fixup misplaced hunks (Neil).

v3: kbuild says v1 was better ...

Cc: Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
Cc: Eric Anholt <eric@anholt.net>
Cc: Rob Clark <robdclark@gmail.com>
Cc: linux-arm-msm@vger.kernel.org
Cc: freedreno@lists.freedesktop.org
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170509140329.24114-2-daniel.vetter@ffwll.ch
2017-05-10 10:21:17 +02:00
Daniel Vetter d673c02c4b drm/vblank: Switch drm_driver->get_vblank_timestamp to return a bool
There's really no reason for anything more:
- Calling this while the crtc vblank stuff isn't set up is a driver
  bug. Those places alrready DRM_ERROR.
- Calling this when the crtc is off is either a driver bug (calling
  drm_crtc_handle_vblank at the wrong time) or a core bug (for
  anything else). Again, we DRM_ERROR.
- EINVAL is checked at higher levels already, and if we'd use struct
  drm_crtc * instead of (dev, pipe) it would be real obvious that
  those are again core bugs.

The only valid failure mode is crap hardware that couldn't sample a
useful timestamp, to ask the core to just grab a not-so-accurate
timestamp. Bool is perfectly fine for that.

v2: Also fix up the one caller, I lost that in the shuffling (Jani).

v3: Fixup commit message (Neil).

Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
Cc: Eric Anholt <eric@anholt.net>
Cc: Rob Clark <robdclark@gmail.com>
Cc: linux-arm-msm@vger.kernel.org
Cc: freedreno@lists.freedesktop.org
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>
Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170509140329.24114-1-daniel.vetter@ffwll.ch
2017-05-10 10:21:08 +02:00
Eric Anholt b72a2816e3 drm/vc4: Turn the V3D clock on at runtime.
For the Raspberry Pi's bindings, the power domain also implicitly
turns on the clock and deasserts reset, but for the new Cygnus port we
start representing the clock in the devicetree.

v2: Document the clock-names property, check for -ENOENT for no clock
    in DT.
v3: Drop NULL checks around clk calls which embed NULL checks.
v4: Drop clk-names (feedback by Rob Herring)

Signed-off-by: Eric Anholt <eric@anholt.net>
Acked-by: Rob Herring <robh@kernel.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20170428224223.21904-1-eric@anholt.net
2017-05-08 12:24:06 -07:00
Eric Anholt 553c942f8b drm/vc4: Allow using more than 256MB of CMA memory.
Until now, we've had to limit Raspberry Pi to 256MB of CMA memory to
keep from triggering the hardware addressing bug between the tile
binner and the tile alloc memory (where the top 4 bits come from the
tile state data array's address).

To work around that and allow more memory to be reserved for graphics,
allocate a single BO to store tile state data arrays and tile
alloc/overflow memory while the GPU is active, and make sure that that
one BO doesn't happen to cross a 256MB boundary.  With that in place,
we can allocate textures and shaders anywhere in system memory (still
contiguous, of course).

Signed-off-by: Eric Anholt <eric@anholt.net>
Link: http://patchwork.freedesktop.org/patch/msgid/20170327231025.19391-1-eric@anholt.net
Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com>
2017-04-18 14:32:20 -07:00
Eric Anholt cdec4d3613 drm/vc4: Expose dma-buf fences for V3D rendering.
This is needed for proper synchronization with display on another DRM
device (pl111 or tinydrm) with buffers produced by vc4 V3D.  Fixes the
new igt vc4_dmabuf_poll testcase, and rendering of one of the glmark2
desktop tests on pl111+vc4.

This doesn't yet introduce waits on another device's fences before
vc4's rendering/display, because I don't have testcases for them.

v2: Reuse dma_fence_free(), retitle commit message to clarify that
    it's not a full dma-buf fencing implementation yet.

Signed-off-by: Eric Anholt <eric@anholt.net>
Link: http://patchwork.freedesktop.org/patch/msgid/20170412191202.22740-6-eric@anholt.net
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-04-13 11:00:28 -07:00
Shawn Guo 0d5f46fa4c drm: vc4: use vblank hooks in struct drm_crtc_funcs
The vblank hooks in struct drm_driver are deprecated and only meant for
legacy drivers.  For modern drivers with DRIVER_MODESET flag, the hooks
in struct drm_crtc_funcs should be used instead.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Cc: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Link: http://patchwork.freedesktop.org/patch/msgid/1486458995-31018-23-git-send-email-shawnguo@kernel.org
2017-02-09 16:12:47 +08:00
Eric Anholt 4078f57571 drm/vc4: Add DSI driver
The DSI0 and DSI1 blocks on the 2835 are related hardware blocks.
Some registers move around, and the featureset is slightly different,
as DSI1 (the 4-lane DSI) is a later version of the hardware block.
This driver doesn't yet enable DSI0, since we don't have any hardware
to test against, but it does put a lot of the register definitions and
code in place.

v2: Use the clk_hw interfaces, don't set CLK_IS_BASIC (from review by
    Stephen Boyd)

Signed-off-by: Eric Anholt <eric@anholt.net>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> (v1)
Link: http://patchwork.freedesktop.org/patch/msgid/20170131192912.11316-1-eric@anholt.net
2017-02-01 12:51:23 -08:00
Noralf Trønnes 55d6616585 drm/vc4: Remove vc4_debugfs_cleanup()
drm_debugfs_cleanup() now removes all minor->debugfs_list entries
automatically, so the drm_driver.debugfs_cleanup callback is not
needed.

Cc: eric@anholt.net
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170126225621.12314-17-noralf@tronnes.org
2017-01-30 09:48:48 +01:00
Shawn Guo c77b9abf3c drm: vc4: use crtc helper drm_crtc_from_index()
Use drm_crtc_from_index() to find drm_crtc for given index, so that we
do not need to maintain a pointer array in struct vc4_dev.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Cc: Eric Anholt <eric@anholt.net>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: http://patchwork.freedesktop.org/patch/msgid/1483961145-18453-7-git-send-email-shawnguo@kernel.org
2017-01-18 09:21:13 -05:00
Laurent Pinchart 9338203c4f drm: Don't include <drm/drm_encoder.h> in <drm/drm_crtc.h>
<drm/drm_crtc.h> used to define most of the in-kernel KMS API. It has
now been split into separate files for each object type, but still
includes most other KMS headers to avoid breaking driver compilation.

As a step towards fixing that problem, remove the inclusion of
<drm/drm_encoder.h> from <drm/drm_crtc.h> and include it instead where
appropriate. Also remove the forward declarations of the drm_encoder and
drm_encoder_helper_funcs structures from <drm/drm_crtc.h> as they're not
needed in the header.

<drm/drm_encoder.h> now has to include <drm/drm_mode.h> and contain a
forward declaration of struct drm_encoder in order to allow including it
as the first header in a compilation unit.

Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Sinclair Yeh <syeh@vmware.com> # For vmwgfx
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Link: http://patchwork.freedesktop.org/patch/msgid/1481709550-29226-2-git-send-email-laurent.pinchart+renesas@ideasonboard.com
2016-12-18 16:29:29 +05:30
Boris Brezillon e4b81f8c74 drm/vc4: Add support for the VEC (Video Encoder) IP
The VEC IP is a TV DAC, providing support for PAL and NTSC standards.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
2016-12-09 15:26:31 -08:00
Boris Brezillon ab8df60e3a drm/vc4: Fix ->clock_select setting for the VEC encoder
PV_CONTROL_CLK_SELECT_VEC is actually 2 and not 0. Fix the definition and
rework the vc4_set_crtc_possible_masks() to cover the full range of the
PV_CONTROL_CLK_SELECT field.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
2016-12-09 15:26:29 -08:00
Derek Foreman 26fc78f6fe drm/vc4: Fix race between page flip completion event and clean-up
There was a small window where a userspace program could submit
a pageflip after receiving a pageflip completion event yet still
receive EBUSY.

Signed-off-by: Derek Foreman <derekf@osg.samsung.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2016-11-29 15:39:45 -08:00
Jonas Pfeil c778cc5df9 drm/vc4: Add fragment shader threading support
FS threading brings performance improvements of 0-20% in glmark2.

The validation code checks for thread switch signals and ensures that
the registers of the other thread are not touched, and that our clamps
are not live across thread switches.  It also checks that the
threading and branching instructions do not interfere.

(Original patch by Jonas, changes by anholt for style cleanup,
removing validation the kernel doesn't need to do, and adding the flag
for userspace).

v2: Minor style fixes from checkpatch.

Signed-off-by: Jonas Pfeil <pfeiljonas@gmx.de>
Signed-off-by: Eric Anholt <eric@anholt.net>
2016-11-16 13:25:26 -08:00
Eric Anholt 7edabee06a drm/vc4: Fix races when the CS reads from render targets.
With the introduction of bin/render pipelining, the previous job may
not be completed when we start binning the next one.  If the previous
job wrote our VBO, IB, or CS textures, then the binning stage might
get stale or uninitialized results.

Fixes the major rendering failure in glmark2 -b terrain.

Signed-off-by: Eric Anholt <eric@anholt.net>
Fixes: ca26d28bba ("drm/vc4: improve throughput by pipelining binning and rendering jobs")
Cc: stable@vger.kernel.org
2016-10-06 11:53:50 -07:00
Masahiro Yamada 57b9f56944 drm/vc4: cleanup with list_first_entry_or_null()
The combo of list_empty() check and return list_first_entry()
can be replaced with list_first_entry_or_null().

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
2016-10-06 11:53:35 -07:00
Eric Anholt 9326e6f255 drm/vc4: Fix overflow mem unreferencing when the binner runs dry.
Overflow memory handling is tricky: While it's still referenced by the
BPO registers, we want to keep it from being freed.  When we are
putting a new set of overflow memory in the registers, we need to
assign the old one to the last rendering job using it.

We were looking at "what's currently running in the binner", but since
the bin/render submission split, we may end up with the binner
completing and having no new job while the renderer is still
processing.  So, if we don't find a bin job at all, look at the
highest-seqno (last) render job to attach our overflow to.

Signed-off-by: Eric Anholt <eric@anholt.net>
Fixes: ca26d28bba ("drm/vc4: improve throughput by pipelining binning and rendering jobs")
Cc: stable@vger.kernel.org
2016-08-19 19:17:34 -07:00
Dave Airlie 2d635fded2 This pull request brings in vc4 shader validation for branching,
allowing GLSL shaders with non-unrolled loops.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCgAGBQJXiWHaAAoJELXWKTbR/J7olkkP/3buVvzBdxQmT7yMeBL2Up+F
 MZ+W/pr7+gWXT/rvyY8zbEZV1txQtZDnvu8cC3h9E66TVYpTx3dYypECwiRKFt5f
 HXhrkiniu65zBPhtYxloswcr686SpFOhr9rwy6cJbCCvuZQZVpQ41Fu7BinaYJc2
 uGllnNmRkB7tPG/6BYQ/MAWvR7AjTaaAmcdhNZDyEHqXHiUu8NPprL0BQIwS1GMn
 65pQjni/WEek6/WxncxszaDWfsLMRJPKiuPV4kAsoQDluB3ortk3k1VVNJ+n3aXw
 GjlMNbU2ganEGWJS16e7HXXz3I19SQHn5mnL74X3GGL8r2x9AIZ64Zk7Ru659NBq
 CW/NsSWM/TfU6T4iflbZjSVlmtbvdX5T0rDFb+lMFNXG8EvmRIc3fkbSeh6gF9sF
 hc3UDS3ScQBAm8QOpDiVAsXNBGJUTsvxdDYgVUj1mjPtbD1ZDmEWYdb2fzs/n7ja
 qU/CW9Ap3tModkTjaD58RoenTkbTXymhxmRd1JHoxVztkAM00fNxtGP6fTxuGLOW
 snOoE9Ahdq/IzV/yZrqvWRCLTDenQ/ZxcP+fYleJ6M3qLxfHkgZX/chy/n21Q80G
 Qe40uqBWreTCz6SzCEOXeaJ9RDbZfSomVfWuLzm5IGMhOmh1rqkpdgAUKQW7OvxB
 e/LnruAmY/K7Yin87gK6
 =Qh0U
 -----END PGP SIGNATURE-----

Merge tag 'drm-vc4-next-2016-07-15' of https://github.com/anholt/linux into drm-next

This pull request brings in vc4 shader validation for branching,
allowing GLSL shaders with non-unrolled loops.

* tag 'drm-vc4-next-2016-07-15' of https://github.com/anholt/linux:
  drm/vc4: Fix a "the the" typo in a comment.
  drm/vc4: Fix definition of QPU_R_MS_REV_FLAGS
  drm/vc4: Add a getparam to signal support for branches.
  drm/vc4: Add support for branching in shader validation.
  drm/vc4: Add a bitmap of branch targets during shader validation.
  drm/vc4: Move validation's current/max ip into the validation struct.
  drm/vc4: Add a getparam ioctl for getting the V3D identity regs.
2016-07-16 11:25:11 +10:00
Eric Anholt 6d45c81d22 drm/vc4: Add support for branching in shader validation.
We're already checking that branch instructions are between the start
of the shader and the proper PROG_END sequence.  The other thing we
need to make branching safe is to verify that the shader doesn't read
past the end of the uniforms stream.

To do that, we require that at any basic block reading uniforms have
the following instructions:

load_imm temp, <next offset within uniform stream>
add unif_addr, temp, unif

The instructions are generated by userspace, and the kernel verifies
that the load_imm is of the expected offset, and that the add adds it
to a uniform.  We track which uniform in the stream that is, and at
draw call time fix up the uniform stream to have the address of the
start of the shader's uniforms at that location.

Signed-off-by: Eric Anholt <eric@anholt.net>
2016-07-15 15:19:50 -07:00
Dave Airlie 35b8a74924 This pull request brings in new vc4 plane formats for Android, precise
vblank timestamping, and a couple of small cleanups.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCgAGBQJXhUULAAoJELXWKTbR/J7oZ1oQAKnTTJlnCbaSNrWbRBUuaMtO
 S2RQKyMI2LOpf4XE13cNHm0IaYNOw6hJIXlnxeuzHbQSGvOrHnabluZuvAfLa3OQ
 4bb8K0elWsPRbtmx2L8DjncLCmmmOsbczrTwo3efq70XZs4PyaCp2vHpCU8kI7HJ
 xO/fk6E8sYDPFCxZxb82C7LTSjQBlPn58dD+cEFg1kGILOhyR1eZwRj8e4netjKU
 gN/059R6VPPor+I8oIMrqoHAxAlZUGnVPsK/1rgR5cfxlC1NPKU71eI7xiBLNclt
 86BcvU/LuwAYt81UirnGTQU/m47dCMK4mmpzD/9EgVW/KTUPLPubML+u9fL67+B4
 BJ7T7N4t7APblqkO/iI9DcTAkZNmydq4sdsfDdXcVZ3umDTNpmb/PQuq+rFEc1CZ
 qsBw13kJ4bHBYUpZvrxuCesSRLpXhUiSOg/uVnRHTdiALZqQQyzQlnw6Q4GCMKiz
 R99/k4/7/cAgQkGElwLeHR8Aw0r26m+X4pnLN9lrafjQjlH04CcuvFwlak28jnit
 Oi0eZ0VD9RTHwPzUIoe4M32Q/7oAe7y5b7gCNPbM/0s56WQv+cIHi7CdgdIYtUWs
 BIyeBTzpLvzkEvrEMYoipgpv3scbudMssTd3bo8gsdOu0tllBzi59Poo5hl6pQbs
 lseJ7LavmK5sPDPsIZb3
 =aF3C
 -----END PGP SIGNATURE-----

Merge tag 'drm-vc4-next-2016-07-12' of https://github.com/anholt/linux into drm-next

This pull request brings in new vc4 plane formats for Android, precise
vblank timestamping, and a couple of small cleanups.

* tag 'drm-vc4-next-2016-07-12' of https://github.com/anholt/linux:
  drm/vc4: remove redundant ret status check
  drm/vc4: Implement precise vblank timestamping.
  drm/vc4: Bind the HVS before we bind the individual CRTCs.
  gpu: drm: vc4_hdmi: add missing of_node_put after calling of_parse_phandle
  drm: vc4: enable XBGR8888 and ABGR8888 pixel formats
  drm/vc4: clean up error exit path on failed dpi_connector allocation
2016-07-15 13:56:11 +10:00
Mario Kleiner 1bf59f1dcb drm/vc4: Implement precise vblank timestamping.
Precise vblank timestamping is implemented via the
usual scanout position based method. On VC4 the
pixelvalves PV do not have a scanout position
register. Only the hardware video scaler HVS has a
similar register which describes which scanline for
the output is currently composited and stored in the
HVS fifo for later consumption by the PV.

This causes a problem in that the HVS runs at a much
faster clock (system clock / audio gate) than the PV
which runs at video mode dot clock, so the unless the
fifo between HVS and PV is full, the HVS will progress
faster in its observable read line position than video
scan rate, so the HVS position reading can't be directly
translated into a scanout position for timestamp correction.

Additionally when the PV is in vblank, it doesn't consume
from the fifo, so the fifo gets full very quickly and then
the HVS stops compositing until the PV enters active scanout
and starts consuming scanlines from the fifo again, making
new space for the HVS to composite.

Therefore a simple translation of HVS read position into
elapsed time since (or to) start of active scanout does
not work, but for the most interesting cases we can still
get useful and sufficiently accurate results:

1. The PV enters active scanout of a new frame with the
   fifo of the HVS completely full, and the HVS can refill
   any fifo line which gets consumed and thereby freed up by
   the PV during active scanout very quickly. Therefore the
   PV and HVS work effectively in lock-step during active
   scanout with the fifo never having more than 1 scanline
   freed up by the PV before it gets refilled. The PV's
   real scanout position is therefore trailing the HVS
   compositing position as scanoutpos = hvspos - fifosize
   and we can get the true scanoutpos as HVS readpos minus
   fifo size, so precise timestamping works while in active
   scanout, except for the last few scanlines of the frame,
   when the HVS reaches end of frame, stops compositing and
   the PV catches up and drains the fifo. This special case
   would only introduce minor errors though.

2. If we are in vblank, then we can only guess something
   reasonable. If called from vblank irq, we assume the irq is
   usually dispatched with minimum delay, so we can take a
   timestamp taken at entry into the vblank irq handler as a
   baseline and then add a full vblank duration until the
   guessed start of active scanout. As irq dispatch is usually
   pretty low latency this works with relatively low jitter and
   good results.

   If we aren't called from vblank then we could be anywhere
   within the vblank interval, so we return a neutral result,
   simply the current system timestamp, and hope for the best.

Measurement shows the generated timestamps to be rather precise,
and at least never off more than 1 vblank duration worst-case.

Limitations: Doesn't work well yet for interlaced video modes,
             therefore disabled in interlaced mode for now.

v2: Use the DISPBASE registers to determine the FIFO size (changes
    by anholt)

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-and-tested-by: Mario Kleiner <mario.kleiner.de@gmail.com> (v2)
2016-07-11 17:17:34 -07:00
Daniel Vetter 2f196b7c4b drm/atomic: Add drm_atomic_crtc_state_for_each_plane_state
... and use it in msm&vc4. Again just want to encapsulate
drm_atomic_state internals a bit.

The const threading is a bit awkward in vc4 since C sucks, but I still
think it's worth to enforce this. Eventually I want to make all the
obj->state pointers const too, but that's a lot more work ...

v2: Provide safe macro to wrap up the unsafe helper better, suggested
by Maarten.

v3: Fixup subject (Maarten) and spelling fixes (Eric Engestrom).

Cc: Eric Anholt <eric@anholt.net>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464877304-4213-1-git-send-email-daniel.vetter@ffwll.ch
2016-06-02 16:59:05 +02:00
Eric Anholt 08302c35b5 drm/vc4: Add DPI driver
The DPI interface involves taking a ton of our GPIOs to be used as
outputs, and routing display signals over them in parallel.

v2: Use display_info.bus_formats[] to replace our custom DT
    properties.
v3: Rebase on V3D documentation changes.
v4: Fix rebase detritus from V3D documentation changes.

Signed-off-by: Eric Anholt <eric@anholt.net>
Acked-by: Rob Herring <robh@kernel.org>
2016-04-14 12:22:53 -07:00
Varad Gautam ca26d28bba drm/vc4: improve throughput by pipelining binning and rendering jobs
The hardware provides us with separate threads for binning and
rendering, and the existing model waits for them both to complete
before submitting the next job.

Splitting the binning and rendering submissions reduces idle time and
gives us approx 20-30% speedup with some x11perf tests such as -line10
and -tilerect1.  Improves openarena performance by 1.01897% +/-
0.247857% (n=16).

Thanks to anholt for suggesting this.

v2: Rebase on the spurious resets fix (change by anholt).

Signed-off-by: Varad Gautam <varadgautam@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Eric Anholt <eric@anholt.net>
2016-03-13 17:05:05 -07:00
Dave Airlie 9b61c0fcdf Merge drm-fixes into drm-next.
Nouveau wanted this to avoid some worse conflicts when I merge that.
2016-03-14 09:46:02 +10:00
Eric Anholt 36cb6253f9 drm/vc4: Use runtime PM to power cycle the device when the GPU hangs.
This gets us functional GPU reset again, like we had until a refactor
at merge time.  Tested with a little patch to stuff in a broken binner
job every 100 frames.

Signed-off-by: Eric Anholt <eric@anholt.net>
2016-02-16 12:21:01 -08:00
Eric Anholt 001bdb55d9 drm/vc4: Enable runtime PM.
This may actually get us a feature that the closed driver didn't have:
turning off the GPU in between rendering jobs, while the V3D device is
still opened by the client.

There may be some tuning to be applied here to use autosuspend so that
we don't bounce the device's power so much, but in steady-state
GPU-bound rendering we keep the power on (since we keep multiple jobs
outstanding) and even if we power cycle on every job we can still
manage at least 680 fps.

More importantly, though, runtime PM will allow us to power off the
device to do a GPU reset.

v2: Switch #ifdef to CONFIG_PM not CONFIG_PM_SLEEP (caught by kbuild
    test robot)

Signed-off-by: Eric Anholt <eric@anholt.net>
2016-02-16 12:21:00 -08:00
Eric Anholt c4ce60dc30 drm/vc4: Fix spurious GPU resets due to BO reuse.
We were tracking the "where are the head pointers pointing" globally,
so if another job reused the same BOs and execution was at the same
point as last time we checked, we'd stop and trigger a reset even
though the GPU had made progress.

Signed-off-by: Eric Anholt <eric@anholt.net>
2016-02-16 12:21:00 -08:00
Eric Anholt 21af94cf1a drm/vc4: Add support for scaling of display planes.
This implements a simple policy for choosing scaling modes
(trapezoidal for decimation, PPF for magnification), and a single PPF
filter (Mitchell/Netravali's recommendation).

Signed-off-by: Eric Anholt <eric@anholt.net>
2016-02-16 11:24:08 -08:00
Eric Anholt d8dbf44f13 drm/vc4: Make the CRTCs cooperate on allocating display lists.
So far, we've only ever lit up one CRTC, so this has been fine.  To
extend to more displays or more planes, we need to make sure we don't
run our display lists into each other.

Signed-off-by: Eric Anholt <eric@anholt.net>
2016-02-16 11:24:08 -08:00
Daniel Vetter 32a3dbeb2b drm/vc4: Nuke preclose hook
Again since the drm core takes care of event unlinking/disarming this
is now just needless code.

v2: Fixup misplaced hunk.

Cc: Eric Anholt <eric@anholt.net>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (v1)
Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453756616-28942-14-git-send-email-daniel.vetter@ffwll.ch
2016-02-08 09:55:53 +01:00
Eric Anholt 214613656b drm/vc4: Add an interface for capturing the GPU state after a hang.
This can be parsed with vc4-gpu-tools tools for trying to figure out
what was going on.

v2: Use __u32-style types.

Signed-off-by: Eric Anholt <eric@anholt.net>
2015-12-07 20:49:49 -08:00
Eric Anholt b501bacc60 drm/vc4: Add support for async pageflips.
An async pageflip stores the modeset to be done and executes it once
the BOs are ready to be displayed.  This gets us about 3x performance
in full screen rendering with pageflipping.

Signed-off-by: Eric Anholt <eric@anholt.net>
2015-12-07 20:10:03 -08:00
Eric Anholt d5b1a78a77 drm/vc4: Add support for drawing 3D frames.
The user submission is basically a pointer to a command list and a
pointer to uniforms.  We copy those in to the kernel, validate and
relocate them, and store the result in a GPU BO which we queue for
execution.

v2: Drop support for NV shader recs (not necessary for GL), simplify
    vc4_use_bo(), improve bin flush/semaphore checks, use __u32 style
    types.

Signed-off-by: Eric Anholt <eric@anholt.net>
2015-12-07 20:05:10 -08:00
Eric Anholt d3f5168a08 drm/vc4: Bind and initialize the V3D engine.
This is the component of the GPU that does 3D rendering.

Signed-off-by: Eric Anholt <eric@anholt.net>
2015-12-07 20:05:10 -08:00
Eric Anholt 463873d570 drm/vc4: Add an API for creating GPU shaders in GEM BOs.
Since we have no MMU, the kernel needs to validate that the submitted
shader code won't make any accesses to memory that the user doesn't
control, which involves banning some operations (general purpose DMA
writes), and tracking where we need to write out pointers for other
operations (texture sampling).  Once it's validated, we return a GEM
BO containing the shader, which doesn't allow mapping for write or
exporting to other subsystems.

v2: Use __u32-style types.

Signed-off-by: Eric Anholt <eric@anholt.net>
2015-12-07 20:05:09 -08:00
Eric Anholt d5bc60f6ad drm/vc4: Add create and map BO ioctls.
While there exist dumb APIs for creating and mapping BOs, one of the
rules is that drivers doing 3D acceleration have to provide their own
APIs for buffer allocation (besides, the pitch/height parameters of
the dumb alloc don't really make sense for a lot of 3D allocations).

v2: Use __u32-style types, use "drm.h" instead of <drm/drm.h>.

Signed-off-by: Eric Anholt <eric@anholt.net>
2015-12-07 20:04:57 -08:00
Eric Anholt c826a6e106 drm/vc4: Add a BO cache.
We need to allocate new BOs in the kernel as part of each frame, but
the CMA allocator is way too slow for that.  As an optimization, keep
track of recently-freed BOs and reuse them, with a 1 second timeout to
fully free them back to the system.

This improves 3D performance by about 15%.

Signed-off-by: Eric Anholt <eric@anholt.net>
2015-12-07 20:01:56 -08:00
Dave Airlie 1f43710a8e Merge tag 'drm-vc4-next-2015-10-21' of http://github.com/anholt/linux into drm-next
This pull request introduces the vc4 driver, for kernel modesetting on
the Raspberry Pi (bcm2835/bcm2836 architectures).  It currently
supports a display plane and cursor on the HDMI output.  The driver
doesn't do 3D, power management, or overlay planes yet.

[airlied: fixup the enable/disable vblank APIs]

Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>

* tag 'drm-vc4-next-2015-10-21' of http://github.com/anholt/linux:
  drm/vc4: Allow vblank to be disabled
  drm/vc4: Use the fbdev_cma helpers
  drm/vc4: Add KMS support for Raspberry Pi.
  drm/vc4: Add devicetree bindings for VC4.
2015-10-22 10:31:17 +10:00
Derek Foreman 48666d5631 drm/vc4: Use the fbdev_cma helpers
Keep the fbdev_cma pointer around so we can use it on hotplog and close
to ensure the frame buffer console is in a useful state.

Signed-off-by: Derek Foreman <derekf@osg.samsung.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
2015-10-21 10:33:12 +01:00
Eric Anholt c8b75bca92 drm/vc4: Add KMS support for Raspberry Pi.
This is enough for fbcon and bringing up X using
xf86-video-modesetting.  It doesn't support the 3D accelerator or
power management yet.

v2: Drop FB_HELPER select thanks to Archit's patches.  Do manual init
    ordering instead of using the .load hook.  Structure registration
    more like tegra's, but still using the typical "component" code.
    Drop no-op hooks for atomic_begin and mode_fixup() now that
    they're optional.  Drop sentinel in Makefile.  Fix minor style
    nits I noticed on another reread.

v3: Use the new bcm2835 clk driver to manage pixel/HSM clocks instead
    of having a fixed video mode.  Use exynos-style component driver
    matching instead of devicetree nodes to list the component driver
    instances.  Rename compatibility strings to say bcm2835, and
    distinguish pv0/1/2.  Clean up some h/vsync code, and add in
    interlaced mode setup.  Fix up probe/bind error paths.  Use
    bitops.h macros for vc4_regs.h

v4: Include i2c.h, allow building under COMPILE_TEST, drop msleep now
    that other bugs have been fixed, add timeouts to cpu_relax()
    loops, rename hpd-gpio to hpd-gpios.

Signed-off-by: Eric Anholt <eric@anholt.net>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-10-21 10:33:12 +01:00