Commit Graph

17 Commits

Author SHA1 Message Date
Eric Anholt 4c70ac7639 drm/vc4: Add a pad field to align drm_vc4_submit_cl to 64 bits.
I had originally asked Stefan Schake to drop the pad field from the
syncobj changes that just landed, because I couldn't come up with a
reason to align to 64 bits.

Talking with Dave Airlie about the new v3d driver's submit ioctl, we
came up with a reason: sizeof() on 64-bit platforms may align to 64
bits, in which case the userspace will be submitting the aligned size
and the final 32 bits won't be zero-padded by the kernel.  If
userspace doesn't zero-fill, then a future ABI change adding a 32-bit
field at the end could potentially cause the kernel to read undefined
data from old userspace (our userspace happens to use structure
initialization that zero-fills, but as a general rule we try not to
rely on that in the kernel).

Signed-off-by: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20180430235927.28712-1-eric@anholt.net
Reviewed-by: Stefan Schake <stschake@gmail.com>
2018-05-03 15:20:09 -07:00
Stefan Schake e84fcb95e0 drm/vc4: Export fence through syncobj
Allow specifying a syncobj on render job submission where we store the
fence for the job. This gives userland flexible access to the fence.

v2: Use 0 as invalid syncobj to drop flag (Eric)
    Don't reintroduce the padding (Eric)

Signed-off-by: Stefan Schake <stschake@gmail.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/1524607427-12876-3-git-send-email-stschake@gmail.com
2018-04-30 16:04:23 -07:00
Stefan Schake 818f5c8f4c drm/vc4: Syncobj import support
Allow userland to specify a syncobj that is waited on before a render job
starts processing.

v2: Use 0 as invalid syncobj to drop flag (Eric)
    Drop extra newline (Eric)

Signed-off-by: Stefan Schake <stschake@gmail.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/1524607427-12876-2-git-send-email-stschake@gmail.com
2018-04-30 16:04:14 -07:00
Boris Brezillon 65101d8c91 drm/vc4: Expose performance counters to userspace
The V3D engine has various hardware counters which might be interesting
to userspace performance analysis tools.

Expose new ioctls to create/destroy a performance monitor object and
query the counter values of this perfmance monitor.

Note that a perfomance monitor is given an ID that is only valid on the
file descriptor it has been allocated from. A performance monitor can be
attached to a CL submission and the driver will enable HW counters for
this request and update the performance monitor values at the end of the
job.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20180112090926.12538-1-boris.brezillon@free-electrons.com
2018-02-10 22:23:26 +00:00
Boris Brezillon b9f19259b8 drm/vc4: Add the DRM_IOCTL_VC4_GEM_MADVISE ioctl
This ioctl will allow us to purge inactive userspace buffers when the
system is running out of contiguous memory.

For now, the purge logic is rather dumb in that it does not try to
release only the amount of BO needed to meet the last CMA alloc request
but instead purges all objects placed in the purgeable pool as soon as
we experience a CMA allocation failure.

Note that the in-kernel BO cache is always purged before the purgeable
cache because those objects are known to be unused while objects marked
as purgeable by a userspace application/library might have to be
restored when they are marked back as unpurgeable, which can be
expensive.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20171019125748.3152-1-boris.brezillon@free-electrons.com
2017-10-19 10:34:49 -07:00
Eric Anholt 3be8eddd9d drm/vc4: Add exec flags to allow forcing a specific X/Y tile walk order.
This is useful to allow GL to provide defined results for overlapping
glBlitFramebuffer, which X11 in turn uses to accelerate uncomposited
window movement without first blitting to a temporary.  x11perf
-copywinwin100 goes from 1850/sec to 4850/sec.

v2: Default to the same behavior as before when the flags aren't
    passed. (suggested by Boris)

Signed-off-by: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20170725162733.28007-2-eric@anholt.net
Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com>
2017-08-08 13:26:44 -07:00
Eric Anholt f30994622b drm/vc4: Add an ioctl for labeling GEM BOs for summary stats
This has proven immensely useful for debugging memory leaks and
overallocation (which is a rather serious concern on the platform,
given that we typically run at about 256MB of CMA out of up to 1GB
total memory, with framebuffers that are about 8MB ecah).

The state of the art without this is to dump debug logs from every GL
application, guess as to kernel allocations based on bo_stats, and try
to merge that all together into a global picture of memory allocation
state.  With this, you can add a couple of calls to the debug build of
the 3D driver and get a pretty detailed view of GPU memory usage from
/debug/dri/0/bo_stats (or when we debug print to dmesg on allocation
failure).

The Mesa side currently labels at the gallium resource level (so you
see that a 1920x20 pixmap has been created, presumably for the window
system panel), but we could extend that to be even more useful with
glObjectLabel() names being sent all the way down to the kernel.

(partial) example of sorted debugfs output with Mesa labeling all
resources:

               kernel BO cache:  16392kb BOs (3)
       tiling shadow 1920x1080:   8160kb BOs (1)
       resource 1920x1080@32/0:   8160kb BOs (1)
scanout resource 1920x1080@32/0:   8100kb BOs (1)
                        kernel:   8100kb BOs (1)

v2: Use strndup_user(), use lockdep assertion instead of just a
    comment, fix an array[-1] reference, extend comment about name
    freeing.

Signed-off-by: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20170725182718.31468-2-eric@anholt.net
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-07-28 16:04:53 -07:00
Eric Anholt 83753117f1 drm/vc4: Add get/set tiling ioctls.
This allows mesa to set the tiling format for a BO and have that
tiling format be respected by mesa on the other side of an
import/export (and by vc4 scanout in the kernel), without defining a
protocol to pass the tiling through userspace.

Signed-off-by: Eric Anholt <eric@anholt.net>
Link: http://patchwork.freedesktop.org/patch/msgid/20170608001336.12842-2-eric@anholt.net
Acked-by: Dave Airlie <airlied@redhat.com>
2017-06-15 16:02:45 -07:00
Jonas Pfeil c778cc5df9 drm/vc4: Add fragment shader threading support
FS threading brings performance improvements of 0-20% in glmark2.

The validation code checks for thread switch signals and ensures that
the registers of the other thread are not touched, and that our clamps
are not live across thread switches.  It also checks that the
threading and branching instructions do not interfere.

(Original patch by Jonas, changes by anholt for style cleanup,
removing validation the kernel doesn't need to do, and adding the flag
for userspace).

v2: Minor style fixes from checkpatch.

Signed-off-by: Jonas Pfeil <pfeiljonas@gmx.de>
Signed-off-by: Eric Anholt <eric@anholt.net>
2016-11-16 13:25:26 -08:00
Eric Anholt 7154d76fed drm/vc4: Add support for rendering with ETC1 textures.
The validation for it ends up being quite simple, but I hadn't got
around to it before merging the driver.  For backwards compatibility,
we also need to add a flag so that the userspace GL driver can easily
tell if the kernel will allow ETC1 textures (on an old kernel, it will
continue to convert to RGBA8)

Signed-off-by: Eric Anholt <eric@anholt.net>
2016-11-03 18:55:46 -07:00
Eric Anholt 7363cee5b4 drm/vc4: Add a getparam to signal support for branches.
Userspace needs to know if it can create shaders that do branching.
Otherwise, for backwards compatibility with old kernels it needs to
lower if statements to conditional assignments.

Signed-off-by: Eric Anholt <eric@anholt.net>
2016-07-15 15:19:51 -07:00
Eric Anholt af713795c5 drm/vc4: Add a getparam ioctl for getting the V3D identity regs.
As I extend the driver to support different V3D revisions, userspace
needs to know what version it's targeting.  This is most easily
detected using the V3D identity registers.

v2: Make sure V3D is runtime PM on when reading the registers.
v3: Switch to a 64-bit param value (suggested by Rob Clark in review)

Signed-off-by: Eric Anholt <eric@anholt.net>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> (v2)
Reviewed-by: Rob Clark <robdclark@gmail.com> (v3, over irc)
2016-07-14 08:08:35 -07:00
Emil Velikov 6a982350f8 drm/vc4: add extern C guard for the UAPI header
Cc: Eric Anholt <eric@anholt.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-05-13 14:06:18 +01:00
Eric Anholt 214613656b drm/vc4: Add an interface for capturing the GPU state after a hang.
This can be parsed with vc4-gpu-tools tools for trying to figure out
what was going on.

v2: Use __u32-style types.

Signed-off-by: Eric Anholt <eric@anholt.net>
2015-12-07 20:49:49 -08:00
Eric Anholt d5b1a78a77 drm/vc4: Add support for drawing 3D frames.
The user submission is basically a pointer to a command list and a
pointer to uniforms.  We copy those in to the kernel, validate and
relocate them, and store the result in a GPU BO which we queue for
execution.

v2: Drop support for NV shader recs (not necessary for GL), simplify
    vc4_use_bo(), improve bin flush/semaphore checks, use __u32 style
    types.

Signed-off-by: Eric Anholt <eric@anholt.net>
2015-12-07 20:05:10 -08:00
Eric Anholt 463873d570 drm/vc4: Add an API for creating GPU shaders in GEM BOs.
Since we have no MMU, the kernel needs to validate that the submitted
shader code won't make any accesses to memory that the user doesn't
control, which involves banning some operations (general purpose DMA
writes), and tracking where we need to write out pointers for other
operations (texture sampling).  Once it's validated, we return a GEM
BO containing the shader, which doesn't allow mapping for write or
exporting to other subsystems.

v2: Use __u32-style types.

Signed-off-by: Eric Anholt <eric@anholt.net>
2015-12-07 20:05:09 -08:00
Eric Anholt d5bc60f6ad drm/vc4: Add create and map BO ioctls.
While there exist dumb APIs for creating and mapping BOs, one of the
rules is that drivers doing 3D acceleration have to provide their own
APIs for buffer allocation (besides, the pitch/height parameters of
the dumb alloc don't really make sense for a lot of 3D allocations).

v2: Use __u32-style types, use "drm.h" instead of <drm/drm.h>.

Signed-off-by: Eric Anholt <eric@anholt.net>
2015-12-07 20:04:57 -08:00