I had originally asked Stefan Schake to drop the pad field from the
syncobj changes that just landed, because I couldn't come up with a
reason to align to 64 bits.
Talking with Dave Airlie about the new v3d driver's submit ioctl, we
came up with a reason: sizeof() on 64-bit platforms may align to 64
bits, in which case the userspace will be submitting the aligned size
and the final 32 bits won't be zero-padded by the kernel. If
userspace doesn't zero-fill, then a future ABI change adding a 32-bit
field at the end could potentially cause the kernel to read undefined
data from old userspace (our userspace happens to use structure
initialization that zero-fills, but as a general rule we try not to
rely on that in the kernel).
Signed-off-by: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20180430235927.28712-1-eric@anholt.net
Reviewed-by: Stefan Schake <stschake@gmail.com>
Allow specifying a syncobj on render job submission where we store the
fence for the job. This gives userland flexible access to the fence.
v2: Use 0 as invalid syncobj to drop flag (Eric)
Don't reintroduce the padding (Eric)
Signed-off-by: Stefan Schake <stschake@gmail.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/1524607427-12876-3-git-send-email-stschake@gmail.com
Allow userland to specify a syncobj that is waited on before a render job
starts processing.
v2: Use 0 as invalid syncobj to drop flag (Eric)
Drop extra newline (Eric)
Signed-off-by: Stefan Schake <stschake@gmail.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/1524607427-12876-2-git-send-email-stschake@gmail.com
The V3D engine has various hardware counters which might be interesting
to userspace performance analysis tools.
Expose new ioctls to create/destroy a performance monitor object and
query the counter values of this perfmance monitor.
Note that a perfomance monitor is given an ID that is only valid on the
file descriptor it has been allocated from. A performance monitor can be
attached to a CL submission and the driver will enable HW counters for
this request and update the performance monitor values at the end of the
job.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20180112090926.12538-1-boris.brezillon@free-electrons.com
This ioctl will allow us to purge inactive userspace buffers when the
system is running out of contiguous memory.
For now, the purge logic is rather dumb in that it does not try to
release only the amount of BO needed to meet the last CMA alloc request
but instead purges all objects placed in the purgeable pool as soon as
we experience a CMA allocation failure.
Note that the in-kernel BO cache is always purged before the purgeable
cache because those objects are known to be unused while objects marked
as purgeable by a userspace application/library might have to be
restored when they are marked back as unpurgeable, which can be
expensive.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20171019125748.3152-1-boris.brezillon@free-electrons.com
This is useful to allow GL to provide defined results for overlapping
glBlitFramebuffer, which X11 in turn uses to accelerate uncomposited
window movement without first blitting to a temporary. x11perf
-copywinwin100 goes from 1850/sec to 4850/sec.
v2: Default to the same behavior as before when the flags aren't
passed. (suggested by Boris)
Signed-off-by: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20170725162733.28007-2-eric@anholt.net
Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com>
This has proven immensely useful for debugging memory leaks and
overallocation (which is a rather serious concern on the platform,
given that we typically run at about 256MB of CMA out of up to 1GB
total memory, with framebuffers that are about 8MB ecah).
The state of the art without this is to dump debug logs from every GL
application, guess as to kernel allocations based on bo_stats, and try
to merge that all together into a global picture of memory allocation
state. With this, you can add a couple of calls to the debug build of
the 3D driver and get a pretty detailed view of GPU memory usage from
/debug/dri/0/bo_stats (or when we debug print to dmesg on allocation
failure).
The Mesa side currently labels at the gallium resource level (so you
see that a 1920x20 pixmap has been created, presumably for the window
system panel), but we could extend that to be even more useful with
glObjectLabel() names being sent all the way down to the kernel.
(partial) example of sorted debugfs output with Mesa labeling all
resources:
kernel BO cache: 16392kb BOs (3)
tiling shadow 1920x1080: 8160kb BOs (1)
resource 1920x1080@32/0: 8160kb BOs (1)
scanout resource 1920x1080@32/0: 8100kb BOs (1)
kernel: 8100kb BOs (1)
v2: Use strndup_user(), use lockdep assertion instead of just a
comment, fix an array[-1] reference, extend comment about name
freeing.
Signed-off-by: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20170725182718.31468-2-eric@anholt.net
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
This allows mesa to set the tiling format for a BO and have that
tiling format be respected by mesa on the other side of an
import/export (and by vc4 scanout in the kernel), without defining a
protocol to pass the tiling through userspace.
Signed-off-by: Eric Anholt <eric@anholt.net>
Link: http://patchwork.freedesktop.org/patch/msgid/20170608001336.12842-2-eric@anholt.net
Acked-by: Dave Airlie <airlied@redhat.com>
FS threading brings performance improvements of 0-20% in glmark2.
The validation code checks for thread switch signals and ensures that
the registers of the other thread are not touched, and that our clamps
are not live across thread switches. It also checks that the
threading and branching instructions do not interfere.
(Original patch by Jonas, changes by anholt for style cleanup,
removing validation the kernel doesn't need to do, and adding the flag
for userspace).
v2: Minor style fixes from checkpatch.
Signed-off-by: Jonas Pfeil <pfeiljonas@gmx.de>
Signed-off-by: Eric Anholt <eric@anholt.net>
The validation for it ends up being quite simple, but I hadn't got
around to it before merging the driver. For backwards compatibility,
we also need to add a flag so that the userspace GL driver can easily
tell if the kernel will allow ETC1 textures (on an old kernel, it will
continue to convert to RGBA8)
Signed-off-by: Eric Anholt <eric@anholt.net>
Userspace needs to know if it can create shaders that do branching.
Otherwise, for backwards compatibility with old kernels it needs to
lower if statements to conditional assignments.
Signed-off-by: Eric Anholt <eric@anholt.net>
As I extend the driver to support different V3D revisions, userspace
needs to know what version it's targeting. This is most easily
detected using the V3D identity registers.
v2: Make sure V3D is runtime PM on when reading the registers.
v3: Switch to a 64-bit param value (suggested by Rob Clark in review)
Signed-off-by: Eric Anholt <eric@anholt.net>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> (v2)
Reviewed-by: Rob Clark <robdclark@gmail.com> (v3, over irc)
This can be parsed with vc4-gpu-tools tools for trying to figure out
what was going on.
v2: Use __u32-style types.
Signed-off-by: Eric Anholt <eric@anholt.net>
The user submission is basically a pointer to a command list and a
pointer to uniforms. We copy those in to the kernel, validate and
relocate them, and store the result in a GPU BO which we queue for
execution.
v2: Drop support for NV shader recs (not necessary for GL), simplify
vc4_use_bo(), improve bin flush/semaphore checks, use __u32 style
types.
Signed-off-by: Eric Anholt <eric@anholt.net>
Since we have no MMU, the kernel needs to validate that the submitted
shader code won't make any accesses to memory that the user doesn't
control, which involves banning some operations (general purpose DMA
writes), and tracking where we need to write out pointers for other
operations (texture sampling). Once it's validated, we return a GEM
BO containing the shader, which doesn't allow mapping for write or
exporting to other subsystems.
v2: Use __u32-style types.
Signed-off-by: Eric Anholt <eric@anholt.net>
While there exist dumb APIs for creating and mapping BOs, one of the
rules is that drivers doing 3D acceleration have to provide their own
APIs for buffer allocation (besides, the pitch/height parameters of
the dumb alloc don't really make sense for a lot of 3D allocations).
v2: Use __u32-style types, use "drm.h" instead of <drm/drm.h>.
Signed-off-by: Eric Anholt <eric@anholt.net>