There are 3 big benefits to suballocating a single big DMA buffer
for command submission:
1. Avoid hammering CMA. The old way of allocating and freeing a DMA
buffer for each submission was hitting some of the real slow
pathes in CMA, as this allocator was not designed for a concurrent
small buffers load.
2. Less TLB flushes on IOMMUv2. If a new command buffer is mapped into
the GPU address space the MMU TLBs need to be flushed. By having
one big buffer statically mapped to the GPU, a lot of those flushes
can be avoided.
3. No funky workarounds for GC3000. The FE TLB flush on GC3000 isn't
reliable. To work around that we tried to lay out the cmdbufs in
the GPU address space in a way to avoid this issue. This hasn't
always worked if the address space is crowded. A single statically
mapped buffer avoids the erratum completely.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
This will get more complex with the following changes, so move it
into its own place.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
With MMUv2 all buffers need to be mapped through the MMU once it
is enabled. Align the buffer size to 4K, as the MMU is only able to
map page aligned buffers.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Split out into a new externally visible function, as the IOMMUv2
code needs this functionality, too.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Split out into a new externally visible function, as the IOMMUv2
code needs this functionality, too.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Fence contexts are created on the fly (for example) by the GPU scheduler used
in the amdgpu driver as a result of an userspace request. Because of this
userspace could in theory force a wrap around of the 32bit context number
if it doesn't behave well.
Avoid this by increasing the context number to 64bits. This way even when
userspace manages to allocate a billion contexts per second it takes more
than 500 years for the context number to wrap around.
v2: fix printf formats as well.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Acked-by: Sumit Semwal <sumit.semwal@linaro.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1464786612-5010-2-git-send-email-deathsimple@vodafone.de
Currently, we scan the list of mappings each time we want to operate on
the vram_mapping struct. Rather than repeatedly scanning these, look
them up once in the submission path, and then use _reference and
_unreference methods as necessary to manage this object.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Add tracking of the current execution state (iow, active GPU pipe).
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Export further minor feature bitmasks and the varyings count from
the GPU specifications registers to userspace.
Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
This adds the etnaviv DRM driver and hooks it up in Makefiles
and Kconfig.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>