Re-arrange things a bit so that we can get work requested after a bo
fence passes, like pageflip, done before retiring bo's. Without any
sort of bo cache in userspace, some games can trigger hundred's of
transient bo's, which can cause retire to take a long time (5-10ms).
Obviously we want a bo cache.. but this cleanup will make things a
bit easier for atomic as well and makes things a bit cleaner.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: David Brown <davidb@codeaurora.org>
Occasionally we seem to miss an IRQ from the ME (microengine). I'm not
entirely sure the root cause, but for now we can unwedge things by
retiring from the hangcheck timer.
Signed-off-by: Rob Clark <robdclark@gmail.com>
If gpu locks up with the rptr shortly beyond the wrap-around point in
the ringbuffer, because the rptr was not reset (but wptr is, by virtue
of resetting rb->cur), we could end up in a scenario where we think
there is not enough space in the ringbuffer for the next cmds. And
since the CP won't reset rptr until after processing an IB, this leaves
things in a sort of deadlock.
So reset rptr too. And a bit more spiffing up of hangcheck to make
things easier to debug.
Signed-off-by: Rob Clark <robdclark@gmail.com>
The userspace API already had everything needed to handle read vs write
synchronization. This patch actually bothers to hook it up properly, so
that we don't need to (for example) stall on userspace read access to a
buffer that gpu is also still reading.
Signed-off-by: Rob Clark <robdclark@gmail.com>
A basic, no-frills recovery mechanism in case the gpu gets wedged. We
could try to be a bit more fancy and restart the next submit after the
one that got wedged, but for now keep it simple. This is enough to
recover things if, for example, the gpu hangs mid way through a piglit
run.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Add initial support for a3xx 3d core.
So far, with hardware that I've seen to date, we can have:
+ zero, one, or two z180 2d cores
+ a3xx or a2xx 3d core, which share a common CP (the firmware
for the CP seems to implement some different PM4 packet types
but the basics of cmdstream submission are the same)
Which means that the eventual complete "class" hierarchy, once
support for all past and present hw is in place, becomes:
+ msm_gpu
+ adreno_gpu
+ a3xx_gpu
+ a2xx_gpu
+ z180_gpu
This commit splits out the parts that will eventually be common
between a2xx/a3xx into adreno_gpu, and the parts that are even
common to z180 into msm_gpu.
Note that there is no cmdstream validation required. All memory access
from the GPU is via IOMMU/MMU. So as long as you don't map silly things
to the GPU, there isn't much damage that the GPU can do.
Signed-off-by: Rob Clark <robdclark@gmail.com>