OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Boris Brezillon	65101d8c91	drm/vc4: Expose performance counters to userspace The V3D engine has various hardware counters which might be interesting to userspace performance analysis tools. Expose new ioctls to create/destroy a performance monitor object and query the counter values of this perfmance monitor. Note that a perfomance monitor is given an ID that is only valid on the file descriptor it has been allocated from. A performance monitor can be attached to a CL submission and the driver will enable HW counters for this request and update the performance monitor values at the end of the job. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Eric Anholt <eric@anholt.net> Link: https://patchwork.freedesktop.org/patch/msgid/20180112090926.12538-1-boris.brezillon@free-electrons.com	2018-02-10 22:23:26 +00:00
Noralf Trønnes	b8f429a77f	drm/vc4: Use drm_fb_cma_fbdev_init/fini() Use drm_fb_cma_fbdev_init() and drm_fb_cma_fbdev_fini() which relies on the fact that drm_device holds a pointer to the drm_fb_helper structure. This means that the driver doesn't have to keep track of that. Also use the drm_fb_helper functions directly. Cc: Eric Anholt <eric@anholt.net> Signed-off-by: Noralf Trønnes <noralf@tronnes.org> Reviewed-by: Eric Anholt <eric@anholt.net> Link: https://patchwork.freedesktop.org/patch/msgid/20171115142001.45358-18-noralf@tronnes.org	2017-12-08 14:47:43 +01:00
Boris Brezillon	b9f19259b8	drm/vc4: Add the DRM_IOCTL_VC4_GEM_MADVISE ioctl This ioctl will allow us to purge inactive userspace buffers when the system is running out of contiguous memory. For now, the purge logic is rather dumb in that it does not try to release only the amount of BO needed to meet the last CMA alloc request but instead purges all objects placed in the purgeable pool as soon as we experience a CMA allocation failure. Note that the in-kernel BO cache is always purged before the purgeable cache because those objects are known to be unused while objects marked as purgeable by a userspace application/library might have to be restored when they are marked back as unpurgeable, which can be expensive. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Eric Anholt <eric@anholt.net> Reviewed-by: Eric Anholt <eric@anholt.net> Link: https://patchwork.freedesktop.org/patch/msgid/20171019125748.3152-1-boris.brezillon@free-electrons.com	2017-10-19 10:34:49 -07:00
Eric Anholt	f30994622b	drm/vc4: Add an ioctl for labeling GEM BOs for summary stats This has proven immensely useful for debugging memory leaks and overallocation (which is a rather serious concern on the platform, given that we typically run at about 256MB of CMA out of up to 1GB total memory, with framebuffers that are about 8MB ecah). The state of the art without this is to dump debug logs from every GL application, guess as to kernel allocations based on bo_stats, and try to merge that all together into a global picture of memory allocation state. With this, you can add a couple of calls to the debug build of the 3D driver and get a pretty detailed view of GPU memory usage from /debug/dri/0/bo_stats (or when we debug print to dmesg on allocation failure). The Mesa side currently labels at the gallium resource level (so you see that a 1920x20 pixmap has been created, presumably for the window system panel), but we could extend that to be even more useful with glObjectLabel() names being sent all the way down to the kernel. (partial) example of sorted debugfs output with Mesa labeling all resources: kernel BO cache: 16392kb BOs (3) tiling shadow 1920x1080: 8160kb BOs (1) resource 1920x1080@32/0: 8160kb BOs (1) scanout resource 1920x1080@32/0: 8100kb BOs (1) kernel: 8100kb BOs (1) v2: Use strndup_user(), use lockdep assertion instead of just a comment, fix an array[-1] reference, extend comment about name freeing. Signed-off-by: Eric Anholt <eric@anholt.net> Link: https://patchwork.freedesktop.org/patch/msgid/20170725182718.31468-2-eric@anholt.net Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-07-28 16:04:53 -07:00
Eric Anholt	55a0b9d70a	drm/vc4: Remove dead vc4_event_pending(). It is no longer used as of commit `34c8ea400f` ("drm/vc4: Mimic drm_atomic_helper_commit() behavior") Signed-off-by: Eric Anholt <eric@anholt.net> Link: http://patchwork.freedesktop.org/patch/msgid/20170621185002.28563-4-eric@anholt.net Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com>	2017-06-22 11:14:39 -07:00
Eric Anholt	83753117f1	drm/vc4: Add get/set tiling ioctls. This allows mesa to set the tiling format for a BO and have that tiling format be respected by mesa on the other side of an import/export (and by vc4 scanout in the kernel), without defining a protocol to pass the tiling through userspace. Signed-off-by: Eric Anholt <eric@anholt.net> Link: http://patchwork.freedesktop.org/patch/msgid/20170608001336.12842-2-eric@anholt.net Acked-by: Dave Airlie <airlied@redhat.com>	2017-06-15 16:02:45 -07:00
Boris Brezillon	9a8d5e4a53	drm/vc4: Fix comment in vc4_drv.h Fixes a copy&paste error. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Reviewed-by: Eric Anholt <eric@anholt.net> Link: http://patchwork.freedesktop.org/patch/msgid/1495550187-525-1-git-send-email-boris.brezillon@free-electrons.com	2017-05-31 10:50:54 -07:00
Masahiro Yamada	b7e8e25b37	drm/vc4: fix include notation and remove -Iinclude/drm flag Include <drm/.h> instead of relative path from include/drm, then remove the -Iinclude/drm compiler flag. While we are here, use <...> instead of "..." for include/linux/.h and include/sound/*.h headers too. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1495081793-9707-2-git-send-email-yamada.masahiro@socionext.com	2017-05-22 09:36:01 +02:00
Daniel Vetter	1bf6ad622b	drm/vblank: drop the mode argument from drm_calc_vbltimestamp_from_scanoutpos If we restrict this helper to only kms drivers (which is the case) we can look up the correct mode easily ourselves. But it's a bit tricky: - All legacy drivers look at crtc->hwmode. But that is updated already at the beginning of the modeset helper, which means when we disable a pipe. Hence the final timestamps might be a bit off. But since this is an existing bug I'm not going to change it, but just try to be bug-for-bug compatible with the current code. This only applies to radeon&amdgpu. - i915 tries to get it perfect by updating crtc->hwmode when the pipe is off (i.e. vblank->enabled = false). - All other atomic drivers look at crtc->state->adjusted_mode. Those that look at state->requested_mode simply don't adjust their mode, so it's the same. That has two problems: Accessing crtc->state from interrupt handling code is unsafe, and it's updated before we shut down the pipe. For nonblocking modesets it's even worse. For atomic drivers try to implement what i915 does. To do that we add a new hwmode field to the vblank structure, and update it from drm_calc_timestamping_constants(). For atomic drivers that's called from the right spot by the helper library already, so all fine. But for safety let's enforce that. For legacy driver this function is only called at the end (oh the fun), which is broken, so again let's not bother and just stay bug-for-bug compatible. The benefit is that we can use drm_calc_vbltimestamp_from_scanoutpos directly to implement ->get_vblank_timestamp in every driver, deleting a lot of code. v2: Completely new approach, trying to mimick the i915 solution. v3: Fixup kerneldoc. v4: Drop the WARN_ON to check that the vblank is off, atomic helpers currently unconditionally call this. Recomputing the same stuff should be harmless. v5: Fix typos and move misplaced hunks to the right patches (Neil). v6: Undo hunk movement (kbuild). Cc: Mario Kleiner <mario.kleiner@tuebingen.mpg.de> Cc: Eric Anholt <eric@anholt.net> Cc: Rob Clark <robdclark@gmail.com> Cc: linux-arm-msm@vger.kernel.org Cc: freedreno@lists.freedesktop.org Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Neil Armstrong <narmstrong@baylibre.com> Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170509140329.24114-4-daniel.vetter@ffwll.ch	2017-05-10 10:21:31 +02:00
Daniel Vetter	3fcdcb2709	drm/vblank: Switch to bool in_vblank_irq in get_vblank_timestamp It's overkill to have a flag parameter which is essentially used just as a boolean. This takes care of core + adjusting drivers. Adjusting the scanout position callback is a bit harder, since radeon also supplies it's own driver-private flags in there. v2: Fixup misplaced hunks (Neil). v3: kbuild says v1 was better ... Cc: Mario Kleiner <mario.kleiner@tuebingen.mpg.de> Cc: Eric Anholt <eric@anholt.net> Cc: Rob Clark <robdclark@gmail.com> Cc: linux-arm-msm@vger.kernel.org Cc: freedreno@lists.freedesktop.org Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170509140329.24114-2-daniel.vetter@ffwll.ch	2017-05-10 10:21:17 +02:00
Daniel Vetter	d673c02c4b	drm/vblank: Switch drm_driver->get_vblank_timestamp to return a bool There's really no reason for anything more: - Calling this while the crtc vblank stuff isn't set up is a driver bug. Those places alrready DRM_ERROR. - Calling this when the crtc is off is either a driver bug (calling drm_crtc_handle_vblank at the wrong time) or a core bug (for anything else). Again, we DRM_ERROR. - EINVAL is checked at higher levels already, and if we'd use struct drm_crtc * instead of (dev, pipe) it would be real obvious that those are again core bugs. The only valid failure mode is crap hardware that couldn't sample a useful timestamp, to ask the core to just grab a not-so-accurate timestamp. Bool is perfectly fine for that. v2: Also fix up the one caller, I lost that in the shuffling (Jani). v3: Fixup commit message (Neil). Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Mario Kleiner <mario.kleiner@tuebingen.mpg.de> Cc: Eric Anholt <eric@anholt.net> Cc: Rob Clark <robdclark@gmail.com> Cc: linux-arm-msm@vger.kernel.org Cc: freedreno@lists.freedesktop.org Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Neil Armstrong <narmstrong@baylibre.com> Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170509140329.24114-1-daniel.vetter@ffwll.ch	2017-05-10 10:21:08 +02:00
Eric Anholt	b72a2816e3	drm/vc4: Turn the V3D clock on at runtime. For the Raspberry Pi's bindings, the power domain also implicitly turns on the clock and deasserts reset, but for the new Cygnus port we start representing the clock in the devicetree. v2: Document the clock-names property, check for -ENOENT for no clock in DT. v3: Drop NULL checks around clk calls which embed NULL checks. v4: Drop clk-names (feedback by Rob Herring) Signed-off-by: Eric Anholt <eric@anholt.net> Acked-by: Rob Herring <robh@kernel.org> Link: http://patchwork.freedesktop.org/patch/msgid/20170428224223.21904-1-eric@anholt.net	2017-05-08 12:24:06 -07:00
Eric Anholt	553c942f8b	drm/vc4: Allow using more than 256MB of CMA memory. Until now, we've had to limit Raspberry Pi to 256MB of CMA memory to keep from triggering the hardware addressing bug between the tile binner and the tile alloc memory (where the top 4 bits come from the tile state data array's address). To work around that and allow more memory to be reserved for graphics, allocate a single BO to store tile state data arrays and tile alloc/overflow memory while the GPU is active, and make sure that that one BO doesn't happen to cross a 256MB boundary. With that in place, we can allocate textures and shaders anywhere in system memory (still contiguous, of course). Signed-off-by: Eric Anholt <eric@anholt.net> Link: http://patchwork.freedesktop.org/patch/msgid/20170327231025.19391-1-eric@anholt.net Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com>	2017-04-18 14:32:20 -07:00
Eric Anholt	cdec4d3613	drm/vc4: Expose dma-buf fences for V3D rendering. This is needed for proper synchronization with display on another DRM device (pl111 or tinydrm) with buffers produced by vc4 V3D. Fixes the new igt vc4_dmabuf_poll testcase, and rendering of one of the glmark2 desktop tests on pl111+vc4. This doesn't yet introduce waits on another device's fences before vc4's rendering/display, because I don't have testcases for them. v2: Reuse dma_fence_free(), retitle commit message to clarify that it's not a full dma-buf fencing implementation yet. Signed-off-by: Eric Anholt <eric@anholt.net> Link: http://patchwork.freedesktop.org/patch/msgid/20170412191202.22740-6-eric@anholt.net Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2017-04-13 11:00:28 -07:00
Shawn Guo	0d5f46fa4c	drm: vc4: use vblank hooks in struct drm_crtc_funcs The vblank hooks in struct drm_driver are deprecated and only meant for legacy drivers. For modern drivers with DRIVER_MODESET flag, the hooks in struct drm_crtc_funcs should be used instead. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Cc: Eric Anholt <eric@anholt.net> Reviewed-by: Eric Anholt <eric@anholt.net> Link: http://patchwork.freedesktop.org/patch/msgid/1486458995-31018-23-git-send-email-shawnguo@kernel.org	2017-02-09 16:12:47 +08:00
Eric Anholt	4078f57571	drm/vc4: Add DSI driver The DSI0 and DSI1 blocks on the 2835 are related hardware blocks. Some registers move around, and the featureset is slightly different, as DSI1 (the 4-lane DSI) is a later version of the hardware block. This driver doesn't yet enable DSI0, since we don't have any hardware to test against, but it does put a lot of the register definitions and code in place. v2: Use the clk_hw interfaces, don't set CLK_IS_BASIC (from review by Stephen Boyd) Signed-off-by: Eric Anholt <eric@anholt.net> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> (v1) Link: http://patchwork.freedesktop.org/patch/msgid/20170131192912.11316-1-eric@anholt.net	2017-02-01 12:51:23 -08:00
Noralf Trønnes	55d6616585	drm/vc4: Remove vc4_debugfs_cleanup() drm_debugfs_cleanup() now removes all minor->debugfs_list entries automatically, so the drm_driver.debugfs_cleanup callback is not needed. Cc: eric@anholt.net Signed-off-by: Noralf Trønnes <noralf@tronnes.org> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20170126225621.12314-17-noralf@tronnes.org	2017-01-30 09:48:48 +01:00
Shawn Guo	c77b9abf3c	drm: vc4: use crtc helper drm_crtc_from_index() Use drm_crtc_from_index() to find drm_crtc for given index, so that we do not need to maintain a pointer array in struct vc4_dev. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Cc: Eric Anholt <eric@anholt.net> Signed-off-by: Sean Paul <seanpaul@chromium.org> Link: http://patchwork.freedesktop.org/patch/msgid/1483961145-18453-7-git-send-email-shawnguo@kernel.org	2017-01-18 09:21:13 -05:00
Laurent Pinchart	9338203c4f	drm: Don't include <drm/drm_encoder.h> in <drm/drm_crtc.h> <drm/drm_crtc.h> used to define most of the in-kernel KMS API. It has now been split into separate files for each object type, but still includes most other KMS headers to avoid breaking driver compilation. As a step towards fixing that problem, remove the inclusion of <drm/drm_encoder.h> from <drm/drm_crtc.h> and include it instead where appropriate. Also remove the forward declarations of the drm_encoder and drm_encoder_helper_funcs structures from <drm/drm_crtc.h> as they're not needed in the header. <drm/drm_encoder.h> now has to include <drm/drm_mode.h> and contain a forward declaration of struct drm_encoder in order to allow including it as the first header in a compilation unit. Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Sinclair Yeh <syeh@vmware.com> # For vmwgfx Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Archit Taneja <architt@codeaurora.org> Link: http://patchwork.freedesktop.org/patch/msgid/1481709550-29226-2-git-send-email-laurent.pinchart+renesas@ideasonboard.com	2016-12-18 16:29:29 +05:30
Boris Brezillon	e4b81f8c74	drm/vc4: Add support for the VEC (Video Encoder) IP The VEC IP is a TV DAC, providing support for PAL and NTSC standards. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Eric Anholt <eric@anholt.net>	2016-12-09 15:26:31 -08:00
Boris Brezillon	ab8df60e3a	drm/vc4: Fix ->clock_select setting for the VEC encoder PV_CONTROL_CLK_SELECT_VEC is actually 2 and not 0. Fix the definition and rework the vc4_set_crtc_possible_masks() to cover the full range of the PV_CONTROL_CLK_SELECT field. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Eric Anholt <eric@anholt.net>	2016-12-09 15:26:29 -08:00
Derek Foreman	26fc78f6fe	drm/vc4: Fix race between page flip completion event and clean-up There was a small window where a userspace program could submit a pageflip after receiving a pageflip completion event yet still receive EBUSY. Signed-off-by: Derek Foreman <derekf@osg.samsung.com> Signed-off-by: Eric Anholt <eric@anholt.net> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Daniel Stone <daniels@collabora.com>	2016-11-29 15:39:45 -08:00
Jonas Pfeil	c778cc5df9	drm/vc4: Add fragment shader threading support FS threading brings performance improvements of 0-20% in glmark2. The validation code checks for thread switch signals and ensures that the registers of the other thread are not touched, and that our clamps are not live across thread switches. It also checks that the threading and branching instructions do not interfere. (Original patch by Jonas, changes by anholt for style cleanup, removing validation the kernel doesn't need to do, and adding the flag for userspace). v2: Minor style fixes from checkpatch. Signed-off-by: Jonas Pfeil <pfeiljonas@gmx.de> Signed-off-by: Eric Anholt <eric@anholt.net>	2016-11-16 13:25:26 -08:00
Eric Anholt	7edabee06a	drm/vc4: Fix races when the CS reads from render targets. With the introduction of bin/render pipelining, the previous job may not be completed when we start binning the next one. If the previous job wrote our VBO, IB, or CS textures, then the binning stage might get stale or uninitialized results. Fixes the major rendering failure in glmark2 -b terrain. Signed-off-by: Eric Anholt <eric@anholt.net> Fixes: `ca26d28bba` ("drm/vc4: improve throughput by pipelining binning and rendering jobs") Cc: stable@vger.kernel.org	2016-10-06 11:53:50 -07:00
Masahiro Yamada	57b9f56944	drm/vc4: cleanup with list_first_entry_or_null() The combo of list_empty() check and return list_first_entry() can be replaced with list_first_entry_or_null(). Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Eric Anholt <eric@anholt.net>	2016-10-06 11:53:35 -07:00
Eric Anholt	9326e6f255	drm/vc4: Fix overflow mem unreferencing when the binner runs dry. Overflow memory handling is tricky: While it's still referenced by the BPO registers, we want to keep it from being freed. When we are putting a new set of overflow memory in the registers, we need to assign the old one to the last rendering job using it. We were looking at "what's currently running in the binner", but since the bin/render submission split, we may end up with the binner completing and having no new job while the renderer is still processing. So, if we don't find a bin job at all, look at the highest-seqno (last) render job to attach our overflow to. Signed-off-by: Eric Anholt <eric@anholt.net> Fixes: `ca26d28bba` ("drm/vc4: improve throughput by pipelining binning and rendering jobs") Cc: stable@vger.kernel.org	2016-08-19 19:17:34 -07:00
Dave Airlie	2d635fded2	This pull request brings in vc4 shader validation for branching, allowing GLSL shaders with non-unrolled loops. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABCgAGBQJXiWHaAAoJELXWKTbR/J7olkkP/3buVvzBdxQmT7yMeBL2Up+F MZ+W/pr7+gWXT/rvyY8zbEZV1txQtZDnvu8cC3h9E66TVYpTx3dYypECwiRKFt5f HXhrkiniu65zBPhtYxloswcr686SpFOhr9rwy6cJbCCvuZQZVpQ41Fu7BinaYJc2 uGllnNmRkB7tPG/6BYQ/MAWvR7AjTaaAmcdhNZDyEHqXHiUu8NPprL0BQIwS1GMn 65pQjni/WEek6/WxncxszaDWfsLMRJPKiuPV4kAsoQDluB3ortk3k1VVNJ+n3aXw GjlMNbU2ganEGWJS16e7HXXz3I19SQHn5mnL74X3GGL8r2x9AIZ64Zk7Ru659NBq CW/NsSWM/TfU6T4iflbZjSVlmtbvdX5T0rDFb+lMFNXG8EvmRIc3fkbSeh6gF9sF hc3UDS3ScQBAm8QOpDiVAsXNBGJUTsvxdDYgVUj1mjPtbD1ZDmEWYdb2fzs/n7ja qU/CW9Ap3tModkTjaD58RoenTkbTXymhxmRd1JHoxVztkAM00fNxtGP6fTxuGLOW snOoE9Ahdq/IzV/yZrqvWRCLTDenQ/ZxcP+fYleJ6M3qLxfHkgZX/chy/n21Q80G Qe40uqBWreTCz6SzCEOXeaJ9RDbZfSomVfWuLzm5IGMhOmh1rqkpdgAUKQW7OvxB e/LnruAmY/K7Yin87gK6 =Qh0U -----END PGP SIGNATURE----- Merge tag 'drm-vc4-next-2016-07-15' of https://github.com/anholt/linux into drm-next This pull request brings in vc4 shader validation for branching, allowing GLSL shaders with non-unrolled loops. * tag 'drm-vc4-next-2016-07-15' of https://github.com/anholt/linux: drm/vc4: Fix a "the the" typo in a comment. drm/vc4: Fix definition of QPU_R_MS_REV_FLAGS drm/vc4: Add a getparam to signal support for branches. drm/vc4: Add support for branching in shader validation. drm/vc4: Add a bitmap of branch targets during shader validation. drm/vc4: Move validation's current/max ip into the validation struct. drm/vc4: Add a getparam ioctl for getting the V3D identity regs.	2016-07-16 11:25:11 +10:00
Eric Anholt	6d45c81d22	drm/vc4: Add support for branching in shader validation. We're already checking that branch instructions are between the start of the shader and the proper PROG_END sequence. The other thing we need to make branching safe is to verify that the shader doesn't read past the end of the uniforms stream. To do that, we require that at any basic block reading uniforms have the following instructions: load_imm temp, <next offset within uniform stream> add unif_addr, temp, unif The instructions are generated by userspace, and the kernel verifies that the load_imm is of the expected offset, and that the add adds it to a uniform. We track which uniform in the stream that is, and at draw call time fix up the uniform stream to have the address of the start of the shader's uniforms at that location. Signed-off-by: Eric Anholt <eric@anholt.net>	2016-07-15 15:19:50 -07:00
Dave Airlie	35b8a74924	This pull request brings in new vc4 plane formats for Android, precise vblank timestamping, and a couple of small cleanups. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABCgAGBQJXhUULAAoJELXWKTbR/J7oZ1oQAKnTTJlnCbaSNrWbRBUuaMtO S2RQKyMI2LOpf4XE13cNHm0IaYNOw6hJIXlnxeuzHbQSGvOrHnabluZuvAfLa3OQ 4bb8K0elWsPRbtmx2L8DjncLCmmmOsbczrTwo3efq70XZs4PyaCp2vHpCU8kI7HJ xO/fk6E8sYDPFCxZxb82C7LTSjQBlPn58dD+cEFg1kGILOhyR1eZwRj8e4netjKU gN/059R6VPPor+I8oIMrqoHAxAlZUGnVPsK/1rgR5cfxlC1NPKU71eI7xiBLNclt 86BcvU/LuwAYt81UirnGTQU/m47dCMK4mmpzD/9EgVW/KTUPLPubML+u9fL67+B4 BJ7T7N4t7APblqkO/iI9DcTAkZNmydq4sdsfDdXcVZ3umDTNpmb/PQuq+rFEc1CZ qsBw13kJ4bHBYUpZvrxuCesSRLpXhUiSOg/uVnRHTdiALZqQQyzQlnw6Q4GCMKiz R99/k4/7/cAgQkGElwLeHR8Aw0r26m+X4pnLN9lrafjQjlH04CcuvFwlak28jnit Oi0eZ0VD9RTHwPzUIoe4M32Q/7oAe7y5b7gCNPbM/0s56WQv+cIHi7CdgdIYtUWs BIyeBTzpLvzkEvrEMYoipgpv3scbudMssTd3bo8gsdOu0tllBzi59Poo5hl6pQbs lseJ7LavmK5sPDPsIZb3 =aF3C -----END PGP SIGNATURE----- Merge tag 'drm-vc4-next-2016-07-12' of https://github.com/anholt/linux into drm-next This pull request brings in new vc4 plane formats for Android, precise vblank timestamping, and a couple of small cleanups. * tag 'drm-vc4-next-2016-07-12' of https://github.com/anholt/linux: drm/vc4: remove redundant ret status check drm/vc4: Implement precise vblank timestamping. drm/vc4: Bind the HVS before we bind the individual CRTCs. gpu: drm: vc4_hdmi: add missing of_node_put after calling of_parse_phandle drm: vc4: enable XBGR8888 and ABGR8888 pixel formats drm/vc4: clean up error exit path on failed dpi_connector allocation	2016-07-15 13:56:11 +10:00
Mario Kleiner	1bf59f1dcb	drm/vc4: Implement precise vblank timestamping. Precise vblank timestamping is implemented via the usual scanout position based method. On VC4 the pixelvalves PV do not have a scanout position register. Only the hardware video scaler HVS has a similar register which describes which scanline for the output is currently composited and stored in the HVS fifo for later consumption by the PV. This causes a problem in that the HVS runs at a much faster clock (system clock / audio gate) than the PV which runs at video mode dot clock, so the unless the fifo between HVS and PV is full, the HVS will progress faster in its observable read line position than video scan rate, so the HVS position reading can't be directly translated into a scanout position for timestamp correction. Additionally when the PV is in vblank, it doesn't consume from the fifo, so the fifo gets full very quickly and then the HVS stops compositing until the PV enters active scanout and starts consuming scanlines from the fifo again, making new space for the HVS to composite. Therefore a simple translation of HVS read position into elapsed time since (or to) start of active scanout does not work, but for the most interesting cases we can still get useful and sufficiently accurate results: 1. The PV enters active scanout of a new frame with the fifo of the HVS completely full, and the HVS can refill any fifo line which gets consumed and thereby freed up by the PV during active scanout very quickly. Therefore the PV and HVS work effectively in lock-step during active scanout with the fifo never having more than 1 scanline freed up by the PV before it gets refilled. The PV's real scanout position is therefore trailing the HVS compositing position as scanoutpos = hvspos - fifosize and we can get the true scanoutpos as HVS readpos minus fifo size, so precise timestamping works while in active scanout, except for the last few scanlines of the frame, when the HVS reaches end of frame, stops compositing and the PV catches up and drains the fifo. This special case would only introduce minor errors though. 2. If we are in vblank, then we can only guess something reasonable. If called from vblank irq, we assume the irq is usually dispatched with minimum delay, so we can take a timestamp taken at entry into the vblank irq handler as a baseline and then add a full vblank duration until the guessed start of active scanout. As irq dispatch is usually pretty low latency this works with relatively low jitter and good results. If we aren't called from vblank then we could be anywhere within the vblank interval, so we return a neutral result, simply the current system timestamp, and hope for the best. Measurement shows the generated timestamps to be rather precise, and at least never off more than 1 vblank duration worst-case. Limitations: Doesn't work well yet for interlaced video modes, therefore disabled in interlaced mode for now. v2: Use the DISPBASE registers to determine the FIFO size (changes by anholt) Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Signed-off-by: Eric Anholt <eric@anholt.net> Reviewed-and-tested-by: Mario Kleiner <mario.kleiner.de@gmail.com> (v2)	2016-07-11 17:17:34 -07:00
Daniel Vetter	2f196b7c4b	drm/atomic: Add drm_atomic_crtc_state_for_each_plane_state ... and use it in msm&vc4. Again just want to encapsulate drm_atomic_state internals a bit. The const threading is a bit awkward in vc4 since C sucks, but I still think it's worth to enforce this. Eventually I want to make all the obj->state pointers const too, but that's a lot more work ... v2: Provide safe macro to wrap up the unsafe helper better, suggested by Maarten. v3: Fixup subject (Maarten) and spelling fixes (Eric Engestrom). Cc: Eric Anholt <eric@anholt.net> Cc: Rob Clark <robdclark@gmail.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1464877304-4213-1-git-send-email-daniel.vetter@ffwll.ch	2016-06-02 16:59:05 +02:00
Eric Anholt	08302c35b5	drm/vc4: Add DPI driver The DPI interface involves taking a ton of our GPIOs to be used as outputs, and routing display signals over them in parallel. v2: Use display_info.bus_formats[] to replace our custom DT properties. v3: Rebase on V3D documentation changes. v4: Fix rebase detritus from V3D documentation changes. Signed-off-by: Eric Anholt <eric@anholt.net> Acked-by: Rob Herring <robh@kernel.org>	2016-04-14 12:22:53 -07:00
Varad Gautam	ca26d28bba	drm/vc4: improve throughput by pipelining binning and rendering jobs The hardware provides us with separate threads for binning and rendering, and the existing model waits for them both to complete before submitting the next job. Splitting the binning and rendering submissions reduces idle time and gives us approx 20-30% speedup with some x11perf tests such as -line10 and -tilerect1. Improves openarena performance by 1.01897% +/- 0.247857% (n=16). Thanks to anholt for suggesting this. v2: Rebase on the spurious resets fix (change by anholt). Signed-off-by: Varad Gautam <varadgautam@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Eric Anholt <eric@anholt.net>	2016-03-13 17:05:05 -07:00
Dave Airlie	9b61c0fcdf	Merge drm-fixes into drm-next. Nouveau wanted this to avoid some worse conflicts when I merge that.	2016-03-14 09:46:02 +10:00
Eric Anholt	36cb6253f9	drm/vc4: Use runtime PM to power cycle the device when the GPU hangs. This gets us functional GPU reset again, like we had until a refactor at merge time. Tested with a little patch to stuff in a broken binner job every 100 frames. Signed-off-by: Eric Anholt <eric@anholt.net>	2016-02-16 12:21:01 -08:00
Eric Anholt	001bdb55d9	drm/vc4: Enable runtime PM. This may actually get us a feature that the closed driver didn't have: turning off the GPU in between rendering jobs, while the V3D device is still opened by the client. There may be some tuning to be applied here to use autosuspend so that we don't bounce the device's power so much, but in steady-state GPU-bound rendering we keep the power on (since we keep multiple jobs outstanding) and even if we power cycle on every job we can still manage at least 680 fps. More importantly, though, runtime PM will allow us to power off the device to do a GPU reset. v2: Switch #ifdef to CONFIG_PM not CONFIG_PM_SLEEP (caught by kbuild test robot) Signed-off-by: Eric Anholt <eric@anholt.net>	2016-02-16 12:21:00 -08:00
Eric Anholt	c4ce60dc30	drm/vc4: Fix spurious GPU resets due to BO reuse. We were tracking the "where are the head pointers pointing" globally, so if another job reused the same BOs and execution was at the same point as last time we checked, we'd stop and trigger a reset even though the GPU had made progress. Signed-off-by: Eric Anholt <eric@anholt.net>	2016-02-16 12:21:00 -08:00
Eric Anholt	21af94cf1a	drm/vc4: Add support for scaling of display planes. This implements a simple policy for choosing scaling modes (trapezoidal for decimation, PPF for magnification), and a single PPF filter (Mitchell/Netravali's recommendation). Signed-off-by: Eric Anholt <eric@anholt.net>	2016-02-16 11:24:08 -08:00
Eric Anholt	d8dbf44f13	drm/vc4: Make the CRTCs cooperate on allocating display lists. So far, we've only ever lit up one CRTC, so this has been fine. To extend to more displays or more planes, we need to make sure we don't run our display lists into each other. Signed-off-by: Eric Anholt <eric@anholt.net>	2016-02-16 11:24:08 -08:00
Daniel Vetter	32a3dbeb2b	drm/vc4: Nuke preclose hook Again since the drm core takes care of event unlinking/disarming this is now just needless code. v2: Fixup misplaced hunk. Cc: Eric Anholt <eric@anholt.net> Acked-by: Daniel Stone <daniels@collabora.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (v1) Acked-by: Eric Anholt <eric@anholt.net> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1453756616-28942-14-git-send-email-daniel.vetter@ffwll.ch	2016-02-08 09:55:53 +01:00
Eric Anholt	214613656b	drm/vc4: Add an interface for capturing the GPU state after a hang. This can be parsed with vc4-gpu-tools tools for trying to figure out what was going on. v2: Use __u32-style types. Signed-off-by: Eric Anholt <eric@anholt.net>	2015-12-07 20:49:49 -08:00
Eric Anholt	b501bacc60	drm/vc4: Add support for async pageflips. An async pageflip stores the modeset to be done and executes it once the BOs are ready to be displayed. This gets us about 3x performance in full screen rendering with pageflipping. Signed-off-by: Eric Anholt <eric@anholt.net>	2015-12-07 20:10:03 -08:00
Eric Anholt	d5b1a78a77	drm/vc4: Add support for drawing 3D frames. The user submission is basically a pointer to a command list and a pointer to uniforms. We copy those in to the kernel, validate and relocate them, and store the result in a GPU BO which we queue for execution. v2: Drop support for NV shader recs (not necessary for GL), simplify vc4_use_bo(), improve bin flush/semaphore checks, use __u32 style types. Signed-off-by: Eric Anholt <eric@anholt.net>	2015-12-07 20:05:10 -08:00
Eric Anholt	d3f5168a08	drm/vc4: Bind and initialize the V3D engine. This is the component of the GPU that does 3D rendering. Signed-off-by: Eric Anholt <eric@anholt.net>	2015-12-07 20:05:10 -08:00
Eric Anholt	463873d570	drm/vc4: Add an API for creating GPU shaders in GEM BOs. Since we have no MMU, the kernel needs to validate that the submitted shader code won't make any accesses to memory that the user doesn't control, which involves banning some operations (general purpose DMA writes), and tracking where we need to write out pointers for other operations (texture sampling). Once it's validated, we return a GEM BO containing the shader, which doesn't allow mapping for write or exporting to other subsystems. v2: Use __u32-style types. Signed-off-by: Eric Anholt <eric@anholt.net>	2015-12-07 20:05:09 -08:00
Eric Anholt	d5bc60f6ad	drm/vc4: Add create and map BO ioctls. While there exist dumb APIs for creating and mapping BOs, one of the rules is that drivers doing 3D acceleration have to provide their own APIs for buffer allocation (besides, the pitch/height parameters of the dumb alloc don't really make sense for a lot of 3D allocations). v2: Use __u32-style types, use "drm.h" instead of <drm/drm.h>. Signed-off-by: Eric Anholt <eric@anholt.net>	2015-12-07 20:04:57 -08:00
Eric Anholt	c826a6e106	drm/vc4: Add a BO cache. We need to allocate new BOs in the kernel as part of each frame, but the CMA allocator is way too slow for that. As an optimization, keep track of recently-freed BOs and reuse them, with a 1 second timeout to fully free them back to the system. This improves 3D performance by about 15%. Signed-off-by: Eric Anholt <eric@anholt.net>	2015-12-07 20:01:56 -08:00
Dave Airlie	1f43710a8e	Merge tag 'drm-vc4-next-2015-10-21' of http://github.com/anholt/linux into drm-next This pull request introduces the vc4 driver, for kernel modesetting on the Raspberry Pi (bcm2835/bcm2836 architectures). It currently supports a display plane and cursor on the HDMI output. The driver doesn't do 3D, power management, or overlay planes yet. [airlied: fixup the enable/disable vblank APIs] Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> * tag 'drm-vc4-next-2015-10-21' of http://github.com/anholt/linux: drm/vc4: Allow vblank to be disabled drm/vc4: Use the fbdev_cma helpers drm/vc4: Add KMS support for Raspberry Pi. drm/vc4: Add devicetree bindings for VC4.	2015-10-22 10:31:17 +10:00
Derek Foreman	48666d5631	drm/vc4: Use the fbdev_cma helpers Keep the fbdev_cma pointer around so we can use it on hotplog and close to ensure the frame buffer console is in a useful state. Signed-off-by: Derek Foreman <derekf@osg.samsung.com> Signed-off-by: Eric Anholt <eric@anholt.net>	2015-10-21 10:33:12 +01:00
Eric Anholt	c8b75bca92	drm/vc4: Add KMS support for Raspberry Pi. This is enough for fbcon and bringing up X using xf86-video-modesetting. It doesn't support the 3D accelerator or power management yet. v2: Drop FB_HELPER select thanks to Archit's patches. Do manual init ordering instead of using the .load hook. Structure registration more like tegra's, but still using the typical "component" code. Drop no-op hooks for atomic_begin and mode_fixup() now that they're optional. Drop sentinel in Makefile. Fix minor style nits I noticed on another reread. v3: Use the new bcm2835 clk driver to manage pixel/HSM clocks instead of having a fixed video mode. Use exynos-style component driver matching instead of devicetree nodes to list the component driver instances. Rename compatibility strings to say bcm2835, and distinguish pv0/1/2. Clean up some h/vsync code, and add in interlaced mode setup. Fix up probe/bind error paths. Use bitops.h macros for vc4_regs.h v4: Include i2c.h, allow building under COMPILE_TEST, drop msleep now that other bugs have been fixed, add timeouts to cpu_relax() loops, rename hpd-gpio to hpd-gpios. Signed-off-by: Eric Anholt <eric@anholt.net> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-10-21 10:33:12 +01:00

50 Commits