Instead of validating all page tables when one was evicted,
track which one needs a validation.
v2: simplify amdgpu_vm_ready as well
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (v1)
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Check memory allocation failure and return -ENOMEM in such a case.
'num_post_dep_syncobjs' still has to be set to 0 before the test in order
to have it initialized if 'amdgpu_cs_parser_fini()' is called to free
resources.
The calling graph would be, in such a case!
failure in amdgpu_cs_process_syncobj_out_dep()
---> error code returned by amdgpu_cs_dependencies()
--> amdgpu_cs_parser_fini() is called
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The function has far more in common with drm_syncobj_find than with
any in the get/put functions.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Christian König <christian.koenig@amd.com> (v1)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Check memory allocation failure and return -ENOMEM in such a case.
'num_post_dep_syncobjs' still has to be set to 0 before the test in order
to have it initialized if 'amdgpu_cs_parser_fini()' is called to free
resources.
The calling graph would be, in such a case!
failure in amdgpu_cs_process_syncobj_out_dep()
---> error code returned by amdgpu_cs_dependencies()
--> amdgpu_cs_parser_fini() is called
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
That better describes what happens here with the BO.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Split that into vm_bo_base and bo_va to allow other uses as well.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Move the CSA bo_va from the VM to the fpriv structure.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Looks like a better place for this.
v2: use atomic64_t members instead
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This should save us a bunch of command submission overhead.
v2: move the LRU move to the right place to avoid the move for the root BO
and handle the shadow BOs as well. This turned out to be a bug fix because
the move needs to happen before the kmap.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Change "prefered" to "preferred"
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm_*_reference() and drm_*_unreference() functions are just
compatibility alias for drm_*_get() and drm_*_put() and should not be
used by new code. So convert all users of compatibility functions to use
the new APIs.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Cihangir Akturk <cakturk@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Instead of open coding the conversion from u64 to pointers.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The BO move throttling code is designed to allow VRAM to fill quickly if it
is relatively empty. However, this does not take into account situations
where the visible VRAM is smaller than total VRAM, and total VRAM may not
be close to full but the visible VRAM segment is under pressure. In such
situations, visible VRAM would experience unrestricted swapping and
performance would drop.
Add a separate counter specifically for moves involving visible VRAM, and
check it before moving BOs there.
v2: Only perform calculations for separate counter if visible VRAM is
smaller than total VRAM. (Michel Dänzer)
v3: [Michel Dänzer]
* Use BO's location rather than the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED
flag to determine whether to account a move for visible VRAM in most
cases.
* Use a single
if (adev->mc.visible_vram_size < adev->mc.real_vram_size) {
block in amdgpu_cs_get_threshold_for_moves.
Fixes: 95844d20ae (drm/amdgpu: throttle buffer migrations at CS using a fixed MBps limit (v2))
Signed-off-by: John Brooks <john@fastquake.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
the drm_file parameter is unused, so remove it.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The function is called only once inside the .c file.
v2: update the commit message (Michel)
Signed-off-by: Alex Xie <AlexBin.Xie@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Xie <AlexBin.Xie@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This creates a new command submission chunk for amdgpu
to add in and out sync objects around the submission.
Sync objects are managed via the drm syncobj ioctls.
The command submission interface is enhanced with two new
chunks, one for syncobj pre submission dependencies,
and one for post submission sync obj signalling,
and just takes a list of handles for each.
This is based on work originally done by David Zhou at AMD,
with input from Christian Konig on what things should look like.
In theory VkFences could be backed with sync objects and
just get passed into the cs as syncobj handles as well.
NOTE: this interface addition needs a version bump to expose
it to userspace.
TODO: update to dep_sync when rebasing onto amdgpu master.
(with this - r-b from Christian)
v1.1: keep file reference on import.
v2: move to using syncobjs
v2.1: change some APIs to just use p pointer.
v3: make more robust against CS failures, we now add the
wait sems but only remove them once the CS job has been
submitted.
v4: rewrite names of API and base on new syncobj code.
v5: move post deps earlier, rename some apis
v6: lookup post deps earlier, and just replace fences
in post deps stage (Christian)
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This just splits out the fence depenency checking into it's
own function to make it easier to add semaphore dependencies.
v2: rebase onto other changes.
v1-Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
New radeon and amdgpu features for 4.13:
- Lots of Vega10 bug fixes
- Preliminary Raven support
- KIQ support for compute rings
- MEC queue management rework from Andres
- Audio support for DCE6
- SR-IOV improvements
- Improved module parameters for controlling radeon vs amdgpu support
for SI and CIK
- Bug fixes
- General code cleanups
[airlied: dropped drmP.h header from one file was needed and build broke]
* 'drm-next-4.13' of git://people.freedesktop.org/~agd5f/linux: (362 commits)
drm/amdgpu: Fix compiler warnings
drm/amdgpu: vm_update_ptes remove code duplication
drm/amd/amdgpu: Port VCN over to new SOC15 macros
drm/amd/amdgpu: Port PSP v10.0 over to new SOC15 macros
drm/amd/amdgpu: Port PSP v3.1 over to new SOC15 macros
drm/amd/amdgpu: Port NBIO v7.0 driver over to new SOC15 macros
drm/amd/amdgpu: Port NBIO v6.1 driver over to new SOC15 macros
drm/amd/amdgpu: Port UVD 7.0 over to new SOC15 macros
drm/amd/amdgpu: Port MMHUB over to new SOC15 macros
drm/amd/amdgpu: Cleanup gfxhub read-modify-write patterns
drm/amd/amdgpu: Port GFXHUB over to new SOC15 macros
drm/amd/amdgpu: Add offset variant to SOC15 macros
drm/amd/powerplay: add avfs control for Vega10
drm/amdgpu: add virtual display support for raven
drm/amdgpu/gfx9: fix compute ring doorbell index
drm/amd/amdgpu: Rename KIQ ring to avoid spaces
drm/amd/amdgpu: gfx9 tidy ups (v2)
drm/amdgpu: add contiguous flag in ucode bo create
drm/amdgpu: fix missed gpu info firmware when cache firmware during S3
drm/amdgpu: export test ib debugfs interface
...
Signed-off-by: Alex Xie <AlexBin.Xie@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add amdgpu_queue_mgr, a mechanism that allows disjointing usermode's
ring ids from the kernel's ring ids.
The queue manager maintains a per-file descriptor map of user ring ids
to amdgpu_ring pointers. Once a map is created it is permanent (this is
required to maintain FIFO execution guarantees for a context's ring).
Different queue map policies can be configured for each HW IP.
Currently all HW IPs use the identity mapper, i.e. kernel ring id is
equal to the user ring id.
The purpose of this mechanism is to distribute the load across multiple
queues more effectively for HW IPs that support multiple rings.
Userspace clients are unable to check whether a specific resource is in
use by a different client. Therefore, it is up to the kernel driver to
make the optimal choice.
v2: remove amdgpu_queue_mapper_funcs
v3: made amdgpu_queue_mgr per context instead of per-fd
v4: add context_put on error paths
v5: rebase and include new IPs UVD_ENC & VCN_*
v6: drop unused amdgpu_ring_is_valid_index (Alex)
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
below ioctl will return -ENODEV:
amdgpu_cs_ioctl
amdgpu_cs_wait_ioctl
amdgpu_cs_wait_fences_ioctl
amdgpu_gem_va_ioctl
amdgpu_info_ioctl
v2: only for map and replace cases in amdgpu_gem_va_ioctl
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Now that drm_[cm]alloc* helpers are simple one line wrappers around
kvmalloc_array and drm_free_large is just kvfree alias we can drop
them and replace by their native forms.
This shouldn't introduce any functional change.
Changes since v1
- fix typo in drivers/gpu//drm/etnaviv/etnaviv_gem.c - noticed by 0day
build robot
Suggested-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Michal Hocko <mhocko@suse.com>drm: drop drm_[cm]alloc* helpers
[danvet: Fixup vgem which grew another user very recently.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Christian König <christian.koenig@amd.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517122312.GK18247@dhcp22.suse.cz
the case could happen when gpu reset:
1. when gpu reset, cs can be continue until sw queue is full, then push job will wait with holding pd reservation.
2. gpu_reset routine will also need pd reservation to restore page table from their shadow.
3. cs is waiting for gpu_reset complete, but gpu reset is waiting for cs releases reservation.
v2: handle amdgpu_cs_submit error path.
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
V2: remove **array method, directly fence_put after fence wait.
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
Reviewed-by: Christian König <chrstian.koenig@amd.com>
Reviewed-by: Ken Wang <Qingqing.Wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Xie <AlexBin.Xie@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This only makes a difference for 32-bit systems. The idea is to have a
fixed virtual address space size with 4-level page tables and to
minimize differences between 32 and 64-bit systems.
v2: Update commit message.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
1) Adapt to vulkan:
Now use double SWITCH BUFFER to replace the 128 nops w/a,
because when vulkan introduced, umd can insert 7 ~ 16 IBs
per submit which makes 256 DW size cannot hold the whole
DMAframe (if we still insert those 128 nops), CP team suggests
use double SWITCH_BUFFERs, instead of tricky 128 NOPs w/a.
2) To fix the CE VM fault issue when MCBP introduced:
Need one more COND_EXEC wrapping IB part (original one us
for VM switch part).
this change can fix vm fault issue caused by below scenario
without this change:
>CE passed original COND_EXEC (no MCBP issued this moment),
proceed as normal.
>DE catch up to this COND_EXEC, but this time MCBP issued,
thus DE treats all following packages as NOP. The following
VM switch packages now looks just as NOP to DE, so DE
dosen't do VM flush at all.
>Now CE proceeds to the first IBc, and triggers VM fault,
because DE didn't do VM flush for this DMAframe.
3) change estimated alloc size for gfx9.
with new DMAframe scheme, we need modify emit_frame_size
for gfx9
4) No need to insert 128 nops after gfx8 vm flush anymore
because there was double SWITCH_BUFFER append to vm flush,
and for gfx7 we already use double SWITCH_BUFFER following
after vm_flush so no change needed for it.
5) Change emit_frame_size for gfx8
v2: squash in BUG removal from Monk
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
1,the check is only appliable for SRIOV GFX engine.
2,use chunk_ib instead of ib.
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Ken Wang <Qingqing.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
to prevent submit two or more IBs with PREEMPT flags.
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
should use chunk_ib instead of ib, otherwise the logic
is incorrect.
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Ken Wang <Qingqing.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Update all levels of the page directory.
V2:
a. sub level pdes always are written to incorrect place.
b. sub levels need to update regardless of parent updates.
Signed-off-by: Christian König <christian.koenig@amd.com> (V1)
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (V1)
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com> (V2)
Acked-by: Alex Deucher <alexander.deucher@amd.com> (V2)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
No functional change, but the base for multi level page tables.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Decribes better what this is used for.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We will add the fence to freed buffer objects in a later commit, to ensure
that the underlying memory can only be re-used after all references in
page tables have been cleared.
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Till GFX8 we can only enable PRT support globally, but with the next hardware
generation we can do this on a per page basis.
Keep the interface consistent by adding PRT mappings and enable
support globally on current hardware when the first mapping is made.
v2: disable PRT support delayed and on all error paths
v3: PRT and other permissions are mutal exclusive,
PRT mappings don't need a BO.
v4: update PRT mappings durign CS as well, make va_flags 64bit
Signed-off-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If we don't reset the chunk info in the error path, the subsequent
fini path will double free.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJYoM2fAAoJEHm+PkMAQRiGr9MH/izEAMri7rJ0QMc3ejt+WmD0
8pkZw3+MVn71z6cIEgpzk4QkEWJd5rfhkETCeCp7qQ9V6cDW1FDE9+0OmPjiphDt
nnzKs7t7skEBwH5Mq5xygmIfkv+Z0QGHZ20gfQWY3F56Uxo+ARF88OBHBLKhqx3v
98C7YbMFLKBslKClA78NUEIdx0UfBaRqerlERx0Lfl9aoOrbBS6WI3iuREiylpih
9o7HTrwaGKkU4Kd6NdgJP2EyWPsd1LGalxBBjeDSpm5uokX6ALTdNXDZqcQscHjE
RmTqJTGRdhSThXOpNnvUJvk9L442yuNRrVme/IqLpxMdHPyjaXR3FGSIDb2SfjY=
=VMy8
-----END PGP SIGNATURE-----
Merge tag 'v4.10-rc8' into drm-next
Linux 4.10-rc8
Backmerge Linus rc8 to fix some conflicts, but also
to avoid pulling it in via a fixes pull from someone.
Like ttm_bo_validate(), ttm_bo_init() might need to move BO and
the number of bytes moved by TTM should be reported. This can help
the throttle buffer migration mechanism to make a better decision.
v2: fix computation
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Difference families may have different numbers of rings. Use
the variable rather than a hardcoded number.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Make sure the CSA is mapped.
v2: agd: rebase.
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Return success when the ring is properly initialized, otherwise return
failure.
Tonga SRIOV VF doesn't have UVD and VCE engines, the initialization of
these IPs is bypassed. The system crashes if application submit IB to
their rings which are not ready to use. It could be a common issue if
IP having ring buffer is disabled for some reason on specific ASIC, so
it should check the ring being ready to use.
Bug: amdgpu_test crashes system on Tonga VF.
Signed-off-by: Ding Pixel <Pixel.Ding@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes a rare NULL pointer dereference in amdgpu_ttm_bind.
The issue was found by Nicolai Haehnle.
The patch was tested by Nicolai Haehnle.
Signed-off-by: Alex Xie <AlexBin.Xie@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
In fence waiting, it never return -EDEADLK yet, so drop this function
here.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
- better atomic state debugging from Rob
- fence prep from gustavo
- sumits flushed out his backlog of pending dma-buf/fence patches from
various people
- drm_mm leak debugging plus trying to appease Kconfig (Chris)
- a few misc things all over
* tag 'topic/drm-misc-2016-11-10' of git://anongit.freedesktop.org/drm-intel: (35 commits)
drm: Make DRM_DEBUG_MM depend on STACKTRACE_SUPPORT
drm/i915: Restrict DRM_DEBUG_MM automatic selection
drm: Restrict stackdepot usage to builtin drm.ko
drm/msm: module param to dump state on error irq
drm/msm/mdp5: add atomic_print_state support
drm/atomic: add debugfs file to dump out atomic state
drm/atomic: add new drm_debug bit to dump atomic state
drm: add helpers to go from plane state to drm_rect
drm: add helper for printing to log or seq_file
drm: helper macros to print composite types
reservation: revert "wait only with non-zero timeout specified (v3)" v2
drm/ttm: fix ttm_bo_wait
dma-buf/fence: revert "don't wait when specified timeout is zero" (v2)
dma-buf/fence: make timeout handling in fence_default_wait consistent (v2)
drm/amdgpu: add the interface of waiting multiple fences (v4)
dma-buf: return index of the first signaled fence (v2)
MAINTAINERS: update Sync File Framework files
dma-buf/sw_sync: put fence reference from the fence creation
dma-buf/sw_sync: mark sync_timeline_create() static
drm: Add stackdepot include for DRM_DEBUG_MM
...
v2: agd: rebase and squash in all the previous optimizations and
changes so everything compiles.
v3: squash in Slava's 32bit build fix
v4: rebase on drm-next (fence -> dma_fence),
squash in Monk's ioctl update patch
Signed-off-by: Junwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Reviewed-by: Jammy Zhou <Jammy.Zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
[sumits: fix checkpatch warnings]
Link: http://patchwork.freedesktop.org/patch/msgid/1478290570-30982-2-git-send-email-alexander.deucher@amd.com
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJYHmoCAAoJEHm+PkMAQRiG7RMIAI2i7Y5hpL5yCxK5AFaL4u/G
KxXfp1B1UanUTgjOmd7zGqtDYcFX9t7GTTUFixQ7/9Opr4PD9qbnatoDGSc3xjbT
msDgA1B78F1/Q3kHWfeGq32MihQ4mj5NwUCo+igUcUvvWG7mHgzErj/Nh5RoobQX
p/izdpTbrw3GX6xXB8olbG7XWHaVye/+TT3q6+gmgm8I/QEujcLeGoycE0zlhPN8
FG/JX76At/+ZM2Py7Oxo3k+oKL9CHrtOQYDp/wN0uslV5eYvvkZz0/M1HMOGZt+c
gZU5jzM17K7C4Nzo06WAuBU9wUBGc25m+cPicLlOmljnzfU+f50SKaDjZq3p7QI=
=2KUF
-----END PGP SIGNATURE-----
Backmerge tag 'v4.9-rc4' into drm-next
Linux 4.9-rc4
This is needed for nouveau development.
Pull request already again to get the s/fence/dma_fence/ stuff in and
allow everyone to resync. Otherwise really just misc stuff all over, and a
new bridge driver.
* tag 'topic/drm-misc-2016-10-27' of git://anongit.freedesktop.org/git/drm-intel:
drm/bridge: fix platform_no_drv_owner.cocci warnings
drm/bridge: fix semicolon.cocci warnings
drm: Print some debug/error info during DP dual mode detect
drm: mark drm_of_component_match_add dummy inline
drm/bridge: add Silicon Image SiI8620 driver
dt-bindings: add Silicon Image SiI8620 bridge bindings
video: add header file for Mobile High-Definition Link (MHL) interface
drm: convert DT component matching to component_match_add_release()
dma-buf: Rename struct fence to dma_fence
dma-buf/fence: add an lockdep_assert_held()
drm/dp: Factor out helper to distinguish between branch and sink devices
drm/edid: Only print the bad edid when aborting
drm/msm: add missing header dependencies
drm/msm/adreno: move function declarations to header file
drm/i2c/tda998x: mark symbol static where possible
doc: add missing docbook parameter for fence-array
drm: RIP mode_config->rotation_property
drm/msm/mdp5: Advertize 180 degree rotation
drm/msm/mdp5: Use per-plane rotation property
This way we can use parse_cs and still keep VM mode enabled.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-and-Tested by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It's constant, so it doesn't make to much sense to keep it
with the variable data.
v2: update vce and uvd phys mode ring structures as well
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Saves a bunch of CPU cycles when swapping things back in and
allows us to split the VM headers into a separate file.
v2: rename parameters
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It's completely pointless to have two pointers to the
device in the same structure.
v2: rename function to amdgpu_ttm_adev, fix typos
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add a flag noting that a BO must be created using linear VRAM
and set this flag on all in kernel users where appropriate.
Hopefully I haven't missed anything.
v2: add it in a few more places, fix CPU mapping.
v3: rename to VRAM_CONTIGUOUS, fix typo in CS code.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Only allocate address space when we really need it.
v2: fix a typo, add correct function description,
stop leaking the node in the error case.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We need to validate the offset to make sure that we don't write after the BO.
Additional to that a page should be enough and can make address space
handling much easier.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We get a few warnings when building kernel with W=1:
drivers/gpu/drm/amd/amdgpu/cz_smc.c:51:5: warning: no previous prototype for 'cz_send_msg_to_smc_async' [-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/cz_smc.c:143:5: warning: no previous prototype for 'cz_write_smc_sram_dword' [-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/iceland_smc.c:124:6: warning: no previous prototype for 'iceland_start_smc' [-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c:3926:6: warning: no previous prototype for 'gfx_v8_0_rlc_stop' [-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c:94:6: warning: no previous prototype for 'amdgpu_job_free_cb' [-Wmissing-prototypes]
....
In fact, these functions are only used in the file in which they are
declared and don't need a declaration, but can be made static.
So this patch marks these functions with 'static'.
Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We don't really need the GTT table any more most of the time. So bind it
only on demand.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
v1:
for gfx8, use CONTEXT_CONTROL package to dynamically
skip preamble CEIB and other load_xxx command in sequence.
v2:
support GFX7 as well.
remove cntxcntl in compute ring funcs because CPC doesn't
support this packet.
v3: fix reduntant judgement in cntxcntl.
v4: some cleanups, don't change cs_submit()
v5: keep old MESA supported & bump up KMS version.
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Ack-by: Chunming Zhou <David1.Zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
job->ctx actually is a fence_context of the entity
it belongs to, naming it as ctx is too vague, and
we'll need add amdgpu_ctx into the job structure
later.
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
As last resort try to evict BOs from the current working set into other
memory domains. This effectively prevents command submission failures when
VM page tables have been swapped out.
v2: fix typos
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
All other errors can't be fixed by using a different memory domain.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The old mechanism used a per-submission limit that didn't take previous
submissions within the same time frame into account. It also filled VRAM
slowly when VRAM usage dropped due to a big eviction or buffer deallocation.
This new method establishes a configurable MBps limit that is obeyed when
VRAM usage is very high. When VRAM usage is not very high, it gives
the driver the freedom to fill it quickly. The result is more consistent
performance.
It can't keep the BO move rate low if lots of evictions are happening due
to VRAM fragmentation, or if a big buffer is being migrated.
The amdgpu.moverate parameter can be used to set a non-default limit.
Measurements can be done to find out which amdgpu.moverate setting gives
the best results.
Mainly APUs and cards with small VRAM will benefit from this. For F1 2015,
anything with 2 GB VRAM or less will benefit.
Some benchmark results - F1 2015 (Tonga 2GB):
Limit MinFPS AvgFPS
Old code: 14 32.6
128 MB/s: 28 41
64 MB/s: 15.5 43
32 MB/s: 28.7 43.4
8 MB/s: 27.8 44.4
8 MB/s: 21.9 42.8 (different run)
Random drops in Min FPS can still occur (due to fragmented VRAM?), but
the average FPS is much better. 8 MB/s is probably a good limit for this
game & the current VRAM management. The random FPS drops are still to be
tackled.
v2: use a spinlock
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Make it more obvious what we are doing here.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It's useful for debugging.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We return the fence as part of the job structur anyway,
no need to do this twice.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Keep the time we don't have a fence associated with the resource smaller.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Same problem as with the VM page tables. The user fence address must be
determined before the job is scheduled, not when the IB is executed.
This fixes a security problem where user fences could be used to overwrite
any part of VRAM.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It's just overhead to do so and allocating a VMID
when we don't need one is actually a bit dangerous.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We don't need to validate them again if the eviction counter didn't changed.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When we pipeline evictions the page directory could already be
moving somewhere else when grab_id is called.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Remove the job reference counting and just properly destroy it from a
work item which blocks on any potential running timeout handler.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Monk.Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The driver shouldn't mess with the scheduler internals.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Monk.Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm_gem_object_lookup() has never required the drm_device for its file
local translation of the user handle to the GEM object. Let's remove the
unused parameter and save some space.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: dri-devel@lists.freedesktop.org
Cc: Dave Airlie <airlied@redhat.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
[danvet: Fixup kerneldoc too.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We leaked the BO in the error pass, additional to that we only have
one user fence for all IBs in a job.
v2: remove white space changes
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
They are the same for all IBs.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We only have one context for all IBs.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use of the ctx pointer is not safe, because they are likely already
be assigned to another ctx when doing comparing.
v2: recreate from scratch, avoid all unnecessary changes.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Monk.Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
ib.vm is a legacy way to get vm, after scheduler
implemented vm should be get from job, and all ibs
from one job share the same vm, no need to keep ib.vm
just move vm field to job.
this patch as well add job as paramter to ib_schedule
so it can get vm from job->vm.
v2: agd: sqaush in:
drm/amdgpu: check if ring emit_vm_flush exists in vm flush
No vm flush on engines that don't support VM.
bug:
https://bugs.freedesktop.org/show_bug.cgi?id=95195
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Not needed any more.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
this is to fix fatal page fault error that occured if:
job is signaled/released after its timeout work is already
put to the global queue (in this case the cancel_delayed_work
will return false), which will lead to NX-protection error
page fault during job_timeout_func.
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add two callbacks to scheduler to maintain jobs, and invoked for
job timeout calculations. Now TDR measures time gap from
job is processed by hw.
v2:
fix typo
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Consolidate job initialization in one place rather than
duplicating it in multiple places.
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Christian König <christian.koenig@amd.com.
Signed-off-by: Dave Airlie <airlied@redhat.com>
That avoids lock inversion between the BO reservation lock
and the anon_vma lock.
v2:
* Changed amdgpu_bo_list_entry.user_pages to an array of pointers
* Lock mmap_sem only for get_user_pages
* Added invalidation of unbound userpointer BOs
* Fixed memory leak and page reference leak
v3 (chk):
* Revert locking mmap_sem only for_get user_pages
* Revert adding invalidation of unbound userpointer BOs
* Sanitize and fix error handling
v4 (chk):
* Init userpages pointer everywhere.
* Fix error handling when get_user_pages() fails.
* Add invalidation of unbound userpointer BOs again.
v5 (chk):
* Add maximum number of tries.
v6 (chk):
* Fix error handling when we run out of tries.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> (v4)
Acked-by: Alex Deucher <alexander.deucher@amd.com>
We need them together with the next patch.
v2: Don't take bo reference twice
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
No need to keep that for every IB.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add a job_alloc_with_ib helper and proper job submission.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We can't submit to multiple rings at the same time anyway.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Don't keep that around twice.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
And use them in the CS instead of allocating IBs and jobs separately.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Specifying no IBs on command submission is invalid, stop crashing
badly when somebody tries it.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucer@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Instead of when we try to bind it check the usermm when
we try to use it in the IOCTLs.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>