Commit Graph

439 Commits

Author SHA1 Message Date
Felix Kuehling 9ce2b991f7 drm/amdgpu: Cast to uint64_t before left shift
Avoid potential integer overflows with left shift in huge-page mapping
code by casting the operand to uin64_t first.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-28 14:38:33 -05:00
Christian König c1a17777eb drm/amdgpu: fix huge page handling on Vega10
We accidentially set the huge flag on the parent instead of the childs.
This caused some VM faults under memory pressure.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2018-11-13 10:21:00 -05:00
Christian König 4faaaa7623 drm/amdgpu: fix VM leaf walking
Make sure we don't try to go down further after the leave walk already
ended. This fixes a crash with a new VM test.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by:  Rex Zhu Rex.Zhu@amd.com
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-10-26 13:27:06 -05:00
Christian König 0af5c656fd drm/amdgpu: fix amdgpu_vm_fini
We should not remove mappings in rbtree_postorder_for_each_entry_safe
because that rebalances the tree.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-10-25 14:04:40 -05:00
Christian König 769f846e14 drm/amdgpu: fix parameter documentation for amdgpu_vm_free_pts
The function was modified without updating the documentation.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-19 12:35:01 -05:00
Christian König cb90b97bb3 drm/amdgpu: add amdgpu_vm_entries_mask v2
We can't get the mask for the root directory from the number of entries.

So add a new function to avoid that problem.

v2: fix typo in mask

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-19 12:34:41 -05:00
Alex Deucher 741deade2a drm/amdgpu: simplify Raven, Raven2, and Picasso handling
Treat them all as Raven rather than adding a new picasso
asic type.  This simplifies a lot of code and also handles the
case of rv2 chips with the 0x15d8 pci id.  It also fixes dmcu
fw handling for picasso.

Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-14 09:38:03 -05:00
Likun Gao 5f4e2085ee drm/amdgpu: add picasso support for vm
Add vm support for picasso.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-14 09:35:00 -05:00
Christian König 646b902598 drm/amdgpu: use a single linked list for amdgpu_vm_bo_base
Instead of the double linked list. Gets the size of amdgpu_vm_pt down to
64 bytes again.

We could even reduce it down to 32 bytes, but that would require some
rather extreme hacks.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-13 15:14:12 -05:00
Christian König e83dfe4d86 drm/amdgpu: remove amdgpu_bo_list_entry.robj (v2)
We can get that just by casting tv.bo.

v2: squash in kfd fix (Alex)

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-13 15:14:12 -05:00
Christian König 0c70dd4985 drm/amdgpu: allow fragment processing for invalid PTEs
That should improve the PRT performance on Vega quite a bit.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Acked-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-13 15:14:11 -05:00
Christian König 1b1d5c43db drm/amdgpu: use the maximum possible fragment size on Vega/Raven
The fragment size controls only the L1 on Vega/Raven and we now don't
have any extra overhead any more because of larger fragments.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-13 15:14:10 -05:00
Christian König dfcd99f627 drm/amdgpu: meld together VM fragment and huge page handling
This optimizes the generating of PTEs by walking the hierarchy only once
for a range and making changes as necessary.

It allows for both huge (2MB) as well giant (1GB) pages to be used on
Vega and Raven.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Acked-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-13 15:14:10 -05:00
Christian König dfa70550f5 drm/amdgpu: use leaf iterator for filling PTs
Less overhead and is the starting point for further cleanups and
improvements.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-13 15:14:10 -05:00
Christian König d4085ea9bc drm/amdgpu: use the DFS iterator in amdgpu_vm_invalidate_pds v2
Less code and easier to maintain.

v2: rename the function as well

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-13 15:14:09 -05:00
Christian König 229a37f834 drm/amdgpu: use dfs iterator to free PDs/PTs
Allows us to free all PDs/PTs without recursion.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-13 15:14:09 -05:00
Christian König d72a6887ee drm/amdgpu: use leaf iterator for allocating PD/PT
Less code and allows for easier error handling.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-13 15:14:08 -05:00
Christian König 73633e3223 drm/amdgpu: add some VM PD/PT iterators v2
Both a leaf as well as dfs iterator to walk over all the PDs/PTs.

v2: update comments and fix for_each_amdgpu_vm_pt_dfs_safe

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-13 15:14:08 -05:00
Oak Zeng 240cd9a642 drm/amdgpu: Move fault hash table to amdgpu vm
In stead of share one fault hash table per device, make it
per vm. This can avoid inter-process lock issue when fault
hash table is full.

Change-Id: I5d1281b7c41eddc8e26113e010516557588d3708
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Suggested-by: Christian Konig <Christian.Koenig@amd.com>
Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian Konig <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-12 16:28:53 -05:00
Andrey Grodzovsky d8de8260a4 drm/amdgpu: Fix SDMA TO after GPU reset v3
After GPU reset amdgpu_vm_clear_bo triggers VM flush
but job->vm_pd_addr is not set causing SDMA TO.

v2:
Per advise by Christian König avoid flushing VM for jobs where
job->vm_pd_addr wasn't explicitly set.

v3:
Shortcut vm_flush_needed early.

Fixes cbd5285 drm/amdgpu: move setting the GART addr into TTM.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-11 16:30:48 -05:00
Christian König 1c860a022f drm/amdgpu: add amdgpu_vm_update_func
Add helper to call the update function for both BO and shadow.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-11 16:30:32 -05:00
Christian König ba79fde47b drm/amdgpu: add amdgpu_vm_pt_parent helper
Add a function to get the parent of a PD/PT.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-11 16:30:16 -05:00
Christian König fbbf794cbd drm/amdgpu: set bulk_moveable to false when a per VM is released
Otherwise we might run into a use after free during bulk move.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-10 22:45:32 -05:00
Masanari Iida 989edc699f drm/amdgpu: Fix warnings while make xmldocs
This patch fixes following warnings.

./drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:3011:
warning: Excess function parameter 'dev' description
in 'amdgpu_vm_get_task_info'

./drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:3012:
warning: Function parameter or member 'adev' not
described in 'amdgpu_vm_get_task_info'

./drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:3012:
warning: Excess function parameter 'dev' description
in 'amdgpu_vm_get_task_info'

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-10 22:44:57 -05:00
Felix Kuehling 03e9dee11d drm/amdgpu: Fix compute VM BO params after rebase v2
The intent of two commits was lost in the last rebase:

810955b drm/amdgpu: Fix acquiring VM on large-BAR systems
b5d21aa drm/amdgpu: Don't use shadow BO for compute context

This commit restores the original behaviour:
* Don't set AMDGPU_GEM_CREATE_NO_CPU_ACCESS for page directories
  to allow them to be reused for compute VMs
* Don't create shadow BOs for page tables in compute VMs

v2: move more logic into amdgpu_vm_bo_param

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Tested-by: Kent Russell <Kent.Russell@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-10 22:44:48 -05:00
Christian König 3d5fe658b5 drm/amdgpu: manually map the shadow BOs again
Otherwise we won't be able to use the AGP aperture.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-10 22:41:46 -05:00
Christian König ad9a5b78f5 drm/amdgpu: correctly sign extend 48bit addresses v3
Correct sign extend the GMC addresses to 48bit.

v2: sign extending turned out easier than thought.
v3: clean up the defines and move them into amdgpu_gmc.h as well

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-10 22:41:24 -05:00
Christian König bcdc9fd634 drm/amdgpu: improve VM state machine documentation v2
Since we have a lot of FAQ on the VM state machine try to improve the
documentation by adding functions for each state move.

v2: fix typo in amdgpu_vm_bo_invalidated, use amdgpu_vm_bo_relocated in
    one more place as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-10 22:41:03 -05:00
Christian König c12a2ee5d0 drm/amdgpu: separate per VM BOs from normal in the moved state
Allows us to avoid taking the spinlock in more places.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-10 22:40:16 -05:00
Christian König c460f8a6f5 drm/amdgpu: move size calculations to the front of the file again
amdgpu_vm_bo_* functions should come much later.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-10 22:40:09 -05:00
Christian König 5d35ed4832 drm/amdgpu: fix idle state and bulk_moveable flag
Add BOs to the idle state again and correctly clear the flag when
new BOs are added.

Signed-off-by: Christian König <christian.koenig@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-02 10:16:44 -05:00
Christian König 17cc525206 drm/amdgpu: Revert "kmap PDs/PTs in amdgpu_vm_update_directories"
This reverts commit a7f91061c6.

Felix pointed out that we need to have the BOs mapped even before
amdgpu_vm_update_directories is called.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-30 09:03:45 -05:00
Philip Yang dcaaff4eed drm/amdgpu: remove redundant memset
kvmalloc_array uses __GFP_ZERO flag ensures that the returned address
is zeroed already, memset it to zero again afterwards is unnecessary,
and in this case buggy because we only clear the first entry.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-29 12:36:06 -05:00
Michel Dänzer d78c1fa0c9 Revert "drm/amdgpu: move PD/PT bos on LRU again"
This reverts commit 31625ccae4464b61ec8cdb9740df848bbc857a5b.

It triggered various badness on my development machine when running the
piglit gpu profile with radeonsi on Bonaire, looks like memory
corruption due to insufficiently protected list manipulations.

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-29 12:35:43 -05:00
Oak Zeng bf47afbabf drm/amdkfd: Release an acquired process vm
For compute vm acquired from amdgpu, vm.pasid is managed
by kfd. Decouple pasid from such vm on process destroy
to avoid duplicate pasid release.

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-29 12:35:00 -05:00
Oak Zeng 1685b01a85 drm/amdgpu: Set pasid for compute vm (v2)
To make a amdgpu vm to a compute vm, the old pasid will be freed and
replaced with a pasid managed by kfd. Kfd can't reuse original pasid
allocated by amdgpu because kfd uses different pasid policy with amdgpu.
For example, all graphic devices share one same pasid in a process.

v2: rebase (Alex)

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-29 12:34:49 -05:00
Emily Deng 7ef0b43545 drm/amdgpu: Need to set moved to true when evict bo
Fix the VMC page fault when the running sequence is as below:
1.amdgpu_gem_create_ioctl
2.ttm_bo_swapout->amdgpu_vm_bo_invalidate, as not called
amdgpu_vm_bo_base_init, so won't called
list_add_tail(&base->bo_list, &bo->va). Even the bo was evicted,
it won't set the bo_base->moved.
3.drm_gem_open_ioctl->amdgpu_vm_bo_base_init, here only called
list_move_tail(&base->vm_status, &vm->evicted), but not set the
bo_base->moved.
4.amdgpu_vm_bo_map->amdgpu_vm_bo_insert_map, as the bo_base->moved is
not set true, the function amdgpu_vm_bo_insert_map will call
list_move(&bo_va->base.vm_status, &vm->moved)
5.amdgpu_cs_ioctl won't validate the swapout bo, as it is only in the
moved list, not in the evict list. So VMC page fault occurs.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-28 11:55:38 -05:00
Christian König 284dec4317 drm/amdgpu: enable GTT PD/PT for raven v3
Should work on Vega10 as well, but with an obvious performance hit.

Older APUs can be enabled as well, but will probably be more work.

v2: fix error checking
v3: use more general check

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27 15:19:49 -05:00
Christian König 24a8d289d5 drm/amdgpu: add amdgpu_gmc_get_pde_for_bo helper v2
Helper to get the PDE for a PD/PT.

v2: improve documentation

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27 15:19:42 -05:00
Christian König e21eb2613d drm/amdgpu: add helper for VM PD/PT allocation parameters v3
Add a helper function to figure them out only once.

v2: fix typo with memset
v3: rebase on kfd changes (Alex)

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27 15:19:12 -05:00
Christian König 248f2b8ef2 drm/amdgpu: remove extra root PD alignment
Just another leftover from radeon.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27 15:12:23 -05:00
Felix Kuehling 43370c4ce5 drm/amdgpu: Adjust the VM size based on system memory size v2
Set the VM size based on system memory size between the ASIC-specific
limits given by min_vm_size and max_bits. GFXv9 GPUs will keep their
default VM size of 256TB (48 bit). Only older GPUs will adjust VM size
depending on system memory size.

This makes more VM space available for ROCm applications on GFXv8 GPUs
that want to map all available VRAM and system memory in their SVM
address space.

v2:
* Clarify comment
* Round up memory size before >> 30
* Round up automatic vm_size to power of two

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Junwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27 15:10:42 -05:00
Huang Rui 07e6d3f03b drm/amdgpu: move PD/PT bos on LRU again
The new bulk moving functionality is ready, the overhead of moving PD/PT bos to
LRU is fixed. So move them on LRU again.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Chunming Zhou <david1.zhou@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27 11:11:22 -05:00
Huang Rui f921661bd4 drm/amdgpu: use bulk moves for efficient VM LRU handling (v6)
I continue to work for bulk moving that based on the proposal by Christian.

Background:
amdgpu driver will move all PD/PT and PerVM BOs into idle list. Then move all of
them on the end of LRU list one by one. Thus, that cause so many BOs moved to
the end of the LRU, and impact performance seriously.

Then Christian provided a workaround to not move PD/PT BOs on LRU with below
patch:
Commit 0bbf32026cf5ba41e9922b30e26e1bed1ecd38ae ("drm/amdgpu: band aid
validating VM PTs")

However, the final solution should bulk move all PD/PT and PerVM BOs on the LRU
instead of one by one.

Whenever amdgpu_vm_validate_pt_bos() is called and we have BOs which need to be
validated we move all BOs together to the end of the LRU without dropping the
lock for the LRU.

While doing so we note the beginning and end of this block in the LRU list.

Now when amdgpu_vm_validate_pt_bos() is called and we don't have anything to do,
we don't move every BO one by one, but instead cut the LRU list into pieces so
that we bulk move everything to the end in just one operation.

Test data:
+--------------+-----------------+-----------+---------------------------------------+
|              |The Talos        |Clpeak(OCL)|BusSpeedReadback(OCL)                  |
|              |Principle(Vulkan)|           |                                       |
+------------------------------------------------------------------------------------+
|              |                 |           |0.319 ms(1k) 0.314 ms(2K) 0.308 ms(4K) |
| Original     |  147.7 FPS      |  76.86 us |0.307 ms(8K) 0.310 ms(16K)             |
+------------------------------------------------------------------------------------+
| Orignial + WA|                 |           |0.254 ms(1K) 0.241 ms(2K)              |
|(don't move   |  162.1 FPS      |  42.15 us |0.230 ms(4K) 0.223 ms(8K) 0.204 ms(16K)|
|PT BOs on LRU)|                 |           |                                       |
+------------------------------------------------------------------------------------+
| Bulk move    |  163.1 FPS      |  40.52 us |0.244 ms(1K) 0.252 ms(2K) 0.213 ms(4K) |
|              |                 |           |0.214 ms(8K) 0.225 ms(16K)             |
+--------------+-----------------+-----------+---------------------------------------+

After test them with above three benchmarks include vulkan and opencl. We can
see the visible improvement than original, and even better than original with
workaround.

v2: move all BOs include idle, relocated, and moved list to the end of LRU and
put them together.
v3: remove unused parameter and use list_for_each_entry instead of the one with
save entry.
v4: move the amdgpu_vm_move_to_lru_tail after command submission, at that time,
all bo will be back on idle list.
v5: remove amdgpu_vm_move_to_lru_tail_by_list(), use bulk_moveable instread of
validated, and move ttm_bo_bulk_move_lru_tail() also into
amdgpu_vm_move_to_lru_tail().
v6: clean up and fix return value.

Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Chunming Zhou <david1.zhou@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27 11:11:22 -05:00
Christian König 9a2779528e drm/ttm: revise ttm_bo_move_to_lru_tail to support bulk moves
When move a BO to the end of LRU, it need remember the BO positions.
Make sure all moved bo in between "first" and "last". And they will be bulk
moving together.

Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Chunming Zhou <david1.zhou@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27 11:11:21 -05:00
Christian König 262b9c392e drm/amdgpu: validate the VM root PD from the VM code
Preparation for following changes. This validates the root PD twice,
but the overhead of that should be minimal.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27 11:11:17 -05:00
Christian König 3798e9a6e6 drm/amdgpu: use new scheduler load balancing for VMs
Instead of the fixed round robin use let the scheduler balance the load
of page table updates.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27 11:10:45 -05:00
Christian König 1cadf2b368 drm/amdgpu: fix VM clearing for the root PD
We need to figure out the address after validating the BO, not before.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27 11:09:39 -05:00
Dave Airlie 940fbcb73f Merge branch 'drm-next-4.19' of git://people.freedesktop.org/~agd5f/linux into drm-next
Fixes for 4.19:
- Fix UVD 7.2 instance handling
- Fix UVD 7.2 harvesting
- GPU scheduler fix for when a process is killed
- TTM cleanups
- amdgpu CS bo_list fixes
- Powerplay fixes for polaris12 and CZ/ST
- DC fixes for link training certain HMDs
- DC fix for vega10 blank screen in certain cases

From: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180801222906.1016-1-alexander.deucher@amd.com
2018-08-08 06:22:23 +10:00
Christian König 8ab19ea619 drm/amdgpu: add new amdgpu_vm_bo_trace_cs() function v2
This allows us to trace all VM ranges which should be valid inside a CS.

v2: dump mappings without BO as well

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming  Zhou <david1.zhou@amd.com>
Reviewed-and-tested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> (v1)
Reviewed-by: Huang Rui <ray.huang@amd.com> (v1)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-07-31 16:58:17 -05:00