Aligned the amd_sriov_msg_pf2vf_info_header and amd_sriov_msg_pf2vf_info_header's
definition with libgv.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Frank.Min <Frank.Min@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Move the kiq handling into amdgpu_virt.c and drop the fallback.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In baremetal, also need to reserve csa for preemption.
so move the csa related code out of sriov.
Reviewed-by: Monk Liu <Monk.Liu@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There is no functional changes,
Use function arguments for SRIOV special variables which
is hardcode in those functions.
so we can share those functions in baremetal.
Reviewed-by: Monk Liu <Monk.Liu@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
driver didn't use this address so far.
Reviewed-by: Monk Liu <Monk.Liu@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Move the CSA area to the top of the VA space to avoid clashing with
HMM/ATC in the lower range on GFX9.
v2: wrong sign noticed by Roger, rebase on CSA_VADDR cleanup, handle VA
hole on GFX9 as well.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
instead of doing it in each GFX ip's sw_fini
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
since now gpu reset is unified with gpu_recover
for both bare-metal and SR-IOV:
1)rename in_sriov_reset to in_gpu_reset
2)move lock_reset from adev->virt to adev
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
1,new imple names amdgpu_gpu_recover which gives more hint
on what it does compared with gpu_reset
2,gpu_recover unify bare-metal and SR-IOV, only the asic reset
part is implemented differently
3,gpu_recover will increase hang job karma and mark its entity/context
as guilty if exceeds limit
V2:
4,in scheduler main routine the job from guilty context will be immedialy
fake signaled after it poped from queue and its fence be set with
"-ECANCELED" error
5,in scheduler recovery routine all jobs from the guilty entity would be
dropped
6,in run_job() routine the real IB submission would be skipped if @skip parameter
equales true or there was VRAM lost occured.
V3:
7,replace deprecated gpu reset, use new gpu recover
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Driver can use this interface to check if there's a function level
reset done in hypervisor. It's helpful when IRQ handler for reset
is not ready, or special handling is required.
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: pding <Pixel.Ding@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
MMIO space can be blocked on virtualised device. Add this
function to check if MMIO is blocked or not.
Todo: need a reliable method such like communation
with hypervisor.
v2:
- add comments inline
Signed-off-by: pding <Pixel.Ding@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
SR-IOV need to exchange some data between PF&VF through shared VRAM
PF will copy some necessary firmware and information to the shared
VRAM. It also requires some information from VF. PF will send a
key through mailbox2 to help guest calculate checksum so that it can
verify whether the data is correct.
So check the data on the specified offset of the shared VRAM, if the
checksum is right, read values from it and write some VF information
next to the data from PF.
Signed-off-by: Horace Chen <horace.chen@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The error handling for virtual functions assumed a single
vf per VM and didn't properly account for bare metal. Make
the error arrays per device and add locking.
Reviewed-by: Gavin Wan <gavin.wan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Move the CSA bo_va from the VM to the fpriv structure.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This feature works for SRIOV enviroment. For non-SRIOV enviroment, the
trans_error function does nothing.
The error information includes error_code (16bit), error_flags(16bit)
and error_data(64bit). Since there are not many errors, we keep the
errors in an array and transfer all errors to Host before amdgpu
initialization function (amdgpu_device_init) exit.
Signed-off-by: Gavin Wan <Gavin.Wan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
that way we can know which job cause hang and
can do per sched reset/recovery instead of all
sched.
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The usage of kiq should not depend on the virtualization.
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Reviewed-by:Andres Rodriquez <andresx7@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add two functions to allocate & free MM table memory.
Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add new structure for MM table for multi media scheduler of sriov.
Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
no need to use a delay work since we don't know how
much time hypervisor takes on FLR, so just polling
and waiting in a work.
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
this lock is used for sriov_gpu_reset, only get this mutex
can run into sriov_gpu_reset.
we have couple source triggers gpu_reset for SRIOV:
1) submit timedout and trigger reset voluntarily
2) invalid instruction detected by ENGINE and trigger reset voluntarily
2) hypervisor found world switch hang and trigger flr and notify guest to
do reset.
all need take care and we need a mutex to protect the consistency of
reset routine.
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
implement SRIOV gpu_reset for future use.
it wil be called from:
1) job timeout
2) privl access or instruction error interrupt
3) hypervisor detect VF hang
v2: agd: rebase on upstream
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
META-DATA is used in GFX cmd submit, we have two
format suit for META-DATA-init, one is legacy and another
is for chained-ib preempt, which is used in vulkan
UMD.
v2: drop use CP version number to judge if chain-ib
supports or not, we wait for it mature
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
VI has asic specific virt support, which including mailbox and
golden registers init.
Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Signed-off-by: shaoyunl <Shaoyun.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add high level interfaces that is not relate to specific asic. So
asic files just need to implement the interfaces to support
virtualization.
Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Monk Liu <Monk.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For virtualization, it is must for driver to use KIQ to access
registers when it is out of GPU full access mode.
v2: agd: rebase
Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Monk Liu <Monk.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add new flag to define gpu runtime that is out of full gpu access.
Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Implement emit_rreg/wreg function for kiq ring.
Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
for SRIOV usage, CSA is only used per device and each
VM will map on it.
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
and implement CSA functions in this file
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use acronym to rename fields to make easy to spell out.
Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
move virtual machine related structure to amdgpu_virt.h
easy for developer to maintain for virualization stuffs
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>