Commit Graph

7471 Commits

Author SHA1 Message Date
Aaron Liu 8fb2e01a1e drm/amdgpu: enable TMZ bit in FRAME_CONTROL for gfx10
This patch enables TMZ bit in FRAME_CONTROL for gfx10.

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28 16:20:29 -04:00
Aaron Liu b231531c50 drm/amdgpu: enable TMZ bit in sdma copy pkt for sdma v5
Enable sdma TMZ mode via setting TMZ bit in sdma copy pkt
for sdma v5.

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28 16:20:29 -04:00
Aaron Liu b7c163fe91 drm/amdgpu: enable TMZ bit in sdma copy pkt for sdma v4
Enable sdma TMZ mode via setting TMZ bit in sdma copy pkt
for sdma v4

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28 16:20:29 -04:00
Aaron Liu c9dc9cfe18 drm/amdgpu: expand amdgpu_copy_buffer interface with tmz parameter
This patch expands amdgpu_copy_buffer interface with tmz parameter.

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28 16:20:29 -04:00
Aaron Liu be7538ff74 drm/amdgpu: expand sdma copy_buffer interface with tmz parameter
This patch expands sdma copy_buffer interface with tmz parameter.

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28 16:20:29 -04:00
Stephen Rothwell 04379e9b04 drm/amdgpu: fix up for amdgpu_tmz.c and removal of drm/drmP.h
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28 16:20:29 -04:00
Alex Deucher 4cd24494cc drm/amdgpu: set TMZ bits in PTEs for secure BO (v4)
If a buffer object is secure, i.e. created with
AMDGPU_GEM_CREATE_ENCRYPTED, then the TMZ bit of
the PTEs that belong the buffer object should be
set.

v1: design and draft the skeletion of TMZ bits setting on PTEs (Alex)
v2: return failure once create secure BO on non-TMZ platform  (Ray)
v3: amdgpu_bo_encrypted() only checks the BO (Luben)
v4: move TMZ flag setting into amdgpu_vm_bo_update  (Christian)

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2020-04-28 16:20:29 -04:00
Huang Rui cb5fae143d drm/amdgpu: job is secure iff CS is secure (v5)
Mark a job as secure, if and only if the command
submission flag has the secure flag set.

v2: fix the null job pointer while in vmid 0
submission.
v3: Context --> Command submission.
v4: filling cs parser with cs->in.flags
v5: move the job secure flag setting out of amdgpu_cs_submit()

Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28 16:20:28 -04:00
Huang Rui 8350361d2d drm/amdgpu: expand the context control interface with trust flag
This patch expands the context control function to support trusted flag while we
want to set command buffer in trusted mode.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28 16:20:28 -04:00
Huang Rui 155748c912 drm/amdgpu: expand the emit tmz interface with trusted flag
This patch expands the emit_tmz function to support trusted flag while we want
to set command buffer in trusted mode.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28 16:20:28 -04:00
Huang Rui eda982a672 drm/amdgpu: add tmz bit in frame control packet
This patch adds tmz bit in frame control pm4 packet, and it will used in future.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28 16:20:28 -04:00
Huang Rui 01a8dcec1a drm/amdgpu: add function to check tmz capability (v4)
Add a function to check tmz capability with kernel parameter and ASIC type.

v2: use a per device tmz variable instead of global amdgpu_tmz.
v3: refine the comments for the function. (Luben)
v4: add amdgpu_tmz.c/h for future use.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28 16:20:28 -04:00
Huang Rui ae60305ac0 drm/amdgpu: add amdgpu_tmz data structure
This patch to add amdgpu_tmz structure which stores all tmz related fields.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28 16:20:28 -04:00
Huang Rui d7ccb38df5 drm/amdgpu: add tmz feature parameter (v2)
This patch adds tmz parameter to enable/disable
the feature in the amdgpu kernel module. Nomally,
by default, it should be auto (rely on the
hardware capability).

But right now, it need to set "off" to avoid
breaking other developers' work because it's not
totally completed.

Will set "auto" till the feature is stable and
completely verified.

v2: add "auto" option for future use.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28 16:20:28 -04:00
Alex Deucher c5efd80f48 drm/amdgpu: define the TMZ bit for the PTE
Define the TMZ (encryption) bit in the page table entry (PTE) for
Raven and newer asics.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
2020-04-28 16:20:28 -04:00
Marek Olšák ff532461a4 drm/amdgpu: bump version for invalidate L2 before SDMA IBs
This fixes GPU hangs due to cache coherency issues.
Bump the driver version. Split out from the original patch.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28 11:50:46 -04:00
Marek Olšák 652a6a858f drm/amdgpu: invalidate L2 before SDMA IBs (v2)
This fixes GPU hangs due to cache coherency issues.

v2: Split the version bump to a separate patch

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28 11:50:46 -04:00
James Zhu cd4df4e6ed drm/amdgpu/vcn2.5: wait for tiles off after unpause
Wait for tiles off after unpause to fix transcode timeout issue.
It is a work around.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28 11:44:30 -04:00
Joseph Greathouse c6d1ec4134 drm/amdkfd: Put ASIC revision into HSA capability
In order to surface the ASIC revision to user level, we want
to put it into the HSA topology. This can be because different
ASIC revisions may require user-level software to do different
things (e.g. patch code for things that are changed in later
hardware revisions).

The ASIC revision from the hardware is maximum of 4 bits at this
time, so put it into 4 of the open bits in the HSA capability.
Then user-level software can use this capability information to
know -- for each ASIC -- what revision-based things must be done.

Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28 11:04:56 -04:00
Jason Yan b6e79d9a31 drm/amdgpu: remove conversion to bool in amdgpu_device.c
The '>' expression itself is bool, no need to convert it to bool again.
This fixes the following coccicheck warning:

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3004:68-73: WARNING:
conversion to bool not needed here

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-27 15:52:16 -04:00
Guchun Chen fd90456c75 drm/amdgpu: decouple EccErrCnt query and clear operation
Due to hardware bug that when RSMU UMC index is disabled,
clear EccErrCnt at the first UMC instance will clean up all other
EccErrCnt registes from other instances at the same time. This
will break the correctable error count log in EccErrCnt register
once querying it. So decouple both to make error count query workable.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-27 15:52:10 -04:00
Guchun Chen 40e733147f drm/amdgpu: switch to SMN interface to operate RSMU index mode
This makes consistent with other regsiters' access in this module.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-27 15:52:03 -04:00
Evan Quan fde812b32c drm/amdgpu: drop redundant cg/pg ungate on runpm enter
CG/PG ungate is already performed in ip_suspend_phase1. Otherwise,
the CG/PG ungate will be performed twice. That will cause gfxoff
disablement is performed twice also on runpm enter while gfxoff
enablemnt once on rump exit. That will put gfxoff into disabled
state.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-27 15:51:56 -04:00
Evan Quan 94fa566058 drm/amdgpu: move kfd suspend after ip_suspend_phase1
This sequence change should be safe as what did in ip_suspend_phase1
is to suspend DCE only. And this is a prerequisite for coming
redundant cg/pg ungate dropping.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-27 15:51:49 -04:00
Jonathan Kim dfe31f255f drm/amdgpu: sw pstate switch should only be for vega20
Driver steered p-state switching is designed for Vega20 only.
Also simplify early return for temporary disable due to SMU FW
bug.

Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-27 15:51:42 -04:00
Zheng Bin d18ba57c72 drm/amdgpu: Remove unneeded semicolon
Fixes coccicheck warning:

drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:2534:2-3: Unneeded semicolon

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zheng Bin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2020-04-27 15:51:14 -04:00
Colin Ian King abb17b1edf drm/amdgpu/gmc: Use consistent variable on unlocks
Currently the error returns paths are unlocking lock kiq->ring_lock
however it seems this should be dev->gfx.kiq.ring_lock as this
is the lock that is being locked and unlocked around the ring
operations.  This looks like a bug, but it's not.  The kiq is just
a local variable pointing to the same structure.  Make it consistent.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-24 11:42:11 -04:00
Yintian Tao 04e4e2e955 drm/amdgpu: protect ring overrun
Wait for the oldest sequence on the ring
to be signaled in order to make sure there
will be no command overrun.

v2: fix coding stype and remove abs operation
v3: remove the initialization of variable r

Signed-off-by: Yintian Tao <yttao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-24 11:42:11 -04:00
Monk Liu 312a79b6ea drm/amdgpu: extent threshold of waiting FLR_COMPLETE
to 5s to satisfy WHOLE GPU reset which need 3+ seconds to
finish

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Yintian Tao <yttao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-24 11:42:11 -04:00
Monk Liu 79bebabb88 drm/amdgpu: for nv12 always need smu ip
because nv12 SRIOV support one vf mode

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Yintian Tao <yttao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-24 11:42:11 -04:00
Monk Liu 8efd72759e drm/amdgpu: skip sysfs node not belong to one vf mode
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Yintian Tao <yttao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-24 11:42:11 -04:00
Monk Liu c2ce6aebf0 drm/amdgpu: provide RREG32_SOC15_NO_KIQ, will be used later
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Yintian Tao <yttao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-24 11:42:11 -04:00
Monk Liu 2f5a0a9119 drm/amdgpu: skip cg/pg set for SRIOV
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Yintian Tao <yttao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-24 11:42:11 -04:00
Monk Liu 1a0f3667d8 drm/amdgpu: ignore TA ucode for SRIOV
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Yintian Tao <yttao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-24 11:42:11 -04:00
Hawking Zhang e748f07d00 drm/amdgpu: retire legacy vega10 sos version check
retired those early sos version used in vega10 bring up
phase

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-23 15:41:06 -04:00
Hawking Zhang 893d14cbe1 drm/amdgpu: switch to helper function to init sos ucode
call common helper function to init sos ucode, instead
of duplicate codes per ip version

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-23 15:41:00 -04:00
Hawking Zhang 1c301f4433 drm/amdgpu: add helper function to init sos ucode
driver already had psp_firmware_header struture to
deal with different layout of sos ucode. the sos
micorcode initialization could be common one.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-23 15:40:53 -04:00
Hawking Zhang f4503f9eb3 drm/amdgpu: switch to helper function to init asd ucode
call common helper function to initialize asd ucode

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-23 15:40:46 -04:00
Hawking Zhang dc7195f663 drm/amdgpu: add helper function to init asd ucode
asd is unified ucode across asic. it is not necessary
to keep its software structure to be ip specific one

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-23 15:40:39 -04:00
Hawking Zhang bc9fb7e93c drm/amdgpu: retire unused check_fw_loading status
The driver can't access UCODE_DATA/ADDR registers on production boards.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-23 15:40:28 -04:00
Hawking Zhang d4d27897db drm/amdgpu: remove unnecessary tOS version check
tOS version is available through debugfs interface

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-23 15:40:22 -04:00
Hawking Zhang a267614932 drm/amdgpu: retire support_vmr_ring interface
vmr ring is dedicated for sriov vf (i.e.guest driver
in sriov), which is general communication interface
between driver and psp fw accross all ip version.
it is not correct to make it as ip specific callback.
it is even worse to check specific tOS version per IP
version (like psp_v11/v12).

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-23 15:40:13 -04:00
Bernard Zhao fe158997c8 drm/amdgpu: shrink critical section in amdgpu_amdkfd_gpuvm_free_memory_of_gpu
Reduce the mem->lock`s protected code area, no need to protect pr_debug.
This also simplifies error handling.

Signed-off-by: Bernard Zhao <bernard@vivo.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-23 15:39:53 -04:00
limingyu 6f81b2d047 drm/amdgpu: Init data to avoid oops while reading pp_num_states.
For chip like CHIP_OLAND with si enabled(amdgpu.si_support=1),
the amdgpu will expose pp_num_states to the /sys directory.
In this moment, read the pp_num_states file will excute the
amdgpu_get_pp_num_states func. In our case, the data hasn't
been initialized, so the kernel will access some ilegal
address, trigger the segmentfault and system will reboot soon:

    uos@uos-PC:~$ cat /sys/devices/pci0000\:00/0000\:00\:00.0/0000\:01\:00
    .0/pp_num_states

    Message from syslogd@uos-PC at Apr 22 09:26:20 ...
     kernel:[   82.154129] Internal error: Oops: 96000004 [#1] SMP

This patch aims to fix this problem, avoid that reading file
triggers the kernel sementfault.

Signed-off-by: limingyu <limingyu@uniontech.com>
Signed-off-by: zhoubinbin <zhoubinbin@uniontech.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-23 15:06:41 -04:00
YueHaibing 00aba6da21 drm/amdgpu: remove set but not used variable 'priority'
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c: In function amdgpu_job_submit:
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c:148:26: warning: variable priority set but not used [-Wunused-but-set-variable]

commit 33abcb1f5a ("drm/amdgpu: set compute queue priority at mqd_init")
left behind this, remove it.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-23 15:06:41 -04:00
Randy Dunlap 408d912100 drm: amdgpu: fix kernel-doc struct warning
Fix a kernel-doc warning of missing struct field desription:

../drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:92: warning: Function parameter or member 'vm' not described in 'amdgpu_vm_eviction_lock'

Fixes: a269e44989 ("drm/amdgpu: Avoid reclaim fs while eviction lock")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: David (ChunMing) Zhou <David1.Zhou@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-23 15:06:41 -04:00
Yintian Tao 5420819401 drm/amdgpu: request reg_val_offs each kiq read reg
According to the current kiq read register method,
there will be race condition when using KIQ to read
register if multiple clients want to read at same time
just like the expample below:
1. client-A start to read REG-0 throguh KIQ
2. client-A poll the seqno-0
3. client-B start to read REG-1 through KIQ
4. client-B poll the seqno-1
5. the kiq complete these two read operation
6. client-A to read the register at the wb buffer and
   get REG-1 value

Therefore, use amdgpu_device_wb_get() to request reg_val_offs
for each kiq read register.

v2: fix the error remove
v3: fix the print typo
v4: remove unused variables

Signed-off-by: Yintian Tao <yttao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-23 15:06:41 -04:00
Christian König e09d40bdba drm/amdgpu: change how we update mmRLC_SPM_MC_CNTL
In pp_one_vf mode avoid the extra overhead and read/write the
registers without the KIQ.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Acked-by: Yintian Tao <yintian.tao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22 18:11:49 -04:00
Dennis Li a891d239f9 drm/amdgpu: set error query ready after all IPs late init
If set error query ready in amdgpu_ras_late_init, which will
cause some IP blocks aren't initialized, but their error query
is ready.

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22 18:11:49 -04:00
Evan Quan 7dd8c205ea drm/amdgpu: code cleanup around gpu reset
Make code more readable.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22 18:11:49 -04:00
Evan Quan 9e94d22c00 drm/amdgpu: optimize the gpu reset for XGMI setup V2
This is basically just some code cosmetic. The current design
for XGMI setup gput reset is to operate on current device(adev)
first and then on other devices from the hive(by another 'for' loop).
But actually we can do some sort to the device list(to put current
device 1st position) and handle all the devices in a single 'for'
loop.

V2: added missing hive->hive_lock protection

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22 18:11:49 -04:00
Evan Quan 52fb44cf30 drm/amdgpu: correct cancel_delayed_work_sync on gpu reset
As for XGMI setup, it should be performed on other devices
from the hive also.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22 18:11:49 -04:00
Evan Quan a2f63ee8b5 drm/amdgpu: correct fbdev suspend on gpu reset
As for XGMI setup, it needs to be performed on
all the devices from the same hive.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22 18:11:49 -04:00
Bernard Zhao 10f39758b8 drm/amdgpu: cleanup coding style in amdkfd a bit
Make the code a bit more readable by using a common
error handling pattern.

Signed-off-by: Bernard Zhao <bernard@vivo.com>
Reviewed-by: Christian König <christian.koenig@amd.com>.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22 18:11:49 -04:00
Kevin Wang e05185b341 drm/amdgpu: clean up unused variable about ring lru
clean up unused variable:
1. ring_lru_list
2. ring_lru_list_lock

related-commit:
drm/amdgpu: remove ring lru handling

Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22 18:11:49 -04:00
Dennis Li 4cc1178e16 drm/amdgpu: replace DRM prefix with PCI device info for gfx/mmhub
Prefix RAS message printing in gfx/mmhub with PCI device info,
which assists the debug in multiple GPU case.

Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22 18:11:49 -04:00
Jiawei 7aba19182e drm/amdgpu: disble vblank when unloading sriov driver
disble vblank in dce_vitual_crtc_commit(), which is skipped
under sriov before

Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Jiawei <Jiawei.Gu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22 18:11:49 -04:00
Yong Zhao d69b8971e5 drm/amdgpu: Print CU information by default during initialization
This is convenient for multiple teams to obtain the information. Also,
add device info by using dev_info().

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22 18:11:49 -04:00
Yong Zhao e1046a1f70 drm/amdgpu: Adjust the SDMA doorbell info printing
Turn off the printing by default because it is not very useful, while
adding more details.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22 18:11:49 -04:00
Jonathan Kim d84a430d9f drm/amdgpu: fix race between pstate and remote buffer map
Vega20 arbitrates pstate at hive level and not device level. Last peer to
remote buffer unmap could drop P-State while another process is still
remote buffer mapped.

With this fix, P-States still needs to be disabled for now as SMU bug
was discovered on synchronous P2P transfers.  This should be fixed in the
next FW update.

Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22 18:11:46 -04:00
James Zhu 4f610503f0 Revert "drm/amdgpu: Disable gfx off if VCN is busy"
This reverts commit 3fded222f4
This is work around for vcn1 only. Currently vcn1 has separate
begin_use and idle work handle.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Tested-by: changzhu <Changfeng.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22 18:11:46 -04:00
Guchun Chen 12c17b9d62 drm/amdgpu: fix kernel page fault issue by ras recovery on sGPU
When running ras uncorrectable error injection and triggering GPU
reset on sGPU, below issue is observed. It's caused by the list
uninitialized when accessing.

[   80.047227] BUG: unable to handle page fault for address: ffffffffc0f4f750
[   80.047300] #PF: supervisor write access in kernel mode
[   80.047351] #PF: error_code(0x0003) - permissions violation
[   80.047404] PGD 12c20e067 P4D 12c20e067 PUD 12c210067 PMD 41c4ee067 PTE 404316061
[   80.047477] Oops: 0003 [#1] SMP PTI
[   80.047516] CPU: 7 PID: 377 Comm: kworker/7:2 Tainted: G           OE     5.4.0-rc7-guchchen #1
[   80.047594] Hardware name: System manufacturer System Product Name/TUF Z370-PLUS GAMING II, BIOS 0411 09/21/2018
[   80.047888] Workqueue: events amdgpu_ras_do_recovery [amdgpu]

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22 18:11:46 -04:00
Kent Russell 69d0c18dda drm/amdgpu: Disable FRU read on Arcturus
Update the list with supported Arcturus chips, but disable for now until
final list is confirmed.

Ideally we can poll atombios for FRU support, instead of maintaining
this list of chips, but this will enable serial number reading for
supported ASICs for the time-being.

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22 18:11:46 -04:00
Rajneesh Bhardwaj 53c9c89ac1 drm/amdgpu/gmc: Fix spelling mistake.
Fixes a minor typo in the file.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22 18:11:46 -04:00
Kent Russell fdd21e62b0 Revert "drm/amdgpu: use the BAR if possible in amdgpu_device_vram_access v2"
This reverts commit c12b84d6e0.

The original patch causes a RAS event and subsequent kernel hard-hang
when running the KFDMemoryTest.PtraceAccessInvisibleVram on VG20 and
Arcturus

dmesg output at hang time:
[drm] RAS event of type ERREVENT_ATHUB_INTERRUPT detected!
amdgpu 0000:67:00.0: GPU reset begin!
Evicting PASID 0x8000 queues
Started evicting pasid 0x8000
qcm fence wait loop timeout expired
The cp might be in an unrecoverable state due to an unsuccessful queues preemption
Failed to evict process queues
Failed to suspend process 0x8000
Finished evicting pasid 0x8000
Started restoring pasid 0x8000
Finished restoring pasid 0x8000
[drm] UVD VCPU state may lost due to RAS ERREVENT_ATHUB_INTERRUPT
amdgpu: [powerplay] Failed to send message 0x26, response 0x0
amdgpu: [powerplay] Failed to set soft min gfxclk !
amdgpu: [powerplay] Failed to upload DPM Bootup Levels!
amdgpu: [powerplay] Failed to send message 0x7, response 0x0
amdgpu: [powerplay] [DisableAllSMUFeatures] Failed to disable all smu features!
amdgpu: [powerplay] [DisableDpmTasks] Failed to disable all smu features!
amdgpu: [powerplay] [PowerOffAsic] Failed to disable DPM!
[drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <powerplay> failed -5

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22 18:11:46 -04:00
Alex Deucher 079c72ad3a drm/amdgpu/gfx9: add gfxoff quirk
Fix screen corruption with firefox.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=207171
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22 18:11:46 -04:00
John Clements 7f70443fd8 drm/amdgpu: set mp1 state before reload
Set MP1 state to prepare for unload before reloading SMU FW

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22 18:11:46 -04:00
John Clements 40e611bdd1 drm/amdgpu: update psp fw loading sequence
Added dedicated function to check if particular fw should be skipped from loading.

Added dedicated function for SMU FW loading via PSP

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22 18:11:45 -04:00
Prike Liang ced1ba9761 drm/amdgpu: fix the hw hang during perform system reboot and reset
The system reboot failed as some IP blocks enter power gate before perform
hw resource destory. Meanwhile use unify interface to set device CGPG to ungate
state can simplify the amdgpu poweroff or reset ungate guard.

Fixes: 487eca11a3 ("drm/amdgpu: fix gfx hang during suspend with video playback (v2)")
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Tested-by: Mengbing Wang <Mengbing.Wang@amd.com>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22 18:11:45 -04:00
Dave Airlie 1aa63ddf72 drm-misc-next for 5.8:
UAPI Changes:
 
   - drm: error out with EBUSY when device has existing master
   - drm: rework SET_MASTER and DROP_MASTER perm handling
 
 Cross-subsystem Changes:
 
   - fbdev: savage: fix -Wextra build warning
   - video: omap2: Use scnprintf() for avoiding potential buffer overflow
 
 Core Changes:
 
   - Remove drm_pci.h
   - drm_pci_{alloc/free)() are now legacy
   - Introduce managed DRM resourcesA
   - Allow drivers to subclass struct drm_framebuffer
   - Introduce struct drm_afbc_framebuffer and helpers
   - fbdev: remove return value from generic fbdev setup
   - Introduce simple-encoder helper
   - vram-helpers: set fence on plane
   - dp_mst: ACT timeout improvements
   - dp_mst: Remove drm_dp_mst_has_audio()
   - TTM: ttm_trace_dma_{map/unmap}() cleanups
   - dma-buf: add flag for PCIP2P support
   - EDID: Various improvements
   - Encoder: cleanup semantics of possible_clones and possible_crtcs
   - VBLANK documentation updates
   - Writeback documentation updates
 
 Driver Changes:
 
   - Convert several drivers to i2c_new_client_device()
   - Drop explicit drm_mode_config_cleanup() calls from drivers
   - Auto-release device structures with drmm_add_final_kfree()
   - Init bfdev console after registering DRM device
   - Make various .debugfs functions return 0 unconditionally; ignore errors
   - video: Use scnprintf() to avoid buffer overflows
   - Convert drivers to simple encoders
 
   - drm/amdgpu: note that we can handle peer2peer DMA-buf
   - drm/amdgpu: add support for exporting VRAM using DMA-buf v3
   - drm/kirin: Revert change to register connectors
   - drm/lima: Add optional devfreq and cooling device support
   - drm/lima: Various improvements wrt. task handling
   - drm/panel: nt39016: Support multiple modes and 50Hz
   - drm/panel: Support Leadtek LTK050H3146W
   - drm/rockchip: Add support for afbc
   - drm/virtio: Various cleanups
   - drm/hisilicon/hibmc: Enforce 128-byte stride alignment
   - drm/qxl: Fix notify port address of cursor ring buffer
   - drm/sun4i: Improvements to format handling
   - drm/bridge: dw-hdmi: Various improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAl6VfAAACgkQaA3BHVML
 eiNjBwgAtzRaqrKX3c4aL4NCBmfWzqxvKN0fVcx8tHtjhmrPTLITsHCM+wfcD2qC
 lkr/RMYJT02pNPGnX3jamQk0q/2GKGagChVZgORRsdYOOf5IqGIjvllhkg+U+7YV
 X0pHAfvGk2VyriHYj3s/cnwi9OwZ2UFjdS+f/u2Qp9jQYG/k8u9CCSnzgratY99I
 bI4jZi6JIoRkwuBpBEc9NbrduenKhcYNgPLDiYXY2TFmVz89NwITPnLyf5FWG5zd
 HsQ+dfIS9eoIxL3DTRgBZrPMvrqgiUjztB7cM4bdE0ttwTS7MW6M50/iV553qb9k
 DZ1+/pWFFyZLOPUYc3EK/QYdu8R3QA==
 =MQkd
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2020-04-14' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next for 5.8:

UAPI Changes:

  - drm: error out with EBUSY when device has existing master
  - drm: rework SET_MASTER and DROP_MASTER perm handling

Cross-subsystem Changes:

  - mm: export two symbols from slub/slob
  - fbdev: savage: fix -Wextra build warning
  - video: omap2: Use scnprintf() for avoiding potential buffer overflow

Core Changes:

  - Remove drm_pci.h
  - drm_pci_{alloc/free)() are now legacy
  - Introduce managed DRM resourcesA
  - Allow drivers to subclass struct drm_framebuffer
  - Introduce struct drm_afbc_framebuffer and helpers
  - fbdev: remove return value from generic fbdev setup
  - Introduce simple-encoder helper
  - vram-helpers: set fence on plane
  - dp_mst: ACT timeout improvements
  - dp_mst: Remove drm_dp_mst_has_audio()
  - TTM: ttm_trace_dma_{map/unmap}() cleanups
  - dma-buf: add flag for PCIP2P support
  - EDID: Various improvements
  - Encoder: cleanup semantics of possible_clones and possible_crtcs
  - VBLANK documentation updates
  - Writeback documentation updates

Driver Changes:

  - Convert several drivers to i2c_new_client_device()
  - Drop explicit drm_mode_config_cleanup() calls from drivers
  - Auto-release device structures with drmm_add_final_kfree()
  - Init bfdev console after registering DRM device
  - Make various .debugfs functions return 0 unconditionally; ignore errors
  - video: Use scnprintf() to avoid buffer overflows
  - Convert drivers to simple encoders

  - drm/amdgpu: note that we can handle peer2peer DMA-buf
  - drm/amdgpu: add support for exporting VRAM using DMA-buf v3
  - drm/kirin: Revert change to register connectors
  - drm/lima: Add optional devfreq and cooling device support
  - drm/lima: Various improvements wrt. task handling
  - drm/panel: nt39016: Support multiple modes and 50Hz
  - drm/panel: Support Leadtek LTK050H3146W
  - drm/rockchip: Add support for afbc
  - drm/virtio: Various cleanups
  - drm/hisilicon/hibmc: Enforce 128-byte stride alignment
  - drm/qxl: Fix notify port address of cursor ring buffer
  - drm/sun4i: Improvements to format handling
  - drm/bridge: dw-hdmi: Various improvements

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20200414090738.GA16827@linux-uq9g
2020-04-22 10:41:35 +10:00
Dave Airlie 1e6adfe565 Merge tag 'amd-drm-fixes-5.7-2020-04-15' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
amd-drm-fixes-5.7-2020-04-15:

amdgpu:
- gfx10 fix
- SMU7 overclocking fix
- RAS fix
- GPU reset fix
- Fix a regression in a previous s/r fix
- Add a gfxoff quirk

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200415221631.3924-1-alexander.deucher@amd.com
2020-04-16 11:24:10 +10:00
Alex Deucher 974229db7e drm/amdgpu/gfx9: add gfxoff quirk
Fix screen corruption with firefox.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=207171
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2020-04-14 12:48:43 -04:00
Prike Liang b2a7e9735a drm/amdgpu: fix the hw hang during perform system reboot and reset
The system reboot failed as some IP blocks enter power gate before perform
hw resource destory. Meanwhile use unify interface to set device CGPG to ungate
state can simplify the amdgpu poweroff or reset ungate guard.

Fixes: 487eca11a3 ("drm/amdgpu: fix gfx hang during suspend with video playback (v2)")
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Tested-by: Mengbing Wang <Mengbing.Wang@amd.com>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2020-04-14 12:48:01 -04:00
Evan Quan 028cfb2444 drm/amdgpu: fix wrong vram lost counter increment V2
Vram lost counter is wrongly increased by two during baco reset.

V2: assumed vram lost for mode1 reset on all ASICs

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13 12:07:09 -04:00
Jason Yan 8e2f842063 drm/amdgpu: remove dead code in si_dpm.c
This code is dead, let's remove it.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13 12:02:42 -04:00
Aurabindo Pillai dd4fa6c1b8 drm/amd/amdgpu: remove hardcoded module name in prints
Let format prefixes take care of printing the module name
through pr_fmt and dev_fmt definitions.

Signed-off-by: Aurabindo Pillai <mail@aurabindo.in>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13 12:02:40 -04:00
Aurabindo Pillai 539489fc91 drm/amd/amdgpu: add print prefix for dev_* variants
Define dev_fmt macro for informative print messages

Signed-off-by: Aurabindo Pillai <mail@aurabindo.in>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13 12:02:37 -04:00
Aurabindo Pillai d57229b1da drm/amd/amdgpu: add prefix for pr_* prints
amdgpu uses lots of pr_* calls for printing error messages.
With this prefix, errors shall be more obvious to the end
use regarding its origin, and may help debugging.

Prefix format:

[xxx.xxxxx] amdgpu: ...

Signed-off-by: Aurabindo Pillai <mail@aurabindo.in>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13 12:02:35 -04:00
Alex Deucher a4c2468027 drm/amdgpu/ring: simplify scheduler setup logic
Set up a GPU scheduler based on the ring flag rather
than the ring type.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13 12:02:26 -04:00
Alex Deucher a783910d5c drm/amdgpu/kiq: add no_scheduler flag to KIQ
We don't want a GPU scheduler for this ring.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13 12:02:24 -04:00
Alex Deucher cb3d108501 drm/amdgpu/ring: add no_scheduler flag
This allows IPs to flag whether a specific ring requires
a GPU scheduler or not.  E.g., sometimes instances of an
IP are asymmetric and have different capabilities.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13 12:02:19 -04:00
Evan Quan dadce777e0 drm/amdgpu: fix wrong vram lost counter increment V2
Vram lost counter is wrongly increased by two during baco reset.

V2: assumed vram lost for mode1 reset on all ASICs

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13 12:02:08 -04:00
Guchun Chen ed72aa21c7 drm/amdgpu: replace DRM prefix with PCI device info for GFX RAS
Prefix RAS message printing in GFX IP with PCI device info,
which assists the debug in multiple GPU case.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13 12:02:03 -04:00
Yintian Tao d32709dac6 drm/amdgpu: resume kiq access debugfs
If there is no GPU hang, user still can access
debugfs through kiq.

Signed-off-by: Yintian Tao <yttao@amd.com>
Reviewed-by: Monk Liu <Monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13 12:01:56 -04:00
Guchun Chen 6952e99cfd drm/amdgpu: refine ras related message print
Prefix ras related kernel message logging with PCI
device info by replacing DRM_INFO/WARN/ERROR with
dev_info/warn/err. This can clearly tell user about
GPU device information where ras is. And add some
other ras message printing to make it more clear
and friendly as well.

Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13 12:01:50 -04:00
Guchun Chen 1f3ef0efba drm/amdgpu: add uncorrectable error count print in UMC ecc irq cb
Uncorrectable error count printing is missed when issuing UMC
UE injection. When going to the error count log function in GPU
recover work thread, there is no chance to get correct error count
value by last error injection and print, because the error status
register is automatically cleared after reading in UMC ecc irq
callback. So add such message printing in UMC ecc irq cb to be
consistent with other RAS error interrupt cases.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13 12:01:44 -04:00
Yintian Tao 95a2f91738 drm/amdgpu: restrict debugfs register access under SR-IOV
Under bare metal, there is no more else to take
care of the GPU register access through MMIO.
Under Virtualization, to access GPU register is
implemented through KIQ during run-time due to
world-switch.

Therefore, under SR-IOV user can only access
debugfs to r/w GPU registers when meets all
three conditions below.
- amdgpu_gpu_recovery=0
- TDR happened
- in_gpu_reset=0

v2: merge amdgpu_virt_can_access_debugfs() into
    amdgpu_virt_enable_access_debugfs()

v3: drop ret variable in amdgpu_virt_enable_access_debugfs()
    and directly return result

Signed-off-by: Yintian Tao <yttao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13 12:01:04 -04:00
Linus Torvalds 21c5b3c6d7 drm fixes for 5.7-rc1 (part two)
legacy:
 - fix drm_local_map.offset type
 
 ttm:
 - temporarily disable hugepages to debug amdgpu problems.
 
 prime:
 - fix sg extraction
 
 amdgpu:
 - Various Renoir fixes
 - Fix gfx clockgating sequence on gfx10
 - RAS fixes
 - Avoid MST property creation after registration
 - Various cursor/viewport fixes
 - Fix a confusing log message about optional firmwares
 
 i915:
 - Flush all the reloc_gpu batch (Chris)
 - Ignore readonly failures when updating relocs (Chris)
 - Fill all the unused space in the GGTT (Chris)
 - Return the right vswing table (Jose)
 - Don't enable DDI IO power on a TypeC port in TBT mode for ICL+ (Imre)
 
 analogix_dp:
 - probe fix
 
 virtio:
 - oob fix in object create
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJej453AAoJEAx081l5xIa+r7YQAIX48cUROehoNDhzEHnAxJuU
 WZXNHKaMCaDPzAs6SyCtHiPFWH6trBR5McE2dXfg6qc33lnzROFNp5PLB7qb4O+q
 +3QkJ5cGd1bohT7vn3omP9FxxZeD4H4bE/zat+yUPwMWJYSUz4m6w6Ya0rIPa8HS
 d9nRL0Y6wGBhm8/E0WCB6fe5G96D1JOFGLhfbVajGlDW/I+eBiS5WEyrtlIW698K
 Q7lTNOXKEi9kFEZiW39RbKwW3YwqOEiQf1k0KbvUqctq4qLskHD3MgJpmkAiGVPH
 mSnTYPPyATILVGKcmEHR3oB9wuYsoPhNgGCVhhm1MppI8GVUzfk6uqOfdK8UNfDU
 IRAZ05AynmMMFNu/4Fw0SyR1sbj4OtAiG0hWaJ6Ou9MBzhERGXfT3+/BzeHsR4MJ
 +fVIbOArSCAeFTAkqcLVbMKjivAJjullpsj36DFn+lXmHnxB98zAkSNT5dQcDjzl
 bp6FhJXm7pWYx8SvkGRneESqLdr2WVgyZmP6u+kgzZ5pPubWSDqY1IFu1exb5Sne
 bf7HoPzQ6LyD5KgX5WdoJ5++bcvQ9G4/qDF96NY6emCMKcwnOaAzvtErxQLpFeWP
 dZwnxHXXtxY4Z4r5bFURPeWX3rWX5f/cCZ8B7mTUDSTa4hgzV8yUX3ZQBc+9Knja
 zuvnpm4j1BXFqOg0Xfsu
 =ZTAf
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2020-04-10' of git://anongit.freedesktop.org/drm/drm

Pull more drm fixes from Dave Airlie:
 "As expected, more fixes did turn up in the latter part of the week.

  The drm_local_map build regression fix is here, along with temporary
  disabling of the hugepage work due to some amdgpu related crashes.

  Otherwise it's just a bunch of i915, and amdgpu fixes.

  legacy:
   - fix drm_local_map.offset type

  ttm:
   - temporarily disable hugepages to debug amdgpu problems.

  prime:
   - fix sg extraction

  amdgpu:
   - Various Renoir fixes
   - Fix gfx clockgating sequence on gfx10
   - RAS fixes
   - Avoid MST property creation after registration
   - Various cursor/viewport fixes
   - Fix a confusing log message about optional firmwares

  i915:
   - Flush all the reloc_gpu batch (Chris)
   - Ignore readonly failures when updating relocs (Chris)
   - Fill all the unused space in the GGTT (Chris)
   - Return the right vswing table (Jose)
   - Don't enable DDI IO power on a TypeC port in TBT mode for ICL+ (Imre)

  analogix_dp:
   - probe fix

  virtio:
   - oob fix in object create"

* tag 'drm-next-2020-04-10' of git://anongit.freedesktop.org/drm/drm: (34 commits)
  drm/ttm: Temporarily disable the huge_fault() callback
  drm/bridge: analogix_dp: Split bind() into probe() and real bind()
  drm/legacy: Fix type for drm_local_map.offset
  drm/amdgpu/display: fix warning when compiling without debugfs
  drm/amdgpu: unify fw_write_wait for new gfx9 asics
  drm/amd/powerplay: error out on forcing clock setting not supported
  drm/amdgpu: fix gfx hang during suspend with video playback (v2)
  drm/amd/display: Check for null fclk voltage when parsing clock table
  drm/amd/display: Acknowledge wm_optimized_required
  drm/amd/display: Make cursor source translation adjustment optional
  drm/amd/display: Calculate scaling ratios on every medium/full update
  drm/amd/display: Program viewport when source pos changes for DCN20 hw seq
  drm/amd/display: Fix incorrect cursor pos on scaled primary plane
  drm/amd/display: change default pipe_split policy for DCN1
  drm/amd/display: Translate cursor position by source rect
  drm/amd/display: Update stream adjust in dc_stream_adjust_vmin_vmax
  drm/amd/display: Avoid create MST prop after registration
  drm/amdgpu/psp: dont warn on missing optional TA's
  drm/amdgpu: update RAS related dmesg print
  drm/amdgpu: resolve mGPU RAS query instability
  ...
2020-04-10 12:38:28 -07:00
Likun Gao bfa5807d4d Revert "drm/amdgpu: change SH MEM alignment mode for gfx10"
This reverts commit b74fb888f4.
Revert the auto alignment mode set of SH MEM config, as it will result
to OCL Conformance Test fail.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:44:41 -04:00
John Clements 9a785c7ad1 drm/amdgpu: increased atom cmd timeout
added macro to define timeout

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:33 -04:00
Aurabindo Pillai ad36d71b3f amdgpu_kms: Remove unnecessary condition check
Execution will only reach here if the asserted condition is true.
Hence there is no need for the additional check.

Signed-off-by: Aurabindo Pillai <mail@aurabindo.in>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:18 -04:00
Aaron Liu ba714a56fc drm/amdgpu: unify fw_write_wait for new gfx9 asics
Make the fw_write_wait default case true since presumably all new
gfx9 asics will have updated firmware. That is using unique WAIT_REG_MEM
packet with opration=1.

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Tested-by: Aaron Liu <aaron.liu@amd.com>
Tested-by: Yuxian Dai <Yuxian.Dai@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:18 -04:00
Hawking Zhang 2eee0229f6 drm/amdgpu: support access regs outside of mmio bar
add indirect access support to registers outside of
mmio bar.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:18 -04:00
Hawking Zhang f384ff95f6 drm/amdgpu: retire AMDGPU_REGS_KIQ flag
all the register access through kiq is redirected
to amdgpu_kiq_rreg/amdgpu_kiq_wreg

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:18 -04:00
Hawking Zhang ec59847e74 drm/amdgpu: retire RREG32_IDX/WREG32_IDX
those are not needed anymore

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:18 -04:00
Hawking Zhang 3c888c1635 drm/amdgpu: retire indirect mmio reg support from cgs
not needed anymore

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:18 -04:00
Hawking Zhang 46e840ed10 drm/amdgpu: replace indirect mmio access in non-dc code path
all the mmCUR_CONTROL instances are in mmr range and
can be accessd directly by using RREG32/WREG32

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:18 -04:00
Hawking Zhang dec0520aff drm/amdgpu: remove inproper workaround for vega10
the workaround is not needed for soc15 ASICs except
for vega10. it is even not needed with latest vega10
vbios.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:18 -04:00
Prike Liang a23ca7f76d drm/amdgpu: fix gfx hang during suspend with video playback (v2)
The system will be hang up during S3 suspend because of SMU is pending
for GC not respose the register CP_HQD_ACTIVE access request.This issue
root cause of accessing the GC register under enter GFX CGGPG and can
be fixed by disable GFX CGPG before perform suspend.

v2: Use disable the GFX CGPG instead of RLC safe mode guard.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Tested-by: Mengbing Wang <Mengbing.Wang@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:17 -04:00
Kent Russell 1ea2b260eb drm/amdgpu: Re-enable FRU check for most models v5
There is at least 1 VG20 DID that does not have an FRU, and trying to read
that will cause a hang. For now, explicitly support reading the FRU for
Arcturus and for the WKS VG20 DIDs, and skip for everything else.
This re-enables serial number reporting for server cards

v2: Add ASIC check
v3: Don't default to true for pre-VG20
v4: Use DID instead of parsing the VBIOS
v5: Sqaush in overflow warning fix

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:17 -04:00
Jack Zhang b639c22c98 drm/amdgpu/sriov add amdgpu_amdkfd_pre_reset in gpu reset
[PATCH 2/2]
kfd_pre_reset will free mem_objs allocated by kfd_gtt_sa_allocate

Without this change, sriov tdr code path will never free those
allocated memories and get memory leak.

Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:15 -04:00
Jack Zhang fe9824d15e drm/amdkfd Avoid destroy hqd when GPU is on reset
This reverts commit 5161bba4311f in order to split it into two
different patches, and this will make it easier to understand.

[PATCH 1/2]
porting to gfx10 from
commit 1b0bfcff46 ("drm/amdgpu: Avoid destroy hqd when GPU is on reset")

Originally, MEC is touched
without GPU initialized first.

Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:15 -04:00
John Clements 4a06686b94 drm/amdgpu: update RAS related dmesg print
prefix RAS error related dmesg print with pci device info

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:15 -04:00
John Clements b3dbd6d3ec drm/amdgpu: resolve mGPU RAS query instability
upon receiving uncorrectable error, query every GPU node for ras errors

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:15 -04:00
Chengming Gui c419bdf5b8 drm/amd/amdgpu: Correct gfx10's CG sequence
Incorrect CG sequence will cause gfx timedout,
if we keep switching power profile mode
(enter profile mod such as PEAK will disable CG,
exit profile mode EXIT will enable CG)
when run Vulkan test case(case used for test: vkexample).

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:15 -04:00
Tianci.Yin b2d92682ff drm/amdgpu: add SPM golden settings for Navi12
Add RLC_SPM golden settings

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Tianci.Yin <tianci.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:15 -04:00
Tianci.Yin a900f562c8 drm/amdgpu: add SPM golden settings for Navi14
Add RLC_SPM golden settings

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Tianci.Yin <tianci.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:15 -04:00
Tianci.Yin 4189425d30 drm/amdgpu: add SPM golden settings for Navi10(v2)
Add RLC_SPM golden settings

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Tianci.Yin <tianci.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:15 -04:00
Oak Zeng d2155a719d drm/amdgpu: Print UTCL2 client ID on a gpuvm fault
UTCL2 client ID is useful information to get which
UTCL2 client caused the gpuvm fault. Print it out
for debug purpose

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian Konig <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:15 -04:00
James Zhu 21b704d783 drm/amdgpu/vcn: add shared memory restore after wake up from sleep.
VCN shared memory needs restore after wake up during S3 test.

v2: Allocate shared memory saved_bo at sw_init and free it in sw_fini.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:15 -04:00
Aaron Ma 2a20e630f8 drm/amdgpu: Fix oops when pp_funcs is unset in ACPI event
On ARCTURUS and RENOIR, powerplay is not supported yet.
When plug in or unplug power jack, ACPI event will issue.
Then kernel NULL pointer BUG will be triggered.
Check for NULL pointers before calling.

Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:15 -04:00
Alex Deucher a45a9e5e10 drm/amdgpu/psp: dont warn on missing optional TA's
Replace dev_warn() with dev_info() and note that they are
optional to avoid confusing users.

The RAS TAs only exist on server boards and the HDCP and DTM
TAs only exist on client boards.  They are optional either way.

Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:15 -04:00
Nirmoy Das 1c6d567bdf drm/amdgpu: rework sched_list generation
Generate HW IP's sched_list in amdgpu_ring_init() instead of
amdgpu_ctx.c. This makes amdgpu_ctx_init_compute_sched(),
ring.has_high_prio and amdgpu_ctx_init_sched() unnecessary.
This patch also stores sched_list for all HW IPs in one big
array in struct amdgpu_device which makes amdgpu_ctx_init_entity()
much more leaner.

v2:
fix a coding style issue
do not use drm hw_ip const to populate amdgpu_ring_type enum

v3:
remove ctx reference and move sched array and num_sched to a struct
use num_scheds to detect uninitialized scheduler list

v4:
use array_index_nospec for user space controlled variables
fix possible checkpatch.pl warnings

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:14 -04:00
Nirmoy Das 07e14845d1 drm/amdgpu: sync ring type and drm hw_ip type
Use AMDGPU_HW_IP_* to set amdgpu_ring_type enum values

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:14 -04:00
Jack Zhang 04bef61e5d drm/amdgpu/sriov add amdgpu_amdkfd_pre_reset in gpu reset
kfd_pre_reset will free mem_objs allocated by kfd_gtt_sa_allocate

Without this change, sriov tdr code path will never free those allocated
memories and get memory leak.

v2:add a bugfix for kiq ring test fail

Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09 10:43:14 -04:00
Aaron Liu 2960758cce drm/amdgpu: unify fw_write_wait for new gfx9 asics
Make the fw_write_wait default case true since presumably all new
gfx9 asics will have updated firmware. That is using unique WAIT_REG_MEM
packet with opration=1.

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Tested-by: Aaron Liu <aaron.liu@amd.com>
Tested-by: Yuxian Dai <Yuxian.Dai@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2020-04-08 17:52:38 -04:00
Prike Liang 487eca11a3 drm/amdgpu: fix gfx hang during suspend with video playback (v2)
The system will be hang up during S3 suspend because of SMU is pending
for GC not respose the register CP_HQD_ACTIVE access request.This issue
root cause of accessing the GC register under enter GFX CGGPG and can
be fixed by disable GFX CGPG before perform suspend.

v2: Use disable the GFX CGPG instead of RLC safe mode guard.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Tested-by: Mengbing Wang <Mengbing.Wang@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2020-04-08 17:51:03 -04:00
Linus Torvalds f5e94d10e4 drm fixes for 5.7-rc1
core:
 - revert drm_mm atomic patch
 - dt binding fixes
 
 fbcon:
 - null ptr error fix
 
 i915:
 - GVT fixes
 
 nouveau:
 - runpm fix
 - svm fixes
 
 amdgpu:
 - HDCP fixes
 - gfx10 fix
 - Misc display fixes
 - BACO fixes
 
 amdkfd:
 - Fix memory leak
 
 vboxvideo:
 - remove conflicting fbs
 
 vc4:
 - mode validation fix
 
 xen:
 - fix PTR_ERR usage
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJejR7RAAoJEAx081l5xIa+heoP/jBjHdwaZsLOkZFqFr8613mR
 juL53xEn7RK7lnVSWBRqAdJznRyJqAyCtDrxW5Au3t0x6zetOv3AhS8fQBtqv/Ze
 ghSdLZdjuTd4Lm2qyS2aWU3DBNnBlYcGcTD6bxwJn3EXSSR1z8YabT2osOOhj7cz
 HwZjhW/XmIOpuhCKDEyyzeTCSLRISBdzgipIbuUXHeqUGB2jt/vpHTiqm+FgTFo5
 hrHfM2EPpUK7LDq+REfXFeIwaQvtRDB1AJ3p8JM9iLt2GvTfavjGIDKDG9XReQhC
 l8WeaMQD69ZbujmBX+XRuP7qj+vfnVYcV/RuMm8Yr09W2Ac8RZOQknvp9PlVmqVl
 xYAvSu1DSD/88/5fvaDxYR2pyCE2ui+3hbFd9WHGFEX8h/cIhGJZ+cZqDO4K0nmY
 vw5JmtSr0Eiwrv2dFn/CFYCxxGz0BdGxVOskbUCwZUPHaTLTAbpfFIaKEABW/sbR
 k5P+cQFjxUJMbCMQeZKSa7L2GZAjf/K/SKyRNMeBuCsfn8KpEUo1W4hhGijtxL//
 akwBFwVeZ0bLTELwB6mFdYGQ987vIcfYjJV0ochPlJCPWJ1OGx6jh1+gYR81W7Np
 5mVFApS2FybmBaTQYWH0E+AhJxyxEHE6spgaGCgCHTlknANmCu3lF90NFVPAK7JB
 Vvlj3gJnOzoguejWzomE
 =HNn4
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2020-04-08' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "This is a set of fixes that have queued up, I think I might have
  another pull with some more before rc1 but I'd like to dequeue what I
  have now just in case Easter is more eggciting that expected.

  The main thing in here is a fix for a longstanding nouveau power
  management issues on certain laptops, it should help runtime
  suspend/resume for a lot of people.

  There is also a reverted patch for some drm_mm behaviour in atomic
  contexts.

  Summary:

  core:
   - revert drm_mm atomic patch
   - dt binding fixes

  fbcon:
   - null ptr error fix

  i915:
   - GVT fixes

  nouveau:
   - runpm fix
   - svm fixes

  amdgpu:
   - HDCP fixes
   - gfx10 fix
   - Misc display fixes
   - BACO fixes

  amdkfd:
   - Fix memory leak

  vboxvideo:
   - remove conflicting fbs

  vc4:
   - mode validation fix

  xen:
   - fix PTR_ERR usage"

* tag 'drm-next-2020-04-08' of git://anongit.freedesktop.org/drm/drm: (41 commits)
  drm/nouveau/kms/nv50-: wait for FIFO space on PIO channels
  drm/nouveau/nvif: protect waits against GPU falling off the bus
  drm/nouveau/nvif: access PTIMER through usermode class, if available
  drm/nouveau/gr/gp107,gp108: implement workaround for HW hanging during init
  drm/nouveau: workaround runpm fail by disabling PCI power management on certain intel bridges
  drm/nouveau/svm: remove useless SVM range check
  drm/nouveau/svm: check for SVM initialized before migrating
  drm/nouveau/svm: fix vma range check for migration
  drm/nouveau: remove checks for return value of debugfs functions
  drm/nouveau/ttm: evict other IO mappings when running out of BAR1 space
  drm/amdkfd: kfree the wrong pointer
  drm/amd/display: increase HDCP authentication delay
  drm/amd/display: Correctly cancel future watchdog and callback events
  drm/amd/display: Don't try hdcp1.4 when content_type is set to type1
  drm/amd/powerplay: move the ASIC specific nbio operation out of smu_v11_0.c
  drm/amd/powerplay: drop redundant BIF doorbell interrupt operations
  drm/amd/display: Fix dcn21 num_states
  drm/amd/display: Enable BT2020 in COLOR_ENCODING property
  drm/amd/display: LFC not working on 2.0x range monitors (v2)
  drm/amd/display: Support plane level CTM
  ...
2020-04-07 20:24:34 -07:00
Alex Deucher 8f0622a19b drm/amdgpu/psp: dont warn on missing optional TA's
Replace dev_warn() with dev_info() and note that they are
optional to avoid confusing users.

The RAS TAs only exist on server boards and the HDCP and DTM
TAs only exist on client boards.  They are optional either way.

Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-07 14:07:41 -04:00
John Clements 2b961e6a95 drm/amdgpu: update RAS related dmesg print
prefix RAS error related dmesg print with pci device info

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-07 14:02:36 -04:00
John Clements 0b9ebd7eeb drm/amdgpu: resolve mGPU RAS query instability
upon receiving uncorrectable error, query every GPU node for ras errors

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-07 14:01:43 -04:00
Chengming Gui dec7880579 drm/amd/amdgpu: Correct gfx10's CG sequence
Incorrect CG sequence will cause gfx timedout,
if we keep switching power profile mode
(enter profile mod such as PEAK will disable CG,
exit profile mode EXIT will enable CG)
when run Vulkan test case(case used for test: vkexample).

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-07 14:01:06 -04:00
Linus Torvalds 86f26a77cb pci-v5.7-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAl6GTQMUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vy3PhAAmqpYBRobOsG8QbmKDjoJEFtkqdvD
 z6+4zf/R+hF11RyXjMDwihIe8d+tkQ4eAaYu6Oh5PrTyanz0G0PgeCrivZeytULk
 thqQIWzDQMVA5vN/2/Vy8s5s+3HzP8z/MZOFScJ7+xA1MndXptPRTNmFUbjx+GAv
 x8/pTp0u9AF6m7itX65DxXvwkzjWamt+Ar4Yx2IcuKAU/M5RtfuZO3PpDnqn7/wk
 JFlkRoYeFB6qNnnkPdeyPHl9dALhuhzgdTyklQEnKVW3nf3xThYDhcEwdh6kBQgl
 0dH8lL5LXy7PKGN8RES4wB0Vqndw/HlsCF5O4wkkfItbnbJxGJtS139e5973m0ud
 sgWvF4yJAT2jCKhIeNz34sePQJMyWALhv0XzZCsJ0YeGHsrV1jrHELkwUT1+eIsT
 3UV0iZ6aL06zQJDyKUbbIcQzEQ/wwBC+x9VgsyL54K1quCQZ1N1Nl/dvrb4cRG9m
 m9EhJK/brDf4c0uFlOmMTSxV1t5J+z6ZSQnh1ShD/o5yBsxqN6q5brDT6LEs+jbM
 LsIkA18jJOd4OyiDs98YiFKvIfFQbQ0LEBQpJwhF0snvfBFMMbUYN/T/NYneWON/
 F0TpkFoP7PXDuq55iNaLdnObfzrpC9kdzUyWvePUvjxIl55bkf+/qtUny+H48t4L
 dNggvW052d7BHes=
 =deWu
 -----END PGP SIGNATURE-----

Merge tag 'pci-v5.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Revert sysfs "rescan" renames that broke apps (Kelsey Skunberg)

   - Add more 32 GT/s link speed decoding and improve the implementation
     (Yicong Yang)

  Resource management:

   - Add support for sizing programmable host bridge apertures and fix a
     related alpha Nautilus regression (Ivan Kokshaysky)

  Interrupts:

   - Add boot interrupt quirk mechanism for Xeon chipsets and document
     boot interrupts (Sean V Kelley)

  PCIe native device hotplug:

   - When possible, disable in-band presence detect and use PDS
     (Alexandru Gagniuc)

   - Add DMI table for devices that don't use in-band presence detection
     but don't advertise that correctly (Stuart Hayes)

   - Fix hang when powering slots up/down via sysfs (Lukas Wunner)

   - Fix an MSI interrupt race (Stuart Hayes)

  Virtualization:

   - Add ACS quirks for Zhaoxin devices (Raymond Pang)

  Error handling:

   - Add Error Disconnect Recover (EDR) support so firmware can report
     devices disconnected via DPC and we can try to recover (Kuppuswamy
     Sathyanarayanan)

  Peer-to-peer DMA:

   - Add Intel Sky Lake-E Root Ports B, C, D to the whitelist (Andrew
     Maier)

  ASPM:

   - Reduce severity of common clock config message (Chris Packham)

   - Clear the correct bits when enabling L1 substates, so we don't go
     to the wrong state (Yicong Yang)

  Endpoint framework:

   - Replace EPF linkup ops with notifier call chain and improve locking
     (Kishon Vijay Abraham I)

   - Fix concurrent memory allocation in OB address region (Kishon Vijay
     Abraham I)

   - Move PF function number assignment to EPC core to support multiple
     function creation methods (Kishon Vijay Abraham I)

   - Fix issue with clearing configfs "start" entry (Kunihiko Hayashi)

   - Fix issue with endpoint MSI-X ignoring BAR Indicator and Table
     Offset (Kishon Vijay Abraham I)

   - Add support for testing DMA transfers (Kishon Vijay Abraham I)

   - Add support for testing > 10 endpoint devices (Kishon Vijay Abraham I)

   - Add support for tests to clear IRQ (Kishon Vijay Abraham I)

   - Add common DT schema for endpoint controllers (Kishon Vijay Abraham I)

  Amlogic Meson PCIe controller driver:

   - Add DT bindings for AXG PCIe PHY, shared MIPI/PCIe analog PHY (Remi
     Pommarel)

   - Add Amlogic AXG PCIe PHY, AXG MIPI/PCIe analog PHY drivers (Remi
     Pommarel)

  Cadence PCIe controller driver:

   - Add Root Complex/Endpoint DT schema for Cadence PCIe (Kishon Vijay
     Abraham I)

  Intel VMD host bridge driver:

   - Add two VMD Device IDs that require bus restriction mode (Sushma
     Kalakota)

  Mobiveil PCIe controller driver:

   - Refactor and modularize mobiveil driver (Hou Zhiqiang)

   - Add support for Mobiveil GPEX Gen4 host (Hou Zhiqiang)

  Microsoft Hyper-V host bridge driver:

   - Add support for Hyper-V PCI protocol version 1.3 and
     PCI_BUS_RELATIONS2 (Long Li)

   - Refactor to prepare for virtual PCI on non-x86 architectures (Boqun
     Feng)

   - Fix memory leak in hv_pci_probe()'s error path (Dexuan Cui)

  NVIDIA Tegra PCIe controller driver:

   - Use pci_parse_request_of_pci_ranges() (Rob Herring)

   - Add support for endpoint mode and related DT updates (Vidya Sagar)

   - Reduce -EPROBE_DEFER error message log level (Thierry Reding)

  Qualcomm PCIe controller driver:

   - Restrict class fixup to specific Qualcomm devices (Bjorn Andersson)

  Synopsys DesignWare PCIe controller driver:

   - Refactor core initialization code for endpoint mode (Vidya Sagar)

   - Fix endpoint MSI-X to use correct table address (Kishon Vijay
     Abraham I)

  TI DRA7xx PCIe controller driver:

   - Fix MSI IRQ handling (Vignesh Raghavendra)

  TI Keystone PCIe controller driver:

   - Allow AM654 endpoint to raise MSI-X interrupt (Kishon Vijay Abraham I)

  Miscellaneous:

   - Quirk ASMedia XHCI USB to avoid "PME# from D0" defect (Kai-Heng
     Feng)

   - Use ioremap(), not phys_to_virt(), for platform ROM to fix video
     ROM mapping with CONFIG_HIGHMEM (Mikel Rychliski)"

* tag 'pci-v5.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (96 commits)
  misc: pci_endpoint_test: remove duplicate macro PCI_ENDPOINT_TEST_STATUS
  PCI: tegra: Print -EPROBE_DEFER error message at debug level
  misc: pci_endpoint_test: Use full pci-endpoint-test name in request_irq()
  misc: pci_endpoint_test: Fix to support > 10 pci-endpoint-test devices
  tools: PCI: Add 'e' to clear IRQ
  misc: pci_endpoint_test: Add ioctl to clear IRQ
  misc: pci_endpoint_test: Avoid using module parameter to determine irqtype
  PCI: keystone: Allow AM654 PCIe Endpoint to raise MSI-X interrupt
  PCI: dwc: Fix dw_pcie_ep_raise_msix_irq() to get correct MSI-X table address
  PCI: endpoint: Fix ->set_msix() to take BIR and offset as arguments
  misc: pci_endpoint_test: Add support to get DMA option from userspace
  tools: PCI: Add 'd' command line option to support DMA
  misc: pci_endpoint_test: Use streaming DMA APIs for buffer allocation
  PCI: endpoint: functions/pci-epf-test: Print throughput information
  PCI: endpoint: functions/pci-epf-test: Add DMA support to transfer data
  PCI: pciehp: Fix MSI interrupt race
  PCI: pciehp: Fix indefinite wait on sysfs requests
  PCI: endpoint: Fix clearing start entry in configfs
  PCI: tegra: Add support for PCIe endpoint mode in Tegra194
  PCI: sysfs: Revert "rescan" file renames
  ...
2020-04-03 14:25:02 -07:00
Aaron Ma 5932d260a8 drm/amdgpu: Fix oops when pp_funcs is unset in ACPI event
On ARCTURUS and RENOIR, powerplay is not supported yet.
When plug in or unplug power jack, ACPI event will issue.
Then kernel NULL pointer BUG will be triggered.
Check for NULL pointers before calling.

Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2020-04-03 17:24:10 -04:00
Likun Gao b74fb888f4 drm/amdgpu: change SH MEM alignment mode for gfx10
Change SH_MEM_CONFIG Alignment mode to Automatic, as:
1)OGL fn_amd_compute_shader will failed with unaligned mode.
2)The default alignment mode was defined to automatic on gfx10
specification.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-03 17:23:21 -04:00
Alex Sierra 193cce34a1 amdgpu/drm: remove psp access on navi10 for sriov
Navi ASICs don't require to access through PSP to osssys registers.
This on SR-IOV configuration.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-03 17:01:25 -04:00
Bhawanpreet Lakha 8913f7ff05 drm/amd/display: Guard calls to hdcp_ta and dtm_ta
[Why]
The buffer used when calling psp is a shared buffer. If we have multiple calls
at the same time we can overwrite the buffer.

[How]
Add mutex to guard the shared buffer.

Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-03 17:01:18 -04:00
Linus Torvalds 50a5de895d hmm related patches for 5.7
This series focuses on corner case bug fixes and general clarity
 improvements to hmm_range_fault().
 
 - 9 bug fixes
 
 - Allow pgmap to track the 'owner' of a DEVICE_PRIVATE - in this case the
   owner tells the driver if it can understand the DEVICE_PRIVATE page or
   not. Use this to resolve a bug in nouveau where it could touch
   DEVICE_PRIVATE pages from other drivers.
 
 - Remove a bunch of dead, redundant or unused code and flags
 
 - Clarity improvements to hmm_range_fault()
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl6CUDEACgkQOG33FX4g
 mxqXgQ/+I2o4QIq86xTddRCesINab73sA4bdvIYgixojTBvo5a9Tu9JO3nmzJOEi
 mmUXsdN+MnyTdnrkcfbEvo0b5ktCGlwGYQmcRv9MrB1QvSChg6Zy3F3cWR+dsv5c
 Mo3z0sWRx+//dVSI0p1BMhd5rym0nZCh/rUO94C/XgZ3YoC+MjX4lKeCMvjJpZFc
 DDEN3Z+zjOvjeiT9TMNMmkPnZSY+N4RQwTRKek5l95QwcHyFy5SBI+uzOHyE0lVw
 KG5+yk+TFRMoiHjZh4BFqkibQbVKG0qMKiOymIncTr3kL4DK0Hwf/4Zk4DMKucEb
 Rs2B7C+UShiyIOzdjbnBsqPiHevAhR3nOpozJy3x/Z6fLapVoIlVx3UJJ/djuxZa
 7+H2VzxKz1xGRDH4Js/WD0smZ8jisA04vW5THJVFr0Vd2sqKBo3G/5ay2Kw9NX7T
 MRII1KkA/luZbmM6WLEQkliVuNkMCsVUU3hiY96tLI7uN73M9fQUe6s3NubeoFxi
 UKg0zuMsAlsJxkHGeY+IMHtUR/law+k9/aZoB4idMGwV/i7KiiolbRcB0H5yn4L5
 OV+4uxVBFS/n37sCpMwpx5WJtgbPzww3b6Cd0hfIUqnQzSOjP40Pl5ZX/c/2M7ps
 bUdZjve5j653YeC/7NBPDprvzbfmyyxJdZZZzzXLQl5kyL2o9LE=
 =UEpV
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull hmm updates from Jason Gunthorpe:
 "This series focuses on corner case bug fixes and general clarity
  improvements to hmm_range_fault(). It arose from a review of
  hmm_range_fault() by Christoph, Ralph and myself.

  hmm_range_fault() is being used by these 'SVM' style drivers to
  non-destructively read the page tables. It is very similar to
  get_user_pages() except that the output is an array of PFNs and
  per-pfn flags, and it has various modes of reading.

  This is necessary before RDMA ODP can be converted, as we don't want
  to have weird corner case regressions, which is still a looking
  forward item. Ralph has a nice tester for this routine, but it is
  waiting for feedback from the selftests maintainers.

  Summary:

   - 9 bug fixes

   - Allow pgmap to track the 'owner' of a DEVICE_PRIVATE - in this case
     the owner tells the driver if it can understand the DEVICE_PRIVATE
     page or not. Use this to resolve a bug in nouveau where it could
     touch DEVICE_PRIVATE pages from other drivers.

   - Remove a bunch of dead, redundant or unused code and flags

   - Clarity improvements to hmm_range_fault()"

* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (25 commits)
  mm/hmm: return error for non-vma snapshots
  mm/hmm: do not set pfns when returning an error code
  mm/hmm: do not unconditionally set pfns when returning EBUSY
  mm/hmm: use device_private_entry_to_pfn()
  mm/hmm: remove HMM_FAULT_SNAPSHOT
  mm/hmm: remove unused code and tidy comments
  mm/hmm: return the fault type from hmm_pte_need_fault()
  mm/hmm: remove pgmap checking for devmap pages
  mm/hmm: check the device private page owner in hmm_range_fault()
  mm: simplify device private page handling in hmm_range_fault
  mm: handle multiple owners of device private pages in migrate_vma
  memremap: add an owner field to struct dev_pagemap
  mm: merge hmm_vma_do_fault into into hmm_vma_walk_hole_
  mm/hmm: don't handle the non-fault case in hmm_vma_walk_hole_()
  mm/hmm: simplify hmm_vma_walk_hugetlb_entry()
  mm/hmm: remove the unused HMM_FAULT_ALLOW_RETRY flag
  mm/hmm: don't provide a stub for hmm_range_fault()
  mm/hmm: do not check pmd_protnone twice in hmm_vma_handle_pmd()
  mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
  mm/hmm: return -EFAULT when setting HMM_PFN_ERROR on requested valid pages
  ...
2020-04-01 17:57:52 -07:00
Linus Torvalds f365ab31ef drm for 5.7-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJehCE6AAoJEAx081l5xIa+bKAQAJj49Hj3WvSJV6cSl3LmgwPV
 IcWpR6LLVrCOQiS600NyXnbv6lTmCtnIfwEneQqm5ltQvJk38QcKQvSua6+ETi9f
 0hl7IiytLwv2R/pS0g9jgSsKmbeP2bDwBAR44vK6pcK6WOgCrkpoL1F4YZEDUILT
 ewnRb52afF3Hw8PSG0lwgBYd7G0uto49t3nt86LjGftJgB3wPFlluVOzLHTeEh0w
 FZEyKuqS8hq8EZFfG1Gu1hS2ylO9y1VgYxiv18jDyRb8jUPq+gzqH6roTPRIronA
 whGZgC6SkyZ9NthCLBu4ITbO9wStAHoawzFfax25QwwuoOyrikuvGy3PfEUu+ixL
 bW2UtYK6BHLGnvZChH6E7i9J6qQbNYCn3Ty5bB1KsY06sHVoP1jYUBSicPN2ELWc
 9KaBI+WROBNEkge/rizwUFfD/u0MZaaSRsVSlGDdHkD7IFj2tDPhfNWdXZQl4EwR
 JnndT6cu97htf7tkid7RASpJ/hnwJTb1hg0Cc9kPblOrGqEDF0K5845FLR9VptGl
 5c2/KyKM/GaI793fP1TG76uDegBhV9337mUF1ZW3c6gCE7QXncZXM5jrIFRE9q2L
 IbvuyUYRof1gW0r1R0WVJYAi2CRBeMd7qgkQjMrbTw0o7FiYDJXB7dc5pYj2um2v
 mSVHikC2S1rbUdH0xbWM
 =I0vz
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2020-04-01' of git://anongit.freedesktop.org/drm/drm

Pull drm updates from Dave Airlie:
 "This is the main drm pull request for 5.7-rc1.

  Highlights:

   - i915 enables Tigerlake by default

   - i915 and amdgpu have initial OLED backlight support

     [ Jani Nikula pipes up and points out that we've had a bunch of
       "initial support" code for a long time already, but only now
       Lyude made it actually work on real world machines ]

   - vmwgfx add support to enable OpenGL 4 userspace

   - zero length arrays are mostly removed.

  Detailed summary:

  new driver:
   - tidss: TI Keystone platform display subsystem

  core:
   - new drm device warn macros
   - mode config valid for memory constrained devices
   - bridge bus format negotation
   - consolidated fake vblank event handling
   - dma_alloc related cleanups
   - drop get_crtc callback
   - dp: DP1.4 EDID corruption test
   - EDID CEA detailed timings improvements
   - relicense some code to dual GPL2/MIT
   - convert core vblank support to per-crtc support
   - rework drm_global_mutex
   - bridge rework to allow omap_dss custom driver removeal
   - remove drm_fb_helper connector interrfaces
   - zero-length array removal

  scheduler:
   - support for modifying the sched list
   - revert job distribution optimization
   - helper to pick least loaded scheduler
   - race condition fix

  mst:
   - various fixes
   - remove register_connector callback

  i915:
   - uapi to allows userspace specific CS ring buffer sizes
   - Tigerlake enablement patches + Tigerlake enabled by default
   - new sysfs entries for engine properties
   - display/logging refactors
   - eDP/DP fixes for DPCD
   - Gen7 back to aliasing-ppgtt
   - Gen8+ irq refactor
   - Avoid globals
   - GEM locking fixes and simplifications
   - Ice Lake and Elkhart Lake fixes and workarounds
   - Baytrail/Haswell instability fix
   - GVT - VFIO edid better support

  amdgpu:
   - Rework VM update handling in preparation for HMM support
   - drm load/unload removal fixups
   - USB-C PD firmware updates
   - HDCP srm support
   - Navi/renoir PM watermark fixes
   - OLED panel support
   - Optimize debugging vram access
   - Use BACO for runtime pm
   - DC clock programming optimizations and fixes
   - PSP fw loading sequence updates
   - Drop DRIVER_USE_AGP
   - Remove legacy drm load and unload callbacks
   - ACP Kconfig fix
   - Lots of fixes across the driver

  amdkfd:
   - runtime pm support
   - more gfx config details in amdgpu

  radeon:
   - drop DRIVER_USE_AGP

  vmwgfx:
   - Disable DMA when SEV encryption in use
   - Shader Model 5 support - needed for GL4 support

  msm:
   - DPU resource manager refactor
   - dpu using atomic global state

  mediatek:
   - MT8183 DPI support

  etnaviv:
   - out-of-bounds read fix
   - expose feature flags for GC400 STM32MP1 SoC
   - runtime suspend entry fix
   - dma32 zone fix

  hisilicon:
   - mode selection fixes

  meson:
   - YUV420 support

  lima:
   - add support for heap buffers

  tinydrm:
   - removal of owner field
   - explicit DT dependency removal
   - YAML schema conversion

  tegra:
   - misc cleanups

  tidss:
   - new driver

  virtio:
   - better batching of notifications to host
   - memory handling reworked
   - shmem + gpu context fixes

  hibmc:
   - add gamma_set support
   - improve DPMS support

  pl111:
   - Integrator IM-PD1 support

  sun4i:
   - LVDS support for A20 + A33
   - DSI panel handling improvements"

* tag 'drm-next-2020-04-01' of git://anongit.freedesktop.org/drm/drm: (1537 commits)
  drm/i915/display: Fix mode private_flags comparison at atomic_check
  drm/i915/gt: Stage the transfer of the virtual breadcrumb
  drm/i915/gt: Select the deepest available parking mode for rc6
  drm/i915: Avoid live-lock with i915_vma_parked()
  drm/i915/gt: Treat idling as a RPS downclock event
  drm/i915/gt: Cancel a hung context if already closed
  drm/i915: Use explicit flag to mark unreachable intel_context
  drm/amdgpu: don't try to reserve training bo for sriov (v2)
  drm/amdgpu/smu11: add support for SMU AC/DC interrupts
  drm/amdgpu/swSMU: handle manual AC/DC notifications
  drm/amdgpu/swSMU: handle DC controlled by GPIO for navi1x
  drm/amdgpu/swSMU: set AC/DC mode based on the current system state (v2)
  drm/amdgpu/swSMU: correct the bootup power source for Navi1X (v2)
  drm/amdgpu/swSMU: use the smu11 power source helper for navi1x
  drm/amdgpu/smu11: add a helper to set the power source
  drm/amd/swSMU: add callback to set AC/DC power source (v2)
  drm/scheduler: fix rare NULL ptr race
  drm/amdgpu: fix the coverage issue to clear ArcVPGRs
  drm/amd/display: Fix pageflip event race condition for DCN.
  drm/[radeon|amdgpu]: Remove HAINAN board from max_sclk override check
  ...
2020-04-01 15:24:20 -07:00
Linus Torvalds 69c1fd9726 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "My attempt to revitalize trivial queue I've been neglecting for years
  (what a disaster that was for this world, right? :) ) with patches
  collected from backlog that were still relevant and not applied
  elsewhere in the meantime"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  err.h: remove deprecated PTR_RET for good
  blk-mq: Fix typo in comment
  x86/boot: Fix comment spelling
  sh: mach-highlander: Fix comment spelling
  s390/dasd: Fix comment spelling
  mfd: wm8994: Fix comment spelling
  docs: Add reference in binfmt-misc.rst
  genirq: fix kerneldoc comment for irq_desc
  drm/amdgpu: fix two documentation mismatch issues
  HID: fix Kconfig word ordering
  list/hashtable: minor documentation corrections.
2020-04-01 14:52:59 -07:00
Colin Ian King a500194e73 drm/amdgpu/vcn: fix spelling mistake "fimware" -> "firmware"
There is a spelling mistake in a dev_err error message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:46 -04:00
Christian König 82c416b13c drm/amdgpu: fix and cleanup amdgpu_gem_object_close v4
The problem is that we can't add the clear fence to the BO
when there is an exclusive fence on it since we can't
guarantee the the clear fence will complete after the
exclusive one.

To fix this refactor the function and also add the exclusive
fence as shared to the resv object.

v2: fix warning
v3: add excl fence as shared instead
v4: squash in fix for fence handling in amdgpu_gem_object_close

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: xinhui pan <xinhui.pan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:46 -04:00
James Zhu e520859cde drm/amdgpu: enable VCN2.5 DPG mode for Arcturus
Enable VCN2.5 DPG mode for arcturus after below items are applied.
ASD: 0x21000023
SOS: 0x17003B
VCN firmware Version ENC: 1.1 DEC: 1 VEP: 0 Revision: 16
VBIOS: 23

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:45 -04:00
James Zhu c97e3076eb drm/amdgpu/vcn2.5: Add firmware w/r ptr reset sync
Add firmware write/read point reset sync through shared memory

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:45 -04:00
James Zhu 9352141027 drm/amdgpu/vcn2.0: Add firmware w/r ptr reset sync
Add firmware write/read point reset sync through shared memory

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:45 -04:00
James Zhu 2c68f0e377 drm/amdgpu/vcn: Add firmware share memory support
Added firmware share memory support for VCN. Current multiple
queue mode is enabled only.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:45 -04:00
James Zhu ad9469fb5b drm/amdgpu/vcn2.5: stall DPG when WPTR/RPTR reset
Add vcn dpg harware synchronization to fix race condition
issue between vcn driver and hardware.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:45 -04:00
James Zhu ef563ff403 drm/amdgpu/vcn2.0: stall DPG when WPTR/RPTR reset
Add vcn dpg harware synchronization to fix race condition
issue between vcn driver and hardware.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:45 -04:00
James Zhu e3b41d82da drm/amdgpu/vcn: fix race condition issue for dpg unpause mode switch
Couldn't only rely on enc fence to decide switching to dpg unpaude mode.
Since a enc thread may not schedule a fence in time during multiple
threads running situation.

v3: 1. Rename enc_submission_cnt to dpg_enc_submission_cnt
    2. Add dpg_enc_submission_cnt check in idle_work_handler

v4:  Remove extra counter check, and reduce counter before idle
    work schedule

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:45 -04:00
James Zhu bd718638b8 drm/amdgpu/vcn: fix race condition issue for vcn start
Fix race condition issue when multiple vcn starts are called.

v2: Removed checking the return value of cancel_delayed_work_sync()
to prevent possible races here.

v3: Add total_submission_cnt to avoid gate power unexpectedly.

v4: Remove extra counter check, and reduce counter before idle
work schedule

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:45 -04:00
Yintian Tao 17e137f27c drm/amdgpu: skip access sdma_v5_0 registers under SRIOV (v2)
Due to the new L1.0b0c011b policy, many SDMA registers are blocked which raise
the violation warning. There are total 6 pair register needed to be skipped
when driver init and de-init.
mmSDMA0/1_CNTL
mmSDMA0/1_F32_CNTL
mmSDMA0/1_UTCL1_PAGE
mmSDMA0/1_UTCL1_CNTL
mmSDMA0/1_CHICKEN_BITS,
mmSDMA0/1_SEM_WAIT_FAIL_TIMER_CNTL

v2: squash in warning fix

Signed-off-by: Yintian Tao <yttao@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:45 -04:00
Christian König 1675c3a24d drm/amdgpu: stop disable the scheduler during HW fini
When we stop the HW for example for GPU reset we should not stop the
front-end scheduler. Otherwise we run into intermediate failures during
command submission.

The scheduler should only be stopped in very few cases:
1. We can't get the hardware working in ring or IB test after a GPU reset.
2. The KIQ scheduler is not used in the front-end and should be disabled during GPU reset.
3. In amdgpu_ring_fini() when the driver unloads.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Test-by: Dennis Li <dennis.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:45 -04:00
Alex Sierra 9e94ff3386 drm/amdgpu: reroute VMC and UMD to IH ring 1 for oss v5
[Why]
Due Page faults can easily overwhelm the interrupt handler.
So to make sure that we never lose valuable interrupts on the primary ring
we re-route page faults to IH ring 1.
It also facilitates the recovery page process, since it's already
running from a process context.
This is valid for Arcturus and future Navi generation GPUs.

[How]
Setting IH_CLIENT_CFG_DATA for VMC and UMD IH clients.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:44 -04:00
Alex Sierra 0ab176e69c drm/amdgpu: call psp to program ih cntl in SR-IOV for Navi
call psp to program ih cntl in SR-IOV if supported on Navi and Arcturus.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:44 -04:00
Alex Sierra ab51801206 drm/amdgpu: enable IH ring 1 and ring 2 for navi
Support added into IH to enable ring1 and ring2 for navi10_ih.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:44 -04:00
Alex Sierra b635ae8744 drm/amdgpu: ih doorbell size of range changed for nbio v7.4
[Why]
nbio v7.4 size of ih doorbell range is 64 bit. This requires 2 DWords per register.

[How]
Change ih doorbell size from 2 to 4. This means two Dwords per ring.
Current configuration uses two ih rings.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:44 -04:00
Alex Sierra 04cdac5c17 drm/amdgpu: infinite retries fix from UTLC1 RB SDMA
[Why]
Previously these registers were set to 0. This was causing an
infinite retry on the UTCL1 RB, preventing higher priority RB such as paging RB.

[How]
Set to one the SDMAx_UTLC1_TIMEOUT registers for all SDMAs on Vega10, Vega12,
Vega20 and Arcturus.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:44 -04:00
Evan Quan a9d82d2f91 drm/amdgpu: fix non-pointer dereference for non-RAS supported
Backtrace on gpu recover test on Navi10.

[ 1324.516681] RIP: 0010:amdgpu_ras_set_error_query_ready+0x15/0x20 [amdgpu]
[ 1324.523778] Code: 4c 89 f7 e8 cd a2 a0 d8 e9 99 fe ff ff 45 31 ff e9 91 fe ff ff 0f 1f 44 00 00 55 48 85 ff 48 89 e5 74 0e 48 8b 87 d8 2b 01 00 <40> 88 b0 38 01 00 00 5d c3 66 90 0f 1f 44 00 00 55 31 c0 48 85 ff
[ 1324.543452] RSP: 0018:ffffaa1040e4bd28 EFLAGS: 00010286
[ 1324.549025] RAX: 0000000000000000 RBX: ffff911198b20000 RCX: 0000000000000000
[ 1324.556217] RDX: 00000000000c0a01 RSI: 0000000000000000 RDI: ffff911198b20000
[ 1324.563514] RBP: ffffaa1040e4bd28 R08: 0000000000001000 R09: ffff91119d0028c0
[ 1324.570804] R10: ffffffff9a606b40 R11: 0000000000000000 R12: 0000000000000000
[ 1324.578413] R13: ffffaa1040e4bd70 R14: ffff911198b20000 R15: 0000000000000000
[ 1324.586464] FS:  00007f4441cbf540(0000) GS:ffff91119ed80000(0000) knlGS:0000000000000000
[ 1324.595434] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1324.601345] CR2: 0000000000000138 CR3: 00000003fcdf8004 CR4: 00000000003606e0
[ 1324.608694] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1324.616303] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1324.623678] Call Trace:
[ 1324.626270]  amdgpu_device_gpu_recover+0x6e7/0xc50 [amdgpu]
[ 1324.632018]  ? seq_printf+0x4e/0x70
[ 1324.636652]  amdgpu_debugfs_gpu_recover+0x50/0x80 [amdgpu]
[ 1324.643371]  seq_read+0xda/0x420
[ 1324.647601]  full_proxy_read+0x5c/0x90
[ 1324.652426]  __vfs_read+0x1b/0x40
[ 1324.656734]  vfs_read+0x8e/0x130
[ 1324.660981]  ksys_read+0xa7/0xe0
[ 1324.665201]  __x64_sys_read+0x1a/0x20
[ 1324.669907]  do_syscall_64+0x57/0x1c0
[ 1324.674517]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1324.680654] RIP: 0033:0x7f44417cf081

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:44 -04:00
Tom St Denis c76c1a4297 drm/amd/amdgpu: Include headers for PWR and SMUIO registers
Clean up the smu10, smu12, and gfx9 drivers to use headers for
registers instead of hardcoding in the C source files.

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:44 -04:00
xinhui pan c8e42d5785 drm/amdgpu: implement more ib pools (v2)
We have three ib pools, they are normal, VM, direct pools.

Any jobs which schedule IBs without dependence on gpu scheduler should
use DIRECT pool.

Any jobs schedule direct VM update IBs should use VM pool.

Any other jobs use NORMAL pool.

v2: squash in coding style fix

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:44 -04:00
Jiawei b7b2a316b9 drm/amdgpu: extend compute job timeout
extend compute lockup timeout to 60000 for SR-IOV.

Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Jiawei <Jiawei.Gu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:43 -04:00
Emily Deng ad31da434e drm/amdgpu: No need support vcn decode
As no need to support vcn decode feature, so disable the
ring for SR-IOV.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:43 -04:00
Monk Liu 2f2941324c drm/amdgpu: postpone entering fullaccess mode
if host support new handshake we only need to enter
fullaccess_mode in ip_init() part, otherwise we need
to do it before reading vbios (becuase host prepares vbios
for VF only after received REQ_GPU_INIT event under
legacy handshake)

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:43 -04:00
Monk Liu dffa11b4f7 drm/amdgpu: adjust sequence of ip_discovery init and timeout_setting
what:
1)move timtout setting before ip_early_init to reduce exclusive mode
cost for SRIOV

2)move ip_discovery_init() to inside of amdgpu_discovery_reg_base_init()
it is a prepare for the later upcoming patches.

why:
in later upcoming patches we would use a new mailbox event --
"req_gpu_init_data", which is a callback hooked in adev->virt.ops and
this callback send a new event "REQ_GPU_INIT_DAT" to host to notify
host to do some preparation like "IP discovery/vbios on the VF FB"
and this callback must be:

A) invoked after set_ip_block() because virt.ops is configured during
set_ip_block()

B) invoked before ip_discovery_init() becausen ip_discovery_init()
need host side prepares everything in VF FB first.

current place of ip_discovery_init() is before we can invoke callback
of adev->virt.ops, thus we must move ip_discovery_init() to a place
after the adev->virt.ops all settle done, and the perfect place is in
amdgpu_discovery_reg_base_init()

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:43 -04:00
Monk Liu 122078de16 drm/amdgpu: equip new req_init_data handshake
by this new handshake host side can prepare vbios/ip-discovery
and pf&vf exchange data upon recieving this request without
stopping world switch.

this way the world switch is less impacted by VF's exclusive mode
request

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:43 -04:00
Monk Liu ff1f03a7b8 drm/amdgpu: use static mmio offset for NV mailbox
what:
with the new "req_init_data" handshake we need to use mailbox
before do IP discovery, so in mxgpu_nv.c file the original
SOC15_REG method won'twork because that depends on IP discovery
complete first.

how:
so the solution is to always use static MMIO offset for NV+ mailbox
registers.
HW team confirm us all MAILBOX registers will be at the same
offset for all ASICs, no IP discovery needed for those registers

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:43 -04:00
Monk Liu aa53bc2edb drm/amdgpu: introduce new request and its function
1) modify xgpu_nv_send_access_requests to support
new idh request

2) introduce new function: req_gpu_init_data() which
is used to notify host to prepare vbios/ip-discovery/pfvf exchange

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:43 -04:00
Monk Liu c27cbdd2d0 drm/amdgpu: introduce new idh_request/event enum
new idh_request and ihd_event to prepare for the
new handshake protocol implementation later

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:43 -04:00
Monk Liu 4d130238a7 drm/amdgpu: cleanup idh event/req for NV headers
1) drop the headers from AI in mxgpu_nv.c, should refer to mxgpu_nv.h

2) the IDH_EVENT_MAX is not used and not aligned with host side
   so drop it
3) the IDH_TEXT_MESSAG was provided in host but not defined in guest

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:43 -04:00
Chen Zhou 955df04e3b drm/amdgpu/uvd7: remove unnecessary conversion to bool
The conversion to bool is not needed, remove it.

Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:43 -04:00
Emily Deng d73cd70127 drm/amdgpu: Ignore the not supported error from psp
As the VCN firmware will not use
vf vmr now. And new psp policy won't support set tmr
now.
For driver compatible issue, ignore the not support error.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:42 -04:00
Emily Deng 6bc8cdde57 drm/amdgpu: Add 4k resolution for virtual display
Add 4k resolution for virtual connector.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:42 -04:00
Emily Deng 02f6efb478 drm/amdgpu: Virtual display need to support multiple ctrcs
The crtc num is determined by virtual_display parameter.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:42 -04:00
John Clements 61380faa4b drm/amdgpu: disable ras query and iject during gpu reset
added flag to ras context to indicate if ras query functionality is ready

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:42 -04:00
John Clements 66399248fe drm/amdgpu: added xgmi ras error reset sequence
added mechanism to clear xgmi ras status inbetween error queries

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:42 -04:00
Monk Liu 3aa0115d23 drm/amdgpu: cleanup all virtualization detection routine
we need to move virt detection much earlier because:
1) HW team confirms us that RCC_IOV_FUNC_IDENTIFIER will always
be at DE5 (dw) mmio offset from vega10, this way there is no
need to implement detect_hw_virt() routine in each nbio/chip file.
for VI SRIOV chip (tonga & fiji), the BIF_IOV_FUNC_IDENTIFIER is at
0x1503

2) we need to acknowledged we are SRIOV VF before we do IP discovery because
the IP discovery content will be updated by host everytime after it recieved
a new coming "REQ_GPU_INIT_DATA" request from guest (there will be patches
for this new handshake soon).

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:42 -04:00
Monk Liu b89659b783 drm/amdgpu: amends feature bits for MM bandwidth mgr
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:42 -04:00
Monk Liu 8884532a6e drm/amdgpu: purge ip_discovery headers
those two headers are not needed for ip discovery

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:42 -04:00
Kent Russell 714309f0f3 drm/amdgpu: Fix FRU data checking
Ensure that when we memcpy, we don't end up copying more data than
the struct supports. For now, this is 16 characters for product number
and serial number, and 32 chars for product name

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:42 -04:00
Kent Russell 358e00e0ad drm/amdgpu: Expose TA FW version in fw_version file
Reporting the fw_version just returns 0, the actual version is kept as
ta_*_ucode_version. This is the same as the feature reported in
the amdgpu_firmware_info debugfs file.

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:42 -04:00
John Clements fabe01d7bb drm/amdgpu: disabled fru eeprom access
added asic support checking function to be filled in by supported asic types

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:42 -04:00
Kent Russell bd607166af drm/amdgpu: Enable reading FRU chip via I2C v3
Allow for reading of information like manufacturer, product number
and serial number from the FRU chip. Report the serial number as
the new sysfs file serial_number. Note that this only works on
server cards, as consumer cards do not feature the FRU chip, which
contains this information.

v2: Add documentation to amdgpu.rst, add helper functions,
    rename functions for consistency, fix bad starting offset
v3: Remove testing definitions

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-01 14:44:41 -04:00
Christian König 8523f8875b drm/amdgpu: improve amdgpu_gem_info debugfs file
Note if a buffer was imported using peer2peer.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Sumit Semwal <sumit.semwal@linaro.org>
Link: https://patchwork.freedesktop.org/patch/359296
2020-04-01 09:02:45 +02:00
Christian König f44ffd677f drm/amdgpu: add support for exporting VRAM using DMA-buf v3
We should be able to do this now after checking all the prerequisites.

v2: fix entrie count in the sgt
v3: manually construct the sg

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Sumit Semwal <sumit.semwal@linaro.org>
Link: https://patchwork.freedesktop.org/patch/359295
2020-04-01 09:02:45 +02:00
Christian König 48262cd949 drm/amdgpu: add checks if DMA-buf P2P is supported
Check if we can do peer2peer on the PCIe bus.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Sumit Semwal <sumit.semwal@linaro.org>
Link: https://patchwork.freedesktop.org/patch/359294
2020-04-01 09:02:45 +02:00
Christian König 57b7b62f5a drm/amdgpu: note that we can handle peer2peer DMA-buf
Importing should work out of the box.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Sumit Semwal <sumit.semwal@linaro.org>
Link: https://patchwork.freedesktop.org/patch/359293
2020-04-01 09:02:45 +02:00
Kevin Wang 987ed8e938 drm/amdgpu: fix hpd bo size calculation error
the HPD bo size calculation error.
the "mem.size" can't present actual BO size all time.

Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <Christian.Koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-31 12:26:14 -04:00
Dave Airlie 5fc0df93fc Linux 5.6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl6BIG4eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGlHUH/RCFve2sfHRPjRW+
 xR5SaLVAw6XKvtKBq7yvKmHEwqNJnL79IHyqqtSrtfFr2FfaH/KvYiCbbAezvSrM
 np0udGu7STKGd21CWuyEZJudyhXkOwMRNiFiCXWp7rs35oh8T0TpJxMzo2Nc1nLk
 JFQPqAP6OSvq4IkWEywKQI+Au3Z1IBf83xVjZ1s+MKPQHYD49x2hc4cQntL5/cnm
 a3DoR2iBkYiGZCZ9dDqAqJTnMQIiCbACdZXgGjNRUpdyA/dtAjsMl11NPYHm8TA2
 3AHBupAK50WBZGad6xv2qKQyScsmoJG2mv92QjlOFz0Tpiu6rLnDlLYREDVB6YH6
 qbLDsc8=
 =XEIU
 -----END PGP SIGNATURE-----

Merge v5.6 into drm-next

msm needed rc6, so I just went and merged release
(msm has been in drm-next outside of this tree)

Signed-off-by: Dave Airlie <airlied@redhat.com>
2020-03-31 15:15:47 +10:00
Mikel Rychliski 72e0ef0e5f PCI: Use ioremap(), not phys_to_virt() for platform ROM
On some EFI systems, the video BIOS is provided by the EFI firmware.  The
boot stub code stores the physical address of the ROM image in pdev->rom.
Currently we attempt to access this pointer using phys_to_virt(), which
doesn't work with CONFIG_HIGHMEM.

On these systems, attempting to load the radeon module on a x86_32 kernel
can result in the following:

  BUG: unable to handle page fault for address: 3e8ed03c
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  *pde = 00000000
  Oops: 0000 [#1] PREEMPT SMP
  CPU: 0 PID: 317 Comm: systemd-udevd Not tainted 5.6.0-rc3-next-20200228 #2
  Hardware name: Apple Computer, Inc. MacPro1,1/Mac-F4208DC8, BIOS     MP11.88Z.005C.B08.0707021221 07/02/07
  EIP: radeon_get_bios+0x5ed/0xe50 [radeon]
  Code: 00 00 84 c0 0f 85 12 fd ff ff c7 87 64 01 00 00 00 00 00 00 8b 47 08 8b 55 b0 e8 1e 83 e1 d6 85 c0 74 1a 8b 55 c0 85 d2 74 13 <80> 38 55 75 0e 80 78 01 aa 0f 84 a4 03 00 00 8d 74 26 00 68 dc 06
  EAX: 3e8ed03c EBX: 00000000 ECX: 3e8ed03c EDX: 00010000
  ESI: 00040000 EDI: eec04000 EBP: eef3fc60 ESP: eef3fbe0
  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010206
  CR0: 80050033 CR2: 3e8ed03c CR3: 2ec77000 CR4: 000006d0
  Call Trace:
   r520_init+0x26/0x240 [radeon]
   radeon_device_init+0x533/0xa50 [radeon]
   radeon_driver_load_kms+0x80/0x220 [radeon]
   drm_dev_register+0xa7/0x180 [drm]
   radeon_pci_probe+0x10f/0x1a0 [radeon]
   pci_device_probe+0xd4/0x140

Fix the issue by updating all drivers which can access a platform provided
ROM. Instead of calling the helper function pci_platform_rom() which uses
phys_to_virt(), call ioremap() directly on the pdev->rom.

radeon_read_platform_bios() previously directly accessed an __iomem
pointer. Avoid this by calling memcpy_fromio() instead of kmemdup().

pci_platform_rom() now has no remaining callers, so remove it.

Link: https://lore.kernel.org/r/20200319021623.5426-1-mikel@mikelr.com
Signed-off-by: Mikel Rychliski <mikel@mikelr.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-30 09:52:23 -05:00
Wolfram Sang 0bf6595049 drm/amdgpu: convert to use i2c_new_client_device()
Move away from the deprecated API.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20200326211005.13301-2-wsa+renesas@sang-engineering.com
2020-03-28 22:47:22 +01:00
Jason Gunthorpe 6bfef2f919 mm/hmm: remove HMM_FAULT_SNAPSHOT
Now that flags are handled on a fine-grained per-page basis this global
flag is redundant and has a confusing overlap with the pfn_flags_mask and
default_flags.

Normalize the HMM_FAULT_SNAPSHOT behavior into one place. Callers needing
the SNAPSHOT behavior should set a pfn_flags_mask and default_flags that
always results in a cleared HMM_PFN_VALID. Then no pages will be faulted,
and HMM_FAULT_SNAPSHOT is not a special flow that overrides the masking
mechanism.

As this is the last flag, also remove the flags argument. If future flags
are needed they can be part of the struct hmm_range function arguments.

Link: https://lore.kernel.org/r/20200327200021.29372-5-jgg@ziepe.ca
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-27 20:19:24 -03:00
Dave Airlie 5117c363eb drm-misc-fixes for v5.6:
- SG fixes for prime, radeon and amdgpu.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEuXvWqAysSYEJGuVH/lWMcqZwE8MFAl58tmgACgkQ/lWMcqZw
 E8NIyBAAoKOKyKwTgzTF+AQvkFsAkRcE5wi7lrJTdbS26susSTstDFOmouHcgfdi
 gxRHwq2BHUfPDV2qUTp9v5dkQDvH9ZrDVaeuT0vK9f9PT+5EEpiO4mlmpq5o2CpH
 5Pn1NAd58vP0BZmLl9MOrJQMKRC7Y7rSAaNXjQu9KfWV1CtqPu/OBm2cbRZC/7Rs
 nRcWRV2H+MSPzzSeGCA8MpcPvQiVEGtGjm3TFtH5Q4EFuE0ILIvf/cqWGLtiIkh2
 QRyE/+bLokuzZc2XerQPf5zxQDCqXc1NPCWXwlAKUUkcIDF3lQ5ewxW6MZ8AExqx
 Sn84+5z/BMlIqjHptODeZaWXLXgUnt7G0iE5aKVlQ14yKgJOejtq2N05XmzhEcLS
 H5WiLW9qIdCKH7C8joZFtb6LAPEq48ubJgYO77G02JSYO/UnB7qBnxTgyEL4Sl2O
 OskTdFTNG4ayVCJkFEgZpU0Xb41H/wIwB1HPcD0QSkHPGmGamIBoy7IoEpfmyWZF
 vN/Ucw0INJMORzr+/sqNSHPnzNhT1MRorVdWMgk/5zcUWn/KD+pQfQrE72UXQtAy
 +Q84lkjhCTGOAVINZGbuC3CkfTdNqqrHTM+IqHBGU6oZ75HUb0N4VfeLQeESBoK6
 kFEQYtB6EL6GMt7d6Pj+qTFXShh1pdWDIqKW2Kswz5nTGqwWgFo=
 =wtId
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-fixes-2020-03-26' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

drm-misc-fixes for v5.6:
- SG fixes for prime, radeon and amdgpu.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ef10e822-76dd-125d-ec1f-9a78c5f76bc3@linux.intel.com
2020-03-27 12:33:23 +10:00
Christoph Hellwig 17ffdc4829 mm: simplify device private page handling in hmm_range_fault
Remove the HMM_PFN_DEVICE_PRIVATE flag, no driver has ever set this flag
on input, and the only place that uses it on output can be trivially
changed to use is_device_private_page().

This removes the ability to request that device_private pages are faulted
back into system memory.

Link: https://lore.kernel.org/r/20200316193216.920734-4-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-26 14:33:38 -03:00
Monk Liu e862b08b46 drm/amdgpu: don't try to reserve training bo for sriov (v2)
1) SRIOV guest KMD doesn't care training buffer
2) if we resered training buffer that will overlap with IP discovery
reservation because training buffer is at vram_size - 0x8000 and
IP discovery is at ()vram_size - 0x10000 => vram_size -1)

v2: squash in warning fix from Nirmoy

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-25 17:04:35 -04:00
Alex Deucher 9644bf5f4a drm/amdgpu/swSMU: handle manual AC/DC notifications
For boards that do not support automatic AC/DC transitions
in firmware, manually tell the firmware when the status
changes.

Bug: https://gitlab.freedesktop.org/drm/amd/issues/1043
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-25 17:00:11 -04:00
Dennis Li 10cda519ef drm/amdgpu: fix the coverage issue to clear ArcVPGRs
Set ComputePGMRSRC1.VGPRS as 0x3f to clear all ArcVGPRs.

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-25 17:00:11 -04:00
Yassine Oudjana c7e5587964 drm/[radeon|amdgpu]: Remove HAINAN board from max_sclk override check
Works stable without the overrides.

Signed-off-by: Yassine Oudjana <y.oudjana@protonmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-25 16:58:40 -04:00
Zhigang Luo 728b3d0533 Revert "drm/amdgpu: add CAP fw loading"
This reverts commit 29e2501f8a.

Signed-off-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-25 16:58:40 -04:00
Shane Francis 0199172f93 drm/amdgpu: fix scatter-gather mapping with user pages
Calls to dma_map_sg may return less segments / entries than requested
if they fall on page bounderies. The old implementation did not
support this use case.

Fixes: be62dbf554 ("iommu/amd: Convert AMD iommu driver to the dma-iommu api")
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206461
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206895
Bug: https://gitlab.freedesktop.org/drm/amd/issues/1056
Signed-off-by: Shane Francis <bigbeeshane@gmail.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200325090741.21957-3-bigbeeshane@gmail.com
Cc: stable@vger.kernel.org
2020-03-25 12:10:40 -04:00
shaoyunl 02be064823 drm/amdgpu/sriov : Don't resume RLCG for SRIOV guest
RLCG is enabled by host driver, no need to enable it in guest for none-PSP load path

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-20 10:45:00 -04:00
John Clements 43c4d57618 drm/amdgpu: protect RAS sysfs during GPU reset
MMHub EDC becomes dirty after BACO reset

EDC registers should be cleared early on in reset phase

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-20 10:45:00 -04:00
Colin Ian King 8cd296082c drm: amd: fix spelling mistake "shoudn't" -> "shouldn't"
There are spelling mistakes in pr_err messages and a comment. Fix these.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-19 00:03:05 -04:00
Nathan Chancellor 931971280c drm/amdgpu: Remove unnecessary variable shadow in gfx_v9_0_rlcg_wreg
clang warns:

drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:754:6: warning: variable 'shadow'
is used uninitialized whenever 'if' condition is
false [-Wsometimes-uninitialized]
        if (offset == grbm_cntl || offset == grbm_idx)
            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:757:6: note: uninitialized use
occurs here
        if (shadow) {
            ^~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:754:2: note: remove the 'if' if
its condition is always true
        if (offset == grbm_cntl || offset == grbm_idx)
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:738:13: note: initialize the
variable 'shadow' to silence this warning
        bool shadow;
                   ^
                    = 0
1 warning generated.

shadow is only assigned in one condition and used as the condition for
another if statement; combine the two if statements and remove shadow
to make the code cleaner and resolve this warning.

Fixes: 2e0cc4d48b ("drm/amdgpu: revise RLCG access path")
Link: https://github.com/ClangBuiltLinux/linux/issues/936
Suggested-by: Joe Perches <joe@perches.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-19 00:03:05 -04:00
James Zhu 6c1cb08e3a drm/amdgpu: fix typo for vcn2.5/jpeg2.5 idle check
fix typo for vcn2.5/jpeg2.5 idle check

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-19 00:03:05 -04:00
James Zhu 23edf7f1a8 drm/amdgpu: fix typo for vcn2/jpeg2 idle check
fix typo for vcn2/jpeg2 idle check

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-19 00:03:05 -04:00
James Zhu 5e31fa6821 drm/amdgpu: fix typo for vcn1 idle check
fix typo for vcn1 idle check

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-19 00:03:05 -04:00
Zhigang Luo 29e2501f8a drm/amdgpu: add CAP fw loading
The CAP fw is for enabling driver compatibility. Currently, it only
enabled for vega10 VF.

Signed-off-by: Zhigang Luo <zhigang.luo@amd.com>
Reviewed-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-19 00:03:05 -04:00
Yintian Tao 31d0271d45 drm/amdgpu: miss PRT case when bo update
Originally, only the PTE valid is taken in consider.
The PRT case is missied when bo update which raise problem.
We need add condition for PRT case.

v2: add PRT condition for amdgpu_vm_bo_update_mapping, too
v3: fix one typo error

Signed-off-by: Yintian Tao <yttao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-19 00:03:04 -04:00
James Zhu a3c33e7a4a drm/amdgpu: fix typo for vcn2.5/jpeg2.5 idle check
fix typo for vcn2.5/jpeg2.5 idle check

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-18 18:21:57 -04:00
James Zhu b5689d22aa drm/amdgpu: fix typo for vcn2/jpeg2 idle check
fix typo for vcn2/jpeg2 idle check

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-18 18:21:45 -04:00
James Zhu acfc62dc68 drm/amdgpu: fix typo for vcn1 idle check
fix typo for vcn1 idle check

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-18 18:21:18 -04:00
Wang Xiayang aad7012c31 drm/amdgpu: fix two documentation mismatch issues
The function name mentioned in the documentational comments mismatches
the actual one. The mismatch may make trouble for automatic documentation
generation. One of the erronous name has seen to be misused
and fixed in commit bc5ab2d29b ("drm/amdgpu: fix typo in function
sdma_v4_0_page_resume").

There is apparently no functional change in the patch.

Signed-off-by: Wang Xiayang <xywang.sjtu@sjtu.edu.cn>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2020-03-17 19:02:28 +01:00
Nirmoy Das 4ff7d8ba4c drm/amdgpu: disable gpu_sched load balancer for vcn jobs
VCN HW doesn't support dynamic load balance on multiple instances
for a context. This patch initializes VNC entities with only one
drm_gpu_scheduler picked by drm_sched_pick_best(). Picking a
drm_gpu_scheduler using drm_sched_pick_best() ensures that we
do load balance among multiple contexts but not among multiple
jobs in a context.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-16 16:21:32 -04:00
Andrey Grodzovsky 9015d60c9e drm/amdgpu: Move EEPROM I2C adapter to amdgpu_device
Puts the i2c adapter in common place for sharing by RAS
and upcoming data read from FRU EEPROM feature.

v2:
Move i2c adapter to amdgpu_pm and rename it.

v3: Move i2c adapter init to ASIC specific code and get rid
of the switch case in amdgpu_device

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-16 16:21:32 -04:00
xinhui pan 57210c19e4 drm_amdgpu: Add job fence to resv conditionally
Job fence on page table should be a shared one, so add it to the root
page talbe bo resv.
last_delayed field is not needed anymore. so remove it.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-16 16:21:32 -04:00
Nirmoy Das 79cb2719be drm/amdgpu: fix switch-case indentation
Fix switch-case indentation in amdgpu_ctx_init_entity()

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-16 16:18:14 -04:00
Monk Liu 2e0cc4d48b drm/amdgpu: revise RLCG access path
what changed:
1)provide new implementation interface for the rlcg access path
2)put SQ_CMD/SQ_IND_INDEX to GFX9 RLCG path to let debugfs's reg_op
function can access reg that need RLCG path help

now even debugfs's reg_op can used to dump wave.

tested-by: Monk Liu <monk.liu@amd.com>
tested-by: Zhou pengju <pengju.zhou@amd.com>
Signed-off-by: Zhou pengju <pengju.zhou@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-16 16:17:55 -04:00
Dennis Li 93cdb48eca drm/amdgpu: add codes to clear AccVGPR for arcturus
AccVGPRs are newly added in arcturus. Before reading these
registers, they should be initialized. Otherwise edc error
happens, when RAS is enabled.

v2: reuse the existing logical to calculate register size

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-13 11:52:36 -04:00
Joe Perches 2541f95c17 AMD KFD: Use fallthrough;
Convert the various uses of fallthrough comments to fallthrough;

Done via script
Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe@perches.com/

Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-13 11:52:35 -04:00
Stanley.Yang c1509f3f6f drm/amdgpu: fix warning in ras_debugfs_create_all()
Fix the warning
"warn: variable dereferenced before check 'obj' (see line 1131)"
by removing unnecessary checks as amdgpu_ras_debugfs_create_all()
is only called from amdgpu_debugfs_init() where obj member in
con->head list is not NULL.
Use list_for_each_entry() instead list_for_each_entry_safe() as obj
do not to be freeing or removing from list during this process.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-13 11:52:34 -04:00
Evan Quan 565d194155 drm/amdgpu: add fbdev suspend/resume on gpu reset
This can fix the baco reset failure seen on Navi10.
And this should be a low risk fix as the same sequence
is already used for system suspend/resume.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-13 11:52:34 -04:00
Guchun Chen 88474ccad5 drm/amdgpu: update ras capability's query based on mem ecc configuration
RAS support capability needs to be updated on top of different
memeory ECC enablement, and remove redundant memory ecc check
in gmc module for vega20 and arcturus.

v2: check HBM ECC enablement and set ras mask accordingly.
v3: avoid to invoke atomfirmware interface to query twice.

Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-13 11:52:34 -04:00
Tom St Denis 6397ec580d drm/amd/amdgpu: Fix GPR read from debugfs (v2)
The offset into the array was specified in bytes but should
be in terms of 32-bit words.  Also prevent large reads that
would also cause a buffer overread.

v2:  Read from correct offset from internal storage buffer.

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-13 11:52:34 -04:00
Stanley.Yang 17cb04f2a6 drm/amdgpu: use amdgpu_ras.h in amdgpu_debugfs.c
include amdgpu_ras.h head file instead of use extern
ras_debugfs_create_all function

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-13 11:52:34 -04:00
Hawking Zhang 06dcd7eb83 drm/amdgpu: check GFX RAS capability before reset counters
disallow the logical to be enabled on platforms that
don't support gfx ras at this stage, like sriov skus,
dgpu with legacy ras.etc

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-13 11:52:33 -04:00
John Clements c2c6f816a8 drm/amdgpu: resolve failed error inject msg
invoking an error injection successfully will cause an at_event intterrupt that

will occur before the invoke sequence can complete causing an invalid error

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-13 11:52:33 -04:00
Jack Zhang 5f87611582 drm/amdgpu/sriov refine vcn_v2_5_early_init func
refine the assignment for vcn.num_vcn_inst,
vcn.harvest_config, vcn.num_enc_rings in VF

Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-13 11:52:33 -04:00
Evan Quan 063e768ebd drm/amdgpu: add fbdev suspend/resume on gpu reset
This can fix the baco reset failure seen on Navi10.
And this should be a low risk fix as the same sequence
is already used for system suspend/resume.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-13 09:20:31 -04:00
Tom St Denis 5bbc6604a6 drm/amd/amdgpu: Fix GPR read from debugfs (v2)
The offset into the array was specified in bytes but should
be in terms of 32-bit words.  Also prevent large reads that
would also cause a buffer overread.

v2:  Read from correct offset from internal storage buffer.

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2020-03-13 09:20:31 -04:00
Dave Airlie 69ddce0970 Merge tag 'amd-drm-next-5.7-2020-03-10' of git://people.freedesktop.org/~agd5f/linux into drm-next
amd-drm-next-5.7-2020-03-10:

amdgpu:
- SR-IOV fixes
- Fix up fallout from drm load/unload callback removal
- Navi, renoir power management watermark fixes
- Refactor smu parameter handling
- Display FEC fixes
- Display DCC fixes
- HDCP fixes
- Add support for USB-C PD firmware updates
- Pollock detection fix
- Rework compute ring priority handling
- RAS fixes
- Misc cleanups

amdkfd:
- Consolidate more gfx config details in amdgpu
- Consolidate bo alloc flags
- Improve code comments
- SDMA MQD fixes
- Misc cleanups

gpu scheduler:
- Add suport for modifying the sched list

uapi:
- Clarify comments about GEM_CREATE flags that are not used by userspace.
  The kernel driver has always prevented userspace from using these.
  They are only used internally in the kernel driver.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200310212748.4519-1-alexander.deucher@amd.com
2020-03-13 09:09:11 +10:00
Dave Airlie 9e12da086e drm-misc-next for 5.7:
UAPI Changes:
 
 Cross-subsystem Changes:
 
 Core Changes:
 
 Driver Changes:
  - fb-helper: Remove drm_fb_helper_{add,add_all,remove}_one_connector
  - fbdev: some cleanups and dead-code removal
  - Conversions to simple-encoder
  - zero-length array removal
  - Panel: panel-dpi support in panel-simple, Novatek NT35510, Elida
    KD35T133,
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCXmZKhwAKCRDj7w1vZxhR
 xUgxAQDB1kkf1xQdU7rdw344vaaMf270qBeG+GNX/py3h9pbnwEA7XQvbB1wWBec
 hR629PO+csE0dWcFkGi8d5kpdWQCOQY=
 =PRn3
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2020-03-09' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next for 5.7:

UAPI Changes:

Cross-subsystem Changes:

Core Changes:

Driver Changes:
 - fb-helper: Remove drm_fb_helper_{add,add_all,remove}_one_connector
 - fbdev: some cleanups and dead-code removal
 - Conversions to simple-encoder
 - zero-length array removal
 - Panel: panel-dpi support in panel-simple, Novatek NT35510, Elida
   KD35T133,

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20200309135439.dicfnbo4ikj4tkz7@gilmour
2020-03-12 12:42:56 +10:00
Dave Airlie d3bd37f587 Linux 5.6-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl5lkYceHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGpHQH/RJrzcaZHo4lw88m
 Jf7vBZ9DYUlRgqE0pxTHWmodNObKRqpwOUGflUcWbb/7GD2LQUfeqhSECVQyTID9
 N9y7FcPvx321Qhc3EkZ24DBYk0+DQ0K2FVUrSa/PxO0n7czxxXWaLRDmlSULEd3R
 D4pVs3zEWOBXJHUAvUQ5R+lKfkeWKNeeepeh+rezuhpdWFBRNz4Jjr5QUJ8od5xI
 sIwobYmESJqTRVBHqW8g2T2/yIsFJ78GCXs8DZLe1wxh40UbxdYDTA0NDDTHKzK6
 lxzBgcmKzuge+1OVmzxLouNWMnPcjFlVgXWVerpSy3/SIFFkzzUWeMbqm6hKuhOn
 wAlcIgI=
 =VQUc
 -----END PGP SIGNATURE-----

Merge v5.6-rc5 into drm-next

Requested my mripard for some misc patches that need this as a base.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2020-03-11 07:27:21 +10:00
Feifei Xu 5d11e37c02 drm/amdgpu/runpm: disable runpm on Vega10
Some framework test will fail if enable runpm on Vega10.
Disable it untill issue fixed.

Signed-off-by: Feifei Xu <Feifei.Xu@amd.com>
Tested-by: Kyle Chen <Kyle.Chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-10 15:55:18 -04:00
Tao Zhou 204eaac625 drm/amdgpu: call ras_debugfs_create_all in debugfs_init
and remove each ras IP's own debugfs creation

this is required to fix ras when the driver does not use the drm load
and unload callbacks due to ordering issues with the drm device node.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-10 15:55:11 -04:00
Tao Zhou f9317014ea drm/amdgpu: add function to creat all ras debugfs node
centralize all debugfs creation in one place for ras

this is required to fix ras when the driver does not use the drm load
and unload callbacks due to ordering issues with the drm device node.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-10 15:55:02 -04:00
xinhui pan 9fe58d0bbd drm/amdgpu: Correct the condition of warning while bo release
Only kernel bo has kfd eviction fence.
This warning is to give a notice that kfd only remove eviction fence on
individual bos.

Tested-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-10 15:54:42 -04:00
Yong Zhao 1d251d9008 drm/amdkfd: Consolidate duplicated bo alloc flags
ALLOC_MEM_FLAGS_* used are the same as the KFD_IOC_ALLOC_MEM_FLAGS_*,
but they are interweavedly used in kernel driver, resulting in bad
readability. For example, KFD_IOC_ALLOC_MEM_FLAGS_COHERENT is not
referenced in kernel, and it functions implicitly in kernel through
ALLOC_MEM_FLAGS_COHERENT, causing unnecessary confusion.

Replace all occurrences of ALLOC_MEM_FLAGS_* with
KFD_IOC_ALLOC_MEM_FLAGS_* to solve the problem.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-10 15:54:34 -04:00
Nirmoy Das ea29221d1d drm/amdgpu: do not set nil entry in compute_prio_sched
If there are no high priority compute queues available then set normal
priority sched array to compute_prio_sched[AMDGPU_GFX_PIPE_PRIO_HIGH]

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-10 15:54:07 -04:00
Hawking Zhang f1c2cd3f8f drm/amdgpu: correct ROM_INDEX/DATA offset for VEGA20
The ROMC_INDEX/DATA offset was changed to e4/e5 since
from smuio_v11 (vega20/arcturus).

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Tested-by: Candice Li <Candice.Li@amd.com>
Reviewed-by: Candice Li <Candice.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-09 16:42:28 -04:00
Nirmoy Das 552b80d740 drm/amdgpu: remove unused functions
AMDGPU statically sets priority for compute queues
at initialization so remove all the functions
responsible for changing compute queue priority dynamically.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-09 13:51:48 -04:00
Nirmoy Das 2316a86bde drm/amdgpu: change hw sched list on ctx priority override
Switch to appropriate sched list for an entity on priority override.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-09 13:51:42 -04:00
Nirmoy Das 33abcb1f5a drm/amdgpu: set compute queue priority at mqd_init
We were changing compute ring priority while rings were being used
before every job submission which is not recommended. This patch
sets compute queue priority at mqd initialization for gfx8, gfx9 and
gfx10.

Policy: make queue 0 of each pipe as high priority compute queue

High/normal priority compute sched lists are generated from set of high/normal
priority compute queues. At context creation, entity of compute queue
get a sched list from high or normal priority depending on ctx->priority

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-09 13:51:24 -04:00
Andrey Grodzovsky 97f6a21bfa drm/amdgpu: Enter low power state if CRTC active.
CRTC in DPMS state off calls for low power state entry.
Support both atomic mode setting and pre-atomic mode setting.

v2: move comment

Acked-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-09 13:50:52 -04:00
Monk Liu cc9f2fba37 drm/amdgpu: disable clock/power gating for SRIOV
and disable MC resum in VCN2.0 as well
those are not concerned by VF driver

Singed-off-by: darlington Opara <darlington.opara@amd.com>
Signed-off-by: Jinage Zhao <jiange.zhao@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06 14:40:30 -05:00
Monk Liu 68430c6be5 drm/amdgpu: cleanup ring/ib test for SRIOV vcn2.0 (v2)
support IB test on dec/enc ring
disable ring test on dec/enc ring (MMSCH limitation)

v2: squash in unused variable warning fix

Singed-off-by: darlington Opara <darlington.opara@amd.com>
Signed-off-by: Jinage Zhao <jiange.zhao@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06 14:40:30 -05:00
Monk Liu dd26858a9c drm/amdgpu: implement initialization part on VCN2.0 for SRIOV
something need to do for VCN2.0 enablement on SRIOV:
1)use one dec ring and one enc ring
2)allocate MM table for MMSCH usage
3)implement SRIOV version vcn_start which orgnize vcn programing
with patcket format and implement start mmsch for to run those
packet
4)doorbell is changed for SRIOV

Singed-off-by: darlington Opara <darlington.opara@amd.com>
Signed-off-by: Jinage Zhao <jiange.zhao@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06 14:34:56 -05:00
Monk Liu fe44249186 drm/amdgpu: disable jpeg block for SRIOV
MMSCH doesn't support jpeg ring on SRIOV

Signed-off-by: Jinage Zhao <jiange.zhao@amd.com>
Singed-off-by: darlington Opara <darlington.opara@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06 14:34:49 -05:00
Monk Liu 3569b6d19e drm/amdgpu: introduce mmsch v2.0 header
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06 14:34:42 -05:00
Yong Zhao fa5bde8056 drm/amdgpu: Use better names to reflect it is CP MQD buffer
Add "CP" to AMDGPU_GEM_CREATE_MQD_GFX9 to indicate it is only for CP MQD
buffer.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06 14:34:18 -05:00
Andrey Grodzovsky 90f88cdd7c drm/amdgpu: Fix GPU reset error.
Problem:
During GU reset PSP's sysfs was being wrongly reinitilized
during call to amdgpu_device_ip_late_init which was failing
with duplicate error.
Fix:
Move psp_sysfs_init to psp_sw_init to avoid this. Add guards
in sysfs file's read and write hook agains premature call
if PSP is not finished initialization.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06 14:32:24 -05:00
Jacob He 5e208eb62b drm/amdgpu: Update SPM_VMID with the job's vmid when application reserves the vmid
SPM access the video memory according to SPM_VMID. It should be updated
with the job's vmid right before the job is scheduled. SPM_VMID is a
global resource

Signed-off-by: Jacob He <jacob.he@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06 14:32:16 -05:00
John Clements 1a2172b5ee drm/amdgpu: update page retirement sequence
check UMC status and exit prior to making and erroneus register access

this resolved unexpected behaviour with UMC indexing mode broadcasting writes

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06 14:32:06 -05:00
Guchun Chen d38c3ac716 drm/amdgpu: toggle DF-Cstate when accessing UMC ras error related registers
On arcturus, DF-Cstate needs to be toggled off/on
before and after accessing UMC error counter and
error address registers, otherwise, clearing such
registers may fail.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06 14:31:59 -05:00
John Clements 1b3460a8b1 drm/amdgpu: increase atombios cmd timeout
mitigates race condition on BACO reset between GPU bootcode and driver reload

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06 14:31:51 -05:00
Hawking Zhang a61f41b177 drm/amdgpu: enable PCS error report on arcturus
add arcturus xgmi/wafl pcs err status group to support
PCS error detection and report on arcturus

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06 14:31:43 -05:00
Hawking Zhang ec01fe2dbf drm/amdgpu: enable PCS error report on VG20
Now driver will report XGMI/WAFL PCS error through
sysfs xgmi_wafl_err_count node on Vega20

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06 14:31:35 -05:00
Hawking Zhang 18f36157f2 drm/amdgpu: add helper funcs to detect PCS error
Since from vega20, hardware supports run-time detect
and report XGMI/WAFL PCS ras error. Add helper functions
to walkthrough every type of ras error and report it if
any.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06 14:31:28 -05:00
Pankaj Bharadiya ff1f62d35b drm: Remove drm_fb_helper add, add all and remove connector calls
drm_fb_helper_{add,remove}_one_connector() and
drm_fb_helper_single_add_all_connectors() are dummy functions now
and serve no purpose. Hence remove their calls.

This is the preparatory step for removing the
drm_fb_helper_{add,remove}_one_connector() functions from
drm_fb_helper.h

This removal is done using below sementic patch and unused variable
compilation warnings are fixed manually.

@@
@@

- drm_fb_helper_single_add_all_connectors(...);

@@
expression e1;
statement S;
@@
- e1 = drm_fb_helper_single_add_all_connectors(...);
- S

@@
@@

- drm_fb_helper_add_one_connector(...);

@@
@@

- drm_fb_helper_remove_one_connector(...);

Changes since v1:
* Squashed warning fixes into the patch that introduced the
  warnings (into 5/7) (Laurent, Emil, Lyude)

Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20200305120434.111091-6-pankaj.laxminarayan.bharadiya@intel.com
2020-03-06 14:19:58 +01:00
Pankaj Bharadiya 2dea2d1182 drm: Remove unused arg from drm_fb_helper_init
The max connector argument for drm_fb_helper_init() isn't used anymore
hence remove it.

All the drm_fb_helper_init() calls are modified with below sementic
patch.

@@
expression E1, E2, E3;
@@
-  drm_fb_helper_init(E1,E2, E3)
+  drm_fb_helper_init(E1,E2)

Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20200305120434.111091-2-pankaj.laxminarayan.bharadiya@intel.com
2020-03-06 14:19:57 +01:00
Tianci.Yin 194bcf35bc drm/amdgpu: disable 3D pipe 1 on Navi1x
[why]
CP firmware decide to skip setting the state for 3D pipe 1 for Navi1x as there
is no use case.

[how]
Disable 3D pipe 1 on Navi1x.

Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Tianci.Yin <tianci.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2020-03-05 09:41:55 -05:00