For r6xx+ asics. This mirrors the behavior of pre-r6xx
asics. We need to program the MC even if something
else in startup() fails. Failure to do so results in
an unusable GPU.
Based on a fix from: Mark Kettenis <kettenis@openbsd.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Removing the clock/power or resetting the VCPU can cause
hangs if that happens in the middle of a register write.
Stall the memory and register bus before putting the VCPU
into reset. Keep it in reset when unloading the module or
suspending.
Signed-off-by: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Changing the UVD BOs offset on suspend/resume doesn't work because the VCPU
internally keeps pointers to it. Just keep it always pinned and save the
content manually.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=66425
v2: fix compiler warning
v3: fix CIK support
Note: a version of this patch needs to go to stable.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Helpful for debugging GPUVM errors as we can see what
hw block and page generated the fault in the log.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Avoid creating temporary platform device that will lead to issue
when several radeon gpu are in same computer. Instead directly use
the radeon device for requesting firmware.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The compute rings use RELEASE_MEM rather then EOP
packets for writing fences and there is no SYNC_PFP_ME
packet on the compute rings.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Type 2 packets are deprecated on CIK MEC and we should use
type 3 nop packets. Setting the count field to the max value
(0x3fff) indicates that only one dword should be skipped
like a type 2 packet.
v2: add comment to code
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
On CIK, the compute rings work slightly differently than
on previous asics, however the basic concepts are the same.
The main differences:
- New MEC engines for compute queues
- Multiple queues per MEC:
- CI/KB: 1 MEC, 4 pipes per MEC, 8 queues per pipe = 32 queues
- KV: 2 MEC, 4 pipes per MEC, 8 queues per pipe = 64 queues
- Queues can be allocated and scheduled by another queue
- New doorbell aperture allows you to assign space in the aperture
for the wptr which allows for userspace access to queues
v2: add wptr shadow, fix eop setup
v3: fix comment
v4: switch to new callback method
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
The doorbell aperture is a PCI BAR whose pages can be
mapped to compute resources for things like wptrs
for userspace queues.
This patch maps the BAR and sets up a simple allocator
to allocate pages from the BAR.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Allows us to select instanced registers based on:
- ME (micro engine
- Pipe
- Queue
- VMID
Switch MC setup to use this new function.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
v2: agd5f: fix clock dividers setup for bonaire
v3: agd5f: rebase
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
v2: update to latest driver changes
v3: properly tear down vm on suspend
v4: fix up irq init ordering
v5: remove outdated comment
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Async page table updates using the sDMA engine. sDMA has a
special packet for updating entries for contiguous pages
that reduces overhead.
v2: add support for and use the CP for now.
v3: update for 2 level PTs
v4: rebase, fix DMA packet
v5: switch to using an IB
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Update the page table base address and flush the
VM TLB using the sDMA.
V2: update for 2 level PTs
V3: update vm flush
V4: update SH_MEM* regs
V5: switch back to old style VM TLB invalidate
V6: fix packet formatting
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CIK has new asynchronous DMA engines called sDMA
(system DMA). Each engine supports 1 ring buffer
for kernel and gfx and 2 userspace queues for compute.
TODO: fill in the compute setup.
v2: update to the latest reset code
v3: remove ib_parse
v4: fix copy_dma()
v5: drop WIP compute sDMA queues
v6: rebase
v7: endian fixes for IB
v8: cleanup for release
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
RLC handles the interrupt controller and other tasks
on the GPU.
v2: add documentation
v3: update programming sequence
v4: additional setup
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Update the page table base address and flush the
VM TLB using the CP.
v2: update for 2 level PTs
v3: use new packet for invalidate
v4: update SH_MEM* regs when flushing the VM
v5: add pfp sync, go back to old style vm TLB invalidate
v6: fix hdp flush packet count
v7: use old style HDP flush
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For gfx ring only. Compute is still todo.
v2: add documentation
v3: update to latest reset changes, integrate emit update patch.
v4: fix count on wait_reg_mem for HDP flush
v5: use old hdp flush method for fence
v6: set valid bit for IB
v7: cleanup for release
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sets up the GFX ring and loads ucode for GFX and Compute.
Todo:
- handle compute queue setup.
v2: add documentation
v3: integrate with latest reset changes
v4: additional init fixes
v5: scratch reg write back no longer supported on CIK
v6: properly set CP_RB0_BASE_HI
v7: rebase
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Currently the driver required 6 sets of ucode:
1. pfp - pre-fetch parser, part of the GFX CP
2. me - micro engine, part of the GFX CP
3. ce - constant engine, part of the GFX CP
4. rlc - interrupt, etc. controller
5. mc - memory controller (discrete cards only)
6. mec - compute engines, part of Compute CP
V2: add documentation
V3: update MC ucode
V4: rebase
V5: update mc ucode
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Redirect invalid memory accesses to the default page
instead of locking up the memory controller.
v2: rebase on top of 2 level PTs
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The vm callbacks are the same as the SI ones right now
(same regs and bits). We could share the SI variants, and
I may yet do that, but I figured I would add CIK specific
ones for now in case we need to change anything.
V2: add documentation, minor fixes.
V3: integrate vram offset fixes for APUs
V4: enable 2 level VM PTs
V5: index SH_MEM_* regs properly
V6: add ib_parse()
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
v2: split soft reset into compute and gfx. Still need
to make reset more fine grained, but this should be a
start.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
v2: tiling fixes
v3: more tiling fixes
v4: more tiling fixes
v5: additional register init
v6: rebase
v7: fix gb_addr_config for KV/KB
v8: drop wip KV bits for now, add missing config reg
v9: fix cu count on Bonaire
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>