For very large page table updates, we can exceed the
size of the ring. To avoid this, use an IB to perform
the page table update.
v2(ck): cleanup the IB infrastructure and the use it instead
of filling the struct ourself.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
fetch the reset mask and check if the relevant ring flags
are set to determine whether the ring is hung or not.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When we attempt the reset the GPU, look at the status registers
to determine what blocks need to be reset.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Used by all asic families from r600+.
Flag for the vbios and later instances of the driver
that the GPU is hung.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes GPU hang during DMA ring IB test.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=59672
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes a hard lock in the gpu reset code after the
rework for DMA support (0ecebb9e0d
"drm/radeon: switch to a finer grained reset for evergreen")
due to not bailing before the MC shutdown if the relevant engines
are idle.
Discussion:
http://lists.freedesktop.org/archives/dri-devel/2013-January/032985.html
Reported-by: Eldad Zack <eldad@fogrefinery.com>
Tested-by: Eldad Zack <eldad@fogrefinery.com>
Acked-by: Paul Menzel <paulepanter@users.sourceforge.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
No change in functionality as we currently set all the reset
flags.
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This try to reset the dma engine when performing gpu reset. Hopefully
bringing back the gpu dma engine in sane state.
v2: agd5f: fix dma reset on cayman/TN, add support for SI
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
To help debug dma related lockup.
v2: agd5f: update SI as well
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull DRM updates from Dave Airlie:
"This is the one and only next pull for 3.8, we had a regression we
found last week, so I was waiting for that to resolve itself, and I
ended up with some Intel fixes on top as well.
Highlights:
- new driver: nvidia tegra 20/30/hdmi support
- radeon: add support for previously unused DMA engines, more HDMI
regs, eviction speeds ups and fixes
- i915: HSW support enable, agp removal on GEN6, seqno wrapping
- exynos: IPP subsystem support (image post proc), HDMI
- nouveau: display class reworking, nv20->40 z compression
- ttm: start of locking fixes, rcu usage for lookups,
- core: documentation updates, docbook integration, monotonic clock
usage, move from connector to object properties"
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (590 commits)
drm/exynos: add gsc ipp driver
drm/exynos: add rotator ipp driver
drm/exynos: add fimc ipp driver
drm/exynos: add iommu support for ipp
drm/exynos: add ipp subsystem
drm/exynos: support device tree for fimd
radeon: fix regression with eviction since evict caching changes
drm/radeon: add more pedantic checks in the CP DMA checker
drm/radeon: bump version for CS ioctl support for async DMA
drm/radeon: enable the async DMA rings in the CS ioctl
drm/radeon: add VM CS parser support for async DMA on cayman/TN/SI
drm/radeon/kms: add evergreen/cayman CS parser for async DMA (v2)
drm/radeon/kms: add 6xx/7xx CS parser for async DMA (v2)
drm/radeon: fix htile buffer size computation for command stream checker
drm/radeon: fix fence locking in the pageflip callback
drm/radeon: make indirect register access concurrency-safe
drm/radeon: add W|RREG32_IDX for MM_INDEX|DATA based mmio accesss
drm/exynos: support extended screen coordinate of fimd
drm/exynos: fix x, y coordinates for right bottom pixel
drm/exynos: fix fb offset calculation for plane
...
Async DMA has a special packet for contiguous pt updates
which saves overhead.
v2: leave the CP method enabled for now as doing the updates
in the DMA rings is not working properly yet.
v3: update for 2 level pts
v4: rebase
v5: drop pte/pde packet. doesn't seem to work on NI.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There are 2 async DMA engines on cayman, one at 0xd000 and
one at 0xd800. The programming interface is the same as
evergreen however there are some changes to the commands
for using vmids.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Redirect invalid memory accesses to the default page
instead of locking up the memory controller. Also
enable the invalid memory access interrupts and
start spamming system log with it.
v2 (agd5f): fix up against 2 level PT changes
Signed-off-by: Christian König <deathsimple@vodafone.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
"Whether" is misspelled in various comments across the tree; this
fixes them. No code changes.
Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Handle requests that won't fit into a single packet.
v2: pe needs to increase as well.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Otherwise the next IB might start reading commands
with the page table still invalid.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
No need to emit them at VM flush as we no longer use
variable sized page tables now that we support 2 level
page tables. This matches the behavior of SI (which
does not support variable sized page tables).
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Based on Dmitries work, but splitting the code into page
directory and page table handling makes it far more
readable and (hopefully) more reliable.
Allocations of page tables are made from the SA on demand,
that should still work fine since all page tables are of
the same size.
Also using the fact that allocations from the SA are mostly
continuously (except for end of buffer wraps and under very
high memory pressure) to group updates send to the chipset
specific code into larger chunks.
v3: mostly a rewrite of Dmitries previous patch.
v4: fix some typos and coding style
Signed-off-by: Dmitry Cherkasov <Dmitrii.Cherkasov@amd.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The actual set up and assignment of VM page tables
is done on the fly in radeon_gart.c.
v2: update vm size comments
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Pull drm merge (part 1) from Dave Airlie:
"So first of all my tree and uapi stuff has a conflict mess, its my
fault as the nouveau stuff didn't hit -next as were trying to rebase
regressions out of it before we merged.
Highlights:
- SH mobile modesetting driver and associated helpers
- some DRM core documentation
- i915 modesetting rework, haswell hdmi, haswell and vlv fixes, write
combined pte writing, ilk rc6 support,
- nouveau: major driver rework into a hw core driver, makes features
like SLI a lot saner to implement,
- psb: add eDP/DP support for Cedarview
- radeon: 2 layer page tables, async VM pte updates, better PLL
selection for > 2 screens, better ACPI interactions
The rest is general grab bag of fixes.
So why part 1? well I have the exynos pull req which came in a bit
late but was waiting for me to do something they shouldn't have and it
looks fairly safe, and David Howells has some more header cleanups
he'd like me to pull, that seem like a good idea, but I'd like to get
this merge out of the way so -next dosen't get blocked."
Tons of conflicts mostly due to silly include line changes, but mostly
mindless. A few other small semantic conflicts too, noted from Dave's
pre-merged branch.
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (447 commits)
drm/nv98/crypt: fix fuc build with latest envyas
drm/nouveau/devinit: fixup various issues with subdev ctor/init ordering
drm/nv41/vm: fix and enable use of "real" pciegart
drm/nv44/vm: fix and enable use of "real" pciegart
drm/nv04/dmaobj: fixup vm target handling in preparation for nv4x pcie
drm/nouveau: store supported dma mask in vmmgr
drm/nvc0/ibus: initial implementation of subdev
drm/nouveau/therm: add support for fan-control modes
drm/nouveau/hwmon: rename pwm0* to pmw1* to follow hwmon's rules
drm/nouveau/therm: calculate the pwm divisor on nv50+
drm/nouveau/fan: rewrite the fan tachometer driver to get more precision, faster
drm/nouveau/therm: move thermal-related functions to the therm subdev
drm/nouveau/bios: parse the pwm divisor from the perf table
drm/nouveau/therm: use the EXTDEV table to detect i2c monitoring devices
drm/nouveau/therm: rework thermal table parsing
drm/nouveau/gpio: expose the PWM/TOGGLE parameter found in the gpio vbios table
drm/nouveau: fix pm initialization order
drm/nouveau/bios: check that fixed tvdac gpio data is valid before using it
drm/nouveau: log channel debug/error messages from client object rather than drm client
drm/nouveau: have drm debugging macros build on top of core macros
...
Pass the vm and ring index rather than an IB. This allows
us to use the vm_flush interface for non-IB cases in the
future.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Convert #include "..." to #include <path/...> in drivers/gpu/.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Jones <davej@redhat.com>
PDE/PTE update code uses CP ring for memory writes.
All page table entries are preallocated for now in alloc_pt().
It is made as whole because it's hard to divide it to several patches
that compile and doesn't break anything being applied separately.
Tested on cayman card.
v2: rebased on top of "refactor set_page chipset interface v3",
code cleanups
v3: switched offsets calc macros to inline funcs where possible,
remove pd_addr from radeon_vm, switched RADEON_BLOCK_SIZE define,
to 9 (and PTE_COUNT to 1 << BLOCK_SIZE)
v4 (ck): move "incr" documentation to previous patch, cleanup and
document RADEON_VM_* constants, change commit message to
our usual format, simplify patch allot by removing
everything current not necessary, disable SI workaround.
v5: (agd5f): Fix typo in tables_size calculation in
radeon_vm_alloc_pt(). Second line should have been
'+=' rather than '='.
v6: fix npdes calculation. In scenario when pfns to be mapped overlap
two PDE spans:
+-----------+-------------+
| PDE span | PDE span |
+-----------+----+--------+
| |
+---------+
| pfns |
+---------+
the following npdes calculation gives incorrect result:
npdes = (nptes >> RADEON_VM_BLOCK_SIZE) + 1;
For the case above picture it should give npdes = 2, but gives one.
This patch corrects it by rounding last pfn up to 512 border,
first - down to 512 border and then subtracting and dividing by 512.
v7: Make npde calculation clearer, fix ndw calculation.
v8: (agd5f): reserve enough for 2 full VM PTs, add some
additional comments.
v9: fix typo in npde calculation
Signed-off-by: Dmitry Cherkasov <Dmitrii.Cherkasov@amd.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cleanup the interface in preparation for hierarchical page tables.
v2: add incr parameter to set_page for simple scattered PTs uptates
added PDE-specific flags to r600_flags and radeon_drm.h
removed superfluous value masking with 0xffffffff
v3: removed superfluous bo_va->valid checking
changed R600_PTE_VALID to R600_ENTRY_VALID to handle PDE too
v4 (ck): fix indention style, rework and fix typos in commit message,
add documentation for incr parameter, also use incr
parameter for system pages
v5 (agd5f): use upper_32_bits() and minor white space fixes
Signed-off-by: Christian König <deathsimple@vodafone.de>
Signed-off-by: Dmitry Cherkassov <Dmitrii.Cherkasov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Let's allow GCC to optimize better.
This exposed some five unused functions, but this patch doesn't remove them.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Currently doing the update with the CP.
v2: Rebased on Jeromes bugfix. Make validity comparison
more human readable.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Move binding onto the ring, simplifying handling a bit.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Move flushing the VMs as function into the rings.
First step to make VM operations async.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Store a reference to the VM into the IB structure, that
makes calculating the IBs address a bit less complicated.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Print various CP register that have valuable informations regarding
GPU lockup.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Before emitting any indirect buffer, emit the offset of the next
valid ring content if any. This allow code that want to resume
ring to resume ring right after ib that caused GPU lockup.
v2: use scratch registers instead of storing it into memory
v3: skip over the surface sync for ni and si as well
v4: use SET_CONFIG_REG instead of PACKET0
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Making it easier to control when it is executed.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Just restore the page table instead. Addressing three
problem with this change:
1. Calling vm_manager_suspend in the suspend path is
problematic cause it wants to wait for the VM use
to end, which in case of a lockup never happens.
2. In case of a locked up memory controller
unbinding the VM seems to make it even more
unstable, creating an unrecoverable lockup
in the end.
3. If we want to backup/restore the leftover ring
content we must not unbind VMs in between.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Just reinitialize the shader content on resume instead.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
The IB pool is in gart memory, so it is completely
superfluous to unpin / repin it on suspend / resume.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
It's not critical, but the current code isn't
100% correct.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
For a normal suspend/resume we allready wait for
the rings to be empty, and for a suspend/reasume
in case of a lockup we REALLY don't want to wait
for anything.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
It is completely unnecessary to create fences
before they are emitted, so remove it and a bunch
of checks if fences are emitted or not.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
After recent changes HDMI code is ready to be enabled on DCE5. This
patch just changes conditions to execute already present code on DCE5.
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Christian König <christian.koenig@amd.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Tested-by: Zoltán Böszörményi <zboszor@pr.hu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Tiling group size is always 256bits on r6xx/r7xx/r8xx/9xx. Also fix and
simplify render backend map. This now properly sets up the backend map
on r6xx-9xx which should improve 3D performance.
Vadim benchmarked also:
Some benchmarks on juniper (5750), fullscreen 1920x1080,
first result - kernel 3.4.0+ (fb21affa), second - with these patches:
Lightsmark: 91 fps => 123 fps +35%
Doom3: 74 fps => 101 fps +36%
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
While there are cards with more than 8 mem banks, the max
number of banks from a tiling perspective is 8, so cap
the tiling config at 8 banks.
Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=43448
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Using the wrong union.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>