Fixes lockups due to CP read GPUVM faults when running piglit on Cape
Verde.
v2 (chk): apply the fix to R600+ as well, on CIK only the GFX CP has
a PFP, add more comments to R600 code, enable flushing again
v3: (agd5f): only apply to 7xx+. r6xx does not have the packet.
v4: (agd5f): split flush change into a separate patch, fix formatting
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
It isn't necessary for command streams generated by the kernel (at least
not while we aren't storing ring or indirect buffers in VRAM).
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Skip the "manual" pageflip completion checks via polling and
guessing in the vblank handler radeon_crtc_handle_vblank() on
asics which are known to reliably support hw pageflip completion
irqs. Those pflip irqs are a more reliable and race-free method
of handling pageflip completion detection, whereas the "classic"
polling method has some small races in combination with dpm on,
and with the reworked pageflip implementation since Linux 3.16.
On old asics without pflip irqs, the classic method is used.
On asics with known good pflip irqs, only pflip irqs are used
by default, but a new module parameter "use_pflipirqs" allows to
override this in case we encounter asics in the wild with
unreliable or faulty pflip irqs. A module parameter of 0 allows
to use the classic method only in such a case. A parameter of 1
allows to use both classic method and pflip irqs as additional
band-aid to avoid some small races which could happen with the
classic method alone. The setting 1 gives Linux 3.16 behaviour.
Hw pflip irqs are available since R600.
Tested on DCE-4, AMD Cedar - FirePro 2270.
v2: agd5f: only enable pflip interrupts on DCE4+ as they are not
reliable on older asics.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Older firmware didn't support the new nop packet.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Older firmware didn't support the new nop packet.
v2 (Andreas Boll):
- Drop usage of packet3 for new firmware
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com> (v1)
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Cc: stable@vger.kernel.org
This ensures the GPU sees all previous CPU writes to VRAM, which makes it
safe:
* For userspace to stream data from CPU to GPU via VRAM instead of GTT
* For IBs to be stored in VRAM instead of GTT
* For ring buffers to be stored in VRAM instead of GTT, if the HPD flush
is performed via MMIO
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Seems to make VM flushes more stable on SI and CIK.
v2: only use the PFP on the GFX ring on CIK
Signed-off-by: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
v2: fix rebase onto drm-fixes
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Doesn't seem necessary, the GART table memory should be persistent.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This adds CIK support for the new ucode format.
v2: add size validation, integrate debug info
v3: add support for MEC2 on KV
v4: fix typos
v4: update to latest format
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This is a halfway fix for hawaii acceleration. More fixes to come
but hopefully isolated to userspace.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
We must mask out the overflow bit as well, otherwise
the wptr will never match the rptr again and the interrupt
handler will loop forever.
Signed-off-by: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Replace occurrences of "v & 0xffffffff" with lower_32_bits(v)
when it's next to an upper_32_bits(v). Also remove unnecessary
"upper_32_bits(v) & 0xffffffff" code snippets.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Merge drm-fixes into drm-next.
Both i915 and radeon need this done for later patches.
Conflicts:
drivers/gpu/drm/drm_crtc_helper.c
drivers/gpu/drm/i915/i915_drv.h
drivers/gpu/drm/i915/i915_gem.c
drivers/gpu/drm/i915/i915_gem_execbuffer.c
drivers/gpu/drm/i915/i915_gem_gtt.c
This patch makes it possible to decide how many address
bits are spend on the page directory vs the page tables.
v2: remove unintended change
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch implements support for VRAM page table entry compression.
PTE construction is enhanced to identify physically contiguous page
ranges and mark them in the PTE fragment field. L1/L2 TLB support is
enabled for 64KB (SI/CIK) and 256KB (NI) PTE fragments, significantly
improving TLB utilization for VRAM allocations.
Linear store bandwidth is improved from 60GB/s to 125GB/s on Pitcairn.
Unigine Heaven 3.0 sees an average improvement from 24.7 to 27.7 FPS
on default settings at 1920x1200 resolution with vsync disabled.
See main comment in radeon_vm.c for a technical description.
v2 (chk): rebased and simplified.
v3 (chk): add missing hw setup
v4 (chk): rebased on current drm-fixes-3.15
v5 (chk): fix comments and commit text
Signed-off-by: Jay Cornwall <jay@jcornwall.me>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Also add golden registers, update firmware loading functions.
Signed-off-by: Samuel Li <samuel.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
It would appear this bug has been copy/pasted many times without being noticed.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Testing the update pending bit directly after issuing an
update is nonsense cause depending on the pixel clock the
CRTC needs a bit of time to execute the flip even when we
are in the VBLANK period.
This is just a non invasive patch to solve the problem at
hand, a more complete and cleaner solution should follow
in the next merge window.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=76564
v2: fix source IDs for CRTC2-6
Signed-off-by: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
Avoid a possible segfault.
Noticed-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Need to swap on BE.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
This fixes fast color clear with 1D-tiled single-sample surfaces
and Hyper-Z corruption with 1D-tiled depth surfaces.
Even though it seems it is not needed for 1D tiling, CMASK and HTILE are
always 2D-tiled, thus the hw needs to know the actual pipe configuration
for CMASK and HTILE addressing no matter what the tiling mode of the surface
is.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJTJlUvAAoJEHm+PkMAQRiGOhYH/1I+Bc7N7Rjr6QQAtBIy0GPC
XMqSE/gpgxlvRneQbQsvTUlPnWRhgzLGendT9HFKawkaQ0UNuZdRVyBHGFmpuED8
RlbicVVuuEZabrxEnCd7UPvYvEyK5pLIFpCRs5B+ManB1qLki2Ar03ymH1NRxOde
edmPbSUFo2aONITrEBm7tqT3cShTmBaDGP/zU0TNDMNrpVVDbHZolSNu2z4xOTa5
GqAOEbluLQ6jP3yxWur/V3Lk3W7pB6TabfX4o6UZu0F3iFnJxRMIzHXrI3o4yLTj
HEwmB3npfc8DIUk4oik7RkN+aqxDcdg/rBLQD63+xxt6zCkP+0q16brC0R67qWE=
=n9Ml
-----END PGP SIGNATURE-----
Merge tag 'v3.14-rc7' into drm-next
Linux 3.14-rc7
Backmerge to help out Intel guys.
When we disable the rings, set the status properly. If
not other code pathes may try and use the rings which are
not functional at this point.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Moving the pm resume up in the init order to fix
dpm seems to have regressed somes cases with the old
pm code. Move it back to late resume.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The reason for the false positives was fixed quite some time ago and since
most engines can still execute NOPs while being locked up it leads to false
negatives.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
In all cases where it really matters we are using the read functions anyway.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Some of the vce clocks are automatic, others need to
be manually enabled. For ease, just disable cg when
vce is active.
v2: rebased
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Only VCE 2.0 support so far.
v2: squashing multiple patches into this one
v3: add IRQ support for CIK, major cleanups,
basic code documentation
v4: remove HAINAN from chipset list
Signed-off-by: Christian König <christian.koenig@amd.com>
If we are not able to properly initialize one of the gpu
engines for buffer paging, we limit vram to the size of
the cpu visible aperture. We generally either use the gfx
or dma engine to do this. Clean up the size limiting code
to only adjust the size based on what ring is selected
for buffer paging rather than making assumptions about which
engine is selected for paging.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
This is the preferred flushing method on CIK.
Note, this only works on the PFP so the engine bit must be
set.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm-intel-next-2014-01-10:
- final bits for runtime D3 on Haswell from Paul (now enabled fully)
- parse the backlight modulation freq information in the VBT from Jani
(but not yet used)
- more watermark improvements from Ville for ilk-ivb and bdw
- bugfixes for fastboot from Jesse
- watermark fix for i830M (but not yet everything)
- vlv vga hotplug w/a (Imre)
- piles of other small improvements, cleanups and fixes all over
Note that the pull request includes a backmerge of the last drm-fixes
pulled into Linus' tree - things where getting a bit too messy. So the
shortlog also contains a bunch of patches from Linus tree. Please yell if
you want me to frob it for you a bit.
* 'drm-intel-next' of git://people.freedesktop.org/~danvet/drm-intel: (609 commits)
drm/i915/bdw: make sure south port interrupts are enabled properly v2
drm/i915: Include more information in disabled hotplug interrupt warning
drm/i915: Only complain about a rogue hotplug IRQ after disabling
drm/i915: Only WARN about a stuck hotplug irq ONCE
drm/i915: s/hotplugt_status_gen4/hotplug_status_g4x/
pci config reset is a low level reset that resets
the entire chip from the bus interface. It can
be more reliable if soft reset fails.
v2: fix rebase
v3: hide behind module parameter
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fill in asic family specific versions rather than
using the generic version. This lets us handle asic
specific differences more easily. In this case, we
disable sw swapping of the rtpr writeback value on
r6xx+ since the hw does it for us. Fixes bogus
rptr readback on BE systems.
v2: remove missed cpu_to_le32(), add comments
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We need more control over the ordering of dpm init with
respect to the rest of the asic. Specifically, the SMC
has to be initialized before the rlc and cg/pg. The pm
code currently initializes late in the driver, but we need
it to happen much earlier so move pm handling into the asic
specific callbacks.
This makes dpm more reliable and makes clockgating work
properly on CIK parts and should help on SI parts as well.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We need to reorder the driver init sequence to better accomodate
dpm which needs to be loaded earlier in the init sequence. Move
fw init up so that it's available for dpm init.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This will allow userspace to correctly program the PA_SC_RASTER_CONFIG
register, so it can be considered a fix.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Only the render backends of the first shader engine were enabled. The others
were erroneously disabled. Enabling the other render backends improves
performance a lot.
Unigine Sanctuary on Bonaire:
Before: 15 fps
After: 90 fps
Judging from the fan noise, the GPU was also underclocked when the other
render backends were disabled, resulting in horrible performance. The fan is
a lot noisy under load now.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org