This allows us to specify if we want to sync to
the shared fences of a reservation object or not.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
DRM_DEBUG_CODE is currently always set, so distributions enable it. The
only reason to keep support in code is if developers wanted to disable
debug support. Sounds unlikely.
All the DRM_DEBUG() printks are still guarded by a drm_debug read. So if
its cacheline is read once, they're discarded pretty fast.. There should
hardly be any performance penalty, it's even guarded by unlikely().
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Reviewed-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Changes since v1:
- Kill the sw interrupt dance, add and use
radeon_irq_kms_sw_irq_get_delayed instead.
- Change custom wait function, lockdep complained about it.
Holding exclusive_lock in the wait function might cause deadlocks.
Instead do all the processing in .enable_signaling, and wait
on the global fence_queue to pick up gpu resets.
- Process all fences in radeon_gpu_reset after reset to close a race
with the trylock in enable_signaling.
Changes since v2:
- Small changes to work with the rewritten lockup recovery patches.
Changes since v3:
- Call radeon_fence_schedule_check when exclusive_lock cannot be
acquired to always cause a wake up.
- Reset irqs from hangup check.
- Drop reading seqno in the callback, use cached value.
- Fix indentation in radeon_fence_default_wait
- Add a radeon_test_signaled function, drop a few test_bit calls.
- Make to_radeon_fence global.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
This improves concurrent stream decoding.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
More radeon changes for drm-next. Highlights:
- UVD support for older asics
- Reset rework in preparation for Maarten's fence patches
I have a few more patches which depend on Christian's ttm changes,
I'll send them out separately once you've merged the ttm changes.
* 'drm-next-3.18' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon: drop doing resets in a work item
drm/radeon: drop RADEON_FENCE_SIGNALED_SEQ v2
drm/radeon: add timeout argument to radeon_fence_wait_seq v2
drm/radeon: handle lockup in delayed work, v5
drm/radeon: take exclusive_lock in read mode during ring tests, v5
drm/radeon: force fence completion only on problematic rings (v2)
drm/radeon: wake up all fences on manual reset
drm/radeon: add UVD fw names for older asic
drm/radeon: enable RB_ARB before resetting the VCPU
drm/radeon: 760G/780V/880V don't have UVD
drm/radeon: implement UVD hw workarounds for R6xx v3
drm/radeon: add UVD support for older asics v4
drm/radeon: add set_uvd_clocks callback for r6xx v4
drm/radeon: properly init UVD MC bits on R600
drm/radeon: force UVD buffers into VRAM on RS[78]80 v2
drm/radeon: move the IB test after the AGP fallback
Blocking completely innocent processes with a GPU reset is
a pretty bad idea. Just set needs_reset and let the next
command submission or fence wait do the job.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It's causing issues with VMID handling and comparing the
fence value two times actually doesn't make handling faster.
v2: rebased on reset changes
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
v5 (chk): complete rework, start when the first fence is emitted,
stop when the last fence is signalled, make it work
correctly with GPU resets, cleanup radeon_fence_wait_seq
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This is needed for the next commit, because the lockup detection
will need the read lock to run.
v4 (chk): split out forced fence completion, remove unrelated changes,
add and handle in_reset flag
v5 (agd5f): rebase fix
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Instead of resetting all fence numbers, only reset the
number of the problematic ring. Split out from a patch
from Maarten Lankhorst <maarten.lankhorst@canonical.com>
v2 (agd5f): rebase build fix
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This allows us to more fine grained specify where to place the buffer object.
v2: rebased on drm-next, add bochs changes as well
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This fixes a problem with GPU resets and TLB flushes on SI/CIK.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
radeon userptr support.
* 'drm-next-3.18' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon: allow userptr write access under certain conditions
drm/radeon: add userptr flag to register MMU notifier v3
drm/radeon: add userptr flag to directly validate the BO to GTT
drm/radeon: add userptr flag to limit it to anonymous memory v2
drm/radeon: add userptr support v8
Conflicts:
drivers/gpu/drm/radeon/radeon_prime.c
It isn't necessary for command streams generated by the kernel (at least
not while we aren't storing ring or indirect buffers in VRAM).
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add a module paramter to enable bapm on APUs. It's disabled
by default on certain APUs due to stability issues. This
option makes it easier to test and to enable it on systems that
are stable.
bug:
https://bugzilla.kernel.org/show_bug.cgi?id=81021
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Whenever userspace mapping related to our userptr change
we wait for it to become idle and unmap it from GTT.
v2: rebased, fix mutex unlock in error path
v3: improve commit message
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch adds an IOCTL for turning a pointer supplied by
userspace into a buffer object.
It imposes several restrictions upon the memory being mapped:
1. It must be page aligned (both start/end addresses, i.e ptr and size).
2. It must be normal system memory, not a pointer into another map of IO
space (e.g. it must not be a GTT mmapping of another object).
3. The BO is mapped into GTT, so the maximum amount of memory mapped at
all times is still the GTT limit.
4. The BO is only mapped readonly for now, so no write support.
5. List of backing pages is only acquired once, so they represent a
snapshot of the first use.
Exporting and sharing as well as mapping of buffer objects created by
this function is forbidden and results in an -EPERM.
v2: squash all previous changes into first public version
v3: fix tabs, map readonly, don't use MM callback any more
v4: set TTM_PAGE_FLAG_SG so that TTM never messes with the pages,
pin/unpin pages on bind/unbind instead of populate/unpopulate
v5: rebased on 3.17-wip, IOCTL renamed to userptr, reject any unknown
flags, better handle READONLY flag, improve permission check
v6: fix ptr cast warning, use set_page_dirty/mark_page_accessed on unpin
v7: add warning about it's availability in the API definition
v8: drop access_ok check, fix VM mapping bits
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (v4)
Reviewed-by: Jérôme Glisse <jglisse@redhat.com> (v4)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Skip the "manual" pageflip completion checks via polling and
guessing in the vblank handler radeon_crtc_handle_vblank() on
asics which are known to reliably support hw pageflip completion
irqs. Those pflip irqs are a more reliable and race-free method
of handling pageflip completion detection, whereas the "classic"
polling method has some small races in combination with dpm on,
and with the reworked pageflip implementation since Linux 3.16.
On old asics without pflip irqs, the classic method is used.
On asics with known good pflip irqs, only pflip irqs are used
by default, but a new module parameter "use_pflipirqs" allows to
override this in case we encounter asics in the wild with
unreliable or faulty pflip irqs. A module parameter of 0 allows
to use the classic method only in such a case. A parameter of 1
allows to use both classic method and pflip irqs as additional
band-aid to avoid some small races which could happen with the
classic method alone. The setting 1 gives Linux 3.16 behaviour.
Hw pflip irqs are available since R600.
Tested on DCE-4, AMD Cedar - FirePro 2270.
v2: agd5f: only enable pflip interrupts on DCE4+ as they are not
reliable on older asics.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Move the decision what to use into the common VM code.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Scales much better than scanning the address range linearly.
v2: store pfn instead of address
Signed-off-by: Christian König <christian.koenig@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Don't wait for the BO to be used again, just
update the PT on the next VM use.
v2: remove stray semicolon.
Signed-off-by: Christian König <christian.koenig@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Some hawaii boards use a different method for fetching the
voltage information from the vbios.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
This ensures the GPU sees all previous CPU writes to VRAM, which makes it
safe:
* For userspace to stream data from CPU to GPU via VRAM instead of GTT
* For IBs to be stored in VRAM instead of GTT
* For ring buffers to be stored in VRAM instead of GTT, if the HPD flush
is performed via MMIO
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
And clean up the function comment a little.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
That didn't worked correctly any more and opened up a security problem.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Unused and unimplemented. Also fix specifying the
kernel flag incorrectly at one occasion.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Now that fallback to gtt is fixed for cpu access, we can
remove this limit.
bug:
https://bugs.freedesktop.org/show_bug.cgi?id=78717
v2: use new gart_pin_size to accurately track available gtt.
v3: fix comment
v4: clarify comment
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
So we know how large an allocation we can allow.
v2: incorporate Michel's comments
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
v2: fix rebase onto drm-fixes
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Doesn't seem necessary, the GART table memory should be persistent.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This was originally un-inlined by Andi Kleen in 2011 citing size concerns.
Indeed, a first attempt at inlining it grew radeon.ko by 7%.
However, 2% of cpu is spent in this function. Simply inlining it gave 1% more fps
in Urban Terror.
v2: We know the minimum MMIO size. Adding it to the if allows the compiler to
optimize the branch out, improving both performance and size.
The v2 patch decreases radeon.ko size by 2%. I didn't re-benchmark, but common sense
says perf is now more than 1% better.
v3: Also change _wreg, make the threshold a define.
Inlining _wreg increased the size a bit compared to v2, so now radeon.ko
is only 1% smaller.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This adds CIK support for the new ucode format.
v2: add size validation, integrate debug info
v3: add support for MEC2 on KV
v4: fix typos
v4: update to latest format
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This adds SI support for the new ucode format.
v2: add size validation, integrate debug info
v3: update to latest version
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Calling radeon_vm_bo_find on the IB BO during CS
is illegal and can lead to an crash.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
v3: completely rewritten. We now just remember which areas
of the PT to clear and do so on the next command submission.
Bug: https://bugs.freedesktop.org/show_bug.cgi?id=79980
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
As well as enabling the vblank interrupt. These shouldn't take any
significant amount of time, but at least pinning the BO has actually been
seen to fail in practice before, in which case we need to let userspace
know about it.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Some monitors seem to have problems with deep color enabled, even
though they claim to support it. I'm not sure if the monitor
need a quirk or if the driver is doing something the monitor doesn't
like. At this point lets just disable deep color by default like
we did for hdmi audio and work through the bugs so we can eventually
enable it by default.
bug:
https://bugs.freedesktop.org/show_bug.cgi?id=80531
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
And also domain to prefered_domains. That matches better
what those values represent.
Signed-off-by: Christian König <christian.koenig@amd.com>
Cc: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We never check the return value anyway and if the
index isn't valid would crash way before calling
the functions.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Merge drm-fixes into drm-next.
Both i915 and radeon need this done for later patches.
Conflicts:
drivers/gpu/drm/drm_crtc_helper.c
drivers/gpu/drm/i915/i915_drv.h
drivers/gpu/drm/i915/i915_gem.c
drivers/gpu/drm/i915/i915_gem_execbuffer.c
drivers/gpu/drm/i915/i915_gem_gtt.c
Instead of trying to flip inside the vblank period when
the buffer is idle, offload blocking for idle to a kernel
thread and program the flip directly into the hardware.
v2: add error handling, fix EBUSY handling
v3: add proper exclusive_lock handling
v4: update crtc->primary->fb when the flip actually happens
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Instead of trying to flip inside the vblank period when
the buffer is idle, offload blocking for idle to a kernel
thread and program the flip directly into the hardware.
v2: add error handling, fix EBUSY handling
v3: add proper exclusive_lock handling
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
They are doing the same on all generations anyway.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch implements support for VRAM page table entry compression.
PTE construction is enhanced to identify physically contiguous page
ranges and mark them in the PTE fragment field. L1/L2 TLB support is
enabled for 64KB (SI/CIK) and 256KB (NI) PTE fragments, significantly
improving TLB utilization for VRAM allocations.
Linear store bandwidth is improved from 60GB/s to 125GB/s on Pitcairn.
Unigine Heaven 3.0 sees an average improvement from 24.7 to 27.7 FPS
on default settings at 1920x1200 resolution with vsync disabled.
See main comment in radeon_vm.c for a technical description.
v2 (chk): rebased and simplified.
v3 (chk): add missing hw setup
v4 (chk): rebased on current drm-fixes-3.15
v5 (chk): fix comments and commit text
Signed-off-by: Jay Cornwall <jay@jcornwall.me>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mullins is DCE83 just like Kabini. Set the proper number
of endpoints on mullins.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
v2 (chk): fix image size storage
v3 (chk): fix UV size calculation
Signed-off-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Testing the update pending bit directly after issuing an
update is nonsense cause depending on the pixel clock the
CRTC needs a bit of time to execute the flip even when we
are in the VBLANK period.
This is just a non invasive patch to solve the problem at
hand, a more complete and cleaner solution should follow
in the next merge window.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=76564
v2: fix source IDs for CRTC2-6
Signed-off-by: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
Some i2c fixes over DisplayPort.
* 'drm-next-3.15-wip' of git://people.freedesktop.org/~deathsimple/linux:
drm/radeon: Improve vramlimit module param documentation
drm/radeon: fix audio pin counts for DCE6+ (v2)
drm/radeon/dp: switch to the common i2c over aux code
drm/dp/i2c: Update comments about common i2c over dp assumptions (v3)
drm/dp/i2c: send bare addresses to properly reset i2c connections (v4)
drm/radeon/dp: handle zero sized i2c over aux transactions (v2)
drm/i915: support address only i2c-over-aux transactions
drm/tegra: dp: Support address-only I2C-over-AUX transactions
There is actually quite a bit of variance based on
the asic.
v2: fix typo noticed by Jerome.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJTJlUvAAoJEHm+PkMAQRiGOhYH/1I+Bc7N7Rjr6QQAtBIy0GPC
XMqSE/gpgxlvRneQbQsvTUlPnWRhgzLGendT9HFKawkaQ0UNuZdRVyBHGFmpuED8
RlbicVVuuEZabrxEnCd7UPvYvEyK5pLIFpCRs5B+ManB1qLki2Ar03ymH1NRxOde
edmPbSUFo2aONITrEBm7tqT3cShTmBaDGP/zU0TNDMNrpVVDbHZolSNu2z4xOTa5
GqAOEbluLQ6jP3yxWur/V3Lk3W7pB6TabfX4o6UZu0F3iFnJxRMIzHXrI3o4yLTj
HEwmB3npfc8DIUk4oik7RkN+aqxDcdg/rBLQD63+xxt6zCkP+0q16brC0R67qWE=
=n9Ml
-----END PGP SIGNATURE-----
Merge tag 'v3.14-rc7' into drm-next
Linux 3.14-rc7
Backmerge to help out Intel guys.
Just move all fields into radeon_cs_reloc, removing unused/duplicated fields.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
No need to make it more complicated than necessary,
just allocate the page tables as normal BO and
flush whenever the address change.
v2: update comments and function name
v3: squash bug fixes, page directory and tables patch
v4: rebased on Mareks changes
Signed-off-by: Christian König <christian.koenig@amd.com>
Userspace should set the first 4 bits of drm_radeon_cs_reloc::flags to
a number from 0 to 15. The higher the number, the higher the priority,
which means a buffer with a higher number will be validated sooner.
The old behavior is preserved: Buffers used for write are prioritized over
read-only buffers if the userspace doesn't set the number.
v2: add buffers to buckets directly, then concatenate them
v3: use a stable sort
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
The statistics are:
- VRAM usage in bytes
- GTT usage in bytes
- number of bytes moved by TTM
The last one is actually a counter, so you need to sample it before and after
command submission and take the difference.
This is useful for finding performance bottlenecks. Userspace queries are
also added.
v2: use atomic64_t
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
When passing buffers between processes, the receiving process needs to know
the original buffer domain, so that it doesn't accidentally move the buffer.
v2: reserve the buffer
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
We no longer need to take the ring lock while checking for
a gpu lockup, so just cleanup the code.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Use atomics and jiffies_64, so that we don't need to have the
ring mutex locked any more and avoid wrap arounds.
v2: fix some checkpatch warnings
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Disable audio around audio hw setup. This may avoid
hangs on certain asics.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
So this is the initial pull request for radeon drm-next 3.15. Highlights:
- VCE bringup including DPM support
- Few cleanups for the ring handling code
* 'drm-next-3.15' of git://people.freedesktop.org/~deathsimple/linux:
drm/radeon: cleanup false positive lockup handling
drm/radeon: drop radeon_ring_force_activity
drm/radeon: drop drivers copy of the rptr
drm/radeon/cik: enable/disable vce cg when encoding v2
drm/radeon: add support for vce 2.0 clock gating
drm/radeon/dpm: properly enable/disable vce when vce pg is enabled
drm/radeon/dpm: enable dynamic vce state switching v2
drm/radeon: add vce dpm support for KV/KB
drm/radeon: enable vce dpm on CI
drm/radeon: add vce dpm support for CI
drm/radeon: fill in set_vce_clocks for CIK asics
drm/radeon/dpm: fetch vce states from the vbios
drm/radeon/dpm: fill in some initial vce infrastructure
drm/radeon/dpm: move platform caps fetching to a separate function
drm/radeon: add callback for setting vce clocks
drm/radeon: add VCE version parsing and checking
drm/radeon: add VCE ring query
drm/radeon: initial VCE support v4
drm/radeon: fix CP semaphores on CIK
The CP semaphore queue on CIK has a bug that triggers if uncompleted
waits use the same address while a signal is still pending. Work around
this by using different addresses for each sync.
Signed-off-by: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The reason for the false positives was fixed quite some time ago and since
most engines can still execute NOPs while being locked up it leads to false
negatives.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
In all cases where it really matters we are using the read functions anyway.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
enable vce states when vce is active. When vce is active,
it adjusts the currently selected state (performance, battery,
uvd, etc.)
v2: add code comments
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Only VCE 2.0 support so far.
v2: squashing multiple patches into this one
v3: add IRQ support for CIK, major cleanups,
basic code documentation
v4: remove HAINAN from chipset list
Signed-off-by: Christian König <christian.koenig@amd.com>
The CP semaphore queue on CIK has a bug that triggers if uncompleted
waits use the same address while a signal is still pending. Work around
this by using different addresses for each sync.
Signed-off-by: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
Otherwise we allocate a new VMID on nearly every submit.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm-intel-next-2014-01-10:
- final bits for runtime D3 on Haswell from Paul (now enabled fully)
- parse the backlight modulation freq information in the VBT from Jani
(but not yet used)
- more watermark improvements from Ville for ilk-ivb and bdw
- bugfixes for fastboot from Jesse
- watermark fix for i830M (but not yet everything)
- vlv vga hotplug w/a (Imre)
- piles of other small improvements, cleanups and fixes all over
Note that the pull request includes a backmerge of the last drm-fixes
pulled into Linus' tree - things where getting a bit too messy. So the
shortlog also contains a bunch of patches from Linus tree. Please yell if
you want me to frob it for you a bit.
* 'drm-intel-next' of git://people.freedesktop.org/~danvet/drm-intel: (609 commits)
drm/i915/bdw: make sure south port interrupts are enabled properly v2
drm/i915: Include more information in disabled hotplug interrupt warning
drm/i915: Only complain about a rogue hotplug IRQ after disabling
drm/i915: Only WARN about a stuck hotplug irq ONCE
drm/i915: s/hotplugt_status_gen4/hotplug_status_g4x/
This is used to hard reset the asic. If a soft
reset is not able to reset things, a hard reset
can be used.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Enabling this parameter enables pci config reset,
aka hard reset, which is a bus level chip reset.
In some cases this works more reliably than a soft
reset. Disabled by default.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fill in asic family specific versions rather than
using the generic version. This lets us handle asic
specific differences more easily. In this case, we
disable sw swapping of the rtpr writeback value on
r6xx+ since the hw does it for us. Fixes bogus
rptr readback on BE systems.
v2: remove missed cpu_to_le32(), add comments
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Certain features need to be enabled after ring tests
(e.g., powergating, etc.). Add a function pointer
to split out late enable features.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
v2: add default_llseek
v3: set inode size in the open callback
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Not very fast, but makes it possible to access even the
normally inaccessible parts of VRAM from userspace.
v2: use MM_INDEX_HI for >2GB mem access, add default_llseek
v3: set inode size in the open callback
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This will allow userspace to correctly program the PA_SC_RASTER_CONFIG
register, so it can be considered a fix.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Also rename the function to better reflect what it is doing.
agd5f: fix argument size warning
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This is required to properly calculate the tiling parameters
in userspace.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
A single doorbell page is plenty for cik kms compute.
Use a single page and manage doorbell allocation by
individual doorbells rather than pages. Identify
doorbells by their index rather than byte offset.
Signed-off-by: Andrew Lewycky <Andrew.Lewycky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
To workaround bugs and/or certain limits it's sometimes
useful to fall back to waiting on fences.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Bit a bit -fixes pull request in the merge window than usual dua to two
feauture-y things:
- Display CRCs are now enabled on all platforms, including the odd DP case
on gm45/vlv. Since this is a testing-only feature it should ever hurt,
but I figured it'll help with regression-testing -fixes. So I left it
in and didn't postpone it to 3.14.
- Display power well refactoring from Imre. Would have caused major pain
conflict with the bdw stage 1 patches if I'd postpone this to -next.
It's only an relatively small interface rework, so shouldn't cause pain.
It's also been in my tree since almost 3 weeks already.
That accounts for about two thirds of the pull, otherwise just bugfixes:
- vlv backlight fix from Jesse/Jani
- vlv vblank timestamp fix from Jesse
- improved edp detection through vbt from Ville (fixes a vlv issue)
- eDP vdd fix from Paulo
- fixes for dvo lvds on i830M
- a few smaller things all over
Note: This contains a backmerge of v3.12. Since the -internal branch
always applied on top of -nightly I need that unified base to merge bdw
patches. So you'll get a conflict with radeon connector props when pulling
this (and nouveau/master will also conflict a bit when Ben doesn't
rebase). The backmerge itself only had conflicts in drm/i915.
There's also a tiny conflict between Jani's backlight fix and your sysfs
lifetime fix in drm-next.
* tag 'drm-intel-fixes-2013-11-07' of git://people.freedesktop.org/~danvet/drm-intel: (940 commits)
drm/i915/vlv: use per-pipe backlight controls v2
drm/i915: make backlight functions take a connector
drm/i915: move opregion asle request handling to a work queue
drm/i915/vlv: use PIPE_START_VBLANK interrupts on VLV
drm/i915: Make intel_dp_is_edp() less specific
drm/i915: Give names to the VBT child device type bits
drm/i915/vlv: enable HDA display audio for Valleyview2
drm/i915/dvo: call ->mode_set callback only when the port is running
drm/i915: avoid unclaimed registers when capturing the error state
drm/i915: Enable DP port CRC for the "auto" source on g4x/vlv
drm/i915: scramble reset support for DP port CRC on vlv
drm/i915: scramble reset support for DP port CRC on g4x
drm/i916: add "auto" pipe CRC source
...
Conflicts:
MAINTAINERS
drivers/gpu/drm/i915/intel_panel.c
drivers/gpu/drm/nouveau/core/subdev/mc/base.c
drivers/gpu/drm/radeon/atombios_encoders.c
drivers/gpu/drm/radeon/radeon_connectors.c
op 08-10-13 18:58, Thomas Hellstrom schreef:
> On 10/08/2013 06:47 PM, Jerome Glisse wrote:
>> On Tue, Oct 08, 2013 at 06:29:35PM +0200, Thomas Hellstrom wrote:
>>> On 10/08/2013 04:55 PM, Jerome Glisse wrote:
>>>> On Tue, Oct 08, 2013 at 04:45:18PM +0200, Christian König wrote:
>>>>> Am 08.10.2013 16:33, schrieb Jerome Glisse:
>>>>>> On Tue, Oct 08, 2013 at 04:14:40PM +0200, Maarten Lankhorst wrote:
>>>>>>> Allocate and copy all kernel memory before doing reservations. This prevents a locking
>>>>>>> inversion between mmap_sem and reservation_class, and allows us to drop the trylocking
>>>>>>> in ttm_bo_vm_fault without upsetting lockdep.
>>>>>>>
>>>>>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
>>>>>> I would say NAK. Current code only allocate temporary page in AGP case.
>>>>>> So AGP case is userspace -> temp page -> cs checker -> radeon ib.
>>>>>>
>>>>>> Non AGP is directly memcpy to radeon IB.
>>>>>>
>>>>>> Your patch allocate memory memcpy userspace to it and it will then be
>>>>>> memcpy to IB. Which means you introduce an extra memcpy in the process
>>>>>> not something we want.
>>>>> Totally agree. Additional to that there is no good reason to provide
>>>>> anything else than anonymous system memory to the CS ioctl, so the
>>>>> dependency between the mmap_sem and reservations are not really
>>>>> clear to me.
>>>>>
>>>>> Christian.
>>>> I think is that in other code path you take mmap_sem first then reserve
>>>> bo. But here we reserve bo and then we take mmap_sem because of copy
>>> >from user.
>>>> Cheers,
>>>> Jerome
>>>>
>>> Actually the log message is a little confusing. I think the mmap_sem
>>> locking inversion problem is orthogonal to what's being fixed here.
>>>
>>> This patch fixes the possible recursive bo::reserve caused by
>>> malicious user-space handing a pointer to ttm memory so that the ttm
>>> fault handler is called when bos are already reserved. That may
>>> cause a (possibly interruptible) livelock.
>>>
>>> Once that is fixed, we are free to choose the mmap_sem ->
>>> bo::reserve locking order. Currently it's bo::reserve->mmap_sem(),
>>> but the hack required in the ttm fault handler is admittedly a bit
>>> ugly. The plan is to change the locking order to
>>> mmap_sem->bo::reserve
>>>
>>> I'm not sure if it applies to this particular case, but it should be
>>> possible to make sure that copy_from_user_inatomic() will always
>>> succeed, by making sure the pages are present using
>>> get_user_pages(), and release the pages after
>>> copy_from_user_inatomic() is done. That way there's no need for a
>>> double memcpy slowpath, but if the copied data is very fragmented I
>>> guess the resulting code may look ugly. The get_user_pages()
>>> function will return an error if it hits TTM pages.
>>>
>>> /Thomas
>> get_user_pages + copy_from_user_inatomic is overkill. We should just
>> do get_user_pages which fails with ttm memory and then use copy_highpage
>> helper.
>>
>> Cheers,
>> Jerome
> Yeah, it may well be that that's the preferred solution.
>
> /Thomas
>
I still disagree, and shuffled radeon_ib_get around to be called sooner.
How does the patch below look?
8<-------
Allocate and copy all kernel memory before doing reservations. This prevents a locking
inversion between mmap_sem and reservation_class, and allows us to drop the trylocking
in ttm_bo_vm_fault without upsetting lockdep.
Changes since v1:
- Kill extra memcpy for !AGP case.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The DMA ring seems to be stable now.
v2: remove pt_ring_index as well
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stop fiddling with jiffies, always wait for RADEON_FENCE_JIFFIES_TIMEOUT.
Consolidate the two wait sequence implementations into just one function.
Activate all waiters and remember if the reset was already done instead of
trying to reset from only one thread.
v2: clear reset flag earlier to avoid timeout in IB test
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This hooks radeon up to the runtime PM system to enable
dynamic power management for secondary GPUs in switchable
and powerxpress laptops.
v2: agd5f: clean up, add module parameter
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We use u16 for voltage values throughout the driver so switch
the table values to a u16 as well. Fixes an incompatible
cast error in ci_patch_clock_voltage_limits_with_vddc_leakage()
picked up by coverity.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
bapm is a pm feature for sharing the power budget between
the GPU and the CPU on APUs. It needs to be enabled or
disabled in certain circumstances. For now, disable it
when on battery and enable it when on AC power.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This adds spinlocks to protect access to other
indirect register apertures. These indirect spaces are
used pretty infrequently and we haven't had an reported
problems, but better safe than sorry.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
smc registers are access indirectly via the main mmio aperture, so
there may be problems with concurrent access. This adds a spinlock
to protect access to this register space.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex writes:
This is the radeon drm-next request. Big changes include:
- support for dpm on CIK parts
- support for ASPM on CIK parts
- support for berlin GPUs
- major ring handling cleanup
- remove the old 3D blit code for bo moves in favor of CP DMA or sDMA
- lots of bug fixes
[airlied: fix up a bunch of conflicts from drm_order removal]
* 'drm-next-3.12' of git://people.freedesktop.org/~agd5f/linux: (898 commits)
drm/radeon/dpm: make sure dc performance level limits are valid (CI)
drm/radeon/dpm: make sure dc performance level limits are valid (BTC-SI) (v2)
drm/radeon: gcc fixes for extended dpm tables
drm/radeon: gcc fixes for kb/kv dpm
drm/radeon: gcc fixes for ci dpm
drm/radeon: gcc fixes for si dpm
drm/radeon: gcc fixes for ni dpm
drm/radeon: gcc fixes for trinity dpm
drm/radeon: gcc fixes for sumo dpm
drm/radeonn: gcc fixes for rv7xx/eg/btc dpm
drm/radeon: gcc fixes for rv6xx dpm
drm/radeon: gcc fixes for radeon_atombios.c
drm/radeon: enable UVD interrupts on CIK
drm/radeon: fix init ordering for r600+
drm/radeon/dpm: only need to reprogram uvd if uvd pg is enabled
drm/radeon: check the return value of uvd_v1_0_start in uvd_v1_0_init
drm/radeon: split out radeon_uvd_resume from uvd_v4_2_resume
radeon kms: fix uninitialised hotplug work usage in r100_irq_process()
drm/radeon/audio: set up the sads on DCE3.2 asics
drm/radeon: fix handling of variable sized arrays for router objects
...
Conflicts:
drivers/gpu/drm/i915/i915_dma.c
drivers/gpu/drm/i915/i915_gem_dmabuf.c
drivers/gpu/drm/i915/intel_pm.c
drivers/gpu/drm/radeon/cik.c
drivers/gpu/drm/radeon/ni.c
drivers/gpu/drm/radeon/r600.c
Resturcture clockgating code so that it can be
enabled/disabled from other components such as
dpm.
v2: make function static
v3: add fine grained cg controls
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This commits adds flags for supported clockgating and
powergating features. This allows us to more easily
track which features are supported on a particular
asic and to enable/disable features for debugging.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Similar to DCE4/5, but supports multiple audio pins
which can be assigned per afmt block.
v2: rework the driver to handle more than one audio
pin.
v3: try different dto reg
v4: properly program dto
v5 (ck): change dto programming order
v6: program speaker allocation block
v7: rebase
v8: rebase on Rafał's changes
v9: integrated Rafał's comments, update to latest
drm_edid_to_speaker_allocation API
v10: add missing line break in error message
v11: add back audio enabled messages
v12: fix copy paste typo in r600_audio_enable
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Rafał Miłecki <zajec5@gmail.com>
Now that we have callbacks for [rw]ptr handling we can
remove the special handling for the DMA rings and use
the callbacks instead.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The hardware just doesn't support this correctly.
Disable it before we accidentally write anywhere we shouldn't.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Give the ring functions a separate structure and let the asic
structure point to the ring specific functions. This simplifies
the code and allows us to make changes at only one point.
No change in functionality.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Starting on CIK, multi-media blocks like UVD no longer
have special power state. Rather they have their own
DPM implementation which adjusts their clocks dynamically
when active. When they are not active, the blocks are
powergated to save power.
v2: add missing pm locks
v3: rebase on uvd state selection rework
v4: fix inverted logic typo noticed by Christian
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This adds dpm support for btc asics. This includes:
- dynamic engine clock scaling
- dynamic memory clock scaling
- dynamic voltage scaling
- dynamic pcie gen switching
Set radeon.dpm=1 to enable.
v2: remove unused radeon_atombios.c changes,
make missing smc ucode non-fatal
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
1. Handle the the thermal state directly in the work handler.
Remove the state selection function since nothing else uses it now.
2. On some asics there is no thermal state, so we just use a regular
state and force the low performance state.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use the UVD handle information to determine which
which power states to select when using UVD. For
example, decoding a single SD stream requires much
lower clocks than multiple HD streams.
v2: switch to a cleaner dpm/uvd interface
v3: change the uvd power state while streams
are active if need be
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add a helper function for counting the number of open stream handles.
v2: fix copy-pasta in comments and whitespace error
v3: make function static since it's only used in radeon_uvd.c
at the moment
v4: make non-static again for future changes
v5: make static again for new rework of dpm uvd changes
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Otherwise just reinitialize from scratch on resume,
and so make it more likely to succeed.
Signed-off-by: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
All the gem based kms drivers really want the same function to
destroy a dumb framebuffer backing storage object.
So give it to them and roll it out in all drivers.
This still leaves the option open for kms drivers which don't use GEM
for backing storage, but it does decently simplify matters for gem
drivers.
Acked-by: Inki Dae <inki.dae@samsung.com>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: Intel Graphics Development <intel-gfx@lists.freedesktop.org>
Cc: Ben Skeggs <skeggsb@gmail.com>
Reviwed-by: Rob Clark <robdclark@gmail.com>
Cc: Alex Deucher <alexdeucher@gmail.com>
Acked-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Covers requirements of all current asics.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
There are cases where we need more than 4k alignment. No
functional change with this commit.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Changing the UVD BOs offset on suspend/resume doesn't work because the VCPU
internally keeps pointers to it. Just keep it always pinned and save the
content manually.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=66425
v2: fix compiler warning
v3: fix CIK support
Note: a version of this patch needs to go to stable.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If the vblank time is too short to adjust mclk,
assume multiple displays (no mclk adjustments).
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This allows you to force specific power levels within a power
state. Due to hardware restrictions between generations, the
interface is limited to the following 3 selections:
auto: all levels enabled
low: forced to the lowest power level
high: forced to the highest power level
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Certain older rv770 asics have both a performance and
a 3D performance state rather than just multiple performance
levels in the state power state. The current code would
select the performance state rather than the 3D performance
state when the "performance" profile was selected. This change
switches to the "balanced" profile by default which ends up being
the internal performance profile. When the user selects the
"performance" profile, it selects the internal 3D performance
state so the user can select the higher performance modes.
For most asics this changes nothing. For certain rv770 asics
with static performance and 3D performance states, this allows
you to select between then using by selecting the "balanced"
and "performance" dpm profiles.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This commit converts the source of the val_seq counter to
the ww_mutex api. The reservation objects are converted later,
because there is still a lockdep splat in nouveau that has to
resolved first.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds dpm support for SI asics. This includes:
- dynamic engine clock scaling
- dynamic memory clock scaling
- dynamic voltage scaling
- dynamic pcie gen1/gen2/gen3 switching
- power containment
- shader power scaling
Set radeon.dpm=1 to enable.
v2: enable hainan support, rebase
v3: guard acpi stuff
v4: fix 64 bit math
v5: fix 64 bit div harder
v6: fix thermal interrupt check noticed by Jerome
v7: attempt fix state enable
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Validate the voltages against the voltage requirements of the
dispclk. We currently don't adjust the disp clock so it never
changes, but we need to filter out voltage levels that are too
low none the less.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>