Move the vblank evasion up from the low-level, hw-specific
update_plane() handlers to the general plane commit operation.
Everything inside commit should now be non-sleeping, so this brings us
closer to how vblank evasion will behave once we move over to atomic.
v2:
- Restore lost intel_crtc->active check on vblank evasion
v3:
- Replace assert_pipe_enabled() in intel_disable_primary_hw_plane()
with an intel_crtc->active test; it turns out assert_pipe_enabled()
grabs some mutexes and can sleep, which we can't do with interrupts
disabled.
v4:
- Equivalent to v2; v3 change is now squashed into an earlier patch
of the series. (Ander).
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Once we integrate our work into the atomic pipeline, plane commit
operations will need to happen with interrupts disabled, due to vblank
evasion. Our commit functions today include sleepable work, so those
operations need to be split out and run either before or after the
atomic register programming.
The solution here calculates which of those operations will need to be
performed during the 'check' phase and sets flags in an intel_crtc
sub-struct. New intel_begin_crtc_commit() and
intel_finish_crtc_commit() functions are added before and after the
actual register programming; these will eventually be called from the
atomic plane helper's .atomic_begin() and .atomic_end() entrypoints.
v2: Fix broken sprite code split
v3: Make the pre/post commit work crtc-based to match how we eventually
want this to be called from the atomic plane helpers.
v4: Some platforms that haven't had their watermark code reworked were
waiting for vblank, then calling update_sprite_watermarks in their
platform-specific disable code. These also need to be flagged out
of the critical section.
v5: Sprite plane test for primary show/hide should just set the flag to
wait for pending flips, not actually perform the wait. (Ander)
v6:
- Rebase onto latest di-nightly; picks up an important runtime PM fix.
- Handle 'wait_for_flips' flag in intel_begin_crtc_commit(). (Ander)
- Use wait_for_flips flag for primary plane update rather than
performing the wait in the check routine.
- Added kerneldoc to pre_disable/post_enable functions that are no
longer static. (Ander)
- Replace assert_pipe_enabled() in intel_disable_primary_hw_plane()
with an intel_crtc->active test; it turns out assert_pipe_enabled()
grabs some mutexes and can sleep, which we can't do with interrupts
disabled.
v7:
- Check for fb != NULL when deciding whether the sprite plane hides the
primary plane during a sprite update. (PRTS)
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If CONFIG_BUG=n __WARN_printf won't be defined leading to the below
build failure. The double underscores should have told us to steer clear
of it anyway.
drivers/gpu/drm/i915/intel_display.c: In function ‘assert_pll’:
drivers/gpu/drm/i915/intel_display.c:1027:2: error: implicit declaration
of function ‘__WARN_printf’ [-Werror=implicit-function-declaration]
I915_STATE_WARN(cur_state != state,
Use WARN(1, ...) instead. It handles CONFIG_BUG=n gracefully and, with
the constant condition, a sane compiler should reduce it to
__WARN_printf.
This is a regression introduced by
commit e2c719b75c
Author: Rob Clark <robdclark@gmail.com>
Date: Mon Dec 15 13:56:32 2014 -0500
drm/i915: tame the chattermouth (v2)
Reported-by: Jim Davis <jim.epost@gmail.com>
Reference: http://mid.gmane.org/CA+r1ZhgHTi7bS2irhtuSUs9aO=Br1dumN8=oAOeaMJDZ_ZhwBw@mail.gmail.com
Cc: Rob Clark <robdclark@gmail.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Conflicts:
drivers/gpu/drm/i915/intel_runtime_pm.c
Separate branch so that Takashi can also pull just this refactoring
into sound-next.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
After switching to using the component interface this API isn't needed
any more.
v2-3: unchanged
v4:
- move the removal of i915_powerwell.h to this patch (Takashi)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Register a component to be used to interface with the snd_hda_intel
driver. This is meant to replace the same interface that is currently
based on module symbol lookup.
v2:
- change roles between the hda and i915 components (Daniel)
- add the implementation to a new file (Jani)
- use better namespacing (Jani)
v3:
- move the implementation to intel_audio.c (Daniel)
- rename display_component to audio_component (Daniel)
- add kerneldoc (Daniel)
v4:
- run forgotten git rm i915_component.c (Jani)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This will be needed by later patches, so factor it out.
No functional change.
v2:
- s/dev_to_i915_priv/dev_to_i915/ (Jani)
- don't use the helper in i915_pm_suspend (Chris)
- simplify the helper (Chris)
v3:
- remove redundant upcasting in the helper (Daniel)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If not pinned VMA can become an eviction target just before it needs to be
executed which breaks the internal object lifetime rules.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87399
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
- Complete overhaul to the main IOCTL function, kfd_ioctl(), according to
drm_ioctl() example. This includes changing the IOCTL definitions, so it
breaks compatibility with previous versions of the userspace. However,
because the kernel was not officialy released yet, and this the first
kernel that includes amdkfd, I assume I can still do that at this stage.
- A couple of bug fixes for the non-HWS path (used for bring-ups and
debugging purposes only).
* tag 'amdkfd-fixes-2015-01-06' of git://people.freedesktop.org/~gabbayo/linux:
drm/amdkfd: rewrite kfd_ioctl() according to drm_ioctl()
drm/amdkfd: reformat IOCTL definitions to drm-style
drm/amdkfd: Do copy_to/from_user in general kfd_ioctl()
drm/amdkfd: unmap VMID<-->PASID when relesing VMID (non-HWS)
drm/radeon: Assign VMID to PASID for IH in non-HWS mode
drm/radeon: do not leave queue acquired if timeout happens in kgd_hqd_destroy()
drm/amdkfd: Load mqd to hqd in non-HWS mode
drm/amd: Fixing typos in kfd<->kgd interface
some minor radeon fixes.
* 'drm-fixes-3.19' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon: integer underflow in radeon_cp_dispatch_texture()
drm/radeon: adjust default bapm settings for KV
drm/radeon: properly filter DP1.2 4k modes on non-DP1.2 hw
drm/radeon: fix sad_count check for dce3
drm/radeon: KV has three PPLLs (v2)
- Fix BUG() on !SMP builds
- Fix for OOPS on pre-NV50 that snuck into -next
- MCP7[789A] hang fix where firmware hasn't already setup NISO pollers
- NV4x IGP MSI disable, it doesn't appear to work correctly
- Add GK208B to recognised boards (no code change aside from adding
chipset recognition)
* 'linux-3.19' of git://anongit.freedesktop.org/git/nouveau/linux-2.6:
drm/nouveau/nouveau: Do not BUG_ON(!spin_is_locked()) on UP
drm/nv4c/mc: disable msi
drm/nouveau/fb/ram/mcp77: enable NISO poller
drm/nouveau/fb/ram/mcp77: use carveout reg to determine size
drm/nouveau/fb/ram/mcp77: subclass nouveau_ram
drm/nouveau: wake up the card if necessary during gem callbacks
drm/nouveau/device: Add support for GK208B, resolves bug 86935
drm/nouveau: fix missing return statement in nouveau_ttm_tt_unpopulate
drm/nouveau/bios: fix oops on pre-nv50 chipsets
Sometimes we wish to tweak how an individual context behaves. Since we
always create a context for every filp, this means that individual
processes can fine tune their behaviour even if they do not explicitly
create a context.
The first example parameter here is to enable multi-process GPU testing,
but the interface should be able to cope with passing arbitrarily complex
parameters.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Testcase: igt/gem_reset_stats/ban-period-*
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It is platform/output depenedent when exactly the pipe will start
running. Sometimes we just need the (cpu) pipe enabled, in other cases
the pch transcoder is enough and in yet other cases the (DP) port is
sending the frame start signal.
In a perfect world we'd put the drm_crtc_vblank_on call exactly where
the pipe starts running, but due to cloning and similar things this
will get messy. And the current approach of picking the most
conservative place for all combinations also doesn't work since that
results in legit vblank waits (in encoder->enable hooks, e.g. the 2
vblank waits for sdvo) failing.
Completely going back to the old world before
commit 51e31d49c8
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Mon Sep 15 12:36:02 2014 +0200
drm/i915: Use generic vblank wait
isn't great either since screaming when the vblank wait work because
the pipe is off is kinda nice.
Pick a compromise and move the drm_crtc_vblank_on right before the
encoder->enable call. This is a lie on some outputs/platforms, but
after the ->enable callback the pipe is guaranteed to run everywhere.
So not that bad really. Suggested by Ville.
v2: Same treatment for drm_crtc_vblank_off and encoder->disable: I've
missed the ibx pipe B select w/a, which also has a vblank wait in the
disable function (while the pipe is obviously still running).
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
This will allow us to set per-file, or even per-context, periods in the
future.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This reverts commit 371abae844.
This data seems unreliable and causing many issues and blocking other
teams and feature implementation. Safest way is to revert that for now.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88081
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88039
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87671
Cc: Vandana Kannan <vandana.kannan@intel.com>
Cc: Deepak M <m.deepak@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Kristian Høgsberg <hoegsberg@gmail.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This patch changes kfd_ioctl() to be very similar to drm_ioctl().
The patch defines an array of amdkfd_ioctls, which maps IOCTL definition to the
ioctl function.
The kfd_ioctl() uses that mapping to call the appropriate ioctl function,
through a function pointer.
This patch also declares a new typedef for the ioctl function pointer.
v2: Renamed KFD_COMMAND_(START|END) to AMDKFD_...
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
This patch reformats the ioctl definitions in kfd_ioctl.h to be similar to the
drm ioctls definition style.
v2: Renamed KFD_COMMAND_(START|END) to AMDKFD_...
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
This patch moves the copy_to_user() and copy_from_user() calls from the
different ioctl functions in amdkfd to the general kfd_ioctl() function, as
this is a common code for all ioctls.
This was done according to example taken from drm_ioctl.c
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
This patch provides support to create write-combining virtual mappings of
GEM object. It intends to provide the same funtionality of 'mmap_gtt'
interface without the constraints and contention of a limited aperture
space, but requires clients handles the linear to tile conversion on their
own. This is for improving the CPU write operation performance, as with such
mapping, writes and reads are almost 50% faster than with mmap_gtt. Similar
to the GTT mmapping, unlike the regular CPU mmapping, it avoids the cache
flush after update from CPU side, when object is passed onto GPU. This
type of mapping is specially useful in case of sub-region update,
i.e. when only a portion of the object is to be updated. Using a CPU mmap
in such cases would normally incur a clflush of the whole object, and
using a GTT mmapping would likely require eviction of an active object or
fence and thus stall. The write-combining CPU mmap avoids both.
To ensure the cache coherency, before using this mapping, the GTT domain
has been reused here. This provides the required cache flush if the object
is in CPU domain or synchronization against the concurrent rendering.
Although the access through an uncached mmap should automatically
invalidate the cache lines, this may not be true for non-temporal write
instructions and also not all pages of the object may be updated at any
given point of time through this mapping. Having a call to get_pages in
set_to_gtt_domain function, as added in the earlier patch 'drm/i915:
Broaden application of set-domain(GTT)', would guarantee the clflush and
so there will be no cachelines holding the data for the object before it
is accessed through this map.
The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
extended with a new flags field (defaulting to 0 for existent users). In
order for userspace to detect the extended ioctl, a new parameter
I915_PARAM_MMAP_VERSION has been added for versioning the ioctl interface.
v2: Fix error handling, invalid flag detection, renaming (ickle)
v3: Rebase to latest drm-intel-nightly codebase
The new mmapping is exercised by igt/gem_mmap_wc,
igt/gem_concurrent_blit and igt/gem_gtt_speed.
Change-Id: Ie883942f9e689525f72fe9a8d3780c3a9faa769a
Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Previously, this was restricted to only operate on bound objects - to
make pointer access through the GTT to the object coherent with writes
to and from the GPU. A second usecase is drm_intel_bo_wait_rendering()
which at present does not function unless the object also happens to
be bound into the GGTT (on current systems that is becoming increasingly
rare, especially for the typical requests from mesa). A third usecase is
a future patch wishing to extend the coverage of the GTT domain to
include objects not bound into the GGTT but still in its coherent cache
domain. For the latter pair of requests, we need to operate on the
object regardless of its bind state.
v2: After discussion with Akash, we came to the conclusion that the
get-pages was required in order for accurate domain tracking in the
corner cases (like the shrinker) and also useful for ensuring memory
coherency with earlier cached CPU mmaps in case userspace uses exotic
cache bypass (non-temporal) instructions.
v3: Fix the inactive object check.
v4: Rebase to latest drm-intel-nightly codebase
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
v2: Use WARN_ONs (Daniel)
Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
I've written these long before we've had a reasonable docbook
structure, and naturally they've gone stale. Fix this up asap.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Haswell significantly improved the performance of sampler_c messages,
but the optimization appears to be off by default. Later platforms
remove this bit, and apparently always enable the optimization.
Improves performance in "Counter Strike: Global Offensive" by 18%
at default settings on Iris Pro.
This may break sampling of paletted formats (P8/A8P8/P8A8). It's
unclear whether it affects sampling of paletted formats in general,
or just the sample_c message (which is never used).
While libva does have support for using paletted formats (primarily
for OSDs), that support appears to have been broken for at least a
year, so I couldn't observe a regression from this:
I tried to get libva-intel to use paletted formats, and observe a
regression...but the only thing I found that used it was mplayer's OSD
(on screen display). Even without my patch, the colors were totally
wrong with that, and it's according to a few distro wikis, that's been
the case for over a year.
If libva's code for paletted formats /is/ broken, they could always
add code to disable this bit using the command validator when fixing
it.
Further investigation from Haihao shows that libva mplayer OSD seems
to work at least on his setup (still unclear what's wron with Ken's),
and that it's not affected by this patch. Quoting the discussion
between Haihao and Ken:
> > > If you use "-vo gl" or "-vo xv", the OSD is solid white text with a black
> > > border around it. I presume that it's supposed to be white with vaapi as
> > > well, but I guess I'm not entirely sure.
> > >
> > > It's possible that the optimization doesn't affect the palette as long as
> > > you never use sample_c with the paletted textures.
> >
> > I verified the palette takes effect in the following way:
> >
> > 1. Only support P8A8 format in the driver
> >
> > 2. ran the above command and I saw white OSD text
> >
> > 3. Only support P4A4 format in the driver and don't use
> > 3DSTATE_SAMPLER_PALETTE_LOAD0 to load the value to the texture palette,
> > so the palette keeps unchanged.
> >
> > 4. ran the above command and I saw black OSD text.
> >
> > 5. Load the right value to the texture palette and ran the above command
> > again, I saw white OSD text.
> >
> > Hence I think sample_c with the paletted textures is used in the driver.
>
> That sounds like the palette is actually working, then. Great :)
>
> I doubt that libva would use sample_c - sampling with a shadow comparison?
> It looks like it just uses sample and sample+killpix.
You are right, libva driver doesn't use sample_c message.
> I'm pretty sure the sample_c optimization just uses the palette memory as
> storage for some stuff, so it's quite possible it just works if you're
> only using sample and sample+killpix.
Thanks for the explanation, it makes sense to me.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
[danvet: Add wa name from Ville's review to the comment and copypaste
the explanation why we don't care about libva (already broken) from
Ken. Also add conclusion from libva devs that&why this is all fine.]
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: "Xiang, Haihao" <haihao.xiang@intel.com>
Cc: libva@lists.freedesktop.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The test:
if (size > RADEON_MAX_TEXTURE_SIZE) {
"size" is an integer and it's controled by the user so it can be
negative and the test can underflow. Later we use "size" in:
dwords = size / 4;
...
RADEON_COPY_MT(buffer, data, (int)(dwords * sizeof(u32)));
It causes memory corruption to copy a negative size buffer.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Enabling bapm seems to cause clocking problems on some
KV configurations. Disable it by default for now.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
The check was already in place in the dp mode_valid check, but
radeon_dp_get_dp_link_clock() never returned the high clock
mode_valid was checking for because that function clipped the
clock based on the hw capabilities. Add an explicit check
in the mode_valid function.
bug:
https://bugs.freedesktop.org/show_bug.cgi?id=87172
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc:stable@vge.kernel.org
Enable all three in the driver. Early documentation
indicated the 3rd one was used for something else, but
that is not the case.
v2: handle disable as well
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
This patch fixes a bug where deallocate_vmid() didn't actually unmap the
VMID<-->PASID mapping (in the registers).
That can cause undefined behavior.
This bug only occurs in non-HWS mode.
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Highlights:
- Link order changes in drm/Makefile and drivers/Makefile to fix issue
when amdkfd, radeon and amd_iommu_v2 are compiled inside the kernel
image.
- Consider kernel configuration (using #IFDEFs) when radeon initializes
amdkfd, due to a specific configuration that makes symbol_request()
return a non-NULL value when a symbol doesn't exists. Rusty Russel
is helping me to find the root cause, but it may take a while because
of year-end so I'm sending this as a band-aid solution.
* tag 'amdkfd-fixes-2014-12-30' of git://people.freedesktop.org/~gabbayo/linux:
drm/radeon: Init amdkfd only if it was compiled
amdkfd: actually allocate longs for the pasid bitmask
drm: Put amdkfd before radeon in drm Makefile
drivers: Move iommu/ before gpu/ in Makefile
amdkfd: Remove duplicate include
amdkfd: Fixing topology bug in building sysfs nodes
amdkfd: Fix accounting of device queues
I've had these since before -rc1, but they missed my last pull
request. Real bug fixes and mostly cc: stable material.
* tag 'drm-intel-next-fixes-2014-12-30' of git://anongit.freedesktop.org/drm-intel:
drm/i915: add missing rpm ref to i915_gem_pwrite_ioctl
Revert "drm/i915: Preserve VGACNTR bits from the BIOS"
drm/i915: Don't call intel_prepare_page_flip() multiple times on gen2-4
drm/i915: Kill check_power_well() calls
This patch changes the radeon_kfd_init(), which is used to initialize the
interface between radeon and amdkfd, so the interface will be initialized only
if amdkfd was build, either as module or inside the kernel image.
In the modules case, the symbol_request() will be used (same as old code). In
the in-image compilation case, a direct call to kgd2kfd_init() will be done.
For other cases, radeon_kfd_init() will just return false.
This patch is necessary because in case of the following specific
configuration: kernel 32-bit, no modules support, random kernel base and no
hibernation, the symbol_request() doesn't work as expected - it doesn't return
NULL if the symbol doesn't exists - which makes the kernel panic.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Commit "amdkfd: use sizeof(long) granularity for the pasid bitmask" calculated
the number of longs it will need, but ended up allocating that number of
bytes rather than longs.
Fix that silly error and allocate the amount of data really required.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
This reverts commit 355a701838.
This had some bad side effects under normal operation, and should
have been dropped earlier.
Signed-off-by: Dave Airlie <airlied@redhat.com>
- Display MEC fw version in topology. Without this, the HSA userspace
stack is broken.
- Init apertures information only once per process
* tag 'amdkfd-fixes-2014-12-23' of git://people.freedesktop.org/~gabbayo/linux:
amdkfd: init aperture once per process
amdkfd: Display MEC fw version in topology node
drm/radeon: Add implementation of get_fw_version
drm/amd: Add get_fw_version to kfd-->kgd interface
This is a set of fixes for two regressions and one bug in the IOMMU
mapping code. It turns out that all of these issues turn up primarily
on Tegra30 hardware. The IOMMU mapping bug only manifests on buffers
that aren't multiples of the page size. I happened to be testing HDMI
with 1080p while writing the code and framebuffers for that happen to
fit exactly within 2025 pages of 4 KiB each.
One of the regressions is caused by the IOMMU code allocating pages from
shmem which can have associated cache lines. If the pages aren't flushed
then these cache lines may be flushed later on and cause framebuffer
corruption. I'm not sure why I didn't see this before. Perhaps the board
that I was using had enough RAM so that the pages shmem would hand out
had a better chance of being unused. Or maybe I didn't look too closely.
The fix for this is to fake up an SG table so that it can be passed to
the DMA API. Ideally this would use drm_clflush_*(), but implementing
that for ARM causes DRM to fail to build as a module since some of the
low-level cache maintenance functions aren't exported. Hopefully we can
get a suitable API exported on ARM for the next release.
The second regression is caused by a mismatch between the hardware pipe
number and the CRTC's DRM index. These were used inconsistently, which
could cause one code location to call drm_vblank_get() with a different
pipe than the corresponding drm_vblank_put(), thereby causing the
reference count to become unbalanced. Alexandre also reported a possible
race condition related to this, which this series also fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJUkYzSAAoJEN0jrNd/PrOhFWgP+wYiyXiLot7Wo3+HM779fQ9a
MZkOycfxyNJ+TxJjvIlJh/2y641G4Elw3rod/QhUKg1b0L2uqVrVRKvEsx5sR5Ci
XASwkx9UFLRxN6/Cr/X8SKmE7nFUSwGd3wSoVrT42ldo0DOOlHuVT9NLFoCfDmFa
GN5pxUW/WHS7WgCVpG9GgoFmFZXyrwx9ZRHqL49eJqAvjBngmBbZeyhFeZdu71fl
rm4qMiLkZZsLZEm3uP53pbdkAf7yZGV3WPWKO43LXgykSMfQ56WcN7JJsGygB3I1
uEMP65Tf3TdynW6Wz2dywq81uITJhd8y6Zhr6j2bsNINTHDz67YOxKfS50axZN/P
2PbqDLyJF7MT1ydQ7weeEv4gkRF8Vt6K3aBfL5gm8PM6jm2sdZytsjLM+/RCIkl3
cDtkC+XmPmGxLTEnV3iWMCbCfOrNvqzkp9jvilIEbxIvgX72T6EQPJnfe7Sv95Cr
VBFWoi26XtFhN9wWEGjc7fRTUwNdg4D21/ns8TY3MgOFQcdP01pp2KRTdPDC/6Mt
kknXwDaZRY6EjGQmRKkxKf1c64nmY8V7MJx2CSbPc3HgGdSXaa0AOZE2d60beste
ASpgMQIbERCmAbdmb5JN6fsKcpJrJL15zbrGcDwSnIk96x4HAC8zB9Xln2ubdCwc
IP4cm/Abz6Cfd6I1cQRr
=eVlK
-----END PGP SIGNATURE-----
Merge tag 'drm/tegra/for-3.19-rc1-fixes' of git://people.freedesktop.org/~tagr/linux into drm-fixes
drm/tegra: Fixes for v3.19-rc1
This is a set of fixes for two regressions and one bug in the IOMMU
mapping code. It turns out that all of these issues turn up primarily
on Tegra30 hardware. The IOMMU mapping bug only manifests on buffers
that aren't multiples of the page size. I happened to be testing HDMI
with 1080p while writing the code and framebuffers for that happen to
fit exactly within 2025 pages of 4 KiB each.
One of the regressions is caused by the IOMMU code allocating pages from
shmem which can have associated cache lines. If the pages aren't flushed
then these cache lines may be flushed later on and cause framebuffer
corruption. I'm not sure why I didn't see this before. Perhaps the board
that I was using had enough RAM so that the pages shmem would hand out
had a better chance of being unused. Or maybe I didn't look too closely.
The fix for this is to fake up an SG table so that it can be passed to
the DMA API. Ideally this would use drm_clflush_*(), but implementing
that for ARM causes DRM to fail to build as a module since some of the
low-level cache maintenance functions aren't exported. Hopefully we can
get a suitable API exported on ARM for the next release.
The second regression is caused by a mismatch between the hardware pipe
number and the CRTC's DRM index. These were used inconsistently, which
could cause one code location to call drm_vblank_get() with a different
pipe than the corresponding drm_vblank_put(), thereby causing the
reference count to become unbalanced. Alexandre also reported a possible
race condition related to this, which this series also fixes.
* tag 'drm/tegra/for-3.19-rc1-fixes' of git://people.freedesktop.org/~tagr/linux:
drm/tegra: dc: Select root window for event dispatch
drm/tegra: gem: Use the proper size for GEM objects
drm/tegra: gem: Flush buffer objects upon allocation
drm/tegra: dc: Fix a potential race on page-flip completion
drm/tegra: dc: Consistently use the same pipe
drm/irq: Add drm_crtc_vblank_count()
drm/irq: Add drm_crtc_handle_vblank()
drm/irq: Add drm_crtc_send_vblank_event()
misc i915 fixes.
* tag 'drm-intel-next-fixes-2014-12-17' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Disable PSMI sleep messages on all rings around context switches
drm/i915: Force the CS stall for invalidate flushes
drm/i915: Invalidate media caches on gen7
drm/i915: sanitize RPS resetting during GPU reset
drm/i915: move RPS PM_IER enabling to gen6_enable_rps_interrupts
drm/i915: vlv: fix IRQ masking when uninstalling interrupts
Yeah a pull for one patch is a bit overkill but I started to assemble the
various patches for 3.20 in a branch for atomic props/ioctl and didn't
realize that this bugfix here at the beginnning of the branch should be in
3.19 (because msm is using the helpers arleady). So if you'd merge we'd
have it twice or or I need to shuffle branches again. Can do if you want.
* tag 'topic/atomic-fixes-2014-12-17' of git://anongit.freedesktop.org/drm-intel:
drm/atomic: fix potential null ptr on plane enable
A few msm fixes for 3.19:
* hdmi regulators fix
* hdmi fix for spurious HPD interrupts
* fix for sync atomic update after async update (which could show
up with a setcrtc following a pageflip)
* couple little Coccinelle cleanups
* 'msm-fixes-3.19' of git://people.freedesktop.org/~robclark/linux:
drm/msm/hdmi: rework HDMI IRQ handler
drm/msm/hdmi: enable regulators before clocks to avoid warnings
drm/msm/mdp5: update irqs on crtc<->encoder link change
drm/msm: block incoming update on pending updates
drm/msm: Deletion of unnecessary checks before the function call "release_firmware"
drm/msm: Deletion of unnecessary checks before two function calls
nouveau userspace back at 1.0.1 used to call the X server
DRIOpenDRMMaster interface even for DRI2 (doh!), this attempts
to map the sarea and fails if it can't.
Since 884c6dabb0 from Daniel,
this fails, but only ancient drivers would see it.
Revert the nouveau bits of that fix.
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: <stable@vger.kernel.org> # 3.18
Signed-off-by: Dave Airlie <airlied@redhat.com>
On !SMP systems spinlocks do not exist. Thus checking of they
are active will always fail.
Use
assert_spin_locked(lock);
instead of
BUG_ON(!spin_is_locked(lock));
to not BUG() on all UP systems.
Signed-off-by: Bruno Prémont <bonbons@linux-vserver.org>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The failure paths if we fail to wake the card are less than desirable,
but there's not really a graceful way to handle this case currently.
I'll keep this situation in mind when I get to fixing other vm-related
issues.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
nouveau_ttm_tt_unpopulate() is supposed to return right after calling
ttm_dma_unpopulate() in the case of a coherent buffer. The return
statement was omitted, leading to the pages being unmapped twice. Fix
this.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
When amdkfd and radeon are compiled inside the kernel image (not as modules),
radeon will load before amdkfd, which will cause a bug when radeon will probe
the GPUs.
When the two drivers are compiled as modules, amdkfd is loaded after radeon is
loaded but before radeon starts probing the GPUs. This is done because radeon
loads the amdkfd module through symbol_request function.
This patch makes amdkfd load before radeon when they are both compiled inside
the kernel image, which makes the behavior similar to the case when they are
modules, and prevents the kernel bug.
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>