It's entirely possibly that the other r375 code is relevant to r370 too,
but I've not confirmed this, so I'll leave it where it is for now.
NVIDIA's copyright headers maintained, as it's still all their code.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Gourav Samaiya <gsamaiya@nvidia.com>
gcc-8 reports
drivers/gpu/drm/nouveau/nvkm/engine/pm/base.c: In function 'nvkm_perfmon_mthd':
include/linux/string.h:265:9: error: '__builtin_strncpy' specified bound 64 equals destination size [-Werror=stringop-truncation]
We need one less byte or call strlcpy() to make it a
nul-terminated string.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The kbuild test bot complained about a new coccinelle warning nearby,
which sparked a discussion about the assignment to 'memory' inside of
the conditional expression. See Link below for the original post.
Fix the assignment to silence the coccinelle warning and also make the
code look a little nicer.
Link: https://lists.freedesktop.org/archives/nouveau/2017-November/029242.html
Signed-off-by: Christoph Böhmwalder <christoph@boehmwalder.at>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Fixes failure to compile with recent envyas as a result of the 'movw'
alias being removed for v5.
A bit of history:
v3 only has a 16-bit sign-extended immediate mov op. In order to set
the high bits, there's a separate 'sethi' op. envyas validates that
the value passed to mov(imm) is between -0x8000 and 0x7fff. In order
to simplify macros that load both the low and high word, a 'movw'
alias was added which takes an unsigned 16-bit immediate. However the
actual hardware op still sign extends.
v5 has a full 32-bit immediate mov op. The v3 16-bit immediate mov op
is gone (loads 0 into the dst reg). However due to a bug in envyas,
the movw alias still existed, and selected the no-longer-present v3
16-bit immediate mov op. As a result usage of movw on v5 is the same
as mov with a 0x0 argument.
The proper fix throughout is to only ever use the 'movw' alias in
combination with 'sethi'. Anything else should get the sign-extended
validation to ensure that the intended value ends up in the
destination register.
Changes in fuc3 binaries is the result of a different encoding being
selected for a mov with an 8-bit value.
v2: added commit message written by Ilia, thanks for that!
v3: messed up rebasing, now it should apply
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
For a while we've been having issues with seemingly random interrupts
coming from nvidia cards when resuming them. Originally the fix for this
was thought to be just re-arming the MSI interrupt registers right after
re-allocating our IRQs, however it seems a lot of what we do is both
wrong and not even nessecary.
This was made apparent by what appeared to be a regression in the
mainline kernel that started introducing suspend/resume issues for
nouveau:
a0c9259dc4 (irq/matrix: Spread interrupts on allocation)
After this commit was introduced, we started getting interrupts from the
GPU before we actually re-allocated our own IRQ (see references below)
and assigned the IRQ handler. Investigating this turned out that the
problem was not with the commit, but the fact that nouveau even
free/allocates it's irqs before and after suspend/resume.
For starters: drivers in the linux kernel haven't had to handle
freeing/re-allocating their IRQs during suspend/resume cycles for quite
a while now. Nouveau seems to be one of the few drivers left that still
does this, despite the fact there's no reason we actually need to since
disabling interrupts from the device side should be enough, as the
kernel is already smart enough to know to disable host-side interrupts
for us before going into suspend. Since we were tearing down our IRQs by
hand however, that means there was a short period during resume where
interrupts could be received before we re-allocated our IRQ which would
lead to us getting an unhandled IRQ. Since we never handle said IRQ and
re-arm the interrupt registers, this would cause us to miss all of the
interrupts from the GPU and cause our init process to start timing out
on anything requiring interrupts.
So, since this whole setup/teardown every suspend/resume cycle is
useless anyway, move irq setup/teardown into the pci subdev's ctor/dtor
functions instead so they're only called at driver load and driver
unload. This should fix most of the issues with pending interrupts on
resume, along with getting suspend/resume for nouveau to work again.
As well, this probably means we can also just remove the msi rearm call
inside nvkm_pci_init(). But since our main focus here is to fix
suspend/resume before 4.15, we'll save that for a later patch.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Karol Herbst <kherbst@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <efault@gmx.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Thought I'd try my luck getting one more in:
- Two fixes for Tegra (one is to common code, but our userspace doesn't hit it).
- One for NV5x-class MCPs
* 'linux-4.15' of git://github.com/skeggsb/linux:
drm/nouveau/mmu/mcp77: fix regressions in stolen memory handling
drm/nouveau/bar/gk20a: Avoid bar teardown during init
drm/nouveau/drm/nouveau: Pass the proper arguments to nvif_object_map_handle()
- Fixes addition of stolen memory base address to PTEs.
- Removes support for compression.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
Commit bbb163e189 ("drm/nouveau/bar: implement bar1 teardown")
introduced add a teardown helper function for BAR1. During
initialisation of the Nouveau, initially all the teardown helpers are
called once, before calling their init counterparts. For gk20a, after
the BAR1 teardown function is called, the device is hanging during the
initialisation of the FB sub-device. At this point it is unclear why
this is happening and this is still under investigation. However, this
change is preventing Tegra124 devices from booting when Nouveau is
enabled. To allow Tegra124 to boot, remove the teardown helper for
gk20a.
This is based upon a previous patch by Guillaume Tucker but limits
the workaround to only gk20a GPUs.
Fixes: bbb163e189 ("drm/nouveau/bar: implement bar1 teardown")
Reported-by: Guillaume Tucker <guillaume.tucker@collabora.com>
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
pci_get_bus_and_slot() is restrictive such that it assumes domain=0 as
where a PCI device is present. This restricts the device drivers to be
reused for other domain numbers.
Getting ready to remove pci_get_bus_and_slot() function in favor of
pci_get_domain_bus_and_slot().
Replace pci_get_bus_and_slot() with pci_get_domain_bus_and_slot()
and extract the domain number from
1. struct pci_dev
2. struct pci_dev through drm_device->pdev
3. struct pci_dev through fb->subdev->drm_device->pdev
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
nouveau regression fixes, and some minor fixes.
* 'linux-4.15' of git://github.com/skeggsb/linux:
drm/nouveau: use alternate memory type for system-memory buffers with kind != 0
drm/nouveau: avoid GPU page sizes > PAGE_SIZE for buffer objects in host memory
drm/nouveau/mmu/gp10b: use correct implementation
drm/nouveau/pci: do a msi rearm on init
drm/nouveau/imem/nv50: fix refcount_t warning
drm/nouveau/bios/dp: support DP Info Table 2.0
drm/nouveau/fbcon: fix NULL pointer access in nouveau_fbcon_destroy
On my GP107 when I load nouveau after unloading it, for some reason the
GPU stopped sending or the CPU stopped receiving interrupts if MSI was
enabled.
Doing a rearm once before getting any interrupts fixes this.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJaCm8RAAoJEAx081l5xIa+zX0QAJSm31kCG3vdw2CNiRx25L3q
3hcsEOgAjVJ9FQVGKFWjzb8TK35tSqtNx5kWIj0VGaIfBE5Bdg5SLLgKKUYas8rY
4LaphqICq2uxu2BNa2tpiar/sHhAnuozwQ4czpVWXzlaISnb9yYzRl7gMuyUVGkx
+Gih5VUhLmQC0HsRTLJ3vaZQoUsLAl2gAjKcWa1bx57j2S+iKOPfsLaq7VYo+y1I
Njc+iSGqMhJzRLXVkxL2lQKaslp7R38Bbh5K4Kvyjkm4Aq7zErOF6irpOXKMcrGl
mwnr89vf1G9thjikrBaXpKnuvdbWYveoN/ORMlTdCfxkFnChHLnm3bd7NJ49RXDN
Hv/Iq9YYjmZ9GTatxnx7lWtmXnZXC5he1yn1JAuz/yt7/0b/Wx+Mu/wEpBXYNFTd
1AZdD586i+AmPo3yDkqH9nBu8JC0W0AnS9VZma4LVvZOP2UfJmj5Im1CLHItbGDN
FnUCkwyD/lJUUk+WgT+w/GOMJgmFHDiFFl4tFtYVVjrUirpCFVguSKG9xuv6tT8P
8iRsoP7RrcmDN9ojN2SEHwcpsAv3HnKkDv+9+GIbWnrGsSbCPq8Qm+JDSvf4h22I
K5lwNpJrcpSKI+q10L7w2xliTBwb98sJkWGA/rssomrdBOWteGZAyqFRYAVgQ+mJ
x/nJurIqQYh2KQN9+uLG
=xVV2
-----END PGP SIGNATURE-----
Merge tag 'drm-for-v4.15' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
"This is the main drm pull request for v4.15.
Core:
- Atomic object lifetime fixes
- Atomic iterator improvements
- Sparse/smatch fixes
- Legacy kms ioctls to be interruptible
- EDID override improvements
- fb/gem helper cleanups
- Simple outreachy patches
- Documentation improvements
- Fix dma-buf rcu races
- DRM mode object leasing for improving VR use cases.
- vgaarb improvements for non-x86 platforms.
New driver:
- tve200: Faraday Technology TVE200 block.
This "TV Encoder" encodes a ITU-T BT.656 stream and can be found in
the StorLink SL3516 (later Cortina Systems CS3516) as well as the
Grain Media GM8180.
New bridges:
- SiI9234 support
New panels:
- S6E63J0X03, OTM8009A, Seiko 43WVF1G, 7" rpi touch panel, Toshiba
LT089AC19000, Innolux AT043TN24
i915:
- Remove Coffeelake from alpha support
- Cannonlake workarounds
- Infoframe refactoring for DisplayPort
- VBT updates
- DisplayPort vswing/emph/buffer translation refactoring
- CCS fixes
- Restore GPU clock boost on missed vblanks
- Scatter list updates for userptr allocations
- Gen9+ transition watermarks
- Display IPC (Isochronous Priority Control)
- Private PAT management
- GVT: improved error handling and pci config sanitizing
- Execlist refactoring
- Transparent Huge Page support
- User defined priorities support
- HuC/GuC firmware refactoring
- DP MST fixes
- eDP power sequencing fixes
- Use RCU instead of stop_machine
- PSR state tracking support
- Eviction fixes
- BDW DP aux channel timeout fixes
- LSPCON fixes
- Cannonlake PLL fixes
amdgpu:
- Per VM BO support
- Powerplay cleanups
- CI powerplay support
- PASID mgr for kfd
- SR-IOV fixes
- initial GPU reset for vega10
- Prime mmap support
- TTM updates
- Clock query interface for Raven
- Fence to handle ioctl
- UVD encode ring support on Polaris
- Transparent huge page DMA support
- Compute LRU pipe tweaks
- BO flag to allow buffers to opt out of implicit sync
- CTX priority setting API
- VRAM lost infrastructure plumbing
qxl:
- fix flicker since atomic rework
amdkfd:
- Further improvements from internal AMD tree
- Usermode events
- Drop radeon support
nouveau:
- Pascal temperature sensor support
- Improved BAR2 handling
- MMU rework to support Pascal MMU
exynos:
- Improved HDMI/mixer support
- HDMI audio interface support
tegra:
- Prep work for tegra186
- Cleanup/fixes
msm:
- Preemption support for a5xx
- Display fixes for 8x96 (snapdragon 820)
- Async cursor plane fixes
- FW loading rework
- GPU debugging improvements
vc4:
- Prep for DSI panels
- fix T-format tiling scanout
- New madvise ioctl
Rockchip:
- LVDS support
omapdrm:
- omap4 HDMI CEC support
etnaviv:
- GPU performance counters groundwork
sun4i:
- refactor driver load + TCON backend
- HDMI improvements
- A31 support
- Misc fixes
udl:
- Probe/EDID read fixes.
tilcdc:
- Misc fixes.
pl111:
- Support more variants
adv7511:
- Improve EDID handling.
- HDMI CEC support
sii8620:
- Add remote control support"
* tag 'drm-for-v4.15' of git://people.freedesktop.org/~airlied/linux: (1480 commits)
drm/rockchip: analogix_dp: Use mutex rather than spinlock
drm/mode_object: fix documentation for object lookups.
drm/i915: Reorder context-close to avoid calling i915_vma_close() under RCU
drm/i915: Move init_clock_gating() back to where it was
drm/i915: Prune the reservation shared fence array
drm/i915: Idle the GPU before shinking everything
drm/i915: Lock llist_del_first() vs llist_del_all()
drm/i915: Calculate ironlake intermediate watermarks correctly, v2.
drm/i915: Disable lazy PPGTT page table optimization for vGPU
drm/i915/execlists: Remove the priority "optimisation"
drm/i915: Filter out spurious execlists context-switch interrupts
drm/amdgpu: use irq-safe lock for kiq->ring_lock
drm/amdgpu: bypass lru touch for KIQ ring submission
drm/amdgpu: Potential uninitialized variable in amdgpu_vm_update_directories()
drm/amdgpu: potential uninitialized variable in amdgpu_vce_ring_parse_cs()
drm/amd/powerplay: initialize a variable before using it
drm/amd/powerplay: suppress KASAN out of bounds warning in vega10_populate_all_memory_levels
drm/amd/amdgpu: fix evicted VRAM bo adjudgement condition
drm/vblank: Tune drm_crtc_accurate_vblank_count() WARN down to a debug
drm/rockchip: add CONFIG_OF dependency for lvds
...
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Addresses-Coverity-ID: 1260018
Addresses-Coverity-ID: 1260019
Addresses-Coverity-ID: 1260022
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Addresses-Coverity-ID: 143119
Addresses-Coverity-ID: 143120
Addresses-Coverity-ID: 143121
Addresses-Coverity-ID: 143122
Addresses-Coverity-ID: 143123
Addresses-Coverity-ID: 143124
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
These are the new priviledged interfaces to the VMM backends, and expose
some functionality that wasn't previously available.
It's now possible to allocate a chunk of address-space (even all of it),
without causing page tables to be allocated up-front, and then map into
it at arbitrary locations. This is the basic primitive used to support
features such as sparse mapping, or to allow userspace control over its
own address-space, or HMM (where the GPU driver isn't in control of the
address-space layout).
Rather than being tied to a subtle combination of memory object and VMA
properties, arguments that control map flags (ro, kind, etc) are passed
explicitly at map time.
The compatibility hacks to implement the old frontend on top of the new
driver backends have been replaced with something similar to implement
the old frontend's interfaces on top of the new frontend.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Adds support for:
- 64KiB/2MiB big page sizes (128KiB not supported by HW with new PT layout).
- System-memory PTs.
- LPTE "invalid" state.
- (Tegra) Use of video memory aperture.
- Sparse PDEs/PTEs.
- Additional blocklinear kinds.
- 49-bit address-space.
GP100 supports an entirely new 5-level page table layout that provides
an expanded 49-bit address-space. It also supports the layout present
on previous generations, which we've been making do with until now.
This commit implements support for the new layout, and enables it by
default.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Adds support for:
- 64KiB big page size.
- System-memory PTs.
- LPTE "invalid" state.
- (Tegra) Use of video memory aperture.
Adds support for marking LPTEs invalid, resulting in the corresponding
SPTEs being ignored, which is supposed to speed up TLB invalidates.
On The Tegra side, this will switch to using the video memory aperture
for all mappings. The HW will still target non-coherent system memory,
but this aperture needs to be selected in order to support compression.
Tegra's instmem backend somewhat cheated to get this effect previously.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This is the common code to support a rework of the VMM backends.
It adds support for more than 2 levels of page table nesting, which
is required to be able to support GP100's MMU layout.
Sparse mappings (that don't cause MMU faults when accessed) are now
supported, where the backend provides it.
Dual-PT handling had to become more sophisticated to support sparse,
but this also allows us to support an optimisation the MMU provides
on GK104 and newer.
Certain operations can now be combined into a single page tree walk
to avoid some overhead, but also enables optimsations like skipping
PTE unmap writes when the PT will be destroyed anyway.
The old backend has been hacked up to forward requests onto the new
backend, if present, so that it's possible to bisect between issues
in the backend changes vs the upcoming frontend changes.
Until the new frontend has been merged, new backends will leak BAR2
page tables on module unload. This is expected, and it's not worth
the effort of hacking around this as it doesn't effect runtime.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
To avoid wasting compression tags when using 64KiB pages, we need to
enable this so we can select between upper/lower comptagline in PTEs.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
If NV_PFB_MMU_CTRL_USE_FULL_COMP_TAG_LINE is TRUE, then the last bit of
NV_MMU_PTE_COMPTAGLINE is re-purposed to select the upper/lower half of
a compression tag when using 64KiB big pages.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
We previously required each VMM user to allocate their own page directory
and fill in the instance block themselves.
It makes more sense to handle this in a common location.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Adds support for:
- Selection of old/new-style page table layout (GP100MmuLayout=0/1).
- System-memory PDs.
New layout disabled by default for the moment, as we don't have a
backend that can handle it yet.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This is the first chunk of the new VMM code that provides the structures
needed to describe a GPU virtual address-space layout, as well as common
interfaces to handle VMM creation, and connecting instances to a VMM.
The constructor now allocates the PD itself, rather than having the user
handle that manually. This won't/can't be used until after all backends
have been ported to these interfaces, so a little bit of memory will be
wasted on Fermi and newer for a couple of commits in the series.
Compatibility has been hacked into the old code to allow each GPU backend
to be ported individually.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
GP100 "big" (which is a funny name, when it supports "even bigger") page
tables are small enough that we want to be able to suballocate them from
a larger block of memory.
This builds on the previous page table cache interfaces so that the VMM
code doesn't need to know the difference.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Builds up and maintains a small cache of each page table size in order
to reduce the frequency of expensive allocations, particularly in the
pathological case where an address range ping-pongs between allocated
and free.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Removes the need to expose internals outside of MMU, and GP100 is both
different, and a lot harder to deal with.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Another transition step to allow finer-grained patches transitioning to
new MMU backends.
Old backends will continue operate as before (accessing nvkm_mem::tag),
and new backends will get a reference to the tags allocated here.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Upcoming MMU changes use nvkm_memory as its basic representation of memory,
so we need to be able to allocate VRAM like this.
The code is basically identical to the current chipset-specific allocators,
minus support for compression tags (which will be handled elsewhere anyway).
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
We need to be able to prevent memory from being freed while it's still
mapped in a GPU's address-space.
Will be used by upcoming MMU changes.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Map flags (access, kind, etc) are currently defined in either the VMA,
or the memory object, which turns out to not be ideal for things like
suballocated buffers, etc.
These will become per-map flags instead, so we need to support passing
these arguments in nvkm_memory_map().
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
nvkm_memory is going to be used by the upcoming mmu rework for the basic
representation of a memory allocation, as such, this commit adds support
for comptag allocation to nvkm_memory.
This is very simple for now, in that it requires comptags for the entire
memory allocation even if only certain ranges are compressed.
Support for tracking ranges will be added at a later date.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
We're moving towards having a central place to handle comptag allocation,
and as some GPUs don't have a ram submodule (ie. Tegra), we need to move
the mm somewhere else.
It probably never belonged in ram anyways.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Different sections of VRAM may have different properties (ie. can't be used
for compression/display, can't be mapped, etc).
We currently already support this, but it's a bit magic. This change makes
it more obvious where we're allocating from.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Before: "imem: init completed in 299277us"
After: "imem: init completed in 11574us"
Suspend from Fedora 26 gnome desktop on GP102.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Before: "imem: suspend completed in 5540487us"
After: "imem: suspend completed in 1871526us"
Suspend from Fedora 26 gnome desktop on GP102.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
A good deal of the structures we map into here aren't accessed very often
at all, and Fedora 26 has exposed an issue where after creating a heap of
channels, BAR2 space would run out, and we'd need to make use of the slow
path while accessing important structures like page tables.
This implements an LRU on BAR2 space, which allows eviction of mappings
that aren't currently needed, to make space for other objects.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Another piece of solving the "GP100 BAR2 VMM bootstrap" puzzle.
Without doing this, we'd attempt to write PDEs for the lower page table
levels through BAR2 before BAR2 access has been fully initialised.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This is not as simple as it was for earlier GPUs, due to the need to swap
accessor functions depending on whether BAR2 is usable or not.
We were previously protected by nvkm_instobj's accessor functions keeping
an object mapped permanently, with some unclear magic that managed to hit
the slow-path where needed even if an object was marked as mapped.
That's been replaced here by reference counting maps (some objects, like
page tables can be accessed concurrently), and swapping the functions as
necessary.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This is to simplify upcoming changes. The slow-path is something that
currently occurs during bootstrap of the BAR2 VMM, while backing up an
object during suspend/resume, or when BAR2 address space runs out.
The latter is a real problem that can happen at runtime, and occurs in
Fedora 26 already (due to some change that causes a lot of channels to
be created at login), so ideally we'd prefer not to make it any slower.
We'd also like suspend/resume speed to not suffer.
Upcoming commits will solve those problems in a better way, making the
extra overhead of moving the locking here a non-issue.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The accessor functions can change as a result of acquire()/release() calls,
and are protected by any refcounting done there.
Other functions must remain constant, as they can be called any time.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Discovered by accident while working to use BAR2 access to instmem objects
on more paths.
We've apparently been relying on luck up until now!
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
GP100's page table nests a lot more deeply than the GF100-compatible
layout we're currently using, which means our hackish-but-simple way
of dealing with BAR2 VMM teardown won't work anymore.
In order to sanely handle the chicken-and-egg (BAR2's PTs get mapped
into themselves) problem, we need prevent page tables getting mapped
back into BAR2 during the destruction of its VMM.
To do this, we simply key off the state that's now maintained by the
BAR2 init/fini functions.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Upcoming changes will remove the nvkm_vmm pointer from nvkm_vma, instead
requiring it to be explicitly specified on each operation.
It's not currently possible to get this information for BAR1 mappings,
so let's fix that ahead of time.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Will prevent spurious MMU fault interrupts if something decides to touch
BAR1 after we've unloaded the driver.
Exposed external to BAR so that INSTMEM can use it to better control the
suspend/resume fast-path access.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
If we want to be able to hit the instmem fast-path in a few trickier cases,
we need to be more flexible with when we can initialise BAR2 access.
There's probably a decent case to be made for merging BAR/INSTMEM into BUS,
but that's something to ponder another day.
Flushes have been added after the write to bind the instance block,
as later commits will reveal the need for them.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Will prevent spurious MMU fault interrupts if something decides to touch
BAR1 after we've unloaded the driver.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
BAR2 being done for practical reasons, this is just for consistency.
Flushes have been added after the write to bind the instance block,
as later commits will reveal the need for them.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
NVIDIA call it BAR2, Linux APIs treat it as BAR3 due to BAR1 being a
64-bit BAR, which I presume take two slots or something.
No actual code changes here, just to make future commits less messy.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Will already be done by MMU as a result of the PT writes that occur
during BAR2 bootstrapping.
This is likely just a left-over from the days when it was hardcoded.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
RM appears to do this really early in its initialisation, before DEVINIT.
We currently do this before BAR2 initialisation for some reason.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
MMU will be needing this to specify kind info on BAR mappings.
We have no userspace currently using these interfaces, so break the ABI
instead of supporting both. NVIF version bump so any future use can be
guarded.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Using the ARRAY_SIZE macro improves the readability of the code. Also,
it is useless to re-invent it.
Found with Coccinelle with the following semantic patch:
@r depends on (org || report)@
type T;
T[] E;
position p;
@@
(
(sizeof(E)@p /sizeof(*E))
|
(sizeof(E)@p /sizeof(E[...]))
|
(sizeof(E)@p /sizeof(T))
)
Reviewed-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
v2:
- add nv138 and drop nv13b chipsets (Ilia Mirkin)
- refactor out status variable and instead mask tsensor (Ilia Mirkin)
- switch SHADOWed state message away from nvkm_error() (Ilia Mirkin)
- rename internal temperature variable (Karol Herbst)
v3:
- use nvkm_trace() for SHADOWed state message (Ben Skeggs)
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
G92's seem to require some additional bit of initialization before the
BSP engine can work. It feels like clocks are not set up for the
underlying VLD engine, which means that all commands submitted to the
xtensa chip end up hanging. VP seems to work fine though.
This still allows people to force-enable the bsp engine if they want to
play around with it, but makes it harder for the card to hang by
default.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Cc: stable@vger.kernel.org
Even though we've zeroed the PDE, the GPU may have cached the PD, so we
need to flush when deleting them.
Noticed while working on replacement MMU code, but a backport might be a
good idea, so let's fix it in the current code too.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Cc: stable@vger.kernel.org
... and __initconst if applicable.
Based on similar work for an older kernel in the Grsecurity patch.
[JD: fix toshiba-wmi build]
[JD: add htcpen]
[JD: move __initconst where checkscript wants it]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jean Delvare <jdelvare@suse.de>
It appears that MSI does not work on either G5 PPC nor on a E5500-based
platform, where other hardware is reported to work fine with MSI.
Both tests were conducted with NV4x hardware, so perhaps other (or even
this) hardware can be made to work. It's still possible to force-enable
with config=NvMSI=1 on load.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Useful for testing, and for the userspace build where we can't kick
a framebuffer driver off the device.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Forked from GP107 implementation. Secboot/gr left out as we don't have
signed blobs from NVIDIA in linux-firmware.
(Ben): Was unable to mmiotrace the binary driver for unknown reasons,
so not able to 100% confirm that no other changes from GP107
are needed. Quick testing shows it seems to work well enough
for display. Due to NVIDIA dragging their heels on getting
signed firmware to us, this is the best we can do for now.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101601
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Most of these errors seem to be WFD related. Official documentation
says dcb type 8 is reserved. It's probably used for WFD. Silence
the warning in either case.
Connector type 70 is stated to be a virtual connector for WiFi
display. Since we know this, don't warn that we don't.
Signed-off by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The NV_PMC_ENABLE bit for PMU did not appear until GF100, and some other
unknown register needs to be poked instead.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
An upcoming commit will replace direct NV_PMC register bashing from PMU
with a call to the proper function.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
We assume that each board has 4 heads for GF119+. However this is not
necessarily true - in the case of a GP108 board, the register indicated
that there were only 2.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101601
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Array thresolds should be named thresholds, rename it. Also make it static
static const char * const
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Martin Peres <martin.peres@free.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Fixes hitting WARN_ON() during initialisation of pre-NV50 GPUs, caused
by the recent changes to support pad macro routing on GM20x.
We currently don't use them here for older GPUs anyway.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Bit 30 being set causes the upper half of BAR2 to stay in physical mode,
mapped over the end of VRAM, even when the rest of the BAR has been set
to virtual mode.
We inherited our initial value from RM, but I'm not aware of any reason
we need to keep it that way.
This fixes severe GPU hang/lockup issues revealed by Wayland on F26.
Shout-out to NVIDIA for the quick response with the potential cause!
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Cc: stable@vger.kernel.org # 4.3+
GP102's cursors go from chan 17..20. Increase the array size to hold
their data properly.
Fixes: e50fcff15f ("drm/nouveau/disp/gp102: fix cursor/overlay immediate channel indices")
Cc: stable@vger.kernel.org # v4.10+
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
We don't support them on G80, but we need to add them to the mapping to
avoid triggering a WARN_ON() on GPUs where the ports are present.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Since switching the I2C-over-AUX helpers, there have been regressions on
some display combinations due to us not having support for "address only"
transactions.
This commits enables support for them for GF119 and newer.
Earlier GPUs have been reverted to a custom I2C-over-AUX algorithm.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZYseIAAoJEAx081l5xIa+85kP/0zKzKKVzZXSXG2TAGb5jNfk
Ex+TELG8tWk9KBxA7lEE5c0WEsnP79cNoXZLQu8wlUzO8+kwQK5Bz0zgNUkpSuo1
RthwdsxBQX1++UxB+HoSG+dOa7hkKVqlgQR3z9qyhsBXzetkJV0DoYcpMV0A1EWd
6Jzt+AvCShVkcW+21LqHPlc5EIVewrDMoA3oU6aYCLhyAOUTVvvQB2ML8YApH7TM
JrSrzCFHTrQEBbGUrZQhzR0sZzZzk9byntb/I/mdVbHeCyIHiL8sC4PfWSOyyazm
GkPnA8G3aFAY9haBRz9jG/VBr1yVb0mCBjkWQ1lGfIAOCDDSc+d7PDXdG+i4AewK
jZheXlrDIdGgmJLy4W3rdEqJvdf7UQHZOs8594OL19l4+FxCTrol1JSHSMeavCvr
8bUNil9Jb/ONU/wmp+q55U0k4TCTyerUA7gKnuaJAwBvd4n78/PKmQnbrWinDyJc
GQXp6zESk9bKt5DXSnVZuVf4POTzpuAsQkkfX1V2y145EHTQYfS3jLENWqEjyZUy
QtKCHZvRkJfGaFU4Pr+vBo9Iu1GlA5OiOv08QadldTT4OxUI0T6yaLDobHCQfKPE
sc3wCuCM+/dAnqoKDcGC4hAmF8zDdO0kw65P2m7uC6T9Jm1G35CioKbzo+fzUhuL
fg5TBpbp2Wwe2oPA5iBm
=2S5N
-----END PGP SIGNATURE-----
Merge tag 'drm-for-v4.13' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
"This is the main pull request for the drm, I think I've got one later
driver pull for mediatek SoC driver, I'm undecided on if it needs to
go to you yet.
Otherwise summary below:
Core drm:
- Atomic add driver private objects
- Deprecate preclose hook in modern drivers
- MST bandwidth tracking
- Use kvmalloc in more places
- Add mode_valid hook for crtc/encoder/bridge
- Reduce sync_file construction time
- Documentation updates
- New DRM synchronisation object support
New drivers:
- pl111 - pl111 CLCD display controller
Panel:
- Innolux P079ZCA panel driver
- Add NL12880B20-05, NL192108AC18-02D, P320HVN03 panels
- panel-samsung-s6e3ha2: Add s6e3hf2 panel support
i915:
- SKL+ watermark fixes
- G4x/G33 reset improvements
- DP AUX backlight improvements
- Buffer based GuC/host communication
- New getparam for (sub)slice infomation
- Cannonlake and Coffeelake initial patches
- Execbuf optimisations
radeon/amdgpu:
- Lots of Vega10 bug fixes
- Preliminary raven support
- KIQ support for compute rings
- MEC queue management rework
- DCE6 Audio support
- SR-IOV improvements
- Better radeon/amdgpu selection support
nouveau:
- HDMI stereoscopic support
- Display code rework for >= GM20x GPUs
msm:
- GEM rework for fine-grained locking
- Per-process pagetable work
- HDMI fixes for Snapdragon 820.
vc4:
- Remove 256MB CMA limit from vc4
- Add out-fence support
- Add support for cygnus
- Get/set tiling ioctls support
- Add T-format tiling support for scanout
zte:
- add VGA support.
etnaviv:
- Thermal throttle support for newer GPUs
- Restore userspace buffer cache performance
- dma-buf sync fix
stm:
- add stm32f429 display support
exynos:
- Rework vblank handling
- Fixup sw-trigger code
sun4i:
- V3s display engine support
- HDMI support for older SoCs
- Preliminary work on dual-pipeline SoCs.
rcar-du:
- VSP work
imx-drm:
- Remove counter load enable from PRE
- Double read/write reduction flag support
tegra:
- Documentation for the host1x and drm driver.
- Lots of staging ioctl fixes due to grate project work.
omapdrm:
- dma-buf fence support
- TILER rotation fixes"
* tag 'drm-for-v4.13' of git://people.freedesktop.org/~airlied/linux: (1270 commits)
drm: Remove unused drm_file parameter to drm_syncobj_replace_fence()
drm/amd/powerplay: fix bug fail to remove sysfs when rmmod amdgpu.
amdgpu: Set cik/si_support to 1 by default if radeon isn't built
drm/amdgpu/gfx9: fix driver reload with KIQ
drm/amdgpu/gfx8: fix driver reload with KIQ
drm/amdgpu: Don't call amd_powerplay_destroy() if we don't have powerplay
drm/ttm: Fix use-after-free in ttm_bo_clean_mm
drm/amd/amdgpu: move get memory type function from early init to sw init
drm/amdgpu/cgs: always set reference clock in mode_info
drm/amdgpu: fix vblank_time when displays are off
drm/amd/powerplay: power value format change for Vega10
drm/amdgpu/gfx9: support the amdgpu.disable_cu option
drm/amd/powerplay: change PPSMC_MSG_GetCurrPkgPwr for Vega10
drm/amdgpu: Make amdgpu_cs_parser_init static (v2)
drm/amdgpu/cs: fix a typo in a comment
drm/amdgpu: Fix the exported always on CU bitmap
drm/amdgpu/gfx9: gfx_v9_0_enable_gfx_static_mg_power_gating() can be static
drm/amdgpu/psp: upper_32_bits/lower_32_bits for address setup
drm/amd/powerplay/cz: print message if smc message fails
drm/amdgpu: fix typo in amdgpu_debugfs_test_ib_init
...
- introduce the new uuid_t/guid_t types that are going to replace
the somewhat confusing uuid_be/uuid_le types and make the terminology
fit the various specs, as well as the userspace libuuid library.
(me, based on a previous version from Amir)
- consolidated generic uuid/guid helper functions lifted from XFS
and libnvdimm (Amir and me)
- conversions to the new types and helpers (Amir, Andy and me)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAllZfmILHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYMvyg/9EvWHOOsSdeDykCK3KdH2uIqnxwpl+m7ljccaGJIc
MmaH0KnsP9p/Cuw5hESh2tYlmCYN7pmYziNXpf/LRS65/HpEYbs4oMqo8UQsN0UM
2IXHfXY0HnCoG5OixH8RNbFTkxuGphsTY8meaiDr6aAmqChDQI2yGgQLo3WM2/Qe
R9N1KoBWH/bqY6dHv+urlFwtsREm2fBH+8ovVma3TO73uZCzJGLJBWy3anmZN+08
uYfdbLSyRN0T8rqemVdzsZ2SrpHYkIsYGUZV43F581vp8e/3OKMoMxpWRRd9fEsa
MXmoaHcLJoBsyVSFR9lcx3axKrhAgBPZljASbbA0h49JneWXrzghnKBQZG2SnEdA
ktHQ2sE4Yb5TZSvvWEKMQa3kXhEfIbTwgvbHpcDr5BUZX8WvEw2Zq8e7+Mi4+KJw
QkvFC1S96tRYO2bxdJX638uSesGUhSidb+hJ/edaOCB/GK+sLhUdDTJgwDpUGmyA
xVXTF51ramRS2vhlbzN79x9g33igIoNnG4/PV0FPvpCTSqxkHmPc5mK6Vals1lqt
cW6XfUjSQECq5nmTBtYDTbA/T+8HhBgSQnrrvmferjJzZUFGr/7MXl+Evz2x4CjX
OBQoAMu241w6Vp3zoXqxzv+muZ/NLar52M/zbi9TUjE0GvvRNkHvgCC4NmpIlWYJ
Sxg=
=J/4P
-----END PGP SIGNATURE-----
Merge tag 'uuid-for-4.13' of git://git.infradead.org/users/hch/uuid
Pull uuid subsystem from Christoph Hellwig:
"This is the new uuid subsystem, in which Amir, Andy and I have started
consolidating our uuid/guid helpers and improving the types used for
them. Note that various other subsystems have pulled in this tree, so
I'd like it to go in early.
UUID/GUID summary:
- introduce the new uuid_t/guid_t types that are going to replace the
somewhat confusing uuid_be/uuid_le types and make the terminology
fit the various specs, as well as the userspace libuuid library.
(me, based on a previous version from Amir)
- consolidated generic uuid/guid helper functions lifted from XFS and
libnvdimm (Amir and me)
- conversions to the new types and helpers (Amir, Andy and me)"
* tag 'uuid-for-4.13' of git://git.infradead.org/users/hch/uuid: (34 commits)
ACPI: hns_dsaf_acpi_dsm_guid can be static
mmc: sdhci-pci: make guid intel_dsm_guid static
uuid: Take const on input of uuid_is_null() and guid_is_null()
thermal: int340x_thermal: fix compile after the UUID API switch
thermal: int340x_thermal: Switch to use new generic UUID API
acpi: always include uuid.h
ACPI: Switch to use generic guid_t in acpi_evaluate_dsm()
ACPI / extlog: Switch to use new generic UUID API
ACPI / bus: Switch to use new generic UUID API
ACPI / APEI: Switch to use new generic UUID API
acpi, nfit: Switch to use new generic UUID API
MAINTAINERS: add uuid entry
tmpfs: generate random sb->s_uuid
scsi_debug: switch to uuid_t
nvme: switch to uuid_t
sysctl: switch to use uuid_t
partitions/ldm: switch to use uuid_t
overlayfs: use uuid_t instead of uuid_be
fs: switch ->s_uuid to uuid_t
ima/policy: switch to use uuid_t
...
On Tegra186 systems with certain firmware revisions, leaving the GPU in
reset can cause a hang. To prevent this, don't leave the GPU in reset.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
On Tegra186, powergating is handled by the BPMP power domain provider
and the "legacy" powergating API is not available. Therefore skip
these calls if we are attached to a power domain.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This makes use of all the additional routing and state added in previous
commits, making it possible to deal with GM20x macro link routing, while
also sharing code between the NV50 and GF119 implementations.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This makes use of all the additional routing and state added in previous
commits, making it possible to deal with GM20x macro link routing, while
also sharing code between the NV50 and GF119 implementations.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This makes use of all the additional routing and state added in previous
commits, making it possible to deal with GM20x macro link routing, while
also sharing code between the NV50 and GF119 implementations.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This makes use of all the additional routing and state added in previous
commits, making it possible to deal with GM20x macro link routing, while
also sharing code between the NV50 and GF119 implementations.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This makes use of all the additional routing and state added in previous
commits, making it possible to deal with GM20x macro link routing, while
also sharing code between the NV50 and GF119 implementations.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This shouldn't have been needed ever since we started executing the
DisableLT script when shutting down heads.
Testing of the board this was originally written for seems to agree.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Training/Untraining will be hooked up to the routing logic, which
doesn't allow us to pass in a data rate.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
These exist to give NVKM information on the set of display paths that
the DD needs to be active at any given time.
Previously, the supervisor attempted to determine this solely from OR
state, but there's a few configurations where this information on its
own isn't enough to determine the specific display paths in question:
- ANX9805, where the PIOR protocol for both DP and TMDS is TMDS.
- On a device using DCB Switched Outputs.
- On GM20x and newer, with a crossbar between the SOR and macro links.
After this commit, the DD tells NVKM *exactly* which display path it's
attempting a modeset on.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
All of the necessary hw-specific logic is now handled at the output
resource level, so all of this can go away.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Also removes the user-facing methods to these controls, as they're not
currently utilised by the DD anyway.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This essentially (unless the link becomes unstable and needs to be
re-trained) gives us a single entry-point to link training, during
supervisor handling, where we can ensure all routing is up to date.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
An upcoming commit will limit link training to only when the sink is
meant to be displaying an image.
We still need IRQs enabled even when the link isn't trained (for MST
messages), but don't want to train the link unnecessarily.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The aim here is to protect the OR against locking up when something
unexpected happens (such as the display disappearing during modeset,
or the DD misbehaving).
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This struct doesn't hold link configuration data anymore, so we can
limit its use to internal DP training (anx9805 handles training for
external DP).
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This hasn't been used since atomic.
We may want to re-implement "fast" DPMS at some point, but for now,
this just gets in the way.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This essentially replicates our current behaviour in a way that's
compatible with the new model that's emerging, so that we're able
to start porting the hw-specific functions to it.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Upcoming commits make supervisor handling share code between the NV50
and GF119 implementations. Because of this, and a few other cleanups,
we need to allow some additional customisation.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
In order to properly support the SOR -> SOR + pad macro separation
that occurred with GM20x GPUs, we need to separate OR handling out
of the output path code.
This will be used as the base to support ORs (DAC, SOR, PIOR).
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Primarily intended as a way to pass per-head state around during
supervisor handling, and share logic between NV50/GF119.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This is to allow hw-specific code to instantiate output resources first,
so we can cull unsupported output paths based on them.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Not all users of nvkm_output_dp have been changed here. The remaining
ones belong to code that's disappearing in upcoming commits.
This also modifies the debug level of some messages.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This isn't technically "output", but, "display/output path".
Not all users of nvkm_output have been changed here. The remaining
ones belong to code that's disappearing in upcoming commits.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Upcoming changes to split OR from output path drastically change the
placement of various operations.
In order to make the real changes clearer, do the moving around part
ahead of time.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
As of DCB 4.1, these are not the same thing.
Compatibility temporarily in place until callers have been updated.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
We already have a subdev pointer, from which we can locate the device's
BIOS subdev. No need for a separate pointer.
Structure/callers not updated yet, as I want to batch more changes and
only touch the callers once.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
nvkm_timer_alarm() already handles this as part of protecting against
callers passing in no timeout value.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
I only saw those values inside the vbios: 0xff, 0xfd, 0xfc, 0xfa for valid
rails.
No idea what the lower value does, but at least we get power readings on
a lot of Fermi GPUs with that.
v2: add missing parentheses
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This is according to what we have in nvbios.
Fixes "ERROR: Can't get value of subfeature in0_min: Can't read" errors
in sensors for some GPUs.
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Now that we have the InfoFrame data being provided, for the most
part, program the hardware to use it.
While we're here, and since the functionality will come in handy
for supporting 3D stereoscopy, implement setting the Vendor
("generic"?) InfoFrame.
Also don't enable any InfoFrame that is not provided, and disable
the Vendor InfoFrame when disabling the output.
Signed-off-by: Alastair Bridgewater <alastair.bridgewater@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Now that we have the InfoFrame data being provided, for the most
part, program the hardware to use it.
While we're here, and since the functionality will come in handy
for supporting 3D stereoscopy, implement setting the Vendor
("generic"?) InfoFrame.
Also don't enable any InfoFrame that is not provided, and disable
the Vendor InfoFrame when disabling the output.
Signed-off-by: Alastair Bridgewater <alastair.bridgewater@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Now that we have the InfoFrame data being provided, for the most
part, program the hardware to use it.
While we're here, and since the functionality will come in handy
for supporting 3D stereoscopy, implement setting the Vendor
("generic") InfoFrame.
Also don't enable any AVI or Vendor InfoFrame that is not provided,
and disable the Vendor InfoFrame when disabling the output.
Ignore the Audio InfoFrame: We don't supply it, and altering HDMI
audio semantics (for better or worse) on this hardware is out of
scope for me at this time.
Signed-off-by: Alastair Bridgewater <alastair.bridgewater@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Now that we have the InfoFrame data being provided, for the most
part, program the hardware to use it.
While we're here, and since the functionality will come in handy
for supporting 3D stereoscopy, implement setting the Vendor
("generic"?) InfoFrame.
Also don't enable any AVI or Vendor InfoFrame that is not provided,
and disable the Vendor InfoFrame when disabling the output.
Ignore the Audio InfoFrame: We don't supply it, and altering HDMI
audio semantics (for better or worse) on this hardware is out of
scope for me at this time.
Signed-off-by: Alastair Bridgewater <alastair.bridgewater@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
HDMI InfoFrames are passed to NVKM as bags of bytes, but the
hardware needs them to be packed into words. Rather than having
four (or more) copies of the packing logic introduce a single copy
now, in a central place.
We currently need these for AVI and Vendor InfoFrames, but we may
also expect to need them for Audio InfoFrames at some point.
Signed-off-by: Alastair Bridgewater <alastair.bridgewater@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The nouveau driver, in the Linux 3.7 days, used to try and set the
AVI InfoFrame based on the selected display mode. These days, it
uses a fixed set of InfoFrames. Start to correct that, by
providing a mechanism whereby InfoFrame data may be passed to the
NVKM functions that do the actual configuration.
At this point, only establish the new parameters and their parsing,
don't actually use the data anywhere yet (since it's not supplied
anywhere).
Signed-off-by: Alastair Bridgewater <alastair.bridgewater@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
acpi_evaluate_dsm() and friends take a pointer to a raw buffer of 16
bytes. Instead we convert them to use guid_t type. At the same time we
convert current users.
acpi_str_to_uuid() becomes useless after the conversion and it's safe to
get rid of it.
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Acked-by: Joerg Roedel <jroedel@suse.de>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Acked-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reusing the list_head for both is a bad idea. Callback execution is done
with the lock dropped so that alarms can be rescheduled from the callback,
which means that with some unfortunate timing, lists can get corrupted.
The execution list should not require its own locking, the single function
that uses it can only be called from a single context.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Cc: stable@vger.kernel.org
Presumably we can never actually hit this return, but static checkers
complain that we should unlock before we return.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The last goto looks spurious because it releases less resources than the
previous one.
Also free 'img->sig' if 'ls_ucode_img_build()' fails.
Fixes: 9d896f3e41 ("drm/nouveau/secboot: abstract LS firmware loading functions")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
These were ineffective due to touching the list without the alarm lock,
but should no longer be required.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Cc: stable@vger.kernel.org
The idea here was to avoid having to "manually" program the HW if there's
a new earliest alarm. This was lazy and bad, as it leads to loads of fun
races between inter-related callers (ie. therm).
Turns out, it's not so difficult after all. Go figure ;)
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Cc: stable@vger.kernel.org
At least therm/fantog "attempts" to work around this issue, which could
lead to corruption of the pending alarm list.
Fix it properly by not updating the timestamp without the lock held, or
trying to add an already pending alarm to the pending alarm list....
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Cc: stable@vger.kernel.org
If the time to the next alarm is short enough, we could race with HW and
end up with an ~4 second delay until it triggers.
Fix this by checking again after we update HW.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Cc: stable@vger.kernel.org
Fixes a race where we can miss an alarm that triggers while we're already
processing previous alarms.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Cc: stable@vger.kernel.org
object->engine cannot be NULL, it's either valid, or an error pointer.
This particular condition shouldn't actually be possible, but just in
case, we'll keep it.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This reg has moved on Pascal, and causes a bus fault.
We never use the value anyway, so just remove the read.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
A missing u64 cast causes a 32-Bit wraparound from
4096 MiB to 0 MiB and therefore total 0 MiB VRAM detected
if card has 4096 Mib per FBP.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Karol Herbst <karolherbst@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The error return code PTR_ERR(mc) is always 0 since mc is
equal to 0 in this error handling case.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
GP10B's power is managed by generic PM domains, so it does not require a
VDD regulator. Add this option into the chip function structure.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
GR is similar to GP100, with a few unavailable registers.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
GP10B requires a specific initialization sequence due to the absence of
devinit.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
GP10B's MC is compatible with GP100's, but engines need to be explicitly
put out of ELPG during init.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
GP10B's FB is largely compatible with the GP100 implementation.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
GP10B's FIFO is similar to GP100's, but only allows 512 channels.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The GP10B firmware is very close to GM20B's. The only difference is that
it supports booting multiple falcons. In order to avoid having too much
functions and structures shared, implement its support in the same
source file as GM20B firmware.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
GP10B's secboot is largely similar to GM20B's. Only differences are MC
base address and the fact that GPCCS is also securely managed.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Allow the MC base address to be specified as an argument for the WPR
region reading function. GP10B uses a different address layout as GM20B,
so this is necessary. Also export the function to be used by GP10B.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The LS firmware post-run hook is the right place to start said LS
firmware. Moving it here also allows to remove special handling in the
ACR code.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
A LS post-run hook can meet an error meaning the failure of secure boot.
Make sure this can be reported.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Having access to the secboot instance loading a LS firmware can be
useful to LS firmware handlers. At least more useful than just having an
out-of-context subdev pointer.
GP10B's firmware will also need to know the WPR address, which can be
obtained from the secboot instance.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Change the secboot and msgqueue interfaces to take a mask of falcons to
reset instead of a single falcon. The GP10B firmware interface requires
FECS and GPCCS to be booted in a single firmware command.
For firmwares that only support single falcon boot, it is trivial to
loop over the mask and boot each falcons individually.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The gk20a implementation of instance memory uses vmap()/vunmap() to map
memory regions into the kernel's virtual address space. These functions
may sleep, so protecting them by a spin lock is not safe. This triggers
a warning if the DEBUG_ATOMIC_SLEEP Kconfig option is enabled. Fix this
by using a mutex instead.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Tested-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Forked from GP106 implementation.
Split out from commit enabling secboot/gr support so that it can be
added to earlier kernels.
Cc: stable@vger.kernel.org [4.10+]
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The NV4A (aka NV44A) is an oddity in the family. It only comes in AGP
and PCI varieties, rather than a core PCIE chip with a bridge for
AGP/PCI as necessary. As a result, it appears that the MMU is also
non-functional. For AGP cards, the vast majority of the NV4A lineup,
this worked out since we force AGP cards to use the nv04 mmu. However
for PCI variants, this did not work.
Switching to the NV04 MMU makes it work like a charm. Thanks to mwk for
the suggestion. This should be a no-op for NV4A AGP boards, as they were
using it already.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70388
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The msgqueue pointer validity should be checked by its owner, not by the
msgqueue code itself to avoid this situation.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We were returning PTR_ERR() on a NULL pointer, which obviously won't
work. nvkm_engine_ref() will return an error in case something went
wrong.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
- Re-architecture of the code to handle proprietary fw, more abstracted
to support the multitude of differences that NVIDIA introduce
- Support in the said code for GP10x ACR and GR fw, giving acceleration
support \o/
- Fix for GTX 970 GPUs that are in an odd MMU configuration
* 'linux-4.12' of git://github.com/skeggsb/linux: (60 commits)
drm/nouveau/fb/gf100-: rework ram detection
drm/nouveau/fb/gm200: split ram implementation from gm107
drm/nouveau/fb/gf108: split implementation from gf100
drm/nouveau/fb/gf100-: modify constructors to allow more customisation
drm/nouveau/kms/nv50: use drm core i2c-over-aux algorithm
drm/nouveau/i2c/g94-: return REPLY_M value on reads
drm/nouveau/i2c: modify aux interface to return length actually transferred
drm/nouveau/gp10x: enable secboot and GR
drm/nouveau/gr/gp102: initial support
drm/nouveau/falcon: support for gp10x msgqueue
drm/nouveau/secboot: add gp102/gp104/gp106/gp107 support
drm/nouveau/secboot: put HS code loading code into own file
drm/nouveau/secboot: support for r375 ACR
drm/nouveau/secboot: support for r367 ACR
drm/nouveau/secboot: support for r364 ACR
drm/nouveau/secboot: workaround bug when starting SEC2 firmware
drm/nouveau/secboot: support standard NVIDIA HS binaries
drm/nouveau/secboot: support for unload blob bootloader
drm/nouveau/secboot: let callers interpret return value of blobs
drm/nouveau/secboot: support for different load and unload falcons
...
This commit reworks the RAM detection algorithm, using RAM-per-LTC to
determine whether a board has a mixed-memory configuration instead of
using RAM-per-FBPA. I'm not certain the algorithm is perfect, but it
should handle all currently known configurations in the very least.
This should fix GTX 970 boards with 4GiB of RAM where the last 512MiB
isn't fully accessible, as well as only detecting half the VRAM on
GF108 boards.
As a nice side-effect, GP10x memory detection now reuses the majority
of the code from earlier chipsets.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
GF108/GM107 implementations will want slightly different functions for
the upcoming RAM detection improvements.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This value represents the actual number of bytes recieved on the AUX
channel as the result of a read transaction.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Apparently sinks are allows to respond with ACK even if they didn't
fully complete a transaction... It seems like a missed opportunity
for DEFER to me, but what do I know :)
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
All the bricks are in place for secure boot to be enabled. This in turn
makes GR usable so enable them all.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Differences from GP100:
- 3 PPCs/GPC.
- Another random reg to calculate/write.
- Attrib CB setup a little different.
- PascalB
- PascalComputeB
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Add support for the msgqueue firmware used to process SEC2 commands
for gp10x chips.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
These gp10x chips are supporting using (roughly) the same firmware.
Compared to previous secure chips, ACR runs on SEC2 and so does the
low-secure msgqueue.
ACR for these chips is based on r367.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
We will also need to load HS blobs outside of acr_r352 (for instance, to
run the NVDEC VPR scrubber), so make this code reusable.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
r375 ACR uses a unified bootloader descriptor for the GR and PMU
firmwares.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
r367 uses a different hsflcn_desc layout and LS firmware signature
format, requiring a rewrite of some functions.
It also makes use of the shadow region, and uses SEC as the boot falcon.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
r364 is similar to r361, but uses a different hsflcn_desc structure to
introduce the shadow region address (even though it is not yet used by
this version).
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
For some unknown reason the LS SEC2 firmware needs to be started twice
to operate. Detect and address that condition.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
I had the brilliant idea to "improve" the binary format by removing
a useless indirection in the HS binary files. In the end it just
makes things more complicated than they ought to be as NVIDIA-provided
files need to be adapted. Since the format used can be identified by the
header, support both.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
If the load and unload falcons are different, then a different
bootloader must also be used. Support this case.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Since the HS blobs are provided and signed by NVIDIA, we cannot expect
always-consistent behavior. In this case, on GP10x the unload blob may
return 0x1d even though things have run perfectly well. This behavior
has been confirmed by NVIDIA.
So let the callers of the run_blob() hook receive the blob return's
value (a positive integer) and decide what it means. This allows us to
workaround the 0x1d code instead of issuing an error.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
On some secure boot instances (e.g. gp10x) the load and unload blobs do
not run on the same falcon. Support this case by introducing a new
member to the ACR structure and making related functions take the falcon
to use as an argument instead of assuming the boot falcon is to be used.
The rule is that the load blob can be run on either the SEC or PMU
falcons, but the unload blob must be always run on PMU.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Share elements of r361 that will be reused in other ACRs.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Add support for running the ACR binary on the SEC falcon.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The start address used for secure blobs is not unique to the ACR, but
rather blob-dependent. Remove the unique member stored in the ACR
structure and make the load function return the start address for the
current blob instead.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
ACR firmware from r364 on need a shadow region for the ACR to copy the
WPR region into. Add a flag to indicate that a shadow region is required
and manage memory allocations accordingly.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Add support for running a msgqueue on the SEC2 falcon.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
On SEC, DMEM is unaccessible by the CPU when the falcon is running in LS
mode. This makes communication with the firmware using DMEM impossible.
For this purpose, a new kind of memory (EMEM) has been added. It works
similarly to DMEM, with the difference that its address space starts at
0x1000000. For this reason, it makes sense to treat it like a special
case of DMEM.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
All falcons have their FBIF registers starting at offset 0x600, with the
exception of the PMU and NVENC engines.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Not all falcons have a debug register, and it is not always found at the
same offset.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
SEC2 is the name given by NVIDIA to the SEC engine post-Fermi (reasons
unknown). Even though it shares the same address range as SEC, its usage
is quite different and this justifies a new engine. Add this engine and
make TOP use it all post-TOP devices should use this implementation and
not the older SEC.
Also quickly add the short gp102 implementation which will be used for
falcon booting purposes.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
gp10x' secure boot requires a blob to be run on NVDEC. Expose the falcon
through a dummy device.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reading registers at device construction time can be harmful, as there
is no guarantee the underlying engine will be up, or in its runtime
configuration. Defer register reading to the oneinit() hook and update
users accordingly.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Both registers allow to bind a new context, but NXTCTX will work on all
falcons, while legacy NEW_INSTBLK is reserved to PMU.
After setting NXTCTX we trigger a context switch by writing 0x090 and
0x0a4.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Enable the PMU firmware in gm20b, managed by secure boot.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
gm20b PMU firmware is driven by a msgqueue, so connect relevant PMU
hooks to their msgqueue counterparts.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The ACR firmware may return no error but fail nonetheless. Such cases
can be detected by verifying that the WPR region has been properly set
in FB. If this is not the case, this is an error, but the unload
firmware should still not be run.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
PMU support has been enabled for r352 ACR, but it must remain optional
if we want to preserve existing user-space that do not include it. Allow
ACR to be instanciated with a list of optional LS falcons, that will not
produce a fatal error if their firmware is not loaded. Also change the
secure boot bootstrap logic to be able to fall back to legacy behavior
if it turns out the boot falcon's LS firmware cannot be loaded.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Add the PMU bootloader generator and PMU LS ops that will enable proper
PMU operation if the PMU falcon is designated as managed.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Adapt secboot's behavior if a PMU firmware is present, in particular
the way LS falcons are reset. Without PMU firmware, secboot needs to be
performed again from scratch so all LS falcons are reset. With PMU
firmware, we can ask the PMU's ACR unit to reset a specific falcon
through a PMU message.
As we must preserve the old behavior to avoid breaking user-space, add a
few conditionals to the way falcons are reset.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Allow secboot to load a LS PMU firmware. LS PMU is one instance of
firmwares based on the message queue mechanism, which is also used for
other firmwares like SEC, so name its source file accordingly.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
NVIDIA-provided PMU firmware is controlled by a msgqueue. Add a member
to the PMU structure as well as the required cleanup code if this
feature is used.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Add support for the msgqueue firmware used to process PMU commands for
gm20b.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
A message queue firmware implements a specific protocol allowing the
host to send "commands" to a falcon, and the falcon to reply using
"messages". This patch implements the common part of this protocol and
defines the interface that the host can use.
Due to the way the firmware is developped internally at NVIDIA (where
kernel driver and firmware evolve in lockstep), firmwares taken at
different points in time can have frustratingly subtle differences that
must be taken into account. This code is architectured to make
implementing such differences as easy as possible.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Add the ability for LS firmwares to declare a post-run hook that is
invoked right after the HS firmware is executed. This allows them to
e.g. write some initialization data into the falcon's DMEM.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
As different firmare versions use different HS descriptor formats, we
need to abstract this part as well.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This structure does not need to be shared anymore.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This allows the bootloader descriptor generation code to not rely on
specialized ls_ucode_img structures, making it reusable in other
instances.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Offsets were not properly computed. This went unnoticed because we are
only using one app for now.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Using 32-bit integers would trim the WPR address if it is allocated above 4GB.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
A WPR region smaller than 256K will result in secure boot failure.
Adjust the minimal size.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The WPR address parameter of the ls_write_wpr hook was defined as a u32,
which will very likely overflow on boards with more than 4GB VRAM.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Check at contruction time that we have support for all the LS firmwares
asked by the caller.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Remove a leftover that became obsolete with the falcon interface.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
DMEM registers are replicated with a stride of 8 bytes.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The falcon library may be used concurrently, especially after the
introduction of the msgqueue interface. Make it safe to use it that way.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This is not used currently, but is added for the sake of completeness.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Some PMU implementations (in particular the ones managed by secure
boot) may not have a reset() hook. Make sure we don't crash in that
case.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Make nvkm_secboot_falcon_name publicly visible as other subdevs will
need to use it for debug messages.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ideally we'd be able to keep these at a more obvious error level, as
they're a good indication of us doing something wrong.
However, NVIDIA's FECS/GPCCS firmware touches registers that trigger
priv ring faults, and we can't do anything to fix that ourselves due
to the need for them to be signed by NVIDIA.
This issue was reported a while back, but hasn't been fixed, so, for
now we will hide the messages to prevent spamming Optimus users with
messages whenever the NVIDIA GPU is powered off and on again.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Use a more common logging style.
Miscellanea:
o Coalesce formats and realign arguments
o Neaten a few macros now using pr_<level>
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Sinclair Yeh <syeh@vmware.com>
Acked-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Acked-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/76355db47b31668bb64d996865ceee53bd66b11f.1488285953.git.joe@perches.com
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJYoM2fAAoJEHm+PkMAQRiGr9MH/izEAMri7rJ0QMc3ejt+WmD0
8pkZw3+MVn71z6cIEgpzk4QkEWJd5rfhkETCeCp7qQ9V6cDW1FDE9+0OmPjiphDt
nnzKs7t7skEBwH5Mq5xygmIfkv+Z0QGHZ20gfQWY3F56Uxo+ARF88OBHBLKhqx3v
98C7YbMFLKBslKClA78NUEIdx0UfBaRqerlERx0Lfl9aoOrbBS6WI3iuREiylpih
9o7HTrwaGKkU4Kd6NdgJP2EyWPsd1LGalxBBjeDSpm5uokX6ALTdNXDZqcQscHjE
RmTqJTGRdhSThXOpNnvUJvk9L442yuNRrVme/IqLpxMdHPyjaXR3FGSIDb2SfjY=
=VMy8
-----END PGP SIGNATURE-----
Merge tag 'v4.10-rc8' into drm-next
Linux 4.10-rc8
Backmerge Linus rc8 to fix some conflicts, but also
to avoid pulling it in via a fixes pull from someone.
704a6c008b7942bb7f30bb43d2a6bcad7f543662 broke pci msi rearm for g92 GPUs.
g92 needs the nv46_pci_msi_rearm, where g94+ gpus used nv40_pci_msi_rearm.
Reported-by: Andrew Randrianasulu <randrianasulu@gmail.com>
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Cc: stable@vger.kernel.org
This seems to be absolutely necessary for a lot of NV40.
Reported-by: gsgf on IRC/freenode
Signed-off-by: Martin Peres <martin.peres@free.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
v2: Set entry to 0xff if not found
Add cap entry for ver 0x30 tables
Rework to fix memory leak
v3: More error checks
Simplify check for invalid entries
v4: disable for ver 0x10 for now
move assignments after the second last return
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Greatly improves the chances of recovering the GPU from a CTXSW_TIMEOUT.
Tested with piglit's arb_shader_image_load_store-atomicity, which causes
GR to hang in such a way that recovery failed (CTXSW_TIMEOUT continually
re-triggers).
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This will serve as a basis for implementing some improvements to how
we recover the GPU from channel errors.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The previous commit simply changes the interface, but should result in
the same behaviour as previously. This commit has been split out from
it as it can result in a different channel being selected.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
FIFO gives us load/save/switch status, and we need to be able to determine
which direction a "switch" is failing during channel recovery.
In order to do this, we apparently need to query the engine itself.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
There are instances (such as non-recoverable GPU page faults) where
NVKM decides that a channel's context is no longer viable, and will
be removed from the runlist.
This commit notifies the owner of the channel when this happens, so
it has the opportunity to take some kind of recovery action instead
of hanging.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Tested on a G92, seems to work. Confirmed by 8 mmiotraces.
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
We never have any need for a double-linked list here, and as there's
generally a large number of these objects, replace it with a single-
linked list in order to save some memory.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
We want a supervisor client of NVKM (such as the DRM) to be able to
allow sharing of resources (such as memory objects) between clients.
To allow this, the supervisor creates all its clients as children of
itself, and will use an upcoming ioctl to permit sharing.
Currently it's not possible for indirect clients to use subclients.
Supporting this will require an additional field in the main ioctl.
This isn't important currently, but will need to be fixed for virt.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>