-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJad5lgAAoJEFmIoMA60/r8s2kQAI3PztawDpaCP9Z12pkbBHSt
Ho0xTyk9rCZi9kQJbNjc+a+QrlA3QmTHXIXerB3LSWoh7M+XhsECjem92eHpgLNS
JvYPhTfOrCr0vdiAmOz6hD0AqN/psrbfzgiJhSwomsGEFS77k7kERSJckRv81sxb
Aj5F/WjucAgLorwm4auveAJEQ7atE7/6pkXzoqYm4G6NLOb46jUcRGndrnvXZBlz
fws8fBM4BHyi7i25CYQl24tFq1CGax1rIPgLg+4KnH76bQk/N6Ju0sGVSzfh+hG8
SIerK9bJbzGRAuNKoxB3aO1dyzsK3x9WztE2mG98w5trOISPIR1FqnvC/225FWAU
d6eIXiC7wKnEx+DElNTzCjzfHc7SAJoupO32H7CoiTe5zPUlWlxJ1zLYkK1gt50q
m8PRBiYTglxyznzrO0drtcdjEzvbdZNRrsYnul4wi1vSHzjk6F6XLtzT10XWM1M1
1pXLB8384FTj0Hu4bq6Y3Aivkmz0Sf+eQM2NaOwe+Zj7/1VV0d3lvi4LUXkqzLCA
FoXPJSMxG2Qu+iflCeYRQBJjExaZH3eNLZ3dT6QpcJrjaFVedd9u5DeeFqNL27zV
bhr8TdqrR4p4rc8EBAGoCapw96IxLZROKB3gxbrZVOpfIZpzthwHbElHX6aqUgF4
w/EV1JWs36WXWaxFk8wd
=ttq9
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
- skip AER driver error recovery callbacks for correctable errors
reported via ACPI APEI, as we already do for errors reported via the
native path (Tyler Baicar)
- fix DPC shared interrupt handling (Alex Williamson)
- print full DPC interrupt number (Keith Busch)
- enable DPC only if AER is available (Keith Busch)
- simplify DPC code (Bjorn Helgaas)
- calculate ASPM L1 substate parameter instead of hardcoding it (Bjorn
Helgaas)
- enable Latency Tolerance Reporting for ASPM L1 substates (Bjorn
Helgaas)
- move ASPM internal interfaces out of public header (Bjorn Helgaas)
- allow hot-removal of VGA devices (Mika Westerberg)
- speed up unplug and shutdown by assuming Thunderbolt controllers
don't support Command Completed events (Lukas Wunner)
- add AtomicOps support for GPU and Infiniband drivers (Felix Kuehling,
Jay Cornwall)
- expose "ari_enabled" in sysfs to help NIC naming (Stuart Hayes)
- clean up PCI DMA interface usage (Christoph Hellwig)
- remove PCI pool API (replaced with DMA pool) (Romain Perier)
- deprecate pci_get_bus_and_slot(), which assumed PCI domain 0 (Sinan
Kaya)
- move DT PCI code from drivers/of/ to drivers/pci/ (Rob Herring)
- add PCI-specific wrappers for dev_info(), etc (Frederick Lawler)
- remove warnings on sysfs mmap failure (Bjorn Helgaas)
- quiet ROM validation messages (Alex Deucher)
- remove redundant memory alloc failure messages (Markus Elfring)
- fill in types for compile-time VGA and other I/O port resources
(Bjorn Helgaas)
- make "pci=pcie_scan_all" work for Root Ports as well as Downstream
Ports to help AmigaOne X1000 (Bjorn Helgaas)
- add SPDX tags to all PCI files (Bjorn Helgaas)
- quirk Marvell 9128 DMA aliases (Alex Williamson)
- quirk broken INTx disable on Ceton InfiniTV4 (Bjorn Helgaas)
- fix CONFIG_PCI=n build by adding dummy pci_irqd_intx_xlate() (Niklas
Cassel)
- use DMA API to get MSI address for DesignWare IP (Niklas Cassel)
- fix endpoint-mode DMA mask configuration (Kishon Vijay Abraham I)
- fix ARTPEC-6 incorrect IS_ERR() usage (Wei Yongjun)
- add support for ARTPEC-7 SoC (Niklas Cassel)
- add endpoint-mode support for ARTPEC (Niklas Cassel)
- add Cadence PCIe host and endpoint controller driver (Cyrille
Pitchen)
- handle multiple INTx status bits being set in dra7xx (Vignesh R)
- translate dra7xx hwirq range to fix INTD handling (Vignesh R)
- remove deprecated Exynos PHY initialization code (Jaehoon Chung)
- fix MSI erratum workaround for HiSilicon Hip06/Hip07 (Dongdong Liu)
- fix NULL pointer dereference in iProc BCMA driver (Ray Jui)
- fix Keystone interrupt-controller-node lookup (Johan Hovold)
- constify qcom driver structures (Julia Lawall)
- rework Tegra config space mapping to increase space available for
endpoints (Vidya Sagar)
- simplify Tegra driver by using bus->sysdata (Manikanta Maddireddy)
- remove PCI_REASSIGN_ALL_BUS usage on Tegra (Manikanta Maddireddy)
- add support for Global Fabric Manager Server (GFMS) event to
Microsemi Switchtec switch driver (Logan Gunthorpe)
- add IDs for Switchtec PSX 24xG3 and PSX 48xG3 (Kelvin Cao)
* tag 'pci-v4.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (140 commits)
PCI: cadence: Add EndPoint Controller driver for Cadence PCIe controller
dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe endpoint controller
PCI: endpoint: Fix EPF device name to support multi-function devices
PCI: endpoint: Add the function number as argument to EPC ops
PCI: cadence: Add host driver for Cadence PCIe controller
dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe host controller
PCI: Add vendor ID for Cadence
PCI: Add generic function to probe PCI host controllers
PCI: generic: fix missing call of pci_free_resource_list()
PCI: OF: Add generic function to parse and allocate PCI resources
PCI: Regroup all PCI related entries into drivers/pci/Makefile
PCI/DPC: Reformat DPC register definitions
PCI/DPC: Add and use DPC Status register field definitions
PCI/DPC: Squash dpc_rp_pio_get_info() into dpc_process_rp_pio_error()
PCI/DPC: Remove unnecessary RP PIO register structs
PCI/DPC: Push dpc->rp_pio_status assignment into dpc_rp_pio_get_info()
PCI/DPC: Squash dpc_rp_pio_print_error() into dpc_rp_pio_get_info()
PCI/DPC: Make RP PIO log size check more generic
PCI/DPC: Rename local "status" to "dpc_status"
PCI/DPC: Squash dpc_rp_pio_print_tlp_header() into dpc_rp_pio_print_error()
...
Add SPDX GPL-2.0 to all PCI files that specified the GPL version 2 license.
Remove the boilerplate GPL version 2 language, relying on the assertion in
b24413180f ("License cleanup: add SPDX GPL-2.0 license identifier to
files with no license") that the SPDX identifier may be used instead of the
full boilerplate text.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull x86 fixes from Thomas Gleixner:
"A couple of fixlets for x86:
- Fix the ESPFIX double fault handling for 5-level pagetables
- Fix the commandline parsing for 'apic=' on 32bit systems and update
documentation
- Make zombie stack traces reliable
- Fix kexec with stack canary
- Fix the delivery mode for APICs which was missed when the x86
vector management was converted to single target delivery. Caused a
regression due to the broken hardware which ignores affinity
settings in lowest prio delivery mode.
- Unbreak modules when AMD memory encryption is enabled
- Remove an unused parameter of prepare_switch_to"
* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Switch all APICs to Fixed delivery mode
x86/apic: Update the 'apic=' description of setting APIC driver
x86/apic: Avoid wrong warning when parsing 'apic=' in X86-32 case
x86-32: Fix kexec with stack canary (CONFIG_CC_STACKPROTECTOR)
x86: Remove unused parameter of prepare_switch_to
x86/stacktrace: Make zombie stack traces reliable
x86/mm: Unbreak modules that use the DMA API
x86/build: Make isoimage work on Debian
x86/espfix/64: Fix espfix double-fault handling on 5-level systems
Some of the APIC incarnations are operating in lowest priority delivery
mode. This worked as long as the vector management code allocated the same
vector on all possible CPUs for each interrupt.
Lowest priority delivery mode does not necessarily respect the affinity
setting and may redirect to some other online CPU. This was documented
somewhere in the old code and the conversion to single target delivery
missed to update the delivery mode of the affected APIC drivers which
results in spurious interrupts on some of the affected CPU/Chipset
combinations.
Switch the APIC drivers over to Fixed delivery mode and remove all
leftovers of lowest priority delivery mode.
Switching to Fixed delivery mode is not a problem on these CPUs because the
kernel already uses Fixed delivery mode for IPIs. The reason for this is
that th SDM explicitely forbids lowest prio mode for IPIs. The reason is
obvious: If the irq routing does not honor destination targets in lowest
prio mode then an IPI targeted at CPU1 might end up on CPU0, which would be
a fatal problem in many cases.
As a consequence of this change, the apic::irq_delivery_mode field is now
pointless, but this needs to be cleaned up in a separate patch.
Fixes: fdba46ffb4 ("x86/apic: Get rid of multi CPU affinity")
Reported-by: vcaputo@pengaru.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: vcaputo@pengaru.com
Cc: Pavel Machek <pavel@ucw.cz>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712281140440.1688@nanos
The effective_affinity_mask is always set when an interrupt is assigned in
__assign_irq_vector() -> apic->cpu_mask_to_apicid(), e.g. for struct apic
apic_physflat: -> default_cpu_mask_to_apicid() ->
irq_data_update_effective_affinity(), but it looks d->common->affinity
remains all-1's before the user space or the kernel changes it later.
In the early allocation/initialization phase of an IRQ, we should use the
effective_affinity_mask, otherwise Hyper-V may not deliver the interrupt to
the expected CPU. Without the patch, if we assign 7 Mellanox ConnectX-3
VFs to a 32-vCPU VM, one of the VFs may fail to receive interrupts.
Tested-by: Adrian Suhov <v-adsuho@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jake Oshins <jakeo@microsoft.com>
Cc: stable@vger.kernel.org
Cc: Jork Loeser <jloeser@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZsr8cAAoJEFmIoMA60/r8lXYQAKViYIRMJDD4n3NhjMeLOsnJ
vwaBmWlLRjSFIEpag5kMjS1RJE17qAvmkBZnDvSNZ6cT28INkkZnVM2IW96WECVq
64MIvDijVPcvqGuWePCfWdDiSXApiDWwJuw55BOhmvV996wGy0gYgzpPY+1g0Knh
XzH9IOzDL79hZleLfsxX0MLV6FGBVtOsr0jvQ04k4IgEMIxEDTlbw85rnrvzQUtc
0Vj2koaxWIESZsq7G/wiZb2n6ekaFdXO/VlVvvhmTSDLCBaJ63Hb/gfOhwMuVkS6
B3cVprNrCT0dSzWmU4ZXf+wpOyDpBexlemW/OR/6CQUkC6AUS6kQ5si1X44dbGmJ
nBPh414tdlm/6V4h/A3UFPOajSGa/ZWZ/uQZPfvKs1R6WfjUerWVBfUpAzPbgjam
c/mhJ19HYT1J7vFBfhekBMeY2Px3JgSJ9rNsrFl48ynAALaX5GEwdpo4aqBfscKz
4/f9fU4ysumopvCEuKD2SsJvsPKd5gMQGGtvAhXM1TxvAoQ5V4cc99qEetAPXXPf
h2EqWm4ph7YP4a+n/OZBjzluHCmZJn1CntH5+//6wpUk6HnmzsftGELuO9n12cLE
GGkreI3T9ctV1eOkzVVa0l0QTE1X/VLyEyKCtb9obXsDaG4Ud7uKQoZgB19DwyTJ
EG76ridTolUFVV+wzJD9
=9cLP
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
- add enhanced Downstream Port Containment support, which prints more
details about Root Port Programmed I/O errors (Dongdong Liu)
- add Layerscape ls1088a and ls2088a support (Hou Zhiqiang)
- add MediaTek MT2712 and MT7622 support (Ryder Lee)
- add MediaTek MT2712 and MT7622 MSI support (Honghui Zhang)
- add Qualcom IPQ8074 support (Varadarajan Narayanan)
- add R-Car r8a7743/5 device tree support (Biju Das)
- add Rockchip per-lane PHY support for better power management (Shawn
Lin)
- fix IRQ mapping for hot-added devices by replacing the
pci_fixup_irqs() boot-time design with a host bridge hook called at
probe-time (Lorenzo Pieralisi, Matthew Minter)
- fix race when enabling two devices that results in upstream bridge
not being enabled correctly (Srinath Mannam)
- fix pciehp power fault infinite loop (Keith Busch)
- fix SHPC bridge MSI hotplug events by enabling bus mastering
(Aleksandr Bezzubikov)
- fix a VFIO issue by correcting PCIe capability sizes (Alex
Williamson)
- fix an INTD issue on Xilinx and possibly other drivers by unifying
INTx IRQ domain support (Paul Burton)
- avoid IOMMU stalls by marking AMD Stoney GPU ATS as broken (Joerg
Roedel)
- allow APM X-Gene device assignment to guests by adding an ACS quirk
(Feng Kan)
- fix driver crashes by disabling Extended Tags on Broadcom HT2100
(Extended Tags support is required for PCIe Receivers but not
Requesters, and we now enable them by default when Requesters support
them) (Sinan Kaya)
- fix MSIs for devices that use phantom RIDs for DMA by assuming MSIs
use the real Requester ID (not a phantom RID) (Robin Murphy)
- prevent assignment of Intel VMD children to guests (which may be
supported eventually, but isn't yet) by not associating an IOMMU with
them (Jon Derrick)
- fix Intel VMD suspend/resume by releasing IRQs on suspend (Scott
Bauer)
- fix a Function-Level Reset issue with Intel 750 NVMe by waiting
longer (up to 60sec instead of 1sec) for device to become ready
(Sinan Kaya)
- fix a Function-Level Reset issue on iProc Stingray by working around
hardware defects in the CRS implementation (Oza Pawandeep)
- fix an issue with Intel NVMe P3700 after an iProc reset by adding a
delay during shutdown (Oza Pawandeep)
- fix a Microsoft Hyper-V lockdep issue by polling instead of blocking
in compose_msi_msg() (Stephen Hemminger)
- fix a wireless LAN driver timeout by clearing DesignWare MSI
interrupt status after it is handled, not before (Faiz Abbas)
- fix DesignWare ATU enable checking (Jisheng Zhang)
- reduce Layerscape dependencies on the bootloader by doing more
initialization in the driver (Hou Zhiqiang)
- improve Intel VMD performance allowing allocation of more IRQ vectors
than present CPUs (Keith Busch)
- improve endpoint framework support for initial DMA mask, different
BAR sizes, configurable page sizes, MSI, test driver, etc (Kishon
Vijay Abraham I, Stan Drozd)
- rework CRS support to add periodic messages while we poll during
enumeration and after Function-Level Reset and prepare for possible
other uses of CRS (Sinan Kaya)
- clean up Root Port AER handling by removing unnecessary code and
moving error handler methods to struct pcie_port_service_driver
(Christoph Hellwig)
- clean up error handling paths in various drivers (Bjorn Andersson,
Fabio Estevam, Gustavo A. R. Silva, Harunobu Kurokawa, Jeffy Chen,
Lorenzo Pieralisi, Sergei Shtylyov)
- clean up SR-IOV resource handling by disabling VF decoding before
updating the corresponding resource structs (Gavin Shan)
- clean up DesignWare-based drivers by unifying quirks to update Class
Code and Interrupt Pin and related handling of write-protected
registers (Hou Zhiqiang)
- clean up by adding empty generic pcibios_align_resource() and
pcibios_fixup_bus() and removing empty arch-specific implementations
(Palmer Dabbelt)
- request exclusive reset control for several drivers to allow cleanup
elsewhere (Philipp Zabel)
- constify various structures (Arvind Yadav, Bhumika Goyal)
- convert from full_name() to %pOF (Rob Herring)
- remove unused variables from iProc, HiSi, Altera, Keystone (Shawn
Lin)
* tag 'pci-v4.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (170 commits)
PCI: xgene: Clean up whitespace
PCI: xgene: Define XGENE_PCI_EXP_CAP and use generic PCI_EXP_RTCTL offset
PCI: xgene: Fix platform_get_irq() error handling
PCI: xilinx-nwl: Fix platform_get_irq() error handling
PCI: rockchip: Fix platform_get_irq() error handling
PCI: altera: Fix platform_get_irq() error handling
PCI: spear13xx: Fix platform_get_irq() error handling
PCI: artpec6: Fix platform_get_irq() error handling
PCI: armada8k: Fix platform_get_irq() error handling
PCI: dra7xx: Fix platform_get_irq() error handling
PCI: exynos: Fix platform_get_irq() error handling
PCI: iproc: Clean up whitespace
PCI: iproc: Rename PCI_EXP_CAP to IPROC_PCI_EXP_CAP
PCI: iproc: Add 500ms delay during device shutdown
PCI: Fix typos and whitespace errors
PCI: Remove unused "res" variable from pci_resource_io()
PCI: Correct kernel-doc of pci_vpd_srdt_size(), pci_vpd_srdt_tag()
PCI/AER: Reformat AER register definitions
iommu/vt-d: Prevent VMD child devices from being remapping targets
x86/PCI: Use is_vmd() rather than relying on the domain number
...
To support implementing remote TLB flushing on Hyper-V with a hypercall
we need to make vp_index available outside of vmbus module. Rename and
globalize.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jork Loeser <Jork.Loeser@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Simon Xiao <sixiao@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: devel@linuxdriverproject.org
Link: http://lkml.kernel.org/r/20170802160921.21791-7-vkuznets@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The setup of MSI with Hyper-V host was sleeping with locks held. This
error is reported when doing SR-IOV hotplug with kernel built with lockdep:
BUG: sleeping function called from invalid context at kernel/sched/completion.c:93
in_atomic(): 1, irqs_disabled(): 1, pid: 1405, name: ip
3 locks held by ip/1405:
#0: (rtnl_mutex){+.+.+.}, at: [<ffffffff976b10bb>] rtnetlink_rcv+0x1b/0x40
#1: (&desc->request_mutex){+.+...}, at: [<ffffffff970ddd33>] __setup_irq+0xb3/0x720
#2: (&irq_desc_lock_class){-.-...}, at: [<ffffffff970ddd65>] __setup_irq+0xe5/0x720
irq event stamp: 3476
hardirqs last enabled at (3475): [<ffffffff971b3005>] get_page_from_freelist+0x225/0xc90
hardirqs last disabled at (3476): [<ffffffff978024e7>] _raw_spin_lock_irqsave+0x27/0x90
softirqs last enabled at (2446): [<ffffffffc05ef0b0>] ixgbevf_configure+0x380/0x7c0 [ixgbevf]
softirqs last disabled at (2444): [<ffffffffc05ef08d>] ixgbevf_configure+0x35d/0x7c0 [ixgbevf]
The workaround is to poll for host response instead of blocking on
completion.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Update the Hyper-V vPCI driver to use the Server-2016 version of the vPCI
protocol, fixing MSI creation and retargeting issues.
Signed-off-by: Jork Loeser <jloeser@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Hyper-V vPCI offers different protocol versions. Add the infra for
negotiating the one to use.
Signed-off-by: Jork Loeser <jloeser@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
To ease parallel effort to centralize CPU-number-to-vCPU-number conversion,
temporarily stand up own version, file-local hv_tmp_cpu_nr_to_vp_nr().
Once the changes have merged, this work-around can be removed, and the
calls replaced with hv_cpu_number_to_vp_number().
Signed-off-by: Jork Loeser <jloeser@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
The hv_pcibus_device structure contains an in-memory hypercall argument
that must not cross a page boundary. Allocate the structure as a page to
ensure that.
Signed-off-by: Jork Loeser <jloeser@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Fix comment formatting and use proper integer fields.
Signed-off-by: Jork Loeser <jloeser@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
refcount_t type and corresponding API should be used instead of atomic_t
when the variable is used as a reference counter. This allows to avoid
accidental refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
The memory allocation here needs to be non-blocking. Fix the issue.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Long Li <longli@microsoft.com>
Cc: <stable@vger.kernel.org>
When we have 32 or more CPUs in the affinity mask, we should use a special
constant to specify that to the host. Fix this issue.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Long Li <longli@microsoft.com>
Cc: <stable@vger.kernel.org>
A PCI_EJECT message can arrive at the same time we are calling
pci_scan_child_bus() in the workqueue for the previous PCI_BUS_RELATIONS
message or in create_root_hv_pci_bus(). In this case we could potentially
modify the bus from multiple places.
Properly lock the bus access.
Thanks Dexuan Cui <decui@microsoft.com> for pointing out the race condition
in create_root_hv_pci_bus().
Reported-by: Xiaofeng Wang <xiaofwan@redhat.com>
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
hv_pci_devices_present() is called in hv_pci_remove() when we remove a PCI
device from the host, e.g., by disabling SR-IOV on a device. In
hv_pci_remove(), the bus is already removed before the call, so we don't
need to rescan the bus in the workqueue scheduled from
hv_pci_devices_present().
By introducing bus state hv_pcibus_removed, we can avoid this situation.
Reported-by: Xiaofeng Wang <xiaofwan@redhat.com>
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Use the device serial number as the PCI domain. The serial numbers start
with 1 and are unique within a VM. So names, such as VF NIC names, that
include domain number as part of the name, can be shorter than that based
on part of bus UUID previously. The new names will also stay same for VMs
created with copied VHD and same number of devices.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
The devfn of 00:02.0 is 0x10. devfn_to_wslot(0x10) == 0x2, and
wslot_to_devfn(0x2) should be 0x10, while it's 0x2 in the current code.
Due to this, hv_eject_device_work() -> pci_get_domain_bus_and_slot()
returns NULL and pci_stop_and_remove_bus_device() is not called.
Later when the real device driver's .remove() is invoked by
hv_pci_remove() -> pci_stop_root_bus(), some warnings can be noticed
because the VM has lost the access to the underlying device at that
time.
Signed-off-by: Jake Oshins <jakeo@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
CC: stable@vger.kernel.org
CC: K. Y. Srinivasan <kys@microsoft.com>
CC: Stephen Hemminger <sthemmin@microsoft.com>
hv_do_hypercall() assumes that we pass a segment from a physically
contiguous buffer. A buffer allocated on the stack may not work if
CONFIG_VMAP_STACK=y is set.
Use kmalloc() to allocate this buffer.
Reported-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
After we send a PCI_EJECTION_COMPLETE message to the host, the host will
immediately send us a PCI_BUS_RELATIONS message with
relations->device_count == 0, so pci_devices_present_work(), running on
another thread, can find the being-ejected device, mark the
hpdev->reported_missing to true, and run list_move_tail()/list_del() for
the device -- this races hv_eject_device_work() -> list_del().
Move the list_del() in hv_eject_device_work() to an earlier place, i.e.,
before we send PCI_EJECTION_COMPLETE, so later the
pci_devices_present_work() can't see the device.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jake Oshins <jakeo@microsoft.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
1. We don't really need such a big on-stack buffer when sending the
teardown_packet: vmbus_sendpacket() here only uses sizeof(struct
pci_message).
2. In the hot-remove case (PCI_EJECT), after we send PCI_EJECTION_COMPLETE
to the host, the host will send a RESCIND_CHANNEL message to us and the
host won't access the per-channel ringbuffer any longer, so we needn't send
PCI_RESOURCES_RELEASED/PCI_BUS_D0EXIT to the host, and we shouldn't expect
the host's completion message of PCI_BUS_D0EXIT, which will never come.
3. We should send PCI_BUS_D0EXIT after hv_send_resources_released().
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jake Oshins <jakeo@microsoft.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
We don't really need such a big on-stack buffer. vmbus_sendpacket() here
only uses sizeof(struct pci_child_message).
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jake Oshins <jakeo@microsoft.com>
Make hv_irq_mask() and hv_irq_unmask() static as they are only used in
pci-hyperv.c
This fixes a sparse warning.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
'completion_status' is used in some places, e.g.,
hv_pci_protocol_negotiation(), so we should make sure it's initialized in
error case too, though the error is unlikely here.
[bhelgaas: fix changelog typo and nearby whitespace]
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: KY Srinivasan <kys@microsoft.com>
CC: Jake Oshins <jakeo@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
Handle vmbus_sendpacket() failure in hv_compose_msi_msg().
I happened to find this when reading the code. I didn't get a real issue
however.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: KY Srinivasan <kys@microsoft.com>
CC: Jake Oshins <jakeo@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
The 2 structs can use a zero-length array here, because dynamic memory of
the correct size is allocated in hv_pci_devices_present() and we don't need
this extra element.
No functional change.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: KY Srinivasan <kys@microsoft.com>
CC: Jake Oshins <jakeo@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
Use zero-length array in struct pci_packet and rename struct pci_message's
field "message_type" to "type". This makes the code more readable.
No functionality change.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: KY Srinivasan <kys@microsoft.com>
CC: Jake Oshins <jakeo@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
SR-IOV disabled from the host causes a memory leak. pci-hyperv usually
first receives a PCI_EJECT notification and then proceeds to delete the
hpdev list entry in hv_eject_device_work(). Later in hv_msi_free() since
the device is no longer on the device list hpdev is NULL and hv_msi_free
returns without freeing int_desc as part of hv_int_desc_free().
Signed-off-by: Cathy Avery <cavery@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jake Oshins <jakeo@microsoft.com>
When we have an interrupt from the host we have a bit set in event page
indicating there are messages for the particular channel. We need to read
them all as we won't get signaled for what was on the queue before we
cleared the bit in vmbus_on_event(). This applies to all Hyper-V drivers
and the pass-through driver should do the same.
I did not meet any bugs; the issue was found by code inspection. We don't
have many events going through hv_pci_onchannelcallback(), which explains
why nobody reported the issue before.
While on it, fix handling non-zero vmbus_recvpacket_raw() return values by
dropping out. If the return value is not zero, it is wrong to inspect
buffer or bytes_recvd as these may contain invalid data.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jake Oshins <jakeo@microsoft.com>
We don't free buffer on several code paths in hv_pci_onchannelcallback(),
put kfree() to the end of the function to fix the issue. Direct { kfree();
return; } can now be replaced with a simple 'break';
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jake Oshins <jakeo@microsoft.com>
Here's the big char and misc driver update for 4.7-rc1.
Lots of different tiny driver subsystems have updates here with new
drivers and functionality. Details in the shortlog.
All have been in linux-next with no reported issues for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlc/0YYACgkQMUfUDdst+ynmtACeLpLLKZsy1v7WfkW92cLSOPBD
2C8AoLFPKoh55rlOJrNz3bW9ANAaOloX
=/nsL
-----END PGP SIGNATURE-----
Merge tag 'char-misc-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc driver updates from Greg KH:
"Here's the big char and misc driver update for 4.7-rc1.
Lots of different tiny driver subsystems have updates here with new
drivers and functionality. Details in the shortlog.
All have been in linux-next with no reported issues for a while"
* tag 'char-misc-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (125 commits)
mcb: Delete num_cells variable which is not required
mcb: Fixed bar number assignment for the gdd
mcb: Replace ioremap and request_region with the devm version
mcb: Implement bus->dev.release callback
mcb: export bus information via sysfs
mcb: Correctly initialize the bus's device
mei: bus: call mei_cl_read_start under device lock
coresight: etb10: adjust read pointer only when needed
coresight: configuring ETF in FIFO mode when acting as link
coresight: tmc: implementing TMC-ETF AUX space API
coresight: moving struct cs_buffers to header file
coresight: tmc: keep track of memory width
coresight: tmc: make sysFS and Perf mode mutually exclusive
coresight: tmc: dump system memory content only when needed
coresight: tmc: adding mode of operation for link/sinks
coresight: tmc: getting rid of multiple read access
coresight: tmc: allocating memory when needed
coresight: tmc: making prepare/unprepare functions generic
coresight: tmc: splitting driver in ETB/ETF and ETR components
coresight: tmc: cleaning up header file
...
I'm trying to pass-through Broadcom BCM5720 NIC (Dell device 1f5b) on a
Dell R720 server. Everything works fine when the target VM has only one
CPU, but SMP guests reboot when the NIC driver accesses PCI config space
with hv_pcifront_read_config()/hv_pcifront_write_config(). The reboot
appears to be induced by the hypervisor and no crash is observed. Windows
event logs are not helpful at all ('Virtual machine ... has quit
unexpectedly'). The particular access point is always different and
putting debug between them (printk/mdelay/...) moves the issue further
away. The server model affects the issue as well: on Dell R420 I'm able to
pass-through BCM5720 NIC to SMP guests without issues.
While I'm obviously failing to reveal the essence of the issue I was able
to come up with a (possible) solution: if explicit barriers are added to
hv_pcifront_read_config()/hv_pcifront_write_config() the issue goes away.
The essential minimum is rmb() at the end on _hv_pcifront_read_config() and
wmb() at the end of _hv_pcifront_write_config() but I'm not confident it
will be sufficient for all hardware. I suggest the following barriers:
1) wmb()/mb() between choosing the function and writing to its space.
2) mb() before releasing the spinlock in both _hv_pcifront_read_config()/
_hv_pcifront_write_config() to ensure that consecutive reads/writes to
the space won't get re-ordered as drivers may count on that.
Config space access is not supposed to be performance-critical so these
explicit barriers should not cause any slowdown.
[bhelgaas: use Linux "barriers" terminology]
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jake Oshins <jakeo@microsoft.com>
Kernel hang is observed when pci-hyperv module is release with device
drivers still attached. E.g., when I do 'rmmod pci_hyperv' with BCM5720
device pass-through-ed (tg3 module) I see the following:
NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [rmmod:2104]
...
Call Trace:
[<ffffffffa0641487>] tg3_read_mem+0x87/0x100 [tg3]
[<ffffffffa063f000>] ? 0xffffffffa063f000
[<ffffffffa0644375>] tg3_poll_fw+0x85/0x150 [tg3]
[<ffffffffa0649877>] tg3_chip_reset+0x357/0x8c0 [tg3]
[<ffffffffa064ca8b>] tg3_halt+0x3b/0x190 [tg3]
[<ffffffffa0657611>] tg3_stop+0x171/0x230 [tg3]
...
[<ffffffffa064c550>] tg3_remove_one+0x90/0x140 [tg3]
[<ffffffff813bee59>] pci_device_remove+0x39/0xc0
[<ffffffff814a3201>] __device_release_driver+0xa1/0x160
[<ffffffff814a32e3>] device_release_driver+0x23/0x30
[<ffffffff813b794a>] pci_stop_bus_device+0x8a/0xa0
[<ffffffff813b7ab6>] pci_stop_root_bus+0x36/0x60
[<ffffffffa02c3f38>] hv_pci_remove+0x238/0x260 [pci_hyperv]
The problem seems to be that we report local resources release before
stopping the bus and removing devices from it and device drivers may try to
perform some operations with these resources on shutdown. Move resources
release report after we do pci_stop_root_bus().
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jake Oshins <jakeo@microsoft.com>
This patch modifies all the callers of vmbus_mmio_allocate()
to call vmbus_mmio_free() instead of release_mem_region().
Signed-off-by: Jake Oshins <jakeo@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add a new driver which exposes a root PCI bus whenever a PCI Express device
is passed through to a guest VM under Hyper-V. The device can be single-
or multi-function. The interrupts for the devices are managed by an IRQ
domain, implemented within the driver.
[bhelgaas: fold in race condition fix (http://lkml.kernel.org/r/1456340196-13717-1-git-send-email-jakeo@microsoft.com)]
Signed-off-by: Jake Oshins <jakeo@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>