Commit Graph

624 Commits

Author SHA1 Message Date
Michal Hocko 4e15a073a1 Revert "mm, mmu_notifier: annotate mmu notifiers with blockable invalidate callbacks"
Revert 5ff7091f5a ("mm, mmu_notifier: annotate mmu notifiers with
blockable invalidate callbacks").

MMU_INVALIDATE_DOES_NOT_BLOCK flags was the only one used and it is no
longer needed since 93065ac753 ("mm, oom: distinguish blockable mode for
mmu notifiers").  We now have a full support for per range !blocking
behavior so we can drop the stop gap workaround which the per notifier
flag was used for.

Link: http://lkml.kernel.org/r/20180827112623.8992-4-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-26 16:25:19 -07:00
Linus Torvalds da19a102ce First merge window pull request
This has been a smaller cycle with many of the commits being smallish code
 fixes and improvements across the drivers.
 
 - Driver updates for bnxt_re, cxgb4, hfi1, hns, mlx5, nes, qedr, and rxe
 
 - Memory window support in hns
 
 - mlx5 user API 'flow mutate/steering' allows accessing the full packet
   mangling and matching machinery from user space
 
 - Support inter-working with verbs API calls in the 'devx' mlx5 user API, and
   provide options to use devx with less privilege
 
 - Modernize the use of syfs and the device interface to use attribute groups
   and cdev properly for uverbs, and clean up some of the core code's device list
   management
 
 - More progress on net namespaces for RDMA devices
 
 - Consolidate driver BAR mmapping support into core code helpers and rework
   how RDMA holds poitners to mm_struct for get_user_pages cases
 
 - First pass to use 'dev_name' instead of ib_device->name
 
 - Device renaming for RDMA devices
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlvR7dUACgkQOG33FX4g
 mxojiw//a9GU5kq4IZ3LNAEio/3Ql/NHRF0uie5tSzJgipRJA1Ln9zW0Cm1S/ms1
 VCmaSJ3l3q3GC4i3tIlsZSIIkN5qtjv/FsT/i+TZwSJYx9BDpPbzWtG6Mp4PSDj0
 v3xzklFCN5HMOmEcjkNmyZw3VjHOt2Iw2mKjqvGbI9imCPLOYnw+WQaZLmMWMH6p
 GL0HDbAopN5Lv8ireWd8pOhPLVbSb12cWM1crx+yHOS3q8YNWjIXGiZr/QkOPtPr
 cymSXB8yuITJ7gnjbs/GxZHg6rxU0knC/Ck8hE7FqqYYHgytTklOXDE2ef1J2lFe
 1VmotD+nTsCir0mZWSdcRrszEk7tzaZT7n1oWggKvWySDB6qaH0II8vWumJchQnN
 pElIQn/WDgpekIqplamNqXJnKnDXZJpEVA01OHHDN4MNSc+Ad08hQy4FyFzpB6/G
 jv9TnDMfGC6ma9pr1ipOXyCgCa2pHYEUCaYxUqRA0O/4ATVl7/PplqT0rqtJ6hKg
 o/hmaVCawIFOUKD87/bo7Em2HBs3xNwE/c5ggbsQElLYeydrgPrZfrPfjkshv5K3
 eIKDb+HPyis0is1aiF7m/bz1hSIYZp0bQhuKCdzLRjZobwCm5WDPhtuuAWb7vYVw
 GSLCJWyet+bLyZxynNOt67gKm9je9lt8YTr5nilz49KeDytspK0=
 =pacJ
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "This has been a smaller cycle with many of the commits being smallish
  code fixes and improvements across the drivers.

   - Driver updates for bnxt_re, cxgb4, hfi1, hns, mlx5, nes, qedr, and
     rxe

   - Memory window support in hns

   - mlx5 user API 'flow mutate/steering' allows accessing the full
     packet mangling and matching machinery from user space

   - Support inter-working with verbs API calls in the 'devx' mlx5 user
     API, and provide options to use devx with less privilege

   - Modernize the use of syfs and the device interface to use attribute
     groups and cdev properly for uverbs, and clean up some of the core
     code's device list management

   - More progress on net namespaces for RDMA devices

   - Consolidate driver BAR mmapping support into core code helpers and
     rework how RDMA holds poitners to mm_struct for get_user_pages
     cases

   - First pass to use 'dev_name' instead of ib_device->name

   - Device renaming for RDMA devices"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (242 commits)
  IB/mlx5: Add support for extended atomic operations
  RDMA/core: Fix comment for hw stats init for port == 0
  RDMA/core: Refactor ib_register_device() function
  RDMA/core: Fix unwinding flow in case of error to register device
  ib_srp: Remove WARN_ON in srp_terminate_io()
  IB/mlx5: Allow scatter to CQE without global signaled WRs
  IB/mlx5: Verify that driver supports user flags
  IB/mlx5: Support scatter to CQE for DC transport type
  RDMA/drivers: Use core provided API for registering device attributes
  RDMA/core: Allow existing drivers to set one sysfs group per device
  IB/rxe: Remove unnecessary enum values
  RDMA/umad: Use kernel API to allocate umad indexes
  RDMA/uverbs: Use kernel API to allocate uverbs indexes
  RDMA/core: Increase total number of RDMA ports across all devices
  IB/mlx4: Add port and TID to MAD debug print
  IB/mlx4: Enable debug print of SMPs
  RDMA/core: Rename ports_parent to ports_kobj
  RDMA/core: Do not expose unsupported counters
  IB/mlx4: Refer to the device kobject instead of ports_parent
  RDMA/nldev: Allow IB device rename through RDMA netlink
  ...
2018-10-26 07:38:19 -07:00
Linus Torvalds bd6bf7c104 pci-v4.20-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAlvPV7IUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vyaUg//WnCaRIu2oKOp8c/bplZJDW5eT10d
 oYAN9qeyptU9RYrg4KBNbZL9UKGFTk3AoN5AUjrk8njxc/dY2ra/79esOvZyyYQy
 qLXBvrXKg3yZnlNlnyBneGSnUVwv/kl2hZS+kmYby2YOa8AH/mhU0FIFvsnfRK2I
 XvwABFm2ZYvXCqh3e5HXaHhOsR88NQ9In0AXVC7zHGqv1r/bMVn2YzPZHL/zzMrF
 mS79tdBTH+shSvchH9zvfgIs+UEKvvjEJsG2liwMkcQaV41i5dZjSKTdJ3EaD/Y2
 BreLxXRnRYGUkBqfcon16Yx+P6VCefDRLa+RhwYO3dxFF2N4ZpblbkIdBATwKLjL
 npiGc6R8yFjTmZU0/7olMyMCm7igIBmDvWPcsKEE8R4PezwoQv6YKHBMwEaflIbl
 Rv4IUqjJzmQPaA0KkRoAVgAKHxldaNqno/6G1FR2gwz+fr68p5WSYFlQ3axhvTjc
 bBMJpB/fbp9WmpGJieTt6iMOI6V1pnCVjibM5ZON59WCFfytHGGpbYW05gtZEod4
 d/3yRuU53JRSj3jQAQuF1B6qYhyxvv5YEtAQqIFeHaPZ67nL6agw09hE+TlXjWbE
 rTQRShflQ+ydnzIfKicFgy6/53D5hq7iH2l7HwJVXbXRQ104T5DB/XHUUTr+UWQn
 /Nkhov32/n6GjxQ=
 =58I4
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:

 - Fix ASPM link_state teardown on removal (Lukas Wunner)

 - Fix misleading _OSC ASPM message (Sinan Kaya)

 - Make _OSC optional for PCI (Sinan Kaya)

 - Don't initialize ASPM link state when ACPI_FADT_NO_ASPM is set
   (Patrick Talbert)

 - Remove x86 and arm64 node-local allocation for host bridge structures
   (Punit Agrawal)

 - Pay attention to device-specific _PXM node values (Jonathan Cameron)

 - Support new Immediate Readiness bit (Felipe Balbi)

 - Differentiate between pciehp surprise and safe removal (Lukas Wunner)

 - Remove unnecessary pciehp includes (Lukas Wunner)

 - Drop pciehp hotplug_slot_ops wrappers (Lukas Wunner)

 - Tolerate PCIe Slot Presence Detect being hardwired to zero to
   workaround broken hardware, e.g., the Wilocity switch/wireless device
   (Lukas Wunner)

 - Unify pciehp controller & slot structs (Lukas Wunner)

 - Constify hotplug_slot_ops (Lukas Wunner)

 - Drop hotplug_slot_info (Lukas Wunner)

 - Embed hotplug_slot struct into users instead of allocating it
   separately (Lukas Wunner)

 - Initialize PCIe port service drivers directly instead of relying on
   initcall ordering (Keith Busch)

 - Restore PCI config state after a slot reset (Keith Busch)

 - Save/restore DPC config state along with other PCI config state
   (Keith Busch)

 - Reference count devices during AER handling to avoid race issue with
   concurrent hot removal (Keith Busch)

 - If an Upstream Port reports ERR_FATAL, don't try to read the Port's
   config space because it is probably unreachable (Keith Busch)

 - During error handling, use slot-specific reset instead of secondary
   bus reset to avoid link up/down issues on hotplug ports (Keith Busch)

 - Restore previous AER/DPC handling that does not remove and
   re-enumerate devices on ERR_FATAL (Keith Busch)

 - Notify all drivers that may be affected by error recovery resets
   (Keith Busch)

 - Always generate error recovery uevents, even if a driver doesn't have
   error callbacks (Keith Busch)

 - Make PCIe link active reporting detection generic (Keith Busch)

 - Support D3cold in PCIe hierarchies during system sleep and runtime,
   including hotplug and Thunderbolt ports (Mika Westerberg)

 - Handle hpmemsize/hpiosize kernel parameters uniformly, whether slots
   are empty or occupied (Jon Derrick)

 - Remove duplicated include from pci/pcie/err.c and unused variable
   from cpqphp (YueHaibing)

 - Remove driver pci_cleanup_aer_uncorrect_error_status() calls (Oza
   Pawandeep)

 - Uninline PCI bus accessors for better ftracing (Keith Busch)

 - Remove unused AER Root Port .error_resume method (Keith Busch)

 - Use kfifo in AER instead of a local version (Keith Busch)

 - Use threaded IRQ in AER bottom half (Keith Busch)

 - Use managed resources in AER core (Keith Busch)

 - Reuse pcie_port_find_device() for AER injection (Keith Busch)

 - Abstract AER interrupt handling to disconnect error injection (Keith
   Busch)

 - Refactor AER injection callbacks to simplify future improvments
   (Keith Busch)

 - Remove unused Netronome NFP32xx Device IDs (Jakub Kicinski)

 - Use bitmap_zalloc() for dma_alias_mask (Andy Shevchenko)

 - Add switch fall-through annotations (Gustavo A. R. Silva)

 - Remove unused Switchtec quirk variable (Joshua Abraham)

 - Fix pci.c kernel-doc warning (Randy Dunlap)

 - Remove trivial PCI wrappers for DMA APIs (Christoph Hellwig)

 - Add Intel GPU device IDs to spurious interrupt quirk (Bin Meng)

 - Run Switchtec DMA aliasing quirk only on NTB endpoints to avoid
   useless dmesg errors (Logan Gunthorpe)

 - Update Switchtec NTB documentation (Wesley Yung)

 - Remove redundant "default n" from Kconfig (Bartlomiej Zolnierkiewicz)

 - Avoid panic when drivers enable MSI/MSI-X twice (Tonghao Zhang)

 - Add PCI support for peer-to-peer DMA (Logan Gunthorpe)

 - Add sysfs group for PCI peer-to-peer memory statistics (Logan
   Gunthorpe)

 - Add PCI peer-to-peer DMA scatterlist mapping interface (Logan
   Gunthorpe)

 - Add PCI configfs/sysfs helpers for use by peer-to-peer users (Logan
   Gunthorpe)

 - Add PCI peer-to-peer DMA driver writer's documentation (Logan
   Gunthorpe)

 - Add block layer flag to indicate driver support for PCI peer-to-peer
   DMA (Logan Gunthorpe)

 - Map Infiniband scatterlists for peer-to-peer DMA if they contain P2P
   memory (Logan Gunthorpe)

 - Register nvme-pci CMB buffer as PCI peer-to-peer memory (Logan
   Gunthorpe)

 - Add nvme-pci support for PCI peer-to-peer memory in requests (Logan
   Gunthorpe)

 - Use PCI peer-to-peer memory in nvme (Stephen Bates, Steve Wise,
   Christoph Hellwig, Logan Gunthorpe)

 - Cache VF config space size to optimize enumeration of many VFs
   (KarimAllah Ahmed)

 - Remove unnecessary <linux/pci-ats.h> include (Bjorn Helgaas)

 - Fix VMD AERSID quirk Device ID matching (Jon Derrick)

 - Fix Cadence PHY handling during probe (Alan Douglas)

 - Signal Cadence Endpoint interrupts via AXI region 0 instead of last
   region (Alan Douglas)

 - Write Cadence Endpoint MSI interrupts with 32 bits of data (Alan
   Douglas)

 - Remove redundant controller tests for "device_type == pci" (Rob
   Herring)

 - Document R-Car E3 (R8A77990) bindings (Tho Vu)

 - Add device tree support for R-Car r8a7744 (Biju Das)

 - Drop unused mvebu PCIe capability code (Thomas Petazzoni)

 - Add shared PCI bridge emulation code (Thomas Petazzoni)

 - Convert mvebu to use shared PCI bridge emulation (Thomas Petazzoni)

 - Add aardvark Root Port emulation (Thomas Petazzoni)

 - Support 100MHz/200MHz refclocks for i.MX6 (Lucas Stach)

 - Add initial power management for i.MX7 (Leonard Crestez)

 - Add PME_Turn_Off support for i.MX7 (Leonard Crestez)

 - Fix qcom runtime power management error handling (Bjorn Andersson)

 - Update TI dra7xx unaligned access errata workaround for host mode as
   well as endpoint mode (Vignesh R)

 - Fix kirin section mismatch warning (Nathan Chancellor)

 - Remove iproc PAXC slot check to allow VF support (Jitendra Bhivare)

 - Quirk Keystone K2G to limit MRRS to 256 (Kishon Vijay Abraham I)

 - Update Keystone to use MRRS quirk for host bridge instead of open
   coding (Kishon Vijay Abraham I)

 - Refactor Keystone link establishment (Kishon Vijay Abraham I)

 - Simplify and speed up Keystone link training (Kishon Vijay Abraham I)

 - Remove unused Keystone host_init argument (Kishon Vijay Abraham I)

 - Merge Keystone driver files into one (Kishon Vijay Abraham I)

 - Remove redundant Keystone platform_set_drvdata() (Kishon Vijay
   Abraham I)

 - Rename Keystone functions for uniformity (Kishon Vijay Abraham I)

 - Add Keystone device control module DT binding (Kishon Vijay Abraham
   I)

 - Use SYSCON API to get Keystone control module device IDs (Kishon
   Vijay Abraham I)

 - Clean up Keystone PHY handling (Kishon Vijay Abraham I)

 - Use runtime PM APIs to enable Keystone clock (Kishon Vijay Abraham I)

 - Clean up Keystone config space access checks (Kishon Vijay Abraham I)

 - Get Keystone outbound window count from DT (Kishon Vijay Abraham I)

 - Clean up Keystone outbound window configuration (Kishon Vijay Abraham
   I)

 - Clean up Keystone DBI setup (Kishon Vijay Abraham I)

 - Clean up Keystone ks_pcie_link_up() (Kishon Vijay Abraham I)

 - Fix Keystone IRQ status checking (Kishon Vijay Abraham I)

 - Add debug messages for all Keystone errors (Kishon Vijay Abraham I)

 - Clean up Keystone includes and macros (Kishon Vijay Abraham I)

 - Fix Mediatek unchecked return value from devm_pci_remap_iospace()
   (Gustavo A. R. Silva)

 - Fix Mediatek endpoint/port matching logic (Honghui Zhang)

 - Change Mediatek Root Port Class Code to PCI_CLASS_BRIDGE_PCI (Honghui
   Zhang)

 - Remove redundant Mediatek PM domain check (Honghui Zhang)

 - Convert Mediatek to pci_host_probe() (Honghui Zhang)

 - Fix Mediatek MSI enablement (Honghui Zhang)

 - Add Mediatek system PM support for MT2712 and MT7622 (Honghui Zhang)

 - Add Mediatek loadable module support (Honghui Zhang)

 - Detach VMD resources after stopping root bus to prevent orphan
   resources (Jon Derrick)

 - Convert pcitest build process to that used by other tools (iio, perf,
   etc) (Gustavo Pimentel)

* tag 'pci-v4.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (140 commits)
  PCI/AER: Refactor error injection fallbacks
  PCI/AER: Abstract AER interrupt handling
  PCI/AER: Reuse existing pcie_port_find_device() interface
  PCI/AER: Use managed resource allocations
  PCI: pcie: Remove redundant 'default n' from Kconfig
  PCI: aardvark: Implement emulated root PCI bridge config space
  PCI: mvebu: Convert to PCI emulated bridge config space
  PCI: mvebu: Drop unused PCI express capability code
  PCI: Introduce PCI bridge emulated config space common logic
  PCI: vmd: Detach resources after stopping root bus
  nvmet: Optionally use PCI P2P memory
  nvmet: Introduce helper functions to allocate and free request SGLs
  nvme-pci: Add support for P2P memory in requests
  nvme-pci: Use PCI p2pmem subsystem to manage the CMB
  IB/core: Ensure we map P2P memory correctly in rdma_rw_ctx_[init|destroy]()
  block: Add PCI P2P flag for request queue
  PCI/P2PDMA: Add P2P DMA driver writer's documentation
  docs-rst: Add a new directory for PCI documentation
  PCI/P2PDMA: Introduce configfs/sysfs enable attribute helpers
  PCI/P2PDMA: Add PCI p2pmem DMA mappings to adjust the bus offset
  ...
2018-10-25 06:50:48 -07:00
Parav Pandit 508a523f6b RDMA/drivers: Use core provided API for registering device attributes
Use rdma_set_device_sysfs_group() to register device attributes and
simplify the driver.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-17 03:45:01 -06:00
Jason Gunthorpe 59bfc59a68 Merge branch 'for-rc' into rdma.git for-next
From git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git

This is required to resolve dependencies of the next series of RDMA
patches.

The code motion conflicts in drivers/infiniband/core/cache.c were
resolved.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16 00:01:02 -06:00
Venkata Sandeep Dhanalakota 1570346153 IB/{hfi1, qib, rdmavt}: Move ruc_loopback to rdmavt
This patch moves ruc_loopback() from hfi1 into rdmavt for code sharing
with the qib driver.

Reviewed-by: Brian Welty <brian.welty@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-03 16:38:28 -06:00
Venkata Sandeep Dhanalakota 116aa0330e IB/{hfi1, qib, rdmavt}: Move send completion logic to rdmavt
Moving send completion code into rdmavt in order to have shared logic
between qib and hfi1 drivers.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-03 16:38:28 -06:00
Brian Welty 019f118b94 IB/{hfi1, qib, rdmavt}: Move copy SGE logic into rdmavt
This patch moves hfi1_copy_sge() into rdmavt for sharing with qib.
This patch also moves all the wss_*() functions into rdmavt as
several wss_*() functions are called from hfi1_copy_sge()

When SGE copy mode is adaptive, cacheless copy may be done in some cases
for performance reasons. In those cases, X86 cacheless copy function
is called since the drivers that use rdmavt and may set SGE copy mode
to adaptive are X86 only. For this reason, this patch adds
"depends on X86_64" to rdmavt/Kconfig.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-03 16:38:28 -06:00
Oza Pawandeep 62b36c3ea6 PCI/AER: Remove pci_cleanup_aer_uncorrect_error_status() calls
After bfcb79fca1 ("PCI/ERR: Run error recovery callbacks for all affected
devices"), AER errors are always cleared by the PCI core and drivers don't
need to do it themselves.

Remove calls to pci_cleanup_aer_uncorrect_error_status() from device
driver error recovery functions.

Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
[bhelgaas: changelog, remove PCI core changes, remove unused variables]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-10-02 16:04:40 -05:00
Kaike Wan bfe397c387 IB/hfi1: Use VL15 for SM packets
Subnet Management Packets (SMP) should exclusively use VL15 and their SL
is ignored (IBTA v1.3, Section 3.5.8.2). Therefore, when an SMP is posted,
the SL in the address handle can be set to 0 by a user
application. Consequently, when an address handle is created by the IB
core, some fields in struct rvt_ah may not be set correctly by using the
SL2SC and SC2VL tables at the time. Subsequently, when the request is post
sent, the incoming swqe may fail the validation check, resulting in the
rejection of the send request.

This patch fixes the problem by using VL15 for any validation, ignoring
the SL in the address handle.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-30 19:21:12 -06:00
Alex Estrin eb50130964 IB/hfi1: Add mtu check for operational data VLs
Since Virtual Lanes BCT credits and MTU are set through separate MADs, we
have to ensure both are valid, and data VLs are ready for transmission
before we allow port transition to Armed state.

Fixes: 5e2d6764a7 ("IB/hfi1: Verify port data VLs credits on transition to Armed")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-30 19:21:12 -06:00
Kaike Wan 15b796bc3d IB/hfi1: Add static trace for iowait
This patch adds the static trace for resource wait.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-30 19:21:12 -06:00
Dennis Dalessandro 5da0fc9dbf IB/hfi1: Prepare resource waits for dual leg
Current implementation allows each qp to have only one send engine.  As
such, each qp has only one list to queue prebuilt packets when send engine
resources are not available. To improve performance, it is desired to
support multiple send engines for each qp.

This patch creates the framework to support two send engines
(two legs) for each qp for the TID RDMA protocol, which can be easily
extended to support more send engines. It achieves the goal by creating a
leg specific struct, iowait_work in the iowait struct, to hold the
work_struct and the tx_list as well as a pointer to the parent iowait
struct.

The hfi1_pkt_state now has an additional field to record the current legs
work structure and that is now passed to all egress waiters to determine
the leg that needs to wait via a new iowait helper.  The APIs are adjusted
to use the new leg specific struct as required.

Many new and modified helpers are added to support this change.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-30 19:21:12 -06:00
Kaike Wan d205a06a14 IB/rdmavt: Rename check_send_wqe as setup_wqe
The driver-provided function check_send_wqe allows the hardware driver to
check and set up the incoming send wqe before it is inserted into the swqe
ring. This patch will rename it as setup_wqe to better reflect its
usage. In addition, this function is only called when all setup is
complete in rdmavt.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-30 19:21:11 -06:00
Michael J. Ruhl 935c84ac64 IB/hfi1: Error path MAD response size is incorrect
If a MAD packet has incorrect header information, the logic uses the reply
path to report the error.  The reply path expects *resp_len to be set
prior to return.  Unfortunately, *resp_len is set to 0 for this path.
This causes an incorrect response packet.

Fix by ensuring that the *resp_len is defaulted to the incoming packet
size (wc->bytes_len - sizeof(GRH)).

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-30 19:21:11 -06:00
Greg Kroah-Hartman ad0371482b Second rc pull request
- Fix a long standing race bug when destroying comp_event file descriptors
 
 - srp, hfi1, bnxt_re: Various driver crashes from missing validation and
   other cases
 
 - Fixes for regressions in patches merged this window in the gid cache,
   devx, ucma and uapi.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlutG7QACgkQOG33FX4g
 mxps3Q//fPL1o4Na/nx/aSmtfcEuPxjXwczUbB0MzlN7rKu0HjrES/DOXbGJZgS7
 5XGu7TQyOBWzpZHZM4Xs+EjTdVY9pOK5+Q+t76Ns1vZsblCATJPOCIYMVkr+zM5T
 zQxEfarlz8kEPWVz/TBAM7djoSkGXNLQQ1ttSVJBSAindhb25NzoiV1fsFGkXKmT
 xbZHW03K1o4dZMf0JJO77eu1m9tkRfX0iFXBg4efgNOZtl9MP4l5lRwPqyRdGhMO
 MvBNCJte/4dKlg5TeoVtp8B0sJvrO6QOj878VqYH4nL8uW2lZWFuzB7ZWxWGxMb/
 VgXwpmXyDuTUjX9gn3kQToG8L9FJqsMzPZGBePMjilJqpgpqoVZpQ8i674JBZznF
 as/bRSLdkA4PL5upFepBD/j9ep9LGIb6ojDcUSmwB+aJ99ZtlFUOhdf4ehQ2x9MU
 WETIcdSqlmFjhY2IoHFKEr5PP3NGIDtPlvZbAmJrrne3uTCwXChXeZgzX1S6AHkY
 djvDEaV8DrMmdJc0XTIOIQvxzMnHuoCo3oHdVaJMLudGkw1NKMN84SkIPb6U2Qjk
 QiPf3pRRUazO6tqJfEZr8Es5lQjYkZUK4wKagOJKdcLBAubEnMhrsvECXQuoFWT8
 sYhzK3AwyEaXp4Y3VbA+aqjZA/AhtPk/l3QKesU6u7hoLtiawoc=
 =cjxG
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Jason writes:
  "Second RDMA rc pull request

   - Fix a long standing race bug when destroying comp_event file descriptors

   - srp, hfi1, bnxt_re: Various driver crashes from missing validation
     and other cases

   - Fixes for regressions in patches merged this window in the gid
     cache, devx, ucma and uapi."

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/core: Set right entry state before releasing reference
  IB/mlx5: Destroy the DEVX object upon error flow
  IB/uverbs: Free uapi on destroy
  RDMA/bnxt_re: Fix system crash during RDMA resource initialization
  IB/hfi1: Fix destroy_qp hang after a link down
  IB/hfi1: Fix context recovery when PBC has an UnsupportedVL
  IB/hfi1: Invalid user input can result in crash
  IB/hfi1: Fix SL array bounds check
  RDMA/uverbs: Fix validity check for modify QP
  IB/srp: Avoid that sg_reset -d ${srp_device} triggers an infinite loop
  ucma: fix a use-after-free in ucma_resolve_ip()
  RDMA/uverbs: Atomically flush and mark closed the comp event queue
  cxgb4: fix abort_req_rss6 struct
2018-09-27 21:53:55 +02:00
Michael J. Ruhl e04951ebee IB/hfi1: Move UnsupportedVL bits definitions to the correct header
The UnsupportedVL SendCtrl register bit information is defined in
the module rather than the chip register header file.

Move the defines to the appropriate header file.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-26 16:35:48 -06:00
Michael J. Ruhl b4a4957d3d IB/hfi1: Fix destroy_qp hang after a link down
rvt_destroy_qp() cannot complete until all in process packets have
been released from the underlying hardware.  If a link down event
occurs, an application can hang with a kernel stack similar to:

cat /proc/<app PID>/stack
 quiesce_qp+0x178/0x250 [hfi1]
 rvt_reset_qp+0x23d/0x400 [rdmavt]
 rvt_destroy_qp+0x69/0x210 [rdmavt]
 ib_destroy_qp+0xba/0x1c0 [ib_core]
 nvme_rdma_destroy_queue_ib+0x46/0x80 [nvme_rdma]
 nvme_rdma_free_queue+0x3c/0xd0 [nvme_rdma]
 nvme_rdma_destroy_io_queues+0x88/0xd0 [nvme_rdma]
 nvme_rdma_error_recovery_work+0x52/0xf0 [nvme_rdma]
 process_one_work+0x17a/0x440
 worker_thread+0x126/0x3c0
 kthread+0xcf/0xe0
 ret_from_fork+0x58/0x90
 0xffffffffffffffff

quiesce_qp() waits until all outstanding packets have been freed.
This wait should be momentary.  During a link down event, the cleanup
handling does not ensure that all packets caught by the link down are
flushed properly.

This is caused by the fact that the freeze path and the link down
event is handled the same.  This is not correct.  The freeze path
waits until the HFI is unfrozen and then restarts PIO.  A link down
is not a freeze event.  The link down path cannot restart the PIO
until link is restored.  If the PIO path is restarted before the link
comes up, the application (QP) using the PIO path will hang (until
link is restored).

Fix by separating the linkdown path from the freeze path and use the
link down path for link down events.

Close a race condition sc_disable() by acquiring both the progress
and release locks.

Close a race condition in sc_stop() by moving the setting of the flag
bits under the alloc lock.

Cc: <stable@vger.kernel.org> # 4.9.x+
Fixes: 7724105686 ("IB/hfi1: add driver files")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-20 19:24:51 -06:00
Michael J. Ruhl d623500b3c IB/hfi1: Fix context recovery when PBC has an UnsupportedVL
If a packet stream uses an UnsupportedVL (virtual lane), the send
engine will not send the packet, and it will not indicate that an
error has occurred.  This will cause the packet stream to block.

HFI has 8 virtual lanes available for packet streams.  Each lane can
be enabled or disabled using the UnsupportedVL mask.  If a lane is
disabled, adding a packet to the send context must be disallowed.

The current mask for determining unsupported VLs defaults to 0 (allow
all).  This is incorrect.  Only the VLs that are defined should be
allowed.

Determine which VLs are disabled (mtu == 0), and set the appropriate
unsupported bit in the mask.  The correct mask will allow the send
engine to error on the invalid VL, and error recovery will work
correctly.

Cc: <stable@vger.kernel.org> # 4.9.x+
Fixes: 7724105686 ("IB/hfi1: add driver files")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Lukasz Odzioba <lukasz.odzioba@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-20 19:24:51 -06:00
Michael J. Ruhl 94694d18cf IB/hfi1: Invalid user input can result in crash
If the number of packets in a user sdma request does not match
the actual iovectors being sent, sdma_cleanup can be called on
an uninitialized request structure, resulting in a crash similar
to this:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffffc0ae8bb7>] __sdma_txclean+0x57/0x1e0 [hfi1]
PGD 8000001044f61067 PUD 1052706067 PMD 0
Oops: 0000 [#1] SMP
CPU: 30 PID: 69912 Comm: upsm Kdump: loaded Tainted: G           OE
------------   3.10.0-862.el7.x86_64 #1
Hardware name: Intel Corporation S2600KPR/S2600KPR, BIOS
SE5C610.86B.01.01.0019.101220160604 10/12/2016
task: ffff8b331c890000 ti: ffff8b2ed1f98000 task.ti: ffff8b2ed1f98000
RIP: 0010:[<ffffffffc0ae8bb7>]  [<ffffffffc0ae8bb7>] __sdma_txclean+0x57/0x1e0
[hfi1]
RSP: 0018:ffff8b2ed1f9bab0  EFLAGS: 00010286
RAX: 0000000000008b2b RBX: ffff8b2adf6e0000 RCX: 0000000000000000
RDX: 00000000000000a0 RSI: ffff8b2e9eedc540 RDI: ffff8b2adf6e0000
RBP: ffff8b2ed1f9bad8 R08: 0000000000000000 R09: ffffffffc0b04a06
R10: ffff8b331c890190 R11: ffffe6ed00bf1840 R12: ffff8b3315480000
R13: ffff8b33154800f0 R14: 00000000fffffff2 R15: ffff8b2e9eedc540
FS:  00007f035ac47740(0000) GS:ffff8b331e100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 0000000c03fe6000 CR4: 00000000001607e0
Call Trace:
 [<ffffffffc0b0570d>] user_sdma_send_pkts+0xdcd/0x1990 [hfi1]
 [<ffffffff9fe75fb0>] ? gup_pud_range+0x140/0x290
 [<ffffffffc0ad3105>] ? hfi1_mmu_rb_insert+0x155/0x1b0 [hfi1]
 [<ffffffffc0b0777b>] hfi1_user_sdma_process_request+0xc5b/0x11b0 [hfi1]
 [<ffffffffc0ac193a>] hfi1_aio_write+0xba/0x110 [hfi1]
 [<ffffffffa001a2bb>] do_sync_readv_writev+0x7b/0xd0
 [<ffffffffa001bede>] do_readv_writev+0xce/0x260
 [<ffffffffa022b089>] ? tty_ldisc_deref+0x19/0x20
 [<ffffffffa02268c0>] ? n_tty_ioctl+0xe0/0xe0
 [<ffffffffa001c105>] vfs_writev+0x35/0x60
 [<ffffffffa001c2bf>] SyS_writev+0x7f/0x110
 [<ffffffffa051f7d5>] system_call_fastpath+0x1c/0x21
Code: 06 49 c7 47 18 00 00 00 00 0f 87 89 01 00 00 5b 41 5c 41 5d 41 5e 41 5f
5d c3 66 2e 0f 1f 84 00 00 00 00 00 48 8b 4e 10 48 89 fb <48> 8b 51 08 49 89 d4
83 e2 0c 41 81 e4 00 e0 00 00 48 c1 ea 02
RIP  [<ffffffffc0ae8bb7>] __sdma_txclean+0x57/0x1e0 [hfi1]
 RSP <ffff8b2ed1f9bab0>
CR2: 0000000000000008

There are two exit points from user_sdma_send_pkts().  One (free_tx)
merely frees the slab entry and one (free_txreq) cleans the sdma_txreq
prior to freeing the slab entry.   The free_txreq variation can only be
called after one of the sdma_init*() variations has been called.

In the panic case, the slab entry had been allocated but not inited.

Fix the issue by exiting through free_tx thus avoiding sdma_clean().

Cc: <stable@vger.kernel.org> # 4.9.x+
Fixes: 7724105686 ("IB/hfi1: add driver files")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Lukasz Odzioba <lukasz.odzioba@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-20 19:24:51 -06:00
Ira Weiny 0dbfaa9f28 IB/hfi1: Fix SL array bounds check
The SL specified by a user needs to be a valid SL.

Add a range check to the user specified SL value which protects from
running off the end of the SL to SC table.

CC: stable@vger.kernel.org
Fixes: 7724105686 ("IB/hfi1: add driver files")
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-20 19:24:51 -06:00
Dennis Dalessandro bfc456060d IB/hfi1,PCI: Allow bus reset while probing
Calling into the new API to reset the secondary bus results in a deadlock.
This occurs because the device/bus is already locked at probe time.
Reverting back to the old behavior while the API is improved.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=200985
Fixes: c6a44ba950 ("PCI: Rename pci_try_reset_bus() to pci_reset_bus()")
Fixes: 409888e096 ("IB/hfi1: Use pci_try_reset_bus() for initiating PCI Secondary Bus Reset")
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Cc: Sinan Kaya <okaya@codeaurora.org>
2018-09-11 21:44:52 -05:00
Michael J. Ruhl b53ae6bc7e IB/hfi1: set_intr_bits uses incorrect source for register modification
HFI IRQ enable bits are not being set correctly.  Send context error and
DC IRQs are not being enabled correctly.  In addition, send context error
IRQs are not being delivered.

Because of this, send context errors are not being handled correctly when
they occur.

When setting the IRQ bits, if an IRQ range is used, and the last bit is on
a register boundary (bit 63), the calculated index for the final register
modification is incorrect (index + 1 vs. index).

The incorrect index calculation causes incorrect IRQ bits to be set.  In
this case the send context error IRQ is NOT enabled.

Fix by using the 'last' value rather than the counted 'src' value to
determine the final index to use.  This satisfies all cases.

Fixes: a2f7bbdc2d ("IB/hfi1: Rework the IRQ API to be more flexible")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-11 11:33:13 -06:00
Michael J. Ruhl 2bf4b33f83 IB/hfi1: Missing return value in error path for user sdma
If the set_txreq_header_agh() function returns an error, the exit path
is chosen.

In this path, the code fails to set the return value.  This will cause
the caller to not realize an error has occurred.

Set the return value correctly in the error path.

Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-11 10:05:17 -06:00
Michael J. Ruhl 3ca633f1ff IB/hfi1: Right size user_sdma sequence numbers and related variables
Hardware limits the maximum number of packets to u16 packets.

Match that size for all relevant sequence numbers in the user_sdma
engine.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-11 10:05:17 -06:00
Michael J. Ruhl 28a9a9e83c IB/hfi1: Remove race conditions in user_sdma send path
Packet queue state is over used to determine SDMA descriptor
availablitity and packet queue request state.

cpu 0  ret = user_sdma_send_pkts(req, pcount);
cpu 0  if (atomic_read(&pq->n_reqs))
cpu 1  IRQ user_sdma_txreq_cb calls pq_update() (state to _INACTIVE)
cpu 0        xchg(&pq->state, SDMA_PKT_Q_ACTIVE);

At this point pq->n_reqs == 0 and pq->state is incorrectly
SDMA_PKT_Q_ACTIVE.  The close path will hang waiting for the state
to return to _INACTIVE.

This can also change the state from _DEFERRED to _ACTIVE.  However,
this is a mostly benign race.

Remove the racy code path.

Use n_reqs to determine if a packet queue is active or not.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-11 10:05:17 -06:00
Michael J. Ruhl a0e0cb8280 IB/hfi1: Eliminate races in the SDMA send error path
pq_update() can only be called in two places: from the completion
function when the complete (npkts) sequence of packets has been
submitted and processed, or from setup function if a subset of the
packets were submitted (i.e. the error path).

Currently both paths can call pq_update() if an error occurrs.  This
race will cause the n_req value to go negative, hanging file_close(),
or cause a crash by freeing the txlist more than once.

Several variables are used to determine SDMA send state.  Most of
these are unnecessary, and have code inspectible races between the
setup function and the completion function, in both the send path and
the error path.

The request 'status' value can be set by the setup or by the
completion function.  This is code inspectibly racy.  Since the status
is not needed in the completion code or by the caller it has been
removed.

The request 'done' value races between usage by the setup and the
completion function.  The completion function does not need this.
When the number of processed packets matches npkts, it is done.

The 'has_error' value races between usage of the setup and the
completion function.  This can cause incorrect error handling and leave
the n_req in an incorrect value (i.e. negative).

Simplify the code by removing all of the unneeded state checks and
variables.

Clean up iovs node when it is freed.

Eliminate race conditions in the error path:

If all packets are submitted, the completion handler will set the
completion status correctly (ok or aborted).

If all packets are not submitted, the caller must wait until the
submitted packets have completed, and then set the completion status.

These two change eliminate the race condition in the error path.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-11 10:05:16 -06:00
Michael J. Ruhl 0b79b27748 IB/{hfi1, qib, rdmavt}: Schedule multi RC/UC packets instead of posting
The post_send() path determines if it should post directly or, schedule
the post for later.  The current logic is:

  if the swqe ring is empty or (for hfi1) wqe->length <= piothreshold
    post the send
  else
    schedule

This can allow large requests to call the send engine directly.  Large
requests can potentially produce a large number of packets prior to
returning to the caller, blocking the caller from posting more requests,
and allowing better parallel processing.

Allow the driver(s) more say in this logic (pass call_send to the driver,
rather than examining a return value).

Update hfi1/qib logic to schedule the send engine if an RC or UC message
is larger than the QP MTU size.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-11 09:55:02 -06:00
Michael J. Ruhl dc9f5d0f84 IB/hfi1: Move URGENT IRQ enable to hfi1_rcvctrl()
User contexts use the receive URGENT interrupt.  However, enabling
the IRQ SRC in the file_ops module is not as clean as it could be.

Augment the _rcvctl() function to be able to enable/disable the IRQ
source.

Use the new interface from file_ops to enable/disable the IRQ.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-01 08:13:38 -04:00
Michael J. Ruhl a2f7bbdc2d IB/hfi1: Rework the IRQ API to be more flexible
The current IRQ API is an all or nothing interface.  This has two
problems:

  1. All IRQs are enabled regardless of use
  2. Moving from general interrupt to MSIx handling is difficult

Introduce a new API to enable/disable specific IRQs or a range of IRQs.

Do not enable and disable all IRQs in one step.

Rework various modules to enable/disable IRQs when needed.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-01 08:13:38 -04:00
Kamenee Arumugam e63bb50d19 IB/hfi1: PCIe bus width retry
Retry the PCIe link training up to 'pcie_retry' times
if the PCIe link width is narrower than the previous width.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-01 08:13:38 -04:00
Michael J. Ruhl 6eb4eb10fb IB/hfi1: Make the MSIx resource allocation a bit more flexible
The current method of allocating MSIx resources is a bit cumbersome,
and not very easily added to.

Refactor and re-order the code paths into a more consistent interface.

Update the interface so that allocations are not order dependent.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-01 08:13:38 -04:00
Michael J. Ruhl 09e71899b9 IB/hfi1: Prepare for new HFI1 MSIx API
The current HFI1 MSIx API is difficult to follow, change, or add to.

In anticipation of moving to an more flexible API, move the current
MSIx functionality to the new msix.c module.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-01 08:11:35 -04:00
Michael J. Ruhl 57f97e9662 IB/hfi1: Get the hfi1_devdata structure as early as possible
Currently several things occur before the hfi1_devdata structure is
allocated.  This leads to an inconsistent logging ability and makes
it more difficult to restructure some code paths.

Allocate (and do a minimal init) the structure as soon as possible.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-01 08:11:34 -04:00
Michael J. Ruhl 6a516bc9d7 IB/hfi1: tune_pcie_caps is arbitrarily placed, poorly
The tune_pcie_caps needs to occur sometime after PCI is enabled, but
before the HFI is enabled.  Currently it is placed in the MSIx
allocation code which doesn't really fit. Moving it to just after
the gen3 bump.

Clean up the associated code (modules, etc.).

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-01 08:11:34 -04:00
Michael J. Ruhl 22c21438aa IB/hfi1: Remove duplicated defines
TXREQ defines are duplicated, incompletely, in the sdma header file.

Remove duplicate defines.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-01 08:11:34 -04:00
Dennis Dalessandro c54a73d820 IB/hfi1: Rework file list in Makefile
We want to keep files in alphabetical order in our makefile, however this
just makes for messy diffs when adding (or removing) files. Let's just clean
this up and make it line by line.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-01 08:11:34 -04:00
Linus Torvalds 1290290c92 Second merge window update
- Switch SMC over to rdma_get_gid_attr and remove the compat
 
 - Fix a crash in HFI1 with some BIOS's
 
 - Fix a randconfig failure
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlt9z6kACgkQOG33FX4g
 mxqLag//Zr3RrNSmswNjl+6PrWeg88GQstAm4gQps8mR03eI8Ha5fAS6MWuq1U8+
 LGP8bbBtE6fHSCkPSnN/owReZSTcTIbgWB021vBIUKUpytmxie+ugMVXIHwm6cIM
 E3pgPsRngTrqtuRxyOhyoaMKSjRCbSdMe43SWf42/S+9SxDkmXtBzymYT119bu/A
 eqU7z/HuRqcVxrgOinhPjcQkEFdRYUaRk+g6jzaQmZl8IjKHp/d3BSzmzbE6txMt
 txP5hu/msz4xkyusX/I6ZS3do+4WMIAMNhq9bVMo5pNu2RG1eo0LLmHgjsDsFI3u
 ZTaZQj6eYWlbwWRiOU+2Frf4kH4CGAyxlFVCr4yEvfe2VvEbZsLZbYuF1iasnqrP
 3Mrbh7Esj4MfDT/QbunnoX+49X/Rzma/a+tu6hqAhnGjvYsaDyy8WSNwHMs65f7m
 sii58arik9exJHVKMXZ5FJJydBYBcPt5IlkUdoTTZ1eLkoc2B9lG6lKZbWxUYNPx
 uS6XOyR4fcsmLfKfEvQuLZznWi3Jo4R17QKlL5ulqHrZUJV4PZ4Eiu5izScZ4ci/
 xcPGz1a4MT7Zg3V611qGSGt7zYDdD5+R+k/sDIOleCFIpueFzzh/I1oLkSIoXFIx
 DmwqCdRGeB5F7ZhfArFTezgiaMe55SPJKY5PAiThxzoostUGOfw=
 =6z1N
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull more rdma updates from Jason Gunthorpe:
 "This is the SMC cleanup promised, a randconfig regression fix, and
  kernel oops fix.

  Summary:

   - Switch SMC over to rdma_get_gid_attr and remove the compat

   - Fix a crash in HFI1 with some BIOS's

   - Fix a randconfig failure"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  IB/ucm: fix UCM link error
  IB/hfi1: Invalid NUMA node information can cause a divide by zero
  RDMA/smc: Replace ib_query_gid with rdma_get_gid_attr
2018-08-23 15:34:48 -07:00
Michal Hocko 93065ac753 mm, oom: distinguish blockable mode for mmu notifiers
There are several blockable mmu notifiers which might sleep in
mmu_notifier_invalidate_range_start and that is a problem for the
oom_reaper because it needs to guarantee a forward progress so it cannot
depend on any sleepable locks.

Currently we simply back off and mark an oom victim with blockable mmu
notifiers as done after a short sleep.  That can result in selecting a new
oom victim prematurely because the previous one still hasn't torn its
memory down yet.

We can do much better though.  Even if mmu notifiers use sleepable locks
there is no reason to automatically assume those locks are held.  Moreover
majority of notifiers only care about a portion of the address space and
there is absolutely zero reason to fail when we are unmapping an unrelated
range.  Many notifiers do really block and wait for HW which is harder to
handle and we have to bail out though.

This patch handles the low hanging fruit.
__mmu_notifier_invalidate_range_start gets a blockable flag and callbacks
are not allowed to sleep if the flag is set to false.  This is achieved by
using trylock instead of the sleepable lock for most callbacks and
continue as long as we do not block down the call chain.

I think we can improve that even further because there is a common pattern
to do a range lookup first and then do something about that.  The first
part can be done without a sleeping lock in most cases AFAICS.

The oom_reaper end then simply retries if there is at least one notifier
which couldn't make any progress in !blockable mode.  A retry loop is
already implemented to wait for the mmap_sem and this is basically the
same thing.

The simplest way for driver developers to test this code path is to wrap
userspace code which uses these notifiers into a memcg and set the hard
limit to hit the oom.  This can be done e.g.  after the test faults in all
the mmu notifier managed memory and set the hard limit to something really
small.  Then we are looking for a proper process tear down.

[akpm@linux-foundation.org: coding style fixes]
[akpm@linux-foundation.org: minor code simplification]
Link: http://lkml.kernel.org/r/20180716115058.5559-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Christian König <christian.koenig@amd.com> # AMD notifiers
Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx and umem_odp
Reported-by: David Rientjes <rientjes@google.com>
Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Sudeep Dutt <sudeep.dutt@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22 10:52:44 -07:00
Michael J. Ruhl c513de490f IB/hfi1: Invalid NUMA node information can cause a divide by zero
If the system BIOS does not supply NUMA node information to the
PCI devices, the NUMA node is selected by choosing the current
node.

This can lead to the following crash:

divide error: 0000 SMP
CPU: 0 PID: 4 Comm: kworker/0:0 Tainted: G          IOE
------------   3.10.0-693.21.1.el7.x86_64 #1
Hardware name: Intel Corporation S2600KP/S2600KP, BIOS
SE5C610.86B.01.01.0005.101720141054 10/17/2014
Workqueue: events work_for_cpu_fn
task: ffff880174480fd0 ti: ffff880174488000 task.ti: ffff880174488000
RIP: 0010: [<ffffffffc020ac69>] hfi1_dev_affinity_init+0x129/0x6a0 [hfi1]
RSP: 0018:ffff88017448bbf8  EFLAGS: 00010246
RAX: 0000000000000011 RBX: ffff88107ffba6c0 RCX: ffff88085c22e130
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880824ad0000
RBP: ffff88017448bc48 R08: 0000000000000011 R09: 0000000000000002
R10: ffff8808582b6ca0 R11: 0000000000003151 R12: ffff8808582b6ca0
R13: ffff8808582b6518 R14: ffff8808582b6010 R15: 0000000000000012
FS:  0000000000000000(0000) GS:ffff88085ec00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007efc707404f0 CR3: 0000000001a02000 CR4: 00000000001607f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
 hfi1_init_dd+0x14b3/0x27a0 [hfi1]
 ? pcie_capability_write_word+0x46/0x70
 ? hfi1_pcie_init+0xc0/0x200 [hfi1]
 do_init_one+0x153/0x4c0 [hfi1]
 ? sched_clock_cpu+0x85/0xc0
 init_one+0x1b5/0x260 [hfi1]
 local_pci_probe+0x4a/0xb0
 work_for_cpu_fn+0x1a/0x30
 process_one_work+0x17f/0x440
 worker_thread+0x278/0x3c0
 ? manage_workers.isra.24+0x2a0/0x2a0
 kthread+0xd1/0xe0
 ? insert_kthread_work+0x40/0x40
 ret_from_fork+0x77/0xb0
 ? insert_kthread_work+0x40/0x40

If the BIOS is not supplying NUMA information:
  - set the default table count to 1 for all possible nodes
  - select node 0 (instead of current NUMA) node to get consistent
    performance
  - generate an error indicating that the BIOS should be upgraded

Reviewed-by: Gary Leshner <gary.s.leshner@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-20 10:45:48 -06:00
Jason Gunthorpe 0a3173a5f0 Merge branch 'linus/master' into rdma.git for-next
rdma.git merge resolution for the 4.19 merge window

Conflicts:
 drivers/infiniband/core/rdma_core.c
   - Use the rdma code and revise with the new spelling for
     atomic_fetch_add_unless
 drivers/nvme/host/rdma.c
   - Replace max_sge with max_send_sge in new blk code
 drivers/nvme/target/rdma.c
   - Use the blk code and revise to use NULL for ib_post_recv when
     appropriate
   - Replace max_sge with max_recv_sge in new blk code
 net/rds/ib_send.c
   - Use the net code and revise to use NULL for ib_post_recv when
     appropriate

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-16 14:21:29 -06:00
Jason Gunthorpe 89982f7cce Linux 4.18
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAltwm2geHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGITkH/iSzkVhT2OxHoir0
 mLVzTi7/Z17L0e/ELl7TvAC0iLFlWZKdlGR0g3b4/QpXLPmNK4HxiDRTQuWn8ke0
 qDZyDq89HqLt+mpeFZ43PCd9oqV8CH2xxK3iCWReqv6bNnowGnRpSStlks4rDqWn
 zURC/5sUh7TzEG4s997RrrpnyPeQWUlf/Mhtzg2/WvK2btoLWgu5qzjX1uFh3s7u
 vaF2NXVJ3X03gPktyxZzwtO1SwLFS1jhwUXWBZ5AnoJ99ywkghQnkqS/2YpekNTm
 wFk80/78sU+d91aAqO8kkhHj8VRrd+9SGnZ4mB2aZHwjZjGcics4RRtxukSfOQ+6
 L47IdXo=
 =sJkt
 -----END PGP SIGNATURE-----

Merge tag 'v4.18' into rdma.git for-next

Resolve merge conflicts from the -rc cycle against the rdma.git tree:

Conflicts:
 drivers/infiniband/core/uverbs_cmd.c
  - New ifs added to ib_uverbs_ex_create_flow in -rc and for-next
  - Merge removal of file->ucontext in for-next with new code in -rc
 drivers/infiniband/core/uverbs_main.c
  - for-next removed code from ib_uverbs_write() that was modified
    in for-rc

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-16 13:12:00 -06:00
Linus Torvalds 4e31843f68 pci-v4.19-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAlt1f9AUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vxbdhAArnhRvkwOk4m4/LCuKF6HpmlxbBNC
 TjnBCenNf+lFXzWskfDFGFl/Wif4UzGbRTSCNQrwMzj3Ww3f/6R2QIq9rEJvyNC4
 VdxQnaBEZSUgN87q5UGqgdjMTo3zFvlFH6fpb5XDiQ5IX/QZeXeYqoB64w+HvKPU
 M+IsoOvnA5gb7pMcpchrGUnSfS1e6AqQbbTt6tZflore6YCEA4cH5OnpGx8qiZIp
 ut+CMBvQjQB01fHeBc/wGrVte4NwXdONrXqpUb4sHF7HqRNfEh0QVyPhvebBi+k1
 kquqoBQfPFTqgcab31VOcQhg70dEx+1qGm5/YBAwmhCpHR/g2gioFXoROsr+iUOe
 BtF6LZr+Y8cySuhJnkCrJBqWvvBaKbJLg0KMbI+7p4o9MZpod2u7LS5LFrlRDyKW
 3nz3o+b1+v3tCCKVKIhKo0ljolgkweQtR1f6KIHvq93wBODHVQnAOt9NlPfHVyks
 ryGBnOhMjoU5hvfexgIWFk9Ph9MEVQSffkI+TeFPO/tyGBfGfQyGtESiXuEaMQaH
 FGdZHX2RLkY3pWHOtWeMzRHzOnr2XjpDFcAqL3HBGPdJ30K3Umv3WOgoFe2SaocG
 0gaddPjKSwwM4Sa/VP+O5cjGuzi7QnczSDdpYjxIGZzBav32hqx4/rsnLw7bHH8y
 XkEme7cYJc8MGsA=
 =2Dmn
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull pci updates from Bjorn Helgaas:

 - Decode AER errors with names similar to "lspci" (Tyler Baicar)

 - Expose AER statistics in sysfs (Rajat Jain)

 - Clear AER status bits selectively based on the type of recovery (Oza
   Pawandeep)

 - Honor "pcie_ports=native" even if HEST sets FIRMWARE_FIRST (Alexandru
   Gagniuc)

 - Don't clear AER status bits if we're using the "Firmware-First"
   strategy where firmware owns the registers (Alexandru Gagniuc)

 - Use sysfs_match_string() to simplify ASPM sysfs parsing (Andy
   Shevchenko)

 - Remove unnecessary includes of <linux/pci-aspm.h> (Bjorn Helgaas)

 - Defer DPC event handling to work queue (Keith Busch)

 - Use threaded IRQ for DPC bottom half (Keith Busch)

 - Print AER status while handling DPC events (Keith Busch)

 - Work around IDT switch ACS Source Validation erratum (James
   Puthukattukaran)

 - Emit diagnostics for all cases of PCIe Link downtraining (Links
   operating slower than they're capable of) (Alexandru Gagniuc)

 - Skip VFs when configuring Max Payload Size (Myron Stowe)

 - Reduce Root Port Max Payload Size if necessary when hot-adding a
   device below it (Myron Stowe)

 - Simplify SHPC existence/permission checks (Bjorn Helgaas)

 - Remove hotplug sample skeleton driver (Lukas Wunner)

 - Convert pciehp to threaded IRQ handling (Lukas Wunner)

 - Improve pciehp tolerance of missed events and initially unstable
   links (Lukas Wunner)

 - Clear spurious pciehp events on resume (Lukas Wunner)

 - Add pciehp runtime PM support, including for Thunderbolt controllers
   (Lukas Wunner)

 - Support interrupts from pciehp bridges in D3hot (Lukas Wunner)

 - Mark fall-through switch cases before enabling -Wimplicit-fallthrough
   (Gustavo A. R. Silva)

 - Move DMA-debug PCI init from arch code to PCI core (Christoph
   Hellwig)

 - Fix pci_request_irq() usage of IRQF_ONESHOT when no handler is
   supplied (Heiner Kallweit)

 - Unify PCI and DMA direction #defines (Shunyong Yang)

 - Add PCI_DEVICE_DATA() macro (Andy Shevchenko)

 - Check for VPD completion before checking for timeout (Bert Kenward)

 - Limit Netronome NFP5000 config space size to work around erratum
   (Jakub Kicinski)

 - Set IRQCHIP_ONESHOT_SAFE for PCI MSI irqchips (Heiner Kallweit)

 - Document ACPI description of PCI host bridges (Bjorn Helgaas)

 - Add "pci=disable_acs_redir=" parameter to disable ACS redirection for
   peer-to-peer DMA support (we don't have the peer-to-peer support yet;
   this is just one piece) (Logan Gunthorpe)

 - Clean up devm_of_pci_get_host_bridge_resources() resource allocation
   (Jan Kiszka)

 - Fixup resizable BARs after suspend/resume (Christian König)

 - Make "pci=earlydump" generic (Sinan Kaya)

 - Fix ROM BAR access routines to stay in bounds and check for signature
   correctly (Rex Zhu)

 - Add DMA alias quirk for Microsemi Switchtec NTB (Doug Meyer)

 - Expand documentation for pci_add_dma_alias() (Logan Gunthorpe)

 - To avoid bus errors, enable PASID only if entire path supports
   End-End TLP prefixes (Sinan Kaya)

 - Unify slot and bus reset functions and remove hotplug knowledge from
   callers (Sinan Kaya)

 - Add Function-Level Reset quirks for Intel and Samsung NVMe devices to
   fix guest reboot issues (Alex Williamson)

 - Add function 1 DMA alias quirk for Marvell 88SS9183 PCIe SSD
   Controller (Bjorn Helgaas)

 - Remove Xilinx AXI-PCIe host bridge arch dependency (Palmer Dabbelt)

 - Remove Aardvark outbound window configuration (Evan Wang)

 - Fix Aardvark bridge window sizing issue (Zachary Zhang)

 - Convert Aardvark to use pci_host_probe() to reduce code duplication
   (Thomas Petazzoni)

 - Correct the Cadence cdns_pcie_writel() signature (Alan Douglas)

 - Add Cadence support for optional generic PHYs (Alan Douglas)

 - Add Cadence power management ops (Alan Douglas)

 - Remove redundant variable from Cadence driver (Colin Ian King)

 - Add Kirin MSI support (Xiaowei Song)

 - Drop unnecessary root_bus_nr setting from exynos, imx6, keystone,
   armada8k, artpec6, designware-plat, histb, qcom, spear13xx (Shawn
   Guo)

 - Move link notification settings from DesignWare core to individual
   drivers (Gustavo Pimentel)

 - Add endpoint library MSI-X interfaces (Gustavo Pimentel)

 - Correct signature of endpoint library IRQ interfaces (Gustavo
   Pimentel)

 - Add DesignWare endpoint library MSI-X callbacks (Gustavo Pimentel)

 - Add endpoint library MSI-X test support (Gustavo Pimentel)

 - Remove unnecessary GFP_ATOMIC from Hyper-V "new child" allocation
   (Jia-Ju Bai)

 - Add more devices to Broadcom PAXC quirk (Ray Jui)

 - Work around corrupted Broadcom PAXC config space to enable SMMU and
   GICv3 ITS (Ray Jui)

 - Disable MSI parsing to work around broken Broadcom PAXC logic in some
   devices (Ray Jui)

 - Hide unconfigured functions to work around a Broadcom PAXC defect
   (Ray Jui)

 - Lower iproc log level to reduce console output during boot (Ray Jui)

 - Fix mobiveil iomem/phys_addr_t type usage (Lorenzo Pieralisi)

 - Fix mobiveil missing include file (Lorenzo Pieralisi)

 - Add mobiveil Kconfig/Makefile support (Lorenzo Pieralisi)

 - Fix mvebu I/O space remapping issues (Thomas Petazzoni)

 - Use generic pci_host_bridge in mvebu instead of ARM-specific API
   (Thomas Petazzoni)

 - Whitelist VMD devices with fast interrupt handlers to avoid sharing
   vectors with slow handlers (Keith Busch)

* tag 'pci-v4.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (153 commits)
  PCI/AER: Don't clear AER bits if error handling is Firmware-First
  PCI: Limit config space size for Netronome NFP5000
  PCI/MSI: Set IRQCHIP_ONESHOT_SAFE for PCI-MSI irqchips
  PCI/VPD: Check for VPD access completion before checking for timeout
  PCI: Add PCI_DEVICE_DATA() macro to fully describe device ID entry
  PCI: Match Root Port's MPS to endpoint's MPSS as necessary
  PCI: Skip MPS logic for Virtual Functions (VFs)
  PCI: Add function 1 DMA alias quirk for Marvell 88SS9183
  PCI: Check for PCIe Link downtraining
  PCI: Add ACS Redirect disable quirk for Intel Sunrise Point
  PCI: Add device-specific ACS Redirect disable infrastructure
  PCI: Convert device-specific ACS quirks from NULL termination to ARRAY_SIZE
  PCI: Add "pci=disable_acs_redir=" parameter for peer-to-peer support
  PCI: Allow specifying devices using a base bus and path of devfns
  PCI: Make specifying PCI devices in kernel parameters reusable
  PCI: Hide ACS quirk declarations inside PCI core
  PCI: Delay after FLR of Intel DC P3700 NVMe
  PCI: Disable Samsung SM961/PM961 NVMe before FLR
  PCI: Export pcie_has_flr()
  PCI: mvebu: Drop bogus comment above mvebu_pcie_map_registers()
  ...
2018-08-16 09:21:54 -07:00
David S. Miller c4c5551df1 Merge ra.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux
All conflicts were trivial overlapping changes, so reasonably
easy to resolve.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-20 21:17:12 -07:00
Sinan Kaya c6a44ba950 PCI: Rename pci_try_reset_bus() to pci_reset_bus()
Now that the old implementation of pci_reset_bus() is gone, replace
pci_try_reset_bus() with pci_reset_bus().

Compared to the old implementation, new code will fail immmediately with
-EAGAIN if object lock cannot be obtained.

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-19 18:04:23 -05:00
Sinan Kaya 811c5cb37d PCI: Unify try slot and bus reset API
Drivers are expected to call pci_try_reset_slot() or pci_try_reset_bus() by
querying if a system supports hotplug or not.  A survey showed that most
drivers don't do this and we are leaking hotplug capability to the user.

Hide pci_try_slot_reset() from drivers and embed into pci_try_bus_reset().
Change pci_try_reset_bus() parameter from struct pci_bus to struct pci_dev.

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-19 18:04:23 -05:00
Sinan Kaya 409888e096 IB/hfi1: Use pci_try_reset_bus() for initiating PCI Secondary Bus Reset
Getting ready to hide pci_reset_bridge_secondary_bus() from the drivers.
pci_reset_bridge_secondary_bus() should only be used internally by the
PCI code itself.

Other drivers should rely on higher level pci_try_reset_bus() API.

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-19 18:04:23 -05:00
Bart Van Assche 522628ed1a IB/hfi1: Suppress a compiler warning
Avoid that the following compiler warning is reported when building
with gcc 8:

drivers/infiniband/hw/hfi1/verbs.c:1896:2: warning: 'strncpy' output may be truncated copying 64 bytes from a string of length 64 [-Wstringop-truncation]

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-11 14:20:45 -06:00
Jason Gunthorpe 958200ad8e RDMA/hfi1: Move grh_required into update_sm_ah
grh_required is intended to be a global setting where all AV's will
require a GRH, not just the sm_lid. Move the special logic to the creation
of the SM AH.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-10 11:13:04 -06:00
Alexander Duyck 4f49dec907 net: allow ndo_select_queue to pass netdev
This patch makes it so that instead of passing a void pointer as the
accel_priv we instead pass a net_device pointer as sb_dev. Making this
change allows us to pass the subordinate device through to the fallback
function eventually so that we can keep the actual code in the
ndo_select_queue call as focused on possible on the exception cases.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-07-09 13:41:34 -07:00