OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Matthew R. Ochs	c4a11827b7	scsi: cxlflash: Fix context reference tracking on detach Commit `888baf069f` ("scsi: cxlflash: Add kref to context") introduced a kref to the context. In particular, the detach routine was updated to use the kref services for managing the removal and destruction of a context. As part of this change, the tracking mechanism internal to the detach handler was refactored. This introduced a bug that can cause the tracking state to be lost. This can lead to a situation where exclusive access to a context is prematurely [and unknowingly] relinquished for the executing thread. To remedy, only update the tracking state when the kref operation indicates the context was removed. Fixes: `888baf069f` ("scsi: cxlflash: Add kref to context") Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-09-14 12:47:42 -04:00
Matthew R. Ochs	f80132613d	scsi: cxlflash: Refactor WWPN setup Commit `964497b3bf` ("cxlflash: Remove dual port online dependency") logically removed the ability for the WWPN setup routine afu_set_wwpn() to return a non-success value. This routine can safely be made a void to simplify the code as there is no longer a need to report a failure. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-09-14 12:47:03 -04:00
Matthew R. Ochs	05dab43230	scsi: cxlflash: Improve EEH recovery time When an EEH occurs during device initialization, the port timeout logic can cause excessive delays as MMIO reads will fail. Depending on where they are experienced, these delays can lead to a prolonged reset, causing an unnecessary triggering of other timeout logic in the SCSI stack or user applications. To expedite recovery, the port timeout logic is updated to decay the timeout at a much faster rate when in the presence of a likely EEH frozen event. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-09-14 12:46:20 -04:00
Matthew R. Ochs	1d3324c382	scsi: cxlflash: Fix to avoid EEH and host reset collisions The EEH reset handler is ignorant to the current state of the driver when processing a frozen event and initiating a device reset. This can be an issue if an EEH event occurs while a user or stack initiated reset is executing. More specifically, if an EEH occurs while the SCSI host reset handler is active, the reset initiated by the EEH thread will likely collide with the host reset thread. This can leave the device in an inconsistent state, or worse, cause a system crash. As a remedy, the EEH handler is updated to evaluate the device state and take appropriate action (proceed, wait, or disconnect host). The host reset handler is also updated to handle situations where an EEH occurred during a host reset. In such situations, the host reset handler will delay reporting back a success to give the EEH reset an opportunity to complete. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-09-14 12:45:06 -04:00
Uma Krishnan	babf985d1e	scsi: cxlflash: Remove the device cleanly in the system shutdown path Commit `704c4b0ddc` ("cxlflash: Shutdown notify support for CXL Flash cards") was recently introduced to notify the AFU when a system is going down. Due to the position of the cxlflash driver in the device stack, cxlflash devices are _always_ removed during a reboot/shutdown. This can lead to a crash if the cxlflash shutdown hook is invoked _after_ the shutdown hook for the owning virtual PHB. Furthermore, the current implementation of shutdown/remove hooks for cxlflash are not tolerant to being invoked when the device is not enabled. This can also lead to a crash in situations where the remove hook is invoked after the device has been removed via the vPHBs shutdown hook. An example of this scenario would be an EEH reset failure while a reboot/shutdown is in progress. To solve both problems, the shutdown hook for cxlflash is updated to simply remove the device. This path already includes the AFU notification and thus this solution will continue to perform the original intent. At the same time, the remove hook is updated to protect against being called when the device is not enabled. Fixes: `704c4b0ddc` ("cxlflash: Shutdown notify support for CXL Flash cards") Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-09-09 07:34:58 -04:00
Uma Krishnan	bbbfae962b	scsi: cxlflash: Scan host only after the port is ready for I/O When a port link is established, the AFU sends a 'link up' interrupt. After the link is up, corresponding initialization steps are performed on the card. Following that, when the card is ready for I/O, the AFU sends 'login succeeded' interrupt. Today, cxlflash invokes scsi_scan_host() upon receipt of both interrupts. SCSI commands sent to the port prior to the 'login succeeded' interrupt will fail with 'port not available' error. This is not desirable. Moreover, when async_scan is active for the host, subsequent scan calls are terminated with error. Due to this, the scsi_scan_host() call performed after 'login succeeded' interrupt could portentially return error and the devices may not be scanned properly. To avoid this problem, scsi_scan_host() should be called only after the 'login succeeded' interrupt. Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-09-09 07:34:18 -04:00
Matthew R. Ochs	de9f0b0cbb	scsi: cxlflash: Remove adapter file descriptor cache The adapter file descriptor was previously cached within the kernel for a given context in order to support performing a close on behalf of an application. This is no longer needed as applications are now required to perform a close on the adapter file descriptor. Inspired-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-08-23 22:23:52 -04:00
Matthew R. Ochs	cd34af40a0	scsi: cxlflash: Transition to application close model Caching the adapter file descriptor and performing a close on behalf of an application is a poor design. This is due to the fact that once a file descriptor in installed, it is free to be altered without the knowledge of the cxlflash driver. This can lead to inconsistencies between the application and kernel. Furthermore, the nature of the former design is more exploitable and thus should be abandoned. To support applications performing a close on the adapter file that is associated with a context, a new flag is introduced to the user API to indicate to applications that they are responsible for the close following the cleanup (detach) of a context. The documentation is also updated to reflect this change in behavior. Inspired-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-08-23 22:23:41 -04:00
Matthew R. Ochs	888baf069f	scsi: cxlflash: Add kref to context Currently, context user references are tracked via the list of LUNs that have attached to the context. While convenient, this is not intuitive without a deep study of the code and is inconsistent with the existing reference tracking patterns within the kernel. This design choice can lead to future bug injection. To improve code comprehension and better protect against future bugs, add explicit reference counting to contexts and migrate the context removal code to the kref release handler. Inspired-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-08-18 22:41:36 -04:00
Matthew R. Ochs	44ef38f9a2	scsi: cxlflash: Cache owning adapter within context The context removal routine requires access to the owning adapter structure to reset the context within the AFU as part of the tear down sequence. In order to support kref adoption, the owning adapter must be accessible from the release handler. As the kref framework only provides the kref reference as the sole parameter, another means is needed to derive the owning adapter. As a remedy, the owning adapter reference is saved off within the context during initialization. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-08-18 22:40:27 -04:00
Matthew R. Ochs	41b99e1a30	scsi: cxlflash: Avoid mutex when destroying context Context information structures are protected by a mutex that is held when accessing/manipulating the context. When the code that manages these structures was authored, a decision was made to include taking the mutex as part of the allocation/initialization sequence and also handle the scenario where the mutex was already held when freeing the context. While not a problem outright, this design decision has been deemed as too flexible and the code should be made more rigid to avoid future bugs. In addition, further review of the code yields that the existing mutex manipulations in both of these context management paths are superfluous. This commit removes the obtaining of the context mutex in the context initialization routine and assumes the mutex is not held in the context free path. Inspired-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-08-18 22:39:45 -04:00
Linus Torvalds	0603006b45	SCSI misc on 20160805 This is seven basic fixes (plus one MAINTAINER update) which came in close to the merge window. Signed-off-by: James Bottomley <jejb@linux.vnet.ibm.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABAgAGBQJXpR8/AAoJEAVr7HOZEZN4QcgP/1YfBXT49Vr/CJyqysyUKbSQ adzjxu0n1L+5wN9/5Wp523cnpIuLKF8h2iWK3KrS4AB4zBzy78PhFXZ1bLzNFb+x e76vOClKHJ5cG52LCvF7JPTbCSV4VQHsGEOGiY6TXsNMOmLhwM0jQ+/YE7KjrOrj 3ICfwFgb32wsby5Dm/VhyCU7YTZUjd4gOny02Uft87cm2sto9CC5zZQvp0CXqnb9 8AHRwicrvar7E+Xvnq/AduFoxjalLGBYR1vbzBOmoolrWnNdz2Y9gPolzU9N8e8u 6Ybubsz3wdlx4wb9ckgxOYQy5C/zEwt9dqWdwzMUc394guB7lBOiZFYPm6t8Smpb +j8HYiVG717K7sb1N2GyCc1iuUC65Fq4yFjCVuos89ez/kbXZDCOLeGJ/Z24u7+T JDPL8GVJmMBhnHbtmiNndAnwGzGNN1tTJShm2MZhR8O3iW+jn/nODCSzBNtzD1x5 3z6q4aq+a87A7swc4W2eBKzGOjhRt4S46JqjHqZDUwl3Jt6eQZH0CZdHK2wn8bwG FaaMWGux6+tr0CRweLGhKRGeJGhGZLJVIVRg0p0j+WkDsljb3Vfr14EX4+gauTRB jBYTFrQlOgQZMRt8GaWhU+iqf3z0KcMcg5O6zHPOHCOEhATmRsj7iUAOHC+32xKj pl5STasmL7aNBzwfCRkq =826r -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "This is seven basic fixes (plus one MAINTAINER update) which came in close to the merge window" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: ipr: Fix error return code in ipr_probe_ioa() fcoe: add missing destroy_workqueue() on error in fcoe_init() lpfc: Fix possible NULL pointer dereference fcoe: Use default VLAN for FIP VLAN discovery ipr: Wait to do async scan until scsi host is initialized MAINTAINERS: Update cxlflash maintainers cxlflash: Verify problem state area is mapped before notifying shutdown lpfc: fix oops in lpfc_sli4_scmd_to_wqidx_distr() from lpfc_send_taskmgmt()	2016-08-05 23:47:27 -04:00
Linus Torvalds	bad60e6f25	powerpc updates for 4.8 # 1 Highlights: - PowerNV PCI hotplug support. - Lots more Power9 support. - eBPF JIT support on ppc64le. - Lots of cxl updates. - Boot code consolidation. Bug fixes: - Fix spin_unlock_wait() from Boqun Feng - Fix stack pointer corruption in __tm_recheckpoint() from Michael Neuling - Fix multiple bugs in memory_hotplug_max() from Bharata B Rao - mm: Ensure "special" zones are empty from Oliver O'Halloran - ftrace: Separate the heuristics for checking call sites from Michael Ellerman - modules: Never restore r2 for a mprofile-kernel style mcount() call from Michael Ellerman - Fix endianness when reading TCEs from Alexey Kardashevskiy - start rtasd before PCI probing from Greg Kurz - PCI: rpaphp: Fix slot registration for multiple slots under a PHB from Tyrel Datwyler - powerpc/mm: Add memory barrier in __hugepte_alloc() from Sukadev Bhattiprolu Cleanups & fixes: - Drop support for MPIC in pseries from Rashmica Gupta - Define and use PPC64_ELF_ABI_v2/v1 from Michael Ellerman - Remove unused symbols in asm-offsets.c from Rashmica Gupta - Fix SRIOV not building without EEH enabled from Russell Currey - Remove kretprobe_trampoline_holder. from Thiago Jung Bauermann - Reduce log level of PCI I/O space warning from Benjamin Herrenschmidt - Add array bounds checking to crash_shutdown_handlers from Suraj Jitindar Singh - Avoid -maltivec when using clang integrated assembler from Anton Blanchard - Fix array overrun in ppc_rtas() syscall from Andrew Donnellan - Fix error return value in cmm_mem_going_offline() from Rasmus Villemoes - export cpu_to_core_id() from Mauricio Faria de Oliveira - Remove old symbols from defconfigs from Andrew Donnellan - Update obsolete comments in setup_32.c about entry conditions from Benjamin Herrenschmidt - Add comment explaining the purpose of setup_kdump_trampoline() from Benjamin Herrenschmidt - Merge the RELOCATABLE config entries for ppc32 and ppc64 from Kevin Hao - Remove RELOCATABLE_PPC32 from Kevin Hao - Fix .long's in tlb-radix.c to more meaningful from Balbir Singh Minor cleanups & fixes: - Andrew Donnellan, Anna-Maria Gleixner, Anton Blanchard, Benjamin Herrenschmidt, Bharata B Rao, Christophe Leroy, Colin Ian King, Geliang Tang, Greg Kurz, Madhavan Srinivasan, Michael Ellerman, Michael Ellerman, Stephen Rothwell, Stewart Smith. Freescale updates from Scott: - "Highlights include more 8xx optimizations, device tree updates, and MVME7100 support." PowerNV PCI hotplug from Gavin Shan: - PCI: Add pcibios_setup_bridge() - Override pcibios_setup_bridge() - Remove PCI_RESET_DELAY_US - Move pnv_pci_ioda_setup_opal_tce_kill() around - Increase PE# capacity - Allocate PE# in reverse order - Create PEs in pcibios_setup_bridge() - Setup PE for root bus - Extend PCI bridge resources - Make pnv_ioda_deconfigure_pe() visible - Dynamically release PE - Update bridge windows on PCI plug - Delay populating pdn - Support PCI slot ID - Use PCI slot reset infrastructure - Introduce pnv_pci_get_slot_id() - Functions to get/set PCI slot state - PCI/hotplug: PowerPC PowerNV PCI hotplug driver - Print correct PHB type names Power9 idle support from Shreyas B. Prabhu: - set power_save func after the idle states are initialized - Use PNV_THREAD_WINKLE macro while requesting for winkle - make hypervisor state restore a function - Rename idle_power7.S to idle_book3s.S - Rename reusable idle functions to hardware agnostic names - Make pnv_powersave_common more generic - abstraction for saving SPRs before entering deep idle states - Add platform support for stop instruction - cpuidle/powernv: Use CPUIDLE_STATE_MAX instead of MAX_POWERNV_IDLE_STATES - cpuidle/powernv: cleanup cpuidle-powernv.c - cpuidle/powernv: Add support for POWER ISA v3 idle states - Use deepest stop state when cpu is offlined Power9 PMU from Madhavan Srinivasan: - factor out power8 pmu macros and defines - factor out power8 pmu functions - factor out power8 __init_pmu code - Add power9 event list macros for generic and cache events - Power9 PMU support - Export Power9 generic and cache events to sysfs Power9 preliminary interrupt & PCI support from Benjamin Herrenschmidt: - Add XICS emulation APIs - Move a few exception common handlers to make room - Add support for HV virtualization interrupts - Add mechanism to force a replay of interrupts - Add ICP OPAL backend - Discover IODA3 PHBs - pci: Remove obsolete SW invalidate - opal: Add real mode call wrappers - Rename TCE invalidation calls - Remove SWINV constants and obsolete TCE code - Rework accessing the TCE invalidate register - Fallback to OPAL for TCE invalidations - Use the device-tree to get available range of M64's - Check status of a PHB before using it - pci: Don't try to allocate resources that will be reassigned Other Power9: - Send SIGBUS on unaligned copy and paste from Chris Smart - Large Decrementer support from Oliver O'Halloran - Load Monitor Register Support from Jack Miller Performance improvements from Anton Blanchard: - Avoid load hit store in __giveup_fpu() and __giveup_altivec() - Avoid load hit store in setup_sigcontext() - Remove assembly versions of strcpy, strcat, strlen and strcmp - Align hot loops of some string functions eBPF JIT from Naveen N. Rao: - Fix/enhance 32-bit Load Immediate implementation - Optimize 64-bit Immediate loads - Introduce rotate immediate instructions - A few cleanups - Isolate classic BPF JIT specifics into a separate header - Implement JIT compiler for extended BPF Operator Panel driver from Suraj Jitindar Singh: - devicetree/bindings: Add binding for operator panel on FSP machines - Add inline function to get rc from an ASYNC_COMP opal_msg - Add driver for operator panel on FSP machines Sparse fixes from Daniel Axtens: - make some things static - Introduce asm-prototypes.h - Include headers containing prototypes - Use #ifdef __BIG_ENDIAN__ #else for REG_BYTE - kvm: Clarify __user annotations - Pass endianness to sparse - Make ppc_md.{halt, restart} __noreturn MM fixes & cleanups from Aneesh Kumar K.V: - radix: Update LPCR HR bit as per ISA - use _raw variant of page table accessors - Compile out radix related functions if RADIX_MMU is disabled - Clear top 16 bits of va only on older cpus - Print formation regarding the the MMU mode - hash: Update SDR1 size encoding as documented in ISA 3.0 - radix: Update PID switch sequence - radix: Update machine call back to support new HCALL. - radix: Add LPID based tlb flush helpers - radix: Add a kernel command line to disable radix - Cleanup LPCR defines Boot code consolidation from Benjamin Herrenschmidt: - Move epapr_paravirt_early_init() to early_init_devtree() - cell: Don't use flat device-tree after boot - ge_imp3a: Don't use the flat device-tree after boot - mpc85xx_ds: Don't use the flat device-tree after boot - mpc85xx_rdb: Don't use the flat device-tree after boot - Don't test for machine type in rtas_initialize() - Don't test for machine type in smp_setup_cpu_maps() - dt: Add of_device_compatible_match() - Factor do_feature_fixup calls - Move 64-bit feature fixup earlier - Move 64-bit memory reserves to setup_arch() - Use a cachable DART - Move FW feature probing out of pseries probe() - Put exception configuration in a common place - Remove early allocation of the SMU command buffer - Move MMU backend selection out of platform code - pasemi: Remove IOBMAP allocation from platform probe() - mm/hash: Don't use machine_is() early during boot - Don't test for machine type to detect HEA special case - pmac: Remove spurrious machine type test - Move hash table ops to a separate structure - Ensure that ppc_md is empty before probing for machine type - Move 64-bit probe_machine() to later in the boot process - Move 32-bit probe() machine to later in the boot process - Get rid of ppc_md.init_early() - Move the boot time info banner to a separate function - Move setting of {i,d}cache_bsize to initialize_cache_info() - Move the content of setup_system() to setup_arch() - Move cache info inits to a separate function - Re-order the call to smp_setup_cpu_maps() - Re-order setup_panic() - Make a few boot functions __init - Merge 32-bit and 64-bit setup_arch() Other new features: - tty/hvc: Use IRQF_SHARED for OPAL hvc consoles from Sam Mendoza-Jonas - tty/hvc: Use opal irqchip interface if available from Sam Mendoza-Jonas - powerpc: Add module autoloading based on CPU features from Alastair D'Silva - crypto: vmx - Convert to CPU feature based module autoloading from Alastair D'Silva - Wake up kopald polling thread before waiting for events from Benjamin Herrenschmidt - xmon: Dump ISA 2.06 SPRs from Michael Ellerman - xmon: Dump ISA 2.07 SPRs from Michael Ellerman - Add a parameter to disable 1TB segs from Oliver O'Halloran - powerpc/boot: Add OPAL console to epapr wrappers from Oliver O'Halloran - Assign fixed PHB number based on device-tree properties from Guilherme G. Piccoli - pseries: Add pseries hotplug workqueue from John Allen - pseries: Add support for hotplug interrupt source from John Allen - pseries: Use kernel hotplug queue for PowerVM hotplug events from John Allen - pseries: Move property cloning into its own routine from Nathan Fontenot - pseries: Dynamic add entires to associativity lookup array from Nathan Fontenot - pseries: Auto-online hotplugged memory from Nathan Fontenot - pseries: Remove call to memblock_add() from Nathan Fontenot cxl: - Add set and get private data to context struct from Michael Neuling - make base more explicitly non-modular from Paul Gortmaker - Use for_each_compatible_node() macro from Wei Yongjun - Frederic Barrat - Abstract the differences between the PSL and XSL - Make vPHB device node match adapter's - Philippe Bergheaud - Add mechanism for delivering AFU driver specific events - Ignore CAPI adapters misplaced in switched slots - Refine slice error debug messages - Andrew Donnellan - static-ify variables to fix sparse warnings - PCI/hotplug: pnv_php: export symbols and move struct types needed by cxl - PCI/hotplug: pnv_php: handle OPAL_PCI_SLOT_OFFLINE power state - Add cxl_check_and_switch_mode() API to switch bi-modal cards - remove dead Kconfig options - fix potential NULL dereference in free_adapter() - Ian Munsie - Update process element after allocating interrupts - Add support for CAPP DMA mode - Fix allowing bogus AFU descriptors with 0 maximum processes - Fix allocating a minimum of 2 pages for the SPA - Fix bug where AFU disable operation had no effect - Workaround XSL bug that does not clear the RA bit after a reset - Fix NULL pointer dereference on kernel contexts with no AFU interrupts - powerpc/powernv: Split cxl code out into a separate file - Add cxl_slot_is_supported API - Enable bus mastering for devices using CAPP DMA mode - Move cxl_afu_get / cxl_afu_put to base - Allow a default context to be associated with an external pci_dev - Do not create vPHB if there are no AFU configuration records - powerpc/powernv: Add support for the cxl kernel api on the real phb - Add support for using the kernel API with a real PHB - Add kernel APIs to get & set the max irqs per context - Add preliminary workaround for CX4 interrupt limitation - Add support for interrupts on the Mellanox CX4 - Workaround PE=0 hardware limitation in Mellanox CX4 - powerpc/powernv: Fix pci-cxl.c build when CONFIG_MODULES=n selftests: - Test unaligned copy and paste from Chris Smart - Load Monitor Register Tests from Jack Miller - Cyril Bur - exec() with suspended transaction - Use signed long to read perf_event_paranoid - Fix usage message in context_switch - Fix generation of vector instructions/types in context_switch - Michael Ellerman - Use "Delta" rather than "Error" in normal output - Import Anton's mmap & futex micro benchmarks - Add a test for PROT_SAO -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXnWchAAoJEFHr6jzI4aWAe64P/36Vd9yJLptjkoyZp8/IQtu1 Cv8buQwGdKuSMzdkcUAOXcC3fe2u70ZWXMKKLfY3koIV1IAiqdWk5/XWRKMP2XmE dG0LhSf0uu7uh+mE0WvQnRu46ImeKtQ+mPp4Hbs/s9SxMSeYjruv3vdWWmgUq0cl Gac2qJSRtAMmgLuHWMjf7N5mxOTOnKejU4o2i9cJ+YHmWKOdCigv2Ge1UadOQFlC E7tRPiUR3asfDfj+e+LVTTdToH6p8pk+mOUzIoZ8jIkQ+IXzi62UDl5+Rw9mqiuX 1CtqEMUXxo2qwX+d4TcV/QUOp0YKPuIcUZ9NMMS+S3lOyJ4NFt+j2Izk7QJp5kNP gKVqB68TjDQsBuDr3P9ynlHbduxTIhZAqopbTrLe0FIg48nUe4n1yHJBVzqaVajX rFBJSsSUffBLAARNPSXJJhIgc2C1/qOC8dgMeDMcR2kPirDHaQZ/lY1yEpq1yiqR q6e3v5hvIAm4IjbYk0mF7TUxBrPGVE/ExyBINyASRoYxAJ1PyeD/iljZ9vI3asRA s+hhxT8H3f7lnqTrmJqMjHgAdGkmag07EdmvFNX4xK4aADSy7Y6g4dw25ffRopo9 p9Jf9HX+dZv65Y3UjbV/6HuXcaSEBJJLSVWvii65PebqSN0LuHEFvNeIJ6Iblx0B AWh/hd0Iin2gdkcG39Mr =Z5kM -----END PGP SIGNATURE----- Merge tag 'powerpc-4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Highlights: - PowerNV PCI hotplug support. - Lots more Power9 support. - eBPF JIT support on ppc64le. - Lots of cxl updates. - Boot code consolidation. Bug fixes: - Fix spin_unlock_wait() from Boqun Feng - Fix stack pointer corruption in __tm_recheckpoint() from Michael Neuling - Fix multiple bugs in memory_hotplug_max() from Bharata B Rao - mm: Ensure "special" zones are empty from Oliver O'Halloran - ftrace: Separate the heuristics for checking call sites from Michael Ellerman - modules: Never restore r2 for a mprofile-kernel style mcount() call from Michael Ellerman - Fix endianness when reading TCEs from Alexey Kardashevskiy - start rtasd before PCI probing from Greg Kurz - PCI: rpaphp: Fix slot registration for multiple slots under a PHB from Tyrel Datwyler - powerpc/mm: Add memory barrier in __hugepte_alloc() from Sukadev Bhattiprolu Cleanups & fixes: - Drop support for MPIC in pseries from Rashmica Gupta - Define and use PPC64_ELF_ABI_v2/v1 from Michael Ellerman - Remove unused symbols in asm-offsets.c from Rashmica Gupta - Fix SRIOV not building without EEH enabled from Russell Currey - Remove kretprobe_trampoline_holder from Thiago Jung Bauermann - Reduce log level of PCI I/O space warning from Benjamin Herrenschmidt - Add array bounds checking to crash_shutdown_handlers from Suraj Jitindar Singh - Avoid -maltivec when using clang integrated assembler from Anton Blanchard - Fix array overrun in ppc_rtas() syscall from Andrew Donnellan - Fix error return value in cmm_mem_going_offline() from Rasmus Villemoes - export cpu_to_core_id() from Mauricio Faria de Oliveira - Remove old symbols from defconfigs from Andrew Donnellan - Update obsolete comments in setup_32.c about entry conditions from Benjamin Herrenschmidt - Add comment explaining the purpose of setup_kdump_trampoline() from Benjamin Herrenschmidt - Merge the RELOCATABLE config entries for ppc32 and ppc64 from Kevin Hao - Remove RELOCATABLE_PPC32 from Kevin Hao - Fix .long's in tlb-radix.c to more meaningful from Balbir Singh Minor cleanups & fixes: - Andrew Donnellan, Anna-Maria Gleixner, Anton Blanchard, Benjamin Herrenschmidt, Bharata B Rao, Christophe Leroy, Colin Ian King, Geliang Tang, Greg Kurz, Madhavan Srinivasan, Michael Ellerman, Michael Ellerman, Stephen Rothwell, Stewart Smith. Freescale updates from Scott: - "Highlights include more 8xx optimizations, device tree updates, and MVME7100 support." PowerNV PCI hotplug from Gavin Shan: - PCI: Add pcibios_setup_bridge() - Override pcibios_setup_bridge() - Remove PCI_RESET_DELAY_US - Move pnv_pci_ioda_setup_opal_tce_kill() around - Increase PE# capacity - Allocate PE# in reverse order - Create PEs in pcibios_setup_bridge() - Setup PE for root bus - Extend PCI bridge resources - Make pnv_ioda_deconfigure_pe() visible - Dynamically release PE - Update bridge windows on PCI plug - Delay populating pdn - Support PCI slot ID - Use PCI slot reset infrastructure - Introduce pnv_pci_get_slot_id() - Functions to get/set PCI slot state - PCI/hotplug: PowerPC PowerNV PCI hotplug driver - Print correct PHB type names Power9 idle support from Shreyas B. Prabhu: - set power_save func after the idle states are initialized - Use PNV_THREAD_WINKLE macro while requesting for winkle - make hypervisor state restore a function - Rename idle_power7.S to idle_book3s.S - Rename reusable idle functions to hardware agnostic names - Make pnv_powersave_common more generic - abstraction for saving SPRs before entering deep idle states - Add platform support for stop instruction - cpuidle/powernv: Use CPUIDLE_STATE_MAX instead of MAX_POWERNV_IDLE_STATES - cpuidle/powernv: cleanup cpuidle-powernv.c - cpuidle/powernv: Add support for POWER ISA v3 idle states - Use deepest stop state when cpu is offlined Power9 PMU from Madhavan Srinivasan: - factor out power8 pmu macros and defines - factor out power8 pmu functions - factor out power8 __init_pmu code - Add power9 event list macros for generic and cache events - Power9 PMU support - Export Power9 generic and cache events to sysfs Power9 preliminary interrupt & PCI support from Benjamin Herrenschmidt: - Add XICS emulation APIs - Move a few exception common handlers to make room - Add support for HV virtualization interrupts - Add mechanism to force a replay of interrupts - Add ICP OPAL backend - Discover IODA3 PHBs - pci: Remove obsolete SW invalidate - opal: Add real mode call wrappers - Rename TCE invalidation calls - Remove SWINV constants and obsolete TCE code - Rework accessing the TCE invalidate register - Fallback to OPAL for TCE invalidations - Use the device-tree to get available range of M64's - Check status of a PHB before using it - pci: Don't try to allocate resources that will be reassigned Other Power9: - Send SIGBUS on unaligned copy and paste from Chris Smart - Large Decrementer support from Oliver O'Halloran - Load Monitor Register Support from Jack Miller Performance improvements from Anton Blanchard: - Avoid load hit store in __giveup_fpu() and __giveup_altivec() - Avoid load hit store in setup_sigcontext() - Remove assembly versions of strcpy, strcat, strlen and strcmp - Align hot loops of some string functions eBPF JIT from Naveen N. Rao: - Fix/enhance 32-bit Load Immediate implementation - Optimize 64-bit Immediate loads - Introduce rotate immediate instructions - A few cleanups - Isolate classic BPF JIT specifics into a separate header - Implement JIT compiler for extended BPF Operator Panel driver from Suraj Jitindar Singh: - devicetree/bindings: Add binding for operator panel on FSP machines - Add inline function to get rc from an ASYNC_COMP opal_msg - Add driver for operator panel on FSP machines Sparse fixes from Daniel Axtens: - make some things static - Introduce asm-prototypes.h - Include headers containing prototypes - Use #ifdef __BIG_ENDIAN__ #else for REG_BYTE - kvm: Clarify __user annotations - Pass endianness to sparse - Make ppc_md.{halt, restart} __noreturn MM fixes & cleanups from Aneesh Kumar K.V: - radix: Update LPCR HR bit as per ISA - use _raw variant of page table accessors - Compile out radix related functions if RADIX_MMU is disabled - Clear top 16 bits of va only on older cpus - Print formation regarding the the MMU mode - hash: Update SDR1 size encoding as documented in ISA 3.0 - radix: Update PID switch sequence - radix: Update machine call back to support new HCALL. - radix: Add LPID based tlb flush helpers - radix: Add a kernel command line to disable radix - Cleanup LPCR defines Boot code consolidation from Benjamin Herrenschmidt: - Move epapr_paravirt_early_init() to early_init_devtree() - cell: Don't use flat device-tree after boot - ge_imp3a: Don't use the flat device-tree after boot - mpc85xx_ds: Don't use the flat device-tree after boot - mpc85xx_rdb: Don't use the flat device-tree after boot - Don't test for machine type in rtas_initialize() - Don't test for machine type in smp_setup_cpu_maps() - dt: Add of_device_compatible_match() - Factor do_feature_fixup calls - Move 64-bit feature fixup earlier - Move 64-bit memory reserves to setup_arch() - Use a cachable DART - Move FW feature probing out of pseries probe() - Put exception configuration in a common place - Remove early allocation of the SMU command buffer - Move MMU backend selection out of platform code - pasemi: Remove IOBMAP allocation from platform probe() - mm/hash: Don't use machine_is() early during boot - Don't test for machine type to detect HEA special case - pmac: Remove spurrious machine type test - Move hash table ops to a separate structure - Ensure that ppc_md is empty before probing for machine type - Move 64-bit probe_machine() to later in the boot process - Move 32-bit probe() machine to later in the boot process - Get rid of ppc_md.init_early() - Move the boot time info banner to a separate function - Move setting of {i,d}cache_bsize to initialize_cache_info() - Move the content of setup_system() to setup_arch() - Move cache info inits to a separate function - Re-order the call to smp_setup_cpu_maps() - Re-order setup_panic() - Make a few boot functions __init - Merge 32-bit and 64-bit setup_arch() Other new features: - tty/hvc: Use IRQF_SHARED for OPAL hvc consoles from Sam Mendoza-Jonas - tty/hvc: Use opal irqchip interface if available from Sam Mendoza-Jonas - powerpc: Add module autoloading based on CPU features from Alastair D'Silva - crypto: vmx - Convert to CPU feature based module autoloading from Alastair D'Silva - Wake up kopald polling thread before waiting for events from Benjamin Herrenschmidt - xmon: Dump ISA 2.06 SPRs from Michael Ellerman - xmon: Dump ISA 2.07 SPRs from Michael Ellerman - Add a parameter to disable 1TB segs from Oliver O'Halloran - powerpc/boot: Add OPAL console to epapr wrappers from Oliver O'Halloran - Assign fixed PHB number based on device-tree properties from Guilherme G. Piccoli - pseries: Add pseries hotplug workqueue from John Allen - pseries: Add support for hotplug interrupt source from John Allen - pseries: Use kernel hotplug queue for PowerVM hotplug events from John Allen - pseries: Move property cloning into its own routine from Nathan Fontenot - pseries: Dynamic add entires to associativity lookup array from Nathan Fontenot - pseries: Auto-online hotplugged memory from Nathan Fontenot - pseries: Remove call to memblock_add() from Nathan Fontenot cxl: - Add set and get private data to context struct from Michael Neuling - make base more explicitly non-modular from Paul Gortmaker - Use for_each_compatible_node() macro from Wei Yongjun - Frederic Barrat - Abstract the differences between the PSL and XSL - Make vPHB device node match adapter's - Philippe Bergheaud - Add mechanism for delivering AFU driver specific events - Ignore CAPI adapters misplaced in switched slots - Refine slice error debug messages - Andrew Donnellan - static-ify variables to fix sparse warnings - PCI/hotplug: pnv_php: export symbols and move struct types needed by cxl - PCI/hotplug: pnv_php: handle OPAL_PCI_SLOT_OFFLINE power state - Add cxl_check_and_switch_mode() API to switch bi-modal cards - remove dead Kconfig options - fix potential NULL dereference in free_adapter() - Ian Munsie - Update process element after allocating interrupts - Add support for CAPP DMA mode - Fix allowing bogus AFU descriptors with 0 maximum processes - Fix allocating a minimum of 2 pages for the SPA - Fix bug where AFU disable operation had no effect - Workaround XSL bug that does not clear the RA bit after a reset - Fix NULL pointer dereference on kernel contexts with no AFU interrupts - powerpc/powernv: Split cxl code out into a separate file - Add cxl_slot_is_supported API - Enable bus mastering for devices using CAPP DMA mode - Move cxl_afu_get / cxl_afu_put to base - Allow a default context to be associated with an external pci_dev - Do not create vPHB if there are no AFU configuration records - powerpc/powernv: Add support for the cxl kernel api on the real phb - Add support for using the kernel API with a real PHB - Add kernel APIs to get & set the max irqs per context - Add preliminary workaround for CX4 interrupt limitation - Add support for interrupts on the Mellanox CX4 - Workaround PE=0 hardware limitation in Mellanox CX4 - powerpc/powernv: Fix pci-cxl.c build when CONFIG_MODULES=n selftests: - Test unaligned copy and paste from Chris Smart - Load Monitor Register Tests from Jack Miller - Cyril Bur - exec() with suspended transaction - Use signed long to read perf_event_paranoid - Fix usage message in context_switch - Fix generation of vector instructions/types in context_switch - Michael Ellerman - Use "Delta" rather than "Error" in normal output - Import Anton's mmap & futex micro benchmarks - Add a test for PROT_SAO" * tag 'powerpc-4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (263 commits) powerpc/mm: Parenthesise IS_ENABLED() in if condition tty/hvc: Use opal irqchip interface if available tty/hvc: Use IRQF_SHARED for OPAL hvc consoles selftests/powerpc: exec() with suspended transaction powerpc: Improve comment explaining why we modify VRSAVE powerpc/mm: Drop unused externs for hpte_init_beat[_v3]() powerpc/mm: Rename hpte_init_lpar() and move the fallback to a header powerpc/mm: Fix build break when PPC_NATIVE=n crypto: vmx - Convert to CPU feature based module autoloading powerpc: Add module autoloading based on CPU features powerpc/powernv/ioda: Fix endianness when reading TCEs powerpc/mm: Add memory barrier in __hugepte_alloc() powerpc/modules: Never restore r2 for a mprofile-kernel style mcount() call powerpc/ftrace: Separate the heuristics for checking call sites powerpc: Merge 32-bit and 64-bit setup_arch() powerpc/64: Make a few boot functions __init powerpc: Re-order setup_panic() powerpc: Re-order the call to smp_setup_cpu_maps() powerpc/32: Move cache info inits to a separate function powerpc/64: Move the content of setup_system() to setup_arch() ...	2016-07-30 21:01:36 -07:00
Uma Krishnan	1bd2b2823e	cxlflash: Verify problem state area is mapped before notifying shutdown If an EEH or some other hard error occurs while the adapter instance was being initialized, on the subsequent shutdown of the device, the system could crash with: [c000000f1da03b60] c0000000005eccfc pci_device_shutdown+0x6c/0x100 [c000000f1da03ba0] c0000000006d67d4 device_shutdown+0x1b4/0x2c0 [c000000f1da03c40] c0000000000ea30c kernel_restart_prepare+0x5c/0x80 [c000000f1da03c70] c0000000000ea48c kernel_restart+0x2c/0xc0 [c000000f1da03ce0] c0000000000ea970 SyS_reboot+0x1c0/0x2d0 [c000000f1da03e30] c000000000009204 system_call+0x38/0xb4 This crash is due to the AFU not being mapped when the shutdown notification routine is called and is a regression that was inserted recently with Commit `704c4b0ddc` ("cxlflash: Shutdown notify support for CXL Flash cards"). As a fix, shutdown notification should only occur when the AFU is mapped. Fixes: `704c4b0ddc` ("cxlflash: Shutdown notify support for CXL Flash cards") Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-07-27 00:32:06 -04:00
Andrew Donnellan	1e44727a0b	cxl: remove dead Kconfig options Remove the CXL_KERNEL_API and CXL_EEH Kconfig options, as they were only needed to coordinate the merging of the cxlflash driver. Also remove the stub implementation of cxl_perst_reloads_same_image() in cxlflash which is only used if CXL_EEH isn't defined (i.e. never). Suggested-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-19 20:12:29 +10:00
Uma Krishnan	704c4b0ddc	cxlflash: Shutdown notify support for CXL Flash cards Some CXL Flash cards need notification of device shutdown in order to flush pending I/Os. A PCI notification hook for shutdown has been added where the driver notifies the card and returns. When the device is removed in the PCI remove path, notification code will wait for shutdown processing to complete. Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Acked-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-07-12 23:16:31 -04:00
Uma Krishnan	96e1b660fa	cxlflash: Add device dependent flags Device dependent flags are needed to support functions that are specific to a particular device. One such case is - some CXL Flash cards need to be notified of device shutdown. For other CXL devices, this feature does not prove to be useful yet. Such distinct features need to be identified in the driver to bypass or invoke specific functionality. In this patch, a member 'flags' has been added to device dependent values. These flags will be used and expanded in the future to support various device specific functions. Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Acked-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-07-12 23:16:31 -04:00
Manoj N. Kumar	f411396d94	cxlflash: Fix to drain operations from previous reset While running 'sg_reset -H' in a loop with a user-space application active, hit the following exception: cpu 0x2: Vector: 300 (Data Access) pc: : afu_attach+0x50/0x240 [cxlflash] lr: : cxlflash_afu_recover+0x3dc/0x7d0 [cxlflash] pid = 20365, comm = run_block_fvt Linux version 4.5.0-491-26f710d+ cxlflash_afu_recover+0x3dc/0x7d0 [cxlflash] cxlflash_ioctl+0x5a8/0x6f0 [cxlflash] scsi_ioctl+0x3b0/0x4c0 sd_ioctl+0x110/0x190 blkdev_ioctl+0x28c/0xc20 block_ioctl+0xa4/0xd0 do_vfs_ioctl+0xd8/0x8c0 SyS_ioctl+0xd4/0xf0 system_call+0x38/0xb4 The problem here is that the problem space area is unmapped while the application issues the DK_CXLFLASH_RECOVER_AFU ioctl. This is the order I observe: proc1 proc2 1) sg_reset 2) ioctl(DK_CXLFLASH_RECOVER_AFU) 3) sg_reset again causing a PSA unmap 4) continues RECOVER_AFU processing The resolution to this problem is to have the reset handler drain all outstanding user space initiated ioctls before proceeding. It is safe to drain after the state has been changed to STATE_RESET. Also since drain_ioctls() was static, it had to be moved up a bit to be before cxlflash_eh_host_reset_handler(). Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-07-12 23:16:31 -04:00
Manoj N. Kumar	635f6b0893	cxlflash: Fix to resolve dead-lock during EEH recovery When a cxlflash adapter goes into EEH recovery and multiple processes (each having established its own context) are active, the EEH recovery can hang if the processes attempt to recover in parallel. The symptom logged after a couple of minutes is: INFO: task eehd:48 blocked for more than 120 seconds. Not tainted 4.5.0-491-26f710d+ #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. eehd 0 48 2 Call Trace: __switch_to+0x2f0/0x410 __schedule+0x300/0x980 schedule+0x48/0xc0 rwsem_down_write_failed+0x294/0x410 down_write+0x88/0xb0 cxlflash_pci_error_detected+0x100/0x1c0 [cxlflash] cxl_vphb_error_detected+0x88/0x110 [cxl] cxl_pci_error_detected+0xb0/0x1d0 [cxl] eeh_report_error+0xbc/0x130 eeh_pe_dev_traverse+0x94/0x160 eeh_handle_normal_event+0x17c/0x450 eeh_handle_event+0x184/0x370 eeh_event_handler+0x1c8/0x1d0 kthread+0x110/0x130 ret_from_kernel_thread+0x5c/0xa4 INFO: task blockio:33215 blocked for more than 120 seconds. Not tainted 4.5.0-491-26f710d+ #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. blockio 0 33215 33213 Call Trace: 0x1 (unreliable) __switch_to+0x2f0/0x410 __schedule+0x300/0x980 schedule+0x48/0xc0 rwsem_down_read_failed+0x124/0x1d0 down_read+0x68/0x80 cxlflash_ioctl+0x70/0x6f0 [cxlflash] scsi_ioctl+0x3b0/0x4c0 sg_ioctl+0x960/0x1010 do_vfs_ioctl+0xd8/0x8c0 SyS_ioctl+0xd4/0xf0 system_call+0x38/0xb4 INFO: task eehd:48 blocked for more than 120 seconds. The hang is because of a 3 way dead-lock: Process A holds the recovery mutex, and waits for eehd to complete. Process B holds the semaphore and waits for the recovery mutex. eehd waits for semaphore. The fix is to have Process B above release the semaphore before attempting to acquire the recovery mutex. This will allow eehd to proceed to completion. Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-05-05 21:02:17 -04:00
James Bottomley	6ea7e3873e	Merge branch 'fixes-base' into fixes	2016-04-05 06:56:47 -04:00
Manoj N. Kumar	ea76543127	cxlflash: Move to exponential back-off when cmd_room is not available While profiling the cxlflash_queuecommand() path under a heavy load it was found that number of retries to find cmd_room was fairly high. There are two problems with the current back-off: a) It starts with a udelay of 0 b) It backs-off linearly Tried several approaches (a higher multiple 10n, 100n, as well as n^2, 2^n) and found that the exponential back-off(2^n) approach had the least overall cost. Cost as being defined as overall time spent waiting. The fix is to change the linear back-off to an exponential back-off. This solution also takes care of the problem with the initial delay (starts with 1 usec). Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-03-28 20:43:34 -04:00
Manoj N. Kumar	9526f36026	cxlflash: Fix regression issue with re-ordering patch While running 'sg_reset -H' back to back the following exception was seen: [ 735.115695] Faulting instruction address: 0xd0000000098c0864 cpu 0x0: Vector: 300 (Data Access) at [c000000ffffafa80] pc: d0000000098c0864: cxlflash_async_err_irq+0x84/0x5c0 [cxlflash] lr: c00000000013aed0: handle_irq_event_percpu+0xa0/0x310 sp: c000000ffffafd00 msr: 9000000000009033 dar: 2010000 dsisr: 40000000 current = 0xc000000001510880 paca = 0xc00000000fb80000 softe: 0 irq_happened: 0x01 pid = 0, comm = swapper/0 Linux version 4.5.0-491-26f710d+ enter ? for help [c000000ffffafe10] c00000000013aed0 handle_irq_event_percpu+0xa0/0x310 [c000000ffffafed0] c00000000013b1a8 handle_irq_event+0x68/0xc0 [c000000ffffaff00] c0000000001404ec handle_fasteoi_irq+0xec/0x2a0 [c000000ffffaff30] c00000000013a084 generic_handle_irq+0x54/0x80 [c000000ffffaff60] c000000000011130 __do_irq+0x80/0x1d0 [c000000ffffaff90] c000000000024d40 call_do_irq+0x14/0x24 [c000000001573a20] c000000000011318 do_IRQ+0x98/0x140 [c000000001573a70] c000000000002594 hardware_interrupt_common+0x114/0x180 This exception is being hit because the async_err interrupt path performs an MMIO to read the interrupt status register. The MMIO region in this case is not available. Commit `6ded8b3cbd` ("cxlflash: Unmap problem state area before detaching master context") re-ordered the sequence in which term_mc() and stop_afu() are called. This introduces a window for interrupts to come in with the problem space area unmapped, that did not exist previously. The fix is to separate the disabling of all AFU interrupts to a distinct function, term_intr() so that it is the first thing that is done in the tear down process. To keep the initialization process symmetric, separate the AFU interrupt setup also to a distinct function: init_intr(). Fixes: `6ded8b3cbd` ("cxlflash: Unmap problem state area before detaching master context") Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-03-28 20:43:17 -04:00
Linus Torvalds	d5e2d00898	powerpc updates for 4.6 Highlights: - Restructure Linux PTE on Book3S/64 to Radix format from Paul Mackerras - Book3s 64 MMU cleanup in preparation for Radix MMU from Aneesh Kumar K.V - Add POWER9 cputable entry from Michael Neuling - FPU/Altivec/VSX save/restore optimisations from Cyril Bur - Add support for new ftrace ABI on ppc64le from Torsten Duwe Various cleanups & minor fixes from: - Adam Buchbinder, Andrew Donnellan, Balbir Singh, Christophe Leroy, Cyril Bur, Luis Henriques, Madhavan Srinivasan, Pan Xinhui, Russell Currey, Sukadev Bhattiprolu, Suraj Jitindar Singh. General: - atomics: Allow architectures to define their own __atomic_op_* helpers from Boqun Feng - Implement atomic{, 64}__return_ variants and acquire/release/relaxed variants for (cmp)xchg from Boqun Feng - Add powernv_defconfig from Jeremy Kerr - Fix BUG_ON() reporting in real mode from Balbir Singh - Add xmon command to dump OPAL msglog from Andrew Donnellan - Add xmon command to dump process/task similar to ps(1) from Douglas Miller - Clean up memory hotplug failure paths from David Gibson pci/eeh: - Redesign SR-IOV on PowerNV to give absolute isolation between VFs from Wei Yang. - EEH Support for SRIOV VFs from Wei Yang and Gavin Shan. - PCI/IOV: Rename and export virtfn_{add, remove} from Wei Yang - PCI: Add pcibios_bus_add_device() weak function from Wei Yang - MAINTAINERS: Update EEH details and maintainership from Russell Currey cxl: - Support added to the CXL driver for running on both bare-metal and hypervisor systems, from Christophe Lombard and Frederic Barrat. - Ignore probes for virtual afu pci devices from Vaibhav Jain perf: - Export Power8 generic and cache events to sysfs from Sukadev Bhattiprolu - hv-24x7: Fix usage with chip events, display change in counter values, display domain indices in sysfs, eliminate domain suffix in event names, from Sukadev Bhattiprolu Freescale: - Updates from Scott: "Highlights include 8xx optimizations, 32-bit checksum optimizations, 86xx consolidation, e5500/e6500 cpu hotplug, more fman and other dt bits, and minor fixes/cleanup." -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJW69OrAAoJEFHr6jzI4aWAe5EQAJw/hE6WBQc6a7Tj70AnXOqR qk/m5pZjuTwQxfBteIvHR1pE5eXdlvtAjcD254LVkFkAbIn19W/h2k0VX/nlee7P n/VRHRifjtGmukqHrPYJJ7ua9mNlY7pxh3leGSixBFASnSWqMxNNNziNQtSTcuCs TjHiw6NkZ/kzeunA4bAfE4yHVUZjmL74oiS9JbLyaVHqoW4fqWLlh26AKo2yYMZI qPicBBG4HBi3FGvoexnKxlJNdcV4HO7LzDjJmCSfUKYCJi+Pw19T5qmhso0q0qVz vHg/A8HNeG4Hn83pNVmLeQSAIQRZ3DvTtcLgbjPo+TVwm/hzrRRBWipTeOVbkLW8 2bcOXT4t7LWUq15EAJ1LYgYZGzcLrfRfUeOcuQ1TWd3+PcfY9pE7FmizsxAAfaVe E9j9mpz4XnIqBtWkFHneTIHkQ5OWptyKuZJEaYH0nut4VsP0k8NarkseafGqBPu7 5eG83gbiQbCVixfOgblV9eocJ29JcwpjPAY4CZSGJimShg909FV7WRgZgJkKWrbK dBRco8Jcp4VglGfo2qymv7Uj4KwQoypBREOhiKUvrAsVlDxPfx+bcskhjGu9xGDC xs/+nme0/lKa/wg5K4C3mQ1GAlkMWHI0ojhJjsyODbetup5UbkEu03wjAaTdO9dT Y6ptGm0rYAJluPNlziFj =qkAt -----END PGP SIGNATURE----- Merge tag 'powerpc-4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "This was delayed a day or two by some build-breakage on old toolchains which we've now fixed. There's two PCI commits both acked by Bjorn. There's one commit to mm/hugepage.c which is (co)authored by Kirill. Highlights: - Restructure Linux PTE on Book3S/64 to Radix format from Paul Mackerras - Book3s 64 MMU cleanup in preparation for Radix MMU from Aneesh Kumar K.V - Add POWER9 cputable entry from Michael Neuling - FPU/Altivec/VSX save/restore optimisations from Cyril Bur - Add support for new ftrace ABI on ppc64le from Torsten Duwe Various cleanups & minor fixes from: - Adam Buchbinder, Andrew Donnellan, Balbir Singh, Christophe Leroy, Cyril Bur, Luis Henriques, Madhavan Srinivasan, Pan Xinhui, Russell Currey, Sukadev Bhattiprolu, Suraj Jitindar Singh. General: - atomics: Allow architectures to define their own __atomic_op_* helpers from Boqun Feng - Implement atomic{, 64}__return_ variants and acquire/release/ relaxed variants for (cmp)xchg from Boqun Feng - Add powernv_defconfig from Jeremy Kerr - Fix BUG_ON() reporting in real mode from Balbir Singh - Add xmon command to dump OPAL msglog from Andrew Donnellan - Add xmon command to dump process/task similar to ps(1) from Douglas Miller - Clean up memory hotplug failure paths from David Gibson pci/eeh: - Redesign SR-IOV on PowerNV to give absolute isolation between VFs from Wei Yang. - EEH Support for SRIOV VFs from Wei Yang and Gavin Shan. - PCI/IOV: Rename and export virtfn_{add, remove} from Wei Yang - PCI: Add pcibios_bus_add_device() weak function from Wei Yang - MAINTAINERS: Update EEH details and maintainership from Russell Currey cxl: - Support added to the CXL driver for running on both bare-metal and hypervisor systems, from Christophe Lombard and Frederic Barrat. - Ignore probes for virtual afu pci devices from Vaibhav Jain perf: - Export Power8 generic and cache events to sysfs from Sukadev Bhattiprolu - hv-24x7: Fix usage with chip events, display change in counter values, display domain indices in sysfs, eliminate domain suffix in event names, from Sukadev Bhattiprolu Freescale: - Updates from Scott: "Highlights include 8xx optimizations, 32-bit checksum optimizations, 86xx consolidation, e5500/e6500 cpu hotplug, more fman and other dt bits, and minor fixes/cleanup" * tag 'powerpc-4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (179 commits) powerpc: Fix unrecoverable SLB miss during restore_math() powerpc/8xx: Fix do_mtspr_cpu6() build on older compilers powerpc/rcpm: Fix build break when SMP=n powerpc/book3e-64: Use hardcoded mttmr opcode powerpc/fsl/dts: Add "jedec,spi-nor" flash compatible powerpc/T104xRDB: add tdm riser card node to device tree powerpc32: PAGE_EXEC required for inittext powerpc/mpc85xx: Add pcsphy nodes to FManV3 device tree powerpc/mpc85xx: Add MDIO bus muxing support to the board device tree(s) powerpc/86xx: Introduce and use common dtsi powerpc/86xx: Update device tree powerpc/86xx: Move dts files to fsl directory powerpc/86xx: Switch to kconfig fragments approach powerpc/86xx: Update defconfigs powerpc/86xx: Consolidate common platform code powerpc32: Remove one insn in mulhdu powerpc32: small optimisation in flush_icache_range() powerpc: Simplify test in __dma_sync() powerpc32: move xxxxx_dcache_range() functions inline powerpc32: Remove clear_pages() and define clear_page() inline ...	2016-03-19 15:38:41 -07:00
Frederic Barrat	ca946d4e4a	cxlflash: Use new cxl_pci_read_adapter_vpd() API To read the adapter VPD, drivers can't rely on pci config APIs, as it wouldn't work on powerVM. cxl introduced a new kernel API especially for this, so start using it. Co-authored-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Reviewed-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-03-09 23:40:01 +11:00
Manoj N. Kumar	83430833b4	cxlflash: Increase cmd_per_lun for better throughput With the current value of cmd_per_lun at 16, the throughput over a single adapter is limited to around 150kIOPS. Increase the value of cmd_per_lun to 256 to improve throughput. With this change a single adapter is able to attain close to the maximum throughput (380kIOPS). Also change the number of RRQ entries that can be queued. Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Reviewed-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-03-08 21:17:33 -05:00
Manoj N. Kumar	603ecce95f	cxlflash: Fix to avoid unnecessary scan with internal LUNs When switching to the internal LUN defined on the IBM CXL flash adapter, there is an unnecessary scan occurring on the second port. This scan leads to the following extra lines in the log: Dec 17 10:09:00 tul83p1 kernel: [ 3708.561134] cxlflash 0008:00:00.0: cxlflash_queuecommand: (scp=c0000000fc1f0f00) 11/1/0/0 cdb=(A0000000-00000000-10000000-00000000) Dec 17 10:09:00 tul83p1 kernel: [ 3708.561147] process_cmd_err: cmd failed afu_rc=32 scsi_rc=0 fc_rc=0 afu_extra=0xE, scsi_extra=0x0, fc_extra=0x0 By definition, both of the internal LUNs are on the first port/channel. When the lun_mode is switched to internal LUN the same value for host->max_channel is retained. This causes an unnecessary scan over the second port/channel. This fix alters the host->max_channel to 0 (1 port), if internal LUNs are configured and switches it back to 1 (2 ports) while going back to external LUNs. Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Reviewed-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-03-08 21:17:33 -05:00
Uma Krishnan	5d1952acd0	cxlflash: Reorder user context initialization In order to support cxlflash in the PowerVM environment, underlying hypervisor APIs have imposed a kernel API ordering change. For the superpipe access to LUN, user applications need a context. The cxlflash module creates this context by making a sequence of cxl calls. In the current code, a context is initialized via cxl_dev_context_init() followed by cxl_process_element(), a function that obtains the process element id. Finally, cxl_start_work() is called to attach the process element. In the PowerVM environment, a process element id cannot be obtained from the hypervisor until the process element is attached. The cxlflash module is unable to create contexts without a valid process element id. To fix this problem, cxl_start_work() is called before obtaining the process element id. Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-03-08 21:17:33 -05:00
Matthew R. Ochs	8a96b52af5	cxlflash: Simplify attach path error cleanup The cxlflash_disk_attach() routine currently uses a cascading error gate strategy for its error cleanup path. While this strategy is commonly used to handle cleanup scenarios, it is too restrictive when function callouts need to be restructured. Problems range from inserting error path bugs in previously 'good' code to the cleanup path imposing design changes to how the normal path is structured. A less restrictive approach is needed to support ordering changes that come about when operating in different environments. To overcome this restriction, the error cleanup path is modified to have a single entrypoint and use conditional logic to cleanup where necessary. Entities that require multiple cleanup steps must be carefully vetted to ensure their APIs support state. In cases where they do not (none as of this commit) additional local variables can be used to maintain state on their behalf. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Reviewed-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-03-08 21:17:33 -05:00
Matthew R. Ochs	5e6632d19e	cxlflash: Split out context initialization Presently, context information structures are allocated and initialized in the same routine, create_context(). This imposes an ordering restriction such that all pieces of information needed to initialize a context must be known before the context is even allocated. This design point is not flexible when the order of context creation needs to be modified. Specifically, this can lead to problems when members of the context information structure are a part of an ordering dependency (i.e. - the 'work' structure embedded within the context). To remedy, the allocation is left as-is, inside of the existing create_context() routine and the initialization is transitioned to a new void routine, init_context(). At the same time, in anticipation of these routines not being called in sequence, a state boolean is added to the context information structure to track when the context has been initilized. The context teardown routine, destroy_context(), is modified to support being called with a non-initialized context. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Reviewed-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-03-08 21:17:33 -05:00
Uma Krishnan	6ded8b3cbd	cxlflash: Unmap problem state area before detaching master context When operating in the PowerVM environment, the cxlflash module can receive an error from the hypervisor indicating that there are existing mappings in the page table for the process MMIO space. This issue exists because term_afu() currently invokes term_mc() before stop_afu(), allowing for the master context to be detached first and the problem state area to be unmapped second. To resolve this issue, stop_afu() should be called before term_mc(). Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-03-08 21:17:33 -05:00
Manoj N. Kumar	961487e46a	cxlflash: Simplify PCI registration The calls to pci_request_regions(), pci_resource_start(), pci_set_dma_mask(), pci_set_master() and pci_save_state() are all unnecessary for the IBM CXL flash adapter since data buffers are not required to be mapped to the device's memory. The use of services such as pci_set_dma_mask() are problematic on hypervisor managed systems as the IBM CXL flash adapter is operating under a virtual PCI Host Bridge (virtual PHB) which does not support these services. cxlflash 0001:00:00.0: init_pci: Failed to set PCI DMA mask rc=-5 The resolution is to simplify init_pci(), to a point where it does the bare minimum (pci_enable_device). Similarly, remove the call the pci_release_regions() from cxlflash_remove(). Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Reviewed-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-03-08 21:17:33 -05:00
Manoj Kumar	a2746fb16e	cxlflash: Enable device id for future IBM CXL adapter This drop enables a future card with a device id of 0x0600 to be recognized by the cxlflash driver. As per the design, the Accelerator Function Unit (AFU) for this new IBM CXL Flash Adapter retains the same host interface as the previous generation. For the early prototypes of the new card, the driver with this change behaves exactly as the driver prior to this behaved with the earlier generation card. Therefore, no card specific programming has been added. These card specific changes can be staged in later if needed. Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-01-06 20:58:29 -05:00
Manoj Kumar	b45cdbaf9f	cxlflash: Resolve oops in wait_port_offline If an async error interrupt is generated, and the error requires the FC link to be reset, it cannot be performed in the interrupt context. So a work element is scheduled to complete the link reset in a process context. If either an EEH event or an escalation occurs in between when the interrupt is generated and the scheduled work is started, the MMIO space may no longer be available. This will cause an oops in the worker thread. [ 606.806583] NIP kthread_data+0x28/0x40 [ 606.806633] LR wq_worker_sleeping+0x30/0x100 [ 606.806694] Call Trace: [ 606.806721] 0x50 (unreliable) [ 606.806796] wq_worker_sleeping+0x30/0x100 [ 606.806884] __schedule+0x69c/0x8a0 [ 606.806959] schedule+0x44/0xc0 [ 606.807034] do_exit+0x770/0xb90 [ 606.807109] die+0x300/0x460 [ 606.807185] bad_page_fault+0xd8/0x150 [ 606.807259] handle_page_fault+0x2c/0x30 [ 606.807338] wait_port_offline.constprop.12+0x60/0x130 [cxlflash] To prevent the problem space area from being unmapped, when there is pending work, a mapcount (using the kref mechanism) is held. The mapcount is released only when the work is completed. The last reference release is tied to the unmapping service. Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Reviewed-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-01-06 20:55:01 -05:00
Manoj Kumar	ee91e332a6	cxlflash: Fix to resolve cmd leak after host reset After a few iterations of resetting the card, either during EEH recovery, or a host_reset the following is seen in the logs. cxlflash 0008:00: cxlflash_queuecommand: could not get a free command At every reset of the card, the commands that are outstanding are being leaked. No effort is being made to reap these commands. A few more resets later, the above error message floods the logs and the card is rendered totally unusable as no free commands are available. Iterated through the 'cmd' queue and printed out the 'free' counter and found that on each reset certain commands were in-use and stayed in-use through subsequent resets. To resolve this issue, when the card is reset, reap all the commands that are active/outstanding. Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-01-06 20:52:31 -05:00
Uma Krishnan	8559921891	cxlflash: Removed driver date print Having a date for the driver requires it to be updated quite often. Removing the date which is not necessary. Also made use of the existing symbol to print the driver name. Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-01-06 20:51:14 -05:00
Matthew R. Ochs	d5e26bb1d8	cxlflash: Fix to avoid virtual LUN failover failure Applications which use virtual LUN's that are backed by a physical LUN over both adapter ports may experience an I/O failure in the event of a link loss (e.g. cable pull). Virtual LUNs may be accessed through one or both ports of the adapter. This access is encoded in the translation entries that comprise the virtual LUN and used by the AFU for load-balancing I/O and handling failover scenarios. In a link loss scenario, even though the AFU is able to maintain connectivity to the LUN, it is up to the application to retry the failed I/O. When applications are unaware of the virtual LUN's underlying topology, they are unable to make a sound decision of when to retry an I/O and therefore are forced to make their reaction to a failed I/O absolute. The result is either a failure to retry I/O or increased latency for scenarios where a retry is pointless. To remedy this scenario, provide feedback back to the application on virtual LUN creation as to which ports the LUN may be accessed. LUN's spanning both ports are candidates for a retry in a presence of an I/O failure. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Manoj Kumar <manoj@linux.vnet.ibm.com> Reviewed-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-01-06 20:50:08 -05:00
Manoj Kumar	a9be294ecb	cxlflash: Fix to escalate LINK_RESET also on port 1 The original fix to escalate a 'login timed out' error to a LINK_RESET was only made for one of the two ports on the card. This fix resolves the same issue for the second port (port 1). Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Reviewed-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-01-06 20:48:02 -05:00
Geliang Tang	21891a452a	cxlflash: drop unlikely before IS_ERR_OR_NULL IS_ERR_OR_NULL already contain an unlikely compiler flag. Drop it. Signed-off-by: Geliang Tang <geliangtang@163.com> Acked-by: Manoj Kumar <manoj@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2015-12-10 12:55:37 -05:00
Dan Carpenter	e37390bee6	cxlflash: a couple off by one bugs The "> MAX_CONTEXT" should be ">= MAX_CONTEXT". Otherwise we go one step beyond the end of the cfg->ctx_tbl[] array. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Manoj Kumar <manoj@linux.vnet.ibm.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2015-12-10 12:53:52 -05:00
Matthew R. Ochs	a82544c7ba	cxlflash: Fix to avoid bypassing context cleanup Contexts may be skipped over for cleanup in situations where contention for the adapter's table-list mutex is experienced in the presence of a signal during the execution of the release handler. This can lead to two known issues: - A hang condition on remove as that path tries to wait for users to cleanup - something that will never complete should this scenario play out as the user has already cleaned up from their perspective. - An Oops in the unmap_mapping_range() call that is made as part of the user waiting mechanism that is invoked on remove when contexts are found to still exist. The root cause of this issue can be found in get_context() and how the table-list mutex is acquired. As this code path is shared by several different access points within the driver, a decision was made during the development cycle to acquire this mutex in this location using the interruptible version of the mutex locking service. In almost all of the use-cases and environmental scenarios this holds up, even when the mutex is contended. However, for critical system threads (such as the release handler), failing to acquire the mutex and bailing with the intention of the user being able to try again later is unacceptable. In such a scenario, the context _must_ be derived as it is on an irreversible path to being freed. Without being able to derive the context, the code mistakenly assumes that it has already been freed and proceeds to free up the underlying CXL context resources. From this point on, any usage of [the now stale] CXL context resources will result in undefined behavior. This is root cause of the Oops mentioned as the second known issue as the mapping passed to the unmap_mapping_range() service is owned by the CXL context. To fix this problem, acquisition of the table-list mutex within get_context() is simply changed to use the uninterruptible version of the mutex locking service. This is safe as the timing windows for holding this mutex are short and also protected against blocking. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Manoj Kumar <manoj@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>	2015-10-30 17:25:34 +09:00
Matthew R. Ochs	0d73122c32	cxlflash: Fix to avoid lock instrumentation rejection When running with lock instrumentation (e.g. lockdep), some of the instrumentation can become disabled at probe time for a cxlflash adapter. This is due to a missing lock registration for the tmf_slock. The fix is to call spin_lock_init() for the tmf_slock during probe. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Manoj Kumar <manoj@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>	2015-10-30 17:24:38 +09:00
Matthew R. Ochs	1a47401bb3	cxlflash: Fix to avoid corrupting port selection mask The port selection mask of a LUN can be corrupted when the manage LUN ioctl (DK_CXLFLASH_MANAGE_LUN) is issued more than once for any device. This mask indicates to the AFU which port[s] can be used for a data transfer to/from a particular LUN. The mask is critical to ensuring the correct behavior when using the virtual LUN function of this adapter. When the mask is configured for both ports, an I/O may be sent to either port as the AFU assumes that each port has access to the same physical device (specified by LUN ID in the port LUN table). In a situation where the mask becomes incorrectly configured to reflect access to both ports when in fact there is only access through a single port, an I/O can be targeted to the wrong physical device. This can lead to data corruption among other ill effects (e.g. security leaks). The cause for this corruption is the assumption that the ioctl will only be called a second time for a LUN when it is being configured for access via a second port. A boolean 'newly_created' variable is used to differentiate between a LUN that was created (and subsequently configured for single port access) and one that is destined for access across both ports. While initially set to 'true', this sticky boolean is toggled to the 'false' state during a lookup on any next ioctl performed on a device with a matching WWN/WWID. The code fails to realize that the match could in fact be the same device calling in again. From here, an assumption is made that any LUN with 'newly_created' set to 'false' is configured for access over both ports and the port selection mask is set to reflect this. Any future attempts to use this LUN for hosting a virtual LUN will result in the port LUN table being incorrectly programmed. As a remedy, the 'newly_created' concept was removed entirely and replaced with code that always constructs the port selection mask based upon the SCSI channel of the LUN being accessed. The bits remain sticky, therefore allowing for a device to be accessed over both ports when that is in fact the correct physical configuration. Also included in this commit are a few minor related changes to enhance the fix and provide better debug information for port selection mask and port LUN table bugs in the future. These include renaming refresh_local() to lookup_local(), tracing the WWN/WWID as a big-endian entity, and tracing the port selection mask, SCSI channel, and LUN ID each time the port LUN table is programmed. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Manoj Kumar <manoj@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>	2015-10-30 17:23:55 +09:00
Manoj Kumar	e6e6df3f71	cxlflash: Fix to escalate to LINK_RESET on login timeout A 'login timed out' asynchronous error interrupt is generated if no response is seen to a FLOGI within 2 seconds. If the time out error is not escalated to a LINK_RESET the port will not be available for use. This fix provides the required escalation. Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>	2015-10-30 17:23:12 +09:00
Matthew R. Ochs	ee3491ba8f	cxlflash: Fix to avoid leaving dangling interrupt resources When running with an unsupported AFU, the cxlflash driver fails the probe. When the driver is removed, the following Oops is encountered on a show_interrupts() thread: Call Trace: [c000001fba5a7a10] [0000000000000003] 0x3 (unreliable) [c000001fba5a7a60] [c00000000053dcf4] vsnprintf+0x204/0x4c0 [c000001fba5a7ae0] [c00000000030045c] seq_vprintf+0x5c/0xd0 [c000001fba5a7b20] [c00000000030051c] seq_printf+0x4c/0x60 [c000001fba5a7b50] [c00000000013e140] show_interrupts+0x370/0x4f0 [c000001fba5a7c10] [c0000000002ff898] seq_read+0xe8/0x530 [c000001fba5a7ca0] [c00000000035d5c0] proc_reg_read+0xb0/0x110 [c000001fba5a7cf0] [c0000000002ca74c] __vfs_read+0x6c/0x180 [c000001fba5a7d90] [c0000000002cb464] vfs_read+0xa4/0x1c0 [c000001fba5a7de0] [c0000000002cc51c] SyS_read+0x6c/0x110 [c000001fba5a7e30] [c000000000009204] system_call+0x38/0xb4 The Oops is due to not cleaning up correctly on the unsupported AFU error path, leaving various allocated and registered resources. In this case, interrupts are in a semi-allocated/registered state, which the show_interrupts() thread attempts to use. To fix, the cleanup logic in init_afu() is consolidated to error gates at the bottom of the function and the appropriate goto is added to each error path. As a mini side fix while refactoring in this routine, the else statement following the AFU version evaluation is eliminated as it is not needed. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Manoj Kumar <manoj@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>	2015-10-30 17:22:44 +09:00
Matthew R. Ochs	aacb4ff69e	cxlflash: Fix to avoid potential deadlock on EEH Ioctl threads that use scsi_execute() can run for an excessive amount of time due to the fact that they have lengthy timeouts and retry logic built in. Under normal operation this is not an issue. However, once EEH enters the picture, a long execution time coupled with the possibility that a timeout can trigger entry to the driver via registered reset callbacks becomes a liability. In particular, a deadlock can occur when an EEH event is encountered while in running in scsi_execute(). As part of the recovery, the EEH handler drains all currently running ioctls, waiting until they have completed before proceeding with a reset. As the scsi_execute()'s are situated on the ioctl path, the EEH handler will wait until they (and the remainder of the ioctl handler they're associated with) have completed. Normally this would not be much of an issue aside from the longer recovery period. Unfortunately, the scsi_execute() triggers a reset when it times out. The reset handler will see that the device is already being reset and wait until that reset completed. This creates a condition where the EEH handler becomes stuck, infinitely waiting for the ioctl thread to complete. To avoid this behavior, temporarily unmark the scsi_execute() threads as an ioctl thread by releasing the ioctl read semaphore. This allows the EEH handler to proceed with a recovery while the thread is still running. Once the scsi_execute() returns, the ioctl read semaphore is reacquired and the adapter state is rechecked in case it changed while inside of scsi_execute(). The state check will wait if the adapter is still being recovered or returns a failure if the recovery failed. In the event that the adapter reset failed, the failure is simply returned as the ioctl would be unable to continue. Reported-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>	2015-10-30 17:22:11 +09:00
Matthew R. Ochs	fa3f2c6eb1	cxlflash: Correct trace string The trace following the failure of alloc_mem() incorrectly identifies which function failed. This can lead to misdiagnosing a failure. Fix the string to correctly indicate that alloc_mem() failed. Reported-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>	2015-10-30 17:21:38 +09:00
Matthew R. Ochs	17ead26f23	cxlflash: Fix to avoid corrupting adapter fops The fops owned by the adapter can be corrupted in certain scenarios, opening a window where certain fops are temporarily NULLed before being reset to their proper value. This can potentially lead software to make incorrect decisions, leaving the user with the inability to function as intended. An example of this behavior can be observed when there are a number of users with a high rate of turn around (attach to LUN, perform an I/O, detach from LUN, repeat). Every so often a user is given a valid context and adapter file descriptor, but the file associated with the descriptor lacks the correct read permission bit (FMODE_CAN_READ) and thus the read system call bails before calling the valid read fop. Background: The fops is stored in the adapter structure to provide the ability to lookup the adapter structure from within the fop handler. CXL services use the file's private_data and at present, the CXL context does not have a private section. In an effort to limit areas of the cxlflash driver with code specific the superpipe function, a design choice was made to keep the details of the fops situated away from the legacy portions of the driver. This drove the behavior that the adapter fops is set at the beginning of the disk attach ioctl handler when there are no users present. The corruption that this fix remedies is due to the fact that the fops is initially defaulted to values found within a static structure. When the fops is handed down to the CXL services later in the attach path, certain services are patched. The fops structure remains correct until the user count drops to 0 and the fops is reset, triggering the process to repeat again. The user counts are tightly coupled with the creation and deletion of the user context. If multiple users perform a disk attach at the same time, when the user count is currently 0, some users can be in the middle of obtaining a file descriptor and have not yet reached the context creation code that [in addition to creating the context] increments the user count. Subsequent users coming in to perform the attach see that the user count is still 0, and reinitialize the fops, temporarily removing the patched fops. The users that are in the middle obtaining their file descriptor may then receive an invalid descriptor. The fix simply removes the user count altogether and moves the fops initialization to probe time such that it is only performed one time for the life of the adapter. In the future, if the CXL services adopt a private member for their context, that could be used to store the adapter structure reference and cxlflash could revert to a model that does not require an embedded fops. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>	2015-10-30 17:20:00 +09:00
Manoj Kumar	b22b4037a0	cxlflash: Fix to double the delay each time The operator used to double the master context response delay is incorrect and does not result in delay doubling. To fix, use a left shift instead of the XOR operator. Reported-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>	2015-10-30 17:19:35 +09:00
Matthew R. Ochs	af10483e5e	cxlflash: Fix to prevent stale AFU RRQ Following an adapter reset, the AFU RRQ that resides in host memory holds stale data. This can lead to a condition where the RRQ interrupt handler tries to process stale entries and/or endlessly loops due to an out of sync generation bit. To fix, the AFU RRQ in host memory needs to be cleared after each reset. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>	2015-10-30 17:18:53 +09:00
Matthew R. Ochs	f15fbf8d4e	cxlflash: Correct spelling, grammar, and alignment mistakes There are several spelling and grammar mistakes throughout the driver. Additionally there are a handful of places where there are extra lines and unnecessary variables/statements. These are a nuisance and pollute the driver. Fix spelling and grammar issues. Update some comments for clarity and consistency. Remove extra lines and a few unneeded variables/statements. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>	2015-10-30 17:18:28 +09:00

1 2

82 Commits