linux-sg2042

Commit Graph

Author	SHA1	Message	Date
Ian Munsie	49e9c99f47	cxl: Fix allowing bogus AFU descriptors with 0 maximum processes If the AFU descriptor of an AFU directed AFU indicates that it supports 0 maximum processes, we will accept that value and attempt to use it. The SPA will still be allocated (with 2 pages due to another minor bug and room for 958 processes), and when a context is allocated we will pass the value of 0 to idr_alloc as the maximum. However, idr_alloc will treat that as meaning no maximum and will allocate a context number and we return a valid context. Conceivably, this could lead to a buffer overflow of the SPA if more than 958 contexts were allocated, however this is mitigated by the fact that there are no known AFUs in the wild with a bogus AFU descriptor like this, and that only the root user is allowed to flash an AFU image to a card. Add a check when validating the AFU descriptor to reject any with 0 maximum processes. We do still allow a dedicated process only AFU to indicate that it supports 0 contexts even though that is forbidden in the architecture, as in that case we ignore the value and use 1 instead. This is just on the off-chance that such a dedicated process AFU may exist (not that I am aware of any), since their developers are less likely to have cared about this value at all. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-08 22:10:42 +10:00
Andrew Donnellan	bdd910017c	powerpc/configs: Remove old symbols from defconfigs Update defconfigs to remove old symbols and comments referencing old symbols. Dropped: * AVERAGE * INET_LRO * EXT3_DEFAULTS_TO_ORDERED * EXT3_FS_XATTR * I2O * INFINIBAND_AMSO1100 * INFINIBAND_EHCA * IP1000 Replaced: * BLK_DEV_XIP -> BLK_DEV_RAM_DAX * CLK_PPC_CORENET -> CLK_QORIQ * EXT2_FS_XIP -> FS_DAX * EXT3_FS* -> EXT4_FS* Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-08 22:10:07 +10:00
Andrew Donnellan	fa2cff3f54	powerpc: Fix typo in comment reference to CONFIG_TRACE_IRQFLAGS Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-08 22:10:03 +10:00
Andrew Donnellan	53775c43fe	powerpc/ps3: Fix typo in comment reference to CONFIG_PS3_REPOSITORY_WRITE Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-08 22:09:57 +10:00
Andrew Donnellan	91dc068202	powerpc/eeh: Fix pr_debug()s in eeh_cache.c eeh_cache.c doesn't build cleanly with -DDEBUG when CONFIG_PHYS_ADDR_T_64BIT is set, as a couple of pr_debug()s use "%lx" for resource_size_t parameters. Use "%pap" instead, as it's the correct format specifier for types deriving from phys_addr_t. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-08 22:09:50 +10:00
Benjamin Herrenschmidt	a203658b5e	powerpc/opal: Wake up kopald polling thread before waiting for events On some environments (prototype machines, some simulators, etc...) there is no functional interrupt source to signal completion, so we rely on the fairly slow OPAL heartbeat. In a number of cases, the calls complete very quickly or even immediately. We've observed that it helps a lot to wakeup the OPAL heartbeat thread before waiting for event in those cases, it will call OPAL immediately to collect completions for anything that finished fast enough. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-By: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-08 19:53:26 +10:00
Michael Neuling	68a2d80c80	powerpc: Add MTD_BLOCK to powernv_defconfig This is so we can use the powernv_flash mtd driver as an block device. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-08 19:24:41 +10:00
Greg Kurz	8c6a0a1f40	powerpc/pseries: start rtasd before PCI probing A strange behaviour is observed when comparing PCI hotplug in QEMU, between x86 and pseries. If you consider the following steps: - start a VM - add a PCI device via the QEMU monitor before the rtasd has started (for example starting the VM in paused state, or hotplug during FW or boot loader) - resume the VM execution The x86 kernel detects the PCI device, but the pseries one does not. This happens because the rtasd kernel worker is currently started under device_initcall, while PCI probing happens earlier under subsys_initcall. As a consequence, if we have a pending RTAS event at boot time, a message is printed and the event is dropped. This patch moves all the initialization of rtasd to arch_initcall, which is run before subsys_call: this way, logging_enabled is true when the RTAS event pops up and it is not lost anymore. The proc fs bits stay at device_initcall because they cannot be run before fs_initcall. Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-08 19:22:15 +10:00
Guilherme G. Piccoli	63a72284b1	powerpc/pci: Assign fixed PHB number based on device-tree properties The domain/PHB field of PCI addresses has its value obtained from a global variable, incremented each time a new domain (represented by struct pci_controller) is added on the system. The domain addition process happens during boot or due to PHB hotplug add. As recent kernels are using predictable naming for network interfaces, the network stack is more tied to PCI naming. This can be a problem in hotplug scenarios, because PCI addresses will change if devices are removed and then re-added. This situation seems unusual, but it can happen if a user wants to replace a NIC without rebooting the machine, for example. This patch changes the way PCI domain values are generated: now, we use device-tree properties to assign fixed PHB numbers to PCI addresses when available (meaning pSeries and PowerNV cases). We also use a bitmap to allow dynamic PHB numbering when device-tree properties are not used. This bitmap keeps track of used PHB numbers and if a PHB is released (by hotplug operations for example), it allows the reuse of this PHB number, avoiding PCI address to change in case of device remove and re-add soon after. No functional changes were introduced. Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Ian Munsie <imunsie@au1.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> [mpe: Drop unnecessary machine_is(pseries) test] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-07 22:06:55 +10:00
Michael Ellerman	fc022fdf41	powerpc/kernel: Drop unused extern for current_set Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-07 22:03:10 +10:00
Michael Ellerman	d468fcafb7	powerpc/pci: Fix build with PCI_IOV=y and EEH=n Despite attempting to fix this in commit `fb36e90736` ("powerpc/pci: Fix SRIOV not building without EEH enabled"), the build is still broken when PCI_IOV=y and EEH=n (eg. g5_defconfig with PCI_IOV=y): arch/powerpc/kernel/pci_dn.c: In function ‘remove_dev_pci_data’: arch/powerpc/kernel/pci_dn.c:230:18: error: unused variable ‘edev’ Incorporate Ben's idea of using __maybe_unused to avoid so many #ifdefs. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-07 16:33:27 +10:00
Benjamin Herrenschmidt	fecbfabe1d	powerpc: Fix build with CONFIG_MEMORY_HOTPLUG on some configs For memory hotplug to work, the MMU code needs to provide the functions create_section_mapping() and remove_section_mapping() to respectively map and unmap portions of the linear mapping. At the moment only hash64 provides these, so we provide weak stubs that just error out. This fixes the build with configurations such as 64-bit BookE with CONFIG_MEMORY_HOTPLUG enabled. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-07 16:33:27 +10:00
Benjamin Herrenschmidt	e93d8e6773	powerpc/mm: Fix build of Book3E/64 with 64K pages Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-07 16:33:26 +10:00
Michael Ellerman	bc5c0a0d7f	selftests/powerpc: Use "Delta" rather than "Error" in normal output Use "Delta" to refer to the difference between measurements, rather than "Error", so scripts that look for "Error" aren't confused into thinking there was a failure. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-07 16:33:26 +10:00
Oliver O'Halloran	656ad58ef1	powerpc/boot: Add OPAL console to epapr wrappers This patch adds an OPAL console backend to the powerpc boot wrapper so that decompression failures inside the wrapper can be reported to the user. This is important since it typically indicates data corruption in the firmware and other nasty things. Currently this only works when building a little endian kernel. When compiling a 64 bit BE kernel the wrapper is always build 32 bit to be compatible with some 32 bit firmwares. BE support will be added at a later date. Another limitation of this is that only the "raw" type of OPAL console is supported, however machines that provide a hvsi console also provide a raw console so this is not an issue in practice. Actually-written-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> [mpe: Move #ifdef __powerpc64__ to avoid warnings on 32-bit] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-05 23:58:54 +10:00
Oliver O'Halloran	faf7882962	powerpc/mm: Add a parameter to disable 1TB segs This patch adds the kernel command line parameter "no_tb_segs" which forces the kernel to use 256MB rather than 1TB segments. Forcing the use of 256MB segments makes it considerably easier to test code that depends on an SLB miss occurring. Suggested-by: Michael Neuling <mikey@neuling.org> Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-05 23:58:54 +10:00
Oliver O'Halloran	7990102446	powerpc/timer: Large Decrementer support Power ISAv3 adds a large decrementer (LD) mode which increases the size of the decrementer register. The size of the enlarged decrementer register is between 32 and 64 bits with the exact size being dependent on the implementation. When in LD mode, reads are sign extended to 64 bits and a decrementer exception is raised when the high bit is set (i.e the value goes below zero). Writes however are truncated to the physical register width so some care needs to be taken to ensure that the high bit is not set when reloading the decrementer. This patch adds support for using the LD inside the host kernel on processors that support it. When LD mode is supported firmware will supply the ibm,dec-bits property for CPU nodes to allow the kernel to determine the maximum decrementer value. Enabling LD mode is a hypervisor privileged operation so the kernel can only enable it manually when running in hypervisor mode. Guests that support LD mode can request it using the "ibm,client-architecture-support" firmware call (not implemented in this patch) or some other platform specific method. If this property is not supplied then the traditional decrementer width of 32 bit is assumed and LD mode will not be enabled. This patch was based on initial work by Jack Miller. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Balbir Singh <bsingharora@gmail.com> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-05 23:58:53 +10:00
Anton Blanchard	9ddf0075f9	powerpc: Avoid -maltivec when using clang integrated assembler Check the assembler supports -maltivec by wrapping it with call as-option. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-05 23:58:53 +10:00
Rasmus Villemoes	e2be23712a	powerpc/pseries: Fix error return value in cmm_mem_going_offline() cmm_mem_going_offline() is (only) called from cmm_memory_cb(), which sends the return value through notifier_from_errno(). The latter expects 0 or -errno (notifier_to_errno(notifier_from_errno(x)) is 0 for any x >= 0, so passing a positive value cannot make sense). Hence negate ENOMEM. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-05 23:58:52 +10:00
Andrew Donnellan	a9862c7440	powerpc/rtas: Fix array overrun in ppc_rtas() syscall If ppc_rtas() is called with args.nargs == 16 and args.nret == 0, args.rets is set to point to &args.args[16], which is beyond the end of the args.args array. This results in a minor read overrun of the array when we check the first return code (which, per PAPR, is a required output of all RTAS calls) to see if there's been a hardware error. Change the nargs/nret check to ensure nargs is <= 15, allowing room for the status code. Users shouldn't be calling with nret == 0, but there's no real harm if they do, so we don't stop them. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-05 23:49:52 +10:00
Chris Smart	4375088072	selftests/powerpc: Test unaligned copy and paste Test that an ISA 3.0 compliant machine performing an unaligned copy, copy_first, paste or paste_last is sent a SIGBUS. Signed-off-by: Chris Smart <chris@distroguy.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-05 23:49:51 +10:00
Chris Smart	ae26b36f80	powerpc: Send SIGBUS on unaligned copy and paste Calling ISA 3.0 instructions copy, copy_first, paste and paste_last generates an alignment fault when copying or pasting unaligned data (128 byte). We catch this and send SIGBUS to the userspace process that caused it. We do not emulate these because paste may contain additional metadata when pasting to a co-processor and paste_last is the synchronisation point for preceding copy/paste sequences. Thanks to Michael Neuling <mikey@neuling.org> for his help. Signed-off-by: Chris Smart <chris@distroguy.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-05 23:49:51 +10:00
Michael Ellerman	0c63e8b7b9	selftests/powerpc: Import Anton's mmap & futex micro benchmarks These are useful little loops for smoke testing performance. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-05 23:49:50 +10:00
Cyril Bur	f2418ae8a8	selftests/powerpc: Fix generation of vector instructions/types in context_switch Currently it doesn't appear the resulting binary actually uses any Altivec or VSX instructions the solution is to explicitly tell GCC to use vector instructions and use vector types in the code. Part of this this issue can be GCC version specific: GCC 4.9.x is happy to use Altivec and VSX instructions if altivec.h is includedi (and possibly if vector types are used), this also means that 4.9.x will use VSX instructions even if only -maltivec is passed. It is also possible that Altivec instructions will be used even without -maltivec or -mabi=altivec. GCC 5.2.x complains about the lack of -maltivec parameter if altivec.h is included and will not use VSX unless -mvsx is present on commandline. GCC 5.3.0 has a regression that means __attribute__((__target__("no-vsx")) fails to build. A fix is targeted for 5.4. Furthermore LTO (Link Time Optimisation) doesn't play well with __attribute__((__target__("no-vsx")), LTO can cause GCC to forget about the attribute and compile with VSX instructions regardless. Be wary when enabling -flfo for this test. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-05 23:49:50 +10:00
Cyril Bur	94fa56a96a	selftests/powerpc: Fix usage message in context_switch When we inverted the behaviour of the flags we forgot to update the usage message. Fixes: `51c21e72eb` ("selftests/powerpc: Make context_switch touch FP/altivec/vector by default") Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-05 23:49:49 +10:00
Cyril Bur	d4ecdff2ec	selftests/powerpc/pmu: Use signed long to read perf_event_paranoid Excerpt from man 2 perf_event_open: /proc/sys/kernel/perf_event_paranoid The perf_event_paranoid file can be set to restrict access to the performance counters. 2 allow only user-space measurements. 1 allow both kernel and user measurements (default). 0 allow access to CPU-specific data but not raw tracepoint samples. -1 no restrictions. require_paranoia_below() should return 0 if perf_event_paranoid is below a specified level, the value from perf_event_paranoid is read into an unsigned long so the incorrect value is returned when perf_event_paranoid is set to -1. Without this patch applied there is the same number of selftests/powerpc which skip when /proc/sys/kernel/perf_event_paranoid is set to 1 or -1 but no skips when set to zero. With this patch applied there are no skipped selftests/powerpc test when /proc/sys/kernel/perf_event_paranoid is set to 0 or -1. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-05 23:49:49 +10:00
Madhavan Srinivasan	f1fb60bfde	powerpc/perf: Export Power9 generic and cache events to sysfs Export the generic hardware and cache perf events for Power9 to sysfs, so users can determine the PMU event monitored. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-05 23:49:48 +10:00
Madhavan Srinivasan	8c002dbd05	powerpc/perf: Power9 PMU support This patch adds base enablement for the power9 PMU. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-05 23:49:48 +10:00
Madhavan Srinivasan	34922527a2	powerpc/perf: Add power9 event list macros for generic and cache events Add macros for the generic and cache events on Power9 Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-05 23:49:48 +10:00
Madhavan Srinivasan	393eb79ad3	powerpc/perf: factor out power8 __init_pmu code Factor out the power8 pmu init functions to share with power9. Monitor Mode Control Register S(MMCRS) and Monitor Mode Control Register H(MMCRH) registers are dropped in Power9. These registers are added to new function which are included for power8 init. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-05 23:49:47 +10:00
Madhavan Srinivasan	7ffd948fae	powerpc/perf: factor out power8 pmu functions Factor out some of the power8 pmu functions to new file "isa207-common.c" to share with power9 pmu code. Only code movement and no logic change Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-05 23:49:47 +10:00
Madhavan Srinivasan	4d3576b207	powerpc/perf: factor out power8 pmu macros and defines Factor out some of the power8 pmu macros to new a header file to share with power9 pmu code. Just code movement and no logic change. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-05 23:49:46 +10:00
Michael Ellerman	b5b1cfc5d4	powerpc/fadump: Fix build error introduced by recent cleanup We spent so much time bike-shedding the printk() we missed that the next line was missing a semi-colon. And it seems none of our defconfigs turn on CONFIG_FA_DUMP. Fixes: `4a03749f14` ("powerpc/fadump: Trivial fix of spelling mistake, clean up message") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-07-05 23:49:46 +10:00
Darren Stevens	bfa37087aa	powerpc: Initialise pci_io_base as early as possible Commit `d6a9996e84` ("powerpc/mm: vmalloc abstraction in preparation for radix") turned kernel memory and IO addresses from #defined constants to variables initialised at runtime. On PA6T (pasemi) systems the setup_arch() machine call initialises the onboard PCI-e root-ports, and uses pci_io_base to do this, which is now before its value has been set, resulting in a panic early in boot before console IO is initialised. Move the pci_io_base initialisation to the same place as vmalloc ranges are set (hash__early_init_mmu()/radix__early_init_mmu()) - this is the earliest possible place we can initialise it. Fixes: `d6a9996e84` ("powerpc/mm: vmalloc abstraction in preparation for radix") Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Darren Stevens <darren@stevens-zone.net> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Add #ifdef CONFIG_PCI, massage change log slightly] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-06-30 16:52:29 +10:00
Suraj Jitindar Singh	43a1dd9b5f	powerpc/powernv: Add driver for operator panel on FSP machines Implement new character device driver to allow access from user space to the operator panel display present on IBM Power Systems machines with FSPs. This will allow status information to be presented on the display which is visible to a user. The driver implements a character buffer which a user can read/write by accessing the device (/dev/op_panel). This buffer is then displayed on the operator panel display. Any attempt to write past the last character position will have no effect and attempts to write more characters than the size of the display will be truncated. The device may only be accessed by a single process at a time. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-06-29 17:33:46 +10:00
Suraj Jitindar Singh	d0226d315d	powerpc/opal: Add inline function to get rc from an ASYNC_COMP opal_msg An opal_msg of type OPAL_MSG_ASYNC_COMP contains the return code in the params[1] struct member. However this isn't intuitive or obvious when reading the code and requires that a user look at the skiboot documentation or opal-api.h to verify this. Add an inline function to get the return code from an opal_msg and update call sites accordingly. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-06-29 17:33:18 +10:00
Suraj Jitindar Singh	1ae88fd54c	devicetree/bindings: Add binding for operator panel on FSP machines Add a binding to Documentation/devicetree/bindings/powerpc/opal (oppanel-opal.txt) for the operator panel which is present on IBM Power Systems machines with FSPs. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Acked-by: Rob Herring <robh@kernel.org> Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-06-29 17:33:17 +10:00
Michael Neuling	190ce8693c	powerpc/tm: Avoid SLB faults in treclaim/trecheckpoint when RI=0 Currently we have 2 segments that are bolted for the kernel linear mapping (ie 0xc000... addresses). This is 0 to 1TB and also the kernel stacks. Anything accessed outside of these regions may need to be faulted in. (In practice machines with TM always have 1T segments) If a machine has < 2TB of memory we never fault on the kernel linear mapping as these two segments cover all physical memory. If a machine has > 2TB of memory, there may be structures outside of these two segments that need to be faulted in. This faulting can occur when running as a guest as the hypervisor may remove any SLB that's not bolted. When we treclaim and trecheckpoint we have a window where we need to run with the userspace GPRs. This means that we no longer have a valid stack pointer in r1. For this window we therefore clear MSR RI to indicate that any exceptions taken at this point won't be able to be handled. This means that we can't take segment misses in this RI=0 window. In this RI=0 region, we currently access the thread_struct for the process being context switched to or from. This thread_struct access may cause a segment fault since it's not guaranteed to be covered by the two bolted segment entries described above. We've seen this with a crash when running as a guest with > 2TB of memory on PowerVM: Unrecoverable exception 4100 at c00000000004f138 Oops: Unrecoverable exception, sig: 6 [#1] SMP NR_CPUS=2048 NUMA pSeries CPU: 1280 PID: 7755 Comm: kworker/1280:1 Tainted: G X 4.4.13-46-default #1 task: c000189001df4210 ti: c000189001d5c000 task.ti: c000189001d5c000 NIP: c00000000004f138 LR: 0000000010003a24 CTR: 0000000010001b20 REGS: c000189001d5f730 TRAP: 4100 Tainted: G X (4.4.13-46-default) MSR: 8000000100001031 <SF,ME,IR,DR,LE> CR: 24000048 XER: 00000000 CFAR: c00000000004ed18 SOFTE: 0 GPR00: ffffffffc58d7b60 c000189001d5f9b0 00000000100d7d00 000000003a738288 GPR04: 0000000000002781 0000000000000006 0000000000000000 c0000d1f4d889620 GPR08: 000000000000c350 00000000000008ab 00000000000008ab 00000000100d7af0 GPR12: 00000000100d7ae8 00003ffe787e67a0 0000000000000000 0000000000000211 GPR16: 0000000010001b20 0000000000000000 0000000000800000 00003ffe787df110 GPR20: 0000000000000001 00000000100d1e10 0000000000000000 00003ffe787df050 GPR24: 0000000000000003 0000000000010000 0000000000000000 00003fffe79e2e30 GPR28: 00003fffe79e2e68 00000000003d0f00 00003ffe787e67a0 00003ffe787de680 NIP [c00000000004f138] restore_gprs+0xd0/0x16c LR [0000000010003a24] 0x10003a24 Call Trace: [c000189001d5f9b0] [c000189001d5f9f0] 0xc000189001d5f9f0 (unreliable) [c000189001d5fb90] [c00000000001583c] tm_recheckpoint+0x6c/0xa0 [c000189001d5fbd0] [c000000000015c40] __switch_to+0x2c0/0x350 [c000189001d5fc30] [c0000000007e647c] __schedule+0x32c/0x9c0 [c000189001d5fcb0] [c0000000007e6b58] schedule+0x48/0xc0 [c000189001d5fce0] [c0000000000deabc] worker_thread+0x22c/0x5b0 [c000189001d5fd80] [c0000000000e7000] kthread+0x110/0x130 [c000189001d5fe30] [c000000000009538] ret_from_kernel_thread+0x5c/0xa4 Instruction dump: 7cb103a6 7cc0e3a6 7ca222a6 78a58402 38c00800 7cc62838 08860000 7cc000a6 38a00006 78c60022 7cc62838 0b060000 <e8c701a0> 7ccff120 e8270078 e8a70098 ---[ end trace 602126d0a1dedd54 ]--- This fixes this by copying the required data from the thread_struct to the stack before we clear MSR RI. Then once we clear RI, we only access the stack, guaranteeing there's no segment miss. We also tighten the region over which we set RI=0 on the treclaim() path. This may have a slight performance impact since we're adding an mtmsr instruction. Fixes: `090b9284d7` ("powerpc/tm: Clear MSR RI in non-recoverable TM code") Signed-off-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-06-29 16:19:25 +10:00
Gavin Shan	cca0e542e0	powerpc/eeh: Fix wrong argument passed to eeh_rmv_device() When calling eeh_rmv_device() in eeh_reset_device() for partial hotplug case, @rmv_data instead of its address is the proper argument. Otherwise, the stack frame is corrupted when writing to @rmv_data (actually its address) in eeh_rmv_device(). It results in kernel crash as observed. This fixes the issue by passing @rmv_data, not its address to eeh_rmv_device() in eeh_reset_device(). Fixes: `67086e32b5` ("powerpc/eeh: powerpc/eeh: Support error recovery for VF PE") Reported-by: Pridhiviraj Paidipeddi <ppaidipe@in.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-06-28 20:47:49 +10:00
Michael Neuling	ad42de859f	cxl: Add set and get private data to context struct This provides AFU drivers a means to associate private data with a cxl context. This is particularly intended for make the new callbacks for driver specific events easier for AFU drivers to use, as they can easily get back to any private data structures they may use. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Philippe Bergheaud <felix@linux.vnet.ibm.com Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-06-28 18:35:08 +10:00
Philippe Bergheaud	b810253bd9	cxl: Add mechanism for delivering AFU driver specific events This adds an afu_driver_ops structure with fetch_event() and event_delivered() callbacks. An AFU driver such as cxlflash can fill this out and associate it with a context to enable passing custom AFU specific events to userspace. This also adds a new kernel API function cxl_context_pending_events(), that the AFU driver can use to notify the cxl driver that new specific events are ready to be delivered, and wake up anyone waiting on the context wait queue. The current count of AFU driver specific events is stored in the field afu_driver_events of the context structure. The cxl driver checks the afu_driver_events count during poll, select, read, etc. calls to check if an AFU driver specific event is pending, and calls fetch_event() to obtain and deliver that event. This way, the cxl driver takes care of all the usual locking semantics around these calls and handles all the generic cxl events, so that the AFU driver only needs to worry about it's own events. fetch_event() return a struct cxl_event_afu_driver_reserved, allocated by the AFU driver, and filled in with the specific event information and size. Total event size (header + data) should not be greater than CXL_READ_MIN_SIZE (4K). Th cxl driver prepends an appropriate cxl event header, copies the event to userspace, and finally calls event_delivered() to return the status of the operation to the AFU driver. The event is identified by the context and cxl_event_afu_driver_reserved pointers. Since AFU drivers provide their own means for userspace to obtain the AFU file descriptor (i.e. cxlflash uses an ioctl on their scsi file descriptor to obtain the AFU file descriptor) and the generic cxl driver will never use this event, the ABI of the event is up to each individual AFU driver. Signed-off-by: Philippe Bergheaud <felix@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-06-28 18:34:56 +10:00
Colin Ian King	6e8a9279a8	powerpc/powernv: Fix spelling mistake "Retrived" -> "Retrieved" Trivial fix to spelling mistake in pr_debug() message. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-06-28 13:52:18 +10:00
Colin Ian King	4a03749f14	powerpc/fadump: Trivial fix of spelling mistake, clean up message Fix trivial spelling mistake "rgistration". Also use pr_err() instead of printk() and unsplit the string to keep it all on one line. Signed-off-by: Colin Ian King <colin.king@canonical.com> [mpe: Keep rc on the same line, splitting it doesn't help] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-06-28 13:50:47 +10:00
Cyril Bur	8e96a87c54	powerpc/tm: Always reclaim in start_thread() for exec() class syscalls Userspace can quite legitimately perform an exec() syscall with a suspended transaction. exec() does not return to the old process, rather it load a new one and starts that, the expectation therefore is that the new process starts not in a transaction. Currently exec() is not treated any differently to any other syscall which creates problems. Firstly it could allow a new process to start with a suspended transaction for a binary that no longer exists. This means that the checkpointed state won't be valid and if the suspended transaction were ever to be resumed and subsequently aborted (a possibility which is exceedingly likely as exec()ing will likely doom the transaction) the new process will jump to invalid state. Secondly the incorrect attempt to keep the transactional state while still zeroing state for the new process creates at least two TM Bad Things. The first triggers on the rfid to return to userspace as start_thread() has given the new process a 'clean' MSR but the suspend will still be set in the hardware MSR. The second TM Bad Thing triggers in __switch_to() as the processor is still transactionally suspended but __switch_to() wants to zero the TM sprs for the new process. This is an example of the outcome of calling exec() with a suspended transaction. Note the first 700 is likely the first TM bad thing decsribed earlier only the kernel can't report it as we've loaded userspace registers. c000000000009980 is the rfid in fast_exception_return() Bad kernel stack pointer 3fffcfa1a370 at c000000000009980 Oops: Bad kernel stack pointer, sig: 6 [#1] CPU: 0 PID: 2006 Comm: tm-execed Not tainted NIP: c000000000009980 LR: 0000000000000000 CTR: 0000000000000000 REGS: c00000003ffefd40 TRAP: 0700 Not tainted MSR: 8000000300201031 <SF,ME,IR,DR,LE,TM[SE]> CR: 00000000 XER: 00000000 CFAR: c0000000000098b4 SOFTE: 0 PACATMSCRATCH: b00000010000d033 GPR00: 0000000000000000 00003fffcfa1a370 0000000000000000 0000000000000000 GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 00003fff966611c0 0000000000000000 0000000000000000 0000000000000000 NIP [c000000000009980] fast_exception_return+0xb0/0xb8 LR [0000000000000000] (null) Call Trace: Instruction dump: f84d0278 e9a100d8 7c7b03a6 e84101a0 7c4ff120 e8410170 7c5a03a6 e8010070 e8410080 e8610088 e8810090 e8210078 <4c000024> 48000000 e8610178 88ed023b Kernel BUG at c000000000043e80 [verbose debug info unavailable] Unexpected TM Bad Thing exception at c000000000043e80 (msr 0x201033) Oops: Unrecoverable exception, sig: 6 [#2] CPU: 0 PID: 2006 Comm: tm-execed Tainted: G D task: c0000000fbea6d80 ti: c00000003ffec000 task.ti: c0000000fb7ec000 NIP: c000000000043e80 LR: c000000000015a24 CTR: 0000000000000000 REGS: c00000003ffef7e0 TRAP: 0700 Tainted: G D MSR: 8000000300201033 <SF,ME,IR,DR,RI,LE,TM[SE]> CR: 28002828 XER: 00000000 CFAR: c000000000015a20 SOFTE: 0 PACATMSCRATCH: b00000010000d033 GPR00: 0000000000000000 c00000003ffefa60 c000000000db5500 c0000000fbead000 GPR04: 8000000300001033 2222222222222222 2222222222222222 00000000ff160000 GPR08: 0000000000000000 800000010000d033 c0000000fb7e3ea0 c00000000fe00004 GPR12: 0000000000002200 c00000000fe00000 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 c0000000fbea7410 00000000ff160000 GPR24: c0000000ffe1f600 c0000000fbea8700 c0000000fbea8700 c0000000fbead000 GPR28: c000000000e20198 c0000000fbea6d80 c0000000fbeab680 c0000000fbea6d80 NIP [c000000000043e80] tm_restore_sprs+0xc/0x1c LR [c000000000015a24] __switch_to+0x1f4/0x420 Call Trace: Instruction dump: 7c800164 4e800020 7c0022a6 f80304a8 7c0222a6 f80304b0 7c0122a6 f80304b8 4e800020 e80304a8 7c0023a6 e80304b0 <7c0223a6> e80304b8 7c0123a6 4e800020 This fixes CVE-2016-5828. Fixes: `bc2a9408fa` ("powerpc: Hook in new transactional memory code") Cc: stable@vger.kernel.org # v3.9+ Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-06-27 20:35:17 +10:00
Benjamin Herrenschmidt	cdb1b3424d	powerpc/pci: Reduce log level of PCI I/O space warning If a PHB has no I/O space, there's no need to make it look like something bad happened, a pr_debug() is plenty enough since this is the case of all our modern POWER chips. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-06-24 15:26:31 +10:00
Naveen N. Rao	156d0e290e	powerpc/ebpf/jit: Implement JIT compiler for extended BPF PPC64 eBPF JIT compiler. Enable with: echo 1 > /proc/sys/net/core/bpf_jit_enable or echo 2 > /proc/sys/net/core/bpf_jit_enable ... to see the generated JIT code. This can further be processed with tools/net/bpf_jit_disasm. With CONFIG_TEST_BPF=m and 'modprobe test_bpf': test_bpf: Summary: 305 PASSED, 0 FAILED, [297/297 JIT'ed] ... on both ppc64 BE and LE. The details of the approach are documented through various comments in the code. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-06-24 15:17:57 +10:00
Naveen N. Rao	6ac0ba5a4f	powerpc/bpf/jit: Isolate classic BPF JIT specifics into a separate header Break out classic BPF JIT specifics into a separate header in preparation for eBPF JIT implementation. Note that ppc32 will still need the classic BPF JIT. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-06-24 15:15:51 +10:00
Naveen N. Rao	cef1e8cdcd	powerpc/bpf/jit: A few cleanups 1. Per the ISA, ADDIS actually uses RT, rather than RS. Though the result is the same, make the usage clear. 2. The multiply instruction used is a 32-bit multiply. Rename PPC_MUL() to PPC_MULW() to make the same clear. 3. PPC_STW[U] take the entire 16-bit immediate value and do not require word-alignment, per the ISA. Change the macros to use IMM_L(). 4. A few white-space cleanups to satisfy checkpatch.pl. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-06-24 15:15:37 +10:00
Naveen N. Rao	277285b854	powerpc/bpf/jit: Introduce rotate immediate instructions Since we will be using the rotate immediate instructions for extended BPF JIT, let's introduce macros for the same. And since the shift immediate operations use the rotate immediate instructions, let's redo those macros to use the newly introduced instructions. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-06-24 15:15:14 +10:00
Naveen N. Rao	b1a057879a	powerpc/bpf/jit: Optimize 64-bit Immediate loads Similar to the LI32() optimization, if the value can be represented in 32-bits, use LI32(). Also handle loading a few specific forms of immediate values in an optimum manner. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-06-24 15:15:04 +10:00

1 2 3 4 5 ...

602451 Commits All Branches Search

602451 Commits

All Branches