Commit Graph

12643 Commits

Author SHA1 Message Date
Anton Blanchard c49f63530b powernv: Add OPAL tracepoints
Knowing how long we spend in firmware calls is an important part of
minimising OS jitter.

This patch adds tracepoints to each OPAL call. If tracepoints are
enabled we branch out to a common routine that calls an entry and exit
tracepoint.

This allows us to write tools that monitor the frequency and duration
of OPAL calls, eg:

name                  count  total(ms)  min(ms)  max(ms)  avg(ms)  period(ms)
OPAL_HANDLE_INTERRUPT     5      0.199    0.037    0.042    0.040   12547.545
OPAL_POLL_EVENTS        204      2.590    0.012    0.036    0.013    2264.899
OPAL_PCI_MSI_EOI       2830      3.066    0.001    0.005    0.001      81.166

We use jump labels if configured, which means we only add a single
nop instruction to every OPAL call when the tracepoints are disabled.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-11 16:06:08 +10:00
Anton Blanchard aaad422482 powerpc/pseries: Optimise hcall tracepoints
Now that we execute the hcall tracepoint entry and exit code out of
line, we can use the same stack across both functions.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-11 16:06:03 +10:00
Anton Blanchard cc1adb5f32 powerpc/pseries: Use jump labels for hcall tracepoints
hcall tracepoints add quite a few instructions to our hcall path:

plpar_hcall:
	mr      r2,r2
	mfcr    r0
	stw     r0,8(r1)
	b       164		<---- start
	ld      r12,0(r2)
	std     r12,32(r1)
	cmpdi   r12,0
	beq     164		<---- end
...

We have an unconditional branch that gets noped out during boot and
a load/compare/branch. We also store the tracepoint value to the
stack for the hcall_exit path to use.

By using jump labels we can simplify this to just a single nop that
gets replaced with a branch when the tracepoint is enabled:

plpar_hcall:
	mr      r2,r2
	mfcr    r0
	stw     r0,8(r1)
	nop			<----
...

If jump labels are not enabled, we fall back to the old method.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-11 16:05:58 +10:00
Alexey Kardashevskiy 8fa5d4547e powerpc/powernv: Add a page size parameter to pnv_pci_setup_iommu_table()
Since a TCE page size can be other than 4K, make it configurable for
P5IOC2 and IODA PHBs.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-11 16:05:53 +10:00
Alexey Kardashevskiy bc32057ed5 powerpc/powernv: Use it_page_shift in TCE build
This makes use of iommu_table::it_page_shift instead of TCE_SHIFT and
TCE_RPN_SHIFT hardcoded values.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-11 16:05:48 +10:00
Alexey Kardashevskiy b0376c9b08 powerpc/powernv: Use it_page_shift for TCE invalidation
This fixes IODA1/2 to use it_page_shift as it may be bigger than 4K.

This changes involved constant values to use "ull" modifier.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-11 16:05:43 +10:00
Benjamin Herrenschmidt 5b97259220 Merge branch 'merge' into next 2014-07-11 15:38:12 +10:00
Anton Blanchard f56029410a powerpc/perf: Never program book3s PMCs with values >= 0x80000000
We are seeing a lot of PMU warnings on POWER8:

    Can't find PMC that caused IRQ

Looking closer, the active PMC is 0 at this point and we took a PMU
exception on the transition from negative to 0. Some versions of POWER8
have an issue where they edge detect and not level detect PMC overflows.

A number of places program the PMC with (0x80000000 - period_left),
where period_left can be negative. We can either fix all of these or
just ensure that period_left is always >= 1.

This patch takes the second option.

Cc: <stable@vger.kernel.org>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-11 13:50:47 +10:00
Guenter Roeck fb43e8477e powerpc: Disable RELOCATABLE for COMPILE_TEST with PPC64
powerpc:allmodconfig has been failing for some time with the following
error.

arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
arch/powerpc/kernel/exceptions-64s.S:1312: Error: attempt to move .org backwards
make[1]: *** [arch/powerpc/kernel/head_64.o] Error 1

A number of attempts to fix the problem by moving around code have been
unsuccessful and resulted in failed builds for some configurations and
the discovery of toolchain bugs.

Fix the problem by disabling RELOCATABLE for COMPILE_TEST builds instead.
While this is less than perfect, it avoids substantial code changes
which would otherwise be necessary just to make COMPILE_TEST builds
happy and might have undesired side effects.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-11 12:55:09 +10:00
Joel Stanley b50a6c584b powerpc/perf: Clear MMCR2 when enabling PMU
On POWER8 when switching to a KVM guest we set bits in MMCR2 to freeze
the PMU counters. Aside from on boot they are then never reset,
resulting in stuck perf counters for any user in the guest or host.

We now set MMCR2 to 0 whenever enabling the PMU, which provides a sane
state for perf to use the PMU counters under either the guest or the
host.

This was manifesting as a bug with ppc64_cpu --frequency:

    $ sudo ppc64_cpu --frequency
    WARNING: couldn't run on cpu 0
    WARNING: couldn't run on cpu 8
      ...
    WARNING: couldn't run on cpu 144
    WARNING: couldn't run on cpu 152
    min:    18446744073.710 GHz (cpu -1)
    max:    0.000 GHz (cpu -1)
    avg:    0.000 GHz

The command uses a perf counter to measure CPU cycles over a fixed
amount of time, in order to approximate the frequency of the machine.
The counters were returning zero once a guest was started, regardless of
weather it was still running or had been shut down.

By dumping the value of MMCR2, it was observed that once a guest is
running MMCR2 is set to 1s - which stops counters from running:

    $ sudo sh -c 'echo p > /proc/sysrq-trigger'
    CPU: 0 PMU registers, ppmu = POWER8 n_counters = 6
    PMC1:  5b635e38 PMC2: 00000000 PMC3: 00000000 PMC4: 00000000
    PMC5:  1bf5a646 PMC6: 5793d378 PMC7: deadbeef PMC8: deadbeef
    MMCR0: 0000000080000000 MMCR1: 000000001e000000 MMCRA: 0000040000000000
    MMCR2: fffffffffffffc00 EBBHR: 0000000000000000
    EBBRR: 0000000000000000 BESCR: 0000000000000000
    SIAR:  00000000000a51cc SDAR:  c00000000fc40000 SIER:  0000000001000000

This is done unconditionally in book3s_hv_interrupts.S upon entering the
guest, and the original value is only save/restored if the host has
indicated it was using the PMU. This is okay, however the user of the
PMU needs to ensure that it is in a defined state when it starts using
it.

Fixes: e05b9b9e5c ("powerpc/perf: Power8 PMU support")
Cc: stable@vger.kernel.org
Signed-off-by: Joel Stanley <joel@jms.id.au>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-11 12:55:08 +10:00
Joel Stanley 4d9690dd56 powerpc/perf: Add PPMU_ARCH_207S define
Instead of separate bits for every POWER8 PMU feature, have a single one
for v2.07 of the architecture.

This saves us adding a MMCR2 define for a future patch.

Cc: stable@vger.kernel.org
Signed-off-by: Joel Stanley <joel@jms.id.au>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-11 12:55:07 +10:00
Joel Stanley f73128f4f6 powerpc/kvm: Remove redundant save of SIER AND MMCR2
These two registers are already saved in the block above. Aside from
being unnecessary, by the time we get down to the second save location
r8 no longer contains MMCR2, so we are clobbering the saved value with
PMC5.

MMCR2 primarily consists of counter freeze bits. So restoring the value
of PMC5 into MMCR2 will most likely have the effect of freezing
counters.

Fixes: 72cde5a88d ("KVM: PPC: Book3S HV: Save/restore host PMU registers that are new in POWER8")
Cc: stable@vger.kernel.org
Signed-off-by: Joel Stanley <joel@jms.id.au>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-11 12:55:07 +10:00
Preeti U Murthy c733cf83bb powerpc/powernv: Check for IRQHAPPENED before sleeping
Commit 8d6f7c5a: "powerpc/powernv: Make it possible to skip the IRQHAPPENED
check in power7_nap()" added code that prevents cpus from checking for
pending interrupts just before entering sleep state, which is wrong. These
interrupts are delivered during the soft irq disabled state of the cpu.

A cpu cannot enter any idle state with pending interrupts because they will
never be serviced until the next time the cpu is woken up by some other
interrupt. Its only then that the pending interrupts are replayed. This can result
in device timeouts or warnings about this cpu being stuck.

This patch fixes ths issue by ensuring that cpus check for pending interrupts
just before entering any idle state as long as they are not in the path of split
core operations.

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-11 12:55:06 +10:00
Michael Ellerman cd68098bce powerpc: Clean up MMU_FTRS_A2 and MMU_FTR_TYPE_3E
In fb5a515704 "powerpc: Remove platforms/wsp and associated pieces",
we removed the last user of MMU_FTRS_A2. So remove it.

MMU_FTRS_A2 was the last user of MMU_FTR_TYPE_3E, so remove it also.
This leaves some unreachable code in mmu_context_nohash.c, so remove
that also.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-11 12:55:05 +10:00
Michael Ellerman e623fbf1c4 powerpc/cell: Fix compilation with CONFIG_COREDUMP=n
Commit 046d662f48 "coredump: make core dump functionality optional"
made the coredump optional, but didn't update the spufs code that
depends on it. That leads to build errors such as:

  arch/powerpc/platforms/built-in.o: In function `.spufs_arch_write_note':
  coredump.c:(.text+0x22cd4): undefined reference to `.dump_emit'
  coredump.c:(.text+0x22cf4): undefined reference to `.dump_emit'
  coredump.c:(.text+0x22d0c): undefined reference to `.dump_align'
  coredump.c:(.text+0x22d48): undefined reference to `.dump_emit'
  coredump.c:(.text+0x22e7c): undefined reference to `.dump_skip'

Fix it by adding some ifdefs in the cell code.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-11 12:55:05 +10:00
Laurentiu TUDOR cd1154770b powerpc/85xx: drop hypervisor specific board compatibles
They're almost a duplicate of the boards array
and we can build them at run-time.

Signed-off-by: Laurentiu Tudor <Laurentiu.Tudor@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-07-02 17:33:10 -05:00
Shengzhou Liu 4c18be2bf5 powerpc/fsl-booke: Add initial T208x QDS board support
Add support for Freescale T2080/T2081 QDS Development System Board.

The T2080QDS Development System is a high-performance computing,
evaluation, and development platform that supports T2080 QorIQ
Power Architecture processor, with following major features:

T2080QDS feature overview:
Processor:
 - T2080 SoC integrating four 64-bit dual-threads e6500 cores up to 1.8GHz
Memory:
 - Single memory controller capable of supporting DDR3 and DDR3-LP
 - Dual DIMM slots up 2133MT/s with ECC
Ethernet interfaces:
 - Two 1Gbps RGMII on-board ports
 - Four 10Gbps XFI on-board cages
 - 1Gbps/2.5Gbps SGMII Riser card
 - 10Gbps XAUI Riser card
Accelerator:
 - DPAA components consist of FMan, BMan, QMan, PME, DCE and SEC
SerDes:
 - 16 lanes up to 10.3125GHz
 - Supports Aurora debug, PEX, SATA, SGMII, sRIO, HiGig, XFI and XAUI
IFC:
 - 128MB NOR Flash, 512MB NAND Flash, PromJet debug port and FPGA
eSPI:
 - Three SPI flash (16MB N25Q128A + 8MB EN25S64 + 512KB SST25WF040)
USB:
 - Two USB2.0 ports with internal PHY (one Type-A + one micro Type-AB)
PCIE:
 - Four PCI Express controllers (two PCIe 2.0 and two PCIe 3.0, SR-IOV)
SATA:
 - Two SATA 2.0 ports on-board
SRIO:
 - Two Serial RapidIO 2.0 ports up to 5 GHz
eSDHC:
 - Supports SD/MMC/eMMC Card
DMA:
 - Three 8-channels DMA controllers
I2C:
 - Four I2C controllers.
UART:
 - Dual 4-pins UART serial ports
System Logic:
 - QIXIS-II FPGA system controll

T2081QDS board shares the same PCB with T1040QDS with some differences.

Signed-off-by: Shengzhou Liu <Shengzhou.Liu@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-07-02 17:33:09 -05:00
Shengzhou Liu 1d8de8fced powerpc/fsl-booke: Add support for T2080/T2081 SoC
The T2080 QorIQ multicore processor combines four dual-threaded e6500 Power
Architecture processor cores with high-performance datapath acceleration
logic and network and peripheral bus interfaces required for networking,
telecom/datacom, wireless infrastructure, and mil/aerospace applications.

The T2080 SoC includes the following function and features:
- Four dual-threaded 64-bit Power architecture e6500 cores, up to 1.8GHz
- 2MB L2 cache and 512KB CoreNet platform cache (CPC)
- Hierarchical interconnect fabric
- One 32-/64-bit DDR3/3L SDRAM memory controllers with ECC and interleaving
- Data Path Acceleration Architecture (DPAA) incorporating acceleration
- 16 SerDes lanes up to 10.3125 GHz
- 8 Ethernet interfaces (multiple 1G/2.5G/10G MACs)
- High-speed peripheral interfaces
  - Four PCI Express controllers (two PCIe 2.0 and two PCIe 3.0)
  - Two Serial RapidIO 2.0 controllers/ports running at up to 5 GHz
- Additional peripheral interfaces
  - Two serial ATA (SATA 2.0) controllers
  - Two high-speed USB 2.0 controllers with integrated PHY
  - Enhanced secure digital host controller (SD/SDXC/eMMC)
  - Enhanced serial peripheral interface (eSPI)
  - Four I2C controllers
  - Four 2-pin UARTs or two 4-pin UARTs
  - Integrated Flash Controller supporting NAND and NOR flash
- Three eight-channel DMA engines
- Support for hardware virtualization and partitioning enforcement
- QorIQ Platform's Trust Architecture 2.0

T2081 is a reduced personality of T2080 with following difference:
Feature               T2080 T2081
1G Ethernet numbers:  8     6
10G Ethernet numbers: 4     2
SerDes lanes:         16    8
Serial RapidIO,RMan:  2     no
SATA Controller:      2     no
Aurora:               yes   no
SoC Package:          896-pins 780-pins

Signed-off-by: Shengzhou Liu <Shengzhou.Liu@freescale.com>
[scottwood@freescale.com: added fsl,qoriq-pci-v3.0 for U-Boot compat]
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-07-02 17:32:41 -05:00
Scott Wood 087dfae3fe powerpc/8xx: Remove empty asm/mpc8xx.h
m8xx_pcmcia_ops was the only thing in this file (other than a comment
that describes a usage that doesn't match the file's contents); now
that m8xx_pcmcia_ops is gone, remove the empty file.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Pantelis Antoniou <pantelis.antoniou@gmail.com>
Cc: Vitaly Bordug <vitb@kernel.crashing.org>
Cc: netdev@vger.kernel.org
2014-06-25 18:49:40 -05:00
Scott Wood 39eb56da2b pcmcia: Remove m8xx_pcmcia driver
This driver doesn't build, and apparently has not built since
arch/ppc was removed in 2008 (when mk_int_int_mask was removed
from asm/irq.h, among other build errors).

A few weeks ago I asked whether anyone was actively maintaining
this code, and got no positive response:
http://patchwork.ozlabs.org/patch/352082/

So, let's remove it.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Vitaly Bordug <vitb@kernel.crashing.org>
Cc: linux-pcmcia@lists.infradead.org
Cc: Paul Bolle <pebolle@tiscali.nl>
2014-06-25 18:49:39 -05:00
Bharat Bhushan 2759a7f13d booke/powerpc: define wimge shift mask to fix compilation error
This fixes below compilation error on SOCs where CONFIG_PHYS_64BIT
is not defined:

 arch/powerpc/kvm/e500_mmu_host.c: In function 'kvmppc_e500_shadow_map':
| arch/powerpc/kvm/e500_mmu_host.c:631:20: error: 'PTE_WIMGE_SHIFT' undeclared (first use in this function)
|    wimg = (*ptep >> PTE_WIMGE_SHIFT) & MAS2_WIMGE_MASK;
|                     ^
| arch/powerpc/kvm/e500_mmu_host.c:631:20: note: each undeclared identifier is reported only once for each function it appears in
| make[1]: *** [arch/powerpc/kvm/e500_mmu_host.o] Error 1

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-06-25 18:49:39 -05:00
Wladislav Wiebe c152833949 powerpc/traps/e500: fix misleading error output
In machine_check_e500 exception handler is a wrong indication
in case of MCSR_BUS_WBERR - so print "Write" instead of "Read".

Signed-off-by: Wladislav Wiebe <wladislav.kw@gmail.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-06-25 18:49:38 -05:00
Chunhe Lan 36a2a09d57 powerpc/85xx: Add T4240RDB board support
T4240RDB board Specification
----------------------------
Memory subsystem:
     6GB DDR3
     128MB NOR flash
     2GB NAND flash
Ethernet:
     Eight 1G SGMII ports
     Four 10Gbps SFP+ ports
PCIe:
     Two PCIe slots
USB:
     Two USB2.0 Type A ports
SDHC:
     One SD-card port
SATA:
     One SATA port
UART:
     Dual RJ45 ports

Signed-off-by: Chunhe Lan <Chunhe.Lan@freescale.com>
Cc: Scott Wood <scottwood@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-06-25 18:49:35 -05:00
Scott Wood 6663a4fa67 powerpc: Don't skip ePAPR spin-table CPUs
Commit 59a53afe70 "powerpc: Don't setup
CPUs with bad status" broke ePAPR SMP booting.  ePAPR says that CPUs
that aren't presently running shall have status of disabled, with
enable-method being used to determine whether the CPU can be enabled.

Fix by checking for spin-table, which is currently the only supported
enable-method.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Emil Medve <Emilian.Medve@Freescale.com>
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-25 13:10:49 +10:00
Laurent Dufour c2cbcf533a powerpc/module: Fix TOC symbol CRC
The commit 71ec7c55ed introduced the magic symbol ".TOC." for ELFv2 ABI.
This symbol is built manually and has no CRC value computed. A zero value
is put in the CRC section to avoid modpost complaining about a missing CRC.
Unfortunately, this breaks the kernel module loading when the kernel is
relocated (kdump case for instance) because of the relocation applied to
the kcrctab values.

This patch compute a CRC value for the TOC symbol which will match the one
compute by the kernel when it is relocated - aka '0 - relocate_start' done in
maybe_relocated called by check_version (module.c).

Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-25 13:10:48 +10:00
Michael Ellerman e2500be2b8 powerpc/powernv: Remove OPAL v1 takeover
In commit 27f4488872 "Add OPAL takeover from PowerVM" we added support
for "takeover" on OPAL v1 machines.

This was a mode of operation where we would boot under pHyp, and query
for the presence of OPAL. If detected we would then do a special
sequence to take over the machine, and the kernel would end up running
in hypervisor mode.

OPAL v1 was never a supported product, and was never shipped outside
IBM. As far as we know no one is still using it.

Newer versions of OPAL do not use the takeover mechanism. Although the
query for OPAL should be harmless on machines with newer OPAL, we have
seen a machine where it causes a crash in Open Firmware.

The code in early_init_devtree() to copy boot_command_line into cmd_line
was added in commit 817c21ad9a "Get kernel command line accross OPAL
takeover", and AFAIK is only used by takeover, so should also be
removed.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-25 13:10:47 +10:00
Catalin Marinas a1d23d5c94 powerpc/kmemleak: Do not scan the DART table
The DART table allocation is registered to kmemleak via the
memblock_alloc_base() call. However, the DART table is later unmapped
and dart_tablebase VA no longer accessible. This patch tells kmemleak
not to scan this block and avoid an unhandled paging request.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-24 14:29:46 +10:00
Rickard Strandqvist f5fc82290c powerpc/cell: cbe_thermal.c: Cleaning up a variable is of the wrong type
This variable is of the wrong type, everywhere it is used it
should be an unsigned int rather than a int.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-24 14:05:59 +10:00
Michael Ellerman 2f0143c91d powerpc/kprobes: Fix jprobes on ABI v2 (LE)
In commit 721aeaa9 "Build little endian ppc64 kernel with ABIv2", we
missed some updates required in the kprobes code to make jprobes work
when the kernel is built with ABI v2.

Firstly update arch_deref_entry_point() to do the right thing. Now that
we have added ppc_global_function_entry() we can just always use that, it
will do the right thing for 32 & 64 bit and ABI v1 & v2.

Secondly we need to update the code that sets up the register state before
calling the jprobe handler. On ABI v1 we setup r2 to hold the TOC, on ABI
v2 we need to populate r12 with the function entry point address.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-24 14:05:55 +10:00
Michael Ellerman 072c4c018e powerpc/ftrace: Use pr_fmt() to namespace error messages
The printks() in our ftrace code have no prefix, so they appear on the
console with very little context, eg:

  Branch out of range

Use pr_fmt() & pr_err() to add a prefix. While we're at it, collapse a
few split lines that don't need to be, and add a missing newline to one
message.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-24 14:05:50 +10:00
Michael Ellerman d84e0d69c2 powerpc/ftrace: Fix nop of modules on 64bit LE (ABIv2)
There is a bug in the handling of the function entry when we are nopping
out a branch from a module in ftrace.

We compare the result of module_trampoline_target() with the value of
ppc_function_entry(), and expect them to be true. But they never will
be.

module_trampoline_target() will always return the global entry point of
the function, whereas ppc_function_entry() will always return the local.

Fix it by using the newly added ppc_global_function_entry().

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-24 14:05:46 +10:00
Michael Ellerman b7b348c682 powerpc/ftrace: Fix inverted check of create_branch()
In commit 24a1bdc35, "Fix ABIv2 issues with __ftrace_make_call", Anton
changed the logic that creates and patches the branch, and added a
thinko in the check of create_branch(). create_branch() returns the
instruction that was generated, so if we get zero then it succeeded.

The result is we can't ftrace modules:

  Branch out of range
  WARNING: at ../kernel/trace/ftrace.c:1638
  ftrace failed to modify [<d000000004ba001c>] fuse_req_init_context+0x1c/0x90 [fuse]

We should probably fix patch_instruction() to do that check and make the
API saner, but that's a separate patch. For now just invert the test.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-24 14:05:41 +10:00
Michael Ellerman dfc382a19a powerpc/ftrace: Fix typo in mask of opcode
In commit 24a1bdc35, "Fix ABIv2 issues with __ftrace_make_call", Anton
changed the logic that checks for the expected code sequence when
patching a module.

We missed the typo in the mask, 0xffff00000 should be 0xffff0000, which
has the effect of making the test always true.

That makes it impossible to ftrace against modules, eg:

  Unexpected call sequence: 48000008 e8410018
  WARNING: at ../kernel/trace/ftrace.c:1638
  ftrace failed to modify [<d000000007cf001c>] rng_dev_open+0x1c/0x70 [rng_core]

Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-24 14:05:37 +10:00
Michael Ellerman d997c00c5a powerpc: Add ppc_global_function_entry()
ABIv2 has the concept of a global and local entry point to a function.
In most cases we are interested in the local entry point, and so that is
what ppc_function_entry() returns.

However we have a case in the ftrace code where we want the global entry
point, and there may be other places we need it too. Rather than special
casing each, add an accessor.

For ABIv1 and 32-bit there is only a single entry point, so we return
that. That means it's safe for the caller to use this without also
checking the ABI version.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-24 14:05:32 +10:00
Benjamin Herrenschmidt 716821c943 powerpc: Remove __arch_swab*
The generic code uses gcc built-ins which work fine so there's no benefit
in implementing our own anymore.

We can't completely remove the ld/st_le* functions as some historical
cruft still uses them, but that's next on the radar

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-24 12:43:15 +10:00
Michael Ellerman bf77ee2a7a powerpc: Remove ancient DEBUG_SIG code
We have some compile-time disabled debug code in signal_xx.c. It's from
some ancient time BG, almost certainly part of the original port, given
the very similar code on other arches.

The show_unhandled_signal logic, added in d0c3d534a4 (2.6.24) is
cleaner and prints more useful information, so drop the debug code.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-24 12:43:14 +10:00
Gavin Shan 0eb5736828 powerpc/kerenl: Enable EEH for IO accessors
In arch/powerpc/kernel/iomap.c, lots of IO reading accessors missed
to check EEH error as Ben pointed. The patch fixes it.

For the writing accessors, we change the called functions only for
making them look similar to the reading counterparts.

Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-24 12:43:13 +10:00
Valentin Longchamp db75c22f1a powerpc/mpc85xx: fix fsl/p2041-post.dtsi clockgen mux2
The mux2 node is missing the clock-output-names field that is required
by the clk-ppc-corenet driver.

Signed-off-by: Valentin Longchamp <valentin.longchamp@keymile.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-06-20 18:48:36 -05:00
Laurentiu Tudor 7c48005036 powerpc/booke64: wrap tlb lock and search in htw miss with FTR_SMT
Virtualized environments may expose a e6500 dual-threaded core
as two single-threaded e6500 cores. Take advantage of this
and get rid of the tlb lock and the trap-causing tlbsx in
the htw miss handler by guarding with CPU_FTR_SMT, as it's
already being done in the bolted tlb1 miss handler.

As seen in the results below, measurements done with lmbench
random memory access latency test running under Freescale's
Embedded Hypervisor, there is a ~34% improvement.

Memory latencies in nanoseconds - smaller is better
    (WARNING - may not be correct, check graphs)
----------------------------------------------------
Host       Mhz   L1 $   L2 $    Main mem    Rand mem
---------  ---   ----   ----    --------    --------
smt       1665 1.8020   13.2    83.0         1149.7
nosmt     1665 1.8020   13.2    83.0          758.1

Signed-off-by: Laurentiu Tudor <Laurentiu.Tudor@freescale.com>
Cc: Scott Wood <scottwood@freescale.com>
[scottwood@freescale.com: commit message tweak]
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-06-20 18:48:34 -05:00
Chunhe Lan db65025d8e t4240/dts: Enable third elo3 DMA engine support
T4240 has a third DMA engine controller, so add the corresponding DMA
node into the dts file.

Signed-off-by: Chunhe Lan <Chunhe.Lan@freescale.com>
Cc: Scott Wood <scottwood@freescale.com>
[scottwood@freescale.com: reword commit message]
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-06-20 18:48:32 -05:00
Shengzhou Liu a95e8c28b3 powerpc/defconfig: update RTC support
- remove CONFIG_RTC_DRV_CMOS in corenet32_smp_defconfig(it's unused),
  reserve CONFIG_RTC_DRV_CMOS in mpc85xx_defconfig(needed on some CDS boards)

- enable CONFIG_RTC_DRV_DS1307, CONFIG_RTC_DRV_DS1374,
  CONFIG_RTC_DRV_DS3232 in mpc85xx_defconfig, mpc85xx_smp_defconfig

- enable RTC support in  corenet64_smp_defconfig

Signed-off-by: Shengzhou Liu <Shengzhou.Liu@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-06-20 18:48:32 -05:00
Scott Wood 5d1bf1e2c0 powerpc/e500mc: Fix wrong value of MCSR_L2MMU_MHIT
Signed-off-by: Scott Wood <scottwood@freescale.com>
Reported-by: Ed Swarthout <ed.swarthout@freescale.com>
2014-06-20 18:48:31 -05:00
Scott Wood 1cb4ed92f6 powerpc/e6500: hw tablewalk: fix recursive tlb lock on cpu 0
Commit 82d86de25b "TLB lock recursive"
introduced a bug whereby cpu 0 uses the same value for "lock held" as
is used to indicate that the lock is free.  This means that cpu 1 can
acquire the lock whenever it wants, regardless of whether cpu 0 has it
locked, which in turn means we can get duplicate TLB entries.

Add one to the CPU value to ensure we do not use zero as a "lock held"
value.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Reported-by: Ed Swarthout <ed.swarthout@freescale.com>
2014-06-20 18:48:30 -05:00
Scott Wood bbd08c72c6 powerpc/e6500: hw tablewalk: clear TID in kernel indirect entries
Previously TID was being cleared before the tlbsx, but not after.  This
can lead to a multiway hit between a TLB entry with TID=0 (previously
inserted when PID=0) and a TLB entry with TID!=0 that matches PID.
This can theoretically result in undefined behavior, though we probably
get lucky due to the details of the overlap.  It also results in the
inability to use multihit detection to detect other conflicting TLB
entries, as well as poorer TLB utilization due to duplicating kernel
TLB entries.

Rather than try to patch up MAS1 after tlbsx, the entire value is
saved/restored as with MAS2.

I observed a slight improvement in TLB miss performance with this patch
applied.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Reported-by: Ed Swarthout <ed.swarthout@freescale.com>
2014-06-20 18:48:29 -05:00
Linus Torvalds 1700ff823b Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kbuild updates from Michal Marek:
 "Kbuild changes for v3.16-rc1:

   - cross-compilation fix so that cc-option is testing the right
     compiler
   - Fix for make defconfig all
   - Using relative paths to the object and source directory where
     possible, plus fixes for the fallout of the change
   - several cleanups in the Makefiles and scripts

  The powerpc fix is from today, because it was only discovered
  recently.  The rest has been in linux-next for some time"

* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  powerpc: Avoid circular dependency with zImage.%
  kbuild: create include/config directory in scripts/kconfig/Makefile
  kbuild: do not create include/linux directory
  Makefile: Fix unrecognized cross-compiler command line options
  kbuild: do not add "selinux" to subdir- twice
  um: Fix for relative objtree when generating x86 headers
  kbuild: Use relative path when building in a subdir of the source tree
  kbuild: Use relative path when building in the source tree
  kbuild: Use relative path for $(objtree)
  firmware: Use $(quote) in the Makefile
  firmware: Simplify directory creation
  kbuild: trivial - fix comment block indent
  kbuild: trivial - remove trailing spaces
  kbuild: support simultaneous "make %config" and "make all"
  kbuild: move extra gcc checks to scripts/Makefile.extrawarn
2014-06-12 21:23:38 -07:00
Linus Torvalds b7c8c1945c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull more powerpc updates from Ben Herrenschmidt:
 "Here are the remaining bits I was mentioning earlier.  Mostly bug
  fixes and new selftests from Michael (yay !).  He also removed the WSP
  platform and A2 core support which were dead before release, so less
  clutter.

  One little "feature" I snuck in is the doorbell IPI support for
  non-virtualized P8 which speeds up IPIs significantly between threads
  of a core"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (34 commits)
  powerpc/book3s: Fix some ABIv2 issues in machine check code
  powerpc/book3s: Fix guest MC delivery mechanism to avoid soft lockups in guest.
  powerpc/book3s: Increment the mce counter during machine_check_early call.
  powerpc/book3s: Add stack overflow check in machine check handler.
  powerpc/book3s: Fix machine check handling for unhandled errors
  powerpc/eeh: Dump PE location code
  powerpc/powernv: Enable POWER8 doorbell IPIs
  powerpc/cpuidle: Only clear LPCR decrementer wakeup bit on fast sleep entry
  powerpc/powernv: Fix killed EEH event
  powerpc: fix typo 'CONFIG_PMAC'
  powerpc: fix typo 'CONFIG_PPC_CPU'
  powerpc/powernv: Don't escalate non-existing frozen PE
  powerpc/eeh: Report frozen parent PE prior to child PE
  powerpc/eeh: Clear frozen state for child PE
  powerpc/powernv: Reduce panic timeout from 180s to 10s
  powerpc/xmon: avoid format string leaking to printk
  selftests/powerpc: Add tests of PMU EBBs
  selftests/powerpc: Add support for skipping tests
  selftests/powerpc: Put the test in a separate process group
  selftests/powerpc: Fix instruction loop for ABIv2 (LE)
  ...
2014-06-12 20:11:38 -07:00
Linus Torvalds b2e09f633a Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more scheduler updates from Ingo Molnar:
 "Second round of scheduler changes:
   - try-to-wakeup and IPI reduction speedups, from Andy Lutomirski
   - continued power scheduling cleanups and refactorings, from Nicolas
     Pitre
   - misc fixes and enhancements"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/deadline: Delete extraneous extern for to_ratio()
  sched/idle: Optimize try-to-wake-up IPI
  sched/idle: Simplify wake_up_idle_cpu()
  sched/idle: Clear polling before descheduling the idle thread
  sched, trace: Add a tracepoint for IPI-less remote wakeups
  cpuidle: Set polling in poll_idle
  sched: Remove redundant assignment to "rt_rq" in update_curr_rt(...)
  sched: Rename capacity related flags
  sched: Final power vs. capacity cleanups
  sched: Remove remaining dubious usage of "power"
  sched: Let 'struct sched_group_power' care about CPU capacity
  sched/fair: Disambiguate existing/remaining "capacity" usage
  sched/fair: Change "has_capacity" to "has_free_capacity"
  sched/fair: Remove "power" from 'struct numa_stats'
  sched: Fix signedness bug in yield_to()
  sched/fair: Use time_after() in record_wakee()
  sched/balancing: Reduce the rate of needless idle load balancing
  sched/fair: Fix unlocked reads of some cfs_b->quota/period
2014-06-12 19:42:15 -07:00
Linus Torvalds f9da455b93 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov.

 2) Multiqueue support in xen-netback and xen-netfront, from Andrew J
    Benniston.

 3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn
    Mork.

 4) BPF now has a "random" opcode, from Chema Gonzalez.

 5) Add more BPF documentation and improve test framework, from Daniel
    Borkmann.

 6) Support TCP fastopen over ipv6, from Daniel Lee.

 7) Add software TSO helper functions and use them to support software
    TSO in mvneta and mv643xx_eth drivers.  From Ezequiel Garcia.

 8) Support software TSO in fec driver too, from Nimrod Andy.

 9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli.

10) Handle broadcasts more gracefully over macvlan when there are large
    numbers of interfaces configured, from Herbert Xu.

11) Allow more control over fwmark used for non-socket based responses,
    from Lorenzo Colitti.

12) Do TCP congestion window limiting based upon measurements, from Neal
    Cardwell.

13) Support busy polling in SCTP, from Neal Horman.

14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru.

15) Bridge promisc mode handling improvements from Vlad Yasevich.

16) Don't use inetpeer entries to implement ID generation any more, it
    performs poorly, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits)
  rtnetlink: fix userspace API breakage for iproute2 < v3.9.0
  tcp: fixing TLP's FIN recovery
  net: fec: Add software TSO support
  net: fec: Add Scatter/gather support
  net: fec: Increase buffer descriptor entry number
  net: fec: Factorize feature setting
  net: fec: Enable IP header hardware checksum
  net: fec: Factorize the .xmit transmit function
  bridge: fix compile error when compiling without IPv6 support
  bridge: fix smatch warning / potential null pointer dereference
  via-rhine: fix full-duplex with autoneg disable
  bnx2x: Enlarge the dorq threshold for VFs
  bnx2x: Check for UNDI in uncommon branch
  bnx2x: Fix 1G-baseT link
  bnx2x: Fix link for KR with swapped polarity lane
  sctp: Fix sk_ack_backlog wrap-around problem
  net/core: Add VF link state control policy
  net/fsl: xgmac_mdio is dependent on OF_MDIO
  net/fsl: Make xgmac_mdio read error message useful
  net_sched: drr: warn when qdisc is not work conserving
  ...
2014-06-12 14:27:40 -07:00
Ingo Molnar 535560d841 Merge commit '3cf2f34' into sched/core, to fix build error
Fix this dependency on the locking tree's smp_mb*() API changes:

  kernel/sched/idle.c:247:3: error: implicit declaration of function ‘smp_mb__after_atomic’ [-Werror=implicit-function-declaration]

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-12 13:46:37 +02:00
Michal Marek 699c659b49 powerpc: Avoid circular dependency with zImage.%
The rule to create the final images uses a zImage.% pattern.
Unfortunately, this also matches the names of the zImage.*.lds linker
scripts, which appear as a dependency of the final images. This somehow
worked when $(srctree) used to be an absolute path, but now the pattern
matches too much. List only the images from $(image-y) as the target of
the rule, to avoid the circular dependency.

Reported-and-tested-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>
2014-06-12 10:07:35 +02:00
Anton Blanchard ad718622ab powerpc/book3s: Fix some ABIv2 issues in machine check code
Commit 2749a2f26a (powerpc/book3s: Fix machine check handling for
unhandled errors) introduced a few ABIv2 issues.

We can maintain ABIv1 and ABIv2 compatibility by branching to the
function rather than the dot symbol.

Fixes: 2749a2f26a ("powerpc/book3s: Fix machine check handling for unhandled errors")
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-12 09:41:33 +10:00
Mahesh Salgaonkar 74845bc2fa powerpc/book3s: Fix guest MC delivery mechanism to avoid soft lockups in guest.
Currently we forward MCEs to guest which have been recovered by guest.
And for unhandled errors we do not deliver the MCE to guest. It looks like
with no support of FWNMI in qemu, guest just panics whenever we deliver the
recovered MCEs to guest. Also, the existig code used to return to host for
unhandled errors which was casuing guest to hang with soft lockups inside
guest and makes it difficult to recover guest instance.

This patch now forwards all fatal MCEs to guest causing guest to crash/panic.
And, for recovered errors we just go back to normal functioning of guest
instead of returning to host. This fixes soft lockup issues in guest.
This patch also fixes an issue where guest MCE events were not logged to
host console.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-11 19:15:15 +10:00
Mahesh Salgaonkar e6654d5b42 powerpc/book3s: Increment the mce counter during machine_check_early call.
We don't see MCE counter getting increased in /proc/interrupts which gives
false impression of no MCE occurred even when there were MCE events.
The machine check early handling was added for PowerKVM and we missed to
increment the MCE count in the early handler.

We also increment mce counters in the machine_check_exception call, but
in most cases where we handle the error hypervisor never reaches there
unless its fatal and we want to crash. Only during fatal situation we may
see double increment of mce count. We need to fix that. But for
now it always good to have some count increased instead of zero.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-11 19:15:14 +10:00
Mahesh Salgaonkar e75ad93afc powerpc/book3s: Add stack overflow check in machine check handler.
Currently machine check handler does not check for stack overflow for
nested machine check. If we hit another MCE while inside the machine check
handler repeatedly from same address then we get into risk of stack
overflow which can cause huge memory corruption. This patch limits the
nested MCE level to 4 and panic when we cross level 4.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-11 19:15:13 +10:00
Mahesh Salgaonkar 2749a2f26a powerpc/book3s: Fix machine check handling for unhandled errors
Current code does not check for unhandled/unrecovered errors and return from
interrupt if it is recoverable exception which in-turn triggers same machine
check exception in a loop causing hypervisor to be unresponsive.

This patch fixes this situation and forces hypervisor to panic for
unhandled/unrecovered errors.

This patch also fixes another issue where unrecoverable_exception routine
was called in real mode in case of unrecoverable exception (MSR_RI = 0).
This causes another exception vector 0x300 (data access) during system crash
leading to confusion while debugging cause of the system crash.

Also turn ME bit off while going down, so that when another MCE is hit during
panic path, system will checkstop and hypervisor will get restarted cleanly
by SP.

With the above fixes we now throw correct console messages (see below) while
crashing the system in case of unhandled/unrecoverable machine checks.

--------------
Severe Machine check interrupt [[Not recovered]
  Initiator: CPU
  Error type: UE [Instruction fetch]
    Effective address: 0000000030002864
Oops: Machine check, sig: 7 [#1]
SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in: bork(O) bridge stp llc kvm [last unloaded: bork]
CPU: 36 PID: 55162 Comm: bash Tainted: G           O 3.14.0mce #1
task: c000002d72d022d0 ti: c000000007ec0000 task.ti: c000002d72de4000
NIP: 0000000030002864 LR: 00000000300151a4 CTR: 000000003001518c
REGS: c000000007ec3d80 TRAP: 0200   Tainted: G           O  (3.14.0mce)
MSR: 9000000000041002 <SF,HV,ME,RI>  CR: 28222848  XER: 20000000
CFAR: 0000000030002838 DAR: d0000000004d0000 DSISR: 00000000 SOFTE: 1
GPR00: 000000003001512c 0000000031f92cb0 0000000030078af0 0000000030002864
GPR04: d0000000004d0000 0000000000000000 0000000030002864 ffffffffffffffc9
GPR08: 0000000000000024 0000000030008af0 000000000000002c c00000000150e728
GPR12: 9000000000041002 0000000031f90000 0000000010142550 0000000040000000
GPR16: 0000000010143cdc 0000000000000000 00000000101306fc 00000000101424dc
GPR20: 00000000101424e0 000000001013c6f0 0000000000000000 0000000000000000
GPR24: 0000000010143ce0 00000000100f6440 c000002d72de7e00 c000002d72860250
GPR28: c000002d72860240 c000002d72ac0038 0000000000000008 0000000000040000
NIP [0000000030002864] 0x30002864
LR [00000000300151a4] 0x300151a4
Call Trace:
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
---[ end trace 7285f0beac1e29d3 ]---

Sending IPI to other CPUs
IPI complete
OPAL V3 detected !
--------------

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-11 19:15:13 +10:00
Gavin Shan 357b2f3dd9 powerpc/eeh: Dump PE location code
As Ben suggested, it's meaningful to dump PE's location code
for site engineers when hitting EEH errors. The patch introduces
function eeh_pe_loc_get() to retireve the location code from
dev-tree so that we can output it when hitting EEH errors.

If primary PE bus is root bus, the PHB's dev-node would be tried
prior to root port's dev-node. Otherwise, the upstream bridge's
dev-node of the primary PE bus will be check for the location code
directly.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-11 19:12:23 +10:00
Michael Neuling d4e58e5928 powerpc/powernv: Enable POWER8 doorbell IPIs
This patch enables POWER8 doorbell IPIs on powernv.

Since doorbells can only IPI within a core, we test to see when we can use
doorbells and if not we fall back to XICS.  This also enables hypervisor
doorbells to wakeup us up from nap/sleep via the LPCR PECEDH bit.

Based on tests by Anton, the best case IPI latency between two threads dropped
from 894ns to 512ns.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-11 17:05:12 +10:00
Gavin Shan 5c7a35e3e2 powerpc/powernv: Fix killed EEH event
On PowerNV platform, EEH errors are reported by IO accessors or poller
driven by interrupt. After the PE is isolated, we won't produce EEH
event for the PE. The current implementation has possibility of EEH
event lost in this way:

The interrupt handler queues one "special" event, which drives the poller.
EEH thread doesn't pick the special event yet. IO accessors kicks in, the
frozen PE is marked as "isolated" and EEH event is queued to the list.
EEH thread runs because of special event and purge all existing EEH events.
However, we never produce an other EEH event for the frozen PE. Eventually,
the PE is marked as "isolated" and we don't have EEH event to recover it.

The patch fixes the issue to keep EEH events for PEs that have been
marked as "isolated" with the help of additional "force" help to
eeh_remove_event().

Reported-by: Rolf Brudeseth <rolfb@us.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-11 17:04:33 +10:00
Paul Bolle 6e0fdf9af2 powerpc: fix typo 'CONFIG_PMAC'
Commit b0d278b7d3 ("powerpc/perf_event: Reduce latency of calling
perf_event_do_pending") added a check for CONFIG_PMAC were a check for
CONFIG_PPC_PMAC was clearly intended.

Fixes: b0d278b7d3 ("powerpc/perf_event: Reduce latency of calling perf_event_do_pending")
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-11 17:04:29 +10:00
Paul Bolle b69a1da94f powerpc: fix typo 'CONFIG_PPC_CPU'
Commit cd64d1697c ("powerpc: mtmsrd not defined") added a check for
CONFIG_PPC_CPU were a check for CONFIG_PPC_FPU was clearly intended.

Fixes: cd64d1697c ("powerpc: mtmsrd not defined")
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-11 17:04:25 +10:00
Gavin Shan 71b540adff powerpc/powernv: Don't escalate non-existing frozen PE
Commit cb5b242c ("powerpc/eeh: Escalate error on non-existing PE")
escalates the frozen state on non-existing PE to fenced PHB. It
was to improve kdump reliability. After that, commit 361f2a2a
("powrpc/powernv: Reset PHB in kdump kernel") was introduced to
issue complete reset on all PHBs to increase the reliability of
kdump kernel.

Commit cb5b242c becomes unuseful and it would be reverted.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-11 17:04:20 +10:00
Gavin Shan 1ad7a72c5e powerpc/eeh: Report frozen parent PE prior to child PE
When we have the corner case of frozen parent and child PE at the
same time, we have to handle the frozen parent PE prior to the
child. Without clearning the frozen state on parent PE, the child
PE can't be recovered successfully.

The patch searches the EEH PE hierarchy tree and returns the toppest
frozen PE to be handled. It ensures the frozen parent PE will be
handled prior to child PE.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-11 17:04:16 +10:00
Gavin Shan 2c66599206 powerpc/eeh: Clear frozen state for child PE
Since commit cb523e09 ("powerpc/eeh: Avoid I/O access during PE
reset"), the PE is kept as frozen state on hardware level until
the PE reset is done completely. After that, we explicitly clear
the frozen state of the affected PE. However, there might have
frozen child PEs of the affected PE and we also need clear their
frozen state as well. Otherwise, the recovery is going to fail.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-11 17:04:12 +10:00
Anton Blanchard 4817fc323d powerpc/powernv: Reduce panic timeout from 180s to 10s
We've already dropped the default pseries timeout to 10s, do
the same for powernv.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-11 17:04:07 +10:00
Kees Cook 50b66dbf87 powerpc/xmon: avoid format string leaking to printk
This makes sure format strings cannot leak into printk (the string has
already been correctly processed for format arguments).

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-11 17:04:03 +10:00
Michael Ellerman 3df48c981d powerpc/perf: Ensure all EBB register state is cleared on fork()
In commit 330a1eb "Core EBB support for 64-bit book3s" I messed up
clear_task_ebb(). It clears some but not all of the task's Event Based
Branch (EBB) registers when we duplicate a task struct.

That allows a child task to observe the EBBHR & EBBRR of its parent,
which it should not be able to do.

Fix it by clearing EBBHR & EBBRR.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: stable@vger.kernel.org [v3.11+]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-11 17:03:41 +10:00
Joel Stanley caf69ba627 powerpc/powernv: Fix reading of OPAL msglog
memory_return_from_buffer returns a signed value, so ret should be
ssize_t.

Fixes the following issue reported by David Binderman:

  [linux-3.15/arch/powerpc/platforms/powernv/opal-msglog.c:65]: (style)
  Checking if unsigned variable 'ret' is less than zero.
  [linux-3.15/arch/powerpc/platforms/powernv/opal-msglog.c:82]: (style)
  Checking if unsigned variable 'ret' is less than zero.

  Local variable "ret" is of type size_t. This is always unsigned,
  so it is pointless to check if it is less than zero.

  https://bugzilla.kernel.org/show_bug.cgi?id=77551

Fixing this exposes a real bug for the case where the entire count
bytes is successfully read from the POS_WRAP case. The second
memory_read_from_buffer will return EINVAL, causing the entire read to
return EINVAL to userspace, despite the data being copied correctly. The
fix is to test for the case where the data has been read and return
early.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-11 17:03:36 +10:00
Dan Carpenter be8f9642a0 powerpc/spufs: Remove duplicate SPUFS_CNTL_MAP_SIZE define
The SPUFS_CNTL_MAP_SIZE define is cut and pasted twice so we can delete
the second instance.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-11 17:03:32 +10:00
Dan Carpenter d9d8212319 powerpc/cpm: Remove duplicate FCC_GFMR_TTX define
The FCC_GFMR_TTX define is cut and pasted twice so we can remove the
second instance.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-11 17:03:28 +10:00
Guo Chao ddf0322a3f powerpc/powernv: Fix endianness problems in EEH
EEH information fetched from OPAL need fix before using in LE environment.
To be included in sparse's endian check, declare them as __beXX and
access them by accessors.

Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>

Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-11 17:03:23 +10:00
Anton Blanchard 1bd098903f powernv: Fix permissions on sysparam sysfs entries
Everyone can write to these files, which is not what we want.

Cc: stable@vger.kernel.org # 3.15
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-11 17:03:15 +10:00
Shreyas B. Prabhu ad41733062 powerpc/powernv : Disable subcore for UP configs
Build throws following errors when CONFIG_SMP=n
arch/powerpc/platforms/powernv/subcore.c: In function ‘cpu_update_split_mode’:
arch/powerpc/platforms/powernv/subcore.c:274:15: error: ‘setup_max_cpus’ undeclared (first use in this function)
arch/powerpc/platforms/powernv/subcore.c:285:5: error: lvalue required as left operand of assignment

'setup_max_cpus' variable is relevant only on SMP, so there is no point
working around it for UP. Furthermore, subcore itself is relevant only
on SMP and hence the better solution is to exclude subcore.o and
subcore-asm.o for UP builds.

Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-11 17:03:10 +10:00
Shreyas B. Prabhu b2a8087869 powerpc/powernv: Include asm/smp.h to fix UP build failure
Build throws following errors when CONFIG_SMP=n
arch/powerpc/platforms/powernv/setup.c: In function ‘pnv_kexec_wait_secondaries_down’:
arch/powerpc/platforms/powernv/setup.c:179:4: error: implicit declaration of function ‘get_hard_smp_processor_id’
    rc = opal_query_cpu_status(get_hard_smp_processor_id(i),

The usage of get_hard_smp_processor_id() needs the declaration from
<asm/smp.h>. The file setup.c includes <linux/sched.h>, which in-turn
includes <linux/smp.h>. However, <linux/smp.h> includes <asm/smp.h>
only on SMP configs and hence UP builds fail.

Fix this by directly including <asm/smp.h> in setup.c unconditionally.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-11 17:03:06 +10:00
Michael Neuling 59a53afe70 powerpc: Don't setup CPUs with bad status
OPAL will mark a CPU that is guarded as "bad" in the status property of the CPU
node.

Unfortunatley Linux doesn't check this property and will put the bad CPU in the
present map.  This has caused hangs on booting when we try to unsplit the core.

This patch checks the CPU is avaliable via this status property before putting
it in the present map.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Tested-by: Anton Blanchard <anton@samba.org>
cc: stable@vger.kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-11 17:03:01 +10:00
Sam bobroff 96d0161086 powerpc: Correct DSCR during TM context switch
Correct the DSCR SPR becoming temporarily corrupted if a task is
context switched during a transaction.

The problem occurs while suspending the task and is caused by saving
the DSCR to thread.dscr after it has already been set to the CPU's
default value:

__switch_to() calls __switch_to_tm()
	which calls tm_reclaim_task()
	which calls tm_reclaim_thread()
	which calls tm_reclaim()
		where the DSCR is set to the CPU's default
__switch_to() calls _switch()
		where thread.dscr is set to the DSCR

When the task is resumed, it's transaction will be doomed (as usual)
and the DSCR SPR will be corrupted, although the checkpointed value
will be correct. Therefore the DSCR will be immediately corrected by
the transaction aborting, unless it has been suspended. In that case
the incorrect value can be seen by the task until it resumes the
transaction.

The fix is to treat the DSCR similarly to the TAR and save it early
in __switch_to().

A program exposing the problem is added to the kernel self tests as:
tools/testing/selftests/powerpc/tm/tm-resched-dscr.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
CC: <stable@vger.kernel.org> [v3.10+]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-11 17:02:56 +10:00
Michael Ellerman fb5a515704 powerpc: Remove platforms/wsp and associated pieces
__attribute__ ((unused))

WSP is the last user of CONFIG_PPC_A2, so we remove that as well.

Although CONFIG_PPC_ICSWX still exists, it's no longer selectable for
any Book3E platform, so we can remove the code in mmu-book3e.h that
depended on it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-11 16:35:38 +10:00
Paul Bolle 94314290ed powerpc: Remove check for CONFIG_SERIAL_TEXT_DEBUG
The Kconfig symbol SERIAL_TEXT_DEBUG was removed from
arch/powerpc/Kconfig.debug in v2.6.22. (In v2.6.27 it was also removed
from arch/ppc/Kconfig.debug.) So the check for its macro has evaluated
to false for over five years now. Remove that check and the few lines
of code hidden behind it.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-11 16:31:21 +10:00
Benjamin Herrenschmidt dd58a092c4 powerpc: Add AT_HWCAP2 to indicate V.CRYPTO category support
The Vector Crypto category instructions are supported by current POWER8
chips, advertise them to userspace using a specific bit to properly
differentiate with chips of the same architecture level that might not
have them.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@vger.kernel.org> [v3.10+]
2014-06-11 16:31:20 +10:00
Linus Torvalds dfb945473a Merge git://www.linux-watchdog.org/linux-watchdog
Pull watchdog updates from Wim Van Sebroeck:
 "This contains:
   - addition of the Intel MID watchdog
   - removal of W83697HF and W83697UG drivers (code was merged into
     w83627hf_wdt driver)
   - addition of Armada 375/380 SoC support
   - conversion of imx2_wdt to regmap API and to watchdog core API
   - lots of other small improvements and fixes"

[ Wim was also tagged by gmail as a spammer, but not delayed by days
  unlike Ben ]

* git://www.linux-watchdog.org/linux-watchdog: (25 commits)
  x86: intel-mid: add watchdog platform code for Merrifield
  watchdog: add Intel MID watchdog driver support
  watchdog: sp805: Set watchdog_device->timeout from ->set_timeout()
  booke/watchdog: refine and clean up the codes
  watchdog: iop_wdt only builds for mach-iop13xx
  watchdog: Remove drivers for W83697HF and W83697UG
  watchdog: w83627hf_wdt: Add early_disable module parameter
  ARM: mvebu: Add A375/A380 watchdog binding documentation
  watchdog: orion: Add Armada 375/380 SoC support
  watchdog: orion: Introduce per-SoC enabled() function
  watchdog: orion: Introduce per-SoC stop() function
  watchdog: orion: Remove unneeded atomic access
  watchdog: orion: Introduce a SoC-specific RSTOUT mapping
  watchdog: orion: Move the register ioremap'ing to its own function
  watchdog: xilinx: Make of_device_id array const
  watchdog: imx2_wdt: convert to watchdog core api
  watchdog: imx2_wdt: convert to use regmap API.
  watchdog: imx2_wdt: Sort the header files alphabetically
  watchdog: ath79_wdt: switch to clk_prepare/clk_disable
  watchdog: ath79_wdt: avoid spurious restarts on AR934x
  ...
2014-06-10 19:16:36 -07:00
Linus Torvalds c5aec4c76a Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc updates from Ben Herrenschmidt:
 "Here is the bulk of the powerpc changes for this merge window.  It got
  a bit delayed in part because I wasn't paying attention, and in part
  because I discovered I had a core PCI change without a PCI maintainer
  ack in it.  Bjorn eventually agreed it was ok to merge it though we'll
  probably improve it later and I didn't want to rebase to add his ack.

  There is going to be a bit more next week, essentially fixes that I
  still want to sort through and test.

  The biggest item this time is the support to build the ppc64 LE kernel
  with our new v2 ABI.  We previously supported v2 userspace but the
  kernel itself was a tougher nut to crack.  This is now sorted mostly
  thanks to Anton and Rusty.

  We also have a fairly big series from Cedric that add support for
  64-bit LE zImage boot wrapper.  This was made harder by the fact that
  traditionally our zImage wrapper was always 32-bit, but our new LE
  toolchains don't really support 32-bit anymore (it's somewhat there
  but not really "supported") so we didn't want to rely on it.  This
  meant more churn that just endian fixes.

  This brings some more LE bits as well, such as the ability to run in
  LE mode without a hypervisor (ie. under OPAL firmware) by doing the
  right OPAL call to reinitialize the CPU to take HV interrupts in the
  right mode and the usual pile of endian fixes.

  There's another series from Gavin adding EEH improvements (one day we
  *will* have a release with less than 20 EEH patches, I promise!).

  Another highlight is the support for the "Split core" functionality on
  P8 by Michael.  This allows a P8 core to be split into "sub cores" of
  4 threads which allows the subcores to run different guests under KVM
  (the HW still doesn't support a partition per thread).

  And then the usual misc bits and fixes ..."

[ Further delayed by gmail deciding that BenH is a dirty spammer.
  Google knows.  ]

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (155 commits)
  powerpc/powernv: Add missing include to LPC code
  selftests/powerpc: Test the THP bug we fixed in the previous commit
  powerpc/mm: Check paca psize is up to date for huge mappings
  powerpc/powernv: Pass buffer size to OPAL validate flash call
  powerpc/pseries: hcall functions are exported to modules, need _GLOBAL_TOC()
  powerpc: Exported functions __clear_user and copy_page use r2 so need _GLOBAL_TOC()
  powerpc/powernv: Set memory_block_size_bytes to 256MB
  powerpc: Allow ppc_md platform hook to override memory_block_size_bytes
  powerpc/powernv: Fix endian issues in memory error handling code
  powerpc/eeh: Skip eeh sysfs when eeh is disabled
  powerpc: 64bit sendfile is capped at 2GB
  powerpc/powernv: Provide debugfs access to the LPC bus via OPAL
  powerpc/serial: Use saner flags when creating legacy ports
  powerpc: Add cpu family documentation
  powerpc/xmon: Fix up xmon format strings
  powerpc/powernv: Add calls to support little endian host
  powerpc: Document sysfs DSCR interface
  powerpc: Fix regression of per-CPU DSCR setting
  powerpc: Split __SYSFS_SPRSETUP macro
  arch: powerpc/fadump: Cleaning up inconsistent NULL checks
  ...
2014-06-10 18:54:22 -07:00
Tang Yuantian d2deebabae booke/watchdog: refine and clean up the codes
Basically, this patch does the following:
1. Move the codes of parsing boot parameters from setup-common.c
   to driver. In this way, code reader can know directly that
   there are boot parameters that can change the timeout.
2. Make boot parameter 'booke_wdt_period' effective.
   currently, when driver is loaded, default timeout is always
   being used in stead of booke_wdt_period.
3. Wrap up the watchdog timeout in device struct and clean up
   unnecessary codes.

Signed-off-by: Tang Yuantian <yuantian.tang@freescale.com>
Acked-by: Scott Wood <scottwood@freescale.com>
Reviewed-by: Li Yang <leoli@freescale.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2014-06-10 21:47:26 +02:00
Linus Torvalds 77c32bbbe0 Merge branch 'for-linus' of git://git.infradead.org/users/vkoul/slave-dma
Pull slave-dmaengine updates from Vinod Koul:
 - new Xilixn VDMA driver from Srikanth
 - bunch of updates for edma driver by Thomas, Joel and Peter
 - fixes and updates on dw, ste_dma, freescale, mpc512x, sudmac etc

* 'for-linus' of git://git.infradead.org/users/vkoul/slave-dma: (45 commits)
  dmaengine: sh: don't use dynamic static allocation
  dmaengine: sh: fix print specifier warnings
  dmaengine: sh: make shdma_prep_dma_cyclic static
  dmaengine: Kconfig: Update MXS_DMA help text to include MX6Q/MX6DL
  of: dma: Grammar s/requests/request/, s/used required/required/
  dmaengine: shdma: Enable driver compilation with COMPILE_TEST
  dmaengine: rcar-hpbdma: Include linux/err.h
  dmaengine: sudmac: Include linux/err.h
  dmaengine: sudmac: Keep #include sorted alphabetically
  dmaengine: shdmac: Include linux/err.h
  dmaengine: shdmac: Keep #include sorted alphabetically
  dmaengine: s3c24xx-dma: Add cyclic transfer support
  dmaengine: s3c24xx-dma: Process whole SG chain
  dmaengine: imx: correct sdmac->status for cyclic dma tx
  dmaengine: pch: fix compilation for alpha target
  dmaengine: dw: check return code of dma_async_device_register()
  dmaengine: dw: fix regression in dw_probe() function
  dmaengine: dw: enable clock before access
  dma: pch_dma: Fix Kconfig dependencies
  dmaengine: mpc512x: add support for peripheral transfers
  ...
2014-06-10 10:28:45 -07:00
Geert Uytterhoeven 0d2b7ea928 powerpc: update comments for generic idle conversion
As of commit 799fef0612 ("powerpc: Use generic idle loop"), this
applies to arch_cpu_idle() instead of cpu_idle().

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:18 -07:00
Benjamin Herrenschmidt 0c0a3e5a10 powerpc/powernv: Add missing include to LPC code
kbuild bot spotted that one:

  arch/powerpc/platforms/powernv/opal-lpc.c: In function 'opal_lpc_init_debugfs':
>> arch/powerpc/platforms/powernv/opal-lpc.c:319:35: error: 'powerpc_debugfs_root' undeclared (first use in this function)
     root = debugfs_create_dir("lpc", powerpc_debugfs_root);
                                      ^
We neet to include the definition explicitely.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-07 08:57:21 +10:00
Michael Ellerman 09567e7fd4 powerpc/mm: Check paca psize is up to date for huge mappings
We have a bug in our hugepage handling which exhibits as an infinite
loop of hash faults. If the fault is being taken in the kernel it will
typically trigger the softlockup detector, or the RCU stall detector.

The bug is as follows:

 1. mmap(0xa0000000, ..., MAP_FIXED | MAP_HUGE_TLB | MAP_ANONYMOUS ..)
 2. Slice code converts the slice psize to 16M.
 3. The code on lines 539-540 of slice.c in slice_get_unmapped_area()
    synchronises the mm->context with the paca->context. So the paca slice
    mask is updated to include the 16M slice.
 3. Either:
    * mmap() fails because there are no huge pages available.
    * mmap() succeeds and the mapping is then munmapped.
    In both cases the slice psize remains at 16M in both the paca & mm.
 4. mmap(0xa0000000, ..., MAP_FIXED | MAP_ANONYMOUS ..)
 5. The slice psize is converted back to 64K. Because of the check on line 539
    of slice.c we DO NOT update the paca->context. The paca slice mask is now
    out of sync with the mm slice mask.
 6. User/kernel accesses 0xa0000000.
 7. The SLB miss handler slb_allocate_realmode() **uses the paca slice mask**
    to create an SLB entry and inserts it in the SLB.
18. With the 16M SLB entry in place the hardware does a hash lookup, no entry
    is found so a data access exception is generated.
19. The data access handler calls do_page_fault() -> handle_mm_fault().
10. __handle_mm_fault() creates a THP mapping with do_huge_pmd_anonymous_page().
11. The hardware retries the access, there is still nothing in the hash table
    so once again a data access exception is generated.
12. hash_page() calls into __hash_page_thp() and inserts a mapping in the
    hash. Although the THP mapping maps 16M the hashing is done using 64K
    as the segment page size.
13. hash_page() returns immediately after calling __hash_page_thp(), skipping
    over the code at line 1125. Resulting in the mismatch between the
    paca->context and mm->context not being detected.
14. The hardware retries the access, the hash it generates using the 16M
    SLB entry does NOT match the hash we inserted.
15. We take another data access and go into __hash_page_thp().
16. We see a valid entry in the hpte_slot_array and so we call updatepp()
    which succeeds.
17. Goto 14.

We could fix this in two ways. The first would be to remove or modify
the check on line 539 of slice.c.

The second option is to cause the check of paca psize in hash_page() on
line 1125 to also be done for THP pages.

We prefer the latter, because the check & update of the paca psize is
not done until we know it's necessary. It's also done only on the
current cpu, so we don't need to IPI all other cpus.

Without further rearranging the code, the simplest fix is to pull out
the code that checks paca psize and call it in two places. Firstly for
THP/hugetlb, and secondly for other mappings as before.

Thanks to Dave Jones for trinity, which originally found this bug.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: stable@vger.kernel.org [v3.11+]
2014-06-06 13:54:26 +10:00
Nicolas Pitre 5d4dfddd4f sched: Rename capacity related flags
It is better not to think about compute capacity as being equivalent
to "CPU power".  The upcoming "power aware" scheduler work may create
confusion with the notion of energy consumption if "power" is used too
liberally.

Let's rename the following feature flags since they do relate to capacity:

	SD_SHARE_CPUPOWER  -> SD_SHARE_CPUCAPACITY
	ARCH_POWER         -> ARCH_CAPACITY
	NONTASK_POWER      -> NONTASK_CAPACITY

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: linaro-kernel@lists.linaro.org
Cc: Andy Fleming <afleming@freescale.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: devicetree@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/n/tip-e93lpnxb87owfievqatey6b5@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 11:52:32 +02:00
Vasant Hegde 8b8f7bf4c2 powerpc/powernv: Pass buffer size to OPAL validate flash call
We pass actual buffer size to opal_validate_flash() OPAL API call
and in return it contains output buffer size.

Commit cc146d1d (Fix little endian issues) missed to set the size
param before making OPAL call. So firmware image validation fails.

This patch sets size variable before making OPAL call.

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Tested-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-05 14:54:04 +10:00
Anton Blanchard c1931e2181 powerpc/pseries: hcall functions are exported to modules, need _GLOBAL_TOC()
The hcall macros may call out to c code for tracing, so we need
to set up a valid r2. This fixes an oops found when testing
ibmvscsi as a module.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-05 13:21:28 +10:00
Anton Blanchard 2ac7b0166a powerpc: Exported functions __clear_user and copy_page use r2 so need _GLOBAL_TOC()
__clear_user and copy_page load from the TOC and are also exported
to modules. This means we have to use _GLOBAL_TOC() so that we
create the global entry point that sets up the TOC.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-05 13:20:41 +10:00
Anton Blanchard 6d97d7a28f powerpc/powernv: Set memory_block_size_bytes to 256MB
powerpc sets a low SECTION_SIZE_BITS to accomodate small pseries
boxes. We default to 16MB memory blocks, and boxes with a lot
of memory end up with enormous numbers of sysfs memory nodes.

Set a more reasonable default for powernv of 256MB.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-05 13:20:40 +10:00
Anton Blanchard a5d862576a powerpc: Allow ppc_md platform hook to override memory_block_size_bytes
The pseries platform code unconditionally overrides
memory_block_size_bytes regardless of the running platform.

Create a ppc_md hook that so each platform can choose to
do what it wants.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-05 13:20:39 +10:00
Anton Blanchard 223ca9d855 powerpc/powernv: Fix endian issues in memory error handling code
struct OpalMemoryErrorData is passed to us from firmware, so we
have to byteswap it.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-05 13:20:39 +10:00
Wei Yang 2213fb142f powerpc/eeh: Skip eeh sysfs when eeh is disabled
When eeh is not enabled, and hotplug two pci devices on the same bus, eeh
related sysfs would be added twice for the first added pci device. Since the
eeh_dev is not created when eeh is not enabled.

This patch adds the check, if eeh is not enabled, eeh sysfs will not be
created.

After applying this patch, following warnings are reduced:

sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:00.0/eeh_mode'
sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:00.0/eeh_config_addr'
sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:00.0/eeh_pe_config_addr'

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-05 13:20:38 +10:00
Anton Blanchard 5d73320a96 powerpc: 64bit sendfile is capped at 2GB
commit 8f9c0119d7 (compat: fs: Generic compat_sys_sendfile
implementation) changed the PowerPC 64bit sendfile call from
sys_sendile64 to sys_sendfile.

Unfortunately this broke sendfile of lengths greater than 2G because
sys_sendfile caps at MAX_NON_LFS. Restore what we had previously which
fixes the bug.

Cc: stable@vger.kernel.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-05 13:20:38 +10:00
Benjamin Herrenschmidt fa2dbe2e0f powerpc/powernv: Provide debugfs access to the LPC bus via OPAL
This provides debugfs files to access the LPC bus on Power8
non-virtualized using the appropriate OPAL firmware calls.

The usage is simple: one file per space (IO, MEM and FW),
lseek to the address and read/write the data. IO and MEM always
generate series of byte accesses. FW can generate word and dword
accesses if aligned properly.

Based on an original patch from Rob Lippert and reworked.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-05 13:20:37 +10:00
Benjamin Herrenschmidt c4cad90f9e powerpc/serial: Use saner flags when creating legacy ports
We had a mix & match of flags used when creating legacy ports
depending on where we found them in the device-tree. Among others
we were missing UPF_SKIP_TEST for some kind of ISA ports which is
a problem as quite a few UARTs out there don't support the loopback
test (such as a lot of BMCs).

Let's pick the set of flags used by the SoC code and generalize it
which means autoconf, no loopback test, irq maybe shared and fixed
port.

Sending to stable as the lack of UPF_SKIP_TEST is breaking
serial on some machines so I want this back into distros

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: stable@vger.kernel.org
2014-06-05 13:20:36 +10:00
Michael Ellerman 736256e4f1 powerpc/xmon: Fix up xmon format strings
There are a couple of places where xmon is using %x to print values that
are unsigned long.

I found this out the hard way recently:

 0:mon> p c000000000d0e7c8 c00000033dc90000 00000000a0000089 c000000000000000
 return value is 0x96300500

Which is calling find_linux_pte_or_hugepte(), the result should be a
kernel pointer. After decoding the page tables by hand I discovered the
correct value was c000000396300500.

So fix up that case and a few others.

We also use a mix of 0x%x, %x and %u to print cpu numbers. So
standardise on 0x%x.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-05 13:20:00 +10:00
Benjamin Herrenschmidt 4926616c77 powerpc/powernv: Add calls to support little endian host
When running as a powernv "host" system on P8, we need to switch
the endianness of interrupt handlers. This does it via the appropriate
call to the OPAL firmware which may result in just switching HID0:HILE
but depending on the processor version might need to do a few more
things. This call must be done early before any other processor has
been brought out of firmware.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-05 13:19:59 +10:00
Fabian Frederick f6187769da sys_sgetmask/sys_ssetmask: add CONFIG_SGETMASK_SYSCALL
sys_sgetmask and sys_ssetmask are obsolete system calls no longer
supported in libc.

This patch replaces architecture related __ARCH_WANT_SYS_SGETMAX by expert
mode configuration.That option is enabled by default for those
architectures.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:14 -07:00
Mel Gorman 4f9b16a647 mm: disable zone_reclaim_mode by default
When it was introduced, zone_reclaim_mode made sense as NUMA distances
punished and workloads were generally partitioned to fit into a NUMA
node.  NUMA machines are now common but few of the workloads are
NUMA-aware and it's routine to see major performance degradation due to
zone_reclaim_mode being enabled but relatively few can identify the
problem.

Those that require zone_reclaim_mode are likely to be able to detect
when it needs to be enabled and tune appropriately so lets have a
sensible default for the bulk of users.

This patch (of 2):

zone_reclaim_mode causes processes to prefer reclaiming memory from
local node instead of spilling over to other nodes.  This made sense
initially when NUMA machines were almost exclusively HPC and the
workload was partitioned into nodes.  The NUMA penalties were
sufficiently high to justify reclaiming the memory.  On current machines
and workloads it is often the case that zone_reclaim_mode destroys
performance but not all users know how to detect this.  Favour the
common case and disable it by default.  Users that are sophisticated
enough to know they need zone_reclaim_mode will detect it.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:53:59 -07:00