Commit Graph

134627 Commits

Author SHA1 Message Date
Michael Ellerman 218ea31039 Merge branch 'fixes' into next
Merge our fixes branch, a few of them are tripping people up while
working on top of next, and we also have a dependency between the CXL
fixes and new CXL code we want to merge into next.
2017-07-03 23:05:43 +10:00
Masahiro Yamada 5405c92bc2 powerpc/dts: Use #include "..." to include local DT
Most of DT files in PowerPC use #include "..." to make pre-processor
include DT in the same directory, but we have 3 exceptional files
that use #include <...> for that.

Fix them to remove -I$(srctree)/arch/$(SRCARCH)/boot/dts path from
dtc_cpp_flags.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03 23:00:33 +10:00
Thiago Jung Bauermann bfaa7834b6 powerpc/perf/hv-24x7: Aggregate result elements on POWER9 SMT8
On POWER9 SMT8 the 24x7 API returns two result elements for physical core
and virtual CPU events and we need to add their counts to get the final
result.

Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:33 +10:00
Thiago Jung Bauermann 2e6553aae3 powerpc/perf/hv-24x7: Support v2 of the hypervisor API
POWER9 introduces a new version of the hypervisor API to access the 24x7
perf counters. The new version changed some of the structures used for
requests and results.

Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:33 +10:00
Thiago Jung Bauermann ebd4a5a3eb powerpc/perf/hv-24x7: Minor improvements
There's an H24x7_DATA_BUFFER_SIZE constant, so use it in init_24x7_request.

There's also an HV_PERF_DOMAIN_MAX constant, so use it in
h_24x7_event_init. This makes the comment above the check redundant,
so remove it.

In add_event_to_24x7_request, a statement is terminated with a comma
instead of a semicolon. Fix it.

In hv-24x7.h, improve comments in struct hv_24x7_result.

Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:32 +10:00
Thiago Jung Bauermann 38d8184610 powerpc/perf/hv-24x7: Fix return value of hcalls
The H_GET_24X7_CATALOG_PAGE hcall can return a signed error code, so fix
this in the code.

The H_GET_24X7_DATA hcall can return a signed error code, so fix this in
the code. Also, don't truncate it to 32 bit to use as return value for
make_24x7_request. In case of error h_24x7_event_commit_txn passes that
return value to generic code, so it should be a proper errno. The other
caller of make_24x7_request is single_24x7_request, whose callers don't
actually care which error code is returned so they are not affected by this
change.

Finally, h_24x7_get_value doesn't use the error code from
single_24x7_request, so there's no need to store it.

Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:32 +10:00
Thiago Jung Bauermann 62714a1492 powerpc-perf/hx-24x7: Don't log failed hcall twice
make_24x7_request already calls log_24x7_hcall if it fails, so callers
don't have to do it again.

In fact, since the latter is now only called from the former, there's no
need for a separate log_24x7_hcall anymore so remove it.

Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:31 +10:00
Thiago Jung Bauermann 41f577eb01 powerpc/perf/hv-24x7: Properly iterate through results
hv-24x7.h has a comment mentioning that result_buffer->results can't be
indexed as a normal array because it may contain results of variable sizes,
so fix the loop in h_24x7_event_commit_txn to take the variation into
account when iterating through results.

Another problem in that loop is that it sets h24x7hw->events[i] to NULL.
This assumes that only the i'th result maps to the i'th request, but that
is not guaranteed to be true. We need to leave the event in the array so
that we don't dereference a NULL pointer in case more than one result maps
to one request.

We still assume that each result has only one result element, so warn if
that assumption is violated.

Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:30 +10:00
Thiago Jung Bauermann 36c8fb2c61 powerpc/perf/hv-24x7: Fix off-by-one error in request_buffer check
request_buffer can hold 254 requests, so if it already has that number of
entries we can't add a new one.

Also, define constant to show where the number comes from.

Fixes: e3ee15dc5d ("powerpc/perf/hv-24x7: Define add_event_to_24x7_request()")
Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:30 +10:00
Thiago Jung Bauermann 12bf85a710 powerpc/perf/hv-24x7: Fix passing of catalog version number
H_GET_24X7_CATALOG_PAGE needs to be passed the version number obtained from
the first catalog page obtained previously. This is a 64 bit number, but
create_events_from_catalog truncates it to 32-bit.

This worked on POWER8, but POWER9 actually uses the upper bits so the call
fails with H_P3 because the hypervisor doesn't recognize the version.

This patch also adds the hcall return code to the error message, which is
helpful when debugging the problem.

Fixes: 5c5cd7b502 ("powerpc/perf/hv-24x7: parse catalog and populate sysfs with events")
Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:29 +10:00
Oliver O'Halloran c07424414c powerpc/mm: Enable ZONE_DEVICE on powerpc
Flip the switch. Running around and screaming "IT'S ALIVE" is optional,
but recommended.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:29 +10:00
Anton Blanchard 1b644f57b3 powerpc/mm: Wire up hpte_removebolted for powernv
Adds support for removing bolted (i.e kernel linear mapping) mappings on
powernv. This is needed to support memory hot unplug operations which
are required for the teardown of DAX/PMEM devices.

Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:28 +10:00
Oliver O'Halloran ebd3119793 powerpc/mm: Add devmap support for ppc64
Add support for the devmap bit on PTEs and PMDs for PPC64 Book3S.  This
is used to differentiate device backed memory from transparent huge
pages since they are handled in more or less the same manner by the core
mm code.

Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:28 +10:00
Oliver O'Halloran b584c25440 powerpc/vmemmap: Add altmap support
Adds support to powerpc for the altmap feature of ZONE_DEVICE memory. An
altmap is a driver provided region that is used to provide the backing
storage for the struct pages of ZONE_DEVICE memory. In situations where
large amount of ZONE_DEVICE memory is being added to the system the
altmap reduces pressure on main system memory by allowing the mm/
metadata to be stored on the device itself rather in main memory.

Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:27 +10:00
Oliver O'Halloran d7d9b612f1 powerpc/vmemmap: Reshuffle vmemmap_free()
Removes an indentation level and shuffles some code around to make the
following patch cleaner. No functional changes.

Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:26 +10:00
Oliver O'Halloran 65f7d04978 mm, x86: Add ARCH_HAS_ZONE_DEVICE to Kconfig
Currently ZONE_DEVICE depends on X86_64 and this will get unwieldly as
new architectures (and platforms) get ZONE_DEVICE support. Move to an
arch selected Kconfig option to save us the trouble.

Cc: linux-mm@kvack.org
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:26 +10:00
Oliver O'Halloran 7a849a6cf3 powerpc/hugetlbfs: Export HPAGE_SHIFT
Export it so it can be referenced inside a module.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:25 +10:00
Nicholas Piggin 4e287e655e powerpc: use spin loop primitives in some functions
Use the different spin loop primitives in some simple powerpc
spin loops, including those which will spin as a common case.

This will help to test the spin loop primitives before more
conversions are done.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Add some includes of <linux/processor.h>]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:24 +10:00
Nicholas Piggin ede8e2bbb0 powerpc/64: implement spin loop primitives
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:17 +10:00
Akshay Adiga 4d0d7c02df powerpc/powernv/idle: Clear r12 on wakeup from stop lite
pnv_wakeup_noloss() expects r12 to contain SRR1 value to determine if the wakeup
reason is an HMI in CHECK_HMI_INTERRUPT.

When we wakeup with ESL=0, SRR1 will not contain the wakeup reason, so there is
no point setting r12 to SRR1.

However, we don't set r12 at all so r12 contains garbage (likely a kernel
pointer), and is still used to check HMI assuming that it contained SRR1. This
causes the OPAL msglog to be filled with the following print:

  HMI: Received HMI interrupt: HMER = 0x0040000000000000

This patch clears r12 after waking up from stop with ESL=EC=0, so that we don't
accidentally enter the HMI handler in pnv_wakeup_noloss() if the value of
r12[42:45] corresponds to HMI as wakeup reason.

Prior to commit 9d29250136 ("powerpc/64s/idle: Avoid SRR usage in idle
sleep/wake paths") this bug existed, in that we would incorrectly look at SRR1
to check for a HMI when SRR1 didn't contain a wakeup reason. However the SRR1
value would just happen to never have bits 42:45 set.

Fixes: 9d29250136 ("powerpc/64s/idle: Avoid SRR usage in idle sleep/wake paths")
Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Change log and comment massaging]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28 22:46:06 +10:00
Anshuman Khandual 39e4675183 powerpc/mm: Add comments on vmemmap physical mapping
Adds some explaination on how the vmemmap based struct page layout's
physical mapping is allocated and tracked through linked list. It
also keeps note of a possible race condition.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28 13:08:17 +10:00
Anshuman Khandual b0f36c10de powerpc/mm: Add comments to the vmemmap layout
Add some explaination to the layout of vmemmap virtual address
space and how physical page mapping is only used for valid PFNs
present at any point on the system.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28 13:08:17 +10:00
Santosh Sivaraj c642af9c41 powerpc/smp: Convert NR_CPUS to nr_cpu_ids
nr_cpu_ids can be limited by nr_cpus boot parameter, whereas NR_CPUS is a
compile time constant, which shouldn't be compared against during cpu kick.

Signed-off-by: Santosh Sivaraj <santosh@fossix.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28 13:08:16 +10:00
Santosh Sivaraj f8d0d5dc64 powerpc/smp: Do not BUG_ON if invalid CPU during kick
During secondary start, we do not need to BUG_ON if an invalid CPU number
is passed. We already print an error if secondary cannot be started, so
just return an error instead.

Signed-off-by: Santosh Sivaraj <santosh@fossix.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28 13:08:16 +10:00
Javier Martinez Canillas adeb8667ea powerpc/44x: Add generic compatible string for I2C EEPROM
The at24 driver allows to register I2C EEPROM chips using different vendor
and devices, but the I2C subsystem does not take the vendor into account
when matching using the I2C table since it only has device entries.

But when matching using an OF table, both the vendor and device has to be
taken into account so the driver defines only a set of compatible strings
using the "atmel" vendor as a generic fallback for compatible I2C devices.

So add this generic fallback to the device node compatible string to make
the device to match the driver using the OF device ID table.

Signed-off-by: Javier Martinez Canillas <javier@dowhile0.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28 13:08:15 +10:00
Javier Martinez Canillas 8d0590cefd powerpc/83xx: Add generic compatible string for I2C EEPROM
The at24 driver allows to register I2C EEPROM chips using different vendor
and devices, but the I2C subsystem does not take the vendor into account
when matching using the I2C table since it only has device entries.

But when matching using an OF table, both the vendor and device has to be
taken into account so the driver defines only a set of compatible strings
using the "atmel" vendor as a generic fallback for compatible I2C devices.

So add this generic fallback to the device node compatible string to make
the device to match the driver using the OF device ID table.

Signed-off-by: Javier Martinez Canillas <javier@dowhile0.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28 13:08:14 +10:00
Javier Martinez Canillas 9b40916827 powerpc/512x: Add generic compatible string for I2C EEPROM
The at24 driver allows to register I2C EEPROM chips using different vendor
and devices, but the I2C subsystem does not take the vendor into account
when matching using the I2C table since it only has device entries.

But when matching using an OF table, both the vendor and device has to be
taken into account so the driver defines only a set of compatible strings
using the "atmel" vendor as a generic fallback for compatible I2C devices.

So add this generic fallback to the device node compatible string to make
the device to match the driver using the OF device ID table.

Signed-off-by: Javier Martinez Canillas <javier@dowhile0.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28 13:08:14 +10:00
Javier Martinez Canillas 226b9391d1 powerpc/fsl: Add generic compatible string for I2C EEPROM
The at24 driver allows to register I2C EEPROM chips using different vendor
and devices, but the I2C subsystem does not take the vendor into account
when matching using the I2C table since it only has device entries.

But when matching using an OF table, both the vendor and device has to be
taken into account so the driver defines only a set of compatible strings
using the "atmel" vendor as a generic fallback for compatible I2C devices.

So add this generic fallback to the device node compatible string to make
the device to match the driver using the OF device ID table.

Signed-off-by: Javier Martinez Canillas <javier@dowhile0.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28 13:08:13 +10:00
Javier Martinez Canillas fd39318806 powerpc/5200: Add generic compatible string for I2C EEPROM
The at24 driver allows to register I2C EEPROM chips using different vendor
and devices, but the I2C subsystem does not take the vendor into account
when matching using the I2C table since it only has device entries.

But when matching using an OF table, both the vendor and device has to be
taken into account so the driver defines only a set of compatible strings
using the "atmel" vendor as a generic fallback for compatible I2C devices.

So add this generic fallback to the device node compatible string to make
the device to match the driver using the OF device ID table.

Signed-off-by: Javier Martinez Canillas <javier@dowhile0.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28 13:08:13 +10:00
Hari Bathini 68fa6478e3 powerpc/fadump: add reschedule point while releasing memory
Around 95% of memory is reserved by fadump/capture kernel. All this
memory is freed, one page at a time, on writing '1' to the node
/sys/kernel/fadump_release_mem. On systems with large memory, this
can take a long time to complete, leading to soft lockup warning
messages. To avoid this, add reschedule points at regular intervals.

Also, while memblock_reserve() implicitly takes care of holes in the
given memory range while reserving memory, those holes need to be
taken care of while releasing memory as memory is freed one page at
a time. Add support to skip holes while releasing memory.

Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28 13:08:11 +10:00
Hari Bathini a5a05b91c7 powerpc/fadump: provide a helpful error message
fadump fails to register when there are holes in boot memory area.
Provide a helpful error message to the user in such case.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28 13:08:10 +10:00
Hari Bathini eae0dfcc44 powerpc/fadump: avoid holes in boot memory area when fadump is registered
To register fadump, boot memory area - the size of low memory chunk that
is required for a kernel to boot successfully when booted with restricted
memory, is assumed to have no holes. But this memory area is currently
not protected from hot-remove operations. So, fadump could fail to
re-register after a memory hot-remove operation, if memory is removed
from boot memory area. To avoid this, ensure that memory from boot
memory area is not hot-removed when fadump is registered.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Reviewed-by: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28 13:08:09 +10:00
Hari Bathini a77af552cc powerpc/fadump: avoid duplicates in crash memory ranges
fadump sets up crash memory ranges to be used for creating PT_LOAD
program headers in elfcore header. Memory chunk RMA_START through
boot memory area size is added as the first memory range because
firmware, at the time of crash, moves this memory chunk to different
location specified during fadump registration making it necessary to
create a separate program header for it with the correct offset.
This memory chunk is skipped while setting up the remaining memory
ranges. But currently, there is possibility that some of this memory
may have duplicate entries like when it is hot-removed and added
again. Ensure that no two memory ranges represent the same memory.

When 5 lmbs are hot-removed and then hot-plugged before registering
fadump, here is how the program headers in /proc/vmcore exported by
fadump look like

without this change:

  Program Headers:
    Type           Offset             VirtAddr           PhysAddr
                   FileSiz            MemSiz              Flags  Align
    NOTE           0x0000000000010000 0x0000000000000000 0x0000000000000000
                   0x0000000000001894 0x0000000000001894         0
    LOAD           0x0000000000021020 0xc000000000000000 0x0000000000000000
                   0x0000000040000000 0x0000000040000000  RWE    0
    LOAD           0x0000000040031020 0xc000000000000000 0x0000000000000000
                   0x0000000010000000 0x0000000010000000  RWE    0
    LOAD           0x0000000050040000 0xc000000010000000 0x0000000010000000
                   0x0000000050000000 0x0000000050000000  RWE    0
    LOAD           0x00000000a0040000 0xc000000060000000 0x0000000060000000
                   0x000000019ffe0000 0x000000019ffe0000  RWE    0

and with this change:

  Program Headers:
    Type           Offset             VirtAddr           PhysAddr
                   FileSiz            MemSiz              Flags  Align
    NOTE           0x0000000000010000 0x0000000000000000 0x0000000000000000
                   0x0000000000001894 0x0000000000001894         0
    LOAD           0x0000000000021020 0xc000000000000000 0x0000000000000000
                   0x0000000040000000 0x0000000040000000  RWE    0
    LOAD           0x0000000040030000 0xc000000040000000 0x0000000040000000
                   0x0000000020000000 0x0000000020000000  RWE    0
    LOAD           0x0000000060030000 0xc000000060000000 0x0000000060000000
                   0x000000019ffe0000 0x000000019ffe0000  RWE    0

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Reviewed-by: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28 13:08:09 +10:00
Madhavan Srinivasan 24bedcb7c8 powerpc/perf: Fix branch event code for power9
Correct "branch" event code of Power9 is "r4d05e". Replace the current
"branch" event code with "r4d05e" and add a hack to use "r10012" as
event code for Power9 DD1.

Fixes: d89f473ff6 ("powerpc/perf: Fix PM_BRU_CMPL event code for power9")
Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28 13:08:08 +10:00
Benjamin Herrenschmidt 89d8bb1638 powerpc/xive: Silence message about VP block allocation
There is no reason for that message to be pr_info(), it will be printed
every time we start a KVM guest.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28 13:08:08 +10:00
Benjamin Herrenschmidt ba6d334ac2 powerpc/64s: Invalidate ERAT on powersave wakeup for POWER9
On POWER9 the ERAT may be incorrect on wakeup from some stop states
that lose state. This causes random segvs and illegal instructions
when these stop states are enabled.

This patch invalidates the ERAT on wakeup on POWER9 to prevent this
from causing a problem.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Merge comment change with upstream changes]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-27 14:18:30 +10:00
Benjamin Herrenschmidt 74e27c6af5 powerpc: Only do ERAT invalidate on radix context switch on P9 DD1
From: Michael Neuling <mikey@neuling.org>

On P9 (Nimbus) DD2 and later, in radix mode, the move to the PID
register will implicitly invalidate the user space ERAT entries
and leave the kernel ones alone. Thus the only thing needed is
an isync() to synchronize this with subsequent uaccess's

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-27 14:15:54 +10:00
Russell Currey 8e3f1b1d82 powerpc/powernv/pci: Enable 64-bit devices to access >4GB DMA space
On PHB3/POWER8 systems, devices can select between two different sections
of address space, TVE#0 and TVE#1.  TVE#0 is intended for 32bit devices
that aren't capable of addressing more than 4GB.  Selecting TVE#1 instead,
with the capability of addressing over 4GB, is performed by setting bit 59
of a PCI address.

However, some devices aren't capable of addressing at least 59 bits, but
still want more than 4GB of DMA space.  In order to enable this, reconfigure
TVE#0 to be suitable for 64-bit devices by allocating memory past the
initial 4GB that is inaccessible by 64-bit DMAs.

This bypass mode is only enabled if a device requests 4GB or more of DMA
address space, if the system has PHB3 (POWER8 systems), and if the device
does not share a PE with any devices from different vendors.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-27 12:14:28 +10:00
Russell Currey a0f98629f1 powerpc/powernv/pci: Add helper to check if a PE has a single vendor
Add a helper that determines if all the devices contained in a given PE
are all from the same vendor or not.  This can be useful in determining
if it's okay to make PE-wide changes that may be suitable for some
devices but not for others.

This is used later in the series.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-27 12:14:28 +10:00
Russell Currey a4b48ba904 powerpc/powernv/pci: Add support for PHB4 diagnostics
As with P7IOC and PHB3, add kernel-side support for decoding and printing
diagnostic data for PHB4.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-27 12:14:27 +10:00
Russell Currey 5cb1f8fddd powerpc/powernv/pci: Dynamically allocate PHB diag data
Diagnostic data for PHBs currently works by allocated a fixed-sized buffer.
This is simple, but either wastes memory (though only a few kilobytes) or
in the case of PHB4 isn't enough to fit the whole data blob.

For machines that don't describe the diagnostic data size in the device
tree, use the hardcoded buffer size as before.  For those that do, only
allocate exactly what's needed.

In the special case of P7IOC (which has two types of diag data), the larger
should be specified in the device tree.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-27 12:14:27 +10:00
Russell Currey 31bbd45af3 powerpc/powernv/pci: Reduce spam when dumping PEST
Dumping the PE State Tables (PEST) can be highly verbose if a number of PEs
are affected, especially in the case where the whole PHB is frozen and 512
lines get printed.  Check for duplicates when dumping the PEST to reduce
useless output.

For example:

    PE[0f8] A/B: 9700002600000000 80000080d00000f8
    PE[0f9] A/B: 8000000000000000 0000000000000000
    PE[..0fe] A/B: as above
    PE[0ff] A/B: 8440002b00000000 0000000000000000

instead of:

    PE[0f8] A/B: 9700002600000000 80000080d00000f8
    PE[0f9] A/B: 8000000000000000 0000000000000000
    PE[0fa] A/B: 8000000000000000 0000000000000000
    PE[0fb] A/B: 8000000000000000 0000000000000000
    PE[0fc] A/B: 8000000000000000 0000000000000000
    PE[0fd] A/B: 8000000000000000 0000000000000000
    PE[0fe] A/B: 8000000000000000 0000000000000000
    PE[0ff] A/B: 8440002b00000000 0000000000000000

and you can imagine how much worse it can get for 512 PEs.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-27 12:14:26 +10:00
Michael Neuling 2bafb7ffa3 powerpc/tm: Fix comment
Update to real function name.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-27 12:09:09 +10:00
Michael Neuling aa9a951636 powerpc: Fix asm offsets to point to actual FP and VMX regs
The asm code assumes the FP regs are at the start of fp_state. While
this is true now, it may not always be the case and there is nothing
enforcing it.

This fixes the asm-offsets to point to the actual FP registers inside
the fp_state.  Similarly for VMX.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-27 12:09:08 +10:00
Michael Neuling 64ebb9a208 powerpc: Fix /proc/cpuinfo revision for POWER9 DD2
The P9 PVR bits 12-15 don't indicate a revision but instead different
chip configurations.  From BookIV we have:
   Bits      Configuration
    0 :    Scale out 12 cores
    1 :    Scale out 24 cores
    2 :    Scale up  12 cores
    3 :    Scale up  24 cores

DD1 doesn't use this but DD2 does. Linux will mostly use the "Scale
out 24 core" configuration (ie. SMT4 not SMT8) which results in a PVR
of 0x004e1200. The reported revision in /proc/cpuinfo is hence
reported incorrectly as "18.0".

This patch fixes this to mask off only the relevant bits for the major
revision (ie. bits 8-11) for POWER9.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-27 12:09:07 +10:00
Michael Ellerman d6bd8194e2 powerpc/32: Avoid miscompilation w/GCC 4.6.3 - don't inline copy_to/from_user()
Larry Finger reported that his Powerbook G4 was no longer booting with v4.12-rc,
userspace was up but giving weird errors such as:

  udevd[64]: starting version 175
  udevd[64]: Unable to receive ctrl message: Bad address.
  modprobe: chdir(4.12-rc1): No such file or directory

He bisected the problem to commit 3448890c32 ("powerpc: get rid of zeroing,
switch to RAW_COPY_USER").

Al identified that the problem is actually a miscompilation by GCC 4.6.3, which
is exposed by the above commit.

Al also pointed out that inlining copy_to/from_user() is probably of little or
no benefit, which is correct. Using Anton's copy_to_user benchmark, with a
pathological single byte copy, we see a small increase in performance
by *removing* inlining:

  Before (inlined):
  # time ./copy_to_user -w -l 1 -i 10000000	( x 3 )
  real	0m22.063s
  real	0m22.059s
  real	0m22.076s

  After:
  # time ./copy_to_user -w -l 1 -i 10000000	( x 3 )
  real	0m21.325s
  real	0m21.299s
  real	0m21.364s

So as a small performance improvement and to avoid the miscompilation, drop
inlining copy_to/from_user() on 32-bit.

Fixes: 3448890c32 ("powerpc: get rid of zeroing, switch to RAW_COPY_USER")
Reported-by: Larry Finger <Larry.Finger@lwfinger.net>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-26 23:25:08 +10:00
Balbir Singh 0428491cba powerpc/mm: Trace tlbie(l) instructions
Add a trace point for tlbie(l) (Translation Lookaside Buffer Invalidate
Entry (Local)) instructions.

The tlbie instruction has changed over the years, so not all versions
accept the same operands. Use the ISA v3 field operands because they are
the most verbose, we may change them in future.

Example output:

  qemu-system-ppc-5371  [016]  1412.369519: tlbie:
  	tlbie with lpid 0, local 1, rb=67bd8900174c11c1, rs=0, ric=0 prs=0 r=0

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
[mpe: Add some missing trace_tlbie()s, reword change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-23 21:14:49 +10:00
Nicholas Piggin 34f19ff1b5 powerpc/64: Initialise thread_info for emergency stacks
Emergency stacks have their thread_info mostly uninitialised, which in
particular means garbage preempt_count values.

Emergency stack code runs with interrupts disabled entirely, and is
used very rarely, so this has been unnoticed so far. It was found by a
proposed new powerpc watchdog that takes a soft-NMI directly from the
masked_interrupt handler and using the emergency stack. That crashed
at BUG_ON(in_nmi()) in nmi_enter(). preempt_count()s were found to be
garbage.

To fix this, zero the entire THREAD_SIZE allocation, and initialize
the thread_info.

Cc: stable@vger.kernel.org
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Move it all into setup_64.c, use a function not a macro. Fix
      crashes on Cell by setting preempt_count to 0 not HARDIRQ_OFFSET]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-23 13:25:38 +10:00
Alistair Popple bbd5ff50af powerpc/powernv/npu-dma: Add explicit flush when sending an ATSD
NPU2 requires an extra explicit flush to an active GPU PID when
sending address translation shoot downs (ATSDs) to reliably flush the
GPU TLB. This patch adds just such a flush at the end of each sequence
of ATSDs.

We can safely use PID 0 which is always reserved and active on the
GPU. PID 0 is only used for init_mm which will never be a user mm on
the GPU. To enforce this we add a check in pnv_npu2_init_context()
just in case someone tries to use PID 0 on the GPU.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
[mpe: Use true/false for bool literals]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-22 21:21:08 +10:00
Paul Mackerras d4cfb11387 powerpc: Convert VDSO update function to use new update_vsyscall interface
This converts the powerpc VDSO time update function to use the new
interface introduced in commit 576094b7f0 ("time: Introduce new
GENERIC_TIME_VSYSCALL", 2012-09-11).  Where the old interface gave
us the time as of the last update in seconds and whole nanoseconds,
with the new interface we get the nanoseconds part effectively in
a binary fixed-point format with tk->tkr_mono.shift bits to the
right of the binary point.

With the old interface, the fractional nanoseconds got truncated,
meaning that the value returned by the VDSO clock_gettime function
would have about 1ns of jitter in it compared to the value computed
by the generic timekeeping code in the kernel.

The powerpc VDSO time functions (clock_gettime and gettimeofday)
already work in units of 2^-32 seconds, or 0.23283 ns, because that
makes it simple to split the result into seconds and fractional
seconds, and represent the fractional seconds in either microseconds
or nanoseconds.  This is good enough accuracy for now, so this patch
avoids changing how the VDSO works or the interface in the VDSO data
page.

This patch converts the powerpc update_vsyscall_old to be called
update_vsyscall and use the new interface.  We convert the fractional
second to units of 2^-32 seconds without truncating to whole nanoseconds.
(There is still a conversion to whole nanoseconds for any legacy users
of the vdso_data/systemcfg stamp_xtime field.)

In addition, this improves the accuracy of the computation of tb_to_xs
for those systems with high-frequency timebase clocks (>= 268.5 MHz)
by doing the right shift in two parts, one before the multiplication and
one after, rather than doing the right shift before the multiplication.
(We can't do all of the right shift after the multiplication unless we
use 128-bit arithmetic.)

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-22 16:26:23 +10:00