Commit Graph

1095 Commits

Author SHA1 Message Date
Linus Torvalds 195303136f Kconfig file consolidation for v4.21
Consolidation of bus (PCI, PCMCIA, EISA, RapidIO) config entries
 by Christoph Hellwig.
 
 Currently, every architecture that wants to provide common peripheral
 busses needs to add some boilerplate code and include the right Kconfig
 files. This series instead just selects the presence (when needed) and
 then handles everything in the bus-specific Kconfig file under drivers/.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJcJilwAAoJED2LAQed4NsGt1YP/RMTEUqbCSwS/CnTLrE+aVTC
 O2aWwB80ZlVwpeBbHLW5/M88OvOev0UaCr+gyzgpFRl5ITzS7Jevb8VbpGzblbH7
 bFxIEyZFGQiy9oEWw3Lfu9JRSsLm3jNo7hkmdBSn2Rw3KkEd/YF7K3q9GuA7BpCS
 ZxAirebvEpr4KYEzkuc57NqCYx2Tc8G+JWr5D7pZCFaq9vxYt3TddGqw/c7iQVSQ
 1Og1809IdhGyCSlA/ExfaqaBMaJHMRAOHX5GgkqZw1EbFcizUFhAAsKCrGL5nBtX
 NiWF9jhgHR1M+L69jfctOstrmGQD2KicNgWQf1aS5RQkPfjuqIKGT/i9g6J1pVyX
 TaW1J36Hcl8PpsKoPBnnrixd1T41O3/PuqtEJRm7LCBYOQiwS9sEmLO09RDRjER8
 SPAAyvkhE8oq+0RHiTYN4tm8dyJc1djZ5wzgLnwFPAnU6SR+mbN02RzBMsYZXD+x
 RNbBSGBRJFQDBw6Rn+ktcIQvcKYmUqe1k1YNHMy6kG3QqvhBaDy+8PA/YjIKPQYQ
 B/NNUAMEJMys1OQrRL2UDXb2ysaCpzwMmlrBW2IwYsQrX5OwbPkNuQ5Mbe1Lr+mc
 4NXR+HubvojsHaAby+OhFbrUX2Jcz3wqYj7aannb9sMRmw0VJXV5dPYUqje3ZhPS
 P2AovKT8O9nWsEttqER5
 =WxId
 -----END PGP SIGNATURE-----

Merge tag 'kconfig-v4.21-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kconfig file consolidation from Masahiro Yamada:
 "Consolidation of bus (PCI, PCMCIA, EISA, RapidIO) config entries by
  Christoph Hellwig.

  Currently, every architecture that wants to provide common peripheral
  busses needs to add some boilerplate code and include the right
  Kconfig files. This series instead just selects the presence (when
  needed) and then handles everything in the bus-specific Kconfig file
  under drivers/"

* tag 'kconfig-v4.21-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  pcmcia: remove per-arch PCMCIA config entry
  eisa: consolidate EISA Kconfig entry in drivers/eisa
  rapidio: consolidate RAPIDIO config entry in drivers/rapidio
  pcmcia: allow PCMCIA support independent of the architecture
  PCI: consolidate the PCI_SYSCALL symbol
  PCI: consolidate the PCI_DOMAINS and PCI_DOMAINS_GENERIC config options
  PCI: consolidate PCI config entry in drivers/pci
  MIPS: remove the HT_PCI config option
2018-12-29 13:40:29 -08:00
Rob Herring 2c8e65b595 powerpc: Use of_node_name_eq for node name comparisons
Convert string compares of DT node names to use of_node_name_eq helper
instead. This removes direct access to the node name pointer.

A couple of open coded iterating thru the child node names are converted
to use for_each_child_of_node() instead.

Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-22 21:29:50 +11:00
Alexey Kardashevskiy 58629c0dc3 powerpc/powernv/npu: Fault user page into the hypervisor's pagetable
When a page fault happens in a GPU, the GPU signals the OS and the GPU
driver calls the fault handler which populated a page table; this allows
the GPU to complete an ATS request.

On the bare metal get_user_pages() is enough as it adds a pte to
the kernel page table but under KVM the partition scope tree does not get
updated so ATS will still fail.

This reads a byte from an effective address which causes HV storage
interrupt and KVM updates the partition scope tree.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21 16:20:46 +11:00
Alexey Kardashevskiy 135ef95405 powerpc/powernv/npu: Check mmio_atsd array bounds when populating
A broken device tree might contain more than 8 values and introduce hard
to debug memory corruption bug. This adds the boundary check.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21 16:20:46 +11:00
Alexey Kardashevskiy 1b785611e1 powerpc/powernv/npu: Add release_ownership hook
In order to make ATS work and translate addresses for arbitrary
LPID and PID, we need to program an NPU with LPID and allow PID wildcard
matching with a specific MSR mask.

This implements a helper to assign a GPU to LPAR and program the NPU
with a wildcard for PID and a helper to do clean-up. The helper takes
MSR (only DR/HV/PR/SF bits are allowed) to program them into NPU2 for
ATS checkout requests support.

This exports pnv_npu2_unmap_lpar_dev() as following patches will use it
from the VFIO driver.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21 16:20:46 +11:00
Alexey Kardashevskiy 0bd971676e powerpc/powernv/npu: Add compound IOMMU groups
At the moment the powernv platform registers an IOMMU group for each
PE. There is an exception though: an NVLink bridge which is attached
to the corresponding GPU's IOMMU group making it a master.

Now we have POWER9 systems with GPUs connected to each other directly
bypassing PCI. At the moment we do not control state of these links so
we have to put such interconnected GPUs to one IOMMU group which means
that the old scheme with one GPU as a master won't work - there will
be up to 3 GPUs in such group.

This introduces a npu_comp struct which represents a compound IOMMU
group made of multiple PEs - PCI PEs (for GPUs) and NPU PEs (for
NVLink bridges). This converts the existing NVLink1 code to use the
new scheme. >From now on, each PE must have a valid
iommu_table_group_ops which will either be called directly (for a
single PE group) or indirectly from a compound group handlers.

This moves IOMMU group registration for NVLink-connected GPUs to
npu-dma.c. For POWER8, this stores a new compound group pointer in the
PE (so a GPU is still a master); for POWER9 the new group pointer is
stored in an NPU (which is allocated per a PCI host controller).

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[mpe: Initialise npdev to NULL in pnv_try_setup_npu_table_group()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21 16:20:46 +11:00
Alexey Kardashevskiy 83fb8ccf97 powerpc/powernv/npu: Convert NPU IOMMU helpers to iommu_table_group_ops
At the moment NPU IOMMU is manipulated directly from the IODA2 PCI
PE code; PCI PE acts as a master to NPU PE. Soon we will have compound
IOMMU groups with several PEs from several different PHB (such as
interconnected GPUs and NPUs) so there will be no single master but
a one big IOMMU group.

This makes a first step and converts an NPU PE with a set of extern
function to a table group.

This should cause no behavioral change. Note that
pnv_npu_release_ownership() has never been implemented.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21 16:20:46 +11:00
Alexey Kardashevskiy b04149c2dd powerpc/powernv/npu: Move single TVE handling to NPU PE
Normal PCI PEs have 2 TVEs, one per a DMA window; however NPU PE has only
one which points to one of two tables of the corresponding PCI PE.

So whenever a new DMA window is programmed to PEs, the NPU PE needs to
release old table in order to use the new one.

Commit d41ce7b1bc ("powerpc/powernv/npu: Do not try invalidating 32bit
table when 64bit table is enabled") did just that but in pci-ioda.c
while it actually belongs to npu-dma.c.

This moves the single TVE handling to npu-dma.c. This does not implement
restoring though as it is highly unlikely that we can set the table to
PCI PE and cannot to NPU PE and if that fails, we could only set 32bit
table to NPU PE and this configuration is not really supported or wanted.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21 16:20:46 +11:00
Alexey Kardashevskiy 847e6563aa powerpc/powernv: Reference iommu_table while it is linked to a group
The iommu_table pointer stored in iommu_table_group may get stale
by accident, this adds referencing and removes a redundant comment
about this.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21 16:20:46 +11:00
Alexey Kardashevskiy 5eada8a3f0 powerpc/iommu_api: Move IOMMU groups setup to a single place
Registering new IOMMU groups and adding devices to them are separated in
code and the latter is dug in the DMA setup code which it does not
really belong to.

This moved IOMMU groups setup to a separate helper which registers a group
and adds devices as before. This does not make a difference as IOMMU
groups are not used anyway; the only dependency here is that
iommu_add_device() requires a valid pointer to an iommu_table
(set by set_iommu_table_base()).

To keep the old behaviour, this does not add new IOMMU groups for PEs
with no DMA weight and also skips NVLink bridges which do not have
pci_controller_ops::setup_bridge (the normal way of adding PEs).

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21 16:20:46 +11:00
Alexey Kardashevskiy c4e9d3c1e6 powerpc/powernv/pseries: Rework device adding to IOMMU groups
The powernv platform registers IOMMU groups and adds devices to them
from the pci_controller_ops::setup_bridge() hook except one case when
virtual functions (SRIOV VFs) are added from a bus notifier.

The pseries platform registers IOMMU groups from
the pci_controller_ops::dma_bus_setup() hook and adds devices from
the pci_controller_ops::dma_dev_setup() hook. The very same bus notifier
used for powernv does not add devices for pseries though as
__of_scan_bus() adds devices first, then it does the bus/dev DMA setup.

Both platforms use iommu_add_device() which takes a device and expects
it to have a valid IOMMU table struct with an iommu_table_group pointer
which in turn points the iommu_group struct (which represents
an IOMMU group). Although the helper seems easy to use, it relies on
some pre-existing device configuration and associated data structures
which it does not really need.

This simplifies iommu_add_device() to take the table_group pointer
directly. Pseries already has a table_group pointer handy and the bus
notified is not used anyway. For powernv, this copies the existing bus
notifier, makes it work for powernv only which means an easy way of
getting to the table_group pointer. This was tested on VFs but should
also support physical PCI hotplug.

Since iommu_add_device() receives the table_group pointer directly,
pseries does not do TCE cache invalidation (the hypervisor does) nor
allow multiple groups per a VFIO container (in other words sharing
an IOMMU table between partitionable endpoints), this removes
iommu_table_group_link from pseries.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21 16:20:46 +11:00
Alexey Kardashevskiy 0e759bd752 powerpc/powernv/npu: Move OPAL calls away from context manipulation
When introduced, the NPU context init/destroy helpers called OPAL which
enabled/disabled PID (a userspace memory context ID) filtering in an NPU
per a GPU; this was a requirement for P9 DD1.0. However newer chip
revision added a PID wildcard support so there is no more need to
call OPAL every time a new context is initialized. Also, since the PID
wildcard support was added, skiboot does not clear wildcard entries
in the NPU so these remain in the hardware till the system reboot.

This moves LPID and wildcard programming to the PE setup code which
executes once during the booting process so NPU2 context init/destroy
won't need to do additional configuration.

This replaces the check for FW_FEATURE_OPAL with a check for npu!=NULL as
this is the way to tell if the NPU support is present and configured.

This moves pnv_npu2_init() declaration as pseries should be able to use it.
This keeps pnv_npu2_map_lpar() in powernv as pseries is not allowed to
call that. This exports pnv_npu2_map_lpar_dev() as following patches
will use it from the VFIO driver.

While at it, replace redundant list_for_each_entry_safe() with
a simpler list_for_each_entry().

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21 16:20:46 +11:00
Alexey Kardashevskiy 46a1449d9e powerpc/powernv: Move npu struct from pnv_phb to pci_controller
The powernv PCI code stores NPU data in the pnv_phb struct. The latter
is referenced by pci_controller::private_data. We are going to have NPU2
support in the pseries platform as well but it does not store any
private_data in in the pci_controller struct; and even if it did,
it would be a different data structure.

This makes npu a pointer and stores it one level higher in
the pci_controller struct.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21 16:20:46 +11:00
Alexey Kardashevskiy ab7032e793 powerpc/ioda/npu: Call skiboot's hot reset hook when disabling NPU2
The skiboot firmware has a hot reset handler which fences the NVIDIA V100
GPU RAM on Witherspoons and makes accesses no-op instead of throwing HMIs:
https://github.com/open-power/skiboot/commit/fca2b2b839a67

Now we are going to pass V100 via VFIO which most certainly involves
KVM guests which are often terminated without getting a chance to offline
GPU RAM so we end up with a running machine with misconfigured memory.
Accessing this memory produces hardware management interrupts (HMI)
which bring the host down.

To suppress HMIs, this wires up this hot reset hook to vfio_pci_disable()
via pci_disable_device() which switches NPU2 to a safe mode and prevents
HMIs.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Alistair Popple <alistair@popple.id.au>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21 16:20:46 +11:00
Mahesh Salgaonkar 08fb726df1 powerpc/powernv: Move opal_power_control_init() call in opal_init().
opal_power_control_init() depends on opal message notifier to be
initialized, which is done in opal_init()->opal_message_init(). But both
these initialization are called through machine initcalls and it all
depends on in which order they being called. So far these are called in
correct order (may be we got lucky) and never saw any issue. But it is
clearer to control initialization order explicitly by moving
opal_power_control_init() into opal_init().

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21 11:32:49 +11:00
Oliver O'Halloran 5f639e5fad powerpc/powernv: Remove PCI_MSI ifdef checks
CONFIG_PCI_MSI was made mandatory by commit a311e738b6
("powerpc/powernv: Make PCI non-optional") so the #ifdef
checks around CONFIG_PCI_MSI here can be removed entirely.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21 11:32:49 +11:00
Alexey Kardashevskiy c20577014f powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status
The current implementation of the OPAL_PCI_EEH_FREEZE_STATUS call in
skiboot's NPU driver does not touch the pci_error_type parameter so
it might have garbage but the powernv code analyzes it nevertheless.

This initializes pcierr and fstate to zero in all call sites.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:59:03 +11:00
Alexey Kardashevskiy a25de7af34 powerpc/powernv/ioda: Reduce a number of hooks in pnv_phb
fixup_phb() is never used, this removes it.

pick_m64_pe() and reserve_m64_pe() are always defined for all powernv
PHBs: they are initialized by pnv_ioda_parse_m64_window() which is
called unconditionally from pnv_pci_init_ioda_phb() which initializes
all known PHB types on powernv so we can open code them.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:59:03 +11:00
Alexey Kardashevskiy f21b0a45e4 powerpc/powernv/ioda1: Remove dead code for a single device PE
At the moment PNV_IODA_PE_DEV is only used for NPU PEs which are not
present on IODA1 machines (i.e. POWER7) so let's remove a piece of
dead code.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:59:03 +11:00
Alexey Kardashevskiy fa1ada7889 powerpc/powernv/npu: Remove unused headers and a macro.
The macro and few headers are not used so remove them.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:59:03 +11:00
Alexey Kardashevskiy bdbf649efe powerpc/powernv/ioda: Allocate indirect TCE levels of cached userspace addresses on demand
The powernv platform maintains 2 TCE tables for VFIO - a hardware TCE
table and a table with userspace addresses; the latter is used for
marking pages dirty when corresponging TCEs are unmapped from
the hardware table.

a68bd1267b ("powerpc/powernv/ioda: Allocate indirect TCE levels
on demand") enabled on-demand allocation of the hardware table,
however it missed the other table so it has still been fully allocated
at the boot time. This fixes the issue by allocating a single level,
just like we do for the hardware table.

Fixes: a68bd1267b ("powerpc/powernv/ioda: Allocate indirect TCE levels on demand")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:59:03 +11:00
Yangtao Li 267acedffc powerpc/powernv/vas: Use DEFINE_SHOW_ATTRIBUTE macro
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-11-25 17:11:22 +11:00
Christoph Hellwig eb01d42a77 PCI: consolidate PCI config entry in drivers/pci
There is no good reason to duplicate the PCI menu in every architecture.
Instead provide a selectable HAVE_PCI symbol that indicates availability
of PCI support, and a FORCE_PCI symbol to for PCI on and the handle the
rest in drivers/pci.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Paul Burton <paul.burton@mips.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-11-23 11:45:34 +09:00
Alistair Popple 3182215dd0 powerpc/powernv/npu: Remove NPU DMA ops
The NPU IOMMU is setup to mirror the parent PCIe device IOMMU
setup. Therefore it does not make sense to call dma operations such as
dma_map_page(), etc. directly on these devices. The existing dma_ops
simply print a warning if they are ever called, however this is
unnecessary and the warnings are likely to go unnoticed.

It is instead simpler to remove these operations and let the generic
DMA code print warnings (eg. via a NULL pointer deref) in cases of
buggy drivers attempting dma operations on NVLink devices.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-11-05 16:05:22 +11:00
David Hildenbrand 5666848774 powerpc/powernv: hold device_hotplug_lock when calling memtrace_offline_pages()
Let's perform all checking + offlining + removing under
device_hotplug_lock, so nobody can mess with these devices via sysfs
concurrently.

[david@redhat.com: take device_hotplug_lock outside of loop]
  Link: http://lkml.kernel.org/r/20180927092554.13567-6-david@redhat.com
Link: http://lkml.kernel.org/r/20180925091457.28651-6-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: John Allen <jallen@linux.vnet.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31 08:54:17 -07:00
David Hildenbrand cec1680591 powerpc/powernv: hold device_hotplug_lock when calling device_online()
device_online() should be called with device_hotplug_lock() held.

Link: http://lkml.kernel.org/r/20180925091457.28651-5-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: John Allen <jallen@linux.vnet.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31 08:54:17 -07:00
David Hildenbrand d15e59260f mm/memory_hotplug: make remove_memory() take the device_hotplug_lock
Patch series "mm: online/offline_pages called w.o. mem_hotplug_lock", v3.

Reading through the code and studying how mem_hotplug_lock is to be used,
I noticed that there are two places where we can end up calling
device_online()/device_offline() - online_pages()/offline_pages() without
the mem_hotplug_lock.  And there are other places where we call
device_online()/device_offline() without the device_hotplug_lock.

While e.g.
	echo "online" > /sys/devices/system/memory/memory9/state
is fine, e.g.
	echo 1 > /sys/devices/system/memory/memory9/online
Will not take the mem_hotplug_lock. However the device_lock() and
device_hotplug_lock.

E.g.  via memory_probe_store(), we can end up calling
add_memory()->online_pages() without the device_hotplug_lock.  So we can
have concurrent callers in online_pages().  We e.g.  touch in
online_pages() basically unprotected zone->present_pages then.

Looks like there is a longer history to that (see Patch #2 for details),
and fixing it to work the way it was intended is not really possible.  We
would e.g.  have to take the mem_hotplug_lock in device/base/core.c, which
sounds wrong.

Summary: We had a lock inversion on mem_hotplug_lock and device_lock().
More details can be found in patch 3 and patch 6.

I propose the general rules (documentation added in patch 6):

1. add_memory/add_memory_resource() must only be called with
   device_hotplug_lock.
2. remove_memory() must only be called with device_hotplug_lock. This is
   already documented and holds for all callers.
3. device_online()/device_offline() must only be called with
   device_hotplug_lock. This is already documented and true for now in core
   code. Other callers (related to memory hotplug) have to be fixed up.
4. mem_hotplug_lock is taken inside of add_memory/remove_memory/
   online_pages/offline_pages.

To me, this looks way cleaner than what we have right now (and easier to
verify).  And looking at the documentation of remove_memory, using
lock_device_hotplug also for add_memory() feels natural.

This patch (of 6):

remove_memory() is exported right now but requires the
device_hotplug_lock, which is not exported.  So let's provide a variant
that takes the lock and only export that one.

The lock is already held in
	arch/powerpc/platforms/pseries/hotplug-memory.c
	drivers/acpi/acpi_memhotplug.c
	arch/powerpc/platforms/powernv/memtrace.c

Apart from that, there are not other users in the tree.

Link: http://lkml.kernel.org/r/20180925091457.28651-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: John Allen <jallen@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31 08:54:17 -07:00
Mike Rapoport 7e1c4e2792 memblock: stop using implicit alignment to SMP_CACHE_BYTES
When a memblock allocation APIs are called with align = 0, the alignment
is implicitly set to SMP_CACHE_BYTES.

Implicit alignment is done deep in the memblock allocator and it can
come as a surprise.  Not that such an alignment would be wrong even
when used incorrectly but it is better to be explicit for the sake of
clarity and the prinicple of the least surprise.

Replace all such uses of memblock APIs with the 'align' parameter
explicitly set to SMP_CACHE_BYTES and stop implicit alignment assignment
in the memblock internal allocation functions.

For the case when memblock APIs are used via helper functions, e.g.  like
iommu_arena_new_node() in Alpha, the helper functions were detected with
Coccinelle's help and then manually examined and updated where
appropriate.

The direct memblock APIs users were updated using the semantic patch below:

@@
expression size, min_addr, max_addr, nid;
@@
(
|
- memblock_alloc_try_nid_raw(size, 0, min_addr, max_addr, nid)
+ memblock_alloc_try_nid_raw(size, SMP_CACHE_BYTES, min_addr, max_addr,
nid)
|
- memblock_alloc_try_nid_nopanic(size, 0, min_addr, max_addr, nid)
+ memblock_alloc_try_nid_nopanic(size, SMP_CACHE_BYTES, min_addr, max_addr,
nid)
|
- memblock_alloc_try_nid(size, 0, min_addr, max_addr, nid)
+ memblock_alloc_try_nid(size, SMP_CACHE_BYTES, min_addr, max_addr, nid)
|
- memblock_alloc(size, 0)
+ memblock_alloc(size, SMP_CACHE_BYTES)
|
- memblock_alloc_raw(size, 0)
+ memblock_alloc_raw(size, SMP_CACHE_BYTES)
|
- memblock_alloc_from(size, 0, min_addr)
+ memblock_alloc_from(size, SMP_CACHE_BYTES, min_addr)
|
- memblock_alloc_nopanic(size, 0)
+ memblock_alloc_nopanic(size, SMP_CACHE_BYTES)
|
- memblock_alloc_low(size, 0)
+ memblock_alloc_low(size, SMP_CACHE_BYTES)
|
- memblock_alloc_low_nopanic(size, 0)
+ memblock_alloc_low_nopanic(size, SMP_CACHE_BYTES)
|
- memblock_alloc_from_nopanic(size, 0, min_addr)
+ memblock_alloc_from_nopanic(size, SMP_CACHE_BYTES, min_addr)
|
- memblock_alloc_node(size, 0, nid)
+ memblock_alloc_node(size, SMP_CACHE_BYTES, nid)
)

[mhocko@suse.com: changelog update]
[akpm@linux-foundation.org: coding-style fixes]
[rppt@linux.ibm.com: fix missed uses of implicit alignment]
  Link: http://lkml.kernel.org/r/20181016133656.GA10925@rapoport-lnx
Link: http://lkml.kernel.org/r/1538687224-17535-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Paul Burton <paul.burton@mips.com>	[MIPS]
Acked-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31 08:54:16 -07:00
Mike Rapoport 57c8a661d9 mm: remove include/linux/bootmem.h
Move remaining definitions and declarations from include/linux/bootmem.h
into include/linux/memblock.h and remove the redundant header.

The includes were replaced with the semantic patch below and then
semi-automated removal of duplicated '#include <linux/memblock.h>

@@
@@
- #include <linux/bootmem.h>
+ #include <linux/memblock.h>

[sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
  Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
[sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
  Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
[sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
  Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Serge Semin <fancer.lancer@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31 08:54:16 -07:00
Mike Rapoport eb31d559f1 memblock: remove _virt from APIs returning virtual address
The conversion is done using

sed -i 's@memblock_virt_alloc@memblock_alloc@g' \
	$(git grep -l memblock_virt_alloc)

Link: http://lkml.kernel.org/r/1536927045-23536-8-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Serge Semin <fancer.lancer@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31 08:54:15 -07:00
Mike Rapoport 9a8dd708d5 memblock: rename memblock_alloc{_nid,_try_nid} to memblock_phys_alloc*
Make it explicit that the caller gets a physical address rather than a
virtual one.

This will also allow using meblock_alloc prefix for memblock allocations
returning virtual address, which is done in the following patches.

The conversion is done using the following semantic patch:

@@
expression e1, e2, e3;
@@
(
- memblock_alloc(e1, e2)
+ memblock_phys_alloc(e1, e2)
|
- memblock_alloc_nid(e1, e2, e3)
+ memblock_phys_alloc_nid(e1, e2, e3)
|
- memblock_alloc_try_nid(e1, e2, e3)
+ memblock_phys_alloc_try_nid(e1, e2, e3)
)

Link: http://lkml.kernel.org/r/1536927045-23536-7-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Serge Semin <fancer.lancer@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31 08:54:15 -07:00
Sam Bobroff fef7f90552 powerpc/eeh: Cleanup eeh_ops.wait_state()
The wait_state member of eeh_ops does not need to be platform
dependent; it's just logic around eeh_ops.get_state(). Therefore,
merge the two (slightly different!) platform versions into a new
function, eeh_wait_state() and remove the eeh_ops member.

While doing this, also correct:
* The wait logic, so that it never waits longer than max_wait.
* The wait logic, so that it never waits less than
  EEH_STATE_MIN_WAIT_TIME.
* One call site where the result is treated like a bit field before
  it's checked for negative error values.
* In pseries_eeh_get_state(), rename the "state" parameter to "delay"
  because that's what it is.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Sam Bobroff e762bb891a powerpc/eeh: Cleanup eeh_pe_state_mark()
Currently, eeh_pe_state_mark() marks a PE (and it's children) with a
state and then performs additional processing if that state included
EEH_PE_ISOLATED.

The state parameter is always a constant at the call site, so
rearrange eeh_pe_state_mark() into two functions and just call the
appropriate one at each site.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Sam Bobroff 80e65b0094 powerpc/eeh: Cleanup list_head field names
Instances of struct eeh_pe are placed in a tree structure using the
fields "child_list" and "child", so place these next to each other
in the definition.

The field "child" is a list entry, so remove the unnecessary and
misleading use of the list initializer, LIST_HEAD(), on it.

The eeh_dev struct contains two list entry fields, called "list" and
"rmv_list". Rename them to "entry" and "rmv_entry" and, as above, stop
initializing them with LIST_HEAD().

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Sam Bobroff bffc0176e7 powerpc/eeh: Cleanup EEH_POSTPONED_PROBE
Currently a flag, EEH_POSTPONED_PROBE, is used to prevent an incorrect
message "EEH: No capable adapters found" from being displayed during
the boot of powernv systems.

It is necessary because, on powernv, the call to eeh_probe_devices()
made from eeh_init() is too early and EEH can't yet be enabled. A
second call is made later from eeh_pnv_post_init(), which succeeds.

(On pseries, the first call succeeds because PCI devices are set up
early enough and no second call is made.)

This can be simplified by moving the early call to eeh_probe_devices()
from eeh_init() (where it's seen by both platforms) to
pSeries_final_fixup(), so that each platform only calls
eeh_probe_devices() once, at a point where it can succeed.
This is slightly later in the boot sequence, but but still early
enough and it is now in the same place in the sequence for both
platforms (the pcibios_fixup hook).

The display of the message can be cleaned up as well, by moving it
into eeh_probe_devices().

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Bartlomiej Zolnierkiewicz 719736e1cc powerpc: remove redundant 'default n' from Kconfig-s
'default n' is the default value for any bool or tristate Kconfig
setting so there is no need to write it explicitly.

Also since commit f467c5640c ("kconfig: only write '# CONFIG_FOO
is not set' for visible symbols") the Kconfig behavior is the same
regardless of 'default n' being present or not:

    ...
    One side effect of (and the main motivation for) this change is making
    the following two definitions behave exactly the same:

        config FOO
                bool

        config FOO
                bool
                default n

    With this change, neither of these will generate a
    '# CONFIG_FOO is not set' line (assuming FOO isn't selected/implied).
    That might make it clearer to people that a bare 'default n' is
    redundant.
    ...

Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Michael Ellerman 9b7e4d601b Merge branch 'fixes' into next
Merge our fixes branch. It has a few important fixes that are needed for
futher testing and also some commits that will conflict with content in
next.
2018-10-09 16:51:05 +11:00
Mark Hairgrove f86ad3e019 powerpc/powernv/npu: Remove atsd_threshold debugfs setting
This threshold is no longer used now that all invalidates issue a single
ATSD to each active NPU.

Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Reviewed-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-04 16:55:54 +10:00
Mark Hairgrove 3689c37d23 powerpc/powernv/npu: Use size-based ATSD invalidates
Prior to this change only two types of ATSDs were issued to the NPU:
invalidates targeting a single page and invalidates targeting the whole
address space. The crossover point happened at the configurable
atsd_threshold which defaulted to 2M. Invalidates that size or smaller
would issue per-page invalidates for the whole range.

The NPU supports more invalidation sizes however: 64K, 2M, 1G, and all.
These invalidates target addresses aligned to their size. 2M is a common
invalidation size for GPU-enabled applications because that is a GPU
page size, so reducing the number of invalidates by 32x in that case is a
clear improvement.

ATSD latency is high in general so now we always issue a single invalidate
rather than multiple. This will over-invalidate in some cases, but for any
invalidation size over 2M it matches or improves the prior behavior.
There's also an improvement for single-page invalidates since the prior
version issued two invalidates for that case instead of one.

With this change all issued ATSDs now perform a flush, so the flush
parameter has been removed from all the helpers.

To show the benefit here are some performance numbers from a
microbenchmark which creates a 1G allocation then uses mprotect with
PROT_NONE to trigger invalidates in strides across the allocation.

One NPU (1 GPU):

         mprotect rate (GB/s)
Stride   Before      After      Speedup
64K         5.3        5.6           5%
1M         39.3       57.4          46%
2M         49.7       82.6          66%
4M        286.6      285.7           0%

Two NPUs (6 GPUs):

         mprotect rate (GB/s)
Stride   Before      After      Speedup
64K         6.5        7.4          13%
1M         33.4       67.9         103%
2M         38.7       93.1         141%
4M        356.7      354.6          -1%

Anything over 2M is roughly the same as before since both cases issue a
single ATSD.

Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Reviewed-By: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-04 16:55:53 +10:00
Mark Hairgrove 7ead15a144 powerpc/powernv/npu: Reduce eieio usage when issuing ATSD invalidates
There are two types of ATSDs issued to the NPU: invalidates targeting a
specific virtual address and invalidates targeting the whole address
space. In both cases prior to this change, the sequence was:

    for each NPU
        - Write the target address to the XTS_ATSD_AVA register
        - EIEIO
        - Write the launch value to issue the ATSD

First, a target address is not required when invalidating the whole
address space, so that write and the EIEIO have been removed. The AP
(size) field in the launch is not needed either.

Second, for per-address invalidates the above sequence is inefficient in
the common case of multiple NPUs because an EIEIO is issued per NPU. This
unnecessarily forces the launches of later ATSDs to be ordered with the
launches of earlier ones. The new sequence only issues a single EIEIO:

    for each NPU
        - Write the target address to the XTS_ATSD_AVA register
    EIEIO
    for each NPU
        - Write the launch value to issue the ATSD

Performance results were gathered using a microbenchmark which creates a
1G allocation then uses mprotect with PROT_NONE to trigger invalidates in
strides across the allocation.

With only a single NPU active (one GPU) the difference is in the noise for
both types of invalidates (+/-1%).

With two NPUs active (on a 6-GPU system) the effect is more noticeable:

         mprotect rate (GB/s)
Stride   Before      After      Speedup
64K         5.9        6.5          10%
1M         31.2       33.4           7%
2M         36.3       38.7           7%
4M        322.6      356.7          11%

Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Reviewed-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-04 16:55:52 +10:00
Michal Suchanek 8a03e81cb1 powerpc/64s: consolidate MCE counter increment.
The code in machine_check_exception excludes 64s hvmode when
incrementing the MCE counter only to call opal_machine_check to
increment it specifically for this case.

Remove the exclusion and special case.

Fixes: a43c159042 ("powerpc/pseries: Flush SLB contents on SLB MCE
		errors.")

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-03 15:40:06 +10:00
Breno Leitao 62dea077f5 powerpc/powernv: Mark function as __noreturn
There is a mismatch between function pnv_platform_error_reboot() definition
and declaration regarding function modifiers. In the declaration part, it
contains the function attribute __noreturn, while function definition
itself lacks it.

This was reported by sparse tool as an error:

  arch/powerpc/platforms/powernv/opal.c:538:6: error: symbol 'pnv_platform_error_reboot' redeclared with different type (originally declared at arch/powerpc/platforms/powernv/powernv.h:11) - different modifiers

I checked and the function is already being considered as being 'noreturn'
by the compiler, thus, I understand this patch does not change any code
being generated.

Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-03 15:40:05 +10:00
Rob Herring b9ef7b4b86 powerpc: Convert to using %pOFn instead of device_node.name
In preparation to remove the node name pointer from struct device_node,
convert printf users to use the %pOFn format specifier.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-03 15:40:01 +10:00
Vaibhav Jain 8139046a5a powerpc/powernv: Make possible for user to force a full ipl cec reboot
Ever since fast reboot is enabled by default in opal,
opal_cec_reboot() will use fast-reset instead of full IPL to perform
system reboot. This leaves the user with no direct way to force a full
IPL reboot except changing an nvram setting that persistently disables
fast-reset for all subsequent reboots.

This patch provides a more direct way for the user to force a one-shot
full IPL reboot by passing the command line argument 'full' to the
reboot command. So the user will be able to tweak the reboot behavior
via:

  $ sudo reboot full	# Force a full ipl reboot skipping fast-reset

  or
  $ sudo reboot  	# default reboot path (usually fast-reset)

The reboot command passes the un-parsed command argument to the kernel
via the 'Reboot' syscall which is then passed on to the arch function
pnv_restart(). The patch updates pnv_restart() to handle this cmd-arg
and issues opal_cec_reboot2 with OPAL_REBOOT_FULL_IPL to force a full
IPL reset.

Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-03 15:39:45 +10:00
Alexey Kardashevskiy 7233b8cab3 powerpc/powernv/ioda2: Reduce upper limit for DMA window size (again)
mpe: This was fixed originally in commit d3d4ffaae4
("powerpc/powernv/ioda2: Reduce upper limit for DMA window size"), but
contrary to what the merge commit says was inadvertently lost by me in
commit ce57c6610c ("Merge branch 'topic/ppc-kvm' into next") which
brought in changes that moved the code to a new file. So reapply it to
the new file.

Original commit message follows:

We use PHB in mode1 which uses bit 59 to select a correct DMA window.
However there is mode2 which uses bits 59:55 and allows up to 32 DMA
windows per a PE.

Even though documentation does not clearly specify that, it seems that
the actual hardware does not support bits 59:55 even in mode1, in
other words we can create a window as big as 1<<58 but DMA simply
won't work.

This reduces the upper limit from 59 to 55 bits to let the userspace
know about the hardware limits.

Fixes: ce57c6610c ("Merge branch 'topic/ppc-kvm' into next")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-20 14:31:03 +10:00
Mahesh Salgaonkar a43c159042 powerpc/pseries: Flush SLB contents on SLB MCE errors.
On pseries, as of today system crashes if we get a machine check
exceptions due to SLB errors. These are soft errors and can be fixed
by flushing the SLBs so the kernel can continue to function instead of
system crash. We do this in real mode before turning on MMU. Otherwise
we would run into nested machine checks. This patch now fetches the
rtas error log in real mode and flushes the SLBs on SLB/ERAT errors.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michal Suchanek <msuchanek@suse.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-19 21:59:22 +10:00
Rashmica Gupta 3f7daf3d75 powerpc/memtrace: Remove memory in chunks
When hot-removing memory release_mem_region_adjustable() splits iomem
resources if they are not the exact size of the memory being
hot-deleted. Adding this memory back to the kernel adds a new resource.

Eg a node has memory 0x0 - 0xfffffffff. Hot-removing 1GB from
0xf40000000 results in the single resource 0x0-0xfffffffff being split
into two resources: 0x0-0xf3fffffff and 0xf80000000-0xfffffffff.

When we hot-add the memory back we now have three resources:
0x0-0xf3fffffff, 0xf40000000-0xf7fffffff, and 0xf80000000-0xfffffffff.

This is an issue if we try to remove some memory that overlaps
resources. Eg when trying to remove 2GB at address 0xf40000000,
release_mem_region_adjustable() fails as it expects the chunk of memory
to be within the boundaries of a single resource. We then get the
warning: "Unable to release resource" and attempting to use memtrace
again gives us this error: "bash: echo: write error: Resource
temporarily unavailable"

This patch makes memtrace remove memory in chunks that are always the
same size from an address that is always equal to end_of_memory -
n*size, for some n. So hotremoving and hotadding memory of different
sizes will now not attempt to remove memory that spans multiple
resources.

Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-19 21:58:02 +10:00
Joel Stanley b0dc0f8618 powerpc/powernv: Don't select the cpufreq governors
Deciding wich govenors should be built into the kernel can be left to
users to configure.

Fixes: 81f359027a ("cpufreq: powernv: Select CPUFreq related Kconfig options for powernv")
Signed-off-by: Joel Stanley <joel@jms.id.au>
[mpe: Update powernv/ppc64 defconfigs to enable them by default]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-17 21:16:26 +10:00
Linus Torvalds aba16dc5cf Merge branch 'ida-4.19' of git://git.infradead.org/users/willy/linux-dax
Pull IDA updates from Matthew Wilcox:
 "A better IDA API:

      id = ida_alloc(ida, GFP_xxx);
      ida_free(ida, id);

  rather than the cumbersome ida_simple_get(), ida_simple_remove().

  The new IDA API is similar to ida_simple_get() but better named.  The
  internal restructuring of the IDA code removes the bitmap
  preallocation nonsense.

  I hope the net -200 lines of code is convincing"

* 'ida-4.19' of git://git.infradead.org/users/willy/linux-dax: (29 commits)
  ida: Change ida_get_new_above to return the id
  ida: Remove old API
  test_ida: check_ida_destroy and check_ida_alloc
  test_ida: Convert check_ida_conv to new API
  test_ida: Move ida_check_max
  test_ida: Move ida_check_leaf
  idr-test: Convert ida_check_nomem to new API
  ida: Start new test_ida module
  target/iscsi: Allocate session IDs from an IDA
  iscsi target: fix session creation failure handling
  drm/vmwgfx: Convert to new IDA API
  dmaengine: Convert to new IDA API
  ppc: Convert vas ID allocation to new IDA API
  media: Convert entity ID allocation to new IDA API
  ppc: Convert mmu context allocation to new IDA API
  Convert net_namespace to new IDA API
  cb710: Convert to new IDA API
  rsxx: Convert to new IDA API
  osd: Convert to new IDA API
  sd: Convert to new IDA API
  ...
2018-08-26 11:48:42 -07:00
Linus Torvalds aa5b1054ba powerpc fixes for 4.19 #2
- An implementation for the newly added hv_ops->flush() for the OPAL hvc
    console driver backends, I forgot to apply this after merging the hvc driver
    changes before the merge window.
 
  - Enable all PCI bridges at boot on powernv, to avoid races when multiple
    children of a bridge try to enable it simultaneously. This is a workaround
    until the PCI core can be enhanced to fix the races.
 
  - A fix to query PowerVM for the correct system topology at boot before
    initialising sched domains, seen in some configurations to cause broken
    scheduling etc.
 
  - A fix for pte_access_permitted() on "nohash" platforms.
 
  - Two commits to fix SIGBUS when using remap_pfn_range() seen on Power9 due to
    a workaround when using the nest MMU (GPUs, accelerators).
 
  - Another fix to the VFIO code used by KVM, the previous fix had some bugs
    which caused guests to not start in some configurations.
 
  - A handful of other minor fixes.
 
 Thanks to:
   Aneesh Kumar K.V, Benjamin Herrenschmidt, Christophe Leroy, Hari Bathini, Luke
   Dashjr, Mahesh Salgaonkar, Nicholas Piggin, Paul Mackerras, Srikar Dronamraju.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAlt//10THG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgLUGD/9y/MPrs+V2lNFhxP+l1jO4Ro0v8DPI
 vARjqq06WfppgQgSS1dvWfzLaMFbGe8wPRfL0T1xMOXCZg8Ts/HrgxVBVFYcv6/Q
 xBaU5bKztg6HKQbwwO+8B/gdTA3hO7yFVux6JGwsGO5Ebl8Q3UDOdbvYX6XTj3H8
 rdiicle9LaI7qodC8bxlBo1Be0YKEW0O/ag179sxXzozzvoPIyFpeX6FL1sAft++
 XlQS1MHu4hErSM2rbmyoFCm+SmyRt3CD0NTVmNd2cgw5XexPOBFlnsdgpaK1jJFc
 CYu1chP83E91ol1/8NAPkcmPWvP6MGyoqOl75RghooY2D1IJ2GtLKoz2Dvc53ay2
 ZlIpMyc2CYIa55Mj18tOV/NbGbh0Lf0Ta++BxqxbcCDt5fq0VxHkoDPqBgWh/tdp
 Po7oQc7U2VTKKC3wiLr//nSHpgtSTAWtucDt7oT7GdP8+EMxUZ8teBFIfTkwfuD2
 plroEmcYuRD3beI4FAG/iCp5POOCsnHLkKVDl7tyQPl3Yvu8hvyLY9gBS9RN4Unt
 /z94YFJtz7UD+VP7jDaPiQS26y0WEJfW8ml0tNxMZVBdksZLPIPN0/EU/04V0jWf
 GxynzKMhElxC67tK/Eb43EGiLEZAlbEnJxoOtiCnL1MK+OIJPjg6e7o6qlsmaa+Y
 zkXVpxLtkq0OoQ==
 =1AUa
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - An implementation for the newly added hv_ops->flush() for the OPAL
   hvc console driver backends, I forgot to apply this after merging the
   hvc driver changes before the merge window.

 - Enable all PCI bridges at boot on powernv, to avoid races when
   multiple children of a bridge try to enable it simultaneously. This
   is a workaround until the PCI core can be enhanced to fix the races.

 - A fix to query PowerVM for the correct system topology at boot before
   initialising sched domains, seen in some configurations to cause
   broken scheduling etc.

 - A fix for pte_access_permitted() on "nohash" platforms.

 - Two commits to fix SIGBUS when using remap_pfn_range() seen on Power9
   due to a workaround when using the nest MMU (GPUs, accelerators).

 - Another fix to the VFIO code used by KVM, the previous fix had some
   bugs which caused guests to not start in some configurations.

 - A handful of other minor fixes.

Thanks to: Aneesh Kumar K.V, Benjamin Herrenschmidt, Christophe Leroy,
Hari Bathini, Luke Dashjr, Mahesh Salgaonkar, Nicholas Piggin, Paul
Mackerras, Srikar Dronamraju.

* tag 'powerpc-4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/mce: Fix SLB rebolting during MCE recovery path.
  KVM: PPC: Book3S: Fix guest DMA when guest partially backed by THP pages
  powerpc/mm/radix: Only need the Nest MMU workaround for R -> RW transition
  powerpc/mm/books3s: Add new pte bit to mark pte temporarily invalid.
  powerpc/nohash: fix pte_access_permitted()
  powerpc/topology: Get topology for shared processors at boot
  powerpc64/ftrace: Include ftrace.h needed for enable/disable calls
  powerpc/powernv/pci: Work around races in PCI bridge enabling
  powerpc/fadump: cleanup crash memory ranges support
  powerpc/powernv: provide a console flush operation for opal hvc driver
  powerpc/traps: Avoid rate limit messages from show unhandled signals
  powerpc/64s: Fix PACA_IRQ_HARD_DIS accounting in idle_power4()
2018-08-24 09:34:23 -07:00
Matthew Wilcox cd38049e48 ppc: Convert vas ID allocation to new IDA API
Removes a custom spinlock and simplifies the code.  Also fix an
error where we could allocate one ID too many.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-08-21 23:54:19 -04:00
Benjamin Herrenschmidt db2173198b powerpc/powernv/pci: Work around races in PCI bridge enabling
The generic code is racy when multiple children of a PCI bridge try to
enable it simultaneously.

This leads to drivers trying to access a device through a
not-yet-enabled bridge, and this EEH errors under various
circumstances when using parallel driver probing.

There is work going on to fix that properly in the PCI core but it
will take some time.

x86 gets away with it because (outside of hotplug), the BIOS enables
all the bridges at boot time.

This patch does the same thing on powernv by enabling all bridges that
have child devices at boot time, thus avoiding subsequent races. It's
suitable for backporting to stable and distros, while the proper PCI
fix will probably be significantly more invasive.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: stable@vger.kernel.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-20 20:29:17 +10:00
Nicholas Piggin 95b861a76c powerpc/powernv: provide a console flush operation for opal hvc driver
Provide the flush hv_op for the opal hvc driver. This will flush the
firmware console buffers without spinning with interrupts disabled.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-20 20:19:54 +10:00
Linus Torvalds 5e2d059b52 powerpc updates for 4.19
Notable changes:
 
  - A fix for a bug in our page table fragment allocator, where a page table page
    could be freed and reallocated for something else while still in use, leading
    to memory corruption etc. The fix reuses pt_mm in struct page (x86 only) for
    a powerpc only refcount.
 
  - Fixes to our pkey support. Several are user-visible changes, but bring us in
    to line with x86 behaviour and/or fix outright bugs. Thanks to Florian Weimer
    for reporting many of these.
 
  - A series to improve the hvc driver & related OPAL console code, which have
    been seen to cause hardlockups at times. The hvc driver changes in particular
    have been in linux-next for ~month.
 
  - Increase our MAX_PHYSMEM_BITS to 128TB when SPARSEMEM_VMEMMAP=y.
 
  - Remove Power8 DD1 and Power9 DD1 support, neither chip should be in use
    anywhere other than as a paper weight.
 
  - An optimised memcmp implementation using Power7-or-later VMX instructions
 
  - Support for barrier_nospec on some NXP CPUs.
 
  - Support for flushing the count cache on context switch on some IBM CPUs
    (controlled by firmware), as a Spectre v2 mitigation.
 
  - A series to enhance the information we print on unhandled signals to bring it
    into line with other arches, including showing the offending VMA and dumping
    the instructions around the fault.
 
 Thanks to:
   Aaro Koskinen, Akshay Adiga, Alastair D'Silva, Alexey Kardashevskiy, Alexey
   Spirkov, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar,
   Arnd Bergmann, Bartosz Golaszewski, Benjamin Herrenschmidt, Bharat Bhushan,
   Bjoern Noetel, Boqun Feng, Breno Leitao, Bryant G. Ly, Camelia Groza,
   Christophe Leroy, Christoph Hellwig, Cyril Bur, Dan Carpenter, Daniel Klamt,
   Darren Stevens, Dave Young, David Gibson, Diana Craciun, Finn Thain, Florian
   Weimer, Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven, Geoff Levand,
   Guenter Roeck, Gustavo Romero, Haren Myneni, Hari Bathini, Joel Stanley,
   Jonathan Neuschäfer, Kees Cook, Madhavan Srinivasan, Mahesh Salgaonkar, Markus
   Elfring, Mathieu Malaterre, Mauro S. M. Rodrigues, Michael Hanselmann, Michael
   Neuling, Michael Schmitz, Mukesh Ojha, Murilo Opsfelder Araujo, Nicholas
   Piggin, Parth Y Shah, Paul Mackerras, Paul Menzel, Ram Pai, Randy Dunlap,
   Rashmica Gupta, Reza Arbab, Rodrigo R. Galvao, Russell Currey, Sam Bobroff,
   Scott Wood, Shilpasri G Bhat, Simon Guo, Souptick Joarder, Stan Johnson,
   Thiago Jung Bauermann, Tyrel Datwyler, Vaibhav Jain, Vasant Hegde, Venkat Rao
   B, zhong jiang.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAlt2O6cTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgC7hD/4+cj796Df7GsVsIMxzQm7SS9dklIdO
 JuKj2Nr5HRzTH59jWlXukLG9mfTNCFgFJB4gEpK1ArDOTcHTCI9RRsLZTZ/kum66
 7Pd+7T40dLYXB5uecuUs0vMXa2fI3syKh1VLzACSXv3Dh9BBIKQBwW/aD2eww4YI
 1fS5LnXZ2PSxfr6KNAC6ogZnuaiD0sHXOYrtGHq+S/TFC7+Z6ySa6+AnPS+hPVoo
 /rHDE1Khr66aj7uk+PP2IgUrCFj6Sbj6hTVlS/iAuwbMjUl9ty6712PmvX9x6wMZ
 13hJQI+g6Ci+lqLKqmqVUpXGSr6y4NJGPS/Hko4IivBTJApI+qV/tF2H9nxU+6X0
 0RqzsMHPHy13n2torA1gC7ttzOuXPI4hTvm6JWMSsfmfjTxLANJng3Dq3ejh6Bqw
 76EMowpDLexwpy7/glPpqNdsP4ySf2Qm8yq3mR7qpL4m3zJVRGs11x+s5DW8NKBL
 Fl5SqZvd01abH+sHwv6NLaLkEtayUyohxvyqu2RU3zu5M5vi7DhqstybTPjKPGu0
 icSPh7b2y10WpOUpC6lxpdi8Me8qH47mVc/trZ+SpgBrsuEmtJhGKszEnzRCOqos
 o2IhYHQv3lQv86kpaAFQlg/RO+Lv+Lo5qbJ209V+hfU5nYzXpEulZs4dx1fbA+ze
 fK8GEh+u0L4uJg==
 =PzRz
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "Notable changes:

   - A fix for a bug in our page table fragment allocator, where a page
     table page could be freed and reallocated for something else while
     still in use, leading to memory corruption etc. The fix reuses
     pt_mm in struct page (x86 only) for a powerpc only refcount.

   - Fixes to our pkey support. Several are user-visible changes, but
     bring us in to line with x86 behaviour and/or fix outright bugs.
     Thanks to Florian Weimer for reporting many of these.

   - A series to improve the hvc driver & related OPAL console code,
     which have been seen to cause hardlockups at times. The hvc driver
     changes in particular have been in linux-next for ~month.

   - Increase our MAX_PHYSMEM_BITS to 128TB when SPARSEMEM_VMEMMAP=y.

   - Remove Power8 DD1 and Power9 DD1 support, neither chip should be in
     use anywhere other than as a paper weight.

   - An optimised memcmp implementation using Power7-or-later VMX
     instructions

   - Support for barrier_nospec on some NXP CPUs.

   - Support for flushing the count cache on context switch on some IBM
     CPUs (controlled by firmware), as a Spectre v2 mitigation.

   - A series to enhance the information we print on unhandled signals
     to bring it into line with other arches, including showing the
     offending VMA and dumping the instructions around the fault.

  Thanks to: Aaro Koskinen, Akshay Adiga, Alastair D'Silva, Alexey
  Kardashevskiy, Alexey Spirkov, Alistair Popple, Andrew Donnellan,
  Aneesh Kumar K.V, Anju T Sudhakar, Arnd Bergmann, Bartosz Golaszewski,
  Benjamin Herrenschmidt, Bharat Bhushan, Bjoern Noetel, Boqun Feng,
  Breno Leitao, Bryant G. Ly, Camelia Groza, Christophe Leroy, Christoph
  Hellwig, Cyril Bur, Dan Carpenter, Daniel Klamt, Darren Stevens, Dave
  Young, David Gibson, Diana Craciun, Finn Thain, Florian Weimer,
  Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven, Geoff Levand,
  Guenter Roeck, Gustavo Romero, Haren Myneni, Hari Bathini, Joel
  Stanley, Jonathan Neuschäfer, Kees Cook, Madhavan Srinivasan, Mahesh
  Salgaonkar, Markus Elfring, Mathieu Malaterre, Mauro S. M. Rodrigues,
  Michael Hanselmann, Michael Neuling, Michael Schmitz, Mukesh Ojha,
  Murilo Opsfelder Araujo, Nicholas Piggin, Parth Y Shah, Paul
  Mackerras, Paul Menzel, Ram Pai, Randy Dunlap, Rashmica Gupta, Reza
  Arbab, Rodrigo R. Galvao, Russell Currey, Sam Bobroff, Scott Wood,
  Shilpasri G Bhat, Simon Guo, Souptick Joarder, Stan Johnson, Thiago
  Jung Bauermann, Tyrel Datwyler, Vaibhav Jain, Vasant Hegde, Venkat
  Rao, zhong jiang"

* tag 'powerpc-4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (234 commits)
  powerpc/mm/book3s/radix: Add mapping statistics
  powerpc/uaccess: Enable get_user(u64, *p) on 32-bit
  powerpc/mm/hash: Remove unnecessary do { } while(0) loop
  powerpc/64s: move machine check SLB flushing to mm/slb.c
  powerpc/powernv/idle: Fix build error
  powerpc/mm/tlbflush: update the mmu_gather page size while iterating address range
  powerpc/mm: remove warning about ‘type’ being set
  powerpc/32: Include setup.h header file to fix warnings
  powerpc: Move `path` variable inside DEBUG_PROM
  powerpc/powermac: Make some functions static
  powerpc/powermac: Remove variable x that's never read
  cxl: remove a dead branch
  powerpc/powermac: Add missing include of header pmac.h
  powerpc/kexec: Use common error handling code in setup_new_fdt()
  powerpc/xmon: Add address lookup for percpu symbols
  powerpc/mm: remove huge_pte_offset_and_shift() prototype
  powerpc/lib: Use patch_site to patch copy_32 functions once cache is enabled
  powerpc/pseries: Fix endianness while restoring of r3 in MCE handler.
  powerpc/fadump: merge adjacent memory ranges to reduce PT_LOAD segements
  powerpc/fadump: handle crash memory ranges array index overflow
  ...
2018-08-17 11:32:50 -07:00
Aneesh Kumar K.V ae24ce5e12 powerpc/powernv/idle: Fix build error
Fix the below build error using strlcpy instead of strncpy

In function 'pnv_parse_cpuidle_dt',
    inlined from 'pnv_init_idle_states' at arch/powerpc/platforms/powernv/idle.c:840:7,
    inlined from '__machine_initcall_powernv_pnv_init_idle_states' at arch/powerpc/platforms/powernv/idle.c:870:1:
arch/powerpc/platforms/powernv/idle.c:820:3: error: 'strncpy' specified bound 16 equals destination size [-Werror=stringop-truncation]
   strncpy(pnv_idle_states[i].name, temp_string[i],
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    PNV_IDLE_NAME_LEN);

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10 22:12:39 +10:00
Rashmica Gupta d3da701d33 powerpc/powernv: Allow memory that has been hot-removed to be hot-added
This patch allows the memory removed by memtrace to be readded to the
kernel. So now you don't have to reboot your system to add the memory
back to the kernel or to have a different amount of memory removed.

Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com>
Tested-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10 22:12:31 +10:00
Benjamin Herrenschmidt 77b5f703dc powerpc/powernv/opal: Use standard interrupts property when available
For (bad) historical reasons, OPAL used to create a non-standard pair
of properties "opal-interrupts" and "opal-interrupts-names" for
representing the list of interrupts it wants Linux to request on its
behalf.

Among other issues, the opal-interrupts doesn't have a way to carry
the type of interrupts, and they were assumed to be all level
sensitive.

This is wrong on some recent systems where some of them are edge
sensitive causing warnings in the XIVE code and possible misbehaviours
if they need to be retriggered (typically the NPU2 TCE error
interrupts).

This makes Linux switch to using the standard "interrupts" and
"interrupt-names" properties instead when they are available, using
standard of_irq helpers, which can carry all the desired type
information.

Newer versions of OPAL will generate those properties in addition to
the legacy ones.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Fixup prefix logic to check strlen(r->name). Reinstate setting
 of start = 0 in opal_event_shutdown() to avoid double free warnings]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:38 +10:00
Haren Myneni 656ecc16e8 crypto/nx: Initialize 842 high and normal RxFIFO control registers
NX increments readOffset by FIFO size in receive FIFO control register
when CRB is read. But the index in RxFIFO has to match with the
corresponding entry in FIFO maintained by VAS in kernel. Otherwise NX
may be processing incorrect CRBs and can cause CRB timeout.

VAS FIFO offset is 0 when the receive window is opened during
initialization. When the module is reloaded or in kexec boot, readOffset
in FIFO control register may not match with VAS entry. This patch adds
nx_coproc_init OPAL call to reset readOffset and queued entries in FIFO
control register for both high and normal FIFOs.

Signed-off-by: Haren Myneni <haren@us.ibm.com>
[mpe: Fixup uninitialized variable warning]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:34 +10:00
Haren Myneni 6e708000ec powerpc/powernv: Export opal_check_token symbol
Export opal_check_token symbol for modules to check the availability
of OPAL calls before using them.

Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:33 +10:00
Michael Ellerman 99d54754d3 powerpc/powernv: Query firmware for count cache flush settings
Look for fw-features properties to determine the appropriate settings
for the count cache flush, and then call the generic powerpc code to
set it up based on the security feature flags.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:27 +10:00
Michael Ellerman af375eefbf powerpc/64: Call setup_barrier_nospec() from setup_arch()
Currently we require platform code to call setup_barrier_nospec(). But
if we add an empty definition for the !CONFIG_PPC_BARRIER_NOSPEC case
then we can call it in setup_arch().

Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-08 00:32:23 +10:00
Benjamin Herrenschmidt e27e0a9465 powerpc/xive: Remove xive_kexec_teardown_cpu()
It's identical to xive_teardown_cpu() so just use the latter

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:28 +10:00
Reza Arbab 9eab9901b0 powerpc/powernv: Fix concurrency issue with npu->mmio_atsd_usage
We've encountered a performance issue when multiple processors stress
{get,put}_mmio_atsd_reg(). These functions contend for
mmio_atsd_usage, an unsigned long used as a bitmask.

The accesses to mmio_atsd_usage are done using test_and_set_bit_lock()
and clear_bit_unlock(). As implemented, both of these will require
a (successful) stwcx to that same cache line.

What we end up with is thread A, attempting to unlock, being slowed by
other threads repeatedly attempting to lock. A's stwcx instructions
fail and retry because the memory reservation is lost every time a
different thread beats it to the punch.

There may be a long-term way to fix this at a larger scale, but for
now resolve the immediate problem by gating our call to
test_and_set_bit_lock() with one to test_bit(), which is obviously
implemented without using a store.

Fixes: 1ab66d1fba ("powerpc/powernv: Introduce address translation services for Nvlink2")
Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
Acked-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-03 20:09:01 +10:00
Nicholas Piggin 3127692deb powernv/cpuidle: Fix idle states all being marked invalid
Commit 9c7b185ab2 ("powernv/cpuidle: Parse dt idle properties into
global structure") parses dt idle states into structs, but never marks
them valid. This results in all idle states being lost.

Fixes: 9c7b185ab2 ("powernv/cpuidle: Parse dt idle properties into global structure")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-03 18:30:33 +10:00
Hari Vyas 44bda4b7d2 PCI: Fix is_added/is_busmaster race condition
When a PCI device is detected, pdev->is_added is set to 1 and proc and
sysfs entries are created.

When the device is removed, pdev->is_added is checked for one and then
device is detached with clearing of proc and sys entries and at end,
pdev->is_added is set to 0.

is_added and is_busmaster are bit fields in pci_dev structure sharing same
memory location.

A strange issue was observed with multiple removal and rescan of a PCIe
NVMe device using sysfs commands where is_added flag was observed as zero
instead of one while removing device and proc,sys entries are not cleared.
This causes issue in later device addition with warning message
"proc_dir_entry" already registered.

Debugging revealed a race condition between the PCI core setting the
is_added bit in pci_bus_add_device() and the NVMe driver reset work-queue
setting the is_busmaster bit in pci_set_master().  As these fields are not
handled atomically, that clears the is_added bit.

Move the is_added bit to a separate private flag variable and use atomic
functions to set and retrieve the device addition state.  This avoids the
race because is_added no longer shares a memory location with is_busmaster.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=200283
Signed-off-by: Hari Vyas <hari.vyas@broadcom.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-31 11:27:54 -05:00
Shilpasri G Bhat 04baaf28f4 powerpc/powernv: Add support to enable sensor groups
Adds support to enable/disable a sensor group at runtime. This
can be used to select the sensor groups that needs to be copied to
main memory by OCC. Sensor groups like power, temperature, current,
voltage, frequency, utilization can be enabled/disabled at runtime.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-31 19:56:45 +10:00
Akshay Adiga 9c7b185ab2 powernv/cpuidle: Parse dt idle properties into global structure
Device-tree parsing happens twice, once while deciding idle state to be
used for hotplug and once during cpuidle init. Hence, parsing the device
tree and caching it will reduce code duplication. Parsing code has been
moved to pnv_parse_cpuidle_dt() from pnv_probe_idle_states(). In addition
to the properties in the device tree the number of available states is
also required.

Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-31 19:56:44 +10:00
Christophe Leroy 2c86cd188f powerpc: clean inclusions of asm/feature-fixups.h
files not using feature fixup don't need asm/feature-fixups.h
files using feature fixup need asm/feature-fixups.h

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:17 +10:00
Christophe Leroy 5c35a02c54 powerpc: clean the inclusion of stringify.h
Only include linux/stringify.h is files using __stringify()

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:17 +10:00
Christophe Leroy ec0c464cdb powerpc: move ASM_CONST and stringify_in_c() into asm-const.h
This patch moves ASM_CONST() and stringify_in_c() into
dedicated asm-const.h, then cleans all related inclusions.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: asm-compat.h should include asm-const.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:16 +10:00
Nicholas Piggin 17cc1dd492 powerpc/powernv: implement opal_put_chars_atomic
The RAW console does not need writes to be atomic, so relax
opal_put_chars to be able to do partial writes, and implement an
_atomic variant which does not take a spinlock. This API is used
in xmon, so the less locking that is used, the better chance there
is that a crash can be debugged.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:09:57 +10:00
Nicholas Piggin ac4ac788fd powerpc/powernv: move opal console flushing to udbg
OPAL console writes do not have to synchronously flush firmware /
hardware buffers unless they are going through the udbg path.

Remove the unconditional flushing from opal_put_chars. Flush if
there was no space in the buffer as an optimisation (callers loop
waiting for success in that case). udbg flushing is moved to
udbg_opal_putc.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:09:57 +10:00
Nicholas Piggin b74d2807ae powerpc/powernv: Remove OPALv1 support from opal console driver
opal_put_chars deals with partial writes because in OPALv1,
opal_console_write_buffer_space did not work correctly. That firmware
is not supported.

This reworks the opal_put_chars code to no longer deal with partial
writes by turning them into full writes. Partial write handling is still
supported in terms of what gets returned to the caller, but it may not
go to the console atomically. A warning message is printed in this
case.

This allows console flushing to be moved out of the opal_write_lock
spinlock. That could cause the lock to be held for long periods if the
console is busy (especially if it was being spammed by firmware),
which is dangerous because the lock is taken by xmon to debug the
system. Flushing outside the lock improves the situation a bit.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:09:56 +10:00
Nicholas Piggin d2a2262e68 powerpc/powernv: Implement and use opal_flush_console
A new console flushing firmware API was introduced to replace event
polling loops, and implemented in opal-kmsg with affddff69c
("powerpc/powernv: Add a kmsg_dumper that flushes console output on
panic"), to flush the console in the panic path.

The OPAL console driver has other situations where interrupts are off
and it needs to flush the console synchronously. These still use a
polling loop.

So move the opal-kmsg flush code to opal_flush_console, and use the
new function in opal-kmsg and opal_put_chars.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:09:56 +10:00
Nicholas Piggin e00da0f2db powerpc/powernv: opal-kmsg use flush fallback from console code
Use the more refined and tested event polling loop from opal_put_chars
as the fallback console flush in the opal-kmsg path. This loop is used
by the console driver today, whereas the opal-kmsg fallback is not
likely to have been used for years.

Use WARN_ONCE rather than a printk when the fallback is invoked to
prepare for moving the console flush into a common function.

Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:09:56 +10:00
Nicholas Piggin 3a80bfc7ea powerpc/powernv: opal-kmsg standardise OPAL_BUSY handling
OPAL_CONSOLE_FLUSH is documented as being able to return OPAL_BUSY,
so implement the standard OPAL_BUSY handling for it.

Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:09:55 +10:00
Nicholas Piggin 36d2dabc87 powerpc/powernv: Fix OPAL console driver OPAL_BUSY loops
The OPAL console driver does not delay in case it gets OPAL_BUSY or
OPAL_BUSY_EVENT from firmware.

It can't yet be made to sleep because it is called under spinlock,
but it can be changed to the standard OPAL_BUSY loop form, and a
delay added to keep it from hitting the firmware too frequently.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:09:55 +10:00
Nicholas Piggin bd90284cc6 powerpc/powernv: opal_put_chars partial write fix
The intention here is to consume and discard the remaining buffer
upon error. This works if there has not been a previous partial write.
If there has been, then total_len is no longer total number of bytes
to copy. total_len is always "bytes left to copy", so it should be
added to written bytes.

This code may not be exercised any more if partial writes will not be
hit, but this is a small bugfix before a larger change.

Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:09:54 +10:00
Mukesh Ojha b29336c0e1 powerpc/powernv/opal-dump : Use IRQ_HANDLED instead of numbers in interrupt handler
Fixes: 8034f715f ("powernv/opal-dump: Convert to irq domain")

Converts all the return explicit number to a more proper IRQ_HANDLED,
which looks proper incase of interrupt handler returning case.

Here, It also removes error message like "nobody cared" which was
getting unveiled while returning -1 or 0 from handler.

Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:03:24 +10:00
Mukesh Ojha a5bbe8fd29 powerpc/powernv/opal-dump : Handles opal_dump_info properly
Moves the return value check of 'opal_dump_info' to a proper place which
was previously unnecessarily filling all the dump info even on failure.

Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com>
Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:03:23 +10:00
Alistair Popple 99c3ce33a0 powerpc/powernv/npu: Add a debugfs setting to change ATSD threshold
The threshold at which it becomes more efficient to coalesce a range
of ATSDs into a single per-PID ATSD is currently not well understood
due to a lack of real-world work loads. This patch adds a debugfs
parameter allowing the threshold to be altered at runtime in order to
aid future development and refinement of the value.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-19 21:58:10 +10:00
Michael Ellerman ce57c6610c Merge branch 'topic/ppc-kvm' into next
Merge in some commits we're sharing with the KVM tree.

I manually propagated the change from commit d3d4ffaae4
("powerpc/powernv/ioda2: Reduce upper limit for DMA window size") into
pci-ioda-tce.c.

Conflicts:
        arch/powerpc/include/asm/cputable.h
        arch/powerpc/platforms/powernv/pci-ioda.c
        arch/powerpc/platforms/powernv/pci.h
2018-07-19 14:37:57 +10:00
Alexey Kardashevskiy a68bd1267b powerpc/powernv/ioda: Allocate indirect TCE levels on demand
At the moment we allocate the entire TCE table, twice (hardware part and
userspace translation cache). This normally works as we normally have
contigous memory and the guest will map entire RAM for 64bit DMA.

However if we have sparse RAM (one example is a memory device), then
we will allocate TCEs which will never be used as the guest only maps
actual memory for DMA. If it is a single level TCE table, there is nothing
we can really do but if it a multilevel table, we can skip allocating
TCEs we know we won't need.

This adds ability to allocate only first level, saving memory.

This changes iommu_table::free() to avoid allocating of an extra level;
iommu_table::set() will do this when needed.

This adds @alloc parameter to iommu_table::exchange() to tell the callback
if it can allocate an extra level; the flag is set to "false" for
the realmode KVM handlers of H_PUT_TCE hcalls and the callback returns
H_TOO_HARD.

This still requires the entire table to be counted in mm::locked_vm.

To be conservative, this only does on-demand allocation when
the usespace cache table is requested which is the case of VFIO.

The example math for a system replicating a powernv setup with NVLink2
in a guest:
16GB RAM mapped at 0x0
128GB GPU RAM window (16GB of actual RAM) mapped at 0x244000000000

the table to cover that all with 64K pages takes:
(((0x244000000000 + 0x2000000000) >> 16)*8)>>20 = 4556MB

If we allocate only necessary TCE levels, we will only need:
(((0x400000000 + 0x400000000) >> 16)*8)>>20 = 4MB (plus some for indirect
levels).

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-16 22:53:11 +10:00
Alexey Kardashevskiy 9bc98c8a43 powerpc/powernv: Rework TCE level allocation
This moves actual pages allocation to a separate function which is going
to be reused later in on-demand TCE allocation.

While we are at it, remove unnecessary level size round up as the caller
does this already.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-16 22:53:10 +10:00
Alexey Kardashevskiy 090bad39b2 powerpc/powernv: Add indirect levels to it_userspace
We want to support sparse memory and therefore huge chunks of DMA windows
do not need to be mapped. If a DMA window big enough to require 2 or more
indirect levels, and a DMA window is used to map all RAM (which is
a default case for 64bit window), we can actually save some memory by
not allocation TCE for regions which we are not going to map anyway.

The hardware tables alreary support indirect levels but we also keep
host-physical-to-userspace translation array which is allocated by
vmalloc() and is a flat array which might use quite some memory.

This converts it_userspace from vmalloc'ed array to a multi level table.

As the format becomes platform dependend, this replaces the direct access
to it_usespace with a iommu_table_ops::useraddrptr hook which returns
a pointer to the userspace copy of a TCE; future extension will return
NULL if the level was not allocated.

This should not change non-KVM handling of TCE tables and it_userspace
will not be allocated for non-KVM tables.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-16 22:53:10 +10:00
Alexey Kardashevskiy 191c22879f powerpc/powernv: Move TCE manupulation code to its own file
Right now we have allocation code in pci-ioda.c and traversing code in
pci.c, let's keep them toghether. However both files are big enough
already so let's move this business to a new file.

While we at it, move the code which links IOMMU table groups to
IOMMU tables as it is not specific to any PNV PHB model.

These puts exported symbols from the new file together.

This fixes several warnings from checkpatch.pl like this:
"WARNING: Prefer 'unsigned int' to bare use of 'unsigned'".

As this is almost cut-n-paste, there should be no behavioral change.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-16 22:53:07 +10:00
Alexey Kardashevskiy da2bb0da73 powerpc/powernv: Remove useless wrapper
This gets rid of a useless wrapper around
pnv_pci_ioda2_table_free_pages().

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-16 22:47:02 +10:00
Nicholas Piggin 2bf1071a8d powerpc/64s: Remove POWER9 DD1 support
POWER9 DD1 was never a product. It is no longer supported by upstream
firmware, and it is not effectively supported in Linux due to lack of
testing.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Michael Ellerman <mpe@ellerman.id.au>
[mpe: Remove arch_make_huge_pte() entirely]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-16 11:37:21 +10:00
Alexey Kardashevskiy 00c376fdd7 powerpc/powernv/ioda2: Add 256M IOMMU page size to the default POWER8 case
The sketchy bypass uses 256M pages so add this page size as well.

This should cause no behavioral change but will be used later.

Fixes: 477afd6ea6 "powerpc/ioda: Use ibm,supported-tce-sizes for IOMMU page size mask"
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-04 22:41:09 +10:00
Alastair D'Silva 8bf6b91a51 Revert "powerpc/powernv: Add support for the cxl kernel api on the real phb"
Remove abandonned capi support for the Mellanox CX4.

This reverts commit 4361b03430.

Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-02 23:54:32 +10:00
Alastair D'Silva 0cfd7335d1 Revert "cxl: Add support for interrupts on the Mellanox CX4"
Remove abandonned capi support for the Mellanox CX4.

This reverts commit a2f67d5ee8.

Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-02 23:54:30 +10:00
Michael Neuling 7dea6f2f05 powerpc/powernv/memtrace: Remove memtrace mmap()
debugfs doesn't support mmap(), so this code is never used.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-02 23:54:29 +10:00
Alexey Kardashevskiy d3d4ffaae4 powerpc/powernv/ioda2: Reduce upper limit for DMA window size
We use PHB in mode1 which uses bit 59 to select a correct DMA window.
However there is mode2 which uses bits 59:55 and allows up to 32 DMA
windows per a PE.

Even though documentation does not clearly specify that, it seems that
the actual hardware does not support bits 59:55 even in mode1, in other
words we can create a window as big as 1<<58 but DMA simply won't work.

This reduces the upper limit from 59 to 55 bits to let the userspace know
about the hardware limits.

Fixes: 7aafac11e3 "powerpc/powernv/ioda2: Gracefully fail if too many TCE levels requested"
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-02 23:54:29 +10:00
Mauro S. M. Rodrigues ee8c446fed powerpc/eeh: Avoid misleading message "EEH: no capable adapters found"
Due to recent refactoring in EEH in:
commit b9fde58db7 ("powerpc/powernv: Rework EEH initialization on
powernv")
a misleading message was seen in the kernel message buffer:

[    0.108431] EEH: PowerNV platform initialized
[    0.589979] EEH: No capable adapters found

This happened due to the removal of the initialization delay for powernv
platform.

Even though the EEH infrastructure for the devices is eventually
initialized and still works just fine the eeh device probe step is
postponed in order to assure the PEs are created. Later
pnv_eeh_post_init does the probe devices job but at that point the
message was already shown right after eeh_init flow.

This patch introduces a new flag EEH_POSTPONED_PROBE to represent that
temporary state and avoid the message mentioned above and showing the
follow one instead:

[    0.107724] EEH: PowerNV platform initialized
[    4.844825] EEH: PCI Enhanced I/O Error Handling Enabled

Signed-off-by: Mauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Tested-by:Venkat Rao B <vrbagal1@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-02 23:54:26 +10:00
Kees Cook 6396bb2215 treewide: kzalloc() -> kcalloc()
The kzalloc() function has a 2-factor argument form, kcalloc(). This
patch replaces cases of:

        kzalloc(a * b, gfp)

with:
        kcalloc(a * b, gfp)

as well as handling cases of:

        kzalloc(a * b * c, gfp)

with:

        kzalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

        kzalloc_array(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

        kzalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  kzalloc(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  kzalloc(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  kzalloc(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * (COUNT_ID)
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * COUNT_ID
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * (COUNT_CONST)
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * COUNT_CONST
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * (COUNT_ID)
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * COUNT_ID
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * (COUNT_CONST)
+	COUNT_CONST, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * COUNT_CONST
+	COUNT_CONST, sizeof(THING)
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kzalloc
+ kcalloc
  (
-	SIZE * COUNT
+	COUNT, SIZE
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  kzalloc(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  kzalloc(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kzalloc(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kzalloc(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  kzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  kzalloc(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  kzalloc(C1 * C2 * C3, ...)
|
  kzalloc(
-	(E1) * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc(
-	(E1) * (E2) * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc(
-	(E1) * (E2) * (E3)
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
  kzalloc(sizeof(THING) * C2, ...)
|
  kzalloc(sizeof(TYPE) * C2, ...)
|
  kzalloc(C1 * C2 * C3, ...)
|
  kzalloc(C1 * C2, ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * (E2)
+	E2, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * E2
+	E2, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * (E2)
+	E2, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * E2
+	E2, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	(E1) * E2
+	E1, E2
  , ...)
|
- kzalloc
+ kcalloc
  (
-	(E1) * (E2)
+	E1, E2
  , ...)
|
- kzalloc
+ kcalloc
  (
-	E1 * E2
+	E1, E2
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Linus Torvalds c90fca951e powerpc updates for 4.18
Notable changes:
 
  - Support for split PMD page table lock on 64-bit Book3S (Power8/9).
 
  - Add support for HAVE_RELIABLE_STACKTRACE, so we properly support live
    patching again.
 
  - Add support for patching barrier_nospec in copy_from_user() and syscall entry.
 
  - A couple of fixes for our data breakpoints on Book3S.
 
  - A series from Nick optimising TLB/mm handling with the Radix MMU.
 
  - Numerous small cleanups to squash sparse/gcc warnings from Mathieu Malaterre.
 
  - Several series optimising various parts of the 32-bit code from Christophe Leroy.
 
  - Removal of support for two old machines, "SBC834xE" and "C2K" ("GEFanuc,C2K"),
    which is why the diffstat has so many deletions.
 
 And many other small improvements & fixes.
 
 There's a few out-of-area changes. Some minor ftrace changes OK'ed by Steve, and
 a fix to our powernv cpuidle driver. Then there's a series touching mm, x86 and
 fs/proc/task_mmu.c, which cleans up some details around pkey support. It was
 ack'ed/reviewed by Ingo & Dave and has been in next for several weeks.
 
 Thanks to:
   Akshay Adiga, Alastair D'Silva, Alexey Kardashevskiy, Al Viro, Andrew
   Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Arnd Bergmann, Balbir Singh,
   Cédric Le Goater, Christophe Leroy, Christophe Lombard, Colin Ian King, Dave
   Hansen, Fabio Estevam, Finn Thain, Frederic Barrat, Gautham R. Shenoy, Haren
   Myneni, Hari Bathini, Ingo Molnar, Jonathan Neuschäfer, Josh Poimboeuf,
   Kamalesh Babulal, Madhavan Srinivasan, Mahesh Salgaonkar, Mark Greer, Mathieu
   Malaterre, Matthew Wilcox, Michael Neuling, Michal Suchanek, Naveen N. Rao,
   Nicholas Piggin, Nicolai Stange, Olof Johansson, Paul Gortmaker, Paul
   Mackerras, Peter Rosin, Pridhiviraj Paidipeddi, Ram Pai, Rashmica Gupta, Ravi
   Bangoria, Russell Currey, Sam Bobroff, Samuel Mendoza-Jonas, Segher
   Boessenkool, Shilpasri G Bhat, Simon Guo, Souptick Joarder, Stewart Smith,
   Thiago Jung Bauermann, Torsten Duwe, Vaibhav Jain, Wei Yongjun, Wolfram Sang,
   Yisheng Xie, YueHaibing.
 -----BEGIN PGP SIGNATURE-----
 
 iQIwBAABCAAaBQJbGQKBExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYBq
 TRAAioK7rz5xYMkxaM3Ng3ybobEeNAwQqOolz98xvmnB9SfDWNuc99vf8cGu0/fQ
 zc8AKZ5RcnwipOjyGlxW9oa1ZhVq0xtYnQPiYLEKMdLQmh5D+C7+KpvAd1UElweg
 ub40/xDySWfMujfuMSF9JDCWPIXyojt4Xg5nJKIVRrAm/3YMe/+i5Am7NWHuMCEb
 aQmZtlYW5Mz81XY0968hjpUO6eKFRmsaM7yFAhGTXx6+oLRpGj1PZB4AwdRIKS2L
 Ak7q/VgxtE4W+s3a0GK2s+eXIhGKeFuX9AVnx3nti+8/K1OqrqhDcLMUC/9JpCpv
 EvOtO7dxPnZujHjdu4Eai/xNoo4h6zRy7bWqve9LoBM40CP5jljKzu1lwqqb5yO0
 jC7/aXhgiSIxxcRJLjoI/TYpZPu40MifrkydmczykdPyPCnMIWEJDcj4KsRL/9Y8
 9SSbJzRNC/SgQNTbUYPZFFi6G0QaMmlcbCb628k8QT+Gn3Xkdf/ZtxzqEyoF4Irq
 46kFBsiSSK4Bu0rVlcUtJQLgdqytWULO6NKEYnD67laxYcgQd8pGFQ8SjZhRZLgU
 q5LA3HIWhoAI4M0wZhOnKXO6JfiQ1UbO8gUJLsWsfF0Fk5KAcdm+4kb4jbI1H4Qk
 Vol9WNRZwEllyaiqScZN9RuVVuH0GPOZeEH1dtWK+uWi0lM=
 =ZlBf
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "Notable changes:

   - Support for split PMD page table lock on 64-bit Book3S (Power8/9).

   - Add support for HAVE_RELIABLE_STACKTRACE, so we properly support
     live patching again.

   - Add support for patching barrier_nospec in copy_from_user() and
     syscall entry.

   - A couple of fixes for our data breakpoints on Book3S.

   - A series from Nick optimising TLB/mm handling with the Radix MMU.

   - Numerous small cleanups to squash sparse/gcc warnings from Mathieu
     Malaterre.

   - Several series optimising various parts of the 32-bit code from
     Christophe Leroy.

   - Removal of support for two old machines, "SBC834xE" and "C2K"
     ("GEFanuc,C2K"), which is why the diffstat has so many deletions.

  And many other small improvements & fixes.

  There's a few out-of-area changes. Some minor ftrace changes OK'ed by
  Steve, and a fix to our powernv cpuidle driver. Then there's a series
  touching mm, x86 and fs/proc/task_mmu.c, which cleans up some details
  around pkey support. It was ack'ed/reviewed by Ingo & Dave and has
  been in next for several weeks.

  Thanks to: Akshay Adiga, Alastair D'Silva, Alexey Kardashevskiy, Al
  Viro, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Arnd
  Bergmann, Balbir Singh, Cédric Le Goater, Christophe Leroy, Christophe
  Lombard, Colin Ian King, Dave Hansen, Fabio Estevam, Finn Thain,
  Frederic Barrat, Gautham R. Shenoy, Haren Myneni, Hari Bathini, Ingo
  Molnar, Jonathan Neuschäfer, Josh Poimboeuf, Kamalesh Babulal,
  Madhavan Srinivasan, Mahesh Salgaonkar, Mark Greer, Mathieu Malaterre,
  Matthew Wilcox, Michael Neuling, Michal Suchanek, Naveen N. Rao,
  Nicholas Piggin, Nicolai Stange, Olof Johansson, Paul Gortmaker, Paul
  Mackerras, Peter Rosin, Pridhiviraj Paidipeddi, Ram Pai, Rashmica
  Gupta, Ravi Bangoria, Russell Currey, Sam Bobroff, Samuel
  Mendoza-Jonas, Segher Boessenkool, Shilpasri G Bhat, Simon Guo,
  Souptick Joarder, Stewart Smith, Thiago Jung Bauermann, Torsten Duwe,
  Vaibhav Jain, Wei Yongjun, Wolfram Sang, Yisheng Xie, YueHaibing"

* tag 'powerpc-4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (251 commits)
  powerpc/64s/radix: Fix missing ptesync in flush_cache_vmap
  cpuidle: powernv: Fix promotion from snooze if next state disabled
  powerpc: fix build failure by disabling attribute-alias warning in pci_32
  ocxl: Fix missing unlock on error in afu_ioctl_enable_p9_wait()
  powerpc-opal: fix spelling mistake "Uniterrupted" -> "Uninterrupted"
  powerpc: fix spelling mistake: "Usupported" -> "Unsupported"
  powerpc/pkeys: Detach execute_only key on !PROT_EXEC
  powerpc/powernv: copy/paste - Mask SO bit in CR
  powerpc: Remove core support for Marvell mv64x60 hostbridges
  powerpc/boot: Remove core support for Marvell mv64x60 hostbridges
  powerpc/boot: Remove support for Marvell mv64x60 i2c controller
  powerpc/boot: Remove support for Marvell MPSC serial controller
  powerpc/embedded6xx: Remove C2K board support
  powerpc/lib: optimise PPC32 memcmp
  powerpc/lib: optimise 32 bits __clear_user()
  powerpc/time: inline arch_vtime_task_switch()
  powerpc/Makefile: set -mcpu=860 flag for the 8xx
  powerpc: Implement csum_ipv6_magic in assembly
  powerpc/32: Optimise __csum_partial()
  powerpc/lib: Adjust .balign inside string functions for PPC32
  ...
2018-06-07 10:23:33 -07:00
Colin Ian King b0c4acb1dd powerpc-opal: fix spelling mistake "Uniterrupted" -> "Uninterrupted"
Trivial fix to spelling mistake in hmi_error_types text

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Stewart Smith <stewart@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-05 11:33:47 +10:00
Haren Myneni 7574364906 powerpc/powernv: copy/paste - Mask SO bit in CR
NX can set the 3rd bit in CR register for XER[SO] (Summary overflow)
which is not related to paste request. The current paste function
returns failure for a successful request when this bit is set. So mask
this bit and check the proper return status.

Fixes: 2392c8c8c0 ("powerpc/powernv/vas: Define copy/paste interfaces")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-04 22:58:41 +10:00
Michal Suchanek cb3d6759a9 powerpc/64s: Enable barrier_nospec based on firmware settings
Check what firmware told us and enable/disable the barrier_nospec as
appropriate.

We err on the side of enabling the barrier, as it's no-op on older
systems, see the comment for more detail.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:45 +10:00
Anju T Sudhakar 25af86b2ae powerpc/perf: Unregister thread-imc if core-imc not supported
Since thread-imc internally use the core-imc hardware infrastructure
and is depended on it, having thread-imc in the kernel in the
absence of core-imc is trivial. Patch disables thread-imc, if
core-imc is not registered.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:37 +10:00