Commit Graph

4617 Commits

Author SHA1 Message Date
Anton Blanchard 5e95235ccd powerpc: Align TOC to 256 bytes
Recent toolchains force the TOC to be 256 byte aligned. We need
to enforce this alignment in our linker script, otherwise pointers
to our TOC variables (__toc_start, __prom_init_toc_start) could
be incorrect.

If they are bad, we die a few hundred instructions into boot.

Cc: stable@vger.kernel.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-14 16:59:21 +10:00
Wei Yang f77ceb717d powerpc/eeh: remove unused macro IS_BRIDGE
Currently, the macro IS_BRIDGE is not used any where.
This patch just removes it.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-13 14:00:07 +10:00
Wei Yang 2ac3990cc3 powerpc/eeh: fix comment for wait_state()
To retrieve the PCI slot state, EEH driver would set a timeout for that.
While current comment is not aligned to what the code does.

This patch fixes those comments according to the code.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-13 14:00:07 +10:00
Wei Yang 3721352990 powerpc/eeh: fix start/end/flags type in struct pci_io_addr_range{}
struct pci_io_addr_range{} stores the information of pci resources. It
would be better to keep these related fields have the same type as in
struct resource{}.

This patch fixes the start/end/flags type in struct pci_io_addr_range{} to
have the same type as in struct resource{}.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-13 14:00:07 +10:00
Gavin Shan ec33d36e5a powerpc/eeh: Introduce eeh_pe_inject_err()
The patch defines PCI error types and functions in uapi/asm/eeh.h
and exports function eeh_pe_inject_err(), which will be called by
VFIO driver to inject the specified PCI error to the indicated
PE for testing purpose.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-12 20:33:35 +10:00
Daniel Axtens ffb2d78eca powerpc/mce: fix off by one errors in mce event handling
Before 69111bac42 ("powerpc: Replace __get_cpu_var uses"), in
save_mce_event, index got the value of mce_nest_count, and
mce_nest_count was incremented *after* index was set.

However, that patch changed the behaviour so that mce_nest count was
incremented *before* setting index.

This causes an off-by-one error, as get_mce_event sets index as
mce_nest_count - 1 before reading mce_event.  Thus get_mce_event reads
bogus data, causing warnings like
"Machine Check Exception, Unknown event version 0 !"
and breaking MCEs handling.

Restore the old behaviour and unbreak MCE handling by subtracting one
from the newly incremented value.

The same broken change occured in machine_check_queue_event (which set
a queue read by machine_check_process_queued_event).  Fix that too,
unbreaking printing of MCE information.

Fixes: 69111bac42 ("powerpc: Replace __get_cpu_var uses")
CC: stable@vger.kernel.org
CC: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
CC: Christoph Lameter <cl@linux.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-12 19:44:01 +10:00
Michael Ellerman e0d0059169 powerpc/vdso: Disable building the 32-bit VDSO on little endian
The only little endian configuration we support is ppc64le. As such if
we're building little endian we don't need a 32-bit VDSO, because there
is no 32-bit userspace.

This patch is a fairly ugly mess of #ifdefs, but is the minimal logic
required to disable the 32-bit VDSO. We can hopefully clean up the
result in future with some further refactoring.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-11 20:01:02 +10:00
Michael Ellerman 6e5c077519 powerpc/vdso: Combine start/size variables
In vdso_fixup_features() we have start64/start32 and size64/size32, but
they have the same types, ie. void * and unsigned long.

They're only used to save the return value from find_sectionXX() for the
subsequent call to do_feature_fixups(), so there's no overlap in their
usage either.

So we can just consolidate them into start/size and avoid the
duplication.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-11 20:00:32 +10:00
Michael Ellerman 63da88dd48 powerpc/vdso: Remove unused debug code
It's in the git history if we ever need it back.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-11 20:00:32 +10:00
Michael Ellerman 5c0aebf6e1 powerpc: Show utsname->machine in boot-up banner
Currently we print "Starting Linux PPC64" at boot. But we don't mention
anywhere whether the kernel is big or little endian.

If we print the utsname->machine value instead we get either "ppc64" or
"ppc64le" which is much more informative, eg:

  Starting Linux ppc64le #1 SMP Wed Apr 15 12:12:20 AEST 2015

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-11 19:55:54 +10:00
Dan Streetman b130e7c04f powerpc: export of_get_ibm_chip_id function
Export the of_get_ibm_chip_id() function.  This will be used by the
PowerNV NX-842 driver.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-05-11 15:06:37 +08:00
Sam Bobroff 0aab374709 powerpc/powernv: Restore non-volatile CRs after nap
Patches 7cba160ad "powernv/cpuidle: Redesign idle states management"
and 77b54e9f2 "powernv/powerpc: Add winkle support for offline cpus"
use non-volatile condition registers (cr2, cr3 and cr4) early in the system
reset interrupt handler (system_reset_pSeries()) before it has been determined
if state loss has occurred. If state loss has not occurred, control returns via
the power7_wakeup_noloss() path which does not restore those condition
registers, leaving them corrupted.

Fix this by restoring the condition registers in the power7_wakeup_noloss()
case.

This is apparent when running a KVM guest on hardware that does not
support winkle or sleep and the guest makes use of secondary threads. In
practice this means Power7 machines, though some early unreleased Power8
machines may also be susceptible.

The secondary CPUs are taken off line before the guest is started and
they call pnv_smp_cpu_kill_self(). This checks support for sleep
states (in this case there is no support) and power7_nap() is called.

When the CPU is woken, power7_nap() returns and because the CPU is
still off line, the main while loop executes again. The sleep states
support test is executed again, but because the tested values cannot
have changed, the compiler has optimized the test away and instead we
rely on the result of the first test, which has been left in cr3
and/or cr4. With the result overwritten, the wrong branch is taken and
power7_winkle() is called on a CPU that does not support it, leading
to it stalling.

Fixes: 7cba160ad7 ("powernv/cpuidle: Redesign idle states management")
Fixes: 77b54e9f21 ("powernv/powerpc: Add winkle support for offline cpus")
[mpe: Massage change log a bit more]
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-01 16:55:11 +10:00
Gavin Shan d91dafc02f powerpc/eeh: Delay probing EEH device during hotplug
Commit 1c509148b ("powerpc/eeh: Do probe on pci_dn") probes EEH
devices in early stage, which is reasonable to pSeries platform.
However, it's wrong for PowerNV platform because the PE# isn't
determined until the resources (IO and MMIO) are assigned to
PE in hotplug case. So we have to delay probing EEH devices
for PowerNV platform until the PE# is assigned.

Fixes: ff57b454dd ("powerpc/eeh: Do probe on pci_dn")
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-01 13:52:32 +10:00
Gavin Shan 1ae79b78bc powerpc/eeh: Fix race condition in pcibios_set_pcie_reset_state()
When asserting reset in pcibios_set_pcie_reset_state(), the PE
is enforced to (hardware) frozen state in order to drop unexpected
PCI transactions (except PCI config read/write) automatically by
hardware during reset, which would cause recursive EEH error.
However, the (software) frozen state EEH_PE_ISOLATED is missed.
When users get 0xFF from PCI config or MMIO read, EEH_PE_ISOLATED
is set in PE state retrival backend. Unfortunately, nobody (the
reset handler or the EEH recovery functinality in host) will clear
EEH_PE_ISOLATED when the PE has been passed through to guest.

The patch sets and clears EEH_PE_ISOLATED properly during reset
in function pcibios_set_pcie_reset_state() to fix the issue.

Fixes: 28158cd ("Enhance pcibios_set_pcie_reset_state()")
Reported-by: Carol L. Soto <clsoto@us.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Tested-by: Carol L. Soto <clsoto@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-01 13:52:09 +10:00
Michael Ellerman 68fc378ce3 Revert "powerpc/tm: Abort syscalls in active transactions"
This reverts commit feba40362b.

Although the principle of this change is good, the implementation has a
few issues.

Firstly we can sometimes fail to abort a syscall because r12 may have
been clobbered by C code if we went down the virtual CPU accounting
path, or if syscall tracing was enabled.

Secondly we have decided that it is safer to abort the syscall even
earlier in the syscall entry path, so that we avoid the syscall tracing
path when we are transactional.

So that we have time to thoroughly test those changes we have decided to
revert this for this merge window and will merge the fixed version in
the next window.

NB. Rather than reverting the selftest we just drop tm-syscall from
TEST_PROGS so that it's not run by default.

Fixes: feba40362b ("powerpc/tm: Abort syscalls in active transactions")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-30 15:24:58 +10:00
Linus Torvalds 63905bba5b powerpc fixes for 4.1
- Fix for mm_dec_nr_pmds() from Scott.
 - Fixes for oopses seen with KVM + THP from Aneesh.
 - Build fixes from Aneesh & Shreyas.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVOsk5AAoJEFHr6jzI4aWACWMP/3EaNoeA1g8VbZWZEdRaoLvX
 W7D08DI3Dt8HLxyn2JR08jYZF0gr68XrF6OiscVVki7wVXT8fbH4jSNmBkbzNH95
 d9taScJyR1CUavkhsXivnR1qEE1Fi2KA2OW9RaNfoSt1MVtdsvOK6xXklUGksuJQ
 XygzyrRr4Dj82kuMUAMO0YDMvknMlzi3a8dzyrWZBXBZOOTWavGB6bQKtCTaOQ99
 3OFGLQ10uY7lmdHDi0t0tQ99FuYfLiJpg5fTLoUni4J5tFp8JlZ+x0Gwc0apN0cy
 Ym8EO6++qWDv8FXvYEPfVUEjbF1fyPiawUgpkMnyvXgd8K5G85SIrtkGW0Ml+6sX
 GfJH8w9hpDbF5EnWlC9bn/jT7sHBHFdrxZuQUc0L4M2OtM73R2a0Xr3b7ZxFCD1q
 7RpYu8MKKcyvaIXNg7VBJjj8zL+WmUJKF6J5uX5bGU2xH0khmp0vTknyyjbwrlcF
 uHidv5ZhMt3aAI70v14jA5BTEmLyOYRu58Ei6cT/VT/DjdbpEApdK8BMAvKSEeib
 +hzh6oDFT92AM0tbg15bNmqGbGfgqtVKe4GDS2QyGaHGAFOGs1nPuSa9se1xYDcM
 CCtRyABwpzJsrCfwra2fsTU6FxlatK4ONViyWFBXa6mEjBNSZ4XmyZvdWUqlwpSC
 F5jNGppm5Ama6xxcLphA
 =6yQx
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux

Pull powerpc fixes from Michael Ellerman:

 - fix for mm_dec_nr_pmds() from Scott.

 - fixes for oopses seen with KVM + THP from Aneesh.

 - build fixes from Aneesh & Shreyas.

* tag 'powerpc-4.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
  powerpc/mm: Fix build error with CONFIG_PPC_TRANSACTIONAL_MEM disabled
  powerpc/kvm: Fix ppc64_defconfig + PPC_POWERNV=n build error
  powerpc/mm/thp: Return pte address if we find trans_splitting.
  powerpc/mm/thp: Make page table walk safe against thp split/collapse
  KVM: PPC: Remove page table walk helpers
  KVM: PPC: Use READ_ONCE when dereferencing pte_t pointer
  powerpc/hugetlb: Call mm_dec_nr_pmds() in hugetlb_free_pmd_range()
2015-04-26 13:23:15 -07:00
Linus Torvalds eadf16a912 This mostly includes the PPC changes for 4.1, which this time cover
Book3S HV only (debugging aids, minor performance improvements and some
 cleanups).  But there are also bug fixes and small cleanups for ARM,
 x86 and s390.
 
 The task_migration_notifier revert and real fix is still pending review,
 but I'll send it as soon as possible after -rc1.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJVOONLAAoJEL/70l94x66DbsMIAIpZPsaqgXOC1sDEiZuYay+6
 rD4n4id7j8hIAzcf3AlZdyf5XgLlr6I1Zyt62s1WcoRq/CCnL7k9EljzSmw31WFX
 P2y7/J0iBdkn0et+PpoNThfL2GsgTqNRCLOOQlKgEQwMP9Dlw5fnUbtC1UchOzTg
 eAMeBIpYwufkWkXhdMw4PAD4lJ9WxUZ1eXHEBRzJb0o0ZxAATJ1tPZGrFJzoUOSM
 WsVNTuBsNd7upT02kQdvA1TUo/OPjseTOEoksHHwfcORt6bc5qvpctL3jYfcr7sk
 /L6sIhYGVNkjkuredjlKGLfT2DDJjSEdJb1k2pWrDRsY76dmottQubAE9J9cDTk=
 =OAi2
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull second batch of KVM changes from Paolo Bonzini:
 "This mostly includes the PPC changes for 4.1, which this time cover
  Book3S HV only (debugging aids, minor performance improvements and
  some cleanups).  But there are also bug fixes and small cleanups for
  ARM, x86 and s390.

  The task_migration_notifier revert and real fix is still pending
  review, but I'll send it as soon as possible after -rc1"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (29 commits)
  KVM: arm/arm64: check IRQ number on userland injection
  KVM: arm: irqfd: fix value returned by kvm_irq_map_gsi
  KVM: VMX: Preserve host CR4.MCE value while in guest mode.
  KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8
  KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C
  KVM: PPC: Book3S HV: Streamline guest entry and exit
  KVM: PPC: Book3S HV: Use bitmap of active threads rather than count
  KVM: PPC: Book3S HV: Use decrementer to wake napping threads
  KVM: PPC: Book3S HV: Don't wake thread with no vcpu on guest IPI
  KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken
  KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu
  KVM: PPC: Book3S HV: Minor cleanups
  KVM: PPC: Book3S HV: Simplify handling of VCPUs that need a VPA update
  KVM: PPC: Book3S HV: Accumulate timing information for real-mode code
  KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT
  KVM: PPC: Book3S HV: Add ICP real mode counters
  KVM: PPC: Book3S HV: Move virtual mode ICP functions to real-mode
  KVM: PPC: Book3S HV: Convert ICS mutex lock to spin lock
  KVM: PPC: Book3S HV: Add guest->host real mode completion counters
  KVM: PPC: Book3S HV: Add helpers for lock/unlock hpte
  ...
2015-04-26 13:06:22 -07:00
Paul Mackerras 66feed61cd KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8
This uses msgsnd where possible for signalling other threads within
the same core on POWER8 systems, rather than IPIs through the XICS
interrupt controller.  This includes waking secondary threads to run
the guest, the interrupts generated by the virtual XICS, and the
interrupts to bring the other threads out of the guest when exiting.

Aggregated statistics from debugfs across vcpus for a guest with 32
vcpus, 8 threads/vcore, running on a POWER8, show this before the
change:

 rm_entry:     3387.6ns (228 - 86600, 1008969 samples)
  rm_exit:     4561.5ns (12 - 3477452, 1009402 samples)
  rm_intr:     1660.0ns (12 - 553050, 3600051 samples)

and this after the change:

 rm_entry:     3060.1ns (212 - 65138, 953873 samples)
  rm_exit:     4244.1ns (12 - 9693408, 954331 samples)
  rm_intr:     1342.3ns (12 - 1104718, 3405326 samples)

for a test of booting Fedora 20 big-endian to the login prompt.

The time taken for a H_PROD hcall (which is handled in the host
kernel) went down from about 35 microseconds to about 16 microseconds
with this change.

The noinline added to kvmppc_run_core turned out to be necessary for
good performance, at least with gcc 4.9.2 as packaged with Fedora 21
and a little-endian POWER8 host.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-04-21 15:21:34 +02:00
Paul Mackerras 7d6c40da19 KVM: PPC: Book3S HV: Use bitmap of active threads rather than count
Currently, the entry_exit_count field in the kvmppc_vcore struct
contains two 8-bit counts, one of the threads that have started entering
the guest, and one of the threads that have started exiting the guest.
This changes it to an entry_exit_map field which contains two bitmaps
of 8 bits each.  The advantage of doing this is that it gives us a
bitmap of which threads need to be signalled when exiting the guest.
That means that we no longer need to use the trick of setting the
HDEC to 0 to pull the other threads out of the guest, which led in
some cases to a spurious HDEC interrupt on the next guest entry.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-04-21 15:21:33 +02:00
Paul Mackerras 5d5b99cd68 KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken
We can tell when a secondary thread has finished running a guest by
the fact that it clears its kvm_hstate.kvm_vcpu pointer, so there
is no real need for the nap_count field in the kvmppc_vcore struct.
This changes kvmppc_wait_for_nap to poll the kvm_hstate.kvm_vcpu
pointers of the secondary threads rather than polling vc->nap_count.
Besides reducing the size of the kvmppc_vcore struct by 8 bytes,
this also means that we can tell which secondary threads have got
stuck and thus print a more informative error message.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-04-21 15:21:32 +02:00
Paul Mackerras 1f09c3ed86 KVM: PPC: Book3S HV: Minor cleanups
* Remove unused kvmppc_vcore::n_busy field.
* Remove setting of RMOR, since it was only used on PPC970 and the
  PPC970 KVM support has been removed.
* Don't use r1 or r2 in setting the runlatch since they are
  conventionally reserved for other things; use r0 instead.
* Streamline the code a little and remove the ext_interrupt_to_host
  label.
* Add some comments about register usage.
* hcall_try_real_mode doesn't need to be global, and can't be
  called from C code anyway.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-04-21 15:21:32 +02:00
Paul Mackerras b6c295df31 KVM: PPC: Book3S HV: Accumulate timing information for real-mode code
This reads the timebase at various points in the real-mode guest
entry/exit code and uses that to accumulate total, minimum and
maximum time spent in those parts of the code.  Currently these
times are accumulated per vcpu in 5 parts of the code:

* rm_entry - time taken from the start of kvmppc_hv_entry() until
  just before entering the guest.
* rm_intr - time from when we take a hypervisor interrupt in the
  guest until we either re-enter the guest or decide to exit to the
  host.  This includes time spent handling hcalls in real mode.
* rm_exit - time from when we decide to exit the guest until the
  return from kvmppc_hv_entry().
* guest - time spend in the guest
* cede - time spent napping in real mode due to an H_CEDE hcall
  while other threads in the same vcore are active.

These times are exposed in debugfs in a directory per vcpu that
contains a file called "timings".  This file contains one line for
each of the 5 timings above, with the name followed by a colon and
4 numbers, which are the count (number of times the code has been
executed), the total time, the minimum time, and the maximum time,
all in nanoseconds.

The overhead of the extra code amounts to about 30ns for an hcall that
is handled in real mode (e.g. H_SET_DABR), which is about 25%.  Since
production environments may not wish to incur this overhead, the new
code is conditional on a new config symbol,
CONFIG_KVM_BOOK3S_HV_EXIT_TIMING.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-04-21 15:21:31 +02:00
Aneesh Kumar K.V 691e95fd73 powerpc/mm/thp: Make page table walk safe against thp split/collapse
We can disable a THP split or a hugepage collapse by disabling irq.
We do send IPI to all the cpus in the early part of split/collapse,
and disabling local irq ensure we don't make progress with
split/collapse. If the THP is getting split we return NULL from
find_linux_pte_or_hugepte(). For all the current callers it should be ok.
We need to be careful if we want to use returned pte_t pointer outside
the irq disabled region. W.r.t to THP split, the pfn remains the same,
but then a hugepage collapse will result in a pfn change. There are
few steps we can take to avoid a hugepage collapse.One way is to take page
reference inside the irq disable region. Other option is to take
mmap_sem so that a parallel collapse will not happen. We can also
disable collapse by taking pmd_lock. Another method used by kvm
subsystem is to check whether we had a mmu_notifer update in between
using mmu_notifier_retry().

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-17 11:23:39 +10:00
Linus Torvalds d19d5efd8c powerpc updates for 4.1
- Numerous minor fixes, cleanups etc.
 - More EEH work from Gavin to remove its dependency on device_nodes.
 - Memory hotplug implemented entirely in the kernel from Nathan Fontenot.
 - Removal of redundant CONFIG_PPC_OF by Kevin Hao.
 - Rewrite of VPHN parsing logic & tests from Greg Kurz.
 - A fix from Nish Aravamudan to reduce memory usage by clamping
   nodes_possible_map.
 - Support for pstore on powernv from Hari Bathini.
 - Removal of old powerpc specific byte swap routines by David Gibson.
 - Fix from Vasant Hegde to prevent the flash driver telling you it was flashing
   your firmware when it wasn't.
 - Patch from Ben Herrenschmidt to add an OPAL heartbeat driver.
 - Fix for an oops causing get/put_cpu_var() imbalance in perf by Jan Stancek.
 - Some fixes for migration from Tyrel Datwyler.
 - A new syscall to switch the cpu endian by Michael Ellerman.
 - Large series from Wei Yang to implement SRIOV, reviewed and acked by Bjorn.
 - A fix for the OPAL sensor driver from Cédric Le Goater.
 - Fixes to get STRICT_MM_TYPECHECKS building again by Michael Ellerman.
 - Large series from Daniel Axtens to make our PCI hooks per PHB rather than per
   machine.
 - Small patch from Sam Bobroff to explicitly abort non-suspended transactions
   on syscalls, plus a test to exercise it.
 - Numerous reworks and fixes for the 24x7 PMU from Sukadev Bhattiprolu.
 - Small patch to enable the hard lockup detector from Anton Blanchard.
 - Fix from Dave Olson for missing L2 cache information on some CPUs.
 - Some fixes from Michael Ellerman to get Cell machines booting again.
 - Freescale updates from Scott: Highlights include BMan device tree nodes, an
   MSI erratum workaround, a couple minor performance improvements, config
   updates, and misc fixes/cleanup.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVL2cxAAoJEFHr6jzI4aWAR8cP/19VTo/CzCE4ffPSx7qR464n
 F+WFZcbNjIMXu6+B0YLuJZEsuWtKKrCit/MCg3+mSgE4iqvxmtI+HDD0445Buszj
 UD4E4HMdPrXQ+KUSUDORvRjv/FFUXIa94LSv/0g2UeMsPz/HeZlhMxEu7AkXw9Nf
 rTxsmRTsOWME85Y/c9ss7XHuWKXT3DJV7fOoK9roSaN3dJAuWTtG3WaKS0nUu0ok
 0M81D6ZczoD6ybwh2DUMPD9K6SGxLdQ4OzQwtW6vWzcQIBDfy5Pdeo0iAFhGPvXf
 T4LLPkv4cF4AwHsAC4rKDPHQNa+oZBoLlScrHClaebAlDiv+XYKNdMogawUObvSh
 h7avKmQr0Ygp1OvvZAaXLhuDJI9FJJ8lf6AOIeULgHsDR9SyKMjZWxRzPe11uarO
 Fyi0qj3oJaQu6LjazZraApu8mo+JBtQuD3z3o5GhLxeFtBBF60JXj6zAXJikufnl
 kk1/BUF10nKUhtKcDX767AMUCtMH3fp5hx8K/z9T5v+pobJB26Wup1bbdT68pNBT
 NjdKUppV6QTjZvCsA6U2/ECu6E9KeIaFtFSL2IRRoiI0dWBN5/5eYn3RGkO2ZFoL
 1NdwKA2XJcchwTPkpSRrUG70sYH0uM2AldNYyaLfjzrQqza7Y6lF699ilxWmCN/H
 OplzJAE5cQ8Am078veTW
 =03Yh
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux

Pull powerpc updates from Michael Ellerman:

 - Numerous minor fixes, cleanups etc.

 - More EEH work from Gavin to remove its dependency on device_nodes.

 - Memory hotplug implemented entirely in the kernel from Nathan
   Fontenot.

 - Removal of redundant CONFIG_PPC_OF by Kevin Hao.

 - Rewrite of VPHN parsing logic & tests from Greg Kurz.

 - A fix from Nish Aravamudan to reduce memory usage by clamping
   nodes_possible_map.

 - Support for pstore on powernv from Hari Bathini.

 - Removal of old powerpc specific byte swap routines by David Gibson.

 - Fix from Vasant Hegde to prevent the flash driver telling you it was
   flashing your firmware when it wasn't.

 - Patch from Ben Herrenschmidt to add an OPAL heartbeat driver.

 - Fix for an oops causing get/put_cpu_var() imbalance in perf by Jan
   Stancek.

 - Some fixes for migration from Tyrel Datwyler.

 - A new syscall to switch the cpu endian by Michael Ellerman.

 - Large series from Wei Yang to implement SRIOV, reviewed and acked by
   Bjorn.

 - A fix for the OPAL sensor driver from Cédric Le Goater.

 - Fixes to get STRICT_MM_TYPECHECKS building again by Michael Ellerman.

 - Large series from Daniel Axtens to make our PCI hooks per PHB rather
   than per machine.

 - Small patch from Sam Bobroff to explicitly abort non-suspended
   transactions on syscalls, plus a test to exercise it.

 - Numerous reworks and fixes for the 24x7 PMU from Sukadev Bhattiprolu.

 - Small patch to enable the hard lockup detector from Anton Blanchard.

 - Fix from Dave Olson for missing L2 cache information on some CPUs.

 - Some fixes from Michael Ellerman to get Cell machines booting again.

 - Freescale updates from Scott: Highlights include BMan device tree
   nodes, an MSI erratum workaround, a couple minor performance
   improvements, config updates, and misc fixes/cleanup.

* tag 'powerpc-4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (196 commits)
  powerpc/powermac: Fix build error seen with powermac smp builds
  powerpc/pseries: Fix compile of memory hotplug without CONFIG_MEMORY_HOTREMOVE
  powerpc: Remove PPC32 code from pseries specific find_and_init_phbs()
  powerpc/cell: Fix iommu breakage caused by controller_ops change
  powerpc/eeh: Fix crash in eeh_add_device_early() on Cell
  powerpc/perf: Cap 64bit userspace backtraces to PERF_MAX_STACK_DEPTH
  powerpc/perf/hv-24x7: Fail 24x7 initcall if create_events_from_catalog() fails
  powerpc/pseries: Correct memory hotplug locking
  powerpc: Fix missing L2 cache size in /sys/devices/system/cpu
  powerpc: Add ppc64 hard lockup detector support
  oprofile: Disable oprofile NMI timer on ppc64
  powerpc/perf/hv-24x7: Add missing put_cpu_var()
  powerpc/perf/hv-24x7: Break up single_24x7_request
  powerpc/perf/hv-24x7: Define update_event_count()
  powerpc/perf/hv-24x7: Whitespace cleanup
  powerpc/perf/hv-24x7: Define add_event_to_24x7_request()
  powerpc/perf/hv-24x7: Rename hv_24x7_event_update
  powerpc/perf/hv-24x7: Move debug prints to separate function
  powerpc/perf/hv-24x7: Drop event_24x7_request()
  powerpc/perf/hv-24x7: Use pr_devel() to log message
  ...

Conflicts:
	tools/testing/selftests/powerpc/Makefile
	tools/testing/selftests/powerpc/tm/Makefile
2015-04-16 13:53:32 -05:00
Linus Torvalds d0bbe0dd35 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree from Jiri Kosina:
 "Usual trivial tree updates.  Nothing outstanding -- mostly printk()
  and comment fixes and unused identifier removals"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  goldfish: goldfish_tty_probe() is not using 'i' any more
  powerpc: Fix comment in smu.h
  qla2xxx: Fix printks in ql_log message
  lib: correct link to the original source for div64_u64
  si2168, tda10071, m88ds3103: Fix firmware wording
  usb: storage: Fix printk in isd200_log_config()
  qla2xxx: Fix printk in qla25xx_setup_mode
  init/main: fix reset_device comment
  ipwireless: missing assignment
  goldfish: remove unreachable line of code
  coredump: Fix do_coredump() comment
  stacktrace.h: remove duplicate declaration task_struct
  smpboot.h: Remove unused function prototype
  treewide: Fix typo in printk messages
  treewide: Fix typo in printk messages
  mod_devicetable: fix comment for match_flags
2015-04-14 09:50:27 -07:00
Michael Ellerman 89a51df5ab powerpc/eeh: Fix crash in eeh_add_device_early() on Cell
The recent change to the EEH probing causes a crash on Cell because
eeh_ops is NULL.

Check if EEH is enabled and if not bail out.

Fixes: ff57b454dd ("powerpc/eeh: Do probe on pci_dn")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-14 17:13:31 +10:00
Michael Ellerman ad30cb9946 Merge branch 'next-sriov' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc into next
Merge Richard's work to support SR-IOV on PowerNV. All generic PCI
patches acked by Bjorn.

Some minor conflicts with Daniel's pci_controller_ops work.

Conflicts:
	arch/powerpc/include/asm/machdep.h
	arch/powerpc/platforms/powernv/pci-ioda.c
2015-04-14 09:29:23 +10:00
Dave Olson f7e9e35836 powerpc: Fix missing L2 cache size in /sys/devices/system/cpu
This problem appears to have been introduced in 2.6.29 by commit
93197a36a9 "Rewrite sysfs processor cache info code".

This caused lscpu to error out on at least e500v2 devices, eg:

  error: cannot open /sys/devices/system/cpu/cpu0/cache/index2/size: No such file or directory

Some embedded powerpc systems use cache-size in DTS for the unified L2
cache size, not d-cache-size, so we need to allow for both DTS names.
Added a new CACHE_TYPE_UNIFIED_D cache_type_info structure to handle
this.

Fixes: 93197a36a9 ("powerpc: Rewrite sysfs processor cache info code")
Signed-off-by: Dave Olson <olson@cumulusnetworks.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:28 +10:00
Anton Blanchard c54b2bf1b5 powerpc: Add ppc64 hard lockup detector support
The hard lockup detector uses a PMU event as a periodic NMI to
detect if we are stuck (where stuck means no timer interrupts have
occurred).

Ben's rework of the ppc64 soft disable code has made ppc64 PMU
exceptions a partial NMI. They can get disabled if an external
interrupt comes in, but otherwise PMU interrupts will fire in
interrupt disabled regions.

We disable the hard lockup detector by default for a few reasons:

- It breaks userspace event based branches on POWER8.
- It is likely to produce false positives on KVM guests.
- Since PMCs can only count to 2^31, counting cycles means we might
  take multiple PMU exceptions per second per hardware thread even
  if our hard lockup timeout is 10 seconds.

It can be enabled via a boot option, or via procfs.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:27 +10:00
Sam bobroff feba40362b powerpc/tm: Abort syscalls in active transactions
This patch changes the syscall handler to doom (tabort) active
transactions when a syscall is made and return immediately without
performing the syscall.

Currently, the system call instruction automatically suspends an
active transaction which causes side effects to persist when an active
transaction fails.

This does change the kernel's behaviour, but in a way that was
documented as unsupported. It doesn't reduce functionality because
syscalls will still be performed after tsuspend. It also provides a
consistent interface and makes the behaviour of user code
substantially the same across powerpc and platforms that do not
support suspended transactions (e.g. x86 and s390).

Performance measurements using
http://ozlabs.org/~anton/junkcode/null_syscall.c
indicate the cost of a system call increases by about 0.5%.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Acked-By: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:19 +10:00
Daniel Axtens 467efc2e4f powerpc: Remove shims for pci_controller_ops operations
Remove shims, patch callsites to use pci_controller_ops
versions instead.

Also move back the probe mode defines, as explained in the patch
for pci_probe_mode.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:18 +10:00
Daniel Axtens 97884e00e2 powerpc: fsl_pci, swiotlb: Move controller ops from ppc_md to controller_ops
Move the installation of DMA operations out of swiotlb's subsys
initcall, and into the generic PCI controller operations struct.

These ops are installed conditionally, based on the ppc_swiotlb_enable
global. The global can be set in two places:
 - swiotlb_detect_4g, which is always called at the arch initcall level
 - setup_pci_atmu, which is called as part of the fsl_add_bridge and
fsl_pci_syscore_do_resume.

fsl_pci_syscore_do_resume is called late enough that any changes as a
result of that call will have no effect.

As such, if we test the global and set the operations as part of
fsl_add_bridge, after the call to setup_pci_atmu, we can be confident
that it will cover all the PCI implementations affected by the changes
to dma-swiotlb.c.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:17 +10:00
Daniel Axtens cd16c7ba0c powerpc: Create pci_controller_ops.reset_secondary_bus and shim
Add pci_controller_ops.reset_secondary_bus,
shadowing ppc_md.pcibios_reset_secondary_bus.
Add a shim, and changes the callsites to use the shim.

Use pcibios_reset_secondary_bus_shim, as both
pcibios_reset_secondary_bus and pci_reset_secondary_bus
are already taken.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:14 +10:00
Daniel Axtens 542070baf4 powerpc: Create pci_controller_ops.window_alignment and shim
Add pci_controller_ops.window_alignment,
shadowing ppc_md.pcibios_window_alignment.
Add a shim, and changes the callsites to use the shim.

Here, we use pci_window_alignment, as pcibios_window_alignment is
already taken.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:13 +10:00
Daniel Axtens b31e79f8d9 powerpc: Create pci_controller_ops.enable_device_hook and shim
Add pci_controller_ops.enable_device_hook,
shadowing ppc_md.pcibios_enable_device_hook.
Add a shim, and changes the callsites to use the shim.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:13 +10:00
Daniel Axtens ff9df8c87d powerpc: Create pci_controller_ops.probe_mode and shim
Add pci_controller_ops.probe_mode, shadowing ppc_md.pci_probe_mode.
Add a shim, and changes the callsites to use the shim.

We also need to move the probe mode defines to pci-bridge.h from pci.h.
They are required by the shim in order to return a sensible default.
Previously, the were defined in pci.h, but pci.h includes pci-bridge.h
before the relevant #defines. This means the definitions are absent
if pci.h is included before pci-bridge.h. This occurs in some drivers.
So, move the definitons now, and move them back when we remove the shim.

Anything that wants the defines would have had to include pci.h, and
since pci.h includes pci-bridge.h, nothing will lose access to the
defines.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:12 +10:00
Daniel Axtens b122c95494 powerpc: Create pci_controller_ops.dma_bus_setup and shim
Add pci_controller_ops.dma_bus_setup, shadowing ppc_md.pci_dma_bus_setup.
Add a shim, and changes the callsites to use the shim.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:11 +10:00
Daniel Axtens e02def5bce powerpc: Create pci_controller_ops.dma_dev_setup and shim
Introduces the pci_controller_ops structure.
Add pci_controller_ops.dma_dev_setup, shadowing ppc_md.pci_dma_dev_setup.
Add a shim, and change the callsites to use the shim.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:11 +10:00
Daniel Axtens c88c2a1889 powerpc: pcibios_enable_device_hook: return bool rather than int
pcibios_enable_device_hook returned an int. Every implementation
returned either -EINVAL or 0. The return value wasn't propagated by
the caller: any non-zero return value caused pcibios_enable_device
to return -EINVAL itself. Therefore, make the hook return a bool.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:10 +10:00
Daniel Axtens bdc728a849 powerpc: move find_and_init_phbs() to pSeries specific code
Previously, find_and_init_phbs() was used in both PowerNV and pSeries
setup. However, since RTAS support has been dropped from PowerNV, we
can move it into a platform-specific file.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-11 20:49:09 +10:00
Michael Ellerman 7e862d7e7d powerpc: Reword the "returning from prom_init" message
We get way too many bug reports that say "the kernel is hung in
prom_init", which stems from the fact that the last piece of output
people see is "returning from prom_init".

The kernel is almost never hung in prom_init(), it's just that it's
crashed somewhere after prom_init() but prior to the console coming up.

The existing message should give a clue to that, ie. "returning from"
indicates that prom_init() has finished, but it doesn't seem to work.
Let's try something different.

This prints:

  Quiescing Open Firmware ...
  Booting Linux via __start() ...

Which hopefully makes it clear that prom_init() is not the problem, and
although __start() probably isn't either, it's at least the right place
to begin looking.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Wistfully-Acked-by: Jeremy Kerr <jk@ozlabs.org>
2015-04-10 20:02:48 +10:00
Michael Ellerman f691fa1080 powerpc: Replace mem_init_done with slab_is_available()
We have a powerpc specific global called mem_init_done which is "set on
boot once kmalloc can be called".

But that's not *quite* true. We set it at the bottom of mem_init(), and
rely on the fact that mm_init() calls kmem_cache_init() immediately
after that, and nothing is running in parallel.

So replace it with the generic and 100% correct slab_is_available().

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-10 20:02:48 +10:00
Michael Ellerman bf4981a006 powerpc: Remove the celleb support
The celleb code has seen no actual development for ~7 years.

We (maintainers) have no access to test hardware, and it is highly
likely the code has bit-rotted.

As far as we're aware the hardware was never widely available, and is
certainly no longer available, and no one on the list has shown any
interest in it over the years.

So remove it. If anyone has one and cares please speak up.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
2015-04-07 17:15:13 +10:00
Michael Ellerman 428d4d6520 Merge branch 'next-eeh' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc into next 2015-04-07 13:24:55 +10:00
Benjamin Herrenschmidt d4ed11aa48 Merge branch 'next-eeh' into next-sriov
Merge in Gavin EEH fixes
2015-03-31 13:11:17 +11:00
Gavin Shan 433185d2b4 powerpc/eeh: Fix PE#0 check in eeh_add_to_parent_pe()
The function eeh_add_parent_pe() is used to create a PE or add one
edev to its parent PE. Current code checks if PE#0 is valid for the
later case. Actually, we should validate PE#0 for both cases when
EEH core regards PE#0 as invalid one (without flag EEH_VALID_PE_ZERO).
Otherwise, not all EEH devices can be added to its parent PE#0 for
EEH on P7IOC.

The patch fixes the issue by validating PE#0 for the two cases. So far,
we don't have PE#0 for EEH on P7IOC, but it will show up when we enable
M64 for P7IOC. The patch also makes the error message more meaningful.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-31 13:10:39 +11:00
Wei Yang 781a868f31 powerpc/powernv: Shift VF resource with an offset
On PowerNV platform, resource position in M64 BAR implies the PE# the
resource belongs to. In some cases, adjustment of a resource is necessary
to locate it to a correct position in M64 BAR .

This patch adds pnv_pci_vf_resource_shift() to shift the 'real' PF IOV BAR
address according to an offset.

Note:

    After doing so, there would be a "hole" in the /proc/iomem when offset
    is a positive value. It looks like the device return some mmio back to
    the system, which actually no one could use it.

[bhelgaas: rework loops, rework overlap check, index resource[]
conventionally, remove pci_regs.h include, squashed with next patch]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-31 13:02:38 +11:00
Wei Yang 5350ab3fd7 powerpc/powernv: Implement pcibios_iov_resource_alignment() on powernv
Implement pcibios_iov_resource_alignment() on powernv platform.

On PowerNV platform, there are 3 cases for the IOV BAR:
1. initial state, the IOV BAR size is multiple times of VF BAR size
2. after expanded, the IOV BAR size is expanded to meet the M64 segment size
3. sizing stage, the IOV BAR is truncated to 0

pnv_pci_iov_resource_alignment() handle these three cases respectively.

[bhelgaas: adjust to drop "align" parameter, return pci_iov_resource_size()
if no ppc_md machdep_call version]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-31 13:02:37 +11:00
Wei Yang 6e628c7d33 powerpc/powernv: Reserve additional space for IOV BAR according to the number of total_pe
On PHB3, PF IOV BAR will be covered by M64 BAR to have better PE isolation.
M64 BAR is a type of hardware resource in PHB3, which could map a range of
MMIO to PE numbers on powernv platform. And this range is divided equally
by the number of total_pe with each divided range mapping to a PE number.
Also, the M64 BAR must map a MMIO range with power-of-two size.

The total_pe number is usually different from total_VFs, which can lead to
a conflict between MMIO space and the PE number.

For example, if total_VFs is 128 and total_pe is 256, the second half of
M64 BAR will be part of other PCI device, which may already belong to other
PEs.

This patch prevents the conflict by reserving additional space for the PF
IOV BAR, which is total_pe number of VF's BAR size.

[bhelgaas: make dev_printk() output more consistent, index resource[]
conventionally]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-31 13:02:37 +11:00
Wei Yang c3b80fb0f2 powerpc/pci: Don't unset PCI resources for VFs
Flag PCI_REASSIGN_ALL_RSRC is used to ignore resources information setup by
firmware, so that kernel would re-assign all resources of pci devices.

On powerpc arch, this happens in a header fixup function
pcibios_fixup_resources(), which will clean up the resources if this flag
is set. This works fine for PFs, since after clean up, kernel will
re-assign the resources in pcibios_resource_survey().

Below is a simple call flow on how it works:

    pcibios_init
      pcibios_scan_phb
        pci_scan_child_bus
          ...
            pci_device_add
              pci_fixup_device(pci_fixup_header)
                pcibios_fixup_resources                     # header fixup
                  for (i = 0; i < DEVICE_COUNT_RESOURCE; i++)
                    dev->resource[i].start = 0
      pcibios_resource_survey                               # re-assign
        pcibios_allocate_resources

However, the VF resources won't be re-assigned, since the VF resources are
completely determined by the PF resources, and the PF resources have
already been reassigned. This means we need to leave VF's resources
un-cleared in pcibios_fixup_resources().

In this patch, we skip the resource unset process in
pcibios_fixup_resources(), if the pci_dev is a VF.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-31 13:02:37 +11:00
Gavin Shan a8b2f8288a powerpc/pci: Create pci_dn for VFs
pci_dn is the extension of PCI device node and is created from device node.
Unfortunately, VFs are enabled dynamically by PF's driver and they don't
have corresponding device nodes and pci_dn, which is required to access
VFs' config spaces.

The patch creates pci_dn for VFs in pcibios_sriov_enable() on their PF,
and removes pci_dn for VFs in pcibios_sriov_disable() on their PF. When
VF's pci_dn is created, it's put to the child list of the pci_dn of PF's
upstream bridge. The pci_dn is linked to pci_dev during early fixup time
to setup the fast path.

[bhelgaas: add ifdef around add_one_dev_pci_info(), use dev_printk()]
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-31 13:02:37 +11:00
Michael Ellerman 529d235a0e powerpc: Add a proper syscall for switching endianness
We currently have a "special" syscall for switching endianness. This is
syscall number 0x1ebe, which is handled explicitly in the 64-bit syscall
exception entry.

That has a few problems, firstly the syscall number is outside of the
usual range, which confuses various tools. For example strace doesn't
recognise the syscall at all.

Secondly it's handled explicitly as a special case in the syscall
exception entry, which is complicated enough without it.

As a first step toward removing the special syscall, we need to add a
regular syscall that implements the same functionality.

The logic is simple, it simply toggles the MSR_LE bit in the userspace
MSR. This is the same as the special syscall, with the caveat that the
special syscall clobbers fewer registers.

This version clobbers r9-r12, XER, CTR, and CR0-1,5-7.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-28 22:03:40 +11:00
Tyrel Datwyler c03e73740d powerpc/pseries: Simplify check for suspendability during suspend/migration
During suspend/migration operation we must wait for the VASI state reported
by the hypervisor to become Suspending prior to making the ibm,suspend-me
RTAS call. Calling routines to rtas_ibm_supend_me() pass a vasi_state variable
that exposes the VASI state to the caller. This is unnecessary as the caller
only really cares about the following three conditions; if there is an error
we should bailout, success indicating we have suspended and woken back up so
proceed to device tree update, or we are not suspendable yet so try calling
rtas_ibm_suspend_me again shortly.

This patch removes the extraneous vasi_state variable and simply uses the
return code to communicate how to proceed. We either succeed, fail, or get
-EAGAIN in which case we sleep for a second before trying to call
rtas_ibm_suspend_me again. The behaviour of ppc_rtas() remains the same,
but migrate_store() now returns the propogated error code on failure.
Previously -1 was returned from migrate_store() in the  failure case which
equates to -EPERM and was clearly wrong.

Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Cc: Nathan Fontenont <nfont@linux.vnet.ibm.com>
Cc: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-28 12:20:39 +11:00
Gavin Shan c6406d8fbb powerpc/eeh: Remove device_node dependency
The patch removes struct eeh_dev::dn and the corresponding helper
functions: eeh_dev_to_of_node() and of_node_to_eeh_dev(). Instead,
eeh_dev_to_pdn() and pdn_to_eeh_dev() should be used to get the
pdn, which might contain device_node on PowerNV platform.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-24 13:15:53 +11:00
Gavin Shan 0bd785873c powerpc/eeh: Replace device_node with pci_dn in eeh_ops
There are 3 EEH operations whose arguments contain device_node:
read_config(), write_config() and restore_config(). The patch
replaces device_node with pci_dn.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-24 13:15:52 +11:00
Gavin Shan ff57b454dd powerpc/eeh: Do probe on pci_dn
Originally, EEH core probes on device_node or pci_dev to populate
EEH devices and PEs, which conflicts with the fact: SRIOV VFs are
usually enabled and created by PF's driver and they don't have the
corresponding device_nodes. Instead, SRIOV VFs have dynamically
created pci_dn, which can be used for EEH probe.

The patch reworks EEH probe for PowerNV and pSeries platforms to
do probing based on pci_dn, instead of pci_dev or device_node any
more.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-24 13:15:52 +11:00
Gavin Shan e8e9b34cef powerpc/eeh: Create eeh_dev from pci_dn instead of device_node
The patch adds function traverse_pci_dn(), which is similar to
traverse_pci_devices() except it takes pci_dn, not device_node
as parameter. The pci_dev.c has been reworked to create eeh_dev
from pci_dn, instead of device_node.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-24 13:15:51 +11:00
Gavin Shan c035ff1d2e powerpc/pci: Trace more information from pci_dn
Originally, EEH probes on device_node or pci_dev and populates the
corresponding eeh_dev. In the subsequent patches, EEH will probes
on pci_dn and populates the corresponding eeh_dev. So we have to
cache some information in pci_dn, either from device_node or SRIOV
PF's enablement platform hook, to populate the eeh_dev properly.

The motivation to probe pci_dn, instead of device node or pci_dev,
to populate eeh_dev is SRIOV VFs are dynamically created and we
don't have the corresponding device nodes for them.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-24 13:15:50 +11:00
Gavin Shan cca87d303c powerpc/pci: Refactor pci_dn
Currently, the PCI config accessors are implemented based on device node.
Unfortunately, SRIOV VFs won't have the corresponding device nodes. pci_dn
will be used in replacement with device node for SRIOV VFs. So we have to
use pci_dn in PCI config accessors.

The patch refactors pci_dn in following aspects to make it ready to be used
in PCI config accessors as we do in subsequent patch:

   * pci_dn is organized as a hierarchy tree.  PCI device's pci_dn is
     put to the child list of pci_dn of its upstream bridge or PHB. VF's
     pci_dn will be put to the child list of pci_dn of PF's bridge.

   * For one particular PCI device (VF or not), its pci_dn can be
     found from pdev->dev.archdata.pci_data, PCI_DN(devnode), or
     parent's list.  The fast path (fetching pci_dn through PCI device
     instance) is populated during early fixup time.

[bhelgaas: changelog]
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-24 13:15:49 +11:00
Mahesh Salgaonkar 44d5f6f590 powerpc/book3s: Fix the MCE code to use CONFIG_KVM_BOOK3S_64_HANDLER
commit id 2ba9f0d has changed CONFIG_KVM_BOOK3S_64_HV to tristate to allow
HV/PR bits to be built as modules. But the MCE code still depends on
CONFIG_KVM_BOOK3S_64_HV which is wrong. When user selects
CONFIG_KVM_BOOK3S_64_HV=m to build HV/PR bits as a separate module the
relevant MCE code gets excluded.

This patch fixes the MCE code to use CONFIG_KVM_BOOK3S_64_HANDLER. This
makes sure that the relevant MCE code is included when HV/PR bits
are built as a separate modules.

Fixes: 2ba9f0d887 ("kvm: powerpc: book3s: Support building HV and PR KVM as module")
Cc: stable@vger.kernel.org  # v3.14+
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-23 17:10:47 +11:00
Paul Mackerras f57333a767 powerpc/powernv: Fix return value from power7_nap() et al.
The power7_nap(), power7_sleep() and power7_winkle() functions are
called from pnv_smp_cpu_kill_self(), which expects them to return the
SRR1 value set by the hardware on wakeup, or 0 if no nap/sleep/winkle
occurred.  However, in the case where an interrupt needs to be
replayed, the logic in power7_powersave_common (the common code for
power7_nap et al.) doesn't set r3 to 0 in this case.  Instead what we
get as the return value is the selector for the type of power-saving
mode requested (1, 2 or 3).  In fact this should not affect the
operation of pnv_smp_cpu_kill_self(), but it is better to get this
correct, so this adds an instruction to set r3 to 0 in this case.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-23 15:06:50 +11:00
Hari Bathini e4a9616c54 powerpc/rtas: Make timestamp related code y2038-safe
While we are here, let us make timestamp related code y2038-safe.

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-23 14:06:11 +11:00
Hari Bathini f7618299b4 powerpc/powernv: Add pstore support on powernv
This patch extends pstore, a generic interface to platform dependent
persistent storage, support for powernv  platform to capture certain
useful information, during dying moments. Such support is already in
place for  pseries platform. This patch re-uses most of that code.

It is a common practice to compile kernels with both CONFIG_PPC_PSERIES=y
and CONFIG_PPC_POWERNV=y. The code in nvram_init_oops_partition() routine
still works as intended, as the caller is platform specific code which
passes the appropriate value for "rtas_partition_exists" parameter.
In all other places, where CONFIG_PPC_PSERIES or CONFIG_PPC_POWERNV
flag is used in this patchset, it is to reduce the kernel size in cases
where this flag is not set and doesn't have any impact logic wise.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-23 14:06:10 +11:00
Hari Bathini 78989f0a55 powerpc/nvram: Move generic code for nvram and pstore
With minor checks, we can move most of the code for nvram
under pseries to a common place to be re-used by other
powerpc platforms like powernv. This patch moves such
common code to arch/powerpc/kernel/nvram_64.c file.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
[mpe: Move select of ZLIB_DEFLATE to PPC64 to fix the build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-23 14:05:49 +11:00
Benjamin Herrenschmidt ddee09c099 powerpc: Add PVR for POWER8NVL processor
There's a new variant of POWER8 coming called "POWER8 with NVLink". The
core is identical to POWER8 but unfortunately they strapped it with a
different PVR, so we need to add an explicit entry for it.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-20 14:52:27 +11:00
Paul Mackerras 755563bc79 powerpc/powernv: Fixes for hypervisor doorbell handling
Since we can now use hypervisor doorbells for host IPIs, this makes
sure we clear the host IPI flag when taking a doorbell interrupt, and
clears any pending doorbell IPI in pnv_smp_cpu_kill_self() (as we
already do for IPIs sent via the XICS interrupt controller).  Otherwise
if there did happen to be a leftover pending doorbell interrupt for
an offline CPU thread for any reason, it would prevent that thread from
going into a power-saving mode; it would instead keep waking up because
of the interrupt.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-20 14:51:53 +11:00
Alex Dowad 6eca8933d3 powerpc/kernel: Rename copy_thread() 'arg' argument to 'kthread_arg'
The 'arg' argument to copy_thread() is only ever used when forking a new
kernel thread. Hence, rename it to 'kthread_arg' for clarity.

Signed-off-by: Alex Dowad <alexinbeijing@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-20 12:41:15 +11:00
Kevin Hao 52d9962700 powerpc: kill PPC_OF
We have set CONFIG_PPC_OF to always 'y' in commit 0a498d96a3
("powerpc: set CONFIG_PPC_OF=y always for ARCH=powerpc") nine years
ago. And the arch/ppc also has gone away for many years. The OF
functionality was also moved to a common place and be used by many
archs. So it does make no sense to keep such a option in the current
kernel. Just kill it.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-17 20:04:32 +11:00
Gavin Shan 28158cd1b7 powerpc/eeh: Enhance pcibios_set_pcie_reset_state()
Function pcibios_set_pcie_reset_state() is possibly called by
pci_reset_function(), on which VFIO infrastructure depends to
issue reset. pcibios_set_pcie_reset_state() is issuing reset
on the parent PE of the indicated PCI device. The reset causes
state lost on all PCI devices except the indicated one as the
argument to pcibios_set_pcie_reset_state(). Also, sideband
MMIO access from guest when issuing reset would cause unexpected
EEH error.

For above two issues, the patch applies following enhancements
to pcibios_set_pcie_reset_state():

   * For all PCI devices except the indicated one, save their
     state prior to reset and restore state after that.
   * Explicitly freeze PE prior to reset and unfreeze it after
     that, in order to avoid unexpected EEH error.

Tested-by: Priya M. A <priyama2@in.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 10:31:18 +11:00
Mahesh Salgaonkar 45706bb53d powerpc/book3s: Fix flush_tlb cpu_spec hook to take a generic argument.
The flush_tlb hook in cpu_spec was introduced as a generic function hook
to invalidate TLBs. But the current implementation of flush_tlb hook
takes IS (invalidation selector) as an argument which is architecture
dependent. Hence, It is not right to have a generic routine where caller
has to pass non-generic argument.

This patch fixes this and makes flush_tlb hook as high level API.

Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-17 07:52:48 +11:00
Anton Blanchard c2ce6f9f3d powerpc: Change vrX register defines to vX to match gcc and glibc
As our various loops (copy, string, crypto etc) get more complicated,
we want to share implementations between userspace (eg glibc) and
the kernel. We also want to write userspace test harnesses to put
in tools/testing/selftest.

One gratuitous difference between userspace and the kernel is the
VMX register definitions - the kernel uses vrX whereas both gcc and
glibc use vX.

Change the kernel to match userspace.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-16 18:32:11 +11:00
Masanari Iida d939be3add treewide: Fix typo in printk messages
This patch fix spelling typo in printk messages.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-03-06 23:05:39 +01:00
Nishanth Aravamudan 4ad04e5987 powerpc/iommu: Remove IOMMU device references via bus notifier
After d905c5df9a ("PPC: POWERNV: move iommu_add_device earlier"), the
refcnt on the kobject backing the IOMMU group for a PCI device is
elevated by each call to pci_dma_dev_setup_pSeriesLP() (via
set_iommu_table_base_and_group). When we go to dlpar a multi-function
PCI device out:

        iommu_reconfig_notifier ->
                iommu_free_table ->
                        iommu_group_put
                        BUG_ON(tbl->it_group)

We trip this BUG_ON, because there are still references on the table, so
it is not freed. Fix this by moving the powernv bus notifier to common
code and calling it for both powernv and pseries.

Fixes: d905c5df9a ("PPC: POWERNV: move iommu_add_device earlier")
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Tested-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-04 13:19:33 +11:00
Michael Ellerman 875ebe940d powerpc/smp: Wait until secondaries are active & online
Anton has a busy ppc64le KVM box where guests sometimes hit the infamous
"kernel BUG at kernel/smpboot.c:134!" issue during boot:

  BUG_ON(td->cpu != smp_processor_id());

Basically a per CPU hotplug thread scheduled on the wrong CPU. The oops
output confirms it:

  CPU: 0
  Comm: watchdog/130

The problem is that we aren't ensuring the CPU active bit is set for the
secondary before allowing the master to continue on. The master unparks
the secondary CPU's kthreads and the scheduler looks for a CPU to run
on. It calls select_task_rq() and realises the suggested CPU is not in
the cpus_allowed mask. It then ends up in select_fallback_rq(), and
since the active bit isnt't set we choose some other CPU to run on.

This seems to have been introduced by 6acbfb9697 "sched: Fix hotplug
vs. set_cpus_allowed_ptr()", which changed from setting active before
online to setting active after online. However that was in turn fixing a
bug where other code assumed an active CPU was also online, so we can't
just revert that fix.

The simplest fix is just to spin waiting for both active & online to be
set. We already have a barrier prior to set_cpu_online() (which also
sets active), to ensure all other setup is completed before online &
active are set.

Fixes: 6acbfb9697 ("sched: Fix hotplug vs. set_cpus_allowed_ptr()")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-04 13:19:33 +11:00
Linus Torvalds 18a8d49973 The clock framework changes for 3.20 contain the usual driver additions,
enhancements and fixes mostly for ARM32, ARM64, MIPS and Power-based
 devices. Additionaly the framework core underwent a bit of surgery with
 two major changes. The boundary between the clock core and clock
 providers (e.g clock drivers) is now more well defined with dedicated
 provider helper functions. struct clk no longer maps 1:1 with the
 hardware clock but is a true per-user cookie which helps us tracker
 users of hardware clocks and debug bad behavior. The second major change
 is the addition of rate constraints for clocks. Rate ranges are now
 supported which are analogous to the voltage ranges in the regulator
 framework. Unfortunately these changes to the core created some
 breakeage. We think we fixed it all up but for this reason there are
 lots of last minute commits trying to undo the damage.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJU54D5AAoJEDqPOy9afJhJs6AQAK5YuUwjDchdpNZx9p7OnT1q
 +poehuUwE/gYjmdACqYFyaPrI/9f43iNCfFAgKGLQqmB5ZK4sm4ktzfBEhjWINR2
 iiCx9QYMQVGiKwC8KU0ddeBciglE2b/DwxB45m9TsJEjowucUeBzwLEIj5DsGxf7
 teXRoOWgXdz1MkQJ4pnA09Q3qEPQgmu8prhMfka/v75/yn7nb9VWiJ6seR2GqTKY
 sIKL9WbKjN4AzctggdqHnMSIqZoq6vew850bv2C1fPn7GiYFQfWW+jvMlVY40dp8
 nNa2ixSQSIXVw4fCtZhTIZcIvZ8puc7WVLcl8fz3mUe3VJn1VaGs0E+Yd3GexpIV
 7bwkTOIdS8gSRlsUaIPiMnUob5TUMmMqjF4KIh/AhP4dYrmVbU7Ie8ccvSxe31Ku
 lK7ww6BFv3KweTnW/58856ZXDlXLC6x3KT+Fw58L23VhPToFgYOdTxn8AVtE/LKP
 YR3UnY9BqFx6WHXVoNvg3Piyej7RH8fYmE9om8tyWc/Ab8Eo501SHs9l3b2J8snf
 w/5STd2CYxyKf1/9JLGnBvGo754O9NvdzBttRlygB14gCCtS/SDk/ELG2Ae+/a9P
 YgRk2+257h8PMD3qlp94dLidEZN4kYxP/J6oj0t1/TIkERWfZjzkg5tKn3/hEcU9
 qM97ZBTplTm6FM+Dt/Vk
 =zCVK
 -----END PGP SIGNATURE-----

Merge tag 'clk-for-linus-3.20' of git://git.linaro.org/people/mike.turquette/linux

Pull clock framework updates from Mike Turquette:
 "The clock framework changes contain the usual driver additions,
  enhancements and fixes mostly for ARM32, ARM64, MIPS and Power-based
  devices.

  Additionally the framework core underwent a bit of surgery with two
  major changes:

   - The boundary between the clock core and clock providers (e.g clock
     drivers) is now more well defined with dedicated provider helper
     functions.  struct clk no longer maps 1:1 with the hardware clock
     but is a true per-user cookie which helps us tracker users of
     hardware clocks and debug bad behavior.

   - The addition of rate constraints for clocks.  Rate ranges are now
     supported which are analogous to the voltage ranges in the
     regulator framework.

  Unfortunately these changes to the core created some breakeage.  We
  think we fixed it all up but for this reason there are lots of last
  minute commits trying to undo the damage"

* tag 'clk-for-linus-3.20' of git://git.linaro.org/people/mike.turquette/linux: (113 commits)
  clk: Only recalculate the rate if needed
  Revert "clk: mxs: Fix invalid 32-bit access to frac registers"
  clk: qoriq: Add support for the platform PLL
  powerpc/corenet: Enable CLK_QORIQ
  clk: Replace explicit clk assignment with __clk_hw_set_clk
  clk: Add __clk_hw_set_clk helper function
  clk: Don't dereference parent clock if is NULL
  MIPS: Alchemy: Remove bogus args from alchemy_clk_fgcs_detr
  clkdev: Always allocate a struct clk and call __clk_get() w/ CCF
  clk: shmobile: div6: Avoid division by zero in .round_rate()
  clk: mxs: Fix invalid 32-bit access to frac registers
  clk: omap: compile legacy omap3 clocks conditionally
  clkdev: Export clk_register_clkdev
  clk: Add rate constraints to clocks
  clk: remove clk-private.h
  pci: xgene: do not use clk-private.h
  arm: omap2+ remove dead clock code
  clk: Make clk API return per-user struct clk instances
  clk: tegra: Define PLLD_DSI and remove dsia(b)_mux
  clk: tegra: Add support for the Tegra132 CAR IP block
  ...
2015-02-21 12:30:30 -08:00
Geoff Levand b28c2ee868 kexec: add IND_FLAGS macro
Add a new kexec preprocessor macro IND_FLAGS, which is the bitwise OR of
all the possible kexec IND_ kimage_entry indirection flags.  Having this
macro allows for simplified code in the prosessing of the kexec
kimage_entry items.  Also, remove the local powerpc definition and use the
generic one.

Signed-off-by: Geoff Levand <geoff@infradead.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Maximilian Attems <max@stro.at>
Cc: Michal Marek <mmarek@suse.cz>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17 14:34:51 -08:00
Tejun Heo 0c118b7bd0 powerpc: use %*pb[l] to print bitmaps including cpumasks and nodemasks
printk and friends can now format bitmaps using '%*pb[l]'.  cpumask
and nodemask also provide cpumask_pr_args() and nodemask_pr_args()
respectively which can be used to generate the two printf arguments
necessary to format the specified cpu/nodemask.

* Spurious if (len > 1) test dropped from shared_cpu_map_show().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13 21:21:36 -08:00
Cyril Bur 4be1b29795 powerpc: add running_clock for powerpc to prevent spurious softlockup warnings
On POWER8 virtualised kernels the VTB register can be read to have a view
of time that only increases while the guest is running.  This will prevent
guests from seeing time jump if a guest is paused for significant amounts
of time.

On POWER7 and below virtualised kernels stolen time is subtracted from
local_clock as a best effort approximation.  This will not eliminate
spurious warnings in the case of a suspended guest but may reduce the
occurance in the case of softlockups due to host over commit.

Bare metal kernels should avoid reading the VTB as KVM does not restore
sane values when not executing, the approxmation is fine as host kernels
won't observe any stolen time.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andrew Jones <drjones@redhat.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: chai wen <chaiw.fnst@cn.fujitsu.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Ben Zhang <benzh@chromium.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:13 -08:00
Andy Lutomirski f56141e3e2 all arches, signal: move restart_block to struct task_struct
If an attacker can cause a controlled kernel stack overflow, overwriting
the restart block is a very juicy exploit target.  This is because the
restart_block is held in the same memory allocation as the kernel stack.

Moving the restart block to struct task_struct prevents this exploit by
making the restart_block harder to locate.

Note that there are other fields in thread_info that are also easy
targets, at least on some architectures.

It's also a decent simplification, since the restart code is more or less
identical on all architectures.

[james.hogan@imgtec.com: metag: align thread_info::supervisor_stack]
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: David Miller <davem@davemloft.net>
Acked-by: Richard Weinberger <richard@nod.at>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:12 -08:00
Linus Torvalds d3f180ea1a powerpc updates for 3.20
Including:
 
 - Update of all defconfigs
 - Addition of a bunch of config options to modernise our defconfigs
 - Some PS3 updates from Geoff
 - Optimised memcmp for 64 bit from Anton
 - Fix for kprobes that allows 'perf probe' to work from Naveen
 - Several cxl updates from Ian & Ryan
 - Expanded support for the '24x7' PMU from Cody & Sukadev
 - Freescale updates from Scott:
   "Highlights include 8xx optimizations, some more work on datapath device
    tree content, e300 machine check support, t1040 corenet error reporting,
    and various cleanups and fixes."
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJU2/LSAAoJEFHr6jzI4aWATDAQAKPU6v2Mq0sLnGst69waHU/Q
 vvpIq9hqVeSr6znHhrnazc3iQTLk0acqIdxUl/dT+5ADhi9+FxGD5Ckk+BH1DDve
 g6mQelSMlVZF9hKonHsbr4iUuTUyZyx2vj2qjdgOaRiv9Xubq6vUFNeolq3AeHxv
 J33vqRTmowj3VJ52u+V1dmzXQGfUye7DG2jHpjXoBieZsroTvyuYm5GoIPblWFO6
 zbYRh6IitALnQRtXfwIManPyWMkJti9JX8PwDkmvacr+V+MXbrksHpIOITMhNlo1
 WsVnFMpxuk80XuUfhaKZgISgBSfCqBckvKDn2QwztF2/kBnV6Su5xiOKVgouzM6B
 myy+maiMZlNJlNjqdMK5v2bqMXICP048zgfMbDN2e1K25jSSlRawt0RngoCQO2EP
 7aWmEDAlL3shgzkl68pj1fevQokxC/40C1yExIgAa9C31+bjtMz4Xb1SfN1SSveW
 7uWEY/eG9eLsrSE1CeBDvh6B8BRdyuIHgPhux4Tgc/bUtBGFQ29NuXwKh3QCeEy9
 9wWrRGx3U69eP06Ey7P5js3jPTQs80bjJewyGaiPQF5XHB89To8Dg8VfXjEV49Dx
 Pa3OLL5QsQloKfEBiEhQeGfKYImC00pVYAxc0qpmnr9T+25Ri1TLdF1EBAwriSYE
 5p9kSW+ZIht0lvzsdPNm
 =xDU3
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux

Pull powerpc updates from Michael Ellerman:

 - Update of all defconfigs

 - Addition of a bunch of config options to modernise our defconfigs

 - Some PS3 updates from Geoff

 - Optimised memcmp for 64 bit from Anton

 - Fix for kprobes that allows 'perf probe' to work from Naveen

 - Several cxl updates from Ian & Ryan

 - Expanded support for the '24x7' PMU from Cody & Sukadev

 - Freescale updates from Scott:
    "Highlights include 8xx optimizations, some more work on datapath
     device tree content, e300 machine check support, t1040 corenet
     error reporting, and various cleanups and fixes"

* tag 'powerpc-3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (102 commits)
  cxl: Add missing return statement after handling AFU errror
  cxl: Fail AFU initialisation if an invalid configuration record is found
  cxl: Export optional AFU configuration record in sysfs
  powerpc/mm: Warn on flushing tlb page in kernel context
  powerpc/powernv: Add OPAL soft-poweroff routine
  powerpc/perf/hv-24x7: Document sysfs event description entries
  powerpc/perf/hv-gpci: add the remaining gpci requests
  powerpc/perf/{hv-gpci, hv-common}: generate requests with counters annotated
  powerpc/perf/hv-24x7: parse catalog and populate sysfs with events
  perf: define EVENT_DEFINE_RANGE_FORMAT_LITE helper
  perf: add PMU_EVENT_ATTR_STRING() helper
  perf: provide sysfs_show for struct perf_pmu_events_attr
  powerpc/kernel: Avoid initializing device-tree pointer twice
  powerpc: Remove old compile time disabled syscall tracing code
  powerpc/kernel: Make syscall_exit a local label
  cxl: Fix device_node reference counting
  powerpc/mm: bail out early when flushing TLB page
  powerpc: defconfigs: add MTD_SPI_NOR (new dependency for M25P80)
  perf/powerpc: reset event hw state when adding it to the PMU
  powerpc/qe: Use strlcpy()
  ...
2015-02-11 18:15:38 -08:00
Michael Ellerman a604c96eb0 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next
Freescale updates from Scott:

"Highlights include 8xx optimizations, some more work on datapath device
tree content, e300 machine check support, t1040 corenet error reporting,
and various cleanups and fixes."
2015-02-04 12:03:21 +11:00
Michael Turquette 54eea32f7e Merge branch 'clk-next' into v3.19-rc7 2015-02-02 14:59:38 -08:00
Gavin Shan fe12545e76 powerpc/kernel: Avoid initializing device-tree pointer twice
As commit 50ba08f3 ("of/fdt: Don't clear initial_boot_params
if fdt_check_header() fails") does, the device-tree pointer
"initial_boot_params" is initialized by early_init_dt_verify(),
which is called by early_init_devtree(). So we needn't explicitly
initialize that again in early_init_devtree().

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-02-02 14:51:32 +11:00
Michael Ellerman a4bcbe6a41 powerpc: Remove old compile time disabled syscall tracing code
We have code to do syscall tracing which is disabled at compile time by
default. It's not been touched since the dawn of time (ie. v2.6.12).

There are now better ways to do syscall tracing, ie. using the
raw_syscall, or syscall tracepoints.

For the specific case of tracing syscalls at boot on a system that
doesn't get to userspace, you can boot with:

  trace_event=syscalls tp_printk=on

Which will trace syscalls from boot, and echo all output to the console.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-02-02 14:51:32 +11:00
Michael Ellerman 4c3b216861 powerpc/kernel: Make syscall_exit a local label
Currently when we back trace something that is in a syscall we see
something like this:

[c000000000000000] [c000000000000000] SyS_read+0x6c/0x110
[c000000000000000] [c000000000000000] syscall_exit+0x0/0x98

Although it's entirely correct, seeing syscall_exit at the bottom can be
confusing - we were exiting from a syscall and then called SyS_read() ?

If we instead change syscall_exit to be a local label we get something
more intuitive:

[c0000001fa46fde0] [c00000000026719c] SyS_read+0x6c/0x110
[c0000001fa46fe30] [c000000000009264] system_call+0x38/0xd0

ie. we were handling a system call, and it was SyS_read().

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-02-02 14:51:31 +11:00
Esben Haabendal 974ff4e2d7 powerpc: Add machine_check cpu function for e300c3 cpus
Signed-off-by: Esben Haabendal <eha@deif.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29 22:57:40 -06:00
LEROY Christophe 4545ff7ed8 powerpc/8xx: Remove duplicated code in set_context()
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29 21:59:02 -06:00
LEROY Christophe fde5a9057f powerpc/8xx: Optimise access to swapper_pg_dir
All accessed to PGD entries are done via 0(r11).
By using lower part of swapper_pg_dir as load index to r11, we can remove the
ori instruction.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29 21:59:02 -06:00
LEROY Christophe 17bb312f4c powerpc/8xx: Take benefit of aligned PGDIR
L1 base address is now aligned so we can insert L1 index into r11 directly and
then preserve r10

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29 21:59:02 -06:00
LEROY Christophe 5ddb75cee5 powerpc/8xx: remove tests on PGDIR entry validity
Kernel MMU handling code handles validity of entries via _PMD_PRESENT which
corresponds to V bit in MD_TWC and MI_TWC. When the V bit is not set, MPC8xx
triggers TLBError exception. So we don't have to check that and branch ourself
to TLBError. We can set TLB entries with non present entries, remove all those
tests and let the 8xx handle it. This reduce the number of cycle when the
entries are valid which is the case most of the time, and doesn't significantly
increase the time for handling invalid entries.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29 21:58:02 -06:00
LEROY Christophe 2374d0af29 powerpc/8xx: remove remaining unnecessary code in FixupDAR
Since commit 33fb845a6f ("powerpc/8xx: Don't use MD_TWC for walk"), MD_EPN and
MD_TWC are not writen anymore in FixupDAR so saving r3 has become useless.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29 21:58:01 -06:00
LEROY Christophe cadbfd0146 powerpc/8xx: use _PAGE_RO instead of _PAGE_RW
On powerpc 8xx, in TLB entries, 0x400 bit is set to 1 for read-only pages
and is set to 0 for RW pages. So we should use _PAGE_RO instead of _PAGE_RW

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29 20:13:10 -06:00
Michael Ellerman 8aa989b8fb powerpc: Remove some unused functions
Remove slice_set_psize() which is not used.

It was added in 3a8247cc2c "powerpc: Only demote individual slices
rather than whole process" but was never used.

Remove vsx_assist_exception() which is not used.

It was added in ce48b21007 "powerpc: Add VSX context save/restore,
ptrace and signal support" but was never used.

Remove generic_mach_cpu_die() which is not used.

Its last caller was removed in 375f561a41 "powerpc/powernv: Always go
into nap mode when CPU is offline".

Remove mpc7448_hpc2_power_off() and mpc7448_hpc2_halt() which are
unused.

These were introduced in c5d56332fd "[POWERPC] Add general support for
mpc7448hpc2 (Taiga) platform" but were never used.

This was partially found by using a static code analysis program called
cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
[mpe: Update changelog with details on when/why they are unused]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-28 15:00:24 +11:00
Cyril Bur 3df76a9dcc powerpc/pseries: Fix endian problems with LE migration
RTAS events require arguments be passed in big endian while hypercalls
have their arguments passed in registers and the values should therefore
be in CPU endian.

The "ibm,suspend_me" 'RTAS' call makes a sequence of hypercalls to setup
one true RTAS call. This means that "ibm,suspend_me" is handled
specially in the ppc_rtas() syscall.

The ppc_rtas() syscall has its arguments in big endian and can therefore
pass these arguments directly to the RTAS call. "ibm,suspend_me" is
handled specially from within ppc_rtas() (by calling rtas_ibm_suspend_me())
which has left an endian bug on little endian systems due to the
requirement of hypercalls. The return value from rtas_ibm_suspend_me()
gets returned in cpu endian, and is left unconverted, also a bug on
little endian systems.

rtas_ibm_suspend_me() does not actually make use of the rtas_args that
it is passed. This patch removes the convoluted use of the rtas_args
struct to pass params to rtas_ibm_suspend_me() in favour of passing what
it needs as actual arguments. This patch also ensures the two callers of
rtas_ibm_suspend_me() pass function parameters in cpu endian and in the
case of ppc_rtas(), converts the return value.

migrate_store() (the other caller of rtas_ibm_suspend_me()) is from a
sysfs file which deals with everything in cpu endian so this function
only underwent cleanup.

This patch has been tested with KVM both LE and BE and on PowerVM both
LE and BE. Under QEMU/KVM the migration happens without touching these
code pathes.

For PowerVM there is no obvious regression on BE and the LE code path
now provides the correct parameters to the hypervisor.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-27 14:03:53 +11:00
Linus Torvalds 550695925d PCI updates for v3.19:
Resource management
     - Clip bridge windows to fit in upstream windows (Yinghai Lu)
 
   Virtualization
     - Mark Atheros AR93xx to avoid using bus reset (Alex Williamson)
 
   Miscellaneous
     - Update Richard Zhu's email address (Lucas Stach)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUwqpDAAoJEFmIoMA60/r8ykIQAINgkP/iPaFMTPkTSfzTJCMY
 oQVNGha4FDt6Ic1UWGyS/sYUpywSnALxlYWxVZTm5r+sGQ2yJBo6veuxvCI09YFw
 lWqf6lfvkFSthWCo7pHLoNaIjKJUNCy4a2han31aAIScMCNX4YF60YorMSjQBST8
 smLMG75U3U9VWaXYsV1e5gTvLa5IQh4lgaTgMAOXqd+6WcAR4WwOgD2sR06o2X43
 63JF2U+ieuA789Xu2IS92TmMMESD5haEZATqdGPtxpnqyHxmBNu0Y4JkkBWD2S92
 HvveOoLBT2TBfICkftvCJscBLHh7PZMIx9nLx58SnijVzX+hzVr4Zfc96MZU50MK
 DuNbbZn3sO902ukOEpfih7Mg0tDxCxNytleEdAnXmZuqf+odbd/Y4AA0Hg6w7GEY
 OsVGbQAT/knlTfsSZsivtmUl7l1SXzrozv+q4f4szY95v34S9pm0sWzz0IBn7oKj
 h7N9Vslr3lyEudOUo1OrFq+0arDw53kwOOkIavMUH0nvTqKs4cmXBcGMfo1EfMa+
 3YhjwbgpvtZ3AXi2NSBk4gIGZEmQslvgRStLhgXVDl+9DieK+sw1Vx4cKe8gu9mD
 c7zPStEsJBJgd3v+8s8avwo8R0oPZb6MsCKFjjaYojTvpfFmfX0YyWE/TzYoUm6Z
 +BTyA8t0+3jTArTs/Zid
 =HRy7
 -----END PGP SIGNATURE-----

Merge tag 'pci-v3.19-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:
 "These are fixes for:

   - a resource management problem that causes a Radeon "Fatal error
     during GPU init" on machines where the BIOS programmed an invalid
     Root Port window.  This was a regression in v3.16.

   - an Atheros AR93xx device that doesn't handle PCI bus resets
     correctly.  This was a regression in v3.14.

   - an out-of-date email address"

* tag 'pci-v3.19-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  MAINTAINERS: Update Richard Zhu's email address
  sparc/PCI: Clip bridge windows to fit in upstream windows
  powerpc/PCI: Clip bridge windows to fit in upstream windows
  parisc/PCI: Clip bridge windows to fit in upstream windows
  mn10300/PCI: Clip bridge windows to fit in upstream windows
  microblaze/PCI: Clip bridge windows to fit in upstream windows
  ia64/PCI: Clip bridge windows to fit in upstream windows
  frv/PCI: Clip bridge windows to fit in upstream windows
  alpha/PCI: Clip bridge windows to fit in upstream windows
  x86/PCI: Clip bridge windows to fit in upstream windows
  PCI: Add pci_claim_bridge_resource() to clip window if necessary
  PCI: Add pci_bus_clip_resource() to clip to fit upstream window
  PCI: Pass bridge device, not bus, when updating bridge windows
  PCI: Mark Atheros AR93xx to avoid bus reset
  PCI: Add flag for devices where we can't use bus reset
2015-01-24 10:58:47 +12:00
Gavin Shan 1b28f170d9 powerpc/eeh: Allow to set maximal frozen times
When PE's frozen count hits maximal allowed frozen times, which is
5 currently, it will be forced to be offline permanently. Once the
PE is removed permanently, rebooting machine is required to bring
the PE back. It's not convienent when testing EEH functionality.

The patch exports the maximal allowed frozen times through debugfs
entry (/sys/kernel/debug/powerpc/eeh_max_freezes).

Requested-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-23 14:02:54 +11:00
Gavin Shan 432227e907 powerpc/eeh: Introduce flag EEH_PE_REMOVED
The conditions that one specific PE's frozen count exceeds the maximal
allowed times (EEH_MAX_ALLOWED_FREEZES) and it's in isolated or recovery
state indicate the PE was removed permanently implicitly. The patch
introduces flag EEH_PE_REMOVED to indicate that explicitly so that we
don't depend on the fixed maximal allowed times, which can be varied as
we do in subsequent patch.

Flag EEH_PE_REMOVED is expected to be marked for the PE whose frozen
count exceeds the maximal allowed times, or just failed from recovery.

Requested-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-23 14:02:53 +11:00
Gavin Shan 2aa5cf9e48 powerpc/eeh: Fix missed PE#0 on P7IOC
PE#0 should be regarded as valid for P7IOC, while it's invalid for
PHB3. The patch adds flag EEH_VALID_PE_ZERO to differentiate those
two cases. Without the patch, we possibly see frozen PE#0 state is
cleared without EEH recovery taken on P7IOC as following kernel logs
indicate:

[root@ltcfbl8eb ~]# dmesg
       :
pci 0000:00     : [PE# 000] Secondary bus 0 associated with PE#0
pci 0000:01     : [PE# 001] Secondary bus 1 associated with PE#1
pci 0001:00     : [PE# 000] Secondary bus 0 associated with PE#0
pci 0001:01     : [PE# 001] Secondary bus 1 associated with PE#1
pci 0002:00     : [PE# 000] Secondary bus 0 associated with PE#0
pci 0002:01     : [PE# 001] Secondary bus 1 associated with PE#1
pci 0003:00     : [PE# 000] Secondary bus 0 associated with PE#0
pci 0003:01     : [PE# 001] Secondary bus 1 associated with PE#1
pci 0003:20     : [PE# 002] Secondary bus 32..63 associated with PE#2
       :
EEH: Clear non-existing PHB#3-PE#0
EEH: PHB location: U78AE.001.WZS00M9-P1-002

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-23 14:02:52 +11:00
Gavin Shan 6f20e7f2e9 powerpc/kernel: Avoid memory corruption at early stage
When calling to early_setup(), we pick "boot_paca" up for the master CPU
and initialize that with initialise_paca(). At that point, the SLB
shadow buffer isn't populated yet. Updating the SLB shadow buffer should
corrupt what we had in physical address 0 where the trap instruction is
usually stored.

This hasn't been observed to cause any trouble in practice, but is
obviously fishy.

Fixes: 6f4441ef70 ("powerpc: Dynamically allocate slb_shadow from memblock")
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-23 14:02:52 +11:00
Michael Ellerman 10ea834364 powerpc: Rename _TIF_SYSCALL_T_OR_A to _TIF_SYSCALL_DOTRACE
Once upon a time, at least 9 years ago (< 2.6.12), _TIF_SYSCALL_T_OR_A
meant "TRACE or AUDIT". But these days it means TRACE or AUDIT or
SECCOMP or TRACEPOINT or NOHZ.

All of those are implemented via syscall_dotrace() so rename the flag to
that to try and clarify things.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-23 14:02:51 +11:00