Commit Graph

183 Commits

Author SHA1 Message Date
Anton Blanchard 1c53973172 powerpc: Remove mtmsrd(), use existing mtmsr()
mtmsr() does the right thing on 32bit and 64bit, so use it everywhere.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-07-13 15:47:28 +10:00
Paul Mackerras 755563bc79 powerpc/powernv: Fixes for hypervisor doorbell handling
Since we can now use hypervisor doorbells for host IPIs, this makes
sure we clear the host IPI flag when taking a doorbell interrupt, and
clears any pending doorbell IPI in pnv_smp_cpu_kill_self() (as we
already do for IPIs sent via the XICS interrupt controller).  Otherwise
if there did happen to be a leftover pending doorbell interrupt for
an offline CPU thread for any reason, it would prevent that thread from
going into a power-saving mode; it would instead keep waking up because
of the interrupt.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-20 14:51:53 +11:00
Shreyas B. Prabhu 77b54e9f21 powernv/powerpc: Add winkle support for offline cpus
Winkle is a deep idle state supported in power8 chips. A core enters
winkle when all the threads of the core enter winkle. In this state
power supply to the entire chiplet i.e core, private L2 and private L3
is turned off. As a result it gives higher powersavings compared to
sleep.

But entering winkle results in a total hypervisor state loss. Hence the
hypervisor context has to be preserved before entering winkle and
restored upon wake up.

Power-on Reset Engine (PORE) is a dedicated engine which is responsible
for powering on the chiplet during wake up. It can be programmed to
restore the register contests of a few specific registers. This patch
uses PORE to restore register state wherever possible and uses stack to
save and restore rest of the necessary registers.

With hypervisor state restore things fall under three categories-
per-core state, per-subcore state and per-thread state. To manage this,
extend the infrastructure introduced for sleep. Mainly we add a paca
variable subcore_sibling_mask. Using this and the core_idle_state we can
distingush first thread in core and subcore.

Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-15 10:46:41 +11:00
Paul Mackerras 8117ac6a6c powerpc/powernv: Switch off MMU before entering nap/sleep/rvwinkle mode
Currently, when going idle, we set the flag indicating that we are in
nap mode (paca->kvm_hstate.hwthread_state) and then execute the nap
(or sleep or rvwinkle) instruction, all with the MMU on.  This is bad
for two reasons: (a) the architecture specifies that those instructions
must be executed with the MMU off, and in fact with only the SF, HV, ME
and possibly RI bits set, and (b) this introduces a race, because as
soon as we set the flag, another thread can switch the MMU to a guest
context.  If the race is lost, this thread will typically start looping
on relocation-on ISIs at 0xc...4400.

This fixes it by setting the MSR as required by the architecture before
setting the flag or executing the nap/sleep/rvwinkle instruction.

Cc: stable@vger.kernel.org
[ shreyas@linux.vnet.ibm.com: Edited to handle LE ]
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-15 10:46:32 +11:00
Anton Blanchard acf620ecf5 powerpc: Rename __get_SP() to current_stack_pointer()
Michael points out that __get_SP() is a pretty horrible
function name. Let's give it a better name.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-15 11:23:20 +11:00
Anton Blanchard bfe9a2cfe9 powerpc: Reimplement __get_SP() as a function not a define
Li Zhong points out an issue with our current __get_SP()
implementation. If ftrace function tracing is enabled (ie -pg
profiling using _mcount) we spill a stack frame on 64bit all the
time.

If a function calls __get_SP() and later calls a function that is
tail call optimised, we will pop the stack frame and the value
returned by __get_SP() is no longer valid. An example from Li can
be found in save_stack_trace -> save_context_stack:

c0000000000432c0 <.save_stack_trace>:
c0000000000432c0:       mflr    r0
c0000000000432c4:       std     r0,16(r1)
c0000000000432c8:       stdu    r1,-128(r1) <-- stack frame for _mcount
c0000000000432cc:       std     r3,112(r1)
c0000000000432d0:       bl      <._mcount>
c0000000000432d4:       nop

c0000000000432d8:       mr      r4,r1 <-- __get_SP()

c0000000000432dc:       ld      r5,632(r13)
c0000000000432e0:       ld      r3,112(r1)
c0000000000432e4:       li      r6,1

c0000000000432e8:       addi    r1,r1,128 <-- pop stack frame

c0000000000432ec:       ld      r0,16(r1)
c0000000000432f0:       mtlr    r0
c0000000000432f4:       b       <.save_context_stack> <-- tail call optimized

save_context_stack ends up with a stack pointer below the current
one, and it is likely to be scribbled over.

Fix this by making __get_SP() a function which returns the
callers stack frame. Also replace inline assembly which grabs
the stack pointer in save_stack_trace and show_stack with
__get_SP().

This also fixes an issue with perf_arch_fetch_caller_regs().
It currently unwinds the stack once, which will skip a
valid stack frame on a leaf function. With the __get_SP() fixes
in this patch, we never need to unwind the stack frame to get
to the first interesting frame.

We have to export __get_SP() because perf_arch_fetch_caller_regs()
(which is used in modules) calls it from a header file.

Reported-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-15 11:23:19 +11:00
Michael Ellerman 75d43b2d0a Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux.git
Freescale updates from Scott (27 commits):

  "Highlights include DMA32 zone support (SATA, USB, etc now works on 64-bit
   FSL kernels), MSI changes, 8xx optimizations and cleanup, t104x board
   support, and PrPMC PCI enumeration."
2014-10-04 08:59:06 +10:00
LEROY Christophe ae466bde19 powerpc/8xx: Declare SPRG2 as a SCRATCH register
Since commit 469d62be92, SPRG2 is used as a
scratch register just like SPRG0 and SPRG1. So Declare it as such and fix
the comment which is not valid anymore since that commit.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-09-04 19:02:11 -05:00
Nishanth Aravamudan 56758e3c3c powerpc: remove duplicate definition of TEXASR_FS
It appears that commits 7f06f21d40 ("powerpc/tm: Add checking to
treclaim/trechkpt") and e4e3812150 ("KVM: PPC: Book3S HV: Add
transactional memory support") both added definitions of TEXASR_FS.
Remove one of them. At the same time, fix the alignment of the remaining
definition (should be tab-separated like the rest of the #defines).

Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-13 15:13:47 +10:00
Linus Torvalds 66bb0aa077 Here are the PPC and ARM changes for KVM, which I separated because
they had small conflicts (respectively within KVM documentation,
 and with 3.16-rc changes).  Since they were all within the subsystem,
 I took care of them.
 
 Stephen Rothwell reported some snags in PPC builds, but they are all
 fixed now; the latest linux-next report was clean.
 
 New features for ARM include:
 - KVM VGIC v2 emulation on GICv3 hardware
 - Big-Endian support for arm/arm64 (guest and host)
 - Debug Architecture support for arm64 (arm32 is on Christoffer's todo list)
 
 And for PPC:
 - Book3S: Good number of LE host fixes, enable HV on LE
 - Book3S HV: Add in-guest debug support
 
 This release drops support for KVM on the PPC440.  As a result, the
 PPC merge removes more lines than it adds. :)
 
 I also included an x86 change, since Davidlohr tied it to an independent
 bug report and the reporter quickly provided a Tested-by; there was no
 reason to wait for -rc2.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJT4iIJAAoJEBvWZb6bTYbyZqoP/3Wxy8NWPFJ8HGt81NHlGnDS
 a9UbL7EibcOEG+aaKqmtBglTD5YDiGBDNCxxiSJaDHt+grLN4fsWIliJob1nJFoO
 90f89EWN2XjeCrJXA5nUoeg5tpc5OoYKsiP6pTgzIwkP8vvs/H1+zpcTS/UmYsr/
 qipVMMsM+zZeHWZcSbqjW88z7YqIn1sr5282wJ85cbyv4KGizb/G4dyPuDqLb6np
 hkAD8Ah6VV2suQ2FSy7G2fg20R0vglUi60hkEHLoCBPVqJCl7SmC8MvxNbjBnP8S
 J36R0R0u1wHYKzAGooLJGVOZ/o/gSiVqKX+++L2EvJBN+kuA6u/7fxLyBT+LwDAE
 IF/Aln5rpg1fe+eywvhz86WljTVEQ8bO1zVsIQUPY+/ZOPedZHMwyvXft8ogbjSp
 2m9OJ/3e8Aggh0OeHpCDoeow+QDUXvX0YdCw+2Yh0p+7VMXqkyp0QEiBu38jrusC
 rB3VNifJbDSWLKdG9LfCAPHnxZD2XYEwv2WFBo6KQOGMGHfx0GXpCOL/jQihrhA6
 HtEG5Bs3lvnHQemdpUZ58xojiABbMaUPdcnPXQQEp23WhZzrfLMLzqVG0VYnhSsC
 9pi7MJj8c31rqx5WU2oRM28i/BvNxN0NCtkDpineO5s3f89Ws1xnwxqlm38AKP0J
 irJQTYFEqec+GM9JK1rG
 =hyQP
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull second round of KVM changes from Paolo Bonzini:
 "Here are the PPC and ARM changes for KVM, which I separated because
  they had small conflicts (respectively within KVM documentation, and
  with 3.16-rc changes).  Since they were all within the subsystem, I
  took care of them.

  Stephen Rothwell reported some snags in PPC builds, but they are all
  fixed now; the latest linux-next report was clean.

  New features for ARM include:
   - KVM VGIC v2 emulation on GICv3 hardware
   - Big-Endian support for arm/arm64 (guest and host)
   - Debug Architecture support for arm64 (arm32 is on Christoffer's todo list)

  And for PPC:
   - Book3S: Good number of LE host fixes, enable HV on LE
   - Book3S HV: Add in-guest debug support

  This release drops support for KVM on the PPC440.  As a result, the
  PPC merge removes more lines than it adds.  :)

  I also included an x86 change, since Davidlohr tied it to an
  independent bug report and the reporter quickly provided a Tested-by;
  there was no reason to wait for -rc2"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (122 commits)
  KVM: Move more code under CONFIG_HAVE_KVM_IRQFD
  KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use
  KVM: nVMX: Fix nested vmexit ack intr before load vmcs01
  KVM: PPC: Enable IRQFD support for the XICS interrupt controller
  KVM: Give IRQFD its own separate enabling Kconfig option
  KVM: Move irq notifier implementation into eventfd.c
  KVM: Move all accesses to kvm::irq_routing into irqchip.c
  KVM: irqchip: Provide and use accessors for irq routing table
  KVM: Don't keep reference to irq routing table in irqfd struct
  KVM: PPC: drop duplicate tracepoint
  arm64: KVM: fix 64bit CP15 VM access for 32bit guests
  KVM: arm64: GICv3: mandate page-aligned GICV region
  arm64: KVM: GICv3: move system register access to msr_s/mrs_s
  KVM: PPC: PR: Handle FSCR feature deselects
  KVM: PPC: HV: Remove generic instruction emulation
  KVM: PPC: BOOKEHV: rename e500hv_spr to bookehv_spr
  KVM: PPC: Remove DCR handling
  KVM: PPC: Expose helper functions for data/inst faults
  KVM: PPC: Separate loadstore emulation from priv emulation
  KVM: PPC: Handle magic page in kvmppc_ld/st
  ...
2014-08-07 11:35:30 -07:00
Stewart Smith 9678cdaae9 Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8
The POWER8 processor has a Micro Partition Prefetch Engine, which is
a fancy way of saying "has way to store and load contents of L2 or
L2+MRU way of L3 cache". We initiate the storing of the log (list of
addresses) using the logmpp instruction and start restore by writing
to a SPR.

The logmpp instruction takes parameters in a single 64bit register:
- starting address of the table to store log of L2/L2+L3 cache contents
  - 32kb for L2
  - 128kb for L2+L3
  - Aligned relative to maximum size of the table (32kb or 128kb)
- Log control (no-op, L2 only, L2 and L3, abort logout)

We should abort any ongoing logging before initiating one.

To initiate restore, we write to the MPPR SPR. The format of what to write
to the SPR is similar to the logmpp instruction parameter:
- starting address of the table to read from (same alignment requirements)
- table size (no data, until end of table)
- prefetch rate (from fastest possible to slower. about every 8, 16, 24 or
  32 cycles)

The idea behind loading and storing the contents of L2/L3 cache is to
reduce memory latency in a system that is frequently swapping vcores on
a physical CPU.

The best case scenario for doing this is when some vcores are doing very
cache heavy workloads. The worst case is when they have about 0 cache hits,
so we just generate needless memory operations.

This implementation just does L2 store/load. In my benchmarks this proves
to be useful.

Benchmark 1:
 - 16 core POWER8
 - 3x Ubuntu 14.04LTS guests (LE) with 8 VCPUs each
 - No split core/SMT
 - two guests running sysbench memory test.
   sysbench --test=memory --num-threads=8 run
 - one guest running apache bench (of default HTML page)
   ab -n 490000 -c 400 http://localhost/

This benchmark aims to measure performance of real world application (apache)
where other guests are cache hot with their own workloads. The sysbench memory
benchmark does pointer sized writes to a (small) memory buffer in a loop.

In this benchmark with this patch I can see an improvement both in requests
per second (~5%) and in mean and median response times (again, about 5%).
The spread of minimum and maximum response times were largely unchanged.

benchmark 2:
 - Same VM config as benchmark 1
 - all three guests running sysbench memory benchmark

This benchmark aims to see if there is a positive or negative affect to this
cache heavy benchmark. Although due to the nature of the benchmark (stores) we
may not see a difference in performance, but rather hopefully an improvement
in consistency of performance (when vcore switched in, don't have to wait
many times for cachelines to be pulled in)

The results of this benchmark are improvements in consistency of performance
rather than performance itself. With this patch, the few outliers in duration
go away and we get more consistent performance in each guest.

benchmark 3:
 - same 3 guests and CPU configuration as benchmark 1 and 2.
 - two idle guests
 - 1 guest running STREAM benchmark

This scenario also saw performance improvement with this patch. On Copy and
Scale workloads from STREAM, I got 5-6% improvement with this patch. For
Add and triad, it was around 10% (or more).

benchmark 4:
 - same 3 guests as previous benchmarks
 - two guests running sysbench --memory, distinctly different cache heavy
   workload
 - one guest running STREAM benchmark.

Similar improvements to benchmark 3.

benchmark 5:
 - 1 guest, 8 VCPUs, Ubuntu 14.04
 - Host configured with split core (SMT8, subcores-per-core=4)
 - STREAM benchmark

In this benchmark, we see a 10-20% performance improvement across the board
of STREAM benchmark results with this patch.

Based on preliminary investigation and microbenchmarks
by Prerna Saxena <prerna@linux.vnet.ibm.com>

Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-07-28 15:23:17 +02:00
Bharat Bhushan 8c95ead603 KVM: PPC: Remove comment saying SPRG1 is used for vcpu pointer
Scott Wood pointed out that We are no longer using SPRG1 for vcpu pointer,
but using SPRN_SPRG_THREAD <=> SPRG3 (thread->vcpu). So this comment
is not valid now.

Note: SPRN_SPRG3R is not supported (do not see any need as of now),
and if we want to support this in future then we have to shift to using
SPRG1 for VCPU pointer.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-07-28 15:23:15 +02:00
Aneesh Kumar K.V 8f42ab2749 KVM: PPC: BOOK3S: PR: Emulate virtual timebase register
virtual time base register is a per VM, per cpu register that needs
to be saved and restored on vm exit and entry. Writing to VTB is not
allowed in the privileged mode.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[agraf: fix compile error]
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-07-28 15:21:50 +02:00
Michael Ellerman 376af5947c powerpc: Remove STAB code
Old cpus didn't have a Segment Lookaside Buffer (SLB), instead they had
a Segment Table (STAB). Now that we've dropped support for those cpus,
we can remove the STAB support entirely.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-28 14:10:22 +10:00
Linus Torvalds c5aec4c76a Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc updates from Ben Herrenschmidt:
 "Here is the bulk of the powerpc changes for this merge window.  It got
  a bit delayed in part because I wasn't paying attention, and in part
  because I discovered I had a core PCI change without a PCI maintainer
  ack in it.  Bjorn eventually agreed it was ok to merge it though we'll
  probably improve it later and I didn't want to rebase to add his ack.

  There is going to be a bit more next week, essentially fixes that I
  still want to sort through and test.

  The biggest item this time is the support to build the ppc64 LE kernel
  with our new v2 ABI.  We previously supported v2 userspace but the
  kernel itself was a tougher nut to crack.  This is now sorted mostly
  thanks to Anton and Rusty.

  We also have a fairly big series from Cedric that add support for
  64-bit LE zImage boot wrapper.  This was made harder by the fact that
  traditionally our zImage wrapper was always 32-bit, but our new LE
  toolchains don't really support 32-bit anymore (it's somewhat there
  but not really "supported") so we didn't want to rely on it.  This
  meant more churn that just endian fixes.

  This brings some more LE bits as well, such as the ability to run in
  LE mode without a hypervisor (ie. under OPAL firmware) by doing the
  right OPAL call to reinitialize the CPU to take HV interrupts in the
  right mode and the usual pile of endian fixes.

  There's another series from Gavin adding EEH improvements (one day we
  *will* have a release with less than 20 EEH patches, I promise!).

  Another highlight is the support for the "Split core" functionality on
  P8 by Michael.  This allows a P8 core to be split into "sub cores" of
  4 threads which allows the subcores to run different guests under KVM
  (the HW still doesn't support a partition per thread).

  And then the usual misc bits and fixes ..."

[ Further delayed by gmail deciding that BenH is a dirty spammer.
  Google knows.  ]

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (155 commits)
  powerpc/powernv: Add missing include to LPC code
  selftests/powerpc: Test the THP bug we fixed in the previous commit
  powerpc/mm: Check paca psize is up to date for huge mappings
  powerpc/powernv: Pass buffer size to OPAL validate flash call
  powerpc/pseries: hcall functions are exported to modules, need _GLOBAL_TOC()
  powerpc: Exported functions __clear_user and copy_page use r2 so need _GLOBAL_TOC()
  powerpc/powernv: Set memory_block_size_bytes to 256MB
  powerpc: Allow ppc_md platform hook to override memory_block_size_bytes
  powerpc/powernv: Fix endian issues in memory error handling code
  powerpc/eeh: Skip eeh sysfs when eeh is disabled
  powerpc: 64bit sendfile is capped at 2GB
  powerpc/powernv: Provide debugfs access to the LPC bus via OPAL
  powerpc/serial: Use saner flags when creating legacy ports
  powerpc: Add cpu family documentation
  powerpc/xmon: Fix up xmon format strings
  powerpc/powernv: Add calls to support little endian host
  powerpc: Document sysfs DSCR interface
  powerpc: Fix regression of per-CPU DSCR setting
  powerpc: Split __SYSFS_SPRSETUP macro
  arch: powerpc/fadump: Cleaning up inconsistent NULL checks
  ...
2014-06-10 18:54:22 -07:00
Paul Mackerras 9bc01a9bc7 KVM: PPC: Book3S HV: Work around POWER8 performance monitor bugs
This adds workarounds for two hardware bugs in the POWER8 performance
monitor unit (PMU), both related to interrupt generation.  The effect
of these bugs is that PMU interrupts can get lost, leading to tools
such as perf reporting fewer counts and samples than they should.

The first bug relates to the PMAO (perf. mon. alert occurred) bit in
MMCR0; setting it should cause an interrupt, but doesn't.  The other
bug relates to the PMAE (perf. mon. alert enable) bit in MMCR0.
Setting PMAE when a counter is negative and counter negative
conditions are enabled to cause alerts should cause an alert, but
doesn't.

The workaround for the first bug is to create conditions where a
counter will overflow, whenever we are about to restore a MMCR0
value that has PMAO set (and PMAO_SYNC clear).  The workaround for
the second bug is to freeze all counters using MMCR2 before reading
MMCR0.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30 14:26:29 +02:00
Michael Ellerman e2186023f2 powerpc/powernv: Add support for POWER8 split core on powernv
Upcoming POWER8 chips support a concept called split core. This is where the
core can be split into subcores that although not full cores, are able to
appear as full cores to a guest.

The splitting & unsplitting procedure is mildly complicated, and explained at
length in the comments within the patch.

One notable detail is that when splitting or unsplitting we need to pull
offline cpus out of their offline state to do work as part of the procedure.

The interface for changing the split mode is via a sysfs file, eg:

 $ echo 2 > /sys/devices/system/cpu/subcores_per_core

Currently supported values are '1', '2' and '4'. And indicate respectively that
the core should be unsplit, split in half, and split in quarters. These modes
correspond to threads_per_subcore of 8, 4 and 2.

We do not allow changing the split mode while KVM VMs are active. This is to
prevent the value changing while userspace is configuring the VM, and also to
prevent the mode being changed in such a way that existing guests are unable to
be run.

CPU hotplug fixes by Srivatsa.  max_cpus fixes by Mahesh.  cpuset fixes by
benh.  Fix for irq race by paulus.  The rest by mikey and mpe.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-05-28 13:35:37 +10:00
Michael Neuling 7f06f21d40 powerpc/tm: Add checking to treclaim/trechkpt
If we do a treclaim and we are not in TM suspend mode, it results in a TM bad
thing (ie. a 0x700 program check).  Similarly if we do a trechkpt and we have
an active transaction or TEXASR Failure Summary (FS) is not set, we also take a
TM bad thing.

This should never happen, but if it does (ie. a kernel bug), the cause is
almost impossible to debug as the GPR state is mostly userspace and hence we
don't get a call chain.

This adds some checks in these cases case a BUG_ON() (in asm) in case we ever
hit these cases.  It moves the register saving around to preserve r1 till later
also.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-04-28 17:36:51 +10:00
Rafael J. Wysocki fe10739284 Merge branch 'pm-cpufreq'
* pm-cpufreq:
  cpufreq: ppc: Remove duplicate inclusion of fsl_soc.h
  cpufreq: create another field .flags in cpufreq_frequency_table
  cpufreq: use kzalloc() to allocate memory for cpufreq_frequency_table
  cpufreq: don't print value of .driver_data from core
  cpufreq: ia64: don't set .driver_data to index
  cpufreq: powernv: Select CPUFreq related Kconfig options for powernv
  cpufreq: powernv: Use cpufreq_frequency_table.driver_data to store pstate ids
  cpufreq: powernv: cpufreq driver for powernv platform
  cpufreq: at32ap: don't declare local variable as static
  cpufreq: loongson2_cpufreq: don't declare local variable as static
  cpufreq: unicore32: fix typo issue for 'clk'
  cpufreq: exynos: Disable on multiplatform build
2014-04-08 13:28:02 +02:00
Vaidyanathan Srinivasan b3d627a5f2 cpufreq: powernv: cpufreq driver for powernv platform
Backend driver to dynamically set voltage and frequency on
IBM POWER non-virtualized platforms.  Power management SPRs
are used to set the required PState.

This driver works in conjunction with cpufreq governors
like 'ondemand' to provide a demand based frequency and
voltage setting on IBM POWER non-virtualized platforms.

PState table is obtained from OPAL v3 firmware through device
tree.

powernv_cpufreq back-end driver would parse the relevant device-tree
nodes and initialise the cpufreq subsystem on powernv platform.

The code was originally written by svaidy@linux.vnet.ibm.com. Over
time it was modified to accomodate bug-fixes as well as updates to the
the cpu-freq core. Relevant portions of the change logs corresponding
to those modifications are noted below:

 * The policy->cpus needs to be populated in a hotplug-invariant
   manner instead of using cpu_sibling_mask() which varies with
   cpu-hotplug. This is because the cpufreq core code copies this
   content into policy->related_cpus mask which should not vary on
   cpu-hotplug. [Authored by srivatsa.bhat@linux.vnet.ibm.com]

 * Create a helper routine that can return the cpu-frequency for the
   corresponding pstate_id. Also, cache the values of the pstate_max,
   pstate_min and pstate_nominal and nr_pstates in a static structure
   so that they can be reused in the future to perform any
   validations. [Authored by ego@linux.vnet.ibm.com]

 * Create a driver attribute named cpuinfo_nominal_freq which creates
   a sysfs read-only file named cpuinfo_nominal_freq. Export the
   frequency corresponding to the nominal_pstate through this
   interface.

     Nominal frequency is the highest non-turbo frequency for the
   platform.  This is generally used for setting governor policies
   from user space for optimal energy efficiency. [Authored by
   ego@linux.vnet.ibm.com]

 * Implement a powernv_cpufreq_get(unsigned int cpu) method which will
   return the current operating frequency. Export this via the sysfs
   interface cpuinfo_cur_freq by setting powernv_cpufreq_driver.get to
   powernv_cpufreq_get(). [Authored by ego@linux.vnet.ibm.com]

[Change log updated by ego@linux.vnet.ibm.com]

Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-04-07 14:35:27 +02:00
Linus Torvalds 7cbb39d4d4 Merge tag 'kvm-3.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
 "PPC and ARM do not have much going on this time.  Most of the cool
  stuff, instead, is in s390 and (after a few releases) x86.

  ARM has some caching fixes and PPC has transactional memory support in
  guests.  MIPS has some fixes, with more probably coming in 3.16 as
  QEMU will soon get support for MIPS KVM.

  For x86 there are optimizations for debug registers, which trigger on
  some Windows games, and other important fixes for Windows guests.  We
  now expose to the guest Broadwell instruction set extensions and also
  Intel MPX.  There's also a fix/workaround for OS X guests, nested
  virtualization features (preemption timer), and a couple kvmclock
  refinements.

  For s390, the main news is asynchronous page faults, together with
  improvements to IRQs (floating irqs and adapter irqs) that speed up
  virtio devices"

* tag 'kvm-3.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (96 commits)
  KVM: PPC: Book3S HV: Save/restore host PMU registers that are new in POWER8
  KVM: PPC: Book3S HV: Fix decrementer timeouts with non-zero TB offset
  KVM: PPC: Book3S HV: Don't use kvm_memslots() in real mode
  KVM: PPC: Book3S HV: Return ENODEV error rather than EIO
  KVM: PPC: Book3S: Trim top 4 bits of physical address in RTAS code
  KVM: PPC: Book3S HV: Add get/set_one_reg for new TM state
  KVM: PPC: Book3S HV: Add transactional memory support
  KVM: Specify byte order for KVM_EXIT_MMIO
  KVM: vmx: fix MPX detection
  KVM: PPC: Book3S HV: Fix KVM hang with CONFIG_KVM_XICS=n
  KVM: PPC: Book3S: Introduce hypervisor call H_GET_TCE
  KVM: PPC: Book3S HV: Fix incorrect userspace exit on ioeventfd write
  KVM: s390: clear local interrupts at cpu initial reset
  KVM: s390: Fix possible memory leak in SIGP functions
  KVM: s390: fix calculation of idle_mask array size
  KVM: s390: randomize sca address
  KVM: ioapic: reinject pending interrupts on KVM_SET_IRQCHIP
  KVM: Bump KVM_MAX_IRQ_ROUTES for s390
  KVM: s390: irq routing for adapter interrupts.
  KVM: s390: adapter interrupt sources
  ...
2014-04-02 14:50:10 -07:00
Paolo Bonzini 7227fc0666 Merge branch 'kvm-ppchv-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into kvm-next 2014-03-29 15:44:05 +01:00
Michael Neuling e4e3812150 KVM: PPC: Book3S HV: Add transactional memory support
This adds saving of the transactional memory (TM) checkpointed state
on guest entry and exit.  We only do this if we see that the guest has
an active transaction.

It also adds emulation of the TM state changes when delivering IRQs
into the guest.  According to the architecture, if we are
transactional when an IRQ occurs, the TM state is changed to
suspended, otherwise it's left unchanged.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Scott Wood <scottwood@freescale.com>
2014-03-29 19:58:02 +11:00
Benjamin Herrenschmidt cd42748535 Merge remote-tracking branch 'scott/next' into next
Freescale updates from Scott. Mostly support for critical
and machine check exceptions on 64-bit BookE, some new
PCI suspend/resume work and misc bits.
2014-03-24 10:26:10 +11:00
Michael Ellerman 76cb8a783a powerpc/perf: Enable BHRB access for EBB events
The previous commit added constraint and register handling to allow
processes using EBB (Event Based Branches) to request access to the BHRB
(Branch History Rolling Buffer).

With that in place we can allow processes using EBB to access the BHRB.
This is achieved by setting BHRBA in MMCR0 when we enable EBB access. We
must also clear BHRBA when we are disabling.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-24 09:48:27 +11:00
Michael Ellerman c2e37a2626 powerpc/perf: Add lost exception workaround
Some power8 revisions have a hardware bug where we can lose a PMU
exception, this commit adds a workaround to detect the bad condition and
rectify the situation.

See the comment in the commit for a full description.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-24 09:48:25 +11:00
Scott Wood 9d378dfac8 powerpc/booke64: Use SPRG7 for VDSO
Previously SPRG3 was marked for use by both VDSO and critical
interrupts (though critical interrupts were not fully implemented).

In commit 8b64a9dfb0 ("powerpc/booke64:
Use SPRG0/3 scratch for bolted TLB miss & crit int"), Mihai Caraman
made an attempt to resolve this conflict by restoring the VDSO value
early in the critical interrupt, but this has some issues:

 - It's incompatible with EXCEPTION_COMMON which restores r13 from the
   by-then-overwritten scratch (this cost me some debugging time).
 - It forces critical exceptions to be a special case handled
   differently from even machine check and debug level exceptions.
 - It didn't occur to me that it was possible to make this work at all
   (by doing a final "ld r13, PACA_EXCRIT+EX_R13(r13)") until after
   I made (most of) this patch. :-)

It might be worth investigating using a load rather than SPRG on return
from all exceptions (except TLB misses where the scratch never leaves
the SPRG) -- it could save a few cycles.  Until then, let's stick with
SPRG for all exceptions.

Since we cannot use SPRG4-7 for scratch without corrupting the state of
a KVM guest, move VDSO to SPRG7 on book3e.  Since neither SPRG4-7 nor
critical interrupts exist on book3s, SPRG3 is still used for VDSO
there.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Mihai Caraman <mihai.caraman@freescale.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: kvm-ppc@vger.kernel.org
2014-03-19 19:57:14 -05:00
Wang Dongsheng b0b7dcbdf3 powerpc/fsl: add PVR definition for E500MC and E5500
Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-03-19 19:39:25 -05:00
Linus Torvalds e2a0f813e0 Second batch of KVM updates. Some minor x86 fixes,
two s390 guest features that need some handling in the host,
 and all the PPC changes.  The PPC changes include support for
 little-endian guests and enablement for new POWER8 features.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJS6UF5AAoJEBvWZb6bTYby55kP/AgTJnyu7avN653/2aSHkjkx
 KgYSMYhZPIFoY5LyZuNetXaoXFRvCykux1VYSZ6V6s35h2PZ+hdJNbHGjFYKPGTq
 FQ92xQVNuWCAPxmFCjDNuDV/0BauG5y08/Orh/jpjz+GAfH43LruUQGbtXUuyJ8u
 vf+yTHniU5gguqsAmodqjHUgbf+GoPJ1j7hmRoWwt8IWm7Ns3v/IK4l0p6G0h26a
 RjE6aK+Tm208Yr5hD/dRAqeTbBNt3c4xub+QPsKoiEMaZBSuAOiux7D3Kx+If1gp
 WsmqEQxoymihVtkZhUFO9ONLJepvmG2QwJVVyMSUW9iqxX9rraXsvVyVMwcQAhog
 JuOAYxBftH07xu6Fs4eym5KvCFghM+EaJvxxt+kgnvdD4htK1+eK5trntc2zygSi
 /qGiIrkqjXpkskW8kujLayF0eAU3CrZvFWveEPBfFgYiOGX/2wzJCtSm/bt9Jo0M
 v60qgNFK3LNqAyeEfnm9VtlwGr6ZgsAB6DHNPX4fM5s2IBjL+qloXk/e/+aVKkW0
 I3yeRdy/ExhLAab6w81JtMeR7G3YS0UNuAEVvcoxzNb5wIBY8qnpfUzTKyMxQR94
 64EVpxWEYO1s55eCCyMujWrSvc+YAwhJcWHGKgC4K7mxxLD3FVyQXX6YZvgRozMX
 HjQju+DToj9CskyrFlRL
 =yd0Z
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more KVM updates from Paolo Bonzini:
 "Second batch of KVM updates.  Some minor x86 fixes, two s390 guest
  features that need some handling in the host, and all the PPC changes.

  The PPC changes include support for little-endian guests and
  enablement for new POWER8 features"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (45 commits)
  x86, kvm: correctly access the KVM_CPUID_FEATURES leaf at 0x40000101
  x86, kvm: cache the base of the KVM cpuid leaves
  kvm: x86: move KVM_CAP_HYPERV_TIME outside #ifdef
  KVM: PPC: Book3S PR: Cope with doorbell interrupts
  KVM: PPC: Book3S HV: Add software abort codes for transactional memory
  KVM: PPC: Book3S HV: Add new state for transactional memory
  powerpc/Kconfig: Make TM select VSX and VMX
  KVM: PPC: Book3S HV: Basic little-endian guest support
  KVM: PPC: Book3S HV: Add support for DABRX register on POWER7
  KVM: PPC: Book3S HV: Prepare for host using hypervisor doorbells
  KVM: PPC: Book3S HV: Handle new LPCR bits on POWER8
  KVM: PPC: Book3S HV: Handle guest using doorbells for IPIs
  KVM: PPC: Book3S HV: Consolidate code that checks reason for wake from nap
  KVM: PPC: Book3S HV: Implement architecture compatibility modes for POWER8
  KVM: PPC: Book3S HV: Add handler for HV facility unavailable
  KVM: PPC: Book3S HV: Flush the correct number of TLB sets on POWER8
  KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs
  KVM: PPC: Book3S HV: Align physical and virtual CPU thread numbers
  KVM: PPC: Book3S HV: Don't set DABR on POWER8
  kvm/ppc: IRQ disabling cleanup
  ...
2014-01-31 08:37:32 -08:00
Paolo Bonzini b73117c493 Merge branch 'kvm-ppc-next' of git://github.com/agraf/linux-2.6 into kvm-queue
Conflicts:
	arch/powerpc/kvm/book3s_hv_rmhandlers.S
	arch/powerpc/kvm/booke.c
2014-01-29 18:29:01 +01:00
Paul Mackerras 8563bf52d5 KVM: PPC: Book3S HV: Add support for DABRX register on POWER7
The DABRX (DABR extension) register on POWER7 processors provides finer
control over which accesses cause a data breakpoint interrupt.  It
contains 3 bits which indicate whether to enable accesses in user,
kernel and hypervisor modes respectively to cause data breakpoint
interrupts, plus one bit that enables both real mode and virtual mode
accesses to cause interrupts.  Currently, KVM sets DABRX to allow
both kernel and user accesses to cause interrupts while in the guest.

This adds support for the guest to specify other values for DABRX.
PAPR defines a H_SET_XDABR hcall to allow the guest to set both DABR
and DABRX with one call.  This adds a real-mode implementation of
H_SET_XDABR, which shares most of its code with the existing H_SET_DABR
implementation.  To support this, we add a per-vcpu field to store the
DABRX value plus code to get and set it via the ONE_REG interface.

For Linux guests to use this new hcall, userspace needs to add
"hcall-xdabr" to the set of strings in the /chosen/hypertas-functions
property in the device tree.  If userspace does this and then migrates
the guest to a host where the kernel doesn't include this patch, then
userspace will need to implement H_SET_XDABR by writing the specified
DABR value to the DABR using the ONE_REG interface.  In that case, the
old kernel will set DABRX to DABRX_USER | DABRX_KERNEL.  That should
still work correctly, at least for Linux guests, since Linux guests
cope with getting data breakpoint interrupts in modes that weren't
requested by just ignoring the interrupt, and Linux guests never set
DABRX_BTI.

The other thing this does is to make H_SET_DABR and H_SET_XDABR work
on POWER8, which has the DAWR and DAWRX instead of DABR/X.  Guests that
know about POWER8 should use H_SET_MODE rather than H_SET_[X]DABR, but
guests running in POWER7 compatibility mode will still use H_SET_[X]DABR.
For them, this adds the logic to convert DABR/X values into DAWR/X values
on POWER8.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27 16:01:15 +01:00
Paul Mackerras e0622bd9f2 KVM: PPC: Book3S HV: Handle new LPCR bits on POWER8
POWER8 has a bit in the LPCR to enable or disable the PURR and SPURR
registers to count when in the guest.  Set this bit.

POWER8 has a field in the LPCR called AIL (Alternate Interrupt Location)
which is used to enable relocation-on interrupts.  Allow userspace to
set this field.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27 16:01:11 +01:00
Paul Mackerras aa31e84322 KVM: PPC: Book3S HV: Handle guest using doorbells for IPIs
* SRR1 wake reason field for system reset interrupt on wakeup from nap
  is now a 4-bit field on P8, compared to 3 bits on P7.

* Set PECEDP in LPCR when napping because of H_CEDE so guest doorbells
  will wake us up.

* Waking up from nap because of a guest doorbell interrupt is not a
  reason to exit the guest.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27 16:01:10 +01:00
Paul Mackerras 5557ae0ec7 KVM: PPC: Book3S HV: Implement architecture compatibility modes for POWER8
This allows us to select architecture 2.05 (POWER6) or 2.06 (POWER7)
compatibility modes on a POWER8 processor.  (Note that transactional
memory is disabled for usermode if either or both of the PCR_TM_DIS
and PCR_ARCH_206 bits are set.)

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27 16:01:06 +01:00
Michael Neuling b005255e12 KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs
This adds fields to the struct kvm_vcpu_arch to store the new
guest-accessible SPRs on POWER8, adds code to the get/set_one_reg
functions to allow userspace to access this state, and adds code to
the guest entry and exit to context-switch these SPRs between host
and guest.

Note that DPDES (Directed Privileged Doorbell Exception State) is
shared between threads on a core; hence we store it in struct
kvmppc_vcore and have the master thread save and restore it.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27 16:01:00 +01:00
Wang Dongsheng 71a6fa17e1 powerpc/fsl: add E6500 PVR and SPRN_PWRMGTCR0 define
E6500 PVR and SPRN_PWRMGTCR0 will be used in subsequent pw20/altivec
idle patches.

Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-01-07 19:29:23 -06:00
LEROY Christophe ae2163be10 powerpc/8xx: mfspr SPRN_TBRx in lieu of mftb/mftbu is not supported
Commit beb2dc0a7a breaks the MPC8xx which
seems to not support using mfspr SPRN_TBRx instead of mftb/mftbu
despite what is written in the reference manual.

This patch reverts to the use of mftb/mftbu when CONFIG_8xx is
selected.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2013-11-22 16:56:48 -06:00
Linus Torvalds f080480488 Here are the 3.13 KVM changes. There was a lot of work on the PPC
side: the HV and emulation flavors can now coexist in a single kernel
 is probably the most interesting change from a user point of view.
 On the x86 side there are nested virtualization improvements and a
 few bugfixes.  ARM got transparent huge page support, improved
 overcommit, and support for big endian guests.
 
 Finally, there is a new interface to connect KVM with VFIO.  This
 helps with devices that use NoSnoop PCI transactions, letting the
 driver in the guest execute WBINVD instructions.  This includes
 some nVidia cards on Windows, that fail to start without these
 patches and the corresponding userspace changes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJShPAhAAoJEBvWZb6bTYbyl48P/297GgmELHAGBgjvb6q7yyGu
 L8+eHjKbh4XBAkPwyzbvUjuww5z2hM0N3JQ0BDV9oeXlO+zwwCEns/sg2Q5/NJXq
 XxnTeShaKnp9lqVBnE6G9rAOUWKoyLJ2wItlvUL8JlaO9xJ0Vmk0ta4n2Nv5GqDp
 db6UD7vju6rHtIAhNpvvAO51kAOwc01xxRixCVb7KUYOnmO9nvpixzoI/S0Rp1gu
 w/OWMfCosDzBoT+cOe79Yx1OKcpaVW94X6CH1s+ShCw3wcbCL2f13Ka8/E3FIcuq
 vkZaLBxio7vjUAHRjPObw0XBW4InXEbhI1DjzIvm8dmc4VsgmtLQkTCG8fj+jINc
 dlHQUq6Do+1F4zy6WMBUj8tNeP1Z9DsABp98rQwR8+BwHoQpGQBpAxW0TE0ZMngC
 t1caqyvjZ5pPpFUxSrAV+8Kg4AvobXPYOim0vqV7Qea07KhFcBXLCfF7BWdwq/Jc
 0CAOlsLL4mHGIQWZJuVGw0YGP7oATDCyewlBuDObx+szYCoV4fQGZVBEL0KwJx/1
 7lrLN7JWzRyw6xTgJ5VVwgYE1tUY4IFQcHu7/5N+dw8/xg9KWA3f4PeMavIKSf+R
 qteewbtmQsxUnvuQIBHLs8NRWPnBPy+F3Sc2ckeOLIe4pmfTte6shtTXcLDL+LqH
 NTmT/cfmYp2BRkiCfCiS
 =rWNf
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM changes from Paolo Bonzini:
 "Here are the 3.13 KVM changes.  There was a lot of work on the PPC
  side: the HV and emulation flavors can now coexist in a single kernel
  is probably the most interesting change from a user point of view.

  On the x86 side there are nested virtualization improvements and a few
  bugfixes.

  ARM got transparent huge page support, improved overcommit, and
  support for big endian guests.

  Finally, there is a new interface to connect KVM with VFIO.  This
  helps with devices that use NoSnoop PCI transactions, letting the
  driver in the guest execute WBINVD instructions.  This includes some
  nVidia cards on Windows, that fail to start without these patches and
  the corresponding userspace changes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits)
  kvm, vmx: Fix lazy FPU on nested guest
  arm/arm64: KVM: PSCI: propagate caller endianness to the incoming vcpu
  arm/arm64: KVM: MMIO support for BE guest
  kvm, cpuid: Fix sparse warning
  kvm: Delete prototype for non-existent function kvm_check_iopl
  kvm: Delete prototype for non-existent function complete_pio
  hung_task: add method to reset detector
  pvclock: detect watchdog reset at pvclock read
  kvm: optimize out smp_mb after srcu_read_unlock
  srcu: API for barrier after srcu read unlock
  KVM: remove vm mmap method
  KVM: IOMMU: hva align mapping page size
  KVM: x86: trace cpuid emulation when called from emulator
  KVM: emulator: cleanup decode_register_operand() a bit
  KVM: emulator: check rex prefix inside decode_register()
  KVM: x86: fix emulation of "movzbl %bpl, %eax"
  kvm_host: typo fix
  KVM: x86: emulate SAHF instruction
  MAINTAINERS: add tree for kvm.git
  Documentation/kvm: add a 00-INDEX file
  ...
2013-11-15 13:51:36 +09:00
Paul Mackerras 388cc6e133 KVM: PPC: Book3S HV: Support POWER6 compatibility mode on POWER7
This enables us to use the Processor Compatibility Register (PCR) on
POWER7 to put the processor into architecture 2.05 compatibility mode
when running a guest.  In this mode the new instructions and registers
that were introduced on POWER7 are disabled in user mode.  This
includes all the VSX facilities plus several other instructions such
as ldbrx, stdbrx, popcntw, popcntd, etc.

To select this mode, we have a new register accessible through the
set/get_one_reg interface, called KVM_REG_PPC_ARCH_COMPAT.  Setting
this to zero gives the full set of capabilities of the processor.
Setting it to one of the "logical" PVR values defined in PAPR puts
the vcpu into the compatibility mode for the corresponding
architecture level.  The supported values are:

0x0f000002	Architecture 2.05 (POWER6)
0x0f000003	Architecture 2.06 (POWER7)
0x0f100003	Architecture 2.06+ (POWER7+)

Since the PCR is per-core, the architecture compatibility level and
the corresponding PCR value are stored in the struct kvmppc_vcore, and
are therefore shared between all vcpus in a virtual core.

Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: squash in fix to add missing break statements and documentation]
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 14:45:02 +02:00
Paul Mackerras a0144e2a6b KVM: PPC: Book3S HV: Store LPCR value for each virtual core
This adds the ability to have a separate LPCR (Logical Partitioning
Control Register) value relating to a guest for each virtual core,
rather than only having a single value for the whole VM.  This
corresponds to what real POWER hardware does, where there is a LPCR
per CPU thread but most of the fields are required to have the same
value on all active threads in a core.

The per-virtual-core LPCR can be read and written using the
GET/SET_ONE_REG interface.  Userspace can can only modify the
following fields of the LPCR value:

DPFD	Default prefetch depth
ILE	Interrupt little-endian
TC	Translation control (secondary HPT hash group search disable)

We still maintain a per-VM default LPCR value in kvm->arch.lpcr, which
contains bits relating to memory management, i.e. the Virtualized
Partition Memory (VPM) bits and the bits relating to guest real mode.
When this default value is updated, the update needs to be propagated
to the per-vcore values, so we add a kvmppc_update_lpcr() helper to do
that.

Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix whitespace]
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 14:45:01 +02:00
Paul Mackerras 93b0f4dc29 KVM: PPC: Book3S HV: Implement timebase offset for guests
This allows guests to have a different timebase origin from the host.
This is needed for migration, where a guest can migrate from one host
to another and the two hosts might have a different timebase origin.
However, the timebase seen by the guest must not go backwards, and
should go forwards only by a small amount corresponding to the time
taken for the migration.

Therefore this provides a new per-vcpu value accessed via the one_reg
interface using the new KVM_REG_PPC_TB_OFFSET identifier.  This value
defaults to 0 and is not modified by KVM.  On entering the guest, this
value is added onto the timebase, and on exiting the guest, it is
subtracted from the timebase.

This is only supported for recent POWER hardware which has the TBU40
(timebase upper 40 bits) register.  Writing to the TBU40 register only
alters the upper 40 bits of the timebase, leaving the lower 24 bits
unchanged.  This provides a way to modify the timebase for guest
migration without disturbing the synchronization of the timebase
registers across CPU cores.  The kernel rounds up the value given
to a multiple of 2^24.

Timebase values stored in KVM structures (struct kvm_vcpu, struct
kvmppc_vcore, etc.) are stored as host timebase values.  The timebase
values in the dispatch trace log need to be guest timebase values,
however, since that is read directly by the guest.  This moves the
setting of vcpu->arch.dec_expires on guest exit to a point after we
have restored the host timebase so that vcpu->arch.dec_expires is a
host timebase value.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 14:44:59 +02:00
Anton Blanchard ef1967ff87 powerpc: Set MSR_LE bit on little endian builds
We need to set MSR_LE in kernel and userspace for little endian builds

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-10-11 16:48:31 +11:00
Paul Mackerras 9f24b0c9ef powerpc: Correct FSCR bit definitions
Commit 74e400cee6 ("powerpc: Rework setting up H/FSCR bit definitions")
ended up with incorrect bit numbers for FSCR_PM_LG and FSCR_BHRB_LG.
This fixes them.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-09-05 17:29:20 +10:00
Benjamin Herrenschmidt 3f1f431188 Merge branch 'merge' into next
Merge stuff that already went into Linus via "merge" which
are pre-reqs for subsequent patches
2013-08-27 15:03:30 +10:00
Scott Wood beb2dc0a7a powerpc: Convert some mftb/mftbu into mfspr
Some CPUs (such as e500v1/v2) don't implement mftb and will take a
trap.  mfspr should work on everything that has a timebase, and is the
preferred instruction according to ISA v2.06.

Currently we get away with mftb on 85xx because the assembler converts
it to mfspr due to -Wa,-me500.  However, that flag has other effects
that are undesireable for certain targets (e.g.  lwsync is converted to
sync), and is hostile to multiplatform kernels.  Thus we would like to
stop setting it for all e500-family builds.

mftb/mftbu instances which are in 85xx code or common code are
converted.  Instances which will never run on 85xx are left alone.

Signed-off-by: Scott Wood <scottwood@freescale.com>
2013-08-20 19:33:12 -05:00
Scott Wood d52459ca30 powerpc/fsl-booke: Work around erratum A-006958
Erratum A-006598 says that 64-bit mftb is not atomic -- it's subject
to a similar race condition as doing mftbu/mftbl on 32-bit.  The lower
half of timebase is updated before the upper half; thus, we can share
the workaround for a similar bug on Cell.  This workaround involves
looping if the lower half of timebase is zero, thus avoiding the need
for a scratch register (other than CR0).  This workaround must be
avoided when the timebase is frozen, such as during the timebase sync
code.

This deals with kernel and vdso accesses, but other userspace accesses
will of course need to be fixed elsewhere.

Signed-off-by: Scott Wood <scottwood@freescale.com>
2013-08-20 15:45:49 -05:00
Anton Blanchard 594bd38303 powerpc: Wrap MSR macros with parentheses
Not having parentheses around a macro is asking for trouble.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-08-14 11:50:21 +10:00
Michael Neuling 74e400cee6 powerpc: Rework setting up H/FSCR bit definitions
This reworks the Facility Status and Control Regsiter (FSCR) config bit
definitions so that we can access the bit numbers.  This is needed for a
subsequent patch to fix the userspace DSCR handling.

HFSCR and FSCR bit definitions are the same, so reuse them.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> [v3.10]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-08-09 18:07:01 +10:00
Michael Neuling 33959f88fc powerpc: Add second POWER8 PVR entry
POWER8 comes with two different PVRs.  This patch enables the additional
PVR in the cputable.

The existing entry (PVR=0x4b) is renamed to POWER8E and the new entry
(PVR=0x4d) is given POWER8.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-07-24 14:18:43 +10:00
Michael Ellerman 330a1eb777 powerpc/perf: Core EBB support for 64-bit book3s
Add support for EBB (Event Based Branches) on 64-bit book3s. See the
included documentation for more details.

EBBs are a feature which allows the hardware to branch directly to a
specified user space address when a PMU event overflows. This can be
used by programs for self-monitoring with no kernel involvement in the
inner loop.

Most of the logic is in the generic book3s code, primarily to avoid a
proliferation of PMU callbacks.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-07-01 11:50:10 +10:00
Michael Ellerman 7a7a41f9d5 powerpc/perf: Freeze PMC5/6 if we're not using them
On Power8 we can freeze PMC5 and 6 if we're not using them. Normally they
run all the time.

As noticed by Anshuman, we should unfreeze them when we disable the PMU
as there are legacy tools which expect them to run all the time.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
CC: <stable@vger.kernel.org> [v3.10]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-07-01 11:49:57 +10:00
Michael Neuling b75c100ef2 powerpc/tm: Move TM abort cause codes to uapi
These cause codes are usable by userspace, so let's export to uapi.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> # v3.9
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:23 +10:00
Michael Neuling 6ce6c629fd powerpc/tm: Abort on emulation and alignment faults
If we are emulating an instruction inside an active user transaction that
touches memory, the kernel can't emulate it as it operates in transactional
suspend context.  We need to abort these transactions and send them back to
userspace for the hardware to rollback.

We can service these if the user transaction is in suspend mode, since the
kernel will operate in the same suspend context.

This adds a check to all alignment faults and to specific instruction
emulations (only string instructions for now).  If the user process is in an
active (non-suspended) transaction, we abort the transaction go back to
userspace allowing the HW to roll back the transaction and tell the user of the
failure.  This also adds new tm abort cause codes to report the reason of the
persistent error to the user.

Crappy test case here http://neuling.org/devel/junkcode/aligntm.c

Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> # v3.9
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:22 +10:00
Michael Neuling 35f7097fce powerpc/tm: Make room for hypervisor in abort cause codes
PAPR carves out 0xff-0xe0 for hypervisor use of transactional memory software
abort cause codes.  Unfortunately we don't respect this currently.

Below fixes this to move our cause codes to below this region.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> # 3.9 only
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:22 +10:00
Linus Torvalds 01227a889e Merge tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Gleb Natapov:
 "Highlights of the updates are:

  general:
   - new emulated device API
   - legacy device assignment is now optional
   - irqfd interface is more generic and can be shared between arches

  x86:
   - VMCS shadow support and other nested VMX improvements
   - APIC virtualization and Posted Interrupt hardware support
   - Optimize mmio spte zapping

  ppc:
    - BookE: in-kernel MPIC emulation with irqfd support
    - Book3S: in-kernel XICS emulation (incomplete)
    - Book3S: HV: migration fixes
    - BookE: more debug support preparation
    - BookE: e6500 support

  ARM:
   - reworking of Hyp idmaps

  s390:
   - ioeventfd for virtio-ccw

  And many other bug fixes, cleanups and improvements"

* tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
  kvm: Add compat_ioctl for device control API
  KVM: x86: Account for failing enable_irq_window for NMI window request
  KVM: PPC: Book3S: Add API for in-kernel XICS emulation
  kvm/ppc/mpic: fix missing unlock in set_base_addr()
  kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write
  kvm/ppc/mpic: remove users
  kvm/ppc/mpic: fix mmio region lists when multiple guests used
  kvm/ppc/mpic: remove default routes from documentation
  kvm: KVM_CAP_IOMMU only available with device assignment
  ARM: KVM: iterate over all CPUs for CPU compatibility check
  KVM: ARM: Fix spelling in error message
  ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally
  KVM: ARM: Fix API documentation for ONE_REG encoding
  ARM: KVM: promote vfp_host pointer to generic host cpu context
  ARM: KVM: add architecture specific hook for capabilities
  ARM: KVM: perform HYP initilization for hotplugged CPUs
  ARM: KVM: switch to a dual-step HYP init code
  ARM: KVM: rework HYP page table freeing
  ARM: KVM: enforce maximum size for identity mapped code
  ARM: KVM: move to a KVM provided HYP idmap
  ...
2013-05-05 14:47:31 -07:00
Michael Ellerman 9353374b8e powerpc: Context switch the new EBB SPRs
This context switches the new Event Based Branching (EBB) SPRs.  The three new
SPRs are:
  - Event Based Branch Handler Register (EBBHR)
  - Event Based Branch Return Register (EBBRR)
  - Branch Event Status and Control Register (BESCR)

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-02 10:37:36 +10:00
Michael Neuling 1ddf499e1a powerpc: Turn on the EBB H/FSCR bits
This turns Event Based Branching (EBB) on in the Hypervisor Facility Status and
Control Register (HFSCR) and Facility Status and Control Register (FSCR).

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-02 10:36:55 +10:00
Anshuman Khandual 53b56ca019 powerpc: Setup BHRB instructions facility in HFSCR for POWER8
Make BHRB instructions available in problem and privileged states.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-02 10:35:15 +10:00
Paul Mackerras 4619ac88b7 KVM: PPC: Book3S HV: Improve real-mode handling of external interrupts
This streamlines our handling of external interrupts that come in
while we're in the guest.  First, when waking up a hardware thread
that was napping, we split off the "napping due to H_CEDE" case
earlier, and use the code that handles an external interrupt (0x500)
in the guest to handle that too.  Secondly, the code that handles
those external interrupts now checks if any other thread is exiting
to the host before bouncing an external interrupt to the guest, and
also checks that there is actually an external interrupt pending for
the guest before setting the LPCR MER bit (mediated external request).

This also makes sure that we clear the "ceded" flag when we handle a
wakeup from cede in real mode, and fixes a potential infinite loop
in kvmppc_run_vcpu() which can occur if we ever end up with the ceded
flag set but MSR[EE] off.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:32 +02:00
Michael Ellerman 8f61aa325f powerpc/perf: Add support for SIER
On power8 we have a new SIER (Sampled Instruction Event Register), which
captures information about instructions when we have random sampling
enabled.

Add support for loading the SIER into pt_regs, overloading regs->dar.
Also set the new NO_SIPR flag in regs->result if we don't have SIPR.

Update regs_sihv/sipr() to look for SIPR/SIHV in SIER.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:11:10 +10:00
Michael Ellerman 240686c136 powerpc: Initialise PMU related regs on Power8
For both HV and guest kernels, intialise PMU regs to something sane.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:11:06 +10:00
Michael Neuling 04b418c97f powerpc: Add HFSCR SPR definitions
Add SPR number and bit definitions for the HFSCR (Hypervisor Facility Status
and Control Register).

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 13:03:58 +10:00
Michael Neuling fa759e9b09 powerpc: Add DSCR FSCR register bit definition
This sets the DSCR (Data Stream Control Register) in the FSCR (Facility Status
& Control Register).

Also harmonise TAR (Target Address Register) FSCR bit definition too.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-03-05 16:56:30 +11:00
Linus Torvalds 89f883372f Merge tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Marcelo Tosatti:
 "KVM updates for the 3.9 merge window, including x86 real mode
  emulation fixes, stronger memory slot interface restrictions, mmu_lock
  spinlock hold time reduction, improved handling of large page faults
  on shadow, initial APICv HW acceleration support, s390 channel IO
  based virtio, amongst others"

* tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits)
  Revert "KVM: MMU: lazily drop large spte"
  x86: pvclock kvm: align allocation size to page size
  KVM: nVMX: Remove redundant get_vmcs12 from nested_vmx_exit_handled_msr
  x86 emulator: fix parity calculation for AAD instruction
  KVM: PPC: BookE: Handle alignment interrupts
  booke: Added DBCR4 SPR number
  KVM: PPC: booke: Allow multiple exception types
  KVM: PPC: booke: use vcpu reference from thread_struct
  KVM: Remove user_alloc from struct kvm_memory_slot
  KVM: VMX: disable apicv by default
  KVM: s390: Fix handling of iscs.
  KVM: MMU: cleanup __direct_map
  KVM: MMU: remove pt_access in mmu_set_spte
  KVM: MMU: cleanup mapping-level
  KVM: MMU: lazily drop large spte
  KVM: VMX: cleanup vmx_set_cr0().
  KVM: VMX: add missing exit names to VMX_EXIT_REASONS array
  KVM: VMX: disable SMEP feature when guest is in non-paging mode
  KVM: Remove duplicate text in api.txt
  Revert "KVM: MMU: split kvm_mmu_free_page"
  ...
2013-02-24 13:07:18 -08:00
Michael Neuling 2b0a576d15 powerpc: Add new transactional memory state to the signal context
This adds the new transactional memory archtected state to the signal context
in both 32 and 64 bit.

Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-02-15 17:02:23 +11:00
Michael Neuling 98ae22e15b powerpc: Add helper functions for transactional memory context switching
Here we add the helper functions to be used when context switching.  These
allow us to fully reclaim and recheckpoint a transaction.

We introduce a new paca field called tm_scratch to help us store away register
values when doing the low level tm reclaim register save.

Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-02-15 16:58:52 +11:00
Michael Neuling 97a0aac9b8 powerpc: Register defines for various transactional memory registers
Defines for MSR bits and transactional memory related SPRs TFIAR, TEXASR and
TEXASRU.

Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-02-15 16:58:51 +11:00
Bharat Bhushan ffe129ecd7 KVM: PPC: booke: use vcpu reference from thread_struct
Like other places, use thread_struct to get vcpu reference.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-02-13 12:56:39 +01:00
Ian Munsie 2468dcf641 powerpc: Add support for context switching the TAR register
This patch adds support for enabling and context switching the Target
Address Register in Power8. The TAR is a new special purpose register
that can be used for computed branches with the bctar[l] (branch
conditional to TAR) instruction in the same manner as the count and link
registers.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-02-08 14:05:50 +11:00
Michael Neuling 9422de3e95 powerpc: Hardware breakpoints rewrite to handle non DABR breakpoint registers
This is a rewrite so that we don't assume we are using the DABR throughout the
code.  We now use the arch_hw_breakpoint to store the breakpoint in a generic
manner in the thread_struct, rather than storing the raw DABR value.

The ptrace GET/SET_DEBUGREG interface currently passes the raw DABR in from
userspace.  We keep this functionality, so that future changes (like the POWER8
DAWR), will still fake the DABR to userspace.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-01-10 17:01:44 +11:00
Michael Neuling a8190a59e7 powerpc: Add DAWR/X SPR number definitions
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-01-10 17:01:41 +11:00
Haren Myneni 13e7a8e846 powerpc: Macros for saving/restore PPR
[PATCH 5/6] powerpc: Macros for saving/restore PPR

Several macros are defined for saving and restore user defined PPR value.

Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-01-10 17:01:10 +11:00
Ian Munsie 42d02b81f2 powerpc: Define differences between doorbells on book3e and book3s
There are a few key differences between doorbells on server compared
with embedded that we care about on Linux, namely:

- We have a new msgsndp instruction for directed privileged doorbells.
  msgsnd is used for directed hypervisor doorbells.
- The tag we use in the instruction is the Thread Identification
  Register of the recipient thread (since server doorbells can only
  occur between threads within a single core), and is only 7 bits wide.
- A new message type is introduced for server doorbells (none of the
  existing book3e message types are currently supported on book3s).

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Tested-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-01-10 15:09:05 +11:00
Linus Torvalds 16e024f30c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc update from Benjamin Herrenschmidt:
 "The main highlight is probably some base POWER8 support.  There's more
  to come such as transactional memory support but that will wait for
  the next one.

  Overall it's pretty quiet, or rather I've been pretty poor at picking
  things up from patchwork and reviewing them this time around and Kumar
  no better on the FSL side it seems..."

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (73 commits)
  powerpc+of: Rename and fix OF reconfig notifier error inject module
  powerpc: mpc5200: Add a3m071 board support
  powerpc/512x: don't compile any platform DIU code if the DIU is not enabled
  powerpc/mpc52xx: use module_platform_driver macro
  powerpc+of: Export of_reconfig_notifier_[register,unregister]
  powerpc/dma/raidengine: add raidengine device
  powerpc/iommu/fsl: Add PAMU bypass enable register to ccsr_guts struct
  powerpc/mpc85xx: Change spin table to cached memory
  powerpc/fsl-pci: Add PCI controller ATMU PM support
  powerpc/86xx: fsl_pcibios_fixup_bus requires CONFIG_PCI
  drivers/virt: the Freescale hypervisor driver doesn't need to check MSR[GS]
  powerpc/85xx: p1022ds: Use NULL instead of 0 for pointers
  powerpc: Disable relocation on exceptions when kexecing
  powerpc: Enable relocation on during exceptions at boot
  powerpc: Move get_longbusy_msecs into hvcall.h and remove duplicate function
  powerpc: Add wrappers to enable/disable relocation on exceptions
  powerpc: Add set_mode hcall
  powerpc: Setup relocation on exceptions for bare metal systems
  powerpc: Move initial mfspr LPCR out of __init_LPCR
  powerpc: Add relocation on exception vector handlers
  ...
2012-12-18 09:58:09 -08:00
Paul Mackerras 28c483b62f KVM: PPC: Book3S PR: Fix VSX handling
This fixes various issues in how we were handling the VSX registers
that exist on POWER7 machines.  First, we were running off the end
of the current->thread.fpr[] array.  Ultimately this was because the
vcpu->arch.vsr[] array is sized to be able to store both the FP
registers and the extra VSX registers (i.e. 64 entries), but PR KVM
only uses it for the extra VSX registers (i.e. 32 entries).

Secondly, calling load_up_vsx() from C code is a really bad idea,
because it jumps to fast_exception_return at the end, rather than
returning with a blr instruction.  This was causing it to jump off
to a random location with random register contents, since it was using
the largely uninitialized stack frame created by kvmppc_load_up_vsx.

In fact, it isn't necessary to call either __giveup_vsx or load_up_vsx,
since giveup_fpu and load_up_fpu handle the extra VSX registers as well
as the standard FP registers on machines with VSX.  Also, since VSX
instructions can access the VMX registers and the FP registers as well
as the extra VSX registers, we have to load up the FP and VMX registers
before we can turn on the MSR_VSX bit for the guest.  Conversely, if
we save away any of the VSX or FP registers, we have to turn off MSR_VSX
for the guest.

To handle all this, it is more convenient for a single call to
kvmppc_giveup_ext() to handle all the state saving that needs to be done,
so we make it take a set of MSR bits rather than just one, and the switch
statement becomes a series of if statements.  Similarly kvmppc_handle_ext
needs to be able to load up more than one set of registers.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06 01:34:02 +01:00
Michael Neuling b0302722ee powerpc: Setup relocation on exceptions for bare metal systems
This turns on MMU on execptions via AIL field in the LPCR.

Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 15:08:06 +11:00
Michael Neuling 71e1849724 powerpc: POWER8 cputable entry
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 13:00:45 +11:00
sukadev@linux.vnet.ibm.com e6878835ac powerpc/perf: Sample only if SIAR-Valid bit is set in P7+
powerpc/perf: Sample only if SIAR-Valid bit is set in P7+

On POWER7+ two new bits (mmcra[35] and mmcra[36]) indicate whether the
contents of SIAR and SDAR are valid.

For marked instructions on P7+, we must save the contents of SIAR and
SDAR registers only if these new bits are set.

This code/check for the SIAR-Valid bit is specific to P7+, so rather than
waste a CPU-feature bit use the PVR flag.

Note that Carl Love proposed a similar change for oprofile:

        https://lkml.org/lkml/2012/6/22/309

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-27 12:51:05 +10:00
Michael Neuling b92a66a65c powerpc: Add denormalisation exception handling for POWER6/7
On POWER6 and POWER7 if the input operand to an instruction is a
denormalised single precision binary floating point value we can take
a denormalisation exception where it's expected that the hypervisor
(HV=1) will fix up the inputs before the instruction is run.

This adds code to handle this denormalisation exception for POWER6 and
POWER7.

It also add a CONFIG_PPC_DENORMALISATION option and sets it in
pseries/ppc64_defconfig.

This is useful on bare metal systems only.  Based on patch from Milton
Miller.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-17 16:31:47 +10:00
Michael Neuling 4474ef055c powerpc: Rework set_dabr so it can take a DABRX value as well
Rework set_dabr to take a DABRX value as well.

Both the pseries and PS3 hypervisors do some checks on the DABRX
values that are passed in the hcall.  This patch stops bogus values
from being passed to hypervisor.  Also, in the case where we are
clearing the breakpoint, where DABR and DABRX are zero, we modify the
DABRX value to make it valid so that the hcall won't fail.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-10 09:59:10 +10:00
sukadev@linux.vnet.ibm.com 22d8ce8879 powerpc: Define Power7+ PV constant PV_POWER7p
This definition will be used by subsequent perf and oprofile patches

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-07 10:47:17 +10:00
Mihai Caraman 8b64a9dfb0 powerpc/booke64: Use SPRG0/3 scratch for bolted TLB miss & crit int
Embedded.Hypervisor category defines GSPRG0..3 physical registers for guests.
Avoid SPRG4-7 usage as scratch in host exception handlers, otherwise guest
SPRG4-7 registers will be clobbered.
For bolted TLB miss exception handlers, which is the version currently
supported by KVM, use SPRN_SPRG_GEN_SCRATCH aka SPRG0 instead of
SPRN_SPRG_TLB_SCRATCH aka SPRG6. Keep using TLB PACA slots to fit in one
64-byte cache line.
For critical exception handlers use SPRG3 instead of SPRG7. Provide a routine
to store and restore user-visible SPRGs. This will be subsequently used
to restore VDSO information in SPRG3. Add EX_R13 to paca slots to free up
SPRG3 and change the critical exception epilog to use it.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-05 15:35:52 +10:00
Mihai Caraman 5473eb1c07 powerpc/booke64: Use GSRR registers in Guest Doorbell interrupts
Guest Doorbell interrupts use guest save and restore registers. Add a new
Guest Doorbell exception type to accommodate GSRR0/1 SPRs usage in exception
prolog and fix the exception handler.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-05 15:35:43 +10:00
Michael Ellerman d3dbeef657 powerpc: Rename 64-bit PVR constants to PVR_foo
We have an old FIXME in reg.h which points out that we should standardise
on PVR_foo for our PVR #defines. Currently we use PVR_ on 32-bit and PV_
on 64-bit.

So do that rename and remove the FIXME.

Seeing as we're touching all but one usage of __is_processor(), rename it
to something less ugly and more indicative of what it does, which is
simply to check the PVR version.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-05 15:19:35 +10:00
Tiejun Chen b416c9a10b powerpc: Add "memory" attribute for mfmsr()
Add "memory" attribute in inline assembly language as a compiler
barrier to make sure 4.6.x GCC don't reorder mfmsr().

Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: stable@vger.kernel.org
2012-07-11 15:25:45 +10:00
Anton Blanchard 18ad51dd34 powerpc: Add VDSO version of getcpu
We have a request for a fast method of getting CPU and NUMA node IDs
from userspace. This patch implements a getcpu VDSO function,
similar to x86.

Ben suggested we use SPRG3 which is userspace readable. SPRG3 can be
modified by a KVM guest, so we save the SPRG3 value in the paca and
restore it when transitioning from the guest to the host.

I have a glibc patch that implements sched_getcpu on top of this.
Testing on a POWER7:

baseline: 538 cycles
vdso:      30 cycles

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-11 14:18:40 +10:00
Scott Wood d30f6e4800 KVM: PPC: booke: category E.HV (GS-mode) support
Chips such as e500mc that implement category E.HV in Power ISA 2.06
provide hardware virtualization features, including a new MSR mode for
guest state.  The guest OS can perform many operations without trapping
into the hypervisor, including transitions to and from guest userspace.

Since we can use SRR1[GS] to reliably tell whether an exception came from
guest state, instead of messing around with IVPR, we use DO_KVM similarly
to book3s.

Current issues include:
 - Machine checks from guest state are not routed to the host handler.
 - The guest can cause a host oops by executing an emulated instruction
   in a page that lacks read permission.  Existing e500/4xx support has
   the same problem.

Includes work by Ashish Kalra <Ashish.Kalra@freescale.com>,
Varun Sethi <Varun.Sethi@freescale.com>, and
Liu Yu <yu.liu@freescale.com>.

Signed-off-by: Scott Wood <scottwood@freescale.com>
[agraf: remove pt_regs usage]
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:51:19 +03:00
Linus Torvalds 2e7580b0e7 Merge branch 'kvm-updates/3.4' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Avi Kivity:
 "Changes include timekeeping improvements, support for assigning host
  PCI devices that share interrupt lines, s390 user-controlled guests, a
  large ppc update, and random fixes."

This is with the sign-off's fixed, hopefully next merge window we won't
have rebased commits.

* 'kvm-updates/3.4' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits)
  KVM: Convert intx_mask_lock to spin lock
  KVM: x86: fix kvm_write_tsc() TSC matching thinko
  x86: kvmclock: abstract save/restore sched_clock_state
  KVM: nVMX: Fix erroneous exception bitmap check
  KVM: Ignore the writes to MSR_K7_HWCR(3)
  KVM: MMU: make use of ->root_level in reset_rsvds_bits_mask
  KVM: PMU: add proper support for fixed counter 2
  KVM: PMU: Fix raw event check
  KVM: PMU: warn when pin control is set in eventsel msr
  KVM: VMX: Fix delayed load of shared MSRs
  KVM: use correct tlbs dirty type in cmpxchg
  KVM: Allow host IRQ sharing for assigned PCI 2.3 devices
  KVM: Ensure all vcpus are consistent with in-kernel irqchip settings
  KVM: x86 emulator: Allow PM/VM86 switch during task switch
  KVM: SVM: Fix CPL updates
  KVM: x86 emulator: VM86 segments must have DPL 3
  KVM: x86 emulator: Fix task switch privilege checks
  arch/powerpc/kvm/book3s_hv.c: included linux/sched.h twice
  KVM: x86 emulator: correctly mask pmc index bits in RDPMC instruction emulation
  KVM: mmu_notifier: Flush TLBs before releasing mmu_lock
  ...
2012-03-28 14:35:31 -07:00
Benjamin Herrenschmidt fe1952fc0a powerpc: Rework runlatch code
This moves the inlines into system.h and changes the runlatch
code to use the thread local flags (non-atomic) rather than
the TIF flags (atomic) to keep track of the latch state.

The code to turn it back on in an asynchronous interrupt is
now simplified and partially inlined.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09 10:55:02 +11:00
Paul Mackerras 342d3db763 KVM: PPC: Implement MMU notifiers for Book3S HV guests
This adds the infrastructure to enable us to page out pages underneath
a Book3S HV guest, on processors that support virtualized partition
memory, that is, POWER7.  Instead of pinning all the guest's pages,
we now look in the host userspace Linux page tables to find the
mapping for a given guest page.  Then, if the userspace Linux PTE
gets invalidated, kvm_unmap_hva() gets called for that address, and
we replace all the guest HPTEs that refer to that page with absent
HPTEs, i.e. ones with the valid bit clear and the HPTE_V_ABSENT bit
set, which will cause an HDSI when the guest tries to access them.
Finally, the page fault handler is extended to reinstantiate the
guest HPTE when the guest tries to access a page which has been paged
out.

Since we can't intercept the guest DSI and ISI interrupts on PPC970,
we still have to pin all the guest pages on PPC970.  We have a new flag,
kvm->arch.using_mmu_notifiers, that indicates whether we can page
guest pages out.  If it is not set, the MMU notifier callbacks do
nothing and everything operates as before.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:38 +02:00
Paul Mackerras 697d3899dc KVM: PPC: Implement MMIO emulation support for Book3S HV guests
This provides the low-level support for MMIO emulation in Book3S HV
guests.  When the guest tries to map a page which is not covered by
any memslot, that page is taken to be an MMIO emulation page.  Instead
of inserting a valid HPTE, we insert an HPTE that has the valid bit
clear but another hypervisor software-use bit set, which we call
HPTE_V_ABSENT, to indicate that this is an absent page.  An
absent page is treated much like a valid page as far as guest hcalls
(H_ENTER, H_REMOVE, H_READ etc.) are concerned, except of course that
an absent HPTE doesn't need to be invalidated with tlbie since it
was never valid as far as the hardware is concerned.

When the guest accesses a page for which there is an absent HPTE, it
will take a hypervisor data storage interrupt (HDSI) since we now set
the VPM1 bit in the LPCR.  Our HDSI handler for HPTE-not-present faults
looks up the hash table and if it finds an absent HPTE mapping the
requested virtual address, will switch to kernel mode and handle the
fault in kvmppc_book3s_hv_page_fault(), which at present just calls
kvmppc_hv_emulate_mmio() to set up the MMIO emulation.

This is based on an earlier patch by Benjamin Herrenschmidt, but since
heavily reworked.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:37 +02:00
Paul Mackerras da9d1d7f28 KVM: PPC: Allow use of small pages to back Book3S HV guests
This relaxes the requirement that the guest memory be provided as
16MB huge pages, allowing it to be provided as normal memory, i.e.
in pages of PAGE_SIZE bytes (4k or 64k).  To allow this, we index
the kvm->arch.slot_phys[] arrays with a small page index, even if
huge pages are being used, and use the low-order 5 bits of each
entry to store the order of the enclosing page with respect to
normal pages, i.e. log_2(enclosing_page_size / PAGE_SIZE).

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:37 +02:00
Tony Breeds df777bd39a powerpc/476fpe: Add 476fpe SoC code
Based on original work by David 'Shaggy' Kleikamp.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Josh Boyer <jwboyer@gmail.com>
2011-12-09 07:51:02 -05:00
Peter Zijlstra 501d238633 ppc: Remove duplicate definition of PV_POWER7
One definition of PV_POWER7 seems enough to me.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-08-05 14:47:55 +10:00
Scott Wood 326ed6a9bc powerpc: mtspr/mtmsr should take an unsigned long
Add a cast in case the caller passes in a different type, as it would
if mtspr/mtmsr were functions.

Previously, if a 64-bit type was passed in on 32-bit, GCC would bind the
constraint to a pair of registers, and would substitute the first register
in the pair in the asm code.  This corresponds to the upper half of the
64-bit register, which is generally not the desired behavior.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-08-05 14:47:54 +10:00
Linus Torvalds 184475029a Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (99 commits)
  drivers/virt: add missing linux/interrupt.h to fsl_hypervisor.c
  powerpc/85xx: fix mpic configuration in CAMP mode
  powerpc: Copy back TIF flags on return from softirq stack
  powerpc/64: Make server perfmon only built on ppc64 server devices
  powerpc/pseries: Fix hvc_vio.c build due to recent changes
  powerpc: Exporting boot_cpuid_phys
  powerpc: Add CFAR to oops output
  hvc_console: Add kdb support
  powerpc/pseries: Fix hvterm_raw_get_chars to accept < 16 chars, fixing xmon
  powerpc/irq: Quieten irq mapping printks
  powerpc: Enable lockup and hung task detectors in pseries and ppc64 defeconfigs
  powerpc: Add mpt2sas driver to pseries and ppc64 defconfig
  powerpc: Disable IRQs off tracer in ppc64 defconfig
  powerpc: Sync pseries and ppc64 defconfigs
  powerpc/pseries/hvconsole: Fix dropped console output
  hvc_console: Improve tty/console put_chars handling
  powerpc/kdump: Fix timeout in crash_kexec_wait_realmode
  powerpc/mm: Fix output of total_ram.
  powerpc/cpufreq: Add cpufreq driver for Momentum Maple boards
  powerpc: Correct annotations of pmu registration functions
  ...

Fix up trivial Kconfig/Makefile conflicts in arch/powerpc, drivers, and
drivers/cpufreq
2011-07-25 22:59:39 -07:00
Paul Mackerras 969391c58a powerpc, KVM: Split HVMODE_206 cpu feature bit into separate HV and architecture bits
This replaces the single CPU_FTR_HVMODE_206 bit with two bits, one to
indicate that we have a usable hypervisor mode, and another to indicate
that the processor conforms to PowerISA version 2.06.  We also add
another bit to indicate that the processor conforms to ISA version 2.01
and set that for PPC970 and derivatives.

Some PPC970 chips (specifically those in Apple machines) have a
hypervisor mode in that MSR[HV] is always 1, but the hypervisor mode
is not useful in the sense that there is no way to run any code in
supervisor mode (HV=0 PR=0).  On these processors, the LPES0 and LPES1
bits in HID4 are always 0, and we use that as a way of detecting that
hypervisor mode is not useful.

Where we have a feature section in assembly code around code that
only applies on POWER7 in hypervisor mode, we use a construct like

END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)

The definition of END_FTR_SECTION_IFSET is such that the code will
be enabled (not overwritten with nops) only if all bits in the
provided mask are set.

Note that the CPU feature check in __tlbie() only needs to check the
ARCH_206 bit, not the HVMODE bit, because __tlbie() can only get called
if we are running bare-metal, i.e. in hypervisor mode.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:58 +03:00
Paul Mackerras aa04b4cc5b KVM: PPC: Allocate RMAs (Real Mode Areas) at boot for use by guests
This adds infrastructure which will be needed to allow book3s_hv KVM to
run on older POWER processors, including PPC970, which don't support
the Virtual Real Mode Area (VRMA) facility, but only the Real Mode
Offset (RMO) facility.  These processors require a physically
contiguous, aligned area of memory for each guest.  When the guest does
an access in real mode (MMU off), the address is compared against a
limit value, and if it is lower, the address is ORed with an offset
value (from the Real Mode Offset Register (RMOR)) and the result becomes
the real address for the access.  The size of the RMA has to be one of
a set of supported values, which usually includes 64MB, 128MB, 256MB
and some larger powers of 2.

Since we are unlikely to be able to allocate 64MB or more of physically
contiguous memory after the kernel has been running for a while, we
allocate a pool of RMAs at boot time using the bootmem allocator.  The
size and number of the RMAs can be set using the kvm_rma_size=xx and
kvm_rma_count=xx kernel command line options.

KVM exports a new capability, KVM_CAP_PPC_RMA, to signal the availability
of the pool of preallocated RMAs.  The capability value is 1 if the
processor can use an RMA but doesn't require one (because it supports
the VRMA facility), or 2 if the processor requires an RMA for each guest.

This adds a new ioctl, KVM_ALLOCATE_RMA, which allocates an RMA from the
pool and returns a file descriptor which can be used to map the RMA.  It
also returns the size of the RMA in the argument structure.

Having an RMA means we will get multiple KMV_SET_USER_MEMORY_REGION
ioctl calls from userspace.  To cope with this, we now preallocate the
kvm->arch.ram_pginfo array when the VM is created with a size sufficient
for up to 64GB of guest memory.  Subsequently we will get rid of this
array and use memory associated with each memslot instead.

This moves most of the code that translates the user addresses into
host pfns (page frame numbers) out of kvmppc_prepare_vrma up one level
to kvmppc_core_prepare_memory_region.  Also, instead of having to look
up the VMA for each page in order to check the page size, we now check
that the pages we get are compound pages of 16MB.  However, if we are
adding memory that is mapped to an RMA, we don't bother with calling
get_user_pages_fast and instead just offset from the base pfn for the
RMA.

Typically the RMA gets added after vcpus are created, which makes it
inconvenient to have the LPCR (logical partition control register) value
in the vcpu->arch struct, since the LPCR controls whether the processor
uses RMA or VRMA for the guest.  This moves the LPCR value into the
kvm->arch struct and arranges for the MER (mediated external request)
bit, which is the only bit that varies between vcpus, to be set in
assembly code when going into the guest if there is a pending external
interrupt request.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:57 +03:00
Paul Mackerras de56a948b9 KVM: PPC: Add support for Book3S processors in hypervisor mode
This adds support for KVM running on 64-bit Book 3S processors,
specifically POWER7, in hypervisor mode.  Using hypervisor mode means
that the guest can use the processor's supervisor mode.  That means
that the guest can execute privileged instructions and access privileged
registers itself without trapping to the host.  This gives excellent
performance, but does mean that KVM cannot emulate a processor
architecture other than the one that the hardware implements.

This code assumes that the guest is running paravirtualized using the
PAPR (Power Architecture Platform Requirements) interface, which is the
interface that IBM's PowerVM hypervisor uses.  That means that existing
Linux distributions that run on IBM pSeries machines will also run
under KVM without modification.  In order to communicate the PAPR
hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
to include/linux/kvm.h.

Currently the choice between book3s_hv support and book3s_pr support
(i.e. the existing code, which runs the guest in user mode) has to be
made at kernel configuration time, so a given kernel binary can only
do one or the other.

This new book3s_hv code doesn't support MMIO emulation at present.
Since we are running paravirtualized guests, this isn't a serious
restriction.

With the guest running in supervisor mode, most exceptions go straight
to the guest.  We will never get data or instruction storage or segment
interrupts, alignment interrupts, decrementer interrupts, program
interrupts, single-step interrupts, etc., coming to the hypervisor from
the guest.  Therefore this introduces a new KVMTEST_NONHV macro for the
exception entry path so that we don't have to do the KVM test on entry
to those exception handlers.

We do however get hypervisor decrementer, hypervisor data storage,
hypervisor instruction storage, and hypervisor emulation assist
interrupts, so we have to handle those.

In hypervisor mode, real-mode accesses can access all of RAM, not just
a limited amount.  Therefore we put all the guest state in the vcpu.arch
and use the shadow_vcpu in the PACA only for temporary scratch space.
We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
We don't have a shared page with the guest, but we still need a
kvm_vcpu_arch_shared struct to store the values of various registers,
so we include one in the vcpu_arch struct.

The POWER7 processor has a restriction that all threads in a core have
to be in the same partition.  MMU-on kernel code counts as a partition
(partition 0), so we have to do a partition switch on every entry to and
exit from the guest.  At present we require the host and guest to run
in single-thread mode because of this hardware restriction.

This code allocates a hashed page table for the guest and initializes
it with HPTEs for the guest's Virtual Real Memory Area (VRMA).  We
require that the guest memory is allocated using 16MB huge pages, in
order to simplify the low-level memory management.  This also means that
we can get away without tracking paging activity in the host for now,
since huge pages can't be paged or swapped.

This also adds a few new exports needed by the book3s_hv code.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:54 +03:00
Paul Mackerras 923c53caea powerpc: Set up LPCR for running guest partitions
In hypervisor mode, the LPCR controls several aspects of guest
partitions, including virtual partition memory mode, and also controls
whether the hypervisor decrementer interrupts are enabled.  This sets
up LPCR at boot time so that guest partitions will use a virtual real
memory area (VRMA) composed of 16MB large pages, and hypervisor
decrementer interrupts are disabled.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:52 +03:00