The lppaca, slb_shadow and dtl_entry hypervisor structures are
big endian, so we have to byte swap them in little endian builds.
LE KVM hosts will also need to be fixed but for now add an #error
to remind us.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On entering a PR KVM guest, we invalidate the whole SLB before loading
up the guest entries. We do this using an slbia instruction, which
invalidates all entries except entry 0, followed by an slbie to
invalidate entry 0. However, the slbie turns out to be ineffective
in some circumstances (specifically when the host linear mapping uses
64k pages) because of errors in computing the parameter to the slbie.
The result is that the guest kernel hangs very early in boot because
it takes a DSI the first time it tries to access kernel data using
a linear mapping address in real mode.
Currently we construct bits 36 - 43 (big-endian numbering) of the slbie
parameter by taking bits 56 - 63 of the SLB VSID doubleword. These bits
for the tlbie are C (class, 1 bit), B (segment size, 2 bits) and 5
reserved bits. For the SLB VSID doubleword these are C (class, 1 bit),
reserved (1 bit), LP (large page size, 2 bits), and 4 reserved bits.
Thus we are not setting the B field correctly, and when LP = 01 as
it is for 64k pages, we are setting a reserved bit.
Rather than add more instructions to calculate the slbie parameter
correctly, this takes a simpler approach, which is to set entry 0 to
zeroes explicitly. Normally slbmte should not be used to invalidate
an entry, since it doesn't invalidate the ERATs, but it is OK to use
it to invalidate an entry if it is immediately followed by slbia,
which does invalidate the ERATs. (This has been confirmed with the
Power architects.) This approach takes fewer instructions and will
work whatever the contents of entry 0.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
While messing around with the SLBs we're running in real mode. The
entry to guest space goes through rfid, which is context synchronizing,
so there's no need to manually synchronize anything through isync.
With this patch and a simple priviledged SPR access loop guest, I get
a speed bump from 2035607 to 2181301 exits per second.
Signed-off-by: Alexander Graf <agraf@suse.de>
This simplifies the way that the book3s_pr makes the transition to
real mode when entering the guest. We now call kvmppc_entry_trampoline
(renamed from kvmppc_rmcall) in the base kernel using a normal function
call instead of doing an indirect call through a pointer in the vcpu.
If kvm is a module, the module loader takes care of generating a
trampoline as it does for other calls to functions outside the module.
kvmppc_entry_trampoline then disables interrupts and jumps to
kvmppc_handler_trampoline_enter in real mode using an rfi[d].
That then uses the link register as the address to return to
(potentially in module space) when the guest exits.
This also simplifies the way that we call the Linux interrupt handler
when we exit the guest due to an external, decrementer or performance
monitor interrupt. Instead of turning on the MMU, then deciding that
we need to call the Linux handler and turning the MMU back off again,
we now go straight to the handler at the point where we would turn the
MMU on. The handler will then return to the virtual-mode code
(potentially in the module).
Along the way, this moves the setting and clearing of the HID5 DCBZ32
bit into real-mode interrupts-off code, and also makes sure that
we clear the MSR[RI] bit before loading values into SRR0/1.
The net result is that we no longer need any code addresses to be
stored in vcpu->arch.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
We just introduced generic segment switching code that only needs to call
small macros to do the actual switching, but keeps most of the entry / exit
code generic.
So let's move the SLB switching code over to use this new mechanism.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
We have a 32 bit value in the PACA to store XER in. We also do an stw
when storing XER in there. But then we load it with ld, completely
screwing it up on every entry.
Welcome to the Big Endian world.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Currently we're racy when doing the transition from IR=1 to IR=0, from
the module memory entry code to the real mode SLB switching code.
To work around that I took a look at the RTAS entry code which is faced
with a similar problem and did the same thing:
A small helper in linear mapped memory that does mtmsr with IR=0 and
then RFIs info the actual handler.
Thanks to that trick we can safely take page faults in the entry code
and only need to be really wary of what to do as of the SLB switching
part.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
To fetch the last instruction we were interrupted on, we enable DR in early
exit code, where we are still in a very transitional phase between guest
and host state.
Most of the time this seemed to work, but another CPU can easily flush our
TLB and HTAB which makes us go in the Linux page fault handler which totally
breaks because we still use the guest's SLB entries.
To work around that, let's introduce a second KVM guest mode that defines
that whenever we get a trap, we don't call the Linux handler or go into
the KVM exit code, but just jump over the faulting instruction.
That way a potentially bad lwz doesn't trigger any faults and we can later
on interpret the invalid instruction we fetched as "fetch didn't work".
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
We're being horribly racy right now. All the entry and exit code hijacks
random fields from the PACA that could easily be used by different code in
case we get interrupted, for example by a #MC or even page fault.
After discussing this with Ben, we figured it's best to reserve some more
space in the PACA and just shove off some vcpu state to there.
That way we can drastically improve the readability of the code, make it
less racy and less complex.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
When we're loading bolted entries into the SLB again, we're checking if an
entry is in use and only slbmte it when it is.
Unfortunately, the check always goes to the skip label of the first entry,
resulting in an endless loop when it actually gets triggered.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This is the really low level of guest entry/exit code.
Book3s_64 has an SLB, which stores all ESID -> VSID mappings we're
currently aware of.
The segments in the guest differ from the ones on the host, so we need
to switch the SLB to tell the MMU that we're in a new context.
So we store a shadow of the guest's SLB in the PACA, switch to that on
entry and only restore bolted entries on exit, leaving the rest to the
Linux SLB fault handler.
That way we get a really clean way of switching the SLB.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>