Commit Graph

81605 Commits

Author SHA1 Message Date
Denis Cheng 0fe410d3f3 dlm: static initialization improvements
also change name_prefix from char pointer to char array.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-30 11:04:43 -06:00
David Teigland dbcfc34733 dlm: clean ups
A couple small clean-ups.  Remove unnecessary wrapper-functions in
rcom.c, and remove unnecessary casting and an unnecessary ASSERT in
util.c.

Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-30 11:04:43 -06:00
Patrick Caulfeld 2a79289e87 dlm: Sanity check namelen before copying it
The 32/64 compatibility code in the DLM does not check the validity of
the lock name length passed into it, so it can easily overwrite memory
if the value is rubbish (as early versions of libdlm can cause with
unlock calls, it doesn't zero the field).

This patch restricts the length of the name to the amount of data
actually passed into the call.

Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-30 11:04:43 -06:00
David Teigland 85f0379aa0 dlm: keep cached master rsbs during recovery
To prevent the master of an rsb from changing rapidly, an unused rsb is kept
on the "toss list" for a period of time to be reused.  The toss list was
being cleared completely for each recovery, which is unnecessary.  Much of
the benefit of the toss list can be maintained if nodes keep rsb's in their
toss list that they are the master of.  These rsb's need to be included
when the resource directory is rebuilt during recovery.

Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-30 11:04:43 -06:00
David Teigland 594199ebaa dlm: change error message to debug
The invalid lockspace messages are normal and can appear relatively
often.  They should be suppressed without debugging enabled.

Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-30 11:04:43 -06:00
David Teigland ce5246b972 dlm: fix possible use-after-free
The dlm_put_lkb() can free the lkb and its associated ua structure,
so we can't depend on using the ua struct after the put.

Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-30 11:04:42 -06:00
David Teigland 755b5eb8ba dlm: limit dir lookup loop
In a rare case we may need to repeat a local resource directory lookup
due to a race with removing the rsb and removing the resdir record.
We'll never need to do more than a single additional lookup, though,
so the infinite loop around the lookup can be removed.  In addition
to being unnecessary, the infinite loop is dangerous since some other
unknown condition may appear causing the loop to never break.

Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-30 11:04:42 -06:00
David Teigland 42dc1601a9 dlm: reject normal unlock when lock is waiting for lookup
Non-forced unlocks should be rejected if the lock is waiting on the
rsb_lookup list for another lock to establish the master node.

Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-30 11:04:42 -06:00
David Teigland c54e04b00f dlm: validate messages before processing
There was some hit and miss validation of messages that has now been
cleaned up and unified.  Before processing a message, the new
validate_message() function checks that the lkb is the appropriate type,
process-copy or master-copy, and that the message is from the correct
nodeid for the the given lkb.  Other checks and assertions on the
lkb type and nodeid have been removed.  The assertions were particularly
bad since they would panic the machine instead of just ignoring the bad
message.

Although other recent patches have made processing old message unlikely,
it still may be possible for an old message to be processed and caught
by these checks.

Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-30 11:04:42 -06:00
David Teigland 46b43eed70 dlm: reject messages from non-members
Messages from nodes that are no longer members of the lockspace should be
ignored.  When nodes are removed from the lockspace, recovery can
sometimes complete quickly enough that messages arrive from a removed node
after recovery has completed.  When processed, these messages would often
cause an error message, and could in some cases change some state, causing
problems.

Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-30 11:04:42 -06:00
David Teigland aec64e1be2 dlm: another call to confirm_master in receive_request_reply
When a failed request (EBADR or ENOTBLK) is unlocked/canceled instead of
retried, there may be other lkb's waiting on the rsb_lookup list for it
to complete.  A call to confirm_master() is needed to move on to the next
waiting lkb since the current one won't be retried.

Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-30 11:04:42 -06:00
David Teigland 601342ce02 dlm: recover locks waiting for overlap replies
When recovery looks at locks waiting for replies, it fails to consider
locks that have already received a reply for their first remote operation,
but not received a reply for secondary, overlapping unlock/cancel.  The
appropriate stub reply needs to be called for these waiters.

Appears when we start doing recovery in the presence of a many overlapping
unlock/cancel ops.

Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-30 11:04:42 -06:00
David Teigland 8a358ca8e7 dlm: clear ast_type when removing from astqueue
The lkb_ast_type field indicates whether the lkb is on the astqueue list.
When clearing locks for a process, lkb's were being removed from the astqueue
list without clearing the field.  If release_lockspace then happened
immediately afterward, it could try to remove the lkb from the list a second
time.

Appears when process calls libdlm dlm_release_lockspace() which first
closes the ls dev triggering clear_proc_locks, and then removes the ls
(a write to control dev) causing release_lockspace().

Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-30 11:04:42 -06:00
David Teigland 861e2369e9 dlm: use fixed errno values in messages
Some errno values differ across platforms. So if we return things like
-EINPROGRESS from one node it can get misinterpreted or rejected on
another one.

This patch fixes up the errno values passed on the wire so that they
match the x86 ones (so as not to break the protocol), and re-instates
the platform-specific ones at the other end.

Many thanks to Fabio for testing this patch.
Initial patch from Patrick.

Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-30 11:04:42 -06:00
Fabio M. Di Nitto 550283e30c dlm: swap bytes for rcom lock reply
DLM_RCOM_LOCK_REPLY messages need byte swapping.

Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-30 11:04:41 -06:00
Fabio M. Di Nitto e7847d35ac dlm: align midcomms message buffer
gcc does not guarantee that an auto buffer is 64bit aligned.
This change allows sparc64 to work.

Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-30 11:04:25 -06:00
Avi Kivity 2f52d58c92 KVM: Move apic timer migration away from critical section
Migrating the apic timer in the critical section is not very nice, and is
absolutely horrible with the real-time port.  Move migration to the regular
vcpu execution path, triggered by a new bitflag.

Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:22 +02:00
Glauber de Oliveira Costa a03d7f4b54 KVM: Put kvm_para.h include outside __KERNEL__
kvm_para.h potentially contains definitions that are to be used by userspace,
so it should not be included inside the __KERNEL__ block. To protect its own
data structures, kvm_para.h already includes its own __KERNEL__ block.

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Acked-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:22 +02:00
Avi Kivity 6c14280125 KVM: Fix unbounded preemption latency
When preparing to enter the guest, if an interrupt comes in while
preemption is disabled but interrupts are still enabled, we miss a
preemption point.  Fix by explicitly checking whether we need to
reschedule.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:22 +02:00
Avi Kivity 97db56ce6c KVM: Initialize the mmu caches only after verifying cpu support
Otherwise we re-initialize the mmu caches, which will fail since the
caches are already registered, which will cause us to deinitialize said caches.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:22 +02:00
Izik Eidus 75e68e6078 KVM: MMU: Fix dirty page setting for pages removed from rmap
Right now rmap_remove won't set the page as dirty if the shadow pte
pointed to this page had write access and then it became readonly.
This patches fixes that, by setting the page as dirty for spte changes from
write to readonly access.

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:22 +02:00
Christian Ehrhardt 6f723c7911 KVM: Portability: Move kvm_fpu to asm-x86/kvm.h
This patch moves kvm_fpu asm-x86/kvm.h to allow every architecture to
define an own representation used for KVM_GET_FPU/KVM_SET_FPU.

Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Acked-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:22 +02:00
Sheng Yang 571008dacc KVM: x86 emulator: Only allow VMCALL/VMMCALL trapped by #UD
When executing a test program called "crashme", we found the KVM guest cannot
survive more than ten seconds, then encounterd kernel panic. The basic concept
of "crashme" is generating random assembly code and trying to execute it.

After some fixes on emulator insn validity judgment, we found it's hard to
get the current emulator handle the invalid instructions correctly, for the
#UD trap for hypercall patching caused troubles. The problem is, if the opcode
itself was OK, but combination of opcode and modrm_reg was invalid, and one
operand of the opcode was memory (SrcMem or DstMem), the emulator will fetch
the memory operand first rather than checking the validity, and may encounter
an error there. For example, ".byte 0xfe, 0x34, 0xcd" has this problem.

In the patch, we simply check that if the invalid opcode wasn't vmcall/vmmcall,
then return from emulate_instruction() and inject a #UD to guest. With the
patch, the guest had been running for more than 12 hours.

Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com>
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:21 +02:00
Dong, Eddie 5882842f9b KVM: MMU: Merge shadow level check in FNAME(fetch)
Remove the redundant level check when fetching
shadow pte for present & non-present spte.

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:21 +02:00
Avi Kivity eb787d10af KVM: MMU: Move kvm_free_some_pages() into critical section
If some other cpu steals mmu pages between our check and an attempt to
allocate, we can run out of mmu pages.  Fix by moving the check into the
same critical section as the allocation.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:21 +02:00
Marcelo Tosatti aaee2c94f7 KVM: MMU: Switch to mmu spinlock
Convert the synchronization of the shadow handling to a separate mmu_lock
spinlock.

Also guard fetch() by mmap_sem in read-mode to protect against alias
and memslot changes.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:21 +02:00
Avi Kivity d7824fff89 KVM: MMU: Avoid calling gfn_to_page() in mmu_set_spte()
Since gfn_to_page() is a sleeping function, and we want to make the core mmu
spinlocked, we need to pass the page from the walker context (which can sleep)
to the shadow context (which cannot).

[marcelo: avoid recursive locking of mmap_sem]

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:21 +02:00
Marcelo Tosatti 7ec5458821 KVM: Add kvm_read_guest_atomic()
In preparation for a mmu spinlock, add kvm_read_guest_atomic()
and use it in fetch() and prefetch_page().

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:20 +02:00
Marcelo Tosatti 10589a4699 KVM: MMU: Concurrent guest walkers
Do not hold kvm->lock mutex across the entire pagefault code,
only acquire it in places where it is necessary, such as mmu
hash list, active list, rmap and parent pte handling.

Allow concurrent guest walkers by switching walk_addr() to use
mmap_sem in read-mode.

And get rid of the lockless __gfn_to_page.

[avi: move kvm_mmu_pte_write() locking inside the function]
[avi: add locking for real mode]
[avi: fix cmpxchg locking]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:20 +02:00
Avi Kivity 774ead3ad9 KVM: Disable vapic support on Intel machines with FlexPriority
FlexPriority accelerates the tpr without any patching.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:20 +02:00
Avi Kivity b93463aa59 KVM: Accelerated apic support
This adds a mechanism for exposing the virtual apic tpr to the guest, and a
protocol for letting the guest update the tpr without causing a vmexit if
conditions allow (e.g. there is no interrupt pending with a higher priority
than the new tpr).

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:20 +02:00
Avi Kivity b209749f52 KVM: local APIC TPR access reporting facility
Add a facility to report on accesses to the local apic tpr even if the
local apic is emulated in the kernel.  This is basically a hack that
allows userspace to patch Windows which tends to bang on the tpr a lot.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:20 +02:00
Avi Kivity 565f1fbd9d KVM: Print data for unimplemented wrmsr
This can help diagnosing what the guest is trying to do.  In many cases
we can get away with partial emulation of msrs.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:20 +02:00
Avi Kivity dfc5aa00cb KVM: MMU: Add cache miss statistic
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:19 +02:00
Eddie Dong caa5b8a5ed KVM: MMU: Coalesce remote tlb flushes
Host side TLB flush can be merged together if multiple
spte need to be write-protected.

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:19 +02:00
Zhang Xiantao ec10f4750d KVM: Expose ioapic to ia64 save/restore APIs
IA64 also needs to see ioapic structure in irqchip.

Signed-off-by: xiantao.zhang@intel.com <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:19 +02:00
Zhang Xiantao 5736199afb KVM: Move kvm_vcpu_kick() to x86.c
Moving kvm_vcpu_kick() to x86.c. Since it should be
common for all archs, put its declarations in <linux/kvm_host.h>

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:19 +02:00
Zhang Xiantao 0eb8f49848 KVM: Move ioapic code to common directory.
Move ioapic code to common, since IA64 also needs it.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:19 +02:00
Zhang Xiantao 82470196fa KVM: Move irqchip declarations into new ioapic.h and lapic.h
This allows reuse of ioapic in ia64.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:19 +02:00
Avi Kivity 0fce5623ba KVM: Move drivers/kvm/* to virt/kvm/
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:18 +02:00
Avi Kivity edf884172e KVM: Move arch dependent files to new directory arch/x86/kvm/
This paves the way for multiple architecture support.  Note that while
ioapic.c could potentially be shared with ia64, it is also moved.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 18:01:18 +02:00
Ryan Harper 9584bf2c93 KVM: VMX: Add printk_ratelimit in vmx_intr_assist
Add printk_ratelimit check in front of printk.  This prevents spamming
of the message during 32-bit ubuntu 6.06server install.  Previously, it
would hang during the partition formatting stage.

Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:58:10 +02:00
Zhang Xiantao 0711456c0d KVM: Portability: Move kvm_vm_stat to x86.h
This patch moves kvm_vm_stat to x86.h, and every arch
can define its own kvm_vm_stat in $arch.h

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:58:10 +02:00
Zhang Xiantao bfc6d222bd KVM: Portability: Move round_robin_prev_vcpu and tss_addr to kvm_arch
This patches moves two fields round_robin_prev_vcpu and tss to kvm_arch.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:58:10 +02:00
Zhang Xiantao d7deeeb02c KVM: Portability: move vpic and vioapic to kvm_arch
This patches moves two fields vpid and vioapic to kvm_arch

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:58:10 +02:00
Zhang Xiantao f05e70ac03 KVM: Portability: Move mmu-related fields to kvm_arch
This patches moves mmu-related fields to kvm_arch.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:58:10 +02:00
Zhang Xiantao d69fb81f05 KVM: Portability: Move memslot aliases to new struct kvm_arch
This patches create kvm_arch to hold arch-specific kvm fileds
and moves fields naliases and aliases to kvm_arch.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:58:10 +02:00
Zhang Xiantao 77b4c255af KVM: Portability: Move kvm_vcpu_stat to x86.h
This patches moves kvm_vcpu_stat to x86.h, so every
arch can define its own kvm_vcpu_stat structure.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:58:10 +02:00
Zhang Xiantao d17fbbf738 KVM: Portability: Expand the KVM_VCPU_COMM in kvm_vcpu structure.
This patches removes KVM_COMM macro, original it is hold
kvm_vcpu common fields.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:58:09 +02:00
Zhang Xiantao d657a98e3c KVM: Portability: Move kvm_vcpu definition back to kvm.h
This patches moves kvm_vcpu definition to kvm.h, and finally
kvm.h includes x86.h.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:58:09 +02:00