Commit Graph

60023 Commits

Author SHA1 Message Date
Eddie Dong 2cc51560ae KVM: VMX: Avoid saving and restoring msr_efer on lightweight vmexit
MSR_EFER.LME/LMA bits are automatically save/restored by VMX
hardware, KVM only needs to save NX/SCE bits at time of heavy
weight VM Exit. But clearing NX bits in host envirnment may
cause system hang if the host page table is using EXB bits,
thus we leave NX bits as it is. If Host NX=1 and guest NX=0, we
can do guest page table EXB bits check before inserting a shadow
pte (though no guest is expecting to see this kind of gp fault).
If host NX=0, we present guest no Execute-Disable feature to guest,
thus no host NX=0, guest NX=1 combination.

This patch reduces raw vmexit time by ~27%.

Me: fix compile warnings on i386.

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:42 +03:00
Eddie Dong f2be4dd654 KVM: VMX: Cleanup redundant code in MSR set
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:42 +03:00
Eddie Dong a75beee6e4 KVM: VMX: Avoid saving and restoring msrs on lightweight vmexit
In a lightweight exit (where we exit and reenter the guest without
scheduling or exiting to userspace in between), we don't need various
msrs on the host, and avoiding shuffling them around reduces raw exit
time by 8%.

i386 compile fix by Daniel Hecken <dh@bahntechnik.de>.

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:41 +03:00
Nitin A Kamble b3f37707b0 KVM: VMX: Handle #SS faults from real mode
Instructions with address size override prefix opcode 0x67
Cause the #SS fault with 0 error code in VM86 mode.  Forward
them to the emulator.

Signed-Off-By: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:41 +03:00
Avi Kivity cd2276a795 KVM: VMX: Use local labels in inline assembly
This makes oprofile dumps and disassebly easier to read.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:41 +03:00
Avi Kivity cd0536d7cb KVM: Fix vmx I/O bitmap initialization on highmem systems
kunmap() expects a struct page, not a virtual address.  Fixes an oops loading
kvm-intel.ko on i386 with CONFIG_HIGHMEM.

Thanks to Michael Ivanov <deruhu@peterstar.ru> for reporting.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:41 +03:00
Avi Kivity 653e3108b7 KVM: Avoid corrupting tr in real mode
The real mode tr needs to be set to a specific tss so that I/O
instructions can function.  Divert the new tr values to the real
mode save area from where they will be restored on transition to
protected mode.

This fixes some crashes on reboot when the bios accesses an I/O
instruction.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:41 +03:00
Avi Kivity eff708bc2b KVM: VMX: Only reload guest msrs if they are already loaded
If we set an msr via an ioctl() instead of by handling a guest exit, we
have the host state loaded, so reloading the msrs would clobber host
state instead of guest state.

This fixes a host oops (and loss of a cpu) on a guest reboot.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:41 +03:00
Avi Kivity 47ad8e689b KVM: MMU: Store shadow page tables as kernel virtual addresses, not physical
Simpifies things a bit.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:40 +03:00
Avi Kivity 4b02d6daa1 KVM: MMU: Simplify kvm_mmu_free_page() a tiny bit
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:40 +03:00
Matthew Gregan 2dc7094b56 KVM: Implement IA32_EBL_CR_POWERON msr
Attempting to boot the default 'bsd' kernel of OpenBSD 4.1 i386 in a guest
fails early in the kernel init inside p3_get_bus_clock while trying to read
the IA32_EBL_CR_POWERON MSR.  KVM logs an 'unhandled MSR' message and the
guest kernel faults.

This patch is sufficient to allow OpenBSD to boot, after which it seems to
run fine.  I'm not sure if this is the correct solution for dealing with
this particular MSR, but it works for me.

Signed-off-by: Matthew Gregan <kinetik@flim.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:40 +03:00
Avi Kivity a3a0636725 KVM: Set cr0.mp for guests
This allows fwait instructions to be trapped when the guest fpu is not
loaded.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:40 +03:00
Avi Kivity 5fd86fcfc0 KVM: Consolidate guest fpu activation and deactivation
Easier to keep track of where the fpu is this way.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:39 +03:00
Avi Kivity abd3f2d622 KVM: Rationalize exception bitmap usage
Everyone owns a piece of the exception bitmap, but they happily write to
the entire thing like there's no tomorrow.  Centralize handling in
update_exception_bitmap() and have everyone call that.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:39 +03:00
Avi Kivity 707c087430 KVM: Move some more msr mangling into vmx_save_host_state()
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:39 +03:00
Avi Kivity 33ed632921 KVM: Fix potential guest state leak into host
The lightweight vmexit path avoids saving and reloading certain host
state.  However in certain cases lightweight vmexit handling can schedule()
which requires reloading the host state.

So we store the host state in the vcpu structure, and reloaded it if we
relinquish the vcpu.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:39 +03:00
Avi Kivity 7494c0ccbb KVM: Increase mmu shadow cache to 1024 pages
This improves kbuild times by about 10%, bringing it within a respectable
25% of native.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:39 +03:00
Avi Kivity 0028425f64 KVM: Update shadow pte on write to guest pte
A typical demand page/copy on write pattern is:

- page fault on vaddr
- kvm propagates fault to guest
- guest handles fault, updates pte
- kvm traps write, clears shadow pte, resumes guest
- guest returns to userspace, re-faults on same vaddr
- kvm installs shadow pte, resumes guest
- guest continues

So, three vmexits for a single guest page fault.  But if instead of clearing
the page table entry, we update to correspond to the value that the guest
has just written, we eliminate the third vmexit.

This patch does exactly that, reducing kbuild time by about 10%.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:39 +03:00
Avi Kivity fce0657ff9 KVM: MMU: Respect nonpae pagetable quadrant when zapping ptes
When a guest writes to a page that has an mmu shadow, we have to clear
the shadow pte corresponding to the memory location touched by the guest.

Now, in nonpae mode, a single guest page may have two or four shadow
pages (because a nonpae page maps 4MB or 4GB, whereas the pae shadow maps
2MB or 1GB), so we when we look up the page we find up to three additional
aliases for the page.  Since we _clear_ the shadow pte, it doesn't matter
except for a slight performance penalty, but if we want to _update_ the
shadow pte instead of clearing it, it is vital that we don't modify the
aliases.

Fortunately, exactly which page is needed (the "quadrant") is easily
computed, and is accessible in the shadow page header.  All we need is
to ignore shadow pages from the wrong quadrants.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:39 +03:00
Avi Kivity 09072daf37 KVM: Unify kvm_mmu_pre_write() and kvm_mmu_post_write()
Instead of calling two functions and repeating expensive checks, call one
function and provide it with before/after information.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:38 +03:00
Avi Kivity 621358455a KVM: Be more careful restoring fs on lightweight vmexit
i386 wants fs for accessing the pda even on a lightweight exit, so ensure
we can always restore it.  This fixes a regression on i386 introduced by
the lightweight vmexit patch.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:38 +03:00
Avi Kivity a25f7e1f8c KVM: Reduce misfirings of the fork detector
The kvm mmu tries to detects forks by looking for repeated writes to a
page table.  If it sees a fork, it unshadows the page table so the page
table copying can proceed at native speed instead of being emulated.

However, the detector also triggered on simple demand paging access patterns:
a linear walk of memory would of course cause repeated writes to the same
pagetable page, causing it to unshadow prematurely.

Fix by resetting the fork detector if we detect a demand fault.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:38 +03:00
Avi Kivity 05e0c8c344 KVM: Unindent some code
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:38 +03:00
Avi Kivity e6adf28365 KVM: Avoid saving and restoring some host CPU state on lightweight vmexit
Many msrs and the like will only be used by the host if we schedule() or
return to userspace.  Therefore, we avoid saving them if we handle the
exit within the kernel, and if a reschedule is not requested.

Based on a patch from Eddie Dong <eddie.dong@intel.com> with a couple of
fixes by me.

Signed-off-by: Yaozu(Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:38 +03:00
Avi Kivity e925c5ba93 KVM: Assume that writes smaller than 4 bytes are to non-pagetable pages
This allows us to remove write protection earlier than otherwise.  Should
some mad OS choose to use byte writes to update pagetables, it will suffer
a performance hit, but still work correctly.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:38 +03:00
Anthony Liguori c86813393f KVM: SVM: Allow direct guest access to PC debug port
The PC debug port is used for IO delay and does not require emulation.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:37 +03:00
He, Qing fdef3ad1b3 KVM: VMX: Enable io bitmaps to avoid IO port 0x80 VMEXITs
This patch enables IO bitmaps control on vmx and unmask the 0x80 port to
avoid VMEXITs caused by accessing port 0x80. 0x80 is used as delays (see
include/asm/io.h), and handling VMEXITs on its access is unnecessary but
slows things down. This patch improves kernel build test at around
3%~5%.
	Because every VM uses the same io bitmap, it is shared between
all VMs rather than a per-VM data structure.

Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:37 +03:00
Linus Torvalds 8f41958bdd Merge git://git.infradead.org/battery-2.6
* git://git.infradead.org/battery-2.6:
  git-battery vs git-acpi
  Power supply class and drivers: remove non obligatory return statements
  pda_power: clean up irq, timer
  MAINTAINERS: Add maintainers for power supply subsystem and drivers

Fixed up trivial conflict in drivers/w1/slaves/w1_ds2760.c manually
2007-07-15 16:56:12 -07:00
Linus Torvalds bc06cffdec Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (166 commits)
  [SCSI] ibmvscsi: convert to use the data buffer accessors
  [SCSI] dc395x: convert to use the data buffer accessors
  [SCSI] ncr53c8xx: convert to use the data buffer accessors
  [SCSI] sym53c8xx: convert to use the data buffer accessors
  [SCSI] ppa: coding police and printk levels
  [SCSI] aic7xxx_old: remove redundant GFP_ATOMIC from kmalloc
  [SCSI] i2o: remove redundant GFP_ATOMIC from kmalloc from device.c
  [SCSI] remove the dead CYBERSTORMIII_SCSI option
  [SCSI] don't build scsi_dma_{map,unmap} for !HAS_DMA
  [SCSI] Clean up scsi_add_lun a bit
  [SCSI] 53c700: Remove printk, which triggers because of low scsi clock on SNI RMs
  [SCSI] sni_53c710: Cleanup
  [SCSI] qla4xxx: Fix underrun/overrun conditions
  [SCSI] megaraid_mbox: use mutex instead of semaphore
  [SCSI] aacraid: add 51245, 51645 and 52245 adapters to documentation.
  [SCSI] qla2xxx: update version to 8.02.00-k1.
  [SCSI] qla2xxx: add support for NPIV
  [SCSI] stex: use resid for xfer len information
  [SCSI] Add Brownie 1200U3P to blacklist
  [SCSI] scsi.c: convert to use the data buffer accessors
  ...
2007-07-15 16:51:54 -07:00
Linus Torvalds d3502d7f25 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (53 commits)
  [TCP]: Verify the presence of RETRANS bit when leaving FRTO
  [IPV6]: Call inet6addr_chain notifiers on link down
  [NET_SCHED]: Kill CONFIG_NET_CLS_POLICE
  [NET_SCHED]: act_api: qdisc internal reclassify support
  [NET_SCHED]: sch_dsmark: act_api support
  [NET_SCHED]: sch_atm: act_api support
  [NET_SCHED]: sch_atm: Lindent
  [IPV6]: MSG_ERRQUEUE messages do not pass to connected raw sockets
  [IPV4]: Cleanup call to __neigh_lookup()
  [NET_SCHED]: Revert "avoid transmit softirq on watchdog wakeup" optimization
  [NETFILTER]: nf_conntrack: UDPLITE support
  [NETFILTER]: nf_conntrack: mark protocols __read_mostly
  [NETFILTER]: x_tables: add connlimit match
  [NETFILTER]: Lower *tables printk severity
  [NETFILTER]: nf_conntrack: Don't track locally generated special ICMP error
  [NETFILTER]: nf_conntrack: Introduces nf_ct_get_tuplepr and uses it
  [NETFILTER]: nf_conntrack: make l3proto->prepare() generic and renames it
  [NETFILTER]: nf_conntrack: Increment error count on parsing IPv4 header
  [NET]: Add ethtool support for NETIF_F_IPV6_CSUM devices.
  [AF_IUCV]: Add lock when updating accept_q
  ...
2007-07-15 16:50:46 -07:00
Linus Torvalds d2a9a8ded4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: fix a race condition bug in umount which caused a segfault
  9p: re-enable mount time debug option
  9p: cache meta-data when cache=loose
  net/9p: set error to EREMOTEIO if trans->write returns zero
  net/9p: change net/9p module name to 9pnet
  9p: Reorganization of 9p file system code
2007-07-15 16:44:53 -07:00
Linus Torvalds 2d896c780d Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6
* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6: (37 commits)
  [XFS] Fix lockdep annotations for xfs_lock_inodes
  [LIB]: export radix_tree_preload()
  [XFS] Fix XFS_IOC_FSBULKSTAT{,_SINGLE} & XFS_IOC_FSINUMBERS in compat mode
  [XFS] Compat ioctl handler for handle operations
  [XFS] Compat ioctl handler for XFS_IOC_FSGEOMETRY_V1.
  [XFS] Clean up function name handling in tracing code
  [XFS] Quota inode has no parent.
  [XFS] Concurrent Multi-File Data Streams
  [XFS] Use uninitialized_var macro to stop warning about rtx
  [XFS] XFS should not be looking at filp reference counts
  [XFS] Use is_power_of_2 instead of open coding checks
  [XFS] Reduce shouting by removing unnecessary macros from dir2 code.
  [XFS] Simplify XFS min/max macros.
  [XFS] Kill off xfs_count_bits
  [XFS] Cancel transactions on xfs_itruncate_start error.
  [XFS] Use do_div() on 64 bit types.
  [XFS] Fix remount,readonly path to flush everything correctly.
  [XFS] Cleanup inode extent size hint extraction
  [XFS] Prevent ENOSPC from aborting transactions that need to succeed
  [XFS] Prevent deadlock when flushing inodes on unmount
  ...
2007-07-15 16:43:43 -07:00
Al Viro 2a9915c8a2 make i2c-acorn tristate
It depends on tristate I2C and it's trivial to make modular.  The
current Kconfig allows I2C=m, I2C_ACORN=y, which doesn't work at
all; alternatives are dependency on I2C=y and making I2C_ACORN
itself a tristate.  The latter is the right thing to do...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-07-15 16:40:52 -07:00
Al Viro ba5b55d049 icside: devm_iounmap() needs linux/io.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-07-15 16:40:52 -07:00
Al Viro 05bd711ea2 missing argument in bin_attribute ->read()/->write()
Fallout from commit 91a6902958 ('sysfs:
add parameter "struct bin_attribute *" ...')

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-15 16:40:52 -07:00
Al Viro ececfdee1c fallout from constified seq_operations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-15 16:40:52 -07:00
Al Viro 8ca7ee6bcc fallout from Auke's pci ->revision patch
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-15 16:40:52 -07:00
Al Viro 2832e856fb ax88796: dev_dbg() wants device, not platform device
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-15 16:40:52 -07:00
Al Viro 22bb3e9e24 pass -msize-long to sparse on s390
s390 is the only 32bit with unsigned long for size_t (usual for those
is unsigned int).  Tell sparse...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-15 16:40:52 -07:00
Al Viro b4a06918c2 frv: missing __clear_user()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-15 16:40:52 -07:00
Al Viro c248725b61 zd1211rw: too early inclusion of asm/unaligned.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-15 16:40:52 -07:00
Al Viro 4381ca3c23 fix return type of skb_checksum_complete()
It returns __sum16, not unsigned int

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-15 16:40:51 -07:00
Al Viro 5f17c70fe6 PDA_POWER depends on having request_irq()
... so all proud owners of s390-based PDAs will have to live without that one

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-15 16:40:51 -07:00
Al Viro 51ec138c64 ieee1394: forgotten dereference...
Going through the string and waiting for _pointer_ to become '\0'
is not what the authors meant...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Ben Collins <ben.collins@ubuntu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-15 16:40:51 -07:00
Al Viro 0e81c666db the wrong variable checked after request_irq()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-15 16:40:51 -07:00
Al Viro 7c9e3c2e3b wrong order of arguments of ->readdir()
Shows how many people are testing coda - the bug had been there for 5 years
and results of stepping on it are not subtle.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-15 16:40:51 -07:00
Al Viro 53b6795002 minimal fixes for drivers/usb/gadget/m66592-udc.c
still looks racy (and definitely leaks)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-15 16:40:51 -07:00
Andrew Morton 0909fca513 git-battery vs git-acpi
drivers/w1/slaves/w1_ds2760.c:85: warning: initialization from incompatible pointer type

The ACPI guys changed the bin_attr APIs
(commit 91a6902958)

Cc: Anton Vorontsov <cbou@mail.ru>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-07-15 22:37:03 +04:00
Anton Vorontsov 7b3d54a8c3 Power supply class and drivers: remove non obligatory return statements
Per Jeff Garzik request.

Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Anton Vorontsov <cbou@mail.ru>
2007-07-15 22:32:38 +04:00
Jeff Garzik 5ebf6e6a96 pda_power: clean up irq, timer
Clean up pda_power interrupt handling:

Prior to this patch, the driver would pass information it needed
to the interrupt handler dev_id pointer, and then prompt forget it
ever did so, recreating that same information after a couple passes
through the timer-based state machine.

This patch removes the redundant checks by passing the
pda_power_supply[] pointer through the state machine.  The current
code passed 'irq' through the state machine, as an index to recreate
the pointer, when we could more simply pass around the pointer itself.

This patch makes it easier to remove the 'irq' argument in the future,
in addition to cleaning up the driver today.

Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-07-15 22:32:03 +04:00