This patch adds MOUT_DMC0 and SCLK_DMC0 for checking the dmc0 clock
in CPUFREQ driver.
Signed-off-by: Jaecheol Lee <jc.lee@samsung.com>
Signed-off-by: Sangbeom Kim <sbkim73@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Current fout_apll has fixed rate value. So CPUFREQ driver gets
incorrect value when finding current CPU frequency. Because some
operation level need to change APLL.
Added get_rate function for fout_apll can give correct frequency
value when calling get_rate function.
Signed-off-by: Jaecheol Lee <jc.lee@samsung.com>
Signed-off-by: Sangbeom Kim <sbkim73@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
S5PV310 and S5PC210 support more I2C devices than previous SoCs.
Add the device support code for them.
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
S5PV310 and S5PC210 support total 8 (+ 1 dedicated for HDMI) I2C devices.
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
The name of the I2C2 and I2C3 interrupt should be IIC2 and IIC3
instead of CAN0 and CAN1.
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
This patch changes I2C2 and I2C3 interrupt name from IRQ_CANX to IRQ_IICX
according other SoCs' I2C interrupt naming rule.
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Basically S5P SoCs use the Samsung common VA address mapping where
plat-samsung and use plat-s5p's mapping also. The later is a little
mess. So this patch cleans it up.
Cc: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Camera devices use the I2C0 and Gyro uese the I2C1 on universal board.
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[kgene.kim@samsung.com: minor title fixes]
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
OneNAND device support for Universal board.
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[kgene.kim@samsung.com: minor title fixes]
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
This patch adds clock types into platform data to support
external clock divider instead of internal clock divider.
It is defined that what kinds of clock type is used in machine.
Signed-off-by: Jeongbae Seo <jeongbae.seo@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
This patch adds to change bus width and host capability of HSMMC,
when HSMMC is only configured with another value of bus width
and host capability from default one.
Signed-off-by: Hyuk Lee <hyuk1.lee@samsung.com>
Signed-off-by: Jeongbae Seo <jeongbae.seo@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
This patch adds support HSMMC for S5PV310(SMDKV310) and
S5PC210(SMDKC210).
Signed-off-by: Hyuk Lee <hyuk1.lee@samsung.com>
Signed-off-by: Jeongbae Seo <jeongbae.seo@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
This patch adds initialization HSMMC device information.
And HSMMC platform data like card detect, data bus width
and capability is configured.
Signed-off-by: Hyuk Lee <hyuk1.lee@samsung.com>
Signed-off-by: Jeongbae Seo <jeongbae.seo@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
This patch adds support HSMMC for S5PV310 and S5PC210 and
setup for HSMMC host controller and also related GPIO.
At most 4 channel can be used at the same time.
A user can configure SDHCI data bus as 8bit or 4bit.
Signed-off-by: Hyuk Lee <hyuk1.lee@samsung.com>
Signed-off-by: Jeongbae Seo <jeongbae.seo@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
This patch clean up the GPIO code and removes useless GPIO addresses.
It can be calculated with offset, the 'base' member of s3c_gpio_chip
is also initialized in the init function.
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
When merged patches, missed IRQ_EINT_BIT() definition from commit ea31fd43
(ARM: S5PV210: Add Power Management Support). The IRQ_EINT_BIT() is used
in the Power Management operation (plat-samsung/pm.c).
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Stephen Rothwell reported this build warning:
arch/x86/oprofile/op_model_amd.c: In function 'ibs_eilvt_valid':
arch/x86/oprofile/op_model_amd.c:289: warning: 'offset' may be used uninitialized in this function
And correctly observed that indeed the variable is used uninitialized in
this function. The result of this bug can be a debug printk with a bogus
value.
Also fix a few more small details that made this function hard to read
and which probably contributed to the bug being introduced to begin with:
- Use more symmetric error conditions
- Remove the !0 obfuscation
- Add newlines to the printk output
- Remove bogus linebreaks in printk strings and elsewhere
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <20101025115736.41d51abe.sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add platform data, necessary to use Sony IMX074 camera sensor with AP4EVB.
Note: this does not work reliably without an API to reserve contiguous and
coherent memory for V4L DMA buffers. Such memory reservation has to be added as
soon as a suitable API becomes available.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Just as it says, if this is merged along with the other patch, the
driver supports the U300 NAND flash interface.
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This patch adds support for parsing Broadcom BCM63xx image tag format and
creating MTD partitions accordingly. This driver is a platform_device which
can be instantiated accordingly by bcm63xx board support code.
Signed-off-by: Daniel Dickinson <cshore@csolve.net>
Signed-off-by: Mike Albon <malbon@openwrt.org>
Signed-off-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Adding a new flash definition would need less code.
Keep the platform passing flash definition method.
If one flash is both defined in platform data and builtin table,
driver would select the one from platform data first.
By this way, platform could select the timing most suit for itself,
not need to follow the common settings.
Signed-off-by: Lei Wen <leiwen@marvell.com>
Signed-off-by: Haojian Zhuang <haojian.zhuang@marvell.com>
Acked-by: Eric Miao <eric.y.miao@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (321 commits)
KVM: Drop CONFIG_DMAR dependency around kvm_iommu_map_pages
KVM: Fix signature of kvm_iommu_map_pages stub
KVM: MCE: Send SRAR SIGBUS directly
KVM: MCE: Add MCG_SER_P into KVM_MCE_CAP_SUPPORTED
KVM: fix typo in copyright notice
KVM: Disable interrupts around get_kernel_ns()
KVM: MMU: Avoid sign extension in mmu_alloc_direct_roots() pae root address
KVM: MMU: move access code parsing to FNAME(walk_addr) function
KVM: MMU: audit: check whether have unsync sps after root sync
KVM: MMU: audit: introduce audit_printk to cleanup audit code
KVM: MMU: audit: unregister audit tracepoints before module unloaded
KVM: MMU: audit: fix vcpu's spte walking
KVM: MMU: set access bit for direct mapping
KVM: MMU: cleanup for error mask set while walk guest page table
KVM: MMU: update 'root_hpa' out of loop in PAE shadow path
KVM: x86 emulator: Eliminate compilation warning in x86_decode_insn()
KVM: x86: Fix constant type in kvm_get_time_scale
KVM: VMX: Add AX to list of registers clobbered by guest switch
KVM guest: Move a printk that's using the clock before it's ready
KVM: x86: TSC catchup mode
...
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: Makefile - replace the use of <module>-objs with <module>-y
crypto: hifn_795x - use cancel_delayed_work_sync()
crypto: talitos - sparse check endian fixes
crypto: talitos - fix checkpatch warning
crypto: talitos - fix warning: 'alg' may be used uninitialized in this function
crypto: cryptd - Adding the AEAD interface type support to cryptd
crypto: n2_crypto - Niagara2 driver needs to depend upon CRYPTO_DES
crypto: Kconfig - update broken web addresses
crypto: omap-sham - Adjust DMA parameters
crypto: fips - FIPS requires algorithm self-tests
crypto: omap-aes - OMAP2/3 AES hw accelerator driver
crypto: updates to enable omap aes
padata: add missing __percpu markup in include/linux/padata.h
MAINTAINERS: Add maintainer entries for padata/pcrypt
Originally, SRAR SIGBUS is sent to QEMU-KVM via touching the poisoned
page. But commit 9605456919 prevents the
signal from being sent. So now the signal is sent via
force_sig_info_fault directly.
[marcelo: use send_sig_info instead]
Reported-by: Dean Nelson <dnelson@redhat.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Now we have MCG_SER_P (and corresponding SRAO/SRAR MCE) support in
kernel and QEMU-KVM, the MCG_SER_P should be added into
KVM_MCE_CAP_SUPPORTED to make all these code really works.
Reported-by: Dean Nelson <dnelson@redhat.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
get_kernel_ns() wants preemption disabled. It doesn't make a lot of sense
during the get/set ioctls (no way to make them non-racy) but the callee wants
it.
Signed-off-by: Avi Kivity <avi@redhat.com>
Move access code parsing from caller site to FNAME(walk_addr) function
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
After root synced, all unsync sps are synced, this patch add a check to make
sure it's no unsync sps in VCPU's page table
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Introduce audit_printk, and record audit point instead audit name
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
After nested nested paging, it may using long mode to shadow 32/PAE paging
guest, so this patch fix it
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Set access bit while setup up direct page table if it's nonpaing or npt enabled,
it's good for CPU's speculate access
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The value of 'vcpu->arch.mmu.pae_root' is not modified, so we can update
'root_hpa' out of the loop.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Eliminate:
arch/x86/kvm/emulate.c:801: warning: ‘sv’ may be used uninitialized in this
function
on gcc 4.1.2
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Older gcc versions complain about the improper type (for x86-32), 4.5
seems to fix this silently. However, we should better use the right type
initially.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
By chance this caused no harm so far. We overwrite AX during switch
to/from guest context, so we must declare this.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Fix a hang during SMP kernel boot on KVM that showed up
after commit 489fb490db
(2.6.35) and 59aab522154a2f17b25335b63c1cf68a51fb6ae0
(2.6.34.1). The problem only occurs when
CONFIG_PRINTK_TIME is set.
KVM-Stable-Tag.
Signed-off-by: Arjan Koers <0h61vkll2ly8@xutrox.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Negate the effects of AN TYM spell while kvm thread is preempted by tracking
conversion factor to the highest TSC rate and catching the TSC up when it has
fallen behind the kernel view of time. Note that once triggered, we don't
turn off catchup mode.
A slightly more clever version of this is possible, which only does catchup
when TSC rate drops, and which specifically targets only CPUs with broken
TSC, but since these all are considered unstable_tsc(), this patch covers
all necessary cases.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This just changes some names to better reflect the usage they
will be given. Separated out to keep confusion to a minimum.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The math in kvm_get_time_scale relies on the fact that
NSEC_PER_SEC < 2^32. To use the same function to compute
arbitrary time scales, we must extend the first reduction
step to shrink the base rate to a 32-bit value, and
possibly reduce the scaled rate into a 32-bit as well.
Note we must take care to avoid an arithmetic overflow
when scaling up the tps32 value (this could not happen
with the fixed scaled value of NSEC_PER_SEC, but can
happen with scaled rates above 2^31.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
If an interrupt is pending, we need to stop emulation so we
can inject it.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Replace the inject-as-software-interrupt hack we currently have with
emulated injection.
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This adds a wrapper function kvm_inject_realmode_interrupt() around the
emulator function emulate_int_real() to allow real mode interrupt injection.
[avi: initialize operand and address sizes before emulating interrupts]
[avi: initialize rip for real mode interrupt injection]
[avi: clear interrupt pending flag after emulating interrupt injection]
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Nested SVM checks for external interrupt after injecting nested exception.
In case there is external interrupt pending the code generates "external
interrupt exit" and overwrites previous exit info. If previously injected
exception already generated exit it will be lost.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The PIC code used to be called from preempt_disable() context, which
wasn't very good for PREEMPT_RT. That is no longer the case, so move
back from raw_spinlock_t to spinlock_t.
Signed-off-by: Avi Kivity <avi@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
If preempted after kvmclock values are updated, but before hardware
virtualization is entered, the last tsc time as read by the guest is
never set. It underflows the next time kvmclock is updated if there
has not yet been a successful entry / exit into hardware virt.
Fix this by simply setting last_tsc to the newly read tsc value so
that any computed nsec advance of kvmclock is nulled.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch moves the detection whether a page-fault was
nested or not out of the error code and moves it into a
separate variable in the fault struct.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Change the interrupt injection code to work from preemptible, interrupts
enabled context. This works by adding a ->cancel_injection() operation
that undoes an injection in case we were not able to actually enter the guest
(this condition could never happen with atomic injection).
Signed-off-by: Avi Kivity <avi@redhat.com>
Currently vmx_complete_interrupts() can decode event information from vmx
exit fields into the generic kvm event queues. Make it able to decode
the information from the entry fields as well by parametrizing it.
Signed-off-by: Avi Kivity <avi@redhat.com>
vmx_complete_interrupts() does too much, split it up:
- vmx_vcpu_run() gets the "cache important vmcs fields" part
- a new vmx_complete_atomic_exit() gets the parts that must be done atomically
- a new vmx_recover_nmi_blocking() does what its name says
- vmx_complete_interrupts() retains the event injection recovery code
This helps in reducing the work done in atomic context.
Signed-off-by: Avi Kivity <avi@redhat.com>
Instead of blindly attempting to inject an event before each guest entry,
check for a possible event first in vcpu->requests. Sites that can trigger
event injection are modified to set KVM_REQ_EVENT:
- interrupt, nmi window opening
- ppr updates
- i8259 output changes
- local apic irr changes
- rflags updates
- gif flag set
- event set on exit
This improves non-injecting entry performance, and sets the stage for
non-atomic injection.
Signed-off-by: Avi Kivity <avi@redhat.com>
Commit "KVM: MMU: Make tdp_enabled a mmu-context parameter" made real-mode
set ->direct_map, and changed the code that merges in the memory type depend
on direct_map instead of tdp_enabled. However, in this case what really
matters is tdp, not direct_map, since tdp changes the pte format regardless
of whether the mapping is direct or not.
As a result, real-mode shadow mappings got corrupted with ept memory types.
The result was a huge slowdown, likely due to the cache being disabled.
Change it back as the simplest fix for the regression (real fix is to move
all that to vmx code, and not use tdp_enabled as a synonym for ept).
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch fixes a bug in KVM where it _always_ reports the
support of the SVM feature to userspace. But KVM only
supports SVM on AMD hardware and only when it is enabled in
the kernel module. This patch fixes the wrong reporting.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch implements the reporting of the nested paging
feature support to userspace.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch adds INTR and NMI intercepts to the list of
expected intercepts with an exit_int_info set. While this
can't happen on bare metal it is architectural legal and may
happen with KVMs SVM emulation.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch adds code to initialize the Nested Nested Paging
MMU context when the L1 guest executes a VMRUN instruction
and has nested paging enabled in its VMCB.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch adds the helper functions which will be used in
the mmu context for handling nested nested page faults.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
With Nested Paging emulation the NX state between the two
MMU contexts may differ. To make sure that always the right
fault error code is recorded this patch moves the NX state
into struct kvm_mmu so that the code can distinguish between
L1 and L2 NX state.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Currently the KVM softmmu implementation can not shadow a 32
bit legacy or PAE page table with a long mode page table.
This is a required feature for nested paging emulation
because the nested page table must alway be in host format.
So this patch implements the missing pieces to allow long
mode page tables for page table types.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch factors out the direct-mapping paths of the
mmu_alloc_roots function into a seperate function. This
makes it a lot easier to avoid all the unnecessary checks
done in the shadow path which may break when running direct.
In fact, this patch already fixes a problem when running PAE
guests on a PAE shadow page table.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This function is implemented to load the pdptr pointers of
the currently running guest (l1 or l2 guest). Therefore it
takes care about the current paging mode and can read pdptrs
out of l2 guest physical memory.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This function need to be able to load the pdptrs from any
mmu context currently in use. So change this function to
take an kvm_mmu parameter to fit these needs.
As a side effect this patch also moves the cached pdptrs
from vcpu_arch into the kvm_mmu struct.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
KVM currently ignores fetch faults in the instruction
emulator. With nested-npt we could have such faults. This
patch adds the code to handle these.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch implements logic to make sure that either a
page-fault/page-fault-vmexit or a nested-page-fault-vmexit
is propagated back to the guest.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch introduces the init_kvm_nested_mmu() function
which is used to re-initialize the nested mmu when the l2
guest changes its paging mode.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch introduces the kvm_read_guest_page_x86 function
which reads from the physical memory of the guest. If the
guest is running in guest-mode itself with nested paging
enabled it will read from the guest's guest physical memory
instead.
The patch also changes changes the code to use this function
where it is necessary.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch uses kvm_read_guest_page_tdp to make the
walk_addr_generic functions suitable for two-level page
table walking.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch adds a function which can read from the guests
physical memory or from the guest's guest physical memory.
This will be used in the two-dimensional page table walker.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch adds the functions to do a nested l2_gva to
l1_gpa page table walk.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch introduces the walk_mmu pointer which points to
the mmu-context currently used for gva_to_gpa translations.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch introduces a mmu-callback to translate gpa
addresses in the walk_addr code. This is later used to
translate l2_gpa addresses into l1_gpa addresses.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This is the first patch in the series towards a generic
walk_addr implementation which could walk two-dimensional
page tables in the end. In this first step the walk_addr
function is renamed into walk_addr_generic which takes a
mmu context as an additional parameter.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch introduces a struct with two new fields in
vcpu_arch for x86:
* fault.address
* fault.error_code
This will be used to correctly propagate page faults back
into the guest when we could have either an ordinary page
fault or a nested page fault. In the case of a nested page
fault the fault-address is different from the original
address that should be walked. So we need to keep track
about the real fault-address.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch changes is_rsvd_bits_set() function prototype to
take only a kvm_mmu context instead of a full vcpu.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Some logic of the init_kvm_softmmu function is required to
build the Nested Nested Paging context. So factor the
required logic into a seperate function and export it.
Also make the whole init path suitable for more than one mmu
context.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch introduces an inject_page_fault function pointer
into struct kvm_mmu which will be used to inject a page
fault. This will be used later when Nested Nested Paging is
implemented.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This function pointer in the MMU context is required to
implement Nested Nested Paging.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch introduces a special set_tdp_cr3 function pointer
in kvm_x86_ops which is only used for tpd enabled mmu
contexts. This allows to remove some hacks from svm code.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This is necessary to implement Nested Nested Paging. As a
side effect this allows some cleanups in the SVM nested
paging code.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch changes the tdp_enabled flag from its global
meaning to the mmu-context and renames it to direct_map
there. This is necessary for Nested SVM with emulation of
Nested Paging where we need an extra MMU context to shadow
the Nested Nested Page Table.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The walk_addr function checks for !is_long_mode in its 64
bit version. But what is meant here is a check for pae
paging. Change the condition to really check for pae paging
so that it also works with nested nested paging.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Some operating systems store data about the host processor at the
time of installation, and when booted on a more uptodate cpu tries
to read MSR_EBC_FREQUENCY_ID. This has been found with XP.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch changes the rip handling in the vmrun emulation
path from using next_rip to the generic kvm register access
functions.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch implements restoring of the correct rip, rsp, and
rax after the svm emulation in KVM injected a selective_cr0
write intercept into the guest hypervisor. The problem was
that the vmexit is emulated in the instruction emulation
which later commits the registers right after the write-cr0
instruction. So the l1 guest will continue to run with the
l2 rip, rsp and rax resulting in unpredictable behavior.
This patch is not the final word, it is just an easy patch
to fix the issue. The real fix will be done when the
instruction emulator is made aware of nested virtualization.
Until this is done this patch fixes the issue and provides
an easy way to fix this in -stable too.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch fixes 32 bit legacy paging with NPT enabled. The
mmu_check_root call on the top-level of the loop causes
root_gfn to take values (in the tdp_enabled path) which are
outside of guest memory. So the mmu_check_root call fails at
some point in the loop interation causing the guest to
tiple-fault.
This patch changes the mmu_check_root calls to the places
where they are really necessary. As a side-effect it
introduces a check for the root of a pae page table too.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
We have to protect the include for linux/of.h by __KERNEL__ so it doesn't
accidently get referenced outside.
This patch fixes this and makes the tree compile again.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
The e500_tlb.c file didn't compile for me due to the following error:
arch/powerpc/kvm/e500_tlb.c: In function ‘kvmppc_e500_shadow_map’:
arch/powerpc/kvm/e500_tlb.c:300: error: format ‘%lx’ expects type ‘long unsigned int’, but argument 2 has type ‘gfn_t’
So let's explicitly cast the argument to make printk happy.
Signed-off-by: Alexander Graf <agraf@suse.de>
The kvmppc_e500_stlbe_invalidate() function was trying to pass too many
parameters to trace_kvm_stlb_inval(). This appears to be a bad
copy-paste from a call to trace_kvm_stlb_write().
Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
BookE also wants to support level based interrupts, so let's implement
all the necessary logic there. We need to trick a bit here because the
irqprios are 1:1 assigned to architecture defined values. But since there
is some space left there, we can just pick a random one and move it later
on - it's internal anyways.
Signed-off-by: Alexander Graf <agraf@suse.de>
Now that we have all the level interrupt magic in place, let's
expose the capability to user space, so it can make use of it!
Signed-off-by: Alexander Graf <agraf@suse.de>
The current interrupt logic is just completely broken. We get a notification
from user space, telling us that an interrupt is there. But then user space
expects us that we just acknowledge an interrupt once we deliver it to the
guest.
This is not how real hardware works though. On real hardware, the interrupt
controller pulls the external interrupt line until it gets notified that the
interrupt was received.
So in reality we have two events: pulling and letting go of the interrupt line.
To maintain backwards compatibility, I added a new request for the pulling
part. The letting go part was implemented earlier already.
With this in place, we can now finally start guests that do not randomly stall
and stop to work at random times.
This patch implements above logic for Book3S.
Signed-off-by: Alexander Graf <agraf@suse.de>
Before I incorrectly enabled napping also for BookE, which would result in
needless dcache flushes. Since we only need to force enable napping on
Book3s_64 because it doesn't go into MSR_POW otherwise, we can just #ifdef
that code to this particular platform.
Reported-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Match only the first part of cur_cpu_spec->platform.
440GP (the first 440 processor) is identified by the string "ppc440gp", while
all later 440 processors use simply "ppc440".
Signed-off-by: Hollis Blanchard <hollis_blanchard@mentor.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Developers can now tell at a glace the exact type of the premature interrupt,
instead of just knowing that there was some premature interrupt.
Signed-off-by: Hollis Blanchard <hollis_blanchard@mentor.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
On Book3S a mtmsr with the MSR_POW bit set indicates that the OS is in
idle and only needs to be waked up on the next interrupt.
Now, unfortunately we let that bit slip into the stored MSR value which
is not what the real CPU does, so that we ended up executing code like
this:
r = mfmsr();
/* r containts MSR_POW */
mtmsr(r | MSR_EE);
This obviously breaks, as we're going into idle mode in code sections that
don't expect to be idling.
This patch masks MSR_POW out of the stored MSR value on wakeup, making
guests happy again.
Signed-off-by: Alexander Graf <agraf@suse.de>
Up until now we were doing segment mappings wrong on Book3s_32. For Book3s_64
we were using a trick where we know that a single mmu_context gives us 16 bits
of context ids.
The mm system on Book3s_32 instead uses a clever algorithm to distribute VSIDs
across the available range, so a context id really only gives us 16 available
VSIDs.
To keep at least a few guest processes in the SID shadow, let's map a number of
contexts that we can use as VSID pool. This makes the code be actually correct
and shouldn't hurt performance too much.
Signed-off-by: Alexander Graf <agraf@suse.de>
There are some heuristics in the PPC power management code that try to find
out if the particular hardware we're running on supports proper power management
or just hangs the machine when going into nap mode.
Since we know that KVM is safe with nap, let's force enable it in the PV code
once we're certain that we are on a KVM VM.
Signed-off-by: Alexander Graf <agraf@suse.de>
We had an arbitrary limitation in mtmsrd L=1 that kept us from using r30 and
r31 as input registers. Let's get rid of that and get more potential speedups!
Signed-off-by: Alexander Graf <agraf@suse.de>
When having a decrementor interrupt pending, the dequeuing happens manually
through an mtdec instruction. This instruction simply calls dequeue on that
interrupt, so the int_pending hint doesn't get updated.
This patch enables updating the int_pending hint also on dequeue, thus
correctly enabling guests to stay in guest contexts more often.
Signed-off-by: Alexander Graf <agraf@suse.de>
So far we've been restricting ourselves to r0-r29 as registers an mtmsr
instruction could use. This was bad, as there are some code paths in
Linux actually using r30.
So let's instead handle all registers gracefully and get rid of that
stupid limitation
Signed-off-by: Alexander Graf <agraf@suse.de>
This is the guest side of the mtsr acceleration. Using this a guest can now
call mtsrin with almost no overhead as long as it ensures that it only uses
it with (MSR_IR|MSR_DR) == 0. Linux does that, so we're good.
Signed-off-by: Alexander Graf <agraf@suse.de>
Now that the actual mtsr doesn't do anything anymore, we can move the sr
contents over to the shared page, so a guest can directly read and write
its sr contents from guest context.
Signed-off-by: Alexander Graf <agraf@suse.de>
Right now we're examining the contents of Book3s_32's segment registers when
the register is written and put the interpreted contents into a struct.
There are two reasons this is bad. For starters, the struct has worse real-time
performance, as it occupies more ram. But the more important part is that with
segment registers being interpreted from their raw values, we can put them in
the shared page, allowing guests to mess with them directly.
This patch makes the internal representation of SRs be u32s.
Signed-off-by: Alexander Graf <agraf@suse.de>
The current approach duplicates the spr->bat finding logic and makes it harder
to reuse the actually used variables. So let's move everything down to the spr
handler.
Signed-off-by: Alexander Graf <agraf@suse.de>
We will soon add SR PV support to the shared page, so we need some
infrastructure that allows the guest to query for features KVM exports.
This patch adds a second return value to the magic mapping that
indicated to the guest which features are available.
Signed-off-by: Alexander Graf <agraf@suse.de>
It turns out the in-kernel hash function is sub-optimal for our subtle
hash inputs where every bit is significant. So let's revert to the original
hash functions.
This reverts commit 05340ab4f9a6626f7a2e8f9fe5397c61d494f445.
Signed-off-by: Alexander Graf <agraf@suse.de>
There is a race condition in the pte invalidation code path where we can't
be sure if a pte was invalidated already. So let's move the spin lock around
to get rid of the race.
Signed-off-by: Alexander Graf <agraf@suse.de>
When hitting a no-execute or read-only data/inst storage interrupt we were
flushing the respective PTE so we're sure it gets properly overwritten next.
According to the spec, this is unnecessary though. The guest issues a tlbie
anyways, so we're safe to just keep the PTE around and have it manually removed
from the guest, saving us a flush.
Signed-off-by: Alexander Graf <agraf@suse.de>
When the guest jumps into kernel mode and has the magic page mapped, theres a
very high chance that it will also use it. So let's detect that scenario and
map the segment accordingly.
Signed-off-by: Alexander Graf <agraf@suse.de>
The different ways of flusing shadow ptes have their own debug prints which use
stupid old printk.
Let's move them to tracepoints, making them easier available, faster and
possible to activate on demand
Signed-off-by: Alexander Graf <agraf@suse.de>
After a flush the sid map contained lots of entries with 0 for their gvsid and
hvsid value. Unfortunately, 0 can be a real value the guest searches for when
looking up a vsid so it would incorrectly find the host's 0 hvsid mapping which
doesn't belong to our sid space.
So let's also check for the valid bit that indicated that the sid we're
looking at actually contains useful data.
Signed-off-by: Alexander Graf <agraf@suse.de>
We have a debug printk on every exit that is usually #ifdef'ed out. Using
tracepoints makes a lot more sense here though, as they can be dynamically
enabled.
This patch converts the most commonly used debug printks of EXIT_DEBUG to
tracepoints.
Signed-off-by: Alexander Graf <agraf@suse.de>
The following patch
commit 57ce1659316f4ca298919649f9b1b55862ac3826
KVM: x86: In DM_LOWEST, only deliver interrupts to vcpus with enabled LAPIC's
ignored the fact that kvm_irq_delivery_to_apic() was also used by ia64.
We define kvm_lapic_enabled() to fix a compile error caused by this.
This will have the same effect as reverting the problematic patch for ia64.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
The audit is very high overhead, so we need lower the frequency to assure
the guest is running.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Both audit_mappings() and audit_sptes_have_rmaps() need to walk vcpu's page
table, so we can do these checking in a spte walking
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Both audit_rmap() and audit_write_protection() need to walk all active sp, so
we can do these checking in a sp walking
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Move the audit code from arch/x86/kvm/mmu.c to arch/x86/kvm/mmu_audit.c
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Add a r/w module parameter named 'mmu_audit', it can control audit
enable/disable:
enable:
echo 1 > /sys/module/kvm/parameters/mmu_audit
disable:
echo 0 > /sys/module/kvm/parameters/mmu_audit
This patch not change the logic
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
MSR_K7_CLK_CTL is a no longer documented MSR, which is only relevant
on said old AMD CPU models. This change returns the expected value,
which the Linux kernel is expecting to avoid writing back the MSR,
plus it ignores all writes to the MSR.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
ICW is not a full reset, instead it resets a limited number of registers
in the PIC. Change ICW1 emulation to only reset those registers.
Signed-off-by: Avi Kivity <avi@redhat.com>
x86_emulate_insn() is full of things like
if (rc != X86EMUL_CONTINUE)
goto done;
break;
consolidate all of those at the end of the switch statement.
Signed-off-by: Avi Kivity <avi@redhat.com>
Otherwise EFER_LMA bit is retained across a SIPI reset.
Fixes guest cpu onlining.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Since commit aad827034e no mmu reinitialization is performed
via init_vmcb.
Zero vcpu->arch.cr0 and pass the reset value as a parameter to
kvm_set_cr0.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Nothing is checked in count_rmaps(), so remove it
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
There is a bugs in this function, we call gfn_to_pfn() and kvm_mmu_gva_to_gpa_read() in
atomic context(kvm_mmu_audit() is called under the spinlock(mmu_lock)'s protection).
This patch fix it by:
- introduce gfn_to_pfn_atomic instead of gfn_to_pfn
- get the mapping gfn from kvm_mmu_page_get_gfn()
And it adds 'notrap' ptes check in unsync/direct sps
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The audit code reports some sp not write protected in current code, it's just the
bug in audit_write_protection(), since:
- the invalid sp not need write protected
- using uninitialize local variable('gfn')
- call kvm_mmu_audit() out of mmu_lock's protection
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The read-only spte also has reverse mapping, so fix the code to check them,
also modify the function name to fit its doing
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
fix:
arch/x86/kvm/mmu.c: In function ‘kvm_mmu_unprotect_page’:
arch/x86/kvm/mmu.c:1741: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘gfn_t’
arch/x86/kvm/mmu.c:1745: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘gfn_t’
arch/x86/kvm/mmu.c: In function ‘mmu_unshadow’:
arch/x86/kvm/mmu.c:1761: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘gfn_t’
arch/x86/kvm/mmu.c: In function ‘set_spte’:
arch/x86/kvm/mmu.c:2005: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘gfn_t’
arch/x86/kvm/mmu.c: In function ‘mmu_set_spte’:
arch/x86/kvm/mmu.c:2033: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 7 has type ‘gfn_t’
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Pit interrupt injection was done by workqueue, so no need to check
pending pit timer in vcpu thread which could lead unnecessary
unblocking of vcpu.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
When CONFIG_KVM_GUEST is selected, but CONFIG_KVM is not, we were missing
some defines in asm-offsets.c and included too many headers at other places.
This patch makes above configuration work.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
The ALU opcode block is very regular; introduce D6ALU() to define decode
flags for 6 instructions at a time.
Suggested by Paolo Bonzini.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Many x86 instructions come in byte and word variants distinguished with bit
0 of the opcode. Add macros to aid in defining them.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
SrcMemFAddr is not defined with the modrm operand designating a register
instead of a memory address.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
x86_emulate_insn() will return 1 if instruction can be restarted
without re-entering a guest.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
As suggested by Christian, we should expose headers to user space with
information that might be valuable there. The s390 virtio interface is
one of those cases. It defines an ABI between hypervisor and guest, so
it should be exposed to user space.
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The one big missing feature in s390-virtio was hotplugging. This is no more.
This patch implements hotplug add support, so you can on the fly add new devices
in the guest.
Keep in mind that this needs a patch for qemu to actually leverage the
functionality.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Currenty the ext_param field only distinguishes between "config change" and
"vring interrupt". We can do a lot more with it though, so let's enable a
full byte of possible values and constants to #defines while at it.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Support prefetch ptes when intercept guest #PF, avoid to #PF by later
access
If we meet any failure in the prefetch path, we will exit it and
not try other ptes to avoid become heavy path
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Kernel time, which advances in discrete steps may progress much slower
than TSC. As a result, when kvmclock is adjusted to a new base, the
apparent time to the guest, which runs at a much higher, nsec scaled
rate based on the current TSC, may have already been observed to have
a larger value (kernel_ns + scaled tsc) than the value to which we are
setting it (kernel_ns + 0).
We must instead compute the clock as potentially observed by the guest
for kernel_ns to make sure it does not go backwards.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The scale_delta function for shift / multiply with 31-bit
precision moves to a common header so it can be used by both
kernel and kvm module.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
If there are active VCPUs which are marked as belonging to
a particular hardware CPU, request a clock sync for them when
enabling hardware; the TSC could be desynchronized on a newly
arriving CPU, and we need to recompute guests system time
relative to boot after a suspend event.
This covers both cases.
Note that it is acceptable to take the spinlock, as either
no other tasks will be running and no locks held (BSP after
resume), or other tasks will be guaranteed to drop the lock
relatively quickly (AP on CPU_STARTING).
Noting we now get clock synchronization requests for VCPUs
which are starting up (or restarting), it is tempting to
attempt to remove the arch/x86/kvm/x86.c CPU hot-notifiers
at this time, however it is not correct to do so; they are
required for systems with non-constant TSC as the frequency
may not be known immediately after the processor has started
until the cpufreq driver has had a chance to run and query
the chipset.
Updated: implement better locking semantics for hardware_enable
Removed the hack of dropping and retaking the lock by adding the
semantic that we always hold kvm_lock when hardware_enable is
called. The one place that doesn't need to worry about it is
resume, as resuming a frozen CPU, the spinlock won't be taken.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Make the match of TSC find TSC writes that are close to each other
instead of perfectly identical; this allows the compensator to also
work in migration / suspend scenarios.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Add a helper function to compute the kernel time and convert nanoseconds
back to CPU specific cycles. Note that these must not be called in preemptible
context, as that would mean the kernel could enter software suspend state,
which would cause non-atomic operation.
Also, convert the KVM_SET_CLOCK / KVM_GET_CLOCK ioctls to use the kernel
time helper, these should be bootbased as well.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
When CPUs with unstable TSCs enter deep C-state, TSC may stop
running. This causes us to require resynchronization. Since
we can't tell when this may potentially happen, we assume the
worst by forcing re-compensation for it at every point the VCPU
task is descheduled.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Move the TSC control logic from the vendor backends into x86.c
by adding adjust_tsc_offset to x86 ops. Now all TSC decisions
can be done in one place.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
If creating an SMP guest with unstable host TSC, issue a warning
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This simplifies much of the init code; we can now simply always
call tsc_khz_changed, optionally passing it a new value, or letting
it figure out the existing value (while interrupts are disabled, and
thus, by inference from the rule, not raceful against CPU hotplug or
frequency updates, which will issue IPIs to the local CPU to perform
this very same task).
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Attempt to synchronize TSCs which are reset to the same value. In the
case of a reliable hardware TSC, we can just re-use the same offset, but
on non-reliable hardware, we can get closer by adjusting the offset to
match the elapsed time.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Also, ensure that the storing of the offset and the reading of the TSC
are never preempted by taking a spinlock. While the lock is overkill
now, it is useful later in this patch series.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Change svm / vmx to be the same internally and write TSC offset
instead of bare TSC in helper functions. Isolated as a single
patch to contain code movement.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This is used only by the VMX code, and is not done properly;
if the TSC is indeed backwards, it is out of sync, and will
need proper handling in the logic at each and every CPU change.
For now, drop this test during init as misguided.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
commit ad05c88266b4cce1c820928ce8a0fb7690912ba1
(KVM: create aggregate kvm_total_used_mmu_pages value)
introduce percpu counter kvm_total_used_mmu_pages but never
destroy it, this may cause oops when rmmod & modprobe.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Latest kvm mmu_shrink code rework makes kernel changes kvm->arch.n_used_mmu_pages/
kvm->arch.n_max_mmu_pages at kvm_mmu_free_page/kvm_mmu_alloc_page, which is called
by kvm_mmu_commit_zap_page. So the kvm->arch.n_used_mmu_pages or
kvm_mmu_available_pages(vcpu->kvm) is unchanged after kvm_mmu_prepare_zap_page(),
This caused kvm_mmu_change_mmu_pages/__kvm_mmu_free_some_pages loops forever.
Moving kvm_mmu_commit_zap_page would make the while loop performs as normal.
Reported-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Tested-by: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Of slab shrinkers, the VM code says:
* Note that 'shrink' will be passed nr_to_scan == 0 when the VM is
* querying the cache size, so a fastpath for that case is appropriate.
and it *means* it. Look at how it calls the shrinkers:
nr_before = (*shrinker->shrink)(0, gfp_mask);
shrink_ret = (*shrinker->shrink)(this_scan, gfp_mask);
So, if you do anything stupid in your shrinker, the VM will doubly
punish you.
The mmu_shrink() function takes the global kvm_lock, then acquires
every VM's kvm->mmu_lock in sequence. If we have 100 VMs, then
we're going to take 101 locks. We do it twice, so each call takes
202 locks. If we're under memory pressure, we can have each cpu
trying to do this. It can get really hairy, and we've seen lock
spinning in mmu_shrink() be the dominant entry in profiles.
This is guaranteed to optimize at least half of those lock
aquisitions away. It removes the need to take any of the locks
when simply trying to count objects.
A 'percpu_counter' can be a large object, but we only have one
of these for the entire system. There are not any better
alternatives at the moment, especially ones that handle CPU
hotplug.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Doing this makes the code much more readable. That's
borne out by the fact that this patch removes code. "used"
also happens to be the number that we need to return back to
the slab code when our shrinker gets called. Keeping this
value as opposed to free makes the next patch simpler.
So, 'struct kvm' is kzalloc()'d. 'struct kvm_arch' is a
structure member (and not a pointer) of 'struct kvm'. That
means they start out zeroed. I _think_ they get initialized
properly by kvm_mmu_change_mmu_pages(). But, that only happens
via kvm ioctls.
Another benefit of storing 'used' intead of 'free' is
that the values are consistent from the moment the structure is
allocated: no negative "used" value.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
arch.n_alloc_mmu_pages is a poor choice of name. This value truly
means, "the number of pages which _may_ be allocated". But,
reading the name, "n_alloc_mmu_pages" implies "the number of allocated
mmu pages", which is dead wrong.
It's really the high watermark, so let's give it a name to match:
nr_max_mmu_pages. This change will make the next few patches
much more obvious and easy to read.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
"free" is a poor name for this value. In this context, it means,
"the number of mmu pages which this kvm instance should be able to
allocate." But "free" implies much more that the objects are there
and ready for use. "available" is a much better description, especially
when you see how it is calculated.
In this patch, we abstract its use into a function. We'll soon
replace the function's contents by calculating the value in a
different way.
All of the reads of n_free_mmu_pages are taken care of in this
patch. The modification sites will be handled in a patch
later in the series.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Most x86 two operand instructions allow the destination to be a memory operand,
but IMUL (for example) requires that the destination be a register. Change
____emulate_2op() to take a register for both source and destination so we
can invoke IMUL.
Signed-off-by: Avi Kivity <avi@redhat.com>
emulate_push() only schedules a push; it doesn't actually push anything.
Call writeback() to flush out the write.
Signed-off-by: Avi Kivity <avi@redhat.com>
Change OUT instruction to use dst instead of src, so we can
reuse those code for all out instructions.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Introduce DstImmUByte for dst operand decode, which
will be used for out instruction.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Introduce function write_register_operand() to write back the
register operand.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Add kvm_release_page_clean() after is_error_page() to avoid
leakage of error page.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The code for initializing the emulation context is duplicated at two
locations (emulate_instruction() and kvm_task_switch()). Separate it
in a separate function and call it from there.
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch lets emulate_grp3() return X86EMUL_* return codes instead
of hardcoded ones.
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Mask group 8 instruction as BitOp, so we can share the
code for adjust the source operand.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
adjust the dst address for a register source but not adjust the
address for an immediate source.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
If bit offset operands is a negative number, BitOp instruction
will return wrong value. This patch fix it.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch change to disable writeback when decode dest
operand if the dest type is ImplicitOps or not specified.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This adds support for int instructions to the emulator.
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The patch adds a new member get_idt() to x86_emulate_ops.
It also adds a function to get the idt in order to be used by the emulator.
This is needed for real mode interrupt injection and the emulation of int
instructions.
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Two-byte opcode always start with 0x0F and the decode flags
of opcode 0xF0 is always 0, so remove dup check.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
When using a relocatable kernel we need to make sure that the trampline code
and the interrupt handlers are both copied to low memory. The only way to do
this reliably is to put them in the copied section.
This patch should make relocated kernels work with KVM.
KVM-Stable-Tag
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
On Book3S KVM we directly expose some asm pointers to C code as
variables. These need to be relocated and thus break on relocatable
kernels.
To make sure we can at least build, let's mark them as long instead
of u32 where 64bit relocations don't work.
This fixes the following build error:
WARNING: 2 bad relocations^M
> c000000000008590 R_PPC64_ADDR32 .text+0x4000000000008460^M
> c000000000008594 R_PPC64_ADDR32 .text+0x4000000000008598^M
Please keep in mind that actually using KVM on a relocated kernel
might still break. This only fixes the compile problem.
Reported-by: Subrata Modak <subrata@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Book3S_32 requires MSR_DR to be disabled during load_up_xxx while on Book3S_64
it's supposed to be enabled. I misread the code and disabled it in both cases,
potentially breaking the PS3 which has a really small RMA.
This patch makes KVM work on the PS3 again.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
On Book3s_32 the tlbie instruction flushed effective addresses by the mask
0x0ffff000. This is pretty hard to reflect with a hash that hashes ~0xfff, so
to speed up that target we should also keep a special hash around for it.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
On failure gfn_to_pfn returns bad_page so use correct function to check
for that.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
So far we've been running all code without locking of any sort. This wasn't
really an issue because I didn't see any parallel access to the shadow MMU
code coming.
But then I started to implement dirty bitmapping to MOL which has the video
code in its own thread, so suddenly we had the dirty bitmap code run in
parallel to the shadow mmu code. And with that came trouble.
So I went ahead and made the MMU modifying functions as parallelizable as
I could think of. I hope I didn't screw up too much RCU logic :-). If you
know your way around RCU and locking and what needs to be done when, please
take a look at this patch.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Due to previous changes, the Book3S_32 guest MMU code didn't compile properly
when enabling debugging.
This patch repairs the broken code paths, making it possible to define DEBUG_MMU
and friends again.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
We need to tell the guest the opcodes that make up a hypercall through
interfaces that are controlled by userspace. So we need to add a call
for userspace to allow it to query those opcodes so it can pass them
on.
This is required because the hypercall opcodes can change based on
the hypervisor conditions. If we're running in hardware accelerated
hypervisor mode, a hypercall looks different from when we're running
without hardware acceleration.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
On BookE the preferred way to write the EE bit is the wrteei instruction. It
already encodes the EE bit in the instruction.
So in order to get BookE some speedups as well, let's also PV'nize thati
instruction.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
There is also a form of mtmsr where all bits need to be addressed. While the
PPC64 Linux kernel behaves resonably well here, on PPC32 we do not have an
L=1 form. It does mtmsr even for simple things like only changing EE.
So we need to hook into that one as well and check for a mask of bits that we
deem safe to change from within guest context.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
The PowerPC ISA has a special instruction for mtmsr that only changes the EE
and RI bits, namely the L=1 form.
Since that one is reasonably often occuring and simple to implement, let's
go with this first. Writing EE=0 is always just a store. Doing EE=1 also
requires us to check for pending interrupts and if necessary exit back to the
hypervisor.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
When we hook an instruction we need to make sure we don't clobber any of
the registers at that point. So we write them out to scratch space in the
magic page. To make sure we don't fall into a race with another piece of
hooked code, we need to disable interrupts.
To make the later patches and code in general easier readable, let's introduce
a set of defines that save and restore r30, r31 and cr. Let's also define some
helpers to read the lower 32 bits of a 64 bit field on 32 bit systems.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
We will need to patch several instruction streams over to a different
code path, so we need a way to patch a single instruction with a branch
somewhere else.
This patch adds a helper to facilitate this patching.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
We will soon require more sophisticated methods to replace single instructions
with multiple instructions. We do that by branching to a memory region where we
write replacement code for the instruction to.
This region needs to be within 32 MB of the patched instruction though, because
that's the furthest we can jump with immediate branches.
So we keep 1MB of free space around in bss. After we're done initing we can just
tell the mm system that the unused pages are free, but until then we have enough
space to fit all our code in.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
With our current MMU scheme we don't need to know about the tlbsync instruction.
So we can just nop it out.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Some instructions can simply be replaced by load and store instructions to
or from the magic page.
This patch replaces often called instructions that fall into the above category.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
We will soon start and replace instructions from the text section with
other, paravirtualized versions. To ease the readability of those patches
I split out the generic looping and magic page mapping code out.
This patch still only contains stubs. But at least it loops through the
text section :).
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
We have all the hypervisor pieces in place now, but the guest parts are still
missing.
This patch implements basic awareness of KVM when running Linux as guest. It
doesn't do anything with it yet though.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Currently x86 is the only architecture that uses kvm_guest_init(). With
PowerPC we're getting a second user, but the signature is different there
and we don't need to export it, as it uses the normal kernel init framework.
So let's move the x86 specific definition of that function over to the x86
specfic header file.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Now that we have the shared page in place and the MMU code knows about
the magic page, we can expose that capability to the guest!
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
We need to override EA as well as PA lookups for the magic page. When the guest
tells us to project it, the magic page overrides any guest mappings.
In order to reflect that, we need to hook into all the MMU layers of KVM to
force map the magic page if necessary.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
We will be introducing a method to project the shared page in guest context.
As soon as we're talking about this coupling, the shared page is colled magic
page.
This patch introduces simple defines, so the follow-up patches are easier to
read.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
On PowerPC it's very normal to not support all of the physical RAM in real mode.
To check if we're matching on the shared page or not, we need to know the limits
so we can restrain ourselves to that range.
So let's make it a define instead of open-coding it. And while at it, let's also
increase it.
Signed-off-by: Alexander Graf <agraf@suse.de>
v2 -> v3:
- RMO -> PAM (non-magic page)
Signed-off-by: Avi Kivity <avi@redhat.com>
When the guest turns on interrupts again, it needs to know if we have an
interrupt pending for it. Because if so, it should rather get out of guest
context and get the interrupt.
So we introduce a new field in the shared page that we use to tell the guest
that there's a pending interrupt lying around.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
While running in hooked code we need to store register contents out because
we must not clobber any registers.
So let's add some fields to the shared page we can just happily write to.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
When running in hooked code we need a way to disable interrupts without
clobbering any interrupts or exiting out to the hypervisor.
To achieve this, we have an additional critical field in the shared page. If
that field is equal to the r1 register of the guest, it tells the hypervisor
that we're in such a critical section and thus may not receive any interrupts.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
To communicate with KVM directly we need to plumb some sort of interface
between the guest and KVM. Usually those interfaces use hypercalls.
This hypercall implementation is described in the last patch of the series
in a special documentation file. Please read that for further information.
This patch implements stubs to handle KVM PPC hypercalls on the host and
guest side alike.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
When in kernel mode there are 4 additional registers available that are
simple data storage. Instead of exiting to the hypervisor to read and
write those, we can just share them with the guest using the page.
This patch converts all users of the current field to the shared page.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
The SRR0 and SRR1 registers contain cached values of the PC and MSR
respectively. They get written to by the hypervisor when an interrupt
occurs or directly by the kernel. They are also used to tell the rfi(d)
instruction where to jump to.
Because it only gets touched on defined events that, it's very simple to
share with the guest. Hypervisor and guest both have full r/w access.
This patch converts all users of the current field to the shared page.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
The DAR register contains the address a data page fault occured at. This
register behaves pretty much like a simple data storage register that gets
written to on data faults. There is no hypervisor interaction required on
read or write.
This patch converts all users of the current field to the shared page.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
The DSISR register contains information about a data page fault. It is fully
read/write from inside the guest context and we don't need to worry about
interacting based on writes of this register.
This patch converts all users of the current field to the shared page.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
One of the most obvious registers to share with the guest directly is the
MSR. The MSR contains the "interrupts enabled" flag which the guest has to
toggle in critical sections.
So in order to bring the overhead of interrupt en- and disabling down, let's
put msr into the shared page. Keep in mind that even though you can fully read
its contents, writing to it doesn't always update all state. There are a few
safe fields that don't require hypervisor interaction. See the documentation
for a list of MSR bits that are safe to be set from inside the guest.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
For transparent variable sharing between the hypervisor and guest, I introduce
a shared page. This shared page will contain all the registers the guest can
read and write safely without exiting guest context.
This patch only implements the stubs required for the basic structure of the
shared page. The actual register moving follows.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
If a nop instruction is encountered, we jump directly to the done label.
This skip updating rip. Break from the switch case instead
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Since modrm operand can be either register or memory, decoding it into
a 'struct operand', which can represent both, is simpler.
Signed-off-by: Avi Kivity <avi@redhat.com>
The operands for these instructions are 32 bits or 64 bits, depending on
long mode, and ignoring REX prefixes, or the operand size prefix.
Signed-off-by: Avi Kivity <avi@redhat.com>
Currently we use a void pointer for memory addresses. That's wrong since
these are guest virtual addresses which are not directly dereferencable by
the host.
Use the correct type, unsigned long.
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch lets a nested vmrun fail if the L1 hypervisor
left the asid zero. This fixes the asid_zero unit test.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch lets the nested vmrun fail if the L1 hypervisor
has not intercepted vmrun. This fixes the "vmrun intercept
check" unit test.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Mark page dirty only when this page is really written, it's more exacter,
and also can fix dirty page marking in speculation path
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Introduce spte_has_volatile_bits() function to judge whether spte
bits will miss, it's more readable and can help us to cleanup code
later
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
It's a small cleanup that using using kvm_set_pfn_accessed() instead
of mark_page_accessed()
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
No need to update vcpu state since instruction is in the middle of the
emulation.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Needed for repeating instructions with execution functions.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Instead of looking up the opcode twice (once for decode flags, once for
the big execution switch) look up both flags and function in the decode tables.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
It doesn't ever change, so we don't need to pass it around everywhere.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Now that the group index no longer exists, the space is free.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Instead of having a group number, store the group table pointer directly in
the opcode.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
We'll be using that to distinguish between new-style and old-style groups.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Once 'struct opcode' grows, its initializer will become more complicated.
Wrap the simple initializers in a D() macro, and replace the empty initializers
with an even simpler N macro.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This will hold all the information known about the opcode. Currently, this
is just the decode flags.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The parenthese make is impossible to use the macros with initializers that
require braces.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Ths patch adds IRET instruction (opcode 0xcf).
Currently, only IRET in real mode is emulated. Protected mode support is to be added later if needed.
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Reviewed-by: Avi Kivity <avi@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch implements the emulations of the svm next_rip
feature in the nested svm implementation in kvm.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch fixes a bug in a nested hypervisor that heavily
switches between real-mode and long-mode. The problem is
fixed by syncing back efer into the guest vmcb on emulated
vmexit.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
After commit 53383eaad08d, the '*spte' has updated before call
rmap_remove()(in most case it's 'shadow_trap_nonpresent_pte'), so
remove this information from error message
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Now that we have the host gdt conveniently stored in a variable, make use
of it instead of querying the cpu.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Use just one group table for byte (F6) and word (F7) opcodes.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Move operand decoding to the opcode table, keep lock decoding in the group
table. This allows us to get consolidate the four variants of Group 1 into one
group.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Allow bits that are common to all members of a group to be specified in the
opcode table instead of the group table. This allows some simplification
of the decode tables.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Add a decode flag to indicate the instruction is invalid. Will come in useful
later, when we mix decode bits from the opcode and group table.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Currently group bits are stored in bits 0:7, where operand bits are stored.
Make group bits be 0:3, and move the existing bits 0:3 to 16:19, so we can
mix group and operand bits.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Some instructions are repetitive in the opcode space, add macros for
consolidating them.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
If an instruction is present in the decode tables but not in the execution
switch, it will be emulated as a NOP. An example is IRET (0xcf).
Fix by adding default: labels to the execution switches.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1699 commits)
bnx2/bnx2x: Unsupported Ethtool operations should return -EINVAL.
vlan: Calling vlan_hwaccel_do_receive() is always valid.
tproxy: use the interface primary IP address as a default value for --on-ip
tproxy: added IPv6 support to the socket match
cxgb3: function namespace cleanup
tproxy: added IPv6 support to the TPROXY target
tproxy: added IPv6 socket lookup function to nf_tproxy_core
be2net: Changes to use only priority codes allowed by f/w
tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled
tproxy: added tproxy sockopt interface in the IPV6 layer
tproxy: added udp6_lib_lookup function
tproxy: added const specifiers to udp lookup functions
tproxy: split off ipv6 defragmentation to a separate module
l2tp: small cleanup
nf_nat: restrict ICMP translation for embedded header
can: mcp251x: fix generation of error frames
can: mcp251x: fix endless loop in interrupt handler if CANINTF_MERRF is set
can-raw: add msg_flags to distinguish local traffic
9p: client code cleanup
rds: make local functions/variables static
...
Fix up conflicts in net/core/dev.c, drivers/net/pcmcia/smc91c92_cs.c and
drivers/net/wireless/ath/ath9k/debug.c as per David
The stack output currently looks like this:
7fffffffffffffff 0000000a00000000 ffffffff81093341 0000000000000046
<0> ffff88003a545fd8 0000000000000000 0000000000000000 00007fffa39769c0
<0> ffff88003e403f58 ffffffff8102fc4c ffff88003e403f58 ffff88003e403f78
The superfluous <0> are caused by recent printk KERN_CONT
change. <*> is now ignored in printk unless some text follows
the level and even then it still has to be the first in the
format message.
Note that the log_lvl parameter is now completely ignored in
show_stack_log_lvl and the stack is dumped with the default
level (like for quite some time already). It behaves the same as
the rest of the dump, function traces are dumped in the very
same manner. Only Code and maybe some lines are printed with
EMERG level.
Unfortunately I see no way how to fix this conceptually to have
the whole oops/BUG/panic output with the same level, so this
removed only the superfluous characters for the time being.
Just for illustration:
<4>Process kworker/0:0 (pid: 0, threadinfo ffff88003c8a6000, task ffff88003c85c100)
<0>Stack:
<4> ffffffff818022c0 0000000a00000001 0000000000000001 0000000000000046
<4> ffff88003c8a7fd8 0000000000000001 ffff88003c8a7e58 0000000000000000
<4> ffff88003e503f48 ffffffff8102fc4c ffff88003e503f48 ffff88003e503f68
<0>Call Trace:
<0> <IRQ>
<4> [<ffffffff8102fc4c>] ? call_softirq+0x1c/0x30 ...
<0>Code: 00 01 00 00 65 8b 04 25 80 c5 00 00 c7 45 ...
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: jirislaby@gmail.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <1287586131-16222-1-git-send-email-jslaby@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'softirq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
softirqs: Make wakeup_softirqd static
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, asm: Restore parentheses around one pushl_cfi argument
x86, asm: Fix ancient-GAS workaround
x86, asm: Fix CFI macro invocations to deal with shortcomings in gas
* 'x86-numa-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, numa: Assign CPUs to nodes in round-robin manner on fake NUMA
* 'x86-quirks-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: HPET force enable for CX700 / VIA Epia LT
* 'x86-setup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, setup: Use string copy operation to optimze copy in kernel compression
* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, UV: Use allocated buffer in tlb_uv.c:tunables_read()
* 'x86-vm86-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, vm86: Fix preemption bug for int1 debug and int3 breakpoint handlers.
* 'x86-trampoline-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-32, mm: Add an initial page table for core bootstrapping
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
kdb,debug_core: adjust master cpu switch logic against new debug_core locking
debug_core: refactor locking for master/slave cpus
x86,kgdb: remove unnecessary call to kgdb_correct_hw_break()
debug_core: disable hw_breakpoints on all cores in kgdb_cpu_enter()
kdb,kgdb: fix sparse fixups
kdb: Fix oops in kdb_unregister
kdb,ftdump: Remove reference to internal kdb include
kdb: Allow kernel loadable modules to add kdb shell functions
debug_core: stop rcu warnings on kernel resume
debug_core: move all watch dog syncs to a single function
x86,kgdb: fix debugger hw breakpoint test regression in 2.6.35
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: (141 commits)
USB: mct_u232: fix broken close
USB: gadget: amd5536udc.c: fix error path
USB: imx21-hcd - fix off by one resource size calculation
usb: gadget: fix Kconfig warning
usb: r8a66597-udc: Add processing when USB was removed.
mxc_udc: add workaround for ENGcm09152 for i.MX35
USB: ftdi_sio: add device ids for ScienceScope
USB: musb: AM35x: Workaround for fifo read issue
USB: musb: add musb support for AM35x
USB: AM35x: Add musb support
usb: Fix linker errors with CONFIG_PM=n
USB: ohci-sh - use resource_size instead of defining its own resource_len macro
USB: isp1362-hcd - use resource_size instead of defining its own resource_len macro
USB: isp116x-hcd - use resource_size instead of defining its own resource_len macro
USB: xhci: Fix compile error when CONFIG_PM=n
USB: accept some invalid ep0-maxpacket values
USB: xHCI: PCI power management implementation
USB: xHCI: bus power management implementation
USB: xHCI: port remote wakeup implementation
USB: xHCI: port power management implementation
...
Manually fix up (non-data) conflict: the SCSI merge gad renamed the
'hw_sector_size' member to 'physical_block_size', and the USB tree
brought a new use of it.
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (49 commits)
serial8250: ratelimit "too much work" error
serial: bfin_sport_uart: speed up sport RX sample rate to be 3% faster
serial: abstraction for 8250 legacy ports
serial/imx: check that the buffer is non-empty before sending it out
serial: mfd: add more baud rates support
jsm: Remove the uart port on errors
Alchemy: Add UART PM methods.
8250: allow platforms to override PM hook.
altera_uart: Don't use plain integer as NULL pointer
altera_uart: Fix missing prototype for registering an early console
altera_uart: Fixup type usage of port flags
altera_uart: Make it possible to use Altera UART and 8250 ports together
altera_uart: Add support for different address strides
altera_uart: Add support for getting mapbase and IRQ from resources
altera_uart: Add support for polling mode (IRQ-less)
serial: Factor out uart_poll_timeout() from 8250 driver
serial: mark the 8250 driver as maintained
serial: 8250: Don't delay after transmitter is ready.
tty: MAINTAINERS: add drivers/serial/jsm/ as maintained driver
vcs: invoke the vt update callback when /dev/vcs* is written to
...
When converting to use s3c_gpio_cfgpin_range() the function for the
IISv4 block appears to have been typoed as 4 (the keypad) rather than
5 as it should be.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
This patch changes code setting special-function and no pull-up
to use the s3c_gpio_cfgrange_nopull() wrapper.
NOTE: This is for missed things from the previous patch.
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
This patch changes code setting special-function and no pull-up
to use the s3c_gpio_cfgrange_nopull() wrapper.
NOTE: This is for missed things from the previous patch.
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
This patch changes code setting special-function and no pull-up
to use the s3c_gpio_cfgrange_nopull() wrapper.
NOTE: This is for missed things from the previous patch.
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Change code setting special-function and no pull-up to use
the s3c_gpio_cfgrange_nopull() wrapper.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Change code setting special-function and no pull-up to use
the s3c_gpio_cfgrange_nopull() wrapper.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Change code setting special-function and no pull-up to use
the s3c_gpio_cfgrange_nopull() wrapper.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
A number of the SDHCI code configure a GPIO to a special function
and remove any pull-up, so add s3c_gpio_cfgrange_nopull() as a
wrapper to the s3c_gpio_cfgall_range() to make the code that
calls it fit on one line.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
This patch changes the code setting range of GPIO pins' configuration and
pull state to use the recently introduced s3c_gpio_cfgpin_range().
NOTE: This is for missed things from the previous patch.
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
This patch changes the code setting range of GPIO pins' configuration and
pull state to use the recently introduced s3c_gpio_cfgpin_range().
NOTE: This is for missed things from the previous patch.
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
This patch changes the code setting range of GPIO pins' configuration and
pull state to use the recently introduced s3c_gpio_cfgpin_range().
NOTE: This is for missed things from the previous patch.
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
This patch changes the code setting range of GPIO pins' configuration and
pull state to use the recently introduced s3c_gpio_cfgpin_range().
NOTE: This is for missed things from the previous patch.
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Change the code setting a range of GPIO pins' configuration and
pull state to use the recently introduced s3c_gpio_cfgall_range().
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Change the code setting a range of GPIO pins' configuration and
pull state to use the recently introduced s3c_gpio_cfgall_range().
Mop up a few missed s3c_gpio_cfgpin_range() changes.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Change the code setting a range of GPIO pins' configuration and
pull state to use the recently introduced s3c_gpio_cfgall_range().
Mop up a few missed s3c_gpio_cfgpin_range() changes.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Change the code setting a range of GPIO pins' configuration and
pull state to use the recently introduced s3c_gpio_cfgall_range().
Mop up a few missed s3c_gpio_cfgpin_range() changes.
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Change the code setting a range of GPIO pins' configuration and
pull state to use the recently introduced s3c_gpio_cfgall_range().
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Change the code setting a range of GPIO pins' configuration and
pull state to use the recently introduced s3c_gpio_cfgall_range().
Mop up a few missed s3c_gpio_cfgpin_range() changes.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
[kgene.kim@samsung.com: Fix small comments]
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Add a function to configure a range of GPIOs function and
pull in one go, mainly for the SDHCI and framebuffer helpers
which tend to do this.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
[kgene.kim@samsung.com: Fix small comments]
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
This patch changes the code setting ranges of GPIO pins in mach-s5pv310 using
s3c_gpio_cfgpin() to use the recently introduced s3c_gpio_cfgpin_range().
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Change the code setting ranges of GPIO pins using s3c_gpio_cfgpin() to
use the recently introduced s3c_gpio_cfgpin_range().
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
[kgene.kim@samsung.com: coding-style fixes]
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Change the code setting ranges of GPIO pins using s3c_gpio_cfgpin() to
use the recently introduced s3c_gpio_cfgpin_range().
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
[kgene.kim@samsung.com: Fixed wrong change]
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
This patch changes the code setting ranges of GPIO pins in mach-s5p64x0 using
s3c_gpio_cfgpin() to use the recently introduced s3c_gpio_cfgpin_range().
NOTE: This is for missed things from the previous patch.
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Change the code setting ranges of GPIO pins using s3c_gpio_cfgpin() to
use the recently introduced s3c_gpio_cfgpin_range().
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
[kgene.kim@samsung.com: modified to s5p64x0 from s5p6440]
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Change the code setting ranges of GPIO pins using s3c_gpio_cfgpin() to
use the recently introduced s3c_gpio_cfgpin_range().
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
[kgene.kim@samsung.com: fixed wrong change]
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Change the code setting ranges of GPIO pins using s3c_gpio_cfgpin() to
use the recently introduced s3c_gpio_cfgpin_range().
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
To aide in changing the gpio code, remove the use of pin-specific configs
and move to using the S3C_GPIO_SFN() versions.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Add s3c_gpio_cfgpin_range() to configure a range of pins to the given
value. This is useful for a number of blocks where the pins are in order
and saves multiple calls to s3c_gpio_cfgpin().
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu: update comments to reflect that percpu allocations are always zero-filled
percpu: Optimize __get_cpu_var()
x86, percpu: Optimize this_cpu_ptr
percpu: clear memory allocated with the km allocator
percpu: fix build breakage on s390 and cleanup build configuration tests
percpu: use percpu allocator on UP too
percpu: reduce PCPU_MIN_UNIT_SIZE to 32k
vmalloc: pcpu_get/free_vm_areas() aren't needed on UP
Fixed up trivial conflicts in include/linux/percpu.h
With the v4l2_i2c_new_subdev* functions now supporting loading modules
based on modaliases, remove the module names hardcoded in platform data
and pass a NULL module name to those functions.
All corresponding I2C modules have been checked, and all of them include
a module aliases table with names corresponding to what the soc_camera
platform data uses.
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
With the v4l2_i2c_new_subdev* functions now supporting loading modules
based on modaliases, remove the module names hardcoded in platform data
and pass a NULL module name to those functions.
All corresponding I2C modules have been checked, and all of them include
a module aliases table with names corresponding to what the sh_vou
platform data uses.
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The kernel debug_core invokes hw breakpoint install and removal via
call backs. The architecture specific kgdb stubs only need to
implement the call backs and not actually call the functions.
Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
CC: x86@kernel.org
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: H. Peter Anvin <hpa@zytor.com>
Fix the following sparse warnings:
kdb_main.c:328:5: warning: symbol 'kdbgetu64arg' was not declared. Should it be static?
kgdboc.c:246:12: warning: symbol 'kgdboc_early_init' was not declared. Should it be static?
kgdb.c:652:26: warning: incorrect type in argument 1 (different address spaces)
kgdb.c:652:26: expected void const *ptr
kgdb.c:652:26: got struct perf_event *[noderef] <asn:3>*pev
The one in kgdb.c required the (void * __force) because of the return
code from register_wide_hw_breakpoint looking like:
return (void __percpu __force *)ERR_PTR(err);
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
HW breakpoints events stopped working correctly with kgdb as a result
of commit: 018cbffe68 (Merge commit
'v2.6.33' into perf/core), later commit:
ba773f7c51 (x86,kgdb: Fix hw breakpoint
regression) allowed breakpoints to propagate to the debugger core but
did not completely address the original regression in functionality
found in 2.6.35.
When the DR_STEP flag is set in dr6 along with any of the DR_TRAP
bits, the kgdb exception handler will enter once from the
hw_breakpoint API call back and again from the die notifier for
do_debug(), which causes the debugger to stop twice and also for the
kgdb regression tests to fail running under kvm with:
echo V2I1 > /sys/module/kgdbts/parameters/kgdbts
To address the problem, the kgdb overflow handler needs to implement
the same logic as the ptrace overflow handler call back with respect
to updating the virtual copy of dr6. This will allow the kgdb
do_debug() die notifier to properly handle the exception and the
attached debugger, or kgdb test suite, will only receive a single
notification.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: x86@kernel.org
Start unifying the PPI/EPPI peripheral structures in one place. This
may be used by camera/video/fpga/high speed devices.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
This matches all the other Blackfin ports and keep us from having to write
bf561-specific code in many places.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Now that the common 8250 serial driver supports an "irqflags" field,
we don't need to patch in a custom define into the code.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Different arches use different names, so make sure we define both so
common code (like MTD_XIP) "just works".
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Mainline version of git merged support for Blackfin parts, but we now
need to propagate the gcc arch define to make it work.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
We have to use ioctl numbers that don't collide with common code.
Otherwise, these ones never even get called because the common fs
code swalled all invocations.
Reported-by: Kay Duenzer <kduenzer@maku.eu>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
This board has a SSM2603 codec, so make sure we have the right resources
declared for it.
Signed-off-by: Bob Liu <lliubbo@gmail.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
The RX/TX address is always the same regardless of the size of the access.
That means there is no dedicated "16bit" or "32bit" MMR. Trying to use
these currently leads to compile errors. So change everything to use the
right MMR define.
Signed-off-by: Bob Liu <lliubbo@gmail.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
We don't want Linux to think that the cpu supports MTRRs when running
under Xen because MTRR operations could only be performed through
hypercalls.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
add the direct mapping area for ISA bus access when running as initial
domain
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Implement xen_create_msi_irq to create an msi and remap it as pirq.
Use xen_create_msi_irq to implement an initial domain specific version
of setup_msi_irqs.
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Implement xen_register_gsi to setup the correct triggering and polarity
properties of a gsi.
Implement xen_register_pirq to register a particular gsi as pirq and
receive interrupts as events.
Call xen_setup_pirqs to register all the legacy ISA irqs as pirqs.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Add XEN_DOM0 to arch/x86/xen/Kconfig as a silent compile time option
that gets enabled when xen and basic x86, acpi and pci support are
selected.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Map MSIs into pirqs, writing 0 in the MSI vector data field and the pirq
number in the MSI destination id field.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Disable pcifront when running on HVM: it is meant to be used with pv
guests that don't have PCI bus.
Use acpi_register_gsi_xen_hvm to remap GSIs into pirqs.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Rather than using a tree of conditionals, use function pointer
for acpi_register_gsi.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
xen_hvm_register_pirq allows the kernel to map a GSI into a Xen pirq and
receive the interrupt as an event channel from that point on.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
FB_OMAP2 can work without VRFB, but currently does not build. Fix this.
Signed-off-by: Senthilvadivu Guruswamy <svadivu@ti.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@nokia.com>
Simplify conditional: (a || (!a && !b)) => (a || !b)
Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@nokia.com>
Calls init functions of dss_features during dss_probe, and the following
features are made omapxxxx independent:
- number of managers, overlays
- supported color modes for each overlay
- supported displays for each manager
- global aplha, and restriction of global alpha for video1 pipeline
- The register field ranges : FIRHINC, FIRVINC, FIFOHIGHTHRESHOLD
FIFOLOWTHRESHOLD and FIFOSIZE
Signed-off-by: Archit Taneja <archit@ti.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@nokia.com>
get_fbmem_region() is only called by omapfb_reserve_sdram_memblock() and
omapfb_reserve_sram() that both live in .init.text. So get_fbmem_region
can go there, too.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@nokia.com>
This function is only called by omap_detect_sram which lives in .init.text,
too.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@nokia.com>
omap_init_fb() is only called as arch_initcall and so can live in
.init.text.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@nokia.com>
When running as initial domain, get the real physical memory map from
xen using the XENMEM_machine_memory_map hypercall and use it to setup
the e820 regions.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Otherwise the second migration attempt fails because the mfn_list_list
still refers to all the old mfns.
We need to update the entires in both p2m_top_mfn and the mid_mfn
pages which p2m_top_mfn refers to.
In order to do this we need to keep track of the virtual addresses
mapping the p2m_mid_mfn pages since we cannot rely on
mfn_to_virt(p2m_top_mfn[idx]) since p2m_top_mfn[idx] will still
contain the old MFN after a migration, which may now belong to another
domain and hence have a different mapping in the m2p.
Therefore add and maintain a third top level page, p2m_top_mfn_p[],
which tracks the virtual addresses of the mfns contained in
p2m_top_mfn[].
We also need to update the content of the p2m_mid_missing_mfn page on
resume to refer to the page's new mfn.
p2m_missing does not need updating since the migration process takes
care of the leaf p2m pages for us.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
If an E820 region is entirely beyond mem_end, don't attempt to truncate
it and add the truncated pages to extra_pages, as they will be negative.
Also, make sure the extra memory region starts after all BIOS provided
E820 regions (and in the case of RAM regions, post-clipping).
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Convert Linux PAT entries into Xen ones when constructing ptes. Linux
doesn't use _PAGE_PAT for ptes, so the only difference in the first 4
entries is that Linux uses _PAGE_PWT for WC, whereas Xen (and default)
use it for WT.
xen_pte_val does the inverse conversion.
We hard-code assumptions about Linux's current PAT layout, but a
warning on the wrmsr to MSR_IA32_CR_PAT should point out any problems.
If necessary we could go to a more general table-based conversion between
Linux and Xen PAT entries.
hugetlbfs poses a problem at the moment, the x86 architecture uses the
same flag for _PAGE_PAT and _PAGE_PSE, which changes meaning depending
on which pagetable level we're using. At the moment this should be OK
so long as nobody tries to do a pte_val on a hugetlbfs pte.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Keep xen_max_p2m_pfn up to date with the end of the extra memory
we're adding. It is possible that it will be too high since memory
may be truncated by a "mem=" option on the kernel command line, but
that won't matter.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
If extra memory is very much larger than the base memory size
then all of the base memory can be filled with structures reserved to
describe the extra memory, leaving no space for anything else.
Even at the maximum ratio there will be little space for anything else,
but this change is intended to at least allow the system to boot rather
than crash mysteriously.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
If an entire E820 RAM region is beyond mem_end, still add its
pages to the extra area so that space can be used by the kernel.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
If Xen gives us non-RAM E820 entries (dom0 only, typically), then
make sure the extra RAM region is beyond them. It's OK for
the extra space to grow into E820 regions, however.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
When using the e820 map to get the initial pseudo-physical address space,
look for either Xen-provided memory which doesn't lie within an E820
region, or an E820 RAM region which extends beyond the Xen-provided
memory range.
Count these pages, and add them to a new "extra memory" range. This range
has an E820 RAM range to describe it - so the kernel will allocate page
structures for it - but it is also marked reserved so that the kernel
will not attempt to use it.
The balloon driver can then add this range as a set of currently
ballooned-out pages, which can be used to extend the domain beyond its
original size.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Rather than simply using a flat memory map from Xen, use its provided
E820 map. This allows the domain builder to tell the domain to reserve
space for more pages than those initially provided at domain-build time.
It also allows the host to specify holes in the address space (for
PCI-passthrough, for example).
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
When setting up a pte for a missing pfn (no matching mfn), just create
an empty pte rather than a junk mapping.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
When building mfn parts of p2m structure, we rely on being able to
use mfn_to_virt, which in turn requires kernel to be mapped into
the linear area (which is distinct from the kernel image mapping
on 64-bit). Defer calling xen_build_mfn_list_list() until after
xen_setup_kernel_pagetable();
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
set_phys_to_machine() can return false on failure, which means a memory
allocation failure for the p2m structure. It can only fail if setting
the mfn for a pfn in previously unused address space. It is guaranteed
to succeed if you're setting a mapping to INVALID_P2M_ENTRY or updating
the mfn for an existing pfn.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Make the p2m structure a 3 level tree which covers the full possible
physical space.
The p2m structure contains mappings from the domain's pfns to system-wide
mfns. The structure has 3 levels and two roots. The first root is for
the domain's own use, and is linked with virtual addresses. The second
is all mfn references, and is used by Xen on save/restore to allow it to
update the p2m mapping for the domain.
At boot, the domain builder provides a simple flat p2m array for all the
initially present pages. We construct the two levels above that using
the early_brk allocator. After early boot time, set_phys_to_machine()
will allocate any missing levels using the normal kernel allocator
(at GFP_KERNEL, so it must be called in a normal blocking context).
Because the early_brk() API requires us to pre-reserve the maximum amount
of memory we could allocate, there is still a CONFIG_XEN_MAX_DOMAIN_MEMORY
config option, but its only negative side-effect is to increase the
kernel's apparent bss size. However, since all unused brk memory is
returned to the heap, there's no real downside to making it large.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Allocate p2m tables based on the actual runtime maximum pfn rather than
the static config-time limit.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Use early brk mechanism to allocate p2m tables, to save memory when
booting non-Xen.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Commit ab69bcd66f (arm: remove
machine_desc.io_pg_offst and .phys_io) could not update
the new boards in the omap tree. This causes the build of
omap2plus_defconfig to fail. Fix this.
Signed-off-by: Anand Gadiyar <gadiyar@ti.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Eric Miao <eric.miao at canonical.com>
[tony@atomide.com: updated description]
Signed-off-by: Tony Lindgren <tony@atomide.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic/io.h: allow people to override individual funcs
bitops: remove duplicated extern declarations
bitops: make asm-generic/bitops/find.h more generic
asm-generic: kdebug.h: Checkpatch cleanup
asm-generic: fcntl: make exported headers use strict posix types
asm-generic: cmpxchg does not handle non-long arguments
asm-generic: make atomic_add_unless a function
On OMAP24xx, UART2 WKEN and WKST registers are in PM_WKEN2_CORE and
PM_WKST2_CORE respecitvely. Fix the OMAP2 register init to use the
correct registers on OMAP24xx.
Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
This fixes a warning
warning: (IMX_HAVE_PLATFORM_FLEXCAN && ARCH_MXC) selects HAVE_CAN_FLEXCAN which has unmet direct dependencies (NET && CAN)
(The warning in general exists since
246cf9c (kbuild: Warn on selecting symbols with unmet direct dependencies)
which was reverted between 71ebc01 and b1f7d6e.)
Reported-and-introduced-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
* 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl:
vfs: make no_llseek the default
vfs: don't use BKL in default_llseek
llseek: automatically add .llseek fop
libfs: use generic_file_llseek for simple_attr
mac80211: disallow seeks in minstrel debug code
lirc: make chardev nonseekable
viotape: use noop_llseek
raw: use explicit llseek file operations
ibmasmfs: use generic_file_llseek
spufs: use llseek in all file operations
arm/omap: use generic_file_llseek in iommu_debug
lkdtm: use generic_file_llseek in debugfs
net/wireless: use generic_file_llseek in debugfs
drm: use noop_llseek
* 'config' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl:
BKL: introduce CONFIG_BKL.
dabusb: remove the BKL
sunrpc: remove the big kernel lock
init/main.c: remove BKL notations
blktrace: remove the big kernel lock
rtmutex-tester: make it build without BKL
dvb-core: kill the big kernel lock
dvb/bt8xx: kill the big kernel lock
tlclk: remove big kernel lock
fix rawctl compat ioctls breakage on amd64 and itanic
uml: kill big kernel lock
parisc: remove big kernel lock
cris: autoconvert trivial BKL users
alpha: kill big kernel lock
isapnp: BKL removal
s390/block: kill the big kernel lock
hpet: kill BKL, add compat_ioctl
this patch gives the possibility to workaround bug ENGcm09152
on i.MX35 when the hardware workaround is also implemented on
the board.
It covers the workaround described on page 25 of the following Errata :
http://cache.freescale.com/files/dsp/doc/errata/IMX35CE.pdf
Signed-off-by: Eric Bénard <eric@eukrea.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
AM35x has musb interface (version 1.8) and uses CPPI41 DMA engine.
It has USB phy built inside the IP itself.
Signed-off-by: Ajay Kumar Gupta <ajay.gupta@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-omap@vger.kernel.org
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Replace FSL USB platform code by simple platform driver for
creation of FSL USB platform devices.
The driver creates platform devices based on the information
from USB nodes in the flat device tree. This is the replacement
for old arch fsl_soc usb code removed by this patch. The driver
uses usual of-style binding, available EHCI-HCD and UDC
drivers can be bound to the created devices. The new of-style
driver additionaly instantiates USB OTG platform device, as the
appropriate USB OTG driver will be added soon.
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Custom UART PM hook for Alchemy chips: do standard UART pm and
additionally en/disable uart block clocks as needed.
This allows to get rid of a debug port PM hack in the Alchemy pm code.
Tested on Db1200.
Signed-off-by: Manuel Lauss <manuel.lauss@googlemail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Again basically cut and paste
Convert the main driver set to use the hooks for GICOUNT
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch converts s390 to use asm-generic/ioctls.h instead of its
own version.
The differences between the arch-specific version and the generic
version are as follows:
- S390 defines its own value for FIOQSIZE, asm-generic/ioctls.h keeps it
- The generic version adds TIOCGRS485 and TIOCGRS485, which are unused
by any driver available on this architecture
- The generic version adds support for termiox
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch converts mn10300 to use asm-generic/ioctls.h instead of its
own version.
The differences between the arch-specific version and the generic
version are as follows:
- The generic version provides TIOCGRS485 and TIOCSRS485 but they
are unused by any driver available for this architecture.
- The generic version adds support for termiox
Cc: David Howells <dhowells@redhat.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch converts m68k to use asm-generic/ioctls.h instead of its
own version.
The differences between the arch-specific version and the generic
version are as follows:
- m68k defines its own value for FIOQSIZE, asm-generic/ioctls.h keeps it
- The generic version adds TIOCSRS485 and TIOCGRS485m which are unused
by any driver available on this architecture.
- The generic version adds support for termiox
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>