Add an ioctl to the /dev/xen/xenbus_backend device allowing the xenbus
backend to be started after the kernel has booted. This allows xenstore
to run in a different domain from the dom0.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
This file depends on <xen/xen.h>, but the dependency was hidden due
to: <asm/acpi.h> -> <asm/trampoline.h> -> <asm/io.h> -> <xen/xen.h>
With the removal of <asm/trampoline.h>, this exposed the missing
Reported-by: Ingo Molnar <mingo@kernel.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
This patch is a significant performance improvement for the
m2p_override: about 6% using the gntdev device.
Each m2p_add/remove_override call issues a MULTI_grant_table_op and a
__flush_tlb_single if kmap_op != NULL. Batching all the calls together
is a great performance benefit because it means issuing one hypercall
total rather than two hypercall per page.
If paravirt_lazy_mode is set PARAVIRT_LAZY_MMU, all these calls are
going to be batched together, otherwise they are issued one at a time.
Adding arch_enter_lazy_mmu_mode/arch_leave_lazy_mmu_mode around the
m2p_add/remove_override calls forces paravirt_lazy_mode to
PARAVIRT_LAZY_MMU, therefore makes sure that they are always batched.
However it is not safe to call arch_enter_lazy_mmu_mode if we are in
interrupt context or if we are already in PARAVIRT_LAZY_MMU mode, so
check for both conditions before doing so.
Changes in v4:
- rebased on 3.4-rc4: all the m2p_override users call gnttab_unmap_refs
and gnttab_map_refs;
- check whether we are in interrupt context and the lazy_mode we are in
before calling arch_enter/leave_lazy_mmu_mode.
Changes in v3:
- do not call arch_enter/leave_lazy_mmu_mode in xen_blkbk_unmap, that
can be called in interrupt context.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
[v5: s/int lazy/bool lazy/]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Provide the registration callback to call in the Xen's
ACPI sleep functionality. This means that during S3/S5
we make a hypercall XENPF_enter_acpi_sleep with the
proper PM1A/PM1B registers.
Based of Ke Yu's <ke.yu@intel.com> initial idea.
[ From http://xenbits.xensource.com/linux-2.6.18-xen.hg
change c68699484a65 ]
[v1: Added Copyright and license]
[v2: Added check if PM1A/B the 16-bits MSB contain something. The spec
only uses 16-bits but might have more in future]
Signed-off-by: Liang Tang <liang.tang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Fit it into 80 columns so that it is readable in menuconfig.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
We did a similar check for the P-states but did not do it for
the C-states. What we want to do is ignore cases where the DSDT
has definition for sixteen CPUs, but the machine only has eight
CPUs and we get:
xen-acpi-processor: (CX): Hypervisor error (-22) for ACPI CPU14
Reported-by: Tobias Geiger <tobias.geiger@vido.info>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
In pirq_check_eoi_map use the pirq number rather than the Linux irq
number to check whether an eoi is needed in the pirq_eoi_map.
The reason is that the irq number is not always identical to the
pirq number so if we wrongly use the irq number to check the
pirq_eoi_map we are going to check for the wrong pirq to EOI.
As a consequence some interrupts might not be EOI'ed by the
guest correctly.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Tested-by: Tobias Geiger <tobias.geiger@vido.info>
[v1: Added some extra wording to git commit]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
linux/drivers/xen/manage.c: In function 'do_suspend':
linux/drivers/xen/manage.c:160:5: warning: 'si.cancelled' may be used uninitialized in this function
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
A rather annoying and common case is when booting a PVonHVM guest
and exposing the PV KBD and PV VFB - as broken toolstacks don't
always initialize the backends correctly.
Normally The HVM guest is using the VGA driver and the emulated
keyboard for this (though upstream version of QEMU implements
PV KBD, but still uses a VGA driver). We provide a very basic
two-stage wait mechanism - where we wait for 30 seconds for all
devices, and then for 270 for all them except the two mentioned.
That allows us to wait for the essential devices, like network
or disk for the full 6 minutes.
To trigger this, put this in your guest config:
vfb = [ 'vnc=1, vnclisten=0.0.0.0 ,vncunused=1']
instead of this:
vnc=1
vnclisten="0.0.0.0"
CC: stable@kernel.org
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
[v3: Split delay in non-essential (30 seconds) and essential
devices per Ian and Stefano suggestion]
[v4: Added comments per Stefano suggestion]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* commit 'c104f1fa1ecf4ee0fc06e31b1f77630b2551be81': (14566 commits)
cpufreq: OMAP: fix build errors: depends on ARCH_OMAP2PLUS
sparc64: Eliminate obsolete __handle_softirq() function
sparc64: Fix bootup crash on sun4v.
kconfig: delete last traces of __enabled_ from autoconf.h
Revert "kconfig: fix __enabled_ macros definition for invisible and un-selected symbols"
kconfig: fix IS_ENABLED to not require all options to be defined
irq_domain: fix type mismatch in debugfs output format
staging: android: fix mem leaks in __persistent_ram_init()
staging: vt6656: Don't leak memory in drivers/staging/vt6656/ioctl.c::private_ioctl()
staging: iio: hmc5843: Fix crash in probe function.
panic: fix stack dump print on direct call to panic()
drivers/rtc/rtc-pl031.c: enable clock on all ST variants
Revert "mm: vmscan: fix misused nr_reclaimed in shrink_mem_cgroup_zone()"
hugetlb: fix race condition in hugetlb_fault()
drivers/rtc/rtc-twl.c: use static register while reading time
drivers/rtc/rtc-s3c.c: add placeholder for driver private data
drivers/rtc/rtc-s3c.c: fix compilation error
MAINTAINERS: add PCDP console maintainer
memcg: do not open code accesses to res_counter members
drivers/rtc/rtc-efi.c: fix section mismatch warning
...
Rather than just leaking pages that can't be freed at the point where
access permission for the backend domain gets revoked, put them on a
list and run a timer to (infrequently) retry freeing them. (This can
particularly happen when unloading a frontend driver when devices are
still present, and the backend still has them in non-closed state or
hasn't finished closing them yet.)
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Since we are using the m2p_override we do have struct pages
corresponding to the user vma mmap'ed by gntdev.
Removing the VM_PFNMAP flag makes get_user_pages work on that vma.
An example test case would be using a Xen userspace block backend
(QDISK) on a file on NFS using O_DIRECT.
CC: stable@kernel.org
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jump to the label ini_nomem as done on the failure of the page allocations
above.
The code at ini_nomem is modified to accommodate different return values.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* one is a workaround that will be removed in v3.5 with proper fix in the tip/x86 tree,
* the other is to fix drivers to load on PV (a previous patch made them only
load in PVonHVM mode).
The rest are just minor fixes in the various drivers and some cleanup in the
core code.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQEcBAABAgAGBQJPfyVUAAoJEFjIrFwIi8fJUjUH/jbY5JavRqSlNELZW2A4Ta76
8p00LqLHw/C56iHZcWKke8mqtWNb+ZfcQt7ZYcxDIYa4QWBL28x0OLAO2tOBIt37
ZjYESWSdFJaJvmpADluWtFyGyZ9TYJllDTBm/jWj1ZtKSZvR1YkhuMXCS0f4AmGQ
xFzSWJZUDdiOAqpN+VQD8wP00gfR8knQLg16XE2fvFdQo4XwpCtqLfHV/5pMMGdy
Cs/ep6rq/7cdv/nshKOcBnw7RW8l3Xoi/28ht8k3DvAQ2VtFq1Tugv2G9pcCHwQG
DIBkB3SOU6/v6P5at5+egKS5xR1fJetCWlkMd8kkbcdz2NPI4UDMkvOW6Q8yQls=
=6Ve+
-----END PGP SIGNATURE-----
Merge tag 'stable/for-linus-3.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
Pull xen fixes from Konrad Rzeszutek Wilk:
"Two fixes for regressions:
* one is a workaround that will be removed in v3.5 with proper fix in
the tip/x86 tree,
* the other is to fix drivers to load on PV (a previous patch made
them only load in PVonHVM mode).
The rest are just minor fixes in the various drivers and some cleanup
in the core code."
* tag 'stable/for-linus-3.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/pcifront: avoid pci_frontend_enable_msix() falsely returning success
xen/pciback: fix XEN_PCI_OP_enable_msix result
xen/smp: Remove unnecessary call to smp_processor_id()
xen/x86: Workaround 'x86/ioapic: Add register level checks to detect bogus io-apic entries'
xen: only check xen_platform_pci_unplug if hvm
Prior to 2.6.19 and as of 2.6.31, pci_enable_msix() can return a
positive value to indicate the number of vectors (less than the amount
requested) that can be set up for a given device. Returning this as an
operation value (secondary result) is fine, but (primary) operation
results are expected to be negative (error) or zero (success) according
to the protocol. With the frontend fixed to match the XenoLinux
behavior, the backend can now validly return zero (success) here,
passing the upper limit on the number of vectors in op->value.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Pull DMA mapping branch from Marek Szyprowski:
"Short summary for the whole series:
A few limitations have been identified in the current dma-mapping
design and its implementations for various architectures. There exist
more than one function for allocating and freeing the buffers:
currently these 3 are used dma_{alloc, free}_coherent,
dma_{alloc,free}_writecombine, dma_{alloc,free}_noncoherent.
For most of the systems these calls are almost equivalent and can be
interchanged. For others, especially the truly non-coherent ones
(like ARM), the difference can be easily noticed in overall driver
performance. Sadly not all architectures provide implementations for
all of them, so the drivers might need to be adapted and cannot be
easily shared between different architectures. The provided patches
unify all these functions and hide the differences under the already
existing dma attributes concept. The thread with more references is
available here:
http://www.spinics.net/lists/linux-sh/msg09777.html
These patches are also a prerequisite for unifying DMA-mapping
implementation on ARM architecture with the common one provided by
dma_map_ops structure and extending it with IOMMU support. More
information is available in the following thread:
http://thread.gmane.org/gmane.linux.kernel.cross-arch/12819
More works on dma-mapping framework are planned, especially in the
area of buffer sharing and managing the shared mappings (together with
the recently introduced dma_buf interface: commit d15bd7ee44
"dma-buf: Introduce dma buffer sharing mechanism").
The patches in the current set introduce a new alloc/free methods
(with support for memory attributes) in dma_map_ops structure, which
will later replace dma_alloc_coherent and dma_alloc_writecombine
functions."
People finally started piping up with support for merging this, so I'm
merging it as the last of the pending stuff from the merge window.
Looks like pohmelfs is going to wait for 3.5 and more external support
for merging.
* 'for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
common: DMA-mapping: add NON-CONSISTENT attribute
common: DMA-mapping: add WRITE_COMBINE attribute
common: dma-mapping: introduce mmap method
common: dma-mapping: remove old alloc_coherent and free_coherent methods
Hexagon: adapt for dma_map_ops changes
Unicore32: adapt for dma_map_ops changes
Microblaze: adapt for dma_map_ops changes
SH: adapt for dma_map_ops changes
Alpha: adapt for dma_map_ops changes
SPARC: adapt for dma_map_ops changes
PowerPC: adapt for dma_map_ops changes
MIPS: adapt for dma_map_ops changes
X86 & IA64: adapt for dma_map_ops changes
common: dma-mapping: introduce generic alloc() and free() methods
Adapt core x86 and IA64 architecture code for dma_map_ops changes: replace
alloc/free_coherent with generic alloc/free methods.
Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
[removed swiotlb related changes and replaced it with wrappers,
merged with IA64 patch to avoid inter-patch dependences in intel-iommu code]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Tony Luck <tony.luck@intel.com>
* Add fast-EOI acking of interrupts (clear a bit instead of hypercall)
And bug-fixes:
* Fix CPU bring-up code missing a call to notify other subsystems.
* Fix reading /sys/hypervisor even if PVonHVM drivers are not loaded.
* In Xen ACPI processor driver: remove too verbose WARN messages, fix up
the Kconfig dependency to be a module by default, and add dependency on
CPU_FREQ.
* Disable CPU frequency drivers from loading when booting under Xen
(as we want the Xen ACPI processor to be used instead).
* Cleanups in tmem code.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQEcBAABAgAGBQJPbc3DAAoJEFjIrFwIi8fJTQkIAMnH2fPhcHAb4mNaz+3gdmsZ
Flo6V1gMBcO8xKZlUkFgKKPYoOm7lLmvoceXLVSH5oOKSnSJo1zSinzKmcdJQo/D
kPo4/EguNwtzcAcQh2dmT6/IM9O3ihMKUli7Oajif9PLCFFFqTaG3Y3YNBo/rxTY
D3HAnJrIfmIyG0NpLnaFCWhCzUvcB4M7ysutECqcF8l5gnbHxRVeCKD0blM+n9GH
Wyum00dQCwo6h6wTduhPOAxHAM4rncyR3heOB2vDxq9YJHSUhhcva5QCgQ+tdUVt
6U2TQT1L2Px8iXXzr2w9YBpepOVajZReoKhajLjJ5VbkpBZFz5dVNfJ8LpF8RV8=
=z8IB
-----END PGP SIGNATURE-----
Merge tag 'stable/for-linus-3.4-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
Pull more xen updates from Konrad Rzeszutek Wilk:
"One tiny feature that accidentally got lost in the initial git pull:
* Add fast-EOI acking of interrupts (clear a bit instead of
hypercall)
And bug-fixes:
* Fix CPU bring-up code missing a call to notify other subsystems.
* Fix reading /sys/hypervisor even if PVonHVM drivers are not loaded.
* In Xen ACPI processor driver: remove too verbose WARN messages, fix
up the Kconfig dependency to be a module by default, and add
dependency on CPU_FREQ.
* Disable CPU frequency drivers from loading when booting under Xen
(as we want the Xen ACPI processor to be used instead).
* Cleanups in tmem code."
* tag 'stable/for-linus-3.4-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/acpi: Fix Kconfig dependency on CPU_FREQ
xen: initialize platform-pci even if xen_emul_unplug=never
xen/smp: Fix bringup bug in AP code.
xen/acpi: Remove the WARN's as they just create noise.
xen/tmem: cleanup
xen: support pirq_eoi_map
xen/acpi-processor: Do not depend on CPU frequency scaling drivers.
xen/cpufreq: Disable the cpu frequency scaling drivers from loading.
provide disable_cpufreq() function to disable the API.
The functions: "acpi_processor_*" sound like they depend on CONFIG_ACPI_PROCESSOR
but in reality they are exposed when CONFIG_CPU_FREQ=[y|m]. As such
update the Kconfig to have this dependency and fix compile issues:
ERROR: "acpi_processor_unregister_performance" [drivers/xen/xen-acpi-processor.ko] undefined!
ERROR: "acpi_processor_notify_smm" [drivers/xen/xen-acpi-processor.ko] undefined!
ERROR: "acpi_processor_register_performance" [drivers/xen/xen-acpi-processor.ko] undefined!
ERROR: "acpi_processor_preregister_performance" [drivers/xen/xen-acpi-processor.ko] undefined!
Note: We still need the CONFIG_ACPI
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
- PV multiconsole support, so that there can be hvc1, hvc2, etc;
- P-state and C-state power management driver that uploads said
power management data to the hypervisor. It also inhibits cpufreq
scaling drivers to load so that only the hypervisor can make power
management decisions - fixing a weird perf bug.
- Function Level Reset (FLR) support in the Xen PCI backend.
Fixes:
- Kconfig dependencies for Xen PV keyboard and video
- Compile warnings and constify fixes
- Change over to use percpu_xxx instead of this_cpu_xxx
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQEcBAABAgAGBQJPZ0qkAAoJEFjIrFwIi8fJjCgH/jeJ39E8ML8DP9tCS2HQnMqM
uTEjLcqvoJ7sEhHvtBLPeG2p0jyBvOWjLbSc7P8nESBAMPvSYol8L6WqfWrdSU4r
lHrma2sg9UYzRog5NyxAgkp7bBsBBFOnhVL3Cxb5Ig78cPWzeSWGpqGZ8M/d51Wf
1iE0tHuU4DpN+fg1SZqPqEm8ecEJ/eSrVTnyTx/Qo2Ak+Zw98SqzX7SV5lo8mudd
WFL1F2K9FyTNk79ndGhqFt36x6nEbFgMLbmCDWumLuWN6bMd1Uq0wNkCqW4F1h28
3yqnY+rfQh4y3eXK1B9nttCUTs+/66U5ZWrT6B1IJumGTAIqcWfgeUX/Vn/HVC4=
=tfMc
-----END PGP SIGNATURE-----
Merge tag 'stable/for-linus-3.4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
Pull xen updates from Konrad Rzeszutek Wilk:
"which has three neat features:
- PV multiconsole support, so that there can be hvc1, hvc2, etc; This
can be used in HVM and in PV mode.
- P-state and C-state power management driver that uploads said power
management data to the hypervisor. It also inhibits cpufreq
scaling drivers to load so that only the hypervisor can make power
management decisions - fixing a weird perf bug.
There is one thing in the Kconfig that you won't like: "default y
if (X86_ACPI_CPUFREQ = y || X86_POWERNOW_K8 = y)" (note, that it
all depends on CONFIG_XEN which depends on CONFIG_PARAVIRT which by
default is off). I've a fix to convert that boolean expression
into "default m" which I am going to post after the cpufreq git
pull - as the two patches to make this work depend on a fix in Dave
Jones's tree.
- Function Level Reset (FLR) support in the Xen PCI backend.
Fixes:
- Kconfig dependencies for Xen PV keyboard and video
- Compile warnings and constify fixes
- Change over to use percpu_xxx instead of this_cpu_xxx"
Fix up trivial conflicts in drivers/tty/hvc/hvc_xen.c due to changes to
a removed commit.
* tag 'stable/for-linus-3.4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen kconfig: relax INPUT_XEN_KBDDEV_FRONTEND deps
xen/acpi-processor: C and P-state driver that uploads said data to hypervisor.
xen: constify all instances of "struct attribute_group"
xen/xenbus: ignore console/0
hvc_xen: introduce HVC_XEN_FRONTEND
hvc_xen: implement multiconsole support
hvc_xen: support PV on HVM consoles
xenbus: don't free other end details too early
xen/enlighten: Expose MWAIT and MWAIT_LEAF if hypervisor OKs it.
xen/setup/pm/acpi: Remove the call to boot_option_idle_override.
xenbus: address compiler warnings
xen: use this_cpu_xxx replace percpu_xxx funcs
xen/pciback: Support pci_reset_function, aka FLR or D3 support.
pci: Introduce __pci_reset_function_locked to be used when holding device_lock.
xen: Utilize the restore_msi_irqs hook.
into debugfs, and use __read_mostly as neccessary.
Also add a MAINTAINER file for cleancache API files.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQEcBAABAgAGBQJPZ1sjAAoJEFjIrFwIi8fJCSkIALqBBCkuPSusvdebT59Oq145
EW2eEwJzVft0vlvS/IPl+W37TH5VWhBVCW8frAYMpLaY5X/W1ZwzFpx76T4acAiM
wkXwsyYXs6N13OOFoH5gmf3cwproAioRxbEeALucxeMR4rK5Yw+oZXT+hSu2KaLh
1LtHyx+u+NLb0QOseAuDcmyjY9r6aLBA1HMcVD2z+4UW1n/9NWexQP3ShYP9uMDs
GnyDzZ8vWXTcoc4Auj0rpaNsT5d47ltGegKASZmmvS3QyMHbZ4sk2HECnQY4wSee
alPJwDyb6mOJmQ3e4s940onjfZPgd8/cW/DVEUrveH+dz6Eqjxqz41dVRVXiLz8=
=tUNC
-----END PGP SIGNATURE-----
Merge tag 'stable/for-linus-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm
Pull cleancache changes from Konrad Rzeszutek Wilk:
"This has some patches for the cleancache API that should have been
submitted a _long_ time ago. They are basically cleanups:
- rename of flush to invalidate
- moving reporting of statistics into debugfs
- use __read_mostly as necessary.
Oh, and also the MAINTAINERS file change. The files (except the
MAINTAINERS file) have been in #linux-next for months now. The late
addition of MAINTAINERS file is a brain-fart on my side - didn't
realize I needed that just until I was typing this up - and I based
that patch on v3.3 - so the tree is on top of v3.3."
* tag 'stable/for-linus-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm:
MAINTAINERS: Adding cleancache API to the list.
mm: cleancache: Use __read_mostly as appropiate.
mm: cleancache: report statistics via debugfs instead of sysfs.
mm: zcache/tmem/cleancache: s/flush/invalidate/
mm: cleancache: s/flush/invalidate/
When xen_emul_unplug=never is specified on kernel command line
reading files from /sys/hypervisor is broken (returns -EBUSY).
It is caused by xen_bus dependency on platform-pci and
platform-pci isn't initialized when xen_emul_unplug=never is
specified.
Fix it by allowing platform-pci to ignore xen_emul_unplug=never,
and do not intialize xen_[blk|net]front instead.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
When booting the kernel under machines that do not have P-states
we would end up with:
------------[ cut here ]------------
WARNING: at drivers/xen/xen-acpi-processor.c:504
xen_acpi_processor_init+0x286/0
x2e0()
Hardware name: ProLiant BL460c G6
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.39-200.0.3.el5uek #1
Call Trace:
[<ffffffff8191d056>] ? xen_acpi_processor_init+0x286/0x2e0
[<ffffffff81068300>] warn_slowpath_common+0x90/0xc0
[<ffffffff8191cdd0>] ? check_acpi_ids+0x1e0/0x1e0
[<ffffffff8106834a>] warn_slowpath_null+0x1a/0x20
[<ffffffff8191d056>] xen_acpi_processor_init+0x286/0x2e0
[<ffffffff8191cdd0>] ? check_acpi_ids+0x1e0/0x1e0
[<ffffffff81002168>] do_one_initcall+0xe8/0x130
.. snip..
Which is OK - the machines do not have P-states, so we fail to register
to process the _PXX states. But there is no need to WARN the user
of it.
Oracle BZ# 13871288
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Use 'bool' for boolean variables. Do proper section placement.
Eliminate an unnecessary export.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The pirq_eoi_map is a bitmap offered by Xen to check which pirqs need to
be EOI'd without having to issue an hypercall every time.
We use PHYSDEVOP_pirq_eoi_gmfn_v2 to map the bitmap, then if we
succeed we use pirq_eoi_map to check whether pirqs need eoi.
Changes in v3:
- explicitly use PHYSDEVOP_pirq_eoi_gmfn_v2 rather than
PHYSDEVOP_pirq_eoi_gmfn;
- introduce pirq_check_eoi_map, a function to check if a pirq needs an
eoi using the map;
-rename pirq_needs_eoi into pirq_needs_eoi_flag;
- introduce a function pointer called pirq_needs_eoi that is going to be
set to the right implementation depending on the availability of
PHYSDEVOP_pirq_eoi_gmfn_v2.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
With patch "xen/cpufreq: Disable the cpu frequency scaling drivers
from loading." we do not have to worry about said drivers loading
themselves before the xen-acpi-processor driver. Hence we can remove
the default selection (=y if CPU frequency drivers were built-in, or
=m if CPU frequency drivers were built as modules), and just
select =m for the default case.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
This driver solves three problems:
1). Parse and upload ACPI0007 (or PROCESSOR_TYPE) information to the
hypervisor - aka P-states (cpufreq data).
2). Upload the the Cx state information (cpuidle data).
3). Inhibit CPU frequency scaling drivers from loading.
The reason for wanting to solve 1) and 2) is such that the Xen hypervisor
is the only one that knows the CPU usage of different guests and can
make the proper decision of when to put CPUs and packages in proper states.
Unfortunately the hypervisor has no support to parse ACPI DSDT tables, hence it
needs help from the initial domain to provide this information. The reason
for 3) is that we do not want the initial domain to change P-states while the
hypervisor is doing it as well - it causes rather some funny cases of P-states
transitions.
For this to work, the driver parses the Power Management data and uploads said
information to the Xen hypervisor. It also calls acpi_processor_notify_smm()
to inhibit the other CPU frequency scaling drivers from being loaded.
Everything revolves around the 'struct acpi_processor' structure which
gets updated during the bootup cycle in different stages. At the startup, when
the ACPI parser starts, the C-state information is processed (processor_idle)
and saved in said structure as 'power' element. Later on, the CPU frequency
scaling driver (powernow-k8 or acpi_cpufreq), would call the the
acpi_processor_* (processor_perflib functions) to parse P-states information
and populate in the said structure the 'performance' element.
Since we do not want the CPU frequency scaling drivers from loading
we have to call the acpi_processor_* functions to parse the P-states and
call "acpi_processor_notify_smm" to stop them from loading.
There is also one oddity in this driver which is that under Xen, the
physical online CPU count can be different from the virtual online CPU count.
Meaning that the macros 'for_[online|possible]_cpu' would process only
up to virtual online CPU count. We on the other hand want to process
the full amount of physical CPUs. For that, the driver checks if the ACPI IDs
count is different from the APIC ID count - which can happen if the user
choose to use dom0_max_vcpu argument. In such a case a backup of the PM
structure is used and uploaded to the hypervisor.
[v1-v2: Initial RFC implementations that were posted]
[v3: Changed the name to passthru suggested by Pasi Kärkkäinen <pasik@iki.fi>]
[v4: Added vCPU != pCPU support - aka dom0_max_vcpus support]
[v5: Cleaned up the driver, fix bug under Athlon XP]
[v6: Changed the driver to a CPU frequency governor]
[v7: Jan Beulich <jbeulich@suse.com> suggestion to make it a cpufreq scaling driver
made me rework it as driver that inhibits cpufreq scaling driver]
[v8: Per Jan's review comments, fixed up the driver]
[v9: Allow to continue even if acpi_processor_preregister_perf.. fails]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The functions these get passed to have been taking pointers to const
since at least 2.6.16.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Unfortunately xend creates a bogus console/0 frotend/backend entry pair
on xenstore that console backends cannot properly cope with.
Any guest behavior that is not completely ignoring console/0 is going
to either cause problems with xenconsoled or qemu.
Returning 0 or -ENODEV from xencons_probe is not enough because it is
going to cause the frontend state to become 4 or 6 respectively.
The best possible thing we can do here is just ignore the entry from
xenbus_probe_frontend.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The individual drivers' remove functions could legitimately attempt to
access this information (for logging messages if nothing else). Note
that I did not in fact observe a problem anywhere, but I came across
this while looking into the reasons for what turned out to need the
fix at https://lkml.org/lkml/2012/3/5/336 to vsprintf().
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
- casting pointers to integer types of different size is being warned on
- an uninitialized variable warning occurred on certain gcc versions
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
So far only the watch path was checked to be zero terminated, while
the watch token was merely assumed to be.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
.. as the rest of the kernel is using that format.
Suggested-by: Марк Коренберг <socketpair@gmail.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
When the initial domain starts, it prints (depending on the
amount of CPUs) a slew of
XENBUS: Unable to read cpu state
XENBUS: Unable to read cpu state
XENBUS: Unable to read cpu state
XENBUS: Unable to read cpu state
which provide no useful information - as the error is a valid
issue - but not on the initial domain. The reason is that the
XenStore is not accessible at that time (it is after all the
first guest) so the CPU hotplug watch cannot parse "availability/cpu"
attribute.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The current device suspend/resume phases during system-wide power
transitions appear to be insufficient for some platforms that want
to use the same callback routines for saving device states and
related operations during runtime suspend/resume as well as during
system suspend/resume. In principle, they could point their
.suspend_noirq() and .resume_noirq() to the same callback routines
as their .runtime_suspend() and .runtime_resume(), respectively,
but at least some of them require device interrupts to be enabled
while the code in those routines is running.
It also makes sense to have device suspend-resume callbacks that will
be executed with runtime PM disabled and with device interrupts
enabled in case someone needs to run some special code in that
context during system-wide power transitions.
Apart from this, .suspend_noirq() and .resume_noirq() were introduced
as a workaround for drivers using shared interrupts and failing to
prevent their interrupt handlers from accessing suspended hardware.
It appears to be better not to use them for other porposes, or we may
have to deal with some serious confusion (which seems to be happening
already).
For the above reasons, introduce new device suspend/resume phases,
"late suspend" and "early resume" (and analogously for hibernation)
whose callback will be executed with runtime PM disabled and with
device interrupts enabled and whose callback pointers generally may
point to runtime suspend/resume routines.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reviewed-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Complete the renaming from "flush" to "invalidate" across
both tmem frontends (cleancache and frontswap) and both tmem backends
(Xen and zcache), as required by akpm.
This change is completely cosmetic.
[v10: no change]
[v9: akpm@linux-foundation.org: change "flush" to "invalidate", part 3]
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Jan Beulich <JBeulich@novell.com>
Acked-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Rik Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
[v11: Remove the frontswap part]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Fix these warnings:
drivers/xen/biomerge.c:14:1: warning: data definition has no type or storage class [enabled by default]
drivers/xen/biomerge.c:14:1: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL' [-Wimplicit-int]
drivers/xen/biomerge.c:14:1: warning: parameter names (without types) in function declaration [enabled by default]
And this build error:
ERROR: "xen_biovec_phys_mergeable" [drivers/block/nvme.ko] undefined!
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.infradead.org/users/willy/linux-nvme: (105 commits)
NVMe: Set number of queues correctly
NVMe: Version 0.8
NVMe: Set queue flags correctly
NVMe: Simplify nvme_unmap_user_pages
NVMe: Mark the end of the sg list
NVMe: Fix DMA mapping for admin commands
NVMe: Rename IO_TIMEOUT to NVME_IO_TIMEOUT
NVMe: Merge the nvme_bio and nvme_prp data structures
NVMe: Change nvme_completion_fn to take a dev
NVMe: Change get_nvmeq to take a dev instead of a namespace
NVMe: Simplify completion handling
NVMe: Update Identify Controller data structure
NVMe: Implement doorbell stride capability
NVMe: Version 0.7
NVMe: Don't probe namespace 0
Fix calculation of number of pages in a PRP List
NVMe: Create nvme_identify and nvme_get_features functions
NVMe: Fix memory leak in nvme_dev_add()
NVMe: Fix calls to dma_unmap_sg
NVMe: Correct sg list setup in nvme_map_user_pages
...
* 'stable/for-linus-fixes-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/balloon: Move the registration from device to subsystem.
module_param(bool) used to counter-intuitively take an int. In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.
It's time to remove the int/unsigned int option. For this version
it'll simply give a warning, but it'll break next kernel version.
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We use the __pci_reset_function_locked to perform the action.
Also on attaching ("bind") and detaching ("unbind") we save and
restore the configuration states. When the device is disconnected
from a guest we use the "pci_reset_function" to also reset the
device before being passed to another guest.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
With git commit 0706802183
"xen-balloon: convert sysdev_class to a regular subsystem" we would
end up with the attributes being put in:
/sys/devices/xen_memory0/target_kb
instead of
/sys/devices/system/xen_memory/xen_memory0/target_kb
Making the tools inable to deflate the kernel to make more space
for launching another guest and printing:
Error: Failed to query current memory allocation of dom0
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Suggested-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* commit '070680218379e15c1901f4bf21b98e3cbf12b527': (50 commits)
xen-balloon: convert sysdev_class to a regular subsystem
clocksource: convert sysdev_class to a regular subsystem
ibm_rtl: convert sysdev_class to a regular subsystem
edac: convert sysdev_class to a regular subsystem
rtmutex-tester: convert sysdev_class to a regular subsystem
driver-core: implement 'sysdev' functionality for regular devices and buses
kref: fix up the kfree build problems
kref: Remove the memory barriers
kref: Implement kref_put in terms of kref_sub
kref: Inline all functions
Drivers: hv: Get rid of an unnecessary check in hv.c
Drivers: hv: Make the vmbus driver unloadable
Drivers: hv: Fix a memory leak
Documentation: Update stable address
MAINTAINERS: stable: Update address
w1: add fast search for single slave bus
driver-core: skip uevent generation when nobody is listening
drivers: hv: Don't OOPS when you cannot init vmbus
firmware: google: fix gsmi.c build warning
drivers_base: make argument to platform_device_register_full const
...
* 'drm-core-next' of git://people.freedesktop.org/~airlied/linux: (307 commits)
drm/nouveau/pm: fix build with HWMON off
gma500: silence gcc warnings in mid_get_vbt_data()
drm/ttm: fix condition (and vs or)
drm/radeon: double lock typo in radeon_vm_bo_rmv()
drm/radeon: use after free in radeon_vm_bo_add()
drm/sis|via: don't return stack garbage from free_mem ioctl
drm/radeon/kms: remove pointless CS flags priority struct
drm/radeon/kms: check if vm is supported in VA ioctl
drm: introduce drm_can_sleep and use in intel/radeon drivers. (v2)
radeon: Fix disabling PCI bus mastering on big endian hosts.
ttm: fix agp since ttm tt rework
agp: Fix multi-line warning message whitespace
drm/ttm/dma: Fix accounting error when calling ttm_mem_global_free_page and don't try to free freed pages.
drm/ttm/dma: Only call set_pages_array_wb when the page is not in WB pool.
drm/radeon/kms: sync across multiple rings when doing bo moves v3
drm/radeon/kms: Add support for multi-ring sync in CS ioctl (v2)
drm/radeon: GPU virtual memory support v22
drm: make DRM_UNLOCKED ioctls with their own mutex
drm: no need to hold global mutex for static data
drm/radeon/benchmark: common modes sweep ignores 640x480@32
...
Fix up trivial conflicts in radeon/evergreen.c and vmwgfx/vmwgfx_kms.c
* 'stable/for-linus-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (37 commits)
xen/pciback: Expand the warning message to include domain id.
xen/pciback: Fix "device has been assigned to X domain!" warning
xen/pciback: Move the PCI_DEV_FLAGS_ASSIGNED ops to the "[un|]bind"
xen/xenbus: don't reimplement kvasprintf via a fixed size buffer
xenbus: maximum buffer size is XENSTORE_PAYLOAD_MAX
xen/xenbus: Reject replies with payload > XENSTORE_PAYLOAD_MAX.
Xen: consolidate and simplify struct xenbus_driver instantiation
xen-gntalloc: introduce missing kfree
xen/xenbus: Fix compile error - missing header for xen_initial_domain()
xen/netback: Enable netback on HVM guests
xen/grant-table: Support mappings required by blkback
xenbus: Use grant-table wrapper functions
xenbus: Support HVM backends
xen/xenbus-frontend: Fix compile error with randconfig
xen/xenbus-frontend: Make error message more clear
xen/privcmd: Remove unused support for arch specific privcmp mmap
xen: Add xenbus_backend device
xen: Add xenbus device driver
xen: Add privcmd device driver
xen/gntalloc: fix reference counts on multi-page mappings
...
When a PCI device is transferred to another domain and it is still
in usage (from the internal perspective), mention which other
domain is using it to aid in debugging.
[v2: Truncate the verbose message per Jan Beulich suggestion]
[v3: Suggestions from Ian Campbell on the wording]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
The full warning is:
"pciback 0000:05:00.0: device has been assigned to 2 domain! Over-writting the ownership, but beware."
which is correct - the previous domain that was using the device
forgot to unregister the ownership. This patch fixes this by
calling the unregister ownership function when the PCI device is
relinquished from the guest domain.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
operation instead of doing it per guest creation/disconnection. Without
this we could have potentially unloaded the vf driver from the
xen pciback control even if the driver was binded to the xen-pciback.
This will hold on to it until the user "unbind"s the PCI device using
SysFS.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
This resolves the conflict in the arch/arm/mach-s3c64xx/s3c6400.c file,
and it fixes the build error in the arch/x86/kernel/microcode_core.c
file, that the merge did not catch.
The microcode_core.c patch was provided by Stephen Rothwell
<sfr@canb.auug.org.au> who was invaluable in the merge issues involved
with the large sysdev removal process in the driver-core tree.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Use this now that it is defined even though it happens to be == PAGE_SIZE.
The code which takes requests from userspace already validates against the size
of this buffer so no further checks are required to ensure that userspace
requests comply with the protocol in this respect.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Haogang Chen <haogangchen@gmail.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Haogang Chen found out that:
There is a potential integer overflow in process_msg() that could result
in cross-domain attack.
body = kmalloc(msg->hdr.len + 1, GFP_NOIO | __GFP_HIGH);
When a malicious guest passes 0xffffffff in msg->hdr.len, the subsequent
call to xb_read() would write to a zero-length buffer.
The other end of this connection is always the xenstore backend daemon
so there is no guest (malicious or otherwise) which can do this. The
xenstore daemon is a trusted component in the system.
However this seem like a reasonable robustness improvement so we should
have it.
And Ian when read the API docs found that:
The payload length (len field of the header) is limited to 4096
(XENSTORE_PAYLOAD_MAX) in both directions. If a client exceeds the
limit, its xenstored connection will be immediately killed by
xenstored, which is usually catastrophic from the client's point of
view. Clients (particularly domains, which cannot just reconnect)
should avoid this.
so this patch checks against that instead.
This also avoids a potential integer overflow pointed out by Haogang Chen.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Haogang Chen <haogangchen@gmail.com>
CC: stable@kernel.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The 'name', 'owner', and 'mod_name' members are redundant with the
identically named fields in the 'driver' sub-structure. Rather than
switching each instance to specify these fields explicitly, introduce
a macro to simplify this.
Eliminate further redundancy by allowing the drvname argument to
DEFINE_XENBUS_DRIVER() to be blank (in which case the first entry from
the ID table will be used for .driver.name).
Also eliminate the questionable xenbus_register_{back,front}end()
wrappers - their sole remaining purpose was the checking of the
'owner' field, proper setting of which shouldn't be an issue anymore
when the macro gets used.
v2: Restore DRV_NAME for the driver name in xen-pciback.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Error handling code following a kmalloc should free the allocated data.
Out_unlock is used on both success and failure, so free vm_priv before
jumping to that label.
A simplified version of the semantic match that finds the problem is as
follows: (http://coccinelle.lip6.fr)
// <smpl>
@r exists@
local idexpression x;
statement S;
identifier f1;
position p1,p2;
expression *ptr != NULL;
@@
x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...);
...
if (x == NULL) S
<... when != x
when != if (...) { <+...x...+> }
x->f1
...>
(
return \(0\|<+...x...+>\|ptr\);
|
return@p2 ...;
)
@script:python@
p1 << r.p1;
p2 << r.p2;
@@
print "* file: %s kmalloc %s return %s" % (p1[0].file,p1[0].line,p2[0].line)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
[v1: Altered the description a bit]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
drivers/xen/xenbus/xenbus_dev_backend.c:74:2: error: implicit declaration of function 'xen_initial_domain'
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Add support for mappings without GNTMAP_contains_pte. This was not
supported because the unmap operation assumed that this flag was being
used; adding a parameter to the unmap operation to allow the PTE
clearing to be disabled is sufficient to make unmap capable of
supporting either mapping type.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
[v1: Fix cleanpatch warnings]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
For xenbus_{map,unmap}_ring to work on HVM, the grant table operations
must be set up using the gnttab_set_{map,unmap}_op functions instead of
directly populating the fields of gnttab_map_grant_ref. These functions
simply populate the structure on paravirtualized Xen; however, on HVM
they must call __pa() on vaddr when populating op->host_addr because the
hypervisor cannot directly interpret guest-virtual addresses.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
[v1: Fixed cleanpatch error]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Add HVM implementations of xenbus_(map,unmap)_ring_v(alloc,free) so
that ring mappings can be done without using GNTMAP_contains_pte which
is not supported on HVM. This also removes the need to use vmlist_lock
on PV by tracking the allocated xenbus rings.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
[v1: Fix compile error when XENBUS_FRONTEND is defined as module]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* commit 'v3.2-rc3': (412 commits)
Linux 3.2-rc3
virtio-pci: make reset operation safer
virtio-mmio: Correct the name of the guest features selector
virtio: add HAS_IOMEM dependency to MMIO platform bus driver
eCryptfs: Extend array bounds for all filename chars
eCryptfs: Flush file in vma close
eCryptfs: Prevent file create race condition
regulator: TPS65910: Fix VDD1/2 voltage selector count
i2c: Make i2cdev_notifier_call static
i2c: Delete ANY_I2C_BUS
i2c: Fix device name for 10-bit slave address
i2c-algo-bit: Generate correct i2c address sequence for 10-bit target
drm: integer overflow in drm_mode_dirtyfb_ioctl()
Revert "of/irq: of_irq_find_parent: check for parent equal to child"
drivers/gpu/vga/vgaarb.c: add missing kfree
drm/radeon/kms/atom: unify i2c gpio table handling
drm/radeon/kms: fix up gpio i2c mask bits for r4xx for real
ttm: Don't return the bo reserved on error path
mount_subtree() pointless use-after-free
iio: fix a leak due to improper use of anon_inode_getfd()
...
drivers/xen/xenbus/xenbus_dev_frontend.c: In function 'xenbus_init':
drivers/xen/xenbus/xenbus_dev_frontend.c:609:2: error: implicit declaration of function 'xen_domain'
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Merge in the upstream tree to bring in the mainline fixes.
Conflicts:
drivers/gpu/drm/exynos/exynos_drm_fbdev.c
drivers/gpu/drm/nouveau/nouveau_sgdma.c
This reverts commit ddacf5ef68.
As when booting the kernel under Amazon EC2 as an HVM guest it ends up
hanging during startup. Reverting this we loose the fix for kexec
booting to the crash kernels.
Fixes Canonical BZ #901305 (http://bugs.launchpad.net/bugs/901305)
Tested-by: Alessandro Salvatori <sandr8@gmail.com>
Reported-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Add the work frontend to the error message because we now also have a
backend device.
Signed-off-by: Bastian Blank <waldi@debian.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
This was used for ia64. But there is no working ia64 support in sight,
so remove it for now.
Signed-off-by: Bastian Blank <waldi@debian.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Access for xenstored to the event channel and pre-allocated ring is
managed via xenfs. This adds its own character device featuring mmap
for the ring and an ioctl for the event channel.
Signed-off-by: Bastian Blank <waldi@debian.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Access to xenbus is currently handled via xenfs. This adds a device
driver for xenbus and makes xenfs use this code.
Signed-off-by: Bastian Blank <waldi@debian.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Access to arbitrary hypercalls is currently provided via xenfs. This
adds a standard character device to handle this. The support in xenfs
remains for backward compatibility and uses the device driver code.
Signed-off-by: Bastian Blank <waldi@debian.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
When a multi-page mapping of gntalloc is created, the reference counts
of all pages in the vma are incremented. However, the vma open/close
operations only adjusted the reference count of the first page in the
mapping, leaking the other pages. Store a struct in the vm_private_data
to track the original page count to properly free the pages when the
last reference to the vma is closed.
Reported-by: Anil Madhavapeddy <anil@recoil.org>
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
gnttab_end_foreign_access_ref does not return the grant reference it is
passed to the free list; gnttab_free_grant_reference needs to be
explicitly called. While gnttab_end_foreign_access provides a wrapper
for this, it is unsuitable because it does not return errors.
Reported-by: Anil Madhavapeddy <anil@recoil.org>
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The event channel number provided to evtchn_get can be provided by
userspace, so needs to be checked against the maximum number of event
channels prior to using it to index into evtchn_to_irq.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
These allow a domain A which has been granted access on a page of domain B's
memory to issue domain C with a copy-grant on the same page. This is useful
e.g. for forwarding packets between domains.
Signed-off-by: Annie Li <annie.li@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
- They can't be used to map the page (so can only be used in a GNTTABOP_copy
hypercall).
- It's possible to grant access with a finer granularity than whole pages.
- Xen guarantees that they can be revoked quickly (a normal map grant can
only be revoked with the cooperation of the domain which has been granted
access).
Signed-off-by: Annie Li <annie.li@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
This adds the needed include file for xen-selfballoon.c to fix the build
error reported by Stephen Rothwell.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This fixes an odd bug found on a Dell PowerEdge 1850/0RC130
(BIOS A05 01/09/2006) where all of the modules doing pci_set_dma_mask
would fail with:
ata_piix 0000:00:1f.1: enabling device (0005 -> 0007)
ata_piix 0000:00:1f.1: can't derive routing for PCI INT A
ata_piix 0000:00:1f.1: BMDMA: failed to set dma mask, falling back to PIO
The issue was the Xen-SWIOTLB was allocated such as that the end of
buffer was stradling a page (and also above 4GB). The fix was
spotted by Kalev Leonid which was to piggyback on git commit
e79f86b2ef "swiotlb: Use page alignment
for early buffer allocation" which:
We could call free_bootmem_late() if swiotlb is not used, and
it will shrink to page alignment.
So alloc them with page alignment at first, to avoid lose two pages
And doing that fixes the outstanding issue.
CC: stable@kernel.org
Suggested-by: "Kalev, Leonid" <Leonid.Kalev@ca.com>
Reported-and-Tested-by: "Taylor, Neal E" <Neal.Taylor@ca.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
As a mechanism to detect whether SWIOTLB is enabled or not.
We also fix the spelling - it was swioltb instead of
swiotlb.
CC: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
[v1: Ripped out swiotlb_enabled]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Annie Li <annie.li@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Receiver-side copying of packets is based on this implementation, it gives
better performance and better CPU accounting. It totally supports three types:
full-page, sub-page and transitive grants.
However this patch does not cover sub-page and transitive grants, it mainly
focus on Full-page part and implements grant table V2 interfaces corresponding
to what already exists in grant table V1, such as: grant table V2
initialization, mapping, releasing and exported interfaces.
Each guest can only supports one type of grant table type, every entry in grant
table should be the same version. It is necessary to set V1 or V2 version before
initializing the grant table.
Grant table exported interfaces of V2 are same with those of V1, Xen is
responsible to judge what grant table version guests are using in every grant
operation.
V2 fulfills the same role of V1, and it is totally backwards compitable with V1.
If dom0 support grant table V2, the guests runing on it can run with either V1
or V2.
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Annie Li <annie.li@oracle.com>
[v1: Modified alloc_vm_area call (new parameters), indentation, and cleanpatch
warnings]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Annie Li <annie.li@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
This patch introduces new structures of grant table V2, grant table V2 is an
extension from V1. Grant table is shared between guest and Xen, and Xen is
responsible to do corresponding work for grant operations, such as: figure
out guest's grant table version, perform different actions based on
different grant table version, etc. Although full-page structure of V2
is different from V1, it play the same role as V1.
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Annie Li <annie.li@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Event channels exposed to userspace by the evtchn module may be used by
other modules in an asynchronous manner, which requires that reference
counting be used to prevent the event channel from being closed before
the signals are delivered.
The reference count on new event channels defaults to -1 which indicates
the event channel is not referenced outside the kernel; evtchn_get fails
if called on such an event channel. The event channels made visible to
userspace by evtchn have a normal reference count.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
When using the unmap notify ioctl, the event channel used for
notification needs to be reserved to avoid it being deallocated prior to
sending the notification.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The event channel release function cannot be called under a spinlock
because it can attempt to acquire a mutex due to the event channel
reference acquired when setting up unmap notifications.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
gref->gref_id is unsigned so the error handling didn't work.
gnttab_grant_foreign_access() returns an int type, so we can add a
cast here, and it doesn't cause any problems.
gnttab_grant_foreign_access() can return a variety of errors
including -ENOSPC, -ENOSYS and -ENOMEM.
CC: stable@kernel.org
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
On 32 bit systems a high value of op.count could lead to an integer
overflow in the kzalloc() and gref_ids would be smaller than
expected. If the you triggered another integer overflow in
"if (gref_size + op.count > limit)" then you'd probably get memory
corruption inside add_grefs().
CC: stable@kernel.org
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The multiplications here can overflow resulting in smaller buffer
sizes than expected. "count" comes from a copy_from_user().
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
If highmem pages are requested from the balloon on a system without
highmem, the implementation of alloc_xenballooned_pages will allocate
all available memory trying to find highmem pages to return. Allow
low memory to be returned when highmem pages are requested to avoid
this loop.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
When mapping a foreign page with xenbus_map_ring_valloc() with the
GNTTABOP_map_grant_ref hypercall, set the GNTMAP_contains_pte flag and
pass a pointer to the PTE (in init_mm).
After the page is mapped, the usual fault mechanism can be used to
update additional MMs. This allows the vmalloc_sync_all() to be
removed from alloc_vm_area().
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
[v1: Squashed fix by Michal for no-mmu case]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
Revert "tracing: Include module.h in define_trace.h"
irq: don't put module.h into irq.h for tracking irqgen modules.
bluetooth: macroize two small inlines to avoid module.h
ip_vs.h: fix implicit use of module_get/module_put from module.h
nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
include: replace linux/module.h with "struct module" wherever possible
include: convert various register fcns to macros to avoid include chaining
crypto.h: remove unused crypto_tfm_alg_modname() inline
uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
pm_runtime.h: explicitly requires notifier.h
linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
miscdevice.h: fix up implicit use of lists and types
stop_machine.h: fix implicit use of smp.h for smp_processor_id
of: fix implicit use of errno.h in include/linux/of.h
of_platform.h: delete needless include <linux/module.h>
acpi: remove module.h include from platform/aclinux.h
miscdevice.h: delete unnecessary inclusion of module.h
device_cgroup.h: delete needless include <linux/module.h>
net: sch_generic remove redundant use of <linux/module.h>
net: inet_timewait_sock doesnt need <linux/module.h>
...
Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
- drivers/media/dvb/frontends/dibx000_common.c
- drivers/media/video/{mt9m111.c,ov6650.c}
- drivers/mfd/ab3550-core.c
- include/linux/dmaengine.h
* 'stable/vmalloc-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
net: xen-netback: use API provided by xenbus module to map rings
block: xen-blkback: use API provided by xenbus module to map rings
xen: use generic functions instead of xen_{alloc, free}_vm_area()
When Xen is enabled, using BIOVEC_PHYS_MERGEABLE in a module
causes xen_biovec_phys_mergeable to be referenced, so it needs
to be exported.
Acked-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Things like THIS_MODULE and EXPORT_SYMBOL were simply everywhere
because module.h was also everywhere. But we are fixing the latter.
So we need to call out the real users in advance.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Previously these drivers just got module.h implicitly, but we
are cleaning that up and it will be no longer. Call out the
real users of it.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Add IRQF_RESUME_EARLY and resume such IRQs earlier
genirq: Fix fatfinered fixup really
genirq: percpu: allow interrupt type to be set at enable time
genirq: Add support for per-cpu dev_id interrupts
genirq: Add IRQCHIP_SKIP_SET_WAKE flag
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1745 commits)
dp83640: free packet queues on remove
dp83640: use proper function to free transmit time stamping packets
ipv6: Do not use routes from locally generated RAs
|PATCH net-next] tg3: add tx_dropped counter
be2net: don't create multiple RX/TX rings in multi channel mode
be2net: don't create multiple TXQs in BE2
be2net: refactor VF setup/teardown code into be_vf_setup/clear()
be2net: add vlan/rx-mode/flow-control config to be_setup()
net_sched: cls_flow: use skb_header_pointer()
ipv4: avoid useless call of the function check_peer_pmtu
TCP: remove TCP_DEBUG
net: Fix driver name for mdio-gpio.c
ipv4: tcp: fix TOS value in ACK messages sent from TIME_WAIT
rtnetlink: Add missing manual netlink notification in dev_change_net_namespaces
ipv4: fix ipsec forward performance regression
jme: fix irq storm after suspend/resume
route: fix ICMP redirect validation
net: hold sock reference while processing tx timestamps
tcp: md5: add more const attributes
Add ethtool -g support to virtio_net
...
Fix up conflicts in:
- drivers/net/Kconfig:
The split-up generated a trivial conflict with removal of a
stale reference to Documentation/networking/net-modules.txt.
Remove it from the new location instead.
- fs/sysfs/dir.c:
Fairly nasty conflicts with the sysfs rb-tree usage, conflicting
with Eric Biederman's changes for tagged directories.
* 'stable/drivers-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xenbus: don't rely on xen_initial_domain to detect local xenstore
xenbus: Fix loopback event channel assuming domain 0
xen/pv-on-hvm:kexec: Fix implicit declaration of function 'xen_hvm_domain'
xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel
xen/pv-on-hvm kexec: update xs_wire.h:xsd_sockmsg_type from xen-unstable
xen/pv-on-hvm kexec+kdump: reset PV devices in kexec or crash kernel
xen/pv-on-hvm kexec: rebind virqs to existing eventchannel ports
xen/pv-on-hvm kexec: prevent crash in xenwatch_thread() when stale watch events arrive
* 'stable/drivers.bugfixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/pciback: Check if the device is found instead of blindly assuming so.
xen/pciback: Do not dereference psdev during printk when it is NULL.
xen: remove XEN_PLATFORM_PCI config option
xen: XEN_PVHVM depends on PCI
xen/pciback: double lock typo
xen/pciback: use mutex rather than spinlock in vpci backend
xen/pciback: Use mutexes when working with Xenbus state transitions.
xen/pciback: miscellaneous adjustments
xen/pciback: use mutex rather than spinlock in passthrough backend
xen/pciback: use resource_size()
* 'stable/pci.fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/pci: support multi-segment systems
xen-swiotlb: When doing coherent alloc/dealloc check before swizzling the MFNs.
xen/pci: make bus notifier handler return sane values
xen-swiotlb: fix printk and panic args
xen-swiotlb: Fix wrong panic.
xen-swiotlb: Retry up three times to allocate Xen-SWIOTLB
xen-pcifront: Update warning comment to use 'e820_host' option.
* 'stable/bug.fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/p2m/debugfs: Make type_name more obvious.
xen/p2m/debugfs: Fix potential pointer exception.
xen/enlighten: Fix compile warnings and set cx to known value.
xen/xenbus: Remove the unnecessary check.
xen/irq: If we fail during msi_capability_init return proper error code.
xen/events: Don't check the info for NULL as it is already done.
xen/events: BUG() when we can't allocate our event->irq array.
* 'stable/mmu.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen: Fix selfballooning and ensure it doesn't go too far
xen/gntdev: Fix sleep-inside-spinlock
xen: modify kernel mappings corresponding to granted pages
xen: add an "highmem" parameter to alloc_xenballooned_pages
xen/p2m: Use SetPagePrivate and its friends for M2P overrides.
xen/p2m: Make debug/xen/mmu/p2m visible again.
Revert "xen/debug: WARN_ON when identity PFN has no _PAGE_IOMAP flag set."
.. we check whether 'xdev' is NULL - but there is no need for
it as the 'dev' check is done before. The 'dev' is embedded in
the 'xdev' so having xdev != NULL with dev being being checked
is not going to happen.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
There are three different modes: PV, HVM, and initial domain 0. In all
the cases we would return -1 for failure instead of a proper error code.
Fix this by propagating the error code from the generic IRQ code.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The list operation checks whether the 'info' structure that is
retrieved from the list is NULL (otherwise it would not been able
to retrieve it). This check is not neccessary.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
In case we can't allocate we are doomed. We should BUG_ON
instead of trying to dereference it later on.
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[v1: Use BUG_ON instead of BUG]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Just in case it is not found, don't try to dereference it.
[v1: Added WARN_ON, suggested by Jan Beulich <JBeulich@suse.com>]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
.. instead use BUG_ON() as all the callers of the kill_domain_by_device
check for psdev.
Suggested-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
This adds a mechanism to resume selected IRQs during syscore_resume
instead of dpm_resume_noirq.
Under Xen we need to resume IRQs associated with IPIs early enough
that the resched IPI is unmasked and we can therefore schedule
ourselves out of the stop_machine where the suspend/resume takes
place.
This issue was introduced by 676dc3cf5b "xen: Use IRQF_FORCE_RESUME".
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Jeremy Fitzhardinge <Jeremy.Fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/1318713254.11016.52.camel@dagon.hellion.org.uk
Cc: stable@kernel.org (at least to 2.6.32.y)
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The balloon driver's "current_pages" is very different from
totalram_pages. Self-ballooning needs to be driven by
the latter. Also, Committed_AS doesn't account for pages
used by the kernel so:
1) Add totalreserve_pages to Committed_AS for the normal target.
2) Enforce a floor for when there are little or no user-space threads
using memory (e.g. single-user mode) to avoid OOMs. The floor
function includes a "min_usable_mb" tuneable in case we discover
later that the floor function is still too aggressive in some
workloads, though likely it will not be needed.
Changes since version 4:
- change floor calculation so that it is not as aggressive; this version
uses a piecewise linear function similar to minimum_target in the 2.6.18
balloon driver, but modified to add to totalreserve_pages instead of
subtract from max_pfn, the 2.6.18 version causes OOMs on recent kernels
because the kernel has expanded over time
- change safety_margin to min_usable_mb and comment on its use
- since committed_as does NOT include kernel space (and other reserved
pages), totalreserve_pages is now added to committed_as. The result is
less aggressive self-ballooning, but theoretically more appropriate.
Changes since version 3:
- missing include causes compile problem when CONFIG_FRONTSWAP is disabled
- add comments after includes
Changes since version 2:
- missing include causes compile problem only on 32-bit
Changes since version 1:
- tuneable safety margin added
[v5: avi.miller@oracle.com: still too aggressive, seeing some OOMs]
[v4: konrad.wilk@oracle.com: fix compile when CONFIG_FRONTSWAP is disabled]
[v3: guru.anbalagane@oracle.com: fix 32-bit compile]
[v2: konrad.wilk@oracle.com: make safety margin tuneable]
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
[v1: Altered description and added an extra include]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
BUG: sleeping function called from invalid context at /local/scratch/dariof/linux/kernel/mutex.c:271
in_atomic(): 1, irqs_disabled(): 0, pid: 3256, name: qemu-dm
1 lock held by qemu-dm/3256:
#0: (&(&priv->lock)->rlock){......}, at: [<ffffffff813223da>] gntdev_ioctl+0x2bd/0x4d5
Pid: 3256, comm: qemu-dm Tainted: G W 3.1.0-rc8+ #5
Call Trace:
[<ffffffff81054594>] __might_sleep+0x131/0x135
[<ffffffff816bd64f>] mutex_lock_nested+0x25/0x45
[<ffffffff8131c7c8>] free_xenballooned_pages+0x20/0xb1
[<ffffffff8132194d>] gntdev_put_map+0xa8/0xdb
[<ffffffff816be546>] ? _raw_spin_lock+0x71/0x7a
[<ffffffff813223da>] ? gntdev_ioctl+0x2bd/0x4d5
[<ffffffff8132243c>] gntdev_ioctl+0x31f/0x4d5
[<ffffffff81007d62>] ? check_events+0x12/0x20
[<ffffffff811433bc>] do_vfs_ioctl+0x488/0x4d7
[<ffffffff81007d4f>] ? xen_restore_fl_direct_reloc+0x4/0x4
[<ffffffff8109168b>] ? lock_release+0x21c/0x229
[<ffffffff81135cdd>] ? rcu_read_unlock+0x21/0x32
[<ffffffff81143452>] sys_ioctl+0x47/0x6a
[<ffffffff816bfd82>] system_call_fastpath+0x16/0x1b
gntdev_put_map tries to acquire a mutex when freeing pages back to the
xenballoon pool, so it cannot be called with a spinlock held. In
gntdev_release, the spinlock is not needed as we are freeing the
structure later; in the ioctl, only the list manipulation needs to be
under the lock.
Reported-and-Tested-By: Dario Faggioli <dario.faggioli@citrix.com>
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The xenstore daemon does not have to run in the xen initial domain;
however, Linux currently uses xen_initial_domain to test if a loopback
event channel should be used instead of the event channel provided in
Xen's start_info structure. Instead, if the event channel passed in the
start_info structure is not valid, assume that this domain will run
xenstored locally and set up the event channel.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Reviewed-by: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The xenbus event channel established in xenbus_init is intended to be a
loopback channel, but the remote domain was hardcoded to 0; this will
cause the channel to be unusable when xenstore is not being run in
domain 0.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Reviewed-by: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Replace calls to the Xen-specific xen_alloc_vm_area() and
xen_free_vm_area() functions with the generic equivalent
(alloc_vm_area() and free_vm_area()).
On x86, these were identical already.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Allow the xen balloon driver to populate its list of extra pages from
more than one region of memory. This will allow platforms to provide
(for example) a region of low memory and a region of high memory.
The maximum possible number of extra regions is 128 (== E820MAX) which
is quite large so xen_extra_mem is placed in __initdata. This is safe
as both xen_memory_setup() and balloon_init() are in __init.
The balloon regions themselves are not altered (i.e., there is still
only the one region).
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
When initializing the balloon only max_pfn needs to be checked
(max_pfn will always be <= e820_end_of_ram_pfn()) and improve the
confusing comment.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
In xen_memory_setup() pages that occur in gaps in the memory map are
released back to Xen. This reduces the domain's current page count in
the hypervisor. The Xen balloon driver does not correctly decrease
its initial current_pages count to reflect this. If 'delta' pages are
released and the target is adjusted the resulting reservation is
always 'delta' less than the requested target.
This affects dom0 if the initial allocation of pages overlaps the PCI
memory region but won't affect most domU guests that have been setup
with pseudo-physical memory maps that don't have gaps.
Fix this by accouting for the released pages when starting the balloon
driver.
If the domain's targets are managed by xapi, the domain may eventually
run out of memory and die because xapi currently gets its target
calculations wrong and whenever it is restarted it always reduces the
target by 'delta'.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Xen PVHVM needs xen-platform-pci, on the other hand xen-platform-pci is
useless in any other cases.
Therefore remove the XEN_PLATFORM_PCI config option and compile
xen-platform-pci built-in if XEN_PVHVM is selected.
Changes to v1:
- remove xen-platform-pci.o and just use platform-pci.o since it is not
externally visible anymore.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
We called mutex_lock() twice instead of unlocking.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
If we want to use granted pages for AIO, changing the mappings of a user
vma and the corresponding p2m is not enough, we also need to update the
kernel mappings accordingly.
Currently this is only needed for pages that are created for user usages
through /dev/xen/gntdev. As in, pages that have been in use by the
kernel and use the P2M will not need this special mapping.
However there are no guarantees that in the future the kernel won't
start accessing pages through the 1:1 even for internal usage.
In order to avoid the complexity of dealing with highmem, we allocated
the pages lowmem.
We issue a HYPERVISOR_grant_table_op right away in
m2p_add_override and we remove the mappings using another
HYPERVISOR_grant_table_op in m2p_remove_override.
Considering that m2p_add_override and m2p_remove_override are called
once per page we use multicalls and hypercall batching.
Use the kmap_op pointer directly as argument to do the mapping as it is
guaranteed to be present up until the unmapping is done.
Before issuing any unmapping multicalls, we need to make sure that the
mapping has already being done, because we need the kmap->handle to be
set correctly.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
[v1: Removed GRANT_FRAME_BIT usage]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Add an highmem parameter to alloc_xenballooned_pages, to allow callers to
request lowmem or highmem pages.
Fix the code style of free_xenballooned_pages' prototype.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Device drivers that create and destroy SR-IOV virtual functions via
calls to pci_enable_sriov() and pci_disable_sriov can cause catastrophic
failures if they attempt to destroy VFs while they are assigned to
guest virtual machines. By adding a flag for use by the Xen PCI back
to indicate that a device is assigned a device driver can check that
flag and avoid destroying VFs while they are assigned and avoid system
failures.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Randy found a compile error when using make randconfig to trigger
drivers/xen/xenbus/xenbus_xs.c:909:2: error: implicit declaration of function 'xen_hvm_domain'
it is unclear which of the CONFIG options triggered this. This
patch fixes the error.
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Add new xs_reset_watches function to shutdown watches from old kernel after
kexec boot. The old kernel does not unregister all watches in the
shutdown path. They are still active, the double registration can not
be detected by the new kernel. When the watches fire, unexpected events
will arrive and the xenwatch thread will crash (jumps to NULL). An
orderly reboot of a hvm guest will destroy the entire guest with all its
resources (including the watches) before it is rebuilt from scratch, so
the missing unregister is not an issue in that case.
With this change the xenstored is instructed to wipe all active watches
for the guest. However, a patch for xenstored is required so that it
accepts the XS_RESET_WATCHES request from a client (see changeset
23839:42a45baf037d in xen-unstable.hg). Without the patch for xenstored
the registration of watches will fail and some features of a PVonHVM
guest are not available. The guest is still able to boot, but repeated
kexec boots will fail.
[v5: use xs_single instead of passing a dummy string to xs_talkv]
[v4: ignore -EEXIST in xs_reset_watches]
[v3: use XS_RESET_WATCHES instead of XS_INTRODUCE]
[v2: move all code which deals with XS_INTRODUCE into xs_introduce()
(based on feedback from Ian Campbell); remove casts from kvec assignment]
Signed-off-by: Olaf Hering <olaf@aepfle.de>
[v1: Redid the git description a bit]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Now that the hypercall interface changes are in -unstable, make the
kernel side code not ignore the segment (aka domain) number anymore
(which results in pretty odd behavior on such systems). Rather, if
only the old interfaces are available, don't call them for devices on
non-zero segments at all.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
[v1: Edited git description]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Similar to the "xen/pciback: use mutex rather than spinlock in passthrough backend"
this patch converts the vpci backend to use a mutex instead of
a spinlock. Note that the code taking the lock won't ever get called
from non-sleepable context
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The caller that orchestrates the state changes is xenwatch_thread
and it takes a mutex. In our processing of Xenbus states we can take
the luxery of going to sleep on a mutex, so lets do that and
also fix this bug:
BUG: sleeping function called from invalid context at /linux/kernel/mutex.c:271
in_atomic(): 1, irqs_disabled(): 0, pid: 32, name: xenwatch
2 locks held by xenwatch/32:
#0: (xenwatch_mutex){......}, at: [<ffffffff813856ab>] xenwatch_thread+0x4b/0x180
#1: (&(&pdev->dev_lock)->rlock){......}, at: [<ffffffff8138f05b>] xen_pcibk_disconnect+0x1b/0x80
Pid: 32, comm: xenwatch Not tainted 3.1.0-rc6-00015-g3ce340d #2
Call Trace:
[<ffffffff810892b2>] __might_sleep+0x102/0x130
[<ffffffff8163b90f>] mutex_lock_nested+0x2f/0x50
[<ffffffff81382c1c>] unbind_from_irq+0x2c/0x1b0
[<ffffffff8110da66>] ? free_irq+0x56/0xb0
[<ffffffff81382dbc>] unbind_from_irqhandler+0x1c/0x30
[<ffffffff8138f06b>] xen_pcibk_disconnect+0x2b/0x80
[<ffffffff81390348>] xen_pcibk_frontend_changed+0xe8/0x140
[<ffffffff81387ac2>] xenbus_otherend_changed+0xd2/0x150
[<ffffffff810895c1>] ? get_parent_ip+0x11/0x50
[<ffffffff81387de0>] frontend_changed+0x10/0x20
[<ffffffff81385712>] xenwatch_thread+0xb2/0x180
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
This is a minor bugfix and a set of small cleanups; as it is not clear
whether this needs splitting into pieces (and if so, at what
granularity), it is a single combined patch.
- add a missing return statement to an error path in
kill_domain_by_device()
- use pci_is_enabled() rather than raw atomic_read()
- remove a bogus attempt to zero-terminate an already zero-terminated
string
- #define DRV_NAME once uniformly in the shared local header
- make DRIVER_ATTR() variables static
- eliminate a pointless use of list_for_each_entry_safe()
- add MODULE_ALIAS()
- a little bit of constification
- adjust a few messages
- remove stray semicolons from inline function definitions
Signed-off-by: Jan Beulich <jbeulich@suse.com>
[v1: Dropped the resource_size fix, altered the description]
[v2: Fixed cleanpatch.pl comments]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To accommodate the call to the callback function from
__xen_pcibk_publish_pci_roots(), which so far dropped and the re-
acquired the lock without checking that the list didn't actually
change, convert the code to use a mutex instead (observing that the
code taking the lock won't ever get called from non-sleepable
context).
As a result, drop the bogus use of list_for_each_entry_safe() and
remove the inappropriate dropping of the lock.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Use resource_size function on resource object
instead of explicit computation.
The semantic patch that makes this output is available
in scripts/coccinelle/api/resource_size.cocci.
More information about semantic patching is available at
http://coccinelle.lip6.fr/
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
When we allocate/change the IRQ informations, we do not
need to use spinlocks. We can use a mutex (which is
what the generic IRQ code does for allocations/changes).
Fixes a slew of:
BUG: sleeping function called from invalid context at /linux/kernel/mutex.c:271
in_atomic(): 1, irqs_disabled(): 0, pid: 3216, name: xenstored
2 locks held by xenstored/3216:
#0: (&u->bind_mutex){......}, at: [<ffffffffa02e0920>] evtchn_ioctl+0x30/0x3a0 [xen_evtchn]
#1: (irq_mapping_update_lock){......}, at: [<ffffffff8138b274>] bind_evtchn_to_irq+0x24/0x90
Pid: 3216, comm: xenstored Not tainted 3.1.0-rc6-00021-g437a3d1 #2
Call Trace:
[<ffffffff81088d10>] __might_sleep+0x100/0x130
[<ffffffff81645c2f>] mutex_lock_nested+0x2f/0x50
[<ffffffff81627529>] __irq_alloc_descs+0x49/0x200
[<ffffffffa02e0920>] ? evtchn_ioctl+0x30/0x3a0 [xen_evtchn]
[<ffffffff8138b214>] xen_allocate_irq_dynamic+0x34/0x70
[<ffffffff8138b2ad>] bind_evtchn_to_irq+0x5d/0x90
[<ffffffffa02e03c0>] ? evtchn_bind_to_user+0x60/0x60 [xen_evtchn]
[<ffffffff8138c282>] bind_evtchn_to_irqhandler+0x32/0x80
[<ffffffffa02e03a9>] evtchn_bind_to_user+0x49/0x60 [xen_evtchn]
[<ffffffffa02e0a34>] evtchn_ioctl+0x144/0x3a0 [xen_evtchn]
[<ffffffff811b4070>] ? vfsmount_lock_local_unlock+0x50/0x80
[<ffffffff811a6a1a>] do_vfs_ioctl+0x9a/0x5e0
[<ffffffff811b476f>] ? mntput+0x1f/0x30
[<ffffffff81196259>] ? fput+0x199/0x240
[<ffffffff811a7001>] sys_ioctl+0xa1/0xb0
[<ffffffff8164ea82>] system_call_fastpath+0x16/0x1b
Reported-by: Jim Burns <jim_burn@bellsouth.net>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
After triggering a crash dump in a HVM guest, the PV backend drivers
will remain in Connected state. When the kdump kernel starts the PV
drivers will skip such devices. As a result, no root device is found and
the vmcore cant be saved.
A similar situation happens after a kexec boot, here the devices will be
in the Closed state.
With this change all frontend devices with state XenbusStateConnected or
XenbusStateClosed will be reset by changing the state file to Closing ->
Closed -> Initializing. This will trigger a disconnect in the backend
drivers. Now the frontend drivers will find the backend drivers in state
Initwait and can connect.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
[v2:
- add timeout when waiting for backend state change
(based on feedback from Ian Campell)
- extent printk message to include backend string
- add comment to fall-through case in xenbus_reset_frontend]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
During a kexec boot some virqs such as timer and debugirq were already
registered by the old kernel. The hypervisor will return -EEXISTS from
the new EVTCHNOP_bind_virq request and the BUG in bind_virq_to_irq()
triggers. Catch the -EEXISTS error and loop through all possible ports to find
what port belongs to the virq/cpu combo.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
[v2:
- use NR_EVENT_CHANNELS instead of private MAX_EVTCHNS]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
During repeated kexec boots xenwatch_thread() can crash because
xenbus_watch->callback is cleared by xenbus_watch_path() if a node/token
combo for a new watch happens to match an already registered watch from
an old kernel. In this case xs_watch returns -EEXISTS, then
register_xenbus_watch() does not remove the to-be-registered watch from
the list of active watches but returns the -EEXISTS to the caller
anyway.
Because the watch is still active in xenstored it will cause an event
which will arrive in the new kernel. process_msg() will find the
encapsulated struct xenbus_watch in its list of registered watches and
puts the "empty" watch handle in the queue for xenwatch_thread().
xenwatch_thread() then calls ->callback which was cleared earlier by
xenbus_watch_path().
To prevent that crash in a guest running on an old xen toolstack remove
the special -EEXIST handling.
v2:
- remove the EEXIST handing in register_xenbus_watch() instead of
checking for ->callback in process_msg()
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Olaf Hering <olaf@aepfle.de>
The process to swizzle a Machine Frame Number (MFN) is not always
necessary. Especially if we know that we actually do not have to do it.
In this patch we check the MFN against the device's coherent
DMA mask and if the requested page(s) are contingous. If it all checks
out we will just return the bus addr without doing the memory
swizzle.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Notifier functions are expected to return NOTIFY_* codes, not -E...
ones. In particular, since the respective hypercalls failing is not
fatal to the operation of the Dom0 kernel, it must be avoided to
return negative values here as those would make it appear as if
NOTIFY_STOP_MASK wa set, suppressing further notification calls to
other interested parties (which is also why we don't want to use
notifier_from_errno() here).
While at it, also notify the user of a failed hypercall.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
[v1: Added dev_err and the disable MSI/MSI-X call]
[v2: Removed the disable MSI/MSI-X call]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Fix printk() and panic() args [swap them] to fix build warnings:
drivers/xen/swiotlb-xen.c:201: warning: format '%s' expects type 'char *', but argument 2 has type 'int'
drivers/xen/swiotlb-xen.c:201: warning: format '%d' expects type 'int', but argument 3 has type 'char *'
drivers/xen/swiotlb-xen.c:202: warning: format '%s' expects type 'char *', but argument 2 has type 'int'
drivers/xen/swiotlb-xen.c:202: warning: format '%d' expects type 'int', but argument 3 has type 'char *'
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Propagate the baremetal git commit "swiotlb: fix wrong panic"
(fba99fa38b) in the Xen-SWIOTLB version.
wherein swiotlb's map_page wrongly calls panic() when it can't find
a buffer fit for device's dma mask. It should return an error instead.
Devices with an odd dma mask (i.e. under 4G) like b44 network card hit
this bug (the system crashes):
http://marc.info/?l=linux-kernel&m=129648943830106&w=2
If xen-swiotlb returns an error, b44 driver can use the own bouncing
mechanism.
CC: stable@kernel.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
We can fail seting up Xen-SWIOTLB if:
- The host does not have enough contiguous DMA32 memory available
(can happen on a machine that has fragmented memory from starting,
stopping many guests).
- Not enough low memory (almost never happens).
We retry allocating and exchanging the swath of contiguous memory
up to three times. Each time we decrease the amount we need - the
minimum being of 2MB.
If we compleltly fail, we will print the reason for failure on the Xen
console on top of doing it to earlyprintk=xen console.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Fix build errors (found when CONFIG_SYSFS is not enabled):
drivers/xen/xen-selfballoon.c:446: warning: data definition has no type or storage class
drivers/xen/xen-selfballoon.c:446: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL'
drivers/xen/xen-selfballoon.c:446: warning: parameter names (without types) in function declaration
drivers/xen/xen-selfballoon.c:485: error: expected declaration specifiers or '...' before string constant
drivers/xen/xen-selfballoon.c:485: warning: data definition has no type or storage class
drivers/xen/xen-selfballoon.c:485: warning: type defaults to 'int' in declaration of 'MODULE_LICENSE'
drivers/xen/xen-selfballoon.c:485: warning: function declaration isn't a prototype
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix build errors (found when CONFIG_SYSFS is not enabled):
drivers/xen/xen-selfballoon.c:446: warning: data definition has no type or storage class
drivers/xen/xen-selfballoon.c:446: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL'
drivers/xen/xen-selfballoon.c:446: warning: parameter names (without types) in function declaration
drivers/xen/xen-selfballoon.c:485: error: expected declaration specifiers or '...' before string constant
drivers/xen/xen-selfballoon.c:485: warning: data definition has no type or storage class
drivers/xen/xen-selfballoon.c:485: warning: type defaults to 'int' in declaration of 'MODULE_LICENSE'
drivers/xen/xen-selfballoon.c:485: warning: function declaration isn't a prototype
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Without enabling CONFIG_XEN_TMEM we get this:
drivers/xen/xen-selfballoon.c:461: undefined reference to `tmem_enabled'
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
With a specific enough .config file compile errors show
for missing workqueue declarations.
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
There is no need to use dynamic initializaion, it just confuses the reader.
Switch to static initializers like its used in other files.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
[v2: Rebased on v3.0]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
This is a patch to the xenbus_probe.c file that fixed up braces and tabs errors found by the checkpatch.pl tools.
Signed-off-by: Ruslan Pisarev <ruslan@rpisarev.org.ua>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
This is a patch to the xenbus_probe.h file that fixed up braces errors found by the checkpatch.pl tools.
Signed-off-by: Ruslan Pisarev <ruslan@rpisarev.org.ua>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
This is a patch to the pci.c file that fixed up whitespaces, tabs warnings found by the checkpatch.pl tools.
Signed-off-by: Ruslan Pisarev <ruslan@rpisarev.org.ua>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
This is a patch to the gntdev.c and grant-table.c files that fixed up
braces errors found by the checkpatch.pl tools.
Signed-off-by: Ruslan Pisarev <ruslan@rpisarev.org.ua>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
This is a patch to the events.c file that fixed up
whitespaces, tabs and braces errors found by the
checkpatch.pl tools.
Signed-off-by: Ruslan Pisarev <ruslan@rpisarev.org.ua>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
This is a patch to the balloon.c file that fixed up
whitespaces, tabs errors found by the checkpatch.pl tools.
Signed-off-by: Ruslan Pisarev <ruslan@rpisarev.org.ua>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Memory hotplug support for Xen balloon driver. It should be mentioned
that hotplugged memory is not onlined automatically. It should be onlined
by user through standard sysfs interface.
Memory could be hotplugged in following steps:
1) dom0: xl mem-max <domU> <maxmem>
where <maxmem> is >= requested memory size,
2) dom0: xl mem-set <domU> <memory>
where <memory> is requested memory size; alternatively memory
could be added by writing proper value to
/sys/devices/system/xen_memory/xen_memory0/target or
/sys/devices/system/xen_memory/xen_memory0/target_kb on dumU,
3) domU: for i in /sys/devices/system/memory/memory*/state; do \
[ "`cat "$i"`" = offline ] && echo online > "$i"; done
Memory could be onlined automatically on domU by adding following line to
udev rules:
SUBSYSTEM=="memory", ACTION=="add", RUN+="/bin/sh -c '[ -f /sys$devpath/state ] && echo online > /sys$devpath/state'"
In that case step 3 should be omitted.
Signed-off-by: Daniel Kiper <dkiper@net-space.pl>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'stable/drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/pciback: Have 'passthrough' option instead of XEN_PCIDEV_BACKEND_PASS and XEN_PCIDEV_BACKEND_VPCI
xen/pciback: Remove the DEBUG option.
xen/pciback: Drop two backends, squash and cleanup some code.
xen/pciback: Print out the MSI/MSI-X (PIRQ) values
xen/pciback: Don't setup an fake IRQ handler for SR-IOV devices.
xen: rename pciback module to xen-pciback.
xen/pciback: Fine-grain the spinlocks and fix BUG: scheduling while atomic cases.
xen/pciback: Allocate IRQ handler for device that is shared with guest.
xen/pciback: Disable MSI/MSI-X when reseting a device
xen/pciback: guest SR-IOV support for PV guest
xen/pciback: Register the owner (domain) of the PCI device.
xen/pciback: Cleanup the driver based on checkpatch warnings and errors.
xen/pciback: xen pci backend driver.
xen: tmem: self-ballooning and frontswap-selfshrinking
xen: Add module alias to autoload backend drivers
xen: Populate xenbus device attributes
xen: Add __attribute__((format(printf... where appropriate
xen: prepare tmem shim to handle frontswap
xen: allow enable use of VGA console on dom0
* stable/xen-pciback-0.6.3:
xen/pciback: Have 'passthrough' option instead of XEN_PCIDEV_BACKEND_PASS and XEN_PCIDEV_BACKEND_VPCI
xen/pciback: Remove the DEBUG option.
xen/pciback: Drop two backends, squash and cleanup some code.
xen/pciback: Print out the MSI/MSI-X (PIRQ) values
xen/pciback: Don't setup an fake IRQ handler for SR-IOV devices.
xen: rename pciback module to xen-pciback.
xen/pciback: Fine-grain the spinlocks and fix BUG: scheduling while atomic cases.
xen/pciback: Allocate IRQ handler for device that is shared with guest.
xen/pciback: Disable MSI/MSI-X when reseting a device
xen/pciback: guest SR-IOV support for PV guest
xen/pciback: Register the owner (domain) of the PCI device.
xen/pciback: Cleanup the driver based on checkpatch warnings and errors.
xen/pciback: xen pci backend driver.
Conflicts:
drivers/xen/Kconfig
.. compile options. This way the user can decide during runtime whether they
want the default 'vpci' (virtual pci passthrough) or where the PCI devices
are passed in without any BDF renumbering. The option 'passthrough' allows
the user to toggle the it from 0 (vpci) to 1 (passthrough).
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The latter is easily fixed - by the developer compiling the
module with -DDEBUG. And during runtime - the loglvl provides
quite a lot of useful data.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
- Remove the slot and controller controller backend as they
are not used.
- Document the find pciback_[read|write]_config_[byte|word|dword]
to make it easier to find.
- Collapse the code from conf_space_capability_msi into pciback_ops.c
- Collapse conf_space_capability_[pm|vpd].c in conf_space_capability.c
[and remove the conf_space_capability.h file]
- Rename all visible functions from pciback to xen_pcibk.
- Rename all the printk/pr_info, etc that use the "pciback" to say
"xen-pciback".
- Convert functions that are not referenced outside the code to be
static to save on name space.
- Do the same thing for structures that are internal to the driver.
- Run checkpatch.pl after the renames and fixup its warnings and
fix any compile errors caused by the variable rename
- Cleanup any structs that checkpath.pl commented about or just
look odd.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
If the verbose_request is set (and loglevel high enough), print out
the MSI/MSI-X values that are sent to the guest. This should aid in
debugging issues.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
If we try to setup an fake IRQ handler for legacy interrupts
for devices that only have MSI-X (most if not all SR-IOV cards),
we will fail with this:
pciback[0000:01:10.0]: failed to install fake IRQ handler for IRQ 0! (rc:-38)
Since those cards don't have anything in dev->irq.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
pciback is rather generic for a modular distro style kernel.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
We were using coarse spinlocks that could end up with a deadlock.
This patch fixes that and makes the spinlocks much more fine-grained.
We also drop be->watchding state spinlocks as they are already
guarded by the xenwatch_thread against multiple customers. Without
that we would trigger the BUG: scheduling while atomic.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
If the device that is to be shared with a guest is a level device and
the IRQ is shared with the initial domain we need to take actions.
Mainly we install a dummy IRQ handler that will ACK on the interrupt
line so as to not have the initial domain disable the interrupt line.
This dummy IRQ handler is not enabled when the device MSI/MSI-X lines
are set, nor for edge interrupts. And also not for level interrupts
that are not shared amongst devices. Lastly, if the user passes
to the guest all of the PCI devices on the shared line the we won't
install the dummy handler either.
There is also SysFS instrumentation to check its state and turn
IRQ ACKing on/off if necessary.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
In cases where the guest is abruptly killed and has not disabled
MSI/MSI-X interrupts we want to do it for it.
Otherwise when the guest is started up and enables MSI, we would
get a WARN() that the device already had been enabled.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
These changes are for PV guest to use Virtual Function. Because the VF's
vendor, device registers in cfg space are 0xffff, which are invalid and
ignored by PCI device scan. Values in 'struct pci_dev' are fixed up by
SR-IOV code, and using these values will present correct VID and DID to
PV guest kernel.
And command registers in the cfg space are read only 0, which means we
have to emulate MMIO enable bit (VF only uses MMIO resource) so PV
kernel can work properly.
Acked-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
When the front-end and back-end start negotiating we register
the domain that will use the PCI device. Furthermore during shutdown
of guest or unbinding of the PCI device (and unloading of module)
from pciback we unregister the domain owner.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Checkpatch found some extra warnings and errors. This mega
patch fixes them all in one big swoop. We also spruce
up the pcistub_ids to use DEFINE_PCI_DEVICE_TABLE macro
(suggested by Jan Beulich).
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
This is the host side counterpart to the frontend driver in
drivers/pci/xen-pcifront.c. The PV protocol is also implemented by
frontend drivers in other OSes too, such as the BSDs.
The PV protocol is rather simple. There is page shared with the guest,
which has the 'struct xen_pci_sharedinfo' embossed in it. The backend
has a thread that is kicked every-time the structure is changed and
based on the operation field it performs specific tasks:
XEN_PCI_OP_conf_[read|write]:
Read/Write 0xCF8/0xCFC filtered data. (conf_space*.c)
Based on which field is probed, we either enable/disable the PCI
device, change power state, read VPD, etc. The major goal of this
call is to provide a Physical IRQ (PIRQ) to the guest.
The PIRQ is Xen hypervisor global IRQ value irrespective of the IRQ
is tied in to the IO-APIC, or is a vector. For GSI type
interrupts, the PIRQ==GSI holds. For MSI/MSI-X the
PIRQ value != Linux IRQ number (thought PIRQ==vector).
Please note, that with Xen, all interrupts (except those level shared ones)
are injected directly to the guest - there is no host interaction.
XEN_PCI_OP_[enable|disable]_msi[|x] (pciback_ops.c)
Enables/disables the MSI/MSI-X capability of the device. These operations
setup the MSI/MSI-X vectors for the guest and pass them to the frontend.
When the device is activated, the interrupts are directly injected in the
guest without involving the host.
XEN_PCI_OP_aer_[detected|resume|mmio|slotreset]: In case of failure,
perform the appropriate AER commands on the guest. Right now that is
a cop-out - we just kill the guest.
Besides implementing those commands, it can also
- hide a PCI device from the host. When booting up, the user can specify
xen-pciback.hide=(1:0:0)(BDF..) so that host does not try to use the
device.
The driver was lifted from linux-2.6.18.hg tree and fixed up
so that it could compile under v3.0. Per suggestion from Jesse Barnes
moved the driver to drivers/xen/xen-pciback.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
In the past (2.6.38) the 'xen_allocate_pirq_gsi' would allocate
an entry in a Linux IRQ -> {XEN_IRQ, type, event, ..} array. All
of that has been removed in 2.6.39 and the Xen IRQ subsystem uses
an linked list that is populated when the call to
'xen_allocate_irq_gsi' (universally done from any of the xen_bind_*
calls) is done. The 'xen_allocate_pirq_gsi' is a NOP and there is
no need for it anymore so lets remove it.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Since they are only called once and the rest of the pci_xen_*
functions follow the same pattern of setup.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
This patch introduces two in-kernel drivers for Xen transcendent memory
("tmem") functionality that complement cleancache and frontswap. Both
use control theory to dynamically adjust and optimize memory utilization.
Selfballooning controls the in-kernel Xen balloon driver, targeting a goal
value (vm_committed_as), thus pushing less frequently used clean
page cache pages (through the cleancache code) into Xen tmem where
Xen can balance needs across all VMs residing on the physical machine.
Frontswap-selfshrinking controls the number of pages in frontswap,
driving it towards zero (effectively doing a partial swapoff) when
in-kernel memory pressure subsides, freeing up RAM for other VMs.
More detail is provided in the header comment of xen-selfballooning.c.
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
[v8: konrad.wilk@oracle.com: set default enablement depending on frontswap]
[v7: konrad.wilk@oracle.com: fix capitalization and punctuation in comments]
[v6: fix frontswap-selfshrinking initialization]
[v6: konrad.wilk@oracle.com: fix init pr_infos; add comments about swap]
[v5: konrad.wilk@oracle.com: add NULL to attr list; move inits up to decls]
[v4: dkiper@net-space.pl: use strict_strtoul plus a few syntactic nits]
[v3: konrad.wilk@oracle.com: fix potential divides-by-zero]
[v3: konrad.wilk@oracle.com: add many more comments, fix nits]
[v2: rebased to linux-3.0-rc1]
[v2: Ian.Campbell@citrix.com: reorganize as new file (xen-selfballoon.c)]
[v2: dkiper@net-space.pl: proper access to vm_committed_as]
[v2: dkiper@net-space.pl: accounting fixes]
Cc: Jan Beulich <JBeulich@novell.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: <xen-devel@lists.xensource.com>
All the Xen backend drivers are assigned to a special bus type
xen-backend. This patch exports xen-backend:* names through modalias and
uevent to autoload them.
Signed-off-by: Bastian Blank <waldi@debian.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The xenbus bus type uses device_create_file to assign all used device
attributes. However it does not remove them when the device goes away.
This patch uses the dev_attrs field of the bus type to specify default
attributes for all devices.
Signed-off-by: Bastian Blank <waldi@debian.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* 'stable/bug.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/setup: Fix for incorrect xen_extra_mem_start.
xen: When calling power_off, don't call the halt function.
xen: Fix compile warning when CONFIG_SMP is not defined.
xen: support CONFIG_MAXSMP
xen: partially revert "xen: set max_pfn_mapped to the last pfn mapped"
Provide the shim code for frontswap for Xen tmem even if the
frontswap patchset is not present yet. (The egg is before
the chicken.)
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Reviewed-by: Konrad Wilk <konrad.wilk@oracle.com>
* 'stable/xen-swiotlb.bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb-2.6:
swiotlb: Export swioltb_nr_tbl and utilize it as appropiate.
By default the io_tlb_nslabs is set to zero, and gets set to
whatever value is passed in via swiotlb_init_with_tbl function.
The default value passed in is 64MB. However, if the user provides
the 'swiotlb=<nslabs>' the default value is ignored and
the value provided by the user is used... Except when the SWIOTLB
is used under Xen - there the default value of 64MB is used and
the Xen-SWIOTLB has no mechanism to get the 'io_tlb_nslabs' filled
out by setup_io_tlb_npages functions. This patch provides a function
for the Xen-SWIOTLB to call to see if the io_tlb_nslabs is set
and if so use that value.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Do not use pirq_needs_eoi to decide which irq handler to use because Xen
always returns true if the guest does not support pirq_eoi_map.
Use the trigger information we already have from MP-tables and ACPI.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reported-by: Thomas Goetz <tom.goetz@virtualcomputer.com>
Tested-by: Thomas Goetz <tom.goetz@virtualcomputer.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/djm/tmem:
xen: cleancache shim to Xen Transcendent Memory
ocfs2: add cleancache support
ext4: add cleancache support
btrfs: add cleancache support
ext3: add cleancache support
mm/fs: add hooks to support cleancache
mm: cleancache core ops functions and config
fs: add field to superblock to support cleancache
mm/fs: cleancache documentation
Fix up trivial conflict in fs/btrfs/extent_io.c due to includes
This patch provides a shim between the kernel-internal cleancache
API (see Documentation/mm/cleancache.txt) and the Xen Transcendent
Memory ABI (see http://oss.oracle.com/projects/tmem).
Xen tmem provides "hypervisor RAM" as an ephemeral page-oriented
pseudo-RAM store for cleancache pages, shared cleancache pages,
and frontswap pages. Tmem provides enterprise-quality concurrency,
full save/restore and live migration support, compression
and deduplication.
A presentation showing up to 8% faster performance and up to 52%
reduction in sectors read on a kernel compile workload, despite
aggressive in-kernel page reclamation ("self-ballooning") can be
found at:
http://oss.oracle.com/projects/tmem/dist/documentation/presentations/TranscendentMemoryXenSummit2010.pdf
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Reviewed-by: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik Van Riel <riel@redhat.com>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Andreas Dilger <adilger@sun.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
Cc: Nitin Gupta <ngupta@vflare.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
b43: fix comment typo reqest -> request
Haavard Skinnemoen has left Atmel
cris: typo in mach-fs Makefile
Kconfig: fix copy/paste-ism for dell-wmi-aio driver
doc: timers-howto: fix a typo ("unsgined")
perf: Only include annotate.h once in tools/perf/util/ui/browsers/annotate.c
md, raid5: Fix spelling error in comment ('Ofcourse' --> 'Of course').
treewide: fix a few typos in comments
regulator: change debug statement be consistent with the style of the rest
Revert "arm: mach-u300/gpio: Fix mem_region resource size miscalculations"
audit: acquire creds selectively to reduce atomic op overhead
rtlwifi: don't touch with treewide double semicolon removal
treewide: cleanup continuations and remove logging message whitespace
ath9k_hw: don't touch with treewide double semicolon removal
include/linux/leds-regulator.h: fix syntax in example code
tty: fix typo in descripton of tty_termios_encode_baud_rate
xtensa: remove obsolete BKL kernel option from defconfig
m68k: fix comment typo 'occcured'
arch:Kconfig.locks Remove unused config option.
treewide: remove extra semicolons
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: (34 commits)
PM: Introduce generic prepare and complete callbacks for subsystems
PM: Allow drivers to allocate memory from .prepare() callbacks safely
PM: Remove CONFIG_PM_VERBOSE
Revert "PM / Hibernate: Reduce autotuned default image size"
PM / Hibernate: Add sysfs knob to control size of memory for drivers
PM / Wakeup: Remove useless synchronize_rcu() call
kmod: always provide usermodehelper_disable()
PM / ACPI: Remove acpi_sleep=s4_nonvs
PM / Wakeup: Fix build warning related to the "wakeup" sysfs file
PM: Print a warning if firmware is requested when tasks are frozen
PM / Runtime: Rework runtime PM handling during driver removal
Freezer: Use SMP barriers
PM / Suspend: Do not ignore error codes returned by suspend_enter()
PM: Fix build issue in clock_ops.c for CONFIG_PM_RUNTIME unset
PM: Revert "driver core: platform_bus: allow runtime override of dev_pm_ops"
OMAP1 / PM: Use generic clock manipulation routines for runtime PM
PM: Remove sysdev suspend, resume and shutdown operations
PM / PowerPC: Use struct syscore_ops instead of sysdevs for PM
PM / UNICORE32: Use struct syscore_ops instead of sysdevs for PM
PM / AVR32: Use struct syscore_ops instead of sysdevs for PM
...
* 'stable/irq' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen: do not clear and mask evtchns in __xen_evtchn_do_upcall
* 'stable/p2m.bugfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/p2m: Create entries in the P2M_MFN trees's to track 1-1 mappings
* 'stable/e820.bugfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/setup: Fix for incorrect xen_extra_mem_start initialization under 32-bit
xen/setup: Ignore E820_UNUSABLE when setting 1-1 mappings.
* 'stable/mmu.bugfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen mmu: fix a race window causing leave_mm BUG()
* 'stable/backend.base.v3' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/pci: Fix compiler error when CONFIG_XEN_PRIVILEGED_GUEST is not set.
xen/p2m: Add EXPORT_SYMBOL_GPL to the M2P override functions.
xen/p2m/m2p/gnttab: Support GNTMAP_host_map in the M2P override.
xen/irq: The Xen hypervisor cleans up the PIRQs if the other domain forgot.
xen/irq: Export 'xen_pirq_from_irq' function.
xen/irq: Add support to check if IRQ line is shared with other domains.
xen/irq: Check if the PCI device is owned by a domain different than DOMID_SELF.
xen/pci: Add xen_[find|register|unregister]_device_domain_owner functions.
* 'stable/gntalloc.v7' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/gntdev,gntalloc: Remove unneeded VM flags
Various merges over time have led to rather a mish-mash of indentation.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel@lists.xensource.com
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Since suspend, resume and shutdown operations in struct sysdev_class
and struct sysdev_driver are not used any more, remove them. Also
drop sysdev_suspend(), sysdev_resume() and sysdev_shutdown() used
for executing those operations and modify all of their users
accordingly. This reduces kernel code size quite a bit and reduces
its complexity.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Change the irq handler of evtchns and pirqs that don't need EOI (pirqs
that correspond to physical edge interrupts) to handle_edge_irq.
Use handle_fasteoi_irq for pirqs that need eoi (they generally
correspond to level triggered irqs), no risk in loosing interrupts
because we have to EOI the irq anyway.
This change has the following benefits:
- it uses the very same handlers that Linux would use on native for the
same irqs (handle_edge_irq for edge irqs and msis, and
handle_fasteoi_irq for everything else);
- it uses these handlers in the same way native code would use them: it
let Linux mask\unmask and ack the irq when Linux want to mask\unmask
and ack the irq;
- it fixes a problem occurring when a driver calls disable_irq() in its
handler: the old code was unconditionally unmasking the evtchn even if
the irq is disabled when irq_eoi was called.
See Documentation/DocBook/genericirq.tmpl for more informations.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
[v1: Fixed space/tab issues]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Device suspend/resume infrastructure is used not only by the suspend
and hibernate code in kernel/power, but also by APM, Xen and the
kexec jump feature. However, commit 40dc166cb5
(PM / Core: Introduce struct syscore_ops for core subsystems PM)
failed to add syscore_suspend() and syscore_resume() calls to that
code, which generally leads to breakage when the features in question
are used.
To fix this problem, add the missing syscore_suspend() and
syscore_resume() calls to arch/x86/kernel/apm_32.c, kernel/kexec.c
and drivers/xen/manage.c.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
We only supported the M2P (and P2M) override only for the
GNTMAP_contains_pte type mappings. Meaning that we grants
operations would "contain the machine address of the PTE to update"
If the flag is unset, then the grant operation is
"contains a host virtual address". The latter case means that
the Hypervisor takes care of updating our page table
(specifically the PTE entry) with the guest's MFN. As such we should
not try to do anything with the PTE. Previous to this patch
we would try to clear the PTE which resulted in Xen hypervisor
being upset with us:
(XEN) mm.c:1066:d0 Attempt to implicitly unmap a granted PTE c0100000ccc59067
(XEN) domain_crash called from mm.c:1067
(XEN) Domain 0 (vcpu#0) crashed on cpu#3:
(XEN) ----[ Xen-4.0-110228 x86_64 debug=y Not tainted ]----
and crashing us.
This patch allows us to inhibit the PTE clearing in the PV guest
if the GNTMAP_contains_pte is not set.
On the m2p_remove_override path we provide the same parameter.
Sadly in the grant-table driver we do not have a mechanism to
tell m2p_remove_override whether to clear the PTE or not. Since
the grant-table driver is used by user-space, we can safely assume
that it operates only on PTE's. Hence the implementation for
it to work on !GNTMAP_contains_pte returns -EOPNOTSUPP. In the future
we can implement the support for this. It will require some extra
accounting structure to keep track of the page[i], and the flag.
[v1: Added documentation details, made it return -EOPNOTSUPP instead
of trying to do a half-way implementation]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
And if the other domain forgot to clean up its PIRQs we don't need
to fail the operation. Just take a note of it and continue on.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
We need this to find the real Xen PIRQ value for a device
that requests an MSI or MSI-X. In the past we used
'xen_gsi_from_irq' since that function would return
an Xen PIRQ or GSI depending on the provided IRQ. Now that
we have seperated that we need to use the correct
function.
[v2: Deal with rebase on stable/irq.cleanup]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
We do this via the PHYSDEVOP_irq_status_query support hypervisor call.
We will get a positive value if another domain has binded its
PIRQ to the specified GSI (IRQ line).
[v2: Deal with v2.6.37-rc1 rebase fallout]
[v3: Deal with stable/irq.cleanup fallout]
[v4: xen_ignore_irq->xen_test_irq_shared]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>