Commit Graph

68 Commits

Author SHA1 Message Date
Vitaly Kuznetsov 87dffe86d4 xen/manage: don't complain about an empty value in control/sysrq node
When guest receives a sysrq request from the host it acknowledges it by
writing '\0' to control/sysrq xenstore node. This, however, make xenstore
watch fire again but xenbus_scanf() fails to parse empty value with "%c"
format string:

 sysrq: SysRq : Emergency Sync
 Emergency Sync complete
 xen:manage: Error -34 reading sysrq code in control/sysrq

Ignore -ERANGE the same way we already ignore -ENOENT, empty value in
control/sysrq is totally legal.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-09-14 08:51:10 -04:00
Zhouyang Jia 84c029a733 xen: add error handling for xenbus_printf
When xenbus_printf fails, the lack of error-handling code may
cause unexpected results.

This patch adds error-handling code after calling xenbus_printf.

Signed-off-by: Zhouyang Jia <jiazhouyang09@gmail.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2018-06-19 14:27:41 +02:00
Dongli Zhang 5e25f5db6a xen/time: do not decrease steal time after live migration on xen
After guest live migration on xen, steal time in /proc/stat
(cpustat[CPUTIME_STEAL]) might decrease because steal returned by
xen_steal_lock() might be less than this_rq()->prev_steal_time which is
derived from previous return value of xen_steal_clock().

For instance, steal time of each vcpu is 335 before live migration.

cpu  198 0 368 200064 1962 0 0 1340 0 0
cpu0 38 0 81 50063 492 0 0 335 0 0
cpu1 65 0 97 49763 634 0 0 335 0 0
cpu2 38 0 81 50098 462 0 0 335 0 0
cpu3 56 0 107 50138 374 0 0 335 0 0

After live migration, steal time is reduced to 312.

cpu  200 0 370 200330 1971 0 0 1248 0 0
cpu0 38 0 82 50123 500 0 0 312 0 0
cpu1 65 0 97 49832 634 0 0 312 0 0
cpu2 39 0 82 50167 462 0 0 312 0 0
cpu3 56 0 107 50207 374 0 0 312 0 0

Since runstate times are cumulative and cleared during xen live migration
by xen hypervisor, the idea of this patch is to accumulate runstate times
to global percpu variables before live migration suspend. Once guest VM is
resumed, xen_get_runstate_snapshot_cpu() would always return the sum of new
runstate times and previously accumulated times stored in global percpu
variables.

Comment above HYPERVISOR_suspend() has been removed as it is inaccurate:
the call can return an error code (e.g., possibly -EPERM in the future).

Similar and more severe issue would impact prior linux 4.8-4.10 as
discussed by Michael Las at
https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-paravirtualized-xen-guest,
which would overflow steal time and lead to 100% st usage in top command
for linux 4.8-4.10. A backport of this patch would fix that issue.

[boris: added linux/slab.h to driver/xen/time.c, slightly reformatted
        commit message]

References: https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-paravirtualized-xen-guest
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-11-02 16:49:41 -04:00
Linus Torvalds 6e6c5b9606 xen: features and fixes for 4.13-rc1
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABAgAGBQJZXdVXAAoJELDendYovxMvVA0IAITmvH21SDTFiilKCOrxhCv0
 W3q3cOhZA4D+UtTqqIm/os/et08n72864s0mUFoY4PxETaUsb1jBav7z7Tod2c6B
 wh26UgIAhVO3ZewFSmpdPYoW0l3elC5JUMkVMfwSvHkROaU+YDEYUsLWGuIHZiiy
 V/kIskcKe08HLObU//BMjfFusmMHmQSg+TruyqRWodlWj4Rwm7q5fNZ/xaap1UCM
 O7GcHyq1k699w5YYTlIEkLWsX/pGM+auGSlT1xdjJEc2bpjH8ps0xbvAn6dsAKsE
 yoDyxQWtX2wBUXCqF0hXYAB2r1iFx2aFfLQjwc7p+V6BvxpWwSsC7Ur4QIDnm3E=
 =OLb7
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.13-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from Juergen Gross:
 "Other than fixes and cleanups it contains:

   - support > 32 VCPUs at domain restore

   - support for new sysfs nodes related to Xen

   - some performance tuning for Linux running as Xen guest"

* tag 'for-linus-4.13-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen: allow userspace access during hypercalls
  x86: xen: remove unnecessary variable in xen_foreach_remap_area()
  xen: allocate page for shared info page from low memory
  xen: avoid deadlock in xenbus driver
  xen: add sysfs node for hypervisor build id
  xen: sync include/xen/interface/version.h
  xen: add sysfs node for guest type
  doc,xen: document hypervisor sysfs nodes for xen
  xen/vcpu: Handle xen_vcpu_setup() failure at boot
  xen/vcpu: Handle xen_vcpu_setup() failure in hotplug
  xen/pv: Fix OOPS on restore for a PV, !SMP domain
  xen/pvh*: Support > 32 VCPUs at domain restore
  xen/vcpu: Simplify xen_vcpu related code
  xen-evtchn: Bind dyn evtchn:qemu-dm interrupt to next online VCPU
  xen: avoid type warning in xchg_xen_ulong
  xen: fix HYPERVISOR_dm_op() prototype
  xen: don't print error message in case of missing Xenstore entry
  arm/xen: Adjust one function call together with a variable assignment
  arm/xen: Delete an error message for a failed memory allocation in __set_phys_to_machine_multi()
  arm/xen: Improve a size determination in __set_phys_to_machine_multi()
2017-07-06 19:11:24 -07:00
Juergen Gross 4e93b6481c xen: don't print error message in case of missing Xenstore entry
When registering for the Xenstore watch of the node control/sysrq the
handler will be called at once. Don't issue an error message if the
Xenstore node isn't there, as it will be created only when an event
is being triggered.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-06-07 09:48:02 +02:00
Thomas Gleixner 69a78ff226 init: Introduce SYSTEM_SCHEDULING state
might_sleep() debugging and smp_processor_id() debugging should be active
right after the scheduler starts working. The init task can invoke
smp_processor_id() from preemptible context as it is pinned on the boot cpu
until sched_smp_init() removes the pinning and lets it schedule on all non
isolated cpus.

Add a new state which allows to enable those checks earlier and add it to
the xen do_poweroff() function.

No functional change.

Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170516184736.196214622@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-23 10:01:38 +02:00
Juergen Gross 5584ea250a xen: modify xenstore watch event interface
Today a Xenstore watch event is delivered via a callback function
declared as:

void (*callback)(struct xenbus_watch *,
                 const char **vec, unsigned int len);

As all watch events only ever come with two parameters (path and token)
changing the prototype to:

void (*callback)(struct xenbus_watch *,
                 const char *path, const char *token);

is the natural thing to do.

Apply this change and adapt all users.

Cc: konrad.wilk@oracle.com
Cc: roger.pau@citrix.com
Cc: wei.liu2@citrix.com
Cc: paul.durrant@citrix.com
Cc: netdev@vger.kernel.org

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-02-09 11:26:49 -05:00
Jan Beulich 4fed1b125e xen/manage: correct return value check on xenbus_scanf()
A negative return value indicates an error; in fact the function at
present won't ever return zero.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-02-03 11:26:40 -05:00
Juergen Gross 44b3c7af02 xenbus: advertise control feature flags
The Xen docs specify several flags which a guest can set to advertise
which values of the xenstore control/shutdown key it will recognize.
This patch adds code to write all the relevant feature-flag keys.

Based-on-patch-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-10-24 15:48:40 +01:00
Julien Grall 0df4f266b3 xen: Use correctly the Xen memory terminologies
Based on include/xen/mm.h [1], Linux is mistakenly using MFN when GFN
is meant, I suspect this is because the first support for Xen was for
PV. This resulted in some misimplementation of helpers on ARM and
confused developers about the expected behavior.

For instance, with pfn_to_mfn, we expect to get an MFN based on the name.
Although, if we look at the implementation on x86, it's returning a GFN.

For clarity and avoid new confusion, replace any reference to mfn with
gfn in any helpers used by PV drivers. The x86 code will still keep some
reference of pfn_to_mfn which may be used by all kind of guests
No changes as been made in the hypercall field, even
though they may be invalid, in order to keep the same as the defintion
in xen repo.

Note that page_to_mfn has been renamed to xen_page_to_gfn to avoid a
name to close to the KVM function gfn_to_page.

Take also the opportunity to simplify simple construction such
as pfn_to_mfn(page_to_pfn(page)) into xen_page_to_gfn. More complex clean up
will come in follow-up patches.

[1] http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=e758ed14f390342513405dd766e874934573e6cb

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-09-08 18:03:49 +01:00
Julien Grall a9fd60e268 xen: Include xen/page.h rather than asm/xen/page.h
Using xen/page.h will be necessary later for using common xen page
helpers.

As xen/page.h already include asm/xen/page.h, always use the later.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-06-17 16:14:18 +01:00
Boris Ostrovsky 2b953a5e99 xen: Suspend ticks on all CPUs during suspend
Commit 77e32c89a7 ("clockevents: Manage device's state separately for
the core") decouples clockevent device's modes from states. With this
change when a Xen guest tries to resume, it won't be calling its
set_mode op which needs to be done on each VCPU in order to make the
hypervisor aware that we are in oneshot mode.

This happens because clockevents_tick_resume() (which is an intermediate
step of resuming ticks on a processor) doesn't call clockevents_set_state()
anymore and because during suspend clockevent devices on all VCPUs (except
for the one doing the suspend) are left in ONESHOT state. As result, during
resume the clockevents state machine will assume that device is already
where it should be and doesn't need to be updated.

To avoid this problem we should suspend ticks on all VCPUs during
suspend.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-04-29 17:10:05 +01:00
Ross Lagerwall 72978b2fe2 xen/manage: Fix USB interaction issues when resuming
Commit 61a734d305 ("xen/manage: Always freeze/thaw processes when
suspend/resuming") ensured that userspace processes were always frozen
before suspending to reduce interaction issues when resuming devices.
However, freeze_processes() does not freeze kernel threads.  Freeze
kernel threads as well to prevent deadlocks with the khubd thread when
resuming devices.

This is what native suspend and resume does.

Example deadlock:
[ 7279.648010]  [<ffffffff81446bde>] ? xen_poll_irq_timeout+0x3e/0x50
[ 7279.648010]  [<ffffffff81448d60>] xen_poll_irq+0x10/0x20
[ 7279.648010]  [<ffffffff81011723>] xen_lock_spinning+0xb3/0x120
[ 7279.648010]  [<ffffffff810115d1>] __raw_callee_save_xen_lock_spinning+0x11/0x20
[ 7279.648010]  [<ffffffff815620b6>] ? usb_control_msg+0xe6/0x120
[ 7279.648010]  [<ffffffff81747e50>] ? _raw_spin_lock_irq+0x50/0x60
[ 7279.648010]  [<ffffffff8174522c>] wait_for_completion+0xac/0x160
[ 7279.648010]  [<ffffffff8109c520>] ? try_to_wake_up+0x2c0/0x2c0
[ 7279.648010]  [<ffffffff814b60f2>] dpm_wait+0x32/0x40
[ 7279.648010]  [<ffffffff814b6eb0>] device_resume+0x90/0x210
[ 7279.648010]  [<ffffffff814b7d71>] dpm_resume+0x121/0x250
[ 7279.648010]  [<ffffffff8144c570>] ? xenbus_dev_request_and_reply+0xc0/0xc0
[ 7279.648010]  [<ffffffff814b80d5>] dpm_resume_end+0x15/0x30
[ 7279.648010]  [<ffffffff81449fba>] do_suspend+0x10a/0x200
[ 7279.648010]  [<ffffffff8144a2f0>] ? xen_pre_suspend+0x20/0x20
[ 7279.648010]  [<ffffffff8144a1d0>] shutdown_handler+0x120/0x150
[ 7279.648010]  [<ffffffff8144c60f>] xenwatch_thread+0x9f/0x160
[ 7279.648010]  [<ffffffff810ac510>] ? finish_wait+0x80/0x80
[ 7279.648010]  [<ffffffff8108d189>] kthread+0xc9/0xe0
[ 7279.648010]  [<ffffffff8108d0c0>] ? flush_kthread_worker+0x80/0x80
[ 7279.648010]  [<ffffffff8175087c>] ret_from_fork+0x7c/0xb0
[ 7279.648010]  [<ffffffff8108d0c0>] ? flush_kthread_worker+0x80/0x80

[ 7441.216287] INFO: task khubd:89 blocked for more than 120 seconds.
[ 7441.219457]       Tainted: G            X 3.13.11-ckt12.kz #1
[ 7441.222176] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 7441.225827] khubd           D ffff88003f433440     0    89      2 0x00000000
[ 7441.229258]  ffff88003ceb9b98 0000000000000046 ffff88003ce83000 0000000000013440
[ 7441.232959]  ffff88003ceb9fd8 0000000000013440 ffff88003cd13000 ffff88003ce83000
[ 7441.236658]  0000000000000286 ffff88003d3e0000 ffff88003ceb9bd0 00000001001aa01e
[ 7441.240415] Call Trace:
[ 7441.241614]  [<ffffffff817442f9>] schedule+0x29/0x70
[ 7441.243930]  [<ffffffff81743406>] schedule_timeout+0x166/0x2c0
[ 7441.246681]  [<ffffffff81075b80>] ? call_timer_fn+0x110/0x110
[ 7441.249339]  [<ffffffff8174357e>] schedule_timeout_uninterruptible+0x1e/0x20
[ 7441.252644]  [<ffffffff81077710>] msleep+0x20/0x30
[ 7441.254812]  [<ffffffff81555f00>] hub_port_reset+0xf0/0x580
[ 7441.257400]  [<ffffffff81558465>] hub_port_init+0x75/0xb40
[ 7441.259981]  [<ffffffff814bb3c9>] ? update_autosuspend+0x39/0x60
[ 7441.262817]  [<ffffffff814bb4f0>] ? pm_runtime_set_autosuspend_delay+0x50/0xa0
[ 7441.266212]  [<ffffffff8155a64a>] hub_thread+0x71a/0x1750
[ 7441.268728]  [<ffffffff810ac510>] ? finish_wait+0x80/0x80
[ 7441.271272]  [<ffffffff81559f30>] ? usb_port_resume+0x670/0x670
[ 7441.274067]  [<ffffffff8108d189>] kthread+0xc9/0xe0
[ 7441.276305]  [<ffffffff8108d0c0>] ? flush_kthread_worker+0x80/0x80
[ 7441.279131]  [<ffffffff8175087c>] ret_from_fork+0x7c/0xb0
[ 7441.281659]  [<ffffffff8108d0c0>] ? flush_kthread_worker+0x80/0x80

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Cc: stable@vger.kernel.org
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-02-06 15:49:09 +00:00
Ross Lagerwall 61a734d305 xen/manage: Always freeze/thaw processes when suspend/resuming
Always freeze processes when suspending and thaw processes when resuming
to prevent a race noticeable with HVM guests.

This prevents a deadlock where the khubd kthread (which is designed to
be freezable) acquires a usb device lock and then tries to allocate
memory which requires the disk which hasn't been resumed yet.
Meanwhile, the xenwatch thread deadlocks waiting for the usb device
lock.

Freezing processes fixes this because the khubd thread is only thawed
after the xenwatch thread finishes resuming all the devices.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: stable@vger.kernel.org
2014-09-02 15:36:59 +01:00
David Vrabel 1b6478231c xen/manage: fix potential deadlock when resuming the console
Calling xen_console_resume() in xen_suspend() causes a warning because
it locks irq_mapping_update_lock (a mutex) and this may sleep.  If a
userspace process is using the evtchn device then this mutex may be
locked at the point of the stop_machine() call and
xen_console_resume() would then deadlock.

Resuming the console after stop_machine() returns avoids this
deadlock.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: <stable@vger.kernel.org>
2014-07-03 11:02:28 +01:00
David Vrabel aa8532c322 xen: refactor suspend pre/post hooks
New architectures currently have to provide implementations of 5 different
functions: xen_arch_pre_suspend(), xen_arch_post_suspend(),
xen_arch_hvm_post_suspend(), xen_mm_pin_all(), and xen_mm_unpin_all().

Refactor the suspend code to only require xen_arch_pre_suspend() and
xen_arch_post_suspend().

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2014-05-12 17:19:56 +01:00
Konrad Rzeszutek Wilk eb47f71200 xen/manage: Poweroff forcefully if user-space is not yet up.
The user can launch the guest in this sequence:

xl create -p /vm.cfg	[launch, but pause it]
xl shutdown latest	[sets control/shutdown=poweroff]
xl unpause latest
xl console latest	[and see that the guest has completely
ignored the shutdown request]

In reality the guest hasn't ignored it. It registers a watch
and gets a notification that there is value. It then calls
the shutdown_handler which ends up calling orderly_shutdown.

Unfortunately that is so early in the bootup that there
are no user-space. Which means that the orderly_shutdown fails.
But since the force flag was set to false it continues on without
reporting.

What we really want to is to use the force when we are in the
SYSTEM_BOOTING state and not use the 'force' when SYSTEM_RUNNING.

However, if we are in the running state - and the shutdown command
has been given before the user-space has been setup, there is nothing
we can do. Worst yet, we stop ignoring the 'xl shutdown' requests!

As such, the other part of this patch is to only stop ignoring
the 'xl shutdown' when we are truly in the power off sequence.

That means the user can do multiple 'xl shutdown' and we will try
to act on them instead of ignoring them.

Fixes-Bug: http://bugs.xenproject.org/xen/bug/6
Reported-by:  Alex Bligh <alex@alex.org.uk>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-04-15 17:41:29 +01:00
Stanislaw Gruszka cd979883b9 xen/acpi-processor: fix enabling interrupts on syscore_resume
syscore->resume() callback is expected to do not enable interrupts,
it generates warning like below otherwise:

[ 9386.365390] WARNING: CPU: 0 PID: 6733 at drivers/base/syscore.c:104 syscore_resume+0x9a/0xe0()
[ 9386.365403] Interrupts enabled after xen_acpi_processor_resume+0x0/0x34 [xen_acpi_processor]
...
[ 9386.365429] Call Trace:
[ 9386.365434]  [<ffffffff81667a8b>] dump_stack+0x45/0x56
[ 9386.365437]  [<ffffffff8106921d>] warn_slowpath_common+0x7d/0xa0
[ 9386.365439]  [<ffffffff8106928c>] warn_slowpath_fmt+0x4c/0x50
[ 9386.365442]  [<ffffffffa0261bb0>] ? xen_upload_processor_pm_data+0x300/0x300 [xen_acpi_processor]
[ 9386.365443]  [<ffffffff814055fa>] syscore_resume+0x9a/0xe0
[ 9386.365445]  [<ffffffff810aef42>] suspend_devices_and_enter+0x402/0x470
[ 9386.365447]  [<ffffffff810af128>] pm_suspend+0x178/0x260

On xen_acpi_processor_resume() we call various procedures, which are
non atomic and can enable interrupts. To prevent the issue introduce
separate resume notify called after we enable interrupts on resume
and before we call other drivers resume callbacks.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-03-18 14:40:20 +00:00
Linus Torvalds 21884a83b2 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer core updates from Thomas Gleixner:
 "The timer changes contain:

   - posix timer code consolidation and fixes for odd corner cases

   - sched_clock implementation moved from ARM to core code to avoid
     duplication by other architectures

   - alarm timer updates

   - clocksource and clockevents unregistration facilities

   - clocksource/events support for new hardware

   - precise nanoseconds RTC readout (Xen feature)

   - generic support for Xen suspend/resume oddities

   - the usual lot of fixes and cleanups all over the place

  The parts which touch other areas (ARM/XEN) have been coordinated with
  the relevant maintainers.  Though this results in an handful of
  trivial to solve merge conflicts, which we preferred over nasty cross
  tree merge dependencies.

  The patches which have been committed in the last few days are bug
  fixes plus the posix timer lot.  The latter was in akpms queue and
  next for quite some time; they just got forgotten and Frederic
  collected them last minute."

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (59 commits)
  hrtimer: Remove unused variable
  hrtimers: Move SMP function call to thread context
  clocksource: Reselect clocksource when watchdog validated high-res capability
  posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting
  posix_timers: fix racy timer delta caching on task exit
  posix-timers: correctly get dying task time sample in posix_cpu_timer_schedule()
  selftests: add basic posix timers selftests
  posix_cpu_timers: consolidate expired timers check
  posix_cpu_timers: consolidate timer list cleanups
  posix_cpu_timer: consolidate expiry time type
  tick: Sanitize broadcast control logic
  tick: Prevent uncontrolled switch to oneshot mode
  tick: Make oneshot broadcast robust vs. CPU offlining
  x86: xen: Sync the CMOS RTC as well as the Xen wallclock
  x86: xen: Sync the wallclock when the system time is set
  timekeeping: Indicate that clock was set in the pvclock gtod notifier
  timekeeping: Pass flags instead of multiple bools to timekeeping_update()
  xen: Remove clock_was_set() call in the resume path
  hrtimers: Support resuming with two or more CPUs online (but stopped)
  timer: Fix jiffies wrap behavior of round_jiffies_common()
  ...
2013-07-06 14:09:38 -07:00
David Vrabel 0eb0716514 xen: Remove clock_was_set() call in the resume path
commit 359cdd3f866(xen: maintain clock offset over save/restore) added
a clock_was_set() call into the xen resume code to propagate the
system time changes. With the modified hrtimer resume code, which
makes sure that all cpus are notified this call is not longer necessary.

[ tglx: Separated it from the hrtimer change ]

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk  <konrad.wilk@oracle.com>
Cc: John Stultz  <john.stultz@linaro.org>
Cc: <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/1372329348-20841-2-git-send-email-david.vrabel@citrix.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-06-28 23:15:06 +02:00
Joe Perches 283c0972d5 xen: Convert printks to pr_<level>
Convert printks to pr_<level> (excludes printk(KERN_DEBUG...)
to be more consistent throughout the xen subsystem.

Add pr_fmt with KBUILD_MODNAME or "xen:" KBUILD_MODNAME
Coalesce formats and add missing word spaces
Add missing newlines
Align arguments and reflow to 80 columns
Remove DRV_NAME from formats as pr_fmt adds the same content

This does change some of the prefixes of these messages
but it also does make them more consistent.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-06-28 11:19:58 -04:00
Stefano Stabellini 65e053a703 xen: ifdef CONFIG_HIBERNATE_CALLBACKS xen_*_suspend
xen_hvm_post_suspend, xen_pre_suspend, xen_post_suspend are only used if
CONFIG_HIBERNATE_CALLBACKS is defined, resulting in:

drivers/xen/manage.c:46:13: warning: ‘xen_hvm_post_suspend’ defined but not used [-Wunused-function]
drivers/xen/manage.c:52:13: warning: ‘xen_pre_suspend’ defined but not used [-Wunused-function]
drivers/xen/manage.c:59:13: warning: ‘xen_post_suspend’ defined but not used [-Wunused-function]

If the kernel config is missing CONFIG_HIBERNATE_CALLBACKS.

Simply ifdef CONFIG_HIBERNATE_CALLBACKS the three functions.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-06-28 11:19:50 -04:00
Konrad Rzeszutek Wilk 186bab1ce0 xen/resume: Fix compile warnings.
linux/drivers/xen/manage.c: In function 'do_suspend':
linux/drivers/xen/manage.c:160:5: warning: 'si.cancelled' may be used uninitialized in this function

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-19 15:12:49 -04:00
Rafael J. Wysocki cf579dfb82 PM / Sleep: Introduce "late suspend" and "early resume" of devices
The current device suspend/resume phases during system-wide power
transitions appear to be insufficient for some platforms that want
to use the same callback routines for saving device states and
related operations during runtime suspend/resume as well as during
system suspend/resume.  In principle, they could point their
.suspend_noirq() and .resume_noirq() to the same callback routines
as their .runtime_suspend() and .runtime_resume(), respectively,
but at least some of them require device interrupts to be enabled
while the code in those routines is running.

It also makes sense to have device suspend-resume callbacks that will
be executed with runtime PM disabled and with device interrupts
enabled in case someone needs to run some special code in that
context during system-wide power transitions.

Apart from this, .suspend_noirq() and .resume_noirq() were introduced
as a workaround for drivers using shared interrupts and failing to
prevent their interrupt handlers from accessing suspended hardware.
It appears to be better not to use them for other porposes, or we may
have to deal with some serious confusion (which seems to be happening
already).

For the above reasons, introduce new device suspend/resume phases,
"late suspend" and "early resume" (and analogously for hibernation)
whose callback will be executed with runtime PM disabled and with
device interrupts enabled and whose callback pointers generally may
point to runtime suspend/resume routines.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reviewed-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
2012-01-29 20:38:29 +01:00
Paul Gortmaker 63c9744b9a xen: Add export.h for THIS_MODULE/EXPORT_SYMBOL to various xen users.
Things like THIS_MODULE and EXPORT_SYMBOL were simply everywhere
because module.h was also everywhere.  But we are fixing the latter.
So we need to call out the real users in advance.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:32:11 -04:00
Rafael J. Wysocki 2e711c04db PM: Remove sysdev suspend, resume and shutdown operations
Since suspend, resume and shutdown operations in struct sysdev_class
and struct sysdev_driver are not used any more, remove them.  Also
drop sysdev_suspend(), sysdev_resume() and sysdev_shutdown() used
for executing those operations and modify all of their users
accordingly.  This reduces kernel code size quite a bit and reduces
its complexity.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-11 21:37:15 +02:00
Rafael J. Wysocki 19234c0819 PM: Add missing syscore_suspend() and syscore_resume() calls
Device suspend/resume infrastructure is used not only by the suspend
and hibernate code in kernel/power, but also by APM, Xen and the
kexec jump feature.  However, commit 40dc166cb5
(PM / Core: Introduce struct syscore_ops for core subsystems PM)
failed to add syscore_suspend() and syscore_resume() calls to that
code, which generally leads to breakage when the features in question
are used.

To fix this problem, add the missing syscore_suspend() and
syscore_resume() calls to arch/x86/kernel/apm_32.c, kernel/kexec.c
and drivers/xen/manage.c.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
2011-04-20 00:36:11 +02:00
Rafael J. Wysocki 1f112cee07 PM / Hibernate: Introduce CONFIG_HIBERNATE_CALLBACKS
Xen save/restore is going to use hibernate device callbacks for
quiescing devices and putting them back to normal operations and it
would need to select CONFIG_HIBERNATION for this purpose.  However,
that also would cause the hibernate interfaces for user space to be
enabled, which might confuse user space, because the Xen kernels
don't support hibernation.  Moreover, it would be wasteful, as it
would make the Xen kernels include a substantial amount of code that
they would never use.

To address this issue introduce new power management Kconfig option
CONFIG_HIBERNATE_CALLBACKS, such that it will only select the code
that is necessary for the hibernate device callbacks to work and make
CONFIG_HIBERNATION select it.  Then, Xen save/restore will be able to
select CONFIG_HIBERNATE_CALLBACKS without dragging the entire
hibernate code along with it.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Tested-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
2011-04-11 22:54:42 +02:00
Shriram Rajagopalan b3e96c0c75 xen: use freeze/restore/thaw PM events for suspend/resume/chkpt
Use PM_FREEZE, PM_THAW and PM_RESTORE power events for
suspend/resume/checkpoint functionality, instead of PM_SUSPEND
and PM_RESUME. Use of these pm events fixes the Xen Guest hangup
when taking checkpoints. When a suspend event is cancelled
(while taking checkpoints once/continuously), we use PM_THAW
instead of PM_RESUME. PM_RESTORE is used when suspend is not
cancelled. See Documentation/power/devices.txt and linux/pm.h
for more info about freeze, thaw and restore. The sequence of
pm events in a suspend-resume scenario is shown below.

        dpm_suspend_start(PMSG_FREEZE);

                dpm_suspend_noirq(PMSG_FREEZE);

                       sysdev_suspend(PMSG_FREEZE);
                       cancelled = suspend_hypercall()
                       sysdev_resume();

               dpm_resume_noirq(cancelled ? PMSG_THAW : PMSG_RESTORE);

       dpm_resume_end(cancelled ? PMSG_THAW : PMSG_RESTORE);

Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-03-16 13:42:56 -04:00
Ian Campbell b056b6a014 xen: suspend: remove xen_hvm_suspend
It is now identical to xen_suspend, the differences are encapsulated
in the suspend_info struct.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-02-25 16:43:14 +00:00
Ian Campbell 55fb4acef7 xen: suspend: pull pre/post suspend hooks out into suspend_info
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-02-25 16:43:13 +00:00
Ian Campbell 07af38102f xen: suspend: move arch specific pre/post suspend hooks into generic hooks
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-02-25 16:43:13 +00:00
Ian Campbell 82043bb60d xen: suspend: refactor non-arch specific pre/post suspend hooks
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-02-25 16:43:12 +00:00
Ian Campbell 03c8142bd2 xen: suspend: add "arch" to pre/post suspend hooks
xen_pre_device_suspend is unused on ia64.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-02-25 16:43:12 +00:00
Ian Campbell 36b401e2c2 xen: suspend: pass extra hypercall argument via suspend_info struct
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-02-25 16:43:11 +00:00
Ian Campbell ceb1802947 xen: suspend: refactor cancellation flag into a structure
Will add extra fields in subsequent patches.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-02-25 16:43:11 +00:00
Ian Campbell bd1c0ad284 xen: suspend: use HYPERVISOR_suspend for PVHVM case instead of open coding
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-02-25 16:43:11 +00:00
Ian Campbell 552717231e xen: do not respond to unknown xenstore control requests
The PV xenbus control/shutdown node is written by the toolstack as a
request to the guest to perform a particular action (shutdown, reboot,
suspend etc). The guest is expected to acknowledge that it will
complete a request by clearing the control node.

Previously it would acknowledge any request, even if it did not know
what to do with it. Specifically in the case where CONFIG_PM_SLEEP is
not enabled the kernel would acknowledge a suspend request even though
it was not actually going to do anything.

Instead make the kernel only acknowledge requests if it is actually
going to do something with it. This will improve the toolstack's
ability to diagnose and deal with failures.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-02-25 16:43:09 +00:00
Stefano Stabellini 702d4eb9b3 xen: no need to delay xen_setup_shutdown_event for hvm guests anymore
Now that xenstore_ready is used correctly for PV on HVM guests too, we
don't need to delay the initialization of xen_setup_shutdown_event
anymore.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
2011-02-25 16:43:03 +00:00
Ian Campbell 8dd38383a5 xen: suspend and resume system devices when running PVHVM
Otherwise we fail to properly suspend/resume all of the emulated devices.

Something between 2.6.38-rc2 and rc3 appears to have exposed this
issue, but it's always been wrong not to do this.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
2011-02-17 10:31:20 +00:00
Stefano Stabellini 6411fe69b8 xen: resume the pv console for hvm guests too
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2010-12-02 14:40:48 +00:00
Dmitry Torokhov f335397d17 Input: sysrq - drop tty argument form handle_sysrq()
Sysrq operations do not accept tty argument anymore so no need to pass
it to us.

[Stephen Rothwell <sfr@canb.auug.org.au>: fix build breakage in drm code
 caused by sysrq using bool but not including linux/types.h]

[Sachin Sant <sachinp@in.ibm.com>: fix build breakage in s390 keyboadr
 driver]

Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2010-08-21 00:34:45 -07:00
Stefano Stabellini 016b6f5fe8 xen: Add suspend/resume support for PV on HVM guests.
Suspend/resume requires few different things on HVM: the suspend
hypercall is different; we don't need to save/restore memory related
settings; except the shared info page and the callback mechanism.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-07-22 16:46:21 -07:00
Stefano Stabellini 183d03cc4f xen: Xen PCI platform device driver.
Add the xen pci platform device driver that is responsible
for initializing the grant table and xenbus in PV on HVM mode.
Few changes to xenbus and grant table are necessary to allow the delayed
initialization in HVM mode.
Grant table needs few additional modifications to work in HVM mode.

The Xen PCI platform device raises an irq every time an event has been
delivered to us. However these interrupts are only delivered to vcpu 0.
The Xen PCI platform interrupt handler calls xen_hvm_evtchn_do_upcall
that is a little wrapper around __xen_evtchn_do_upcall, the traditional
Xen upcall handler, the very same used with traditional PV guests.

When running on HVM the event channel upcall is never called while in
progress because it is a normal Linux irq handler (and we cannot switch
the irq chip wholesale to the Xen PV ones as we are running QEMU and
might have passed in PCI devices), therefore we cannot be sure that
evtchn_upcall_pending is 0 when returning.
For this reason if evtchn_upcall_pending is set by Xen we need to loop
again on the event channels set pending otherwise we might loose some
event channel deliveries.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-07-22 16:46:09 -07:00
Randy Dunlap f3bc3189a0 xen: fix build when SYSRQ is disabled
Fix build error when CONFIG_MAGIC_SYSRQ is not enabled:

drivers/xen/manage.c:223: error: implicit declaration of function 'handle_sysrq'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:07 -07:00
Tejun Heo 3fc1f1e27a stop_machine: reimplement using cpu_stop
Reimplement stop_machine using cpu_stop.  As cpu stoppers are
guaranteed to be available for all online cpus,
stop_machine_create/destroy() are no longer necessary and removed.

With resource management and synchronization handled by cpu_stop, the
new implementation is much simpler.  Asking the cpu_stop to execute
the stop_cpu() state machine on all online cpus with cpu hotplug
disabled is enough.

stop_machine itself doesn't need to manage any global resources
anymore, so all per-instance information is rolled into struct
stop_machine_data and the mutex and all static data variables are
removed.

The previous implementation created and destroyed RT workqueues as
necessary which made stop_machine() calls highly expensive on very
large machines.  According to Dimitri Sivanich, preventing the dynamic
creation/destruction makes booting faster more than twice on very
large machines.  cpu_stop resources are preallocated for all online
cpus and should have the same effect.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
2010-05-06 18:49:20 +02:00
Tejun Heo 5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Ian Campbell c5cae661d6 xen: fix hang on suspend.
In 65f63384 "xen: improve error handling in do_suspend" I said:
    - xs_suspend()/xs_resume() and dpm_suspend_noirq()/dpm_resume_noirq() were not
      nested in the obvious way.
and changed the ordering of the calls as so:
    BEFORE		AFTER
    xs_suspend		dpm_suspend_noirq
    dpm_suspend_noirq	xs_suspend
    *SUSPEND*		*SUSPEND*
    dpm_resume_noirq	dpm_resume_noirq
    xs_resume		xs_resume
Clearly this is not an improvement and I was talking rubbish.

In particular the new ordering is susceptible to a hang if a xenstore write is
in progress at the point at which the suspend kicks in. When the suspend
process calls xs_suspend it tries to take the request_mutex but if a write is
in progress it could be looping in xenbus_xs.c:read_reply() waiting for
something to arrive on &xs_state.reply_list while holding the request_mutex
(taken in the caller of read_reply).

However if we have done dpm_suspend_noirq before xs_suspend then we won't get
any more xenstore interrupts and process_msg() will never be woken up to add
anything to the reply_list.

Fix this by calling xs_suspend before dpm_suspend_noirq. If dpm_suspend_noirq
fails then make sure we go through the xs_suspend_cancel() code path.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
2010-01-13 10:01:35 +00:00
Ian Campbell b4606f2165 xen: explicitly create/destroy stop_machine workqueues outside suspend/resume region.
I have observed cases where the implicit stop_machine_destroy() done by
stop_machine() hangs while destroying the workqueues, specifically in
kthread_stop(). This seems to be because timer ticks are not restarted
until after stop_machine() returns.

Fortunately stop_machine provides a facility to pre-create/post-destroy
the workqueues so use this to ensure that workqueues are only destroyed
after everything is really up and running again.

I only actually observed this failure with 2.6.30. It seems that newer
kernels are somehow more robust against doing kthread_stop() without timer
interrupts (I tried some backports of some likely looking candidates but
did not track down the commit which added this robustness). However this
change seems like a reasonable belt&braces thing to do.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
2009-12-03 11:14:56 -08:00
Ian Campbell 65f63384b3 xen: improve error handling in do_suspend.
The existing error handling has a few issues:
- If freeze_processes() fails it exits with shutting_down = SHUTDOWN_SUSPEND.
- If dpm_suspend_noirq() fails it exits without resuming xenbus.
- If stop_machine() fails it exits without resuming xenbus or calling
  dpm_resume_end().
- xs_suspend()/xs_resume() and dpm_suspend_noirq()/dpm_resume_noirq() were not
  nested in the obvious way.

Fix by ensuring each failure case goto's the correct label. Treat a failure of
stop_machine() as a cancelled suspend in order to follow the correct resume
path.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
2009-12-03 11:14:56 -08:00