Commit Graph

738 Commits

Author SHA1 Message Date
Daniel Lezcano 30dc72c6fa ARM: kirkwood: cpuidle: use init/exit common routine
Remove the duplicated code and use the cpuidle common code for initialization.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-23 13:45:23 +02:00
Daniel Lezcano 0b210d96a6 ARM: calxeda: cpuidle: use init/exit common routine
Remove the duplicated code and use the cpuidle common code for initialization.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-23 13:45:23 +02:00
Daniel Lezcano 4c637b2175 cpuidle: make a single register function for all
The usual scheme to initialize a cpuidle driver on a SMP is:

	cpuidle_register_driver(drv);
	for_each_possible_cpu(cpu) {
		device = &per_cpu(cpuidle_dev, cpu);
		cpuidle_register_device(device);
	}

This code is duplicated in each cpuidle driver.

On UP systems, it is done this way:

	cpuidle_register_driver(drv);
	device = &per_cpu(cpuidle_dev, cpu);
	cpuidle_register_device(device);

On UP, the macro 'for_each_cpu' does one iteration:

#define for_each_cpu(cpu, mask)                 \
        for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask)

Hence, the initialization loop is the same for UP than SMP.

Beside, we saw different bugs / mis-initialization / return code unchecked in
the different drivers, the code is duplicated including bugs. After fixing all
these ones, it appears the initialization pattern is the same for everyone.

Please note, some drivers are doing dev->state_count = drv->state_count. This is
not necessary because it is done by the cpuidle_enable_device function in the
cpuidle framework. This is true, until you have the same states for all your
devices. Otherwise, the 'low level' API should be used instead with the specific
initialization for the driver.

Let's add a wrapper function doing this initialization with a cpumask parameter
for the coupled idle states and use it for all the drivers.

That will save a lot of LOC, consolidate the code, and the modifications in the
future could be done in a single place. Another benefit is the consolidation of
the cpuidle_device variable which is now in the cpuidle framework and no longer
spread accross the different arch specific drivers.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-23 13:45:22 +02:00
Daniel Lezcano 554c06ba3e cpuidle: remove en_core_tk_irqen flag
The en_core_tk_irqen flag is set in all the cpuidle driver which
means it is not necessary to specify this flag.

Remove the flag and the code related to it.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Kevin Hilman <khilman@linaro.org>  # for mach-omap2/*
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-23 13:45:22 +02:00
Daniel Lezcano a06df062a1 cpuidle: initialize the broadcast timer framework
The commit 89878baa73f0f1c679355006bd8632e5d78f96c2 introduced
the CPUIDLE_FLAG_TIMER_STOP flag where we specify a specific idle
state stops the local timer.

Now use this flag to check at init time if one state will need
the broadcast timer and, in this case, setup the broadcast timer
framework. That prevents multiple code duplication in the drivers.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-01 01:10:28 +02:00
Silviu-Mihai Popescu 488540bf41 cpuidle: kirkwood: fix coccicheck warnings
Convert all uses of devm_request_and_ioremap() to the newly introduced
devm_ioremap_resource() which provides more consistent error handling.

devm_ioremap_resource() provides its own error messages so all explicit
error messages can be removed from the failure code paths.

Signed-off-by: Silviu-Mihai Popescu <silviupopescu1990@gmail.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-01 01:10:27 +02:00
Daniel Lezcano 9a23fe65cf cpuidle / kirkwood: remove redundant Kconfig option
When the CPU_IDLE and the ARCH_KIRKWOOD options are set it is
pointless to define a new option CPU_IDLE_KIRKWOOD because it
is redundant.

The Makefile drivers directory contains a condition to compile
the cpuidle drivers:

obj-$(CONFIG_CPU_IDLE)          += cpuidle/

Hence, if CPU_IDLE is not set we won't enter this directory.

This patch removes the useless Kconfig option and replaces the
condition in the Makefile by CONFIG_ARCH_KIRKWOOD.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-01 01:10:27 +02:00
Daniel Lezcano b60e6a0eb0 cpuidle : handle clockevent notify from the cpuidle framework
When a cpu enters a deep idle state, the local timers are stopped and
the time framework falls back to the timer device used as a broadcast
timer.

The different cpuidle drivers are calling clockevents_notify ENTER/EXIT
when the idle state stops the local timer.

Add a new flag CPUIDLE_FLAG_TIMER_STOP which can be set by the cpuidle
drivers. If the flag is set, the cpuidle core code takes care of the
notification on behalf of the driver to avoid pointless code duplication.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-01 01:10:27 +02:00
Linus Torvalds bab588fcfb arm-soc: soc-specific updates
This is a larger set of new functionality for the existing SoC families,
 including:
 
 * vt8500 gains support for new CPU cores, notably the Cortex-A9 based wm8850
 * prima2 gains support for the "marco" SoC family, its SMP based cousin
 * tegra gains support for the new Tegra4 (Tegra114) family
 * socfpga now supports a newer version of the hardware including SMP
 * i.mx31 and bcm2835 are now using DT probing for their clocks
 * lots of updates for sh-mobile
 * OMAP updates for clocks, power management and USB
 * i.mx6q and tegra now support cpuidle
 * kirkwood now supports PCIe hot plugging
 * tegra clock support is updated
 * tegra USB PHY probing gets implemented diffently
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAUSUyPGCrR//JCVInAQI4YA/+Nb0FaA7qMmTPuJhm7aZNfnwBcGxZ7IZp
 s2xByEl3r5zbLKlKGNGE0x7Q7ETHV4y9tohzi9ZduH2b60dMRYgII06CEmDPu6/h
 4vBap2oLzfWfs9hwpCIh7N9wNzxSj/R42vlXHhNmspHlw7cFk1yw5EeJ+ocxmZPq
 H9lyjAxsGErkZyM/xstNQ1Uvhc8XHAFSUzWrg8hvf6AVVR8hwpIqVzfIizv6Vpk6
 ryBoUBHfdTztAOrafK54CdRc7l6kVMomRodKGzMyasnBK3ZfFca3IR7elnxLyEFJ
 uPDu5DKOdYrjXC8X2dPM6kYiE41YFuqOV2ahBt9HqRe6liNBLHQ6NAH7f7+jBWSI
 eeWe84c2vFaqhAGlci/xm4GaP0ud5ZLudtiVPlDY5tYIADqLygNcx1HIt/5sT7QI
 h34LMjc4+/TGVWTVf5yRmIzTrCXZv5YoAak3UWFoM4nVBo/eYVyNLEt5g9YsfjrC
 P/GWrXJJvOCB3gAi31pgGYJzZg8K7kTTAh/dgxjqzU4f6nGRm5PBydiJe18/lWkH
 qtfNE0RbhxCi3JEBnxW48AIEndVSRbd7jf8upC/s9rPURtFSVXp4APTHVyNUKCip
 gojBxcRYtesyG/53nrwdTyiyHx6GocmWnMNZJoDo0UQEkog2dOef+StdC3zhc2Vm
 9EttcFqWJ+E=
 =PRrg
 -----END PGP SIGNATURE-----

Merge tag 'soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC-specific updates from Arnd Bergmann:
 "This is a larger set of new functionality for the existing SoC
  families, including:

   - vt8500 gains support for new CPU cores, notably the Cortex-A9 based
     wm8850

   - prima2 gains support for the "marco" SoC family, its SMP based
     cousin

   - tegra gains support for the new Tegra4 (Tegra114) family

   - socfpga now supports a newer version of the hardware including SMP

   - i.mx31 and bcm2835 are now using DT probing for their clocks

   - lots of updates for sh-mobile

   - OMAP updates for clocks, power management and USB

   - i.mx6q and tegra now support cpuidle

   - kirkwood now supports PCIe hot plugging

   - tegra clock support is updated

   - tegra USB PHY probing gets implemented diffently"

* tag 'soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (148 commits)
  ARM: prima2: remove duplicate v7_invalidate_l1
  ARM: shmobile: r8a7779: Correct TMU clock support again
  ARM: prima2: fix __init section for cpu hotplug
  ARM: OMAP: Consolidate OMAP USB-HS platform data (part 3/3)
  ARM: OMAP: Consolidate OMAP USB-HS platform data (part 1/3)
  arm: socfpga: Add SMP support for actual socfpga harware
  arm: Add v7_invalidate_l1 to cache-v7.S
  arm: socfpga: Add entries to enable make dtbs socfpga
  arm: socfpga: Add new device tree source for actual socfpga HW
  ARM: tegra: sort Kconfig selects for Tegra114
  ARM: tegra: enable ARCH_REQUIRE_GPIOLIB for Tegra114
  ARM: tegra: Fix build error w/ ARCH_TEGRA_114_SOC w/o ARCH_TEGRA_3x_SOC
  ARM: tegra: Fix build error for gic update
  ARM: tegra: remove empty tegra_smp_init_cpus()
  ARM: shmobile: Register ARM architected timer
  ARM: MARCO: fix the build issue due to gic-vic-to-irqchip move
  ARM: shmobile: r8a7779: Correct TMU clock support
  ARM: mxs_defconfig: Select CONFIG_DEVTMPFS_MOUNT
  ARM: mxs: decrease mxs_clockevent_device.min_delta_ns to 2 clock cycles
  ARM: mxs: use apbx bus clock to drive the timers on timrotv2
  ...
2013-02-21 15:27:22 -08:00
Andrew Lunn 9cfc94eb0f cpuidle: kirkwood: Move out of mach directory
Move the Kirkwood cpuidle driver out of arch/arm/mach-kirkwood and
into drivers/cpuidle. Convert the driver into a platform driver.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
2013-01-31 17:01:37 +00:00
Paul Gortmaker 43720bd601 PM / tracing: remove deprecated power trace API
The text in Documentation said it would be removed in 2.6.41;
the text in the Kconfig said removal in the 3.1 release.  Either
way you look at it, we are well past both, so push it off a cliff.

Note that the POWER_CSTATE and the POWER_PSTATE are part of the
legacy tracing API.  Remove all tracepoints which use these flags.
As can be seen from context, most already have a trace entry via
trace_cpu_idle anyways.

Also, the cpufreq/cpufreq.c PSTATE one is actually unpaired, as
compared to the CSTATE ones which all have a clear start/stop.
As part of this, the trace_power_frequency also becomes orphaned,
so it too is deleted.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-01-26 00:39:12 +01:00
Daniel Lezcano 8aef33a7cf cpuidle: remove the power_specified field in the driver
We realized that the power usage field is never filled and when it
is filled for tegra, the power_specified flag is not set causing all
of these values to be reset when the driver is initialized with
set_power_state().

However, the power_specified flag can be simply removed under the
assumption that the states are always backward sorted, which is the
case with the current code.

This change allows the menu governor select function and the
cpuidle_play_dead() to be simplified.  Moreover, the
set_power_states() function can removed as it does not make sense
any more.

Drop the power_specified flag from struct cpuidle_driver and make
the related changes as described above.

As a consequence, this also fixes the bug where on the dynamic
C-states system, the power fields are not initialized.

[rjw: Changelog]
References: https://bugzilla.kernel.org/show_bug.cgi?id=42870
References: https://bugzilla.kernel.org/show_bug.cgi?id=43349
References: https://lkml.org/lkml/2012/10/16/518
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-01-15 14:18:04 +01:00
Krzysztof Mazur 392370e7aa cpuidle: fix number of initialized/destroyed states
Commit bf4d1b5ddb (cpuidle: support
multiple drivers) changed the number of initialized state kobjects
in cpuidle_add_state_sysfs() from device->state_count to
drv->state_count, but left device->state_count in
cpuidle_remove_state_sysfs().  The values of these two fields may be
different, in which case a NULL pointer dereference may happen in
cpuidle_remove_state_sysfs(), for example.  Fix this problem by making
cpuidle_add_state_sysfs() use device->state_count too (which restores
the original behavior of it).

[rjw: Changelog]
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-01-11 23:20:09 +01:00
Daniel Lezcano ac34d7c8c8 cpuidle: fix lock contention in the idle path
Commit bf4d1b5 (cpuidle: support multiple drivers) introduced
locking in cpuidle_get_cpu_driver(), which is used in the
idle_call() function.

This leads to a contention problem with a large number of CPUs,
because they all try to run the idle routine at the same time.

The lock can be safely removed because of how is used the cpuidle
API.  Namely, cpuidle_register_driver() is called first, but the
cpuidle idle function is not entered before cpuidle_register_device()
is called, because the cpuidle device is not enabled then. Moreover,
cpuidle_unregister_driver(), which would reset the driver value to
NULL, is not called before cpuidle_unregister_device().

All of the cpuidle drivers use the API in the same way.

In general, a cleanup around the lock is necessary and a proper
refcounting mechanism should be used to ensure the consistency in the
API (for example, cpuidle_unregister_driver() should fail if the
driver's refcount is not 0). However, these modifications will require
some code reorganization and rewrite which will be too intrusive for
a fix.

For this reason, fix the contention problem introduced by commit
bf4d1b5 by simply removing the locking from cpuidle_get_cpu_driver(),
which restores the original behavior of that routine.

[rjw: Changelog.]
Reported-and-tested-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-01-03 13:11:06 +01:00
Sivaram Nair 92638e2fac cpuidle / coupled: fix ready counter decrement
The ready_waiting_counts atomic variable is compared against the wrong
online cpu count. The latter is computed incorrectly using logical-OR
instead of bit-OR. This patch fixes that.

Signed-off-by: Sivaram Nair <sivaramn@nvidia.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Acked-by: Colin Cross <ccross@android.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-01-03 13:11:05 +01:00
Sivaram Nair 0e5537b30d cpuidle: Fix finding state with min power_usage
Since cpuidle_state.power_usage is a signed value, use INT_MAX (instead
of -1) to init the local copies so that functions that tries to find
cpuidle states with minimum power usage works correctly even if they use
non-negative values.

Signed-off-by: Sivaram Nair <sivaramn@nvidia.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-01-03 13:11:05 +01:00
Linus Torvalds d027db132b ARM: arm-soc: SoC updates for 3.8
This contains the bulk of new SoC development for this merge window.
 
 Two new platforms have been added, the sunxi platforms (Allwinner A1x
 SoCs) by Maxime Ripard, and a generic Broadcom platform for a new
 series of ARMv7 platforms from them, where the hope is that we can
 keep the platform code generic enough to have them all share one mach
 directory. The new Broadcom platform is contributed by Christian Daudt.
 
 Highbank has grown support for Calxeda's next generation of hardware,
 ECX-2000.
 
 clps711x has seen a lot of cleanup from Alexander Shiyan, and he's also
 taken on maintainership of the platform.
 
 Beyond this there has been a bunch of work from a number of people on
 converting more platforms to IRQ domains, pinctrl conversion, cleanup
 and general feature enablement across most of the active platforms.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQyLCjAAoJEIwa5zzehBx3AdQP/R+L3+EQMjiEWt/p7g/ql5Em
 0SnP92CcGzrjgLTg9z1FeOazfOsGnkZAYUlDRkqfKobH3VqkhYFFtt1/0x0KMahm
 xcowHgMBOyimFdWT9vLK3J8U6DLui5XrEG9LGH2VL+lqmfjIyP/OOF3mVc0/+pV9
 WTLAsYswdBRSeiNuF43kqlfrOwF6xsPLgiNMlc82w6BzHqoHu6dOif5M9MqWaApS
 V74DPmwLD371Tyit6aHqt3JOqpgiPSHlmxkzomK+5idcW3Pa7HnzzFYmx85dk/eN
 J2siqIkoOu7tEfjIbNZTL2MYoX4tUUKv4qZZ3IOl3YSWaV3P5ilMApF01XVrkk8E
 DWOMhzte9hC7L90W+/kCPLF1VyeAhCem2KQWUitO71fKur3r+3ZaUokNVvWzkJIL
 7aduxAJOV2hfLgEqbjbjF3o4S8p63OV3kzivFJM1And15zDJo4+qqOh67+bPo4jj
 +R4du+SqzXriw4i3tDLGVpdjDffk4D41tbLzgkWAtvGyoP45yeYfHAzAh0pDFPRv
 ASfZVmZ5PhwAUAkIMnpC2sjgmxMYff3SYqmDgnsqXES7rbDH/hG+teymtHFTyUQp
 m+f60DNotSMcMvkLdvruLSB4aeTiwbfOqPn/g+aXYUlPuNMq1fVWgN7EJKWkamK4
 nRwaJmLwx1/ojcVbpy2G
 =YMKB
 -----END PGP SIGNATURE-----

Merge tag 'soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC updates from Olof Johansson:
 "This contains the bulk of new SoC development for this merge window.

  Two new platforms have been added, the sunxi platforms (Allwinner A1x
  SoCs) by Maxime Ripard, and a generic Broadcom platform for a new
  series of ARMv7 platforms from them, where the hope is that we can
  keep the platform code generic enough to have them all share one mach
  directory.  The new Broadcom platform is contributed by Christian
  Daudt.

  Highbank has grown support for Calxeda's next generation of hardware,
  ECX-2000.

  clps711x has seen a lot of cleanup from Alexander Shiyan, and he's
  also taken on maintainership of the platform.

  Beyond this there has been a bunch of work from a number of people on
  converting more platforms to IRQ domains, pinctrl conversion, cleanup
  and general feature enablement across most of the active platforms."

Fix up trivial conflicts as per Olof.

* tag 'soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (174 commits)
  mfd: vexpress-sysreg: Remove LEDs code
  irqchip: irq-sunxi: Add terminating entry for sunxi_irq_dt_ids
  clocksource: sunxi_timer: Add terminating entry for sunxi_timer_dt_ids
  irq: versatile: delete dangling variable
  ARM: sunxi: add missing include for mdelay()
  ARM: EXYNOS: Avoid early use of of_machine_is_compatible()
  ARM: dts: add node for PL330 MDMA1 controller for exynos4
  ARM: EXYNOS: Add support for secondary CPU bring-up on Exynos4412
  ARM: EXYNOS: add UART3 to DEBUG_LL ports
  ARM: S3C24XX: Add clkdev entry for camif-upll clock
  ARM: SAMSUNG: Add s3c24xx/s3c64xx CAMIF GPIO setup helpers
  ARM: sunxi: Add missing sun4i.dtsi file
  pinctrl: samsung: Do not initialise statics to 0
  ARM i.MX6: remove gate_mask from pllv3
  ARM i.MX6: Fix ethernet PLL clocks
  ARM i.MX6: rename PLLs according to datasheet
  ARM i.MX6: Add pwm support
  ARM i.MX51: Add pwm support
  ARM i.MX53: Add pwm support
  ARM: mx5: Replace clk_register_clkdev with clock DT lookup
  ...
2012-12-12 12:05:15 -08:00
Julius Werner a474a51549 cpuidle: Measure idle state durations with monotonic clock
Many cpuidle drivers measure their time spent in an idle state by
reading the wallclock time before and after idling and calculating the
difference. This leads to erroneous results when the wallclock time gets
updated by another processor in the meantime, adding that clock
adjustment to the idle state's time counter.

If the clock adjustment was negative, the result is even worse due to an
erroneous cast from int to unsigned long long of the last_residency
variable. The negative 32 bit integer will zero-extend and result in a
forward time jump of roughly four billion milliseconds or 1.3 hours on
the idle state residency counter.

This patch changes all affected cpuidle drivers to either use the
monotonic clock for their measurements or make use of the generic time
measurement wrapper in cpuidle.c, which was already working correctly.
Some superfluous CLIs/STIs in the ACPI code are removed (interrupts
should always already be disabled before entering the idle function, and
not get reenabled until the generic wrapper has performed its second
measurement). It also removes the erroneous cast, making sure that
negative residency values are applied correctly even though they should
not appear anymore.

Signed-off-by: Julius Werner <jwerner@chromium.org>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Tested-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-27 14:17:58 +01:00
Li Zhong a093b93ee0 cpuidle: fix a suspicious RCU usage in menu governor
I saw this suspicious RCU usage on the next tree of 11/15

[   67.123404] ===============================
[   67.123413] [ INFO: suspicious RCU usage. ]
[   67.123423] 3.7.0-rc5-next-20121115-dirty #1 Not tainted
[   67.123434] -------------------------------
[   67.123444] include/trace/events/timer.h:186 suspicious rcu_dereference_check() usage!
[   67.123458]
[   67.123458] other info that might help us debug this:
[   67.123458]
[   67.123474]
[   67.123474] RCU used illegally from idle CPU!
[   67.123474] rcu_scheduler_active = 1, debug_locks = 0
[   67.123493] RCU used illegally from extended quiescent state!
[   67.123507] 1 lock held by swapper/1/0:
[   67.123516]  #0:  (&cpu_base->lock){-.-...}, at: [<c0000000000979b0>] .__hrtimer_start_range_ns+0x28c/0x524
[   67.123555]
[   67.123555] stack backtrace:
[   67.123566] Call Trace:
[   67.123576] [c0000001e2ccb920] [c00000000001275c] .show_stack+0x78/0x184 (unreliable)
[   67.123599] [c0000001e2ccb9d0] [c0000000000c15a0] .lockdep_rcu_suspicious+0x120/0x148
[   67.123619] [c0000001e2ccba70] [c00000000009601c] .enqueue_hrtimer+0x1c0/0x1c8
[   67.123639] [c0000001e2ccbb00] [c000000000097aa0] .__hrtimer_start_range_ns+0x37c/0x524
[   67.123660] [c0000001e2ccbc20] [c0000000005c9698] .menu_select+0x508/0x5bc
[   67.123678] [c0000001e2ccbd20] [c0000000005c740c] .cpuidle_idle_call+0xa8/0x6e4
[   67.123699] [c0000001e2ccbdd0] [c0000000000459a0] .pSeries_idle+0x10/0x34
[   67.123717] [c0000001e2ccbe40] [c000000000014dc8] .cpu_idle+0x130/0x280
[   67.123738] [c0000001e2ccbee0] [c0000000006ffa8c] .start_secondary+0x378/0x384
[   67.123758] [c0000001e2ccbf90] [c00000000000936c] .start_secondary_prolog+0x10/0x14

hrtimer_start was added in 198fd638 and ae515197. The patch below tries
to use RCU_NONIDLE around it to avoid the above report.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-23 00:05:03 +01:00
Daniel Lezcano bf4d1b5ddb cpuidle: support multiple drivers
With the tegra3 and the big.LITTLE [1] new architectures, several cpus
with different characteristics (latencies and states) can co-exists on the
system.

The cpuidle framework has the limitation of handling only identical cpus.

This patch removes this limitation by introducing the multiple driver support
for cpuidle.

This option is configurable at compile time and should be enabled for the
architectures mentioned above. So there is no impact for the other platforms
if the option is disabled. The option defaults to 'n'. Note the multiple drivers
support is also compatible with the existing drivers, even if just one driver is
needed, all the cpu will be tied to this driver using an extra small chunk of
processor memory.

The multiple driver support use a per-cpu driver pointer instead of a global
variable and the accessor to this variable are done from a cpu context.

In order to keep the compatibility with the existing drivers, the function
'cpuidle_register_driver' and 'cpuidle_unregister_driver' will register
the specified driver for all the cpus.

The semantic for the output of /sys/devices/system/cpu/cpuidle/current_driver
remains the same except the driver name will be related to the current cpu.

The /sys/devices/system/cpu/cpu[0-9]/cpuidle/driver/name files are added
allowing to read the per cpu driver name.

[1] http://lwn.net/Articles/481055/

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Peter De Schrijver <pdeschrijver@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:34:23 +01:00
Daniel Lezcano 13dd52f11a cpuidle: prepare the cpuidle core to handle multiple drivers
This patch is a preparation for the multiple cpuidle drivers support.

As the next patch will introduce the multiple drivers with the Kconfig
option and we want to keep the code clean and understandable, this patch
defines a set of functions for encapsulating some common parts and splits
what should be done under a lock from the rest.

[rjw: Modified the subject and changelog slightly.]
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Peter De Schrijver <pdeschrijver@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:34:22 +01:00
Daniel Lezcano 4168203271 cpuidle: move driver checking within the lock section
The code is racy and the check with cpuidle_curr_driver should be
done under the lock.

I don't find a path in the different drivers where that could happen
because the arch specific drivers are written in such way it is not
possible to register a driver while it is unregistered, except maybe
in a very improbable case when "intel_idle" and "processor_idle" are
competing. One could unregister a driver, while the other one is
registering.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Peter De Schrijver <pdeschrijver@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:34:22 +01:00
Daniel Lezcano 42f67f2aca cpuidle: move driver's refcount to cpuidle
We want to support different cpuidle drivers co-existing together.
In this case we should move the refcount to the cpuidle_driver
structure to handle several drivers at a time.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Peter De Schrijver <pdeschrijver@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:34:22 +01:00
Daniel Lezcano 8f3e9953e1 cpuidle: fixup device.h header in cpuidle.h
The "struct device" is only used in sysfs.c.

The other .c files including the private header "cpuidle.h"
do not need to pull the entire headers tree from there as they
don't manipulate the "struct device".

This patch fixes this by moving the header inclusion to sysfs.c
and adding a forward declaration for the struct device.

The number of lines generated by the preprocesor:
Without this patch : 17269 loc
With this patch : 16446 loc

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:34:21 +01:00
Daniel Lezcano 349631e0e4 cpuidle / sysfs: move structure declaration into the sysfs.c file
The structure cpuidle_state_kobj is not used anywhere except
in the sysfs.c file. The definition of this structure is not
needed in the cpuidle header file. This patch moves it to the
sysfs.c file in order to encapsulate the code a bit more.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:34:21 +01:00
Youquan Song c96ca4fb76 cpuidle: Get typical recent sleep interval
The function detect_repeating_patterns was not very useful for
workloads with alternating long and short pauses, for example
virtual machines handling network requests for each other (say
a web and database server).

Instead, try to find a recent sleep interval that is somewhere
between the median and the mode sleep time, by discarding outliers
to the up side and recalculating the average and standard deviation
until that is no longer required.

This should do something sane with a sleep interval series like:

	200 180 210 10000 30 1000 170 200

The current code would simply discard such a series, while the
new code will guess a typical sleep interval just shy of 200.

The original patch come from Rik van Riel <riel@redhat.com>.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:34:20 +01:00
Youquan Song d73d68dc49 cpuidle: Set residency to 0 if target Cstate not enter
When cpuidle governor choose a C-state to enter for idle CPU, but it notice that
there is tasks request to be executed. So the idle CPU will not really enter
the target C-state and go to run task.

In this situation, it will use the residency of previous really entered target
C-states. Obviously, it is not reasonable.

So, this patch fix it by set the target C-state residency to 0.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:34:20 +01:00
Youquan Song e11538d1f0 cpuidle: Quickly notice prediction failure in general case
The prediction for future is difficult and when the cpuidle governor prediction
fails and govenor possibly choose the shallower C-state than it should. How to
quickly notice and find the failure becomes important for power saving.

The patch extends to general case that prediction logic get a small predicted
residency, so it choose a shallow C-state though the expected residency is large
. Once the prediction will be fail, the CPU will keep staying at shallow C-state
for a long time. Acutally, the CPU has change enter into deep C-state.
So when the expected residency is long enough but governor choose a shallow
C-state, an timer will be added in order to monitor if the prediction failure.

When C-state is waken up prior to the adding timer, the timer will be cancelled
initiatively. When the timer is triggered and menu governor will quickly notice
prediction failure and re-evaluates deeper C-states possibility.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:34:20 +01:00
Youquan Song 69a37beabf cpuidle: Quickly notice prediction failure for repeat mode
The prediction for future is difficult and when the cpuidle governor prediction
fails and govenor possibly choose the shallower C-state than it should. How to
quickly notice and find the failure becomes important for power saving.

cpuidle menu governor has a method to predict the repeat pattern if there are 8
C-states residency which are continuous and the same or very close, so it will
predict the next C-states residency will keep same residency time.

There is a real case that turbostat utility (tools/power/x86/turbostat)
at kernel 3.3 or early. turbostat utility will read 10 registers one by one at
Sandybridge, so it will generate 10 IPIs to wake up idle CPUs. So cpuidle menu
 governor will predict it is repeat mode and there is another IPI wake up idle
 CPU soon, so it keeps idle CPU stay at C1 state even though CPU is totally
idle. However, in the turbostat, following 10 registers reading is sleep 5
seconds by default, so the idle CPU will keep at C1 for a long time though it is
 idle until break event occurs.
In a idle Sandybridge system, run "./turbostat -v", we will notice that deep
C-state dangles between "70% ~ 99%". After patched the kernel, we will notice
deep C-state stays at >99.98%.

In the patch, a timer is added when menu governor detects a repeat mode and
choose a shallow C-state. The timer is set to a time out value that greater
than predicted time, and we conclude repeat mode prediction failure if timer is
triggered. When repeat mode happens as expected, the timer is not triggered
and CPU waken up from C-states and it will cancel the timer initiatively.
When repeat mode does not happen, the timer will be time out and menu governor
will quickly notice that the repeat mode prediction fails and then re-evaluates
deeper C-states possibility.

Below is another case which will clearly show the patch much benefit:

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <sys/time.h>
#include <time.h>
#include <pthread.h>

volatile int * shutdown;
volatile long * count;
int delay = 20;
int loop = 8;

void usage(void)
{
	fprintf(stderr,
		"Usage: idle_predict [options]\n"
		"  --help	-h  Print this help\n"
		"  --thread	-n  Thread number\n"
		"  --loop     	-l  Loop times in shallow Cstate\n"
		"  --delay	-t  Sleep time (uS)in shallow Cstate\n");
}

void *simple_loop() {
	int idle_num = 1;
	while (!(*shutdown)) {
		*count = *count + 1;

		if (idle_num % loop)
			usleep(delay);
		else {
			/* sleep 1 second */
			usleep(1000000);
			idle_num = 0;
		}
		idle_num++;
	}

}

static void sighand(int sig)
{
	*shutdown = 1;
}

int main(int argc, char *argv[])
{
	sigset_t sigset;
	int signum = SIGALRM;
	int i, c, er = 0, thread_num = 8;
	pthread_t pt[1024];

	static char optstr[] = "n:l:t:h:";

	while ((c = getopt(argc, argv, optstr)) != EOF)
		switch (c) {
			case 'n':
				thread_num = atoi(optarg);
				break;
			case 'l':
				loop = atoi(optarg);
				break;
			case 't':
				delay = atoi(optarg);
				break;
			case 'h':
			default:
				usage();
				exit(1);
		}

	printf("thread=%d,loop=%d,delay=%d\n",thread_num,loop,delay);
	count = malloc(sizeof(long));
	shutdown = malloc(sizeof(int));
	*count = 0;
	*shutdown = 0;

	sigemptyset(&sigset);
	sigaddset(&sigset, signum);
	sigprocmask (SIG_BLOCK, &sigset, NULL);
	signal(SIGINT, sighand);
	signal(SIGTERM, sighand);

	for(i = 0; i < thread_num ; i++)
		pthread_create(&pt[i], NULL, simple_loop, NULL);

	for (i = 0; i < thread_num; i++)
		pthread_join(pt[i], NULL);

	exit(0);
}

Get powertop V2 from git://github.com/fenrus75/powertop, build powertop.
After build the above test application, then run it.
Test plaform can be Intel Sandybridge or other recent platforms.
#./idle_predict -l 10 &
#./powertop

We will find that deep C-state will dangle between 40%~100% and much time spent
on C1 state. It is because menu governor wrongly predict that repeat mode
is kept, so it will choose the C1 shallow C-state even though it has chance to
sleep 1 second in deep C-state.

While after patched the kernel, we find that deep C-state will keep >99.6%.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:34:19 +01:00
Daniel Lezcano e45a00d679 cpuidle / sysfs: move kobj initialization in the syfs file
Move the kobj initialization and completion in the sysfs.c
and encapsulate the code more.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:34:19 +01:00
Daniel Lezcano 1aef40e288 cpuidle / sysfs: change function parameter
The function needs the cpuidle_device which is initially passed to the
caller.

The current code gets the struct device from the struct cpuidle_device,
pass it the cpuidle_add_sysfs function. This function calls
per_cpu(cpuidle_devices, cpu) to get the cpuidle_device.

This patch pass the cpuidle_device instead and simplify the code.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:34:19 +01:00
Rob Herring be6a98d3f0 cpuidle: add Calxeda SOC idle support
Add support for core powergating on Calxeda platforms. Initially, this
supports ECX-1000 (highbank), but support will be added for ECX-2000
later.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Cc: Len Brown <len.brown@intel.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
2012-11-07 17:15:36 -06:00
Srivatsa S. Bhat cf31cd1a0c ACPI idle, CPU hotplug: Fix NULL pointer dereference during hotplug
On a KVM guest, when a CPU is taken offline and brought back online, we hit
the following NULL pointer dereference:

[   45.400843] Unregister pv shared memory for cpu 1
[   45.412331] smpboot: CPU 1 is now offline
[   45.529894] SMP alternatives: lockdep: fixing up alternatives
[   45.533472] smpboot: Booting Node 0 Processor 1 APIC 0x1
[   45.411526] kvm-clock: cpu 1, msr 0:7d14601, secondary cpu clock
[   45.571370] KVM setup async PF for cpu 1
[   45.572331] kvm-stealtime: cpu 1, msr 7d0e040
[   45.575031] BUG: unable to handle kernel NULL pointer dereference at           (null)
[   45.576017] IP: [<ffffffff81519f98>] cpuidle_disable_device+0x18/0x80
[   45.576017] PGD 5dfb067 PUD 5da8067 PMD 0
[   45.576017] Oops: 0000 [#1] SMP
[   45.576017] Modules linked in:
[   45.576017] CPU 0
[   45.576017] Pid: 607, comm: stress_cpu_hotp Not tainted 3.6.0-padata-tp-debug #3 Bochs Bochs
[   45.576017] RIP: 0010:[<ffffffff81519f98>]  [<ffffffff81519f98>] cpuidle_disable_device+0x18/0x80
[   45.576017] RSP: 0018:ffff880005d93ce8  EFLAGS: 00010286
[   45.576017] RAX: ffff880005d93fd8 RBX: 0000000000000000 RCX: 0000000000000006
[   45.576017] RDX: 0000000000000006 RSI: 2222222222222222 RDI: 0000000000000000
[   45.576017] RBP: ffff880005d93cf8 R08: 2222222222222222 R09: 2222222222222222
[   45.576017] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[   45.576017] R13: 0000000000000000 R14: ffffffff81c8cca0 R15: 0000000000000001
[   45.576017] FS:  00007f91936ae700(0000) GS:ffff880007c00000(0000) knlGS:0000000000000000
[   45.576017] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   45.576017] CR2: 0000000000000000 CR3: 0000000005db3000 CR4: 00000000000006f0
[   45.576017] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   45.576017] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   45.576017] Process stress_cpu_hotp (pid: 607, threadinfo ffff880005d92000, task ffff8800066bbf40)
[   45.576017] Stack:
[   45.576017]  ffff880007a96400 0000000000000000 ffff880005d93d28 ffffffff813ac689
[   45.576017]  ffff880007a96400 ffff880007a96400 0000000000000002 ffffffff81cd8d01
[   45.576017]  ffff880005d93d58 ffffffff813aa498 0000000000000001 00000000ffffffdd
[   45.576017] Call Trace:
[   45.576017]  [<ffffffff813ac689>] acpi_processor_hotplug+0x55/0x97
[   45.576017]  [<ffffffff813aa498>] acpi_cpu_soft_notify+0x93/0xce
[   45.576017]  [<ffffffff816ae47d>] notifier_call_chain+0x5d/0x110
[   45.576017]  [<ffffffff8109730e>] __raw_notifier_call_chain+0xe/0x10
[   45.576017]  [<ffffffff81069050>] __cpu_notify+0x20/0x40
[   45.576017]  [<ffffffff81069085>] cpu_notify+0x15/0x20
[   45.576017]  [<ffffffff816978f1>] _cpu_up+0xee/0x137
[   45.576017]  [<ffffffff81697983>] cpu_up+0x49/0x59
[   45.576017]  [<ffffffff8168758d>] store_online+0x9d/0xe0
[   45.576017]  [<ffffffff8140a9f8>] dev_attr_store+0x18/0x30
[   45.576017]  [<ffffffff812322c0>] sysfs_write_file+0xe0/0x150
[   45.576017]  [<ffffffff811b389c>] vfs_write+0xac/0x180
[   45.576017]  [<ffffffff811b3be2>] sys_write+0x52/0xa0
[   45.576017]  [<ffffffff816b31e9>] system_call_fastpath+0x16/0x1b
[   45.576017] Code: 48 c7 c7 40 e5 ca 81 e8 07 d0 18 00 5d c3 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 48 83 ec 10 48 89 5d f0 4c 89 65 f8 48 89 fb <f6> 07 02 75 13 48 8b 5d f0 4c 8b 65 f8 c9 c3 66 0f 1f 84 00 00
[   45.576017] RIP  [<ffffffff81519f98>] cpuidle_disable_device+0x18/0x80
[   45.576017]  RSP <ffff880005d93ce8>
[   45.576017] CR2: 0000000000000000
[   45.656079] ---[ end trace 433d6c9ac0b02cef ]---

Analysis:
Commit 3d339dc (cpuidle / ACPI : move cpuidle_device field out of the
acpi_processor_power structure()) made the allocation of the dev structure
(struct cpuidle) of a CPU dynamic, whereas previously it was statically
allocated. And this dynamic allocation occurs in acpi_processor_power_init()
if pr->flags.power evaluates to non-zero.

On KVM guests, pr->flags.power evaluates to zero, hence dev is never
allocated. This causes the NULL pointer (dev) dereference in
cpuidle_disable_device() during a subsequent CPU online operation. Fix this
by ensuring that dev is non-NULL before dereferencing.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-10-08 22:52:54 -04:00
Daniel Lezcano ed953472d1 cpuidle: rename function name "__cpuidle_register_driver", v2
The function __cpuidle_register_driver name is confusing because it
suggests, conforming to the coding style of the kernel, it registers
the driver without taking a lock. Actually, it just fill the different
power field states with a decresing value if the power has not been
specified.

Clarify the purpose of the function by changing its name and
move the condition out of this function.

This patch fix nothing and does not change the behavior of the
function. It is just for the sake of clarity.

IHMO, reading in the code:

+       if (!drv->power_specified)
+               set_power_states(drv);

is much more explicit than:

-       __cpuidle_register_driver(drv);

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-09-22 00:38:32 +02:00
Daniel Lezcano a77de28662 cpuidle: remove some empty lines
This mindless patch is just about removing some trailing
carriage returns.

[rjw: Changed the subject.]

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-09-19 21:59:42 +02:00
Rafael J. Wysocki 66804c13f7 PM / cpuidle: Make ladder governor use the "disabled" state flag
For the mechanism introduced by commit cbc9ef0 (PM / Domains: Add
preliminary support for cpuidle, v2) to work with the ladder
governor, that governor should respect the "disabled" state flag
added by that commit.  Change the ladder governor accordingly.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-09-04 01:35:45 +02:00
Carsten Emde 62d6ae880e Honor state disabling in the cpuidle ladder governor
There are two cpuidle governors ladder and menu. While the ladder
governor is always available, if CONFIG_CPU_IDLE is selected, the
menu governor additionally requires CONFIG_NO_HZ.

A particular C state can be disabled by writing to the sysfs file
/sys/devices/system/cpu/cpuN/cpuidle/stateN/disable, but this mechanism
is only implemented in the menu governor. Thus, in a system where
CONFIG_NO_HZ is not selected, the ladder governor becomes default and
always will walk through all sleep states - irrespective of whether the
C state was disabled via sysfs or not. The only way to select a specific
C state was to write the related latency to /dev/cpu_dma_latency and
keep the file open as long as this setting was required - not very
practical and not suitable for setting a single core in an SMP system.

With this patch, the ladder governor only will promote to the next
C state, if it has not been disabled, and it will demote, if the
current C state was disabled.

Note that the patch does not make the setting of the sysfs variable
"disable" coherent, i.e. if one is disabling a light state, then all
deeper states are disabled as well, but the "disable" variable does not
reflect it. Likewise, if one enables a deep state but a lighter state
still is disabled, then this has no effect. A related section has been
added to the documentation.

Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-09-04 01:35:44 +02:00
Jon Medhurst (Tixy) 5fbbb90dfd cpuidle: Prevent null pointer dereference in cpuidle_coupled_cpu_notify
When a kernel is built to support multiple hardware types it's possible
that CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED is set but the hardware the
kernel is run on doesn't support cpuidle and therefore doesn't load a
driver for it. In this case, when the system is shut down,
cpuidle_coupled_cpu_notify() gets called with cpuidle_devices set to
NULL. There are quite possibly other circumstances where this
situation can also occur and we should check for it.

Signed-off-by: Jon Medhurst <tixy@linaro.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-08-17 19:37:08 +02:00
Colin Cross 63c6ba4352 cpuidle: coupled: fix sleeping while atomic in cpu notifier
The cpu hotplug notifier gets called in both atomic and non-atomic
contexts, it is not always safe to lock a mutex.  Filter out all events
except the six necessary ones, which are all sleepable, before taking
the mutex.

Signed-off-by: Colin Cross <ccross@android.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-08-17 19:37:01 +02:00
Linus Torvalds 476525004a Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull ACPI & power management update from Len Brown:
 "Re-write of the turbostat tool.
     lower overhead was necessary for measuring very large system when
     they are very idle.

  IVB support in intel_idle
     It's what I run on my IVB, others should be able to also:-)

  ACPICA core update
     We have found some bugs due to divergence between Linux and the
     upstream ACPICA base.  Most of these patches are to reduce that
     divergence to reduce the risk of future bugs.

  Some cpuidle updates, mostly for non-Intel
     More will be coming, as they depend on this part.

  Some thermal management changes needed by non-ACPI systems.

  Some _OST (OS Status Indication) updates for hot ACPI hot-plug."

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (51 commits)
  Thermal: Documentation update
  Thermal: Add Hysteresis attributes
  Thermal: Make Thermal trip points writeable
  ACPI/AC: prevent OOPS on some boxes due to missing check power_supply_register() return value check
  tools/power: turbostat: fix large c1% issue
  tools/power: turbostat v2 - re-write for efficiency
  ACPICA: Update to version 20120711
  ACPICA: AcpiSrc: Fix some translation issues for Linux conversion
  ACPICA: Update header files copyrights to 2012
  ACPICA: Add new ACPI table load/unload external interfaces
  ACPICA: Split file: tbxface.c -> tbxfload.c
  ACPICA: Add PCC address space to space ID decode function
  ACPICA: Fix some comment fields
  ACPICA: Table manager: deploy new firmware error/warning interfaces
  ACPICA: Add new interfaces for BIOS(firmware) errors and warnings
  ACPICA: Split exception code utilities to a new file, utexcep.c
  ACPI: acpi_pad: tune round_robin_time
  ACPICA: Update to version 20120620
  ACPICA: Add support for implicit notify on multiple devices
  ACPICA: Update comments; no functional change
  ...
2012-07-26 14:28:55 -07:00
Len Brown ec033d0a02 Merge branches 'acpi_pad', 'acpica', 'apei-bugzilla-43282', 'battery', 'cpuidle-coupled', 'cpuidle-tweaks', 'intel_idle-ivb', 'ost', 'red-hat-bz-772730', 'thermal', 'thermal-spear' and 'turbostat-v2' into release 2012-07-26 00:03:58 -04:00
Rafael J. Wysocki 7791bd230c Merge branch 'pm-domains'
* pm-domains:
  PM / Domains: Fix build warning for CONFIG_PM_RUNTIME unset
  PM / Domains: Replace plain integer with NULL pointer in domain.c file
  PM / Domains: Add missing static storage class specifier in domain.c file
  PM / Domains: Allow device callbacks to be added at any time
  PM / Domains: Add device domain data reference counter
  PM / Domains: Add preliminary support for cpuidle, v2
  PM / Domains: Do not stop devices after restoring their states
  PM / Domains: Use subsystem runtime suspend/resume callbacks by default
2012-07-19 00:03:17 +02:00
Preeti U Murthy 8651f97bd9 PM / cpuidle: System resume hang fix with cpuidle
On certain bios, resume hangs if cpus are allowed to enter idle states
during suspend [1].

This was fixed in apci idle driver [2].But intel_idle driver does not
have this fix. Thus instead of replicating the fix in both the idle
drivers, or in more platform specific idle drivers if needed, the
more general cpuidle infrastructure could handle this.

A suspend callback in cpuidle_driver could handle this fix. But
a cpuidle_driver provides only basic functionalities like platform idle
state detection capability and mechanisms to support entry and exit
into CPU idle states. All other cpuidle functions are found in the
cpuidle generic infrastructure for good reason that all cpuidle
drivers, irrepective of their platforms will support these functions.

One option therefore would be to register a suspend callback in cpuidle
which handles this fix. This could be called through a PM_SUSPEND_PREPARE
notifier. But this is too generic a notfier for a driver to handle.

Also, ideally the job of cpuidle is not to handle side effects of suspend.
It should expose the interfaces which "handle cpuidle 'during' suspend"
or any other operation, which the subsystems call during that respective
operation.

The fix demands that during suspend, no cpus should be allowed to enter
deep C-states. The interface cpuidle_uninstall_idle_handler() in cpuidle
ensures that. Not just that it also kicks all the cpus which are already
in idle out of their idle states which was being done during cpu hotplug
through a CPU_DYING_FROZEN callbacks.

Now the question arises about when during suspend should
cpuidle_uninstall_idle_handler() be called. Since we are dealing with
drivers it seems best to call this function during dpm_suspend().
Delaying the call till dpm_suspend_noirq() does no harm, as long as it is
before cpu_hotplug_begin() to avoid race conditions with cpu hotpulg
operations. In dpm_suspend_noirq(), it would be wise to place this call
before suspend_device_irqs() to avoid ugly interactions with the same.

Ananlogously, during resume.

References:
[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/674075.
[2] http://marc.info/?l=linux-pm&m=133958534231884&w=2

Reported-and-tested-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-07-10 21:34:49 +02:00
Rafael J. Wysocki cbc9ef0287 PM / Domains: Add preliminary support for cpuidle, v2
On some systems there are CPU cores located in the same power
domains as I/O devices.  Then, power can only be removed from the
domain if all I/O devices in it are not in use and the CPU core
is idle.  Add preliminary support for that to the generic PM domains
framework.

First, the platform is expected to provide a cpuidle driver with one
extra state designated for use with the generic PM domains code.
This state should be initially disabled and its exit_latency value
should be set to whatever time is needed to bring up the CPU core
itself after restoring power to it, not including the domain's
power on latency.  Its .enter() callback should point to a procedure
that will remove power from the domain containing the CPU core at
the end of the CPU power transition.

The remaining characteristics of the extra cpuidle state, referred to
as the "domain" cpuidle state below, (e.g. power usage, target
residency) should be populated in accordance with the properties of
the hardware.

Next, the platform should execute genpd_attach_cpuidle() on the PM
domain containing the CPU core.  That will cause the generic PM
domains framework to treat that domain in a special way such that:

 * When all devices in the domain have been suspended and it is about
   to be turned off, the states of the devices will be saved, but
   power will not be removed from the domain.  Instead, the "domain"
   cpuidle state will be enabled so that power can be removed from
   the domain when the CPU core is idle and the state has been chosen
   as the target by the cpuidle governor.

 * When the first I/O device in the domain is resumed and
   __pm_genpd_poweron(() is called for the first time after
   power has been removed from the domain, the "domain" cpuidle
   state will be disabled to avoid subsequent surprise power removals
   via cpuidle.

The effective exit_latency value of the "domain" cpuidle state
depends on the time needed to bring up the CPU core itself after
restoring power to it as well as on the power on latency of the
domain containing the CPU core.  Thus the "domain" cpuidle state's
exit_latency has to be recomputed every time the domain's power on
latency is updated, which may happen every time power is restored
to the domain, if the measured power on latency is greater than
the latency stored in the corresponding generic_pm_domain structure.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reviewed-by: Kevin Hilman <khilman@ti.com>
2012-07-03 19:07:42 +02:00
Rafael J. Wysocki 6e797a0788 PM / cpuidle: Add driver reference counter
Add a reference counter for the cpuidle driver, so that it can't
be unregistered when it is in use.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-07-03 19:06:25 +02:00
ShuoX Liu dc7fd275ae cpuidle: move field disable from per-driver to per-cpu
Andrew J.Schorr raises a question.  When he changes the disable setting on
a single CPU, it affects all the other CPUs.  Basically, currently, the
disable field is per-driver instead of per-cpu.  All the C states of the
same driver are shared by all CPU in the same machine.

The patch changes the `disable' field to per-cpu, so we could set this
separately for each cpu.

Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
Reported-by: Andrew J.Schorr <aschorr@telemetry-investments.com>
Reviewed-by: Yanmin Zhang <yanmin_zhang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-07-03 19:05:31 +02:00
Colin Cross 20ff51a36b cpuidle: coupled: add parallel barrier function
Adds cpuidle_coupled_parallel_barrier, which can be used by coupled
cpuidle state enter functions to handle resynchronization after
determining if any cpu needs to abort.  The normal use case will
be:

static bool abort_flag;
static atomic_t abort_barrier;

int arch_cpuidle_enter(struct cpuidle_device *dev, ...)
{
	if (arch_turn_off_irq_controller()) {
	   	/* returns an error if an irq is pending and would be lost
		   if idle continued and turned off power */
		abort_flag = true;
	}

	cpuidle_coupled_parallel_barrier(dev, &abort_barrier);

	if (abort_flag) {
	   	/* One of the cpus didn't turn off it's irq controller */
	   	arch_turn_on_irq_controller();
		return -EINTR;
	}

	/* continue with idle */
	...
}

This will cause all cpus to abort idle together if one of them needs
to abort.

Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Tested-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-06-02 00:49:36 -04:00
Colin Cross 4126c0197b cpuidle: add support for states that affect multiple cpus
On some ARM SMP SoCs (OMAP4460, Tegra 2, and probably more), the
cpus cannot be independently powered down, either due to
sequencing restrictions (on Tegra 2, cpu 0 must be the last to
power down), or due to HW bugs (on OMAP4460, a cpu powering up
will corrupt the gic state unless the other cpu runs a work
around).  Each cpu has a power state that it can enter without
coordinating with the other cpu (usually Wait For Interrupt, or
WFI), and one or more "coupled" power states that affect blocks
shared between the cpus (L2 cache, interrupt controller, and
sometimes the whole SoC).  Entering a coupled power state must
be tightly controlled on both cpus.

The easiest solution to implementing coupled cpu power states is
to hotplug all but one cpu whenever possible, usually using a
cpufreq governor that looks at cpu load to determine when to
enable the secondary cpus.  This causes problems, as hotplug is an
expensive operation, so the number of hotplug transitions must be
minimized, leading to very slow response to loads, often on the
order of seconds.

This file implements an alternative solution, where each cpu will
wait in the WFI state until all cpus are ready to enter a coupled
state, at which point the coupled state function will be called
on all cpus at approximately the same time.

Once all cpus are ready to enter idle, they are woken by an smp
cross call.  At this point, there is a chance that one of the
cpus will find work to do, and choose not to enter idle.  A
final pass is needed to guarantee that all cpus will call the
power state enter function at the same time.  During this pass,
each cpu will increment the ready counter, and continue once the
ready counter matches the number of online coupled cpus.  If any
cpu exits idle, the other cpus will decrement their counter and
retry.

To use coupled cpuidle states, a cpuidle driver must:

   Set struct cpuidle_device.coupled_cpus to the mask of all
   coupled cpus, usually the same as cpu_possible_mask if all cpus
   are part of the same cluster.  The coupled_cpus mask must be
   set in the struct cpuidle_device for each cpu.

   Set struct cpuidle_device.safe_state to a state that is not a
   coupled state.  This is usually WFI.

   Set CPUIDLE_FLAG_COUPLED in struct cpuidle_state.flags for each
   state that affects multiple cpus.

   Provide a struct cpuidle_state.enter function for each state
   that affects multiple cpus.  This function is guaranteed to be
   called on all cpus at approximately the same time.  The driver
   should ensure that the cpus all abort together if any cpu tries
   to abort once the function is called.

update1:

cpuidle: coupled: fix count of online cpus

online_count was never incremented on boot, and was also counting
cpus that were not part of the coupled set.  Fix both issues by
introducting a new function that counts online coupled cpus, and
call it from register as well as the hotplug notifier.

update2:

cpuidle: coupled: fix decrementing ready count

cpuidle_coupled_set_not_ready sometimes refuses to decrement the
ready count in order to prevent a race condition.  This makes it
unsuitable for use when finished with idle.  Add a new function
cpuidle_coupled_set_done that decrements both the ready count and
waiting count, and call it after idle is complete.

Cc: Amit Kucheria <amit.kucheria@linaro.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Trinabh Gupta <g.trinabh@gmail.com>
Cc: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Tested-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Colin Cross <ccross@android.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-06-02 00:49:09 -04:00
Colin Cross 3af272ab75 cpuidle: fix error handling in __cpuidle_register_device
Fix the error handling in __cpuidle_register_device to include
the missing list_del.  Move it to a label, which will simplify
the error handling when coupled states are added.

Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Tested-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Colin Cross <ccross@android.com>
Reviewed-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-06-02 00:48:49 -04:00
Colin Cross 56cfbf74a1 cpuidle: refactor out cpuidle_enter_state
Split the code to enter a state and update the stats into a helper
function, cpuidle_enter_state, and export it.  This function will
be called by the coupled state code to handle entering the safe
state and the final coupled state.

Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Tested-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Colin Cross <ccross@android.com>
Reviewed-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-06-02 00:48:31 -04:00
Srivatsa S. Bhat 1b0a0e9a15 cpuidle: add checks to avoid NULL pointer dereference
The existing check for dev == NULL in __cpuidle_register_device() is
rendered useless because dev is dereferenced before the check itself.
Moreover, correctly speaking, it is the job of the callers of this
function, i.e., cpuidle_register_device() & cpuidle_enable_device() (which
also happen to be exported functions) to ensure that
__cpuidle_register_device() is called with a non-NULL dev.

So add the necessary dev == NULL checks in the two callers and remove the
(useless) check from __cpuidle_register_device().

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-06-01 16:07:23 -04:00
Sergey Senozhatsky 0aeb9cac6f cpuidle: remove unused hrtimer_peek_ahead_timers() call
commit 9a6558371b
  Author: Arjan van de Ven <arjan@linux.intel.com>
  Date:   Sun Nov 9 12:45:10 2008 -0800

     regression: disable timer peek-ahead for 2.6.28

     It's showing up as regressions; disabling it very likely just papers
     over an underlying issue, but time is running out for 2.6.28, lets get
     back to this for 2.6.29

 Many years has passed since 2008, so it seems ok to remove whole `#if 0' block.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Trinabh Gupta <g.trinabh@gmail.com>
Cc: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-06-01 16:06:48 -04:00
Thomas Gleixner 4a1625133d cpuidle: Use kick_all_cpus_sync()
kick_all_cpus_sync() is the core implementation of cpu_idle_wait()
which is copied all over the arch code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120507175652.119842173@linutronix.de
2012-05-08 12:35:06 +02:00
Len Brown eeaab2d8af Merge branches 'idle-fix' and 'misc' into release 2012-04-06 21:48:59 -04:00
Toshi Kani ee01e66337 cpuidle: Fix panic in CPU off-lining with no idle driver
Fix a NULL pointer dereference panic in cpuidle_play_dead() during
CPU off-lining when no cpuidle driver is registered.  A cpuidle
driver may be registered at boot-time based on CPU type.  This patch
allows an off-lined CPU to enter HLT-based idle in this condition.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Cc: Boris Ostrovsky <boris.ostrovsky@amd.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Tested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-04-06 15:01:25 -04:00
Linus Torvalds a335750b9a Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull ACPI & Power Management changes from Len Brown:
 - ACPI 5.0 after-ripples, ACPICA/Linux divergence cleanup
 - cpuidle evolving, more ARM use
 - thermal sub-system evolving, ditto
 - assorted other PM bits

Fix up conflicts in various cpuidle implementations due to ARM cpuidle
cleanups (ARM at91 self-refresh and cpu idle code rewritten into
"standby" in asm conflicting with the consolidation of cpuidle time
keeping), trivial SH include file context conflict and RCU tracing fixes
in generic code.

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (77 commits)
  ACPI throttling: fix endian bug in acpi_read_throttling_status()
  Disable MCP limit exceeded messages from Intel IPS driver
  ACPI video: Don't start video device until its associated input device has been allocated
  ACPI video: Harden video bus adding.
  ACPI: Add support for exposing BGRT data
  ACPI: export acpi_kobj
  ACPI: Fix logic for removing mappings in 'acpi_unmap'
  CPER failed to handle generic error records with multiple sections
  ACPI: Clean redundant codes in scan.c
  ACPI: Fix unprotected smp_processor_id() in acpi_processor_cst_has_changed()
  ACPI: consistently use should_use_kmap()
  PNPACPI: Fix device ref leaking in acpi_pnp_match
  ACPI: Fix use-after-free in acpi_map_lsapic
  ACPI: processor_driver: add missing kfree
  ACPI, APEI: Fix incorrect APEI register bit width check and usage
  Update documentation for parameter *notrigger* in einj.txt
  ACPI, APEI, EINJ, new parameter to control trigger action
  ACPI, APEI, EINJ, limit the range of einj_param
  ACPI, APEI, Fix ERST header length check
  cpuidle: power_usage should be declared signed integer
  ...
2012-03-30 16:45:39 -07:00
Boris Ostrovsky 02401c06b7 cpuidle: power_usage should be declared signed integer
power_usage is always assigned a negative value and should be declared
a signed integer

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-03-30 03:23:30 -04:00
Boris Ostrovsky 1a022e3f1b idle, x86: Allow off-lined CPU to enter deeper C states
Currently when a CPU is off-lined it enters either MWAIT-based idle or,
if MWAIT is not desired or supported, HLT-based idle (which places the
processor in C1 state). This patch allows processors without MWAIT
support to stay in states deeper than C1.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-03-30 03:23:01 -04:00
Daniel Lezcano fc850f39ea cpuidle: use the driver's state_count as default
If the state_count is not initialized for the device use
the driver's state count as the default. That will prevent
to add it manually in the cpuidle driver initialization
routine and will save us from duplicate line of code.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-03-30 01:55:04 -04:00
ShuoX Liu 3a53396b03 cpuidle: add a sysfs entry to disable specific C state for debug purpose.
Some C states of new CPU might be not good.  One reason is BIOS might
configure them incorrectly.  To help developers root cause it quickly, the
patch adds a new sysfs entry, so developers could disable specific C state
manually.

In addition, C state might have much impact on performance tuning, as it
takes much time to enter/exit C states, which might delay interrupt
processing.  With the new debug option, developers could check if a deep C
state could impact performance and how much impact it could cause.

Also add this option in Documentation/cpuidle/sysfs.txt.

[akpm@linux-foundation.org: check kstrtol return value]
Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
Reviewed-by: Yanmin Zhang <yanmin_zhang@intel.com>
Reviewed-and-Tested-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-03-30 01:52:58 -04:00
Robert Lee e1689795a7 cpuidle: Add common time keeping and irq enabling
Make necessary changes to implement time keeping and irq enabling
in the core cpuidle code.  This will allow the removal of these
functionalities from various platform cpuidle implementations whose
timekeeping and irq enabling follows the form in this common code.

Signed-off-by: Robert Lee <rob.lee@linaro.org>
Tested-by: Jean Pihet <j-pihet@ti.com>
Tested-by: Amit Daniel <amit.kachhap@linaro.org>
Tested-by: Robert Lee <rob.lee@linaro.org>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Acked-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-03-21 01:59:40 -04:00
Ingo Molnar 737f24bda7 Merge branch 'perf/urgent' into perf/core
Conflicts:
	tools/perf/builtin-record.c
	tools/perf/builtin-top.c
	tools/perf/perf.h
	tools/perf/util/top.h

Merge reason: resolve these cherry-picking conflicts.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-05 09:20:08 +01:00
Benjamin Herrenschmidt aa491ad3d4 cpuidle: Default y on powerpc pSeries
We moved all our pSeries idle loops to the cpu idle framework
so we really want it to come up by default.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-22 16:48:51 +11:00
Steven Rostedt 76027ea863 cpuidle/tracing: Denote the tracepoints as being in rcu_idle_exit() section
As the tracepoints in the cpuidle code are called when rcu_idle_exit() is in
effect, the _rcuidle() version must be used, otherwise the rcu_read_lock()s
that protect the tracepoint will not be honored.

Cc: Len Brown <len.brown@intel.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-02-13 09:14:46 -05:00
Kay Sievers 8a25a2fd12 cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystem
This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem
and converts the devices to regular devices. The sysdev drivers are
implemented as subsystem interfaces now.

After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.

Userspace relies on events and generic sysfs subsystem infrastructure
from sysdev devices, which are made available with this conversion.

Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@amd64.org>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Cc: Len Brown <lenb@kernel.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-21 14:29:42 -08:00
Linus Torvalds 3c00303206 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  cpuidle: Single/Global registration of idle states
  cpuidle: Split cpuidle_state structure and move per-cpu statistics fields
  cpuidle: Remove CPUIDLE_FLAG_IGNORE and dev->prepare()
  cpuidle: Move dev->last_residency update to driver enter routine; remove dev->last_state
  ACPI: Fix CONFIG_ACPI_DOCK=n compiler warning
  ACPI: Export FADT pm_profile integer value to userspace
  thermal: Prevent polling from happening during system suspend
  ACPI: Drop ACPI_NO_HARDWARE_INIT
  ACPI atomicio: Convert width in bits to bytes in __acpi_ioremap_fast()
  PNPACPI: Simplify disabled resource registration
  ACPI: Fix possible recursive locking in hwregs.c
  ACPI: use kstrdup()
  mrst pmu: update comment
  tools/power turbostat: less verbose debugging
2011-11-07 10:13:52 -08:00
Deepthi Dharwar 46bcfad7a8 cpuidle: Single/Global registration of idle states
This patch makes the cpuidle_states structure global (single copy)
instead of per-cpu. The statistics needed on per-cpu basis
by the governor are kept per-cpu. This simplifies the cpuidle
subsystem as state registration is done by single cpu only.
Having single copy of cpuidle_states saves memory. Rare case
of asymmetric C-states can be handled within the cpuidle driver
and architectures such as POWER do not have asymmetric C-states.

Having single/global registration of all the idle states,
dynamic C-state transitions on x86 are handled by
the boot cpu. Here, the boot cpu  would disable all the devices,
re-populate the states and later enable all the devices,
irrespective of the cpu that would receive the notification first.

Reference:
https://lkml.org/lkml/2011/4/25/83

Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com>
Tested-by: Jean Pihet <j-pihet@ti.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-11-06 21:13:58 -05:00
Deepthi Dharwar 4202735e8a cpuidle: Split cpuidle_state structure and move per-cpu statistics fields
This is the first step towards global registration of cpuidle
states. The statistics used primarily by the governor are per-cpu
and have to be split from rest of the fields inside cpuidle_state,
which would be made global i.e. single copy. The driver_data field
is also per-cpu and moved.

Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com>
Tested-by: Jean Pihet <j-pihet@ti.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-11-06 21:13:49 -05:00
Deepthi Dharwar b25edc42bf cpuidle: Remove CPUIDLE_FLAG_IGNORE and dev->prepare()
The cpuidle_device->prepare() mechanism causes updates to the
cpuidle_state[].flags, setting and clearing CPUIDLE_FLAG_IGNORE
to tell the governor not to chose a state on a per-cpu basis at
run-time. State demotion is now handled by the driver and it returns
the actual state entered. Hence, this mechanism is not required.
Also this removes per-cpu flags from cpuidle_state enabling
it to be made global.

Reference:
https://lkml.org/lkml/2011/3/25/52

Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm>
Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com>
Tested-by: Jean Pihet <j-pihet@ti.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-11-06 21:13:43 -05:00
Deepthi Dharwar e978aa7d7d cpuidle: Move dev->last_residency update to driver enter routine; remove dev->last_state
Cpuidle governor only suggests the state to enter using the
governor->select() interface, but allows the low level driver to
override the recommended state. The actual entered state
may be different because of software or hardware demotion. Software
demotion is done by the back-end cpuidle driver and can be accounted
correctly. Current cpuidle code uses last_state field to capture the
actual state entered and based on that updates the statistics for the
state entered.

Ideally the driver enter routine should update the counters,
and it should return the state actually entered rather than the time
spent there. The generic cpuidle code should simply handle where
the counters live in the sysfs namespace, not updating the counters.

Reference:
https://lkml.org/lkml/2011/3/25/52

Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com>
Tested-by: Jean Pihet <j-pihet@ti.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-11-06 21:13:30 -05:00
Paul Gortmaker 70e522a028 cpuidle: ladder.c needs module.h and not just moduleparam.h
This file has module_init/exit and MODULE_LICENSE, and so it
needs the full module.h header.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:31:31 -04:00
Paul Gortmaker 884b17e109 cpuidle: Add module.h to drivers/cpuidle files as required.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:31:30 -04:00
Jean Pihet e8db0be124 PM QoS: Move and rename the implementation files
The PM QoS implementation files are better named
kernel/power/qos.c and include/linux/pm_qos.h.

The PM QoS support is compiled under the CONFIG_PM option.

Signed-off-by: Jean Pihet <j-pihet@ti.com>
Acked-by: markgross <markgross@thegnar.org>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-08-25 15:35:03 +02:00
Len Brown a0bfa13738 cpuidle: stop depending on pm_idle
cpuidle users should call cpuidle_call_idle() directly
rather than via (pm_idle)() function pointer.

Architecture may choose to continue using (pm_idle)(),
but cpuidle need not depend on it:

  my_arch_cpu_idle()
	...
	if(cpuidle_call_idle())
		pm_idle();

cc: Kevin Hilman <khilman@deeprootsystems.com>
cc: Paul Mundt <lethal@linux-sh.org>
cc: x86@kernel.org
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-08-03 19:06:37 -04:00
Len Brown d91ee5863b cpuidle: replace xen access to x86 pm_idle and default_idle
When a Xen Dom0 kernel boots on a hypervisor, it gets access
to the raw-hardware ACPI tables.  While it parses the idle tables
for the hypervisor's beneift, it uses HLT for its own idle.

Rather than have xen scribble on pm_idle and access default_idle,
have it simply disable_cpuidle() so acpi_idle will not load and
architecture default HLT will be used.

cc: xen-devel@lists.xensource.com
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-08-03 19:06:36 -04:00
Len Brown 62027aea23 cpuidle: create bootparam "cpuidle.off=1"
useful for disabling cpuidle to fall back
to architecture-default idle loop

cpuidle drivers and governors will fail to register.
on x86 they'll say so:

intel_idle: intel_idle yielding to (null)
ACPI: acpi_idle yielding to (null)

Signed-off-by: Len Brown <len.brown@intel.com>
2011-08-03 19:06:36 -04:00
Linus Torvalds f310642123 Merge branch 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6
* 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6:
  x86 idle: deprecate mwait_idle() and "idle=mwait" cmdline param
  x86 idle: deprecate "no-hlt" cmdline param
  x86 idle APM: deprecate CONFIG_APM_CPU_IDLE
  x86 idle floppy: deprecate disable_hlt()
  x86 idle: EXPORT_SYMBOL(default_idle, pm_idle) only when APM demands it
  x86 idle: clarify AMD erratum 400 workaround
  idle governor: Avoid lock acquisition to read pm_qos before entering idle
  cpuidle: menu: fixed wrapping timers at 4.294 seconds
2011-05-29 11:18:09 -07:00
Tero Kristo 7467571f44 cpuidle: menu: fixed wrapping timers at 4.294 seconds
Cpuidle menu governor is using u32 as a temporary datatype for storing
nanosecond values which wrap around at 4.294 seconds. This causes errors
in predicted sleep times resulting in higher than should be C state
selection and increased power consumption. This also breaks cpuidle
state residency statistics.

cc: stable@kernel.org # .32.x through .39.x
Signed-off-by: Tero Kristo <tero.kristo@nokia.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-05-29 00:35:47 -04:00
Jiri Kosina 0a9d59a246 Merge branch 'master' into for-next 2011-02-15 10:24:31 +01:00
Jesper Juhl 42b16b3fbb Kill off warning: ‘inline’ is not at beginning of declaration
Fix a bunch of
	warning: ‘inline’ is not at beginning of declaration
messages when building a 'make allyesconfig' kernel with -Wextra.

These warnings are trivial to kill, yet rather annoying when building with
-Wextra.
The more we can cut down on pointless crap like this the better (IMHO).

A previous patch to do this for a 'allnoconfig' build has already been
merged. This just takes the cleanup a little further.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-01-19 15:43:08 +01:00
Len Brown 43952886f0 Merge branch 'cpuidle-perf-events' into idle-test 2011-01-12 18:06:19 -05:00
Len Brown 56dbed129d Merge branch 'linus' into idle-test 2011-01-12 18:06:06 -05:00
Thomas Renninger f77cfe4ea2 cpuidle/x86/perf: fix power:cpu_idle double end events and throw cpu_idle events from the cpuidle layer
Currently intel_idle and acpi_idle driver show double cpu_idle "exit idle"
events -> this patch fixes it and makes cpu_idle events throwing less complex.

It also introduces cpu_idle events for all architectures which use
the cpuidle subsystem, namely:
  - arch/arm/mach-at91/cpuidle.c
  - arch/arm/mach-davinci/cpuidle.c
  - arch/arm/mach-kirkwood/cpuidle.c
  - arch/arm/mach-omap2/cpuidle34xx.c
  - arch/drivers/acpi/processor_idle.c (for all cases, not only mwait)
  - arch/x86/kernel/process.c (did throw events before, but was a mess)
  - drivers/idle/intel_idle.c (did throw events before)

Convention should be:
Fire cpu_idle events inside the current pm_idle function (not somewhere
down the the callee tree) to keep things easy.

Current possible pm_idle functions in X86:
c1e_idle, poll_idle, cpuidle_idle_call, mwait_idle, default_idle
-> this is really easy is now.

This affects userspace:
The type field of the cpu_idle power event can now direclty get
mapped to:
/sys/devices/system/cpu/cpuX/cpuidle/stateX/{name,desc,usage,time,...}
instead of throwing very CPU/mwait specific values.
This change is not visible for the intel_idle driver.
For the acpi_idle driver it should only be visible if the vendor
misses out C-states in his BIOS.
Another (perf timechart) patch reads out cpuidle info of cpu_idle
events from:
/sys/.../cpuidle/stateX/*, then the cpuidle events are mapped
to the correct C-/cpuidle state again, even if e.g. vendors miss
out C-states in their BIOS and for example only export C1 and C3.
-> everything is fine.

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: Robert Schoene <robert.schoene@tu-dresden.de>
CC: Jean Pihet <j-pihet@ti.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: linux-pm@lists.linux-foundation.org
CC: linux-acpi@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: linux-perf-users@vger.kernel.org
CC: linux-omap@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
2011-01-12 18:05:16 -05:00
Len Brown d247632c08 cpuidle: delete NOP CPUIDLE_FLAG_POLL
it serves no purpose

Signed-off-by: Len Brown <len.brown@intel.com>
2011-01-12 12:47:31 -05:00
Thomas Renninger 720f1c3010 cpuidle: Rename X86 specific idle poll state[0] from C0 to POLL
C0 means and is well know as "not idle".
All documentation out there uses this term as "running"/"not idle"
state. Also Linux userspace tools (e.g. cpufreq-aperf and turbostat)
show C0 residency which there is correct, but means something totally
else than cpuidle "POLL" state.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-01-12 12:47:31 -05:00
Rafael J. Wysocki d8c216cfa5 cpuidle: Make cpuidle_enable_device() call poll_idle_init()
The following scenario is possible with the current cpuidle code and
the ACPI cpuidle driver:
(1) acpi_processor_cst_has_changed() is called,
(2) cpuidle_disable_device() is called,
(3) cpuidle_remove_state_sysfs() is called to remove the (presumably
    outdated) states info from sysfs,
(3) acpi_processor_get_power_info() is called, the first entry in the
    pr->power.states[] table is filled with zeros,
(4) acpi_processor_setup_cpuidle() is called and it doesn't fill the
    first entry in pr->power.states[],
(5) cpuidle_enable_device() is called,
(6) __cpuidle_register_device() is _not_ called, since the device has
    already been registered,
(7) Consequently, poll_idle_init() is _not_ called either,
(8) cpuidle_add_state_sysfs() is called to create the sysfs attributes
    for the new states and it uses the bogus first table entry from
    acpi_processor_get_power_info() for creating state0.

This problem is avoided if cpuidle_enable_device()
unconditionally calls poll_idle_init().

Reported-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
cc: stable@kernel.org
2011-01-12 12:47:30 -05:00
Linus Torvalds 72eb6a7914 Merge branch 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (30 commits)
  gameport: use this_cpu_read instead of lookup
  x86: udelay: Use this_cpu_read to avoid address calculation
  x86: Use this_cpu_inc_return for nmi counter
  x86: Replace uses of current_cpu_data with this_cpu ops
  x86: Use this_cpu_ops to optimize code
  vmstat: User per cpu atomics to avoid interrupt disable / enable
  irq_work: Use per cpu atomics instead of regular atomics
  cpuops: Use cmpxchg for xchg to avoid lock semantics
  x86: this_cpu_cmpxchg and this_cpu_xchg operations
  percpu: Generic this_cpu_cmpxchg() and this_cpu_xchg support
  percpu,x86: relocate this_cpu_add_return() and friends
  connector: Use this_cpu operations
  xen: Use this_cpu_inc_return
  taskstats: Use this_cpu_ops
  random: Use this_cpu_inc_return
  fs: Use this_cpu_inc_return in buffer.c
  highmem: Use this_cpu_xx_return() operations
  vmstat: Use this_cpu_inc_return for vm statistics
  x86: Support for this_cpu_add, sub, dec, inc_return
  percpu: Generic support for this_cpu_add, sub, dec, inc_return
  ...

Fixed up conflicts: in arch/x86/kernel/{apic/nmi.c, apic/x2apic_uv_x.c, process.c}
as per Tejun.
2011-01-07 17:02:58 -08:00
Thomas Renninger 25e41933b5 perf: Clean up power events by introducing new, more generic ones
Add these new power trace events:

 power:cpu_idle
 power:cpu_frequency
 power:machine_suspend

The old C-state/idle accounting events:
  power:power_start
  power:power_end

Have now a replacement (but we are still keeping the old
tracepoints for compatibility):

  power:cpu_idle

and
  power:power_frequency

is replaced with:
  power:cpu_frequency

power:machine_suspend is newly introduced.

Jean Pihet has a patch integrated into the generic layer
(kernel/power/suspend.c) which will make use of it.

the type= field got removed from both, it was never
used and the type is differed by the event type itself.

perf timechart userspace tool gets adjusted in a separate patch.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Jean Pihet <jean.pihet@newoldbits.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: rjw@sisk.pl
LKML-Reference: <1294073445-14812-3-git-send-email-trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de>
2011-01-04 08:16:54 +01:00
Christoph Lameter 4a6f4fe837 drivers: Replace __get_cpu_var with __this_cpu_read if not used for an address.
__get_cpu_var() can be replaced with this_cpu_read and will then use a single
read instruction with implied address calculation to access the correct per cpu
instance.

However, the address of a per cpu variable passed to __this_cpu_read() cannot be
determed (since its an implied address conversion through segment prefixes).
Therefore apply this only to uses of __get_cpu_var where the addres of the
variable is not used.

V3->V4:
	- Move one instance of this_cpu_inc_return to a later patch
	  so that this one can go in without percpu infrastructrure
	  changes.

Sedat: fixed compile failure caused by an extra ')'.

Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2010-12-17 15:07:18 +01:00
Lucas De Marchi 20e3341bb1 cpuidle: Fix typos
Signed-off-by: Len Brown <len.brown@intel.com>
2010-09-28 23:30:38 -04:00
Ai Li 71abbbf856 cpuidle: extend cpuidle and menu governor to handle dynamic states
On some SoC chips, HW resources may be in use during any particular idle
period.  As a consequence, the cpuidle states that the SoC is safe to
enter can change from idle period to idle period.  In addition, the
latency and threshold of each cpuidle state can vary, depending on the
operating condition when the CPU becomes idle, e.g.  the current cpu
frequency, the current state of the HW blocks, etc.

cpuidle core and the menu governor, in the current form, are geared
towards cpuidle states that are static, i.e.  the availabiltiy of the
states, their latencies, their thresholds are non-changing during run
time.  cpuidle does not provide any hook that cpuidle drivers can use to
adjust those values on the fly for the current idle period before the menu
governor selects the target cpuidle state.

This patch extends cpuidle core and the menu governor to handle states
that are dynamic.  There are three additions in the patch and the patch
maintains backwards-compatibility with existing cpuidle drivers.

1) add prepare() to struct cpuidle_device.  A cpuidle driver can hook
   into the callback and cpuidle will call prepare() before calling the
   governor's select function.  The callback gives the cpuidle driver a
   chance to update the dynamic information of the cpuidle states for the
   current idle period, e.g.  state availability, latencies, thresholds,
   power values, etc.

2) add CPUIDLE_FLAG_IGNORE as one of the state flags.  In the prepare()
   function, a cpuidle driver can set/clear the flag to indicate to the
   menu governor whether a cpuidle state should be ignored, i.e.  not
   available, during the current idle period.

3) add power_specified bit to struct cpuidle_device.  The menu governor
   currently assumes that the cpuidle states are arranged in the order of
   increasing latency, threshold, and power savings.  This is true or can
   be made true for static states.  Once the state parameters are dynamic,
   the latencies, thresholds, and power savings for the cpuidle states can
   increase or decrease by different amounts from idle period to idle
   period.  So the assumption of increasing latency, threshold, and power
   savings from Cn to C(n+1) can no longer be guaranteed.

It can be straightforward to calculate the power consumption of each
available state and to specify it in power_usage for the idle period.
Using the power_usage fields, the menu governor then selects the state
that has the lowest power consumption and that still satisfies all other
critieria.  The power_specified bit defaults to 0.  For existing cpuidle
drivers, cpuidle detects that power_specified is 0 and fills in a dummy
set of power_usage values.

Signed-off-by: Ai Li <aili@codeaurora.org>
Cc: Len Brown <len.brown@intel.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09 20:45:04 -07:00
Thomas Renninger 6f4f2723d0 [CPUFREQ] x86 cpufreq: Make trace_power_frequency cpufreq driver independent
and fix the broken case if a core's frequency depends on others.

trace_power_frequency was only implemented in a rather ungeneric way
in acpi-cpufreq driver's target() function only.
-> Move the call to trace_power_frequency to
   cpufreq.c:cpufreq_notify_transition() where CPUFREQ_POSTCHANGE
   notifier is triggered.
   This will support power frequency tracing by all cpufreq drivers

trace_power_frequency did not trace frequency changes correctly when
the userspace governor was used or when CPU cores' frequency depend
on each other.
-> Moving this into the CPUFREQ_POSTCHANGE notifier and pass the cpu
   which gets switched automatically fixes this.

Robert Schoene provided some important fixes on top of my initial
quick shot version which are integrated in this patch:
- Forgot some changes in power_end trace (TP_printk/variable names)
- Variable dummy in power_end must now be cpu_id
- Use static 64 bit variable instead of unsigned int for cpu_id

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: davej@redhat.com
CC: arjan@infradead.org
CC: linux-kernel@vger.kernel.org
CC: robert.schoene@tu-dresden.de
Tested-by: robert.schoene@tu-dresden.de
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:05 -04:00
Peter Zijlstra 8c215bd389 sched: Cure nr_iowait_cpu() users
Commit 0224cf4c5e (sched: Intoduce get_cpu_iowait_time_us())
broke things by not making sure preemption was indeed disabled
by the callers of nr_iowait_cpu() which took the iowait value of
the current cpu.

This resulted in a heap of preempt warnings. Cure this by making
nr_iowait_cpu() take a cpu number and fix up the callers to pass
in the right number.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: linux-pm@lists.linux-foundation.org
LKML-Reference: <1277968037.1868.120.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-07-01 09:39:48 +02:00
Linus Torvalds e4f2e5eaac Merge branch 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6
* 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6:
  intel_idle: native hardware cpuidle driver for latest Intel processors
  ACPI: acpi_idle: touch TS_POLLING only in the non-MWAIT case
  acpi_pad: uses MONITOR/MWAIT, so it doesn't need to clear TS_POLLING
  sched: clarify commment for TS_POLLING
  ACPI: allow a native cpuidle driver to displace ACPI
  cpuidle: make cpuidle_curr_driver static
  cpuidle: add cpuidle_unregister_driver() error check
  cpuidle: fail to register if !CONFIG_CPU_IDLE
2010-05-28 16:14:17 -07:00
Len Brown 752138df0d cpuidle: make cpuidle_curr_driver static
cpuidle_register_driver() sets cpuidle_curr_driver
cpuidle_unregister_driver() clears cpuidle_curr_driver

We should't expose cpuidle_curr_driver to
potential modification except via these interfaces.
So make it static and create cpuidle_get_driver() to observe it.

Signed-off-by: Len Brown <len.brown@intel.com>
2010-05-27 21:06:58 -04:00
Len Brown c0d64cb031 cpuidle: add cpuidle_unregister_driver() error check
Assure that cpuidle_unregister_driver() will not clobber
the registered driver if unregistered by somebody else.

Signed-off-by: Len Brown <len.brown@intel.com>
2010-05-27 13:04:04 -04:00
Arjan van de Ven 1f85f87d4f cpuidle: add a repeating pattern detector to the menu governor
Currently, the menu governor uses the (corrected) next timer as key item
for predicting the idle duration.

It turns out that there are specific cases where this breaks down: There
are cases where we have a very repetitive pattern of idle durations, where
the idle period is pretty much the same, for reasons completely unrelated
to the next timer event.  Examples of such repeating patterns are network
loads with irq mitigation, the mouse moving but in theory also the wifi
beacons.

This patch adds a relatively simple detector for such repeating patterns,
where the standard deviation of the last 8 idle periods is compared to a
threshold.

With this extra predictor in place, measurements show that the DECAY
factor can now be increased (the decaying average will now decay slower)
to get an even more stable result.

[arjan@infradead.org: fix bug identified by Frank]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Corrado Zoccolo <czoccolo@gmail.com>
Cc: Frank Rowand <frank.rowand@am.sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:02 -07:00
Mark Gross ed77134bfc PM QOS update
This patch changes the string based list management to a handle base
implementation to help with the hot path use of pm-qos, it also renames
much of the API to use "request" as opposed to "requirement" that was
used in the initial implementation.  I did this because request more
accurately represents what it actually does.

Also, I added a string based ABI for users wanting to use a string
interface.  So if the user writes 0xDDDDDDDD formatted hex it will be
accepted by the interface.  (someone asked me for it and I don't think
it hurts anything.)

This patch updates some documentation input I got from Randy.

Signed-off-by: markgross <mgross@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-05-10 23:08:19 +02:00
Arjan van de Ven 1c6fe0364f cpuidle: Fix incorrect optimization
commit 672917dcc7 ("cpuidle: menu governor: reduce latency on exit")
added an optimization, where the analysis on the past idle period moved
from the end of idle, to the beginning of the new idle.

Unfortunately, this optimization had a bug where it zeroed one key
variable for new use, that is needed for the analysis.  The fix is
simple, zero the variable after doing the work from the previous idle.

During the audit of the code that found this issue, another issue was
also found; the ->measured_us data structure member is never set, a
local variable is always used instead.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Corrado Zoccolo <czoccolo@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-09 18:35:36 -07:00
Tejun Heo 5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Emese Revfy 52cf25d0ab Driver core: Constify struct sysfs_ops in struct kobj_type
Constify struct sysfs_ops.

This is part of the ops structure constification
effort started by Arjan van de Ven et al.

Benefits of this constification:

 * prevents modification of data that is shared
   (referenced) by many other structure instances
   at runtime

 * detects/prevents accidental (but not intentional)
   modification attempts on archs that enforce
   read-only kernel data at runtime

 * potentially better optimized code as the compiler
   can assume that the const data cannot be changed

 * the compiler/linker move const data into .rodata
   and therefore exclude them from false sharing

Signed-off-by: Emese Revfy <re.emese@gmail.com>
Acked-by: David Teigland <teigland@redhat.com>
Acked-by: Matt Domsch <Matt_Domsch@dell.com>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Hans J. Koch <hjk@linutronix.de>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-07 17:04:49 -08:00
Andi Kleen c9be0a36f9 sysdev: Pass attribute in sysdev_class attributes show/store
Passing the attribute to the low level IO functions allows all kinds
of cleanups, by sharing low level IO code without requiring
an own function for every piece of data.

Also drivers can extend the attributes with own data fields
and use that in the low level function.

Similar to sysdev_attributes and normal attributes.

This is a tree-wide sweep, converting everything in one go.

No functional changes in this patch other than passing the new
argument everywhere.

Tested on x86, the non x86 parts are uncompiled.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-07 17:04:47 -08:00
Richard Kennedy 56e6943b41 cpuidle menu: remove 8 bytes of padding on 64 bit builds
Reorder struct menu_device to remove 8 bytes of padding on 64 bit builds.
Size drops from 136 to 128 bytes, so possibly needing one fewer cache
lines.

Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-06 11:26:28 -08:00
Stephen Hemminger 5787536edf drivers/cpuidle/governors/menu.c: fix undefined reference to `__udivdi3'
menu: use proper 64 bit math

The new menu governor is incorrectly doing a 64 bit divide.  Compile
tested only

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-11 09:34:07 -08:00
Julia Lawall faa7b7ddca drivers/cpuidle: Move dereference after NULL test
It does not seem possible that ldev can be NULL, so drop the unnecessary
test.  If ldev can somehow be NULL, then the initialization of last_idx
should be moved below the test.

A simplified version of the semantic match that detects this problem is as
follows (http://coccinelle.lip6.fr/):

// <smpl>
@match exists@
expression x, E;
identifier fld;
@@

* x->fld
  ... when != \(x = E\|&x\)
* x == NULL
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:25 -08:00
Uwe Kleine-Knig 21ae2956ce tree-wide: fix typos "aquire" -> "acquire", "cumsumed" -> "consumed"
This patch was generated by

	git grep -E -i -l '[Aa]quire' | xargs -r perl -p -i -e 's/([Aa])quire/$1cquire/'

and the cumsumed was found by checking the diff for aquire.

Signed-off-by: Uwe Kleine-Knig <u.kleine-koenig@pengutronix.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-11-09 09:40:57 +01:00
Kevin Hilman 246eb7f0ed cpuidle: always return with interrupts enabled
In the case where cpuidle_idle_call() returns before changing state due to
a need_resched(), it was returning with IRQs disabled.

The idle path assumes that the platform specific idle code returns with
interrupts enabled (although this too is undocumented AFAICT) and on ARM
we have a WARN_ON(!(irqs_disabled()) when returning from the idle loop, so
the user-visible effects were only a warning since interrupts were
eventually re-enabled later.

On x86, this same problem exists, but there is no WARN_ON() to detect it.
As on ARM, the interrupts are eventually re-enabled, so I'm not sure of
any actual bugs triggered by this.  It's primarily a
correctness/consistency fix.

This patch ensures IRQs are (re)enabled before returning.

Reported-by: Hemanth V <hemanthv@ti.com>
Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Tested-by: Martin Michlmayr <tbm@cyrius.com>
Cc: <stable@kernel.org>		[2.6.31.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29 07:39:31 -07:00
Corrado Zoccolo 672917dcc7 cpuidle: menu governor: reduce latency on exit
Move the state residency accounting and statistics computation off the hot
exit path.

On exit, the need to recompute statistics is recorded, and new statistics
will be computed when menu_select is called again.

The expected effect is to reduce processor wakeup latency from sleep
(C-states).  We are speaking of few hundreds of cycles reduction out of a
several microseconds latency (determined by the hardware transition), so
it is difficult to measure.

Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Adam Belay <abelay@novell.com
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22 07:17:45 -07:00
Arjan van de Ven 69d25870f2 cpuidle: fix the menu governor to boost IO performance
Fix the menu idle governor which balances power savings, energy efficiency
and performance impact.

The reason for a reworked governor is that there have been serious
performance issues reported with the existing code on Nehalem server
systems.

To show this I'm sure Andrew wants to see benchmark results:
(benchmark is "fio", "no cstates" is using "idle=poll")

		no cstates	current linux	new algorithm
1 disk		107 Mb/s	85 Mb/s		105 Mb/s
2 disks		215 Mb/s	123 Mb/s	209 Mb/s
12 disks	590 Mb/s	320 Mb/s	585 Mb/s

In various power benchmark measurements, no degredation was found by our
measurement&diagnostics team.  Obviously a small percentage more power was
used in the "fio" benchmark, due to the much higher performance.

While it would be a novel idea to describe the new algorithm in this
commit message, I cheaped out and described it in comments in the code
instead.

[changes since first post: spelling fixes from akpm, review feedback,
folded menu-tng into menu.c]

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22 07:17:45 -07:00
Arjan van de Ven 288f023e70 tracing, x86, cpuidle: Move the end point of a C state in the power tracer
The "end of a C state" trace point currently happens before
the code runs that corrects the TSC for having stopped during idle.

The result of this is that the timestamp of the end-of-C-state event
is garbage on cpus where the TSC stops during idle.

This patch moves the end point of the C state to after the timekeeping
engine of the kernel has been corrected.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: fweisbec@gmail.com
Cc: peterz@infradead.org
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <20090919133533.139c2a46@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-19 18:57:52 +02:00
Pallipadi, Venkatesh 816bb611e4 cpuidle: Add decaying history logic to menu idle predictor
Add decaying history of predicted idle time, instead of using the last early
wakeup. This logic helps menu governor do better job of predicting idle time.

With this change, we also measured noticable (~8%) power savings on
a DP server system with CPUs supporting deep C states, when system
was lightly loaded. There was no change to power or perf on other load
conditions.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-12-30 18:48:01 -05:00
Arjan van de Ven 9a6558371b regression: disable timer peek-ahead for 2.6.28
It's showing up as regressions; disabling it very likely just papers
over an underlying issue, but time is running out for 2.6.28, lets get
back to this for 2.6.29

Fixes: #11826 and #11893

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-09 16:28:42 -08:00
Linus Torvalds 1f6d6e8ebe Merge branch 'v28-range-hrtimers-for-linus-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'v28-range-hrtimers-for-linus-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (37 commits)
  hrtimers: add missing docbook comments to struct hrtimer
  hrtimers: simplify hrtimer_peek_ahead_timers()
  hrtimers: fix docbook comments
  DECLARE_PER_CPU needs linux/percpu.h
  hrtimers: fix typo
  rangetimers: fix the bug reported by Ingo for real
  rangetimer: fix BUG_ON reported by Ingo
  rangetimer: fix x86 build failure for the !HRTIMERS case
  select: fix alpha OSF wrapper
  select: fix alpha OSF wrapper
  hrtimer: peek at the timer queue just before going idle
  hrtimer: make the futex() system call use the per process slack value
  hrtimer: make the nanosleep() syscall use the per process slack
  hrtimer: fix signed/unsigned bug in slack estimator
  hrtimer: show the timer ranges in /proc/timer_list
  hrtimer: incorporate feedback from Peter Zijlstra
  hrtimer: add a hrtimer_start_range() function
  hrtimer: another build fix
  hrtimer: fix build bug found by Ingo
  hrtimer: make select() and poll() use the hrtimer range feature
  ...
2008-10-23 10:53:02 -07:00
Venkatesh Pallipadi 89cedfefca cpuidle: upon BIOS bug, default to default_idle rather than polling
http://bugzilla.kernel.org/show_bug.cgi?id=11345

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-10-16 19:00:08 -04:00
Venkatesh Pallipadi 887e301aa1 cpuidle: use last_state which can reflect the actual state entered
cpuidle accounts the idle time for the C-state it was trying to enter and
not to the actual state that the driver eventually entered. The driver may
select a different state than the one chosen by cpuidle due to
constraints like bus-mastering, etc.

Change the time acounting code to look at the dev->last_state after
returning from target_state->enter(). Driver can modify dev->last_state
internally, inside the enter routine to reflect the actual C-state
entered.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Tested-by: Kevin Hilman <khilman@deeprootsystems.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-10-16 17:59:44 -04:00
Arjan van de Ven 2e94d1f71f hrtimer: peek at the timer queue just before going idle
As part of going idle, we already look at the time of the next timer event to determine
which C-state to select etc.

This patch adds functionality that causes the timers that are past their
soft expire time, to fire at this time, before we calculate the next wakeup
time. This functionality will thus avoid wakeups by running timers before
going idle rather than specially waking up for it.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
2008-09-11 07:17:49 -07:00
venkatesh.pallipadi@intel.com 06d9e908b2 cpuidle: Make ladder governor honor latency requirements fully
ladder governor only honored latency requirement when promoting C-states.
Instead. it should check for latency requirement on each idle call,
and demote to appropriate C-state when there is a latency requirement change.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2008-08-15 21:25:35 +02:00
venkatesh.pallipadi@intel.com 320eee7763 cpuidle: Menu governor fix wrong usage of measured_us
There is a bug in menu governor where we have
		if (data->elapsed_us < data->elapsed_us + measured_us)

with measured_us already having elapsed_us added in tickless case here
	unsigned int measured_us =
		cpuidle_get_last_residency(dev) + data->elapsed_us;

Also, it should be last_residency, not measured_us, that need to be used to
do comparing and distinguish between expected & non-expected events.

Refactor menu_reflect() to fix these two problems.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Wei Gang <gang.wei@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2008-08-15 21:25:25 +02:00
venkatesh.pallipadi@intel.com a2bd920233 cpuidle: Do not use poll_idle unless user asks for it
poll_idle was added to CPUIDLE, just as a low latency idle handler, to be
used in cases when user desires CPUs not to enter any idle state at all. It
was supposed to be a run time idle=poll option to the user. But, it was indeed
getting used during normal menu and ladder governor default case, with no
special user setting (Reported by Linus Torvalds).

Change below ensures that poll_idle will not be used unless user explicitly
asks pm_qos infrastructure for zero latency requirement.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2008-08-15 21:25:25 +02:00
Rabin Vincent 66198f36aa cpuidle: make sysfs attributes sysdev class attributes
These attributes are really sysdev class attributes.  The incorrect
definition leads to an oops because of recent changes which make sysdev
attributes use a different prototype.

Based on Andi's f718cd4add ("sched: make
scheduler sysfs attributes sysdev class devices")

Reported-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Rabin Vincent <rabin@rab.in>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: "Li, Shaohua" <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-12 16:07:28 -07:00
Thomas Gleixner b032bf70df ACPI/CPUIDLE: prevent setting pm_idle to NULL
pm_idle_save resp. pm_idle_old can be NULL when the restore code in
acpi_processor_cst_has_changed() resp. cpuidle_uninstall_idle_handler()
is called. This can set pm_idle unconditinally to NULL, which causes the
kernel to panic when calling pm_idle in the x86 idle code. This was
covered by an extra check for !pm_idle in the x86 idle code, which was
removed during the x86 idle code refactoring.

Instead of restoring the pm_idle check in the x86 code prevent the
acpi/cpuidle code to set pm_idle to NULL.

Reported by: Dhaval Giani http://lkml.org/lkml/2008/7/2/309
Based on a debug patch from Ingo Molnar

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-28 08:31:58 -07:00
Andi Kleen 4a0b2b4dbe sysdev: Pass the attribute to the low level sysdev show/store function
This allow to dynamically generate attributes and share show/store
functions between attributes. Right now most attributes are generated
by special macros and lots of duplicated code. With the attribute
passed it's instead possible to attach some data to the attribute
and then use that in shared low level functions to do different things.

I need this for the dynamically generated bank attributes in the x86
machine check code, but it'll allow some further cleanups.

I converted all users in tree to the new show/store prototype. It's a single
huge patch to avoid unbisectable sections.

Runtime tested: x86-32, x86-64
Compiled only: ia64, powerpc
Not compile tested/only grep converted: sh, arm, avr32

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:55:02 -07:00
Jens Axboe 8691e5a8f6 smp_call_function: get rid of the unused nonatomic/retry argument
It's never used and the comments refer to nonatomic and retry
interchangably. So get rid of it.

Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-06-26 11:24:35 +02:00
Venkatesh Pallipadi dcb84f335b cpuidle acpi driver: fix oops on AC<->DC
cpuidle and acpi driver interaction bug with the way cpuidle_register_driver()
is called. Due to this bug, there will be oops on
AC<->DC on some systems, where they support C-states in one DC and not in AC.

The current code does
ON BOOT:
	Look at CST and other C-state info to see whether more than C1 is
	supported. If it is, then acpi processor_idle does a
	cpuidle_register_driver() call, which internally enables the device.

ON CST change notification (AC<->DC) and on suspend-resume:
	acpi driver temporarily disables device, updates the device with
	any new C-states, and reenables the device.

The problem is is on boot, there are no C2, C3 states supported and we skip
the register. Later on AC<->DC, we may get a CST notification and we try
to reevaluate CST and enabled the device, without actually registering it.
This causes breakage as we try to create /sys fs sub directory, without the
parent directory which is created at register time.

Thanks to Sanjeev for reporting the problem here.
http://bugzilla.kernel.org/show_bug.cgi?id=10394

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-06-11 19:13:45 -04:00
Venki Pallipadi 8e92b6605d cpuidle: fix 100% C0 statistics regression
commit 9b12e18cdc
'ACPI: cpuidle: Support C1 idle time accounting'
was implicated in a 100% C0 idle regression.
http://bugzilla.kernel.org/show_bug.cgi?id=10076

It pointed out a potential problem where the menu governor
may get confused by the C-state residency time from poll
idle or C1 idle, where this timing info is not accurate.
This inaccuracy is due to interrupts being handled
before we account for C-state exit.

Do not mark TIME_VALID for CO poll state.
Mark C1 time as valid only with the MWAIT (CSTATE_FFH) entry method.

This makes governors use the timing information only when it is correct and
eliminates any wrong policy decisions that may result from invalid timing
information.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-03-26 00:58:19 -04:00
Yi Yang 8b78cf602f cpuidle: fix cpuidle time and usage overflow
cpuidle C-state sysfs node time and usage are very easy to overflow because
they are all of unsigned int type, time will overflow within about two hours,
usage will take longer time to overflow, but they are increasing for ever.

This patch will convert them to unsigned long long.

Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-03-26 00:45:26 -04:00
Venkatesh Pallipadi 4fcb2fcd4d ACPI, cpuidle: Clarify C-state description in sysfs
Add a new sysfs entry under cpuidle states. desc - can be used by driver to
communicate to userspace any specific information about the state.
This helps in identifying the exact hardware C-states behind the ACPI C-state
definition.

Idea is to export this through powertop, which will help to map the C-state
reported by powertop to actual hardware C-state.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-02-14 00:09:55 -05:00
Venki Pallipadi a6869cc4cf cpuidle: build fix for non-x86
The last posted version of this patch gave compile error
on IA64. So, here goes yet another rewrite of the patch.

Convert cpu_idle_wait() to cpuidle_kick_cpus() which is
SMP-only, and gives error on non supported CPU.

Changes from last patch sent by Kevin:
Moved the definition of kick_cpus back to cpuidle.c from cpuidle.h:
* Having it in .h gives #error on archs which includes the header file without
  actually having CPU_IDLE configured. To make it work in .h, we need one more
  #ifdef around that code which makes it messy.
* Also, the function is only called from one file. So, it can be in declared
  statically in .c rather than making it available to everyone who includes
  the .h file.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Kevin Hilman <khilman@mvista.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-02-09 03:33:40 -05:00
Len Brown 9b71315421 Revert "cpuidle: build fix for non-x86"
This reverts commit f757397097.
which ironically broke the ia64 build
2008-02-07 04:16:34 -05:00
Len Brown acf63867ae Merge branches 'release', 'cpuidle-2.6.25' and 'idle' into release 2008-02-07 03:11:05 -05:00
venkatesh.pallipadi@intel.com 9a0b841586 cpuidle: Add a poll_idle method
Add a default poll idle state with 0 latency. Provides an option to users
to use poll_idle by using 0 as the latency requirement.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-02-07 02:20:15 -05:00
Mark Gross d82b35186e pm qos infrastructure and interface
The following patch is a generalization of the latency.c implementation done
by Arjan last year.  It provides infrastructure for more than one parameter,
and exposes a user mode interface for processes to register pm_qos
expectations of processes.

This interface provides a kernel and user mode interface for registering
performance expectations by drivers, subsystems and user space applications on
one of the parameters.

Currently we have {cpu_dma_latency, network_latency, network_throughput} as
the initial set of pm_qos parameters.

The infrastructure exposes multiple misc device nodes one per implemented
parameter.  The set of parameters implement is defined by pm_qos_power_init()
and pm_qos_params.h.  This is done because having the available parameters
being runtime configurable or changeable from a driver was seen as too easy to
abuse.

For each parameter a list of performance requirements is maintained along with
an aggregated target value.  The aggregated target value is updated with
changes to the requirement list or elements of the list.  Typically the
aggregated target value is simply the max or min of the requirement values
held in the parameter list elements.

>From kernel mode the use of this interface is simple:

pm_qos_add_requirement(param_id, name, target_value):

  Will insert a named element in the list for that identified PM_QOS
  parameter with the target value.  Upon change to this list the new target is
  recomputed and any registered notifiers are called only if the target value
  is now different.

pm_qos_update_requirement(param_id, name, new_target_value):

  Will search the list identified by the param_id for the named list element
  and then update its target value, calling the notification tree if the
  aggregated target is changed.  with that name is already registered.

pm_qos_remove_requirement(param_id, name):

  Will search the identified list for the named element and remove it, after
  removal it will update the aggregate target and call the notification tree
  if the target was changed as a result of removing the named requirement.

>From user mode:

  Only processes can register a pm_qos requirement.  To provide for
  automatic cleanup for process the interface requires the process to register
  its parameter requirements in the following way:

  To register the default pm_qos target for the specific parameter, the
  process must open one of /dev/[cpu_dma_latency, network_latency,
  network_throughput]

  As long as the device node is held open that process has a registered
  requirement on the parameter.  The name of the requirement is
  "process_<PID>" derived from the current->pid from within the open system
  call.

  To change the requested target value the process needs to write a s32
  value to the open device node.  This translates to a
  pm_qos_update_requirement call.

  To remove the user mode request for a target value simply close the device
  node.

[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: fix build again]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: mark gross <mgross@linux.intel.com>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Jaroslav Kysela <perex@suse.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Venki Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Adam Belay <abelay@novell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:22 -08:00
Kevin Hilman f757397097 cpuidle: build fix for non-x86
Convert cpu_idle_wait() to cpuidle_kick_cpus() macro which is
SMP-only, and gives error on non supported CPU.

Signed-off-by: Kevin Hilman <khilman@mvista.com>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-01-31 22:50:48 -05:00
Greg Kroah-Hartman c10997f657 Kobject: convert drivers/* from kobject_unregister() to kobject_put()
There is no need for kobject_unregister() anymore, thanks to Kay's
kobject cleanup changes, so replace all instances of it with
kobject_put().


Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:40 -08:00
Greg Kroah-Hartman 94f57f3368 Kobject: change drivers/cpuidle/sysfs.c to use kobject_init_and_add
Stop using kobject_register, as this way we can control the sending of
the uevent properly, after everything is properly initialized.

Cc: Shaohua Li <shaohua.li@intel.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:28 -08:00
len.brown@intel.com 60555e371d ACPI: CONFIG_CPU_IDLE=ACPI by default
In Linux-2.6.24, CPU_IDLE went upstream, default =n.
For Linux-2.6.25, enable it by default on ACPI systems.
For Linux-2.6.26, we plan to enable it always on ACPI systems.

Signed-off-by: Len Brown <len.brown@intel.com>
2007-11-19 22:22:37 -05:00
Adrian Bunk 83788c0cae cpuidle: remove unused exports
This patch removes the following unused exports:
- cpuidle_devices
- cpuidle_register_governor
- cpuidle_unregister_governor

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2007-10-29 17:27:50 -04:00
Len Brown 4f86d3a8e2 cpuidle: consolidate 2.6.22 cpuidle branch into one patch
commit e5a16b1f9eec0af7cfa0830304b41c1c0833cf9f
Author: Len Brown <len.brown@intel.com>
Date:   Tue Oct 2 23:44:44 2007 -0400

    cpuidle: shrink diff

    processor_idle.c |  440 +++++++++++++++++++++++++++++++++++++++++--
    1 file changed, 429 insertions(+), 11 deletions(-)

    Signed-off-by: Len Brown <len.brown@intel.com>

commit dfbb9d5aedfb18848a3e0d6f6e3e4969febb209c
Author: Len Brown <len.brown@intel.com>
Date:   Wed Sep 26 02:17:55 2007 -0400

    cpuidle: reduce diff size

    Reduces the cpuidle processor_idle.c diff vs 2.6.22 from this
     processor_idle.c | 2006 ++++++++++++++++++++++++++-----------------
     1 file changed, 1219 insertions(+), 787 deletions(-)

    to this:
     processor_idle.c |  502 +++++++++++++++++++++++++++++++++++++++----
     1 file changed, 458 insertions(+), 44 deletions(-)

    ...for the purpose of making the cpuilde patch less invasive
    and easier to review.

    no functional changes.  build tested only.

    Signed-off-by: Len Brown <len.brown@intel.com>

commit 889172fc915f5a7fe20f35b133cbd205ce69bf6c
Author: Venki Pallipadi <venkatesh.pallipadi@intel.com>
Date:   Thu Sep 13 13:40:05 2007 -0700

    cpuidle: Retain old ACPI policy for !CONFIG_CPU_IDLE

    Retain the old policy in processor_idle, so that when CPU_IDLE is not
    configured, old C-state policy will still be used. This provides a
    clean gradual migration path from old ACPI policy to new cpuidle
    based policy.

    Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit 9544a8181edc7ecc33b3bfd69271571f98ed08bc
Author: Venki Pallipadi <venkatesh.pallipadi@intel.com>
Date:   Thu Sep 13 13:39:17 2007 -0700

    cpuidle: Configure governors by default

    Quoting Len "Do not give an option to users to shoot themselves in the foot".

    Remove the configurability of ladder and menu governors as they are
    needed for default policy of cpuidle. That way users will not be able to
    have cpuidle without any policy loosing all C-state power savings.

    Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit 8975059a2c1e56cfe83d1bcf031bcf4cb39be743
Author: Adam Belay <abelay@novell.com>
Date:   Tue Aug 21 18:27:07 2007 -0400

    CPUIDLE: load ACPI properly when CPUIDLE is disabled

    Change the registration return codes for when CPUIDLE
    support is not compiled into the kernel.  As a result, the ACPI
    processor driver will load properly even if CPUIDLE is unavailable.
    However, it may be possible to cleanup the ACPI processor driver further
    and eliminate some dead code paths.

    Signed-off-by: Adam Belay <abelay@novell.com>
    Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit e0322e2b58dd1b12ec669bf84693efe0dc2414a8
Author: Adam Belay <abelay@novell.com>
Date:   Tue Aug 21 18:26:06 2007 -0400

    CPUIDLE: remove cpuidle_get_bm_activity()

    Remove cpuidle_get_bm_activity() and updates governors
    accordingly.

    Signed-off-by: Adam Belay <abelay@novell.com>
    Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit 18a6e770d5c82ba26653e53d240caa617e09e9ab
Author: Adam Belay <abelay@novell.com>
Date:   Tue Aug 21 18:25:58 2007 -0400

    CPUIDLE: max_cstate fix

    Currently max_cstate is limited to 0, resulting in no idle processor
    power management on ACPI platforms.  This patch restores the value to
    the array size.

    Signed-off-by: Adam Belay <abelay@novell.com>
    Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit 1fdc0887286179b40ce24bcdbde663172e205ef0
Author: Adam Belay <abelay@novell.com>
Date:   Tue Aug 21 18:25:40 2007 -0400

    CPUIDLE: handle BM detection inside the ACPI Processor driver

    Update the ACPI processor driver to detect BM activity and
    limit state entry depth internally, rather than exposing such
    requirements to CPUIDLE.  As a result, CPUIDLE can drop this
    ACPI-specific interface and become more platform independent.  BM
    activity is now handled much more aggressively than it was in the
    original implementation, so some testing coverage may be needed to
    verify that this doesn't introduce any DMA buffer under-run issues.

    Signed-off-by: Adam Belay <abelay@novell.com>
    Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit 0ef38840db666f48e3cdd2b769da676c57228dd9
Author: Adam Belay <abelay@novell.com>
Date:   Tue Aug 21 18:25:14 2007 -0400

    CPUIDLE: menu governor updates

    Tweak the menu governor to more effectively handle non-timer
    break events.  Non-timer break events are detected by comparing the
    actual sleep time to the expected sleep time.  In future revisions, it
    may be more reliable to use the timer data structures directly.

    Signed-off-by: Adam Belay <abelay@novell.com>
    Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit bb4d74fca63fa96cf3ace644b15ae0f12b7df5a1
Author: Adam Belay <abelay@novell.com>
Date:   Tue Aug 21 18:24:40 2007 -0400

    CPUIDLE: fix 'current_governor' sysfs entry

    Allow the "current_governor" sysfs entry to properly handle
    input terminated with '\n'.

    Signed-off-by: Adam Belay <abelay@novell.com>
    Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit df3c71559bb69b125f1a48971bf0d17f78bbdf47
Author: Len Brown <len.brown@intel.com>
Date:   Sun Aug 12 02:00:45 2007 -0400

    cpuidle: fix IA64 build (again)

    Signed-off-by: Len Brown <len.brown@intel.com>

commit a02064579e3f9530fd31baae16b1fc46b5a7bca8
Author: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Date:   Sun Aug 12 01:39:27 2007 -0400

    cpuidle: Remove support for runtime changing of max_cstate

    Remove support for runtime changeability of max_cstate. Drivers can use
    use latency APIs.

    max_cstate can still be used as a boot time option and dmi override.

    Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit 0912a44b13adf22f5e3f607d263aed23b4910d7e
Author: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Date:   Sun Aug 12 01:39:16 2007 -0400

    cpuidle: Remove ACPI cstate_limit calls from ipw2100

    ipw2100 already has code to use accetable_latency interfaces to limit the
    C-state. Remove the calls to acpi_set_cstate_limit and acpi_get_cstate_limit
    as they are redundant.

    Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit c649a76e76be6bff1fd770d0a775798813a3f6e0
Author: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Date:   Sun Aug 12 01:35:39 2007 -0400

    cpuidle: compile fix for pause and resume functions

    Fix the compilation failure when cpuidle is not compiled in.

    Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Acked-by: Adam Belay <adam.belay@novell.com>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit 2305a5920fb8ee6ccec1c62ade05aa8351091d71
Author: Adam Belay <abelay@novell.com>
Date:   Thu Jul 19 00:49:00 2007 -0400

    cpuidle: re-write

    Some portions have been rewritten to make the code cleaner and lighter
    weight.  The following is a list of changes:

    1.) the state name is now included in the sysfs interface
    2.) detection, hotplug, and available state modifications are handled by
    CPUIDLE drivers directly
    3.) the CPUIDLE idle handler is only ever installed when at least one
    cpuidle_device is enabled and ready
    4.) the menu governor BM code no longer overflows
    5.) the sysfs attributes are now printed as unsigned integers, avoiding
    negative values
    6.) a variety of other small cleanups

    Also, Idle drivers are no longer swappable during runtime through the
    CPUIDLE sysfs inteface.  On i386 and x86_64 most idle handlers (e.g.
    poll, mwait, halt, etc.) don't benefit from an infrastructure that
    supports multiple states, so I think using a more general case idle
    handler selection mechanism would be cleaner.

    Signed-off-by: Adam Belay <abelay@novell.com>
    Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Acked-by: Shaohua Li <shaohua.li@intel.com>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit df25b6b56955714e6e24b574d88d1fd11f0c3ee5
Author: Len Brown <len.brown@intel.com>
Date:   Tue Jul 24 17:08:21 2007 -0400

    cpuidle: fix IA64 buid

    Signed-off-by: Len Brown <len.brown@intel.com>

commit fd6ada4c14488755ff7068860078c437431fbccd
Author: Adrian Bunk <bunk@stusta.de>
Date:   Mon Jul 9 11:33:13 2007 -0700

    cpuidle: static

    make cpuidle_replace_governor() static

    Signed-off-by: Adrian Bunk <bunk@stusta.de>
    Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit c1d4a2cebcadf2429c0c72e1d29aa2a9684c32e0
Author: Adrian Bunk <bunk@stusta.de>
Date:   Tue Jul 3 00:54:40 2007 -0400

    cpuidle: static

    This patch makes the needlessly global struct menu_governor static.

    Signed-off-by: Adrian Bunk <bunk@stusta.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit dbf8780c6e8d572c2c273da97ed1cca7608fd999
Author: Andrew Morton <akpm@linux-foundation.org>
Date:   Tue Jul 3 00:49:14 2007 -0400

    export symbol tick_nohz_get_sleep_length

    ERROR: "tick_nohz_get_sleep_length" [drivers/cpuidle/governors/menu.ko] undefined!
    ERROR: "tick_nohz_get_idle_jiffies" [drivers/cpuidle/governors/menu.ko] undefined!

    And please be sure to get your changes to core kernel suitably reviewed.

    Cc: Adam Belay <abelay@novell.com>
    Cc: Venki Pallipadi <venkatesh.pallipadi@intel.com>
    Cc: Ingo Molnar <mingo@elte.hu>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: john stultz <johnstul@us.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit 29f0e248e7017be15f99febf9143a2cef00b2961
Author: Andrew Morton <akpm@linux-foundation.org>
Date:   Tue Jul 3 00:43:04 2007 -0400

    tick.h needs hrtimer.h

    It uses hrtimers.

    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit e40cede7d63a029e92712a3fe02faee60cc38fb4
Author: Venki Pallipadi <venkatesh.pallipadi@intel.com>
Date:   Tue Jul 3 00:40:34 2007 -0400

    cpuidle: first round of documentation updates

    Documentation changes based on Pavel's feedback.

    Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit 83b42be2efece386976507555c29e7773a0dfcd1
Author: Venki Pallipadi <venkatesh.pallipadi@intel.com>
Date:   Tue Jul 3 00:39:25 2007 -0400

    cpuidle: add rating to the governors and pick the one with highest rating by default

    Introduce a governor rating scheme to pick the right governor by default.

    Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit d2a74b8c5e8f22def4709330d4bfc4a29209b71c
Author: Venki Pallipadi <venkatesh.pallipadi@intel.com>
Date:   Tue Jul 3 00:38:08 2007 -0400

    cpuidle: make cpuidle sysfs driver governor switch off by default

    Make default cpuidle sysfs to show current_governor and current_driver in
    read-only mode.  More elaborate available_governors and available_drivers with
    writeable current_governor and current_driver interface only appear with
    "cpuidle_sysfs_switch" boot parameter.

    Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit 1f60a0e80bf83cf6b55c8845bbe5596ed8f6307b
Author: Venki Pallipadi <venkatesh.pallipadi@intel.com>
Date:   Tue Jul 3 00:37:00 2007 -0400

    cpuidle: menu governor: change the early break condition

    Change the C-state early break out algorithm in menu governor.

    We only look at early breakouts that result in wakeups shorter than idle
    state's target_residency.  If such a breakout is frequent enough, eliminate
    the particular idle state upto a timeout period.

    Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit 45a42095cf64b003b4a69be3ce7f434f97d7af51
Author: Venki Pallipadi <venkatesh.pallipadi@intel.com>
Date:   Tue Jul 3 00:35:38 2007 -0400

    cpuidle: fix uninitialized variable in sysfs routine

    Fix the uninitialized usage of ret.

    Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit 80dca7cdba3e6ee13eae277660873ab9584eb3be
Author: Venki Pallipadi <venkatesh.pallipadi@intel.com>
Date:   Tue Jul 3 00:34:16 2007 -0400

    cpuidle: reenable /proc/acpi//power interface for the time being

    Keep /proc/acpi/processor/CPU*/power around for a while as powertop depends
    on it. It will be marked deprecated and removed in future. powertop can use
    cpuidle interfaces instead.

    Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit 589c37c2646c5e3813a51255a5ee1159cb4c33fc
Author: Venki Pallipadi <venkatesh.pallipadi@intel.com>
Date:   Tue Jul 3 00:32:37 2007 -0400

    cpuidle: menu governor and hrtimer compile fix

    Compile fix for menu governor.

    Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit 0ba80bd9ab3ed304cb4f19b722e4cc6740588b5e
Author: Len Brown <len.brown@intel.com>
Date:   Thu May 31 22:51:43 2007 -0400

    cpuidle: build fix - cpuidle vs ipw2100 module

    ERROR: "acpi_set_cstate_limit" [drivers/net/wireless/ipw2100.ko] undefined!

    Signed-off-by: Len Brown <len.brown@intel.com>

commit d7d8fa7f96a7f7682be7c6cc0cc53fa7a18c3b58
Author: Adam Belay <abelay@novell.com>
Date:   Sat Mar 24 03:47:07 2007 -0400

    cpuidle: add the 'menu' governor

    Here is my first take at implementing an idle PM governor that takes
    full advantage of NO_HZ.  I call it the 'menu' governor because it
    considers the full list of idle states before each entry.

    I've kept the implementation fairly simple.  It attempts to guess the
    next residency time and then chooses a state that would meet at least
    the break-even point between power savings and entry cost.  To this end,
    it selects the deepest idle state that satisfies the following
    constraints:
         1. If the idle time elapsed since bus master activity was detected
            is below a threshold (currently 20 ms), then limit the selection
            to C2-type or above.
         2. Do not choose a state with a break-even residency that exceeds
            the expected time remaining until the next timer interrupt.
         3. Do not choose a state with a break-even residency that exceeds
            the elapsed time between the last pair of break events,
            excluding timer interrupts.

    This governor has an advantage over "ladder" governor because it
    proactively checks how much time remains until the next timer interrupt
    using the tick infrastructure.  Also, it handles device interrupt
    activity more intelligently by not including timer interrupts in break
    event calculations.  Finally, it doesn't make policy decisions using the
    number of state entries, which can have variable residency times (NO_HZ
    makes these potentially very large), and instead only considers sleep
    time deltas.

    The menu governor can be selected during runtime using the cpuidle sysfs
    interface like so:
    "echo "menu" > /sys/devices/system/cpu/cpuidle/current_governor"

    Signed-off-by: Adam Belay <abelay@novell.com>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit a4bec7e65aa3b7488b879d971651cc99a6c410fe
Author: Adam Belay <abelay@novell.com>
Date:   Sat Mar 24 03:47:03 2007 -0400

    cpuidle: export time until next timer interrupt using NO_HZ

    Expose information about the time remaining until the next
    timer interrupt expires by utilizing the dynticks infrastructure.
    Also modify the main idle loop to allow dynticks to handle
    non-interrupt break events (e.g. DMA).  Finally, expose sleep ticks
    information to external code.  Thomas Gleixner is responsible for much
    of the code in this patch.  However, I've made some additional changes,
    so I'm probably responsible if there are any bugs or oversights :)

    Signed-off-by: Adam Belay <abelay@novell.com>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit 2929d8996fbc77f41a5ff86bb67cdde3ca7d2d72
Author: Adam Belay <abelay@novell.com>
Date:   Sat Mar 24 03:46:58 2007 -0400

    cpuidle: governor API changes

    This patch prepares cpuidle for the menu governor.  It adds an optional
    stage after idle state entry to give the governor an opportunity to
    check why the state was exited.  Also it makes sure the idle loop
    returns after each state entry, allowing the appropriate dynticks code
    to run.

    Signed-off-by: Adam Belay <abelay@novell.com>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit 3a7fd42f9825c3b03e364ca59baa751bb350775f
Author: Venki Pallipadi <venkatesh.pallipadi@intel.com>
Date:   Thu Apr 26 00:03:59 2007 -0700

    cpuidle: hang fix

    Prevent hang on x86-64, when ACPI processor driver is added as a module on
    a system that does not support C-states.

    x86-64 expects all idle handlers to enable interrupts before returning from
    idle handler.  This is due to enter_idle(), exit_idle() races.  Make
    cpuidle_idle_call() confirm to this when there is no pm_idle_old.

    Also, cpuidle look at the return values of attch_driver() and set
    current_driver to NULL if attach fails on all CPUs.

    Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit 4893339a142afbd5b7c01ffadfd53d14746e858e
Author: Shaohua Li <shaohua.li@intel.com>
Date:   Thu Apr 26 10:40:09 2007 +0800

    cpuidle: add support for max_cstate limit

    With CPUIDLE framework, the max_cstate (to limit max cpu c-state)
    parameter is ingored. Some systems require it to ignore C2/C3
    and some drivers like ipw require it too.

    Signed-off-by: Shaohua Li <shaohua.li@intel.com>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit 43bbbbe1cb998cbd2df656f55bb3bfe30f30e7d1
Author: Shaohua Li <shaohua.li@intel.com>
Date:   Thu Apr 26 10:40:13 2007 +0800

    cpuidle: add cpuidle_fore_redetect_devices API

    add cpuidle_force_redetect_devices API,
    which forces all CPU redetect idle states.
    Next patch will use it.

    Signed-off-by: Shaohua Li <shaohua.li@intel.com>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit d1edadd608f24836def5ec483d2edccfb37b1d19
Author: Shaohua Li <shaohua.li@intel.com>
Date:   Thu Apr 26 10:40:01 2007 +0800

    cpuidle: fix sysfs related issue

    Fix the cpuidle sysfs issue.
    a. make kobject dynamicaly allocated
    b. fixed sysfs init issue to avoid suspend/resume issue

    Signed-off-by: Shaohua Li <shaohua.li@intel.com>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit 7169a5cc0d67b263978859672e86c13c23a5570d
Author: Randy Dunlap <randy.dunlap@oracle.com>
Date:   Wed Mar 28 22:52:53 2007 -0400

    cpuidle: 1-bit field must be unsigned

    A 1-bit bitfield has no room for a sign bit.
    drivers/cpuidle/governors/ladder.c:54:16: error: dubious bitfield without explicit `signed' or `unsigned'

    Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
    Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit 4658620158dc2fbd9e4bcb213c5b6fb5d05ba7d4
Author: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Date:   Wed Mar 28 22:52:41 2007 -0400

    cpuidle: fix boot hang

    Patch for cpuidle boot hang reported by Larry Finger here.
    http://www.ussg.iu.edu/hypermail/linux/kernel/0703.2/2025.html

    Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Cc: Larry Finger <larry.finger@lwfinger.net>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit c17e168aa6e5fe3851baaae8df2fbc1cf11443a9
Author: Len Brown <len.brown@intel.com>
Date:   Wed Mar 7 04:37:53 2007 -0500

    cpuidle: ladder does not depend on ACPI

    build fix for CONFIG_ACPI=n

    In file included from drivers/cpuidle/governors/ladder.c:21:
    include/acpi/processor.h:88: error: expected specifier-qualifier-list before ‘acpi_integer’
    include/acpi/processor.h:106: error: expected specifier-qualifier-list before ‘acpi_integer’
    include/acpi/processor.h:168: error: expected specifier-qualifier-list before ‘acpi_handle’

    Signed-off-by: Len Brown <len.brown@intel.com>

commit 8c91d958246bde68db0c3f0c57b535962ce861cb
Author: Adrian Bunk <bunk@stusta.de>
Date:   Tue Mar 6 02:29:40 2007 -0800

    cpuidle: make code static

    This patch makes the following needlessly global code static:
    - driver.c: __cpuidle_find_driver()
    - governor.c: __cpuidle_find_governor()
    - ladder.c: struct ladder_governor

    Signed-off-by: Adrian Bunk <bunk@stusta.de>
    Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Cc: Adam Belay <abelay@novell.com>
    Cc: Shaohua Li <shaohua.li@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit 0c39dc3187094c72c33ab65a64d2017b21f372d2
Author: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Date:   Wed Mar 7 02:38:22 2007 -0500

    cpu_idle: fix build break

    This patch fixes a build breakage with !CONFIG_HOTPLUG_CPU and
    CONFIG_CPU_IDLE.

    Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Signed-off-by: Adrian Bunk <bunk@stusta.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit 8112e3b115659b07df340ef170515799c0105f82
Author: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Date:   Tue Mar 6 02:29:39 2007 -0800

    cpuidle: build fix for !CPU_IDLE

    Fix the compile issues when CPU_IDLE is not configured.

    Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Cc: Adam Belay <abelay@novell.com>
    Cc: Shaohua Li <shaohua.li@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit 1eb4431e9599cd25e0d9872f3c2c8986821839dd
Author: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Date:   Thu Feb 22 13:54:57 2007 -0800

    cpuidle take2: Basic documentation for cpuidle

    Documentation for cpuidle infrastructure

    Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Signed-off-by: Adam Belay <abelay@novell.com>
    Signed-off-by: Shaohua Li <shaohua.li@intel.com>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit ef5f15a8b79123a047285ec2e3899108661df779
Author: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Date:   Thu Feb 22 13:54:03 2007 -0800

    cpuidle take2: Hookup ACPI C-states driver with cpuidle

    Hookup ACPI C-states onto generic cpuidle infrastructure.

    drivers/acpi/procesor_idle.c is now a ACPI C-states driver that registers as
    a driver in cpuidle infrastructure and the policy part is removed from
    drivers/acpi/processor_idle.c. We use governor in cpuidle instead.

    Signed-off-by: Shaohua Li <shaohua.li@intel.com>
    Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Signed-off-by: Adam Belay <abelay@novell.com>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit 987196fa82d4db52c407e8c9d5dec884ba602183
Author: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Date:   Thu Feb 22 13:52:57 2007 -0800

    cpuidle take2: Core cpuidle infrastructure

    Announcing 'cpuidle', a new CPU power management infrastructure to manage
    idle CPUs in a clean and efficient manner.
    cpuidle separates out the drivers that can provide support for multiple types
    of idle states and policy governors that decide on what idle state to use
    at run time.
    A cpuidle driver can support multiple idle states based on parameters like
    varying power consumption, wakeup latency, etc (ACPI C-states for example).
    A cpuidle governor can be usage model specific (laptop, server,
    laptop on battery etc).
    Main advantage of the infrastructure being, it allows independent development
    of drivers and governors and allows for better CPU power management.

    A huge thanks to Adam Belay and Shaohua Li who were part of this mini-project
    since its beginning and are greatly responsible for this patchset.

    This patch:

    Core cpuidle infrastructure.
    Introduces a new abstraction layer for cpuidle:
    * which manages drivers that can support multiple idles states. Drivers
      can be generic or particular to specific hardware/platform
    * allows pluging in multiple policy governors that can take idle state policy
      decision
    * The core also has a set of sysfs interfaces with which administrato can know
      about supported drivers and governors and switch them at run time.

    Signed-off-by: Adam Belay <abelay@novell.com>
    Signed-off-by: Shaohua Li <shaohua.li@intel.com>
    Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Signed-off-by: Len Brown <len.brown@intel.com>

Signed-off-by: Len Brown <len.brown@intel.com>
2007-10-10 00:12:41 -04:00