Commit 5800043 (cpufreq: convert cpufreq_driver to using RCU) causes
the following call trace to be spit on boot:
BUG: sleeping function called from invalid context at /scratch/rafael/work/linux-pm/mm/slab.c:3179
in_atomic(): 0, irqs_disabled(): 0, pid: 292, name: systemd-udevd
2 locks held by systemd-udevd/292:
#0: (subsys mutex){+.+.+.}, at: [<ffffffff8146851a>] subsys_interface_register+0x4a/0xe0
#1: (rcu_read_lock){.+.+.+}, at: [<ffffffff81538210>] cpufreq_add_dev_interface+0x60/0x5e0
Pid: 292, comm: systemd-udevd Not tainted 3.9.0-rc8+ #323
Call Trace:
[<ffffffff81072c90>] __might_sleep+0x140/0x1f0
[<ffffffff811581c2>] kmem_cache_alloc+0x42/0x2b0
[<ffffffff811e7179>] sysfs_new_dirent+0x59/0x130
[<ffffffff811e63cb>] sysfs_add_file_mode+0x6b/0x110
[<ffffffff81538210>] ? cpufreq_add_dev_interface+0x60/0x5e0
[<ffffffff810a3254>] ? __lock_is_held+0x54/0x80
[<ffffffff811e647d>] sysfs_add_file+0xd/0x10
[<ffffffff811e6541>] sysfs_create_file+0x21/0x30
[<ffffffff81538280>] cpufreq_add_dev_interface+0xd0/0x5e0
[<ffffffff81538210>] ? cpufreq_add_dev_interface+0x60/0x5e0
[<ffffffffa000337f>] ? acpi_processor_get_platform_limit+0x32/0xbb [processor]
[<ffffffffa022f540>] ? do_drv_write+0x70/0x70 [acpi_cpufreq]
[<ffffffff810a3254>] ? __lock_is_held+0x54/0x80
[<ffffffff8106c97e>] ? up_read+0x1e/0x40
[<ffffffff8106e632>] ? __blocking_notifier_call_chain+0x72/0xc0
[<ffffffff81538dbd>] cpufreq_add_dev+0x62d/0xae0
[<ffffffff815389b8>] ? cpufreq_add_dev+0x228/0xae0
[<ffffffff81468569>] subsys_interface_register+0x99/0xe0
[<ffffffffa014d000>] ? 0xffffffffa014cfff
[<ffffffff81535d5d>] cpufreq_register_driver+0x9d/0x200
[<ffffffffa014d000>] ? 0xffffffffa014cfff
[<ffffffffa014d0e9>] acpi_cpufreq_init+0xe9/0x1000 [acpi_cpufreq]
[<ffffffff810002fa>] do_one_initcall+0x11a/0x170
[<ffffffff810b4b87>] load_module+0x1cf7/0x2920
[<ffffffff81322580>] ? ddebug_proc_open+0xb0/0xb0
[<ffffffff816baee0>] ? retint_restore_args+0xe/0xe
[<ffffffff810b5887>] sys_init_module+0xd7/0x120
[<ffffffff816bb6d2>] system_call_fastpath+0x16/0x1b
which is quite obvious, because that commit put (multiple instances
of) sysfs_create_file() under rcu_read_lock()/rcu_read_unlock(),
although sysfs_create_file() may cause memory to be allocated with
GFP_KERNEL and that may sleep, which is not permitted in RCU read
critical section.
Revert the buggy commit altogether along with some changes on top
of it.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Some cpufreq drivers implement their own governor and so don't need
us to call generic governors interface via __cpufreq_governor(). Few
recent commits haven't obeyed this law well and we saw some
regressions.
This patch is an attempt to fix the above issue.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reported-and-tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Dirk Brandewie <dirk.brandewie@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
__cpufreq_governor() must be called with a correct policy->cpus mask.
In __cpufreq_remove_dev() we initially clear policy->cpus with
cpumask_clear_cpu() and then call
__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT). If the governor
is doing some per-cpu stuff in EXIT callback, this can create
uncertain behavior.
Generic governors in drivers/cpufreq/ doesn't do any per-cpu stuff
in EXIT callback and so we don't face any issues currently. But its
better to keep the code clean, so we don't face any issues in future.
Now, we call cpumask_clear_cpu() only when multiple cpus are managed
by policy.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We eventually would like to remove the rwlock cpufreq_driver_lock or
convert it back to a spinlock and protect the read sections with RCU.
The first step in that direction is to make cpufreq_driver use RCU.
I don't see an easy wasy to protect the cpufreq_cpu_data structure
with RCU, so I am leaving it with the rwlock for now since under
certain configs __cpufreq_cpu_get is a hot spot with 256+ cores.
[rjw: Subject, changelog, white space]
Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
policy->cpus contains all online cpus that have single shared clock line. And
their frequencies are always updated together.
Many SMP system's cpufreq drivers take care of this in individual drivers but
the best place for this code is in cpufreq core.
This patch modifies cpufreq_notify_transition() to notify frequency change for
all cpus in policy->cpus and hence updates all users of this API.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Stephen Warren <swarren@nvidia.com>
Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently, there can't be multiple instances of single governor_type.
If we have a multi-package system, where we have multiple instances
of struct policy (per package), we can't have multiple instances of
same governor. i.e. We can't have multiple instances of ondemand
governor for multiple packages.
Governors directory in sysfs is created at /sys/devices/system/cpu/cpufreq/
governor-name/. Which again reflects that there can be only one
instance of a governor_type in the system.
This is a bottleneck for multicluster system, where we want different
packages to use same governor type, but with different tunables.
This patch uses the infrastructure provided by earlier patch and
implements init/exit routines for ondemand and conservative
governors.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently, there can't be multiple instances of single governor_type.
If we have a multi-package system, where we have multiple instances
of struct policy (per package), we can't have multiple instances of
same governor. i.e. We can't have multiple instances of ondemand
governor for multiple packages.
Governors directory in sysfs is created at /sys/devices/system/cpu/cpufreq/
governor-name/. Which again reflects that there can be only one
instance of a governor_type in the system.
This is a bottleneck for multicluster system, where we want different
packages to use same governor type, but with different tunables.
This patch is inclined towards providing this infrastructure. Because
we are required to allocate governor's resources dynamically now, we
must do it at policy creation and end. And so got
CPUFREQ_GOV_POLICY_INIT/EXIT.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This eliminates the contention I am seeing in __cpufreq_cpu_get.
It also nicely stages the lock to be replaced by the rcu.
Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* pm-cpufreq: (55 commits)
cpufreq / intel_pstate: Fix 32 bit build
cpufreq: conservative: Fix typos in comments
cpufreq: ondemand: Fix typos in comments
cpufreq: exynos: simplify .init() for setting policy->cpus
cpufreq: kirkwood: Add a cpufreq driver for Marvell Kirkwood SoCs
cpufreq/x86: Add P-state driver for sandy bridge.
cpufreq_stats: do not remove sysfs files if frequency table is not present
cpufreq: Do not track governor name for scaling drivers with internal governors.
cpufreq: Only call cpufreq_out_of_sync() for driver that implement cpufreq_driver.target()
cpufreq: Retrieve current frequency from scaling drivers with internal governors
cpufreq: Fix locking issues
cpufreq: Create a macro for unlock_policy_rwsem{read,write}
cpufreq: Remove unused HOTPLUG_CPU code
cpufreq: governors: Fix WARN_ON() for multi-policy platforms
cpufreq: ondemand: Replace down_differential tuner with adj_up_threshold
cpufreq / stats: Get rid of CPUFREQ_STATDEVICE_ATTR
cpufreq: Don't check cpu_online(policy->cpu)
cpufreq: add imx6q-cpufreq driver
cpufreq: Don't remove sysfs link for policy->cpu
cpufreq: Remove unnecessary use of policy->shared_type
...
Scaling drivers that implement internal governors do not have governor
structures assocaited with them. Only track the name of the governor
associated with the CPU if the driver does not implement
cpufreq_driver.setpolicy()
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Scaling drivers that implement cpufreq_driver.setpolicy() have
internal governors that do not signal changes via
cpufreq_notify_transition() so the frequncy in the policy will almost
certainly be different than the current frequncy. Only call
cpufreq_out_of_sync() when the underlying driver implements
cpufreq_driver.target()
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Scaling drivers that implement the cpufreq_driver.setpolicy() versus
the cpufreq_driver.target() interface do not set policy->cur.
Normally policy->cur is set during the call to cpufreq_driver.target()
when the frequnecy request is made by the governor.
If the scaling driver implements cpufreq_driver.setpolicy() and
cpufreq_driver.get() interfaces use cpufreq_driver.get() to retrieve
the current frequency.
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
cpufreq core uses two locks:
- cpufreq_driver_lock: General lock for driver and cpufreq_cpu_data array.
- cpu_policy_rwsemfix locking: per CPU reader-writer semaphore designed to cure
all cpufreq/hotplug/workqueue/etc related lock issues.
These locks were not used properly and are placed against their principle
(present before their definition) at various places. This patch is an attempt to
fix their use.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
On the lines of macro: lock_policy_rwsem, we can create another macro for
unlock_policy_rwsem. Lets do it.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Because the sibling cpu of any online cpu is identified very early in
cpufreq_add_dev(), below code is never executed. And so can be removed.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
On multi-policy systems there is a single instance of governor for both the
policies (if same governor is chosen for both policies). With the code update
from following patches:
8eeed09 cpufreq: governors: Get rid of dbs_data->enable field
b394058 cpufreq: governors: Reset tunables only for cpufreq_unregister_governor()
We are creating/removing sysfs directory of governor for for every call to
GOV_START and STOP. This would fail for multi-policy system as there is a
per-policy call to START/STOP.
This patch reuses the governor->initialized variable to detect total users of
governor.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
policy->cpu or cpus in policy->cpus can't be offline anymore. And so we don't
need to check if they are online or not.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
"cpufreq" directory in policy->cpu is never created using
sysfs_create_link(), but using kobject_init_and_add(). And so we
shouldn't call sysfs_remove_link() for policy->cpu(). sysfs stuff
for policy->cpu is automatically removed when we call kobject_put()
for dying policy.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Dirk Brandewie <dirk.brandewie@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently, whenever governor->governor() is called for CPUFRREQ_GOV_START event
we reset few tunables of governor. Which isn't correct, as this routine is
called for every cpu hot-[un]plugging event. We should actually be resetting
these only when the governor module is removed and re-installed.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently cpufreq_add_dev() firsts allocates policy, calls
driver->init() and then checks if this CPU is already managed or not.
And if it is already managed, its policy is freed.
We can save all this if we somehow know that CPU is managed or not in
advance. policy->related_cpus contains the list of all valid sibling
CPUs of policy->cpu. We can check this to see if the current CPU is
already managed.
From now on, platforms don't really need to set related_cpus from
their init() routines, as the same work is done by core too.
If a platform driver needs to set the related_cpus mask with some
additional CPUs, other than CPUs present in policy->cpus, they are
free to do it, though, as we don't override anything.
[rjw: Changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This reverts commit 956f339 "cpufreq: Don't use cpu removed during
cpufreq_driver_unregister".
With the addition of the following commit, this change/variable is not
required any more:
commit b9ba2725343ae57add3f324dfa5074167f48de96
Author: Viresh Kumar <viresh.kumar@linaro.org>
Date: Mon Jan 14 13:23:03 2013 +0000
cpufreq: Simplify __cpufreq_remove_dev()
[rjw: Subject and changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add a helper function to return cpufreq_driver->name.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When disable_cpufreq() is called some exported functions are still
being used that do not have a check for cpufreq being disabled.
Add a disabled check into cpufreq_cpu_get() to return NULL if
cpufreq is disabled this covers most of the exported functions. For
the exported functions that do not call cpufreq_cpu_get() add an
explicit check.
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
__cpufreq_remove_dev() is called on multiple occasions: cpufreq_driver
unregister and cpu removals.
Current implementation of this routine is overly complex without much need. If
the cpu to be removed is the policy->cpu, we remove the policy first and add all
other cpus again from policy->cpus and then finally call __cpufreq_remove_dev()
again to remove the cpu to be deleted. Haahhhh..
There exist a simple solution to removal of a cpu:
- Simply use the old policy structure
- update its fields like: policy->cpu, etc.
- notify any users of cpufreq, which depend on changing policy->cpu
Hence this patch, which tries to implement the above theory. It is tested well
by myself on ARM big.LITTLE TC2 SoC, which has 5 cores (2 A15 and 3 A7). Both
A15's share same struct policy and all A7's share same policy structure.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This is how the core works:
cpufreq_driver_unregister()
- subsys_interface_unregister()
- for_each_cpu() call cpufreq_remove_dev(), i.e. 0,1,2,3,4 when we
unregister.
cpufreq_remove_dev():
- Remove policy node
- Call cpufreq_add_dev() for next cpu, sharing mask with removed cpu.
i.e. When cpu 0 is removed, we call it for cpu 1. And when called for cpu 2,
we call it for cpu 3.
- cpufreq_add_dev() would call cpufreq_driver->init()
- init would return mask as AND of 2, 3 and 4 for cluster A7.
- cpufreq core would do online_cpu && policy->cpus
Here is the BUG(). Because cpu hasn't died but we have just unregistered
the cpufreq driver, online cpu would still have cpu 2 in it. And so thing
go bad again.
Solution: Keep cpumask of cpus that are registered with cpufreq core and clear
cpus when we get a call from subsys_interface_unregister() via
cpufreq_remove_dev().
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Because cpufreq core and governors worry only about the online cpus, if a cpu is
hot [un]plugged, we must notify governors about it, otherwise be ready to expect
something unexpected.
We already have notifiers in the form of CPUFREQ_GOV_START/CPUFREQ_GOV_STOP, we
just need to call them now.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
cpufreq core doesn't manage offline cpus and if driver->init() has returned
mask including offline cpus, it may result in unwanted behavior by cpufreq core
or governors.
We need to get only online cpus in this mask. There are two places to fix this
mask, cpufreq core and cpufreq driver. It makes sense to do this at common place
and hence is done in core.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The text in Documentation said it would be removed in 2.6.41;
the text in the Kconfig said removal in the 3.1 release. Either
way you look at it, we are well past both, so push it off a cliff.
Note that the POWER_CSTATE and the POWER_PSTATE are part of the
legacy tracing API. Remove all tracepoints which use these flags.
As can be seen from context, most already have a trace entry via
trace_cpu_idle anyways.
Also, the cpufreq/cpufreq.c PSTATE one is actually unpaired, as
compared to the CSTATE ones which all have a clear start/stop.
As part of this, the trace_power_frequency also becomes orphaned,
so it too is deleted.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Remove an unnecessary initializer for the 'ret' variable in
__cpufreq_set_policy().
[rjw: Modified the subject and changelog slightly.]
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
__cpufreq_driver_target() must not pass target frequency beyond the
limits of current policy.
Today most of cpufreq platform drivers are doing this check in their
target routines. Why not move it to __cpufreq_driver_target()?
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Avoid calling cpufreq driver's target() routine if new frequency is same as
policies current frequency.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
cpufreq_disabled() is a local function, so should be marked static.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There is no need to do cpufreq_get_cpu() and cpufreq_put_cpu() for drivers that
don't support getavg() routine.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
With debug options on, it is difficult to locate cpufreq core's debug prints.
Fix this by prefixing debug prints with KBUILD_MODNAME.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Arrays for governer and driver name are of size CPUFREQ_NAME_LEN or 16.
i.e. 15 bytes for name and 1 for trailing '\0'.
When cpufreq driver print these names (for sysfs), it includes '\n' or ' ' in
the fmt string and still passes length as CPUFREQ_NAME_LEN. If the driver or
governor names are using all 15 fields allocated to them, then the trailing '\n'
or ' ' will never be printed. And so commands like:
root@linaro-developer# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
will print something like:
cpufreq_foodrvroot@linaro-developer#
Fix this by increasing print length by one character.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show_bios_limit is mistakenly written as show_scaling_driver in a comment
describing purpose of show_bios_limit() routine.
Fix it.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Running one program that continuously hotplugs and replugs a cpu
concurrently with another program that continuously writes to the
scaling_setspeed node eventually deadlocks with:
=============================================
[ INFO: possible recursive locking detected ]
3.4.0 #37 Tainted: G W
---------------------------------------------
filemonkey/122 is trying to acquire lock:
(s_active#13){++++.+}, at: [<c01a3d28>] sysfs_remove_dir+0x9c/0xb4
but task is already holding lock:
(s_active#13){++++.+}, at: [<c01a22f0>] sysfs_write_file+0xe8/0x140
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(s_active#13);
lock(s_active#13);
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by filemonkey/122:
#0: (&buffer->mutex){+.+.+.}, at: [<c01a2230>] sysfs_write_file+0x28/0x140
#1: (s_active#13){++++.+}, at: [<c01a22f0>] sysfs_write_file+0xe8/0x140
stack backtrace:
[<c0014fcc>] (unwind_backtrace+0x0/0x120) from [<c00ca600>] (validate_chain+0x6f8/0x1054)
[<c00ca600>] (validate_chain+0x6f8/0x1054) from [<c00cb778>] (__lock_acquire+0x81c/0x8d8)
[<c00cb778>] (__lock_acquire+0x81c/0x8d8) from [<c00cb9c0>] (lock_acquire+0x18c/0x1e8)
[<c00cb9c0>] (lock_acquire+0x18c/0x1e8) from [<c01a3ba8>] (sysfs_addrm_finish+0xd0/0x180)
[<c01a3ba8>] (sysfs_addrm_finish+0xd0/0x180) from [<c01a3d28>] (sysfs_remove_dir+0x9c/0xb4)
[<c01a3d28>] (sysfs_remove_dir+0x9c/0xb4) from [<c02d0e5c>] (kobject_del+0x10/0x38)
[<c02d0e5c>] (kobject_del+0x10/0x38) from [<c02d0f74>] (kobject_release+0xf0/0x194)
[<c02d0f74>] (kobject_release+0xf0/0x194) from [<c0565a98>] (cpufreq_cpu_put+0xc/0x24)
[<c0565a98>] (cpufreq_cpu_put+0xc/0x24) from [<c05683f0>] (store+0x6c/0x74)
[<c05683f0>] (store+0x6c/0x74) from [<c01a2314>] (sysfs_write_file+0x10c/0x140)
[<c01a2314>] (sysfs_write_file+0x10c/0x140) from [<c014af44>] (vfs_write+0xb0/0x128)
[<c014af44>] (vfs_write+0xb0/0x128) from [<c014b06c>] (sys_write+0x3c/0x68)
[<c014b06c>] (sys_write+0x3c/0x68) from [<c000e0e0>] (ret_fast_syscall+0x0/0x3c)
This is because store() in cpufreq.c indirectly calls
kobject_get() via cpufreq_cpu_get() and is the last one to call
kobject_put() via cpufreq_cpu_put(). Sysfs code should not call
kobject_get() or kobject_put() directly (see the comment around
sysfs_schedule_callback() for more information).
Fix this deadlock by introducing two new functions:
struct cpufreq_policy *cpufreq_cpu_get_sysfs(unsigned int cpu)
void cpufreq_cpu_put_sysfs(struct cpufreq_policy *data)
which do the same thing as cpufreq_cpu_{get,put}() but don't call
kobject functions.
To easily trigger this deadlock you can insert an msleep() with a
reasonably large value right after the fail label at the bottom
of the store() function in cpufreq.c and then write
scaling_setspeed in one task and offline the cpu in another. The
first task will hang and be detected by the hung task detector.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
useful for disabling cpufreq altogether. The cpu frequency
scaling drivers and cpu frequency governors will fail to register.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Dave Jones <davej@redhat.com>
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: (23 commits)
[CPUFREQ] EXYNOS: Removed useless headers and codes
[CPUFREQ] EXYNOS: Make EXYNOS common cpufreq driver
[CPUFREQ] powernow-k8: Update copyright, maintainer and documentation information
[CPUFREQ] powernow-k8: Fix indexing issue
[CPUFREQ] powernow-k8: Avoid Pstate MSR accesses on systems supporting CPB
[CPUFREQ] update lpj only if frequency has changed
[CPUFREQ] cpufreq:userspace: fix cpu_cur_freq updation
[CPUFREQ] Remove wall variable from cpufreq_gov_dbs_init()
[CPUFREQ] EXYNOS4210: cpufreq code is changed for stable working
[CPUFREQ] EXYNOS4210: Update frequency table for cpu divider
[CPUFREQ] EXYNOS4210: Remove code about bus on cpufreq
[CPUFREQ] s3c64xx: Use pr_fmt() for consistent log messages
cpufreq: OMAP: fixup for omap_device changes, include <linux/module.h>
cpufreq: OMAP: fix freq_table leak
cpufreq: OMAP: put clk if cpu_init failed
cpufreq: OMAP: only supports OPP library
cpufreq: OMAP: dont support !freq_table
cpufreq: OMAP: deny initialization if no mpudev
cpufreq: OMAP: move clk name decision to init
cpufreq: OMAP: notify even with bad boot frequency
...
During scaling up of cpu frequency, loops_per_jiffy
is updated upon invoking PRECHANGE notifier.
If setting to new frequency fails in cpufreq driver,
lpj is left at incorrect value.
Hence update lpj only if cpu frequency is changed,
i.e. upon invoking POSTCHANGE notifier.
Penalty would be that during time period between
changing cpu frequency & invocation of POSTCHANGE
notifier, udelay(x) may not gurantee minimal delay
of 'x' us for frequency scaling up operation.
Perhaps a better solution would be to define
CPUFREQ_ABORTCHANGE & handle accordingly, but then
it would be more intrusive (using ABORTCHANGE may
help drivers also; if any has registered notifier
and expect POST for a PRECHANGE, their needs can
be taken care using ABORT)
Signed-off-by: Afzal Mohammed <afzal@ti.com>
Signed-off-by: Dave Jones <davej@redhat.com>
This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem
and converts the devices to regular devices. The sysdev drivers are
implemented as subsystem interfaces now.
After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.
Userspace relies on events and generic sysfs subsystem infrastructure
from sysdev devices, which are made available with this conversion.
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@amd64.org>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Cc: Len Brown <lenb@kernel.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This allows drivers and other code to get the max reported CPU frequency.
Initial use is for scaling ring frequency with GPU frequency in the i915
driver.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Keith Packard <keithp@keithp.com>
Since format string handling is part of request_module, there is no
need to construct the module name. As such, drop the redundant sprintf
and heap usage.
Signed-off-by: Kees Cook <kees.cook@canonical.com>
Signed-off-by: Dave Jones <davej@redhat.com>
With dynamic debug having gained the capability to report debug messages
also during the boot process, it offers a far superior interface for
debug messages than the custom cpufreq infrastructure. As a first step,
remove the old cpufreq_debug_printk() function and replace it with a call
to the generic pr_debug() function.
How can dynamic debug be used on cpufreq? You need a kernel which has
CONFIG_DYNAMIC_DEBUG enabled.
To enabled debugging during runtime, mount debugfs and
$ echo -n 'module cpufreq +p' > /sys/kernel/debug/dynamic_debug/control
for debugging the complete "cpufreq" module. To achieve the same goal during
boot, append
ddebug_query="module cpufreq +p"
as a boot parameter to the kernel of your choice.
For more detailled instructions, please see
Documentation/dynamic-debug-howto.txt
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Dave Jones <davej@redhat.com>
When we discover CPUs that are affected by each other's
frequency/voltage transitions, the first CPU gets a sysfs directory
created, and rest of the siblings get symlinks. Currently, when we
hotplug off only the first CPU, all of the symlinks and the sysfs
directory gets removed. Even though rest of the siblings are still
online and functional, they are orphaned, and no longer governed by
cpufreq.
This patch, given the above scenario, creates a sysfs directory for
the first sibling and symlinks for the rest of the siblings.
Please note the recursive call, it was rather too ugly to roll it
out. And the removal of redundant NULL setting (it is already taken
care of near the top of the function).
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Acked-by: Mark Langsdorf <mark.langsdorf@amd.com>
Reviewed-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
Cc: stable@kernel.org
The cpufreq subsystem uses sysdev suspend and resume for
executing cpufreq_suspend() and cpufreq_resume(), respectively,
during system suspend, after interrupts have been switched off on the
boot CPU, and during system resume, while interrupts are still off on
the boot CPU. In both cases the other CPUs are off-line at the
relevant point (either they have been switched off via CPU hotplug
during suspend, or they haven't been switched on yet during resume).
For this reason, although it may seem that cpufreq_suspend() and
cpufreq_resume() are executed for all CPUs in the system, they are
only called for the boot CPU in fact, which is quite confusing.
To remove the confusion and to prepare for elimiating sysdev
suspend and resume operations from the kernel enirely, convernt
cpufreq to using a struct syscore_ops object for the boot CPU
suspend and resume and rename the callbacks so that their names
reflect their purpose. In addition, put some explanatory remarks
into their kerneldoc comments.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
None of the existing cpufreq drivers uses the second argument of
its .suspend() callback (which isn't useful anyway), so remove it.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
cpufreq_register_driver sets cpufreq_driver to a structure owned (and
placed) in the caller's memory. If cpufreq policy fails in its ->init
function, sysdev_driver_register returns nonzero in
cpufreq_register_driver. Now, cpufreq_register_driver returns an error
without setting cpufreq_driver back to NULL.
Usually cpufreq policy modules are unloaded because they propagate the
error to the module init function and return that.
So a later access to any member of cpufreq_driver causes bugs like:
BUG: unable to handle kernel paging request at ffffffffa00270a0
IP: [<ffffffff8145eca3>] cpufreq_cpu_get+0x53/0xe0
PGD 1805067 PUD 1809063 PMD 1c3f90067 PTE 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/virtual/net/tun0/statistics/collisions
CPU 0
Modules linked in: ...
Pid: 5677, comm: thunderbird-bin Tainted: G W 2.6.38-rc4-mm1_64+ #1389 To be filled by O.E.M./To Be Filled By O.E.M.
RIP: 0010:[<ffffffff8145eca3>] [<ffffffff8145eca3>] cpufreq_cpu_get+0x53/0xe0
RSP: 0018:ffff8801aec37d98 EFLAGS: 00010086
RAX: 0000000000000202 RBX: 0000000000000000 RCX: 0000000000000001
RDX: ffffffffa00270a0 RSI: 0000000000001000 RDI: ffffffff8199ece8
...
Call Trace:
[<ffffffff8145f490>] cpufreq_quick_get+0x10/0x30
[<ffffffff8103f12b>] show_cpuinfo+0x2ab/0x300
[<ffffffff81136292>] seq_read+0xf2/0x3f0
[<ffffffff8126c5d3>] ? __strncpy_from_user+0x33/0x60
[<ffffffff8116850d>] proc_reg_read+0x6d/0xa0
[<ffffffff81116e53>] vfs_read+0xc3/0x180
[<ffffffff81116f5c>] sys_read+0x4c/0x90
[<ffffffff81030dbb>] system_call_fastpath+0x16/0x1b
...
It's all cause by weird fail path handling in cpufreq_register_driver.
To fix that, shuffle the code to do proper handling with gotos.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Dave Jones <davej@redhat.com>
Add these new power trace events:
power:cpu_idle
power:cpu_frequency
power:machine_suspend
The old C-state/idle accounting events:
power:power_start
power:power_end
Have now a replacement (but we are still keeping the old
tracepoints for compatibility):
power:cpu_idle
and
power:power_frequency
is replaced with:
power:cpu_frequency
power:machine_suspend is newly introduced.
Jean Pihet has a patch integrated into the generic layer
(kernel/power/suspend.c) which will make use of it.
the type= field got removed from both, it was never
used and the type is differed by the event type itself.
perf timechart userspace tool gets adjusted in a separate patch.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Jean Pihet <jean.pihet@newoldbits.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: rjw@sisk.pl
LKML-Reference: <1294073445-14812-3-git-send-email-trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de>