Commit Graph

313 Commits

Author SHA1 Message Date
Viresh Kumar 3bc28ab6da cpufreq: remove CONFIG_CPU_FREQ_TABLE
CONFIG_CPU_FREQ_TABLE will be always enabled when cpufreq framework is used, as
cpufreq core depends on it. So, we don't need this CONFIG option anymore as it
is not configurable. Remove CONFIG_CPU_FREQ_TABLE and update its users.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-16 00:50:33 +02:00
Viresh Kumar 70e9e77833 cpufreq: create cpufreq_generic_init() routine
Many CPUFreq drivers for SMP system (where all cores share same clock lines), do
similar stuff in their ->init() part.

This patch creates a generic routine in cpufreq core which can be used by these
so that we can remove some redundant code.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-16 00:50:33 +02:00
Viresh Kumar da60ce9f2f cpufreq: call cpufreq_driver->get() after calling ->init()
Almost all drivers set policy->cur with current CPU frequency in their ->init()
part. This can be done for all of them at core level and so they wouldn't need
to do it.

This patch adds supporting code in cpufreq core for calling get() after we have
called init() for a policy.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-16 00:50:28 +02:00
Viresh Kumar 0b981e7074 cpufreq: use cpufreq_driver->flags to mark CPUFREQ_HAVE_GOVERNOR_PER_POLICY
Use cpufreq_driver->flags to mark CPUFREQ_HAVE_GOVERNOR_PER_POLICY instead
of a separate field within cpufreq_driver. This will save some bytes of
memory.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-16 00:50:23 +02:00
Viresh Kumar 037ce8397d cpufreq: rename __cpufreq_set_policy() as cpufreq_set_policy()
Earlier there used to be two functions named __cpufreq_set_policy() and
cpufreq_set_policy(), but now we only have a single routine lets name it
cpufreq_set_policy() instead of __cpufreq_set_policy().

This also removes some invalid comments or fixes some incorrect comments.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-16 00:50:23 +02:00
Viresh Kumar 27a862e983 cpufreq: remove __cpufreq_remove_dev()
Nobody except cpufreq_remove_dev() calls __cpufreq_remove_dev() and
so we don't need two separate routines here. Merge code from
__cpufreq_remove_dev() into cpufreq_remove_dev() and get rid of
__cpufreq_remove_dev().

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-16 00:50:22 +02:00
Viresh Kumar 75949c9a1f cpufreq: don't break string in print statements
As a rule its better not to break string (quoted inside "") in a
print statement even if it crosses 80 column boundary as that may
introduce bugs and so this patch rewrites one of the print statements..

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-16 00:50:22 +02:00
Viresh Kumar bbdd04ab1f cpufreq: Remove extra blank line
We don't need a blank line just at start of a block, lets remove it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-16 00:50:22 +02:00
Viresh Kumar 67a29e558b cpufreq: remove invalid comment from __cpufreq_remove_dev()
Some section of kerneldoc comment for __cpufreq_remove_dev() is invalid now.
Remove it.

Suggested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-16 00:50:21 +02:00
Viresh Kumar 1b750e3bda cpufreq: make return type of lock_policy_rwsem_{read|write}() as void
lock_policy_rwsem_{read|write}() currently has return type of int,
but it always returns zero and hence its return type should be void
instead. This patch makes that change and modifies all of the users
accordingly.

Reported-by: Jon Medhurst<tixy@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-10 02:51:14 +02:00
Viresh Kumar 26ca869434 cpufreq: check cpufreq driver is valid and cpufreq isn't disabled in cpufreq_get()
cpufreq_get() can be called from external drivers which might not be aware if
cpufreq driver is registered or not. And so we should actually check if cpufreq
driver is registered or not and also if cpufreq is active or disabled, at the
beginning of cpufreq_get().

Otherwise call to lock_policy_rwsem_read() might hit BUG_ON(!policy).

Reported-and-tested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-25 03:24:02 +02:00
Yinghai Lu 4dea5806d3 cpufreq: return EEXIST instead of EBUSY for second registering
On systems that support intel_pstate, acpi_cpufreq fails to load, and
udev keeps trying until trace gets filled up and kernel crashes.

The root cause is driver return ret from cpufreq_register_driver(),
because when some other driver takes over before, it will return
EBUSY and then udev will keep trying ...

cpufreq_register_driver() should return EEXIST instead so that the
system can boot without appending intel_pstate=disable and still use
intel_pstate.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-20 00:37:10 +02:00
Viresh Kumar 8efd57657d cpufreq: unlock correct rwsem while updating policy->cpu
Current code looks like this:

        WARN_ON(lock_policy_rwsem_write(cpu));
        update_policy_cpu(policy, new_cpu);
        unlock_policy_rwsem_write(cpu);

{lock|unlock}_policy_rwsem_write(cpu) takes/releases policy->cpu's rwsem.
Because cpu is changing with the call to update_policy_cpu(), the
unlock_policy_rwsem_write() will release the incorrect lock.

The right solution would be to release the same lock as was taken earlier. Also
update_policy_cpu() was also called from cpufreq_add_dev() without any locks and
so its better if we move this locking to inside update_policy_cpu().

This patch fixes a regression introduced in 3.12 by commit f9ba680d23
(cpufreq: Extract the handover of policy cpu to a helper function).

Reported-and-tested-by: Jon Medhurst<tixy@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-18 00:01:52 +02:00
Viresh Kumar 9c8f1ee40b cpufreq: Clear policy->cpus bits in __cpufreq_remove_dev_finish()
This broke after a recent change "cedb70a cpufreq: Split __cpufreq_remove_dev()
into two parts" from Srivatsa.

Consider a scenario where we have two CPUs in a policy (0 & 1) and we are
removing CPU 1. On the call to __cpufreq_remove_dev_prepare() we have cleared 1
from policy->cpus and now on a call to __cpufreq_remove_dev_finish() we read
cpumask_weight of policy->cpus, which will come as 1 and this code will behave
as if we are removing the last CPU from policy :)

Fix it by clearing the CPU mask in __cpufreq_remove_dev_finish() instead of
__cpufreq_remove_dev_prepare().

Tested-by: Stephen Warren <swarren@wwwdotorg.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-18 00:01:27 +02:00
Lan Tianyu 44871c9c7f cpufreq: Acquire the lock in cpufreq_policy_restore() for reading
In cpufreq_policy_restore() before system suspend policy is read from
percpu's cpufreq_cpu_data_fallback.  It's a read operation rather
than a write one, so take the lock for reading in there.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-11 23:30:03 +02:00
Srivatsa S. Bhat cb38ed5cf1 cpufreq: Prevent problems in update_policy_cpu() if last_cpu == new_cpu
If update_policy_cpu() is invoked with the existing policy->cpu itself
as the new-cpu parameter, then a lot of things can go terribly wrong.

In its present form, update_policy_cpu() always assumes that the new-cpu
is different from policy->cpu and invokes other functions to perform their
respective updates. And those functions implement the actual update like
this:

per_cpu(..., new_cpu) = per_cpu(..., last_cpu);
per_cpu(..., last_cpu) = NULL;

Thus, when new_cpu == last_cpu, the final NULL assignment makes the per-cpu
references vanish into thin air! (memory leak). From there, it leads to more
problems: cpufreq_stats_create_table() now doesn't find the per-cpu reference
and hence tries to create a new sysfs-group; but sysfs already had created
the group earlier, so it complains that it cannot create a duplicate filename.
In short, the repercussions of a rather innocuous invocation of
update_policy_cpu() can turn out to be pretty nasty.

Ideally update_policy_cpu() should handle this situation (new == last)
gracefully, and not lead to such severe problems. So fix it by adding an
appropriate check.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-11 23:29:57 +02:00
Srivatsa S. Bhat 61173f256a cpufreq: Restructure if/else block to avoid unintended behavior
In __cpufreq_remove_dev_prepare(), the code which decides whether to remove
the sysfs link or nominate a new policy cpu, is governed by an if/else block
with a rather complex set of conditionals. Worse, they harbor a subtlety
which leads to certain unintended behavior.

The code looks like this:

        if (cpu != policy->cpu && !frozen) {
                sysfs_remove_link(&dev->kobj, "cpufreq");
        } else if (cpus > 1) {
		new_cpu = cpufreq_nominate_new_policy_cpu(...);
		...
		update_policy_cpu(..., new_cpu);
	}

The original intention was:
If the CPU going offline is not policy->cpu, just remove the link.
On the other hand, if the CPU going offline is the policy->cpu itself,
handover the policy->cpu job to some other surviving CPU in that policy.

But because the 'if' condition also includes the 'frozen' check, now there
are *two* possibilities by which we can enter the 'else' block:

1. cpu == policy->cpu (intended)
2. cpu != policy->cpu && frozen (unintended)

Due to the second (unintended) scenario, we end up spuriously nominating
a CPU as the policy->cpu, even when the existing policy->cpu is alive and
well. This can cause problems further down the line, especially when we end
up nominating the same policy->cpu as the new one (ie., old == new),
because it totally confuses update_policy_cpu().

To avoid this mess, restructure the if/else block to only do what was
originally intended, and thus prevent any unwelcome surprises.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-11 23:29:57 +02:00
Srivatsa S. Bhat 0d66b91ebf cpufreq: Fix crash in cpufreq-stats during suspend/resume
Stephen Warren reported that the cpufreq-stats code hits a NULL pointer
dereference during the second attempt to suspend a system. He also
pin-pointed the problem to commit 5302c3f "cpufreq: Perform light-weight
init/teardown during suspend/resume".

That commit actually ensured that the cpufreq-stats table and the
cpufreq-stats sysfs entries are *not* torn down (ie., not freed) during
suspend/resume, which makes it all the more surprising. However, it turns
out that the root-cause is not that we access an already freed memory, but
that the reference to the allocated memory gets moved around and we lose
track of that during resume, leading to the reported crash in a subsequent
suspend attempt.

In the suspend path, during CPU offline, the value of policy->cpu is updated
by choosing one of the surviving CPUs in that policy, as long as there is
atleast one CPU in that policy. And cpufreq_stats_update_policy_cpu() is
invoked to update the reference to the stats structure by assigning it to
the new CPU. However, in the resume path, during CPU online, we end up
assigning a fresh CPU as the policy->cpu, without letting cpufreq-stats
know about this. Thus the reference to the stats structure remains
(incorrectly) associated with the old CPU. So, in a subsequent suspend attempt,
during CPU offline, we end up accessing an incorrect location to get the
stats structure, which eventually leads to the NULL pointer dereference.

Fix this by letting cpufreq-stats know about the update of the policy->cpu
during CPU online in the resume path. (Also, move the update_policy_cpu()
function higher up in the file, so that __cpufreq_add_dev() can invoke
it).

Reported-and-tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-11 23:29:57 +02:00
Rafael J. Wysocki 798282a871 Revert "cpufreq: make sure frequency transitions are serialized"
Commit 7c30ed5 (cpufreq: make sure frequency transitions are
serialized) attempted to serialize frequency transitions by
adding checks to the CPUFREQ_PRECHANGE and CPUFREQ_POSTCHANGE
notifications.  However, it assumed that the notifications will
always originate from the driver's .target() callback, but they
also can be triggered by cpufreq_out_of_sync() and that leads to
warnings like this on some systems:

 WARNING: CPU: 0 PID: 14543 at drivers/cpufreq/cpufreq.c:317
 __cpufreq_notify_transition+0x238/0x260()
 In middle of another frequency transition

accompanied by a call trace similar to this one:

 [<ffffffff81720daa>] dump_stack+0x46/0x58
 [<ffffffff8106534c>] warn_slowpath_common+0x8c/0xc0
 [<ffffffff815b8560>] ? acpi_cpufreq_target+0x320/0x320
 [<ffffffff81065436>] warn_slowpath_fmt+0x46/0x50
 [<ffffffff815b1ec8>] __cpufreq_notify_transition+0x238/0x260
 [<ffffffff815b33be>] cpufreq_notify_transition+0x3e/0x70
 [<ffffffff815b345d>] cpufreq_out_of_sync+0x6d/0xb0
 [<ffffffff815b370c>] cpufreq_update_policy+0x10c/0x160
 [<ffffffff815b3760>] ? cpufreq_update_policy+0x160/0x160
 [<ffffffff81413813>] cpufreq_set_cur_state+0x8c/0xb5
 [<ffffffff814138df>] processor_set_cur_state+0xa3/0xcf
 [<ffffffff8158e13c>] thermal_cdev_update+0x9c/0xb0
 [<ffffffff8159046a>] step_wise_throttle+0x5a/0x90
 [<ffffffff8158e21f>] handle_thermal_trip+0x4f/0x140
 [<ffffffff8158e377>] thermal_zone_device_update+0x57/0xa0
 [<ffffffff81415b36>] acpi_thermal_check+0x2e/0x30
 [<ffffffff81415ca0>] acpi_thermal_notify+0x40/0xdc
 [<ffffffff813e7dbd>] acpi_device_notify+0x19/0x1b
 [<ffffffff813f8241>] acpi_ev_notify_dispatch+0x41/0x5c
 [<ffffffff813e3fbe>] acpi_os_execute_deferred+0x25/0x32
 [<ffffffff81081060>] process_one_work+0x170/0x4a0
 [<ffffffff81082121>] worker_thread+0x121/0x390
 [<ffffffff81082000>] ? manage_workers.isra.20+0x170/0x170
 [<ffffffff81088fe0>] kthread+0xc0/0xd0
 [<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0
 [<ffffffff8173582c>] ret_from_fork+0x7c/0xb0
 [<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0

For this reason, revert commit 7c30ed5 along with the fix 266c13d
(cpufreq: Fix serialization of frequency transitions) on top of it
and we will revisit the serialization problem later.

Reported-by: Alessandro Bono <alessandro.bono@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10 02:54:50 +02:00
Srivatsa S. Bhat 5136fa5658 cpufreq: Use signed type for 'ret' variable, to store negative error values
There are places where the variable 'ret' is declared as unsigned int
and then used to store negative return values such as -EINVAL. Fix them
by declaring the variable as a signed quantity.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10 02:49:48 +02:00
Srivatsa S. Bhat 56d07db274 cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writes
Commit "cpufreq: serialize calls to __cpufreq_governor()" had been a temporary
and partial solution to the race condition between writing to a cpufreq sysfs
file and taking a CPU offline. Now that we have a proper and complete solution
to that problem, remove the temporary fix.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10 02:49:47 +02:00
Srivatsa S. Bhat 4f750c9308 cpufreq: Synchronize the cpufreq store_*() routines with CPU hotplug
The functions that are used to write to cpufreq sysfs files (such as
store_scaling_max_freq()) are not hotplug safe. They can race with CPU
hotplug tasks and lead to problems such as trying to acquire an already
destroyed timer-mutex etc.

Eg:

    __cpufreq_remove_dev()
     __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
       policy->governor->governor(policy, CPUFREQ_GOV_STOP);
        cpufreq_governor_dbs()
         case CPUFREQ_GOV_STOP:
          mutex_destroy(&cpu_cdbs->timer_mutex)
          cpu_cdbs->cur_policy = NULL;
      <PREEMPT>
    store()
     __cpufreq_set_policy()
      __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
        policy->governor->governor(policy, CPUFREQ_GOV_LIMITS);
         case CPUFREQ_GOV_LIMITS:
          mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex)
           if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL

So use get_online_cpus()/put_online_cpus() in the store_*() functions, to
synchronize with CPU hotplug. However, there is an additional point to note
here: some parts of the CPU teardown in the cpufreq subsystem are done in
the CPU_POST_DEAD stage, with cpu_hotplug.lock *released*. So, using the
get/put_online_cpus() functions alone is insufficient; we should also ensure
that we don't race with those latter steps in the hotplug sequence. We can
easily achieve this by checking if the CPU is online before proceeding with
the store, since the CPU would have been marked offline by the time the
CPU_POST_DEAD notifiers are executed.

Reported-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10 02:49:47 +02:00
Srivatsa S. Bhat 1aee40ac9c cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lock
__cpufreq_remove_dev_finish() handles the kobject cleanup for a CPU going
offline. But because we destroy the kobject towards the end of the CPU offline
phase, there are certain race windows where a task can try to write to a
cpufreq sysfs file (eg: using store_scaling_max_freq()) while we are taking
that CPU offline, and this can bump up the kobject refcount, which in turn might
hinder the CPU offline task from running to completion. (It can also cause
other more serious problems such as trying to acquire a destroyed timer-mutex
etc., depending on the exact stage of the cleanup at which the task managed to
take a new refcount).

To fix the race window, we will need to synchronize those store_*() call-sites
with CPU hotplug, using get_online_cpus()/put_online_cpus(). However, that
in turn can cause a total deadlock because it can end up waiting for the
CPU offline task to complete, with incremented refcount!

Write to sysfs                            CPU offline task
--------------                            ----------------
kobj_refcnt++

                                          Acquire cpu_hotplug.lock

get_online_cpus();

					  Wait for kobj_refcnt to drop to zero

                     **DEADLOCK**

A simple way to avoid this problem is to perform the kobject cleanup in the
CPU offline path, with the cpu_hotplug.lock *released*. That is, we can
perform the wait-for-kobj-refcnt-to-drop as well as the subsequent cleanup
in the CPU_POST_DEAD stage of CPU offline, which is run with cpu_hotplug.lock
released. Doing this helps us avoid deadlocks due to holding kobject refcounts
and waiting on each other on the cpu_hotplug.lock.

(Note: We can't move all of the cpufreq CPU offline steps to the
CPU_POST_DEAD stage, because certain things such as stopping the governors
have to be done before the outgoing CPU is marked offline. So retain those
parts in the CPU_DOWN_PREPARE stage itself).

Reported-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10 02:49:47 +02:00
Srivatsa S. Bhat cedb70afd0 cpufreq: Split __cpufreq_remove_dev() into two parts
During CPU offline, the cpufreq core invokes __cpufreq_remove_dev()
to perform work such as stopping the cpufreq governor, clearing the
CPU from the policy structure etc, and finally cleaning up the
kobject.

There are certain subtle issues related to the kobject cleanup, and
it would be much easier to deal with them if we separate that part
from the rest of the cleanup-work in the CPU offline phase. So split
the __cpufreq_remove_dev() function into 2 parts: one that handles
the kobject cleanup, and the other that handles the rest of the work.

Reported-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10 02:49:46 +02:00
Viresh Kumar 19c763031a cpufreq: serialize calls to __cpufreq_governor()
We can't take a big lock around __cpufreq_governor() as this causes
recursive locking for some cases. But calls to this routine must be
serialized for every policy. Otherwise we can see some unpredictable
events.

For example, consider following scenario:

__cpufreq_remove_dev()
 __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
   policy->governor->governor(policy, CPUFREQ_GOV_STOP);
    cpufreq_governor_dbs()
     case CPUFREQ_GOV_STOP:
      mutex_destroy(&cpu_cdbs->timer_mutex)
      cpu_cdbs->cur_policy = NULL;
  <PREEMPT>
store()
 __cpufreq_set_policy()
  __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
    policy->governor->governor(policy, CPUFREQ_GOV_LIMITS);
     case CPUFREQ_GOV_LIMITS:
      mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex)
       if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL

And so store() will eventually result in a crash if cur_policy is
NULL at this point.

Introduce an additional variable which would guarantee serialization
here.

Reported-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10 02:49:46 +02:00
Viresh Kumar f73d393384 cpufreq: don't allow governor limits to be changed when it is disabled
__cpufreq_governor() returns with -EBUSY when governor is already
stopped and we try to stop it again, but when it is stopped we must
not allow calls to CPUFREQ_GOV_LIMITS event as well.

This patch adds this check in __cpufreq_governor().

Reported-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10 02:49:45 +02:00
Li Zhong 5025d628c8 cpufreq: fix bad unlock balance on !CONFIG_SMP
This patch tries to fix lockdep complaint attached below.

It seems that we should always read acquire the cpufreq_rwsem,
whether CONFIG_SMP is enabled or not.  And CONFIG_HOTPLUG_CPU
depends on CONFIG_SMP, so it seems we don't need CONFIG_SMP for the
code enabled by CONFIG_HOTPLUG_CPU.

[    0.504191] =====================================
[    0.504627] [ BUG: bad unlock balance detected! ]
[    0.504627] 3.11.0-rc6-next-20130819 #1 Not tainted
[    0.504627] -------------------------------------
[    0.504627] swapper/1 is trying to release lock (cpufreq_rwsem) at:
[    0.504627] [<ffffffff813d927a>] cpufreq_add_dev+0x13a/0x3e0
[    0.504627] but there are no more locks to release!
[    0.504627]
[    0.504627] other info that might help us debug this:
[    0.504627] 1 lock held by swapper/1:
[    0.504627]  #0:  (subsys mutex#4){+.+.+.}, at: [<ffffffff8134a7bf>] subsys_interface_register+0x4f/0xe0
[    0.504627]
[    0.504627] stack backtrace:
[    0.504627] CPU: 0 PID: 1 Comm: swapper Not tainted 3.11.0-rc6-next-20130819 #1
[    0.504627] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[    0.504627]  ffffffff813d927a ffff88007f847c98 ffffffff814c062b ffff88007f847cc8
[    0.504627]  ffffffff81098bce ffff88007f847cf8 ffffffff81aadc30 ffffffff813d927a
[    0.504627]  00000000ffffffff ffff88007f847d68 ffffffff8109d0be 0000000000000006
[    0.504627] Call Trace:
[    0.504627]  [<ffffffff813d927a>] ? cpufreq_add_dev+0x13a/0x3e0
[    0.504627]  [<ffffffff814c062b>] dump_stack+0x19/0x1b
[    0.504627]  [<ffffffff81098bce>] print_unlock_imbalance_bug+0xfe/0x110
[    0.504627]  [<ffffffff813d927a>] ? cpufreq_add_dev+0x13a/0x3e0
[    0.504627]  [<ffffffff8109d0be>] lock_release_non_nested+0x1ee/0x310
[    0.504627]  [<ffffffff81099d0e>] ? mark_held_locks+0xae/0x120
[    0.504627]  [<ffffffff811510cb>] ? kfree+0xcb/0x1d0
[    0.504627]  [<ffffffff813d77ea>] ? cpufreq_policy_free+0x4a/0x60
[    0.504627]  [<ffffffff813d927a>] ? cpufreq_add_dev+0x13a/0x3e0
[    0.504627]  [<ffffffff8109d2a4>] lock_release+0xc4/0x250
[    0.504627]  [<ffffffff8106c9f3>] up_read+0x23/0x40
[    0.504627]  [<ffffffff813d927a>] cpufreq_add_dev+0x13a/0x3e0
[    0.504627]  [<ffffffff8134a809>] subsys_interface_register+0x99/0xe0
[    0.504627]  [<ffffffff81b19f3b>] ? cpufreq_gov_dbs_init+0x12/0x12
[    0.504627]  [<ffffffff813d7f0d>] cpufreq_register_driver+0x9d/0x1d0
[    0.504627]  [<ffffffff81b19f3b>] ? cpufreq_gov_dbs_init+0x12/0x12
[    0.504627]  [<ffffffff81b1a039>] acpi_cpufreq_init+0xfe/0x1f8
[    0.504627]  [<ffffffff810002ba>] do_one_initcall+0xda/0x180
[    0.504627]  [<ffffffff81ae301e>] kernel_init_freeable+0x12c/0x1bb
[    0.504627]  [<ffffffff81ae2841>] ? do_early_param+0x8c/0x8c
[    0.504627]  [<ffffffff814b4dd0>] ? rest_init+0x140/0x140
[    0.504627]  [<ffffffff814b4dde>] kernel_init+0xe/0xf0
[    0.504627]  [<ffffffff814d029a>] ret_from_fork+0x7a/0xb0
[    0.504627]  [<ffffffff814b4dd0>] ? rest_init+0x140/0x140

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Acked-and-tested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-21 02:04:31 +02:00
Viresh Kumar 1b27429446 cpufreq: Use cpufreq_policy_list for iterating over policies
To iterate over all policies we currently iterate over all online
CPUs and then get the policy for each of them which is suboptimal.
Use the newly created cpufreq_policy_list for this purpose instead.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-20 15:43:50 +02:00
Viresh Kumar 474deff744 cpufreq: remove cpufreq_policy_cpu per-cpu variable
cpufreq_policy_cpu per-cpu variables are used for storing the ID of
the CPU that manages the given CPU's policy.  However, we also store
a policy pointer for each cpu in cpufreq_cpu_data, so the
cpufreq_policy_cpu information is simply redundant.

It is better to use cpufreq_cpu_data to retrieve a policy and get
policy->cpu from there, so make that happen everywhere and drop the
cpufreq_policy_cpu per-cpu variables which aren't necessary any more.

[rjw: Changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-20 15:43:50 +02:00
Viresh Kumar 9e9fd80167 cpufreq: remove unnecessary check in __cpufreq_governor()
We don't need to check if event is CPUFREQ_GOV_POLICY_INIT and put
governor module as we are sure event can only be START/STOP here.

Remove the useless check.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-20 15:43:50 +02:00
Viresh Kumar 9515f4d69b cpufreq: remove policy from cpufreq_policy_list during suspend
cpufreq_policy_list is a list of active policies.  We do remove
policies from this list when all CPUs belonging to that policy are
removed.  But during system suspend we don't really free a policy
struct as it will be used again during resume, so we didn't remove
it from cpufreq_policy_list as well..

However, this is incorrect.  We are saying this policy isn't valid
anymore and must not be referenced (though we haven't freed it), but
it can still be used by code that iterates over cpufreq_policy_list.

Remove policy from this list during system suspend as well.
Of course, we must add it back whenever the first CPU belonging to
that policy shows up.

[rjw: Changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-20 15:43:50 +02:00
Viresh Kumar edab2fbc21 cpufreq: Fix white space in __cpufreq_remove_dev()
Align closing brace '}' of an if block.

[rjw: Subject and changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-20 15:43:50 +02:00
Rafael J. Wysocki 878f6e074e Revert "cpufreq: Use cpufreq_policy_list for iterating over policies"
Revert commit eb60852 (cpufreq: Use cpufreq_policy_list for iterating
over policies), because it breaks system suspend/resume on multiple
machines.

It either causes resume to block indefinitely or causes the BUG_ON()
in lock_policy_rwsem_##mode() to trigger on sysfs accesses to cpufreq
attributes.

Conflicts:
	drivers/cpufreq/cpufreq.c
2013-08-18 15:35:59 +02:00
Viresh Kumar 3de9bdeb28 cpufreq: improve error checking on return values of __cpufreq_governor()
The __cpufreq_governor() function can fail in rare cases especially
if there are bugs in cpufreq drivers.  Thus we must stop processing
as soon as this routine fails, otherwise it may result in undefined
behavior.

This patch adds error checking code whenever this routine is called
from any place.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-10 03:24:48 +02:00
Viresh Kumar 6eed9404ab cpufreq: Use rwsem for protecting critical sections
Critical sections of the cpufreq core are protected with the help of
the driver module owner's refcount, which isn't the correct approach,
because it causes rmmod to return an error when some routine has
updated that refcount.

Let's use rwsem for this purpose instead.  Only
cpufreq_unregister_driver() will use write sem
and everybody else will use read sem.

[rjw: Subject & changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-10 03:24:47 +02:00
Viresh Kumar fe492f3f03 cpufreq: Fix broken usage of governor->owner's refcount
The cpufreq governor owner refcount usage is broken.  We should only
increment that refcount when a CPUFREQ_GOV_POLICY_INIT event has come
and it should only be decremented if CPUFREQ_GOV_POLICY_EXIT has come.

Currently, there can be situations where the governor is in use, but
we have allowed it to be unloaded which may result in undefined
behavior.  Let's fix it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-10 03:24:47 +02:00
Viresh Kumar eb608521f1 cpufreq: Use cpufreq_policy_list for iterating over policies
To iterate over all policies we currently iterate over all CPUs and
then get the policy for each of them.  Let's use the newly created
cpufreq_policy_list for this purpose.

[rjw: Changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-10 03:24:46 +02:00
Lukasz Majewski c88a1f8b96 cpufreq: Store cpufreq policies in a list
Policies available in the cpufreq framework are now linked together.
They are accessible via cpufreq_policy_list defined in the cpufreq
core.

[rjw: Fix from Yinghai Lu folded in]
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Myungjoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-10 03:24:06 +02:00
Viresh Kumar d5b73cd870 cpufreq: Use sizeof(*ptr) convetion for computing sizes
Chapter 14 of Documentation/CodingStyle says:

The preferred form for passing a size of a struct is the following:

	p = kmalloc(sizeof(*p), ...);

The alternative form where struct name is spelled out hurts
readability and introduces an opportunity for a bug when the pointer
variable type is changed but the corresponding sizeof that is passed
to a memory allocator is not.

This wasn't followed consistently in drivers/cpufreq, let's make it
more consistent by always following this rule.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07 23:34:10 +02:00
Viresh Kumar 3a3e9e06d0 cpufreq: Give consistent names to cpufreq_policy objects
They are called policy, cur_policy, new_policy, data, etc.  Just call
them policy wherever possible.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07 23:34:10 +02:00
Viresh Kumar 5ff0a26803 cpufreq: Clean up header files included in the core
This patch addresses the following issues in the header files in the
cpufreq core:
 - Include headers in ascending order, so that we don't add same
   many times by mistake.
 - <asm/> must be included after <linux/>, so that they override
   whatever they need to.
 - Remove unnecessary includes.
 - Don't include files already included by cpufreq.h or
   cpufreq_governor.h.

[rjw: Changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07 23:34:09 +02:00
Rafael J. Wysocki 1133bfa6dc Merge branch 'pm-cpufreq-ondemand' into pm-cpufreq
* pm-cpufreq:
  cpufreq: Remove unused function __cpufreq_driver_getavg()
  cpufreq: Remove unused APERF/MPERF support
  cpufreq: ondemand: Change the calculation of target frequency
2013-08-07 23:11:43 +02:00
Viresh Kumar d8d3b47112 cpufreq: Pass policy to cpufreq_add_policy_cpu()
The caller of cpufreq_add_policy_cpu() already has a pointer to the
policy structure and there is no need to look it up again in
cpufreq_add_policy_cpu().  Let's pass it directly.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07 23:02:50 +02:00
Rafael J. Wysocki 10659ab7b5 cpufreq: Avoid double kobject_put() for the same kobject in error code path
The only case triggering a jump to the err_out_unregister label in
__cpufreq_add_dev() is when cpufreq_add_dev_interface() fails.
However, if cpufreq_add_dev_interface() fails, it calls kobject_put()
for the policy kobject in its error code path and since that causes
the kobject's refcount to become 0, the additional kobject_put() for
the same kobject under err_out_unregister and the
wait_for_completion() following it are pointless, so drop them.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2013-08-07 23:02:50 +02:00
Rafael J. Wysocki 71c3461ef7 cpufreq: Do not hold driver module references for additional policy CPUs
The cpufreq core is a little inconsistent in the way it uses the
driver module refcount.

Namely, if __cpufreq_add_dev() is called for a CPU that doesn't
share the policy object with any other CPUs, the driver module
refcount it grabs to start with will be dropped by it before
returning and will be equal to whatever it had been before that
function was invoked.

However, if the given CPU does share the policy object with other
CPUs, either cpufreq_add_policy_cpu() is called to link the new CPU
to the existing policy, or cpufreq_add_dev_symlink() is used to link
the other CPUs sharing the policy with it to the just created policy
object.  In that case, because both cpufreq_add_policy_cpu() and
cpufreq_add_dev_symlink() call cpufreq_cpu_get() for the given
policy (the latter possibly many times) without the balancing
cpufreq_cpu_put() (unless there is an error), the driver module
refcount will be left by __cpufreq_add_dev() with a nonzero value
(different from the initial one).

To remove that inconsistency make cpufreq_add_policy_cpu() execute
cpufreq_cpu_put() for the given policy before returning, which
decrements the driver module refcount so that it will be equal to its
initial value after __cpufreq_add_dev() returns.  Also remove the
cpufreq_cpu_get() call from cpufreq_add_dev_symlink(), since both the
policy refcount and the driver module refcount are nonzero when it is
called and they don't need to be bumped up by it.

Accordingly, drop the cpufreq_cpu_put() from __cpufreq_remove_dev(),
since it is only necessary to balance the cpufreq_cpu_get() called
by cpufreq_add_policy_cpu() or cpufreq_add_dev_symlink().

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2013-08-07 23:02:50 +02:00
Viresh Kumar 308b60e715 cpufreq: Don't pass CPU to cpufreq_add_dev_{symlink|interface}()
Pointer to struct cpufreq_policy is already passed to these routines
and we don't need to send policy->cpu to them as well.  So, get rid
of this extra argument and use policy->cpu everywhere.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07 23:02:49 +02:00
Viresh Kumar e8fdde1011 cpufreq: Remove extra variables from cpufreq_add_dev_symlink()
We call cpufreq_cpu_get() in cpufreq_add_dev_symlink() to increase usage
refcount of policy, but not to get a policy for the given CPU.  So, we
don't really need to capture the return value of this routine.  We can
simply use policy passed as an argument to cpufreq_add_dev_symlink().

Moreover debug print is rewritten to make it more clear.

[rjw: Changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07 23:02:49 +02:00
Srivatsa S. Bhat 5302c3fb2e cpufreq: Perform light-weight init/teardown during suspend/resume
Now that we have the infrastructure to perform a light-weight init/tear-down,
use that in the cpufreq CPU hotplug notifier when invoked from the
suspend/resume path.

This also ensures that the file permissions of the cpufreq sysfs files are
preserved across suspend/resume, something which commit a66b2e (cpufreq:
Preserve sysfs files across suspend/resume) originally intended to do, but
had to be reverted due to other problems.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07 23:02:49 +02:00
Srivatsa S. Bhat 8414809c6a cpufreq: Preserve policy structure across suspend/resume
To perform light-weight cpu-init and teardown in the cpufreq subsystem
during suspend/resume, we need to separate out the 2 main functionalities
of the cpufreq CPU hotplug callbacks, as outlined below:

1. Init/tear-down of core cpufreq and CPU-specific components, which are
   critical to the correct functioning of the cpufreq subsystem.

2. Init/tear-down of cpufreq sysfs files during suspend/resume.

The first part requires accurate updates to the policy structure such as
its ->cpus and ->related_cpus masks, whereas the second part requires that
the policy->kobj structure is not released or re-initialized during
suspend/resume.

To handle both these requirements, we need to allow updates to the policy
structure throughout suspend/resume, but prevent the structure from getting
freed up. Also, we must have a mechanism by which the cpu-up callbacks can
restore the policy structure, without allocating things afresh. (That also
helps avoid memory leaks).

To achieve this, we use 2 schemes:
a. Use a fallback per-cpu storage area for preserving the policy structures
   during suspend, so that they can be restored during resume appropriately.

b. Use the 'frozen' flag to determine when to free or allocate the policy
   structure vs when to restore the policy from the saved fallback storage.
   Thus we can successfully preserve the structure across suspend/resume.

Effectively, this helps us complete the separation of the 'light-weight'
and the 'full' init/tear-down sequences in the cpufreq subsystem, so that
this can be made use of in the suspend/resume scenario.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07 23:02:49 +02:00
Srivatsa S. Bhat a82fab2928 cpufreq: Introduce a flag ('frozen') to separate full vs temporary init/teardown
During suspend/resume we would like to do a light-weight init/teardown of
CPUs in the cpufreq subsystem and preserve certain things such as sysfs files
etc across suspend/resume transitions. Add a flag called 'frozen' to help
distinguish the full init/teardown sequence from the light-weight one.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07 23:02:48 +02:00
Srivatsa S. Bhat f9ba680d23 cpufreq: Extract the handover of policy cpu to a helper function
During cpu offline, when the policy->cpu is going down, some other CPU
present in the policy->cpus mask is nominated as the new policy->cpu.
Extract this functionality from __cpufreq_remove_dev() and implement
it in a helper function. This helps in upcoming code reorganization.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07 23:02:48 +02:00
Srivatsa S. Bhat e18f1682bc cpufreq: Extract non-interface related stuff from cpufreq_add_dev_interface
cpufreq_add_dev_interface() includes the work of exposing the interface
to the device, as well as a lot of unrelated stuff. Move the latter to
cpufreq_add_dev(), where it is more appropriate.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07 23:02:48 +02:00
Srivatsa S. Bhat e9698cc5d2 cpufreq: Add helper to perform alloc/free of policy structure
Separate out the allocation of the cpufreq policy structure (along with
its error handling) to a helper function. This makes the code easier to
read and also helps with some upcoming code reorganization.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07 23:02:48 +02:00
Srivatsa S. Bhat 23d328994b cpufreq: Fix misplaced call to cpufreq_update_policy()
The call to cpufreq_update_policy() is placed in the CPU hotplug callback
of cpufreq_stats, which has a higher priority than the CPU hotplug callback
of cpufreq-core. As a result, during CPU_ONLINE/CPU_ONLINE_FROZEN, we end up
calling cpufreq_update_policy() *before* calling cpufreq_add_dev() !
And for uninitialized CPUs, it just returns silently, not doing anything.

To add to that, cpufreq_stats is not even the right place to call
cpufreq_update_policy() to begin with. The cpufreq core ought to handle
this in its own callback, from an elegance/relevance perspective.

So move the invocation of cpufreq_update_policy() to cpufreq_cpu_callback,
and place it *after* cpufreq_add_dev().

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07 23:02:48 +02:00
Rafael J. Wysocki 2a99859932 cpufreq: Fix cpufreq driver module refcount balance after suspend/resume
Since cpufreq_cpu_put() called by __cpufreq_remove_dev() drops the
driver module refcount, __cpufreq_remove_dev() causes that refcount
to become negative for the cpufreq driver after a suspend/resume
cycle.

This is not the only bad thing that happens there, however, because
kobject_put() should only be called for the policy kobject at this
point if the CPU is not the last one for that policy.

Namely, if the given CPU is the last one for that policy, the
policy kobject's refcount should be 1 at this point, as set by
cpufreq_add_dev_interface(), and only needs to be dropped once for
the kobject to go away.  This actually happens under the cpu == 1
check, so it need not be done before by cpufreq_cpu_put().

On the other hand, if the given CPU is not the last one for that
policy, this means that cpufreq_add_policy_cpu() has been called
at least once for that policy and cpufreq_cpu_get() has been
called for it too.  To balance that cpufreq_cpu_get(), we need to
call cpufreq_cpu_put() in that case.

Thus, to fix the described problem and keep the reference
counters balanced in both cases, move the cpufreq_cpu_get() call
in __cpufreq_remove_dev() to the code path executed only for
CPUs that share the policy with other CPUs.

Reported-and-tested-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: 3.10+ <stable@vger.kernel.org>
2013-07-30 00:32:00 +02:00
Stratos Karafotis cffe4e0e74 cpufreq: Remove unused function __cpufreq_driver_getavg()
The target frequency calculation method in the ondemand governor has
changed and it is now independent of the measured average frequency.
Consequently, the __cpufreq_driver_getavg() function and getavg
member of struct cpufreq_driver are not used any more, so drop them.

[rjw: Changelog]
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-07-26 01:06:44 +02:00
Linus Torvalds b7356abb9f Power management and ACPI fixes for 3.11-rc2
- Two cpufreq commits from the 3.10 cycle introduced regressions.
   The first of them was buggy (it did way much more than it needed
   to do) and the second one attempted to fix an issue introduced by
   the first one.  Fixes from Srivatsa S Bhat revert both.
 
 - If autosleep triggers during system shutdown and the shutdown
   callbacks of some device drivers have been called already, it may
   crash the system.  Fix from Liu Shuo prevents that from happening
   by making try_to_suspend() check system_state.
 
 - The ACPI memory hotplug driver doesn't clear its driver_data on
   errors which may cause a NULL poiter dereference to happen later.
   Fix from Toshi Kani.
 
 - The ACPI namespace scanning code should not try to attach scan
   handlers to device objects that have them already, which may confuse
   things quite a bit, and it should rescan the whole namespace branch
   starting at the given node after receiving a bus check notify event
   even if the device at that particular node has been discovered
   already.  Fixes from Rafael J Wysocki.
 
 - New ACPI video blacklist entry for a system whose initial backlight
   setting from the BIOS doesn't make sense.  From Lan Tianyu.
 
 - Garbage string output avoindance for ACPI PNP from Liu Shuo.
 
 - Two Kconfig fixes for issues introduced recently in the s3c24xx
   cpufreq driver (when moving the driver to drivers/cpufreq) from
   Paul Bolle.
 
 - Trivial comment fix in pm_wakeup.h from Chanwoo Choi.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABAgAGBQJR6Sq+AAoJEKhOf7ml8uNsrGQP/0HRDW+QmTGM8znDTHXngbn9
 X3pqlpjEOiCtcmJaSJlD7GwLHMscwWcHKEezteaZ7KUI4mcnysJX6EY5YVbNriDC
 xlLcDQn9c6Xx1maCSfp+WMygvqItxZwuc8veRjrT3XtOfCNWS/FlX40Voh63BCAe
 GbfQ/HesmUg5CKplyD8/XypLWh5OFXmHzCe8IhrKGfhsZukXdSgSBjwQZMRrEMsQ
 kJjDCF8zUu0JisiWqL+xE6IFSKme9i6LBlHpzU0Y1g4RqAqkIbuS0Z3vezOYzoTD
 oZjBNa9XAgCS3x0l5g3G0ChgDAU+Mpji/imXA7JysrwbirGFbtPHtQYh2HzpAtnw
 Hkah/0ocBM7/w7VTsUQiRsFPdIJTCBLlm6J38x8yh7n84h4nJgOpK69dBLrMwCUZ
 f3kid6KIPVLBvnC3QSULrCAKUcUcVVWYtNho+sfXBMjP+cPwTmc3DvATnpru6twa
 0KjR5o585UOcciq7EWAoMrCFCfZYF5C4XGaZAxHI/SWooxeCQH84S8vfNLL2epVC
 ixmLYo4X2ANDsnfbUV+ewhB0/L2905Et6NhPUgPD/1rm15MEZbowbB2K0pzr0QL9
 /1hTL61InXx3jLxducJJFKN+HZ0zfDQdTkyafKrR9jb+GsdmnzYJ/vnfDG8MfPjp
 GZ281YBqVmUeYJh5CPU+
 =IUmn
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management and ACPI fixes from Rafael Wysocki:
 "These are fixes collected over the last week, most importnatly two
  cpufreq reverts fixing regressions introduced in 3.10, an autoseelp
  fix preventing systems using it from crashing during shutdown and two
  ACPI scan fixes related to hotplug.

  Specifics:

   - Two cpufreq commits from the 3.10 cycle introduced regressions.
     The first of them was buggy (it did way much more than it needed to
     do) and the second one attempted to fix an issue introduced by the
     first one.  Fixes from Srivatsa S Bhat revert both.

   - If autosleep triggers during system shutdown and the shutdown
     callbacks of some device drivers have been called already, it may
     crash the system.  Fix from Liu Shuo prevents that from happening
     by making try_to_suspend() check system_state.

   - The ACPI memory hotplug driver doesn't clear its driver_data on
     errors which may cause a NULL poiter dereference to happen later.
     Fix from Toshi Kani.

   - The ACPI namespace scanning code should not try to attach scan
     handlers to device objects that have them already, which may
     confuse things quite a bit, and it should rescan the whole
     namespace branch starting at the given node after receiving a bus
     check notify event even if the device at that particular node has
     been discovered already.  Fixes from Rafael J Wysocki.

   - New ACPI video blacklist entry for a system whose initial backlight
     setting from the BIOS doesn't make sense.  From Lan Tianyu.

   - Garbage string output avoindance for ACPI PNP from Liu Shuo.

   - Two Kconfig fixes for issues introduced recently in the s3c24xx
     cpufreq driver (when moving the driver to drivers/cpufreq) from
     Paul Bolle.

   - Trivial comment fix in pm_wakeup.h from Chanwoo Choi"

* tag 'pm+acpi-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / video: ignore BIOS initial backlight value for Fujitsu E753
  PNP / ACPI: avoid garbage in resource name
  cpufreq: Revert commit 2f7021a8 to fix CPU hotplug regression
  cpufreq: s3c24xx: fix "depends on ARM_S3C24XX" in Kconfig
  cpufreq: s3c24xx: rename CONFIG_CPU_FREQ_S3C24XX_DEBUGFS
  PM / Sleep: Fix comment typo in pm_wakeup.h
  PM / Sleep: avoid 'autosleep' in shutdown progress
  cpufreq: Revert commit a66b2e to fix suspend/resume regression
  ACPI / memhotplug: Fix a stale pointer in error path
  ACPI / scan: Always call acpi_bus_scan() for bus check notifications
  ACPI / scan: Do not try to attach scan handlers to devices having them
2013-07-19 09:59:06 -07:00
Paul Gortmaker 2760984f65 cpufreq: delete __cpuinit usage from all cpufreq files
The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications.  For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.

After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out.  Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.

This removes all the drivers/cpufreq uses of the __cpuinit macros
from all C files.

[1] https://lkml.org/lkml/2013/5/20/589

[v2: leave 2nd lines of args misaligned as requested by Viresh]
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: cpufreq@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-07-14 19:36:57 -04:00
Srivatsa S. Bhat aae760ed21 cpufreq: Revert commit a66b2e to fix suspend/resume regression
commit a66b2e (cpufreq: Preserve sysfs files across suspend/resume)
has unfortunately caused several things in the cpufreq subsystem to
break subtly after a suspend/resume cycle.

The intention of that patch was to retain the file permissions of the
cpufreq related sysfs files across suspend/resume.  To achieve that,
the commit completely removed the calls to cpufreq_add_dev() and
__cpufreq_remove_dev() during suspend/resume transitions.  But the
problem is that those functions do 2 kinds of things:
  1. Low-level initialization/tear-down that are critical to the
     correct functioning of cpufreq-core.
  2. Kobject and sysfs related initialization/teardown.

Ideally we should have reorganized the code to cleanly separate these
two responsibilities, and skipped only the sysfs related parts during
suspend/resume.  Since we skipped the entire callbacks instead (which
also included some CPU and cpufreq-specific critical components),
cpufreq subsystem started behaving erratically after suspend/resume.

So revert the commit to fix the regression.  We'll revisit and address
the original goal of that commit separately, since it involves quite a
bit of careful code reorganization and appears to be non-trivial.

(While reverting the commit, note that another commit f51e1eb
 (cpufreq: Fix cpufreq regression after suspend/resume) already
 reverted part of the original set of changes.  So revert only the
 remaining ones).

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Paul Bolle <pebolle@tiscali.nl>
Cc: 3.10+ <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-07-15 01:31:36 +02:00
Viresh Kumar 266c13d767 cpufreq: Fix serialization of frequency transitions
Commit 7c30ed ("cpufreq: make sure frequency transitions are serialized")
interacts poorly with systems that have a single core freqency for all
cores.  On such systems we have a single policy for all cores with
several CPUs.  When we do a frequency transition the governor calls the
pre and post change notifiers which causes cpufreq_notify_transition()
per CPU.  Since the policy is the same for all of them all CPUs after
the first and the warnings added are generated by checking a per-policy
flag the warnings will be triggered for all cores after the first.

Fix this by allowing notifier to be called for n times. Where n is the number of
cpus in policy->cpus.

Reported-and-tested-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-07-04 13:12:44 +02:00
Lan Tianyu f4fd379784 acpi-cpufreq: Add new sysfs attribute freqdomain_cpus
Commits fcf8058 (cpufreq: Simplify cpufreq_add_dev()) and aa77a52
(cpufreq: acpi-cpufreq: Don't set policy->related_cpus from .init())
changed the contents of the "related_cpus" sysfs attribute on systems
where acpi-cpufreq is used and user space can't get the list of CPUs
which are in the same hardware coordination CPU domain (provided by
the ACPI AML method _PSD) via "related_cpus" any more.

To make up for that loss add a new sysfs attribute "freqdomian_cpus"
for the acpi-cpufreq driver which exposes the list of CPUs in the
same domain regardless of whether it is coordinated by hardware or
software.

[rjw: Changelog, documentation]
References: https://bugzilla.kernel.org/show_bug.cgi?id=58761
Reported-by: Jean-Philippe Halimi <jean-philippe.halimi@exascale-computing.eu>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-06-27 21:51:09 +02:00
Viresh Kumar 7c30ed532c cpufreq: make sure frequency transitions are serialized
Whenever we are changing frequency of a cpu, we are calling PRECHANGE and
POSTCHANGE notifiers. They must be serialized. i.e. PRECHANGE or POSTCHANGE
shouldn't be called twice contiguously.

This can happen due to bugs in users of __cpufreq_driver_target() or actual
cpufreq drivers who are sending these notifiers.

This patch adds some protection against this. Now, we keep track of the last
transaction and see if something went wrong.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-06-27 21:49:55 +02:00
Viresh Kumar 0956df9c84 cpufreq: make __cpufreq_notify_transition() static
__cpufreq_notify_transition() is used only in cpufreq.c,
make it static.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-06-21 01:08:16 +02:00
Viresh Kumar bb176f7d03 cpufreq: Fix minor formatting issues
There were a few noticeable formatting issues in core cpufreq code.
This cleans them up to make code look better.  The changes include:
 - Whitespace cleanup.
 - Rearrangements of code.
 - Multiline comments fixes.
 - Formatting changes to fit 80 columns.

Copyright information in cpufreq.c is also updated to include my name
for 2013.

[rjw: Changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-06-21 01:06:34 +02:00
Xiaoguang Chen 95731ebb11 cpufreq: Fix governor start/stop race condition
Cpufreq governors' stop and start operations should be carried out
in sequence.  Otherwise, there will be unexpected behavior, like in
the example below.

Suppose there are 4 CPUs and policy->cpu=CPU0, CPU1/2/3 are linked
to CPU0.  The normal sequence is:

 1) Current governor is userspace.  An application tries to set the
    governor to ondemand.  It will call __cpufreq_set_policy() in
    which it will stop the userspace governor and then start the
    ondemand governor.

 2) Current governor is userspace.  The online of CPU3 runs on CPU0.
    It will call cpufreq_add_policy_cpu() in which it will first
    stop the userspace governor, and then start it again.

If the sequence of the above two cases interleaves, it becomes:

 1) Application stops userspace governor
 2)                                  Hotplug stops userspace governor

which is a problem, because the governor shouldn't be stopped twice
in a row.  What happens next is:

 3) Application starts ondemand governor
 4)                                  Hotplug starts a governor

In step 4, the hotplug is supposed to start the userspace governor,
but now the governor has been changed by the application to ondemand,
so the ondemand governor is started once again, which is incorrect.

The solution is to prevent policy governors from being stopped
multiple times in a row.  A governor should only be stopped once for
one policy.  After it has been stopped, no more governor stop
operations should be executed.

Also add a mutex to serialize governor operations.

[rjw: Changelog.  And you owe me a beverage of my choice.]
Signed-off-by: Xiaoguang Chen <chenxg@marvell.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-06-21 00:56:04 +02:00
Viresh Kumar a262e94cdc cpufreq: remove unnecessary cpufreq_cpu_{get|put}() calls
struct cpufreq_policy is already passed as argument to some routines
like: __cpufreq_driver_getavg() and so we don't really need to do
cpufreq_cpu_get() before and cpufreq_cpu_put() in them to get a
policy structure.

Remove them.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-06-04 14:26:27 +02:00
Viresh Kumar 2361be2366 cpufreq: Don't create empty /sys/devices/system/cpu/cpufreq directory
When we don't have any file in cpu/cpufreq directory we shouldn't
create it. Specially with the introduction of per-policy governor
instance patchset, even governors are moved to
cpu/cpu*/cpufreq/governor-name directory and so this directory is
just not required.

Lets have it only when required.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-05-27 13:24:02 +02:00
Viresh Kumar 72a4ce340a cpufreq: Move get_cpu_idle_time() to cpufreq.c
Governors other than ondemand and conservative can also use
get_cpu_idle_time() and they aren't required to compile
cpufreq_governor.c. So, move these independent routines to
cpufreq.c instead.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-05-27 13:20:56 +02:00
Viresh Kumar 944e9a0316 cpufreq: governors: Move get_governor_parent_kobj() to cpufreq.c
get_governor_parent_kobj() can be used by any governor, generic
cpufreq governors or platform specific ones and so must be present in
cpufreq.c instead of cpufreq_governor.c.

This patch moves it to cpufreq.c. This also adds
EXPORT_SYMBOL_GPL(get_governor_parent_kobj) so that modules can use
this function too.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-05-27 13:20:56 +02:00
Viresh Kumar 3f869d6d41 cpufreq: Add EXPORT_SYMBOL_GPL for have_governor_per_policy
This patch adds: EXPORT_SYMBOL_GPL(have_governor_per_policy), so that
this routine can be used by modules too.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-05-27 13:20:55 +02:00
Viresh Kumar 955ef48335 cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT
With the rwsem lock around
__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT), we
get circular dependency when we call sysfs_remove_group().

 ======================================================
 [ INFO: possible circular locking dependency detected ]
 3.9.0-rc7+ #15 Not tainted
 -------------------------------------------------------
 cat/2387 is trying to acquire lock:
  (&per_cpu(cpu_policy_rwsem, cpu)){+++++.}, at: [<c02f6179>] lock_policy_rwsem_read+0x25/0x34

 but task is already holding lock:
  (s_active#41){++++.+}, at: [<c00f9bf7>] sysfs_read_file+0x4f/0xcc

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

-> #1 (s_active#41){++++.+}:
        [<c0055a79>] lock_acquire+0x61/0xbc
        [<c00fabf1>] sysfs_addrm_finish+0xc1/0x128
        [<c00f9819>] sysfs_hash_and_remove+0x35/0x64
        [<c00fbe6f>] remove_files.isra.0+0x1b/0x24
        [<c00fbea5>] sysfs_remove_group+0x2d/0xa8
        [<c02f9a0b>] cpufreq_governor_interactive+0x13b/0x35c
        [<c02f61df>] __cpufreq_governor+0x2b/0x8c
        [<c02f6579>] __cpufreq_set_policy+0xa9/0xf8
        [<c02f6b75>] store_scaling_governor+0x61/0x100
        [<c02f6f4d>] store+0x39/0x60
        [<c00f9b81>] sysfs_write_file+0xed/0x114
        [<c00b3fd1>] vfs_write+0x65/0xd8
        [<c00b424b>] sys_write+0x2f/0x50
        [<c000cdc1>] ret_fast_syscall+0x1/0x52

-> #0 (&per_cpu(cpu_policy_rwsem, cpu)){+++++.}:
        [<c0055253>] __lock_acquire+0xef3/0x13dc
        [<c0055a79>] lock_acquire+0x61/0xbc
        [<c03ee1f5>] down_read+0x25/0x30
        [<c02f6179>] lock_policy_rwsem_read+0x25/0x34
        [<c02f6edd>] show+0x21/0x58
        [<c00f9c0f>] sysfs_read_file+0x67/0xcc
        [<c00b40a7>] vfs_read+0x63/0xd8
        [<c00b41fb>] sys_read+0x2f/0x50
        [<c000cdc1>] ret_fast_syscall+0x1/0x52

 other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(s_active#41);
                                lock(&per_cpu(cpu_policy_rwsem, cpu));
                                lock(s_active#41);
   lock(&per_cpu(cpu_policy_rwsem, cpu));

  *** DEADLOCK ***

 2 locks held by cat/2387:
  #0:  (&buffer->mutex){+.+.+.}, at: [<c00f9bcd>] sysfs_read_file+0x25/0xcc
  #1:  (s_active#41){++++.+}, at: [<c00f9bf7>] sysfs_read_file+0x4f/0xcc

 stack backtrace:
 [<c0011d55>] (unwind_backtrace+0x1/0x9c) from [<c03e9a09>] (print_circular_bug+0x19d/0x1e8)
 [<c03e9a09>] (print_circular_bug+0x19d/0x1e8) from [<c0055253>] (__lock_acquire+0xef3/0x13dc)
 [<c0055253>] (__lock_acquire+0xef3/0x13dc) from [<c0055a79>] (lock_acquire+0x61/0xbc)
 [<c0055a79>] (lock_acquire+0x61/0xbc) from [<c03ee1f5>] (down_read+0x25/0x30)
 [<c03ee1f5>] (down_read+0x25/0x30) from [<c02f6179>] (lock_policy_rwsem_read+0x25/0x34)
 [<c02f6179>] (lock_policy_rwsem_read+0x25/0x34) from [<c02f6edd>] (show+0x21/0x58)
 [<c02f6edd>] (show+0x21/0x58) from [<c00f9c0f>] (sysfs_read_file+0x67/0xcc)
 [<c00f9c0f>] (sysfs_read_file+0x67/0xcc) from [<c00b40a7>] (vfs_read+0x63/0xd8)
 [<c00b40a7>] (vfs_read+0x63/0xd8) from [<c00b41fb>] (sys_read+0x2f/0x50)
 [<c00b41fb>] (sys_read+0x2f/0x50) from [<c000cdc1>] (ret_fast_syscall+0x1/0x52)

This lock isn't required while calling __cpufreq_governor(policy,
CPUFREQ_GOV_POLICY_EXIT). Remove it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-05-22 00:23:54 +02:00
Srivatsa S. Bhat a66b2e503f cpufreq: Preserve sysfs files across suspend/resume
The file permissions of cpufreq per-cpu sysfs files are not preserved
across suspend/resume because we internally go through the CPU
Hotplug path which reinitializes the file permissions on CPU online.

But the user is not supposed to know that we are using CPU hotplug
internally within suspend/resume (IOW, the kernel should not silently
wreck the user-set file permissions across a suspend cycle).
Therefore, we need to preserve the file permissions as they are
across suspend/resume.

The simplest way to achieve that is to just not touch the sysfs files
at all - ie., just ignore the CPU hotplug notifications in the
suspend/resume path (_FROZEN) in the cpufreq hotplug callback.

Reported-by: Robert Jarzmik <robert.jarzmik@intel.com>
Reported-by: Durgadoss R <durgadoss.r@intel.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-05-15 21:47:17 +02:00
Viresh Kumar d96038e0fa cpufreq: Issue CPUFREQ_GOV_POLICY_EXIT notifier before dropping policy refcount
We must call __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT) before
calling cpufreq_cpu_put(data), so that policy kobject have valid
fields. Otherwise, removing last online cpu of policy->cpus causes
this crash for ondemand/conservative governor.

 [<c00fb076>] (sysfs_find_dirent+0xe/0xa8) from [<c00fb1bd>] (sysfs_get_dirent+0x21/0x58)
 [<c00fb1bd>] (sysfs_get_dirent+0x21/0x58) from [<c00fc259>] (sysfs_remove_group+0x85/0xbc)
 [<c00fc259>] (sysfs_remove_group+0x85/0xbc) from [<c02faad9>] (cpufreq_governor_dbs+0x369/0x4a0)
 [<c02faad9>] (cpufreq_governor_dbs+0x369/0x4a0) from [<c02f66d7>] (__cpufreq_governor+0x2b/0x8c)
 [<c02f66d7>] (__cpufreq_governor+0x2b/0x8c) from [<c02f6893>] (__cpufreq_remove_dev.isra.12+0x15b/0x250)
 [<c02f6893>] (__cpufreq_remove_dev.isra.12+0x15b/0x250) from [<c03e91c7>] (cpufreq_cpu_callback+0x2f/0x3c)
 [<c03e91c7>] (cpufreq_cpu_callback+0x2f/0x3c) from [<c0036fe1>] (notifier_call_chain+0x45/0x54)
 [<c0036fe1>] (notifier_call_chain+0x45/0x54) from [<c001e611>] (__cpu_notify+0x1d/0x34)
 [<c001e611>] (__cpu_notify+0x1d/0x34) from [<c03e5833>] (_cpu_down+0x63/0x1ac)
 [<c03e5833>] (_cpu_down+0x63/0x1ac) from [<c03e5997>] (cpu_down+0x1b/0x30)
 [<c03e5997>] (cpu_down+0x1b/0x30) from [<c03e60eb>] (store_online+0x27/0x54)
 [<c03e60eb>] (store_online+0x27/0x54) from [<c0295629>] (dev_attr_store+0x11/0x18)
 [<c0295629>] (dev_attr_store+0x11/0x18) from [<c00f9edd>] (sysfs_write_file+0xed/0x114)
 [<c00f9edd>] (sysfs_write_file+0xed/0x114) from [<c00b42a9>] (vfs_write+0x65/0xd8)
 [<c00b42a9>] (vfs_write+0x65/0xd8) from [<c00b4523>] (sys_write+0x2f/0x50)
 [<c00b4523>] (sys_write+0x2f/0x50) from [<c000cdc1>] (ret_fast_syscall+0x1/0x52)

Of course this only impacted drivers which have
have_governor_per_policy set to true. i.e. big LITTLE cpufreq driver.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-05-12 14:04:16 +02:00
Rafael J. Wysocki 1c3d85dd4e cpufreq: Revert incorrect commit 5800043
Commit 5800043 (cpufreq: convert cpufreq_driver to using RCU) causes
the following call trace to be spit on boot:

 BUG: sleeping function called from invalid context at /scratch/rafael/work/linux-pm/mm/slab.c:3179
 in_atomic(): 0, irqs_disabled(): 0, pid: 292, name: systemd-udevd
 2 locks held by systemd-udevd/292:
  #0:  (subsys mutex){+.+.+.}, at: [<ffffffff8146851a>] subsys_interface_register+0x4a/0xe0
  #1:  (rcu_read_lock){.+.+.+}, at: [<ffffffff81538210>] cpufreq_add_dev_interface+0x60/0x5e0
 Pid: 292, comm: systemd-udevd Not tainted 3.9.0-rc8+ #323
 Call Trace:
  [<ffffffff81072c90>] __might_sleep+0x140/0x1f0
  [<ffffffff811581c2>] kmem_cache_alloc+0x42/0x2b0
  [<ffffffff811e7179>] sysfs_new_dirent+0x59/0x130
  [<ffffffff811e63cb>] sysfs_add_file_mode+0x6b/0x110
  [<ffffffff81538210>] ? cpufreq_add_dev_interface+0x60/0x5e0
  [<ffffffff810a3254>] ? __lock_is_held+0x54/0x80
  [<ffffffff811e647d>] sysfs_add_file+0xd/0x10
  [<ffffffff811e6541>] sysfs_create_file+0x21/0x30
  [<ffffffff81538280>] cpufreq_add_dev_interface+0xd0/0x5e0
  [<ffffffff81538210>] ? cpufreq_add_dev_interface+0x60/0x5e0
  [<ffffffffa000337f>] ? acpi_processor_get_platform_limit+0x32/0xbb [processor]
  [<ffffffffa022f540>] ? do_drv_write+0x70/0x70 [acpi_cpufreq]
  [<ffffffff810a3254>] ? __lock_is_held+0x54/0x80
  [<ffffffff8106c97e>] ? up_read+0x1e/0x40
  [<ffffffff8106e632>] ? __blocking_notifier_call_chain+0x72/0xc0
  [<ffffffff81538dbd>] cpufreq_add_dev+0x62d/0xae0
  [<ffffffff815389b8>] ? cpufreq_add_dev+0x228/0xae0
  [<ffffffff81468569>] subsys_interface_register+0x99/0xe0
  [<ffffffffa014d000>] ? 0xffffffffa014cfff
  [<ffffffff81535d5d>] cpufreq_register_driver+0x9d/0x200
  [<ffffffffa014d000>] ? 0xffffffffa014cfff
  [<ffffffffa014d0e9>] acpi_cpufreq_init+0xe9/0x1000 [acpi_cpufreq]
  [<ffffffff810002fa>] do_one_initcall+0x11a/0x170
  [<ffffffff810b4b87>] load_module+0x1cf7/0x2920
  [<ffffffff81322580>] ? ddebug_proc_open+0xb0/0xb0
  [<ffffffff816baee0>] ? retint_restore_args+0xe/0xe
  [<ffffffff810b5887>] sys_init_module+0xd7/0x120
  [<ffffffff816bb6d2>] system_call_fastpath+0x16/0x1b

which is quite obvious, because that commit put (multiple instances
of) sysfs_create_file() under rcu_read_lock()/rcu_read_unlock(),
although sysfs_create_file() may cause memory to be allocated with
GFP_KERNEL and that may sleep, which is not permitted in RCU read
critical section.

Revert the buggy commit altogether along with some changes on top
of it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-29 00:08:16 +02:00
Viresh Kumar 820c6ca293 cpufreq: Don't call __cpufreq_governor() for drivers without target()
Some cpufreq drivers implement their own governor and so don't need
us to call generic governors interface via __cpufreq_governor(). Few
recent commits haven't obeyed this law well and we saw some
regressions.

This patch is an attempt to fix the above issue.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reported-and-tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Dirk Brandewie <dirk.brandewie@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-22 00:48:03 +02:00
Viresh Kumar e4969ebac8 cpufreq: Call __cpufreq_governor() with correct policy->cpus mask
__cpufreq_governor() must be called with a correct policy->cpus mask.
In __cpufreq_remove_dev() we initially clear policy->cpus with
cpumask_clear_cpu() and then call
__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT). If the governor
is doing some per-cpu stuff in EXIT callback, this can create
uncertain behavior.

Generic governors in drivers/cpufreq/ doesn't do any per-cpu stuff
in EXIT callback and so we don't face any issues currently. But its
better to keep the code clean, so we don't face any issues in future.

Now, we call cpumask_clear_cpu() only when multiple cpus are managed
by policy.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-11 22:50:09 +02:00
Nathan Zimmer 5800043b24 cpufreq: convert cpufreq_driver to using RCU
We eventually would like to remove the rwlock cpufreq_driver_lock or
convert it back to a spinlock and protect the read sections with RCU.
The first step in that direction is to make cpufreq_driver use RCU.
I don't see an easy wasy to protect the cpufreq_cpu_data structure
with RCU, so I am leaving it with the rwlock for now since under
certain configs __cpufreq_cpu_get is a hot spot with 256+ cores.

[rjw: Subject, changelog, white space]
Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-10 13:19:26 +02:00
Viresh Kumar b43a7ffbf3 cpufreq: Notify all policy->cpus in cpufreq_notify_transition()
policy->cpus contains all online cpus that have single shared clock line. And
their frequencies are always updated together.

Many SMP system's cpufreq drivers take care of this in individual drivers but
the best place for this code is in cpufreq core.

This patch modifies cpufreq_notify_transition() to notify frequency change for
all cpus in policy->cpus and hence updates all users of this API.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Stephen Warren <swarren@nvidia.com>
Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-02 15:24:00 +02:00
Viresh Kumar 4d5dcc4211 cpufreq: governor: Implement per policy instances of governors
Currently, there can't be multiple instances of single governor_type.
If we have a multi-package system, where we have multiple instances
of struct policy (per package), we can't have multiple instances of
same governor. i.e. We can't have multiple instances of ondemand
governor for multiple packages.

Governors directory in sysfs is created at /sys/devices/system/cpu/cpufreq/
governor-name/. Which again reflects that there can be only one
instance of a governor_type in the system.

This is a bottleneck for multicluster system, where we want different
packages to use same governor type, but with different tunables.

This patch uses the infrastructure provided by earlier patch and
implements init/exit routines for ondemand and conservative
governors.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-01 01:11:34 +02:00
Viresh Kumar 7bd353a995 cpufreq: Add per policy governor-init/exit infrastructure
Currently, there can't be multiple instances of single governor_type.
If we have a multi-package system, where we have multiple instances
of struct policy (per package), we can't have multiple instances of
same governor. i.e. We can't have multiple instances of ondemand
governor for multiple packages.

Governors directory in sysfs is created at /sys/devices/system/cpu/cpufreq/
governor-name/. Which again reflects that there can be only one
instance of a governor_type in the system.

This is a bottleneck for multicluster system, where we want different
packages to use same governor type, but with different tunables.

This patch is inclined towards providing this infrastructure. Because
we are required to allocate governor's resources dynamically now, we
must do it at policy creation and end. And so got
CPUFREQ_GOV_POLICY_INIT/EXIT.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-01 01:11:34 +02:00
Nathan Zimmer 0d1857a1b9 cpufreq: Convert the cpufreq_driver_lock to a rwlock
This eliminates the contention I am seeing in __cpufreq_cpu_get.
It also nicely stages the lock to be replaced by the rcu.

Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-01 01:11:34 +02:00
Rafael J. Wysocki 4419fbd4b4 Merge branch 'pm-cpufreq'
* pm-cpufreq: (55 commits)
  cpufreq / intel_pstate: Fix 32 bit build
  cpufreq: conservative: Fix typos in comments
  cpufreq: ondemand: Fix typos in comments
  cpufreq: exynos: simplify .init() for setting policy->cpus
  cpufreq: kirkwood: Add a cpufreq driver for Marvell Kirkwood SoCs
  cpufreq/x86: Add P-state driver for sandy bridge.
  cpufreq_stats: do not remove sysfs files if frequency table is not present
  cpufreq: Do not track governor name for scaling drivers with internal governors.
  cpufreq: Only call cpufreq_out_of_sync() for driver that implement cpufreq_driver.target()
  cpufreq: Retrieve current frequency from scaling drivers with internal governors
  cpufreq: Fix locking issues
  cpufreq: Create a macro for unlock_policy_rwsem{read,write}
  cpufreq: Remove unused HOTPLUG_CPU code
  cpufreq: governors: Fix WARN_ON() for multi-policy platforms
  cpufreq: ondemand: Replace down_differential tuner with adj_up_threshold
  cpufreq / stats: Get rid of CPUFREQ_STATDEVICE_ATTR
  cpufreq: Don't check cpu_online(policy->cpu)
  cpufreq: add imx6q-cpufreq driver
  cpufreq: Don't remove sysfs link for policy->cpu
  cpufreq: Remove unnecessary use of policy->shared_type
  ...
2013-02-15 13:59:07 +01:00
Dirk Brandewie fa69e33f7d cpufreq: Do not track governor name for scaling drivers with internal governors.
Scaling drivers that implement internal governors do not have governor
structures assocaited with them.  Only track the name of the governor
associated with the CPU if the driver does not implement
cpufreq_driver.setpolicy()

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-09 12:55:53 +01:00
Dirk Brandewie f6b0515b07 cpufreq: Only call cpufreq_out_of_sync() for driver that implement cpufreq_driver.target()
Scaling drivers that implement cpufreq_driver.setpolicy() have
internal governors that do not signal changes via
cpufreq_notify_transition() so the frequncy in the policy will almost
certainly be different than the current frequncy.  Only call
cpufreq_out_of_sync() when the underlying driver implements
cpufreq_driver.target()

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-09 12:55:47 +01:00
Dirk Brandewie 9e21ba8bd8 cpufreq: Retrieve current frequency from scaling drivers with internal governors
Scaling drivers that implement the cpufreq_driver.setpolicy() versus
the cpufreq_driver.target() interface do not set policy->cur.

Normally policy->cur is set during the call to cpufreq_driver.target()
when the frequnecy request is made by the governor.

If the scaling driver implements cpufreq_driver.setpolicy() and
cpufreq_driver.get() interfaces use cpufreq_driver.get() to retrieve
the current frequency.

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-09 12:55:03 +01:00
Viresh Kumar 2eaa3e2df1 cpufreq: Fix locking issues
cpufreq core uses two locks:
- cpufreq_driver_lock: General lock for driver and cpufreq_cpu_data array.
- cpu_policy_rwsemfix locking: per CPU reader-writer semaphore designed to cure
  all cpufreq/hotplug/workqueue/etc related lock issues.

These locks were not used properly and are placed against their principle
(present before their definition) at various places. This patch is an attempt to
fix their use.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-09 01:22:57 +01:00
Viresh Kumar fa1d8af47f cpufreq: Create a macro for unlock_policy_rwsem{read,write}
On the lines of macro: lock_policy_rwsem, we can create another macro for
unlock_policy_rwsem. Lets do it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-09 01:22:06 +01:00
Viresh Kumar 65922465b5 cpufreq: Remove unused HOTPLUG_CPU code
Because the sibling cpu of any online cpu is identified very early in
cpufreq_add_dev(), below code is never executed. And so can be removed.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-09 01:21:37 +01:00
Viresh Kumar 8e53695f7f cpufreq: governors: Fix WARN_ON() for multi-policy platforms
On multi-policy systems there is a single instance of governor for both the
policies (if same governor is chosen for both policies). With the code update
from following patches:

8eeed09 cpufreq: governors: Get rid of dbs_data->enable field
b394058 cpufreq: governors: Reset tunables only for cpufreq_unregister_governor()

We are creating/removing sysfs directory of governor for for every call to
GOV_START and STOP. This would fail for multi-policy system as there is a
per-policy call to START/STOP.

This patch reuses the governor->initialized variable to detect total users of
governor.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-09 01:21:13 +01:00
Viresh Kumar 3361b7b173 cpufreq: Don't check cpu_online(policy->cpu)
policy->cpu or cpus in policy->cpus can't be offline anymore. And so we don't
need to check if they are online or not.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-09 01:18:34 +01:00
Viresh Kumar 73bf0fc2b0 cpufreq: Don't remove sysfs link for policy->cpu
"cpufreq" directory in policy->cpu is never created using
sysfs_create_link(), but using kobject_init_and_add(). And so we
shouldn't call sysfs_remove_link() for policy->cpu().  sysfs stuff
for policy->cpu is automatically removed when we call kobject_put()
for dying policy.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Dirk Brandewie <dirk.brandewie@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-05 22:21:14 +01:00
Viresh Kumar b394058f06 cpufreq: governors: Reset tunables only for cpufreq_unregister_governor()
Currently, whenever governor->governor() is called for CPUFRREQ_GOV_START event
we reset few tunables of governor. Which isn't correct, as this routine is
called for every cpu hot-[un]plugging event. We should actually be resetting
these only when the governor module is removed and re-installed.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-02 01:29:31 +01:00
Viresh Kumar fcf8058296 cpufreq: Simplify cpufreq_add_dev()
Currently cpufreq_add_dev() firsts allocates policy, calls
driver->init() and then checks if this CPU is already managed or not.
And if it is already managed, its policy is freed.

We can save all this if we somehow know that CPU is managed or not in
advance.  policy->related_cpus contains the list of all valid sibling
CPUs of policy->cpu. We can check this to see if the current CPU is
already managed.

From now on, platforms don't really need to set related_cpus from
their init() routines, as the same work is done by core too.

If a platform driver needs to set the related_cpus mask with some
additional CPUs, other than CPUs present in policy->cpus, they are
free to do it, though, as we don't override anything.

[rjw: Changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-02 00:01:16 +01:00
Viresh Kumar b26f72042e cpufreq: Revert "cpufreq: Don't use cpu removed during cpufreq_driver_unregister"
This reverts commit 956f339 "cpufreq: Don't use cpu removed during
cpufreq_driver_unregister".

With the addition of the following commit, this change/variable is not
required any more:

commit b9ba2725343ae57add3f324dfa5074167f48de96
Author: Viresh Kumar <viresh.kumar@linaro.org>
Date:   Mon Jan 14 13:23:03 2013 +0000

    cpufreq: Simplify __cpufreq_remove_dev()

[rjw: Subject and changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-02 00:01:16 +01:00
Borislav Petkov 9d95046e5d cpufreq: Add a get_current_driver helper
Add a helper function to return cpufreq_driver->name.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-02 00:01:15 +01:00
Dirk Brandewie d5aaffa9dd cpufreq: handle cpufreq being disabled for all exported function.
When disable_cpufreq() is called some exported functions are still
being used that do not have a check for cpufreq being disabled.

Add a disabled check into cpufreq_cpu_get() to return NULL if
cpufreq is disabled this covers most of the exported functions. For
the exported functions that do not call cpufreq_cpu_get() add an
explicit check.

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-02 00:01:14 +01:00
Viresh Kumar b8eed8af94 cpufreq: Simplify __cpufreq_remove_dev()
__cpufreq_remove_dev() is called on multiple occasions: cpufreq_driver
unregister and cpu removals.

Current implementation of this routine is overly complex without much need. If
the cpu to be removed is the policy->cpu, we remove the policy first and add all
other cpus again from policy->cpus and then finally call __cpufreq_remove_dev()
again to remove the cpu to be deleted. Haahhhh..

There exist a simple solution to removal of a cpu:
- Simply use the old policy structure
- update its fields like: policy->cpu, etc.
- notify any users of cpufreq, which depend on changing policy->cpu

Hence this patch, which tries to implement the above theory. It is tested well
by myself on ARM big.LITTLE TC2 SoC, which has 5 cores (2 A15 and 3 A7). Both
A15's share same struct policy and all A7's share same policy structure.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-02 00:01:14 +01:00
Viresh Kumar 6954ca9c8b cpufreq: Don't use cpu removed during cpufreq_driver_unregister
This is how the core works:
cpufreq_driver_unregister()
 - subsys_interface_unregister()
   - for_each_cpu() call cpufreq_remove_dev(), i.e. 0,1,2,3,4 when we
     unregister.

cpufreq_remove_dev():
 - Remove policy node
 - Call cpufreq_add_dev() for next cpu, sharing mask with removed cpu.
   i.e. When cpu 0 is removed, we call it for cpu 1. And when called for cpu 2,
   we call it for cpu 3.
   - cpufreq_add_dev() would call cpufreq_driver->init()
   - init would return mask as AND of 2, 3 and 4 for cluster A7.
   - cpufreq core would do online_cpu && policy->cpus
     Here is the BUG(). Because cpu hasn't died but we have just unregistered
     the cpufreq driver, online cpu would still have cpu 2 in it. And so thing
     go bad again.

Solution: Keep cpumask of cpus that are registered with cpufreq core and clear
	  cpus when we get a call from subsys_interface_unregister() via
	  cpufreq_remove_dev().

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-02 00:01:14 +01:00
Viresh Kumar f6a7409cab cpufreq: Notify governors when cpus are hot-[un]plugged
Because cpufreq core and governors worry only about the online cpus, if a cpu is
hot [un]plugged, we must notify governors about it, otherwise be ready to expect
something unexpected.

We already have notifiers in the form of CPUFREQ_GOV_START/CPUFREQ_GOV_STOP, we
just need to call them now.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-02 00:01:14 +01:00
Viresh Kumar 643ae6e81d cpufreq: Manage only online cpus
cpufreq core doesn't manage offline cpus and if driver->init() has returned
mask including offline cpus, it may result in unwanted behavior by cpufreq core
or governors.

We need to get only online cpus in this mask. There are two places to fix this
mask, cpufreq core and cpufreq driver. It makes sense to do this at common place
and hence is done in core.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-02 00:01:14 +01:00